Cutoff phenomenon in Markov chains

February 24, 2024

Introduction

A sequence of random variables $ (X_0, X_1, . . .)$ is said to be a Markov chain with state space $\mathcal{X}$ and transition matrix $P$ if $\forall x, y \in \mathcal{X} \text{ and } \forall t \geq 1$, the probability of moving to the next state depends only on the present state and not on the previous states. Consider, an irreducible aperiodic Markov chain $X_t$ with a transition matrix $P$ on a finite state space $\mathcal{X}$. If $\pi P=P$, then the distribution is said to be stationary for the Markov chain. Since the Markov chain is irreducible, the stationary distribution is unique ([8], Corollary 1.17).

In order to quantify the convergence of Markov chains, we would require a metric to measure distances between distributions. Assume that $\mu$ and $\nu$ are two probability distributions on the finite state space $\mathcal{X}$. The notion of distance between them can be defined in many ways. One such definition is the total variation distance, $$\left\vert \mu-\nu \right\vert_{\mathrm{TV}} :=\frac{1}{2}\sum_{x \in \mathcal{X}} |\mu (x)-\nu (x)|.$$

This distance can be used to define the distance between the stationary distributions of the Markov chain and the probability distribution at time $t$. $$ d (t) := \max_{x \in X} \left\vert P^t (x, \cdot)-\pi \right\vert_{\mathrm{TV}}, $$

where $P^t (x, \cdot)$ denotes the $x^{\mathrm{th}}$ row of $P^t$. By the convergence theorem ([8], Theorem 4.9), $d (t)\rightarrow 0$ as $t\rightarrow \infty$.

The $\varepsilon$-mixing time is,

Definition

$$ t_{\mathrm{mix}}(\varepsilon):= \min_{t \in \mathbb{N}} \left\{t | d (t)<\varepsilon\right\}. $$ The mixing time is defined to be $t_{\mathrm{mix}}:= t_{\mathrm{mix}}\left(\frac{1}{4}\right)$.

Although the mixing time is difficult to evaluate exactly, we can still give bounds on it. It is often useful to find a function $f$ such that the difference between the expectation of $f (X_t)$ and that of $f$ under $\pi$ is bounded, to get a lower bound on the mixing time.

Proposition

Let $X_t$ be a Markov chain with transition matrix $P$. Let $f$ be a function on the finite state space $\mathcal{X}$. If $| \operatorname{E}_x f(X_t) - \operatorname{E}_\pi f| \leq r\sigma_*,$ for $r>0$ and $\sigma_*=\sqrt{\max\left\{ \operatorname{Var}_x [f (X_t)], \operatorname{Var}_\pi (f)\right\}}$, then $$ |P^t (x, \cdot)-\pi|_{\mathrm{TV}}\geq 1-\frac{8}{r^2}. $$

Similarly the mixing time can be upper bounded by many methods introducing suitable couplings and strong stationary times. Another method is using the spectral properties of the transition matrix. This technique works for \textit{reversible} chains, i.e. chains for which $P (x,y)\pi (x)=P (y,x)\pi (y)$. Notice that the eigenvalues of a reversible Markov chain are always real.

Definition

Let $\lambda_*:=\max \{ |\lambda| |\lambda| < 1,\lambda \text{ is an eigenvalue of } P\}, $ then the absolute spectral gap is defined as $\gamma_*:=1-\lambda_*$ and for reversible Markov chains the spectral gap is defined to be $\gamma:=1-\lambda_2$ where $\lambda_2$ is the second largest eigenvalue of $P$. The relaxation time is defined as $t_\mathrm{rel} := \frac{1}{\gamma_*}$

In order to avoid aperiodicity we use \textit{lazy} chains where $P(x,x)>0$. The lazy version of a Markov chain is defined to be a chain with transition matrix $P_L=(P+I)/2$. It can also be verified that for lazy chains $\gamma_*=\gamma$.

Definition

Consider $n$ Markov chains with state space $\mathcal{X}_j$ and transition matrix $P_j$ for $j=1, \ldots, n$. A product chain is defined on the state space $\mathcal{X}_1\times\cdots\times\mathcal{X}_n$ with the transition matrix $$ P_{\mathrm{prod}} (\mathbf{x}, \mathbf{y}) = \sum_{j=1}^n w_j P_j (x_j, y_j)\prod_{i\neq j}1\left\{x_i=y_i\right\}, $$ where $w_j$ is a probability vector on $\left\{1, \ldots, n\right\}$, and $\mathbf{x}= (x_1, \ldots, x_n) \text{ and } \mathbf{y}= (y_1, \ldots, y_n)$.

The eigenvalues of this new chain are linear combination of the eigenvalues of the individual chains with $w_j$ being the weights \cite[Lemma 12.12]{levin2017}.

Consider a chain $X_t$ at which for each $x,y \in \mathcal{X}$ there exists a bijection such that $\phi_{xy} (x)=y$ and $P (u,v)=P (\phi_{xy} (u),\phi_{xy} (v))$. Such a chain is called transitive.

Proposition [[8], Lemma 12.18]

Assume that a reversible Markov chain is transitive with the eigenvalues as $\lambda_1\geq\lambda_2\geq \cdots \geq \lambda_{|\mathcal{X}|}$. Then $$ 4|P^t (x, \cdot)-\pi|_{\mathrm{TV}}^2\leq \sum_{j=2}^{|\mathcal{X}|} \lambda_j^{2t}. $$

Hence the above proposition helps us to upper bound the mixing time of random walks in Cayley graphs if the eigenvalues can be explicitly calculated.

The question is how much does the mixing time depend on the $\varepsilon$. If we want to get within a one-tenth of the stationary distribution how long do we have to run and now if we want to get to within one over a thousand how much more do we need to run. we say that there is cutoff if $t_\mathrm{mix} (\varepsilon)$ hardly depends on $\varepsilon$. The cutoff phenomenon was introduced in literature during the 80s. Aldous [1], Diaconis and Shahshahani [5] initiated the systematic study of this phenomenon.

Definition

Let us consider a family of Markov chains $\left\{X_t^{ (n)}\right\}_{n=1}^\infty$ with state spaces $\mathcal{X}$. Such a family of Markov chains is said to exhibit a cutoff if for every $0<\varepsilon<1$, $$ \lim_{n\rightarrow \infty} \frac{t_\mathrm{mix}^{ (n)} (\varepsilon)}{t_\mathrm{mix}^{ (n)} (1-\varepsilon)} = 1. $$

A Markov chain is said to have a cutoff at $t_n$ with cutoff window of $\mathcal{O}(w_n)$ if $w_n=o (t_n)$ and, $$ \begin{aligned} \lim_{\alpha \rightarrow -\infty}\liminf_{n \rightarrow \infty} d_n (t_n+\alpha w_n)=1, \lim_{\alpha \rightarrow \infty}\limsup_{n \rightarrow \infty} d_n (t_n+\alpha w_n)=0. \end{aligned} $$ In general it is difficult to characterize this behaviour. Most of the properties are known for some typical examples. For instance, top to random shuffle [3], biased random walk on an interval [4], and so on. However, there exists chains which does not exhibit this phenomenon, for example random walk on a cycle. We will discuss the cutoff phenomenon in one such example, that is random walk on hypercube.

Cutoff for random walk on hypercube

Consider a lazy random walk $ (X_t^n)$ in the $n$ dimensional hypercube graph with transition matrix $P$. The random walker moves in the hypercube by choosing a coordinate uniformly and flipping the bit of that coordinate with probability 1/2. We will analyze the bounds on $d_n (t)$ to determine if this family exhibit a cutoff or not (check [8] for detailed proofs).

Theorem

For a lazy random walk $(X_t^n)$ on the $n$ dimensional hypercube graph with transition matrix $P$, we have $$ d_n (t) \leq \frac{1}{2}\sqrt{\left(1+e^{-\frac{2t}{n}}\right)^n -1}. $$ Therefore $t_\mathrm{mix}^{ (n)} (\varepsilon)\leq \frac{1}{2}n \ln n - \frac{n}{2} \ln \ln \left(4\varepsilon^2+1\right). $

Proof

Clearly the Markov chain is product chain of one dimensional lazy random walks in hypercube with the vertex set as ${0, 1}$, $w_i=\frac{1}{n}$ as the vector of weights, and $P_i=\begin{bmatrix} 1/2 & 1/2 \\ 1/2 & 1/2 \end{bmatrix}$ being the transition matrices. The eigenvalues of $P_i$ are 0 and 1. Thus the set of eigenvalues of $P$ is $$\left\{\sum_{i=0}^n \lambda_i/n| \text{ where } \lambda_i=\left\{0,1\right\}\right\}.$$ This shows that the eigenvalues are $k/n$ with multiplicity $\binom{n}{k}$ for $k=0,\ldots,n$. Note that, the Markov chain is transitive as it is a random walk on a group. Thus by Proposition, $$ 4\max_{x \in \mathcal{X}} |P^t (x, \cdot)-\pi|_{\mathrm{TV}}^2 \leq \left(1+e^{-\frac{2t}{n}}\right)^n -1. $$ The upper bound for the mixing time follows from this.

Theorem

For $X_t^n$ as above we have, $$ d_n (t)\geq 1-\frac{8}{n}e^\frac{2t}{n-1}. $$ Therefore $t_\mathrm{mix}^{ (n)} (\varepsilon)\geq \frac{1}{2} (n-1)\ln n + \frac{1}{2} (n-1)\ln \frac{1-\varepsilon}{8}. $

Proof

Assume that the random walk starts at $\mathbf{1}= (1, \cdots, 1)$. Define a function $\mathbf{W} (x):=|x|_1$, which is the Hamming weight of a vector. Let $R_t$ denote the number of coordinates yet not chosen. Note that $R_t$ is same as the number of uncollected coupons in the coupon collecting problem. If the chain initiates with $\pi$ then $W\sim \mathrm{Binomial} (n, \frac{1}{2})$. On the other hand, if it had started from $\mathbf{1}$, then $\mathbf{W} (X_t)|R_t\sim R_t + \mathrm{Binomial} (n-R_t, \frac{1}{2})$, which provides $\operatorname{E}\left[\mathbf{W} (X_t)\right]$ and $\operatorname{Var}\left[\mathbf{W} (X_t)\right]$. Hence $\sigma_{*}^2=\max \left\{ \operatorname{Var}[\mathbf{W} (X_t)], \operatorname{Var}_\pi[W]\right\}=\frac{n}{4}$ and $$ |\operatorname{E}[\mathbf{W} (X_t)]-E_\pi (W)| \leq \sqrt{n}\sigma_*e^{-\frac{t}{n-1}}. $$ Hence by Proposition $$ d_n (t)\geq |P^t (\mathbf{1}, \cdot)-\pi|_{\mathrm{TV}} \geq 1-\frac{8}{n}e^\frac{2t}{n-1}, $$ and the theorem follows directly from this.

Theorem

A lazy random walk on the $n$ dimensional hypercube has a cutoff with a window of size $\mathcal{O} (n)$.

Proof

From Theorems proving upper and lower bounds, we have $\frac{t_\mathrm{mix}^{ (n)} (\varepsilon)}{t_\mathrm{mix}^{ (n)} (1-\varepsilon)}=o (1)$. Hence the family has a cutoff.

It is evident that $\frac{n}{2}\ln n +c_2n\leq t_\mathrm{mix}^{ (n)}\leq \frac{n}{2}\ln n -c_1n$ for some $c_1,c_2>0$ which clearly shows $n=o (t_\mathrm{mix}^{ (n)})$. Hence $$d_n (t_\mathrm{mix}^{ (n)}+\alpha n) \leq \frac{1}{2}\sqrt{e^{e^{-2 (\alpha+c_2)}}-1}, d_n (t_\mathrm{mix}^{ (n)}+\alpha n) \geq 1-8e^{-\frac{1}{n-1}\ln n + \frac{2n (\alpha-c_1)}{n-1}}.$$ The above two equations verify the claim.

The simulation results in the \autoref{fig:hyp_cutoff} shows how the cutoff phenomenon occurs in the random walk on hypercube. The drop of the total variation distance becomes sharper as $n$ increases. \begin{figure} \centering \captionsetup{justification=raggedright} \includegraphics[width=\textwidth]{hyp_plot.pdf} \caption{The cutoff phenomenon in the random walk on hypercube for $n=10$ and $n=15$.}

\end{figure}

Necessary condition for cutoff

In 2004, Peres[9] provided a necessary condition for a cutoff to occur. He named the condition \textit{product condition}.

Theorem

Consider a family of reversible, irreducible and aperiodic Markov chains, $ (X_t^{ (n)})$ with cutoff then $\lim_{n\rightarrow \infty}\frac{t_\mathrm{mix}^{ (n)}}{t_\mathrm{rel}^{ (n)}-1}\rightarrow \infty$, i.e, $t_\mathrm{rel}^{ (n)}=o (t_\mathrm{mix}^{ (n)})$.

Proof

Assume that $ (t_\mathrm{rel}^{ (n)}-1)/t_\mathrm{mix}^{ (n)} \nrightarrow 0$, then $\exists\delta$ and an infinite set $M\subset \mathbb{N}$ such that, $ (t_\mathrm{rel}^{ (n)}-1)/t_\mathrm{mix}^{ (n)} \geq \delta \forall n \in M$. By the spectral lower bound on mixing time ([8], Theorem 12.5) $$ \frac{t_\mathrm{mix}^{ (n)} (\varepsilon)}{t_\mathrm{mix}^{ (n)} (1-\varepsilon)}\geq\frac{t_\mathrm{mix}^{ (n)} (\varepsilon)}{t_\mathrm{mix}^{ (n)}}\geq \frac{t_{\mathrm{rel}}-1}{t_\mathrm{mix}^{ (n)}}\ln \left(\frac{1}{2\varepsilon}\right) \geq \delta \ln \left(\frac{1}{2\varepsilon}\right). $$ As $\varepsilon \rightarrow 0$ $\ln (1/2\varepsilon) \rightarrow \infty$, this contradicts the fact that the Markov chain has a cutoff.

Sufficient condition for cutoff

Peres conjectured that in many classes of chains, \textit{Cutoff occurs iff $t_{\mathrm{rel}}=o (t_\mathrm{mix}^{ (n)})$}. However, this is not the case in general. There are two examples by Aldous [2] and Pak ([8], Example 18.7).

Aldous’s Example

Consider the following random walk on a graph (as shown in \autoref{fig:no-cutoff}). We will provide a heuristic argument of why the Markov chain fails to have a cutoff. As it can be seen from the graph that most of the mass of the stationary distribution will be concentrated near $y$. This shows that the time taken from $x$ to $y$ is a good estimate of the mixing time. It takes about $12n$ to reach the fork from $x$. The walker then reaches $y$ through the bottom path first with a probability of about 1/2. The walker reaches $y$ in about $12n$ steps moves through the top it and about $6n$ steps following the bottom path. Moreover, the time will be within $\sqrt{n}$ of the expected steps with high probability due to central limit theorem. This shows that the $d_n (t)$ will decrease by about 1/2 in a window of order $\sqrt{n}$ at $18n$ and again by 1/2 of same order at $24n$. This shows that such a chain does not have cutoff.

We also conduct a simulation to see that the random walk on such a graph does not have a cutoff. \autoref{fig:no-cutoff-simulation} shows that the above problem fails to exihibit cutoff.

Birth and Death chains

The paper [6] proves that for Birth and Death chains the product condition is sufficient for cutoff. Let $X_t$ be a lazy irreducible birth and death chain. All the following theorems are proved for this family of chains.

Theorem

$\forall 0<\varepsilon<\frac{1}{2}$ $\exists$ a constant $c_\varepsilon$ such that, $$ t_\mathrm{mix} (\varepsilon)-t_\mathrm{mix} (1-\varepsilon)\leq c_\varepsilon \sqrt{t_\mathrm{mix} t_\mathrm{rel}}. $$

This shows that a family of birth and death chains have cutoff within a window of at most $\sqrt{t_\mathrm{mix}^{ (n)} t_{\mathrm{rel}}}$ iff the product condition holds. The bound is indeed tight, the paper [6] discusses an example where it achieves the bound. As well as there exist examples where the bound is strictly less than $\sqrt{t_\mathrm{mix} t_\mathrm{rel}}$.

It is enough to prove the Theorem for $t_\mathrm{rel}$ small enough compared to $t_\mathrm{mix}$.

Theorem

If $\exists 0<\varepsilon<\frac{1}{16}$ with $t_\mathrm{rel} < \varepsilon^5 t_\mathrm{mix}$, then, $$ t_\mathrm{mix} (4\varepsilon)-t_\mathrm{mix} (1-2\varepsilon)\leq \frac{6}{\varepsilon} \sqrt{t_\mathrm{mix} t_\mathrm{rel}}. $$

The implication can be proved by taking cases of whether $t_\mathrm{rel}/t_\mathrm{mix}$ is large or small and noting the fact that $t_\mathrm{mix} (\varepsilon)-t_\mathrm{mix} (1-\varepsilon)$ decreases with increase in $\varepsilon$.

Definition

The $\varepsilon$-quantile of the distribution $\pi$ for $0<\varepsilon<1$ is defined as, $$ Q (\varepsilon):=\min \left\{k: \sum_{j=0}^{k} \pi (j) \geq \varepsilon\right\}. $$

We define hitting time of a node $y$ to be the first time when the chain visits the node $y$. Let us denote this random time by $\tau_y$.

Lemma

For any $\varepsilon \in (0, 1)$ and $\forall k\in [n]$ with $t>0$ we have $$ |P^t (0, \cdot)-\pi| \leq \mathbf{P}_0 (\tau_{Q (1-\varepsilon)}>t)+\varepsilon, \ $$ $$ |P^t (k, \cdot)-\pi| \leq \mathbf{P}_k (\max\left\{\tau_{Q (\varepsilon)}, \tau_{Q (1-\varepsilon)}\right\}>t)+2\varepsilon. $$

This can be proved by introducing a \textit{no-crossing} coupling. Assume without any loss of generality that $\operatorname{E}_0\tau_{Q (1-\varepsilon)}\geq\operatorname{E}_n\tau_{Q (\varepsilon)}$. With this assumption and using Markov’s inequality we have, $$ t_\mathrm{mix}\leq 16 \operatorname{E}_0\tau_{Q (1-\varepsilon)}. $$

Lemma

Assume the state space to be $\mathcal{X}=\left\{0, 1, \ldots ,d\right\}$ with $d$ being the absorbing state. Then $ \operatorname{Var}_0\tau_d\leq t_\mathrm{rel} \operatorname{E}_0\tau_d$.

Proof

The proof relies on a special property of the birth and death chains mentioned in [7] which says that the hitting time can be expressed as the sum of independent geometric random variables with parameters $1-\lambda_i$ where $\lambda_i \neq 1$ are eigenvalues of the transition matrix $P$. Hence $\operatorname{Var}_0\tau_d \leq t_\mathrm{rel} \operatorname{E}_0\tau_d$.\

This subtle argument breaks down when someone moves out of the realm of birth and death chains. Now, if $l=Q (1-\varepsilon)$ is changed to a absorbing state, then $ \operatorname{Var}_0\tau_l\leq \frac{\operatorname{E}_0\tau_l}{\gamma_l}$ where $\gamma_l$ is the spectral gap of the truncated chain. It is easy to show that $\gamma_l \geq \varepsilon \gamma$ by using the Rayleigh Ritz representation. This in turn shows that $\forall 0<\varepsilon<1$ $$ \operatorname{Var}_0\tau_{Q (1-\varepsilon)} \leq \frac{t_\mathrm{rel} \operatorname{E}_0\tau_{Q (1-\varepsilon)}}\varepsilon. $$

Lemma

If $\exists 0< \varepsilon < \frac{1}{16}$ such that $t_\mathrm{rel}<\varepsilon^4 \operatorname{E}_0 \tau_{Q (1-\varepsilon)}$. Then $$ \operatorname{E}_{Q (\varepsilon)}\tau_{Q (1-\varepsilon)}=\frac{3}{2\varepsilon} \sqrt{t_\mathrm{rel}\operatorname{E}_0\tau_{Q (\frac{1}{2})}}. $$

Proof

Let $\nu$ be a random variable with support on $[Q (\varepsilon)]$ defined as, $\nu (k)=\frac{\pi (k)}{\pi ([Q (\varepsilon)])}1\left\{[Q (\varepsilon)]\right\}$. Similarly, define $w$ to be the vector such that $w (k)=\frac{1}{\pi ([Q (\varepsilon)])}1\left\{[Q (\varepsilon)]\right\}$ for all $k$. Hence $$ |P^t (\nu, \cdot)-\pi|_{\mathrm{TV}} \leq \frac{\lambda_2^t}{2\sqrt{\varepsilon}}. $$ Now, let us consider a time after which the total variation distance is less than $\varepsilon/2$. We choose this time to be $t_\varepsilon=\lceil \frac{3}{2}\ln (\frac{1}{\varepsilon})t_\mathrm{rel} \rceil$. $$ \mathbf{P}_\nu (\tau_{Q (1-\varepsilon)}\leq t_\varepsilon) \geq \frac{\varepsilon}{2}. $$ Use Chebychev’s inequality to upper bound this probability, followed by the use of fact that $\operatorname{Var}_{Q (\varepsilon)}\tau_Q (\varepsilon) \leq \operatorname{Var}_0\tau_Q (\varepsilon)$. This equation along with lower bound on the hitting time and the variance bound shows that, $$ \operatorname{E}_{Q (\varepsilon)} \tau_{Q (1-\varepsilon)} \leq 2 \ln\left(\frac{1}{\varepsilon}\right)t_\mathrm{rel} + \frac{1}{\varepsilon}\sqrt{2t_\mathrm{rel} \operatorname{E}_0\tau_{Q (1-\varepsilon)}}. $$ We use the fact that $t_\mathrm{rel} < \varepsilon^4 \operatorname{E}_0\tau_{Q (1-\varepsilon)}$ to complete the rest of the proof.

Proof of the theorem

Define, $$ t^+ (\Delta) :=\operatorname{E}_0\tau_{Q\left(\frac{1}{2}\right)}+\Delta \sqrt{t_\mathrm{rel}\operatorname{E}_0\tau_{Q\left(\frac{1}{2}\right)}}, t^- (\Delta) :=\operatorname{E}_0\tau_{Q\left(\frac{1}{2}\right)}-\Delta \sqrt{t_\mathrm{rel}\operatorname{E}_0\tau_{Q\left(\frac{1}{2}\right)}}. $$ Now, we will use the relation between $t_\mathrm{rel}$ and $t_\mathrm{mix}$ already derived, $$ t_\mathrm{rel}<\varepsilon^5 t_\mathrm{mix} < 16\varepsilon^5 \operatorname{E}_0\tau_{Q (1-\varepsilon)}< \varepsilon^4 \operatorname{E}_0\tau_{Q (1-\varepsilon)}. $$ With similar idea as lower bounding the hitting time, we upper bound the $|P^{t^-} (0, \cdot)-\pi|_{\mathrm{TV}}$ with a hitting time probability. Now, in order to bound the term $\mathbf{P}_0 (\tau_{Q (\varepsilon)}\leq t^-)$, we use Chebychev’s inequality. By the Lemma and the bound on variance, we can bound $\operatorname{E}_0\tau_{Q (1-\varepsilon)}$ and $ \operatorname{Var}_0\tau_{Q (1-\varepsilon)}$. With $\Delta=\frac{2}{\varepsilon}$, we can get a lower bound on $t_\mathrm{mix} (1-2\varepsilon)\geq t^- (\frac{2}{\varepsilon})$ as desired for the statement.

For the upper bound we use the no crossing coupling to get, $$ |P^{t^+} (k, \cdot)-\pi|_{\mathrm{TV}}\leq 2\varepsilon + \mathbf{P}_0 (\tau_{Q (1-\varepsilon)}>t^+)+\mathbf{P}_n (\tau_{Q (\varepsilon)}>t^+). $$ Again we can use Chebychev’s inequality to get the bound. First of all we will give bounds for the respective expectations and variances, i.e., $$ \begin{aligned} \operatorname{E}_{n} \tau_{Q (\varepsilon)} & \leq \operatorname{E}_0 \tau_{Q (1-\varepsilon)} \leq \operatorname{E}_0 \tau_{Q\left(\frac{1}{2}\right)}+\frac{3}{2 \varepsilon} \sqrt{t_\mathrm{rel}} \operatorname{E}_0 \tau_{Q\left(\frac{1}{2}\right)}, \ \operatorname{Var}_0 \tau_{Q (\varepsilon)} & \leq (1 / \varepsilon) t_\mathrm{rel} \operatorname{E}_0 \tau_{Q (1-\varepsilon)} \leq \frac{t_\mathrm{rel}}{\varepsilon\left(1-\frac{3 \varepsilon}{2}\right)} \operatorname{E}_0 \tau_{Q\left(\frac{1}{2}\right)}. \end{aligned} $$ Following the same idea as the variance bound we conclude that, $$ \operatorname{Var}_n \tau_{Q (\varepsilon)} \leq (1 / \varepsilon) t_\mathrm{rel} \operatorname{E}_n \tau_{Q (1-\varepsilon)} \leq \frac{2}{\varepsilon} \operatorname{E}_0 \tau_{Q\left(\frac{1}{2}\right)}. $$ Similarly, with appropriate choice of $\Delta$, we can achieve a bound, i.e., $t_\mathrm{mix} (4 \varepsilon) \leq t^{+}\left(\frac{3}{\varepsilon}\right)$. Rewriting this with respect to $t_\mathrm{mix}$ gives the final statement.

Acknowledgments

I would like to acknowledge the immense help and support I got from Dr. Anirban Basak and Dr. Riddhipratim Basu. I am grateful to ICTS-S. N. Bhatt Program 2022 for providing me this opportunity to work with them. Last but not the least, I am thankful to all others who supported me through out my mathematical journey.

References

[1] D. Aldous. Random walks on finite groups and rapidly mixing markov chains. In S´ eminaire de Probabilit´ es XVII 1981/82, pages 243–297. Springer, 1983.

[2] D. Aldous. American institute of mathematics (aim) research workshop “sharp thresholds for mixing times” (palo alto), 2004.

[3] P. Diaconis. Applications of non-commutative fourier analysis to probability problems. In ´ Ecole d’´ Et´ e de Proba bilit´ es de Saint-Flour XV–XVII, 1985–87, pages 51–100. Springer, 1988.

[4] P. Diaconis and J. A. Fill. Strong stationary times via a new form of duality. The Annals of Probability, 18(4):1483 1522, 1990.

[5] P. Diaconis and M. Shahshahani. Generating a random permutation with random transpositions. Zeitschrift f¨ur Wahrscheinlichkeitstheorie und verwandte Gebiete, 57(2):159–179, 1981.

[6] J. Ding, E. Lubetzky, and Y. Peres. Total variation cutoff in birth-and-death chains. Probability Theory and Related Fields, 146(1-2), 2008.

[7] S. Karlin and J. McGregor. Coincidence properties of birth and death processes. Pacific Journal of Mathematics, 9(4):1109–1140, 1959.

[8] D. A. Levin and Y. Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.

[9] Y. Peres. American institute of mathematics (aim) research workshop “sharp thresholds for mixing times” (palo alto), 2004