★ See also the **PDF version of this chapter** (better formatting/references) ★

See any bugs/typos/confusing explanations? Open a GitHub issue. You can also comment below

- See examples of randomized algorithms
- Get more comfort with analyzing probabilistic processes and tail
bounds
- Success amplification using tail bounds

“in 1946 .. (I asked myself) what are the chances that a Canfield solitaire laid out with 52 cards will come out successfully? After spending a lot of time trying to estimate them by pure combinatorial calculations, I wondered whether a more practical method … might not be to lay it our say one hundred times and simple observe and count”, Stanislaw Ulam, 1983

“The salient features of our method are that it is probabilistic … and with a controllable miniscule probability of error.”, Michael Rabin, 1977

In early computer systems, much effort was taken to drive *out*
randomness and noise. Hardware components were prone to
non-deterministic behavior from a number of causes, whether it was
vacuum tubes overheating or actual physical bugs causing short circuits
(see bugfig). This motivated John von Neumann, one of the
early computing pioneers, to write a paper on how to *error correct*
computation, introducing the notion of *redundancy*.

So it is quite surprising that randomness turned out not just a
hindrance but also a *resource* for computation, enabling to achieve
tasks much more efficiently than previously known. One of the first
applications involved the very same John von Neumann. While he was sick
in bed and playing cards, Stan Ulam came up with the observation that
calculating statistics of a system could be done much faster by running
several randomized simulations. He mentioned this idea to von Neumann,
who became very excited about it; indeed, it turned out to be crucial
for the neutron transport calculations that were needed for development
of the Atom bomb and later on the hydrogen bomb. Because this project
was highly classified, Ulam, von Neumann and their collaborators came up
with the codeword “Monte Carlo” for this approach (based on the famous
casinos where Ulam’s uncle gambled). The name stuck, and randomized
algorithms are known as Monte Carlo algorithms to this day.

In this chapter, we will see some examples of randomized algorithms that use randomness to compute a quantity in a faster or simpler way than was known otherwise. We will describe the algorithms in an informal / “pseudo-code” way, rather than as NAND or NAND++ programs. In chapmodelrand we will discuss how to augment the NAND and NAND++ models to incorporate the ability to “toss coins”.

We start with the following example. Recall the *maximum cut problem* of
finding, given a graph \(G=(V,E)\), the cut that maximizes the number of
edges. This problem is \(\mathbf{NP}\)-hard, which means that we do not
know of any efficient algorithm that can solve it, but randomization
enables a simple algorithm that can cut at least half of the edges:

There is an efficient probabilistic algorithm that on input an \(n\)-vertex \(m\)-edge graph \(G\), outputs a cut \((S,\overline{S})\) that cuts at least \(m/2\) of the edges of \(G\) in expectation.

We simply choose a *random cut*: we choose a subset \(S\) of vertices by
choosing every vertex \(v\) to be a member of \(S\) with probability \(1/2\)
independently. It’s not hard to see that each edge is cut with
probability \(1/2\) and so the expected number of cut edges is \(m/2\).

The algorithm is extremely simple:

Algorithm Random Cut:

Input:Graph \(G=(V,E)\) with \(n\) vertices and \(m\) edges. Denote \(V = \{ v_0,v_1,\ldots, v_{n-1}\}\).

Operation:

- Pick \(x\) uniformly at random in \(\{0,1\}^n\).
- Let \(S \subseteq V\) be the set \(\{ v_i \;:\; x_i = 1 \;,\; i\in [n] \}\) that includes all vertices corresponding to coordinates of \(x\) where \(x_i=1\).
- Output the cut \((S,\overline{S})\).

We claim that the expected number of edges cut by the algorithm is \(m/2\). Indeed, for every edge \(e \in E\), let \(X_e\) be the random variable such that \(X_e(x)=1\) if the edge \(e\) is cut by \(x\), and \(X_e(x)=0\) otherwise. For every such edge \(e =\{ i,j \}\), \(X_e(x)=1\) if and only if \(x_i \neq x_j\). Since the pair \((x_i,x_j)\) obtains each of the values \(00,01,10,11\) with probability \(1/4\), the probability that \(x_i \neq x_j\) is \(1/2\) and hence \(\E[X_e]=1/2\). If we let \(X\) be the random variable corresponding to the total number of edges cut by \(S\), then \(X = \sum_{e\in E} X_e\) and hence by linearity of expectation

\[\E[X] = \sum_{e\in E} \E[X_e] = m(1/2) = m/2 \;.\]

maxcutthm gives us an algorithm that cuts \(m/2\) edges in
*expectation*. But, as we saw before, expectation does not immediately
imply concentration, and so a priori, it may be the case that when we
run the algorithm, most of the time we don’t get a cut matching the
expectation. Luckily, we can *amplify* the probability of success by
repeating the process several times and outputting the best cut we find.
We start by arguing that the probability the algorithm above succeeds in
cutting at least \(m/2\) edges is not *too* tiny.

The probability that a random cut in an \(m\) edge graph cuts at least \(m/2\) edges is at least \(1/(2m)\).

To see the idea behind the proof, think of the case that \(m=1000\). In this case one can show that we will cut at least \(500\) edges with probability at least \(0.001\) (and so in particular larger than \(1/(2m)=1/2000\)). Specifically, if we assume otherwise, then this means that with probability more than \(0.999\) the algorithm cuts \(499\) or fewer edges. But since we can never cut more than the total of \(1000\) edges, given this assumption, the highest value the expected number of edges cut is if we cut exactly \(499\) edges with probability \(0.999\) and cut \(1000\) edges with probability \(0.001\). Yet even in this case the expected number of edges will be \(0.999 \cdot 499 + 0.001 \cdot 1000 < 500\), which contradicts the fact that we’ve calculated the expectation to be at least \(500\) in maxcutthm.

Let \(p\) be the probability that we cut at least \(m/2\) edges and suppose, towards a contradiction, that \(p<1/(2m)\). Since the number of edges cut is an integer, and \(m/2\) is a multiple of \(0.5\), by definition of \(p\), with probability \(1-p\) we cut at most \(m/2 - 0.5\) edges. Moreover, since we can never cut more than \(m\) edges, under our assumption that \(p<m/2\), we can bound the expected number of edges cut by

\[ pm + (1-p)(m/2-0.5) \leq pm + m/2-0.5 \] But if \(p<1/(2m)\) then \(pm<0.5\) and so the righthand side is smaller than \(m/2\), which contradicts the fact that (as proven in maxcutthm) the expected number of edges cut is at least \(m/2\).

**Success amplification.** cutprob shows that our algorithm
succeeds at least *some* of the time, but we’d like to succeed almost
*all* of the time. The approach to do that is to simply *repeat* our
algorithm many times, with fresh randomness each time, and output the
best cut we get in one of these repetitions. It turns out that with
extremely high probability we will get a cut of size at least \(m/2\). For
example, if we repeat this experiment \(2000m\) times, then (using the
inequality \((1-1/k)^k \leq 1/e \leq 1/2\)) we can show that the
probability that we will never cut at least \(m/2\) edges is at most

\[ (1-1/(2m))^{2000 m} \leq 2^{-1000} \;. \]

More generally, the same calculations can be used to show the following lemma:

There is a algorithm that on input a graph \(G=(V,E)\) and a number \(k\), runs in time polynomial in \(|V|\) and \(k\) and outputs a cut \((S,\overline{S})\) such that \[ \Pr[ \text{number of edges cut by $(S,\overline{S})$ } \geq |E|/2 ] \geq 1- 2^{-k} \;. \]

The algorithm will work as follows:

Algorithm AMPLIFY RANDOM CUT:

Input:Graph \(G=(V,E)\) with \(n\) vertices and \(m\) edges. Denote \(V = \{ v_0,v_1,\ldots, v_{n-1}\}\). Number \(k>0\).

Operation:

Repeat the following \(200km\) times:

- Pick \(x\) uniformly at random in \(\{0,1\}^n\).
- Let \(S \subseteq V\) be the set \(\{ v_i \;:\; x_i = 1 \;,\; i\in [n] \}\) that includes all vertices corresponding to coordinates of \(x\) where \(x_i=1\).
- If \((S,\overline{S})\) cuts at least \(m/2\) then halt and output \((S,\overline{S})\).
- Output “failed”

We leave completing the analysis as an exercise to the reader (see cutalgorithmamplificationlemex).

The analysis above relied on the fact that the maximum cut has *one
sided error*. By this we mean that if we get a cut of size at least
\(m/2\) then we know we have succeeded. This is common for randomized
algorithms, but is not the only case. In particular, consider the task
of computing some Boolean function \(F:\{0,1\}^* \rightarrow \{0,1\}\). A
randomized algorithm \(A\) for computing \(F\), given input \(x\), might toss
coins and succeed in outputting \(F(x)\) with probability, say, \(0.9\). We
say that \(A\) has *two sided errors* if there is positive probability
that \(A(x)\) outputs \(1\) when \(F(x)=0\), and positive probability that
\(A(x)\) outputs \(0\) when \(F(x)=1\). In such a case, to simplify \(A\)’s
success, we cannot simply repeat it \(k\) times and output \(1\) if a single
one of those repetitions resulted in \(1\), nor can we output \(0\) if a
single one of the repetitions resulted in \(0\). But we can output the
*majority value* of these repetitions. By the Chernoff bound
(chernoffthm), with probability *exponentially close* to \(1\)
(i.e., \(1-2^{\Omega(k)}\)), the fraction of the repetitions where \(A\)
will output \(F(x)\) will be at least, say \(0.89\), and in such cases we
will of course output the correct answer.

The above translates into the following theorem

If \(F:\{0,1\}^* \rightarrow \{0,1\}\) is a function such that there is a polynomial-time algorithm \(A\) satisfying \[ \Pr[A(x) = F(x)] \geq 0.51 \] for every \(x\in \{0,1\}^*\), then there is a polynomial time algorithm \(B\) satisfying \[ \Pr[ B(x) = F(x) ] \geq 1 - 2^{-|x|} \] for every \(x\in \{0,1\}^*\).

We omit the proof of amplifyalg, since we will prove a more general result later on in amplificationthm.

We have shown a probabilistic algorithm that on any \(m\) edge graph \(G\), will output a cut of at least \(m/2\) edges with probability at least \(1-2^{-1000}\). Does it mean that we can consider this problem as “easy”? Should we be somewhat wary of using a probabilistic algorithm, since it can sometimes fail?

First of all, it is important to emphasize that this is still a *worst
case* guarantee. That is, we are not assuming anything about the *input
graph*: the probability is only due to the *internal randomness of the
algorithm*. While a probabilistic algorithm might not seem as nice as a
deterministic algorithm that is *guaranteed* to give an output, to get a
sense of what a failure probability of \(2^{-1000}\) means, note that:

- The chance of winning the Massachusetts Mega Million lottery is one over \((75)^5\cdot 15\) which is roughly \(2^{-35}\). So \(2^{-1000}\) corresponds to winning the lottery about \(300\) times in a row, at which point you might not care so much about your algorithm failing.
- The chance for a U.S. resident to be struck by lightning is about \(1/700000\), which corresponds to about \(2^{-45}\) chance that you’ll be struck by lightning the very second that you’re reading this sentence (after which again you might not care so much about the algorithm’s performance).
- Since the earth is about 5 billion years old, we can estimate the chance that an asteroid of the magnitude that caused the dinosaurs’ extinction will hit us this very second to be about \(2^{-60}\). It is quite likely that even a deterministic algorithm will fail if this happens.

So, in practical terms, a probabilistic algorithm is just as good as a
deterministic one. But it is still a theoretically fascinating question
whether randomized algorithms actually yield more power, or whether is
it the case that for any computational problem that can be solved by
probabilistic algorithm, there is a deterministic algorithm with nearly
the same performance.

The 3SAT problem is \(\mathbf{NP}\) hard, and so it is unlikely that it has a polynomial (or even subexponential) time algorithm. But this does not mean that we can’t do at least somewhat better than the trivial \(2^n\) algorithm for \(n\)-variable 3SAT. The best known worst-case algorithms for 3SAT are randomized, and are related to the following simple algorithm, variants of which are also used in practice:

Algorithm WalkSAT:

Input:An \(n\) variable 3CNF formula \(\varphi\).

Parameters:\(T,S \in \N\)

Operation:

Repeat the following \(T\) steps:

Choose a random assignment \(x\in \{0,1\}^n\) and repeat the following for \(S\) steps:

- If \(x\) satisfies \(\varphi\) then output \(x\).
- Otherwise, choose a random clause \((\ell_i \vee \ell_j \vee \ell_k)\) that \(x\) does not satisfy, choose a random literal in \(\ell_i,\ell_j,\ell_k\) and modify \(x\) to satisfy this literal.
- If all the \(T\cdot S\) repetitions above did not result in a satisfying assignment then output
`Unsatisfiable`

The running time of this algorithm is \(S\cdot T \cdot poly(n)\), and so
the key question is how small we can make \(S\) and \(T\) so that the
probability that WalkSAT outputs `Unsatisfiable`

on a satisfiable
formula \(\varphi\) is small. It is known that we can do so with
\(ST = \tilde{O}((4/3)^n) = \tilde{O}(1.333\ldots^n)\) (see
walksatex for a weaker result), but we’ll show below a simpler
analysis yielding \(ST= \tilde{O}(\sqrt{3}^n) = \tilde{O}(1.74^n)\), which
is still much better than the trivial \(2^n\) bound.

If we set \(T=100\cdot \sqrt{3}^{n}\) and \(S= n/2\), then the probability
we output `Unsatisfiable`

for a satisfiable \(\varphi\) is at most \(1/2\).

Suppose that \(\varphi\) is a satisfiable formula and let \(x^*\) be a satisfying assignment for it. For every \(x\in \{0,1\}^n\), denote by \(\Delta(x,x^*)\) the number of coordinates that differ between \(x\) and \(x^*\). The heart of the proof is the following claim:

**Claim I:** For every \(x,x^*\) as above, in every local improvement
step, the value \(\Delta(x,x^*)\) is decreased by one with probability at
least \(1/3\).

**Proof of Claim I:** Since \(x^*\) is a *satisfying* assignment, if \(C\)
is a clause that \(x\) does *not* satisfy, then at least one of the
variables involve in \(C\) must get different values in \(x\) and \(x^*\).
Thus when we change \(x\) by one of the three literals in the clause, we
have probability at least \(1/3\) of decreasing the distance.

The second claim is that our starting point is not that bad:

**Claim 2:** With probability at least \(1/2\) over a random
\(x\in \{0,1\}^n\), \(\Delta(x,x^*) \leq n/2\).

**Proof of Claim II:** Consider the map
\(FLIP:\{0,1\}^n \rightarrow \{0,1\}^n\) that simply “flips” all the bits
of its input from \(0\) to \(1\) and vice versa. That is,
\(FLIP(x_0,\ldots,x_{n-1}) = (1-x_0,\ldots,1-x_{n-1})\). Clearly \(FLIP\) is
one to one. Moreover, if \(x\) is of distance \(k\) to \(x^*\), then \(FLIP(x)\)
is distance \(n-k\) to \(x^*\). Now let \(B\) be the “bad event” in which \(x\)
is of distance \(>n/2\) from \(x^*\). Then the set
\(A = FLIP(B) = \{ FLIP(x) \;:\; x\in \{0,1\}^n \}\) satisfies \(|A|=|B|\)
and that if \(x\in A\) then \(x\) is of distance \(<n/2\) from \(x^*\). Since
\(A\) and \(B\) are disjoint events, \(\Pr[A] + \Pr[B] \leq 1\). Since they
have the same cardinality, they have the same probability and so we get
that \(2\Pr[B] \leq 1\) or \(\Pr[B] \leq 1/2\). (See also
flipaanalysisfig).

Claims I and II imply that each of the \(T\) iterations of the outer loop succeeds with probability at least \(0.5 \cdot \sqrt{3}^{-n/2}\). Indeed, by Claim II, the original guess \(x\) will satisfy \(\Delta(x,x^*) \leq n/2\), and by Claim I, even conditioned on all the history so far, for each of the \(S=n/2\) steps we have probability \(\geq 1/3\) of being “lucky” and decreasing the distance at one. The chance we will be lucky in all \(n/2\) steps is hence at least \((1/3)^{n/2} = \sqrt{3}^{n/2}\).

Since any single iteration of the outer loop succeeds with probability at least \(\tfrac{1}{2} \cdot \sqrt{3}^{-n}\), the probability that we never do so in \(T=100 \sqrt{3}^{n}\) repetitions is at most \((1-\tfrac{1}{2\sqrt{3}^{n}})^{100\cdot \sqrt{3}^n} \leq (1/e)^{50}\).

The *matching* problem is one of the canonical optimization problems,
arising in all kinds of applications: matching residents with hospitals,
kidney donors with patients, flights with crews, and many others. One
prototypical variant is *bipartite perfect matching*. In this problem,
we are given a bipartite graph \(G = (L\cup R,E)\) which has \(2n\) vertices
partitioned into \(n\)-sized sets \(L\) and \(R\), where all edges have one
endpoint in \(L\) and the other in \(R\). The goal is to determine whether
there is a *perfect matching*, a subset \(M \subseteq E\) of \(n\) disjoint
edges. That is, \(M\) matches every vertex in \(L\) to a unique vertex in
\(R\).

The bipartite matching problem turns out to have a polynomial-time algorithm, since we can reduce finding a matching in \(G\) to finding a maximum flow (or equivalently, minimum cut) in a related graph \(G'\) (see matchingfig). However, we will see a different probabilistic algorithm to determine whether a graph contains such a matching.

Let us label \(G\)’s vertices as \(L = \{ \ell_0,\ldots,\ell_{n-1} \}\) and
\(R = \{ r_0, \ldots, r_{n-1} \}\). A matching \(M\) corresponds to a
*permutation* \(\pi \in S_n\) (i.e., one-to-one and onto function
\(\pi: [n] \rightarrow [n]\)) where for every \(i\in [n]\), we define
\(\pi(i)\) to be the unique \(j\) such that \(M\) contains the edge
\(\{ \ell_i ,r_j \}\). Define an \(n\times n\) matrix \(A=A(G)\) where
\(A_{i,j}=1\) if and only if the edge \(\{\ell_i,r_j\}\) is present and
\(A_{i,j}=0\) otherwise. The correspondence between matchings and
permutations implies the following claim:

Define \(P=P(G)\) to be the polynomial mapping \(\R^{n^2}\) to \(\R\) where \[
P(x_{0,0},\ldots,x_{n-1,n-1}) = \sum_{\pi \in S_n} \left( \prod_{i=0}^{n-1} sign(\pi)A_{i,\pi(i)} \right) \prod_{i=0}^{n-1} x_{i,\pi(i)} \label{matchpolyeq}
\] Then \(G\) has a perfect matching if and only if \(P\) is not identically
zero. That is, \(G\) has a perfect matching if and only if there exists
some assignment \(x=(x_{i,j})_{i,j\in [n]} \in \R^{n^2}\) such that
\(P(x) \neq 0\).*inverted* by \(\pi\)). The importance of the term \(sign(\pi)\) is that
it makes \(P\) equal to the *determinant* of the matrix \((x_{i,j})\)
and hence efficiently computable.

If \(G\) has a perfect matching \(M^*\), then let \(\pi^*\) be the permutation corresponding to \(M\) and let \(x^* \in \Z^{n^2}\) defined as follows: \(x_{i,j}=1\) if \(j=\pi(i)\) and \(x_{i,j}=0\). Note that for every \(\pi \neq \pi^*\), \(\prod_{i=0}^{n-1} x_{i,\pi(i)}=0\) but \(\prod_{i=0}^{n-1} x^*_{i,\pi^*(i)}=1\). Hence \(P(x^*)\) will equal \(\prod_{i=0}^{n-1} A_{i,\pi^*(i)}\). But since \(M^*\) is a perfect matching in \(G\), \(\prod_{i=0}^{n-1} A_{i,\pi^*(i)} = 1\).

On the other hand, suppose that \(P\) is not identically zero. By \eqref{matchpolyeq}, this means that at least one of the terms \(\prod_{i=0}^{n-1}A_{i,\pi(i)}\) is not equal to zero. But then this permutation \(\pi\) must be a perfect matching in \(G\).

As we’ve seen before, for every \(x \in \R^{n^2}\), we can compute \(P(x)\)
by simply computing the *determinant* of the matrix \(A(x)\), which is
obtained by replacing \(A_{i,j}\) with \(A_{i,j}x_{i,j}\). This reduces
testing perfect matching to the *zero testing* problem for polynomials:
given some polynomial \(P(\cdot)\), test whether \(P\) is identically zero
or not. The intuition behind our randomized algorithm for zero testing
is the following:

If a polynomial is not identically zero, then it can’t have “too many” roots.

This intuition sort of makes sense. For one variable polynomials, we know that a nonzero linear function has at most one root, a quadratic function (e.g., a parabola) has at most two roots, and generally a degree \(d\) equation has at most \(d\) roots. While in more than one variable there can be an infinite number of roots (e.g., the polynomial \(x_0+y_0\) vanishes on the line \(y=-x\)) it is still the case that the set of roots is very “small” compared to the set of all inputs. For example, the root of a bivariate polynomial form a curve, the roots of a three-variable polynomial form a surface, and more generally the roots of an \(n\)-variable polynomial are a space of dimension \(n-1\).

This intuition leads to the following simple randomized algorithm:

To decide if \(P\) is identically zero, choose a “random” input \(x\) and check if \(P(x)\neq 0\).

This makes sense: if there are only “few” roots, then we expect that with high probability the random input \(x\) is not going to be one of those roots. However, to transform this into an actual algorithm, we need to make both the intuition and the notion of a “random” input precise. Choosing a random real number is quite problematic, especially when you have only a finite number of coins at your disposal, and so we start by reducing the task to a finite setting. We will use the following result

For every integer \(q\), and polynomial \(P:\R^n \rightarrow \R\) with integer coefficients. If \(P\) has degree at most \(d\) and is not identically zero, then it has at most \(dq^{n-1}\) roots in the set \([q]^n = \{ (x_0,\ldots,x_{n-1}) : x_i \in \{0,\ldots,q-1\} \}\).

We omit the (not too complicated) proof of szlem. We remark that it holds not just over the real numbers but over any field as well. Since the matching polynomial \(P\) of matchpolylem has degree at most \(n\), szlem leads directly to a simple algorithm for testing if it is nonzero:

Algorithm Perfect-Matching:

Input:Bipartite graph \(G\) on \(2n\) vertices \(\{ \ell_0,\ldots,\ell_{n-1} , r_0,\ldots,r_{n-1} \}\).

Operation:

- For every \(i,j \in [n]\), choose \(x_{i,j}\) independently at random from \([2n]=\{0,\ldots 2n-1\}\).
- Compute the determinant of the matrix \(A(x)\) whose \((i,j)^{th}\) entry equals \(x_{i,j}\) if the edge \(\{\ell_i,r_j\}\) is present and \(0\) otherwise.
- Output
`no perfect matching`

if this determinant is zero, and output`perfect matching`

otherwise.

This algorithm can be improved further (e.g., see matchingmodex). While it is not necessarily faster than the cut-based algorithms for perfect matching, it does have some advantages. In particular, it is more amenable for parallelization. (However, it also has the significant disadvantage that it does not produce a matching but only states that one exists.) The Schwartz–Zippel Lemma, and the associated zero testing algorithm for polynomials, is widely used across computer science, including in several settings where we have no known deterministic algorithm matching their performance.

- Using concentration results, we can
*amplify*in polynomial time the success probability of a probabilistic algorithm from a mere \(1/p(n)\) to \(1-2^{-q(n)}\) for every polynomials \(p\) and \(q\). - There are several randomized algorithms that are better in various senses (e.g., simpler, faster, or other advantages) than the best known deterministic algorithm for the same problem.

Most of the exercises have been written in the summer of 2018 and haven’t yet been fully debugged. While I would prefer people do not post online solutions to the exercises, I would greatly appreciate if you let me know of any bugs. You can do so by posting a GitHub issue about the exercise, and optionally complement this with an email to me with more details about the attempted solution.

Our model for probability involves tossing \(n\) coins, but sometimes
algorithm require sampling from other distributions, such as selecting a
uniform number in \(\{0,\ldots,M-1\}\) for some \(M\). Fortunately, we can
simulate this with an exponentially small probability of error: prove
that for every \(M\), if \(n>k\lceil \log M \rceil\), then there is a
function \(F:\{0,1\}^n \rightarrow \{0,\ldots,M-1\} \cup \{ \bot \}\) such
that **(1)** The probability that \(F(x)=\bot\) is at most \(2^{-k}\) and
**(2)** the distribution of \(F(x)\) conditioned on \(F(x) \neq \bot\) is
equal to the uniform distribution over \(\{0,\ldots,M-1\}\).**Hint:** Think of \(x\in \{0,1\}^n\) as choosing \(k\) numbers
\(y_1,\ldots,y_k \in \{0,\ldots, 2^{\lceil \log M \rceil}-1 \}\).
Output the first such number that is in \(\{0,\ldots,M-1\}\).

- Prove that for every \(\epsilon>0\), if \(n\) is large enough then for
every \(x^*\in \{0,1\}^n\)
\(\Pr_{x \sim \{0,1\}^n}[ \Delta(x,x^*) \leq n/3 ] \leq 2^{-(1-H(1/3)-\epsilon)n}\)
where \(H(p)=p\log(1/p) + (1-p)\log(1/(1-p))\) is the same function as
in entropybinomex.
- Prove that \(2^{1-H(1/4)+(1/4) \log 3}=(3/2)\).
- Use the above to prove that for every \(\delta>0\) and large enough
\(n\), if we set \(T=1000\cdot (3/2+\delta)^n\) and \(S=n/4\) in the
WalkSAT algorithm then for every satisfiable 3CNF \(\varphi\), the
probability that we output
`unsatisfiable`

is at most \(1/2\).

monte carlo history:
`http://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-88-9068`

Some topics related to this chapter that might be accessible to advanced students include: (to be completed)

Copyright 2018, Boaz Barak.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

HTML version is produced using the Distill template, Copyright 2018, The Distill Template Authors.

## Comments

Comments are posted on the GitHub repository using the utteranc.es app. A GitHub login is required to comment. If you don't want to authorize the app to post on your behalf, you can also comment directly on the GitHub issue for this page.