See any bugs/typos/confusing explanations? Open a GitHub issue. You can also comment below

★ See also the PDF version of this chapter (better formatting/references) ★

NP, NP completeness, and the Cook-Levin Theorem

  • Introduce the class NP\mathbf{NP} capturing a great many important computational problems
  • NP\mathbf{NP}-completeness: evidence that a problem might be intractable.
  • The P\mathbf{P} vs NP\mathbf{NP} problem.

“In this paper we give theorems that suggest, but do not imply, that these problems, as well as many others, will remain intractable perpetually”, Richard Karp, 1972

“Sad to say, but it will be many more years, if ever before we really understand the Mystical Power of Twoness… 2-SAT is easy, 3-SAT is hard, 2-dimensional matching is easy, 3-dimensional matching is hard. Why? oh, Why?” Eugene Lawler

So far we have shown that 3SAT is no harder than Quadratic Equations, Independent Set, Maximum Cut, and Longest Path. But to show that these problems are computationally equivalent we need to give reductions in the other direction, reducing each one of these problems to 3SAT as well. It turns out we can reduce all three problems to 3SAT in one fell swoop.

In fact, this result extends far beyond these particular problems. All of the problems we discussed in Chapter 14, and a great many other problems, share the same commonality: they are all search problems, where the goal is to decide, given an instance xx, whether there exists a solution yy that satisfies some condition that can be verified in polynomial time. For example, in 3SAT, the instance is a formula and the solution is an assignment to the variable; in Max-Cut the instance is a graph and the solution is a cut in the graph; and so on and so forth. It turns out that every such search problem can be reduced to 3SAT.

In this chapter we will see the definition of the complexity class NP\mathbf{NP}- one of the most important definitions in this book, and the Cook-Levin Theorem- one of the most important theorems in it. Intuitively, the class NP\mathbf{NP} corresponds to the class of problems where it is easy to verify a solution (i.e., verification can be done by a polynomial-time algorithm). For example, finding a satistfying assignment to a 2SAT or 3SAT formula is such a problem, since if we are given an assignment to the variables a 2SAT or 3SAT formula then we can efficiently verify that it satisfies all constraints. More precisely, NP\mathbf{NP} is the class of decision problems (i.e., Boolean functions or languages) corresponding to determining the existence of such a solution, though we will see in Chapter 16 that the decision and search problems are closely related.

As the examples of 2SAT and 3SAT show, there are some computational problems (i.e., functions) in NP\mathbf{NP} for which we have a polynomial-time algorithm, and some for which no such algorithm is known. It is an outstanding open question whether or not all functions in NP\mathbf{NP} have a polynomial-time algorithm, or in other words (to use just a little bit of math) whether or not P=NP\mathbf{P}=\mathbf{NP}. In this chapter we will see that there are some functions in NP\mathbf{NP} that are in a precise sense “hardest in all of NP\mathbf{NP}” in the sense that if even one of these functions has a polynomial-time algorithm then all functions in NP\mathbf{NP} have such an algorithm. Such functions are known as NP\mathbf{NP} complete. The Cook-Levin Theorem states that 3SAT is NP\mathbf{NP} complete. Using a complex web of polynomial-time reductions, researchers have derived from the Cook-Levin theorem the NP\mathbf{NP}-completeness of thousands of computational problems from all areas of mathematics, natural and social sciences, engineering, and more. These results provide strong evidence that all of these problems cannot be solved in the worst-case by polynomial-time algorithm.

15.1: Overview of the results of this chapter. We define NP\mathbf{NP} to contain all decision problems for which a solution can be efficiently verified. The main result of this chapter is the Cook Levin Theorem (Theorem 15.6) which states that 3SAT3\ensuremath{\mathit{SAT}} has a polynomial-time algorithm if and only if every problem in NP\mathbf{NP} has a polynomial-time algorithm. Another way to state this theorem is that 3SAT3\ensuremath{\mathit{SAT}} is NP\mathbf{NP} complete. We will prove the Cook-Levin theorem by defining the two intermediate problems NANDSAT\ensuremath{\mathit{NANDSAT}} and 3NAND3\ensuremath{\mathit{NAND}}, proving that NANDSAT\ensuremath{\mathit{NANDSAT}} is NP\mathbf{NP} complete, and then proving that NANDSATp3NANDp3SAT\ensuremath{\mathit{NANDSAT}} \leq_p 3\ensuremath{\mathit{NAND}} \leq_p 3\ensuremath{\mathit{SAT}}.

The class NP\mathbf{NP}

To make the above precise, we will make the following mathematical definition. We define the class NP\mathbf{NP} to contain all Boolean functions that correspond to a search problem of the form above. That is, a Boolean function FF is in NP\mathbf{NP} if FF has the form that on input a string xx, F(x)=1F(x)=1 if and only if there exists a “solution” string ww such that the pair (x,w)(x,w) satisfies some polynomial-time checkable condition. Formally, NP\mathbf{NP} is defined as follows:

15.2: The class NP\mathbf{NP} corresponds to problems where solutions can be efficiently verified. That is, this is the class of functions FF such that F(x)=1F(x)=1 if there is a “solution” ww of length polynomial in x|x| that can be verified by a polynomial-time algorithm VV.

We say that F:{0,1}{0,1}F:\{0,1\}^* \rightarrow \{0,1\} is in NP\mathbf{NP} if there exists some integer a>0a>0 and V:{0,1}{0,1}V:\{0,1\}^* \rightarrow \{0,1\} such that VPV\in \mathbf{P} and for every x{0,1}nx\in \{0,1\}^n,

F(x)=1w{0,1}na s.t. V(xw)=1  .    (15.1) F(x)=1 \Leftrightarrow \exists_{w \in \{0,1\}^{n^a}} \text{ s.t. } V(xw)=1 \;. \;\;(15.1)

In other words, for FF to be in NP\mathbf{NP}, there needs to exist some polynomial-time computable verification function VV, such that if F(x)=1F(x)=1 then there must exist ww (of length polynomial in x|x|) such that V(xw)=1V(xw)=1, and if F(x)=0F(x)=0 then for every such ww, V(xw)=0V(xw)=0. Since the existence of this string ww certifies that F(x)=1F(x)=1, ww is often referred to as a certificate, witness, or proof that F(x)=1F(x)=1.

See also Figure 15.2 for an illustration of Definition 15.1. The name NP\mathbf{NP} stands for “non-deterministic polynomial time” and is used for historical reasons; see the bibiographical notes. The string ww in Equation 15.1 is sometimes known as a solution, certificate, or witness for the instance xx.

Show that the condition that w=xa|w|=|x|^a in Definition 15.1 can be replaced by the condition that wp(x)|w| \leq p(|x|) for some polynomial pp. That is, prove that for every F:{0,1}{0,1}F:\{0,1\}^* \rightarrow \{0,1\}, FNPF \in \mathbf{NP} if and only if there is a polynomial-time Turing machine VV and a polynomial p:NNp:\N \rightarrow \N such that for every x{0,1}x\in \{0,1\}^* F(x)=1F(x)=1 if and only if there exists w{0,1}w\in \{0,1\}^* with wp(x)|w| \leq p(|x|) such that V(x,w)=1V(x,w)=1.

The “only if” direction (namely that if FNPF\in \mathbf{NP} then there is an algorithm VV and a polynomial pp as above) follows immediately from Definition 15.1 by letting p(n)=nap(n)=n^a. For the “if” direction, the idea is that if a string ww is of size at most p(n)p(n) for degree dd polynomial pp, then there is some n0n_0 such that for all n>n0n > n_0, w<nd+1|w| < n^{d+1}. Hence we can encode ww by a string of exactly length nd+1n^{d+1} by padding it with 11 and an appropriate number of zeroes. Hence if there is an algorithm VV and polynomial pp as above, then we can define an algorithm VV' that does the following on input x,wx,w' with x=n|x|=n and w=na|w'|=n^a:

  • If nn0n \leq n_0 then V(x,w)V'(x,w') ignores ww' and enumerates over all ww of length at most p(n)p(n) and outputs 11 if there exists ww such that V(x,w)=1V(x,w)=1. (Since n<n0n < n_0, this only takes a constant number of steps.)

  • If n>n0n> n_0 then V(x,w)V'(x,w') “strips out” the padding by dropping all the rightmost zeroes from ww until it reaches out the first 11 (which it drops as well) and obtains a string ww. If wp(n)|w| \leq p(n) then VV' outputs V(x,w)V(x,w).

Since VV runs in polynomial time, VV' runs in polynomial time as well, and by definition for every xx, there exists w{0,1}xaw' \in \{0,1\}^{|x|^a} such that V(xw)=1V'(xw')=1 if and only if there exists w{0,1}w \in \{0,1\}^* with wp(x)|w| \leq p(|x|) such that V(xw)=1V(xw)=1.

The definition of NP\mathbf{NP} means that for every FNPF\in \mathbf{NP} and string x{0,1}x\in \{0,1\}^*, F(x)=1F(x)=1 if and only if there is a short and efficiently verifiable proof of this fact. That is, we can think of the function VV in Definition 15.1 as a verifier algorithm, similar to what we’ve seen in Section 11.1. The verifier checks whether a given string w{0,1}w\in \{0,1\}^* is a valid proof for the statement “F(x)=1F(x)=1”. Essentially all proof systems considered in mathematics involve line-by-line checks that can be carried out in polynomial time. Thus the heart of NP\mathbf{NP} is asking for statements that have short (i.e., polynomial in the size of the statements) proofs. Indeed, as we will see in Chapter 16, Kurt Gödel phrased the question of whether NP=P\mathbf{NP}=\mathbf{P} as asking whether “the mental work of a mathematician [in proving theorems] could be completely replaced by a machine”.

Definition 15.1 is asymmetric in the sense that there is a difference between an output of 11 and an output of 00. You should make sure you understand why this definition does not guarantee that if FNPF \in \mathbf{NP} then the function 1F1-F (i.e., the map x1F(x)x \mapsto 1-F(x)) is in NP\mathbf{NP} as well.

In fact, it is believed that there do exist functions FF such that FNPF\in \mathbf{NP} but 1F∉NP1-F \not\in \mathbf{NP}. For example, as shown below, 3SATNP3\ensuremath{\mathit{SAT}} \in \mathbf{NP}, but the function 3SAT\overline{3\ensuremath{\mathit{SAT}}} that on input a 3CNF formula φ\varphi outputs 11 if and only if φ\varphi is not satisfiable is not known (nor believed) to be in NP\mathbf{NP}. This is in contrast to the class P\mathbf{P} which does satisfy that if FPF\in \mathbf{P} then 1F1-F is in P\mathbf{P} as well.

Examples of functions in NP\mathbf{NP}

We now present some examples of functions that are in the class NP\mathbf{NP}. We start with the canonical example of the 3SAT3\ensuremath{\mathit{SAT}} function.

3SAT3\ensuremath{\mathit{SAT}} is in NP\mathbf{NP} since for every \ell-variable formula φ\varphi, 3SAT(φ)=13\ensuremath{\mathit{SAT}}(\varphi)=1 if and only if there exists a satisfying assignment x{0,1}x \in \{0,1\}^\ell such that φ(x)=1\varphi(x)=1, and we can check this condition in polynomial time.

The above reasoning explains why 3SAT3\ensuremath{\mathit{SAT}} is in NP\mathbf{NP}, but since this is our first example, we will now belabor the point and expand out in full formality the precise representation of the witness ww and the algorithm VV that demonstrate that 3SAT3\ensuremath{\mathit{SAT}} is in NP\mathbf{NP}. Since demonstrating that functions are in NP\mathbf{NP} is fairly straightforward, in future cases we will not use as much detail, and the reader can also feel free to skip the rest of this example.

Using Solved Exercise 15.1, it is OK if witness is of size at most polynomial in the input length nn, rather than of precisely size nan^a for some integer a>0a>0. Specifically, we can represent a 3CNF formula φ\varphi with kk variables and mm clauses as a string of length n=O(mlogk)n=O(m\log k), since every one of the mm clauses involves three variables and their negation, and the identity of each variable can be represented using log2k\lceil \log_2 k \rceil. We assume that every variable participates in some clause (as otherwise it can be ignored) and hence that mkm \geq k, which in particular means that the input length nn is at least as large as mm and kk.

We can represent an assignment to the kk variables using a kk-length string ww. The following algorithm checks whether a given ww satisfies the formula φ\varphi:

Algorithm 15.4 Verifier for 3SAT

Input: 3CNF formula φ\varphi on kk variables and with mm clauses, string w{0,1}kw \in \{0,1\}^k

Output: 11 iff ww satisfies φ\varphi

for{j[m]j \in [m]}

Let 123\ell_1 \vee \ell_2 \vee \ell_3 be the jj-th clause of φ\varphi

if{ww violates all three literals}

return 00

endif

endfor

return 11

Algorithm 15.4 takes O(m)O(m) time to enumerate over all clauses, and will return 11 if and only if yy satisfies all the clauses.

Here are some more examples for problems in NP\mathbf{NP}. For each one of these problems we merely sketch how the witness is represented and why it is efficiently checkable, but working out the details can be a good way to get more comfortable with Definition 15.1:

  • QUADEQ\ensuremath{\mathit{QUADEQ}} is in NP\mathbf{NP} since for every \ell-variable instance of quadratic equations EE, QUADEQ(E)=1\ensuremath{\mathit{QUADEQ}}(E)=1 if and only if there exists an assignment x{0,1}x\in \{0,1\}^\ell that satisfies EE. We can check the condition that xx satisfies EE in polynomial time by enumerating over all the equations in EE, and for each such equation ee, plug in the values of xx and verify that ee is satisfied.

  • ISET\ensuremath{\mathit{ISET}} is in NP\mathbf{NP} since for every graph GG and integer kk, ISET(G,k)=1\ensuremath{\mathit{ISET}}(G,k)=1 if and only if there exists a set SS of kk vertices that contains no pair of neighbors in GG. We can check the condition that SS is an independent set of size k\geq k in polynomial time by first checking that Sk|S| \geq k and then enumerating over all edges {u,v}\{u,v \} in GG, and for each such edge verify that either u∉Su\not\in S or v∉Sv\not\in S.

  • LONGPATH\ensuremath{\mathit{LONGPATH}} is in NP\mathbf{NP} since for every graph GG and integer kk, LONGPATH(G,k)=1\ensuremath{\mathit{LONGPATH}}(G,k)=1 if and only if there exists a simple path PP in GG that is of length at least kk. We can check the condition that PP is a simple path of length kk in polynomial time by checking that it has the form (v0,v1,,vk)(v_0,v_1,\ldots,v_k) where each viv_i is a vertex in GG, no viv_i is repeated, and for every i[k]i \in [k], the edge {vi,vi+1}\{v_i,v_{i+1}\} is present in the graph.

  • MAXCUT\ensuremath{\mathit{MAXCUT}} is in NP\mathbf{NP} since for every graph GG and integer kk, MAXCUT(G,k)=1\ensuremath{\mathit{MAXCUT}}(G,k)=1 if and only if there exists a cut (S,S)(S,\overline{S}) in GG that cuts at least kk edges. We can check that condition that (S,S)(S,\overline{S}) is a cut of value at least kk in polynomial time by checking that SS is a subset of GG’s vertices and enumerating over all the edges {u,v}\{u,v\} of GG, counting those edges such that uSu\in S and v∉Sv\not\in S or vice versa.

Basic facts about NP\mathbf{NP}

The definition of NP\mathbf{NP} is one of the most important definitions of this book, and is worth while taking the time to digest and internalize. The following solved exercises establish some basic properties of this class. As usual, I highly recommend that you try to work out the solutions yourself.

Prove that PNP\mathbf{P} \subseteq \mathbf{NP}.

Suppose that FPF \in \mathbf{P}. Define the following function VV: V(x0n)=1V(x0^n)=1 iff n=xn=|x| and F(x)=1F(x)=1. (VV outputs 00 on all other inputs.) Since FPF\in \mathbf{P} we can clearly compute VV in polynomial time as well.

Let x{0,1}nx\in \{0,1\}^n be some string. If F(x)=1F(x)=1 then V(x0n)=1V(x0^n)=1. On the other hand, if F(x)=0F(x)=0 then for every w{0,1}nw\in \{0,1\}^n, V(xw)=0V(xw)=0. Therefore, setting a=1a=1 (i.e. w{0,1}n1w\in \{0,1\}^{n^1}), we see that VV satisfies Equation 15.1, and establishes that FNPF \in \mathbf{NP}.

People sometimes think that NP\mathbf{NP} stands for “non-polynomial time”. As Solved Exercise 15.2 shows, this is far from the truth, and in fact every polynomial-time computable function is in NP\mathbf{NP} as well.

If FF is in NP\mathbf{NP} it certainly does not mean that FF is hard to compute (though it does not, as far as we know, necessarily mean that it’s easy to compute either). Rather, it means that FF is easy to verify, in the technical sense of Definition 15.1.

Prove that NPEXP\mathbf{NP} \subseteq \mathbf{EXP}.

Suppose that FNPF\in \mathbf{NP} and let VV be the polynomial-time computable function that satisfies Equation 15.1 and aa the corresponding constant. Then given every x{0,1}nx\in \{0,1\}^n, we can check whether F(x)=1F(x)=1 in time poly(n)2na=o(2na+1)poly(n)\cdot 2^{n^a} = o(2^{n^{a+1}}) by enumerating over all the 2na2^{n^a} strings w{0,1}naw\in \{0,1\}^{n^a} and checking whether V(xw)=1V(xw)=1, in which case we return 11. If V(xw)=0V(xw)=0 for every such ww then we return 00. By construction, the algorithm above will run in time at most exponential in its input length and by the definition of NP\mathbf{NP} it will return F(x)F(x) for every xx.

Solved Exercise 15.2 and Solved Exercise 15.3 together imply that

PNPEXP  .\mathbf{P} \subseteq \mathbf{NP} \subseteq \mathbf{EXP}\;.

The time hierarchy theorem (Theorem 13.9) implies that PEXP\mathbf{P} \subsetneq \mathbf{EXP} and hence at least one of the two inclusions PNP\mathbf{P} \subseteq \mathbf{NP} or NPEXP\mathbf{NP} \subseteq \mathbf{EXP} is strict. It is believed that both of them are in fact strict inclusions. That is, it is believed that there are functions in NP\mathbf{NP} that cannot be computed in polynomial time (this is the PNP\mathbf{P} \neq \mathbf{NP} conjecture) and that there are functions FF in EXP\mathbf{EXP} for which we cannot even efficiently certify that F(x)=1F(x)=1 for a given input xx. One function FF that is believed to lie in EXPNP\mathbf{EXP} \setminus \mathbf{NP} is the function 3SAT\overline{3\ensuremath{\mathit{SAT}}} defined as 3SAT(φ)=13SAT(φ)\overline{3\ensuremath{\mathit{SAT}}}(\varphi)= 1 - 3\ensuremath{\mathit{SAT}}(\varphi) for every 3CNF formula φ\varphi. The conjecture that 3SAT∉NP\overline{3\ensuremath{\mathit{SAT}}}\not\in \mathbf{NP} is known as the “NPcoNP\mathbf{NP} \neq \mathbf{co-NP}” conjecture. It implies the PNP\mathbf{P} \neq \mathbf{NP} conjecture (see Exercise 15.2).

We have previously informally equated the notion of FpGF \leq_p G with FF being “no harder than GG” and in particular have seen in Solved Exercise 14.1 that if GPG \in \mathbf{P} and FpGF \leq_p G, then FPF \in \mathbf{P} as well. The following exercise shows that if FpGF \leq_p G then it is also “no harder to verify” than GG. That is, regardless of whether or not it is in P\mathbf{P}, if GG has the property that solutions to it can be efficiently verified, then so does FF.

Let F,G:{0,1}{0,1}F,G:\{0,1\}^* \rightarrow \{0,1\}. Show that if FpGF \leq_p G and GNPG\in \mathbf{NP} then FNPF \in \mathbf{NP}.

Suppose that GG is in NP\mathbf{NP} and in particular there exists aa and VPV \in \mathbf{P} such that for every y{0,1}y \in \{0,1\}^*, G(y)=1w{0,1}yaV(yw)=1G(y)=1 \Leftrightarrow \exists_{w\in \{0,1\}^{|y|^a}} V(yw)=1. Suppose also that FpGF \leq_p G and so in particular there is a nbn^b-time computable function RR such that F(x)=G(R(x))F(x) = G(R(x)) for all x{0,1}x\in \{0,1\}^*. Define VV' to be a Turing machine that on input a pair (x,w)(x,w) computes y=R(x)y=R(x) and returns 11 if and only if w=ya|w|=|y|^a and V(yw)=1V(yw)=1. Then VV' runs in polynomial time, and for every x{0,1}x\in \{0,1\}^*, F(x)=1F(x)=1 iff there exists ww of size R(x)a|R(x)|^a which is at most polynomial in x|x| such that V(x,w)=1V'(x,w)=1, hence demonstrating that FNPF \in \mathbf{NP}.

From NP\mathbf{NP} to 3SAT: The Cook-Levin Theorem

We have seen several examples of problems for which we do not know if their best algorithm is polynomial or exponential, but we can show that they are in NP\mathbf{NP}. That is, we don’t know if they are easy to solve, but we do know that it is easy to verify a given solution. There are many, many, many, more examples of interesting functions we would like to compute that are easily shown to be in NP\mathbf{NP}. What is quite amazing is that if we can solve 3SAT then we can solve all of them!

The following is one of the most fundamental theorems in Computer Science:

For every FNPF\in \mathbf{NP}, Fp3SATF \leq_p 3\ensuremath{\mathit{SAT}}.

We will soon show the proof of Theorem 15.6, but note that it immediately implies that QUADEQ\ensuremath{\mathit{QUADEQ}}, LONGPATH\ensuremath{\mathit{LONGPATH}}, and MAXCUT\ensuremath{\mathit{MAXCUT}} all reduce to 3SAT3\ensuremath{\mathit{SAT}}. Combining it with the reductions we’ve seen in Chapter 14, it implies that all these problems are equivalent! For example, to reduce QUADEQ\ensuremath{\mathit{QUADEQ}} to LONGPATH\ensuremath{\mathit{LONGPATH}}, we can first reduce QUADEQ\ensuremath{\mathit{QUADEQ}} to 3SAT3\ensuremath{\mathit{SAT}} using Theorem 15.6 and use the reduction we’ve seen in Theorem 14.12 from 3SAT3\ensuremath{\mathit{SAT}} to LONGPATH\ensuremath{\mathit{LONGPATH}}. That is, since QUADEQNP\ensuremath{\mathit{QUADEQ}} \in \mathbf{NP}, Theorem 15.6 implies that QUADEQp3SAT\ensuremath{\mathit{QUADEQ}} \leq_p 3\ensuremath{\mathit{SAT}}, and Theorem 14.12 implies that 3SATpLONGPATH3\ensuremath{\mathit{SAT}} \leq_p \ensuremath{\mathit{LONGPATH}}, which by the transitivity of reductions (Solved Exercise 14.2) means that QUADEQpLONGPATH\ensuremath{\mathit{QUADEQ}} \leq_p \ensuremath{\mathit{LONGPATH}}. Similarly, since LONGPATHNP\ensuremath{\mathit{LONGPATH}} \in \mathbf{NP}, we can use Theorem 15.6 and Theorem 14.4 to show that LONGPATHp3SATpQUADEQ\ensuremath{\mathit{LONGPATH}} \leq_p 3\ensuremath{\mathit{SAT}} \leq_p \ensuremath{\mathit{QUADEQ}}, concluding that LONGPATH\ensuremath{\mathit{LONGPATH}} and QUADEQ\ensuremath{\mathit{QUADEQ}} are computationally equivalent.

There is of course nothing special about QUADEQ\ensuremath{\mathit{QUADEQ}} and LONGPATH\ensuremath{\mathit{LONGPATH}} here: by combining Theorem 15.6 with the reductions we saw, we see that just like 3SAT3\ensuremath{\mathit{SAT}}, every FNPF\in \mathbf{NP} reduces to LONGPATH\ensuremath{\mathit{LONGPATH}}, and the same is true for QUADEQ\ensuremath{\mathit{QUADEQ}} and MAXCUT\ensuremath{\mathit{MAXCUT}}. All these problems are in some sense “the hardest in NP\mathbf{NP}” since an efficient algorithm for any one of them would imply an efficient algorithm for all the problems in NP\mathbf{NP}. This motivates the following definition:

Let G:{0,1}{0,1}G:\{0,1\}^* \rightarrow \{0,1\}. We say that GG is NP\mathbf{NP} hard if for every FNPF\in \mathbf{NP}, FpGF \leq_p G.

We say that GG is NP\mathbf{NP} complete if GG is NP\mathbf{NP} hard and GNPG \in \mathbf{NP}.

The Cook-Levin Theorem (Theorem 15.6) can be rephrased as saying that 3SAT3\ensuremath{\mathit{SAT}} is NP\mathbf{NP} hard, and since it is also in NP\mathbf{NP}, this means that 3SAT3\ensuremath{\mathit{SAT}} is NP\mathbf{NP} complete. Together with the reductions of Chapter 14, Theorem 15.6 shows that despite their superficial differences, 3SAT, quadratic equations, longest path, independent set, and maximum cut, are all NP\mathbf{NP}-complete. Many thousands of additional problems have been shown to be NP\mathbf{NP}-complete, arising from all the sciences, mathematics, economics, engineering and many other fields. (For a few examples, see this Wikipedia page and this website.)

If a single NP\mathbf{NP}-complete has a polynomial-time algorithm, then there is such an algorithm for every decision problem that corresponds to the existence of an efficiently-verifiable solution.

What does this mean?

As we’ve seen in Solved Exercise 15.2, PNP\mathbf{P} \subseteq \mathbf{NP}. The most famous conjecture in Computer Science is that this containment is strict. That is, it is widely conjectured that PNP\mathbf{P} \neq \mathbf{NP}. One way to refute the conjecture that PNP\mathbf{P} \neq \mathbf{NP} is to give a polynomial-time algorithm for even a single one of the NP\mathbf{NP}-complete problems such as 3SAT, Max Cut, or the thousands of others that have been studied in all fields of human endeavors. The fact that these problems have been studied by so many people, and yet not a single polynomial-time algorithm for any of them has been found, supports that conjecture that indeed PNP\mathbf{P} \neq \mathbf{NP}. In fact, for many of these problems (including all the ones we mentioned above), we don’t even know of a 2o(n)2^{o(n)}-time algorithm! However, to the frustration of computer scientists, we have not yet been able to prove that PNP\mathbf{P}\neq\mathbf{NP} or even rule out the existence of an O(n)O(n)-time algorithm for 3SAT. Resolving whether or not P=NP\mathbf{P}=\mathbf{NP} is known as the P\mathbf{P} vs NP\mathbf{NP} problem. A million-dollar prize has been offered for the solution of this problem, a popular book has been written, and every year a new paper comes out claiming a proof of P=NP\mathbf{P}=\mathbf{NP} or PNP\mathbf{P}\neq\mathbf{NP}, only to wither under scrutiny.

15.3: The world if PNP\mathbf{P}\neq \mathbf{NP} (left) and P=NP\mathbf{P}=\mathbf{NP} (right). In the former case the set of NP\mathbf{NP}-complete problems is disjoint from P\mathbf{P} and Ladner’s theorem shows that there exist problems that are neither in P\mathbf{P} nor are NP\mathbf{NP}-complete. (There are remarkably few natural candidates for such problems, with some prominent examples being decision variants of problems such as integer factoring, lattice shortest vector, and finding Nash equilibria.) In the latter case that P=NP\mathbf{P}=\mathbf{NP} the notion of NP\mathbf{NP}-completeness loses its meaning, as essentially all functions in P\mathbf{P} (save for the trivial constant zero and constant one functions) are NP\mathbf{NP}-complete.

One of the mysteries of computation is that people have observed a certain empirical “zero-one law” or “dichotomy” in the computational complexity of natural problems, in the sense that many natural problems are either in P\mathbf{P} (often in TIME(O(n))\ensuremath{\mathit{TIME}}(O(n)) or TIME(O(n2))\ensuremath{\mathit{TIME}}(O(n^2))), or they are are NP\mathbf{NP} hard. This is related to the fact that for most natural problems, the best known algorithm is either exponential or polynomial, with not too many examples where the best running time is some strange intermediate complexity such as 22logn2^{2^{\sqrt{\log n}}}. However, it is believed that there exist problems in NP\mathbf{NP} that are neither in P\mathbf{P} nor are NP\mathbf{NP}-complete, and in fact a result known as “Ladner’s Theorem” shows that if PNP\mathbf{P} \neq \mathbf{NP} then this is indeed the case (see also Exercise 15.1 and Figure 15.3).

15.4: A rough illustration of the (conjectured) status of problems in exponential time. Darker colors correspond to higher running time, and the circle in the middle is the problems in P\mathbf{P}. NP\mathbf{NP} is a (conjectured to be proper) superclass of P\mathbf{P} and the NP\mathbf{NP}-complete problems (or NPC\mathbf{NPC} for short) are the “hardest” problems in NP\mathbf{NP}, in the sense that a solution for one of them implies a solution for all other problems in NP\mathbf{NP}. It is conjectured that all the NP\mathbf{NP}-complete problems require at least exp(nϵ)\exp(n^\epsilon) time to solve for a constant ϵ>0\epsilon>0, and many require exp(Ω(n))\exp(\Omega(n)) time. The permanent is not believed to be contained in NP\mathbf{NP} though it is NP\mathbf{NP}-hard, which means that a polynomial-time algorithm for it implies that P=NP\mathbf{P}=\mathbf{NP}.

The Cook-Levin Theorem: Proof outline

We will now prove the Cook-Levin Theorem, which is the underpinning to a great web of reductions from 3SAT to thousands of problems across many great fields. Some problems that have been shown to be NP\mathbf{NP}-complete include: minimum-energy protein folding, minimum surface-area foam configuration, map coloring, optimal Nash equilibrium, quantum state entanglement, minimum supersequence of a genome, minimum codeword problem, shortest vector in a lattice, minimum genus knots, positive Diophantine equations, integer programming, and many many more. The worst-case complexity of all these problems is (up to polynomial factors) equivalent to that of 3SAT, and through the Cook-Levin Theorem, to all problems in NP\mathbf{NP}.

To prove Theorem 15.6 we need to show that Fp3SATF \leq_p 3\ensuremath{\mathit{SAT}} for every FNPF\in \mathbf{NP}. We will do so in three stages. We define two intermediate problems: NANDSAT\ensuremath{\mathit{NANDSAT}} and 3NAND3\ensuremath{\mathit{NAND}}. We will shortly show the definitions of these two problems, but Theorem 15.6 will follow from combining the following three results:

  1. NANDSAT\ensuremath{\mathit{NANDSAT}} is NP\mathbf{NP} hard (Lemma 15.8).

  2. NANDSATp3NAND\ensuremath{\mathit{NANDSAT}} \leq_p 3\ensuremath{\mathit{NAND}} (Lemma 15.9).

  3. 3NANDp3SAT3\ensuremath{\mathit{NAND}} \leq_p 3\ensuremath{\mathit{SAT}} (Lemma 15.10).

By the transitivity of reductions, it will follow that for every FNPF \in \mathbf{NP},

FpNANDSATp3NANDp3SAT F \leq_p \ensuremath{\mathit{NANDSAT}} \leq_p 3\ensuremath{\mathit{NAND}} \leq_p 3\ensuremath{\mathit{SAT}}

hence establishing Theorem 15.6.

We will prove these three results Lemma 15.8, Lemma 15.9 and Lemma 15.10 one by one, providing the requisite definitions as we go along.

The NANDSAT\ensuremath{\mathit{NANDSAT}} Problem, and why it is NP\mathbf{NP} hard

The function NANDSAT:{0,1}{0,1}\ensuremath{\mathit{NANDSAT}}:\{0,1\}^* \rightarrow \{0,1\} is defined as follows:

  • The input to NANDSAT\ensuremath{\mathit{NANDSAT}} is a string QQ representing a NAND-CIRC program (or equivalently, a circuit with NAND gates).

  • The output of NANDSAT\ensuremath{\mathit{NANDSAT}} on input QQ is 11 if and only if there exists a string w{0,1}nw\in \{0,1\}^n (where nn is the number of inputs to QQ) such that Q(w)=1Q(w)=1.

Prove that NANDSATNP\ensuremath{\mathit{NANDSAT}} \in \mathbf{NP}.

We have seen that the circuit (or straightline program) evaluation problem can be computed in polynomial time. Specifically, given a NAND-CIRC program QQ of ss lines and nn inputs, and w{0,1}nw\in \{0,1\}^n, we can evaluate QQ on the input ww in time which is polynomial in ss and hence verify whether or not Q(w)=1Q(w)=1.

We now prove that NANDSAT\ensuremath{\mathit{NANDSAT}} is NP\mathbf{NP} hard.

NANDSAT\ensuremath{\mathit{NANDSAT}} is NP\mathbf{NP} hard.

The proof closely follows the proof that PP/poly\mathbf{P} \subseteq \mathbf{P_{/poly}} (Theorem 13.12 , see also Section 13.6.2). Specifically, if FNPF\in \mathbf{NP} then there is a polynomial time Turing machine MM and positive integer aa such that for every x{0,1}nx\in \{0,1\}^n, F(x)=1F(x)=1 iff there is some w{0,1}naw \in \{0,1\}^{n^a} such that M(xw)=1M(xw)=1. The proof that PP/poly\mathbf{P} \subseteq \mathbf{P_{/poly}} gave us a way (via “unrolling the loop”) to come up in polynomial time with a Boolean circuit CC on nan^a inputs that computes the function wM(xw)w \mapsto M(xw). We can then translate CC into an equivalent NAND circuit (or NAND-CIRC program) QQ. We see that there is a string w{0,1}naw \in \{0,1\}^{n^a} such that Q(w)=1Q(w)=1 if and only if there is such ww satisfying M(xw)=1M(xw)=1 which (by definition) happens if and only if F(x)=1F(x)=1. Hence the translation of xx into the circuit QQ is a reduction showing FpNANDSATF \leq_p \ensuremath{\mathit{NANDSAT}}.

The proof is a little bit technical but ultimately follows quite directly from the definition of NP\mathbf{NP}, as well as the ability to “unroll the loop” of NAND-TM programs as discussed in Section 13.6.2. If you find it confusing, try to pause here and think how you would implement in your favorite programming language the function unroll which on input a NAND-TM program PP and numbers T,nT,n outputs an nn-input NAND-CIRC program QQ of O(T)O(|T|) lines such that for every input z{0,1}nz\in \{0,1\}^n, if PP halts on zz within at most TT steps and outputs yy, then Q(z)=yQ(z)=y.

Let FNPF \in \mathbf{NP}. To prove Lemma 15.8 we need to give a polynomial-time computable function that will map every x{0,1}x^* \in \{0,1\}^* to a NAND-CIRC program QQ such that F(x)=NANDSAT(Q)F(x^*)=\ensuremath{\mathit{NANDSAT}}(Q).

Let x{0,1}x^* \in \{0,1\}^* be such a string and let n=xn=|x^*| be its length. By Definition 15.1 there exists VPV \in \mathbf{P} and positive aNa \in \N such that F(x)=1F(x^*)=1 if and only if there exists w{0,1}naw\in \{0,1\}^{n^a} satisfying V(xw)=1V(x^*w)=1.

Let m=nam=n^a. Since VPV\in \mathbf{P} there is some NAND-TM program PP^* that computes VV on inputs of the form xwxw with x{0,1}nx\in \{0,1\}^n and w{0,1}mw\in \{0,1\}^m in at most (n+m)c{(n+m)}^c time for some constant cc. Using our “unrolling the loop NAND-TM to NAND compiler” of Theorem 13.14, we can obtain a NAND-CIRC program QQ' that has n+mn+m inputs and at most O((n+m)2c)O((n+m)^{2c}) lines such that Q(xw)=P(xw)Q'(xw)= P^*(xw) for every x{0,1}nx\in \{0,1\}^n and w{0,1}mw \in \{0,1\}^m.

We can then use a simple “hardwiring” technique, reminiscent of Remark 9.11 to map QQ' into a circuit/NAND-CIRC program QQ on mm inputs such that Q(w)=Q(xw)Q(w)= Q'(x^*w) for every w{0,1}mw\in \{0,1\}^m.

CLAIM: There is a polynomial-time algorithm that on input a NAND-CIRC program QQ' on n+mn+m inputs and x{0,1}nx^* \in \{0,1\}^n, outputs a NAND-CIRC program QQ such that for every w{0,1}nw\in \{0,1\}^n, Q(w)=Q(xw)Q(w)=Q'(x^*w).

PROOF OF CLAIM: We can do so by adding a few lines to ensure that the variables zero and one are 00 and 11 respectively, and then simply replacing any reference in QQ' to an input xix_i with i[n]i\in [n] the corresponding value based on xix^*_i. See Figure 15.5 for an implementation of this reduction in Python.

Our final reduction maps an input xx^*, into the NAND-CIRC program QQ obtained above. By the above discussion, this reduction runs in polynomial time. Since we know that F(x)=1F(x^*)=1 if and only if there exists w{0,1}mw\in \{0,1\}^m such that P(xw)=1P^*(x^*w)=1, this means that F(x)=1F(x^*)=1 if and only if NANDSAT(Q)=1\ensuremath{\mathit{NANDSAT}}(Q)=1, which is what we wanted to prove.

15.5: Given an TT-line NAND-CIRC program QQ that has n+mn+m inputs and some x{0,1}nx^*\in \{0,1\}^n, we can transform QQ into a T+3T+3 line NAND-CIRC program QQ' that computes the map wQ(xw)w \mapsto Q(x^*w) for w{0,1}mw\in \{0,1\}^m by simply adding code to compute the zero and one constants, replacing all references to X[ii] with either zero or one depending on the value of xix^*_i, and then replacing the remaining references to X[jj] with X[jnj-n]. Above is Python code that implements this transformation, as well as an example of its execution on a simple program.

The 3NAND3\ensuremath{\mathit{NAND}} problem

The 3NAND3\ensuremath{\mathit{NAND}} problem is defined as follows:

  • The input is a logical formula Ψ\Psi on a set of variables z0,,zr1z_0,\ldots,z_{r-1} which is an AND of constraints of the form zi=NAND(zj,zk)z_i = \ensuremath{\mathit{NAND}}(z_j,z_k).

  • The output is 11 if and only if there is an input z{0,1}rz\in \{0,1\}^r that satisfies all of the constraints.

For example, the following is a 3NAND3\ensuremath{\mathit{NAND}} formula with 55 variables and 33 constraints:

Ψ=(z3=NAND(z0,z2))(z1=NAND(z0,z2))(z4=NAND(z3,z1))  . \Psi = \left( z_3 = \ensuremath{\mathit{NAND}}(z_0,z_2) \right) \wedge \left( z_1 = \ensuremath{\mathit{NAND}}(z_0,z_2) \right) \wedge \left( z_4 = \ensuremath{\mathit{NAND}}(z_3,z_1) \right) \;.

In this case 3NAND(Ψ)=13\ensuremath{\mathit{NAND}}(\Psi)=1 since the assignment z=01010z = 01010 satisfies it. Given a 3NAND3\ensuremath{\mathit{NAND}} formula Ψ\Psi on rr variables and an assignment z{0,1}rz\in \{0,1\}^r, we can check in polynomial time whether Ψ(z)=1\Psi(z)=1, and hence 3NANDNP3\ensuremath{\mathit{NAND}} \in \mathbf{NP}. We now prove that 3NAND3\ensuremath{\mathit{NAND}} is NP\mathbf{NP} hard:

NANDSATp3NAND\ensuremath{\mathit{NANDSAT}} \leq_p 3\ensuremath{\mathit{NAND}}.

To prove Lemma 15.9 we need to give a polynomial-time map from every NAND-CIRC program QQ to a 3NAND formula Ψ\Psi such that there exists ww such that Q(w)=1Q(w)=1 if and only if there exists zz satisfying Ψ\Psi. For every line ii of QQ, we define a corresponding variable ziz_i of Ψ\Psi. If the line ii has the form foo = NAND(bar,blah) then we will add the clause zi=NAND(zj,zk)z_i = \ensuremath{\mathit{NAND}}(z_j,z_k) where jj and kk are the last lines in which bar and blah were written to. We will also set variables corresponding to the input variables, as well as add a clause to ensure that the final output is 11. The resulting reduction can be implemented in about a dozen lines of Python, see Figure 15.6.

15.6: Python code to reduce an instance QQ of NANDSAT\ensuremath{\mathit{NANDSAT}} to an instance Ψ\Psi of 3NAND3\ensuremath{\mathit{NAND}}. In the example above we transform the NAND-CIRC program xor5 which has 55 input variables and 1616 lines, into a 3NAND3\ensuremath{\mathit{NAND}} formula Ψ\Psi that has 2424 variables and 2020 clauses. Since xor5 outputs 11 on the input 1,0,0,1,11,0,0,1,1, there exists an assignment z{0,1}24z \in \{0,1\}^{24} to Ψ\Psi such that (z0,z1,z2,z3,z4)=(1,0,0,1,1)(z_0,z_1,z_2,z_3,z_4)=(1,0,0,1,1) and Ψ\Psi evaluates to true on zz.

To prove Lemma 15.9 we need to give a reduction from NANDSAT\ensuremath{\mathit{NANDSAT}} to 3NAND3\ensuremath{\mathit{NAND}}. Let QQ be a NAND-CIRC program with nn inputs, one output, and mm lines. We can assume without loss of generality that QQ contains the variables one and zero as usual.

We map QQ to a 3NAND3\ensuremath{\mathit{NAND}} formula Ψ\Psi as follows:

  • Ψ\Psi has m+nm+n variables z0,,zm+n1z_0,\ldots,z_{m+n-1}.

  • The first nn variables z0,,zn1z_0,\ldots,z_{n-1} will corresponds to the inputs of QQ. The next mm variables zn,,zn+m1z_n,\ldots,z_{n+m-1} will correspond to the mm lines of QQ.

  • For every {n,n+1,,n+m}\ell\in \{n,n+1,\ldots,n+m \}, if the n\ell-n-th line of the program QQ is foo = NAND(bar,blah) then we add to Ψ\Psi the constraint z=NAND(zj,zk)z_\ell = \ensuremath{\mathit{NAND}}(z_j,z_k) where jnj-n and knk-n correspond to the last lines in which the variables bar and blah (respectively) were written to. If one or both of bar and blah was not written to before then we use z0z_{\ell_0} instead of the corresponding value zjz_j or zkz_k in the constraint, where 0n\ell_0-n is the line in which zero is assigned a value. If one or both of bar and blah is an input variable X[i] then we use ziz_i in the constraint.

  • Let \ell^* be the last line in which the output y_0 is assigned a value. Then we add the constraint z=NAND(z0,z0)z_{\ell^*} = \ensuremath{\mathit{NAND}}(z_{\ell_0},z_{\ell_0}) where 0n\ell_0-n is as above the last line in which zero is assigned a value. Note that this is effectively the constraint z=NAND(0,0)=1z_{\ell^*}=\ensuremath{\mathit{NAND}}(0,0)=1.

To complete the proof we need to show that there exists w{0,1}nw\in \{0,1\}^n s.t. Q(w)=1Q(w)=1 if and only if there exists z{0,1}n+mz\in \{0,1\}^{n+m} that satisfies all constraints in Ψ\Psi. We now show both sides of this equivalence.

Part I: Completeness. Suppose that there is w{0,1}nw\in \{0,1\}^n s.t. Q(w)=1Q(w)=1. Let z{0,1}n+mz\in \{0,1\}^{n+m} be defined as follows: for i[n]i\in [n], zi=wiz_i=w_i and for i{n,n+1,,n+m}i\in \{n,n+1,\ldots,n+m\} ziz_i equals the value that is assigned in the (in)(i-n)-th line of QQ when executed on ww. Then by construction zz satisfies all of the constraints of Ψ\Psi (including the constraint that z=NAND(0,0)=1z_{\ell^*}=\ensuremath{\mathit{NAND}}(0,0)=1 since Q(w)=1Q(w)=1.)

Part II: Soundness. Suppose that there exists z{0,1}n+mz\in \{0,1\}^{n+m} satisfying Ψ\Psi. Soundness will follow by showing that Q(z0,,zn1)=1Q(z_0,\ldots,z_{n-1})=1 (and hence in particular there exists w{0,1}nw\in \{0,1\}^n, namely w=z0zn1w=z_0\cdots z_{n-1}, such that Q(w)=1Q(w)=1). To do this we will prove the following claim ()(*): for every [m]\ell \in [m], z+nz_{\ell+n} equals the value assigned in the \ell-th step of the execution of the program QQ on z0,,zn1z_0,\ldots,z_{n-1}. Note that because zz satisfies the constraints of Ψ\Psi, ()(*) is sufficient to prove the soundness condition since these constraints imply that the last value assigned to the variable y_0 in the execution of QQ on z0zn1z_0\cdots z_{n-1} is equal to 11. To prove ()(*) suppose, towards a contradiction, that it is false, and let \ell be the smallest number such that z+nz_{\ell+n} is not equal to the value assigned in the \ell-th step of the execution of QQ on z0,,zn1z_0,\ldots,z_{n-1}. But since zz satisfies the constraints of Ψ\Psi, we get that z+n=NAND(zi,zj)z_{\ell+n}=\ensuremath{\mathit{NAND}}(z_i,z_j) where (by the assumption above that \ell is smallest with this property) these values do correspond to the values last assigned to the variables on the right-hand side of the assignment operator in the \ell-th line of the program. But this means that the value assigned in the \ell-th step is indeed simply the NAND of ziz_i and zjz_j, contradicting our assumption on the choice of \ell.

15.7: A 3NAND3\ensuremath{\mathit{NAND}} instance that is obtained by taking a NAND-TM program for computing the AND\ensuremath{\mathit{AND}} function, unrolling it to obtain a NANDSAT\ensuremath{\mathit{NANDSAT}} instance, and then composing it with the reduction of Lemma 15.9.

From 3NAND3\ensuremath{\mathit{NAND}} to 3SAT3\ensuremath{\mathit{SAT}}

The final step in the proof of Theorem 15.6 is the following:

3NANDp3SAT3\ensuremath{\mathit{NAND}} \leq_p 3\ensuremath{\mathit{SAT}}.

To prove Lemma 15.10 we need to map a 3NAND formula φ\varphi into a 3SAT formula ψ\psi such that φ\varphi is satisfiable if and only if ψ\psi is. The idea is that we can transform every NAND constraint of the form a=NAND(b,c)a=\ensuremath{\mathit{NAND}}(b,c) into the AND of ORs involving the variables a,b,ca,b,c and their negations, where each of the ORs contains at most three terms. The construction is fairly straightforward, and the details are given below.

It is a good exercise for you to try to find a 3CNF formula ξ\xi on three variables a,b,ca,b,c such that ξ(a,b,c)\xi(a,b,c) is true if and only if a=NAND(b,c)a = \ensuremath{\mathit{NAND}}(b,c). Once you do so, try to see why this implies a reduction from 3NAND3\ensuremath{\mathit{NAND}} to 3SAT3\ensuremath{\mathit{SAT}}, and hence completes the proof of Lemma 15.10

15.8: Code and example output for the reduction given in Lemma 15.10 of 3NAND3\ensuremath{\mathit{NAND}} to 3SAT3\ensuremath{\mathit{SAT}}.

The constraint

zi=NAND(zj,zk)    (15.2) z_i = \ensuremath{\mathit{NAND}}(z_j,z_k) \;\;(15.2)
is satisfied if zi=1z_i=1 whenever (zj,zk)(1,1)(z_j,z_k) \neq (1,1). By going through all cases, we can verify that Equation 15.2 is equivalent to the constraint

(zizjzk)(zizj)(zizk)    .    (15.3) (\overline{z_i} \vee \overline{z_j} \vee\overline{z_k} ) \wedge (z_i \vee z_j ) \wedge (z_i \vee z_k) \;\;. \;\;(15.3)

Indeed if zj=zk=1z_j=z_k=1 then the first constraint of Equation 15.3 is only true if zi=0z_i=0. On the other hand, if either of zjz_j or zkz_k equals 00 then unless zi=1z_i=1 either the second or third constraints will fail. This means that, given any 3NAND formula φ\varphi over nn variables z0,,zn1z_0,\ldots,z_{n-1}, we can obtain a 3SAT formula ψ\psi over the same variables by replacing every 3NAND3\ensuremath{\mathit{NAND}} constraint of φ\varphi with three 3OR3\ensuremath{\mathit{OR}} constraints as in Equation 15.3.

1 Because of the equivalence of Equation 15.2 and Equation 15.3, the formula ψ\psi satisfies that ψ(z0,,zn1)=φ(z0,,zn1)\psi(z_0,\ldots,z_{n-1})=\varphi(z_0,\ldots,z_{n-1}) for every assignment z0,,zn1{0,1}nz_0,\ldots,z_{n-1} \in \{0,1\}^n to the variables. In particular ψ\psi is satisfiable if and only if φ\varphi is, thus completing the proof.

15.9: An instance of the independent set problem obtained by applying the reductions NANDSATp3NANDp3SATpISAT\ensuremath{\mathit{NANDSAT}} \leq_p 3\ensuremath{\mathit{NAND}} \leq_p 3\ensuremath{\mathit{SAT}} \leq_p \ensuremath{\mathit{ISAT}} starting with the xor5 NAND-CIRC program.

Wrapping up

We have shown that for every function FF in NP\mathbf{NP}, FpNANDSATp3NANDp3SATF \leq_p \ensuremath{\mathit{NANDSAT}} \leq_p 3\ensuremath{\mathit{NAND}} \leq_p 3\ensuremath{\mathit{SAT}}, and so 3SAT3\ensuremath{\mathit{SAT}} is NP\mathbf{NP}-hard. Since in Chapter 14 we saw that 3SATpQUADEQ3\ensuremath{\mathit{SAT}} \leq_p \ensuremath{\mathit{QUADEQ}}, 3SATpISET3\ensuremath{\mathit{SAT}} \leq_p \ensuremath{\mathit{ISET}}, 3SATpMAXCUT3\ensuremath{\mathit{SAT}} \leq_p \ensuremath{\mathit{MAXCUT}} and 3SATpLONGPATH3\ensuremath{\mathit{SAT}} \leq_p \ensuremath{\mathit{LONGPATH}}, all these problems are NP\mathbf{NP}-hard as well. Finally, since all the aforementioned problems are in NP\mathbf{NP}, they are all in fact NP\mathbf{NP}-complete and have equivalent complexity. There are thousands of other natural problems that are NP\mathbf{NP}-complete as well. Finding a polynomial-time algorithm for any one of them will imply a polynomial-time algorithm for all of them.

15.10: We believe that PNP\mathbf{P} \neq \mathbf{NP} and all NP\mathbf{NP} complete problems lie outside of P\mathbf{P}, but we cannot rule out the possiblity that P=NP\mathbf{P}=\mathbf{NP}. However, we can rule out the possiblity that some NP\mathbf{NP}-complete problems are in P\mathbf{P} and others are not, since we know that if even one NP\mathbf{NP}-complete problem is in P\mathbf{P} then P=NP\mathbf{P}=\mathbf{NP}. The relation between P/poly\mathbf{P_{/poly}} and NP\mathbf{NP} is not known though it can be shown that if one NP\mathbf{NP}-complete problem is in P/poly\mathbf{P_{/poly}} then NPP/poly\mathbf{NP} \subseteq \mathbf{P_{/poly}}.
  • Many of the problems for which we don’t know polynomial-time algorithms are NP\mathbf{NP}-complete, which means that finding a polynomial-time algorithm for one of them would imply a polynomial-time algorithm for all of them.
  • It is conjectured that NPP\mathbf{NP}\neq \mathbf{P} which means that we believe that polynomial-time algorithms for these problems are not merely unknown but are non-existent.
  • While an NP\mathbf{NP}-hardness result means for example that a full-fledged “textbook” solution to a problem such as MAX-CUT that is as clean and general as the algorithm for MIN-CUT probably does not exist, it does not mean that we need to give up whenever we see a MAX-CUT instance. Later in this course we will discuss several strategies to deal with NP\mathbf{NP}-hardness, including average-case complexity and approximation algorithms.

Exercises

Prove that if there is no nO(log2n)n^{O(\log^2 n)} time algorithm for 3SAT3\ensuremath{\mathit{SAT}} then there is some FNPF\in \mathbf{NP} such that F∉PF \not\in \mathbf{P} and FF is not NP\mathbf{NP} complete.

2

Let 3SAT\overline{3\ensuremath{\mathit{SAT}}} be the function that on input a 3CNF formula φ\varphi return 13SAT(φ)1-3\ensuremath{\mathit{SAT}}(\varphi). Prove that if 3SAT∉NP\overline{3\ensuremath{\mathit{SAT}}} \not\in \mathbf{NP} then PNP\mathbf{P} \neq \mathbf{NP}. See footnote for hint.

3

Define WSAT\ensuremath{\mathit{WSAT}} to be the following function: the input is a CNF formula φ\varphi where each clause is the OR of one to three variables (without negations), and a number kNk\in \mathbb{N}. For example, the following formula can be used for a valid input to WSAT\ensuremath{\mathit{WSAT}}: φ=(x5x2x1)(x1x3x0)(x2x4x0)\varphi = (x_5 \vee x_{2} \vee x_1) \wedge (x_1 \vee x_3 \vee x_0) \wedge (x_2 \vee x_4 \vee x_0). The output WSAT(φ,k)=1\ensuremath{\mathit{WSAT}}(\varphi,k)=1 if and only if there exists a satisfying assignment to φ\varphi in which exactly kk of the variables get the value 11. For example for the formula above WSAT(φ,2)=1\ensuremath{\mathit{WSAT}}(\varphi,2)=1 since the assignment (1,1,0,0,0,0)(1,1,0,0,0,0) satisfies all the clauses. However WSAT(φ,1)=0\ensuremath{\mathit{WSAT}}(\varphi,1)=0 since there is no single variable appearing in all clauses.

Prove that WSAT\ensuremath{\mathit{WSAT}} is NP\mathbf{NP}-complete.

In the employee recruiting problem we are given a list of potential employees, each of which has some subset of mm potential skills, and a number kk. We need to assemble a team of kk employees such that for every skill there would be one member of the team with this skill.

For example, if Alice has the skills “C programming”, “NAND programming” and “Solving Differential Equations”, Bob has the skills “C programming” and “Solving Differential Equations”, and Charlie has the skills “NAND programming” and “Coffee Brewing”, then if we want a team of two people that covers all the four skills, we would hire Alice and Charlie.

Define the function EMP\ensuremath{\mathit{EMP}} s.t. on input the skills LL of all potential employees (in the form of a sequence LL of nn lists L1,,LnL_1,\ldots,L_n, each containing distinct numbers between 00 and mm), and a number kk, EMP(L,k)=1\ensuremath{\mathit{EMP}}(L,k)=1 if and only if there is a subset SS of kk potential employees such that for every skill jj in [m][m], there is an employee in SS that has the skill jj.

Prove that EMP\ensuremath{\mathit{EMP}} is NP\mathbf{NP} complete.

Prove that the “balanced variant” of the maximum cut problem is NP\mathbf{NP}-complete, where this is defined as BMC:{0,1}{0,1}\ensuremath{\mathit{BMC}}:\{0,1\}^* \rightarrow \{0,1\} where for every graph G=(V,E)G=(V,E) and kNk\in \mathbb{N}, BMC(G,k)=1\ensuremath{\mathit{BMC}}(G,k)=1 if and only if there exists a cut SS in GG cutting at least kk edges such that S=V/2|S|=|V|/2.

Let MANYREGS\ensuremath{\mathit{MANYREGS}} be the following function: On input a list of regular expressions exp0,,expmexp_0,\ldots,\exp_m (represented as strings in some standard way), output 11 if and only if there is a single string x{0,1}x \in \{0,1\}^* that matches all of them. Prove that MANYREGS\ensuremath{\mathit{MANYREGS}} is NP\mathbf{NP}-hard.

Bibliographical notes

Aaronson’s 120 page survey (Aaronson, 2016) is a beautiful and extensive exposition to the P\mathbf{P} vs NP\mathbf{NP} problem, its importance and status. See also as well as Chapter 3 in Wigderson’s excellent book (Wigderson, 2019) . Johnson (Johnson, 2012) gives a survey of the historical development of the theory of NP\mathbf{NP} completeness. The following web page keeps a catalog of failed attempts at settling P\mathbf{P} vs NP\mathbf{NP}. At the time of this writing, it lists about 110 papers claiming to resolve the question, of which about 60 claim to prove that P=NP\mathbf{P}=\mathbf{NP} and about 50 claim to prove that PNP\mathbf{P} \neq \mathbf{NP}.

Ladner’s Theorem was proved by Richard Ladner in 1975. Ladner, who was born to deaf parents, later switched his research focus into computing for assistive technologies, where he have made many contributions. In 2014, he wrote a personal essay on his path from theoretical CS to accessibility research.

Eugene Lawler’s quote on the “mystical power of twoness” was taken from the wonderful book “The Nature of Computation” by Moore and Mertens. See also this memorial essay on Lawler by Lenstra.

  1. The resulting formula will have some of the OR’s involving only two variables. If we wanted to insist on each formula involving three distinct variables we can always add a “dummy variable” zn+mz_{n+m} and include it in all the OR’s involving only two variables, and add a constraint requiring this dummy variable to be zero.

  2. Hint: Use the function FF that on input a formula φ\varphi and a string of the form 1t1^t, outputs 11 if and only if φ\varphi is satisfiable and t=φlogφt=|\varphi|^{\log|\varphi|}.

  3. Hint: Prove and then use the fact that P\mathbf{P} is closed under complement.

Comments

Comments are posted on the GitHub repository using the utteranc.es app. A GitHub login is required to comment. If you don't want to authorize the app to post on your behalf, you can also comment directly on the GitHub issue for this page.

Compiled on 12/06/2023 00:07:00

Copyright 2023, Boaz Barak. Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Produced using pandoc and panflute with templates derived from gitbook and bookdown.