★ See also the **PDF version of this chapter** (better formatting/references) ★

See any bugs/typos/confusing explanations? Open a GitHub issue. You can also comment below

- See that computation can be precisely modeled.
- Learn the computational model of
*Boolean circuits*/*straightline programs*. - See the NAND operation and also why the specific choice of NAND is
not important.
- Examples of computing in the physical world.
- Equivalence of circuits and programs.

“there is no reason why mental as well as bodily labor should not be economized by the aid of machinery”, Charles Babbage, 1852

“If, unwarned by my example, any man shall undertake and shall succeed in constructing an engine embodying in itself the whole of the executive department of mathematical analysis upon different principles or by simpler mechanical means, I have no fear of leaving my reputation in his charge, for he alone will be fully able to appreciate the nature of my efforts and the value of their results.”, Charles Babbage, 1864

“To understand a program you must become both the machine and the program.”, Alan Perlis, 1982

People have been computing for thousands of years, with aids that
include not just pen and paper, but also abacus, slide rulers, various
mechanical devices, and modern electronic computers. A priori, the
notion of computation seems to be tied to the particular mechanism that
you use. You might think that the “best” algorithm for multiplying
numbers will differ if you implement it in *Python* on a modern laptop
than if you use pen and paper. However, as we saw in the introduction
(chapintro), an algorithm that is asymptotically better would
eventually beat a worse one regardless of the underlying technology.
This gives us hope for a *technology independent* way of defining
computation, which is what we will do in this chapter.

The name “algorithm” is derived from the Latin transliteration of
Muhammad ibn Musa al-Khwarizmi’s name. Al-Khwarizmi was a Persian
scholar during the 9th century whose books introduced the western world
to the decimal positional numeral system, as well as the solutions of
linear and quadratic equations (see alKhwarizmi). However
Al-Khwarizmi’s descriptions of algorithms were rather informal by
today’s standards. Rather than use “variables” such as \(x,y\), he used
concrete numbers such as 10 and 39, and trusted the reader to be able to
extrapolate from these examples.

Here is how al-Khwarizmi described the algorithm for solving an equation
of the form \(x^2 +bx = c\):

[How to solve an equation of the form ] “roots and squares are equal to numbers”: For instance “one square , and ten roots of the same, amount to thirty-nine dirhems” that is to say, what must be the square which, when increased by ten of its own root, amounts to thirty-nine? The solution is this: you halve the number of the roots, which in the present instance yields five. This you multiply by itself; the product is twenty-five. Add this to thirty-nine’ the sum is sixty-four. Now take the root of this, which is eight, and subtract from it half the number of roots, which is five; the remainder is three. This is the root of the square which you sought for; the square itself is nine.

For the purposes of this course, we will need a much more precise way to
describe algorithms. Fortunately (or is it unfortunately?), at least at
the moment, computers lag far behind school-age children in learning
from examples. Hence in the 20th century people have come up with exact
formalisms for describing algorithms, namely *programming languages*.
Here is al-Khwarizmi’s quadratic equation solving algorithm described in
the *Python* programming language:

We can define algorithms informally as follows:

Informal definition of an algorithm:AnAlgorithmis a set of instructions of how to compute an input from an output by following a sequence of “elementary steps”.An algorithm \(A\)

computesa function \(F\) if for every input \(x\), if we follow the instruction of \(A\) on the input \(x\), we obtain the output \(F(x)\).

In this chapter we will use an ultra-simple “programming language” to
give a *formal* (that is, *precise*) definition of algorithms. (In fact,
our programming language will be so simple that it is hardly worthy of
this name.) However, it will take us some time to get there. We will
start by discussing what are “elementary operations” and also how do we
map a description of an algorithm into an actual physical process that
produces an output from an input in the real world.

An algorithm breaks down a complex calculation into a series of simpler steps. These steps can be executed by:

- Writing down symbols on a piece of paper
- Modifying the current flowing on electrical wires.
- Binding a protein to a strand of DNA
- Response to a stimulus by a member of a collection (e.g., a bee in a colony, a trader in a market).

Let us try to “err on the side of simplicity” and model computation in the simplest possible way. We will think of the most basic of computational steps. For example, are some very simple functions:

- \(OR:\{0,1\}^2 \rightarrow \{0,1\}\) defined as

\[OR(a,b) = \begin{cases} 0 & a=b=0 \\ 1 & \text{otherwise} \end{cases}\]

- \(AND:\{0,1\}^2 \rightarrow \{0,1\}\) defined as

\[AND(a,b) = \begin{cases} 1 & a=b=1 \\ 0 & \text{otherwise} \end{cases}\]

- \(NOT:\{0,1\} \rightarrow \{0,1\}\) defind as \(NOT(a) = 1-a\).

Each one of these functions takes either one or two single bits as
input, and produces a single bit as output. Clearly, it cannot get much
more basic than these. However, the power of computation comes from
*composing* simple building blocks together.

Let us see how we can obtain a different function from these building blocks: Define \(XOR:\{0,1\}^2 \rightarrow \{0,1\}\) to be the function \(XOR(a,b)= a + b \mod 2\). That is, \(XOR(0,0)=XOR(1,1)=0\) and \(XOR(1,0)=XOR(0,1)=1\). We claim that we can construct \(XOR\) using only \(AND\), \(OR\), and \(NOT\).

Here is an algorithm to compute \(XOR(a,b)\) using \(AND,NOT,OR\) as basic operations:

- Compute \(w1 = AND(a,b)\)
- Compute \(w2 = NOT(w1)\)
- Compute \(w3 = OR(a,b)\)
- Output \(AND(w2,w3)\)

We can also express this algorithm graphically, see
andornotcircxorfig. Such diagrams are often known as *Boolean
circuits*, and each basic operation is known as a *gate*. This is a
point of view that we will revisit often in this course.

Last but not least, we can also express it in Python code (see below).

Extending the same ideas, we can use these basic operations to compute the function \(XOR_3:\{0,1\}^3 \rightarrow \{0,1\}\) defined as \(XOR_3(a,b,c) = a + b + c (\mod 2)\) by computing first \(d=XOR(a,b)\) and then outputting \(XOR(d,c)\). In Python this is done as follows:

Make sure you see how to generalize this and obtain a way to compute \(XOR_n:\{0,1\}^n \rightarrow \{0,1\}\) for every \(n\) using at most \(4n\) basic steps involving applications of a function in \(\{ AND, OR , NOT \}\) to omputs or previously computed values.

Here is another function we can compute using \(AND,OR,NOT\). The \(NAND\) function maps \(\{0,1\}^2\) to \(\{0,1\}\) and is defined as

\[NAND(a,b) = \begin{cases} 0 & a=b=1 \\ 1 & \text{otherwise} \end{cases}\]

As its name implies, \(NAND\) is the NOT of AND (i.e., \(NAND(a,b)= NOT(AND(a,b))\)), and so we can clearly compute \(NAND\) using \(AND\) and \(NOT\). Interestingly, the opposite direction also holds:

We can compute \(AND\), \(OR\), and \(NOT\) by composing only the \(NAND\) function.

We start with the following observation. For every \(a\in \{0,1\}\), \(AND(a,a)=a\). Hence, \(NAND(a,a)=NOT(AND(a,a))=NOT(a)\). This means that \(NAND\) can compute \(NOT\), and since by the principle of “double negation”, \(AND(a,b)=NOT(NOT(AND(a,b)))\) this means that we can use \(NAND\) to compute \(AND\) as well. Once we can compute \(AND\) and \(NOT\), we can compute \(OR\) using the so called “De Morgan’s Law”: \(OR(a,b)=NOT(AND(NOT(a),NOT(b)))\) for every \(a,b \in \{0,1\}\).

univnandonethm’s proof is very simple, but you should make
sure that **(i)** you understand the statement of the theorem, and
**(ii)** you follow its proof completely. In particular, you should make
sure you understand why De Morgan’s law is true.

If you are so inclined, you can also verify the proof of univnandonethm by Python:

Let \(MAJ_3: \{0,1\}^3 \rightarrow \{0,1\}\) be the function that on input \(a,b,c\) outputs \(1\) iff \(a+b+c \geq 2\). Show how to compute \(MAJ_3\) using a composition of \(NAND\)’s.

To solve this problem, we will first express \(MAJ_3\) using \(AND\), \(OR\), \(NOT\), and then use univnandonethm to replace those with only \(NAND\). We can very naturally express the statement “At least two of \(a,b,c\) are equal to \(1\)” using OR’s and AND’s. Specifically, this is true if at least one of the values \(AND(a,b)\), \(AND(a,c)\), \(AND(b,c)\) is true. So we can write \[MAJ_3(a,b,c) = OR(OR(AND(a,b),AND(a,c)),AND(b,c)) \label{eqmajusingandor} \;.\]

Now we can use the equivalence \(AND(a,b)=NOT(NAND(a,b))\), \(OR(a,b)=NAND(NOT(a),NOT(b))\), and \(NOT(a)=NAND(a,a)\) to replace the righthand side of \eqref{{eqmajusingandor}} with an expression involving only \(NAND\), yielding \[ MAJ_3(a,b,c) = NAND(NAND(NAND(NAND(a,b),NAND(a,c)),NAND(NAND(a,b),NAND(a,c))),NAND(b,c)) \]

This corresponds to the following circuit with \(NAND\) gates:

univnandonethm tells us that we can use applications of the single function \(NAND\) to obtain \(AND\), \(OR\), \(NOT\), and so by extension all the other functions that can be built up from them. So, if we wanted to decide on a “basic operation”, we might as well choose \(NAND\), as we’ll get “for free” the three other operations \(AND\), \(OR\) and \(NOT\). This suggests the following definition of an “algorithm”:

Semi-formal definition of an algorithm:Analgorithmconsists of a sequence of steps of the form “store the NAND of variables`bar`

and`blah`

in variable`foo`

”.An algorithm \(A\)

computesa function \(F\) if for every input \(x\) to \(F\), if we feed \(x\) as input to the algorithm, the value computed in its last step is \(F(x)\).

There are several concerns that are raised by this definition:

- First and foremost, this definition is indeed too informal. We do not specify exactly what each step does, nor what it means to “feed \(x\) as input”.
- Second, the choice of \(NAND\) as a basic operation seems arbitrary.
Why just \(NAND\)? Why not \(AND\), \(OR\) or \(NOT\)? Why not allow
operations like addition and multiplication? What about any other
logical constructions such
`if`

/`then`

or`while`

? - Third, do we even know that this definition has anything to do with actual computing? If someone gave us a description of such an algorithm, could we use it to actually compute the function the real world?

These concerns will to a large extent guide us in the upcoming chapters. Thus you would be well advised to re-read the above informal definition and see what you think about these issues.

A large part of this course will be devoted to adressing the above issues. We will see that:

- We can make the definition of an algorithm fully formal, and so give a precise mathematical meaning to statements such as “Algorithm \(A\) computes function \(F\)”.
- While the choice of \(NAND\) is arbitrary, and we could just as well chose some other functions, we will also see this choice does not matter much. Our notion of an algorithm is not more restrictive because we only think of \(NAND\) as a basic step. We have already seen that allowing \(AND\),\(OR\), \(NOT\) as basic operations will not add any power (because we can compute them from \(NAND\)’s via univnandonethm). We will see that the same is true for addition, multiplication, and essentially every other operation that could be reasonably thought of as a basic step.
- It turns out that we can and do compute such “\(NAND\) based
algorithms” in the real world. First of all, such an algorithm is
clearly well specified, and so can be executed by a human with a pen
and paper. Second, there are a variety of ways to
*mechanize*this computation. We’ve already seen that we can write Python code that corresponds to following such a list of instructions. But in fact we can directly implement operations such as \(NAND\), \(AND\), \(OR\), \(NOT\) etc.. via electronic signals using components known as*transistors*. This is how modern electronic computers operate.

In the remainder of this chapter, we will begin to answer some of these
questions. We will see more example of the power of simple operations
like \(NAND\) (or equivalently, \(AND\)/\(OR\)/\(NOT\), as well as many other
choices) to compute more complex operations including addition,
multiplication, sorting and more. We will then discuss how to
*physically implement* simple operations such as NAND using a variety of
technologies. Finally we will define *The NAND programming language*
that will be our formal model of computation.

We have seen that using \(NAND\), we can compute \(AND\), \(OR\), \(NOT\) and \(XOR\). But this still seems a far cry from being able to add and multiply numbers, not to mention more complex programs such as sorting and searching, solving equations, manipulating images, and so on. We now give a few examples demonstrating how we can use these simple operations to do some more complicated tasks. While we will not go as far as implementing Call of Duty using \(NAND\), we will at least show how we can compose \(NAND\) operations to obtain tasks such as addition, multiplications, and comparisons.

We can describe the computation of a function
\(F:\{0,1\}^n \rightarrow \{0,1\}\) via a composition of \(NAND\) operations
in terms of a *circuit*, as was done in andornotcircxorfig.
Since in our case, all the gates are the same function (i.e., \(NAND\)),
the description of the circuit is even simpler. We can think of the
circuit as a directed graph. It has a vertex for every one of the input
bits, and also for every intermediate value we use in our computation.
If we compute a value \(u\) by applying \(NAND\) to \(v\) and \(w\) then we put
a directed edges from \(v\) to \(u\) and from \(w\) to \(u\). We will follow the
convention of using “\(x\)” for inputs and “\(y\)” for outputs, and hence
write \(x_0,x_1,\ldots\) for our inputs and \(y_0,y_1,\ldots\) for our
outputs. (We will sometimes also write these as `X[0]`

,`X[1]`

,\(\ldots\)
and `Y[0]`

,`Y[1]`

,\(\ldots\) respectively.) Here is a more formal
definition:

Before reading the formal definition, it would be an extremely good
exercise for you to pause here and try to think how *you* would formally
define the notion of a NAND circuit. Sometimes working out the
definition for yourself is easier than parsing its text.

Let \(n,m,s > 0\). A NAND circuit \(C\) with \(n\) inputs, \(m\) outputs, and \(s\) gates is a labeled directed acyclic graph (DAG) with \(n+s\) vertices such that:

- \(C\) has \(n\) vertices with no incoming edges, which are called the
*input vertices*and are labeled with`X[`

\(0\)`]`

,\(\ldots\),`X[`

\(n-1\)`]`

. - \(C\) has \(s\) vertices each with exactly two (possibly parallel)
incoming edges, which are called the
*gates*. - \(C\) has \(m\) gates which are called the
*output vertices*and are labeled with`Y[`

\(0\)`]`

,\(\ldots\),`Y[`

\(m-1\)`]`

. The output vertices have no outgoing edges.

For \(x\in \{0,1\}^n\), the *output* of \(C\) on input \(x\), denoted by
\(C(X)\), is computed in the natural way. For every \(i\in [n]\), we assign
to the input vertex `X[`

\(i\)`]`

the value \(x_i\), and then continuously
assign to every gate the value which is the NAND of the values assigned
to its two incoming neighbors. The output is the string \(y\in \{0,1\}^m\)
such that for every \(j \in [m]\), \(y_j\) is the value assigned to the
output gate labeled with `Y[`

\(j\)`]`

.

nandcircdef is perhaps our first encounter with a somewhat complicated definition. When you are faced with such a definition, there are several strategies to try to understand it:

- First, as we suggested above, you might want to see how
*you*would formalize the intuitive notion that the definitions tries to capture. If we made different choices than you would, try to think why is that the case. - Then, you should read the definition carefully, making sure you
understand all the terms that it uses, and all the conditions it
imposes.
- Finally, try to how the definition corresponds to simple examples such as the NAND circuit presented in eqmajusingandor, as well as the examples illustrated below.

We now present some examples of \(NAND\) circuits for various natural problems:

Recall the \(XOR\) function which maps \(x_0,x_1 \in \{0,1\}\) to \(x_0 + x_1 \mod 2\). We have seen in XORandornotexample that we can compute this function using \(AND\), \(OR\), and \(NOT\), and so by univnandonethm we can compute it using only \(NAND\)’s. However, the following is a direct construction of computing \(XOR\) by a sequence of NAND operations:

- Let \(u = NAND(x_0,x_1)\).
- Let \(v = NAND(x_0,u)\)
- Let \(w = NAND(x_1,u)\).
- The \(XOR\) of \(x_0\) and \(x_1\) is \(y_0 = NAND(v,w)\).

(We leave it to you to verify that this algorithm does indeed compute \(XOR\).)

We can also represent this algorithm graphically as a circuit:

We now present a few more examples of computing natural functions by a sequence of \(NAND\) operations.

Consider the task of computing, given as input a string \(x\in \{0,1\}^n\) that represents a natural number \(X\in \N\), the representation of \(X+1\). That is, we want to compute the function \(INC_n:\{0,1\}^n \rightarrow \{0,1\}^{n+1}\) such that for every \(x_0,\ldots,x_{n-1}\), \(INC_n(x)=y\) which satisfies \(\sum_{i=0}^n y_i 2^i = \left( \sum_{i=0}^{n-1} x_i 2^i \right)+1\).

The increment operation can be very informally described as follows:
*“Add \(1\) to the least significant bit and propagate the carry”*. A
little more precisely, in the case of the binary representation, to
obtain the increment of \(x\), we scan \(x\) from the least significant bit
onwards, and flip all \(1\)’s to \(0\)’s until we encounter a bit equal to
\(0\), in which case we flip it to \(1\) and stop. (Please verify you
understand why this is the case.)

Thus we can compute the increment of \(x_0,\ldots,x_{n-1}\) by doing the following:

- Set \(c_0=1\) (we pretend we have a “carry” of \(1\) initially)
- For \(i=0,\ldots, n-1\) do the following:

- Let \(y_i = XOR(x_i,c_i)\).
- If \(y_i=x_i=1\) then \(c_{i+1}=1\), else \(c_{i+1}=0\).

- Set \(y_n = c_n\).

The above is a very precise description of an algorithm to compute the
increment operation, and can be easily transformed into *Python* code
that performs the same computation, but it does not seem to directly
yield a NAND circuit to compute this. However, we can transform this
algorithm line by line to a NAND circuit. For example, since for every
\(a\), \(NAND(a,NOT(a))=1\), we can replace the initial statement \(c_0=1\)
with \(c_0 = NAND(x_0,NAND(x_0,x_0))\). We already know how to compute
\(XOR\) using NAND, so line 2.a can be replaced by some NAND operations.
Next, we can write line 2.b as simply saying \(c_{i+1} = AND(y_i,x_i)\),
or in other words \(c_{i+1}=NAND(NAND(y_i,x_i),NAND(y_i,x_i))\). Finally,
the assignment \(y_n = c_n\) can be written as
\(y_n = NAND(NAND(c_n,c_n),NAND(c_n,c_n))\). Combining these observations
yields for every \(n\in \N\), a \(NAND\) circuit to compute \(INC_n\). For
example, this is how this circuit looks like for \(n=4\).

Once we have the increment operation, we can certainly compute addition by repeatedly incrementing (i.e., compute \(x+y\) by performing \(INC(x)\) \(y\) times). However, that would be quite inefficient and unnecessary. With the same idea of keeping track of carries we can implement the “grade-school” algorithm for addition to compute the function \(ADD_n:\{0,1\}^{2n} \rightarrow \{0,1\}^{n+1}\) that on input \(x\in \{0,1\}^{2n}\) outputs the binary representation of the sum of the numbers represented by \(x_0,\ldots,x_{n-1}\) and \(x_{n+1},\ldots,x_n\):

- Set \(c_0=0\).
- For \(i=0,\ldots,n-1\):

- Let \(y_i = x_i + x_{n+i} + c_i (\mod 2)\).
- If \(x_i + x_{n+i} + c_i \geq 2\) then \(c_{i+1}=1\).

- Let \(y_n = c_n\)

Once again, this can be translated into a NAND circuit. To transform Step 2.b to a NAND circuit we use the fact (shown in majbynandex) that the function \(MAJ_3:\{0,1\}^3 \rightarrow \{0,1\}\) can be computed using \(NAND\)s.

*Computation* is an abstract notion, that is distinct from its physical
*implementations*. While most modern computing devices are obtained by
mapping logical gates to semi-conductor based transistors, over history
people have computed using a huge variety of mechanisms, including
mechanical systems, gas and liquid (known as *fluidics*), biological and
chemical processes, and even living creatures (e.g., see
crabfig or this
video for how crabs or
slime mold can be used to do computations).

In this section we will review some of these implementations, both so
you can get an appreciation of how it is possible to directly translate
NAND programs to the physical world, without going through the entire
stack of architecture, operating systems, compilers, etc… as well as
to emphasize that silicon-based processors are by no means the only way
to perform computation. Indeed, as we will see much later in this
course, a very exciting recent line of works involves using different
media for computation that would allow us to take advantage of *quantum
mechanical effects* to enable different types of algorithms.

A *transistor* can be thought of as an electric circuit with two inputs,
known as *source* and *gate* and an output, known as the *sink*. The
gate controls whether current flows from the source to the sink. In a
*standard transistor*, if the gate is “ON” then current can flow from
the source to the sink and if it is “OFF” then it can’t. In a
*complementary transistor* this is reversed: if the gate is “OFF” then
current can flow from the source to the sink and if it is “ON” then it
can’t.

There are several ways to implement the logic of a transistor. For
example, we can use faucets to implement it using water pressure (e.g.
transistor-water-fig).*vacuum tubes*. As its name implies, a vacuum tube
is a tube containing nothing (i.e., vacuum) and where a priori electrons
could freely flow from source (a wire) to the sink (a plate). However,
there is a gate (a grid) between the two, where modulating its voltage
can block the flow of electrons.

Early vacuum tubes were roughly the size of lightbulbs (and looked very
much like them too). In the 1950’s they were supplanted by
*transistors*, which implement the same logic using *semiconductors*
which are materials that normally do not conduct electricity but whose
conductivity can be modified and controlled by inserting impurities
(“doping”) and an external electric field (this is known as the *field
effect*). In the 1960’s computers were started to be implemented using
*integrated circuits* which enabled much greater density. In 1965,
Gordon Moore predicted that the number of transistors per circuit would
double every year (see moorefig), and that this would lead to
“such wonders as home computers —or at least terminals connected to a
central computer— automatic controls for automobiles, and personal
portable communications equipment”. Since then, (adjusted versions of)
this so-called “Moore’s law” has been running strong, though exponential
growth cannot be sustained forever, and some physical limitations are
already becoming
apparent.

We can use transistors to implement a *NAND gate*, which would be a
system with two input wires \(x,y\) and one output wire \(z\), such that if
we identify high voltage with “\(1\)” and low voltage with “\(0\)”, then the
wire \(z\) will equal to “\(1\)” if and only if the NAND of the values of
the wires \(x\) and \(y\) is \(1\) (see transistor-nand-fig).

This means that there exists a NAND circuit to compute a function \(F:\{0,1\}^n \rightarrow \{0,1\}^m\), then we can compute \(F\) in the physical world using transistors as well.

Electronic transistors are in no way the only technology that can implement computation. There are many mechanical, chemical, biological, or even social systems that can be thought of as computing devices. We now discuss some of these examples.

Computation can be based on biological or chemical
systems.
For example the *lac* operon
produces the enzymes needed to digest lactose only if the conditions
\(x \wedge (\neg y)\) hold where \(x\) is “lactose is present” and \(y\) is
“glucose is present”. Researchers have managed to create
transistors,
and from them the NAND function and other logic gates, based on DNA
molecules (see also transcriptorfig). One motivation for DNA
computing is to achieve increased parallelism or storage density;
another is to create “smart biological agents” that could perhaps be
injected into bodies, replicate themselves, and fix or kill cells that
were damaged by a disease such as cancer. Computing in biological
systems is not restricted of course to DNA. Even larger systems such as
flocks of
birds
can be considered as computational processes.

*Cellular automata* is a model of a system composed of a sequence of
*cells*, which of which can have a finite state. At each step, a cell
updates its state based on the states of its *neighboring cells* and
some simple rules. As we will discuss later in this course, cellular
automata such as Conway’s “Game of Life” can be used to simulate
computation gates, see gameoflifefig.

One computation device that we all carry with us is our own *brain*.
Brains have served humanity throughout history, doing computations that
range from distinguishing prey from predators, through making scientific
discoveries and artistic masterpieces, to composing witty 280 character
messages. The exact working of the brain is still not fully understood,
but it seems that to a first approximation it can be modeled by a (very
large) *neural network*.

A neural network is a Boolean circuit that instead of \(NAND\) (or even
\(AND\)/\(OR\)/\(NOT\)) uses some other gates as the basic basis. For example,
ine particular basis we can use are *threshold gates*. For every vector
\(w= (w_0,\ldots,w_{k-1})\) of integers and integer \(t\) (some or all of
whom could be negative), the *threshold function corresponding to \(w,t\)*
is the function \(T_{w,t}:\{0,1\}^k \rightarrow \{0,1\}\) that maps
\(x\in \{0,1\}^k\) to \(1\) if and only if
\(\sum_{i=0}^{k-1} w_i x_i \geq t\). For example, the threshold function
\(T_{w,t}\) corresponding to \(w=(1,1,1,1,1)\) and \(t=3\) is simply the
majority function \(MAJ_5\) on \(\{0,1\}^5\). The function
\(NAND:\{0,1\}^2 \rightarrow \{0,1\}\) is the threshold function
corresponding to \(w=(-1,-1)\) and \(t=-1\), since \(NAND(x_0,x_1)=1\) if and
only if \(x_0 + x_1 \leq 1\) or equivalently, \(-x_0 - x_1 \geq -1\).

Threshold gates can be thought of as an approximation for *neuron cells*
that make up the core of human and animal brains. To a first
approximation, a neuron has \(k\) inputs and a single output and the
neurons “fires” or “turns on” its output when those signals pass some
threshold. Unlike the cases above, when we considered the number of
inputs to a gate \(k\) to be a small constant, in such neural networks we
often do not put any bound on the number of inputs. However, since any
threshold function on \(k\) inputs can be computed by a NAND circuit of at
most \(poly(k)\) gates (see threshold-nand-ex), NAND circuits
are no less powerful than neural networks.

TO BE COMPLETED

We now turn to formally defining the notion of algorithm. We use a
*programming language* to do so. We define the *NAND Programming
Language* to be a programming language where every line has the
following form:

where `foo`

, `bar`

and `blah`

are variable identifiers.`foo`

,
`bar`

, `baz`

, `blah`

as stand-ins for generic identifiers. Generally
a variable identifier in the NAND programming language can be any
combination of letters and numbers, and we will also sometimes have
identifiers such as `Foo[12]`

that end with a number inside square
brackets. Later in the course we will introduce programming
languages where such identifiers carry special meaning as *arrays*.
At the moment you can treat them as simply any other identifier. The
appendix contains a full formal specification of the NAND
programming language.

Here is an example of a NAND program:

`u = NAND(X[0],X[1])`

`v = NAND(X[0],u)`

`w = NAND(X[1],u)`

`Y[0] = NAND(v,w)`

Do you know what function this program computes? Hint: you have seen it before.

As you might have guessed from this example, we have two special types
of variables in the NAND language: *input variables* have the form `X[`

\(i\) `]`

where \(i\) is a natural number, and *output variables* have the
form `Y[`

\(j\) `]`

where \(j\) is a natural number. When a NAND program is
*executed* on input \(x \in \{0,1\}^n\), the variable `X[`

\(i\) `]`

is
assigned the value \(x_i\) for all \(i\in [n]\). The *output* of the program
is the list of \(m\) values `Y[0]`

\(\ldots\) `Y[`

\(m-1\) `]`

, where \(m-1\) is
the largest index for which the variable `Y[`

\(m-1\) `]`

is assigned a
value in the program. If a line of the form `foo = NAND(bar,blah)`

appears in the program, then if `bar`

is *not* an input variable of the
form `X[`

\(i\) `]`

, then it must have been assigned a value in a previous
line, and the same holds for `blah`

. We also forbid assigning a value to
an input variable, and applying the NAND operation to an output
variable.

We can now formally define the notion of a function being computed by a NAND program:

Let \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) be some function, and let \(P\) be
a NAND program. We say that \(P\) *computes* the function \(F\) if:

- \(P\) has \(n\) input variables
`X[`

\(0\)`]`

\(,\ldots,\)`X[`

\(n-1\)`]`

and \(m\) output variables`Y[`

\(0\)`]`

,\(\ldots\),`Y[`

\(m-1\)`]`

. - For every \(x\in \{0,1\}^n\), if we execute \(P\) when we assign to
`X[`

\(0\)`]`

\(,\ldots,\)`X[`

\(n-1\)`]`

the values \(x_0,\ldots,x_{n-1}\), then at the end of the execution, the output variables`Y[`

\(0\)`]`

,\(\ldots\),`Y[`

\(m-1\)`]`

have the values \(y_0,\ldots,y_{m-1}\).

NANDcomp is one of the most important definitions in this book. Please make sure to read it time and again until you are sure that you understand it. A full formal specification of the execution model of NAND programs appears in the appendix.

You might have heard of a term called “Turing Complete” to describe
programming languages. (If you haven’t, feel free to ignore the rest of
this remark: we will encounter this term later in this course and define
it properly.) If so, you might wonder if the NAND programming language
has this property. The answer is no, or perhaps more accurately, the
term is not really applicable for the NAND programming language. The
reason is that, by design, the NAND programming language can only
compute *finite* functions \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) that take
a fixed number of input bits and produce a fixed number of outputs bits.
The term “Turing Complete” is really only applicable to programming
languages for *infinite* functions that can take inputs of arbitrary
length. We will come back to this distinction later on in the course.

So far we have described two models of computation:

*NAND circuits*, which are obtained by applying NAND gates to inputs.*NAND programs*, which are obtained by repeatedly applying operations of the form`foo = NAND(bar,blah)`

.

A central result is that these two models are actually equivalent:

Let \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) and \(s\in \N\). Then \(F\) is computable by a NAND program of \(s\) lines if and only if it is computable by a NAND circuit of \(s\) gates.

To understand the proof, you can first work out for yourself the
equivalence between the NAND program of NANDprogramexample and
the circuit we have seen in xornandexample, see also
progandcircfig. Generally, if we have a NAND program, we can
transform it into a circuit by mapping every line `foo = NAND(bar,blah)`

of the program into a gate `foo`

that is applied to the result of the
previous gates `bar`

and `blah`

. (Since we always assign a variable to
variables that have been assigned before or are input variables, we can
assume that `bar`

and `blah`

are either gates we already constructed or
are inputs to the circuit.) In the reverse direction, to map a circuit
\(C\) into a program \(P\) we use topological
sorting to sort the vertices of the graph of \(C\)
into an order \(v_0,v_1,\ldots,v_{s-1}\) such that if there is an edge
from \(v_i\) to \(v_j\) then \(j>i\). Thus we can transform every gate
(i.e. non input vertex) of the circuit into a line in a program in an
analogous way: if \(v\) is a gate that has two incoming edges from \(u\) and
\(w\), then we add a variable `foo`

corresonding to \(v\) and a line
`foo = NAND(bar,blah)`

where `bar`

and `blah`

are the variables
corresponding to \(u\) and \(w\).

Let \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) be a function. Suppose that there exists a program \(P\) of \(s\) lines that computes \(F\). We construct a NAND circuit \(C\) to compute \(F\) as follows: the circuit will include \(n\) input vertices, and will include \(s\) gates, one for each of the lines of \(P\). We let \(I(0),\ldots,I(n-1)\) denotes the vertices corresponding to the inputs and \(G(0),\ldots,G(s-1)\) denote the vertices corresponding to the lines. We connect our gates in the natural way as follows:

If the \(\ell\)-th line of \(P\) has the form `foo = NAND(bar,blah)`

where
`bar`

and `blah`

are variables *not* of the form `X[`

\(i\)`]`

, then `bar`

and `blah`

must have been assigned a value before. We let \(j\) and \(k\) be
the last lines before the \(\ell\)-th line in which the variables `bar`

and `blah`

respectively were assigned a value. In such a case, we will
add the edges \(\overrightarrow{G(j)\;G(\ell)}\) and
\(\overrightarrow{G(k)\;G(\ell)}\) to our circuit \(C\). That is, we will
apply the gate \(G(\ell)\) to the outputs of the gates \(G(j)\) and \(G(k)\).
If `bar`

is an input variable of the form `X[`

\(i\)`]`

then we connect
\(G(\ell)\) to the corresponding input vertex \(I(i)\), and do the analogous
step if `blah`

is an input variable. Finally, for every \(j\in [m]\), if
\(\ell(j)\) is the last line which assigns a value to `Y[`

\(j\)`]`

, then we
mark the gate \(G(j)\) as the \(j\)-th output gate of the circuit \(C\).

We claim that the circuit \(C\) computes the same function as the program \(P\). Indeed, one can show by induction on \(\ell\) that for every input \(x\in \{0,1\}^n\), if we execute \(P\) on input \(x\), then the value assigned to the variable in the \(\ell\)-th line is the same as the value output by the gate \(G(\ell)\) in the circuit \(C\). (To see this note that by the induction hypothesis, this is true for the values that the \(\ell\)-th line uses, as they were assigned a value in earlier lines or are inputs, and both the gate and the line compute the NAND function on these values.) Hence in particular the output variables of the program will have the same value as the output gates of the circuits.

In the other direction, given a circuit \(C\) of \(s\) gates that computes
\(F\), we can construct a program of \(s\) lines that computes the same
function. We use a topological sort to ensure that the \(n+s\) vertices of
the graph of \(C\) are sorted so that all edges go from earlier vertices
to later ones, and ensure the first \(n\) vertices \(0,1,\ldots,n-1\)
correspond to the \(n\) inputs. (This can be ensured as input vertices
have no incoming edges.) Then for every \(\ell \in [s]\), the \(\ell\)-th
line of the program \(P\) will correspond to the vertex \(n+\ell\) of the
circuit. If vertex \(n+\ell\)’s incoming neighbors are \(j\) and \(k\), then
the \(\ell\)-th line will be of the form
`Temp[`

\(\ell\)`] = NAND(Temp[`

\(j-n\)`],Temp[`

\(k-n\)`])`

(if \(j\) and/or \(k\)
are one of the first \(n\) vertices, then we will use the corresponding
input variable `X[`

\(j\)`]`

and/or `X[`

\(j\)`]`

instead). If vertex \(n+\ell\)
is the \(j\)-th output gate, then we use `Y[`

\(j\)`]`

as the variable on the
righthand side of the \(\ell\)-th line. Once again by a similar inductive
proof we can show that the program \(P\) we constructed computes the same
function as the circuit \(C\).

The proof of nandcircuitthm is *constructive*, in the sense
that it yield an explicit transformation from a program to a circuit and
vice versa. The appendix contains code of a *Python* function that
outputs the circuit corresponding to a program.

There is nothing special about NAND. For every set of functions
\(\mathcal{G} = \{ G_0,\ldots,G_{k-1} \}\), we can define a notion of
circuits that use elements of \(\mathcal{G}\) as gates, and a notion of a
“\(\mathcal{G}\) programming language” where every line involves assigning
to a variable `foo`

the result of applying some \(G_i \in \mathcal{G}\) to
previously defined or input variables. We can use the same proof idea of
nandcircuitthm to show that \(\mathcal{G}\) circuits and
\(\mathcal{G}\) programs are equivalent. We have seen that for
\(\mathcal{G} = \{ AND,OR, NOT\}\), the resulting circuits/programs are
equivalent in power to the NAND programming language, as we can compute
\(NAND\) using \(AND\)/\(OR\)/\(NOT\) and vice versa. This turns out to be a
special case of a general phenomena- the *universality* of \(NAND\) and
other gate sets- that we will explore more in depth later in this
course.

- An
*algorithm*is a recipe for performing a computation as a sequence of “elementary” or “simple” operations. - One candidate definition for an “elementary” operation is the \(NAND\) operation. It is an operation that is easily implementable in the physical world in a variety of methods including by electronic transistors.
- We can use \(NAND\) to compute many other functions, including majority, increment, and others.
- There are other equivalent choices, including the set \(\{AND,OR,NOT\}\).
- We can formally define the notion of a function
\(F:\{0,1\}^n \rightarrow \{0,1\}^m\) being computable using the
*NAND Programming language*. - The notions of being computable by a \(NAND\) circuit and being computable by a \(NAND\) program are equivalent.

Most of the exercises have been written in the summer of 2018 and haven’t yet been fully debugged. While I would prefer people do not post online solutions to the exercises, I would greatly appreciate if you let me know of any bugs. You can do so by posting a GitHub issue about the exercise, and optionally complement this with an email to me with more details about the attempted solution.

Define a set \(\mathcal{G}\) of functions to be a *universal basis* if we
can compute \(NAND\) using \(\mathcal{G}\). For every one of the following
sets, either prove that it is a universal basis or prove that it is not.
1. \(B = \{ \wedge, \vee, \neg \}\). (To make all of them be function on
two inputs, define \(\neg(x,y)=\overline{x}\).)

2. \(B = \{ \wedge, \vee \}\).

3. \(B= \{ \oplus,0,1 \}\) where \(\oplus:\{0,1\}^2 \rightarrow \{0,1\}\) is
the XOR function and \(0\) and \(1\) are the constant functions that output
\(0\) and \(1\).

4. \(B = \{ LOOKUP_1,0,1 \}\) where \(0\) and \(1\) are the constant functions
as above and \(LOOKUP_1:\{0,1\}^3 \rightarrow \{0,1\}\) satisfies
\(LOOKUP_1(a,b,c)\) equals \(a\) if \(c=0\) and equals \(b\) if \(c=1\).

Prove that for every subset \(B\) of the functions from \(\{0,1\}^k\) to
\(\{0,1\}\), if \(B\) is universal then there is a \(B\)-circuit of at most
\(O(k)\) gates to compute the \(NAND\) function (you can start by showing
that there is a \(B\) circuit of at most \(O(k^{16})\) gates).

Prove that for every \(w,t\), the function \(T_{w,t}\) can be computed by a
NAND program of at most \(O(k^3)\) lines.

Some topics related to this chapter that might be accessible to advanced students include:

- Efficient constructions of circuits: finding circuits of minimal size that compute certain functions.

TBC

Copyright 2018, Boaz Barak.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

HTML version is produced using the Distill template, Copyright 2018, The Distill Template Authors.

## Comments

Comments are posted on the GitHub repository using the utteranc.es app. A GitHub login is required to comment. If you don't want to authorize the app to post on your behalf, you can also comment directly on the GitHub issue for this page.