See any bugs/typos/confusing explanations? Open a GitHub issue. You can also comment below

★ See also the **PDF version of this chapter** (better formatting/references) ★

# Defining computation

- See that computation can be precisely modeled.

- Learn the computational model of
*Boolean circuits*/*straight-line programs*. - Equivalence of circuits and sraight-line programs.
- Equivalence of AND/OR/NOT and NAND.
- Examples of computing in the physical world.

“there is no reason why mental as well as bodily labor should not be economized by the aid of machinery”, Charles Babbage, 1852

“If, unwarned by my example, any man shall undertake and shall succeed in constructing an engine embodying in itself the whole of the executive department of mathematical analysis upon different principles or by simpler mechanical means, I have no fear of leaving my reputation in his charge, for he alone will be fully able to appreciate the nature of my efforts and the value of their results.”, Charles Babbage, 1864

“To understand a program you must become both the machine and the program.”, Alan Perlis, 1982

People have been computing for thousands of years, with aids that include not just pen and paper, but also abacus, slide rulers, various mechanical devices, and modern electronic computers. A priori, the notion of computation seems to be tied to the particular mechanism that you use. You might think that the “best” algorithm for multiplying numbers will differ if you implement it in *Python* on a modern laptop than if you use pen and paper. However, as we saw in the introduction (Chapter 0), an algorithm that is asymptotically better would eventually beat a worse one regardless of the underlying technology. This gives us hope for a *technology independent* way of defining computation, which is what we will do in this chapter.

## Defining computation

The name “algorithm” is derived from the Latin transliteration of Muhammad ibn Musa al-Khwarizmi’s name. Al-Khwarizmi was a Persian scholar during the 9th century whose books introduced the western world to the decimal positional numeral system, as well as to the solutions of linear and quadratic equations (see Figure 3.3). However Al-Khwarizmi’s descriptions of algorithms were rather informal by today’s standards. Rather than use “variables” such as \(x,y\), he used concrete numbers such as 10 and 39, and trusted the reader to be able to extrapolate from these examples.^{1}

Here is how al-Khwarizmi described the algorithm for solving an equation of the form \(x^2 +bx = c\):^{2}

[How to solve an equation of the form ] “roots and squares are equal to numbers”: For instance “one square , and ten roots of the same, amount to thirty-nine dirhems” that is to say, what must be the square which, when increased by ten of its own root, amounts to thirty-nine? The solution is this: you halve the number of the roots, which in the present instance yields five. This you multiply by itself; the product is twenty-five. Add this to thirty-nine’ the sum is sixty-four. Now take the root of this, which is eight, and subtract from it half the number of roots, which is five; the remainder is three. This is the root of the square which you sought for; the square itself is nine.

For the purposes of this book, we will need a much more precise way to describe algorithms. Fortunately (or is it unfortunately?), at least at the moment, computers lag far behind school-age children in learning from examples. Hence in the 20th century people have come up with exact formalisms for describing algorithms, namely *programming languages*. Here is al-Khwarizmi’s quadratic equation solving algorithm described in the *Python* programming language:^{3}

```
from math import sqrt
#Pythonspeak to enable use of the sqrt function to compute square roots.
def solve_eq(b,c):
# return solution of x^2 + bx = c following Al Khwarizmi's instructions
# Al Kwarizmi demonstrates this for the case b=10 and c= 39
val1 = b/2.0 # "halve the number of the roots"
val2 = val1*val1 # "this you multiply by itself"
val3 = val2 + c # "Add this to thirty-nine"
val4 = sqrt(val3) # "take the root of this"
val5 = val4 - val1 # "subtract from it half the number of roots"
return val5 # "This is the root of the square which you sought for"
# Test: solve x^2 + 10*x = 39
print(solve_eq(10,39))
# 3.0
```

We can define algorithms informally as follows:

Informal definition of an algorithm:AnAlgorithmis a set of instructions of how to compute an output from an input by following a sequence of “elementary steps”.An algorithm \(A\)

computesa function \(F\) if for every input \(x\), if we follow the instructions of \(A\) on the input \(x\), we obtain the output \(F(x)\).

In this chapter we will use an ultra-simple “programming language” to give a *formal* (that is, *precise*) definition of algorithms. (In fact, our programming language will be so simple that it is hardly worthy of this name.) However, it will take us some time to get there. We will start by discussing what are “elementary operations” and also how do we map a description of an algorithm into an actual physical process that produces an output from an input in the real world.

## Boolean formulas with AND, OR, and NOT.

An algorithm breaks down a *complex* calculation into a series of *simpler* steps. These steps can be executed in a variety of different ways, including:

Writing down symbols on a piece of paper

Modifying the current flowing on electrical wires.

Binding a protein to a strand of DNA

Response to a stimulus by a member of a collection (e.g., a bee in a colony, a trader in a market).

To formally define algorithms, let us try to “err on the side of simplicity” and model our “basic steps” as truly minimal. For example, here are some very simple functions:

- \(\ensuremath{\mathit{OR}}:\{0,1\}^2 \rightarrow \{0,1\}\) defined as

\[\ensuremath{\mathit{OR}}(a,b) = \begin{cases} 0 & a=b=0 \\ 1 & \text{otherwise} \end{cases}\]

- \(\ensuremath{\mathit{AND}}:\{0,1\}^2 \rightarrow \{0,1\}\) defined as

\[\ensuremath{\mathit{AND}}(a,b) = \begin{cases} 1 & a=b=1 \\ 0 & \text{otherwise} \end{cases}\]

- \(\ensuremath{\mathit{NOT}}:\{0,1\} \rightarrow \{0,1\}\) defined as \(\ensuremath{\mathit{NOT}}(a) = 1-a\).

The functions \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\) and \(\ensuremath{\mathit{NOT}}\), are the basic logical operators used in logic and many computer system. Each one of these functions takes either one or two single bits as input, and produces a single bit as output. Clearly, it cannot get much more basic than that. However, the power of computation comes from *composing* such simple building blocks together.

Consider the function \(\ensuremath{\mathit{MAJ}}:\{0,1\}^3 \rightarrow \{0,1\}\) that is defined as follows:

\[\ensuremath{\mathit{MAJ}}(x) = \begin{cases}1 & x_0 + x_1 + x_2 \geq 2 \\ 0 & \text{otherwise}\end{cases} \;.\]

That is, for every \(x\in \{0,1\}^3\), \(\ensuremath{\mathit{MAJ}}(x)=1\) if and only if the majority (i.e., at least two out of the three) of \(x\)’s coordinates are equal to \(1\). Can you come up with a formula involving \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\) and \(\ensuremath{\mathit{NOT}}\) to compute \(\ensuremath{\mathit{MAJ}}\)? (It would be useful for you to pause at this point and work out the formula for yourself. As a hint, although the \(\ensuremath{\mathit{NOT}}\) operator is needed to compute some functions, you will not need to use it to compute \(\ensuremath{\mathit{MAJ}}\).)

Let us first try to rephrase \(\ensuremath{\mathit{MAJ}}(x)\) in words: “\(\ensuremath{\mathit{MAJ}}(x)=1\) if and only if there exists some pair of distinct coordinates \(i,j\) such that both \(x_i\) and \(x_j\) are equal to \(1\).” In other words it means that \(\ensuremath{\mathit{MAJ}}(x)=1\) iff *either* both \(x_0=1\) *and* \(x_1=1\), *or* both \(x_1=1\) *and* \(x_2=1\), *or* both \(x_0=1\) *and* \(x_2=1\). Since the \(\ensuremath{\mathit{OR}}\) of three conditions \(c_0,c_1,c_2\) can be written as \(\ensuremath{\mathit{OR}}(c_0,\ensuremath{\mathit{OR}}(c_1,c_2))\), we can now translate this into a formula as follows:

\[ \ensuremath{\mathit{MAJ}}(x_0,x_1,x_2) = \ensuremath{\mathit{OR}}\left(\, \ensuremath{\mathit{AND}}(x_0,x_1)\;,\; \ensuremath{\mathit{OR}} \bigl( \ensuremath{\mathit{AND}}(x_1,x_2) \;,\; \ensuremath{\mathit{AND}}(x_0,x_2) \bigr) \, \right) \;. \;\;(3.4) \]

It is common to use \(a \vee b\) for \(\ensuremath{\mathit{OR}}(a,b)\) and \(a \wedge b\) for \(\ensuremath{\mathit{AND}}(a,b)\), as well as write \(a \vee b \vee c\) as shorthand for \((a \vee b) \vee c\). (\(\ensuremath{\mathit{NOT}}(a)\) is often written as either \(\neg a\) or \(\overline{a}\); we will use both notations in this book.) With this notation, Equation 3.4 can also be written as

\[\ensuremath{\mathit{MAJ}}(x_0,x_1,x_2) = (x_0 \wedge x_1) \vee (x_1 \wedge x_2) \vee (x_0 \wedge x_3)\;.\]

We can also write Equation 3.4 in a “programming language” format, expressing it as a set of instructions for computing \(\ensuremath{\mathit{MAJ}}\) given the basic operations \(\ensuremath{\mathit{AND}},\ensuremath{\mathit{OR}},\ensuremath{\mathit{NOT}}\):

### Extended example: Computing \(\ensuremath{\mathit{XOR}}\) from \(\ensuremath{\mathit{AND}}\),\(\ensuremath{\mathit{OR}}\),\(\ensuremath{\mathit{NOT}}\).

Let us see how we can obtain a different function from the same building blocks. Define \(\ensuremath{\mathit{XOR}}:\{0,1\}^2 \rightarrow \{0,1\}\) to be the function \(\ensuremath{\mathit{XOR}}(a,b)= a + b \mod 2\). That is, \(\ensuremath{\mathit{XOR}}(0,0)=\ensuremath{\mathit{XOR}}(1,1)=0\) and \(\ensuremath{\mathit{XOR}}(1,0)=\ensuremath{\mathit{XOR}}(0,1)=1\). We claim that we can construct \(\ensuremath{\mathit{XOR}}\) using only \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\), and \(\ensuremath{\mathit{NOT}}\).

As usual, it is a good exercise to try to work out the algorithm for \(\ensuremath{\mathit{XOR}}\) using \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\) and \(\ensuremath{\mathit{NOT}}\) on your own before reading further.

The following algorithms computes \(\ensuremath{\mathit{XOR}}\) using \(\ensuremath{\mathit{AND}}\),\(\ensuremath{\mathit{OR}}\), and \(\ensuremath{\mathit{NOT}}\):

**Input:** \(a,b \in \{0,1\}\).

**Operations:**

Compute \(w1 = \ensuremath{\mathit{AND}}(a,b)\)

Compute \(w2 = \ensuremath{\mathit{NOT}}(w1)\)

Compute \(w3 = \ensuremath{\mathit{OR}}(a,b)\)

Output \(\ensuremath{\mathit{AND}}(w2,w3)\)

For every \(a,b\in \{0,1\}\), on input \(a,b\) Algorithm 3.2 will return \(a+b \mod 2\).

For every \(a,b\), \(\ensuremath{\mathit{XOR}}(a,b)=1\) if and only if \(a\) is *different* from \(b\). On input \(a,b\in \{0,1\}\), Algorithm 3.2 outputs \(\ensuremath{\mathit{AND}}(w2,w3)\) where \(w2=\ensuremath{\mathit{NOT}}(\ensuremath{\mathit{AND}}(a,b))\) and \(w3=\ensuremath{\mathit{OR}}(a,b)\).

If \(a=b=0\) then \(w3=\ensuremath{\mathit{OR}}(a,b)=0\) and so the output will be \(0\).

If \(a=b=1\) then \(\ensuremath{\mathit{AND}}(a,b)=1\) and so \(w2=\ensuremath{\mathit{NOT}}(\ensuremath{\mathit{AND}}(a,b))=0\) and the output will be \(0\).

If \(a=1\) and \(b=0\) (or vice versa) then both \(w3=\ensuremath{\mathit{OR}}(a,b)=1\) and \(w1=\ensuremath{\mathit{AND}}(a,b)=0\), in which case the algorithm will output \(\ensuremath{\mathit{OR}}(\ensuremath{\mathit{NOT}}(w1),w3)=1\).

We can also express Algorithm 3.2 via a programming language. Specifically, the following is a *Python* program that computes the \(\ensuremath{\mathit{XOR}}\) function:

```
def AND(a,b): return a*b
def OR(a,b): return 1-(1-a)*(1-b)
def NOT(a): return 1-a
def XOR(a,b):
w1 = AND(a,b)
w2 = NOT(w1)
w3 = OR(a,b)
return AND(w2,w3)
# Test out the code
print([f"XOR({a},{b})={XOR(a,b)}" for a in [0,1] for b in [0,1]])
# ['XOR(0,0)=0', 'XOR(0,1)=1', 'XOR(1,0)=1', 'XOR(1,1)=0']
```

Let \(\ensuremath{\mathit{XOR}}_3:\{0,1\}^3 \rightarrow \{0,1\}\) be the function defined as \(\ensuremath{\mathit{XOR}}_3(a,b,c) = a + b +c \mod 2\). That is, \(\ensuremath{\mathit{XOR}}_3(a,b,c)=1\) if \(a+b+c\) is odd, and \(\ensuremath{\mathit{XOR}}_3(a,b,c)=0\) otherwise. Show that you can compute \(\ensuremath{\mathit{XOR}}_3\) using AND, OR, and NOT. You can express it as a forumla, use a programming language such as Python, or use a Boolean circuit.

Addition modulo two satisfies the same properties of *associativity* (\((a+b)+c=(a+b)+c\)) and *commutativity* (\(a+b=b+a\)) as standard addition. This means that, if we define \(a \oplus b\) to equal \(a + b \mod 2\), then \[
\ensuremath{\mathit{XOR}}_3(a,b,c) = (a \oplus b) \oplus c
\] or in other words \[
\ensuremath{\mathit{XOR}}_3(a,b,c) = \ensuremath{\mathit{XOR}}(\ensuremath{\mathit{XOR}}(a,b),c) \;.
\]

Since we know how to compute \(\ensuremath{\mathit{XOR}}\) using AND, OR, and NOT, we can compose this to compute \(\ensuremath{\mathit{XOR}}_3\) using the same building blocks. In Python this corresponds to the following program:

```
def XOR3(a,b,c):
w1 = AND(a,b)
w2 = NOT(w1)
w3 = OR(a,b)
w4 = AND(w2,w3)
w5 = AND(w4,c)
w6 = NOT(w5)
w7 = OR(w4,c)
return AND(w6,w7)
# Let's test this out
print([f"XOR3({a},{b},{c})={XOR3(a,b,c)}" for a in [0,1] for b in [0,1] for c in [0,1]])
# ['XOR3(0,0,0)=0', 'XOR3(0,0,1)=1', 'XOR3(0,1,0)=1', 'XOR3(0,1,1)=0', 'XOR3(1,0,0)=1', 'XOR3(1,0,1)=0', 'XOR3(1,1,0)=0', 'XOR3(1,1,1)=1']
```

Try to generalize the above examples to obtain a way to compute \(\ensuremath{\mathit{XOR}}_n:\{0,1\}^n \rightarrow \{0,1\}\) for every \(n\) using at most \(4n\) basic steps involving applications of a function in \(\{ \ensuremath{\mathit{AND}}, \ensuremath{\mathit{OR}} , \ensuremath{\mathit{NOT}} \}\) to outputs or previously computed values.

### Informally defining “basic operations” and “algorithms”

We have seen that we can obtain at least some examples of interesting functions by composing together applications of \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\), and \(\ensuremath{\mathit{NOT}}\). This suggests that we can use \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\), and \(\ensuremath{\mathit{NOT}}\) as our “basic operations”, hence obtaining the following definition of an “algorithm”:

Semi-formal definition of an algorithm:Analgorithmconsists of a sequence of steps of the form “compute a new value by applying \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\), or \(\ensuremath{\mathit{NOT}}\) to previously computed values”.An algorithm \(A\)

computesa function \(f\) if for every input \(x\) to \(f\), if we feed \(x\) as input to the algorithm, the value computed in its last step is \(f(x)\).

There are several concerns that are raised by this definition:

First and foremost, this definition is indeed too informal. We do not specify exactly what each step does, nor what it means to “feed \(x\) as input”.

Second, the choice of \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\) or \(\ensuremath{\mathit{NOT}}\) seems rather arbitrary. Why not \(\ensuremath{\mathit{XOR}}\) and \(\ensuremath{\mathit{MAJ}}\)? Why not allow operations like addition and multiplication? What about any other logical constructions such

`if`

/`then`

or`while`

?Third, do we even know that this definition has anything to do with actual computing? If someone gave us a description of such an algorithm, could we use it to actually compute the function in the real world?

These concerns will to a large extent guide us in the upcoming chapters. Thus you would be well advised to re-read the above informal definition and see what you think about these issues.

A large part of this book will be devoted to addressing the above issues. We will see that:

We can make the definition of an algorithm fully formal, and so give a precise mathematical meaning to statements such as “Algorithm \(A\) computes function \(f\)”.

While the choice of \(\ensuremath{\mathit{AND}}\)/\(\ensuremath{\mathit{OR}}\)/\(\ensuremath{\mathit{NOT}}\) is arbitrary, and we could just as well chose some other functions, we will also see this choice does not matter much. We will see that the we would obtain the same computational power if we used instead for addition and multiplication, and essentially every other operation that could be reasonably thought of as a basic step.

It turns out that we can and do compute such “\(\ensuremath{\mathit{AND}}\)/\(\ensuremath{\mathit{OR}}\)/\(\ensuremath{\mathit{NOT}}\) based algorithms” in the real world. First of all, such an algorithm is clearly well specified, and so can be executed by a human with a pen and paper. Second, there are a variety of ways to

*mechanize*this computation. We’ve already seen that we can write Python code that corresponds to following such a list of instructions. But in fact we can directly implement operations such as \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\), \(\ensuremath{\mathit{NOT}}\) etc.. via electronic signals using components known as*transistors*. This is how modern electronic computers operate.

In the remainder of this chapter, and the rest of this book, we will begin to answer some of these questions. We will see more examples of the power of simple operations to compute more complex operations including addition, multiplication, sorting and more. We will also discuss how to *physically implement* simple operations such as \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\) and \(\ensuremath{\mathit{NOT}}\) using a variety of technologies.

## Boolean Circuits

*Boolan circuits* provide a precise notion of “composing basic operations together”. A Boolean circuit (see Figure 3.5) is composed of *gates* and *inputs* that are connected by *wires*. The *wires* carry a signal that is either the value \(0\) or \(1\). ^{4} Each gate corresponds to either the *OR*, *AND*, or *NOT* operation. An *OR gate* has two incoming wires, and one or more outgoing wires. If these two incoming wires carry the signals \(a\) and \(b\) (for \(a,b \in \{0,1\}\)), then the signal on the outgoing wires will be \(\ensuremath{\mathit{OR}}(a,b)\). AND and NOT gates are defined similarly. The *inputs* have only outgoing wires. If we set a certain input to a value \(a\in \{0,1\}\), then this value is propagated on all the wires outgoing from it. We also designate some gates as *output gates*, and their value corresponds to the result of evaluating the circuit. We evaluate an \(n\)-input Boolean circuit \(C\) on an input \(x\in \{0,1\}^n\) by placing the bits of \(x\) on the inputs, and then propagating the values on the wires until we reach an output, see Figure 3.5.

Equation 3.4 gave a formula for computing the \(\ensuremath{\mathit{MAJ}}:\{0,1\}^3 \rightarrow \{0,1\}\) via \(\ensuremath{\mathit{AND}}\)’s and \(\ensuremath{\mathit{OR}}\)’s. We can express the same formula as a Boolean circuit as well, see Figure 3.6. Since the formula Equation 3.4 involves three AND’s and two OR’s, the circuit has five gates (three AND gates and two OR gates).

Algorithm 3.2 can be also presented as a circuit for computing the function \(\ensuremath{\mathit{XOR}}\), see Figure 3.7. The five lines of Algorithm 3.2 translate into the five gates of the circuit.

### Boolean circuits: a formal definition

We defined Boolean circuits informally as obtained by connecting AND, OR, and NOT gates via wires so as to produce an output from an input. However, for us to be able to make precise and prove statements such as *“there is a Boolean circuit of 5 gates that computes the function \(\ensuremath{\mathit{MAJ}}:\{0,1\}^3 \rightarrow \{0,1\}\)”* we need to:

Formally define a Boolean circuit as a mathematical object.

Formally define what it means for a circuit \(C\) computing a function \(f\).

We now proceed to do so. We will define a Boolean circuit as a labeled *Directed Acyclic Graph (DAG)*. The *vertices* of the graph correspond to the gates and inputs of the circuit, and the *edges* of the graph correspond to the wires. A wire from an input or gate \(u\) to a gate \(v\) in the circuit corresponds to a directed edge between the corresponding vertices. The inputs are vertices with no incoming edges, while each gate has the appropriate number of incoming edges based on the function it computes. (That is, \(\ensuremath{\mathit{AND}}\) and \(\ensuremath{\mathit{OR}}\) gates have two in-neighbors, while \(\ensuremath{\mathit{NOT}}\) gates have one in-neighbor.) The formal definition is as follows (see also Figure 3.8):

Let \(n,m,s\) be positive integers with \(s \geq m\). A *Boolean circuit* with \(n\) inputs, \(m\) outputs, and \(s\) gates, is a labeled directed acyclic graph (DAG) \(G=(V,E)\) with \(s+n\) vertices satisfying the following properties:

Exactly \(n\) of the vertices have no in-neighbors. These vertices are known as

*inputs*and are labeled with the \(n\) labels`X[`

\(0\)`]`

, \(\ldots\),`X[`

\(n-1\)`]`

.The other \(s\) vertices are known as

*gates*. Each gate is labeled with \(\wedge\), \(\vee\) or \(\neg\). Gates labeled with \(\wedge\) or \(\vee\) have two in-neighbors. Gates labeled with \(\neg\) have one in-neighbor. We will allow parallel edges (and so for example an AND gate can have both its in-neighbors be the same vertex).Exactly \(m\) of the gates are also labeled with the \(m\) labels

`Y[`

\(0\)`]`

, \(\ldots\),`Y[`

\(m-1\)`]`

(in addition to their label \(\wedge\)/\(vee\)/\(\neg\)). These are known as*outputs*.

This is our first example of a non trivial mathematical definition, so it is worth taking the time to read it slowly and carefully. As in all mathematical definitions, we are using a known mathematical object - a directed acyclic graph - to define a new object - a Boolean circuit. This might be a good time to review some of the basic properties of DAGs and in particular the fact that they can be *topologically sorted*, see Section 1.7.

If \(C\) is a circuit with \(n\) inputs and \(m\) outputs, and \(x\in \{0,1\}^n\), then we can compute the output of \(C\) on the input \(x\) in the natural way: assign the input vertices `X[`

\(0\)`]`

, `X[`

\(n-1\)`]`

the values \(x_0,\ldots,x_{n-1}\), apply each gate on the values of its in-neighbors, and then output the values that correspond to the output vertices. Formally, this is defined as follows:

Let \(C\) be a Boolean circuit with \(n\) inputs and \(m\) outputs. For every \(x\in \{0,1\}^n\), the *output* of \(C\) on the input \(x\), denoted by \(C(x)\), is defined as the result of the following process:

We let \(h:V \rightarrow \N\) be the *minimal layering* of \(C\) (see Theorem 1.30). We let \(L\) be the maximum layer of \(h\), and for \(\ell=0,1,\ldots,L\) we do the following:

For every \(v\) in the \(\ell\)-th layer (i.e., \(v\) such that \(h(v)=\ell\)) do:

If \(v\) is an input vertex labeled with

`X[`

\(i\)`]`

for some \(i\in [n]\), then we assign to \(v\) the value \(x_i\).^{5}If \(v\) is a gate vertex labeled with \(\wedge\) and with two in-neighbors \(u,w\) then we assign to \(v\) the AND of the values assigned to \(u\) and \(w\).

^{6}If \(v\) is a gate vertex labeled with \(\vee\) and with two in-neighbors \(u,w\) then we assign to \(v\) the OR of the values assigned to \(u\) and \(w\).

If \(v\) is a gate vertex labeled with \(\neg\) and with one in-neighbor \(u\) then we assign to \(v\) the negation of the value assigned to \(u\).

The result of this process is the value \(y\in \{0,1\}^m\) such that for every \(j\in [m]\), \(y_j\) is the value assigned to the vertex with label

`Y[`

\(j\)`]`

.

Let \(f:\{0,1\}^n \rightarrow \{0,1\}^m\). We say that the circuit \(C\) *computes* \(f\) if for every \(x\in \{0,1\}^n\), \(C(x)=f(x)\).

### Equivalence of circuits and straight-line programs

We have seen two ways to describe how to compute a function \(f\) using AND, OR and NOT:

A

*Boolean circuit*, defined in Definition 3.5, computes \(f\) by connecting via wires AND, OR, and NOT gates to the inputs.We can also describe such a computation using a

*straight-line program*that has lines of the form`foo = AND(bar,blah)`

,`foo = OR(bar,blah)`

and`foo = NOT(bar)`

where`foo`

,`bar`

and`blah`

are variable names. (We call this a*straight-line program*since it contains no loops or if/then statements.)

We will now formally define the AON-CIRC programming language (“AON” stands for AND/OR/NOT) which has the above operations, and show that it is equivalent to Boolean circuits.

An *AON-CIRC program* is a string of lines of the form `foo = AND(bar,blah)`

, `foo = OR(bar,blah)`

and `foo = NOT(bar)`

where `foo`

, `bar`

and `blah`

are variable names.^{7} Variables of the form `X[`

\(i\)`]`

are known as *input* variables, and variables of the form `Y[`

\(j\)`]`

are known as *output* variables. In every line, the variables on the righthand side of the assignment operators must either be input variables or variables that have already been assigned a value before.

If an AON-CIRC program \(P\) has input variables `X[`

\(0\)`]`

,\(\ldots\),`X[`

\(n-1\)`]`

and output variables `Y[`

\(0\)`]`

,\(\ldots\), `Y[`

\(m-1\)`]`

then for every \(x\in \{0,1\}^n\), we define the *output of \(P\) on input \(x\)*, denoted by \(P(x)\), to be the string \(y\in \{0,1\}^m\) corresponding to the values of the output variables `Y[`

\(0\)`]`

,\(\ldots\), `Y[`

\(m-1\)`]`

in the execution of \(P\) where we initialize the input variables `X[`

\(0\)`]`

,\(\ldots\),`X[`

\(n-1\)`]`

to the values \(x_0,\ldots,x_{n-1}\).

We say that such an AON-CIRC program \(P\) *computes* a function \(f:\{0,1\}^n \rightarrow \{0,1\}^m\) if \(P(x)=f(x)\) for every \(x\in \{0,1\}^n\).

Consider the following function \(\ensuremath{\mathit{CMP}}:\{0,1\}^4 \rightarrow \{0,1\}\) that on input four bits \(a,b,c,d\in \{0,1\}\), outputs \(1\) iff the number represented by \((a,b)\) is larger than the number represented by \((c,d)\). That is \(\ensuremath{\mathit{CMP}}(a,b,c,d)=1\) iff \(2a+b>2c+d\).

Write an AON-CIRC program to compute \(\ensuremath{\mathit{CMP}}\).

Writing such a program is tedious but not truly hard. To compare two numbers we first compare their most significant digit, and then go down to the next digit and so on and so forth. In this case where the numbers have just two binary digits, these comparisons are particularly simple: The number represented by \((a,b)\) is larger than the number represented by \((c,d)\) if and only if one of the following conditions happens:

- The most significant bit \(a\) of \((a,b)\) is larger than the most significant bit \(c\) of \((c,d)\).

or

- The two most significant bits \(a\) and \(c\) are equal, but \(b>d\).

Another way to express the same condition is the following: the number \((a,b)\) is larger than \((c,d)\) iff \(a>c\) **OR** (**NOT** \((c<a)\) **AND** \(b>d\)).

For binary digits \(\alpha,\beta\), the condition \(\alpha>\beta\) is simply that \(\alpha=1\) and \(\beta=0\) or \(\ensuremath{\mathit{AND}}(\alpha,\ensuremath{\mathit{NOT}}(\beta))=1\). Together these observations can be used to give the following AON-CIRC program to compute \(\ensuremath{\mathit{CMP}}\):

```
temp_1 = NOT(X[2])
temp_2 = OR(X[0],temp_1)
temp_3 = NOT(X[0])
temp_4 = OR(X[2],temp_3)
temp_5 = NOT(X[3])
temp_6 = OR(X[1],temp_5)
temp_7 = AND(temp_6,temp_4)
Y[0] = OR(temp_2,temp_7)
```

We can also present this 8-line program as a circuit with 8 gates, see Figure 3.9.

AON-CIRC is not a practical programming language: it was designed for pedagogical purposes only, as a way to model computation as composition of \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\), and \(\ensuremath{\mathit{NOT}}\). However, AON-CIRC can still be easily implemented on a computer. Specifically the following Python program will evaluate an AON-CIRC program (given as a string) on an input of our choice:^{8}

```
def EVAL(code,X):
"""Evaluate code on input X."""
n,m = numinout(code) # helper function - get number of inputs and outputs of program by searching for substrings of form X[i] and Y[j]
vtable = { f"X[{i}]":int(X[i]) for i in range(n)}
# table of variable values, initially only contains input variables
for line in code.split("\n"):
if not line: continue
foo,op,bar,blah = parseline(line,2)
# helper function - split "foo = OP(,blah)" to list ["foo","OP","bar","blah"]
# 2 is num of arguments to expect: blah is empty if it's missing
if op=="NOT": vtable[foo] = NOT(vtable[bar])
if op=="AND": vtable[foo] = AND(vtable[bar],vtable[blah])
if op=="OR": vtable[foo] = OR(vtable[bar],vtable[blah])
return [vtable[f"Y[{j}]"] for j in range(m)]
```

It turns out that AON-CIRC programs and Boolean circuits have exactly the same power:

Let \(f:\{0,1\}^n \rightarrow \{0,1\}^m\) and \(s \geq m\) be some number. Then \(f\) is computable by a Boolean circuit of \(s\) gates if and only if \(f\) is computable by an AON-CIRC program of \(s\) lines.

The idea is simple - AON-CIRC program and Boolean circuits are just different ways of describing the exact same computational process. For example, an AND gate in a Boolean circuit corresponding to computing the AND of two previously-computed values. In a AON-CIRC program this will correspond to the line that stores in a variable the AND of two previously-computed variables.

This proof is simple at heart, but all the details it contains can make it a little cumbersome to read. You might be better off trying to work it out yourself before reading it. We will also show a “proof by Python” that might help in clarifying these details.

Let \(f:\{0,1\}^n \rightarrow \{0,1\}^m\). Since the theorem is an “if and only if” statement, to prove it we need to show both directions: translating an AON-CIRC program that computes \(f\) into a circuit that computes \(f\), and translating a circuit that computes \(f\) into an AON-CIRC program that does so.

We start with the first direction. Let \(P\) be an \(s\) line AON-CIRC that computes \(f\). We define a circuit \(C\) as follows: the circuit will have \(n\) inputs and \(s\) gates. For every \(i \in [s]\), if the \(i\)-th line has the form `foo = AND(bar,blah)`

then the \(i\)-th gate in the circuit will be an AND gate that is connected to gates \(j\) and \(k\) where \(j\) and \(k\) correspond to the last lines before \(i\) where the variables `bar`

and `blah`

(respectively) where written to. (For example, if \(i=57\) and the last line `bar`

was written to is \(35\) and the last line `blah`

was written to is \(17\) then the two in-neighbors of gate \(57\) will be gates \(35\) and \(17\).) If either `bar`

or `blah`

is an input variable then we connect the gate to the corresponding input vertex instead. If `foo`

is an output variable of the form `Y[`

\(j\)`]`

then we add the same label the corresponding gate to mark it as an output gate. We do the analogous operations if the \(i\)-th line involves an `OR`

or a `NOT`

operation (except that we use an OR, or a NOT, gate, and in the latter case have only one in-neighbor instead of two). For every input \(x\in \{0,1\}^n\), if we run the program \(P\) on \(x\), then the value written that is computed in the \(i\)-th line is exactly the value that will be assigned to the \(i\)-th gate if we evaluate the circuit \(C\) on \(x\). Hence \(C(x)=P(x)\) for every \(x\in \{0,1\}^n\).

For the other direction, let \(C\) be a circuit of \(s\) gates and \(n\) inputs that computes the function \(f\). We sort the gates according to a topological order and write them as \(v_0,\ldots,v_{s-1}\). We now can create a program \(P\) of \(s\) lines as follows. For every \(i\in [s]\), if \(v_i\) is an AND gate with in-neighbors \(v_j,v_k\) then we will add a line to \(P\) of the form `temp_`

\(i\) `= AND(temp_`

\(j\)`,temp_`

\(k\)`)`

, unless one of the vertices is an input vertex or an output gate, in which case we change this to the form `X[.]`

or `Y[.]`

appropriately. Because we work in topological ordering, we are guaranteed that the in-neighbors \(v_j\) and \(v_k\) correspond to variables that have already been assigned a value. We do the same for OR and NOT gate. Once again, one can verify that for every input \(x\), the value \(P(x)\) will equal \(C(x)\) and hence the program computes the same function as the circuit.

### “Proof by Python”

The proof of Theorem 3.9 is *constructive*. It yields a way of transforming an AON-CIRC program into an equivalent Boolean circuit and vice versa. Below is the code that carries out this transformation. (It uses some “helper” functions and objects: see our GitHub repository for the full implementation.) Figure 3.10 shows an example of the result of this transformation.

```
def circuit2prog(C):
"""Transform circuit to a program."""
code = ""
def key2var(key):
"""Helper function: translate key identifying a node into a variable name"""
if key[:6]=="input_": return f"X[{key[6:]}]"
elif key in C.outputs: return f"Y[{C.outputs[key]}]"
return key
# every gate is translated into a line
for (key,n) in C.nodes.items():
# we assume nodes are in topological ordering, otherwise should layer first
if n[0]!="GATE": continue # ignore input (non gate) nodes
args = ",".join(map(key2var,C.in_neighbors[key]))
code += f"{key2var(key)} = {n[1].__name__}({args})\n"
return code
```

```
def prog2circuit(code,gateset=None):
"""Transform a straight-line program into a circuit.
Takes as input the basic gates one uses (otherwise use all functions currently defined)"""
if not gateset: gateset = globals()
n,m = numinout(code) # helper function - extract number of inputs and outputs from code
C = Circuit(n) # create circuit with n inputs
nodes = { f"X[{i}]" : C.X[i] for i in range(n) }
# initially we have n nodes corresponding to n inputs.
for line in code.split("\n"): # every line is translated to a new gate
if not line: continue
foo,op,bar,blah = parseline(line,2)
# parseline takes "foo = OP(bar,blah)" to the list ["foo","OP","bar","blah"]
if blah: g = C.gate(gateset[op],nodes[bar],nodes[blah])
else: g = C.gate(gateset[op],nodes[bar])
nodes[foo] = g
if foo[0]=="Y": C.output(g,int(foo[2:-1]))
return C
```

## Digression: physical implementations of computing devices.

*Computation* is an abstract notion, that is distinct from its physical *implementations*. While most modern computing devices are obtained by mapping logical gates to semi-conductor based transistors, over history people have computed using a huge variety of mechanisms, including mechanical systems, gas and liquid (known as *fluidics*), biological and chemical processes, and even living creatures (e.g., see Figure 3.11 or this video for how crabs or slime mold can be used to do computations).

In this section we will review some of these implementations, both so you can get an appreciation of how it is possible to directly translate Boolean circuits to the physical world, without going through the entire stack of architecture, operating systems, and compilers, as well as to emphasize that silicon-based processors are by no means the only way to perform computation. Indeed, as we will see in Chapter 22, a very exciting recent line of works involves using different media for computation that would allow us to take advantage of *quantum mechanical effects* to enable different types of algorithms.

### Transistors

A *transistor* can be thought of as an electric circuit with two inputs, known as *source* and *gate* and an output, known as the *sink*. The gate controls whether current flows from the source to the sink. In a *standard transistor*, if the gate is “ON” then current can flow from the source to the sink and if it is “OFF” then it can’t. In a *complementary transistor* this is reversed: if the gate is “OFF” then current can flow from the source to the sink and if it is “ON” then it can’t.

There are several ways to implement the logic of a transistor. For example, we can use faucets to implement it using water pressure (e.g. Figure 3.12).^{9} However, the standard implementation uses electrical current. One of the original implementations used *vacuum tubes*. As its name implies, a vacuum tube is a tube containing nothing (i.e., vacuum) and where a priori electrons could freely flow from source (a wire) to the sink (a plate). However, there is a gate (a grid) between the two, where modulating its voltage can block the flow of electrons.

Early vacuum tubes were roughly the size of lightbulbs (and looked very much like them too). In the 1950’s they were supplanted by *transistors*, which implement the same logic using *semiconductors* which are materials that normally do not conduct electricity but whose conductivity can be modified and controlled by inserting impurities (“doping”) and an external electric field (this is known as the *field effect*). In the 1960’s computers were started to be implemented using *integrated circuits* which enabled much greater density. In 1965, Gordon Moore predicted that the number of transistors per circuit would double every year (see Figure 3.13), and that this would lead to “such wonders as home computers —or at least terminals connected to a central computer— automatic controls for automobiles, and personal portable communications equipment”. Since then, (adjusted versions of) this so-called “Moore’s law” has been running strong, though exponential growth cannot be sustained forever, and some physical limitations are already becoming apparent.

### Logical gates from transistors

We can use transistors to implement various Boolean functions such as \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\), and not \(\ensuremath{\mathit{NOT}}\). For each a two-input gate \(G:\{0,1\}^2 \rightarrow \{0,1\}\), such an implementation would be a system with two input wires \(x,y\) and one output wire \(z\), such that if we identify high voltage with “\(1\)” and low voltage with “\(0\)”, then the wire \(z\) will equal to “\(1\)” if and only if applying \(G\) to the values of the wires \(x\) and \(y\) is \(1\) (see Figure 3.16 and Figure 3.17). This means that there exists a AND/OR/NOT circuit to compute a function \(g:\{0,1\}^n \rightarrow \{0,1\}^m\), then we can compute \(g\) in the physical world using transistors as well.

### Biological computing

Computation can be based on biological or chemical systems. For example the *lac* operon produces the enzymes needed to digest lactose only if the conditions \(x \wedge (\neg y)\) hold where \(x\) is “lactose is present” and \(y\) is “glucose is present”. Researchers have managed to create transistors, and from them logic gates, based on DNA molecules (see also Figure 3.18). One motivation for DNA computing is to achieve increased parallelism or storage density; another is to create “smart biological agents” that could perhaps be injected into bodies, replicate themselves, and fix or kill cells that were damaged by a disease such as cancer. Computing in biological systems is not restricted of course to DNA. Even larger systems such as flocks of birds can be considered as computational processes.

### Cellular automata and the game of life

*Cellular automata* is a model of a system composed of a sequence of *cells*, which of which can have a finite state. At each step, a cell updates its state based on the states of its *neighboring cells* and some simple rules. As we will discuss later in this course, cellular automata such as Conway’s “Game of Life” can be used to simulate computation gates (see Figure 3.19).

### Neural networks

One computation device that we all carry with us is our own *brain*. Brains have served humanity throughout history, doing computations that range from distinguishing prey from predators, through making scientific discoveries and artistic masterpieces, to composing witty 280 character messages. The exact working of the brain is still not fully understood, but it seems that to a first approximation it can be modeled by a (very large) *neural network*.

A neural network can be thought of as a Boolean circuit that instead of \(\ensuremath{\mathit{AND}}\)/\(\ensuremath{\mathit{OR}}\)/\(\ensuremath{\mathit{NOT}}\) uses some other gates as the basic basis. For example, one particular basis we can use are *threshold gates*. For every vector \(w= (w_0,\ldots,w_{k-1})\) of integers and integer \(t\) (some or all of whom could be negative), the *threshold function corresponding to \(w,t\)* is the function \(T_{w,t}:\{0,1\}^k \rightarrow \{0,1\}\) that maps \(x\in \{0,1\}^k\) to \(1\) if and only if \(\sum_{i=0}^{k-1} w_i x_i \geq t\). For example, the threshold function \(T_{w,t}\) corresponding to \(w=(1,1,1,1,1)\) and \(t=3\) is simply the majority function \(\ensuremath{\mathit{MAJ}}_5\) on \(\{0,1\}^5\). As another example, the negation of AND (known as \(\ensuremath{\mathit{NAND}}\)) corresponds to the threshold function corresponding to \(w=(-1,-1)\) and \(t=-1\), since \(\ensuremath{\mathit{NAND}}(x_0,x_1)=1\) if and only if \(x_0 + x_1 \leq 1\) or equivalently, \(-x_0 - x_1 \geq -1\).^{10}

Threshold gates can be thought of as an approximation for *neuron cells* that make up the core of human and animal brains. To a first approximation, a neuron has \(k\) inputs and a single output and the neurons “fires” or “turns on” its output when those signals pass some threshold.

### The marble computer

We can implement computation using many other physical media, without need for any electronic, biological, or chemical components. Many suggestions for *mechanical* computers have been put forward, starting with Charles Babbage’s 1837 plan for a mechanical “Analytical Engine”.

As one example, Figure 3.20 shows a simple implementation of a NAND (negation of AND, see Subsection 3.5.3) gate using marbles going through pipes. We represent a logical value in \(\{0,1\}\) by a pair of pipes, such that there is a marble flowing through exactly one of the pipes. We call one of the pipes the “\(0\) pipe” and the other the “\(1\) pipe”, and so the identity of the pipe containing the marble determines the logical value. A NAND gate corresponds to a mechanical object with two pairs of incoming pipes and one pair of outgoing pipes, such that for every \(a,b \in \{0,1\}\), if two marble are rolling toward the object in the \(a\) pipe of the first pair and the \(b\) pipe of the second pair, then a marble will roll out of the object in the \(\ensuremath{\mathit{NAND}}(a,b)\)-pipe of the outgoing pair. In fact, there is even a commercially-available educational game that uses marbles as a basis of computing, see Figure 3.22.

## The NAND function

Here is another function we can compute using \(\ensuremath{\mathit{AND}},\ensuremath{\mathit{OR}},\ensuremath{\mathit{NOT}}\). The \(\ensuremath{\mathit{NAND}}\) function maps \(\{0,1\}^2\) to \(\{0,1\}\) and is defined as

\[\ensuremath{\mathit{NAND}}(a,b) = \begin{cases} 0 & a=b=1 \\ 1 & \text{otherwise} \end{cases}\]

As its name implies, \(\ensuremath{\mathit{NAND}}\) is the NOT of AND (i.e., \(\ensuremath{\mathit{NAND}}(a,b)= \ensuremath{\mathit{NOT}}(\ensuremath{\mathit{AND}}(a,b))\)), and so we can clearly compute \(\ensuremath{\mathit{NAND}}\) using \(\ensuremath{\mathit{AND}}\) and \(\ensuremath{\mathit{NOT}}\). Interestingly, the opposite direction also holds:

We can compute \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\), and \(\ensuremath{\mathit{NOT}}\) by composing only the \(\ensuremath{\mathit{NAND}}\) function.

We start with the following observation. For every \(a\in \{0,1\}\), \(\ensuremath{\mathit{AND}}(a,a)=a\). Hence, \(\ensuremath{\mathit{NAND}}(a,a)=\ensuremath{\mathit{NOT}}(\ensuremath{\mathit{AND}}(a,a))=\ensuremath{\mathit{NOT}}(a)\). This means that \(\ensuremath{\mathit{NAND}}\) can compute \(\ensuremath{\mathit{NOT}}\). By the principle of “double negation”, \(\ensuremath{\mathit{AND}}(a,b)=\ensuremath{\mathit{NOT}}(\ensuremath{\mathit{NOT}}(\ensuremath{\mathit{AND}}(a,b)))\), and hence we can use \(\ensuremath{\mathit{NAND}}\) to compute \(\ensuremath{\mathit{AND}}\) as well. Once we can compute \(\ensuremath{\mathit{AND}}\) and \(\ensuremath{\mathit{NOT}}\), we can compute \(\ensuremath{\mathit{OR}}\) using “De Morgan’s Law”: \(\ensuremath{\mathit{OR}}(a,b)=\ensuremath{\mathit{NOT}}(\ensuremath{\mathit{AND}}(\ensuremath{\mathit{NOT}}(a),\ensuremath{\mathit{NOT}}(b)))\) (which can also be written as \(a \vee b = \overline{\overline{a} \wedge \overline{b}}\)) for every \(a,b \in \{0,1\}\).

Theorem 3.10’s proof is very simple, but you should make sure that **(i)** you understand the statement of the theorem, and **(ii)** you follow its proof completely. In particular, you should make sure you understand why De Morgan’s law is true.

If you are so inclined, you can also verify the proof of Theorem 3.10 by Python. The following program contains an implementation of \(\ensuremath{\mathit{OR}}\) using \(\ensuremath{\mathit{NAND}}\) and code to test that the output of this implementation is indeed correct on all the four inputs in \(\{0,1\}^2\).

Let \(\ensuremath{\mathit{MAJ}}: \{0,1\}^3 \rightarrow \{0,1\}\) be the function that on input \(a,b,c\) outputs \(1\) iff \(a+b+c \geq 2\). Show how to compute \(\ensuremath{\mathit{MAJ}}\) using a composition of \(\ensuremath{\mathit{NAND}}\)’s.

Recall that Equation 3.4 states that

\[ \ensuremath{\mathit{MAJ}}(x_0,x_1,x_2) = \ensuremath{\mathit{OR}}\left(\, \ensuremath{\mathit{AND}}(x_0,x_1)\;,\; \ensuremath{\mathit{OR}} \bigl( \ensuremath{\mathit{AND}}(x_1,x_2) \;,\; \ensuremath{\mathit{AND}}(x_0,x_2) \bigr) \, \right) \;. \;\;(3.9) \]

We can use Theorem 3.10 to replace all the occurrences of \(\ensuremath{\mathit{AND}}\) and \(\ensuremath{\mathit{OR}}\) with \(\ensuremath{\mathit{NAND}}\)’s. Specifically, we can use the equivalence \(\ensuremath{\mathit{AND}}(a,b)=\ensuremath{\mathit{NOT}}(\ensuremath{\mathit{NAND}}(a,b))\), \(\ensuremath{\mathit{OR}}(a,b)=\ensuremath{\mathit{NAND}}(\ensuremath{\mathit{NOT}}(a),\ensuremath{\mathit{NOT}}(b))\), and \(\ensuremath{\mathit{NOT}}(a)=\ensuremath{\mathit{NAND}}(a,a)\) to replace the righthand side of Equation 3.9 with an expression involving only \(\ensuremath{\mathit{NAND}}\), yielding that \(\ensuremath{\mathit{MAJ}}(a,b,c)\) is equivalent the (somewhat unwieldy) expression

\[ \begin{gathered} \ensuremath{\mathit{NAND}} \biggl(\, \ensuremath{\mathit{NAND}}\Bigl(\, \ensuremath{\mathit{NAND}}\bigl(\ensuremath{\mathit{NAND}}(a,b),\ensuremath{\mathit{NAND}}(a,c)\bigr), \\ \ensuremath{\mathit{NAND}}\bigl(\ensuremath{\mathit{NAND}}(a,b),\ensuremath{\mathit{NAND}}(a,c)\bigr)\, \Bigr),\\ \ensuremath{\mathit{NAND}}(b,c) \, \biggr) \end{gathered} \]

This corresponds to the following circuit with \(\ensuremath{\mathit{NAND}}\) gates:

### NAND Circuits

We can define *NAND Circuits* to be circuits in which all the gates are NAND operations. Such a circuit again corresponds to a directed acyclic graph (DAG) but it is even simpler than general Boolan circuits: all the gates correspond to the same function (i.e., NAND) and all of them have in-degree exactly two. Despite their simplicity, NAND circuits can be quite powerful.

Recall the \(\ensuremath{\mathit{XOR}}\) function which maps \(x_0,x_1 \in \{0,1\}\) to \(x_0 + x_1 \mod 2\). We have seen in ?? that we can compute \(\ensuremath{\mathit{XOR}}\) using \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\), and \(\ensuremath{\mathit{NOT}}\), and so by Theorem 3.10 we can compute it using only \(\ensuremath{\mathit{NAND}}\)’s. The following is a direct construction of computing \(\ensuremath{\mathit{XOR}}\) by a sequence of NAND operations:

- Let \(u = \ensuremath{\mathit{NAND}}(x_0,x_1)\).
- Let \(v = \ensuremath{\mathit{NAND}}(x_0,u)\)
- Let \(w = \ensuremath{\mathit{NAND}}(x_1,u)\).
- The \(\ensuremath{\mathit{XOR}}\) of \(x_0\) and \(x_1\) is \(y_0 = \ensuremath{\mathit{NAND}}(v,w)\).

One can verify that this algorithm does indeed compute \(\ensuremath{\mathit{XOR}}\) by enumerating all the four choices for \(x_0,x_1 \in \{0,1\}\).

We can also represent this algorithm graphically as a circuit:

In fact, we can show the following theorem:

For every Boolean circuit \(C\) of \(s\) gates, there exists a NAND circuit \(C'\) of at most \(3s\) gates that computes the same function as \(C\).

The idea of the proof is to just replace every \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\) and \(\ensuremath{\mathit{NOT}}\) gate with their NAND implementation following the proof of Theorem 3.10.

If \(C\) is a Boolean circuit, then since, as we’ve seen in the proof of Theorem 3.10, for every \(a,b \in \{0,1\}\)

\(\ensuremath{\mathit{NOT}}(a) = \ensuremath{\mathit{NAND}}(a,a)\)

\(\ensuremath{\mathit{AND}}(a,b) = \ensuremath{\mathit{NAND}}(\ensuremath{\mathit{NAND}}(a,b),\ensuremath{\mathit{NAND}}(a,b))\)

\(\ensuremath{\mathit{OR}}(a,b) = \ensuremath{\mathit{NAND}}(\ensuremath{\mathit{NAND}}(a,a),\ensuremath{\mathit{NAND}}(b,b))\)

we can replace every gate of \(C\) with at most three \(\ensuremath{\mathit{NAND}}\) gates to obtain an equivalent circuit \(C'\). The resulting circuit will have at most \(3s\) gates.

Once we have shown that two models such AND/OR/NOT circuits and NAND circuits are *computationally equivalent*, we can translate between one model to the other freely. Therefore we can always choose the model that is most convenient for the task at hand.

### More examples of NAND circuits (optional)

Here are some more sophisticated examples of NAND circuits

Consider the task of computing, given as input a string \(x\in \{0,1\}^n\) that represents a natural number \(X\in \N\), the representation of \(X+1\). That is, we want to compute the function \(\ensuremath{\mathit{INC}}_n:\{0,1\}^n \rightarrow \{0,1\}^{n+1}\) such that for every \(x_0,\ldots,x_{n-1}\), \(\ensuremath{\mathit{INC}}_n(x)=y\) which satisfies \(\sum_{i=0}^n y_i 2^i = \left( \sum_{i=0}^{n-1} x_i 2^i \right)+1\). (For simplicity of notation in this example we will use the representation where the least significant digit is first rather than last.)

The increment operation can be very informally described as follows: *“Add \(1\) to the least significant bit and propagate the carry”*. A little more precisely, in the case of the binary representation, to obtain the increment of \(x\), we scan \(x\) from the least significant bit onwards, and flip all \(1\)’s to \(0\)’s until we encounter a bit equal to \(0\), in which case we flip it to \(1\) and stop. (Please verify you understand why this is the case.)

Thus we can compute the increment of \(x_0,\ldots,x_{n-1}\) by doing the following:

- Set \(c_0=1\) (we pretend we have a “carry” of \(1\) initially)
- For \(i=0,\ldots, n-1\) do the following:

- Let \(y_i = \ensuremath{\mathit{XOR}}(x_i,c_i)\).
- If \(c_i=x_i=1\) then \(c_{i+1}=1\), else \(c_{i+1}=0\).

- Set \(y_n = c_n\).

The above is a very precise description of an algorithm to compute the increment operation, and can be easily transformed into *Python* code that performs the same computation, but it does not seem to directly yield a NAND circuit to compute this. However, we can transform this algorithm line by line to a NAND circuit. For example, since for every \(a\), \(\ensuremath{\mathit{NAND}}(a,\ensuremath{\mathit{NOT}}(a))=1\), we can replace the initial statement \(c_0=1\) with \(c_0 = \ensuremath{\mathit{NAND}}(x_0,\ensuremath{\mathit{NAND}}(x_0,x_0))\). We already know how to compute \(\ensuremath{\mathit{XOR}}\) using NAND, so line 2.a can be replaced by some NAND operations. Next, we can write line 2.b as simply saying \(c_{i+1} = \ensuremath{\mathit{AND}}(y_i,x_i)\), or in other words \(c_{i+1}=\ensuremath{\mathit{NAND}}(\ensuremath{\mathit{NAND}}(y_i,x_i),\ensuremath{\mathit{NAND}}(y_i,x_i))\). Finally, the assignment \(y_n = c_n\) can be written as \(y_n = \ensuremath{\mathit{NAND}}(\ensuremath{\mathit{NAND}}(c_n,c_n),\ensuremath{\mathit{NAND}}(c_n,c_n))\). Combining these observations yields for every \(n\in \N\), a \(\ensuremath{\mathit{NAND}}\) circuit to compute \(\ensuremath{\mathit{INC}}_n\). For example, this is how this circuit looks like for \(n=4\).

Once we have the increment operation, we can certainly compute addition by repeatedly incrementing (i.e., compute \(x+y\) by performing \(\ensuremath{\mathit{INC}}(x)\) \(y\) times). However, that would be quite inefficient and unnecessary. With the same idea of keeping track of carries we can implement the “grade-school” addition algorithm and compute the function \(\ensuremath{\mathit{ADD}}_n:\{0,1\}^{2n} \rightarrow \{0,1\}^{n+1}\) that on input \(x\in \{0,1\}^{2n}\) outputs the binary representation of the sum of the numbers represented by \(x_0,\ldots,x_{n-1}\) and \(x_{n+1},\ldots,x_n\):

- Set \(c_0=0\).
- For \(i=0,\ldots,n-1\):
- Let \(y_i = x_i + x_{n+i} + c_i (\mod 2)\).
- If \(x_i + x_{n+i} + c_i \geq 2\) then \(c_{i+1}=1\).

- Let \(y_n = c_n\)

Once again, this can be translated into a NAND circuit. To transform Step 2.b to a NAND circuit we use the fact (shown in Solvedexercise 3.3) that the function \(\ensuremath{\mathit{MAJ}}_3:\{0,1\}^3 \rightarrow \{0,1\}\) can be computed using \(\ensuremath{\mathit{NAND}}\)s.

### The NAND-CIRC Programming language

Just like we did for Boolean circuits, we can define a programming-language analog of NAND circuits. It is even simpler than the AON-CIRC language since we only have a single operation. We define the *NAND-CIRC Programming Language* to be a programming language where every line has the following form:

where `foo`

, `bar`

and `blah`

are variable identifiers.

Here is an example of a NAND-CIRC program:

Do you know what function this program computes? Hint: you have seen it before.

We can formally define the notion of computation by a NAND-CIRC program in the natural way:

Let \(f:\{0,1\}^n \rightarrow \{0,1\}^m\) be some function, and let \(P\) be a NAND-CIRC program. We say that \(P\) *computes* the function \(F\) if:

\(P\) has \(n\) input variables

`X[`

\(0\)`]`

\(,\ldots,\)`X[`

\(n-1\)`]`

and \(m\) output variables`Y[`

\(0\)`]`

,\(\ldots\),`Y[`

\(m-1\)`]`

.For every \(x\in \{0,1\}^n\), if we execute \(P\) when we assign to

`X[`

\(0\)`]`

\(,\ldots,\)`X[`

\(n-1\)`]`

the values \(x_0,\ldots,x_{n-1}\), then at the end of the execution, the output variables`Y[`

\(0\)`]`

,\(\ldots\),`Y[`

\(m-1\)`]`

have the values \(y_0,\ldots,y_{m-1}\) where \(y=f(x)\).

As before we can show that NAND circuits are equivalent to NAND-CIRC programs (see Figure 3.23):

For every \(f:\{0,1\}^n \rightarrow \{0,1\}^m\) and \(s \geq m\), \(f\) is computable by a NAND-CIRC program of \(s\) lines if and only if \(f\) is computable by a NAND circuit of \(s\) gates.

We omit the proof of Theorem 3.18 since it follows along exactly the same lines as the equivalence of Boolean circuits and AON-CIRC program (Theorem 3.9). Given Theorem 3.18 and Theorem 3.13, we know that we can translate every \(s\)-line AON-CIRC program \(P\) into an equivalent NAND-CIRC program of at most \(3s\) lines. In fact, this translation can be easily done by replacing every line of the form `foo = AND(bar,blah)`

, `foo = OR(bar,blah)`

or `foo = NOT(bar)`

with the equivalent 1-3 lines that use the `NAND`

operation.

Here is a Here is a “proof by code”: a simple Python program that translates an input AON-CIRC program into an equivalent NAND-CIRC program:

```
def AON2NAND(code):
"""Translate an AON-CIRC program to an equivalent NAND-CIRC program"""
output = ""
counter = 0
for line in code.split("\n"):
if not line: continue
foo,op,bar,blah = parseline(line,2)
if op=="NOT":
output += f"{foo} = NAND({bar},{bar})\n"
if op=="AND":
output += f"temp_{counter} = NAND({bar},{blah})\n"
output += f"{foo} = NAND(temp_{counter},temp_{counter})\n"
counter +=1
if op=="OR":
output += f"temp_{counter} = NAND({bar},{bar})\n"
output += f"temp_{counter+1} = NAND({blah},{blah})\n"
output += f"{foo} = NAND(temp_{counter},temp_{counter+1})\n"
counter +=2
return output
# The AON-CIRC code
t1 = AND(X[0],X[1])
notx0 = NOT(X[0])
t2 = AND(notx0,X[2])
Y[0] = OR(t1,t2)
# will be translated into the NAND-CIRC code
temp_0 = NAND(X[0],X[1])
t1 = NAND(temp_0,temp_0)
notx0 = NAND(X[0],X[0])
temp_1 = NAND(notx0,X[2])
t2 = NAND(temp_1,temp_1)
temp_2 = NAND(t1,t1)
temp_3 = NAND(t2,t2)
Y[0] = NAND(temp_2,temp_3)
```

You might have heard of a term called “Turing Complete” that is sometimes used to describe programming languages. (If you haven’t, feel free to ignore the rest of this remark: we will encounter this term in Chapter 7 where we will define it precisely.) If so, you might wonder if the NAND-CIRC programming language has this property. The answer is **no**, or perhaps more accurately, the term “Turing Completeness” is not really applicable for the NAND-CIRC programming language. The reason is that, by design, the NAND-CIRC programming language can only compute *finite* functions \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) that take a fixed number of input bits and produce a fixed number of outputs bits. The term “Turing Complete” is only applicable to programming languages for *infinite* functions that can take inputs of arbitrary length. We will come back to this distinction later on in this book.

## Equivalence of all these models

If we put together Theorem 3.9, Theorem 3.13, and Theorem 3.18, we obtain the following result:

For every sufficiently large \(s,n,m\) and \(f:\{0,1\}^n \rightarrow \{0,1\}^m\), the following conditions are all equivalent to one another:

\(f\) can be computed by a Boolean circuit (with \(\wedge,\vee,\neg\) gates) of at most \(O(s)\) gates.

\(f\) can be computed by an AON-CIRC straight-line program of at most \(O(s)\) lines.

\(f\) can be computed by a NAND circuit of at most \(O(s)\) gates.

\(f\) can be computed by a NAND-CIRC straight-line program of at most \(O(s)\) lines.

By “\(O(s)\)” we mean that the bound is at most \(c\cdot s\) where \(c\) is a constant that is independent of \(n\). For example, if \(f\) can be computed by a Boolean circuit of \(s\) gates, then it can be computed by a NAND-CIRC program of at most \(3s\) lines, and if \(f\) can be computed by a NAND circuit of \(s\) gates, then it can be computed by an AON-CIRC program of at most \(2s\) lines.

We omit the formal proof, which is obtained by combining Theorem 3.9, Theorem 3.13, and Theorem 3.18. The key observation is that the results we have seen allow us to translate a program/circuit that computes \(f\) in one of the above models into a program/circuit that computes \(f\) in another model by increasing the lines/gates by at most a constant factor (in fact this constant factor is at most \(3\)).

We can describe a finite computation that uses some set of basic operations using either a *circuit* or a *straightline program*, and these two representations are *equivalent* to one another. Moreover, if we can implement one set of basic operations using another and vice versa, then circuits/programs using one of these sets are equivalent in power to circuits/programs using the other.

Theorem 3.9 is a special case of a more general result. We can consider even more general models of computation, where instead of AND/OR/NOT or NAND, we use other operations (see Subsection 3.6.1 below). It turns out that Boolean circuits are equivalent in power to such models as well. The fact that all these different ways to define computation lead to equivalent models shows that we are “on the right track”. It justifies the seemingly arbitrary choices that we’ve made of using AND/OR/NOT or NAND as our basic operations, since these choices do not affect the computational model of our power.

Equivalence results such as Theorem 3.20 mean that we can easily translate between Boolean circuits, NAND circuits, NAND-CIRC programs and the like. We will use this ability later on in this book, often shifting to the most convenient formulation without making a big deal about it. Hence we will not worry too much about the distinction between, for example, Boolean circuits and NAND-CIRC programs.

In contrast, we will continue to take special care to distinguish between *circuits/programs* and *functions* (recall Bigidea 2). A function corresponds to a *specification* of a computational task, and it is a fundamentally different object than a program or a circuit, which corresponds to the *implementation* of the task.

### Circuits with other gate sets

There is nothing special about AND/OR/NOT or NAND. For every set of functions \(\mathcal{G} = \{ G_0,\ldots,G_{k-1} \}\), we can define a notion of circuits that use elements of \(\mathcal{G}\) as gates, and a notion of a “\(\mathcal{G}\) programming language” where every line involves assigning to a variable `foo`

the result of applying some \(G_i \in \mathcal{G}\) to previously defined or input variables. Specifically, we can make the following definition:

Let \(\mathcal{F} = \{ f_0,\ldots, f_{t-1} \}\) be a finite collection of Boolean functions, such that \(f_i:\{0,1\}^{k_i} \rightarrow \{0,1\}\) for some \(k_i \in \N\). An *\(\mathcal{F}\) program* is a sequence of lines, each of which assigns to some variable the result of applying some \(f_i \in \mathcal{F}\) to \(k_i\) other variables. As above, we use `X[`

\(i\)`]`

and `Y[`

\(j\)`]`

to denote the input and output variables.

We say that \(\mathcal{F}\) is a *universal set of operations* (also known as a universal gate set) if there exists a \(\mathcal{F}\) program to compute the function \(\ensuremath{\mathit{NAND}}\).

AON-CIRC programs correspond to \(\{AND,\ensuremath{\mathit{OR}},\ensuremath{\mathit{NOT}}\}\) programs, NAND-CIRC programs corresponds to \(\mathcal{F}\) programs for the set \(\mathcal{F}\) that only contains the \(\ensuremath{\mathit{NAND}}\) function, but we can also define \(\{ \ensuremath{\mathit{XOR}},0,1\}\) programs, or use any other set.

We can also define *\(\mathcal{F}\) circuits*, which will be directed graphs in which the *gates* corresponds to applying a function \(f_i \in \mathcal{F}\), and will each have \(k_i\) incoming wires and a single outgoing wire.^{11} As in Theorem 3.9, we can show that \(\mathcal{F}\) circuits and \(\mathcal{F}\) programs are equivalent. We have seen that for \(\mathcal{F} = \{ \ensuremath{\mathit{AND}},\ensuremath{\mathit{OR}}, \ensuremath{\mathit{NOT}}\}\), the resulting circuits/programs are equivalent in power to the NAND-CIRC programming language, as we can compute \(\ensuremath{\mathit{NAND}}\) using \(\ensuremath{\mathit{AND}}\)/\(\ensuremath{\mathit{OR}}\)/\(\ensuremath{\mathit{NOT}}\) and vice versa. This turns out to be a special case of a general phenomena— the *universality* of \(\ensuremath{\mathit{NAND}}\) and other gate sets — that we will explore more in depth later in this book.

Let \(\mathcal{F} = \{ \ensuremath{\mathit{IF}} , \ensuremath{\mathit{ZERO}}, \ensuremath{\mathit{ONE}} \}\) where \(\ensuremath{\mathit{ZERO}}:\{0,1\} \rightarrow \{0,1\}\) and \(\ensuremath{\mathit{ONE}}:\{0,1\} \rightarrow \{0,1\}\) are the constant zero and one functions,^{12} and \(\ensuremath{\mathit{IF}}:\{0,1\}^3 \rightarrow \{0,1\}\) is the function that on input \((a,b,c)\) outputs \(b\) if \(a=1\) and \(c\) otherwise. Then \(\mathcal{F}\) is universal.

Indeed, we can demonstrate that \(\{ \ensuremath{\mathit{IF}}, \ensuremath{\mathit{ZERO}} , \ensuremath{\mathit{ONE}} \}\) is universal using the following formula for \(\ensuremath{\mathit{NAND}}\):

\[ \ensuremath{\mathit{NAND}}(a,b) = \ensuremath{\mathit{IF}}(a,\ensuremath{\mathit{IF}}(b,\ensuremath{\mathit{ZERO}},\ensuremath{\mathit{ONE}}),\ensuremath{\mathit{ONE}}) \;. \]

There are also some sets \(\mathcal{F}\) that are more restricted in power, for example it can be shown that if we use only AND or OR gates (without NOT) then we do *not* get an equivalent model of comutation. The exercises cover several examples of universal and non-universal gate sets.

- An
*algorithm*is a recipe for performing a computation as a sequence of “elementary” or “simple” operations. - One candidate definition for “elementary” operations is the set \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\) and \(\ensuremath{\mathit{NOT}}\).
- Another candidate definition for an “elementary” operation is the \(\ensuremath{\mathit{NAND}}\) operation. It is an operation that is easily implementable in the physical world in a variety of methods including by electronic transistors.
- We can use \(\ensuremath{\mathit{NAND}}\) to compute many other functions, including majority, increment, and others.
- There are other equivalent choices, including the set \(\{AND,\ensuremath{\mathit{OR}},\ensuremath{\mathit{NOT}}\}\).
- We can formally define the notion of a function \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) being computable using the
*NAND-CIRC Programming language*. - For every set of basic operations, the notions of being computable by a circuit and being computable by a straight-line program are equivalent.

## Exercises

Prove that the set \(\{ \ensuremath{\mathit{OR}} , \ensuremath{\mathit{NOT}} \}\) is *universal*, in the sense that one can compute NAND from it.

Prove that for every \(n\)-bit input circuit \(C\) that contains only AND, and OR gates, as well as gates that compute the constant functions \(0\) and \(1\), \(C\) is *monotone*, in the sense that if \(x,x' \in \{0,1\}^n\), \(x_i \leq x'_i\) for every \(i\in [n]\), then \(C(x) \leq C(x')\).

Conclude that the set \(\{ \ensuremath{\mathit{AND}} , \ensuremath{\mathit{OR}}, 0 , 1\}\) is *not* universal.

Prove that for every \(n\)-bit input circuit \(C\) that contains only XOR, gates, as well as gates that compute the constant functions \(0\) and \(1\), \(C\) is *affine or linear modulo two*, in the sense that there exists some \(a\in \{0,1\}^n\) and \(b\in \{0,1\}\) such that for every \(x\in \{0,1\}^n\), \(C(x) = \sum_{i=0}^{n-1}a_ix_i + b \mod 2\).

Conclude that the set \(\{ \ensuremath{\mathit{XOR}} , 0 , 1\}\) is *not* universal.

Prove that \(\{ \ensuremath{\mathit{MAJ}},\ensuremath{\mathit{NOT}} \}\) is a universal set of gates.

Let \(\ensuremath{\mathit{NOR}}:\{0,1\}^2 \rightarrow \{0,1\}\) defined as \(\ensuremath{\mathit{NOR}}(a,b) = \ensuremath{\mathit{NOT}}(\ensuremath{\mathit{OR}}(a,b))\). Prove that \(\{ \ensuremath{\mathit{NOR}} \}\) is a universal set of gates.

Prove that \(\{ \ensuremath{\mathit{LOOKUP}}_1,0,1 \}\) is a universal set of gates where \(0\) and \(1\) are the constant functions \(\ensuremath{\mathit{LOOKUP}}_1:\{0,1\}^3 \rightarrow \{0,1\}\) satisfies \(\ensuremath{\mathit{LOOKUP}}_1(a,b,c)\) equals \(a\) if \(c=0\) and equals \(b\) if \(c=1\).

Prove that for every subset \(B\) of the functions from \(\{0,1\}^k\) to \(\{0,1\}\), if \(B\) is universal then there is a \(B\)-circuit of at most \(O(k)\) gates to compute the \(\ensuremath{\mathit{NAND}}\) function (you can start by showing that there is a \(B\) circuit of at most \(O(k^{16})\) gates).^{13}

Prove that there is some constant \(c\) such that for every \(n>1\), and integers \(a_0,\ldots,a_{n-1},b \in \{-2^n,-2^n+1,\ldots,-1,0,+1,\ldots,2^n\}\), there is a NAND circuit with at most \(c\dot n^4\) gates that computes the *threshold* function \(f_{a_0,\ldots,a_{n-1},b}:\{0,1\}^n \rightarrow \{0,1\}\) that on input \(x\in \{0,1\}^n\) outputs \(1\) if and only if \(\sum_{i=0}^{n-1} a_i x_i > b\).

Prove that there is some constant \(c\) such that for every \(n>1\), there is a NAND circuit of at most \(c\cdot n\) gates that computes the function \(\ensuremath{\mathit{MAJ}}_n:\{0,1\}^n \rightarrow \{0,1\}\) is the majority function on \(n\) input bits. That is \(\ensuremath{\mathit{MAJ}}_n(x)=1\) iff \(\sum_{i=0}^{n-1}x_i > n/2\).^{14}

## Biographical notes

Charles Babbage (1791-1871) was a visionary scientist, mathematician, and inventor (see (Swade, 2002) (Collier, MacLachlan, 2000) ). More than a century before the invention of modern electronic computers, Babbage realized that computation can be in principle mechanized. His first design for a mechanical computer was the *difference engine* that was designed to do polynomial interpolation. He then designed the *analytical engine* which was a much more general machine and the first prototype for a programmable general purpose computer. Unfortunately, Babbage was never able to complete the design of his prototypes. One of the earliest people to realize the engine’s potential and far reaching implications was Ada Lovelace (see the notes to Chapter 6).

Boolean algebra was first investigated by Boole and DeMorgan in the 1840’s (Boole, 1847) (De Morgan, 1847) but the definition of Boolean circuits and connection to electrical relay circuits was given in Shannon’s Masters Thesis (Shannon, 1938) . (Howard Gardener called Shannon’s thesis “possibly the most important, and also the most famous, master’s thesis of the [20th] century”.) Savage’s book (Savage, 1998) , like this one, introduces the theory of computation starting with Boolean circuits as the first model. Jukna’s book (Jukna, 2012) contains a modern exposition of Boolean circuits. See also Wegener’s book (Wegener, 1987) .

The NAND function was shown to be universal by Sheffer (Sheffer, 1913) (though apparently this was shown even earlier by Peirce, see (Peirce, Eisele, 1976) (Burks, 1978) ). Whitehead and Russell used NAND as the basis for their logic in their magnum opus *Principia Mathematica* (Whitehead, Russell, 1912) . In her Ph.D thesis, Ernst (Ernst, 2009) investigates empirically the minimal NAND circuits for various functions. Nissan and Shocken’s book (Nisan, Schocken, 2005) builds a computing system starting from NAND gates and ending with high level programs games (“NAND to Tetris”); see also the website nandtotetris.org.

Indeed, extrapolation from examples is still the way most of us first learn algorithms such as addition and multiplication, see Figure 3.4)

Translation from “The Algebra of Ben-Musa”, Fredric Rosen, 1831.

This is not a programming textbook, and it is absolutely fine if you don’t know Python. Still the code below should be fairly self-explanatory.

When Boolean circuits are implemented physically, the signal is often implemented by electric potential or

*voltage*on a wire, where for example voltage above a certain level is interpreted as a logical value of \(1\), and below a certain level is interpreted as a logical value of \(0\).Since \(h\) is a minimal layering, all the input vertices will be in the \(0\)-th layer.

Note that since \(u,w\) are in-neighbors of \(v\), they are in lower layer than \(v\), and hence their value has already been assigned.

We follow the common programming languages convention of using names such as

`foo`

,`bar`

,`baz`

,`blah`

as stand-ins for generic identifiers. A variable identifier in our programming language can be any combination of letters, numbers, underscores, and brackets. The appendix contains a full formal specification of our programming language.This program uses two “helper functions”

`numinout`

and`parseline`

. The (short) code of these functions is available on the GitHub repository for this book.This might seem as merely a curiosity but there is a field known as fluidics concerned with implementing logical operations using liquids or gasses. Some of the motivations include operating in extreme environmental conditions such as in space or a battlefield, where standard electronic equipment would not survive.

Threshold is just one example of gates that can used by neural networks. More generally, a neural network is often described as operating on signals that are real numbers, rather than \(0/1\) values, and where the output of a gate on inputs \(x_0,\ldots,x_{k-1}\) is obtained by applying \(f(\sum_i w_i x_i)\) where \(f:\R \rightarrow \R\) is an an activation function such as rectified linear unit (ReLU), Sigmoid, or many others. However, for the purpose of our discussion, all of the above are equivalent. In particular we can reduce the real case to the binary case by a real number in the binary basis, and multiplying the weight of the bit corresponding to the \(i^{th}\) digit by \(2^i\).

There is a minor technical complication when using gates corresponding to

*non symmetric*functions. A function \(f:\{0,1\}^k \rightarrow \{0,1\}\) is*symmetric*if re-ordering its inputs does not make a difference to the output. For example, the functions \(\ensuremath{\mathit{NAND}}\), \(\ensuremath{\mathit{AND}}\), \(\ensuremath{\mathit{OR}}\) are symmetric. If we consider circuits with gates that are non-symmetric functions, then we need to label each wire entering a gate as to which parameter of the function it correspond to.One can also define these functions as taking a length zero input. This makes no difference for the computational power of the model.

Thanks to Alec Sun for solving this problem.

*Hint:*One approach to solve this is using recursion and analyzing it using the so called “Master Theorem”.

## Comments

Comments are posted on the GitHub repository using the utteranc.es app. A GitHub login is required to comment. If you don't want to authorize the app to post on your behalf, you can also comment directly on the GitHub issue for this page.

Compiled on 04/18/2019 14:47:35

Copyright 2019, Boaz Barak.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Produced using pandoc and panflute with templates derived from gitbook and bookdown.