See any bugs/typos/confusing explanations? Open a GitHub issue. You can also comment below
★ See also the PDF version of this chapter (better formatting/references) ★
Loops and infinity
 Learn the model of Turing machines, which can compute functions of arbitrary input lengths.
 See a programminglanguage description of Turing machines, using NANDTM programs, which add loops and arrays to NANDCIRC.
 See some basic syntactic sugar and equivalence of variants of Turing machines and NANDTM programs.
“The bounds of arithmetic were however outstepped the moment the idea of applying the [punched] cards had occurred; and the Analytical Engine does not occupy common ground with mere”calculating machines."" … In enabling mechanism to combine together general symbols, in successions of unlimited variety and extent, a uniting link is established between the operations of matter and the abstract mental processes of the most abstract branch of mathematical science. ", Ada Augusta, countess of Lovelace, 1843^{1}
“What is the difference between a Turing machine and the modern computer? It’s the same as that between Hillary’s ascent of Everest and the establishment of a Hilton hotel on its peak.” , Alan Perlis, 1982.
The model of Boolean circuits (or equivalently, the NANDCIRC programming language) has one very significant drawback: a Boolean circuit can only compute a finite function \(f\), and in particular the number of inputs of \(f\) is always smaller than (twice) the number of gates of the circuit.^{2}
This does not capture our intuitive notion of an algorithm as a single recipe to compute a potentially infinite function. For example, the standard elementary school multiplication algorithm is a single algorithm that multiplies numbers of all lengths, but yet we cannot express this algorithm as a single NANDCIRC program, but rather need a different NANDCIRC program for every input length (see Figure 6.1).
Let us consider the case of the simple parity or XOR function \(\ensuremath{\mathit{XOR}}:\{0,1\}^* \rightarrow \{0,1\}\), where \(\ensuremath{\mathit{XOR}}(x)\) equals \(1\) iff the number of \(1\)’s in \(x\) is odd. (In other words, \(\ensuremath{\mathit{XOR}}(x) = \sum_{i=0}^{x1} x_i \mod 2\) for every \(x\in \{0,1\}^*\).) As simple as it is, the \(\ensuremath{\mathit{XOR}}\) function cannot be computed by a NANDCIRC program. Rather, for every \(n\), we can compute \(\ensuremath{\mathit{XOR}}_n\) (the restriction of \(\ensuremath{\mathit{XOR}}\) to \(\{0,1\}^n\)) using a different NANDCIRC program. For example, Figure 6.2 presents the NANDCIRC program (or equivalently the circuit) to compute \(\ensuremath{\mathit{XOR}}_5\).
This code for computing \(\ensuremath{\mathit{XOR}}_5\) is rather repetitive, and more importantly, does not capture the fact that there is a single algorithm to compute the parity on all inputs. Typical programming language use the notion of loops to express such an algorithm, and so we might have wanted to use code such as:
# s is the "running parity", initialized to 0
while i<len(X):
u = NAND(s,X[i])
v = NAND(s,u)
w = NAND(X[i],u)
s = NAND(v,w)
i+= 1
Y[0] = s
In this chapter we will show how we can extend the model of Boolean circuits / straightline programs so that it can capture these kinds of constructs. We will see two ways to do so:
Turing machines, invented by Alan Turing in 1936, are an hypothetical abstract device that can yields a finite description of an algorithm that can handle arbitrarily long inputs.
The NANDTM Programming language extends NANDCIRC with the notion of loops and arrays to allow a finite program that can compute a function with arbitrarily long inputs.
It turns out that these two models are equivalent, and in fact they are equivalent to a great many other computational models including programming languages you may be familiar with such as C, Java, Python, Javascript, OCaml, and so on and so forth. This notion, known as Turing equivalence or Turing completeness, will be discussed in Chapter 7. We start off by presenting Turing machines and then show their equivalence to NANDTM, though it is also possible to present these in the opposite order.
Previously in this book we studied the computation of finite functions \(f:\{0,1\}^n \rightarrow \{0,1\}^m\). Such a function \(f\) can always be desribed by listing all the \(2^n\) values it takes on inputs \(x\in \{0,1\}^n\).
In this chapter we consider functions that take inputs of unbounded size, such as the function \(\ensuremath{\mathit{XOR}}:\{0,1\}^* \rightarrow \{0,1\}\) that maps \(x\) to \(\sum_{i=0}^{x1} x_i \mod 2\). While we can describe \(\ensuremath{\mathit{XOR}}\) using a finite number of symbols (in fact we just did so in the previous sentence), it takes infinitely many possible inputs and so we cannot just write down all of its values. The same is true for many other functions capturing important computational tasks including addition, multiplication, sorting, finding paths in graphs, fitting curves to points, and so on and so forth.
To contrast with the finite case, we will sometimes call a function \(F:\{0,1\}^* \rightarrow \{0,1\}\) (or \(F:\{0,1\}^* \rightarrow \{0,1\}^*\)) infinite but we emphasize that the functions we are interested in always take an input which is a finite string. It’s just that, unlike the finite case, this string can be artbirarily long and is not fixed to some particular length \(n\).
Turing Machines
“Computing is normally done by writing certain symbols on paper. We may suppose that this paper is divided into squares like a child’s arithmetic book.. The behavior of the [human] computer at any moment is determined by the symbols which he is observing, and of his ‘state of mind’ at that moment… We may suppose that in a simple operation not more than one symbol is altered.”,
“We compare a man in the process of computing … to a machine which is only capable of a finite number of configurations… The machine is supplied with a ‘tape’ (the analogue of paper) … divided into sections (called ‘squares’) each capable of bearing a ‘symbol’”, Alan Turing, 1936
The “granddaddy” of all models of computation is the Turing Machine. Turing machines were defined in 1936 by Alan Turing in an attempt to formally capture all the functions that can be computed by human “computers” (see Figure 6.4) that follow a welldefined set of rules, such as the standard algorithms for addition or multiplication.
Turing thought of such a person as having access to as much “scratch paper” as they need. For simplicity we can think of this scratch paper as a one dimensional piece of graph paper (or tape, as it is commonly referred to), which is divided to “cells”, where each “cell” can hold a single symbol (e.g., one digit or letter, and more generally some element of a finite alphabet). At any point in time, the person can read from and write to a single cell of the paper, and based on the contents can update his/her finite mental state, and/or move to the cell immediately to the left or right of the current one.
Thus, Turing modeled such a computation by a “machine” that maintains one of \(k\) states, and at each point can read and write a single symbol from some alphabet \(\Sigma\) (containing \(\{0,1\}\)) from its “work tape” (see Figure 6.6). To perform computation using this machine, we write the input \(x\in \{0,1\}^*\) on the tape, and the goal of the machine is to ensure that at the end of the computation, the value \(F(x)\) will be written on the tape. Specifically, a computation of a Turing Machine \(M\) with \(k\) states and alphabet \(\Sigma\) on input \(x\in \{0,1\}^*\) proceeds as follows:
Initially the machine is at state \(0\) (known as the “starting state”) and the tape is initialized to \(\triangleright,x_0,\ldots,x_{n1},\varnothing,\varnothing,\ldots\).^{3}
The location \(i\) to which the machine points to is set to \(0\).
At each step, the machine reads the symbol \(\sigma = T[i]\) that is in the \(i^{th}\) location of the tape, and based on this symbol and its state \(s\) decides on:
 What symbol \(\sigma'\) to write on the tape
 Whether to move Left (i.e., \(i \leftarrow i1\)), Right (i.e., \(i \leftarrow i+1\)), Stay in place, or Halt the computation.
 What is going to be the new state \(s \in [k]\)
 What symbol \(\sigma'\) to write on the tape
The set of rules the Turing machine follows is known as its transition function.
When the machine halts then its output is obtained by reading off the tape from the second location (just after the \(\triangleright\)) onwards, stopping at the first point where the symbol is not \(0\) or \(1\).
Example: A Turing machine for palindromes
Let \(\ensuremath{\mathit{PAL}}\) (for palindromes) be the function that on input \(x\in \{0,1\}^*\), outputs \(1\) if and only if \(x\) is an (even length) palindrome, in the sense that \(x = w_0 \cdots w_{n1}w_{n1}w_{n2}\cdots w_0\) for some \(n\in \N\) and \(w\in \{0,1\}^n\).
We now show a Turing Machine \(M\) that computes \(\ensuremath{\mathit{PAL}}\). To specify \(M\) we need to specify (i) \(M\)’s tape alphabet \(\Sigma\) which should contain at least the symboles \(0\),\(1\), \(\triangleright\) and \(\varnothing\), and (ii) \(M\)’s transition function which determines what action \(M\) takes when it reads a given symbol while it is in a particular state.
In our case, \(M\) will use the alphabet \(\{ 0,1,\triangleright, \varnothing, \times \}\) and will have \(k=14\) states. Though the states are simply numbers between \(0\) and \(k1\), for convenience we will give them the following labels:
State 
Label 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

We describe the operation of our Turing Machine \(M\) in words:
\(M\) starts in state
START
and will go right, looking for the first symbol that is \(0\) or \(1\). If we find \(\varnothing\) before we hit such a symbol then we will move to theOUTPUT_1
state that we describe below.Once \(M\) found such a symbol \(b \in \{0,1\}\), \(M\) deletes \(b\) from the tape by writing the \(\times\) symbol, it enters either the
RIGHT_0
orRIGHT_1
mode according to the value of \(b\) and starts moving rightwards until it hits the first \(\varnothing\) or \(\times\) symbol.Once we found this symbol we into the state
LOOK_FOR_0
orLOOK_FOR_1
depending on whether we were in the stateRIGHT_0
orRIGHT_1
and make one left move.In the state
LOOK_FOR_
\(b\), we check whether the value on the tape is \(b\). If it is, then we delete it by changing its value to \(\times\), and move to the stateRETURN
. Otherwise, we change to theOUTPUT_0
state.The
RETURN
state means we go back to the beginning. Specifically, we move leftward until we hit the first symbol that is not \(0\) or \(1\), in which case we change our state toSTART
.The
OUTPUT_
\(b\) states mean that we are going to output the value \(b\). In both these states we go left until we hit \(\triangleright\). Once we do so, we make a right step, and change to the1_AND_BLANK
or0_AND_BLANK
states respectively. In the latter states, we write the corresponding value, and then move right and change to theBLANK_AND_STOP
state, in which we write \(\varnothing\) to the tape and halt.
The above description can be turned into a table describing for each one of the \(13\cdot 5\) combination of state and symbol, what the Turing machine will do when it is in that state and it reads that symbol. This table is known as the transition function of the Turing machine.
Turing machines: a formal definition
The formal definition of Turing machines is as follows:
A (one tape) Turing machine with \(k\) states and alphabet \(\Sigma \supseteq \{0,1, \triangleright, \varnothing \}\) is represented by a transition function \(\delta_M:[k]\times \Sigma \rightarrow [k] \times \Sigma \times \{\mathsf{L},\mathsf{R}, \mathsf{S}, \mathsf{H} \}\).
For every \(x\in \{0,1\}^*\), the output of \(M\) on input \(x\), denoted by \(M(x)\), is the result of the following process:
We initialize \(T\) to be the sequence \(\triangleright,x_0,x_1,\ldots,x_{n1},\varnothing,\varnothing,\ldots\), where \(n=x\). (That is, \(T[0]=\triangleright\), \(T[i+1]=x_{i}\) for \(i\in [n]\), and \(T[i]=\varnothing\) for \(i>n\).)
We also initialize \(i=0\) and \(s=0\).
We then repeat the following process:
 Let \((s',\sigma',D) = \delta_M(s,T[i])\).
 Set \(s \rightarrow s'\), \(T[i] \rightarrow \sigma'\).
 If \(D=\mathsf{R}\) then set \(i \rightarrow i+1\), if \(D=\mathsf{L}\) then set \(i \rightarrow \max\{i1,0\}\). (If \(D = \mathsf{S}\) then we keep \(i\) the same.)
 If \(D=\mathsf{H}\) then halt.
The result of the process, which we denote by \(M(x)\), is the string \(T[1],\ldots,T[m]\) where \(m>0\) is the smallest integer such that \(T[m+1] \not\in \{0,1\}\). If the process never ends then we write \(M(x)=\bot\).
You should make sure you see why this formal definition corresponds to our informal description of a Turing Machine. To get more intuition on Turing Machines, you can explore some of the online available simulators such as Martin Ugarte’s, Anthony Morphett’s, or Paul Rendell’s.
One should not confuse the transition function \(\delta_M\) of a Turing machine \(M\) with the function that the machine computes. The transition function \(\delta_M\) is a finite function, with \(k\Sigma\) inputs and \(4k\Sigma\) outputs. (Can you see why?) The machine can compute an infinite function \(F\) that takes as input a string \(x\in \{0,1\}^*\) of arbitrary length and might also produce an arbitrary length string as output.
In our formal definition, we identified the machine \(M\) with its transition function \(\delta_M\) since the transition function tells us everything we need to know about the Turing machine, and hence serves as a good mathematical representation of it. This choice of representation is somewhat arbitrary, and is based on our convention that the state space is always the numbers \(\{0,\ldots,k1\}\) with \(0\) as the starting state. Other texts use different conventions and so their mathematical definition of a Turing machine might look superficially different, but these definitions describe the same computational process and has the same computational powers.
For example, Sipser’s text (Sipser, 1997) allows a more general set of states \(Q\). Sipser also restrict attention to Turing machines that output only a single bit and therefore designates two speical halting states: the “\(0\) halting state” (often known as the rejecting state) and the other as the “\(1\) halting state” (often known as the accepting state). Thus instead of writing \(0\) or \(1\) on an output tape, the machine will enter into one of these states and halt. This again makes no difference to the computational power, though we prefer to consider the more general model of multibit outputs.
Sipser considers also functions with input in \(\Sigma^*\) for an arbitrary alphabet \(\Sigma\) (and hence distiguishes between the input alphabet which he denotes as \(\Sigma\) and the tape alphabet which he denotes as \(\Gamma\)), while we restrict attention to functions with binary strings as input. Again this is not a major issue, since we can always encode an element of \(\Sigma\) using a binary string of length \(\log \ceil{Sigma}\). Finally (and this is a very minor point) Sipser requires the machine to either move left or right in every step, without the \(\mathsf{S}\)tay operation, though staying in place is very easy to emulate by simply moving right and then back left.
The bottom line is that Sipser defines Turing machines as a seven tuple consisting of the state space, input alphabet, tape alphabet, transition function, starting state, accpeting state, and rejecting state. Superficially this might look like a very different definition than Definition 6.2 but it is simply a different representation of the same concept, just as a graph can be represented in either adjacency list or adjacency matrix form.
Computable functions
We now turn to making one of the most important definitions in this book, that of computable functions.
Let \(F:\{0,1\}^* \rightarrow \{0,1\}^*\) be a (total) function and let \(M\) be a Turing machine. We say that \(M\) computes \(F\) if for every \(x\in \{0,1\}^*\), \(M(x)=F(x)\).
We say that a function \(F\) is computable if there exists a Turing machine \(M\) that computes it.
Defining a function “computable” if and only if it can be computed by a Turing machine might seem “reckless” but, as we’ll see in Chapter 7, it turns out that being computable in the sense of Definition 6.4 is equivalent to being computable in essentially any reasonable model of computation. This is known as the Church Turing Thesis.^{4}
Definition 6.4 is, as we mentioned above, one of the most important definitions in this book. Please reread it (and Definition 6.2, which it relies upon).
This is a good point to remind the reader of the distinction between functions and programs:
\[ \text{Functions} \;\neq\; \text{Programs} \;.\]
A Turing machine (or program) \(M\) can compute some function \(F\), but it is not the same as \(F\). In particular there can be more than one program to compute the same function. Being computable is a property of functions, not of machines.
We will often pay special attention to functions \(F:\{0,1\}^* \rightarrow \{0,1\}\) that have a single bit of output. Hence we give a special name for the set of functions of this form that are computable.
We define \(\mathbf{R}\) be the set of all computable functions \(F:\{0,1\}^* \rightarrow \{0,1\}\).
Many texts use the terminology of “languages” rather than functions to refer to computational tasks. The name “language” has its roots in formal language theory as pursued by linguists such as Noam Chomsky. A formal language is a subset \(L \subseteq \{0,1\}^*\) (or more generally \(L \subseteq \Sigma^*\) for some finite alphabet \(\Sigma\)). The membership or decision problem for a language \(L\), is the task of determining, given \(x\in \{0,1\}^*\), whether or not \(x\in L\). A Turing machine \(M\) decides a language \(L\) if for every input \(x\in \{0,1\}^*\), \(M(x)\) outputs \(1\) if and only if \(x\in L\). This is equivalent to computing the Boolean function \(F:\{0,1\}^* \rightarrow \{0,1\}\) defined as \(F(x)=1\) iff \(x\in L\). A language \(L\) is decidable if there is a Turing machine \(M\) that decides it. For historical reasons, some texts also call such a language recursive (which is the reason that the letter \(\mathbf{R}\) is often used to denote the set of computable Boolean functions / decidable languages defined in Definition 6.5).^{5}
In this book we stick to the terminology of functions rather than languages, but all definitions and results can be easily translated back and forth by using the equivalence between the function \(F:\{0,1\}^* \rightarrow \{0,1\}\) and the language \(L = \{ x\in \{0,1\}^* \;\; F(x) = 1 \}\).
Infinite loops and partial functions
One crucial difference between circuits/straightline programs and Turing machines is the following. Looking at a NANDCIRC program \(P\), we can always tell how many inputs and how many outputs it has (by simply looking at the X
and Y
variables). Furthermore, we are guaranteed that if we invoke \(P\) on any input then some output will be produced.
In contrast, given any Turing machine \(M\), we cannot determine a priori the length of the output. In fact, we don’t even know if an output would be produced at all! For example, it is very easy to come up with a Turing machine whose transition function never outouts \(\mathsf{H}\) and hence never halts.
If a machine \(M\) fails to stop and produce an output on some an input \(x\), then it cannot compute any total function \(F\), since clearly on input \(x\), \(M\) will fail to output \(F(x)\). However, \(P\) can still compute a partial function.^{6}
For example, consider the partial function \(\ensuremath{\mathit{DIV}}\) that on input a pair \((a,b)\) of natural numbers, outputs \(\ceil{a/b}\) if \(b > 0\), and is undefined otherwise. We can define a turing machine \(M\) that computes \(\ensuremath{\mathit{DIV}}\) on input \(a,b\) by outputting the first \(c=0,1,2,\ldots\) such that \(cb \geq a\). If \(a>0\) and \(b=0\) then the machine \(M\) will never halt, but this is OK, since \(\ensuremath{\mathit{DIV}}\) is undefined on such inputs. If \(a=0\) and \(b=0\), the machine \(M\) will output \(0\), which is also OK, since we don’t care about what the program outputs on inputs on which \(\ensuremath{\mathit{DIV}}\) is undefined. Formally, we define computability of partial functions as follows:
Let \(F\) be either a total or partial function mapping \(\{0,1\}^*\) to \(\{0,1\}^*\) and let \(M\) be a Turing machine. We say that \(M\) computes \(F\) if for every \(x\in \{0,1\}^*\) on which \(F\) is defined, \(M(x)=F(x)\).^{7}
We say that a (partial or total) function \(F\) is computable if there is a Turing machine that computes it.
We often use \(\bot\) as our special “failure symbol”. If a Turing machine \(M\) fails to halt on some input \(x\in \{0,1\}^*\) then we denote this by \(M(x) = \bot\). This does not mean that \(M\) outputs some encoding of the symbol \(\bot\) but rather that \(M\) enters into an infinite loop when given \(x\) as input.
If a partial function \(F\) is undefined on \(x\) then can also write \(F(x) = \bot\). Therefore one might think that Definition 6.7 can be simplified to requiring that \(M(x) = F(x)\) for every \(x\in \{0,1\}\), which would imply that for every \(x\), \(M\) halts on \(x\) if and only if \(F\) is defined on \(x\). However this is not the case: for a Turing Machine \(M\) to compute a partial function \(F\) it is not necessary for \(M\) to enter an infinite loop on inputs \(x\) on which \(F\) is not defined. All that is needed is for \(M\) to output \(F(x)\) on \(x\)’s on which \(F\) is defined: on other inputs it is OK for \(M\) to output an arbitrary value or not to halt at all.^{8}
Turing machines as programming languages
The name “Turing machine”, with its “tape” and “head” evokes a physical object, while in contrast we think of a program as a piece of text. But we can think of a Turing machine as a program as well. For example, consider the Turing Machine \(M\) of Subsection 6.1.1 that computes the function \(\ensuremath{\mathit{PAL}}\) such that \(\ensuremath{\mathit{PAL}}(x)=1\) iff \(x\) is a palindrome. We can also describe this machine as a program using the Pythonlike pseudocode of the form below
# Gets an array Tape that is initialized to [">", x_0 , x_1 , .... , x_(n1), "∅", "∅", ...]
# At the end of the execution, Tape[1] is equal to 1 if x is a palindrome and is equal to 0 otherwise
def PAL(Tape):
head = 0
state = 0 # START
while (state != 12):
if (state == 0 && Tape[head]=='0'):
state = 3 # LOOK_FOR_0
Tape[head] = 'x'
head += 1 # move right
if (state==0 && Tape[head]=='1')
state = 4 # LOOK_FOR_1
Tape[head] = 'x'
head += 1 # move right
... # more if statements here
The particular details of this program are not important. What is important is that we can describe Turing machines as programs. Moreover, note that when translating a Turing machine into a program, the tape becomes a list or array that can hold values from the finite set \(\Sigma\).^{9} The head position can be thought of as an integer valued variable that can hold integers of unbounded size. The state is a local register that can hold one of a fixed number of values in \([k]\).
More generally we can think of every Turing Machine \(M\) as equivalent to a program along the following lines:
# Gets an array Tape that is initialized to [">", x_0 , x_1 , .... , x_(n1), "∅", "∅", ...]
def M(Tape):
state = 0
i = 0 # holds head location
while (True):
# Move head, modify state and write to tape based on
# current state and value of tape at head location
if Tape[i]=="0" and state==7:
i += 1
Tape[i]="1"
state = 19
elif Tape[i]==">" and state == 13:
Tape[i]="0"
state = 15
elif ...
...
elif Tape[i]==">" and state == 29:
break # Halt
If we wanted to use only Boolean (i.e., \(0\)/\(1\)valued) variables then we can encode the state
variables using \(\ceil{\log k}\) bits. Similarly, we can represent each element of the alphabet \(\Sigma\) using \(\ceil{\log \Sigma}\) bits and hence we can replace the \(\Sigma\)valued array Tape[]
with \(\ceil{\log \Sigma}\) Booleanvalued arrays Tape0[]
,\(\ldots\), Tape
\(\ell\)[]
for \(\ell = \ceil{\log\Sigma}\).
The NANDTM Programming language
We now introduce the NANDTM programming language, which aims to capture the power of a Turing machine in a programming language formalism. Just like the difference between Boolean circuits and Turing Machines, the main difference between NANDTM and NANDCIRC is that NANDTM models a single uniform algorithm that can compute a function that takes inputs of arbitrary lengths. To do so, we extend the NANDCIRC programming language with two constructs:
Loops: NANDCIRC is a straightline programming language a NANDCIRC program of \(s\) lines takes exactly \(s\) steps of computation and hence in particular cannot even touch more than \(3s\) variables. Loops allow us to capture in a short program the instructions for a computation that can take an arbitrary amount of time.
Arrays: A NANDCIRC program of \(s\) lines touches at most \(3s\) variables. While we can use variables with names such as
Foo_17
orBar[22]
, they are not true arrays, since the number in the identifier is a constant that is “hardwired” into the program.
Thus a good way to remember NANDTM is using the following informal equation:
\[ \text{NANDTM} \;=\; \text{NANDCIRC} \;+\; \text{loops} \;+\; \text{arrays} \;\;(6.2) \]
As we will see, adding loops and arrays to NANDCIRC is enough to capture the full power of all programming languages! Hence we could replace “NANDTM” with any of Python, C, Javascript, OCaml, etc. in the lefthand side of Equation 6.2. But we’re getting ahead of ourselves: this issue will be discussed in Chapter 7.
Concretely, the NANDTM programming language adds the following features on top of NANCCIRC (see Figure 6.7)):
We add a special integer valued variable
i
. All other variables in NANDTM are Boolean valued (as in NANDCIRC).We add arrays to the language by allowing variable identifiers to have the form
Foo[i]
withi
being the special integervalued variable mentioned above.Foo
is an array of Boolean values, andFoo[i]
refers to the value of this array at location equal to the current value of the variablei
.We use the convention that arrays always start with a capital letter, and scalar variables (which are never indexed with
i
) start with lowercase letters. HenceFoo
is an array andbar
is a scalar variable.The input and output
X
andY
are now considered arrays with values of zeroes and ones.We add a special
MODANDJUMP
instruction that takes two boolean variables \(a,b\) as input and does the following: If \(a=1\) and \(b=1\) then
MODANDJUMP(
\(a,b\))
incrementsi
by one and jumps to the first line of the program.  If \(a=0\) and \(b=1\) then
MODANDJUMP(
\(a,b\))
decrementsi
by one and jumps to the first line of the program. (Ifi
is already equal to \(0\) then it tays at \(0\).)  If \(a=1\) and \(b=0\) then
MODANDJUMP(
\(a,b\))
jumps to the first line of the program without modifyingi
.  If \(a=b=0\) then
MODANDJUMP(
\(a,b\))
halts execution of the program.
 If \(a=1\) and \(b=1\) then
Since
MODANDJUMP
always either jumps to the first line of the program or halts the computation, the instructions following it will never get executed. Hence in NANDTM programs theMODANDJUMP
instruction appears in the last line of the program and nowhere else.
We also need one more convention to handle “default values”. In a Turing machine the tape could contain in addition to \(0\) and \(1\) (and possibly other values) also the special symbol \(\varnothing\) to indicate that this location is “blank”. All our variables, including our arrays, will be Boolean, and so they can contain either the value \(0\) or \(1\). All values default to \(0\) if they have not been initialized to another value.
To keep track of whether a \(0\) in an array corresponds to a true zero or to an uninitialized cell, a programmer can always add to an array Foo
a “companion array” Foo_nonblank
and set Foo_nonblank[i]
to \(1\) whenever the i
’th location is written to. In particular we will use this convention for the input and output arrays X
and Y
. When a NANDTM program is executed it has four special arrays X
, X_nonblank
, Y
, and Y_nonblank
. When a NANDTM program is executed on input \(x\in \{0,1\}^*\) of length \(n\), the first \(n\) cells of the array X
are initialized to \(x_0,\ldots,x_{n1}\) and the first \(n\) cells of the array X_nonblank
are initialized to \(1\). (All uninitialized cells default to \(0\).)
The output of a NANDTM program is the string Y[
\(0\)]
, \(\ldots\), Y[
\(m1\)]
where \(m\) is the smallest integer such that Y_nonblank[
\(m\)]
\(=0\). A NANDTM program gets called with X
and X_nonblank
initialized to contain the input, and writes to Y
and Y_nonblank
to produce the output.
Formally, NANDTM programs are defined as follows:
A NANDTM program consists of a sequence of lines of the form foo = NAND(bar,blah)
ending with a line of the form MODANDJMP(foo,bar)
, where foo
,bar
,blah
are either scalar variables (sequences of letters, digits, and underscores) or array variables of the form Foo[i]
(starting with capital letter and indexed by i
).
If \(P\) is a NANDTM program and \(x\in \{0,1\}^*\) is an input then an execution of \(P\) on \(x\) is the following process:
The arrays
X
andX_nonblank
are initialized byX[
\(i\)]
\(=x_i\) andX_nonblank[
\(i\)]
\(=1\) for all \(i\in [x]\). All other variables and cells are initialized to \(0\). The index variablei
is also initalized to \(0\).The program is executed line by line, when the last line
MODANDJMP(foo,bar)
is executed then we do as follows:If
foo
\(=1\) andbar
\(=0\) then jump to the first line without modifying the value ofi
.If
foo
\(=1\) andbar
\(=1\) then incrementi
by one and jump to the first line.If
foo
\(=0\) andbar
\(=1\) then decrementi
by one (unless it is already zero) and jump to the first line.If
foo
\(=0\) andbar
\(=0\) then halt and outputY[
\(0\)]
, \(\ldots\),Y[
\(m1\)]
where \(m\) is the smallest integer such thatY_nonblank[
\(m\)]
\(=0\).
Sneak peak: NANDTM vs Turing machines
As the name implies, NANDTM programs are a direct implementation of Turing machines in programming language form. we will show the equivalence below but you can already see how the components of Turing machines and NANDTM programs correspond to one another:
Turing Machine 
NANDTM program 

State: single register that takes values in \([k]\) 
Scalar variables: Several variables such as 
Tape: One tape containing values in a finite set \(\Sigma\). Potentially infinite but \(T[t]\) defaults to \(\varnothing\) for all locations \(t\) that have not been accessed. 
Arrays: Several arrays such as 
Head location: A number \(i\in \mathbb{N}\) that encodes the position of the head. 
Index variable: The variable 
Accessing memory: At every step the Turing machine has access to its local state, but can only access the tape at the position of the current head location. 
Accessing memory: At every step a NANDTM program has access to all the scalar variables, but can only access the arrays at the location 
Control of location: In each step the machine can move the head location by at most one position. 
Control of index variable: In each iteration of its main loop the program can modify the index 
Examples
We now present some examples of NANDTM programs
The following is a NANDTM program to compute the XOR function on inputs of arbitrary length. That is \(\ensuremath{\mathit{XOR}}:\{0,1\}^* \rightarrow \{0,1\}\) such that \(\ensuremath{\mathit{XOR}}(x) = \sum_{i=0}^{x1} x_i \mod 2\) for every \(x\in \{0,1\}^*\).
We now present NANDTM program to compute the increment function. That is, \(\ensuremath{\mathit{INC}}:\{0,1\}^* \rightarrow \{0,1\}^*\) such that for every \(x\in \{0,1\}^n\), \(\ensuremath{\mathit{INC}}(x)\) is the \(n+1\) bit long string \(y\) such that if \(X = \sum_{i=0}^{n1}x_i \cdot 2^i\) is the number represented by \(x\), then \(y\) is the (leastsignificant digit first) binary representation of the number \(X+1\).
We start by showing the program using the “syntactic sugar” we’ve seen before of using shorthand for some NANDCIRC programs we have seen before to compute simple functions such as IF
, XOR
and AND
(as well as the constant one
function as well as the function COPY
that just maps a bit to itself).
carry = IF(started,carry,one(started))
started = one(started)
Y[i] = XOR(X[i],carry)
carry = AND(X[i],carry)
Y_nonblank[i] = one(started)
MODANDJUMP(X_nonblank[i],X_nonblank[i])
The above is not, strictly speaking, a valid NANDTM program. If we “open up” all of the syntactic sugar, we get the following valid program to compute this syntactic sugar.
temp_0 = NAND(started,started)
temp_1 = NAND(started,temp_0)
temp_2 = NAND(started,started)
temp_3 = NAND(temp_1,temp_2)
temp_4 = NAND(carry,started)
carry = NAND(temp_3,temp_4)
temp_6 = NAND(started,started)
started = NAND(started,temp_6)
temp_8 = NAND(X[i],carry)
temp_9 = NAND(X[i],temp_8)
temp_10 = NAND(carry,temp_8)
Y[i] = NAND(temp_9,temp_10)
temp_12 = NAND(X[i],carry)
carry = NAND(temp_12,temp_12)
temp_14 = NAND(started,started)
Y_nonblank[i] = NAND(started,temp_14)
MODANDJUMP(X_nonblank[i],X_nonblank[i])
Working out the above two example can go a long way towards understanding the NANDTM language. See the appendix and our [GitHub repository](https://github.com/boazbk/tcscode] for a full specification of the NANDTM language.
Equivalence of Turing machines and NANDTM programs
Given the above discussion, it might not be surprising that Turing machines turn out to be equivalent to NANDTM programs. Indeed, we designed the NANDTM language to have this property. Nevertheless, this is an important result, and the first of many other such equivalence results we will see in this book.
For every \(F:\{0,1\}^* \rightarrow \{0,1\}^*\), \(F\) is computable by a NANDTM program \(P\) if and only if there is a Turing Machine \(M\) that computes \(F\).
To prove such an equivalence theorem, we need to show two directions. We need to be able to (1) transform a Turing machine \(M\) to a NANDTM program \(P\) that computes the same function as \(P\) and (2) transform a NANDTM program \(P\) into a Turing machine \(M\) that computes the same function as \(P\).
The idea of the proof is illustrated in Figure 6.8. To show (1), given a Turing machine \(M\), we will create a NANDTM program \(P\) that will have an array Tape
for the tape of \(M\) and scalar (i.e., non array) variable(s) state
for the state of \(M\). Specifically, since the state of a Turing machine is not in \(\{0,1\}\) but rather in a larger set \([k]\), we will use \(\ceil{\log k}\) variables state_
\(0\) , \(\ldots\), state_
\(\ceil{\log k}1\) variables to store the representation of the state. Similarly, to encode the larger alphabet \(\Sigma\) of the tape, we will use \(\ceil{\log \Sigma}\) arrays Tape_
\(0\) , \(\ldots\), Tape_
\(\ceil{\log \Sigma}1\), such that the \(i^{th}\) location of these arrays encodes the \(i^{th}\) symbol in the tape for every tape. Using the fact that every function can be computed by a NANDCIRC program, we will be able to compute the transition function of \(M\), replacing moving left and right by decrementing and incrementing i
respectively.
We show (2) using very similar ideas. Given a program \(P\) that uses \(a\) array variables and \(b\) scalar variables, we will create a Turing machine with about \(2^b\) states to encode the values of scalar variables, and an alphabet of about \(2^a\) so we can encode the arrays using our tape. (The reason the sizes are only “about” \(2^a\) and \(2^b\) is that we will need to add some symbols and steps for bookkeeping purposes.) The Turing Machine \(M\) will simulate each iteration of the program \(P\) by updating its state and tape accordingly.
We start by proving the “if” direction of Theorem 6.14. Namely we show that given a Turing machine \(M\), we can find a NANDTM program \(P_M\) such that for every input \(x\), if \(M\) halts on input \(x\) with output \(y\) then \(P_M(x)=y\). Since our goal is just to show such a program \(P_M\) exists, we don’t need to write out the full code of \(P_M\) line by line, and can take advantage of our various “syntactic sugar” in describing it.
The key observation is that by Theorem 4.11 we can compute every finite function using a NANDCIRC program. In particular, consider the transition function \(\delta_M:[k]\times \Sigma \rightarrow [k] \times \Sigma \times \{\mathsf{L},\mathsf{R} \}\) of our Turing Machine. We can encode the its components as follows:
We encode \([k]\) using \(\{0,1\}^\ell\) and \(\Sigma\) using \(\{0,1\}^{\ell'}\), where \(\ell = \ceil{\log k}\) and \(\ell' = \ceil{\log \Sigma}\).
We encode the set \(\{\mathsf{L},\mathsf{R}, \mathsf{S},\mathsf{H} \}\) using \(\{0,1\}^2\). We will choose the encode \(\mathsf{L} \mapsto 01\), \(\mathsf{R} \mapsto 11\), \(\mathsf{S} \mapsto 10\), \(\mathsf{H} \mapsto 00\). (This conveniently corresponds to the semantics of the
MODANDJUMP
operation.)
Hence we can identify \(\delta_M\) with a function \(\overline{M}:\{0,1\}^{\ell+\ell'} \rightarrow \{0,1\}^{\ell+\ell'+2}\), mapping strings of length \(\ell+\ell'\) to strings of length \(\ell+\ell'+2\). By Theorem 4.11 there exists a finite length NANDCIRC program ComputeM
that computes this function \(\overline{M}\). The NANDTM program to simulate \(M\) will essentially be the following:
where we use state
as shorthand for the tuple of variables state_
\(0\), \(\ldots\), state_
\(\ell1\) and Tape[i]
as shorthand for Tape_
\(0\)[i]
,\(\ldots\), Tape_
\(\ell'1\)[i]
. (We need to add a little “book keeping” to translate the output and the input from the tape to the X
and Y
variables, but the above is basically it.) Since every step of the main loop of the above program perfectly mimics the computation of the Turing Machine \(M\) as ComputeM
computes the transition of the Turing Machine, and the program carries out exactly the definition of computation by a Turing Machine as per Definition 6.2.
For the other direction, suppose that \(P\) is a NANDTM program with \(s\) lines, \(\ell\) scalar variables, and \(\ell'\) array variables. We will show that there exists a Turing machine \(M_P\) with \(2^\ell+C\) states and alphabet \(\Sigma\) of size \(C' + 2^{\ell'}\) that computes the same functions as \(P\) (where \(C\), \(C'\) are some constants to be determined later).
Specifically, consider the function \(\overline{P}:\{0,1\}^\ell \times \{0,1\}^{\ell'} \rightarrow \{0,1\}^\ell \times \{0,1\}^{\ell'}\) that on input the contents of \(P\)’s scalar variables and the contents of the array variables at location i
in the beginning of an iteration, outputs all the new values of these variables at the last line of the iteration, right before the MODANDJUMP
instruction is executed.
If foo
and bar
are the two variables that are used as input to the MODANDJUMP
instruction, then this means that based on the values of these variables we can compute whether i
will increase, decrease or stay the same, and whether the program will halt or jump back to the beginning. Hence a Turing machine can simulate an execution of \(P\) in one iteration using a finite function applied to its alphabet. The overall operation of the Turing machine will be as follows:
The machine \(M_P\) encodes the contents of the array variables of \(P\) in its tape, and the contents of the scalar variables in (part of) its state. Specifically, if \(P\) has \(\ell\) local variables and \(t\) arrays, then the state space of \(M\) will be large enough to encode all \(2^\ell\) assignments to the local variables and the alphabet \(\Sigma\) of \(M\) will be large enough to encode all \(2^t\) assignments for the array variables at each location. The head location corresponds to the index variable
i
.Recall that every line of the program \(P\) corresponds to reading and writing either a scalar variable, or an array variable at the location
i
. In one iteration of \(P\) the value ofi
remains fixed, and so the machine \(M\) can simulate this iteration by reading the values of all array variables ati
(which are encoded by the single symbol in the alphabet \(\Sigma\) located at thei
th cell of the tape) , reading the values of all scalar variables (which are encoded by the state), and updating both. The transition function of \(M\) can output \(\mathsf{L},\mathsf{S},\mathsf{R}\) depending on whether the values given to theMODANDJMP
operation are \(01\), \(10\) or \(11\) respectively.When the program halts (i.e.,
MODANDJMP
gets \(00\)) then the Turing machine will enter into a special loop to copy the results of theY
array into the output and then halt. We can achieve this by adding a few more states.
The above is not a full formal description of a Turing Machine, but our goal is just to show that such a machine exists. One can see that \(M_P\) simulates every step of \(P\), and hence computes the same function as \(P\).
If we examine the proof of Theorem 6.14 then we can see that the every iteration of the loop of a NANDTM program corresponds to one step in the execution of the Turing machine. We will come back to this question of measuring number of computation steps later in this course. For now the main take away point is that NANDTM programs and Turing Machines are essentially equivalent in power even when taking running time into account.
Specification vs implementation (again)
Once you understand the definitions of both NANDTM programs and Turing Machines, Theorem 6.14 is fairly straightforward. Indeed, NANDTM programs are not as much a different model from Turing Machines as they are simply a reformulation of the same model using programming language notation. You can think of the difference between a Turing machine and a NANDTM program as the difference between representing a number using decimal or binary notation. In contrast, the difference between a function \(F\) and a Turing machine that computes \(F\) is much more profound: it is like the difference between an equation \(E\) and a number that is a solution for \(E\). For this reason, while we take special care in distinguishing functions from programs or machines, we will often identify the two latter concepts. We will move freely between describing an algorithm as a Turing machine or as a NANDTM program (as well as some of the other equivalent computational models we will see in Chapter 7 and beyond).
Setting 
Specification 
Implementation 

Finite computation 
Functions mapping \(\{0,1\}^n\) to \(\{0,1\}^m\) 
Circuits, Straightline programs 
Infinite computation 
Functions mapping \(\{0,1\}^*\) to \(\{0,1\}\) or to \(\{0,1\}^*\). 
Algorithms, Turing Machines, Programs 
NANDTM syntactic sugar
Just like we did with NANDCIRC in Chapter 4, we can use “syntactic sugar” to make NANDTM programs easier to write. For starters, we can use all of the syntactic sugar of NANDCIRC, and so have access to macro definitions and conditionals (i.e., if/then). But we can go beyond this and achieve for example:
Inner loops such as the
while
andfor
operations commong to many programming language.sMultiple index variables (e.g., not just
i
but we can addj
,k
, etc.).Arrays with more than one dimension (e.g.,
Foo[i][j]
,Bar[i][j][k]
etc.)
In all of these cases (and many others) we can implement the new feature as mere “syntactic sugar” on top of standard NANDTM, which means that the set of functions computable by NANDTM with this feature is the same as the set of functions computable by standard NANDTM. Similarly, we can show that the set of functions computable by Turing Machines that have more than one tape, or tapes of more dimensions than one, is the same as the set of functions computable by standard Turing machines.
“GOTO” and inner loops
We can implement more advanced looping constructs than the simple MODANDJUMP
. For example, we can implement GOTO
. A GOTO
statement corresponds to jumping to a certain line in the execution. If our code has the form
then the program will only do foo
and blah
as when it reaches the line GOTO("gohere")
it will jump to the line labeled with “gohere”. We can achieve the effect of GOTO
in NANDTM using conditionals. In the code below, we assume that we have a variable pc
that can take strings of some constant length. This can be encoded using a finite number of Boolean variables pc_0
, pc_1
, \(\ldots\), pc_
\(k1\), and so when we write below pc = "line1"
what we mean is something like pc_0 = 0
,pc_1 = 1
, \(\ldots\) (where the bits \(0,1,\ldots\) correspond to the encoding of the finite string "line1"
as a string of length \(k\)). We also assume that we have access to conditional (i.e., if
statements), which we can emulate using syntactic sugar in the same way as we did in NANDCIRC.
To emulate a GOTO statement, we will first modify a program P of the form
to have the following form (using syntactic sugar for if
):
pc = "line1"
if (pc=="line1"):
do foo
pc = "line2"
if (pc=="line2"):
do bar
pc = "line3"
if (pc=="line3"):
do blah
These two programs do the same thing. The variable pc
corresponds to the “program counter” and tells the program which line to execute next. We can see that if we wanted to emulate a GOTO("line3")
then we could simply modify the instruction pc = "line2"
to be pc = "line3"
.
In NANDCIRC we could only have GOTO
s that go forward in the code, but since in NANDTM everything is encompassed within a large outer loop, we can use the same ideas to implement GOTO
’s that can go backwards, as well as conditional loops.
Other loops. Once we have GOTO
, we can emulate all the standard loop constructs such as while
, do .. until
or for
in NANDTM as well. For example, we can replace the code
with
The GOTO
statement was a staple of most early programming languages, but has largely fallen out of favor and is not included in many modern languages such as Python, Java, Javascript. In 1968, Edsger Dijsktra wrote a famous letter titled “Go to statement considered harmful.” (see also Figure 6.9). The main trouble with GOTO
is that it makes analysis of programs more difficult by making it harder to argue about invariants of the program.
When a program contains a loop of the form:
you know that the line of code do blah
can only be reached if the loop ended, in which case you know that j
is equal to \(100\), and might also be able to argue other properties of the state of the program. In contrast, if the program might jump to do blah
from any other point in the code, then it’s very hard for you as the programmer to know what you can rely upon in this code. As Dijkstra said, such invariants are important because “our intellectual powers are rather geared to master static relations and .. our powers to visualize processes evolving in time are relatively poorly developed” and so “we should … do …our utmost best to shorten the conceptual gap between the static program and the dynamic process.”
That said, GOTO
is still a major part of lower level languages where it is used to implement higher level looping constructs such as while
and for
loops. For example, even though Java doesn’t have a GOTO
statement, the Java Bytecode (which is a lower level representation of Java) does have such a statement. Similarly, Python bytecode has instructions such as POP_JUMP_IF_TRUE
that implement the GOTO
functionality, and similar instructions are included in many assembly languages. The way we use GOTO
to implement a higher level functionality in NANDTM is reminiscent of the way these various jump instructions are used to implement higher level looping constructs.
Extended example: well formed programs (optional)
The notion of passing between different variants of programs can be extremely useful, as often, given a program \(P\) that we want to analyze, it would be simpler for us to first modify it to an equivalent program \(P'\) that has some convenient properties. You can think of this as the NANDTM equivalent of enforcing “coding conventions” that are often used for programming languages. For example, while this is not part of the Python language, Google’s Python style guide stipulates that variables that are initialized to a value and never changed (i.e., constants) are typed with all capital letters. (Similar requirements are used in other style guides.) Of course this does not really restrict the power of Googleconforming Python programs, since every Python program can be transformed to an equivalent one that satisfies this requirement. In fact, many programming languages have automatic programs known as linters that can detect and sometimes modify the program to fit certain standards.
The following solved exercise is an example of that. We will define the notion of a wellformed program and show that every NANDTM program can be transformed into an equivalent one that is well formed.
We say that a NANDTM program \(P\) is well formed if it satisfies the following properties:
Every reference to a variable in \(P\) either has the form
foo
orfoo_123
(a scalar variable: alphanumerical string starting with a lowercase letter and no brackets) or the formBar[i]
orBar_12[i]
(an array variable alphanumerical string starting with a capital letter and ending with[i]
).\(P\) contains the scalar variables
zero
,one
,dir0
, anddir1
such thatzero
andone
are always the constants \(0\) and \(1\) respectively, andMODANDJUMP
is always invoked with the variablesdir0
anddir1
.\(P\) contains the array variables
Visited
andAtstart
and code to ensure thatAtstart[
\(i\)]
equals \(1\) if and only if \(i=0\), andVisited[
\(i\)]
equals \(1\) for all the positions \(i\) such that the program finished an iteration with the index variablei
equalling \(i\).
The following exercise shows that we can transform every NANDTM program \(P\) into a wellformed program \(P'\) that is equivalent to it. Hence if we are given a NANDTM program \(P\), we can (and will) often assume without loss of generality that it is wellformed.
For every NANDTM program \(P\), there exists an NANDTM program \(P'\) equivalent to \(P\) that is well formed as pre Definition 6.17. That is, for every input \(x\in \{0,1\}^*\), either both \(P\) and \(P'\) do not halt on \(x\), or both \(P\) and \(P'\) halt on \(x\) and produce the same output \(y\in \{0,1\}^*\).
Prove Lemma 6.18
As usual, I would recommend you try to solve this exercise yourself before looking up the solution.
Since variable identifiers on their own have no meaning in (enhanced) NANDTM (other than the special ones X
, X_nonblank
, Y
, Y_nonblank
that already have the desired properties), we can easily achieve the property that scalars variables start with lowercase and arrays with uppercase using “search and replace”. We just have to take care that we don’t make two distinct identifiers become the same. For example, we can do so by changing all scalar variable identifiers to lower case, and adding to them the prefix scalar_
, and adding the prefix Array_
to all array variable identifiers.
The property that an array variable is never references with a numerical index is more challenging. We need to remove all references to an array variable with an actual numerical index rather than i
. One thought might be to simply convert a a reference of the form Arr[17]
to the scalar variable arr_17
. However, this will not necessarily preserve the functionality of the program. The reason is that we want to ensure that when i
\(=17\) then Arr[i]
would give us the same value as arr_17
.
Nevertheless, we can use the approach above with a slight twist. We will demonstrate the solution in a concrete case.(Needless to say, if you needed to solve this question in a problem set or an exam, such a demonstration of a special case would not be sufficient; but this example should be good enough for you to extrapolate a full solution.) Suppose that there are only three references to array variables with numerical indices in the program: Foo[5]
, Bar[12]
and Blah[22]
. We will include three scalar variables foo_5
, bar_12
and blah_22
which will serve as a cache for the values of these arrays. We will change all references to Foo[5]
to foo_5
, Bar[12]
to bar_12
and so on and so forth. But in addition to that, whenever in the code we refer to Foo[i]
we will check if i
\(=5\) and if so use the value foo_5
instead, and similarly with Bar[i]
or Blah[i]
.
Specifically, we will change our program as follows. We will create an array Is_5
such that Is_5[i]
\(=1\) if and only i
\(=5\), and similarly create arrays Is_12
, Is_22
.
We can then change code of the following form
to
and similarly code of the form
to
To create the arrays we can add code of the following form in the beginning of the program (here we’re using enhanced NANDTM syntax, GOTO
, and the constant one
but this syntactic sugar can of course be avoided):
# initialization of arrays
GOTO("program body",init_done)
i += one
i += one
i += one
i += one
i += one
Is_5[i] = one
i += one
... # repeat i += one 6 more times
Is_12[i] = one
i += one
... # repeat i += one 9 more times
Is_22[i] = one
i = one
... # repeat i = one 21 more times
init_done = one
LABEL("program body")
original code of program..
Using IF
statements (which can easily be replaced with syntactic sugar) we can handle the conditions that loop
, Y
, and Y_nonblank
are not written to once loop
is set to \(0\). We leave completing all the details as an exercise to the reader (see ?? ??).
Uniformity, and NAND vs NANDTM (discussion)
While NANDTM adds extra operations over NANDCIRC, it is not exactly accurate to say that NANDTM programs or Turing machines are “more powerful” than NANDCIRC programs. NANDCIRC programs, having no loops, are simply not applicable for computing functions with an bounded number of inputs. Thus, to compute a function \(F:\{0,1\}^* :\rightarrow \{0,1\}^*\) using NANDCIRC (or equivalently, Boolean circuits) we need a collection of programs/circuits: one for every input length.
The key difference between NANDCIRC and NANDTM is that NANDTM allows us to express the fact that the algorithm for computing parities of length\(100\) strings is really the same one as the algorithm for computing parities of length\(5\) strings (or similarly the fact that the algorithm for adding \(n\)bit numbers is the same for every \(n\), etc.). That is, one can think of the NANDTM program for general parity as the “seed” out of which we can grow NANDCIRC programs for length \(10\), length \(100\), or length \(1000\) parities as needed.
This notion of a single algorithm that can compute functions of all input lengths is known as uniformity of computation and hence we think of Turing machines / NANDTM as uniform model of computation, as opposed to Boolean circuits or NANDCIRC which is a nonuniform model, where we have to specify a different program for every input length.
Looking ahead, we will see that this uniformity leads to another crucial difference between Turing machines and circuits. Turing machines can have inputs and outputs that are longer than the description of the machine as a string and in particular there exists a Turing machine that can “self replicate” in the sense that it can print its own code. This notion of “self replication”, and the related notion of “self reference” is crucial to many aspects of computation, as well of course to life itself, whether in the form of digital or biological programs.
For now, what you ought to remember is the following differences between uniform and non uniform computational models:
Non uniform computational models: Examples are NANDCIRC programs and Boolean circuits. These are models where each individual program/circuit can compute a finite function \(f:\{0,1\}^n \rightarrow \{0,1\}^m\). We have seen that every finite function can be computed by some program/circuit. To discuss computation of an infinite function \(F:\{0,1\}^* \rightarrow \{0,1\}^*\) we need to allow a sequence \(\{ P_n \}_{n\in \N}\) of programs/circuits (one for every input length), but this does not capture the notion of a single algorithm to compute the function \(F\).
Uniform computational models: Examples are Turing machines and NANDTM programs. These are model where a single program/machine can take inputs of arbitrary length and hence compute an infinite function \(F:\{0,1\}^* \rightarrow \{0,1\}^*\). The number of steps that a program/machine takes on some input is not a priori bounded in advance and in particular there is a chance that it will enter into an infinite loop. Unlike the nonuniform case, we have not shown that every infinite function can be computed by some NANDTM program/Turing Machine. We will come back to this point in Chapter 8.
 Turing machines capture the notion of a single algorithm that can evaluate functions of every input length.
 They are equivalent to NANDTM programs, which add loops and arrays to NANDCIRC.
 Unlike NANDCIRC or Boolean circuits, the number of steps that a Turing machine takes on a given input is not fixed in advance. In fact, a Turing machine or a NANDTM program can enter into an infinite loop on certain inputs, and not halt at all.
Exercises
Produce the code of a (syntacticsugar free) NANDTM program \(P\) that computes the (unbounded input length) Majority function \(Maj:\{0,1\}^* \rightarrow \{0,1\}\) where for every \(x\in \{0,1\}^*\), \(Maj(x)=1\) if and only if \(\sum_{i=0}^{x} x_i > x/2\). We say “produce” rather than “write” because you do not have to write the code of \(P\) by hand, but rather can use the programming language of your choice to compute this code.
Prove the following closure properties of the set \(\mathbf{R}\) defined in Definition 6.5:
If \(F \in \mathbf{R}\) then the function \(G(x) = 1  F(x)\) is in \(\mathbf{R}\).
If \(F,G \in \mathbf{R}\) then the function \(H(x) = F(x) \vee G(x)\) is in \(\mathbf{R}\).
If \(F \in \mathbf{R}\) then the function \(F^*\) in in \(\mathbf{R}\) where \(F^*\) is defined as follows: \(F^*(x)=1\) iff there exist some strings \(w_0,\ldots,w_{k1}\) such that \(x = w_0 w_1 \cdots w_{k1}\) and \(F(w_i)=1\) for every \(i\in [k]\).
If \(F \in \mathbf{R}\) then the function \[ G(x) = \begin{cases} \exists_{y \in \{0,1\}^{x}} F(xy) = 1 \\ 0 & \text{otherwise} \end{cases} \] is in \(\mathbf{R}\).
Prove that for every \(F:\{0,1\}^* \rightarrow \{0,1\}^*\), the function \(F\) is computable if and only if the following function \(G:\{0,1\}^* \rightarrow \{0,1\}\) is computable, where \(G\) is defined as follows: \(G(x,i,\sigma) = \begin{cases} F(x)_i & i < F(x), \sigma =0 \\ 1 & i < F(x), \sigma = 1 \\ 0 & i \geq F(x) \end{cases}\)
Recall that \(\mathbf{R}\) is the set of all total functions from \(\{0,1\}^*\) to \(\{0,1\}\) that are computable by a Turing machine (see Definition 6.5). Prove that \(\mathbf{R}\) is countable. That is, prove that there exists a onetoone map \(DtN:\mathbf{R} \rightarrow \mathbb{N}\). You can use the equivalence between Turing machines and NANDTM programs.
Prove that the set of all total functions from \(\{0,1\}^* \rightarrow \{0,1\}\) is not countable. You can use the results of Subsection 2.3.1. (We will see an explicit uncomputable function in Chapter 8.)
Bibliographical notes
Augusta Ada Byron, countess of Lovelace (18151852) lived a short but turbulent life, though is today most well known for her collaboration with Charles Babbage (see (Stein, 1987) for a biography). Ada took an immense interest in Babbage’s analytical engine, which we mentioned in Chapter 3. In 18423, she translated from Italian a paper of Menabrea on the engine, adding copious notes (longer than the paper itself) including some examples of programs. Because of these programs, some call Ada “the first computer programmer” though recent evidence shows they were likely written by Babbage himself (Holt, 2001) . Regardless, Ada was clearly one of very few people (perhaps the only one outside of Babbage himself) to fully appreciate how significant and revolutionary the idea of mechanizing computation truly is.
The books of Shetterly (Shetterly, 2016) and Sobel (Sobel, 2017) discuss the history of human computers (which were more often than not women) and their important contributions to scientific discoveries.
Alan Turing was one of the intellectual giants of the 20th century. He was not only the first person to define the notion of computation, but also invented and used some of the world’s earliest computational devices as part of the effort to break the Enigma cipher during World War II, saving millions of lives. Tragically, Turing committed suicide in 1954, following his conviction in 1952 for homosexual acts and a courtmandated hormonal treatment. In 2009, British prime minister Gordon Brown made an official public apology to Turing, and in 2013 Queen Elizabeth II granted Turing a posthumous pardon. Turing’s life is the subject of a great book and a mediocre movie.
One of the first programminglanguage formulations of Turing machines was given by Wang (Wang, 1957) . Our formulation is aimed at making the connection with circuits more direct, with the eventual goal of using it for the CookLevin Theorem, as well as results such as \(\mathbf{P} \subseteq \mathbf{P_{/poly}}\) and \(\mathbf{BPP} \subseteq \mathbf{P_{/poly}}\). The website esolangs.org features a large variety of esoteric Turingcomplete programming languages. One of the most famous of them is Brainf*ck.
In this book, we use the terminology of functions as opposed to languages. These are easily related by identifying a language \(L \subseteq \{0,1\}^*\) with its characteristic function \(1_L:\{0,1\}^* \rightarrow \{0,1\}\) defined as \(1_L(x)=1\) iff \(x\in L\). The characteristic function of a language is always a total function. The languagelike object that corresponds to a partial function is known as a promise problem.
Translation of “Sketch of the Analytical Engine” by L. F. Menabrea, Note A.
The reason the number of inputs is at most twice the number of gates is because we consider gates such as AND,OR,NOT and NAND that each have two inputs. For different sets of gates the constant factor can be different. However, the conceptual point of being only able to handle a finite number of inputs holds for any model of circuits or straightline programming language.
We use the symbol \(\triangleright\) to denote the beginning of the tape, and the symbol \(\varnothing\) to denote an empty cell. Hence we will assume that \(\Sigma\) contains these symbols, along with \(0\) and \(1\).
Unlike the extended Church Turing Thesis which we discussed in Section 5.5, the ChurchTuring thesis itself is widely believed and there are no candidate devices that attack it.
Another definition used in the literature is that a Turing machine \(M\) recognizes a language \(L\) if for every \(x\in L\), \(M(x)=1\) and for every \(x\not\in L\) \(M(x) \in \{0,\bot \}\). A language \(L\) is recursively enumerable if there exists a Turing machine \(M\) that recognizes it, and the set of all recursively enumerable languages is often denoted by \(\mathbf{RE}\). We will not use this terminology in this book.
A partial function \(F\) from a set \(A\) to a set \(B\) is a function that is only defined on a subset of \(A\), (see Subsection 1.5.4). We can also think of such a function as mapping \(A\) to \(B \cup \{ \bot \}\) where \(\bot\) is a special “failure” symbol such that \(F(a)=\bot\) indicates the function \(F\) is not defined on \(a\).
Note that if \(F\) is a total function, then it is defined on every \(x\in \{0,1\}^*\) and hence in this case, this definition is identical to Definition 6.4.
To borrow a term from the
C
programming language, on inputs \(x\) on which \(F\) is not defined, what \(M\) does is “undefined behaviour”.Most programming languages use arrays of fixed size, while a Turing machine’s tape is unbounded. But of course there is no need to store an infinite number of \(\varnothing\) symbols. If you want, you can think of the tape as a list that starts off just long enough to store the input, but is dynamically grown in size as the Turing machine’s head explores new positions.
Comments
Comments are posted on the GitHub repository using the utteranc.es app. A GitHub login is required to comment. If you don't want to authorize the app to post on your behalf, you can also comment directly on the GitHub issue for this page.
Compiled on 04/30/2019 13:03:47
Copyright 2019, Boaz Barak.
This work is licensed under a Creative Commons AttributionNonCommercialNoDerivatives 4.0 International License.
Produced using pandoc and panflute with templates derived from gitbook and bookdown.