See any bugs/typos/confusing explanations? Open a GitHub issue. You can also comment below

★ See also the PDF version of this chapter (better formatting/references) ★

Equivalent models of computation

  • Learn about RAM machines and λ calculus, which are important models of computation.
  • See the equivalence between these models and Turing machines.
  • See how many other models turn out to be “Turing complete”
  • Understand the Church-Turing thesis.

“All problems in computer science can be solved by another level of indirection”, attributed to David Wheeler.

“Because we shall later compute with expressions for functions, we need a distinction between functions and forms and a notation for expressing this distinction. This distinction and a notation for describing it, from which we deviate trivially, is given by Church.”, John McCarthy, 1960 (in paper describing the LISP programming language)

So far we have defined the notion of computing a function based on Turing machines, which don’t really correspond to the way computation is done in practice. In this chapter we justify this choice by showing that the definition of computable functions will remain the same under a wide variety of computational models. In fact, a widely believed claim known as the Church-Turing Thesis holds that every “reasonable” definition of computable function is equivalent to ours. We will discuss the Church-Turing Thesis and the potential definitions of “reasonable” in Section 7.8.

RAM machines and NAND-RAM

One of the limitations of NAND-TM (and Turing machines) is that we can only access one location of our arrays/tape at a time. If i\(=22\) and we want to access Foo[\(957\)] then it will take us at least 923 steps to get there. In contrast, almost every programming language has a formalism for directly accessing memory locations. Hardware implementations also provide so called Random Access Memory (RAM) which can be thought of as a large array Memory, such that given an index \(p\) (i.e., memory address, or a pointer), we can read from and write to the \(p^{th}\) location of Memory.1

A RAM Machine contains a finite number of local registers, each of which holds an integer, and an unbounded memory array. It can perform arithmetic operations on its register as well as load to a register r the contents of the memory at the address indexed by the number in register r'.
A RAM Machine contains a finite number of local registers, each of which holds an integer, and an unbounded memory array. It can perform arithmetic operations on its register as well as load to a register \(r\) the contents of the memory at the address indexed by the number in register \(r'\).

The computational model that allows access to such a memory is known as a RAM machine (sometimes also known as the Word RAM model), as depicted in Figure 7.1. In this model the memory is an array of unbounded size where each cell can store a single word, which we think of as a string in \(\{0,1\}^w\) and also as a number in \([2^w]\). For example, many modern computing architectures use \(64\) bit words, in which every memory location holds a string in \(\{0,1\}^{64}\) which can also be thought of as a number between \(0\) and \(2^{64}-1= 9,223,372,036,854,775,807\). The parameter \(w\) is known as the word size and (when doing theory) is chosen as some function of the input length \(n\). A typical choice is that \(w = c\log n\) for some constant \(c\). In addition to the memory array, a RAM machine also contains a constant number of registers \(r_0,\ldots,r_{k-1}\), each of which can also contain a single word. The operations in this model include loops, arithmetic on registers, and most importantly the ability to read and write to memory at the location specified by one of the register. Hence RAM machines can directly access each location of memory without having to move the “head” to that position as one needs to do in Turing machines.

We will not give a formal definition of RAM Machines, though the bibliographical notes section (Section 7.10) contains sources for such definitions. Rather, we will use an extension of the NAND-TM programming language to capture RAM algorithms. Specifically, we define the NAND-RAM programming language to be the following extension of NAND-TM:

  • The variables are allowed to be (non negative) integer valued rather than only Boolean. That is, a scalar variable foo holds an non negative integer in \(\N\) (rather than only a bit in \(\{0,1\}\)), and an array variable Bar holds an array of integers. As in the case of RAM machines, we will not allow integers of unbounded size. Concretely, each variable holds a number between \(0\) and \(T\), where \(T\) is the number of steps that have been executed by the program so far.2

  • We allow indexed access to arrays. If foo is a scalar and Bar is an array, then Bar[foo] refers to the location of Bar indexed by the value of foo. (Note that this means we don’t need to have a special index variable i any more.)

  • As is often the case in programming languages, we will assume that for Boolean operations such as NAND, a zero valued integer is considered as false, and a nonzero valued integer is considered as true.

To make NAND-RAM more realistic and similar to modern computer architecture, we make NAND-RAM “batteries included” and so the following features are built-in into NAND-TM (as opposed to using “syntactic sugar”):3

  • In addition to NAND, NAND-RAM also includes all the basic arithmetic operations of addition, subtraction, multiplication, (integer) division, as well as comparisons (equal, greater than, less than, etc..)

  • We will also include as part of the language basic control flow structures such as if and while.

The full description of the NAND-RAM programing language is in the appendix. However, the most important fact you need to know about NAND-RAM is the following:

For every function \(F:\{0,1\}^* \rightarrow \{0,1\}^*\), \(F\) is computable by a NAND-TM program if and only if \(F\) is computable by a NAND-RAM program.

Clearly NAND-RAM is only more powerful than NAND-TM, and so if a function \(F\) is computable by a NAND-TM program then it can be computed by a NAND-RAM program. The challenging direction is of course to transform a NAND-RAM program \(P\) to an equivalent NAND-TM program \(Q\). To describe the proof in full we will need to cover the full formal specification of the NAND-RAM language, and show how we can implement every one of its features as syntactic sugar on top of NAND-TM.

This can be done but going over all the operations in detail is rather tedious. Hence we will focus on describing the main ideas behind this transformation. The transformation has two steps:

  1. Indexed access of bit arrays: NAND-RAM generalizes NAND-TM in two main ways: (a) adding indexed access to the arrays (ie.., Foo[bar] syntax) and (b) moving from Boolean valued variables to integer valued ones. We will start by showing how to handle (a). Namely, we will show how we can implement in NAND-TM the operation Setindex(Bar) such that if Bar is an array that encodes some integer \(j\), then after executing Setindex(Bar) the value of i will equal to \(j\). This will allow us to simulate syntax of the form Foo[Bar] by Setindex(Bar) followed by Foo[i].

  2. Two dimensional bit arrays: We will then show how we can use “syntactic sugar” to augment NAND-TM with two dimensional arrays. That is, have two indices i and j and two dimensional arrays, such that we can use the syntax Foo[i][j] to access the (i,j)-th location of Foo

  3. Arrays of integers: Finally we will encode a one dimensional array Arr of integers by a two dimensional Arrbin of bits. The idea is simple: if \(a_{i,0},\ldots,a_{i,\ell}\) is a binary (prefix-free) representation of Arr[\(i\)], then Arrbin[\(i\)][\(j\)] will be equal to \(a_{i,j}\).

Once we have arrays of integers, we can use our usual syntactic sugar for functions, GOTO etc. to implement the arithmetic and control flow operations of NAND-RAM.

The gory details (optional)

We do not show the full formal proof of Theorem 7.1 but focus on the most important parts: implementing indexed access, and simulating two dimensional arrays with one dimensional ones. Even these are already quite tedious to describe, as will not be surprising to anyone that has ever written a compiler. Hence you can feel free to merely skim this section. The important point is not for you to know all details by heart but to be convinced that in principle it is possible to transform a NAND-RAM program to an equivalent NAND-TM program, and even be convinced that, with sufficient time and effort, you could do it if you wanted to.

Indexed access in NAND-TM

Let us choose some prefix-free representation for the natural numbers (see Subsection 2.4.2). For example, if a natural number \(k\) is equal to \(\sum_{i=0}^{\ell} k_i \cdot 2^i\) for \(\ell=\floor{\log k}\), then we can represent it as the string \((k_0,k_0,k_1,k_1,\ldots,k_\ell,k_\ell,1,0)\).

To implement indexed access in NAND-TM, we need to be able to do the following. Given an array Bar, implement to operation Setindex(Bar) that will set i to the value encoded by Bar. This can be achieved as follows:

  1. Set i to zero, by decrementing it until we reach the point where Atzero[i]\(=1\) (where Atzero is an array that has \(1\) only in position \(0\)).

  2. Let Temp be an array encoding the number \(0\).

  3. While the number encoded by Temp differs from the number encoded by Bar:

    1. Increment Temp
    2. GOTO the same line (along the performing a MODANDJUMP operation that increments i by one).

At the end of the loop, i is equal to the value at Bar, and so we can use this to read or write to arrays at the location corresponding to this value. In code, we can implement the above operations as follows:

Two dimensional arrays in NAND-TM

To implement two dimensional arrays, we want to embed them in a one dimensional array. The idea is that we come up with a one to one function \(embed:\N \times \N \rightarrow \N\), and so embed the location \((i,j)\) of the two dimensional array Two in the location \(embed(i,j)\) of the array One.

Since the set \(\N \times \N\) seems “much bigger” than the set \(\N\), a priori it might not be clear that such a one to one mapping exists. However, once you think about it more, it is not that hard to construct. For example, you could ask a child to use scissors and glue to transform a 10" by 10" piece of paper into a 1" by 100" strip. If you think about it, this is essentially a one to one map from \([10]\times [10]\) to \([100]\). We can generalize this to obtain a one to one map from \([n]\times [n]\) to \([n^2]\) and more generally a one to one map from \(\N \times \N\) to \(\N\). Specifically, the following map \(embed\) would do (see Figure 7.2):

\[embed(x,y) = \tfrac{1}{2}(x+y)(x+y+1)+x\;\;.\]

We ask you to prove that \(embed\) is indeed one to one, as well as computable by a NAND-TM program, in Exercise 7.2.

Illustration of the map embed(x,y) = \tfrac{1}{2}(x+y)(x+y+1)+x for x,y \in [10], one can see that for every distinct pairs (x,y) and (x',y'), embed(x,y) \neq embed(x',y').
Illustration of the map \(embed(x,y) = \tfrac{1}{2}(x+y)(x+y+1)+x\) for \(x,y \in [10]\), one can see that for every distinct pairs \((x,y)\) and \((x',y')\), \(embed(x,y) \neq embed(x',y')\).

So, we can replace code of the form Two[Foo][Bar] = something (i.e., access the two dimensional array Two at the integers encoded by the one dimensional arrays Foo and Bar) by code of the form:

Computing embed is left for you the reader as Exercise 7.2, but let us hint that this can be done by simply following the grade-school algorithms for multiplication, addition, and division.

All the rest

Once we have two dimensional arrays and indexed access, simulating NAND-RAM with NAND-TM is just a matter of implementing the standard algorithms for arithmetic operations and comparators in NAND-TM. While this is cumbersome, it is not difficult, and the end result is to show that every NAND-RAM program \(P\) can be simulated by an equivalent NAND-TM program \(Q\), thus completing the proof of Theorem 7.1.

Turing equivalence (discussion)

A punched card corresponding to a Fortran statement.
A punched card corresponding to a Fortran statement.

Any of the standard programming language such as C, Java, Python, Pascal, Fortran have very similar operations to NAND-RAM. (Indeed, ultimately they can all be executed by machines which have a fixed number of registers and a large memory array.) Hence using Theorem 7.1, we can simulate any program in such a programming language by a NAND-TM program. In the other direction, it is a fairly easy programming exercise to write an interpreter for NAND-TM in any of the above programming languages. Hence we can also simulate NAND-TM programs (and so by Theorem 6.11, Turing machines) using these programming languages. This property of being equivalent in power to Turing Machines / NAND-TM is called Turing Equivalent (or sometimes Turing Complete). Thus all programming languages we are familiar with are Turing equivalent.4

One concept that appears in many programming languages but we did not include in NAND-RAM programs is recursion. However, recursion (and function calls in general) can be implemented in NAND-RAM using the stack data structure. A stack is a data structure containing a sequence of elements, where we can “push” elements into it and “pop” them from it in “first in last out” order.

We can implement a stack using an array of integers Stack and a scalar variable stackpointer that will be the number of items in the stack. We implement push(foo) by

and implement bar = pop() by

We implement a function call to \(F\) by pushing the arguments for \(F\) into the stack. The code of \(F\) will “pop” the arguments from the stack, perform the computation (which might involve making recursive or non recursive calls) and then “push” its return value into the stack. Because of the “first in last out” nature of a stack, we do not return control to the calling procedure until all the recursive calls are done.

The fact that we can implement recursion using a non-recursive language is not surprising. Indeed, machine languages typically do not have recursion (or function calls in general), and hence a compiler implements function calls using a stack and GOTO. You can find online tutorials on how recursion is implemented via stack in your favorite programming language, whether it’s Python , JavaScript, or Lisp/Scheme.

The “Best of both worlds” paradigm

The equivalence between Turing Machines and RAM machines allows us to choose the most convenient language for the task at hand:

  • When we want to prove a theorem about all programs/algorithms, we can use Turing machines (or NAND-TM) since they are simpler and easier to analyze. In particular, if we want to show that a certain function can not be computed, then we will use Turing machines.

  • When we want to show that a function can be computed we can use RAM machines or NAND-RAM, because they are easier to program in and correspond more closely to high level programming languages we are used to. In fact, we will often describe NAND-RAM programs in an informal manner, trusting that the reader can fill in the details and translate the high level description to the precise program. (This is just like the way people typically use informal or “pseudocode” descriptions of algorithms, trusting that their audience will know to translate these descriptions to code if needed.)

Our usage of Turing Machines / NAND-TM and RAM Machines / NAND-RAM is very similar to the way people use in practice high and low level programming languages. When one wants to produce a device that executes programs, it is convenient to do so for very simple and “low level” programming language. When one wants to describe an algorithm, it is convenient to use as high level a formalism as possible.

By having the two equivalent languages NAND-TM and NAND-RAM, we can “have our cake and eat it too”, using NAND-TM when we want to prove that programs can’t do something, and using NAND-RAM or other high level languages when we want to prove that programs can do something.
By having the two equivalent languages NAND-TM and NAND-RAM, we can “have our cake and eat it too”, using NAND-TM when we want to prove that programs can’t do something, and using NAND-RAM or other high level languages when we want to prove that programs can do something.

Using equivalence results such as those between Turing and RAM machines, we can “have our cake and eat it too”.

We can use a simpler model such as Turing machines when we want to prove something can’t be done, and use a feature-rich model such as RAM machines when we want to prove something can be done.

Let’s talk about abstractions.

“The programmer is in the unique position that … he has to be able to think in terms of conceptual hierarchies that are much deeper than a single mind ever needed to face before.”, Edsger Dijkstra, “On the cruelty of really teaching computing science”, 1988.

At some point in any theory of computation course, the instructor and students need to have the talk. That is, we need to discuss the level of abstraction in describing algorithms. In algorithms courses, one typically describes algorithms in English, assuming readers can “fill in the details” and would be able to convert such an algorithm into an implementation if needed. For example, we might describe the breadth first search algorithm to find if two vertices \(u,v\) are connected as follows:

Input: Graph \(G\), vertices \(u,v\)

Operation:

  1. Put \(u\) in queue \(Q\).

  2. While \(Q\) is not empty:

    • Remove the top vertex \(w\) from \(Q\)
    • If \(w=v\) then declare “connected” and exit.
    • Mark \(w\) and add all unmarked neighbors of \(w\) to \(Q\).
  3. Declare “unconnected”.

We call such a description a high level description.

If we wanted to give more details on how to implement breadth first search in a programming language such as Python or C (or NAND-RAM / NAND-TM for that matter), we would describe how we implement the queue data structure using an array, and similarly how we would use arrays to implement the marking. We call such an “intermediate level” description an implementation level or pseudocode description. Finally, if we want to describe the implementation precisely, we would give the full code of the program (or another fully precise representation, such as in the form of a list of tuples). We call this a formal or low level description.

We can describe an algorithm at different levels of granularity/detail and precision. At the highest level we just write the idea in words, omitting all details on representation and implementation. In the intermediate level (also known as implementation or pseudocode) we give enough details of the implementation that would allow someone to derive it, though we still fall short of providing the full code. The lowest level is where the actual code or mathematical description is fully spelled out. These different levels of detail all have their uses, and moving between them is one of the most important skills for a computer scientist.
We can describe an algorithm at different levels of granularity/detail and precision. At the highest level we just write the idea in words, omitting all details on representation and implementation. In the intermediate level (also known as implementation or pseudocode) we give enough details of the implementation that would allow someone to derive it, though we still fall short of providing the full code. The lowest level is where the actual code or mathematical description is fully spelled out. These different levels of detail all have their uses, and moving between them is one of the most important skills for a computer scientist.

While we started off by describing NAND-CIRC, NAND-TM, and NAND-RAM programs at the full formal level, as we progress in this book we will move to implementation and high level description. After all, our goal is not to use these models for actual computation, but rather to analyze the general phenomenon of computation. That said, if you don’t understand how the high level description translates to an actual implementation, going “down to the metal” is often an excellent exercise. (One of the most important skills for a computer scientist is the ability to move up and down hierarchies of abstractions.)

A similar distinction applies to the notion of representation of objects as strings. Sometimes, to be precise, we give a low level specification of exactly how an object maps into a binary string. For example, we might describe an encoding of \(n\) vertex graphs as length \(n^2\) binary strings, by saying that we map a graph \(G\) over the vertices \([n]\) to a string \(x\in \{0,1\}^{n^2}\) such that the \(n\cdot i + j\)-th coordinate of \(x\) is \(1\) if and only if the edge \(\overrightarrow{i \; j}\) is present in \(G\). We can also use an intermediate or implementation level description, by simply saying that we represent a graph using the adjacency matrix representation.

Finally, because we are translating between the various representations of graphs (and objects in general) can be done via a NAND-RAM (and hence a NAND-TM) program, when talking in a high level we also suppress discussion of representation altogether. For example, the fact that graph connectivity is a computable function is true regardless of whether we represent graphs as adjacency lists, adjacency matrices, list of edge-pairs, and so on and so forth. Hence, in cases where the precise representation doesn’t make a difference, we would often talk about our algorithms as taking as input an object \(O\) (that can be a graph, a vector, a program, etc.) without specifying how \(O\) is encoded as a string.

Definition of “Algorithm”

Up until now we have use the term “algorithm” informally. However, Turing Machines and the range of equivalent models give a way to precisely and formally define algorithms. Hence whenever we refer to an algorithm in this book, we will mean that it is an instance of one of the Turing equivalent models, such as Turing machines, NAND-TM, RAM machines, etc. Because of the equivalence of all these models, in many contexts, it will not matter which of these we use.

Lambda calculus and functional programming languages

The λ calculus is another way to define computable functions. It was proposed by Alonzo Church in the 1930’s around the same time as Alan Turing’s proposal of the Turing Machine. Interestingly, while Turing Machines are not used for practical computation, the λ calculus has inspired functional programming languages such as LISP, ML and Haskell, and indirectly the development of many other programming languages as well. In this section we will present the λ calculus and show that its power is equivalent to NAND-TM programs (and hence also to Turing machines). Our Github rpository contains a Jupyter notebook with a Python implementation of the λ calculus that you can experiment with to get a better feel for this topic.

The λ operator. At the core of the λ calculus is a way to define “anonymous” functions. For example, instead of defining the squaring function as

\[ square(x) = x\times x \]

we can write it as

\[ \lambda x. x\times x \]

and so \((\lambda x.x\times x)(7)=49\). That is, you can think of \(\lambda x. exp(x)\), where \(exp\) is some expression as a way of specifying the anonymous function \(x \mapsto exp(x)\).5 Clearly, the name of the argument to a function doesn’t matter, and so \(\lambda y.y\times y\) is the same as \(\lambda x.x \times x\), as both correspond to the squaring function.

To reduce notational clutter, when writin$1 $2 calculus expression we often drop the parenthesis for function evaluation. Hence instead of writing \(f(x)\) for the result of applying the function \(f\) to the input \(x\), we can also write this as simply \(f\; x\). Therefore we can write \((\lambda x.x\times x) 7=49\). In this chapter, we will use both the \(f(x)\) and \(f\; x\) notations for function application. Function evaluations are associative and bind from left to right, and hence \(f\;g\;h\) is the same as \((f g) h\).

Applying functions to functions

A key feature of the λ calculus is that we can use functions as arguments to other functions. For example, can you guess what number is the following expression equal to?

\[(((\lambda f.(\lambda y.(f \;(f\; y)))) (\lambda x. x\times x))\; 3) \;\;(7.4)\]

The expression Equation 7.4 might seem daunting, but before you look at the solution below, try to break it apart to its components, and evaluate each component at a time. Working out this example would go a long way toward understanding the λ calculus.

Let’s evaluate Equation 7.4 one step at a time. As nice as it is for the λ calculus to allow anonymous functions, adding names can be very helpful for understanding complicated expressions. So, let us write \(F = \lambda f.(\lambda y.(f (f y)))\) and \(g = \lambda x.x\times x\).

Therefore Equation 7.4 becomes \[ ((F \; g)\; 3) \;. \]

On input a function \(f\), \(F\) outputs the function \(\lambda y.(f (f\; y))\), or in other words \(F f\) is the function \(y \mapsto f(f(y))\). Our function \(g\) is simply \(g(x)=x^2\) and so \((F g)\) is the function that maps \(y\) to \((y^2)^2\) or in other words to \(y^4\). Hence \(((F g) 3) = 3^4 = 81\).

Here is another example of a λ expression:

\[((\lambda x.(\lambda y.x)) \; 2)\; 9) \;. \;\;(7.6)\]

Let us denote \((\lambda y.x)\) by \(F\). Then Equation 7.6 has the form

\[((\lambda x. F) \; 2) \; 9)\]

Now \((\lambda x.F) 2\) is equal to \(F[x \rightarrow 2]\). Since \(F\) is \(\lambda y.x\) this means that \((\lambda x.F) 2\) is the function \(\lambda y.2\) that ignores its input and outputs \(2\) no matter what it is equal to. Hence Equation 7.6 is equivalent to \((\lambda y. 2) 9\) which is the result of applying the function \(y \mapsto 2\) on the input \(9\), which is simply the number \(2\).

Obtaining multi-argument functions via Currying

As we’ve seen, in a λ expression of the form \(\lambda x. e\), the expression \(e\) can itself involve the λ operator. Thus for example the function

\[ \lambda x. (\lambda y. x+y) \;\;(7.8) \]

maps \(x\) to the function \(y \mapsto x+y\).

In particular, if we invoke the function Equation 7.8 on \(a\), and then invoke the result of this invocation on \(b\), we will get the value \(a+b\). We can see that the one-argument function Equation 7.8 corresponding to \(a \mapsto (b \mapsto a+b)\) can also be thought of as the two-argument function \((a,b) \mapsto a+b\). In general, we will use the λ expression \(\lambda x.(\lambda y.f(x,y))\) to simulate the effect of a two argument function \((x,y) \mapsto f(x,y)\). This technique is known as Currying. We will use the shorthand \(\lambda x,y. e\) for \(\lambda x. (\lambda y. e)\). If \(f= \lambda x.(\lambda y.e)\) then \((f a) b)\) corresponds to applying \(f a\) and then invoking the resulting function on \(b\), obtaining the result of replacing in \(e\) the occurrences of \(x\) with \(a\) and occurrences of \(b\) with \(y\). By our rules of associativity, this is the same as \((f a b)\) which we’ll sometimes also write as \(f(a,b)\).

In the “currying” transformation, we can create the effect of a two parameter function f(x,y) with the λ expression \lambda x.(\lambda y. f(x,y)) which on input x outputs a one-parameter function f_x that has x “hardwired” into it and such that f_x(y)=f(x,y). This can be illustrated by a circuit diagram; see Chelsea Voss’s site.
In the “currying” transformation, we can create the effect of a two parameter function \(f(x,y)\) with the λ expression \(\lambda x.(\lambda y. f(x,y))\) which on input \(x\) outputs a one-parameter function \(f_x\) that has \(x\) “hardwired” into it and such that \(f_x(y)=f(x,y)\). This can be illustrated by a circuit diagram; see Chelsea Voss’s site.

Formal description of the λ calculus.

We now provide a formal description of the λ calculus. We start with “basic expressions” that contain a single variable such as \(x\) or \(y\) and build more complex expressions using the following two rules:

  • Application: If \(e\) and \(e'\) are λ expressions, then the λ expression \((e \; e')\) corresponds to applying the function described by \(e\) to the input \(e'\).

  • Abstraction: If \(e\) is y i expression and \(x\) is a variable, then the λ expression \(\lambda x.(e)\) corresponds to the function that on any input \(z\) returns the expression \(e[x \rightarrow z]\) replacing all (free) occurrences of \(x\) in \(e\).6

Formally λ expressions are defined as follows:

A λ expression is either a single variable identifier or an expression that is built from other expressions using the application and abstraction operations.

Definition 7.6 is a recursive definition since we defined the concept of λ expressions in terms of itself. Specifically, $1 $2 expression can either be the “base case” of the form \(\mathit{foo}\) for a variable identifier \(\mathit{foo}\), or it can be of the form \((e e')\) or \(\lambda \mathit{bar}.(e)\) where \(e\) and \(e'\) are other λ expressions and \(\mathit{bar}\) is a variable identifier. Such a recursive definition might seem confusing at first, but in fact you have known recursive definitions since you were an elementary school student. Consider how we define an arithmetic expression: it is an expression that is either just a number, or has one of the forms \((e + e')\), \((e - e')\), \((e \times e')\), or \((e \div e')\), where \(e\) and \(e'\) are other arithmetic expressions.

We will use the following rules to allow us to drop some parenthesis. Function application associates from left to right, and so \(fgh\) is the same as \((fg)h\). Function application has a higher precedence than the λ operator, and so \(\lambda x.fgx\) is the same as \(\lambda x.((fg)x)\).

This is similar to how we use the precedence rules in arithmetic operations to allow us to use fewer parenthesis and so write the expression \((7 \times 3) + 2\) as \(7\times 3 + 2\).

As mentioned in ??, we also use the shorthand \(\lambda x,y.e\) for \(\lambda x.(\lambda y.e)\) and the shorthand \(f(x,y)\) for \((f\; x)\; y\). This plays nicely with the “Currying” transformation of simulating multi-input functions using λ expressions.

As we have seen in Equation 7.6, the rule that \((\lambda x. exp) exp'\) is equivalent to \(exp[x \rightarrow exp']\) enables us to modify λ expressions and obtain simpler equivalent form for them. Another rule that we can use is that the parameter does not matter and hence for example \(\lambda y.y\) is the same as \(\lambda z.z\). Together these rules define the notion of equivalence of λ expressions:

Two λ expressions are equivalent if they can be made into the same expression by repeated applications of the following rules:7

  1. Evaluation (aka \(\beta\) reduction): The expression \((\lambda x.exp) exp'\) is equivalent to \(exp[x \rightarrow exp']\).

  2. Variable renaming (aka \(\alpha\) conversion): The expression \(\lambda x.exp\) is equivalent to \(\lambda y.exp[x \rightarrow y]\).

If \(exp\) is a λ expression of the form \(\lambda x.exp'\) then it naturally corresponds to the function that maps any input \(z\) to \(exp'[x \rightarrow z]\). Hence the λ calculus naturally implies a computational model. Since in the λ calculus the inputs can themselves be functions, we need to decide how to evaluate an expression such as

\[ (\lambda x.f)(\lambda y.g z) \;. \;\;(7.9) \] There are two natural conventions for this:

  • Call by name: We evaluate Equation 7.9 by first plugging in the righthand expression \((\lambda y.g z)\) as input to the lefthand side function, obtaining \(f[x \rightarrow (\lambda y.g z)]\) and then continue from there.

  • Call by value: We evaluate Equation 7.9 by first evaluating the righthand side and obtaining \(h=g[y \rightarrow z]\), and then plugging this into the lefthandside to obtain \(f[x \rightarrow h]\).

Because the λ calculus has only pure functions, that do not have “side effects”, in many cases the order does not matter. In fact, it can be shown that if we obtain an definite irreducible expression (for example, a number) in both strategies, then it will be the same one. However, there could be situations where “call by value” goes into an infinite loop while “call by name” does not. Hence we will use “call by name” henceforth.8

Functions as first class objects

The key property of the λ calculus (and functional languages in general) is that functions are “first-class citizens” in the sense that they can be used as parameters and return values of other functions. Thus, we can invoke one λ expression on another. For example if \(\ensuremath{\mathit{DOUBLE}}\) is the λ expression \(\lambda f.(\lambda x. f(fx))\), then for every function \(f\), \(\ensuremath{\mathit{DOUBLE}}\; f\) corresponds to the function that invokes \(f\) twice on \(x\) (i.e., first computes \(fx\) and then invokes \(f\) on the result). In particular, if \(f=\lambda y.(y+1)\) then \(\ensuremath{\mathit{DOUBLE}}\; f = \lambda x.(x+2)\).

Unlike most programming languages, the pure λ-calculus doesn’t have the notion of types. Every object in the λ calculus can also be thought of as a λ expression and hence as a function that takes one input and returns one output. All functions take one input and return one output, and if you feed a function an input of a form it didn’t expect, it still evaluates the λ expression via “search and replace”, replacing all instances of its parameter with copies of the input expression you fed it.

The “Enhanced” λ calculus

We now discuss the λ calculus as a computational model. We will start by describing an “enhanced” version of the λ calculus that contains some “superfluous features” but is easier to wrap your head around. We will first show how the enhanced λ calculus is equivalent to Turing machines in computational power. Then we will show how all the features of “enhanced λ calculus” can be implemented as “syntactic sugar” and even the “pure” (i.e., non enhanced) λ calculus is equivalent in power to Turing machines (and hence also to RAM machines and all other Turing-equivalent models).

The enhanced λ calculus includes the following set of objects and operations:

  • Boolean constants and IF function: The enhanced λ calculus has the constants \(0\) and \(1\) and the \(\ensuremath{\mathit{IF}}\) function such that for every \(cond \in \{0,1\}\) and λ expressions \(a,b\), \(\ensuremath{\mathit{IF}} cond\;a\;b\) outputs \(a\) if \(cond=1\) and outputs \(b\) if \(cond=0\).9 Using \(\ensuremath{\mathit{IF}}\) and the constants \(0,1\) we can also compute logical operations such as \(\ensuremath{\mathit{AND}},\ensuremath{\mathit{OR}},\ensuremath{\mathit{NOT}},\ensuremath{\mathit{NAND}}\). For example, \(\ensuremath{\mathit{NOT}} = \lambda a. \ensuremath{\mathit{IF}} a 0 1\) and \(\ensuremath{\mathit{AND}} = \lambda a,b. \ensuremath{\mathit{IF}} a b 0\).

  • Pairs: We have the function \(\ensuremath{\mathit{PAIR}}\) such that \(\ensuremath{\mathit{PAIR}}\; x\; y\) returns the pair \((x,y)\) that holds \(x\) and \(y\). We also have the functions \(\ensuremath{\mathit{HEAD}}\) and \(\ensuremath{\mathit{TAIL}}\) to extract the first and second member of a pair respectively. Hence, \(\ensuremath{\mathit{HEAD}} (\ensuremath{\mathit{PAIR}} a b) = a\) and \(\ensuremath{\mathit{TAIL}} (\ensuremath{\mathit{PAIR}} a b) = b\).

  • Lists and strings: Using \(\ensuremath{\mathit{PAIR}}\) we can also construct lists. The idea is that \(\ensuremath{\mathit{PAIR}}\; a\; L\) corresponds to the list obtained by adding the element \(a\) to the beginning of a list \(L\). By repeating this operation, we can construct lists of any length. Specifically, we will have a special λ expression \(\ensuremath{\mathit{NIL}}\) that corresponds to the empty list, which we also denote by \(\langle \rangle\). If \(c\) is some λ expression, then \(\ensuremath{\mathit{PAIR}}\; c \; \ensuremath{\mathit{NIL}}\) corresponds to the single-element list \(\langle c \rangle\). Now for every λ expressions \(b,c\), the expression \(\ensuremath{\mathit{PAIR}} \; b \; (\ensuremath{\mathit{PAIR}} C \ensuremath{\mathit{NIL}})\) corresponds to the two-element list \(\langle b , c \rangle\). Similarly the expression \(\ensuremath{\mathit{PAIR}} \; a (\ensuremath{\mathit{PAIR}} \; b \; (\ensuremath{\mathit{PAIR}} \; c \; \ensuremath{\mathit{NIL}}))\) corresponds to the list \(\langle a,b,c \rangle\) and so on and so forth.10 The function \(\ensuremath{\mathit{ISEMPTY}}\) returns \(1\) on \(\ensuremath{\mathit{NIL}}\) and returns \(0\) on every other list. A string is simply a list of bits.

  • List operations: The enhanced λ calculus also contains the list-processing functions \(\ensuremath{\mathit{MAP}}\), \(\ensuremath{\mathit{REDUCE}}\), and \(\ensuremath{\mathit{FILTER}}\). Given a list \(L= \langle x_0,\ldots,x_{n-1}\rangle\) and a function \(f\), \(\ensuremath{\mathit{MAP}}\; L \; f\) applies \(f\) on every member of the list to obtain the new list \(L'= \langle f(x_0),\ldots,f(x_{n-1})\rangle\). Given a list \(L\) as above and a function \(f\) whose output is either \(0\) or \(1\), \(\ensuremath{\mathit{FILTER}}\; L\; f\) returns the list \(\langle x_i \rangle_{f x_i = 1}\) containing all the elements of \(L\) for which \(f\) outputs \(1\). The function \(\ensuremath{\mathit{REDUCE}}\) applies a “combining” operation to a list. For example, \(\ensuremath{\mathit{REDUCE}}\; L \; + \; 0\) will return the sum of all the elements in the list \(L\). The sum of a list is defined recursively as follows: the sum of the empty list is \(0\), and the sum of a non-empty list \(L\) is obtained by recursively summing \(\ensuremath{\mathit{TAIL}}\;L\) (i.e., all elements of \(L\) except the first) and adding the result to \(\ensuremath{\mathit{HEAD}}\;L\) (which is the first element of \(L\)). More generally, \(\ensuremath{\mathit{REDUCE}}\) takes a list \(L\), an operation \(f\) (which we think of as taking two arguments) and a λ expression \(z\) (which we think of as the “neutral element” for the operation \(f\), such as \(0\) for addition and \(1\) for multiplication). The output is defined via

\[\ensuremath{\mathit{REDUCE}}\;L\;f\;z = \begin{cases}z & L=\ensuremath{\mathit{NIL}} \\ f\;(\ensuremath{\mathit{HEAD}} L) \; (\ensuremath{\mathit{REDUCE}}\;(\ensuremath{\mathit{TAIL}} L)\;f\;z) & \text{otherwise}\end{cases}\;.\] See Figure 7.8 for an illustration of the three list-processing operations.

  • Recursion: Finally, we want to be able to execute recursive functions. Since in λ calculus functions are anonymous, we can’t write a definition of the form \(f(x) = blah\) where \(blah\) includes calls to \(f\). Instead we use functions \(f\) that take an additional input \(me\) as a parameter. The operator \(\ensuremath{\mathit{RECURSE}}\) will take such a function \(f\) as input and return a “recursive version” of \(f\) where all the calls to \(me\) are replaced by recursive calls to this function. That is, if we have a function \(F\) taking two parameters \(me\) and \(x\), then \(\ensuremath{\mathit{RECURSE}}\; F\) will be the function \(f\) taking one parameter \(x\) such that \(f(x) = F(f,x)\) for every \(x\).

Give a λ expression \(N\) such that \(N\;x\;y = \ensuremath{\mathit{NAND}}(x,y)\) for every \(x,y \in \{0,1\}\).

This can be done in a similar way to how we computed \(\ensuremath{\mathit{XOR}}_2\). The \(\ensuremath{\mathit{NAND}}\) of \(x,y\) is equal to \(1\) unless \(x=y=1\). Hence we can write

\[ N = \lambda x,y.\ensuremath{\mathit{IF}}(x,\ensuremath{\mathit{IF}}(y,0,1),1) \]

Let us see how we can compute the XOR of a list in the enhanced λ calculus. First, we note that we can compute XOR of two bits as follows: \[ \ensuremath{\mathit{NOT}} = \lambda a. \ensuremath{\mathit{IF}}(a,0,1) \;\;(7.12) \] and \[ \ensuremath{\mathit{XOR}}_2 = \lambda a,b. \ensuremath{\mathit{IF}}(b,\ensuremath{\mathit{NOT}}(a),a) \;\;(7.13) \]

(We are using here a bit of syntactic sugar to describe the functions. To obtain the λ expression for XOR we will simply replace the expression Equation 7.12 in Equation 7.13.)

Now recursively we can define the XOR of a list as follows:

\[ \ensuremath{\mathit{XOR}}(L) = \begin{cases} 0 & \text{$L$ is empty} \\ \ensuremath{\mathit{XOR}}_2(\ensuremath{\mathit{HEAD}}(L),\ensuremath{\mathit{XOR}}(\ensuremath{\mathit{TAIL}}(L))) & \text{otherwise} \end{cases} \]

This means that \(\ensuremath{\mathit{XOR}}\) is equal to

\[ \ensuremath{\mathit{RECURSE}} \; \bigl(\lambda me,L. \ensuremath{\mathit{IF}}(\ensuremath{\mathit{ISEMPTY}}(L),0,\ensuremath{\mathit{XOR}}_2(\ensuremath{\mathit{HEAD}}\;L\;\;,\;\;me(\ensuremath{\mathit{TAIL}} \; L)))\bigr) \;. \]

That is, \(\ensuremath{\mathit{XOR}}\) is obtained by applying the \(\ensuremath{\mathit{RECURSE}}\) operator to the function that on inputs \(me\), \(L\), returns \(0\) if \(\ensuremath{\mathit{ISEMPTY}}(L)\) and otherwise returns \(\ensuremath{\mathit{XOR}}_2\) applied to \(\ensuremath{\mathit{HEAD}}(L)\) and to \(me(\ensuremath{\mathit{TAIL}}(L))\).

We could have also computed \(\ensuremath{\mathit{XOR}}\) using the \(\ensuremath{\mathit{REDUCE}}\) operation, we leave working this out as an exercise to the reader.

A list \langle x_0,x_1,x_2 \rangle in the λ calculus is constructed from the tail up, building the pair \langle x_2,\ensuremath{\mathit{NIL}}\rangle, then the pair \langle x_1, \langle x_2,\ensuremath{\mathit{NIL}}\rangle \rangle and finally the pair \langle x_0,\langle x_1,\langle x_2,\ensuremath{\mathit{NIL}} \rangle\rangle\rangle. That is, a list is a pair where the first element of the pair is the first element of the list and the second element is the rest of the list. The figure on the left renders this “pairs inside pairs” construction, though it is often easier to think of a list as a “chain”, as in the figure on the right, where the second element of each pair is thought of as a link, pointer or reference to the remainder of the list.
A list \(\langle x_0,x_1,x_2 \rangle\) in the λ calculus is constructed from the tail up, building the pair \(\langle x_2,\ensuremath{\mathit{NIL}}\rangle\), then the pair \(\langle x_1, \langle x_2,\ensuremath{\mathit{NIL}}\rangle \rangle\) and finally the pair \(\langle x_0,\langle x_1,\langle x_2,\ensuremath{\mathit{NIL}} \rangle\rangle\rangle\). That is, a list is a pair where the first element of the pair is the first element of the list and the second element is the rest of the list. The figure on the left renders this “pairs inside pairs” construction, though it is often easier to think of a list as a “chain”, as in the figure on the right, where the second element of each pair is thought of as a link, pointer or reference to the remainder of the list.
Illustration of the \ensuremath{\mathit{MAP}}, \ensuremath{\mathit{FILTER}} and \ensuremath{\mathit{REDUCE}} operations.
Illustration of the \(\ensuremath{\mathit{MAP}}\), \(\ensuremath{\mathit{FILTER}}\) and \(\ensuremath{\mathit{REDUCE}}\) operations.

Enhanced λ expressions

An enhanced λ expression is obtained by composing the objects above with the application and abstraction rules. We can now define the notion of computing a function using the λ calculus. We will define the simplification of a λ expression as the following recursive process:

  1. (Evaluation / \(\beta\) reduction.) If the expression has the form \((exp_L exp_R)\) then replace the expression with \(exp'_L[x \rightarrow exp_R]\).

  2. (Renaming / \(\alpha\) conversion.) When we cannot simplify any further, rename the variables so that the first bound variable in the expression is \(v_0\), the second one is \(v_1\), and so on and so forth.

Please make sure you understand why this recursive procedure simply corresponds to the “call by name” evaluation strategy.

The result of simplifying a λ expression is an equivalent expression, and hence if two expressions have the same simplification then they are equivalent.

Let \(F:\{0,1\}^* \rightarrow \{0,1\}^*\) be a function and \(exp\) a λ expression. For every \(x\in \{0,1\}^n\), we denote by \(\ensuremath{\mathit{LIST}}(x)\) th$1 $2 list \(\ensuremath{\mathit{PAIR}}(x_0, \ensuremath{\mathit{PAIR}}( x_1 , \ensuremath{\mathit{PAIR}}(\cdots \ensuremath{\mathit{PAIR}}(x_{n-1} \ensuremath{\mathit{NIL}}))))\) that corresponds to \(x\).

We say that \(exp\) computes \(F\) if for every \(x\in \{0,1\}^*\), the expressions \((exp \ensuremath{\mathit{LIST}}(x))\) and \(\ensuremath{\mathit{LIST}}(F(x))\) are equivalent, and moreover they have the same simplification.

Enhanced λ calculus is Turing-complete

The basic operations of the enhanced λ calculus more or less amount to the Lisp or Scheme programming languages.11 Given that, it is perhaps not surprising that the enhanced λ-calculus is equivalent to Turing machines:

For every function \(F:\{0,1\}^* \rightarrow \{0,1\}^*\), \(F\) is computable in the enhanced λ calculus if and only if it is computable by a Turing machine.

To prove the theorem, we need to show that (1) if \(F\) is computable by a λ calculus expression then it is computable by a Turing machine, and (2) if \(F\) is computable by a Turing machine, then it is computable by an enhanced λ calculus expression.

Showing (1) is fairly straightforward. Applying the simplification rules to a λ expression basically amounts to “search and replace” which we can implement easily in, say, NAND-RAM, or for that matter Python (both of which are equivalent to Turing machines in power). Showing (2) essentially amounts to simulating a Turing machine (or writing a NAND-TM interpreter) in a functional programming language such as LISP or Scheme. Showing how this can be done is a good exercise in mastering some functional programming techniques that are useful in their own right.

We only sketch the proof. The “if” direction is simple. As mentioned above, evaluating λ expressions basically amounts to “search and replace”. It is also a fairly straightforward programming exercise to implement all the above basic operations in an imperative language such as Python or C, and using the same ideas we can do so in NAND-RAM as well, which we can then transform to a NAND-TM program.

For the “only if” direction, we need to simulate a Turing machine, or equivalently a NAND-TM program, using a λ expression. First, by Solvedexercise 7.1 we can compute the \(\ensuremath{\mathit{NAND}}\) function, and hence every finite function, using the λ calculus. Thus proving the theory boils down to simulating the arrays of NAND-TM using the lists of the enhanced λ calculus.

We will encode each array A of NAND-TM program by a list \(L\) of the NAND-CIRC program. We encode the index variable i by a special list \(I\) that has \(1\) in the location corresponding to the value of i and \(0\)’s everywhere else. To simulate moving i to the left, we need to remove the first item from the list, while to simulate moving i to the right, we add a zero to the head of list.12

To extract the i-th bit of the array corresponding to \(L\), we need to compute the following function \(get\) that on input a pair of lists \(I\) and \(L\) of bits of the same length \(n\), \(get(I,L)\) outputs \(1\) if and only if there is some \(j \in [n]\) such that the \(j\)-th element of both \(I\) and \(L\) is equal to \(1\). This turns out to be not so hard. The key is to implement the function \(zip\) that on input a pair of lists \(I\) and \(L\) of the same length \(n\), outputs a list of \(n\) pairs \(M\) such that the \(j\)-th element of \(M\) (which we denote by \(M_j\)) is the pair \((I_j,L_j)\). Thus \(zip\) “zips together” these two lists of elements into a single list of pairs.13 It is a good exercise to give a recursive implementation of \(zip\), and so can implement it using the \(\ensuremath{\mathit{RECURSE}}\) operator. Once we have \(zip\), we can implement \(get\) by applying an appropriate \(\ensuremath{\mathit{REDUCE}}\) on the list \(zip(I,L)\).

Setting the list \(L\) at the \(i\)-th location to a certain value requires computing the function \(set(I,L,v)\) that outputs a list \(L'\) such that \(L'_j = L_j\) if \(I_j = 0\) and \(L'_j = v\) otherwise. The function \(set\) can be implemented by applying \(\ensuremath{\mathit{MAP}}\) with an appropriate operator to the list \(zip(I,L)\).

We omit the full details of implementing \(set\) and \(get\), but the bottom line is that for every NAND-TM program \(P\), we can obtain a λ expression \(\ensuremath{\mathit{NEXT}}_P\) such that, if we let \(\sigma = (loop,foo,bar,\ldots,I,X,X_nonblank,Y,Y_nonblank,Baz,Blah,\ldots)\) be the set of Boolean values and lists that encode the current state of \(P\) (with a list for each array and for the index variable i), then \(\ensuremath{\mathit{NEXT}}_P \sigma\) will encode the state after performing one iteration of \(P\).

Now we can use the following “pseudocode” to simulate the program \(P\). The function \(\ensuremath{\mathit{SIM}}_P\) will obtain an encoding \(\sigma_0\) of the initial state of \(P\), and output the encoding \(\sigma^*\) of the state of \(P\) after it halts. It will be computed as follows:

Input: \(\sigma\): encoding of the state of a NAND-TM program

Operation:

  1. Let \(\sigma' = \ensuremath{\mathit{NEXT}}_P \sigma\).

  2. If \(halt(\sigma') = 1\) then return \(\sigma'\), where \(halt(\sigma')\) is equal to \(1\) if the configuration \(\sigma'\) is one in which the program will halt when executing the MODANDJUMP operation.

  3. Otherwise return \(\ensuremath{\mathit{SIM}}_P(\sigma')\).

We can write this algorithm as the λ expression

\[ \ensuremath{\mathit{RECURSE}} \; \bigl(\lambda m,\sigma. \ensuremath{\mathit{IF}}(loop(\ensuremath{\mathit{NEXT}}_P \sigma)\;,\; m(\ensuremath{\mathit{NEXT}}_P \sigma)\;,\;\ensuremath{\mathit{NEXT}}_P \sigma) \bigr) \]

Given \(\ensuremath{\mathit{SIM}}_P\), we can compute the function computed by \(P\) by writing expressions for encoding the input as the initial state, and decoding the output from the final state. We omit the details, though this is fairly straightforward.14

The pure λ calculus

While the collection of “basic” functions we allowed for λ calculus is smaller than what’s provided by most Lisp dialects, coming from NAND-TM it still seems a little “bloated”. Can we make do with less? In other words, can we find a subset of these basic operations that can implement the rest?

This is a good point to pause and think how you would implement these operations yourself. For example, start by thinking how you could implement \(\ensuremath{\mathit{MAP}}\) using \(\ensuremath{\mathit{REDUCE}}\), and then \(\ensuremath{\mathit{REDUCE}}\) using \(\ensuremath{\mathit{RECURSE}}\) combined with \(0,1,\ensuremath{\mathit{IF}},\ensuremath{\mathit{PAIR}},\ensuremath{\mathit{HEAD}},\ensuremath{\mathit{TAIL}},\ensuremath{\mathit{NIL}},\ensuremath{\mathit{ISEMPTY}}\) together with the λ operations.

Now you can think how you could implement \(\ensuremath{\mathit{PAIR}}\), \(\ensuremath{\mathit{HEAD}}\) and \(\ensuremath{\mathit{TAIL}}\) based on \(0,1,\ensuremath{\mathit{IF}}\). The idea is that we can represent a pair as function.

It turns out that there is in fact a proper subset of these basic operations that can be used to implement the rest. That subset is the empty set. That is, we can implement all the operations above using the λ formalism only, even without using \(0\)’s and \(1\)’s. It’s λ’s all the way down! The idea is that we encode \(0\) and \(1\) themselves as λ expressions, and build things up from there. This is known as Church encoding, as it was originated by Church in his effort to show that the λ calculus can be a basis for all computation.

There are λ expressions that implement the functions \(0\),\(1\),\(\ensuremath{\mathit{IF}}\),\(\ensuremath{\mathit{PAIR}}\), \(\ensuremath{\mathit{HEAD}}\), \(\ensuremath{\mathit{TAIL}}\), \(\ensuremath{\mathit{NIL}}\), \(\ensuremath{\mathit{ISEMPTY}}\), \(\ensuremath{\mathit{MAP}}\), \(\ensuremath{\mathit{REDUCE}}\), and \(\ensuremath{\mathit{RECURSE}}\).

We will not write the full formal proof of Theorem 7.14 but outline the ideas involved in it:

  • We define \(0\) to be the function that on two inputs \(x,y\) outputs \(y\), and \(1\) to be the function that on two inputs \(x,y\) outputs \(x\). Of course we use Currying to achieve the effect of two-input functions and hence \(0 = \lambda x. \lambda y.y\) and \(1 = \lambda x.\lambda y.x\).15

  • The above implementation makes the \(\ensuremath{\mathit{IF}}\) function trivial: \(\ensuremath{\mathit{IF}}(cond,a,b)\) is simply \(cond \; a\; b\) since \(0ab = b\) and \(1ab = a\). We can write \(\ensuremath{\mathit{IF}} = \lambda x.x\) to achieve \(\ensuremath{\mathit{IF}}(cond,a,b) = (((\ensuremath{\mathit{IF}} cond) a) b) = cond \; a \; b\).

  • To encode a pair \((x,y)\) we will produce a function \(f_{x,y}\) that has \(x\) and \(y\) “in its belly” and satisfies \(f_{x,y}g = g x y\) for every function \(g\). That is, we write \(\ensuremath{\mathit{PAIR}} = \lambda x,y. \lambda g. gxy\). Note that now we can extract the first element of a pair \(p\) by writing \(p1\) and the second element by writing \(p0\), and so \(\ensuremath{\mathit{HEAD}} = \lambda p. p1\) and \(\ensuremath{\mathit{TAIL}} = \lambda p. p0\).

  • We define \(\ensuremath{\mathit{NIL}}\) to be the function that ignores its input and always outputs \(1\). That is, \(\ensuremath{\mathit{NIL}} = \lambda x.1\). The \(\ensuremath{\mathit{ISEMPTY}}\) function checks, given an input \(p\), whether we get \(1\) if we apply \(p\) to the function \(z = \lambda x,y.0\) that ignores both its inputs and always outputs \(0\). For every valid pair of the form \(p = \ensuremath{\mathit{PAIR}} x y\), \(p z = p x y = 0\) while \(\ensuremath{\mathit{NIL}} z=1\). Formally, \(\ensuremath{\mathit{ISEMPTY}} = \lambda p. p (\lambda x,y.0)\).

There is nothing special about Boolean values. You can use similar tricks to implement natural numbers using λ terms. The standard way to do so is to represent the number \(n\) by the function \(\ensuremath{\mathit{ITER}}_n\) that on input a function \(f\) outputs the function \(x \mapsto f(f(\cdots f(x)))\) (\(n\) times). That is, we represent the natural number \(1\) as \(\lambda f.f\), the number \(2\) as \(\lambda f.(\lambda x.f(fx))\), the number \(3\) as \(\lambda f.(\lambda x.f(f(fx)))\), and so on and so forth. (Note that this is not the same representation we used for \(1\) in the Boolean context: this is fine; we already know that the same object can be represented in more than one way.) The number \(0\) is represented by the function that maps any function \(f\) to the identity function \(\lambda x.x\). (That is, \(0 = \lambda f.(\lambda x.x)\).)

In this representation, we can compute \(\ensuremath{\mathit{PLUS}}(n,m)\) as \(\lambda f.\lambda x.(n f)((m f)x)\) and \(\ensuremath{\mathit{TIMES}}(n,m)\) as \(\lambda f.n(m f)\). Subtraction and division are trickier, but can be achieved using recursion. (Working this out is a great exercise.)

List processing

Now we come to a bigger hurdle, which is how to implement \(\ensuremath{\mathit{MAP}}\), \(\ensuremath{\mathit{FILTER}}\), and \(\ensuremath{\mathit{REDUCE}}\) in the λ calculus. It turns out that we can build \(\ensuremath{\mathit{MAP}}\) and \(\ensuremath{\mathit{FILTER}}\) from \(\ensuremath{\mathit{REDUCE}}\). For example \(\ensuremath{\mathit{MAP}}(L,f)\) is the same as \(\ensuremath{\mathit{REDUCE}}(L,g)\) where \(g\) is the operation that on input \(x\) and \(y\), outputs \(\ensuremath{\mathit{PAIR}}(f(x),\ensuremath{\mathit{NIL}})\) if \(y\) is NIL and otherwise outputs \(\ensuremath{\mathit{PAIR}}(f(x),y)\). (I leave checking this as a (recommended!) exercise for you, the reader.) So, it all boils down to implementing \(\ensuremath{\mathit{REDUCE}}\). We can define \(\ensuremath{\mathit{REDUCE}}(L,g)\) recursively, by setting \(\ensuremath{\mathit{REDUCE}}(\ensuremath{\mathit{NIL}},g)=\ensuremath{\mathit{NIL}}\) and stipulating that given a non-empty list \(L\), which we can think of as a pair \((head,rest)\), \(\ensuremath{\mathit{REDUCE}}(L,g) = g(head, \ensuremath{\mathit{REDUCE}}(rest,g)))\). Thus, we might try to write a λ expression for \(\ensuremath{\mathit{REDUCE}}\) as follows

\[ \ensuremath{\mathit{REDUCE}} = \lambda L,g. \ensuremath{\mathit{IF}}(\ensuremath{\mathit{ISEMPTY}}(L),\ensuremath{\mathit{NIL}},g \ensuremath{\mathit{HEAD}}(L) \ensuremath{\mathit{REDUCE}}(\ensuremath{\mathit{TAIL}}(L),g)) \;\;(7.17) \;. \]

The only fly in this ointment is that the λ calculus does not have the notion of recursion, and so this is an invalid definition. But of course we can use our \(\ensuremath{\mathit{RECURSE}}\) operator to solve this problem. We will replace the recursive call to “\(\ensuremath{\mathit{REDUCE}}\)” with a call to a function \(me\) that is given as an extra argument, and then apply \(\ensuremath{\mathit{RECURSE}}\) to this. Thus \(\ensuremath{\mathit{REDUCE}} = \ensuremath{\mathit{RECURSE}}\;myREDUCE\) where

\[ myREDUCE = \lambda me,L,g. \ensuremath{\mathit{IF}}(\ensuremath{\mathit{ISEMPTY}}(L),\ensuremath{\mathit{NIL}},g \ensuremath{\mathit{HEAD}}(L) me(\ensuremath{\mathit{TAIL}}(L),g)) \;\;(7.18) \;. \]

So everything boils down to implementing the \(\ensuremath{\mathit{RECURSE}}\) operator, which we now deal with.

The Y combinator, or recursion without recursion

How can we implement recursion without recursion? We will illustrate this using a simple example - the \(\ensuremath{\mathit{XOR}}\) function. As shown in Example 7.10, we can write the \(\ensuremath{\mathit{XOR}}\) function of a list recursively as follows:

\[ \ensuremath{\mathit{XOR}}(L) = \begin{cases} 0 & L \text{ is empty} \\ \ensuremath{\mathit{XOR}}_2(\ensuremath{\mathit{HEAD}}(L),\ensuremath{\mathit{XOR}}(\ensuremath{\mathit{TAIL}}(L))) & \text{otherwise} \end{cases} \]

where \(\ensuremath{\mathit{XOR}}_2:\{0,1\}^2 \rightarrow \{0,1\}\) is the XOR on two bits. In Python we would write this as

Now, how could we eliminate this recursive call? The main idea is that since functions can take other functions as input, it is perfectly legal in Python (and the λ calculus of course) to give a function itself as input. So, our idea is to try to come up with a non recursive function tempxor that takes two inputs: a function and a list, and such that tempxor(tempxor,L) will output the XOR of L!

At this point you might want to stop and try to implement this on your own in Python or any other programming language of your choice (as long as it allows functions as inputs).

Our first attempt might be to simply use the idea of replacing the recursive call by me. Let’s define this function as myxor

Let’s test this out:

If you do this, you will get the following complaint from the interpreter: TypeError: myxor() missing 1 required positional argument. The problem is that myxor expects two inputs- a function and a list- while in the call to me we only provided a list. To correct this, we modify the call to also provide the function itself:

Note the call me(me,..) in the definition of tempxor: given a function me as input, tempxor will actually call the function me with itself as the first input. If we test this out now, we see that we actually get the right result!

and so we can define xor(L) as simply return tempxor(tempxor,L).

The approach above is not specific to XOR. Given a recursive function f that takes an input x, we can obtain a non recursive version as follows:

  1. Create the function myf that takes a pair of inputs me and x, and replaces recursive calls to f with calls to me.

  2. Create the function tempf that converts calls in myf of the form me(x) to calls of the form me(me,x).

  3. The function f(x) will be defined as tempf(tempf,x)

Here is the way we implement the RECURSE operator in Python. It will take a function myf as above, and replace it with a function g such that g(x)=myf(g,x) for every x.

From Python to th$1 $2 calculus. In the λ calculus, a two input function \(g\) that takes a pair of inputs \(me,y\) is written as \(\lambda me.(\lambda y. g)\). So the function \(y \mapsto me(me,y)\) is simply written as \(me\;me\) and similarly the function \(x \mapsto tempf(tempf,x)\) is simply \(tempf\; tempf\). (Can you see why?) Therefore in the λ calculus, the function tempf is λ me. f(me me) and similarly and the expression lambda x: tempf(tempf,x) is the same as tempf tempf. Therefore, if we denote the input of RECURSE by \(f\), then the output is equal to \(F F\) where \(F = \lambda m. f (m m)\), which means that \[ \ensuremath{\mathit{RECURSE}} = \lambda f.\bigl( (\lambda m. f(m\; m))\;\; (\lambda m. f(m \;m)) \bigr) \]

The online appendix contains an implementation of the λ calculus using Python. Here is an implementation of the recursive XOR function from that appendix:16

The \(\ensuremath{\mathit{RECURSE}}\) operator above is better known as the Y combinator.

It is one of a family of a fixed point operators that given a lambda expression \(F\), find a fixed point \(f\) of \(F\) such that \(f = F f\). If you think about it, \(\ensuremath{\mathit{XOR}}\) is the fixed point of \(myXOR\) above. \(\ensuremath{\mathit{XOR}}\) is the function such that for every \(x\), if plug in \(\ensuremath{\mathit{XOR}}\) as the first argument of \(myXOR\) then we get back \(\ensuremath{\mathit{XOR}}\), or in other words \(\ensuremath{\mathit{XOR}} = myXOR\; \ensuremath{\mathit{XOR}}\). Hence finding a fixed point for \(myXOR\) is the same as applying \(\ensuremath{\mathit{RECURSE}}\) to it.

Infinite loops in the λ calculus

The fact that λ-expressions can simulate NAND-TM programs means that, like Turing machines and NAND-TM programs, the λ calculus can also enter into an infinite loop. For example, consider the λ expression

\[ (\lambda x.xxx)(\lambda x.xxx) \]

If we try to evaluate it then the first step is to invoke the lefthand function on the righthand one and then obtain

\[ (\lambda x.xxx)(\lambda x.xxx)(\lambda x.xxx) \]

To evaluate this, the next step would be to apply the second term on the third term,17 which would result in

\[ (\lambda x.xxx)(\lambda x.xxx)(\lambda x.xxx)(\lambda x.xxx) \]

We can see that continuing in this way we get longer and longer expressions, and this process never concludes.

More Turing-complete computational models

There is a great variety of models that are computationally equivalent to Turing machines (and hence to NAND-TM/NAND-RAM program). We briefly mention a few examples.

Parallel algorithms and cloud computing

The models of computation we considered so far are inherently sequential, but these days much computation happens in parallel, whether using multi-core processors or in massively parallel distributed computation in data centers or over the Internet. Parallel computing is important in practice, but it does not really make much difference for the question of what can and can’t be computed. After all, if a computation can be performed using \(m\) machines in \(t\) time, then it can be computed by a single machine in time \(mt\).

Game of life, tiling and cellular automata

Many physical systems can be described as consisting of a large number of elementary components that interact with one another. One way to model such systems is using cellular automata. This is a system that consists of a large number (or even infinite) cells. Each cell only has a constant number of possible states. At each time step, a cell updates to a new state by applying some simple rule to the state of itself and its neighbors.

A canonical example of a cellular automaton is Conway’s Game of Life. In this automata the cells are arranged in an infinite two dimensional grid. Each cell has only two states: “dead” (which we can encode as \(0\) and identify with \(\varnothing\)) or “alive” (which we can encode as \(1\)). The next state of a cell depends on its previous state and the states of its 8 vertical, horizontal and diagonal neighbors. A dead cell becomes alive only if exactly three of its neighbors are alive. A live cell continues to live if it has two or three live neighbors. Even though the number of cells is potentially infinite, we can have a finite encoding for the state by only keeping track of the live cells. If we initialize the system in a configuration with a finite number of live cells, then the number of live cells will stay finite in all future steps.

We can think of such a system as encoding a computation by starting it in some initial configuration, and then defining some halting condition (e.g., we halt if the cell at position \((0,0)\) becomes dead) and some way to define an output (e.g., we output the state of the cell at position \((1,1)\)). Clearly, given any starting configuration \(x\), we can simulate the game of life starting from \(x\) using a NAND-RAM (or NAND-TM) program, and hence every “Game-of-Life computable” function is computable by a NAND-RAM program. Surprisingly, it turns out that the other direction is true as well: as simple as its rules seem, we can simulate a Turing machine using the game of life (see Figure 7.9). The Wikipedia page for the Game of Life contains some beautiful figures and animations of configurations that produce some very interesting evolutions. See also the book The Nature of Computation.

A Game-of-Life configuration simulating a Turing Machine. Figure by Paul Rendell.
A Game-of-Life configuration simulating a Turing Machine. Figure by Paul Rendell.

Configurations of Turing machines and one dimensional cellular automata

It turns out that even one dimensional cellular automata can be Turing complete (see Figure 7.11). In a one dimensional automata, the cells are laid out in one infinitely long line. The next state of each cell is only a function of its past state and the state of both its neighbors.

Let \(\Sigma\) be a finite set containing the symbol \(\varnothing\). A one dimensional cellular automation over alphabet \(\Sigma\) is described by a transition rule \(r:\Sigma^3 \rightarrow \Sigma\), which satisfies \(r(\varnothing,\varnothing,\varnothing) = \varnothing\).

An configuration of the automaton is specified by a string \(\alpha \in \Sigma^*\). We can also think of \(\alpha\) as the infinite sequence \((\alpha_0,\alpha_1,\ldots,\alpha_{n-1},\varnothing,\varnothing,\varnothing,\ldots)\), where \(n=|\alpha|\). If \(\alpha\) is a configuration and \(r\) is a transition rule, then the next step configuration, denoted by \(\alpha' = \ensuremath{\mathit{NEXT}}_r(\alpha)\) is defined as follows: \[alpha'_i = \ensuremath{\mathit{NEXT}}_r(\alpha_{i-1},\alpha_i,\alpha_{i+1})\] for \(i=0,\ldots,n\). If \(j\) is smaller than \(0\) or larger than \(n-1\) then we set \(\alpha_j = \varnothing\).

In other words, the next state of the automaton \(r\) at point \(i\) obtained by applying the rule \(r\) to the values of \(\alpha\) at \(i\) and its two neighbors.

For every Turing machine \(M\), there is a one dimension cellular automaton that can simulate \(M\) on every input \(x\).

To make this more precise, what Theorem 7.18 says is that for every Turing machine \(M\), there is a one-dimensional cellular automaton \(\mathcal{A}\) over some alphabet \(\Sigma\), such that:

  • There are computable maps to encode and decode configurations of \(M\) as configurations of \(\mathcal{A}\).

  • If we initialize \(\mathcal{A}\) in the configuration encoding the starting state of \(M\), then at every step \(t\), the configuration of \(\mathcal{A}\) at step \(t\) encodes the configuration of \(M\) at the same step.

A configuration of \(M\) contains its full state at after a particular iteration. That is, the contents of all (non empty) cells of its tape, its current state, as well as the head position. We can encode such a configuration as a string \(\alpha\) over some large alphabet \(\Sigma\). At position \(j\), the symbol \(\alpha_j\) will encode the value of the \(j\)-th symbol in \(M\)’s tape. If the head position is \(i\) then \(\alpha_i\) will encode this fact as well, and also contain an encoding of which state the machine is at.

Given this notion of an encoding, and the fact that the head moves only one position in each step, we can see that after one step of the machine \(M\), the configuration largely stays the same except the locations \(i,i-1,i+1\) corresponding to the location of the current variable i and its immediate neighbors. Once we realize this, we can phrase the progression from one configuration to the next as a one dimensional ceullar automaton! From this observation, Theorem 7.18 follows in a fairly straightforward manner.

Before proving Theorem 7.18, let us formally define the notion of a configuration of a Turing machine (see also Figure 7.10). We will come back to this notion in later chapters as well.

A configuration of a Turing machine M with alphabet \Sigma and state space [k] encodes the state of M at a particular step in its execution as a string \alpha over the alphabet \overline{\Sigma} = \Sigma \times (\{\cdot \} \times [k]). The string is of length t where t is such that M’s tape contains \varnothing in all positions t and larger and M’s head is in a position smaller than t. If M’s head is in the i-th position, then for j \neq i, \alpha_j encodes the value of the j-th cell of M’s tape, while \alpha_i encodes both this value as well as the current state of M. If the machine writes the value \tau, changes state to t, and moves right, then in the next configuration will contain at position i the value (\tau,\cdot) and at position i+1 the value (\alpha_{i+1},t).
A configuration of a Turing machine \(M\) with alphabet \(\Sigma\) and state space \([k]\) encodes the state of \(M\) at a particular step in its execution as a string \(\alpha\) over the alphabet \(\overline{\Sigma} = \Sigma \times (\{\cdot \} \times [k])\). The string is of length \(t\) where \(t\) is such that \(M\)’s tape contains \(\varnothing\) in all positions \(t\) and larger and \(M\)’s head is in a position smaller than \(t\). If \(M\)’s head is in the \(i\)-th position, then for \(j \neq i\), \(\alpha_j\) encodes the value of the \(j\)-th cell of \(M\)’s tape, while \(\alpha_i\) encodes both this value as well as the current state of \(M\). If the machine writes the value \(\tau\), changes state to \(t\), and moves right, then in the next configuration will contain at position \(i\) the value \((\tau,\cdot)\) and at position \(i+1\) the value \((\alpha_{i+1},t)\).

Definition 7.19 below has some technical details, but is not actually that deep or complicated. You would probably understand it better if before starting to read it, you take a moment to stop and think how you would encode as a string the state of a Turing machine at a given point in an execution.

Think what are all the components that you need to know in order to be able to continue the execution from this point onwards, and what is a simple way to encode them using a list of strings (which in turn can be encoded as a string). In particular, with an eye towards our future applications, try to think of an encoding which will make it as simple as possible to map a configuration at step \(t\) to the configuration at step \(t+1\).

Let \(M\) be a Turing machine with tape alphabet \(\Sigma\) and state space \([k]\). A configuration of \(M\) is a string \(\alpha \in \overline{\Sigma}^*\) where \(\overline{\Sigma} = \Sigma \times \left( \{\cdot\} \cup [k] \right)\) that satisfies that there is exactly one coordinate \(i\) for which \(\alpha_i = (\sigma,s)\) for some \(\sigma \in \Sigma\) and \(s\in [k]\). For all other coordinates \(j\), \(\alpha_j = (\sigma',\cdot)\) for some \(\sigma'\in \Sigma\).

A configuration \(\alpha \in \overline{\Sigma}^*\) of \(M\) corresponds to the following state of its execution:

  • \(M\)’s tape contains \(\alpha_{j,0}\) for all \(j<|\alpha|\) and contains \(\varnothing\) for all positions that are at least \(|\alpha|\), where we let \(\alpha_{j,0}\) be the value \(\sigma\) such that \(\alpha_j = (\sigma,t)\) with \(\sigma \in \Sigma\) and \(t \in \{\cdot \} \cup [k]\). (In other words, since \(\alpha_j\) is a pair of an alphabet symbol \(\sigma\) and either a state in \([k]\) or the symbol \(\cdot\), \(\alpha_{j,0}\) is the first component \(\sigma\) of this pair.)

  • \(M\)’s head is in the unique position \(i\) for which \(\alpha_i\) has the form \((\sigma,s)\) for \(s\in [k]\), and \(M\)’s state is equal to \(s\).

Definition 7.19 is a little cumbersome, but ultimately a configuration is simply a string that encodes a snapshot of the state of the NAND-TM program at a given point in the execution. (In operating-systems lingo, it would be a “core dump”.) Such a snapshot needs to encode the following components:

  1. The current head position.

  2. The full contents of the large scale memory, that is the tape.

  3. The contents of the “local registers”, that is the state of the machine.

The precise details of how we encode a configuration are not important, but we do want to record the following simple fact:

Let \(M\) be a Turing machine and let \(\ensuremath{\mathit{NEXT}}_M:\overline{\Sigma}^* \rightarrow \overline{\Sigma}^*\) be the function that maps a configuration of \(M\) to the configuration at the next step of the execution. Then for every \(i \in \N\), the value of \(\ensuremath{\mathit{NEXT}}_M(\alpha)_i\) only depends on the coordinates \(\alpha_{i-1},\alpha_i,\alpha_{i+1}\).18

We leave proving Lemma 7.20 as Exercise 7.4. It is not a hard exercise, but doing it is a great way to ensure that you are comfortable with the definition of configurations.

Once we have Lemma 7.20 in place, we see that the function \(\ensuremath{\mathit{NEXT}}_M\) that maps a configuration of \(P\) into the next one is in fact a valid rule for a one dimensional automata, hence completing the proof of Theorem 7.18. The automaton arising from the proof of Theorem 7.18 has a large alphabet, and furthermore one whose size that depends on the machine \(M\) that is being simulated. It turns out that one can obtain an automaton with an alphabet of fixed size that is independent of the program being simulated, and in fact the alphabet of the automaton can be the minimal set \(\{0,1\}\)! See Figure 7.11 for an example of such an Turing-complete automaton.

Evolution of a one dimensional automata. Each row in the figure corresponds to the configuration. The initial configuration corresponds to the top row and contains only a single “live” cell. This figure corresponds to the “Rule 110” automaton of Stefan Wolfram which is Turing Complete. Figure taken from Wolfram MathWorld.
Evolution of a one dimensional automata. Each row in the figure corresponds to the configuration. The initial configuration corresponds to the top row and contains only a single “live” cell. This figure corresponds to the “Rule 110” automaton of Stefan Wolfram which is Turing Complete. Figure taken from Wolfram MathWorld.

We can represent a configuration \(\alpha \in \overline{\Sigma}^*\) by simply encoding each coordinate of \(\alpha\) using \(\log |\overline{\Sigma}|\) bits. When we refer to a configuration as a binary string (for example when feeding it as input to other programs) we will assume that this string represents the configuration via the above encoding.

We can use the same approach to define configurations of a NAND-TM program. Such a configuration will need to encode:

  1. The current value of the variable i.

  2. For every scalar variable foo, the value of foo.

  3. For every array variable Bar, the value Bar[\(j\)] for every \(j \in \{0,\ldots, t-1\}\) where \(t-1\) is the largest value that the index variable i ever achieved in the computation.

Turing completeness and equivalence, a formal definition (optional)

A computational model is some way to define what it means for a program (which is represented by a string) to compute a (partial) function. A computational model \(\mathcal{M}\) is Turing complete, if we can map every Turing machine (or equivalently NAND-TM program) \(N\) into a program \(P\) for \(\mathcal{M}\) that computes the same function as \(Q\). It is Turing equivalent if the other direction holds as well (i.e., we can map every program in \(\mathcal{M}\) to a Turing machine that computes the same function). Formally, we can define this notion as follows:19

Let \(\mathcal{F}\) be the set of all partial functions from \(\{0,1\}^*\) to \(\{0,1\}^*\). A computational model is a map \(\mathcal{M}:\{0,1\}^* \rightarrow \mathcal{F}\).

We say that a program \(P \in \{0,1\}^*\) \(\mathcal{M}\)-computes a function \(F\in \mathcal{F}\) if \(\mathcal{M}(P) = F\).

A computational model \(\mathcal{M}\) is Turing complete if there is a computable map \(\ensuremath{\mathit{ENCODE}}_{\mathcal{M}}:\{0,1\}^* \rightarrow \{0,1\}^*\) for every Turing machine \(N\) (represented as a string), \(\mathcal{M}(\ensuremath{\mathit{ENCODE}}_{\mathcal{M}}(N))\) is equal to the partial function computed by \(P\).

A computational model \(\mathcal{M}\) is Turing equivalent if it is Turing complete and there exists a computable map \(\ensuremath{\mathit{DECODE}}_{\mathcal{M}}:\{0,1\}^* \rightarrow \{0,1\}^*\) such that or every string \(P\in \{0,1\}^*\), \(N=\ensuremath{\mathit{DECODE}}_{\mathcal{M}}(P)\) is a string representation of a Turing machine that computes the function \(\mathcal{M}(P)\).

Some examples of Turing equivalent models include:

  • Turing machines
  • NAND-TM programs
  • NAND-RAM programs
  • Python, JavaScript, C, Lisp, and other programming languages.
  • λ calculus
  • Game of life (mapping programs and inputs/outputs to starting and ending configurations)
  • Programming languages such as Python/C/Javascript/OCaml… (allowing for unbounded storage)

The Church-Turing Thesis (discussion)

“[In 1934], Church had been speculating, and finally definitely proposed, that the λ-definable functions are all the effectively calculable functions …. When Church proposed this thesis, I sat down to disprove it … but, quickly realizing that [my approach failed], I became overnight a supporter of the thesis.”, Stephen Kleene, 1979.

“[The thesis is] not so much a definition or to an axiom but … a natural law.”, Emil Post, 1936.

We have defined functions to be computable if they can be computed by a NAND-TM program, and we’ve seen that the definition would remain the same if we replaced NAND-TM programs by Python programs, Turing machines, λ calculus, cellular automata, and many other computational models. The Church-Turing thesis is that this is the only sensible definition of “computable” functions. Unlike the “Physical Extended Church Turing Thesis” (PECTT) which we saw before, the Church Turing thesis does not make a concrete physical prediction that can be experimentally tested, but it certainly motivates predictions such as the PECTT. One can think of the Church-Turing Thesis as either advocating a definitional choice, making some prediction about all potential computing devices, or suggesting some laws of nature that constrain the natural world. In Scott Aaronson’s words, “whatever it is, the Church-Turing thesis can only be regarded as extremely successful”. No candidate computing device (including quantum computers, and also much less reasonable models such as the hypothetical “closed time curve” computers we mentioned before) has so far mounted a serious challenge to the Church Turing thesis. These devices might potentially make some computations more efficient, but they do not change the difference between what is finitely computable and what is not.20

Different models of computation

We can summarize the models we have seen in the following table:

Different models for computing finite functions and functions with arbitrary input length.
Computational problems Type of model Examples
Finite functions \(f:\{0,1\}^n \rightarrow \{0,1\}^m\) Non uniform computation (algorithm depends on input length) Boolean circuits, NAND circuits, straight-line programs (e.g., NAND-CIRC)
Functions with unbounded inputs \(F:\{0,1\}^* \rightarrow \{0,1\}^*\) Sequential access to memory Turing machines, NAND-TM programs
Indexed access / RAM RAM machines, NAND-RAM, modern programming languages
Other Lambda calculus, cellular automata

Later on in Chapter 16 we will study memory bounded computation. It turns out that NAND-TM programs with a constant amount of memory are equivalent to the model of finite automata (the adjectives “deterministic” or “nondeterministic” are sometimes added as well, this model is also known as finite state machines) which in turns captures the notion of regular languages (those that can be described by regular expressions), which is a concept we will see in Chapter 9.

  • While we defined computable functions using NAND-TM programs, we could just as well have done so using many other models, including not just NAND-RAM but also Turing machines, RAM machines, the λ-calculus and many other models.
  • Very simple models turn out to be “Turing complete” in the sense that they can simulate arbitrarily complex computation.

Exercises

This exercise shows part of the proof that NAND-TM can simulate NAND-RAM. Produce the code of a NAND-TM program that computes the function \(\ensuremath{\mathit{LOOKUP}}:\{0,1\}^* \rightarrow \{0,1\}\) that is defined as follows. On input \(pf(i)x\), where \(pf(i)\) denotes a prefix-free encoding of an integer \(i\), \(\ensuremath{\mathit{LOOKUP}}(pf(i)x)=x_i\) if \(i<|x|\) and \(\ensuremath{\mathit{LOOKUP}}(pf(i)x)=0\) otherwise. (We don’t care what \(\ensuremath{\mathit{LOOKUP}}\) outputs on inputs that are not of this form.) You can choose any prefix-free encoding of your choice, and also can use your favorite programming language to produce this code.

Let \(embed:\N^2 \rightarrow \N\) be the function defined as \(embed(x_0,x_1)= \tfrac{1}{2}(x_0+x_1)(x_0+x_1+1) + x_1\).

  1. Prove that for every \(x^0,x^1 \in \N\), \(embed(x^0,x^1)\) is indeed a natural number.

  2. Prove that \(embed\) is one-to-one

  3. Construct a NAND-TM program \(P\) such that for every \(x^0,x^1 \in \N\), \(P(pf(x^0)pf(x^1))=pf(embed(x^0,x^1))\), where \(pf\) is the prefix-free encoding map defined above. You can use the syntactic sugar for inner loops, conditionals, and incrementing/decrementing the counter.

  4. Construct NAND-TM programs \(P_0,P_1\) such that for for every \(x^0,x^1 \in \N\) and \(i \in N\), \(P_i(pf(embed(x^0,x^1)))=pf(x^i)\). You can use the syntactic sugar for inner loops, conditionals, and incrementing/decrementing the counter.

Prove that for every λ-expression \(e\) with no free variables there is an equivalent λ-expression \(f\) that only uses the variables \(x\),\(y\), and \(z\).21

Prove Lemma 7.20 and use it to complete the proof of Theorem 7.18.

Bibliographical notes

Chapters 7 in the wonderful book of Moore and Mertens (Moore, Mertens, 2011) contains a great exposition much of this material. Chapter 3 in Savage’s book (Savage, 1998) contains a more formal description of RAM machines, see also the paper (Hagerup, 1998) . A study of RAM algorithms that are independent of the input size (known as the “transdichotomous RAM model”) was initiated by (Fredman, Willard, 1993) .

The RAM model can be very useful in studying the concrete complexity of practical algorithms. However, the exact set of operations that are allowed in the RAM model at unit cost can vary between texts and contexts. One needs to be careful in making such definitions, especially if the word size grows, as was already shown by Shamir (Shamir, 1979) .

The λ-calculus was described by Church in (Church, 1941) . Pierce’s book (Pierce, 2002) is a canonical textbook, see also (Barendregt, 1984) . The “Currying technique” is named after the logician Haskell Curry (the Haskell programming language is named after Haskell Curry as well). Curry himself attributed this concept to Moses Schönfinkel, though for some reason the term “Schönfinkeling” never caught on..

Tao has proposed showing the Turing completeness of fluid dynamics (a “water computer”) as a way of settling the question of the behavior of the Navier-Stokes equations, see this popular article.

  1. “Random access memory” is quite a misnomer, since it has nothing to do with probability. Indexed access would have been more appropriate. However, the term “random access” is standard in both the theoretical and practical literature, and hence we will use it as well.

  2. You can ignore this restriction for now: if we want to hold larger numbers, we can simply execute dummy instructions. This restriction will be useful in later chapters, where we will be interested in a more realistic accounting of running time. Also, while RAM machines have a single memory array, we allow several arrays in NAND-RAM. This does not make any difference. For example, one can simulate five arrays Array0[], \(\ldots\), Array4[] using a single array Array[] by replacing calls to Array\(i\)[$j$] with Array[\(5j+i\)].

  3. The difference between having “built in” vs “syntactic sugar” features is immaterial at this point in the book, but we do so with an eye toward the later parts of this book, when we start counting the number of operations of our algorithms. Even then, the effect of including these features vs implementing them via syntactic sugar will not be very dramatic.

  4. Some programming language have fixed (even if extremely large) bounds on the amount of memory they can access, which formally prevent them from being applicable to computing infinite functions and hence simulating Turing machines. We ignore such issues in this discussion and assume access to some storage device without a fixed upper bound on its capacity.

  5. Anonymous functions, using either \(\lambda x.f(x)\), \(x \mapsto f(x)\) or other closely related notation, appear in many programming languages. For example, in Python we can define the squaring function using lambda x: x*x while in JavaScript we can use x => x*x or (x) => x*x. In Scheme we would define it as (lambda (x) (* x x)).

  6. Strictly speaking we should replace only the free and not the ones that are bound by some other λ operator. For example, if we have the λ expression \(\lambda x.(\lambda x. x+1)(x)\) and invoke it on the number \(7\) then we get \((\lambda x.x+1)(7)=8\) and not the nonsensical expression \((\lambda 7.7+1)(7)\). To avoid such annoyances, we can adopt the convention that every instance of \(\lambda \mathit{var}.e\) uses a unique variable identifier \(\mathit{var}\). See Subsection 1.5.8 for more discussion on bound and free variables.

  7. These two rules are commonly known as “\(\beta\) reduction” and “\(\alpha\) conversion” in the literature on the λ calculus.

  8. “Call by value” is also sometimes known as eager evaluation, since it means we always evaluate parameters to functions before they are executed, while “call by name” is also known as lazy evaluation, since it means that we hold off on evaluating parameters until we are sure we need them. Most programming languages use eager evaluation, though there are some exceptions (notably Haskell). For programming languages that involve non pure functions, call by value has the advantage that it is much easier to understand when the side effects will take place in the program.

  9. We use currying to implement multi-input functions, and so \(\ensuremath{\mathit{IF}}\) is the function \(cond \mapsto f_cond\) where \(f_1\) is the function \(x \mapsto (y \mapsto x)\) and \(f_0\) is the function \(x \mapsto (y \mapsto y)\). Can you see why? If not, then working this out is a great exercise.

  10. Note that if \(L\) is a list, then \(\ensuremath{\mathit{HEAD}} L\) is its first element, but \(\ensuremath{\mathit{TAIL}} L\) is not the last element but rather all the elements except the first. The second element of a list \(L\) can be extracted using \(\ensuremath{\mathit{HEAD}} (\ensuremath{\mathit{TAIL}} L)\). Once again, working out why this is the case is a great exercise.

  11. In Lisp, the \(\ensuremath{\mathit{PAIR}}\), \(\ensuremath{\mathit{HEAD}}\) and \(\ensuremath{\mathit{TAIL}}\) functions are traditionally called cons, car and cdr.

  12. In fact, it will be convenient for us to make sure all lists are of the same length, and so at the end of each step we will add a sufficient number of zeroes to the end of each list. This can be done with a simple REDUCE operation.

  13. The name \(zip\) is a common name for this operation, for example in Python. It should not be be confused with the zip compression file format.

  14. For example, if X is a list representing the input, then we can obtain a list X_nonblank of \(1\)’s of the same length by simply writing X_nonblank \(= \ensuremath{\mathit{MAP}}(\)X\(,\lambda x.1)\).

  15. This representation scheme is the common convention for representing false and true but there are many other alternative representations for \(0\) and \(1\) that would have worked just as well.

  16. Because of specific issues of Python syntax, in this implementation we use f * g for applying f to g rather than fg, and use λx(exp) rather than λx.exp for abstraction. We also use _0 and _1 for the λ terms for \(0\) and \(1\) so as not to confuse with the Python constants.

  17. This assumes we use the “call by value” evaluation ordering which states that to evaluate a λ expression \(fg\) we first evaluate the righthand expression \(g\) and then invoke \(f\) on it. The “Call by name” or “lazy evaluation” ordering would first evaluate the lefthand expression \(f\) and then invoke it on \(g\). In this case both strategies would result in an infinite loop. There are examples though when “call by name” would not enter an infinite loop while “call by value” would. The SML and OCaml programming languages use “call by value” while Haskell uses (a close variant of) “call by name”.

  18. For simplicity of notation and of phrasing this lemma, we use the convention that if \(i\) is “out of bounds”, such as \(i<0\) or \(i>|\alpha|\), then we assume that \(\alpha_i = (\varnothing,\cdot)\).

  19. The formal definition is very cumbersome to state, and not crucial for the remainder of this book. Feel free to skip it as long as you understand the general concept of Turing equivalence. This notion is sometimes referred to in the literature as Gödel numbering or admissalbe numbering.

  20. The extended Church Turing thesis, which we discuss in Section 12.6, stipulates that Turing machines capture also the limit of what can be efficiently computable. Just like its physical version, quantum computing presents the main challenge to this thesis.

  21. Hint: You can reduce the number of variables a function takes by “pairing them up”. That is, define a λ expression \(\ensuremath{\mathit{PAIR}}\) such that for every \(x,y\) \(\ensuremath{\mathit{PAIR}} xy\) is some function \(f\) such that \(f0=x\) and \(f1=y\). Then use \(\ensuremath{\mathit{PAIR}}\) to iteratively reduce the number of variables used.

Comments

Comments are posted on the GitHub repository using the utteranc.es app. A GitHub login is required to comment. If you don't want to authorize the app to post on your behalf, you can also comment directly on the GitHub issue for this page.

Compiled on 03/19/2019 09:08:31

Copyright 2019, Boaz Barak.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Produced using pandoc and panflute with templates derived from gitbook and bookdown.