★ See also the **PDF version of this chapter** (better formatting/references) ★

See any bugs/typos/confusing explanations? Open a GitHub issue. You can also comment below

- Recall basic mathematical notions such as sets, functions, numbers, logical operators and quantifiers, strings, and graphs.
- Rigorously define Big-\(O\) notation.
- Proofs by induction.
- Practice with reading mathematical
*definitions*,*statements*, and*proofs*. - Transform an intuitive argument into a rigorous proof.

“When you have mastered numbers, you will in fact no longer be reading numbers, any more than you read words when reading books. You will be reading meanings.”, W. E. B. Du Bois

“I found that every number, which may be expressed from one to ten, surpasses the preceding by one unit: afterwards the ten is doubled or tripled … until a hundred; then the hundred is doubled and tripled in the same manner as the units and the tens … and so forth to the utmost limit of numeration.”, Muhammad ibn Mūsā al-Khwārizmī, 820, translation by Fredric Rosen, 1831.

In this chapter, we review some of the mathematical concepts that we will use in this course. Most of these are not very complicated, but do require some practice and exercise to get comfortable with. If you have not previously encountered some of these concepts, there are several excellent freely-available resources online that cover them. In particular, the CS 121 webpage contains a program for self study of all the needed notions using the lecture notes, videos, and assignments of MIT course 6.042j Mathematics for Computer science. (The MIT lecture notes were also used in the past in Harvard CS 20.)

Before explaining the math background, perhaps I should explain why is
this course so “mathematically heavy”. After all, this is supposed to be
a course about *computation*; one might think we should be talking
mostly about *programs*, rather than more “mathematical” objects such as
*sets*, *functions*, and *graphs*, and doing more *coding* on an actual
computer than writing mathematical proofs with pen and paper. So, why
are we doing so much math in this course? Is it just some form of
hazing? Perhaps a revenge of the “math nerds” against the
“hackers”?

At the end of the day, mathematics is simply a language for modeling
concepts in a precise and unambiguous way. In this course, we will be
mostly interested in the concept of *computation*. For example, we will
look at questions such as *“is there an efficient algorithm to find the
prime factors of a given integer?”*.*phrase* such a
question, we need to give a precise *definition* of the notion of an
*algorithm*, and of what it means for an algorithm to be *efficient*.
Also, if the answer to this or similar questions turns out to be
*negative*, then this cannot be shown by simply writing and executing
some code. After all, there is no empirical experiment that will prove
the *nonexistence* of an algorithm. Thus, our only way to show this type
of *negative result* is to use *mathematical proofs*. So you can see why
our main tools in this course will be mathematical proofs and
definitions.

Depending on your background, you can approach this chapter in different ways:

- If you already have taken some proof-based courses, and are very
familiar with the notions of discrete mathematics, you can take a
quick look at secmathoverview to see the main tools we
will use, and skip ahead to the rest of this book. Alternatively,
you can sit back, relax, and read this chapter just to get familiar
with our notation, as well as to enjoy (or not) my philosophical
musings and attempts at humor. You might also want to start brushing
up on
*discrete probability*, which we’ll use later in this course. - If your background is not as extensive, you should lean forward, and read this chapter with a pen and paper handy, making notes and working out derivations as you go along. We cannot fit a semester-length discrete math course in a single chapter, and hence will be necessarily brief in our discussions. Thus you might want to occasionally pause to watch some discrete math lectures, read some of the resources mentioned above, and do some exercises to make sure you internalized the material.

The main notions we will use in this course are the following:

**Proofs:**First and foremost, this course will involve a heavy dose of formal mathematical reasoning, which includes mathematical*definitions*,*statements*, and*proofs*.**Sets:**The basic set*relationships*of membership (\(\in\)), containment (\(\subseteq\)), and set*operations*, principally union, intersection, set difference and Cartesian product (\(\cup,\cap,\setminus\) and \(\times\)).**Tuples and strings:**The set \(\Sigma^k\) of length-\(k\) strings/lists over elements in \(\Sigma\), where \(\Sigma\) is some finite set which is called the*alphabet*(quite often \(\Sigma = \{0,1\}\)). We use \(\Sigma^*\) for the set of all strings of finite length.**Some special sets:**The set \(\N\) of natural numbers. We will index from zero in this course and so write \(\N = \{0,1,2,\ldots \}\). We will use \([n]\) for the set \(\{0,1,2,\ldots,n-1\}\). We use \(\{0,1\}^*\) for the set of all binary strings and \(\{0,1\}^n\) for the set of strings of length \(n\). If \(x\) is a string of length \(n\), then we refer to its coordinate by \(x_0,\ldots,x_{n-1}\).**Functions:**The*domain*and*codomain*of a function, properties such as being*one-to-one*(also known as*injective*) or*onto*(also known as*surjective*) functions, as well as*partial functions*(that, unlike standard or “total” functions, are not necessarily defined on all elements of their domain).**Logical operations:**The operations AND, OR, and NOT (\(\wedge,\vee,\neg\)) and the quantifiers “there exists” and “for all” (\(\exists\),\(\forall\)).**Basic combinatorics:**Notions such as \(\binom{n}{k}\) (the number of \(k\)-sized subsets of a set of size \(n\)).**Graphs:**Undirected and directed graphs, connectivity, paths, and cycles.**Big-\(O\) notation:**\(O,o,\Omega,\omega,\Theta\) notation for analyzing asymptotic growth of functions.**Discrete probability:**Later on in we will use*probability theory*, and specifically probability over*finite*samples spaces such as tossing \(n\) coins, including notions such as*random variables*,*expectation*, and*concentration*. We will only use probability theory in the second half of this text, and will review it beforehand. However, probabilistic reasoning is a subtle (and extremely useful!) skill, and it’s always good to start early in acquiring it.

In the rest of this chapter we briefly review the above notions. This is partially to remind the reader and reinforce material that might not be fresh in your mind, and partially to introduce our notation and conventions which might occasionally differ from those you’ve encountered before.

In this course, we will eventually tackle some fairly complex definitions. For example, let us consider one of the definitions that we will encounter towards the very end of this text:

If \(G:\{0,1\}^n \rightarrow \{0,1\}\) is a finite function and \(Q\) is a
Quantum circuit then we say that *\(Q\) computes \(G\)* if for every
\(x\in \{0,1\}^n\), \(\Pr[ Q(x)=G(x) ] \geq 2/3\).

The class \(\mathbf{BQP}\) (which stands for “bounded-error quantum polynomial time”) is the set of all functions \(F:\{0,1\}^* \rightarrow \{0,1\}\) such that there exists a polynomial-time Turing Machine \(M\) that satisfies the following: for every \(n\in \N\), \(M(1^n)\) is a Quantum circuit \(Q_n\) that computes \(F_n\), where \(F_n:\{0,1\}^n \rightarrow \{0,1\}\) is the restriction of \(F\) to inputs of length \(\{0,1\}^n\). That is, \(\Pr[ M(1^n)(x) = F(x)] \geq 2/3\) for every \(n\in \N\) and \(x\in \{0,1\}^n\).

We will also see the following theorem:

Let \(F:\{0,1\}^* \rightarrow \{0,1\}\) be the function that on input a string representation of a pair \((m,i)\) of natural numbers, outputs \(1\) if and only if the \(i\)-th bit of the smallest prime factor of \(m\) is equal to \(1\). Then \(F \in \mathbf{BQP}\).

While it should make sense to you by the end of the term, at the current
point in time it is perfectly fine if BGPintrodef and
shorsthmintro seem to you as a meaningless combination of
inscrutable terms. Indeed, to a large extent they *are* such a
combination, as they contains many terms that we have not defined (and
that we would need to build on a semester’s worth of material to be able
to define). Yet, even when faced with what seems like completely
incomprehensible gibberish, it is still possible for us to try to make
*some* sense of it, and try to at least to be able to “know what we
don’t know”. Let’s use BGPintrodef and shorsthmintro
as examples. For starters, let me tell you what this definition and this
theorem are about. *Quantum computing* is an approach to use the
peculiarities of quantum mechanics to build computing devices that can
solve certain problems exponentially faster than current computers. Many
large companies and
governments
are extremely excited about this possibility, and are investing hundreds
of millions of dollars in trying to make this happen. To a first order
of approximation, the reason they are so excited is Shor’s Algorithm
(i.e., shorsthmintro), which says that the problem of *integer
factoring*, with history going back thousands of years, and whose
difficulty is (as we’ll see) closely tied to the security of many
current encryption schemes, can be solved efficiently using quantum
computers.

shorsthmintro was proven by Peter Shor in 1994. However, Shor
could not even have *stated* this theorem, let alone prove it, without
having the proper definition (i.e., BGPintrodef) in place.
BGPintrodef defines the class \(\mathbf{BQP}\) of functions that
can be computed in polynomial time by quantum computers. Like any
mathematical definition, it defines a new concept (in this case the
class \(\mathbf{BQP}\)) in terms of other concepts. In this case the
concepts that are needed are

- The notion of a
*function*, which is a mapping of one set to another. In this particular case we use functions whose output is a single number that is either zero or one (i.e., a*bit*) and the input is a list of bits (i.e., a*string*) which can either have a fixed length \(n\) (this is denoted as the set \(\{0,1\}^n\)) or have length that is not a priori bounded (this is denoted by \(\{0,1\}^*\)). *Restrictions*of functions. If \(F\) is a function that takes strings of arbitrary length as input (i.e., members of the set \(\{0,1\}^*\)) then \(F_n\) is the restriction of \(F\) to inputs of length \(n\) (i.e., members of \(\{0,1\}^n\)).- We use the notion of a
*Quantum circuit*which will be our computational model for quantum computers, and which we will encounter later on in the course. Quantum circuits can compute functions with a fixed input length \(n\), and we define the notion of computing a function \(G\) as outputting on input \(x\) the value \(G(x)\) with probability at least \(2/3\). - We will also use the notion of
*Turing machines*which will be our computational model for “classical” computers.As we’ll see, there is a great variety of ways to model “classical computers”, including RAM machines, \(\lambda\)-calculus, and *NAND++ programs*. - We require that for every \(n\in \N\), the quantum circuit \(Q_n\) for \(F_n\) can be generated efficiently, in the sense that there is a polynomial-time classical program \(P\) that on input a string of \(n\) ones (which we shorthand as \(1^n\)) outputs \(Q_n\).

The point of this example is not for you to understand BGPintrodef and shorsthmintro. Fully understanding them will require background that will take us weeks to develop. The point is to show that you should not be afraid of even the most complicated looking definitions and mathematical terminology. No matter how convoluted the notation, and how many layers of indirection, you can always look at mathematical definitions and try to at least attempt at answering the following questions:

- What is the intuitive notion that this definition aims at modeling?
- How is each new concept defined in terms of other concepts?
- Which of these prior concepts am I already familiar with, and which ones do I still need to look up?

Dealing with mathematical text is in many ways not so different from dealing with any other complex text, whether it’s a legal argument, a philosophical treatise, an English Renaissance play, or even the source code of an operating system. You should not expect it to be clear in a first reading, but you need not despair. Rather you should engage with the text, trying to figure out both the high level intentions as well as the underlying details. Luckily, compared to philosophers or even programmers, mathematicians have a greater discipline of introducing definitions in linear order, making sure that every new concept is defined only in terms of previously defined notions. As you read through the rest of this chapter and this text, try to ask yourself questions 1-3 above every time that you encounter a new definition.

Here is a simpler mathematical definition, which you may have encountered in the past (and will encounter again shortly):

A function \(f:S \rightarrow T\) is *one to one* if for every two elements
\(x,x' \in S\), if \(x \neq x'\) then \(f(x) \neq f(x')\).

This definition captures a simple concept, but even so it uses quite a
bit of notation. When reading this definition, or any other piece of
mathematical text, it is often useful to annotate it with a pen as
you’re going through it, as in onetoonedefannotatedef. For
every identifier you encounter (for example \(f,S,T,x,x'\) in this case),
make sure that you realize what sort of object is it: is it a set, a
function, an element, a number, a gremlin? Make sure you understand how
the identifiers are *quantified*. For example, in onetoonedef
there is a *universal* or “for all” (sometimes denotes by \(\forall\))
quantifier over pairs \((x,x')\) of distinct elements in \(S\). Finally, and
most importantly, make sure that aside from being able to parse the
definition formally, you also have an intuitive understanding of what is
it that the text is actually saying. For example, onetoonedef
says that a one to one function is a function where every output is
obtained by a unique input.

Reading mathematical texts in this way takes time, but it gets easier with practice. Moreover, this is one of the most transferable skills you could take from this course. Our world is changing rapidly, not just in the realm of technology, but also in many other human endeavors, whether it is medicine, economics, law or even culture. Whatever your future aspirations, it is likely that you will encounter texts that use new concepts that you have not seen before (for semi-random recent examples from current “hot areas”, see alphagozerofig and zerocashfig). Being able to internalize and then apply new definitions can be hugely important. It is a skill that’s much easier to acquire in the relatively safe and stable context of a mathematical course, where one at least has the guarantee that the concepts are fully specified, and you have access to your teaching staff for questions.

We now quickly review some of the mathematical objects (the “basic data structures” of mathematics, if you will) we use in this course.

A *set* is an unordered collection of objects. For example, when we
write \(S = \{ 2,4, 7 \}\), we mean that \(S\) denotes the set that contains
the numbers \(2\), \(4\), and \(7\). (We use the notation “\(2 \in S\)” to
denote that \(2\) is an element of \(S\).) Note that the set \(\{ 2, 4, 7 \}\)
and \(\{ 7 , 4, 2 \}\) are identical, since they contain the same
elements. Also, a set either contains an element or does not contain it
– there is no notion of containing it “twice” – and so we could even
write the same set \(S\) as \(\{ 2, 2, 4, 7\}\) (though that would be a
little weird). The *cardinality* of a finite set \(S\), denoted by \(|S|\),
is the number of elements it contains.*infinite* sets.*subset* of a set \(T\), denoted by
\(S \subseteq T\), if every element of \(S\) is also an element of \(T\). (We
can also describe this by saying that \(T\) is a *superset* of \(S\).) For
example, \(\{2,7\} \subseteq \{ 2,4,7\}\). The set that contains no
elements is known as the *empty set* and it is denoted by \(\emptyset\).

We can define sets by either listing all their elements or by writing down a rule that they satisfy such as \[ \text{EVEN} = \{ x \;:\; \text{ $x=2y$ for some non-negative integer $y$} \} \;. \]

Of course there is more than one way to write the same set, and often we will use intuitive notation listing a few examples that illustrate the rule. For example, we can also define \(\text{EVEN}\) as

\[ \text{EVEN} = \{ 0,2,4, \ldots \} \;. \]

Note that a set can be either finite (such as the set \(\{2,4,7\}\) ) or
infinite (such as the set \(\text{EVEN}\)). Also, the elements of a set
don’t have to be numbers. We can talk about the sets such as the set
\(\{a,e,i,o,u \}\) of all the vowels in the English language, or the set
\(\{\) `New York`

, `Los Angeles`

, `Chicago`

, `Houston`

, `Philadelphia`

,
`Phoenix`

, `San Antonio`

, `San Diego`

, `Dallas`

\(\}\) of all cities in
the U.S. with population more than one million per the 2010 census. A
set can even have other sets as elements, such as the set
\(\{ \emptyset, \{1,2\},\{2,3\},\{1,3\} \}\) of all even-sized subsets of
\(\{1,2,3\}\).

**Operations on sets:** The *union* of two sets \(S,T\), denoted by
\(S \cup T\), is the set that contains all elements that are either in \(S\)
*or* in \(T\). The *intersection* of \(S\) and \(T\), denoted by \(S \cap T\),
is the set of elements that are both in \(S\) *and* in \(T\). The *set
difference* of \(S\) and \(T\), denoted by \(S \setminus T\) (and in some
texts also by \(S-T\)), is the set of elements that are in \(S\) but *not*
in \(T\).

**Tuples, lists, strings, sequences:** A *tuple* is an *ordered*
collection of items. For example \((1,5,2,1)\) is a tuple with four
elements (also known as a \(4\)-tuple or quadruple). Since order matters,
this is not the same tuple as the \(4\)-tuple \((1,1,5,2)\) or the \(3\)-tuple
\((1,5,2)\). A \(2\)-tuple is also known as a *pair*. We use the terms
*tuples* and *lists* interchangeably. A tuple where every element comes
from some finite set \(\Sigma\) (such as \(\{0,1\}\)) is also known as a
*string*. Analogously to sets, we denote the *length* of a tuple \(T\) by
\(|T|\). Just like sets, we can also think of infinite analogues of
tuples, such as the ordered collection \((1,2,4,9,\ldots )\) of all
perfect squares. Infinite ordered collections are known as *sequences*;
we might sometimes use the term “infinite sequence” to emphasize this,
and use “finite sequence” as a synonym for a tuple.*function* \(A:\N \rightarrow S\) (where
\(a_n = A(n)\) for every \(n\in \N\)). Similarly, we can identify a
\(k\)-tuple \((a_0,\ldots,a_{k-1})\) of elements in \(S\) with a function
\(A:[k] \rightarrow S\).

**Cartesian product:** If \(S\) and \(T\) are sets, then their *Cartesian
product*, denoted by \(S \times T\), is the set of all ordered pairs
\((s,t)\) where \(s\in S\) and \(t\in T\). For example, if \(S = \{1,2,3 \}\)
and \(T = \{10,12 \}\), then \(S\times T\) contains the \(6\) elements
\((1,10),(2,10),(3,10),(1,12),(2,12),(3,12)\). Similarly if \(S,T,U\) are
sets then \(S\times T \times U\) is the set of all ordered triples
\((s,t,u)\) where \(s\in S\), \(t\in T\), and \(u\in U\). More generally, for
every positive integer \(n\) and sets \(S_0,\ldots,S_{n-1}\), we denote by
\(S_0 \times S_1 \times \cdots \times S_{n-1}\) the set of ordered
\(n\)-tuples \((s_0,\ldots,s_{n-1})\) where \(s_i\in S_i\) for every
\(i \in \{0,\ldots, n-1\}\). For every set \(S\), we denote the set
\(S\times S\) by \(S^2\), \(S\times S\times S\) by \(S^3\),
\(S\times S\times S \times S\) by \(S^4\), and so on and so forth.

To get more comfortable with sets, one can also play with the `set`

data
structure in Python:`set`

data structure only corresponds to *finite* sets;
infinite sets are much more cumbersome to handle in programming
languages, though mechanisms such as Python
generators and lazy
evaluation in general can be helpful.

There are several sets that we will use in this course time and again, and so find it useful to introduce explicit notation for them. For starters we define

\[
\N = \{ 0, 1,2, \ldots \}
\] to be the set of all *natural numbers*, i.e., non-negative integers.
For any natural number \(n\), we define the set \([n]\) as
\(\{0,\ldots, n-1\} = \{ k\in \N : k < n \}\).

We will also occasionally use the set
\(\Z=\{\ldots,-2,-1,0,+1,+2,\ldots \}\) of (negative and non-negative)
*integers*,*numbers*.*real* numbers. (This is the
set that includes not just the integers, but also fractional and even
irrational numbers; e.g., \(\R\) contains numbers such as \(+0.5\), \(-\pi\),
etc.) We denote by \(\R_+\) the set \(\{ x\in \R : x > 0 \}\) of *positive*
real numbers. This set is sometimes also denoted as \((0,\infty)\).

**Strings:** Another set we will use time and again is

\[ \{0,1\}^n = \{ (x_0,\ldots,x_{n-1}) \;:\; x_0,\ldots,x_{n-1} \in \{0,1\} \} \] which is the set of all \(n\)-length binary strings for some natural number \(n\). That is \(\{0,1\}^n\) is the set of all \(n\)-tuples of zeroes and ones. This is consistent with our notation above: \(\{0,1\}^2\) is the Cartesian product \(\{0,1\} \times \{0,1\}\), \(\{0,1\}^3\) is the product \(\{0,1\} \times \{0,1\} \times \{0,1\}\) and so on.

We will write the string \((x_0,x_1,\ldots,x_{n-1})\) as simply \(x_0x_1\cdots x_{n-1}\) and so for example

\[ \{0,1\}^3 = \{ 000 , 001, 010 , 011, 100, 101, 110, 111 \} \;. \]

For every string \(x\in \{0,1\}^n\) and \(i\in [n]\), we write \(x_i\) for the
\(i^{th}\) coordinate of \(x\). If \(x\) and \(y\) are strings, then \(xy\)
denotes their *concatenation*. That is, if \(x \in \{0,1\}^n\) and
\(y\in \{0,1\}^m\), then \(xy\) is equal to the string \(z\in \{0,1\}^{n+m}\)
such that for \(i\in [n]\), \(z_i=x_i\) and for \(i\in \{n,\ldots,n+m-1\}\),
\(z_i = y_{i-n}\).

We will also often talk about the set of binary strings of *all*
lengths, which is

\[ \{0,1\}^* = \{ (x_0,\ldots,x_{n-1}) \;:\; n\in\N \;,\;, x_0,\ldots,x_{n-1} \in \{0,1\} \} \;. \]

Another way to write this set is as \[ \{0,1\}^* = \{0,1\}^0 \cup \{0,1\}^1 \cup \{0,1\}^2 \cup \cdots \] or more concisely as

\[ \{0,1\}^* = \cup_{n\in\N} \{0,1\}^n \;. \]

The set \(\{0,1\}^*\) contains also the “string of length \(0\)” or “the
empty string”, which we will denote by \(\mathtt{""}\).

**Generalizing the star operation:** For every set \(\Sigma\), we define

\[\Sigma^* = \cup_{n\in \N} \Sigma^n \;.\] For example, if \(\Sigma = \{a,b,c,d,\ldots,z \}\) then \(\Sigma^*\) denotes the set of all finite length strings over the alphabet a-z.

**Concatenation:** As mentioned in specialsets, the
*concatenation* of two strings \(x\in \Sigma^n\) and \(y\in \Sigma^m\) is
the \((n+m)\)-length string \(xy\) obtained by writing \(y\) after \(x\).

If \(S\) and \(T\) are nonempty sets, a *function* \(F\) mapping \(S\) to \(T\),
denoted by \(F:S \rightarrow T\), associates with every element \(x\in S\)
an element \(F(x)\in T\). The set \(S\) is known as the *domain* of \(F\) and
the set \(T\) is known as the *codomain* of \(F\). The *image* of a function
\(F\) is the set \(\{ F(x) \;|\; x\in S\}\) which is the subset of \(F\)’s
codomain consisting of all output elements that are mapped from some
input.*range* to denote the image of a function, while
other texts use *range* to denote the codomain of a function. Hence
we will avoid using the term “range” altogether.*remainder* of \(x\) when it is divided by \(a\). That is, it is the
number \(r\) in \(\{0,\ldots,a-1\}\) such that \(x = ak +r\) for some
integer \(k\). We sometimes also use the notation \(x = y (\mod a)\) to
denote the assertion that \(x \mod a\) is the same as \(y \mod a\).

Input | Output |
---|---|

0 | 0 |

1 | 1 |

2 | 0 |

3 | 1 |

4 | 0 |

5 | 1 |

6 | 0 |

7 | 1 |

8 | 0 |

9 | 1 |

If \(F:S \rightarrow T\) satisfies that \(F(x)\neq F(y)\) for all \(x \neq y\)
then we say that \(F\) is *one-to-one* (also known as an *injective*
function or simply an *injection*). If \(F\) satisfies that for every
\(y\in T\) there is some \(x\in S\) such that \(F(x)=y\) then we say that \(F\)
is *onto* (also known as a *surjective* function or simply a
*surjection*). A function that is both one-to-one and onto is known as a
*bijective* function or simply a *bijection*. A bijection from a set \(S\)
to itself is also known as a *permutation* of \(S\). If
\(F:S \rightarrow T\) is a bijection then for every \(y\in T\) there is a
unique \(x\in S\) s.t. \(F(x)=y\). We denote this value \(x\) by \(F^{-1}(y)\).
Note that \(F^{-1}\) is itself a bijection from \(T\) to \(S\) (can you see
why?).

Giving a bijection between two sets is often a good way to show they
have the same size. In fact, the standard mathematical definition of the
notion that “\(S\) and \(T\) have the same cardinality” is that there exists
a bijection \(f:S \rightarrow T\). In particular, the cardinality of a set
\(S\) is defined to be \(n\) if there is a bijection from \(S\) to the set
\(\{0,\ldots,n-1\}\). As we will see later in this course, this is a
definition that can generalizes to defining the cardinality of
*infinite* sets.

**Partial functions:** We will sometimes be interested in *partial*
functions from \(S\) to \(T\). A partial function is allowed to be undefined
on some subset of \(S\). That is, if \(F\) is a partial function from \(S\) to
\(T\), then for every \(s\in S\), either there is (as in the case of
standard functions) an element \(F(s)\) in \(T\), or \(F(s)\) is undefined.
For example, the partial function \(F(x)= \sqrt{x}\) is only defined on
non-negative real numbers. When we want to distinguish between partial
functions and standard (i.e., non-partial) functions, we will call the
latter *total* functions. When we say “function” without any qualifier
then we mean a *total* function.

The notion of partial functions is a strict generalization of functions, and so every function is a partial function, but not every partial function is a function. (That is, for every nonempty \(S\) and \(T\), the set of partial functions from \(S\) to \(T\) is a proper superset of the set of total functions from \(S\) to \(T\).) When we want to emphasize that a function \(f\) from \(A\) to \(B\) might not be total, we will write \(f: A \rightarrow_p B\). We can think of a partial function \(F\) from \(S\) to \(T\) also as a total function from \(S\) to \(T \cup \{ \bot \}\) where \(\bot\) is some special “failure symbol”, and so instead of saying that \(F\) is undefined at \(x\), we can say that \(F(x)=\bot\).

**Basic facts about functions:** Verifying that you can prove the
following results is an excellent way to brush up on functions:

- If \(F:S \rightarrow T\) and \(G:T \rightarrow U\) are one-to-one
functions, then their
*composition*\(H:S \rightarrow U\) defined as \(H(s)=G(F(s))\) is also one to one. - If \(F:S \rightarrow T\) is one to one, then there exists an onto function \(G:T \rightarrow S\) such that \(G(F(s))=s\) for every \(s\in S\).
- If \(G:T \rightarrow S\) is onto then there exists a one-to-one function \(F:S \rightarrow T\) such that \(G(F(s))=s\) for every \(s\in S\).
- If \(S\) and \(T\) are finite sets then the following conditions are
equivalent to one another:
**(a)**\(|S| \leq |T|\),**(b)**there is a one-to-one function \(F:S \rightarrow T\), and**(c)**there is an onto function \(G:T \rightarrow S\).

You can find the proofs of these results in many discrete math texts, including for example, section 4.5 in the Leham-Leighton-Meyer notes. However, I strongly suggest you try to prove them on your own, or at least convince yourself that they are true by proving special cases of those for small sizes (e.g., \(|S|=3,|T|=4,|U|=5\)).

Let us prove one of these facts as an example:

If \(S,T\) are non-empty sets and \(F:S \rightarrow T\) is one to one, then there exists an onto function \(G:T \rightarrow S\) such that \(G(F(s))=s\) for every \(s\in S\).

Let \(S\), \(T\) and \(F:S \rightarrow T\) be as in the Lemma’s statement, and
choose some \(s_0 \in S\). We will define the function \(G:T \rightarrow S\)
as follows: for every \(t\in T\), if there is some \(s\in S\) such that
\(F(s)=t\) then set \(G(t)=s\) (the choice of \(s\) is well defined since by
the one-to-one property of \(F\), there cannot be two distinct \(s,s'\) that
both map to \(t\)). Otherwise, set \(G(t)=s_0\). Now for every \(s\in S\), by
the definition of \(G\), if \(t=F(s)\) then \(G(t)=G(F(s))=s\). Moreover, this
also shows that \(G\) is *onto*, since it means that for every \(s\in S\)
there is some \(t\) (namely \(t=F(s)\)) such that \(G(t)=s\).

*Graphs* are ubiquitous in Computer Science, and many other fields as
well. They are used to model a variety of data types including social
networks, road networks, deep neural nets, gene interactions,
correlations between observations, and a great many more. The formal
definitions of graphs are below, but if you have not encountered them
before then I urge you to read up on them in one of the sources linked
above. Graphs come in two basic flavors: *undirected* and
*directed*.

An *undirected graph* \(G = (V,E)\) consists of a set \(V\) of *vertices*
and a set \(E\) of edges. Every edge is a size two subset of \(V\). We say
that two vertices \(u,v \in V\) are *neighbors*, denoted by \(u \sim v\), if
the edge \(\{u,v\}\) is in \(E\).

Given this definition, we can define several other properties of graphs
and their vertices. We define the *degree* of \(u\) to be the number of
neighbors \(u\) has. A *path* in the graph is a tuple
\((u_0,\ldots,u_k) \in V^k\), for some \(k>0\) such that \(u_{i+1}\) is a
neighbor of \(u_i\) for every \(i\in [k]\). A *simple path* is a path
\((u_0,\ldots,u_{k-1})\) where all the \(u_i\)’s are distinct. A *cycle* is
a path \((u_0,\ldots,u_k)\) where \(u_0=u_{k}\). We say that two vertices
\(u,v\in V\) are *connected* if either \(u=v\) or there is a path from
\((u_0,\ldots,u_k)\) where \(u_0=u\) and \(u_k=v\). We say that the graph \(G\)
is *connected* if every pair of vertices in it is connected.

Here are some basic facts about undirected graphs. We give some informal arguments below, but leave the full proofs as exercises. (The proofs can also be found in most basic texts on graph theory.)

In any undirected graph \(G=(V,E)\), the sum of the degrees of all vertices is equal to twice the number of edges.

degreesegeslem can be shown by seeing that every edge \(\{ u,v\}\) contributes twice to the sum of the degrees (once for \(u\) and the second time for \(v\).)

The connectivity relation is *transitive*, in the sense that if \(u\) is
connected to \(v\), and \(v\) is connected to \(w\), then \(u\) is connected to
\(w\).

conntranslem can be shown by simply attaching a path of the form \((u,u_1,u_2,\ldots,u_{k-1},v)\) to a path of the form \((v,u'_1,\ldots,u'_{k'-1},w)\) to obtain the path \((u,u_1,\ldots,u_{k-1},v,u'_1,\ldots,u'_{k'-1},w)\) that connects \(u\) to \(w\).

For every undirected graph \(G=(V,E)\) and connected pair \(u,v\), the shortest path from \(u\) to \(v\) is simple. In particular, for every connected pair there exists a simple path that connects them.

simplepathlem can be shown by “shortcutting” any non simple path of the form \((u,u_1,\ldots,u_{i-1},w,u_{i+1},\ldots,u_{j-1},w,u_{j+1},\ldots,u_{k-1},v)\) where the same vertex \(w\) appears in both the \(i\)-th and \(j\)-position, to obtain the shorter path \((u,u_1,\ldots,u_{i-1},w,u_{j+1},\ldots,u_{k-1},v)\).

If you haven’t seen these proofs before, it is indeed a great exercise to transform the above informal exercises into fully rigorous proofs.

A *directed graph* \(G=(V,E)\) consists of a set \(V\) and a set
\(E \subseteq V\times V\) of *ordered pairs* of \(V\). We denote the edge
\((u,v)\) also as \(\overrightarrow{u v}\). If the edge
\(\overrightarrow{u v}\) is present in the graph then we say that \(v\) is
an *out-neighbor* of \(u\) and \(u\) is an *in-neigbor* of \(v\).

A directed graph might contain both \(\overrightarrow{u v}\) and
\(\overrightarrow{v u}\) in which case \(u\) will be both an in-neighbor and
an out-neighbor of \(v\) and vice versa. The *in-degree* of \(u\) is the
number of in-neighbors it has, and the *out-degree* of \(v\) is the number
of out-neighbors it has. A *path* in the graph is a tuple
\((u_0,\ldots,u_k) \in V^k\), for some \(k>0\) such that \(u_{i+1}\) is an
out-neighbor of \(u_i\) for every \(i\in [k]\). As in the undirected case, a
*simple path* is a path \((u_0,\ldots,u_{k-1})\) where all the \(u_i\)’s are
distinct and a *cycle* is a path \((u_0,\ldots,u_k)\) where \(u_0=u_{k}\).
One type of directed graphs we often care about is *directed acyclic
graphs* or *DAGs*, which, as their name implies, are directed graphs
without any cycles.

The lemmas we mentioned above have analogs for directed graphs. We again leave the proofs (which are essentially identical to their undirected analogs) as exercises for the reader:

In any directed graph \(G=(V,E)\), the sum of the in-degrees is equal to the sum of the out-degrees, which is equal to the number of edges.

In any directed graph \(G\), if there is a path from \(u\) to \(v\) and a path from \(v\) to \(w\), then there is a path from \(u\) to \(w\).

For every directed graph \(G=(V,E)\) and a pair \(u,v\) such that there is a
path from \(u\) to \(v\), the *shortest path* from \(u\) to \(v\) is simple.

The word *graph* in the sense above was coined by the mathematician
Sylvester in 1878 in analogy with the chemical graphs used to visualize
molecules. There is an unfortunate confusion with the more common usage
of the term as a way to plot data, and in particular a plot of some
function \(f(x)\) as a function of \(x\). We can merge these two meanings by
thinking of a function \(f:A \rightarrow B\) as a special case of a
directed graph over the vertex set \(V= A \cup B\) where we put the edge
\(\overrightarrow{x f(x)}\) for every \(x\in A\). In a graph constructed in
this way every vertex in \(A\) has out-degree one.

The following lecture of Berkeley CS70 provides an excellent overview of graph theory.

If \(P\) and \(Q\) are some statements that can be true or false, then \(P\)
AND \(Q\) (denoted as \(P \wedge Q\)) is the statement that is true if and
only if both \(P\) *and* \(Q\) are true, and \(P\) OR \(Q\) (denoted as
\(P \vee Q\)) is the statement that is true if and only if either \(P\) *or*
\(Q\) is true. The *negation* of \(P\), denoted as \(\neg P\) or
\(\overline{P}\), is the statement that is true if and only if \(P\) is
false.

Suppose that \(P(x)\) is a statement that depends on some *parameter* \(x\)
(also sometimes known as an *unbound* variable) in the sense that for
every instantiation of \(x\) with a value from some set \(S\), \(P(x)\) is
either true or false. For example, \(x>7\) is a statement that is not a
priori true or false, but does become true or false whenever we
instantiate \(x\) with some real number. In such case we denote by
\(\forall_{x\in S} P(x)\) the statement that is true if and only if \(P(x)\)
is true *for every* \(x\in S\).*there exists* some \(x\in S\)
such that \(P(x)\) is true.

For example, the following is a formalization of the true statement that there exists a natural number \(n\) larger than \(100\) that is not divisible by \(3\):

\[ \exists_{n\in \N} (n>100) \wedge \left(\forall_{k\in N} k+k+k \neq n\right) \;. \]

*“For sufficiently large \(n\)”* One expression which comes up time and
again is the claim that some statement \(P(n)\) is true “for sufficiently
large \(n\)”. What this means is that there exists an integer \(N_0\) such
that \(P(n)\) is true for every \(n>N_0\). We can formalize this as
\(\exists_{N_0\in \N} \forall_{n>N_0} P(n)\).

The following shorthands for summing up or taking products of several numbers are often convenient. If \(S = \{s_0,\ldots,s_{n-1} \}\) is a finite set and \(f:S \rightarrow \R\) is a function, then we write \(\sum_{x\in S} f(x)\) as shorthand for

\[ f(s_0) + f(s_1) + f(s_2) + \ldots + f(s_{n-1}) \;, \]

and \(\prod_{x\in S} f(x)\) as shorthand for

\[ f(s_0) \cdot f(s_1) \cdot f(s_2) \cdot \ldots \cdot f(s_{n-1}) \;. \]

For example, the sum of the squares of all numbers from \(1\) to \(100\) can be written as

\[ \sum_{i\in \{1,\ldots,100\}} i^2 \;. \label{eqsumsquarehundred} \]

Since summing up over intervals of integers is so common, there is a special notation for it, and for every two integers \(a \leq b\), \(\sum_{i=a}^b f(i)\) denotes \(\sum_{i\in S} f(i)\) where \(S =\{ x\in \Z : a \leq x \leq b \}\). Hence we can write the sum \eqref{eqsumsquarehundred} as

\[ \sum_{i=1}^{100} i^2 \;. \]

In mathematics, as in coding, we often have symbolic “variables” or
“parameters”. It is important to be able to understand, given some
formula, whether a given variable is *bound* or *free* in this formula.
For example, in the following statement \(n\) is free but \(a\) and \(b\) are
bound by the \(\exists\) quantifier:

\[ \exists_{a,b \in \N} (a \neq 1) \wedge (a \neq n) \wedge (n = a \times b) \label{aboutnstmt} \]

Since \(n\) is free, it can be set to any value, and the truth of the statement \eqref{aboutnstmt} depends on the value of \(n\). For example, if \(n=8\) then \eqref{aboutnstmt} is true, but for \(n=11\) it is false. (Can you see why?)

The same issue appears when parsing code. For example, in the following snippet from the C++ programming language

the variable `i`

is bound to the `for`

operator but the variable `n`

is
free.

The main property of bound variables is that we can change them to a different name (as long as it doesn’t conflict with another used variable) without changing the meaning of the statement. Thus for example the statement

\[ \exists_{x,y \in \N} (x \neq 1) \wedge (x \neq n) \wedge (n = x \times y) \label{aboutnstmttwo} \] is equivalent to \eqref{aboutnstmt} in the sense that it is true for exactly the same set of \(n\)’s. Similarly, the code

produces the same result as the code above that used `i`

instead of `j`

.

Mathematical notation has a lot of similarities with programming
language, and for the same reasons. Both are formalisms meant to convey
complex concepts in a precise way. However, there are some cultural
differences. In programming languages, we often try to use meaningful
variable names such as `NumberOfVertices`

while in math we often use
short identifiers such as \(n\). (Part of it might have to do with the
tradition of mathematical proofs as being handwritten and verbally
presented, as opposed to typed up and compiled.)

One consequence of that is that in mathematics we often end up reusing identifier, and also “run out” of letters and hence use greek letters too, as well as distinguish between small and capital letters. Similarly, mathematical notation tends to use quite a lot of “overloading”, using operators such as \(+\) for a great variety of objects (e.g., real numbers, matrices, finite field elements, etc..), and assuming that the meaning can be inferred from the context.

Both fields have a notion of “types”, and in math we often try to reserve certain letters for variables of a particular type. For example, variables such as \(i,j,k,\ell,m,n\) will often denote integers, and \(\epsilon\) will often denote a small positive real number. When reading or writing mathematical texts, we usually don’t have the advantage of a “compiler” that will check type safety for us. Hence it is important to keep track of the type of each variable, and see that the operations that are performed on it “make sense”.

“\(\log\log\log n\) has been proved to go to infinity, but has never been observed to do so.”, Anonymous, quoted by Carl Pomerance (2000)

It is often very cumbersome to describe precisely quantities such as
running time and is also not needed, since we are typically mostly
interested in the “higher order terms”. That is, we want to understand
the *scaling behavior* of the quantity as the input variable grows. For
example, as far as running time goes, the difference between an
\(n^5\)-time algorithm and an \(n^2\)-time one is much more significant than
the difference between an \(100n^2 + 10n\) time algorithm and an \(10n^2\)
time algorithm. For this purpose, \(O\)-notation is extremely useful as a
way to “declutter” our text and focus our attention on what really
matters. For example, using \(O\)-notation, we can say that both
\(100n^2 + 10n\) and \(10n^2\) are simply \(\Theta(n^2)\) (which informally
means “the same up to constant factors”), while \(n^2 = o(n^5)\) (which
informally means that \(n^2\) is “much smaller than” \(n^5\)).

Generally (though still informally), if \(F,G\) are two functions mapping natural numbers to non-negative reals, then “\(F=O(G)\)” means that \(F(n) \leq G(n)\) if we don’t care about constant factors, while “\(F=o(G)\)” means that \(F\) is much smaller than \(G\), in the sense that no matter by what constant factor we multiply \(F\), if we take \(n\) to be large enough then \(G\) will be bigger (for this reason, sometimes \(F=o(G)\) is written as \(F \ll G\)). We will write \(F= \Theta(G)\) if \(F=O(G)\) and \(G=O(F)\), which one can think of as saying that \(F\) is the same as \(G\) if we don’t care about constant factors. More formally, we define Big-\(O\) notation as follows:

For \(F,G: \N \rightarrow \R_+\), we define \(F=O(G)\) if there exist
numbers \(a,N_0 \in \N\) such that \(F(n) \leq a\cdot G(n)\) for every
\(n>N_0\).

We write \(F =o(G)\) if for every \(\epsilon>0\) there is some \(N_0\) such that \(F(n) <\epsilon G(n)\) for every \(n>N_0\). We write \(F =\omega(G)\) if \(G=o(F)\). We write \(F= \Theta(G)\) if \(F=O(G)\) and \(G=O(F)\).

We can also use the notion of *limits* to define Big- and Little-\(O\)
notation. You can verify that \(F=o(G)\) (or, equivalently, \(G=\omega(F)\))
if and only if
\(\lim\limits_{n\rightarrow\infty} \tfrac{F(n)}{G(n)} = 0\). Similarly, if
the limit \(\lim\limits_{n\rightarrow\infty} \tfrac{F(n)}{G(n)}\) exists
and is a finite number then \(F=O(G)\). If you are familiar with the
notion of *supremum*, then you can verify that \(F=O(G)\) if and only if
\(\limsup\limits_{n\rightarrow\infty} \tfrac{F(n)}{G(n)} < \infty\).

Using the equality sign for \(O\)-notation is extremely common, but is
somewhat of a misnomer, since a statement such as \(F = O(G)\) really
means that \(F\) is in the set
\(\{ G' : \exists_{N,c} \text{ s.t. } \forall_{n>N} G'(n) \leq c G(n) \}\).
For this reason, some texts write \(F \in O(G)\) instead of \(F = O(G)\). If
anything, it would have made more sense use *inequalities* and write
\(F \leq O(G)\) and \(F \geq \Omega(G)\), reserving equality for
\(F = \Theta(G)\), but by now the equality notation is quite firmly
entrenched. Nevertheless, you should remember that a statement such as
\(F = O(G)\) means that \(F\) is “at most” \(G\) in some rough sense when we
ignore constants, and a statement such as \(F = \Omega(G)\) means that \(F\)
is “at least” \(G\) in the same rough sense.

It’s often convenient to use “anonymous functions” in the context of \(O\)-notation, and also to emphasize the input parameter to the function. For example, when we write a statement such as \(F(n) = O(n^3)\), we mean that \(F=O(G)\) where \(G\) is the function defined by \(G(n)=n^3\). Chapter 7 in Jim Apsnes’ notes on discrete math provides a good summary of \(O\) notation; see also this tutorial for a gentler and more programmer-oriented introduction.

There are some simple heuristics that can help when trying to compare two functions \(F\) and \(G\):

- Multiplicative constants don’t matter in \(O\)-notation, and so if \(F(n)=O(G(n))\) then \(100F(n)=O(G(n))\).
- When adding two functions, we only care about the larger one. For example, for the purpose of \(O\)-notation, \(n^3+100n^2\) is the same as \(n^3\), and in general in any polynomial, we only care about the larger exponent.
- For every two constants \(a,b>0\), \(n^a = O(n^b)\) if and only if \(a \leq b\), and \(n^a = o(n^b)\) if and only if \(a<b\). For example, combining the two observations above, \(100n^2 + 10n + 100 = o(n^3)\).
- Polynomial is always smaller than exponential: \(n^a = o(2^{n^\epsilon})\) for every two constants \(a>0\) and \(\epsilon>0\) even if \(\epsilon\) is much smaller than \(a\). For example, \(100n^{100} = o(2^{\sqrt{n}})\).
- Similarly, logarithmic is always smaller than polynomial: \((\log n)^a\) (which we write as \(\log^a n\)) is \(o(n^\epsilon)\) for every two constants \(a,\epsilon>0\). For example, combining the observations above, \(100n^2 \log^{100} n = o(n^3)\).

In most (though not all!) cases we use \(O\)-notation, the constants hidden by it are not too huge and so on an intuitive level, you can think of \(F=O(G)\) as saying something like \(F(n) \leq 1000 G(n)\) and \(F=\Omega(G)\) as saying something \(F(n) \geq 0.001 G(n)\).

Many people think of mathematical proofs as a sequence of logical deductions that starts from some axioms and ultimately arrives at a conclusion. In fact, some dictionaries define proofs that way. This is not entirely wrong, but in reality a mathematical proof of a statement X is simply an argument that convinces the reader that X is true beyond a shadow of a doubt. To produce such a proof you need to:

- Understand precisely what X means.
- Convince
*yourself*that X is true. - Write your reasoning down in plain, precise and concise English (using formulas or notation only when they help clarity).

In many cases, Step 1 is the most important one. Understanding what a statement means is often more than halfway towards understanding why it is true. In Step 3, to convince the reader beyond a shadow of a doubt, we will often want to break down the reasoning to “basic steps”, where each basic step is simple enough to be “self evident”. The combination of all steps yields the desired statement.

There is a great deal of similarity between the process of writing
*proofs* and that of writing *programs*, and both require a similar set
of skills. Writing a *program* involves:

- Understanding what is the
*task*we want the program to achieve. - Convincing
*yourself*that the task can be achieved by a computer, perhaps by planning on a whiteboard or notepad how you will break it up to simpler tasks. - Converting this plan into code that a compiler or interpreter can understand, by breaking up each task into a sequence of the basic operations of some programming language.

In programs as in proofs, step 1 is often the most important one. A key
difference is that the reader for proofs is a human being and for
programs is a compiler.*machine verifiable form* and progress in
artificial intelligence allows expressing programs in more human
friendly ways, such as “programming by example”. Interestingly, much
of the progress in automatic proof verification and proof assistants
relies on a much deeper
correspondence
between *proofs* and *programs*. We *might* see this correspondence
later in this course.*readability* and
having a *clear logical flow* for the proof (which is not a bad idea for
programs as well…). When writing a proof, you should think of your
audience as an intelligent but highly skeptical and somewhat petty
reader, that will “call foul” at every step that is not well justified.

To illustrate these ideas, let us consider the following example of a true theorem:

Every connected undirected graph of \(n\) vertices has at least \(n-1\) edges.

We are going to take our time to understand how one would come up with a proof for graphconthm, and how to write such a proof down. This will not be the shortest way to prove this theorem, but hopefully following this process will give you some general insights on reading, writing, and discovering mathematical proofs.

Before trying to prove graphconthm, we need to understand what
it means. Let’s start with the terms in the theorems. We defined
undirected graphs and the notion of connectivity in graphsec
above. In particular, an undirected graph \(G=(V,E)\) is *connected* if
for every pair \(u,v \in V\), there is a path \((u_0,u_1,\ldots,u_k)\) such
that \(u_0=u\), \(u_k=v\), and \(\{ u_i,u_{i+1} \} \in E\) for every
\(i\in [k]\).

It is crucial that at this point you pause and verify that you completely understand the definition of connectivity. Indeed, you should make a habit of pausing after any statement of a theorem, even before looking at the proof, and verifying that you understand all the terms that the theorem refers to.

To prove graphconthm we need to show that there is no
\(2\)-vertex connected graph with fewer than \(1\) edges, \(3\)-vertex
connected graph with fewer than \(2\) edges, and so on and so forth. One
of the best ways to prove a theorem is to first try to *disprove it*. By
trying and failing to come up with a counterexample, we often understand
why the theorem can not be false. For example, if you try to draw a
\(4\)-vertex graph with only two edges, you can see that there are
basically only two choices for such a graph as depicted in
figurefourvertexgraph, and in both there will remain some
vertices that cannot be connected.

In fact, we can see that if we have a budget of \(2\) edges and we choose
some vertex \(u\), we will not be able to connect to \(u\) more than two
other vertices, and similarly with a budget of \(3\) edges we will not be
able to connect to \(u\) more than three other vertices. We can keep
trying to draw such examples until we convince ourselves that the
theorem is probably true, at which point we want to see how we can
*prove* it.

If you have not seen the proof of this theorem before (or don’t remember
it), this would be an excellent point to pause and try to prove it
yourself. One way to do it would be to describe an *algorithm* that on
input a graph \(G\) on \(n\) vertices and \(n-2\) or fewer edges, finds a pair
\(u,v\) of vertices such that \(u\) is disconnected from \(v\).

There are several ways to prove graphconthm. One approach to
do is to start by proving it for small graphs, such as graphs with 2,3
or 4 edges, for which we can check all the cases, and then try to extend
the proof for larger graphs. The technical term for this proof approach
is *proof by induction*.

*Induction* is simply an application of the self-evident Modus Ponens
rule that says that if
**(a)** \(P\) is true and **(b)** \(P\) implies \(Q\) then \(Q\) is true. In the
setting of proofs by induction we typically have a statement \(Q(k)\) that
is parameterized by some integer \(k\), and we prove that **(a)** \(Q(0)\)
is true and **(b)** For every \(k>0\), if \(Q(0),\ldots,Q(k-1)\) are all
true then \(Q(k)\) is true.**(b)** is the hard part, though there are
examples where the “base case” **(a)** is quite subtle.**(a)** and **(b)** that \(Q(1)\) is true, and then from
**(a)**,**(b)** and \(Q(1)\) that \(Q(2)\) is true, and so on and so forth
to obtain that \(Q(k)\) is true for every \(k\). The statement **(a)** is
called the “base case”, while **(b)** is called the “inductive step”.
The assumption in **(b)** that \(Q(i)\) holds for \(i<k\) is called the
“inductive hypothesis”.

Proofs by inductions are closely related to algorithms by recursion. In both cases we reduce solving a larger problem to solving a smaller instance of itself. In a recursive algorithm to solve some problem P on an input of length \(k\) we ask ourselves “what if someone handed me a way to solve P on instances smaller than \(k\)?”. In an inductive proof to prove a statement Q parameterized by a number \(k\), we ask ourselves “what if I already knew that \(Q(k')\) is true for \(k'<k\)”. Both induction and recursion are crucial concepts for this course and Computer Science at large (and even other areas of inquiry, including not just mathematics but other sciences as well). Both can be initially (and even post-initially) confusing, but with time and practice they become clearer. For more on proofs by induction and recursion, you might find the following Stanford CS 103 handout, this MIT 6.00 lecture or this excerpt of the Lehman-Leighton book useful.

There are several ways to use induction to prove graphconthm. We will do so by following our intuition above that with a budget of \(k\) edges, we cannot connect to a vertex more than \(k\) other vertices. That is, we will define the statement \(Q(k)\) as follows:

\(Q(k)\) is

“For every graph \(G=(V,E)\) with at most \(k\) edges and every \(u\in V\), the number of vertices that are connected to \(u\) (including \(u\) itself) is at most \(k+1\)”

Note that \(Q(n-2)\) implies our theorem, since it means that in an \(n\)
vertex graph of \(n-2\) edges, there would be at most \(n-1\) vertices that
are connected to \(u\), and hence in particular there would be *some*
vertex that is not connected to \(u\). More formally, if we define, given
any undirected graph \(G\) and vertex \(u\) of \(G\), the set \(C_G(u)\) to
contain all vertices connected to \(u\), then the statement \(Q(k)\) is that
for every undirected graph \(G=(V,E)\) with \(|E|=k\) and \(u\in V\),
\(|C_G(u)| \leq k+1\).

To prove that \(Q(k)\) is true for every \(k\) by induction, we will first
prove that **(a)** \(Q(0)\) is true, and then prove **(b)** if
\(Q(0),\ldots,Q(k-1)\) are true then \(Q(k)\) is true as well. In fact, we
will prove the stronger statement **(b’)** that if \(Q(k-1)\) is true then
\(Q(k)\) is true as well. (**(b’)** is a stronger statement than **(b)**
because it has same conclusion with a weaker assumption.) Thus, if we
show both **(a)** and **(b’)** then we complete the proof of
graphconthm.

Proving **(a)** (i.e., the “base case”) is actually quite easy. The
statement \(Q(0)\) says that if \(G\) has zero edges, then \(|C_G(u)|=1\), but
this is clear because in a graph with zero edges, \(u\) is only connected
to itself. The heart of the proof is, as typical with induction proofs,
is in proving a statement such as **(b’)** (or even the weaker statement
**(b)**). Since we are trying to prove an *implication*, we can *assume*
the so-called “inductive hypothesis” that \(Q(k-1)\) is true and need to
prove from this assumption that \(Q(k)\) is true. So, suppose that
\(G=(V,E)\) is a graph of \(k\) edges, and \(u\in V\). Since we can use
induction, a natural approach would be to remove an edge \(e\in E\) from
the graph to create a new graph \(G'\) of \(k-1\) edges. We can use the
induction hypothesis to argue that \(|C_{G'}(u)| \leq k\). Now if we could
only argue that removing the edge \(e\) reduced the connected component of
\(u\) by at most a single vertex, then we would be done, as we could argue
that \(|C_G(u)| \leq |C_{G'}(u)|+1 \leq k+1\).

Please ensure that you understand why showing that \(|C_G(u)| \leq |C_{G'}(u)|+1\) completes the inductive proof.

Alas, this might not be the case. It could be that removing a single edge \(e\) will greatly reduce the size of \(C_{G}(u)\). For example that edge might be a “bridge” between two large connected components; such a situation is illustrated in effectofoneedgefig. This might seem as a real stumbling block, and at this point we might go back to the drawing board to see if perhaps the theorem is false after all. However, if we look at various concrete examples, we see that in any concrete example, there is always a “good” choice of an edge, adding which will increase the component connect to \(u\) by at most one vertex.

The crucial observation is that this always holds if we choose an edge \(e = \{ s, w\}\) where \(w \in C_G(u)\) has degree one in the graph \(G\), see addingdegreeonefig. The reason is simple. Since every path from \(u\) to \(w\) must pass through \(s\) (which is \(w\)’s only neighbor), removing the edge \(\{ s,w \}\) merely has the effect of disconnecting \(w\) from \(u\), and hence \(C_{G'}(u) = C_G(u) \setminus \{ w \}\) and in particular \(|C_{G'}(u)|=|C_G(u)|-1\), which is exactly the condition we needed.

Now the question is whether there will always be a degree one vertex in \(C_G(u) \setminus \{u \}\). Of course generally we are not guaranteed that a graph would have a degree one vertex, but we are not dealing with a general graph here but rather a graph with a small number of edges. We can assume that \(|C_G(u)| > k+1\) (otherwise we’re done) and each vertex in \(C_G(u)\) must have degree at least one (as otherwise it would not be connected to \(u\)). Thus, the only case where there is no vertex \(w\in C_G(u) \setminus \{u\}\) of degree one, is when the degrees of all vertices in \(C_G(u)\) are at least \(2\). But then by degreesegeslem the number of edges in the graph is at least \(\tfrac{1}{2}\cdot 2 \cdot (k+1)>k\), which contradicts our assumption that the graph \(G\) has at most \(k\) edges. Thus we can conclude that either \(|C_G(u)| \leq k+1\) (in which case we’re done) or there is a degree one vertex \(w\neq u\) that is connected to \(u\). By removing the single edge \(e\) that touches \(w\), we obtain a \(k-1\) edge graph \(G'\) which (by the inductive hypothesis) satisfies \(|C_{G'}(u)| \leq k\), and hence \(|C_G(u)|=|C_{G'}(u) \cup \{ w \}| \leq k+1\). This suffices to complete an inductive proof of statement \(Q(k)\).

All of the above was a discussion of how we *discover* the proof, and
convince *ourselves* that the statement is true. However, once we do
that, we still need to write it down. When writing the proof, we use the
benefit of hindsight, and try to streamline what was a messy journey
into a linear and easy-to-follow flow of logic that starts with the word
**“Proof:”** and ends with **“QED”** or the symbol \(\blacksquare\).*lemmas* or *claims* in math language), which will be smaller
statements that help us prove the main result. However, it should always
be crystal-clear to the reader in what stage we are of the proof. Just
like it should always be clear to which function a line of code belongs
to, it should always be clear whether an individual sentence is part of
a proof of some intermediate result, or is part of the argument showing
that this intermediate result implies the theorem. Sometimes we
highlight this partition by noting after each occurrence of **“QED”** to
which lemma or claim it belongs.

Let us see how the proof of graphconthm looks in this streamlined fashion. We start by repeating the theorem statement

Every connected undirected graph of \(n\) vertices has at least \(n-1\) edges.

The proof will follow from the following lemma:

For every \(k\in \N\), undirected graph \(G=(V,E)\) of at most \(k\) edges, and \(u\in V\), the number of vertices connected to \(u\) in \(G\) is at most \(k+1\).

We start by showing that graphcontlem implies the theorem:

Proof of graphconthmpf from graphcontlem:We will show that for undirected graph \(G=(V,E)\) of \(n\) vertices and at most \(n-2\) edges, there is a pair \(u,v\) of vertices that are disconnected in \(G\). let \(G\) be such a graph and \(u\) be some vertex of \(G\). By graphcontlem, the number of vertices connected to \(u\) is at most \(n-1\), and hence (since \(|V|=n\)) there is a vertex \(v\in V\) that is not connected to \(u\), thus completing the proof.QED (Proof of graphconthmpf from graphcontlem)

We now turn to proving graphcontlem. Let \(G=(V,E)\) be an undirected graph of \(k\) edges and \(u\in V\). We define \(C_G(u)\) to be the set of vertices connected to \(u\). To complete the proof of graphcontlem, we need to prove that \(|C_G(u)| \leq k+1\). We will do so by induction on \(k\).

The *base* case that \(k=0\) is true because a graph with zero edges, \(u\)
is only connected to itself.

Now suppose that graphcontlem is true for \(k-1\) and we will
prove it for \(k\). Let \(G=(V,E)\) and \(u\in V\) be as above, where \(|E|=k\),
and suppose (towards a contradiction) that \(|C_G(u)| \geq k+2\). Let
\(S = C_G(u) \setminus \{u \}\). Denote by \(deg(v)\) the degree of any
vertex \(v\). By degreesegeslem,
\(\sum_{v\in S} deg(v) \leq \sum_{v\in V} deg(v) = 2|E|=2k\). Hence in
particular, under our assumption that \(|S|+1=|C_G(u)| \geq k+2\), we get
that \(\tfrac{1}{|S|}\sum_{v\in S} deg(v) \leq 2k/(k+1)< 2\). In other
words, the *average* degree of a vertex in \(S\) is smaller than \(2\), and
hence in particular there is *some* vertex \(w\in S\) with degree smaller
than \(2\). Since \(w\) is connected to \(u\), it must have degree at least
one, and hence (since \(w\)’s degree is smaller than two) degree *exactly*
one. In other words, \(w\) has a single neighbor which we denote by \(s\).

Let \(G'\) be the graph obtained by removing the edge \(\{ s, w\}\) from \(G\). Since \(G'\) has at most \(k-1\) edges, by the inductive hypothesis we can assume that \(|C_{G'}(u)| \leq k\). The proof of the lemma is concluded by showing the following claim:

Claim:Under the above assumptions, \(|C_G(u)| \leq |C_{G'}(u)|+1\).

Proof of claim:The claim says that \(C_{G'}(u)\) has at most one fewer element than \(C_G(u)\). Thus it follows from the following statement \((*)\): \(C_{G'}(u) \supseteq C_G(u) \setminus \{ w \}\). To prove (*) we need to show that for every \(v \neq w\) that is connected to \(u\), \(v \in C_{G'}(u)\). Indeed for every such \(v\), simplepathlem implies that there must be somesimplepath \((t_0,t_1,\ldots,t_{i-1},t_i)\) in the graph \(G\) where \(t_0=u\) and \(t_i=v\). But \(w\) cannot belong to this path, since \(w\) is different from the endpoints \(u\) and \(v\) of the path and can’t equal one of the intermediate points either, since it has degree one and that would make the path not simple. More formally, if \(w=t_j\) for \(0 < j < i\), then since \(w\) has only a single neighbor \(s\), it would have to hold that \(w\)’s neighbor \(s\) satisfies \(s=t_{j-1}=t_{j+1}\), contradicting the simplicity of the path. Hence the path from \(u\) to \(v\) is also a path in the graph \(G'\), which means that \(v \in C_{G'}(u)\), which is what we wanted to prove.QED (claim)

The claim implies graphcontlem since by the inductive
assumption, \(|C_{G'}(u)| \leq k\), and hence by the claim
\(|C_G(u)| \leq k+1\), which is what we wanted to prove. This concludes
the proof of graphcontlem and hence also of
graphconthmpf. **QED (graphcontlem)**, **QED
(graphconthmpf)**

The proof above used the observation that if the *average* of some \(n\)
numbers \(x_0,\ldots,x_{n-1}\) is at most \(X\), then there must *exists* at
least a single number \(x_i \leq X\). (In this particular proof, the
numbers were the degrees of vertices in \(S\).) This is known as the
*averaging principle*, and despite its simplicity, it is often extremely
useful.

Reading a proof is no less of an important skill than producing one. In
fact, just like understanding code, it is a highly non-trivial skill in
itself. Therefore I strongly suggest that you re-read the above proof,
asking yourself at every sentence whether the assumption it makes are
justified, and whether this sentence truly demonstrates what it purports
to achieve. Another good habit is to ask yourself when reading a proof
for every variable you encounter (such as \(u\), \(t_i\), \(G'\), etc. in the
above proof) the following questions: **(1)** What *type* of variable is
it? is it a number? a graph? a vertex? a function? and **(2)** What do
we know about it? Is it an arbitrary member of the set? Have we shown
some facts about it?, and **(3)** What are we *trying* to show about
it?.

A mathematical proof is a piece of writing, but it is a specific genre of writing with certain conventions and preferred styles. As in any writing, practice makes perfect, and it is also important to revise your drafts for clarity.

In a proof for the statement \(X\), all the text between the words
**“Proof:”** and **“QED”** should be focused on establishing that \(X\) is
true. Digressions, examples, or ruminations should be kept outside these
two words, so they do not confuse the reader. The proof should have a
clear logical flow in the sense that every sentence or equation in it
should have some purpose and it should be crystal-clear to the reader
what this purpose is. When you write a proof, for every equation or
sentence you include, ask yourself:

- Is this sentence or equation stating that some statement is true?
- If so, does this statement follow from the previous steps, or are we going to establish it in the next step?
- What is the
*role*of this sentence or equation? Is it one step towards proving the original statement, or is it a step towards proving some intermediate claim that you have stated before? - Finally, would the answers to questions 1-3 be clear to the reader? If not, then you should reorder, rephrase or add explanations.

Some helpful resources on mathematical writing include this handout by Lee, this handout by Hutching, as well as several of the excellent handouts in Stanford’s CS 103 class.

“If it was so, it might be; and if it were so, it would be; but as it isn’t, it ain’t. That’s logic.”, Lewis Carroll,Through the looking-glass.

Just like in programming, there are several common patterns of proofs that occur time and again. Here are some examples:

**Proofs by contradiction:** One way to prove that \(X\) is true is to
show that if \(X\) was false then we would get a contradiction as a
result. Such proofs often start with a sentence such as “Suppose,
towards a contradiction, that \(X\) is false” and end with deriving some
contradiction (such as a violation of one of the assumptions in the
theorem statement). Here is an example:

There are no natural numbers \(a,b\) such that \(\sqrt{2} = \tfrac{a}{b}\).

Suppose, towards the sake of contradiction that this is false, and so
let \(a\in \N\) be the smallest number such that there exists some
\(b\in\N\) satisfying \(\sqrt{2}=\tfrac{a}{b}\). Squaring this equation we
get that \(2=a^2/b^2\) or \(a^2=2b^2\) \((*)\). But this means that \(a^2\) is
*even*, and since the product of two odd numbers is odd, it means that
\(a\) is even as well, or in other words, \(a = 2a'\) for some \(a' \in \N\).
Yet plugging this into \((*)\) shows that \(4a'^2 = 2b^2\) which means
\(b^2 = 2a'^2\) is an even number as well. By the same considerations as
above we gat that \(b\) is even and hence \(a/2\) and \(b/2\) are two natural
numbers satisfying \(\tfrac{a/2}{b/2}=\sqrt{2}\), contradicting the
minimality of \(a\).

**Proofs of a universal statement:** Often we want to prove a statement
\(X\) of the form “Every object of type \(O\) has property \(P\).” Such proofs
often start with a sentence such as “Let \(o\) be an object of type \(O\)”
and end by showing that \(o\) has the property \(P\). Here is a simple
example:

For every natural number \(n\in N\), either \(n\) or \(n+1\) is even.

Let \(n\in N\) be some number. If \(n/2\) is a whole number then we are done, since then \(n=2(n/2)\) and hence it is even. Otherwise, \(n/2+1/2\) is a whole number, and hence \(2(n/2+1/2)=n+1\) is even.

**Proofs of an implication:** Another common case is that the statement
\(X\) has the form “\(A\) implies \(B\)”. Such proofs often start with a
sentence such as “Assume that \(A\) is true” and end with a derivation of
\(B\) from \(A\). Here is a simple example:

If \(b^2 \geq 4ac\) then there is a solution to the quadratic equation \(ax^2 + bx + c =0\).

Suppose that \(b^2 \geq 4ac\). Then \(d = b^2 - 4ac\) is a non-negative number and hence it has a square root \(s\). Thus \(x = (-b+s)/(2a)\) satisfies \[ \begin{aligned} ax^2 + bx + c &= a(-b+s)^2/(4a^2) + b(-b+s)/(2a) + c \\ &= (b^2-2bs+s^2)/(4a)+(-b^2+bs)/(2a)+c \;. \label{eq:quadeq} \end{aligned} \]

Rearranging the terms of \eqref{eq:quadeq} we get \[ s^2/(4a)+c- b^2/(4a) = (b^2-4ac)/(4a) + c - b^2/(4a) = 0 \]

**Proofs of equivalence:** If a statement has the form “\(A\) if and only
if \(B\)” (often shortened as “\(A\) iff \(B\)”) then we need to prove both
that \(A\) implies \(B\) and that \(B\) implies \(A\). We call the implication
that \(A\) implies \(B\) the “only if” direction, and the implication that
\(B\) implies \(A\) the “if” direction.

**Proofs by combining intermediate claims:** When a proof is more
complex, it is often helpful to break it apart into several steps. That
is, to prove the statement \(X\), we might first prove statements
\(X_1\),\(X_2\), and \(X_3\) and then prove that \(X_1 \wedge X_2 \wedge X_3\)
implies \(X\).

**Proofs by case distinction:** This is a special case of the above,
where to prove a statement \(X\) we split into several cases
\(C_1,\ldots,C_k\), and prove that **(a)** the cases are *exhaustive*, in
the sense that *one* of the cases \(C_i\) must happen and **(b)** go one
by one and prove that each one of the cases \(C_i\) implies the result \(X\)
that we are after.

**“Without loss of generality (w.l.o.g)”:** This term can be initially
quite confusing to students. It is essentially a way to shorten case
distinctions such as the above. The idea is that if Case 1 is equal to
Case 2 up to a change of variables or a similar transformation, then the
proof of Case 1 will also imply the proof of case 2. It is always a
statement that should be viewed with suspicion. Whenever you see it in a
proof, ask yourself if you understand *why* the assumption made is truly
without loss of generality, and when you use it, try to see if the use
is indeed justified. Sometimes it might be easier to just repeat the
proof of the second case (adding a remark that the proof is very similar
to the first one).

**Proofs by induction:** We can think of such proofs as a variant of the
above, where we have an unbounded number of intermediate claims
\(X_0,X_2,\ldots,X_k\), and we prove that \(X_0\) is true, as well that
\(X_0\) implies \(X_1\), and that \(X_0 \wedge X_1\) implies \(X_2\), and so on
and so forth. The website for CMU course 15-251 contains a useful
handout on
potential pitfalls when making proofs by induction.

Mathematical proofs are ultimately written in English prose. The
well-known computer scientist Leslie
Lamport argued that this
is a problem, and proofs should be written in a more formal and rigorous
way. In his
manuscript he
proposes an approach for *structured hierarchical proofs*, that have the
following form:

- A proof for a statement of the form “If \(A\) then \(B\)” is a sequence of numbered claims, starting with the assumption that \(A\) is true, and ending with the claim that \(B\) is true.
- Every claim is followed by a proof showing how it is derived from the previous assumptions or claims.
- The proof for each claim is itself a sequence of subclaims.

The advantage of Lamport’s format is that it is very clear for every sentence in the proof what is the role that it plays. It is also much easier to transform such proofs into machine-checkable format. The disadvantage is that such proofs can be more tedious to read and write, with less differentiation on the important parts of the arguments versus the more routine ones.

Most of the notation we discussed above is standard and is used in most mathematical texts. The main points where we diverge are:

- We index the natural numbers \(\N\) starting with \(0\) (though many other texts, especially in computer science, do the same).
- We also index the set \([n]\) starting with \(0\), and hence define it as \(\{0,\ldots,n-1\}\). In most texts it is defined as \(\{1,\ldots, n \}\). Similarly, we index coordinates of our strings starting with \(0\), and hence a string \(x\in \{0,1\}^n\) is written as \(x_0x_1\cdots x_{n-1}\).
- We use
*partial*functions which are functions that are not necessarily defined on all inputs. When we write \(f:A \rightarrow B\) this will refer to a*total*function unless we say otherwise. When we want to emphasize that \(f\) can be a partial function, we will sometimes write \(f: A \rightarrow_p B\). - As we will see later on in the course, we will mostly describe our
computational problems in the terms of computing a
*Boolean function*\(f: \{0,1\}^* \rightarrow \{0,1\}\). In contrast, most textbooks will refer to this as the task of*deciding a language*\(L \subseteq \{0,1\}^*\). These two viewpoints are equivalent, since for every set \(L\subseteq \{0,1\}^*\) there is a corresponding function \(f = 1_L\) such that \(f(x)=1\) if and only if \(x\in L\). Computing*partial functions*corresponds to the task known in the literature as a solving a*promise problem*.Because the language notation is so prevalent in textbooks, we will occasionally remind the reader of this correspondence. - Some other notation we use is \(\ceil{x}\) and \(\floor{x}\) for the “ceiling” and “floor” operators that correspond to “rounding up” or “rounding down” a number to the nearest integer. We use \((x \mod y)\) to denote the “remainder” of \(x\) when divided by \(y\). That is, \((x \mod y) = x - y\floor{x/y}\). In context when an integer is expected we’ll typically “silently round” the quantities to an integer. For example, if we say that \(x\) is a string of length \(\sqrt{n}\) then we’ll typically mean that \(x\) is of length \(\lceil \sqrt{n} \rceil\). (In most such cases, it will not make a difference whether we round up or down.)
- Like most Computer Science texts, we default to the logarithm in base two. Thus, \(\log n\) is the same as \(\log_2 n\).
- We will also use the notation \(f(n)=poly(n)\) as a short hand for \(f(n)=n^{O(1)}\) (i.e., as shorthand for saying that there are some constants \(a,b\) such that \(f(n) \leq a\cdot n^b\) for every sufficiently large \(n\)). Similarly, we will use \(f(n)=polylog(n)\) as shorthand for \(f(n)=poly(\log n)\) (i.e., as shorthand for saying that there are some constants \(a,b\) such that \(f(n) \leq a\cdot (\log n)^b\) for every sufficiently large \(n\)).

- The basic “mathematical data structures” we’ll need are
*numbers*,*sets*,*tuples*,*strings*,*graphs*and*functions*. - We can use basic objects to define more complex notions. For
example,
*graphs*can be defined as a list of*pairs*. - Given precise
*definitions*of objects, we can state unambiguous and precise*statements*. We can then use mathematical*proofs*to determine whether these statements are true or false. - A mathematical proof is not a formal ritual but rather a clear, precise and “bulletproof” argument certifying the truth of a certain statement.
- Big-\(O\) notation is an extremely useful formalism to suppress less significant details and allow us to focus on the high level behavior of quantities of interest.
- The only way to get comfort with mathematical notions is to apply them in the contexts of solving problems. You should expect to need to go back time and again to the definitions and notation in this lecture as you work through problems in this course.

Most of the exercises have been written in the summer of 2018 and haven’t yet been fully debugged. While I would prefer people do not post online solutions to the exercises, I would greatly appreciate if you let me know of any bugs. You can do so by posting a GitHub issue about the exercise, and optionally complement this with an email to me with more details about the attempted solution.

- Write a logical expression \(\varphi(x)\) involving the variables
\(x_0,x_1,x_2\) and the operators \(\wedge\) (AND), \(\vee\) (OR), and
\(\neg\) (NOT), such that \(\varphi(x)\) is true if the majority of the
inputs are
*True*. - Write a logical expression \(\varphi(x)\) involving the variables
\(x_0,x_1,x_2\) and the operators \(\wedge\) (AND), \(\vee\) (OR), and
\(\neg\) (NOT), such that \(\varphi(x)\) is true if the sum
\(\sum_{i=0}^{2} x_i\) (identifying “true” with \(1\) and “false” with
\(0\)) is
*odd*.

Use the logical quantifiers \(\forall\) (for all), \(\exists\) (there exists), as well as \(\wedge,\vee,\neg\) and the arithmetic operations \(+,\times,=,>,<\) to write the following:

- An expression \(\varphi(n,k)\) such that for every natural numbers \(n,k\), \(\varphi(n,k)\) is true if and only if \(k\) divides \(n\).
- An expression \(\varphi(n)\) such that for every natural number \(n\), \(\varphi(n)\) is true if and only if \(n\) is a power of three.

Describe in words the following sets:

- \(S = \{ x\in \{0,1\}^{100} : \forall_{i\in \{0,\ldots, 99\}} x_i = x_{99-i} \}\)
- \(T = \{ x\in \{0,1\}^* : \forall_{i,j \in \{2,\ldots,|x|-1 \} } i\cdot j \neq |x| \}\)

For each one of the following pairs of sets \((S,T)\), prove or disprove the following statement: there is a one to one function \(f\) mapping \(S\) to \(T\).

- Let \(n>10\). \(S = \{0,1\}^n\) and \(T= [n] \times [n] \times [n]\).
- Let \(n>10\). \(S\) is the set of all functions mapping \(\{0,1\}^n\) to \(\{0,1\}\). \(T = \{0,1\}^{n^3}\).
- Let \(n>100\). \(S = \{k \in [n] \;|\; k \text{ is prime} \}\), \(T = \{0,1\}^{\ceil{\log n -1}}\).

- Let \(A,B\) be finite sets. Prove that
\(|A\cup B| = |A|+|B|-|A\cap B|\).
- Let \(A_0,\ldots,A_{k-1}\) be finite sets. Prove that
\(|A_0 \cup \cdots \cup A_{k-1}| \geq \sum_{i=0}^{k-1} |A_i| - \sum_{0 \leq i < j < k} |A_i \cap A_j|\).
- Let \(A_0,\ldots,A_{k-1}\) be finite subsets of \(\{1,\ldots, n\}\), such that \(|A_i|=m\) for every \(i\in [k]\). Prove that if \(k>100n\), then there exist two distinct sets \(A_i,A_j\) s.t. \(|A_i \cap A_j| \geq m^2/(10n)\).

Prove that if \(S,T\) are finite and \(F:S \rightarrow T\) is one to one then \(|S| \leq |T|\).

Prove that if \(S,T\) are finite and \(F:S \rightarrow T\) is onto then \(|S| \geq |T|\).

Prove that for every finite \(S,T\), there are \((|T|+1)^{|S|}\) partial functions from \(S\) to \(T\).

Suppose that \(\{ S_n \}_{n\in \N}\) is a sequence such that \(S_0 \leq 10\) and for \(n>1\) \(S_n \leq 5 S_{\lfloor \tfrac{n}{5} \rfloor} + 2n\). Prove by induction that \(S_n \leq 100 n \log n\) for every \(n\).

Describe the following statement in English words: \(\forall_{n\in\N} \exists_{p>n} \forall{a,b \in \N} (a\times b \neq p) \vee (a=1)\).

Prove that for every undirected graph \(G\) of \(100\) vertices, if every vertex has degree at most \(4\), then there exists a subset \(S\) of at \(20\) vertices such that no two vertices in \(S\) are neighbors of one another.

For every pair of functions \(F,G\) below, determine which of the following relations holds: \(F=O(G)\), \(F=\Omega(G)\), \(F=o(G)\) or \(F=\omega(G)\).

- \(F(n)=n\), \(G(n)=100n\).
- \(F(n)=n\), \(G(n)=\sqrt{n}\).
- \(F(n)=n\log n\), \(G(n)=2^{(\log (n))^2}\).
- \(F(n)=\sqrt{n}\), \(G(n)=2^{\sqrt{\log n}}\)
- \(F(n) = \binom{n}{\ceil{0.2 n}}\) , \(G(n) = 2^{0.1 n}\).

Give an example of a pair of functions \(F,G:\N \rightarrow \N\) such that neither \(F=O(G)\) nor \(G=O(F)\) holds.

Prove that for every directed acyclic graph (DAG) \(G=(V,E)\), there
exists a map \(f:V \rightarrow \N\) such that \(f(u)<f(v)\) for every edge
\(\overrightarrow{u \; v}\) in the graph.*sink*: a vertex
without an outgoing edge.

Prove that for every undirected graph \(G\) on \(n\) vertices, if \(G\) has at least \(n\) edges then \(G\) contains a cycle.

The section heading “A Mathematician’s Apology”, refers of course to Hardy’s classic book. Even when Hardy is wrong, he is very much worth reading.

Copyright 2018, Boaz Barak.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

HTML version is produced using the Distill template, Copyright 2018, The Distill Template Authors.

## Comments

Comments are posted on the GitHub repository using the utteranc.es app. A GitHub login is required to comment. If you don't want to authorize the app to post on your behalf, you can also comment directly on the GitHub issue for this page.