# Content

- Part 1: Introduction - Computers, Complexity, and Intractability
- Part 2: The Theory of NP-Completeness

Jesus A. Gonzalez

July 17, 2019

- Computers and Intractability: A Guide to the Theory of NP-Completeness, Michael R. Garey and David S. Johnson, Series of Books in the Mathematical Sciences, W. H. Freeman, 1979.

- Part 1: Introduction - Computers, Complexity, and Intractability
- Part 2: The Theory of NP-Completeness

- Provides techniques to prove that a problem is
*just as hard*as a set of other problems that are recognized as being difficult- Prove that a problem is equivalent to another problem that is
*hard*

- Prove that a problem is equivalent to another problem that is

- Finding out that a problem is NP-complete is the first step to work with it
- The problem at hand (i.e. in the industry) still needs a solution
- Discovering a problem is NP-complete provides information
- About the approach(es) we need to follow to solve the problem
- Trying to design an
algorithm should have a*exact**low priority*

- With NP-complete problems, we look for
approaches*less ambitious*- Concentrate in special cases of the general problem
- Algorithms that do not run quick but find a solution
*most of the time* - Relax the problem by removing some constraints

- Why We Need NP-completeness Theory
- Assist algorithm designers to focus toward approaches that lead to useful algorithms

- Problem
- A general question to be answered
- Has several parameters or free variables with unspecified values

- Problem description
- A general description of all the problem parameters
- A statement of what properties the answer or solution to the problem must satisfy

- Instance of a problem
- Specification of particular values for all parameters of a problem

- Example of the “traveling salesman problem”
*cities*: \(C = \{c_1, c_2, \dots, c_m\}\)*distances*: \(d(c_i, c_j)\) the distance between each pair of cities*solution*: an ordering \(\langle c_{\pi(1)}, c_{\pi(2)}, \dots, c_{\pi(m)} \rangle\) of the given cities minimizing\(\Biggl[\sum\limits_{i=1}^{m-1} d(c_{\pi(i)}, c_{\pi(i+1)}) + d(c_{\pi(m)}, c_{\pi(1)})\Biggr]\)

- Example of the “traveling salesman problem” \(\dots\)
- A tour that starts at the first city, visits each city in sequence, and returns to the first city from the last city
- First city: \(c_{\pi(1)}\)
- Last city: \(c_{\pi(m)}\)

- An instance
- \(C = \{c_1, c_2, c_3, c_4\}\), \(d(c_1, c_2) = 10\), \(d(c_1, c_3) = 5\), \(d(c_1, c_4) = 9\), \(d(c_2, c_3) = 6\), \(d(c_2, c_4) = 9\), \(d(c_3, c_4) = 3\)
- Ordering \(\langle c_1, c_2, c_4, c_3 \rangle\) is a solution with length 27

- Example of the “traveling salesman problem” \(\dots\)

- Algorithm
- Step by step procedure to solve problems
- A computer program
- An algorithm
a problem \(\Pi\) if that algorithm can be applied to any instance \(I\) of \(\Pi\) and is always*solves*to produce a soluction for that instance \(I\).*guaranteed* - An algorithm
solve the traveling salesman problem unless it always constructs an ordering that gives a minimum length tour*does not*

- Efficient
- We are interested in finding the most
algorithm to solve a problem*efficient* - Involves various computing resources needed to execute an algorithm
- We concentrate on
*time*

- We are interested in finding the most

- Time requirements
- Conveniently expressed in terms of a single variable:
of a problem instance*size* - Shows the ammount of input data needed to describe the instance
- We expect the relative difficulty of problem intances to vary with their size

- Conveniently expressed in terms of a single variable:

- Time requirements \(\dots\)
- The size of a problem might be measured in an informal way
- In the TSP, number of cities is used as its size
- There is more to consider: numbers defining distances between cities, size of these numbers
- We should take all these factors into account

- Problem instance description associated to a single finite string of symbols chosen from a finite alphabet
- Each problem is associated with a fixed encoding scheme

- Encoding scheme
- Maps problem instances into strings describing them

- Input length
- The
for an instance \(I\) of a problem \(\Pi\) is defined as the number of symbols in the description of \(I\) obtained from the encoding scheme for \(\Pi\)*input length* - This is the
*input lenght*used as the formal measure of instance size

- The

- Example of encoding scheme
- Alphabet: \(\{c, [, ], /, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9\}\)
- Representation of an instance encoded with the string “\(c[1]c[2]c[3]c[4]//10/5/9//6/9//3\)”
- Input length: 32

- Expresses its time requirements
- For each input length, gives the largest amount of time needed by the algorithm to solve a problem instance of that size
- We need to define the encoding scheme to determine the input length
- We need to define the computer model to determine the execution time
*These have little effect on the broad distinctions made in the theory of NP-completeness*- We focus on time complexity using input lengths and execution times

- Different algorithms \(\rightarrow\) different time complexity functions

- Which of these are
*efficient enough*? and Which are too*inefficient*?- This depends on the problem at hand

- Computer scientists make a distinction
- Distinction between
*polynomial time algorithms*and*exponential time algorithms*

- Distinction between
- Let us say that a function \(f(n)\) is \(O(g(n))\) whenever there exists a constant \(c\) such that \(|f(n)| \leq c \cdot |g(n)|\) for all values of \(n \geq 0\)

- Polynomial time algorithm
- One whose time complexity function is \(O(p(n))\) for some polynomial function \(p\), where \(n\) is used to denote the input length

- Exponential time algorithm
- Any algorithm whose time complexity function cannot be so bounded (including functions such as \(n^{log n}\) which are not normally considered exponential functions)

- This distinction is important
- When considering the solution to large problem instances
- Explosive growth rates for the exponential complexity functions

- These tables show why polynomial time algorithms are much more desirable than exponential time ones
- Theory of NP-completeness based on
- Two types of algorithms
- Polynomial time algorithms are much more desirable

- Exponential time algorithms
- Should not be considered
*good*algorithms - Most exponential time algorithms are variations on exhaustive search

- Should not be considered
- Polynomial time algorithms
- Made possible through gain of deeper insight into the structure of the problem
- Agreement that a problem has not been
*well solved*until a polynomial time algorithm is known for it

- Intractable
- A problem is intractable if it is so hard that no polynomial time algorithm can possibly solve it

- Distinction between
*efficient*polynomial time algorithms and*inefficient*exponential time algorithms- Exceptions when problem instances of interest have limited size
- i.e. \(2^n\) algorithm is faster than the \(n^5\) algorithm for \(n \leq 20\) (fig. 1.2)

- Many exponential time algorithms are useful in practice
- Time complexity defined as a
*worst-case*measure - Algorithm with time complexity \(2^m\) means
- At least one instance of size \(n\) requires that time
- Most problem instances may require less time
- Simplex algorithm (linear programming) shown to have exponential complexity, in practice
*runs quickly* - Branch and bound algorithms for the knapsack problem are successful

- Time complexity defined as a

- Many exponential time algorithms are useful in practice \(\dots\)
- Examples like these are rare
- Many algorithms are known to be exponential but
*few of them considered as useful in practice*

*Probably efficient*algorithms- Those with degree 2 or 3 at worst
- Do not involve extremely large coefficients
**A desired property**

*Intractability*of a problem- Independent of the encoding scheme and computer model used to obtain time complexity
- Encoding schemes seem to differ at most polynomially from one another
*Reasonable*encoding scheme would not differ more than polynomially*Reasonable*, not a formal definition

- Reasonable encoding
*the encoding of an instance \(I\) should be concise and not “padded” with unnecessary information or symbols**numbers occurring in \(I\) should be represented in binary (or decimal, or octal, or in any fixed base other than 1)*

- Follow encodings with these conditions
- Encoding scheme should not affect determination of whether a given problem is intractable

- Similar guidelines for the choice of computer models
- One-tape Turing machines, multi-tape Turing machines, random-access machines (RAMs) \(\rightarrow\) equivalent

**Reasonable**: there is a polynomial bound on the amount of work that can be done in a*single unit of time*- Performing arbitrarily many operations in parallel is not considered
*reasonable*- Not considered for now

- Performing arbitrarily many operations in parallel is not considered

- Two causes of intractability allowed by our definition
- So difficult problem that an exponential amount of time is required to solve it
- The solution is so extensive that it cannot be described with an expression having length bounded by a polynomial function (the input length)
- i.e. add to TSP parameter B and ask for all tours having total length B or less
- Listing all solutions cannot be done in polynomial time
- This type of complexity is apparent from the problem definition
- The problem is not defined realistically

- We restrict attention to the first type of intractability

- Earliest intractability results
- Decidability results of Alan Turing
- Turing demonstrated that certain problems are so hard that they are
*undecidable*

- Undecidable problems
- No algorithm can be given to solve the problem
- It is impossible to specify any algorithm which
- Given an arbitrary computer program
- Given an arbitrary input to the program
- Can decide whether or not the program will eventually halt when applied to that input

- These problems cannnot be solved by any algorithm
- Much less in \(p\) time
**They are intractable in a especially strong sense**

- Intractable decidable problems
- First ones obtained in the early 1960’s
- Work on complexity hierarchies (Hartmanis and Stearns, 1965)
- Only artificially constructed problems

- In the early 1970’s
- Work by (Meyer and Stockmeyer, 1972), (Fisher and Rabin, 1974),
- Proved some
*natural*decidable problems to be intractable - Proofs show these problems cannot be solved in polynomial time using even a
*nondeterministic*computer model (computing in parallel)

- First ones obtained in the early 1960’s

- Probably intractable problems fall into two categories
- Undecidable
- Non-deterministically intractable

- Most of the apparently intractable problems in practice
- Are decidable
- Can be solved in polynomial time with aid of nondeterministic computer

- Study how problems are related with respect to their difficulty
- These relations provide useful information to algorithm designers
- Reducing
- Technique used to demonstrate that two problems are related one to the other
- Transformation that maps any instance of the first problem into an equivalent instance of the second
- Convert any algorithm that solves the second problem into an algorithm for sonving the first one

- Foundations of NP-completeness
- The Complexity of Theorem Proving Procedures, Stephen Cook, 1971

- Important things of this paper
- Emphasized significance of
*polynomial time reducibility*- Transformation (reduction) can be executed by a polynomial time algorithm
- polynomial time reduction from one problem to the other implies that any polynomial time algorithm for the second problem can be converted into a corresponding polynomial time algorithm for the first problem

- Emphasized significance of

- Foundations of NP-completeness
- The Complexity of Theorem Proving Procedures, Stephen Cook, 1971

- Important things of this paper
- Focused on the
solvable in polynomial time by a nondeterministic computer*class NP of decision problems*- Decidable problem: its solution is either
*yes*or*no* - Most apparently intractable problems (encountered in practice), when described as decision problems, belong to this class

- Decidable problem: its solution is either

- Focused on the

- Foundations of NP-completeness
- The Complexity of Theorem Proving Procedures, Stephen Cook, 1971

- Important things of this paper
- Proved that the
problem has the property: every other problem in NP can be polynomially reduced to it*satisfiability*- If satisfiability problem can be solved with a \(p\) time algorithm \(\rightarrow\) every other problem in \(NP\) can be polynomially reduced to it
- If any problem in \(NP\) is intractable \(\rightarrow\) satisfiability problem must also be intractable
- In a sense, satisfiability problem is the
*hardest*problem in \(NP\)

- Proved that the

- Foundations of NP-completeness
- The Complexity of Theorem Proving Procedures, Stephen Cook, 1971

- Important things of this paper
- Cook suggested other problems in \(NP\) might be as
*hard*as the satisfiability problem and be members of the \(NP\) class- i.e. Does a given graph \(G\) contain a complete subgraph on a given number of \(k\) of vertices?
- (Richard Karp, 1972) proved that decision versions of other problems are just as
*hard*as the satisfiability problem (i.e. TSP)

- Cook suggested other problems in \(NP\) might be as

- The hardest problems in \(NP\) are known as the class of
*NP-complete problems* - There is not a proof that
*NP*-completeness implies intractability - knowing that a problem is
*NP*-complete suggests that a disruptive algorithm is required to solve it in polynomial time

- NP-Completeness theory
- Designed to apply only to
*decision problems*(DP) - DPs have only two possible solutions:
*yes*or*no*

- Designed to apply only to
- Definition
- “A Decision Problem \(\Pi\) consists of a set \(D_n\) of
*instances*and a subset \(Y_n \subseteq D_n\) of*yes-instances*” - Most problems have more structure than a DP
- We describe them so that they emphasize this structure

- “A Decision Problem \(\Pi\) consists of a set \(D_n\) of

- We use a standard format to specify problems
*Generic instance*of the problem using*sets*,*graphs*,*functions*,*numbers*, etc.- A yes-no
*question*asked in terms of the generic instance

- An instance belongs to \(D_{\Pi}\) iff it can be obtained from the generic instance by substituting objects of the specified types for all generic components
- An instance belongs to \(Y_{\Pi}\) iff the answer for the question (specific to an instance) is
*yes*

- Subgraph Isomorphism
- INSTANCE: Two graphs, \(G_1 = (V_1, E_1)\) and \(G_2 = (V_2, E_2)\).
- QUESTION: Does \(G_1\) contain a subgraph isomorphic to \(G_2\), that is, a subset \(V' \subseteq V_1\) and a subset \(E' \subseteq E_1\) such that \(|V'| = |V_2|, |E'| = |E_2|\), and there exists a one-to-one function \(f:V_2 \rightarrow V'\) satisfying \(\{u, v\} \in E_2\) if and only if \(\{f(u), f(v)\} \in E'\)?

- Traveling Salesman
- INSTANCE: A finite set \(C = \{c_1, c_2, \dots, c_m\}\) of “cities”, a “distance” \(d(c_i, c_j) \in Z^+\) for each pair of cities \(c_i, c_j \in C\), and a bound \(B \in Z^+\) (where \(Z^+\) denotes the positive integers).
- QUESTION: Is there a “tour” of all the cities in \(C\) having total length no more than \(B\), that is, an ordering \(\langle c_{\pi(1)}, c_{\pi(2)}, \dots, c_{\pi(m)} \rangle\) of \(C\) such that
\(\Biggl[\sum\limits_{i=1}^{m-1} d(c_{\pi(i)}, c_{\pi(i+1)}) \Biggr] + d(c_{\pi(m)}, c_{\pi(1)}) \leq B\) **?**

- Set of symbols \(\Sigma\)
- \(\Sigma^*\), set of finite strings from \(\Sigma\)
- Language over the alphabet \(\Sigma\)
- \(\{01, 10, 001, 110, 0001, 1110\}\) is a language over \(\{0, 1\}\)

- Correspondence between decision problems and languages
- Related to the encoding schemes to specify problem instances
- Encoding schema \(e\) for problem \(\Pi\) used to describe instances of \(\Pi\) with strings that belong to a fixed alphabet \(\Sigma\)

- \(\Pi\) and \(e\) partition \(\Sigma^*\) in three classes of strings
- Strings that are not encodings of instances of \(\Pi\)
- Strings that encode instances of \(\Pi\) with answer
*no* - Strings that encode instances of \(\Pi\) with answer
*yes*

The \(3^{rd}\) class of strings is the language associated to \(\Pi\) and \(e\)

- \(L[\Pi, e] = \{x \in \Sigma^* \}\)
- \(\Sigma\) is the alphabet used by \(e\), and \(x\) is the encoding under e of an instance \(I \in Y_{\Pi}\)

- \(L[\Pi, e] = \{x \in \Sigma^* \}\)

- We apply our language theory to decision problems
- If a result holds for language \(L[\Pi, e]\), it also holds for the problem \(\Pi\) under the encoding scheme \(e\)

- We use TMs to formalize the notion of algorithm
- Deterministic one-tape Turing machine (DTM)

- A program in a DTM is specifies as:
- A finite set of
*symbols*\(\Gamma\), including*input symbols*\(\Sigma \subset \Gamma\) and the*blank symbol*\(b \in \Gamma - \Sigma\) - A finite set \(Q\) of
*states*, including a distinguished*start state*\(q_0\) and two distinguished*halt-states*\(q_Y\) and \(q_N\) - A
*transition function*\(\delta:(Q-\{q_Y, q_N\}) \times \Gamma \rightarrow Q \times \Gamma \times \{-1, +1\}\)

- A finite set of

-Image from (Garey and Johnson, 1979)

-Image from (Garey and Johnson, 1979)

- Computation finishes after 8 steps in state
*\(q_Y\)* - Answer for instance “10100” is
**yes**

- A DTM program \(M\) with input alphabet \(\Sigma\)
*accepts*\(x \in \Sigma^*\) iff \(M\) halts in state \(q_Y\) when applied to input \(x\) - The language \(L_M\) recognized by the program \(M\) is given by
- \(L_M = \{x \in \Sigma^*: M \text{ accepts } x\}\)

- The DTM recognizes the language
- \(\lbrace x \in \{0, 1\}^* : \text{ the rightmost two symbols of x are both 0 }\rbrace\)

- For a DTM program to be an
*algorithm*, it must halt on all possible strings over its input alphabet- DTM program of Fig. 22.2 is algorithmic, it halts for any input string from \(\{0, 1\}^*\)

- Correspondence between
*recognizing*languages and*solving*decision problems- A DTM program \(M\)
*solves*the decision problem \(\Pi\) under encoding scheme \(e\) if \(M\) halts for all input strings over its input alphabet and \(L_M = L[\Pi, e]\)

- A DTM program \(M\)

- Formal definition of
*time complexity*- The time used in the computation of a DTM program \(M\) on an input \(x\) is the number of steps occurring on that computation up until a halt state is entered
- For a DTM program \(M\) that halts for all inputs \(x \in \Sigma^*\), its
*time complexity function*\(T_M(n) : Z^+ \rightarrow Z^+\) is given by

\(\begin{aligned} T_M(n) = {} & \text{max} \lbrace m : \mbox{there is an } x \in \Sigma^* \mbox{, } \mbox{with } |x|=n \mbox{, such that}\\ & \mbox{the computation of } M \mbox{ on input } x \mbox{ takes time } m \rbrace \end{aligned}\)

- Program \(M\) is called a
*polynomial time DTM program*if there exists a polynomial \(p\) such that, for all \(n \in Z^+\), \(T_M(n) \leq p(n)\)

\(\begin{aligned} P = {} & \lbrace L : \mbox{there is a polynomial time DTM program } M\\ & \mbox{ for which } L = L_M \rbrace \end{aligned}\)

- A decision problem \(\Pi\) belongs to \(P\) under encoding scheme \(e\) if \(L[\Pi, e] \in P\)
- Language \(L\) represents instances of \(\Pi\) with encoding \(e\)
- \(L\) is equivalent to the language of program \(M\) (\(L_M\))

- Polynomial time DTM program that
*solves*\(\Pi\) under \(e\) - We say that decision problem \(\Pi\) belongs to \(P\)
- \(M\) always gives the right answer for the problem at hand

- Language \(L\) represents instances of \(\Pi\) with encoding \(e\)

- Suppose we have a problem that we know there is not a polynomial time algorithm to solve it
- The Traveling Salesman problem
- Subgraph Isomorphism problem

- Somebody claims that he has found a solution
- We want to prove such a claim is true,
**VERIFY A yes ANSWER**for that particular instance of the problem- Prove the required properties (as in the problem description)

*Verification procedure*\(\rightarrow\) a general algorithm with*time complexity*polynomial in Length[\(I\)]*Polynomial time verifiability***DOES NOT IMPLY POLYNOMIAL TIME SOLVABILITY**

- Define NP in terms of
*nondeterministic*algorithm, given problem instance \(I\)*Guessing stage*: guesses some structure \(S\)*Checking stage*: takes \(I\) and \(S\) as inputs- Compute in normal
*deterministic*way, halting in*yes*, or*no*, or computing forever

- Compute in normal

*Nondeterministic*algorithm*solves*a decision problem \(\Pi\) if two properties hold for all instances \(I \in D_{\Pi}\)- If \(I \in Y_{\Pi}\)
- There exists structure \(S\) that, when guessed for input \(I\), will lead the checking stage to respond
*yes*for \(I\) and \(S\)

- There exists structure \(S\) that, when guessed for input \(I\), will lead the checking stage to respond
- If \(I \notin Y_{\Pi}\)
- There exists
**no**structure \(S\) that, when guessed for input \(I\), will lead the checking stage to respond*yes*for \(I\) and \(S\)

- There exists

- If \(I \in Y_{\Pi}\)

*Nondeterministic*algorithm*solves*decision problem \(\Pi\) in polynomial time if:- There exists polynomial \(p\) such that foreach instance \(I \in Y_{\Pi}\), there is some guess \(S\) that leads the deterministic checking stage to respond
*yes*for \(I\) and \(S\) within time \(p(Length[I])\)

- There exists polynomial \(p\) such that foreach instance \(I \in Y_{\Pi}\), there is some guess \(S\) that leads the deterministic checking stage to respond

- Informally,
*class NP*is defined as- The class of all decision problems \(\Pi\) that, under reasonable encoding schemes, can be solved by polynomial time nondeterministic algorithms

- Polynomial time nondeterministic algorithm
- Our device for polynomial time verifiability
- Not a realistic method to solve decision problems
- For a given input, it has many possible computations, one for each possible guess

- Formal counterpart of a nondeterministic algorithm
- Program for a nondeterministic one-tape Turing Machine (NDTM)

- We augment our DTM with a
*guessing module*with its own*write-only*head- Writes the
*guess*to the tape

- Writes the