Beyond Classical Search

Jesus A. Gonzalez

September 19, 2016

Beyond Classical Search

Now we take a look and see what happens when we relax some assumptions about the environment:
- Observable
- Deterministic
- Known
Topics
- Local search
  - Simulated Annealing
  - Genetic Algorithms
- Online search

Local Search and Optimization Problems

Previous methods find a path to the goal state
There are problems in which the path to the goal is irrelevant
- 8-queens problem
- Integrated-circuit design
- Factory-floor layout
- Job-shop scheduling
- Automatic programming
- Telecommunications network optimization
- Vehicle routing
- Portfolio management

Local Search and Optimization Problems

Local Search algorithms
- Use a single current node
- Generally move to only neighbors of the current node
- Usually, paths are not kept
Key advantages
- Low memory usage, usually a constant amount
- Able to find reasonable soltions even in large or infinite state spaces
Local Search algorithms
- Also good to solve optimization problems
  - Find the best state according to an objective function

Local Search and Optimization Problems

State-space landscape
- Location, defined by state
- Elevation, defined by the value of the objective function (or heuristic cost function)
If elevation corresponds to cost, we are looking for the lowest valley: global minimum
If elevation corresponds to an objective function, we are looking for the highest peak: global maximum
We convert one to the other by adding a minus sign
Local search algorithms explore this landscape
- Complete: Always finds a goal if it exists
- Optimal: Always finds a global minimum/maximum

Local Search and Optimization Problems

Hill-climbing Search

Hill-climbing, steepest-ascent version
- Continually moves in direction of increasing value, uphill
- Finishes when it reaches a peak \(\rightarrow\) no neighbor has a higher value
- Doesn’t keep a search tree \(\rightarrow\) only keeps record of current state and value of the objective function
- Doesn’t look ahead beyond the immediate neighbors of the current state
- Sometimes called greedy local search

Hill-climbing Search

8-queens problem with hill-climbing
- Typically use a complete-state formulation
- Succesor states generated from current state
  - By changing configuration (i.e. 8-queens problem: by moving a single queen to another square in same column, 8x7 = 56 successors)
- Heuristic cost function, h
  - Number of pairs of queens attacking each other
  - Global minimum if h is zero, perfect solutions
- Chooses randomly among best successors (when more than one)

Hill-climbing Search

\(h\) is the number of pairs of queens attacking each other

Hill-climbing Search

Hill-climing often gets stuck
- Local maxima: a peak higher than each of its neighbors but lower than the global maximum (see figure 3b)
- Ridge: a sequence of local maxima difficult to navigate for a greedy algorithm
- Plateau: a flat area of the state-space landscape, can be a flat local maximum (no uphill exit exists)
- Shoulder: a plateau in which progress is possible (figure 1)
- Sideways move: moving in the hope that a plateau is a shoulder
  - Might get into infinite loop when reaching a flat local maxima (not a shoulder)
  - Put limit on number of consecutive sideways moves allowed

Hill-climbing Search

Stochastic hill climbing
- Chooses at random from among uphill moves
- Probability of selection can vary with steepness of uphill move
- Slower than steepest ascent but might find better solutions
First-choice hill climbing
- Implementation of stochastic hill climbing
- Randomly generates successors, until one is better than current state
- Good when a state has many successors (thousands)

Hill-climbing Search

Random-restart hill climbing
- Performs a series of hill-climbing searches from randomly generated initial states, until reaching a goal
- Trivially complete with probability approachng 1 (eventually generate a goal state as initial state)
- With probability of success of \(p\), expected number of restarts is \(1/p\), 8-queens has \(p \approx 0.14\), roughly 7 itertions to find a goal

Hill-climbing Search

Success in hill-climbing depends on shape of the state-space
NP hard problems might have exponential number of local maxima
Reasonably good local maxima can be found after a few restarts

Simulated Annealing

Hill-climbing that never makes a down-hill
Incomplete
Combines hill-climbing with random walk, yielding efficiency and completeness
Annealing
- Metallurgy: process to temper or harden metals and glass by heating them to a high temperature and then gradually cooling them and allowing the material to reach a low-energy crystalline state

Simulated Annealing

Gradient descent (minimizing cost)
- Example: imagine task of getting a ping-pong ball into the deepest crevice in a bumpy surface
  - Shake the surface to get out of local minimum
- Simulating annealing
  - Start shaking hard (at high temperature)
  - Gradually reduce the intensity of shaking (at lower temperature)

Local Beam Search

Keeps track of k states rather than only one
Begins with \(k\) randomly generated states
Each step, it generates all successors of the \(k\) states
- If one is a goal, the algorithm finishes
- Otherwise, selects the \(k\) best successors from the whole list, repeats the process
  - These \(k\) best successors are a guide for the next iteration

Local Beam Search

Stochastic beam search
- To avoid concentrating in a small region of the state space
- Chooses \(k\) successors randomly
- Probability of choosing a successor as increasing function of its value
- Resembles natural selection
  - successors (offspring) of a state (organism) populate the next generation according to its value (fitness)

Genetic Algorithms

Genetic Algorithms (GA) are a variant of stochastic beam search
Successors generated by combining two parent states (instead of modifying a single state)

Genetic Algorithms

GAs
- Population: k randomly generated states to start the process
- Individual: representing a state as a string over a finite alphabet
  - i.e. a string of 0s and 1s, see figure 6a
- Fitness function: states rated by the fitness function
  - Should return higher values for better states
  - 8-queens, number of nonattacking pairs of queens, 28 for a solution
  - Probability of being chosen for reproduction is directly proportional to fitness value

Genetic Algorithms

GAs
- Selection: Select the best fitted individuals
- Crossover: When two individuals are mated, a crossover point is chosen randomly
- Mutation: Each location subject to random mutation with a very small probability
- Also combine uphill tendency with random exploration (as in stochastic beam search)
- Also exchange information among parallel search threads

Genetic Algorithms

GAs
- Great and widespread impact on optimization problems

Local Search in Continuous Spaces

Most real-world environments are continuous
States are represented with a vector of numerical variables
- \(\vec{x} = (x_1, x_2, x_3, \dots, x_n)\), \(x_i \in R\)
Objective function
- \(f(x_1, x_2, x_3, \dots, x_n)\), or \(f(\vec{x})\)

Gradient Descent/Ascent

Many algorithms use the gradient of the landscape to find a maximum/minimum
Gradient of the objective function is a vector \(\nabla f\)
- Gives magnitude and direction of the steepest slope
- \(\nabla f = \left( \frac{\delta f}{\delta x_1}, \frac{\delta f}{\delta x_2}, \frac{\delta f}{\delta x_3}, \dots, \frac{\delta f}{\delta x_n} \right)\)

Gradient Descent/Ascent

<> http://jmvidal.cse.sc.edu/talks/ann/graddesgrad.html

Gradient Descent/Ascent

Sometimes we can find a maximum by solving \(\nabla f = 0\)
- But there are cases in which we cannot
  - Cases in which we cannot solve the equation in closed form (i.e. evaluate with a finite number of operations) for the general case
  - We can still try to solve the problem locally
For problems without a close form solution
- Solve the problem locally with hill climbing
- We set the direction to move to the steepest slope

Gradient Descent/Ascent

Steepest-ascent hill climbing
- \(\vec{x} = \vec{x} + \alpha \nabla f(\vec{x})\), gradient ascent
- \(\vec{x} = \vec{x} - \alpha \nabla f(\vec{x})\), gradient descent
\(\alpha\) is the Step Size, a small constant
- Many methods to adjust \(\alpha\)
- If \(\alpha\) is too small \(\rightarrow\) too many steps to find the solution
- If \(\alpha\) is too large \(\rightarrow\) the search could jump/miss the solution

Gradient Descent/Ascent

Adjusting \(\alpha\)
- Linear Search
- Newton-Raphson
  - General method for solving equations of the form \(g(x) = 0\)
  - Computes a new estimate for the root \(x\) according to Newton’s formula
    - \(x \leftarrow x - g(x)/g'(x)\).
  - \(\vec{x} \leftarrow \vec{x} - H_f^{-1}(\vec{x}) \nabla f(\vec{x})\)
  - \(H_f(\vec{x})\) is the Hessian matrix of second derivatives, \(H_{ij}\) given by \(\frac{\delta^2f(\vec{x})}{\delta_{x_i}\delta_{x_j}}\)