Quantifying Uncertainty
Jesus A. Gonzalez
4 de julio de 2016
Introduction
- Acting Under Uncertainty
- Agents may need to handle uncertainty
- Partial observability, which state the agent reached?
- Nondeterminism, where the agent will end after a sequence of actions?
- Combination of the two
- They are designed to handle uncertainty
- Keeping track of a belief state
- Representation of the set of all possible world states that it might be in
- Generating contingency plan
- To handle eventualities that its sensors report during execution
Introduction
- Acting Under Uncertainty
- There are drawbacks to this approach
- Leads to impossible large and complex belief-state representations
- The agent is dealing with partial sensor information
- Must consider every logically possible explanation of observations (even if some are very unlikely)
- A correct contingent plan handling every eventuality can be too large
- Must consider even those very unlikely contingencies
- Sometimes there is no plan that guarantees achieving the goal
Introduction
- Example: Automated taxi with goal of taking a passenger to the airport on time
- Agent makes a plan: \(A_{90}\)
- Leave home 90 minutes before flight departs and driving at reasonable speed
- Airport is close, about 5 miles away
- Taxi agent cannot be certain that Plan \(A_{90}\) will get the passenger to the airport on time
- Can only make weaker conclussion Plan \(A_{90}\) will get the passenger to the airport on time if car doesn’t break down or run out of gas, and don’t get into an accident, and there are no accidents on bridge, and plane doesn’t leave early, and no meteorite hits the car, and ….
- This is called the qualification problem
Introduction
- Example: Automated taxi with goal of taking a passenger to the airport on time
- \(A_{90}\) is the correct plan to take, expected to maximize the agent’s performance measure
- Getting on time to the airport
- Avoid long unproductive wait at the airport
- Avoid speeding tickets
Introduction
- Example: Automated taxi with goal of taking a passenger to the airport on time
- Agent cannot guarantee these outcomes but can provide a degree of belief that they will be achieved
- \(A_{180}\) might increase the agent’s belief to get to the airport on time
- Also increases likelihood of waiting a long time
- Rational decision
- Depends on relative importance of various goals and likelihood and degree to which they will be achieved
Summarizing Uncertainty
- Example of uncertain reasoning
- Diagnosing a dental patient’s toothache
- Diagnosis almost always involves uncertainty
- Medicine, car repair, etc.
- Rules for dental diagnosis using propositional logic
- See how logical approach breaks down
- \(Toothache \Rightarrow Cavity\)
- This rule is wrong, not all patients with a \(toothache\) have a \(cavity\), some have gum disease, an abscess, or other problem
- \(Toothache \Rightarrow Cavity \lor GumProblem \lor Abscess \dots\)
- Need to add all possibilities to make the rule true
Summarizing Uncertainty
- We can invert the rule
- \(Cavity \Rightarrow Toothache\)
- This rule isn’t right either, not all cavities cause pain
- To make rule true, we need to quantify in all required ways for a cavity to cause a toothache
Summarizing Uncertainty
- Trying to use logic for medical-diagnosis fails for 3 reasons
- Laziness
- Too much work to list the complete set of antecedents or consequents
- to ensure an exceptionless rule
- Too hard to use that rule
- Theoretical ignorance
- Medical science has no complete theory about the domain
- Practical ignorance
- Might be uncertain about a patient because not all tests have been run on her/him
Summarizing Uncertainty
- Connection between toothaches and cavities is not a logical consequence
- Typical in medical domain and most judgmental domains: law, business, design, automobile repair, \(\dots\)
- Agent can at best provide
- Degree of belief in the relevant sentences
Summarizing Uncertainty
- Tool to deal with degrees of belief
- Probability theory
- Ontological commitments of logic and probability theory are the same
- The world is composed of facts that hold or do not hold in any particular case
- Epistemological commitments are different
- A logical agent believes each sentence is true or false or has no opinion
- A probabilistic agent has a degree of belief from 0 (false) to 1 (true)
Summarizing Uncertainty
- Probability can be used as a way to sumarize the uncertainty that comes from our laziness and ignorance
- Solving the qualification problem
- We might not know for sure about the problem of a particular patient
- We have a belief, we believe with 80% of chance (probability of 0.8) that the patient with a toothache has a cavity
- This belief can be derived from statistical data
- 80% of toothaches patients also had a cavity
Summarizing Uncertainty
- Probability statements are made with respect to a knowledge state, not with respect to the real world
- The probability that a patient has a cavity, given that she has a toothache, is 0.8
- The probability that a patient has a cavity, given that she has a toothache and a history of gum disease, is 0.4
- The probability that a patient has a cavity, given all we now know, is almost 0
Uncertainty and Rational Decisions
- Which of these plans to go to the airport is a rational choice?
- \(A_{90}\) (90 minutes before), \(A_{180}\) (180 minutes before), or \(A_{1440}\) (1 day before)
- It depends on our preferences, an agent must have preferences
- We want to arrive on time but not waiting too much
- Intolerable wait and food?
- We can use utility theory
- To reason and represent preferences
- In utility theory, every state has a degree of usefulness or utility
- An agent prefers states with high utility
Uncertainty and Rational Decisions
- Utility of a state is relative to an agent
- If an agent is winning a game, the other is loosing it
- In a chess game against the world champion
- Most people would be extremely happy with a draw
- The former world champion might not
- A utility function can account for any set of preferences
- Preferences might be quirky or atypical, noble or perverse
Uncertainty and Rational Decisions
- Decision theory
- Combines preferences (expressed with utilities) with probabilities in the theory of rational decisions
- Decision theory = probability theory + utility theory
- Rational agent in decision theory
- An agent is rational iff it chooses the action that yields the highest expected utility, averaged over all the possible outcomes of the action
- This is the maximum expected utility (MEU) principle
- expected means the “average” or “statistical mean” of the outcomes weighted by the probability of the outcome
Uncertainty and Rational Decisions
- Agent using decision theory to select actions
- Identical (abstract level) to agents that maintain a belief state of the world
- Through a history of percepts
- Primary difference
- The decision-theoretic agent’s belief state represents: possibilities for world states + their probabilities
- Given the belief state, can make probabilistic predictions of action outcomes
- Can select action with highest expected utility
Basic Probability Notation
- In order to represent/use probabilistic information
- We need a formal language
- Probabilistic assertions are about possible worlds
- About how probable worlds are
- Sample space is the set of all possible worlds
- The possible worlds are
- Mutually exclusive
- Exhaustive
- \(\Omega\) refers to the sample space
- \(\omega\) refers to elements of the space, particular possible worlds
Basic Probability Notation
- Probability model
- A fully specified probability model associates a probability \(P(\omega)\) with each possible world
- Basic axioms of probability say
- Every possible world has probability between 0 and 1: \(0 \leq P(\omega) \leq 1\) for every \(\omega\)
- The total probability of the set of possible worlds is 1: \(\sum\limits_{\omega \in \Omega} P(\omega) = 1\)
Basic Probability Notation
- Example: rolling 2 dies, assuming each die is fair
- Possible worlds (1,1), (1,2), \(\dots\), (6,6)
- Each with probability \(1/36\)
- Example: rolling 2 dies, assuming dice have tendency to produce the same number
- Worlds (1,1), (2,2), (3,3), etc might have higher probabilities
- This leaves the other worlds with lower probabilities
Basic Probability Notation
- Probabilistic assertions and queries
- Usually about sets of worlds (not over particular possible worlds)
- i.e. cases where 2 dice add up 11, cases where doubles are rolled
- Events
- A set of outcomes of an experiment
- In AI these sets are described by propositions in a formal language
- Proposition
- Describes a set of worlds
- Those worlds corresponding to the possible worlds in which the proposition holds
Basic Probability Notation
- Probability associated with a proposition
- Sum of the probabilities of the worlds in which it holds
- For any proposition \(\phi\), \(P(\phi) = \sum\limits_{\omega \in \phi} P(\omega)\)
- Example: \(P(Total=11) = P((5,6)) + P((6,5)) = 1/36 + 1/36 = 1/18\)
- Probability theory doesn’t require complete knowledge of probs of each possible world
- If we believe dice conspire to produce the same number, we might assert \(P(doubles) = 1/4\)
- But we don’t know if dice prefer double 6 or double 2
Basic Probability Notation
- Unconditional probabilities or prior
- The degree of belief in propositions
- In the absense of any other information
- Evidence
- Information that has been revealed
- Conditional probability or posterior
- i.e. rolling doubles given that the first die is a 5
- \(P(doubles | Die_1 = 5)\)
Basic Probability Notation
- Example: Going to the dentist for a checkup
- The prior of a cavity:
- If I go to the dentist because I have a toothache
- \(P(cavity | toothache) = 0.6\)
- Whenever toothache is true and we have no further information, we conclude that cavity is true with probability 0.6
- THIS DOESN’T MEAN THAT whenever toothache is true, conclude that cavity is true with probability 0.6
- i.e. \(P(cavity | toochache \land \neg cavity) = 0\)
Basic Probability Notation
- Conditional probabilities are defined in terms of unconditional probabilities
- \(P(a|b) = \frac{P(a \land b)}{P(b)}\), holds when \(P(b) > 0\)
- \(P(doubles | Die_1 = 5) = \frac{P(doubles \land Die_1 = 5)}{P(Die_1 = 5)}\)
- Product rule
- Conditional probability written as the product rule
- \(P(a \land b) = P(a|b) P(b)\)
Basic Probability Notation
- The language of propositions in probability assertions
- We use a notation combining propositional logic and constraint satisfaction
- To write propositions describing sets of possible worlds
- A factored representation
- Possible world represesented by a set of attribute/value pairs
Basic Probability Notation
- Random variables
- In probability theory, variables are called random variables
- Names of variables begin with an uppercase letter
- Domain of a variable is the set of possible values that it can take
- A boolean variable has a domain of \(\{true, false\}\)
- i.e. \(Doubles = true\), for the proposition that doubles are rolled
- \(A = true\), abbreviated as \(a\)
- \(A = false\), abbreviated as \(\neg a\)
- Domains can be finite or infinite
- Finite: \(\{ red, blue, green\}\)
- Infinite discrete: like integer numbers
- Infinite continuous: like real numbers
Basic Probability Notation
- We can combine elementary propositions by using connectives (from propositional logic)
- i.e. " The probability that the patient has a cavity, given that she is a teenager with no toothache, is 0.1"
- \(P(cavity | \neg toothache \land teen) = 0.1\)
- To specify probabilities of all possible values of a random variable
- \(P(Weather = sunny) = 0.6\)
- \(P(Weather = rain) = 0.1\)
- \(P(Weather = cloudy) = 0.29\)
- \(P(Weather = snow) = 0.01\)
- We can abbreviate the previous result as: (refer to these abbreviations as the P notation)
- P\((Weather) = \langle 0.6, 0.1, 0.29, 0.01 \rangle\), P is a vector
Basic Probability Notation
- Probability distributions (for single variables)
- The probability of each measurable subset of the possible outcomes of a random experiment
- P\((Weather) = \langle 0.6, 0.1, 0.29, 0.01 \rangle\) defines a probability distribution for random variable Weather
- P\((X|Y)\) gives the values for \(P(X=x_i | Y=y_j)\) for each possible \(i\), \(j\) pair
Basic Probability Notation
- For continuous variables we use a probability density function or pdf
- Describes relative likelihood for a random variable to take a given range of values (including very small ranges)
- But we can define the probability that a random variable takes on some value \(x\)
- \(P(NoonTemp = x) = Uniform_{[18C, 26C]}(x)\)
- Expresses the belief that temperature at noon is uniformly distributed, between 18 and 26 degrees Celcius
- \(P(x) = \lim\limits_{dx \rightarrow 0} P(x \leq X \leq x + dx)/dx\)
Basic Probability Notation
- Joint Probability Distribution
- Distributions on multiple variables
- P\((Weather, Cavity)\)
- Denotes probabilities of all combinations of the values of Weather and Cavity
- A 4 \(\times\) 2 table, called the joint probability distribution of Weather and Cavity
- Can mix variables with and without values: P\((sunny, Cavity)\)
- A vector with 2 elements, the probabilities of a sunny day with a cavity and a sunny day with no cavity
- We can use the P notation to make an expression more concise
- i.e. the product rule to express all possible values of Weather and Cavity \[P(Weather, Cavity) = P(Weather | Cavity)P(Cavity)\]
- instead of writing these 4 \(\times\) 2 = 8 equations
Basic Probability Notation
- The probability associated with a proposition is defined as the sum of the probabilities of the worlds in which it holds
- For proposition \(\phi\), \(P(\phi) = \sum\limits_{\omega in \phi} P(\omega)\)
- A possible world is an assignment of values to all of the random variables under consideration
- With variables Cavity, Toothache, and Weather, we have \(2 \times 2 \times 4 = 16\) possible worlds
- A probability model is completely determined by the joint distribution for all of the random variables
- The full joint probability distribution
- For variables Cavity, Toothache, and Weather:
- Full joint probability distribution
- P\((Cavity, Toothache, Weather)\), a \(2 \times 2 \times 4\) table with 16 entries
- A proposition’s probability is the sum over possible worlds
Inference Using Full Joint Distributions
- Probabilistic Inference
- Computation of postarior probabilities for query propositions given observed evidence
- Our knowledge base is the full joint distribution
- Use it to answer questions!
- Techniques for manipulating equations involving probabilities
Inference Using Full Joint Distributions
- Inference example
- Domain with 3 variables, booleans:
- \(Toothache\)
- \(Cavity\)
- \(Catch\): the dentist’s steel probe catches in the tooth
- Full joint distribution: \(2 \times 2 \times 2\) table
Inference Using Full Joint Distributions
- Probabilities sum to 1
- Equation \(P(\phi) = \sum\limits_{\omega \in \phi} P(\omega)\) gives direct way to compute the probability of any proposition
- \(P(cavity \lor toothache) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28\)
Inference Using Full Joint Distributions
- Marginal probability
- Used to extract the distribution over some subset of variables or a single variable
- Adding entries in the first row: gives unconditional or marginal probability of cavity
- \(P(cavity) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2\)
- This process is called marginalization, or summing out
- Sum probabilities for each possible value of the other variables
- Marginalization rule for any sets of variables Y and Z
- \(P(Y) = \sum\limits_{z \in Z} P(Y,z)\)
- \(\sum\limits_{z \in Z}\) refers to the sum over all possible combinations of values of the set of variables Z
- \(P(Cavity) = \sum\limits_{z \in \{Catch, Toothache\}} P(Cavity, z)\).
Inference Using Full Joint Distributions
- Marginal probability
- Conditioning
- Marginalization with conditional probabilities instead of joint probabilities using the product rule
- \(P(Y) = \sum\limits_z P(Y|z)P(z)\).
- Compute the probability of a cavity, given evidence of a toothache
- \(P(cavity|toothache) = \frac{P(cavity \land toothache)}{P(toothache)}\)
- \(=\frac{0.108 + 0.012}{0.108 + 0.012 + 0.016 + 0.064} = 0.6\)
- Compute the probability that there is no cavity, given a toothache
- \(P(\neg cavity | toothache) = \frac{P(\neg cavity \land toothache)}{P(toothache)}\)
- \(=\frac{0.016 + 0.064}{0.108 + 0.012 + 0.016 + 0.064} = 0.4\)
Inference Using Full Joint Distributions
- Normalization
- As they should, both values sum to 1.0
- In these computations the term \(1/P(toothache)\) remains constant, no matter which value of \(Cavity\) we compute
- This term can be viewed as a Normalization constant for the distribution \(P(Cavity | toothache)\)
- Ensuring it adds up to 1
- We use \(\alpha\) to denote these constants
\(P(Cavity | toothache) = \alpha P(Cavity, toothache)\) \(= \alpha [P(Cavity, toothache, catch) + P(Cavity, toothache, \neg catch)]\) \(= \alpha [\langle 0.108, 0.016 \rangle + \langle 0.012, 0.064 \rangle] = \alpha \langle 0.12, 0.08 \rangle = \langle 0.6, 0.4 \rangle\).
Inference Using Full Joint Distributions
- Normalization
- We can compute \(P(Cavity | toothache)\) even without knowing \(P(toothache)\)
- Add up values of \(cavity\) and \(\neg cavity\), getting 0.12 and 0.08
- These are the correct relative proportions but don’t sum to 1
- We normalize by dividing each by \(0.12 + 0.08\)
- We get the true probabilities of 0.6 and 0.4
- Normalization is a useful procedure, a shortcut in probability calculations
- To make computation easier
- Proceed when some probability is not available
Inference Using Full Joint Distributions
- Normalization
- General inference procedure
- with a single variable \(X\) (\(Cavity\) in the example)
- with a list of evidence variables \(E\) (\(Toothache\) in the example)
- with \(e\) as the list of observed values
- with \(Y\) the remaining unobserved variables (just \(Catch\) in the example)
- The query is \(P(X | e)\)
- Evaluated as: \(P(X | e) = \alpha P(X, e) = \alpha \sum\limits_y P(X, e, y)\)
- summation is over all possible \(y\)’s, all possible combinations of values of the unobserved variables \(Y\)
- But this procedure doesn’t scale well
- Input size: \(O(2^n)\)
- Time to process: \(O(2^n)\) to process the table
- We can easily have \(n > 100\), and \(O(2^n)\) is impractical
Independence
- If we add 1 variable to our joint distribution,
- \(P(Toothache, Catch, Cavity, Weather)\), with \(2 \times 2 \times 2 \times 4 = 32\) entries
- We now have 4 tables as the one for \(P(Toothache, Catch, Cavity)\), one for each kind of weather
- A variable might be independent from the others
- We can reduce the size of the full joint distribution!
Independence
- Independence (also marginal independence and absolute independence)
- \(P(a | b) = P(a)\) or \(P(b | a) = P(b)\) or \(P(a \land b) = P(a) P(b)\)
- Independence among variables is written as:
- \(P(X | Y) = P(X)\) or \(P(Y | X) = P(Y)\) or \(P(X, Y) = P(X) P(Y)\)
Independence
- Factoring
- We can factor a large joint distribution into smaller distributions
- Divide the complete set of variables into independent subsets
- i.e. Full joint distribution of outcome of \(n\) independent coin flips
- \(P(C_1, \dots C_n)\) has \(2^n\) entries
- Can be represented as product of \(n\) single-variable distributions \(P(C_i)\)
- When available, independence assertions help to reduce the size of the domain representation and the complexity of the inference problem
- Unfortunately
- Clean separation of entire sets of variables by independence is not common
- If a connection exists between two variables, they are not independent
- We need more than independence
Independence
Bayes’ Rule and its Use
- The product rule can be written in two ways
- \(P(a \land b) = P(a|b) P(b)\) and \(P(a \land b) = P(b|a) P(a)\)
- Equating the two right hand sides and dividing by \(P(a)\) we get
- \(P(b | a) = \frac{P(a|b) P(b)}{P(a)}\)
- This is the Bayes’ rule or Bayes’ law or Bayes’ theorem
- Most modern AI systems use Bayes’ rule for probabilistic inference
Bayes’ Rule and its Use
- General case of Bayes’ rule for multivalued variables
- \(P(Y | X) = \frac{P(X|Y) P(Y)}{P(X)}\)
- Represents a set of equations, each dealing with specific values of the variables
- General version conditionalized on some background evidence \(e\)
- \(P(Y | X, e) = \frac{P(X|Y, e)P(Y|e)}{P(X|e)}\)
Bayes’ Rule and its Use
- Applying Bayes’ rule: the simple case
- Sometimes we have 3 of the elements and we need to compute the 4th
- Perceiving the effect of an unknown cause and we want to compute the cause:
- \(P(cause|effect) = \frac{P(effect|cause) P(cause)}{P(effect)}\)
- \(P(cause | effect)\) quantifies the relationship in the causal direction
- \(P(effect | cause)\) describes the diagnostic direction
- In medical diagnosis we often have conditional probabilities on causal relationships
- Doctor knows: \(P(symptoms | disease)\)
- Want to derive a diagnosis: \(P(disease | symptoms)\)
Bayes’ Rule and its Use
- Example:
- Doctor knows that meningitis causes the patient to have a stiff neck 70% of the time
- Doctor knows some unconditional facts
- Prior probability that patient has meningitis is 1/50,000
- Prior probability that a patient has a stiff neck is 1%
- \(s\) is the proposition: the patient has a stiff neck and
- \(m\) is the proposition: the patient has meningitis
- \(P(s|m) = 0.7\)
- \(P(m) = 1/50000\)
- \(P(s) = 0.01\)
- \(P(m|s) = \frac{P(s|m)P(m)}{P(s)} = \frac{0.7 \times 1/50000}{0.01} = 0.0014\).
Bayes’ Rule and its Use
- Normalization also applies when working with the Bayes’ rule
- Avoiding using the probability of the evidence, \(P(s)\)
- Instead computing a posterior probability for each value of the query variable (\(m\) and \(\neg m\)) and then normalizing the results
- \(P(M|s) = \alpha \langle P(s|m) P(m), P(s|\neg m) P(\neg m) \rangle\)
- Now, we need \(P(s|\neg m)\) instead of \(P(s)\)
- General form of Bayes’ rule with normalization is
- \(P(Y|X) = \alpha P(X|Y) P(Y)\)
- \(\alpha\) is the normalization constant, to make entries in \(P(Y|X)\) sum to 1
Using Bayes’ rule: Combining Evidence
- What do we do when we have 2 or more values as evidence?
- i.e. what can a dentist conclude if her probe catches in the aching tooth of a patient?
- With the full joint distribution:
- \(P(Cavity|toothache \land catch) = \alpha \langle 0.108, 0.016 \rangle \approx \langle 0.871, 0.129 \rangle\)
- This doesn’t scale up well…
- We may reformulate the problem using the Bayes’ rule
- \(P(Cavity | toothache \land catch) = \alpha P(toothache \land catch | Cavity) P(Cavity)\)
- But now we need the conditional probabilities of the conjunction \(toothache \land catch\) for each value of \(Cavity\)
- Might be feasible for just 2 evidence variables
- But it doesn’t scale up for many variables (i.e. X rays, diet, oral hygiene, etc)
- \(2^n\) combinations…
- We need to simplify the expressions
Using Bayes’ rule: Combining Evidence
- Independence could help simplify the expressions
- Would be good that \(Toothache\) and \(Catch\) were independent
- But they are not
- If probe catches in the tooth, then it is likely that the tooth has a cavity and that the cavity causes a toothache
- But these variables are independent given the presence or absence of a cavity
- Each is directly caused by the cavity but neither has a direct effect on the other
- Toothache depends on the state of the nerves in the tooth
- Probe’s accuracy depends on the dentist but the toothache is irrelevant here
- \(P(toothache \land catch | Cavity) = P(toothache | Cavity) P(catch | Cavity)\)
- We can replace this in the equation for Bayes’ rule
- \(P(Cavity | toothache \land catch) = \alpha P(toothache \land catch | Cavity) P(Cavity)\)
- To get: \(P(Cavity | toothache \land catch) = \alpha P(toothache | Cavity) P(catch | Cavity) P(Cavity)\)
- General definition of conditional independence of two variables \(X\) and \(Y\) given \(Z\)
- \(P(X,Y|Z) = P(X|Z)P(Y|Z)\)
Using Bayes’ rule: Combining Evidence
- Conditional independence assertions allow probabilistic systems to scale up
- SEPARATION
- We say that \(Cavity\) separates \(Toothache\) and \(Catch\)
- It is a direct cause of both of them
- A pattern in which a single cause directly influences a number of effects
- Effects are conditionally independent
- The full joint distribution is
- \(P(Cause, Effect_1, \dots, Effect_n) = P(Cause) \prod\limits_i P(Effect_i | Cause)\).
- This is the Naive Bayes model
Using Bayes’ rule: Combining Evidence
- Naive Bayes
- Naive because it is often used as a simplifying assumption in case the effect variables are not actually conditionally independent given the cause variable
- Naive Bayes is also used as a Bayesian classifier
- Work very well (surprisingly) even when the conditional independence assumption isn’t true
The Wumpus World Revisited
- Probabilistic reasoning in the Wumpus World
- We deal with uncertainty in the wumpus world
- Agent’s sensors give partial information about the world
- i.e. each of the three reachable squares [1, 3], [2, 2], [3, 1] might contain a pit
- With logical inference we cannot conclude which is the safest square to go
- We have to choose randomly
- A probabilistic agent can do better than this
The Wumpus World Revisited
- Compute the probability that each of those squares contains a pit
- A pit causes breezes in all neighboring squares
- Each square other thatn [1,1] contains a pit with probability of 0.2
- Our random variables are:
- \(P_{ij}\), a boolean variable for each square, true iff a square contains a pit
- \(B_{ij}\), a boolean variable for each square, true iff square [i,j] is breezy
- Only known for the observed squares [1,1], [1,2], [2,1]
- We specify the full joint distribution:
- \(P(P_{1,1}, .., P_{4,4}, B_{1,1}, B_{1,2}, B_{2,1})\)
- We apply the product rule
- \(P(P_{1,1}, .., P_{4,4}, B_{1,1}, B_{1,2}, B_{2,1})\) =
- \(P(B_{1,1}, B_{1,2}, B_{2,1} | P_{1,1}, \dots P_{4,4}) P(P_{1,1}, \dots P_{4,4})\)
- First term: conditional probability distribution of a breeze configuration given a pit configuration
- Second term: prior probability of a pit configuration
- Each square contains a pit with probability 0.2 independently of the other squares, then
- \(P(P_{1,1}, \dots, P_{4,4}) = \prod\limits_{i,j=1,1}^{4,4} P(P_{i,j})\)
- For configuration with exactly \(n\) pits \(P(P_{1,1}, \dots, P_{4,4}) = 0.2^n \times 0.8^{16-n}\)
The Wumpus World Revisited
The Wumpus World Revisited
- Situation of figure 13.5(a)
- Evidence:
- observed breeze (or its absence) in each visited square, those squares don’t contain pits
- Abbreviate these facts:
- \(b = \neg b_{1,1} \land b_{1,2} \land b_{2,1}\)
- \(known = \neg p_{1,1} \land \neg p_{1,2} \land \neg p_{2,1}\)
- Query:
- \(P(P_{1,3}\) | \(known, b)\): how likely is it that [1,3] contains a pit, given the observations so far?
The Wumpus World Revisited
- Situation of figure 13.5(a)
- To answer the query we can use the general inference procedure (equation 9)
- Sum over entries from the full joint distribution
- \(P(X|e)= \alpha P(X,e) = \alpha \sum\limits_{y} P(X, e, y)\)
- Let Unknown be the set of \(P_{i,j}\) variables for squares other than the Known squares and the query square [1,3]
- \(P(P_{1,3} | known, b) = \alpha \sum\limits_{unknown} P(P_{1,3}, unknown, known,b)\)
- This solves our problem but it requires a lot of computation
- 12 unknown squares, the summation contains \(2^{12} = 4096\) terms
- The summation grows exponentially
The Wumpus World Revisited
- Situation of figure 13.5(a)
- But we can take advantage of the conditionally independence property
- The observed breezes are conditionally independent of the other variables given the known, frontier, and query variables.
- We manipulate the query formula so that breezes are conditioned on all other variables, then we apply conditional independence
- \(P(P_{1,3} | known, b)\)
- \(= \alpha \sum\limits_{unknown} P(P_{1,3}, known, b, unknown)\) (by equation 9)
- \(=\alpha \sum\limits_{unknown} P(b|P_{1,3}, known, unknown) P(P_{1,3}, known, unknown)\) (by the product rule)
- \(=\alpha \sum\limits_{frontier}\sum\limits_{other} P(b|known, P_{1,3},frontier,other)P(P_{1,3}, known,frontier,other)\)
- \(=\alpha \sum\limits_{frontier}\sum\limits_{other} P(b|known, P_{1,3},frontier)P(P_{1,3}, known,frontier,other)\)
- Final step uses conditional independence: \(b\) is independent of \(other\) given \(known\), \(P_{1,3}\), and \(frontier\)
- We move the summation inward:
- \(P(P_{1,3}|known,b)=\alpha\sum\limits_{frontier}P(b|known,P_{1,3},frontier)\sum\limits_{other}P(P_{1,3},known,frontier,other)\)
- By independence (equation 20), the prior term can be factored and terms reordered
- \(P(P_{1,3}|known,b)=\alpha\sum\limits_{frontier}P(b|known,P_{1,3},frontier)\sum\limits_{other}P(P_{1,3})P(known)P(frontier)P(other)\)
- \(=\alpha P(known)P(P_{1,3}\sum\limits_{frontier}P(b|known,P_{1,3},frontier)P(frontier)\sum\limits_{other}P(other))\)
- \(=\alpha ' P(P_{1,3}\sum\limits_{frontier} P(b|known,P_{1,3},frontier)P(frontier))\)
- the last step folds \(P(known)\) into the normalizing constant and uses the fact that \(\sum\limits_{other}P(other)\) equals 1
- Now we have just 4 terms in the summation over the frontier variables \(P_{2,2}\) and \(P_{3,1}\)
- Use of independence and conditional independence eliminated the other squares from consideration
- Notice that \(P(b|known, P_{1,3}, frontier)\) is 1 when the frontier is consistent with the breeze observations and 0 otherwise
- Thus, for each value of \(P_{1,3}\) we sum over the logical models for the frontier variables that are consistent with the known facts (models and associated prior probabilities P(frontier) are shown in fig. 6)
- \(P(P_{1,3}|known,b) = \alpha' \langle0.2(0.04 +0.16+0.16), 0.8(0.04+0.16)\rangle \approx \langle 0.31,0.69 \rangle\)
- Then [1,3] (and [3,1] by symmetry) contains a pit with approx. 31% probability
- A similar calculation gives that [2,2] contains a pit with approx. 86% probability.
- Then the agent should avoid [2,2]
- Logic tell us that it is unknown whether there is a pit in [2,2], but we need probability to tell us how likely it is!