Preparation for Third Midterm - Answers

Answers

  1. Compute P(fire | alarm), given the following information:
    P(alarm | fire) = A
    P(alarm | not fire) = B
    P(fire) = C
    Answer:

    P(fire | alarm) = P(alarm | fire) * P(fire) / P(alarm) = A * C / P(alarm).

    P(alarm) = P(alarm, fire) + P(alarm, not fire)
    = P(alarm | fire) * P(fire) + P(alarm | not fire) * P(not fire) =
    = A * C + B * (1 - C)
    So, if we set D = P(alarm) = A * C + B * (1 - C), our final answer is:
    P(fire | alarm) = A * C / D
  2. We are given the following information:
    P(fire) = 0.1
    P(earthquake) = 0.2
    P(flood) = 0.4
    1. Suppose that we do not know whether fire, earthquake, and flood, are independent events. Can we compute the probability P(fire and earthquake and flood)? If yes, what is P(fire and earthquake and flood)?

      Answer:

      No, if we do not know whether fire, earthquake, and flood, are independent events, then we would need some additional information (such as a joint distribution table) to compute P(fire and earthquake and flood).

    2. Suppose that we know that fire, earthquake, and flood, are independent events. Can we compute the probability P(fire and earthquake and flood)? If yes, what is P(fire and earthquake and flood)?

      Answer:

      Yes.

      P(fire and earthquake and flood) = P(fire) * P(earthquake) * P(flood) = 0.1 * 0.2 * 0.4 = 0.008.

    3. Suppose that we know that fire, earthquake, and flood, are not independent events. Can we compute the probability P(fire and earthquake and flood)? If yes, what is P(fire and earthquake and flood)?

      Answer:

      No, if we know that fire, earthquake, and flood are not independent events, then we would need some additional information (such as a joint distribution table) to compute P(fire and earthquake and flood).

  3. Compute P(commute time < 20 min | temperature > 80), given the following joint probability distribution:
    Commute time 40-60 Fahrenheit 60-80 Fahrenheit above 80 Fahrenheit
    < 20 min 0.1 0.05 0.1
    20-40 min 0.2 0.1 0.1
    > 40 min 0.05 0.1 0.2

    Answer:

    P(commute time < 20 min | temperature > 80) = P(commute time < 20 min AND temperature > 80) / P(temperature > 80)

    P(commute time < 20 min AND temperature > 80) = 0.1

    P(temperature > 80) = 0.1 + 0.1 + 0.2 = 0.4

    P(commute time < 20 min | temperature > 80) = 0.1 / 0.4 = 0.25
  4. For the Bayesian network shown in textbook figure 14.2: is P(Earthquake | Alarm) larger, equal to, or smaller than P(Earthquake | Alarm and Burglary)? You can either (not recommended) compute both probabilities, or (recommended) provide an intuitive (but correct) justification for your answer.

    Answer:

    We expect that P(Earthquake | Alarm) is larger than P(Earthquake | Alarm and Burglary). Burglary and Earthquake are competing causes for the Alarm event. Given that Alarm is true, if we know that one possible cause (Burglary) is true, the other competing cause (Earthquake) becomes less likely.

  5. For the Bayesian network shown in textbook figure 14.2: is P(Earthquake | Alarm) larger, equal to, or smaller than P(Earthquake | Alarm and MaryCalls)? You can either (not recommended) compute both probabilities, or (recommended) provide an intuitive (but correct) justification for your answer.

    Answer:

    P(Earthquake | Alarm) is equal to P(Earthquake | Alarm and MaryCalls). Earthquake and MaryCalls are conditionally independent given the value for the Alarm event.

  6. We are building a decision tree to determine if the next car of a person will be a regular car or a minivan. We have 100 cases as examples. The following is true for those cases: What is the entropy gain of selecting the "over 35 years of age" attribute as a test for the root node of the decision tree?

    Answer:

    We call "parent" the node with the 100 training examples, "child1" the child node that receives the examples where the age is over 35 years, and child2 the child node that receives the examples where the age is under 35. Node child1 receives 42 examples, and node child2 receives 58 examples. We denote by log2(x) the logarithm base 2 of x. Then:

    Entropy gain = Entropy(parent) - 42/100 * Entropy(child1) - 58/100 * Entropy(child2).

    Entropy(parent) = -0.4 * log2(0.4) - 0.6 * log2(0.6) = 0.971

    Entropy(child1) = -(30/42) * log2(30/42) - (12/42) * log2(12/42) = 0.8631

    Entropy(child2) = -(10/58) * log2(10/58) - (48/58) * log2(48/58) = 0.6632

    Entropy gain = Entropy(parent) - 42/100 * Entropy(child1) - 58/100 * Entropy(child2)
    = 0.971 - 0.42 * .8631 - 0.58 * 0.6632
    => Entropy gain = 0.2238

  7. Given a set of training examples, is there always a decision tree that perfectly classifies all training examples in that set? If yes, prove your answer. If no, provide a counter example.

    Answer:

    If there are no duplicate training examples (i.e., if no two training examples have exactly the same values for all attributes), then the answer is yes. If there are two training examples with exactly the same values for all attributes but different class labels, then the answer is no.

  8. There are two types of candy bags, type A and type B. Both types of bags contain an infinite number of candies. A bag of type A contains 80% chocolate candies and 20% vanilla candies. A bag of type B contains 40% chocolate candies and 60% vanilla candies. The prior probability P(A) of having a bag of type A is 0.99, and the prior probability P(B) of having a bag of type B is 0.01. What is the posterior probability that we have a bag of type A if the first candy that we pick is a vanilla candy?

    Answer:

    P(A | vanilla) = P(vanilla | A) * P(A) / P(vanilla) = 0.2 * 0.99 / P(vanilla)

    P(vanilla) = P(vanilla AND A) + P(vanilla AND B)
    = P(vanilla | A) * P(A) + P(vanilla | B) * P(B)
    = 0.2 * 0.99 + 0.6 * 0.01
    = 0.2040
    Consequently:
    P(A | vanilla) = P(vanilla | A) * P(A) / P(vanilla) = 0.2 * 0.99 / 0.204 = 0.9706

  9. Textbook exercise 11.2 (second edition), exercise 10.2 (third edition).

    Answer:

    Fly(P1, JFK, SFO)
    Fly(P2, SFO, JFK)
    Fly(P1, JFK, JFK)
    Fly(P2, SFO,SFO)

  10. Textbook figure 10.3 in 3rd edition (figure 11.4 in 2nd edition) provides a description for a deterministic version of the blocks world. We want to make a modification to that description, so as to model a nondeterministic version, in which the effect of action move(b, x, y) is sometimes on(b, y) and sometimes on(b, table). How would you modify the description of the move action to make it reflect the above two possible outcomes of move(b, x, y)?

    Answer:

    We would replace the effect of the move action shown on that figure with the following effect:

    (On(b, y) and Clear(x) and (not On(b, x)) and (not Clear(y))) or
    (On(b, table) and Clear(x) and (not (On(b, x))))
  11. In the nondeterministic blocks world described in the previous exercise, suppose that the initial state and the goal are as as follows:
    Initial state:
    On(A, Table) and On(B, Table) and On(C, Table) and Block(A)
    and Block(B) and Block(C) and Clear(A) and Clear(B) and Clear(C)

    Goal:
    On(A, B) and On(B, C)
    Is there a conditional plan that achieves this goal with guaranteed success? If yes, list the sequence of actions in that plan. If no, explain why not.

    Answer:

    No, there is no conditional plan that guarantees success, because it is theoretically possible that an infinite number of moves be required to successfully move a block on top of another block.