Preparation for Third Midterm

Practice Questions

- Is the following function P a valid probability function? If you answer "no", explain why not.
```
P(day == Monday) = 0.4
P(day == Tuesday) = 0.4
P(day == Wednesday) = 0.2
```
- Is the following function P a valid probability function? If you answer "no", explain why not.
```
P(day == Monday) = 0.4
P(day == Tuesday) = 0.4
P(day == Wednesday) = 0.2
P(day == Thursday) = 0.1
```
- Is the following function P a valid probability density function? If you answer "no", explain why not.
```
P(x) = 0.1, if 10 <= x <= 20.
P(x) = 0 otherwise.
```
- Is the following function P a valid probability density function? If you answer "no", explain why not.
```
P(x) = 0.01, if 10 <= x <= 20.
P(x) = 0 otherwise.
```
Compute P(fire | alarm), given the following information:
```
P(alarm | fire) = A
P(alarm | not fire) = B
P(fire) = C
```
We are given the following information:
```
P(fire) = 0.1
P(earthquake) = 0.2
P(flood) = 0.4
```
1. Suppose that we do not know whether fire, earthquake, and flood, are independent events. Can we compute the probability P(fire and earthquake and flood)? If yes, what is P(fire and earthquake and flood)?
2. Suppose that we know that fire, earthquake, and flood, are independent events. Can we compute the probability P(fire and earthquake and flood)? If yes, what is P(fire and earthquake and flood)?
3. Suppose that we know that fire, earthquake, and flood, are not independent events. Can we compute the probability P(fire and earthquake and flood)? If yes, what is P(fire and earthquake and flood)?

Compute P(commute time < 20 min | temperature > 80), given the following joint probability distribution:

Commute time 40-60 Fahrenheit 60-80 Fahrenheit above 80 Fahrenheit
< 20 min 0.1 0.05 0.1 
20-40 min 0.2 0.1 0.1
> 40 min 0.05 0.1 0.2

For the Bayesian network shown in textbook figure 14.2: is P(Earthquake | Alarm) larger, equal to, or smaller than P(Earthquake | Alarm and Burglary)? You can either (not recommended) compute both probabilities, or (recommended) provide an intuitive (but correct) justification for your answer.
For the Bayesian network shown in textbook figure 14.2: is P(Earthquake | Alarm) larger, equal to, or smaller than P(Earthquake | Alarm and MaryCalls)? You can either (not recommended) compute both probabilities, or (recommended) provide an intuitive (but correct) justification for your answer.
We are building a decision tree to determine if the next car of a person will be a regular car or a minivan. We have 100 cases as examples. The following is true for those cases:
- 40 people bought minivans. Out of those 40 people, 30 people were over 35 years of age, and 10 people were under 35 years of age.
- 60 people bought regular cars. Out of those 60 people, 12 people were over 35 years of age, and 48 people were under 35 years of age.
What is the entropy gain of selecting the "over 35 years of age" attribute as a test for the root node of the decision tree?
Given a set of training examples, is there always a decision tree that perfectly classifies all training examples in that set? If yes, prove your answer. If no, provide a counter example.
There are two types of candy bags, type A and type B. Both types of bags contain an infinite number of candies. A bag of type A contains 80% chocolate candies and 20% vanilla candies. A bag of type B contains 40% chocolate candies and 60% vanilla candies. The prior probability P(A) of having a bag of type A is 0.99, and the prior probability P(B) of having a bag of type B is 0.01. What is the posterior probability that we have a bag of type A if the first candy that we pick is a vanilla candy?
Design a perceptron takes two inputs X1 and X2, and that outputs +1 if X1 > X2 + 5, and that outputs 0 if X1 <= X2 + 5.
Consider a function F that takes three Boolean inputs and gives a +1 response when exactly two (no more, no fewer) of those inputs are set to true (for the inputs, true is encoded by value 1, false is encoded by value 0). Can we construct a perceptron that models function F perfectly? Why, or why not?
Design a neural network that implements the XOR function. You can use any number and any type of perceptrons you like. You do not have to specify the weights inside each perceptron, but you need to specify what function each perceptron implements (and, of course, the function should be a function that a perceptron can indeed model).