Part a: If the first e-mail you got from sensor S indicates a daily high over 80 degrees, what is the probability that the sensor is placed in Maine?
Part b: If the first e-mail you got from sensor S indicates a daily high over 80 degrees, what is the probability that the second e-mail also indicates a daily high over 80 degrees?
Part c: What is the probability that the first three e-mails all indicate daily highs over 80 degrees?
Part a: How many numbers do you need to store in the joint distribution table of these 11 variables?
Part b: What is the most space-efficient way (in terms of how many numbers you need to store) representation for the joint probability distribution of these 11 variables? How many numbers do you need to store in your solution? Your answer should work with any variables satisfying the assumptions stated above.
Figure 2: A decision tree for estimating whether the patron will be willing to wait for a table at a restaurant.
Part a: Suppose that, on the entire set of training samples available for constructing the decision tree of Figure 1, 80 people decided to wait, and 20 people decided not to wait. What is the initial entropy at node A (before the test is applied)?
Part b: As mentioned in the previous part, at node A 80 people decided to wait, and 20 people decided not to wait.
Part c: In the decision tree of Figure 1, node E uses the exact same test (whether it is weekend or not) as node A. What is the information gain, at node E, of using the weekend test?
Part d: We have a test case of a hungry patron who came in on a rainy Tuesday. Which leaf node does this test case end up in? What does the decision tree output for that case?
Part e: We have a test case of a not hungry patron who came in on a sunny Saturday. Which leaf node does this test case end up in? What does the decision tree output for that case?
Class | A | B | C |
X | 1 | 2 | 1 |
X | 2 | 1 | 2 |
X | 3 | 2 | 2 |
X | 1 | 3 | 3 |
X | 1 | 2 | 2 |
Y | 2 | 1 | 1 |
Y | 3 | 1 | 1 |
Y | 2 | 2 | 2 |
Y | 3 | 3 | 1 |
Y | 2 | 1 | 1 |
What is the information gain of each attribute at the root? Which attribute achieves the highest information gain at the root?
Part a: What is the highest possible and lowest possible entropy value at node N?
Part b: Suppose that, at node N, we choose an attribute K. What is the highest possible and lowest possible information gain for that attribute?