Figure 1: A decision tree for estimating whether the patron will be willing to wait for a table at a restaurant.
Suppose that, on the entire set of training samples available for constructing the decision tree of Figure 1, 80 people decide to wait, and 20 people decide not to wait. What is the initial entropy at node A (before the test is applied)? Write an expression that fully specifies the answer numerically.
In the decision tree of Figure 1, node D uses the exact same test (whether it is weekend or not) as node A. What is the information gain, at node D, of using the weekend test? Justify your answer.
Your boss at a software company gives you a binary classifier H that predicts, for any basketball game, whether the home team will win or not. Classifier H has a 28% accuracy, and your boss assigns you the task of improving that classifier, so that you get an accuracy that is better than 60%. How do you achieve that task?
You are a meteorologist that places temperature sensors all of the world, and you set them up so that they automatically e-mail you, each day, the high temperature for that day. Unfortunately, you have forgotten whether you placed a certain sensor S in Maine or in the Sahara desert (but you are sure you placed it in one of those two places) . The probability that you placed sensor S in Maine is 5%. The probability of getting a daily high temperature of 80 degrees or more is 20% in Maine and 90% in Sahara. Assume that probability of a daily high for any day is independent of the daily high for the previous day.
In the following questions, you do not have to perform all numerical calculations, but you have to fully specify the numerical value of each answer.
a. (15 points) Given that the first e-mail you got from sensor S indicates a daily high under 80 degrees, what is the maximum likelihood estimate that sensor S is placed in Maine? Show how you derive your answer.
b. (15 points) Given that the first e-mail you got from sensor S indicates a daily high under 80 degrees, what is the probability that the sensor is placed in Maine?
c. (15 points) Given that the first e-mail you got from sensor S indicates a daily high under 80 degrees, what is the probability that the second e-mail also indicates a daily high under 80 degrees?