Assignment 4

Written and Programming Assignment - Probabilites, Bayesian Networks & Decision Trees

Note: The assignment is for 120 Points

Task 1

10 points.

In a certain probability problem, we have 11 variables: A, B₁, B₂, ..., B₁₀.

Variable A has 5 values.
Each of variables B₁, ..., B₁₀ have 7 possible values. Each B_i is conditionally indepedent of all other 9 B_jvariables (with j != i) given A.

Based on these facts:

Part a: How many numbers do you need to store in the joint distribution table of these 11 variables?

Part b: What is the most space-efficient way (in terms of how many numbers you need to store) representation for the joint probability distribution of these 11 variables? How many numbers do you need to store in your solution? Your answer should work with any variables satisfying the assumptions stated above.

Task 2

40 points

The task in this part is to implement a system that:

Can determine the posterior probability of different hypotheses, given priors for these hypotheses, and given a sequence of observations.
Can determine the probability that the next observation will be of a specific type, priors for different hypotheses, and given a sequence of observations.

As in the slides that we saw in class, there are five types of bags of candies. Each bag has an infinite amount of candies. We have one of those bags, and we are picking candies out of it. We don't know what type of bag we have, so we want to figure out the probability of each type based on the candies that we have picked.

The five possible hypotheses for our bag are:

h₁ (prior: 10%): This type of bag contains 100% cherry candies.
h₂ (prior: 20%): This type of bag contains 75% cherry candies and 25% lime candies.
h₃ (prior: 40%): This type of bag contains 50% cherry candies and 50% lime candies.
h₄ (prior: 20%): This type of bag contains 25% cherry candies and 75% lime candies.
h₅ (prior: 10%): This type of bag contains 100% lime candies.

Command Line arguments:

The program takes a single command line argument, which is a string, for example CLLCCCLLL. This string represents a sequence of observations, i.e., a sequence of candies that we have already picked. Each character is C if we picked a cherry candy, and L if we picked a lime candy. Assuming that characters in the string are numbered starting with 1, the i-th character of the string corresponds to the i-th observation. The program should be invoked from the commandline as follows:

compute_a_posteriori observations

For example:

compute_a_posteriori CLLCCLLLCCL

We also allow the case of not having a command line argument at all, this represents the case where we have made no observations yet.

Output:

Your program should create a text file called "result.txt", that is formatted exactly as shown below. ??? is used where your program should print values that depend on its command line argument. Five decimal points should appear for any floating point number.

Observation sequence Q: ???
Length of Q: ???

After Observation ??? = ???: (This and all remaining lines are repeated for every observation)

P(h1 | Q) = ???
P(h2 | Q) = ???
P(h3 | Q) = ???
P(h4 | Q) = ???
P(h5 | Q) = ???

Probability that the next candy we pick will be C, given Q: ???
Probability that the next candy we pick will be L, given Q: ???

Sample output for the Tasks is given here.

Task 3

10 points

George doesn't watch much TV in the evening, unless there is a baseball game on. When there is baseball on TV, George is very likely to watch. George has a cat that he feeds most evenings, although he forgets every now and then. He's much more likely to forget when he's watching TV. He's also very unlikely to feed the cat if he has run out of cat food (although sometimes he gives the cat some of his own food). Design a Bayesian network for modeling the relations between these four events:

baseball_game_on_TV
George_watches_TV
out_of_cat_food
George_feeds_cat

Your task is to connect these nodes with arrows pointing from causes to effects. No programming is needed for this part, just include an electronic document (PDF, Word file, or OpenOffice document) showing your Bayesian network design.

Task 4

10 points

For the Bayesian network of Task 1, the text file at this link contains training data from every evening of an entire year. Every line in this text file corresponds to an evening, and contains four numbers. Each number is a 0 or a 1. In more detail:

The first number is 0 if there is no baseball game on TV, and 1 if there is a baseball game on TV.
The second number is 0 if George does not watch TV, and 1 if George watches TV.
The third number is 0 if George is not out of cat food, and 1 if George is out of cat food.
The fourth number is 0 if George does not feed the cat, and 1 if George feeds the cat.

Based on the data in this file, determine the probability table for each node in the Bayesian network you have designed for Task 3. You need to include these four tables in the drawing that you produce for question 3. You also need to submit the code/script that computes these probabilities.hich leaf node does this test case end up in? What does the decision tree output for that case?

Task 5

15 points.

BayesianNetwork2

Figure 1: Yet another Bayesian Network.

Part a: On the network shown in Figure 1, what is the Markovian blanket of node N?

Part b: On the network shown in Figure 1, what is P(A, F)? How is it derived?

Part c: On the network shown in Figure 1, what is P(N, not(D) | I)? How is it derived?

Hint: Part b,c have easier ways to arrive at the answer other than Inference by enumeration.

Task 6

15 points

Class	A	B	C
X	1	2	1
X	2	1	2
X	3	2	2
X	1	3	3
X	1	2	2
Y	2	1	1
Y	3	1	1
Y	2	2	2
Y	3	3	1
Y	2	1	1

We want to build a decision tree that determines whether a certain pattern is of type X or type Y. The decision tree can only use tests that are based on attributes A, B, and C. Each attribute has 3 possible values: 1, 2, 3 (we do not apply any thresholding). We have the 10 training examples, shown on the table (each row corresponds to a training example).

What is the information gain of each attribute at the root? Which attribute achieves the highest information gain at the root?

Task 7

20 points

Figure 3: A decision tree for estimating whether the patron will be willing to wait for a table at a restaurant.

Part a (5 points): Suppose that, on the entire set of training samples available for constructing the decision tree of Figure 1, 80 people decided to wait, and 20 people decided not to wait. What is the initial entropy at node A (before the test is applied)?

Part b (5 points): As mentioned in the previous part, at node A 80 people decided to wait, and 20 people decided not to wait.

Out of the cases where people decided to wait, in 20 cases it was weekend and in 60 cases it was not weekend.
Out of the cases where people decided not to wait, in 15 cases it was weekend and in 5 cases it was not weekend.

What is the information gain for the weekend test at node A?

Part c (5 points): In the decision tree of Figure 1, node E uses the exact same test (whether it is weekend or not) as node A. What is the information gain, at node E, of using the weekend test?

Part d (5 points): We have a test case of a hungry patron who came in on a rainy Sunday. Which leaf node does this test case end up in? What does the decision tree output for that case?