Written Assignment - Probabilites, Bayesian Networks & Decision Trees

Max points:
The assignment should be submitted via Canvas.

Instructions



Task 1

12 points.

Consider the given joint probabilty distribution for a domain of two variables (Color, Vehicle) :


Color = Red
Color = Green
Color = Blue
Vehicle = Car
0.1299
0.0195
0.0322
Vehicle = Van
0.1681
0.0252
0.0417
Vehicle = Truck
0.1070
0.0160
0.0265
Vehicle = SUV
0.3103
0.0465
0.0769

Part a: Calculate P ( Color is not Green | Vehicle is Truck )

Part b: Prove that Vehicle and Color are totally independant from each other


Task 2

15 points.

In a certain probability problem, we have 11 variables: A, B1, B2, ..., B10. Based on these facts:

Part a: How many numbers do you need to store in the joint distribution table of these 11 variables?

Part b: What is the most space-efficient way (in terms of how many numbers you need to store) representation for the joint probability distribution of these 11 variables? How many numbers do you need to store in your solution? Your answer should work with any variables satisfying the assumptions stated above.

Part c: Does this scenario follow the Naive-Bayes model?



Task 3

10 points

George doesn't watch much TV in the evening, unless there is a baseball game on. When there is baseball on TV, George is very likely to watch. George has a cat that he feeds most evenings, although he forgets every now and then. He's much more likely to forget when he's watching TV. He's also very unlikely to feed the cat if he has run out of cat food (although sometimes he gives the cat some of his own food). Design a Bayesian network for modeling the relations between these four events:

Your task is to connect these nodes with arrows pointing from causes to effects. No programming is needed for this part, just include an electronic document (PDF, Word file, or OpenOffice document) showing your Bayesian network design.


Task 4 

10 points

For the Bayesian network of previous task, the text file at this link contains training data from every evening of an entire year. Every line in this text file corresponds to an evening, and contains four numbers. Each number is a 0 or a 1. In more detail:

Based on the data in this file, determine the probability table for each node in the Bayesian network you have designed for Task 3. You need to include these four tables in the drawing that you produce for question 3. You also need to submit the code/script that computes these probabilities.


Task 5

10 points

Given the network obtained in the previous two tasks, calculate P ( Baseball Game on TV | not(George Feeds Cat) ) using Inference by Enumeration


Task 6

18 points

  Class     A     B     C  
X 1 2 1
X 2 1 2
X 3 2 2
X 1 3 3
X 1 2 1
Y 2 1 2
Y 3 1 1
Y 2 2 2
Y 3 3 1
Y 2 1 1

We want to build a decision tree that determines whether a certain pattern is of type X or type Y. The decision tree can only use tests that are based on attributes A, B, and C. Each attribute has 3 possible values: 1, 2, 3 (we do not apply any thresholding). We have the 10 training examples, shown on the table (each row corresponds to a training example). What is the information gain of each attribute at the root? Which attribute achieves the highest information gain at the root?