Written Assignment - Probabilites, Bayesian Networks & Decision Trees

Max points:

CSE 4308: 75
CSE 5360: 75

The assignment should be submitted via Canvas.

Instructions

The answers can be typed as a document or handwritten and scanned.
Name files as assignment4_<net-id>.<format>
Accepted document format is .pdf.

If you are using Word, OpenOffice or LibreOffice, make sure to save as .pdf.
If you are using LaTEX, compile into a .pdf file.

Please do not submit .txt files.

If you are scanning handwritten documents make sure to scan it at a minimum of 600dpi and save as a .pdf or .png file. Do not insert images in word document and submit.
If there are multiple files in your submission, zip them together as assignment4_<net-id>.zip and submit the .zip file.

Task 1

12 points.

Consider the given joint probabilty distribution for a domain of two variables (Color, Vehicle) :

	Color = Red	Color = Green	Color = Blue
Vehicle = Car	0.1184	0.1280	0.0736
Vehicle = Van	0.0444	0.0480	0.0276
Vehicle = Truck	0.1554	0.1680	0.0966
Vehicle = SUV	0.0518	0.0560	0.0322

Part a: Calculate P ( Color is not Green | Vehicle is Truck )

Part b: Check if Vehicle and Color are totally independant from each other

Task 2

15 points.

In a certain probability problem, we have 11 variables: A, B₁, B₂, ..., B₁₀.

Variable A has 7 values.
Each of variables B₁, ..., B₁₀ have 8 possible values. Each B_i is conditionally indepedent of all other 9 B_jvariables (with j != i) given A.

Based on these facts:

Part a: How many numbers do you need to store in the joint distribution table of these 11 variables?

Part b: What is the most space-efficient way (in terms of how many numbers you need to store) representation for the joint probability distribution of these 11 variables? How many numbers do you need to store in your solution? Your answer should work with any variables satisfying the assumptions stated above.

Part c: Does this scenario follow the Naive-Bayes model?

Task 3

12 points

Given the network above, calculate P ( not(Baseball Game on TV) | not(George Feeds Cat) ) using Inference by Enumeration

Task 4

18 points

Class	A	B	C
X	1	2	1
X	2	1	2
X	3	2	2
X	1	3	3
X	1	2	1
Y	2	1	2
Y	3	1	1
Y	2	2	2
Y	3	3	1
Y	2	1	1

We want to build a decision tree that determines whether a certain pattern is of type X or type Y. The decision tree can only use tests that are based on attributes A, B, and C. Each attribute has 3 possible values: 1, 2, 3 (we do not apply any thresholding). We have the 10 training examples, shown on the table (each row corresponds to a training example). What is the information gain of each attribute at the root? Which attribute achieves the highest information gain at the root?

Task 5

18 points

Class	A	B	C
X	25	24	31
X	22	14	24
X	28	22	25
X	24	13	30
X	26	20	24
Y	20	31	17
Y	18	32	14
Y	21	25	20
Y	13	32	15
Y	12	27	18

We want to build a decision tree (which thresholding) that determines whether a certain pattern is of type X or type Y. The decision tree can only use tests that are based on attributes A, B, and C. We have the 10 training examples, shown on the table (each row corresponds to a training example). Which attribute threshold combination achieves the highest information gain at the root? For each attribute try the thresholds of 15, 20 and 25.