Written Assignment - Probabilites, Bayesian Networks & Decision
Trees
Max points:
- CSE 4308: 75
- CSE 5360: 75
The
assignment should be submitted via Canvas.
Instructions
- The answers can be typed as a document or handwritten and
scanned.
- Name files as
assignment4_<net-id>.<format>
- Accepted document format is .pdf.
- If you are using Word, OpenOffice or LibreOffice, make
sure
to
save as .pdf.
- If you are using LaTEX, compile into a .pdf file.
- Please do not submit
.txt files.
- If
you are scanning handwritten documents make sure to scan it at a
minimum of 600dpi and save as a .pdf or .png file. Do not
insert images in word document and submit.
- If there are multiple files in your submission, zip them
together as assignment4_<net-id>.zip and submit the .zip
file.
Task 1
12
points.
Consider
the given joint probabilty distribution for a domain of two variables
(Color, Vehicle) :
|
Color = Red
|
Color = Green
|
Color = Blue
|
Vehicle = Car
|
0.1184
|
0.1280
|
0.0736
|
Vehicle = Van
|
0.0444
|
0.0480
|
0.0276
|
Vehicle = Truck
|
0.1554
|
0.1680
|
0.0966
|
Vehicle = SUV
|
0.0518
|
0.0560
|
0.0322
|
Part a: Calculate P ( Color is
not Green | Vehicle is Truck )
Part b: Check if Vehicle and
Color are totally independant from each other
Task 2
15
points.
In
a certain probability problem,
we have 11 variables: A, B1,
B2,
..., B10.
- Variable A has 7 values.
- Each of variables B1, ..., B10 have 8
possible values. Each Bi is
conditionally indepedent of all other 9 Bjvariables
(with j != i) given A.
Based
on these facts:
Part
a: How
many numbers do you need to store in the joint distribution table of
these 11 variables?
Part
b: What
is the most space-efficient way (in terms of how many numbers you need
to store) representation for the joint probability distribution of
these 11 variables? How many numbers do you need to store in your
solution? Your answer should work with any variables satisfying the
assumptions stated above.
Part
c: Does this
scenario follow the Naive-Bayes model?
Task
3
12 points
Given the network above, calculate P
( not(Baseball Game on TV) | not(George Feeds Cat) ) using Inference by
Enumeration
Task
4
18 points
Class |
A |
B |
C |
X |
1 |
2 |
1 |
X |
2 |
1 |
2 |
X |
3 |
2 |
2 |
X |
1 |
3 |
3 |
X |
1 |
2 |
1 |
Y |
2 |
1 |
2 |
Y |
3 |
1 |
1 |
Y |
2 |
2 |
2 |
Y |
3 |
3 |
1 |
Y |
2 |
1 |
1 |
We want to build a decision tree that determines whether a certain
pattern is of type X or type Y. The decision tree can only use tests
that are based on attributes A, B, and C. Each attribute has 3 possible
values: 1, 2, 3 (we do not apply any thresholding). We have the 10
training examples, shown on the table (each row corresponds to a
training example).
What is the information gain of each attribute at the root?
Which attribute achieves the highest information gain at the root?
Task
5
18 points
Class |
A |
B |
C |
X |
25
|
24
|
31
|
X |
22
|
14
|
24
|
X |
28
|
22
|
25
|
X |
24
|
13
|
30
|
X |
26
|
20
|
24
|
Y |
20
|
31
|
17
|
Y |
18
|
32
|
14
|
Y |
21
|
25
|
20
|
Y |
13
|
32
|
15
|
Y |
12
|
27
|
18
|
We want to build a decision tree (which thresholding) that determines
whether a certain pattern is of type X or type Y. The decision tree can
only use tests that are based on attributes A, B, and C. We have the 10
training examples, shown on the table (each row corresponds to a
training example). Which attribute threshold combination achieves the
highest information gain at the root? For each attribute try the
thresholds of 15, 20 and 25.