Samples for Exam 4
Task
1
Class |
A |
B |
C |
X |
1 |
2 |
1 |
X |
2 |
1 |
2 |
X |
3 |
2 |
2 |
X |
1 |
3 |
3 |
X |
1 |
2 |
1 |
Y |
2 |
1 |
2 |
Y |
3 |
1 |
1 |
Y |
2 |
2 |
2 |
Y |
3 |
3 |
1 |
Y |
2 |
1 |
1 |
We want to build a decision tree that determines whether a certain
pattern is of type X or type Y. The decision tree can only use tests
that are based on attributes A, B, and C. Each attribute has 3 possible
values: 1, 2, 3 (we do not apply any thresholding). We have the 10
training examples, shown on the table (each row corresponds to a
training example).
What is the information gain of each attribute at the root?
Which attribute achieves the highest information gain at the root?
Task
2
Class |
A |
B |
C |
X |
25
|
24
|
31
|
X |
22
|
14
|
24
|
X |
28
|
22
|
25
|
X |
24
|
13
|
30
|
X |
26
|
20
|
24
|
Y |
20
|
31
|
17
|
Y |
18
|
32
|
14
|
Y |
21
|
25
|
20
|
Y |
13
|
32
|
15
|
Y |
12
|
27
|
18
|
We want to build a decision tree (which thresholding) that determines
whether a certain pattern is of type X or type Y. The decision tree can
only use tests that are based on attributes A, B, and C. We have the 10
training examples, shown on the table (each row corresponds to a
training example). Which attribute threshold combination achieves the
highest information gain at the root? For each attribute try the
thresholds of 15, 20 and 25.
Task
3
Consider a Classification problem with patterns that have 30 Attributes
where each attribute takes a value between 0 and 127 that have to
classified as one of 10 possible class labels.
- If you are building a Pseudo Bayes classifier with P(X|C) modeled
using a Histogram, how many numbers do you have to store:
- If no binning is done
- If values that are 4 numbers apart (Ex: 0-3, 1-4) are
considered very similar and so have their counts binned together
- If you instead build it with P(X|C) approximated using a PDF, how
many numbers do you have to store:
- If you use 1-D Gaussians (with Naive-Bayes assumption)
- If you use a mixture of 3 1-D Gaussians (again with Naive-Bayes
assumption)
- If you use a 30-D Gaussian
- If you use a mixture of 3 30-D Gaussians.