Material
The material covered in the final consists of the following chapters and sections in the textbook:
(material for first midterm)
- Chapter 2, all sections .
- Chapter 3, all sections except 3.6.
- Chapter 4, sections 4.1 and 4.2.
- Chapter 6, all sections except 6.6.
- Chapter 7, sections 7.1, 7.2, 7.3, 7.4, 7.5 up to and NOT INCLUDING the "forward and backward chaining" subsection, which starts on page 217.
(material for second midterm)
- Chapter 8, sections 8.1, 8.2, 8.3.
- Chapter 9, sections 9.1, 9.2, 9.3, 9.4, 9.5.
- Chapter 11, sections 11.1, 11.2, 11.3, 11.4, 11.5.
- Chapter 12, sections 12.3, 12.4, 12.5.
- Chapter 13, sections 13.1, 13.2, 13.3, 13.4, 13.5, 13.6.
- Chapter 14, sections 14.1, 14.2, 14.3.
(additional material)
- Chapter 18, sections 18.1, 18.2, 18.3 (up to and NOT INCLUDING the "broadening the applicability of decision trees" subsection starting on page 663), 18.4.
- Chapter 20, sections 20.1, 20.2, 20.5 up to and NOT INCLUDING the paragraph starting with "For the mathematically inclined", on page 746.
50% of the points will be assigned to questions from the additional material that was not covered in the midterms. The remaining 50% will be assigned to questions from the material covered in the first two midterms. The practice questions in this page refer only to the additional material that was not covered in the midterms. For practice questions referring to the material covered earlier in the course, please see the questions on the two midterms and the practice questions for those two midterms, accessible from the exams page.
Practice Questions
- We are building a decision tree to determine if the next car of a person will be a regular car or a minivan. We have 100 cases as examples. The following is true for those cases:
- 40 people bought minivans. Out of those 40 people, 30 people were over 35 years of age, and 10 people were under 35 years of age.
- 60 people bought regular cars. Out of those 60 people, 12 people were over 35 years of age, and 48 people were under 35 years of age.
What is the entropy gain of selecting the "over 35 years of age" attribute as a test for the root node of the decision tree?
- Given a set of training examples, is there always a decision tree that perfectly classifies all training examples in that set? If yes, prove your answer. If no, provide a counter example.
- We are running AdaBoost on a set of 100 training examples.
- What is the weight of the 20th training example initially?
- The 20th training example is misclassified by the first weak classifier chosen by the algorithm. Is the new weight of the 20th training example (i.e., the weight assigned to the 20th example after choosing the first weak classifier) larger or smaller than the weight assigned to that example initially?
- There are two types of candy bags, type A and type B. Both types of bags contain an infinite number of candies. A bag of type A contains 80% chocolate candies and 20% vanilla candies. A bag of type B contains 40% chocolate candies and 60% vanilla candies. The prior probability P(A) of having a bag of type A is 0.99, and the prior probability P(B) of having a bag of type B is 0.01.
- What is the posterior probability that we have a bag of type A if the first candy that we pick is a vanilla candy?
- We want to estimate whether the bag we have is a type A bag or a type B bag. If the first candy is vanilla, what is the MAP estimate, and what is the ML estimate?
- Design a perceptron takes two inputs X1 and X2, and that outputs +1 if X1 > X2 + 5, and that outputs 0 if X1 <= X2 + 5.
- Consider a function F that takes three Boolean inputs and gives a +1 response when exactly two (no more, no fewer) of those inputs are set to true (for the inputs, true is encoded by value 1, false is encoded by value 0). Given a large amount of training data, can we use the perceptron learning algorithm to construct a perceptron that models function F perfectly? Why, or why not?
- Design a neural network that implements the XOR function. You can use any number and any type of perceptrons you like. You do not have to specify the weights inside each perceptron, but you need to specify what function each perceptron implements (and, of course, the function should be a function that a perceptron can indeed model).