Due dates for CSE 4308 (assessment):
- Parts 1 and 2: Tuesday November 20, 2007, 11:59pm (by e-mail).
- Part 3: Thursday November 29, 2007, 11:59pm(by e-mail).
- Part 4: Tuesday, December 4, 11:59pm (by e-mail).
For CSE 5360 Only: All parts due Tuesday, December 4, 11:59pm (by e-mail).
Summary
The goal of this assignment/assessment is to evaluate theoretical understanding of Bayesian networks and the practical ability to design and implement such networks. Four tasks need to be completed:
- Answering some written questions about Bayesian networks.
- Designing a Bayesian network graph given a description of the problem in English.
- Using training data to learn probability distributions for a Bayesian network.
- Implementing a Bayesian network in software. Your implementation must be able to compute the probability of any event.
Part 1 (Theoretical Understanding): Questions about Bayesian Networks.
Questions 1-6 refer to the Bayesian network in Figure 1. Questions 7 and 8 refer to the Bayesian network in Figure 2.
- Consider the event E = "battery age is less than three years". Let A = P(E | "battery dead"=true AND "no oil"=true) and B = P(E | "battery dead"=true AND "no oil"=false). Which of the following three cases can possibly be true: A > B, A = B, or A < B? Why?
- Consider again the event E = "battery age is less than three years". Let A = P(E | "battery dead"=true AND "no oil"=true) and B = P(E | "battery dead"=true AND "starter broken"=true). Which of the following three cases can possibly be true: A > B, A = B, or A < B? Why?
- The "alternator broken" event and the "fanbelt broken" event are both causes of the "no charging event." Let A = P("alternator broken"=true | "no charging"=true) and B = P("alternator broken"=true | "no charging"=true AND "fanbelt broken"=true). Which of the following three cases do you expect to be true: A > B, A = B, or A < B? Why?
- The "no gas" event is a cause of the "car won't start" event. Let A = P("no gas"=true) and B = P("no gas"=true | "car won't start"=true). Which of the following three cases do you expect to be true: A > B, A = B, or A < B? Why?
- Suppose that:
- P("alternator broken"=true) = 0.02
- P("no charging"=true | "alternator broken"=true) = 0.95
- P("no charging"=true | "alternator broken"=false) = 0.01.
What is P("no charging"=false)? How is it derived?
- Suppose that:
- P("battery age" <= 3 years) = 0.7
- P("battery dead"=true | "battery age" <= 3 years) = 0.02
- P("battery dead"=true | "battery age" > 3 years) = 0.1.
What is P("battery age" <= 3 years | "battery dead"=true)? How is it derived?
- In Figure 2, what is the probability of the following event: burglary=false AND earthquake=true AND alarm=false AND JohnCalls=true AND MaryCalls=false.
- In Figure 2, what is the probability of the following event: earthquake=true AND alarm=false AND JohnCalls=true AND MaryCalls=false.

Figure 1: A Bayesian network graph establishing relations between various car problems and their causes.

Figure 2: A Bayesian network establishing relations between events, together with complete specifications of all probability distributions.
Part 2 (Solution Design): Designing a Bayesian network graph.
George doesn't watch much TV in the evening, unless there is a baseball game on. When there is baseball on TV, George is very likely to watch. George has a cat that he feeds every night, although he forgets every now and then. He's much more likely to forget when he's watching TV. He's also very unlikely to feed the cat if he has run out of cat food (although sometimes he gives the cat some of his own food).
Design a Bayesian network for modeling the relations between these four events:
- baseball_game_on_TV
- George_watches_TV
- out_of_cat_food
- George_feeds_cat
Your task is to connect these nodes with arrows pointing from causes to effects. No programming is needed for this part, just submit a document showing your Bayesian network design.
Part 3 (Data Analysis): Learning Probabilities from Training Data.
For the Bayesian network of Part 2, the text file at this link contains training data from every evening of an entire year. Every line in this text file corresponds to an evening, and contains four numbers. Each number is a 0 or a 1. In more detail:
- The first number is 0 if there is no baseball game on TV, and 1 if there is a baseball game on TV.
- The second number is 0 if George does not watch TV, and 1 if George watches TV.
- The third number is 0 if George is not out of cat food, and 1 if George is out of cat food.
- The fourth number is 0 if George does not feed the cat, and 1 if George feeds the cat.
Based on the data in this file, determine the probability table for each node in the Bayesian network you have designed for Part 2. You need to submit these four tables by e-mail. Optionally (to help with grading) you can also submit the code that computes these probabilities.
Part 4 (Software Implementation): Implementing a Bayesian Network in Software.
For the Bayesian network of Figure 2, implement a program that computes the probability of any combination of events. For example, if the executable is called bnet, an example invocation of the executable would be:
bnet 0 1 0 1 1
In general, bnet takes exactly 5 (no more, no fewer) command line arguments, and each argument is either 0 or 1. The arguments provide to the program the following information:
- The first argument is 1 if Burglary=true, and 0 if Burglary=false.
- The second argument is 1 if Earthquake=true, and 0 if Earthquake=false.
- The third argument is 1 if Alarm=true, and 0 if Alarm=false.
- The fourth argument is 1 if JohnCalls=true, and 0 if JohnCalls=false.
- The fifth argument is 1 if MaryCalls=true, and 0 if MaryCalls=false.
The correct implementation will not contain hardcoded values for all 32 combinations of arguments, but instead will use the tables shown on Figure 2 and the appropriate formulas to evaluate the probability of the specified event.
Grading Rubric
Each of the four parts is worth 25 points. For the assessment, in addition to the points, there will also be a qualitative grade for each part, and for the assessment as a whole. The qualitative grade can have the following five values: poor, fair, acceptable, goods, excellent. The qualitative grade will be assigned as follows:
- 0%-29%: poor
- 30%-59%: fair
- 60%-74%: acceptable
- 75%-89%: good
- 90%-100%: excellent
Here is additional information about how each part will be graded:
- Part 1 (Theoretical Understanding): Each question is worth roughly 3 points (25/8, to be exact).
- Part 2 (Solution Design):
- 10 points: Establishing a correct correspondence between nodes in the Bayesian network and events in the problem description.
- 8 points: Establishing correct connections between nodes in the Bayesian network, according to the problem description.
- 7 points: Using the correct direction for each arrow in the Bayesian network.
- Part 3 (Data Analysis):
- 10 points: Correctly identifying the form of the probability table that needs to be estimated for each node (i.e., the probability of what event given what other events).
- 15 points: Estimating correctly each probability table from the training data.
- Part 4 (Software Implementation):
- 10 points: Creating an executable that provides the correct output for each input.
- 10 points: Writing code that correctly implements probability estimation for Bayesian networks, and explicitly models each node of the network and each connection between nodes. There should be no hard coding except for specifying the nodes of the network, the connections between the nodes, and the probability table for each node.
- 5 points: Elegance of the software implementation: modular design, good code organization, code that is easy to read and understand, proper comments.
Submissions
All submissions are via e-mail. E-mail your submission to BOTH the
instructor and the TA. Use subject "CSE 4308 assessment, part X" or "CSE 5360, assignment
6, part X". where X is the part of the assignment that you are submitting.
For part 4, implementations in LISP, C, C++, and Java will be accepted. If you would like
to use another language, please first check with the instructor via e-mail.
For part 4, submit a zipped directory (no other forms of compression accepted) via
e-mail, with subject "CSE 4308/5360, assignment
6". THE ATTACHMENT SHOULD NOT EXCEED 800KB in size (e-mail the instructor
if the 800KB limit is a concern). The directory should
contain source code, and optionally, binaries that are appropriate for gamma
or omega. The directory should also contain a file called readme.txt,
which should specify precisely:
- Where the code runs (i.e., on gamma or on omega).
- What programming language is used.
- How to run the code (including compilation instructions, if compilation
is needed and no appropriate binaries are included).
Insufficient or unclear instructions will be penalized severely.
Code that does not run on at least one of the gamma and omega machines gets
zero points.
Some hints
- How to submit homeworks using gzip and tar:
1. Log into omega or gamma.
2. Create a homework directory CSE4308HWNo. (replace the No. by the
homework number being submitted.)
3. Now use the following command to create a zip file.
1. tar -cvf CSE4308HWNo.tar CSE4308HWNo.
2. gzip CSE4308HWNo.tar.
-To list all the files in the archive CSE4308HWNo.tar you can use tar
-tvf CSE4308HWNo.tar.
-To list all the files in the gzip CSE4308HWNo.gz you can use gzip -l
CSE4308HWNo.tar.gz.
-To extract all the files in the archive CSE4308HWNo.tar you can use
tar -xvf CSE4308HWNo.tar.
-To dcompress CSE4308HWNo.tar.gz you can use gzip -d
CSE4308HWNo.tar.gz.