Written and Programming Assignment - Probailites
The assignment should be submitted via Blackboard.
Task 1
24 points
You
are a meteorologist that places temperature sensors all of the world,
and you set them up so that they automatically e-mail you, each day,
the high temperature for that day. Unfortunately, you have forgotten
whether you placed a certain sensor S in Maine or in the Sahara desert
(but you are sure you placed it in one of those two places) . The
probability that you placed sensor S in Maine is 5%. The probability of
getting a daily high temperature of 80 degrees or more is 20% in Maine
and 90% in Sahara. Assume that probability of a daily high for any day
is conditionally independent of the daily high for the previous day,
given the location of the sensor.
Part
a: If
the first e-mail you got from sensor S indicates a daily high under 80
degrees, what is the probability that the sensor is placed in Maine?
Part
b: If
the first e-mail you got from sensor S indicates a daily high under 80
degrees, what is the probability that the second e-mail also indicates
a daily high under 80 degrees?
Part
c: What
is the probability that the first three e-mails all indicate daily
highs under 80 degrees?
Task 2
10 points.
In
a certain probability problem, we have 11 variables: A, B1,
B2,
..., B10.
- Variable A has 5 values.
- Each of variables B1, ..., B10 have 7
possible values. Each Bi is
conditionally indepedent of all other 9 Bjvariables
(with j != i) given A.
Based
on these facts:
Part
a: How
many numbers do you need to store in the joint distribution table of
these 11 variables?
Part
b: What
is the most space-efficient way (in terms of how many numbers you need
to store) representation for the joint probability distribution of
these 11 variables? How many numbers do you need to store in your
solution? Your answer should work with any variables satisfying the
assumptions stated above.
Task 3
16 points.
For the following joint probabilty distribution answer the following
|
C = T
|
C = F
|
B = T
|
B = F
|
B = T
|
B = F
|
A = T
|
0.048
|
0.192
|
0.196
|
0.084
|
A = F
|
0.012
|
0.048
|
0.294
|
0.126
|
a. Calculate P (A | B)
b. Calculate P (A | B, C)
c. Calculate P (A, C | B)
d. Given B, is A conditionally independant of C? Justify.
Task 4
50 points
The
task in this part is to implement a system that:
- Can determine the posterior probability of different
hypotheses, given priors for these hypotheses, and given a sequence of
observations.
- Can determine the probability that the next observation
will be of a specific type, priors for different hypotheses, and given
a sequence of observations.
As
in the slides that we saw in class, there are five types of bags of
candies. Each bag has an infinite amount of candies. We have one of
those bags, and we are picking candies out of it. We don't know what
type of bag we have, so we want to figure out the probability of each
type based on the candies that we have picked.
The
five possible hypotheses for our bag are:
- h1 (prior:
10%): This type of bag contains 100% cherry candies.
- h2 (prior:
20%): This type of bag contains 75% cherry candies and 25% lime candies.
- h3 (prior:
40%): This type of bag contains 50% cherry candies and 50% lime candies.
- h4 (prior:
20%): This type of bag contains 25% cherry candies and 75% lime candies.
- h5 (prior:
10%): This type of bag contains 100% lime candies.
Command
Line arguments:
The
program takes a single command line argument, which is a string, for
example CLLCCCLLL. This string represents a sequence of observations,
i.e., a sequence of candies that we have already picked. Each character
is C if we picked a cherry candy, and L if we picked a lime candy.
Assuming that characters in the string are numbered starting with 1,
the i-th character of the string corresponds to the i-th observation.
The program should be invoked from the commandline as follows:
compute_a_posteriori observations
For
example:
compute_a_posteriori CLLCCLLLCCL
We
also allow the case of not having a command line argument at all, this
represents the case where we have made no observations yet.
Output:
Your
program should create a text file called "result.txt", that is
formatted exactly as shown below. ??? is used where your program should
print values that depend on its command line argument. Five decimal
points should appear for any floating point number.
Observation sequence Q: ???
Length of Q: ???
After Observation ??? = ???: (This and all remaining lines are repeated for every observation)
P(h1 | Q) = ???
P(h2 | Q) = ???
P(h3 | Q) = ???
P(h4 | Q) = ???
P(h5 | Q) = ???
Probability that the next candy we pick will be C, given Q: ???
Probability that the next candy we pick will be L, given Q: ???
Sample output for the Tasks is given here.
How to submit
Submissions should be made using Blackboard.
Submit a ZIPPED directory called assignment8_<netid>.zip
(no other
forms of compression accepted, contact the instructor or TA if you do
not know how to produce a ZIP file) that contains all the files
- Your submission for Task 4 needs to execute in omega. Otherwise no credit will be given
- Submit solutions to the remaining Tasks in .pdf
format or as scanned .png images.
- The submission should also contain a file called
readme.txt,
which should specify precisely:
- Name and UTA ID of the student.
- Any additional instructions to compile/run your program.