仓库源文，站点原文

content {:toc}

Week 07

This week you should watch Lesson 6, Bayes Nets, and read Chapter 14 in AIMA (Russell & Norvig). > Assignment 3: Bayes Nets Sampling Due: October 10 at 11:59PM UTC-12 (Anywhere on Earth time)

Challenge Question

Graduation and job offer

given the Bayes net, what is the probability of P(O2|o1), that is the probability of the students get an offer from company 2 when s/he got an offer from company 1.

Answer

a simple Bayes net

bayes net quiz

How many parameters (probabilities) are needed to specify the Bayes network above? (answer is 3)

Computing Bayes Rule

normalizing pseudo probablility

the Bayes net can be calculated without knowing P(B) which is hard to calculate at first place.
Since we know P(A|B) + P(-A|B) = 1, we can use the pseudo probability method. We only calculate the numerator of the Bayse rule and normalize the result with a normalization factor.

Two Test Cancer example

note here P(C|+,+) =P(c, +,+)/[P(c,+,+) + P(-c,+,+)]

simillary, P(C|+,-) =P(c, +,-)/[P(c,+,-) + P(-c,+,-)]

conditional independence

When C is known, T1 and T2 are independent.
what if we don't know C?

the answer is that not knowing C, T1 and T2 might not be independent.

Quiz on Conditional independece

the hard part for me is to come up with P(+2|+1)=P(+2|+1,C)P(c|+1) + P(+2|+1,-C)P(-C|+1)
the calculation of P(C|+1) is covered in the lecture of Week 06. P(C|+1) = P(+1|C)P(C)/P(+1)

the relationship between absolute and conditional independence

quiz

Independence

S and R (Sunny and Raise) are not related in the Bayes Network

still the first part of the Bayes rule, I did not get
the point here that the lecture is trying to make is that P(R|S) = P(R) since R and S are independent
this is called Away effect

quiz: calculate P(R|H)

apply Bayes rule so that P(R|H) = P(H|R)*P(R)/P(H), P(H) is known
calculate P(H) is the key here
P(H|R) is easy, the steps are similar to the calculation of P(H)

Similar to the P(R|S,H)

Conditional Dependence

we know that R and S are independent, but adding the information about H, it created a dependence between the two. E.g. being happy in a not-Sunny day increases the probability of raise.

independence does not imply conditional independence

general definition of Bayes Network

Bayes Network requires fewer parameters to represent the probability distributions

quiz: Number of parameters needed

quiz: Number of parameters needed 2

quiz: Value Of A Network 3

the quizzes above showed the advantage of Bayes network v.s. unstructured networks. Bayes net is more concise.

D Separation

quiz:D Separation1

A and C are not independent because A can determine B and then determine A through B. But if B is known, then A becauses irrelevant.
C and D is not independent because knowing D can infer the knowledge of A and then can influence C. But if A is known, then D can not inference A or C anymore.
E and C are similar to C and D, they are independent if A, B, or D is known. if A, B, D is unknown, the C and E can be dependent.

Quiz: D Separation2

A matters in determine E so does B. Because Both A and B can influence C and C influence E. If C is known, A or B becomes irrelevant to E.
A and B are independent but if C is known, we can use A to infer B or vice versa. they are then not independent anymore in the case that C is known. This is the explain away effect.

D Separation Rules

Active means dependence of the nodes the unknown nodes.

Quiz: D Separation 3

notice that the third one is the application of the last rule in the previous slide.

Now the application of Bayes Network:

Probabilistic Inference

In the remaining part of this lesson, you will learn about probabilistic inference using Bayes Nets, i.e. how to answer questions that you are interested in, given certain inputs.

Evidence, query and hidden nodes

With Bayes net, we can infer the probability distribution of some nodes given certain input.
the input is called evidence, the output is called Query and all the other nodes are called Hidden nodes.

quiz: what is evidence, hidden and query when trying to solve P(B|M)?

Enumeration

Enumerate f(+e, +a)

The calculation becomes reading from the Bayes net table and summarize them. But when the number of nodes becomes larger, this method requires a lot of calculation.

Technique 1: pulling out terms that won't change with the loop variable

Technique 2: maximize independence

e.g., if A is unknown, J and M are dependent.

Bayes net in Causal direction

Causal direction

Value elimination

Value elimination is a two-step operation: eliminate some fac Joining factors and elimination

example of joining factors

Example of elimination

Value elimination 2

Value elimination 3

Approximate Inference

approximate by sampling

Quiz: sampling, give R and S, what's the rows to consider to sample from, and where W should be positive?

the sampling procedure will get the probability distribution to be close to the truth if repeated infinitely. This is called consistent.
the sampling method can be used to generate the complete joint probability or individual probability
to generate conditional probability, we can still use the sampling method, but need to cross-off the irrelevant joint probabilities. E.g, if we need to know P(R|C), then all the cases when c is not true will need to be thrown away. this is called rejection sampling.

Rejection Sampling

the problem of rejection sampling is that when the condition we care about is at a lower odds, the rejection sampling will reject a lot of samples which is a waste of resources. We can solve the problem by fixing the conditional variable, this is called likelihood weighting.

Likelyhood weighting

the problem of likelihood weighting, is that it is inconsistent. In another word, sampling with likelihood weighting cannot get true probability distribution unless it is properly weighted.

Likelihood Weighting

quiz: Likelihood Weighting

Specify the weight for w+ and +s when constraining them when sampling. In the quiz, we need to constrain the +2 with the probability of 0.99 since we already know its parents are +S(due to constraining) and +R(due to sampling).
With likelihood weighting, we can make rejection sampling consistent
but it does not solve all the problems, for example

Likelihood weighting fails helping P(C|+s, +r

Likelihood weighting with sampling will get good values for WetGrass because S and R determine W.
However, the sampling of C will not look at S or R, so the process will still generate C with values that do not go well with R and S, in this case, a lot of rejections will happen.

Gibbs Sampling

Markov Chain Monte Carlo (MCMC): 1) sample by randomly sampling all nodes at the first time, then, 2) randomly identify one node from the first sample, and resample the node while keeping all others consistent with the first one to generate the second sample, and 3) repeat 2) with the second sample to generate the third sample, and so on...
MCMC is consistent.

Monty Hall Problem

remember, the host will never reveal the door that the player selected in the second step, so be revealing one door, we did not know more about the door selected, but the door not selected got updated by the information provided by the revealing door

Even Monty Hall himself does not understand the Monty Hall problem.