Lab exercise for Introduction to Graphical Models and Monte Carlo
Taylan Cemgil, Cambridge June 20, 2007
1 Stochastic Inference
In this exercise you will implement the toy model (OSSS) described in the lecture.
s1 ∼ p(s1) = N (s1; µ1, P1) s2 ∼ p(s2) = N (s2; µ2, P2)
x|s1, s2 ∼ p(x|s1, s2) = N (x; s1+ s2, R)
s1 N (s1; µ1, P1)
s2 N (s2; µ2, P2)
x
N (x; s1+ s2, R)
Figure 1: OSSS model
We will use the following parameters: µ1= 3, µ2= 5, P1, P2= 0.5 and R = 0.3.
1.1 Forward sampling
Write a program to sample from the model. Find p(x) analytically and verify your result by plotting an histogram.
1.1.1 Solution
p(x) = N (x; µ1+ µ2, P1+ P2+ R)
1.2 Exact result
Suppose we observe x = 9. Find p(s1, s2|x = 9) analytically. Plot the posterior. You can use the function ellipse line to draw the isocontours of a Gaussian in 2-D.
1.3 Solution
Since log φ(s1, s2) =+p(s1, s2|x = 9) and this is a second order polynomial, the exact posterior is also Gaussian. Let
p(s1, s2|x = ˆx) = N ( µ s1
s2
¶
; ˜m, Σ)
log φ(s1, s2) = log p(x = ˆx|s1, s2) + log p(s1) + log p(s2)
=+ ¡
µT1P1−1+ ˆxTR−1¢ s1+¡
µT2P2−1+ ˆxTR−1¢ s2
−1 2Tr¡
P1−1+ R−1¢
s1sT1 − sT1R−1s2
| {z }
(∗)
−1 2Tr¡
P2−1+ R−1¢ s2sT2
By rewriting the expression for log φ, we obtain log φ(s1, s2) =+ −1
2
¡ s1 s2
¢µ
P1−1+ R−1 R−1 R−1 P2−1+ R−1
¶ µ s1
s2
¶ + h>
µ s1
s2
¶
where
h>≡¡
µT1P1−1+ ˆxTR−1 µT2P2−1+ ˆxTR−1 ¢
we identify the coefficients of the quadratic terms as the inverse covariance matrix of the exact posterior
Σ−1 =
µ P1−1+ R−1 R−1 R−1 P2−1+ R−1
¶
The mean is found by
˜
m = Σh Numerical result is
˜ m =
µ 3.4975 5.4975
¶
Σ =
µ 0.2512 −0.2488
−0.2488 0.2512
¶
1.4 Gibbs sampling
Find the full conditionals analytically and implement
s(t)1 ∼ p(s1|s(t)2 , x = ˆx) s(t+1)2 ∼ p(s2|s(t)1 , x = ˆx)
Compute the mean and the variance of the generated samples. Is the sample average close to the exact posterior mean?
1.4.1 Solution
We denote the conditional as
p(s1|s(t)2 , x = ˆx) = N (s1; ˜m1|2, Σ1|2) We substitute to the exact posterior
log φ(s1, s2= s(t)2 ) =+ −1 2
³
s1 s(t)2 ´ µ P1−1+ R−1 R−1 R−1 P1−1+ R−1
¶ µ s1
s(t)2
¶ +¡
µT1P1−1+ ˆxTR−1¢>
s1
Reorganizing the terms,
log φ(s1, s2= s(t)2 ) =+ −1
2s>1(P1−1+ R−1)s1+³
P1−1µ1+ R−1x − Rˆ −1s(t)2 ´>
s1
Σ−11|2 = P1−1+ R−1
˜
m1|2 = Σ1|2
³
P1−1µ1+ R−1(ˆx − s(t)2 )
´
R = 0.3;
P = 0.5;
mu1=3;
mu2=5;
x = 9;
T = 1000;
s = zeros(2,T);
s(:,1) = [5 ;5];
Sig = 1/(1/P + 1/R);
for t=2:T,
m1 = Sig*(mu1/P + (x-s(2,t-1))/R );
s(1,t) = sqrt(Sig)*randn + m1;
m2 = Sig*(mu2/P + (x-s(1,t))/R );
s(2,t) = sqrt(Sig)*randn + m2;
end
plot(s(1,:), s(2,:), ’.’)
Sig_exact = inv([1/P+1/R 1/R;1/R 1/P+1/R]);
1.5 Relaxation
Repeat the previous experiment with R = 0.005. How many iterations does it take until conver- gence if θ(0)= (µ1, µ2)?
Change during the simulation the R parameter slowly from 2 to 0.005. Can you find a schedule to speed up convergence. Try to get a plot like Figure 2:
s1
s2 exact posterior
factorized MF R1 R2
Rτ
Figure 2: Relaxation
2 AR(1) Model
A R
x0 x1 . . . xk
−1 xk . . . xK
A ∼ N (A; 0, 1.2) R ∼ IG(R; 0.4, 250) xk|xk−1, A, R ∼ N (xk; Axk−1, R)
x0 = 1 x1= −6
N (x; m, r) = exp{−1
2(x2+ m2− 2xm)/r −1
2log(2πr)}
IG(r; a, b) = exp µ
−(a + 1) log r − 1
br − log Γ(a) − a log b
¶
1. Draw the factor graph
2. Write the expression for the full joint distribution and assign terms to the individual factors on the factor graph
3. Derive the full conditional distributions p(A|R, x0, x1) and p(R|A, x0, x1) 4. Implement the Gibbs sampler
5. Implement the simulated annealing and iterative improvement
The mode of an inverse gamma distribution is at r = 1/((a + 1)b). To generate an inverse
2.1 Solution
p(A, R|x0, x1) ∝ p(x1|x0, A, R)p(A)p(R)
= N (x1; Ax0, R)N (A; 0, P )IG(R; ν, β/ν)
∝ exp µ
−1 2
x21
R + x0x1A R−1
2 x20A2
R −1
2log 2πR
¶
exp µ
−1 2
A2 P
¶ exp
µ
−(ν + 1) log R −ν β
1 R
¶
log p(A|R, x0, x1) =+ +x0x1A R −1
2 x20A2
R −1 2
A2 P
= −1 2
µx20 R + 1
P
¶
A2+x0x1
R A p(A|R, x0, x1) = N (A; µA, ΣA)
ΣA = µx20
R + 1 P
¶−1
µA = ΣAx0x1
R log p(R|A, x0, x1) =+ −1
2 x21
R + x0x1A R −1
2 x20A2
R −1
2log 2πR − (ν + 1) log R − ν β
1 R
=+ −(ν +1
2 + 1) log R − µ1
2x21− x0x1A +1
2x20A2+ν β
¶ 1 R p(R|A, x0, x1) = IG
Ã
R; ν +1 2,
µ1
2(x1− Ax0)2+ν β
¶−1!
beta_nu = 250;
nu = 0.4;
P = 1.2;
x_0 = 1;
x_1 = -6;
T = 10000;
R = zeros(1, T);
A = zeros(1, T);
A(1) = -6;
R(1) = 0.00001;
for t=2:T,
Sig = 1/(x_0^2/R(t-1) + 1/P);
mu = Sig*x_0*x_1/R(t-1);
A(t) = sqrt(Sig)*randn + mu;
b = 0.5*(x_1 - A(t)*x_0).^2 + 1/beta_nu;
R(t) = 1/(gamrnd(nu+0.5, 1/b));
end;
plot(A, log(R), ’.’);
3 Importance Sampling
Consider a fully connected graph with 3 nodes A, B and C. The edges are distributed with xAB ∼ E(x; u1)
xAC ∼ E(x; u2) xBC ∼ E(x; u3) where u1= 1, u2= 2, u3= 3
1. Find an expression for the length of the shortest path from A to C, denoted by L . 2. Simulate from the distribution of L
3. Compute the probability Pr(L ≥ 10) by importance sampling. What is the variance of the weights?
4. Use as a proposal where u1 = 2, u2 = 3, u3 = 4 and repeat. What is the variance of the weights?
5. Could one adapt the proposal?