Lab exercise for Introduction to Graphical Models and Monte Carlo

(1)

Lab exercise for Introduction to Graphical Models and Monte Carlo

Taylan Cemgil, Cambridge June 20, 2007

1 Stochastic Inference

In this exercise you will implement the toy model (OSSS) described in the lecture.

s1 ∼ p(s1) = N (s1; µ1, P1) s2 ∼ p(s2) = N (s2; µ2, P2)

x|s₁, s₂ ∼ p(x|s₁, s₂) = N (x; s₁+ s₂, R)

s₁ N (s₁; µ₁, P₁)

s₂ N (s₂; µ₂, P₂)

x

N (x; s₁+ s₂, R)

Figure 1: OSSS model

We will use the following parameters: µ1= 3, µ2= 5, P1, P2= 0.5 and R = 0.3.

1.1 Forward sampling

Write a program to sample from the model. Find p(x) analytically and verify your result by plotting an histogram.

1.1.1 Solution

p(x) = N (x; µ1+ µ2, P1+ P2+ R)

1.2 Exact result

Suppose we observe x = 9. Find p(s1, s2|x = 9) analytically. Plot the posterior. You can use the function ellipse line to draw the isocontours of a Gaussian in 2-D.

(2)

1.3 Solution

Since log φ(s1, s2) =⁺p(s1, s2|x = 9) and this is a second order polynomial, the exact posterior is also Gaussian. Let

p(s1, s2|x = ˆx) = N ( µ s1

s2

¶

; ˜m, Σ)

log φ(s1, s2) = log p(x = ˆx|s1, s2) + log p(s1) + log p(s2)

=⁺ ¡

µ^T₁P₁⁻¹+ ˆx^TR⁻¹¢ s1+¡

µ^T₂P₂⁻¹+ ˆx^TR⁻¹¢ s2

−1 2Tr¡

P₁⁻¹+ R⁻¹¢

s1s^T₁ − s^T₁R⁻¹s2

| {z }

(∗)

−1 2Tr¡

P₂⁻¹+ R⁻¹¢ s2s^T₂

By rewriting the expression for log φ, we obtain log φ(s1, s2) =⁺ −1

2

¡ s1 s2

¢µ

P₁⁻¹+ R⁻¹ R⁻¹ R⁻¹ P₂⁻¹+ R⁻¹

¶ µ s1

s2

¶ + h^>

µ s1

s2

¶

where

h^>≡¡

µ^T₁P₁⁻¹+ ˆx^TR⁻¹ µ^T₂P₂⁻¹+ ˆx^TR⁻¹ ¢

we identify the coefficients of the quadratic terms as the inverse covariance matrix of the exact posterior

Σ⁻¹ =

µ P₁⁻¹+ R⁻¹ R⁻¹ R⁻¹ P₂⁻¹+ R⁻¹

¶

The mean is found by

˜

m = Σh Numerical result is

˜ m =

µ 3.4975 5.4975

¶

Σ =

µ 0.2512 −0.2488

−0.2488 0.2512

¶

1.4 Gibbs sampling

Find the full conditionals analytically and implement

s^(t)₁ ∼ p(s1|s^(t)₂ , x = ˆx) s^(t+1)₂ ∼ p(s₂|s^(t)₁ , x = ˆx)

Compute the mean and the variance of the generated samples. Is the sample average close to the exact posterior mean?

(3)

1.4.1 Solution

We denote the conditional as

p(s1|s^(t)₂ , x = ˆx) = N (s1; ˜m_1|2, Σ_1|2) We substitute to the exact posterior

log φ(s1, s2= s^(t)₂ ) =⁺ −1 2

³

s1 s^(t)₂ ´ µ P₁⁻¹+ R⁻¹ R⁻¹ R⁻¹ P₁⁻¹+ R⁻¹

¶ µ s1

s^(t)₂

¶ +¡

µ^T₁P₁⁻¹+ ˆx^TR⁻¹¢>

s1

Reorganizing the terms,

log φ(s1, s2= s^(t)₂ ) =⁺ −1

2s^>₁(P₁⁻¹+ R⁻¹)s1+³

P₁⁻¹µ1+ R⁻¹x − Rˆ ⁻¹s^(t)₂ ´_>

s1

Σ⁻¹_1|2 = P₁⁻¹+ R⁻¹

˜

m1|2 = Σ1|2

³

P₁⁻¹µ1+ R⁻¹(ˆx − s^(t)₂ )

´

R = 0.3;

P = 0.5;

mu1=3;

mu2=5;

x = 9;

T = 1000;

s = zeros(2,T);

s(:,1) = [5 ;5];

Sig = 1/(1/P + 1/R);

for t=2:T,

m1 = Sig*(mu1/P + (x-s(2,t-1))/R );

s(1,t) = sqrt(Sig)*randn + m1;

m2 = Sig*(mu2/P + (x-s(1,t))/R );

s(2,t) = sqrt(Sig)*randn + m2;

end

plot(s(1,:), s(2,:), ’.’)

Sig_exact = inv([1/P+1/R 1/R;1/R 1/P+1/R]);

1.5 Relaxation

Repeat the previous experiment with R = 0.005. How many iterations does it take until conver- gence if θ⁽⁰⁾= (µ1, µ2)?

Change during the simulation the R parameter slowly from 2 to 0.005. Can you find a schedule to speed up convergence. Try to get a plot like Figure 2:

(4)

s₁

s2 exact posterior

factorized MF R₁ R2

R_τ

Figure 2: Relaxation

2 AR(1) Model

A R

x₀ x₁ . . . x_k

−1 x_k . . . x_K

A ∼ N (A; 0, 1.2) R ∼ IG(R; 0.4, 250) xk|xk−1, A, R ∼ N (xk; Axk−1, R)

x0 = 1 x1= −6

N (x; m, r) = exp{−1

2(x²+ m²− 2xm)/r −1

2log(2πr)}

IG(r; a, b) = exp µ

−(a + 1) log r − 1

br − log Γ(a) − a log b

¶

1. Draw the factor graph

2. Write the expression for the full joint distribution and assign terms to the individual factors on the factor graph

3. Derive the full conditional distributions p(A|R, x0, x1) and p(R|A, x0, x1) 4. Implement the Gibbs sampler

5. Implement the simulated annealing and iterative improvement

The mode of an inverse gamma distribution is at r = 1/((a + 1)b). To generate an inverse

(5)

2.1 Solution

p(A, R|x0, x1) ∝ p(x1|x0, A, R)p(A)p(R)

= N (x1; Ax0, R)N (A; 0, P )IG(R; ν, β/ν)

∝ exp µ

−1 2

x²₁

R + x0x1A R−1

2 x²₀A²

R −1

2log 2πR

¶

exp µ

−1 2

A² P

¶ exp

µ

−(ν + 1) log R −ν β

1 R

¶

log p(A|R, x0, x1) =⁺ +x0x1A R −1

2 x²₀A²

R −1 2

A² P

= −1 2

µx²₀ R + 1

P

¶

A²+x0x1

R A p(A|R, x0, x1) = N (A; µA, ΣA)

ΣA = µx²₀

R + 1 P

¶−1

µA = ΣAx0x1

R log p(R|A, x0, x1) =⁺ −1

2 x²₁

R + x0x1A R −1

2 x²₀A²

R −1

2log 2πR − (ν + 1) log R − ν β

1 R

=⁺ −(ν +1

2 + 1) log R − µ1

2x²₁− x0x1A +1

2x²₀A²+ν β

¶ 1 R p(R|A, x0, x1) = IG

Ã

R; ν +1 2,

µ1

2(x1− Ax0)²+ν β

¶₋₁!

beta_nu = 250;

nu = 0.4;

P = 1.2;

x_0 = 1;

x_1 = -6;

T = 10000;

R = zeros(1, T);

A = zeros(1, T);

A(1) = -6;

R(1) = 0.00001;

for t=2:T,

Sig = 1/(x_0^2/R(t-1) + 1/P);

mu = Sig*x_0*x_1/R(t-1);

A(t) = sqrt(Sig)*randn + mu;

b = 0.5*(x_1 - A(t)*x_0).^2 + 1/beta_nu;

R(t) = 1/(gamrnd(nu+0.5, 1/b));

end;

plot(A, log(R), ’.’);

(6)

3 Importance Sampling

Consider a fully connected graph with 3 nodes A, B and C. The edges are distributed with xAB ∼ E(x; u1)

xAC ∼ E(x; u2) xBC ∼ E(x; u3) where u1= 1, u2= 2, u3= 3

1. Find an expression for the length of the shortest path from A to C, denoted by L . 2. Simulate from the distribution of L

3. Compute the probability Pr(L ≥ 10) by importance sampling. What is the variance of the weights?

4. Use as a proposal where u1 = 2, u2 = 3, u3 = 4 and repeat. What is the variance of the weights?

5. Could one adapt the proposal?