• Sonuç bulunamadı

p(D|n=1)p(D|n=2)p(D|n=3)p(D|n=4)p(D|n=5) p(n|D = 9)

N/A
N/A
Protected

Academic year: 2021

Share "p(D|n=1)p(D|n=2)p(D|n=3)p(D|n=4)p(D|n=5) p(n|D = 9)"

Copied!
61
0
0

Yükleniyor.... (view fulltext now)

Tam metin

(1)

An Introduction to

Graphical Models and Monte Carlo methods

A. Taylan Cemgil

Signal Processing and Communications Lab.

Birkbeck School of Economics, Mathematics and Statistics June 19, 2007

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007

Goals of this Tutorial

To Provide ...

• a basic understanding of underlying principles of probabilistic modeling and inference

• an introduction to Graphical models and associated concepts

• a succinct overview of (perhaps interesting) applications from engineering and computer science

– Statistical Signal Processing, Pattern Recognition – Machine Learning, Artificial Intelligence

• an initial orientation in the broad literature of Monte Carlo methods

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 1

First Part, Basic Concepts and MCMC

• Introduction

– Bayes’ Theorem,

– Trivial toy example to clarify notation

• Graphical Models – Bayesian Networks

– Undirected Graphical models, Markov Random Fields – Factor graphs

• Maximum Likelihood and Bayesian Learning

• Some Applications

– (classical AI) Medical Expert systems, (Statistics) Variable selection, (Engineering-CS) Computer vision,

– Time Series - terminology and applications – Audio processing

– Non Bayesian applications

• Probability Models

– Exponential family, Conjugacy

– Motivation for Approximate Inference

• Markov Chain Monte Carlo – A Gaussian toy example – The Gibbs sampler

– Sketch of Markov Chain theory

– Metropolis-Hastings, MCMC Transition Kernels,

(2)

– Sketch of Convergence proofs for Metropolis-Hastings and the Gibbs sampler

– Optimisation versus Integration: Simulated annealing and iterative improvement

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 4

Second Part, Time Series Models and SMC

• Latent State-Space Models – Hidden Markov Models (HMM), – Kalman Filter Models

– Switching State Space models – Changepoint models

• Inference in HMM

– Forward Backward Algorithm – Viterbi

– Exact inference in Graphical models by message passing

• Sequential Monte Carlo – Importance Sampling

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 5

– Particle Filtering

• Final Remarks and Bibliography

Bayes’ Theorem

Thomas Bayes (1702-1761)

“What you know about a parameter λ after the data D arrive is what you knew before about λ and what the data D told you

1

.”

p(λ|D) = p(D|λ)p(λ) p(D)

Posterior = Likelihood × Prior Evidence

1(Janes 2003 (ed. by Bretthorst); MacKay 2003)

(3)

An application of Bayes’ Theorem: “Source Separation”

Given two fair dice with outcomes λ and y,

D = λ + y

What is λ when D = 9 ?

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 8

An application of Bayes’ Theorem: “Source Separation”

D = λ + y = 9

D = λ + y y = 1 y = 2 y = 3 y = 4 y = 5 y = 6

λ = 1 2 3 4 5 6 7

λ = 2 3 4 5 6 7 8

λ = 3 4 5 6 7 8

9

λ = 4 5 6 7 8

9

10

λ = 5 6 7 8

9

10 11

λ = 6 7 8

9

10 11 12

Bayes theorem “upgrades”p(λ) into p(λ|D).

But you have to provide an observation model:p(D|λ)

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 9

“Burocratical” derivation

Formally we write

p(λ) = C(λ; [ 1/6 1/6 1/6 1/6 1/6 1/6 ]) p(y) = C(y; [ 1/6 1/6 1/6 1/6 1/6 1/6 ]) p(D|λ, y) = δ(D − (λ + y))

Kronecker delta function denoting a degenerate (deterministic) distribution δ(x) =

1 x = 0 0 x 6= 0

p(λ, y|D) = 1

p(D)× p(D|λ, y) × p(y)p(λ) Posterior = 1

Evidence× Likelihood × Prior p(λ|D) = X

y

p(λ, y|D) Posterior Marginal

Prior

p(y)p(λ)

p(y) × p(λ) y = 1 y = 2 y = 3 y = 4 y = 5 y = 6

λ = 1 1/36 1/36 1/36 1/36 1/36 1/36

λ = 2 1/36 1/36 1/36 1/36 1/36 1/36

λ = 3 1/36 1/36 1/36 1/36 1/36 1/36

λ = 4 1/36 1/36 1/36 1/36 1/36 1/36

λ = 5 1/36 1/36 1/36 1/36 1/36 1/36

λ = 6 1/36 1/36 1/36 1/36 1/36 1/36

• A table with indicies λ and y

• Each cell denotes the probability p(λ, y)

(4)

Likelihood

p(D = 9|λ, y)

p(D = 9|λ, y) y = 1 y = 2 y = 3 y = 4 y = 5 y = 6

λ = 1 0 0 0 0 0 0

λ = 2 0 0 0 0 0 0

λ = 3 0 0 0 0 0 1

λ = 4 0 0 0 0 1 0

λ = 5 0 0 0 1 0 0

λ = 6 0 0 1 0 0 0

• A table with indicies λ and y

• The likelihood is not a probability distribution, but a positive function.

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 12

Likelihood × Prior

φ

D

(λ, y) = p(D = 9|λ, y)p(λ)p(y)

p(D = 9|λ, y) y = 1 y = 2 y = 3 y = 4 y = 5 y = 6

λ = 1 0 0 0 0 0 0

λ = 2 0 0 0 0 0 0

λ = 3 0 0 0 0 0 1/36

λ = 4 0 0 0 0 1/36 0

λ = 5 0 0 0 1/36 0 0

λ = 6 0 0 1/36 0 0 0

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 13

Evidence

p(D = 9) = X

λ,y

p(D = 9|λ, y)p(λ)p(y)

= 0 + 0 + · · · + 1/36 + 1/36 + 1/36 + 1/36 + 0 + · · · + 0

= 1/9

p(D = 9|λ, y) y = 1 y = 2 y = 3 y = 4 y = 5 y = 6

λ = 1 0 0 0 0 0 0

λ = 2 0 0 0 0 0 0

λ = 3 0 0 0 0 0 1/36

λ = 4 0 0 0 0 1/36 0

λ = 5 0 0 0 1/36 0 0

λ = 6 0 0 1/36 0 0 0

Posterior

p(λ, y|D = 9) = 1

p(D) p(D = 9|λ, y)p(λ)p(y)

p(D = 9|λ, y) y = 1 y = 2 y = 3 y = 4 y = 5 y = 6

λ = 1 0 0 0 0 0 0

λ = 2 0 0 0 0 0 0

λ = 3 0 0 0 0 0 1/4

λ = 4 0 0 0 0 1/4 0

λ = 5 0 0 0 1/4 0 0

λ = 6 0 0 1/4 0 0 0

1/4 = (1/36)/(1/9)

(5)

Marginal Posterior

p(λ|D) = X

y

1

p(D)p(D|λ, y)p(λ)p(y)

p(λ|D = 9) y = 1 y = 2 y = 3 y = 4 y = 5 y = 6

λ = 1 0 0 0 0 0 0 0

λ = 2 0 0 0 0 0 0 0

λ = 3 1/4 0 0 0 0 0 1/4

λ = 4 1/4 0 0 0 0 1/4 0

λ = 5 1/4 0 0 0 1/4 0 0

λ = 6 1/4 0 0 1/4 0 0 0

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 16

The “proportional to” notation

p(λ|D = 9) ∝ p(λ, D = 9) =X

y

p(D = 9|λ, y)p(λ)p(y)

p(λ, D = 9) y = 1 y = 2 y = 3 y = 4 y = 5 y = 6

λ = 1 0 0 0 0 0 0 0

λ = 2 0 0 0 0 0 0 0

λ = 3 1/36 0 0 0 0 0 1/36

λ = 4 1/36 0 0 0 0 1/36 0

λ = 5 1/36 0 0 0 1/36 0 0

λ = 6 1/36 0 0 1/36 0 0 0

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 17

Exercise

p(x1, x2) x2= 1 x2= 2 x1= 1 0.3 0.3 x1= 2 0.1 0.3 1. Find the following quantities

• Marginals: p(x1), p(x2)

• Conditionals: p(x1|x2), p(x2|x1)

• Posterior: p(x1, x2= 2), p(x1|x2= 2)

• Evidence: p(x2= 2)

• p({})

• Max: p(x1) = maxx1p(x1|x2= 1)

• Mode: x1= arg maxx1p(x1|x2= 1)

• Max-marginal: maxx1p(x1, x2)

2. Arex1andx2independent ? (i.e., Isp(x1, x2) = p(x1)p(x2) ?)

Answers

p(x1, x2) x2= 1 x2= 2 x1= 1 0.3 0.3 x1= 2 0.1 0.3

• Marginals:

p(x1) x1= 1 0.6 x1= 2 0.4

p(x2) x2= 1 x2= 2

0.4 0.6

• Conditionals:

p(x1|x2) x2= 1 x2= 2 x1= 1 0.75 0.5 x1= 2 0.25 0.5

p(x2|x1) x2= 1 x2= 2 x1= 1 0.5 0.5 x1= 2 0.25 0.75

(6)

Answers

p(x1, x2) x2= 1 x2= 2 x1= 1 0.3 0.3 x1= 2 0.1 0.3

• Posterior:

p(x1, x2= 2) x2= 2 x1= 1 0.3 x1= 2 0.3

p(x1|x2= 2) x2= 2 x1= 1 0.5 x1= 2 0.5

• Evidence:

p(x2= 2) =X

x1

p(x1, x2= 2) = 0.6

• Normalisation constant:

p({}) =X

x1

X

x2

p(x1, x2) = 1

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 20

Answers

p(x1, x2) x2= 1 x2= 2 x1= 1 0.3 0.3 x1= 2 0.1 0.3

• Max: (get the value)

maxx1 p(x1|x2= 1) = 0.75

• Mode: (get the index)

argmax

x1 p(x1|x2= 1) = 1

• Max-marginal: (get the “skyline”) maxx1p(x1, x2)

maxx1p(x1, x2) x2= 1 x2= 2

0.3 0.3

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 21

Another application of Bayes’ Theorem: “Model Selection”

Given an unknown number of fair dice with outcomes λ

1

, λ

2

, . . . , λ

n

,

D = X

n

i=1

λ

i

How many dice are there when D = 9 ? Assume that any number n is equally likely

Another application of Bayes’ Theorem: “Model Selection”

Given alln are equally likely (i.e., p(n) is flat), we calculate (formally)

p(n|D = 9) = p(D = 9|n)p(n)

p(D) ∝ p(D = 9|n)

p(D|n = 1) = X

λ1

p(D|λ1)p(λ1)

p(D|n = 2) = X

λ1

X

λ2

p(D|λ1, λ2)p(λ1)p(λ2) . . .

p(D|n = n) = X

λ1,...,λn′

p(D|λ1, . . . , λn)

n

Y

i=1

p(λi)

(7)

p(D|n) = P

λ

p(D|λ, n)p(λ|n)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0

0.2

p(D|n=1)

D 0

0.2

p(D|n=2)

0 0.2

p(D|n=3)

0 0.2

p(D|n=4)

0 0.2

p(D|n=5)

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 24

Another application of Bayes’ Theorem: “Model Selection”

1 2 3 4 5 6 7 8 9

0 0.1 0.2 0.3 0.4 0.5

n = Number of Dice

p(n|D = 9)

• Complex models are more flexible but they spread their probability mass

• Bayesian inference inherently prefers “simpler models” – Occam’s razor

• Computational burden: We need to sum over all parameters λ

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 25

Probabilistic Inference

A huge spectrum of applications – all boil down to computation of

• expectations of functions under probability distributions: Integration hf (x)i =

Z

X

dxp(x)f (x) hf (x)i = X

x∈X

p(x)f (x)

• modes of functions under probability distributions: Optimization x = argmax

x∈X p(x)f (x)

• any “mix” of the above: e.g., x = argmax

x∈X

p(x) = argmax

x∈X

Z

Z

dzp(z)p(x|z)

Graphical Models

“By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and in effect increases the mental power of the race.” A.N. Whitehead

(8)

Graphical Models

• formal languages for specification of probability distributions and associated inference algorithms

• historically, introduced in probabilistic expert systems (Pearl 1988) as a visual guide for representing expert knowledge

• today, a standard tool in machine learning, statistics and signal processing

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 28

Graphical Models

• provide graph based algorithms for derivations and computation

• pedagogical insight/motivation for model/algorithm construction

– Statistics:

“Kalman filter models and hidden Markov models (HMM) are equivalent upto parametrisation”

– Signal processing:

“Fast Fourier transform is an instance of sum-product algorithm on a factor graph”

– Computer Science:

“Backtracking in Prolog is equivalent to inference in Bayesian networks with deterministic tables”

• Automated tools for code generation start to emerge, making the design/implement/test cycle shorter

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 29

Important types of Graphical Models

• Useful for Model Construction

– Directed Acyclic Graphs (DAG), Bayesian Networks – Undirected Graphs, Markov Networks, Random Fields – Influence diagrams

– ...

• Useful for Inference – Factor Graphs

– Junction/Clique graphs – Region graphs

– ...

Directed Acyclic Graphical (DAG) Models

Factor Graphs and

(9)

Directed Graphical models

• Each random variable is associated with a node in the graph,

• We draw an arrow from A → B if p(B| . . . , A, . . . ) (A ∈ parent(B)),

• The edges tell us qualitatively about the factorization of the joint probability

• For N random variables x

1

, . . . , x

N

, the distribution admits

p(x

1

, . . . , x

N

) = Y

N i=1

p(x

i

|parent(x

i

))

• Describes in a compact way an algorithm to “generate” the data –

“Generative models”

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 32

DAG Example: Two dice

p(λ) p(y)

λ y

D p(D|λ, y)

p(D, λ, y) = p(D|λ, y)p(λ)p(y)

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 33

DAG with observations

p(λ) p(y)

λ y

D p(D = 9|λ, y)

φD(λ, y) = p(D = 9|λ, y)p(λ)p(y)

Examples

Model Structure factorization

Full ! " # p(x1)p(x2|x1)p(x3|x1, x2)p(x4|x1, x2, x3)

Markov(2) ! " # p(x1)p(x2|x1)p(x3|x1, x2)p(x4|x2, x3)

Markov(1) ! " # p(x1)p(x2|x1)p(x3|x2)p(x4|x3)

! " #

p(x1)p(x2|x1)p(x3|x1)p(x4)

Factorized ! " # p(x1)p(x2)p(x3)p(x4)

Removing edges eliminates a term from the conditional probability factors.

(10)

Undirected Graphical Models

• Define a distribution by local compatibility functions φ(xα)

p(x) = 1 Z

Y

α

φ(xα)

whereα runs over cliques : fully connected subsets

• Markov Random Fields

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 36

Undirected Graphical Models

• Examples

x1

x2 x3

x4

x1

x2 x3

x4

p(x) =Z1φ(x1, x2)φ(x1, x3)φ(x2, x4)φ(x3, x4) p(x) =Z1φ(x1, x2, x3)φ(x2, x3, x4)

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 37

Factor graphs

(Kschischang et. al.)

• A bipartite graph. A powerful graphical representation of the inference problem – Factor nodes: Black squares. Factor potentials (local functions) defining

the posterior.

– Variable nodes: White Nodes. Define collections of random variables – Edges: denote membership. A variable node is connected to a factor node

if a member variable is an argument of the local function.

p(λ) p(y)

λ y

p(D = 9|λ, y)

φD(λ, y) = p(D = 9|λ, y)p(λ)p(y) = φ1(λ, y)φ2(λ)φ3(y)

Exercise

• For the following Graphical models, write down the factors of the joint distribution and plot an equivalent factor graph.

Full ! " # Markov(1) ! " #

HMM

! " #

! !! !" !#

MIX ! !! !" !#

IFA

!

! !

!

!

"

!

# Factorized ! " #

(11)

Answer (Markov(1))

x1 x2 x3 x4

p(x1) x1

p(x2|x1) x2

p(x3|x2) x3

p(x4|x3) x4

x1 x2 x3 x4

p(x1)p(x2|x1)

| {z }

φ(x1,x2)

p(x3|x2)

| {z }

φ(x2,x3)

p(x4|x3)

| {z }

φ(x3,x4)

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 40

Answer (IFA – Factorial)

!

! !

!

!

"

!

#

p(h1)p(h2) Y4 i=1

p(xi|h1, h2)

h1 h2

x1 x2 x3 x4

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 41

Answer (IFA – Factorial)

h1 h2

x1 x2 x3 x4

• We can also cluster nodes together

h1, h2

x1 x2 x3 x4

Inference and Learning

• Data set

D = {x

1

, . . . x

N

}

• Model with parameter λ

p(D|λ)

• Maximum Likelihood (ML)

λ

ML

= arg max

λ

log p(D|λ)

• Predictive distribution

p(x

N +1

|D) ≈ p(x

N +1

ML

)

(12)

Regularisation

• Prior

p(λ)

• Maximum a-posteriori (MAP) : Regularised Maximum Likelihood λ

MAP

= arg max

λ

log p(D|λ)p(λ)

• Predictive distribution

p(x

N +1

|D) ≈ p(x

N +1

MAP

)

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 44

Bayesian Learning

• We treat parameters on the same footing as all other variables

• We integrate over unknown parameters rather than using point estimates (remember the many-dice example)

– Avoids overfitting

– Natural setup for online adaptation – Model selection

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 45

Bayesian Learning

• Predictive distribution p(x

N +1

|D) =

Z

dλ p(x

N +1

|λ)p(λ|D)

λ

x1 x2 . . . xN xN +1

• Bayesian learning is just inference ...

Some Applications

(13)

Medical Expert Systems

A S

T L B

E

X D

Diseases

Symptomes Causes

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 48

Medical Expert Systems

Visit to Asia? Smoking?

Tuberclosis? Lung Cancer? Bronchitis?

Either T or L?

Positive X Ray? Dyspnoea?

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 49

Medical Expert Systems

Visit to Asia?

0 %99

1 %1

Smoking?

0 %50

1 %50

Tuberclosis?

0 %99

1 %1

Lung Cancer?

0 %94.5

1 %5.5

Bronchitis?

0 %55

1 %45

Either T or L?

0 %93.5

1 %6.5

Positive X Ray?

0 %89

1 %11

Dyspnoea?

0 %56.4

1 %43.6

Medical Expert Systems

Visit to Asia?

0 %98.7

1 %1.3

Smoking?

0 %31.2

1 %68.8

Tuberclosis?

0 %90.8

1 %9.2

Lung Cancer?

0 %51.1

1 %48.9

Bronchitis?

0 %49.4

1 %50.6

Either T or L?

0 %42.4

1 %57.6

Positive X Ray?

0 %0

1 %100

Dyspnoea?

0 %35.9

1 %64.1

(14)

Medical Expert Systems

Visit to Asia?

0 %98.5

1 %1.5

Smoking?

0 %100

1 %0

Tuberclosis?

0 %85.2

1 %14.8

Lung Cancer?

0 %85.8

1 %14.2

Bronchitis?

0 %70

1 %30

Either T or L?

0 %71.1

1 %28.9

Positive X Ray?

0 %0

1 %100

Dyspnoea?

0 %56

1 %44

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 52

Model Selection: Variable selection in Polynomial Regression

• Given D = {tj, x(tj)}j=1...J, what is the orderN of the polynomial?

x(t) = XN i=0

si+1ti+ ǫ(t)

−1 −0.5 0 0.5 1

−0.2 0 0.2 0.4 0.6 0.8 1 1.2

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 53

Bayesian Variable Selection

C(r1; π) C(rW; π)

r1 . . . rW

N (s1; µ(r1), Σ(r1)) s1 . . . sW N (sW; µ(rW), Σ(rW))

x

N (x; Cs1:W, R)

• Generalized Linear Model – Column’s of C are the basis vectors

• The exact posterior is a mixture of 2WGaussians

• When W is large, computation of posterior features becomes intractable.

Regression

t = t1 t2 . . . tJ 

C ≡ t0 t1 . . . tW −1 

>> C = fliplr(vander(0:4)) % Van der Monde matrix

1 0 0 0 0

1 1 1 1 1

1 2 4 8 16

1 3 9 27 81

1 4 16 64 256

ri ∼ C(ri; 0.5, 0.5) ri∈ {on, off}

si|ri ∼ N (si; 0, Σ(ri)) x|s1:W ∼ N (x; Cs1:W, R)

Σ(ri= on) ≫ Σ(ri= off)

(15)

Regression

To find the “active” basis functions we need to calculate r1:W ≡ argmax

r1:W p(r1:W|x) = argmax

r1:W

Z

ds1:Wp(x|s1:W)p(s1:W|r1:W)p(r1:W)

Then, the reconstruction is given by

ˆ x(t) =

*W −1 X

i=0

si+1ti +

p(s1:W|x,r1:W )

=

W −1X

i=0

hsi+1ip(s

i+1|x,r1:W)ti

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 56

Regression

i

0 1 2 3 4

−10 0 10 20 30

p(x, r1:W)

All on Configurations All off

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 57

Regression

−1 −0.5 0 0.5 1

−0.2 0 0.2 0.4 0.6 0.8 1 1.2

data true approx

Clustering

(16)

Clustering

π Label probability

c1 c2 . . . cN Labels∈ {a, b}

x1 x2 . . . xN Data Points

µa µb Cluster Centers

a, µb, π) = argmax

µab

X

c1:N

YN i=1

p(xia, µb, ci)p(ci|π)

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 60

Computer vision / Cognitive Science

How many rectangles are there in this image?

0 10 20 30 40 50 60

0

5

10

15

20

25

30

35

40

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 61

Computer vision / Cognitive Science

. . .

π1 π2 . . . πN Label probabilities

c1 c2 . . . cN Labels∈ {a, b, . . .}

x1 x2 . . . xN Pixel Values

µa µb . . . Rectangle Colors

0 10 20 30 40 50 60

0

5

10

15

20

25

30

35

40

Computer Vision

How many people are there in these images?

(17)

Visual Tracking

20 40 60

20 40 60 80 100 120 140

20 40 60

20 40 60 80 100 120 140

20 40 60

20 40 60 80 100 120 140

20 40 60

20 40 60 80 100 120 140

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 64

Navigation, Robotics

−0.5 0 0.5

−0.6

−0.4

−0.2 0 0.2 0.4 0.6

−2 0 2

−2

−1 0 1 2

−2 0

2

−2 0 2 0 2 4

f Lx Ly

−0.5 0 0.5

−0.6

−0.4

−0.2 0 0.2 0.4 0.6

−2 0 2

−2

−1 0 1 2

−2 0 2

−2 0 2 0 2 4 6 8

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 65

Navigation, Robotics

GPS?t GPS status

Gt GPS reading

... Other sensors (magnetic, pressure, e.t.c.) lt Linear accelerator sensor

ωt Gyroscope

Et−1 Et Attitude Variables

Xt−1 Xt Linear Kinematic Variables

1:Nt}t Set of feature points (Camera Frame)

{x1:Mt}t Set of feature points (World Coordinates)

ρ(x) Global Static Map (Intensity function)

Time series models and Inference, Terminology

Generic structure of dynamical system models

x0 x1 . . . xk−1 xk . . . xK

y1 . . . yk−1 yk . . . yK

xk ∼ p(xk|xk−1) Transition Model yk ∼ p(yk|xk) Observation Model

• x are the latent states

• y are the observations

• In a full Bayesian setting, x includes unknown model parameters

(18)

Online Inference, Terminology

• Filtering: p(x

k

|y

1:k

)

– Distribution of current state given all past information – Realtime/Online/Sequential Processing

x0 x1 . . . xk−1 xk . . . xK

y1 . . . yk−1 yk . . . yK

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 68

Online Inference, Terminology

• Prediction p(y

k:K

, x

k:K

|y

1:k−1

)

– evaluation of possible future outcomes; like filtering without observations

x0 x1 . . . xk−1 xk . . . xK

y1 . . . yk−1 yk . . . yK

• Tracking, Restoration

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 69

Offline Inference, Terminology

• Smoothing p(x0:K|y1:K),

Most likely trajectory – Viterbi patharg maxx0:Kp(x0:K|y1:K) better estimate of past states, essential for learning

x0 x1 . . . xk−1 xk . . . xK

y1 . . . yk−1 yk . . . yK

• Interpolation p(yk, xk|y1:k−1, yk+1:K) fill in lost observations given past and future

x0 x1 . . . xk−1 xk . . . xK

y1 . . . yk−1 yk . . . yK

Time Series Analysis

• Stationary

0 10 20 30 40 50 60 70 80 90 100

−0.5 0 0.5

– What is the true state of the process given noisy data ? – Parameters ?

– Markovian ? Order ?

(19)

Time Series Analysis

• Nonstationary, time varying variance – stochastic volatility

0 200 400 600 800 1000

0 5 10 15 20

vk

0 200 400 600 800 1000

−10

−5 0 5 10

yk

k

True VB

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 72

Time Series Analysis

• Nonstationary, time varying intensity – nonhomogeneous Poisson Process

0 50 100 150

λk

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

ck

Arrival time

True VB

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 73

Time Series Analysis

• Piecewise constant

0 10 20 30 40 50 60 70 80 90 100

−5 0 5 10 15

Time Series Analysis

• Piecewise linear

0 20 40 60 80 100 120 140 160 180 200

−10

−5 0 5 10 15

• Segmentation and Changepoint detection

– What is the true state of the process given noisy data ? – Where are the changepoints ?

– How many changepoints ?

(20)

Audio Processing

xt

t (Speech)

t xt

(Piano)

x = x1 . . . xt . . . 

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 76

Audio Restoration

• During download or transmission, some samples of audio are lost

• Estimate missing samples given clean ones

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 77

Examples: Audio Restoration

p(x¬κ|xκ) ∝ Z

dHp(x¬κ|H)p(xκ|H)p(H) H ≡ (parameters, hidden states)

H

x¬κ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Probabilistic Phase Vocoder

(Cemgil and Godsill 2005)

Aν Qν

sν0 · · · sνk · · · sνK−1 ν = 0 . . . W1

x0 xk xK−1

sνk ∼ N (sνk; Aνsνk−1, Qν) Aν∼ N

 Aν;

 cos(ων) − sin(ων) sin(ων) cos(ων)

 , Ψ



(21)

Restoration

• Piano

– Signal with missing samples (37%) – Reconstruction, 7.68 dB improvement – Original

• Trumpet

– Signal with missing samples (37%) – Reconstruction, 7.10 dB improvement – Original

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 80

Pitch Tracking

Monophonic Pitch Tracking= Online estimation (filtering) of p(rt, mt|y1:t).

100 200 300 400 500 600 700 800 900 1000

−100

−50 0 50

100 200 300 400 500 600 700 800 900 1000

5 10 15

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 81

Pitch Tracking

r0 r1 . . . rT

m0 m1 . . . mT

s0 s1 . . . sT

y1 . . . yT

Monophonic transcription

• Detecting onsets, offsets and pitch(Cemgil et. al. 2006, IEEE TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

(22)

Tracking Pitch Variations

• Allow m to change with k.

50 100 150 200 250 300 350 400 450 500

• Intractable, need to resort to approximate inference (Mixture Kalman Filter - Rao-Blackwellized Particle Filter)

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 84

Source Separation

sk,1 . . . sk,n . . . sk,N

xk,1 . . . xk,M

k = 1 . . . K

a1 r1 . . . aM rM

• Joint estimation Sources, Channel noise and mixing system xk,1:M ∼ N (xk,1:M; Ask,1:N, R)

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 85

Spectrogram

t/sec

f/Hz

0 200 400 600 800 1000 1200

0 2000 4000 6000 8000 10000

5 10 15 20 25 30

(Speech)

t/sec

f/Hz

0 200 400 600 800 1000 1200

0 2000 4000 6000 8000 10000

0 10 20 30

(Piano)

• A linear expansion using a collection of basis functions φ(t; τ, ω) centered around timeτ and frequency ω

xt = X

τ,ω

α(τ, ω)φ(t; τ, ω)

• Spectrogram displays log |α(τ, ω)|2or|α(τ, ω)|2

Source Separation

t/sec

f/Hz

0 200 400 600 800 1000 1200

0 2000 4000 6000 8000 10000

5 10 15 20 25 30

(Speech)

t/sec

f/Hz

0 200 400 600 800 1000 1200

0 2000 4000 6000 8000 10000

0 10 20 30

(Piano)

t/sec

f/Hz

0 200 400 600 800 1000 1200

0 2000 4000 6000 8000 10000

0 5 10 15 20 25

(Guitar)

t/sec

f/Hz

0 200 400 600 800 1000 1200

0 2000 4000 6000 8000 10000

10 15 20 25

(Mix)

(23)

Reconstructions

t/sec

f/Hz

0 200 400 600 800 1000 1200

0 2000 4000 6000 8000 10000

10 15 20 25 30

(Speech)

t/sec

f/Hz

0 200 400 600 800 1000 1200

0 2000 4000 6000 8000 10000

0 10 20 30

(Piano)

t/sec

f/Hz

0 200 400 600 800 1000 1200

0 2000 4000 6000 8000 10000

5 10 15 20 25

(Guitar)

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 88

Polyphonic Music Transcription

• from sound ...

t/sec

f/Hz

0 1 2 3 4 5 6 7 8

0 1000 2000 3000 4000 5000

0 10 20

(S)

• ... to score

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 89

Generative Models for Music Generative Models for Music

Score Expression

Piano-Roll

Signal

(24)

Hierarchical Modeling of Music

! !!

"""

!

# #! """ #

$ $! """ $

% %! """ %

& &! """ &

' '! """ '

(!" (!"! """ (!"

)!" )!"! """ )!"

*!" *!"! """ *!"

!" !"!

"""

!"

+!" +!"! """ +!"

+ +! """ +

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 92

A few non-Bayesian applications where Monte Carlo is useful

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 93

Combinatorics

• Counting

Example : What is the probability that a solitaire laid out with 52 cards comes out successfully given all permutations have equal probability ?

|A| = X

x∈X

[x ∈ A] [x ∈ A] ≡

 1 x ∈ A 0 x /∈ A

p(x ∈ A) = |A|

|X |= ?

≈ 2225

Geometry

• Given a simplex S in N dimensional space by S = {x : Ax ≤ b, x ∈ RN} find the Volume|S|

(25)

Rare Events

• Given a graph with random edge lengths xi∼ p(xi)

Find the probability that the shortest path from A to B is larger thanγ.

A B

x1

x2

x4

x5

x3

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 96

Rare Events

x1 x2 x3 x4 x5 Edge Lengths

L ShortestPath(A,B)

Pr(L ≥ γ) = Z

dx1:5[L(x1:5) ≥ γ] p(x1:5)

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 97

Rare Events

A B

hx1i = 4

hx2i = 1

hx4i = 1

hx5i = 4 hx3i = 1

xi∼ E(xi; ui) ≡u1

iexp

u1

ixi



ui= hxii ≡R

xip(xi)dxi

0 2 4 6 8 10 12 14

0 2000 4000 6000 8000 10000 12000

Shortest−Path(A,B)

Count

Probability Models

(26)

Example: AR(1) model

0 10 20 30 40 50 60 70 80 90 100

−0.5 0 0.5

xk= Axk−1+ ǫk k = 1 . . . K

ǫkis i.i.d., zero mean and normal with varianceR.

Estimation problem:

Givenx0, . . . , xK, determine coefficientA and variance R (both scalars).

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 100

AR(1) model, Generative Model notation

A ∼ N (A; 0, P ) R ∼ IG(R; ν, β/ν)

xk|xk−1, A, R ∼ N (xk; Axk−1, R) x0= ˆx0

A R

x0 x1 . . . xk−1 xk . . . xK

Observed variables are shown with double circles

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 101

Example, Univariate Gaussian

The Gaussian distribution with meanm and covariance S has the form N (x; m, S) = (2πS)−1/2exp{−1

2(x − m)2/S}

= exp{−1

2(x2+ m2− 2xm)/S −1

2log(2πS)}

= expm Sx − 1

2Sx2−1

2log(2πS) + 1 2Sm2



= exp{

 m/S

12/S



| {z }

θ

 x x2



| {z }

ψ(x)

−c(θ)}

Hence by matching coefficients we have exp

12Kx2+ hx + g

⇔ S = K−1 m = K−1h

Example, Gaussian

(27)

The Multivariate Gaussian Distribution

µ is the mean and P is the covariance:

N (s; µ, P ) = |2πP |−1/2exp



−1

2(s − µ)TP−1(s − µ)



= exp



−1

2sTP−1s + µTP−1s−1

TP−1µ −1 2|2πP |



log N (s; µ, P ) = −1

2sTP−1s + µTP−1s + const

= −1

2TrP−1ssT+ µTP−1s + const

=+ −1

2TrP−1ssT+ µTP−1s

Notation:log f (x) =+g(x) ⇐⇒ f (x) ∝ exp(g(x)) ⇐⇒ ∃c ∈ R : f(x) = c exp(g(x))

log p(s) =+ −1

2TrKssT+ hs ⇒ p(s) = N (s; K−1h, K−1)

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 104

Example, Inverse Gamma

The inverse Gamma distribution with shapea and scale b

IG(r; a, b) = 1 Γ(a)

r−(a+1)

ba exp(−1 br)

= exp



−(a + 1) log r − 1

br− log Γ(a) − a log b



= exp

 −(a + 1)

−1/b

 log r

1/r



− log Γ(a) − a log b

!

Hence by matching coefficients, we have

exp



α log r + β1 r+ c



⇔ a = −α − 1 b = −1/β

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 105

Example, Inverse Gamma

0 1 2 3 4 5

0 0.2 0.4 0.6 0.8 1 1.2 1.4

a=1 b=1

a=1 b=0.5 a=2 b=1

Basic Distributions : Exponential Family

• Following distributions are used often as elementary building blocks:

– Gaussian

– Gamma, Inverse Gamma, (Exponential, Chi-square, Wishart) – Dirichlet

– Discrete (Categorical), Bernoulli, multinomial

• All of those distributions can be written as

p(x|θ) = exp{θψ(x) − c(θ)}

c(θ) = log Z

Xn

dx exp(θψ(x)) log-partition function

θ canonical parameters

ψ(x) sufficient statistics

(28)

Conjugate priors: Posterior is in the same family as the prior.

Example: posterior inference for the varianceR of a zero mean Gaussian.

p(x|R) = N (x; 0, R) p(R) = IG(R; a, b)

p(R|x) ∝ p(R)p(x|R)

∝ exp



−(a + 1) log R − (1/b)1 R

 exp



−(x2/2)1 R−1

2log R



= exp

 −(a + 1 +12)

−(1/b + x2/2)

 log R

1/R

!

∝ IG(R; a +1 2, 2

x2+ 2/b)

Like the prior, this is an inverse-Gamma distribution.

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 108

Conjugate priors: Posterior is in the same family as the prior.

Example: posterior inference of varianceR from x1, . . . , xN.

R

x1 x2 . . . xN xN +1

p(R|x) p(R)

N i=1

p(xi|R)

exp

!

−(a + 1) log R − (1/b)1 R

"

exp

#

#

1 2

$

i

x2i

%

1 RN

2 log R

%

= exp

#

!

−(a + 1 +N2)

−(1/b +12&ix2i)

"!

log R 1/R

"

%

∝ IG(R; a +N

2, 2

&

ix2i+ 2/b) Sufficient statistics are additive

Cemgil An Introduction to Graphical Models and Monte Carlo Methods. June 19, 2007 109

Inverse Gamma, P

i

x

2i

= 10 N = 10

0 1 2 3 4 5

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Σi xi2 = 10 N = 10

Inverse Gamma, P

i

x

2i

= 100 N = 100

0 1 2 3 4 5

0 0.5 1 1.5 2 2.5 3

Σi xi2 = 100 N = 100

Referanslar

Benzer Belgeler

A¸sa˘ gıdaki serilerin

Bu bölümde f (x) fonksiyonunun baz¬ özel durumlar¬ için özel çözümün nas¬l bulundu¼ gunu görelim..

H373 - Uzun süreli veya tekrarlı maruz kalma sonucu organlarda hasara yol açabilir H412 - Sucul ortamda uzun süre kalıcı, zararlı etki.. Önlem İfadeleri (CLP) : P260

Yapı için teklif edilen taşıyıcı iskelet sistemi ve elamanlarının uygulamaya el- verişliliği, ulaşılmak istenen ve plânlama kalitelerinin mimarî etkiyle kolayca

Sen zaman hastahaneleri de, bil- hassa (Pflege einheit) hasta odalarının bu- lunduğu kısım çok çeşitli olarak çözümlen- miştir. Staticnlarda, kısa funktion yollarına

Historia del texto Clll

Reseña publicada en The Sunday Times, 13 de febrero 1977.. BOLETÍN AEPE

Bitwise 10 Büyük Kripto Endeksi, likidite, güvenlik ve diğer riskler için taranan en büyük 10 kripto varlığın piyasa değeri ağırlıklarından oluşan bir endeksidir ve