The Probability Hypothesis Density Filter

(1)

The Probability Hypothesis Density Filter

A. Taylan Cemgil

Signal Processing and Communications Lab.

7 December 2007 NIPS 07 Workshop

Approximate Bayesian Inference in Continuous/Hybrid Systems

(2)

(3)

Sinusoidal Tracking

with Clark, Peeling and Godsill

0 50 100 150 200 250 300 350

−500 0 500 1000 1500 2000 2500 3000 3500 4000

−0.5 0 0.5 1 1.5 2 2.5 3

x 10⁴

Frequency (Hz)

Time Tracking Results

Amplitude

(4)

Outline

• Introduction to Multi Object Tracking

• The Probability Hypothesis Density Filter – A toy model

– A short summary of point process theory

• Summary

(5)

Stochastic Dynamical System

• Generic Model

xk ∼ p(xk|x_k−1) Transition Model yk ∼ p(yk|xk) Observation Model

• Examples: Hidden Markov Model, Kalman filter

x₀ x₁ . . . xk−1 xk . . . xK

y₁ . . . y_k−1 yk . . . yK

• Observations yk ∈ Y are projections of the latent states xk ∈

• Exact inference possible only for few cases

(6)

Tracking - Filtering

time

state

10 20 30 40 50 60 70 80 90 100

10 20 30 40

50 0

0.5 1

time

observation

10 20 30 40 50 60 70 80 90 100

10 20 30 40

50 0

0.5 1

x₀ x₁ . . . xk−1 xk . . . xK

y y y y

(7)

Tracking - Kalman Filtering

0 1 2 3 4 5

0 0.5 1 1.5

p(y k|y 1:k−1)

Phase

Period

• Propagate exact sufficient statistics of the filtering density

(8)

A harder scenario – clutter and missing detections

time

state

10 20 30 40 50 60 70 80 90 100

10 20 30 40

50 0

0.5 1

time

observation

10 20 30 40 50 60 70 80 90 100

10 20 30 40

50 0

0.5 1 1.5 2

• At each time k, we observe n_k observations, at most one corresponding to the true target

(9)

A harder scenario – clutter and missing detections

x₀ x₁ . . . x_k−1 xk . . . xK

y₁¹ . . . y¹_k−1 y_k¹ . . . y_K¹

... ... ... ...

y₁^N . . . y^N_k₋₁ y_k^N . . . y_K^N

a₀ a₁ . . . ak−1 ak . . . aK

• The latent discrete switch variables a_k denote which of the observations is the true one

(10)

Inference in Switching State Space models

• Unlike HMM’s or KFM’s, summing over indicators ak does not simplify the filtering density.

• Number of Gaussian kernels to represent exact filtering density increases exponentially

−7.9036 6.6343

0.76292

−10.3422

−10.1982

−2.393

−2.7957

−0.4593

(11)

Approximate Inference

• Sequential Monte Carlo (Particle Filtering)

– Sample branches with a probability propotional to the evidence, Mixture Kalman filter (Chen and Liu 2001)

• Deterministic Approximations

– Assumed density filter (ADF) : Project the filtering density by moment matching onto a tractable family,

• See “Bayesian inference in dynamic models – an overview” by Tom Minka

(12)

An even harder scenario

time

state

10 20 30 40 50 60 70 80 90 100

10 20 30 40

50 0

0.5 1

time

observation

10 20 30 40 50 60 70 80 90 100

10 20 30 40

50 0

0.5 1 1.5 2

• At each time slice, a chain can cease to exist, or a new chain is born

• Clutter and misdetections

(13)

An even harder scenario

x¹₀ x¹₁ . . . x¹_k−1 x¹_k . . . x¹_K

... ... ... ...

x^M₀ x^M₁ . . . x^M_k−1 x^M_k . . . x^M_K

y¹₁ . . . y_k−1¹ y_k¹ . . . y¹_K

... ... ... ...

y₁^N . . . y_k−1^N y^N_k . . . y_K^N

a₀ a₁ . . . a_k−1 ak . . . aK

• Factorial dynamic model with changing number of latent chains (not modeled here)

• Association problem: combinatorial explosion in the number of switch variables

• Multi-hypothesis tracker (MHT), Joint Probabilistic Data Association filter (JPDA), Harmonic analysis on finite groups (Kondor, Howard, Jebara (2007))

(14)

time

observation

10 20 30 40 50 60 70 80 90 100

10

20

30

40

50 0

0.5 1 1.5 2

p_sur = 0.98; % Survival probability b = 0.01; % Birth intensity p_det = 0.5; % Detection probability c = 1/30; % Clutter intensity lam0 = 1; % Prior inensity

(15)

time

state

10 20 30 40 50 60 70 80 90 100

10

20

30

40

50

0.2 0.4 0.6 0.8 1

(16)

time

state

10 20 30 40 50 60 70 80 90 100

10

20

30

40

50 0

0.2 0.4 0.6 0.8 1

(17)

A simplified Model

• Suppose we just want to track the number of objects sk

s_k−1 s¯k sk s¯_k+1 . . .

ˆ s_k

m_k Survive Birth

Detect

Clutter

• We want to design a filter to track the number of objects, i.e. want to compute sequentially the filtering density p(sk|y_1:k)

(18)

Notation

• Poisson Distribution

PO(s; λ) = e

^−λ

s! λ

^s

• Binomial distribution – Number of succesful outcomes in n independent trials with success probability π

BI(s; n, π) = n s

π

^s

(1 − π)

^n−s

(19)

Basic Model

s_k−1 s¯k sk s¯k+1 . . .

ˆ sk

mk

Survive Birth Detect

Clutter

Survive s¯_k|s_k−1 ∼ BI(¯s_k; s_k−1, π_sur)

Birth sk = ¯sk + vk vk ∼ PO(vk; b) Detect sˆk|sk = BI(ˆsk; sk, πdet)

Observe in Clutter y_k|ˆs_k = ˆs_k + e_k e_k ∼ PO(ek; c)

(20)

Realisation from the process

p_sur = 0.9; % Survival probability b = 3; % Birth intensity

p_det = 0.5; % Detection probability c = 20; % Clutter intensity

lam0 = 10; % Prior inensity

0 10 20 30 40 50 60 70 80 90 100

10 20 30 40 50

Time

Number of Objects

Obs y

t

True s

t

(21)

Superposition of Poisson random variables

s ∼ PO(s; λ) e ∼ PO(e; ν) y = s + e

p(y) = PO(s; λ + ν)

(22)

Thinning Poisson Random variables

• Suppose we have n objects. Each object survives independently with probability π.

s|n ∼ BI(s; n, π) n ∼ PO(n; λ)

p(s) = X

n

BI(s; n, π)PO(n; λ) = PO(s; λπ)

(23)

Observing the sum of two Poisson Random variables

s ∼ PO(s; λ) e ∼ PO(e; ν) y = s + e

s

e

0 5 10 15 20

0 2 4 6 8 10 12 14 16 18 20

p(s|y) = BI(s; y, λ/(λ + ν))

(24)

Moment matching

λ^∗ = argmin

λ

KL(BI(s; m, π)||PO(s; λ)) = mπ

0 5 10 15 20

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

s

Binom(s; y, λ/(λ + ν)) Poisson(s; y λ/(λ + ν))

(25)

Probability generating functions (z-transforms)

• For a discrete random variable n ∼ p(n)

G(z) =

∞

X

n=0

p(n)zⁿ

G(1) = 1 G^′(1) = hni

BI(s; N, π) ⇔ G_BI(z) = (1 − π + πz)^N

G^′_BI(z) = N π(1 − π + πz)^N⁻¹ ⇒ hsi = N π

PO(s; λ) ⇔ G_PO(z) = exp(λ(z − 1)) G^′_PO(z) = λ exp(λ(z − 1)) ⇒ hsi = λ

First moment approximation to BI.

(26)

Sketch of the derivation of the ADF

• Start : p(s

_k−1

|m

_1:k−1

) = PO(s

_k−1

; λ

_k−1|k−1

)

s_k−1 s¯k sk s¯k+1 . . .

ˆ sk

mk

Clutter

• Prediction Step (Survive + Birth) :

p(s

_k

|m

_1:k−1

) = PO(s

_k

; λ

_k|k−1

)

λ = b + π λ

(27)

Sketch of the derivation of the ADF

• Update and Project Step (Observe in Clutter) (πdet = 1)

p(sk|m_1:k) ∝ BI(sk; mk; λ_k_|k−1/(c + λ_k_|k−1))

≈ PO(s_k; λ_k_|k)

sk−1 s¯k sk s¯k+1 . . .

ˆ sk

mk

Clutter

λ

_k|k

= m

_k

λ

_k|k−1

c + λ

_k|k−1

(28)

Sketch of the derivation of the ADF

• Update and Project Step (Observe in Clutter) (πdet < 1) p(sk|m_1:k) ∝ PO(sk; λ_k|k)

sk−1 s¯k sk s¯k+1 . . .

ˆ sk

mk

Clutter

λ

10 20 30 40 50

Observations True number Filt. Estimate

(30)

Point Process, Definition

• A random countable set

• Realizations are from the state space

X

^∪

=

∞

[

n=0

X

ⁿ

X

⁰

≡ ∅

• Defined by restrictions P⁽ⁿ⁾ on Xⁿ, where the probability observing n points in A is denoted by

P⁽ⁿ⁾(A × · · · × A)

(31)

Point Process

P⁽⁰⁾ P⁽¹⁾(dx₁) P⁽²⁾(dx₁, dx₂)

. . .

X⁰ X¹ X²

• All P⁽ⁿ⁾ are symmetric, e.g. P⁽²⁾(dx₁, dx₂) = P⁽²⁾(dx₂, dx₁)

• Normalisation

∞

X P⁽ⁿ⁾(Xⁿ) = 1

(32)

Realisations from a point process

• Choose a set A

• Generate the number of points in A

N

A

∼ p(n) = P

⁽ⁿ⁾

(A

ⁿ

)

P⁽⁰⁾ P⁽¹⁾(dx₁) P⁽²⁾(dx₁, dx₂)

(33)

Realisations from a point process

• Generate the coordinates from the joint distribution

(x

₁

, . . . , x

_n

) ∼ P

⁽ⁿ⁾

(dx

₁

, . . . , dx

_n

)/P

⁽ⁿ⁾

(A

ⁿ

)

−2 −1 0 1 2

−2

−1.5

−1

−0.5 0 0.5 1 1.5 2

−2 −1 0 1 2

−2

−1.5

−1

−0.5 0 0.5 1 1.5 2

(34)

Poisson Point Process

• For any A ⊂ X , we denote a poisson point process by X ∼ PP

_A

(X; λ) X ⊂ A

• The number of points to be observed in A is distributed by

|X ∩ A| = N

_A

∼ PO(N

_A

; Λ

_A

)

Intensity function λ(x) : X → R

⁺

Intensity measure Λ =

Z

λ(x)dx

(35)

Sampling from a Poisson Point Process

• Each P

⁽ⁿ⁾

is fully factorised

P

⁽ⁿ⁾

∝

n

Y

i=1

λ(x

_i

)

P⁽⁰⁾ P⁽¹⁾(dx₁) P⁽²⁾(dx₁, dx₂)

. . .

• Need to only represent λ(x), a positive function

(36)

Sampling from a Poisson Point Process

• Generate number of points on A

n ∼ P

⁽ⁿ⁾

(A

ⁿ

) = PO(n; Λ

_A

) = exp(−Λ

_A

) n! Λ

ⁿ_A

• For i = 1 . . . n, generate the coordinates independently from

x

_i

∼ λ(x)/Λ

_A

(37)

Superposition of Poisson Processes

X

_i

∼ PP

_A

(X

_i

; λ

_i

(x)) i = 1 . . . L

X ∼ PP

_A

(X; X

i

λ

_i

(x))

X = X₁ ∪ X₂ ∪ · · · ∪ XL

(38)

Thinning a Poisson Process

X ∼ PP

_A

A

dx

_t

p(x

_t+1

|x

t

)λ(x

_t

)

(40)

Partially observing a Poisson Point Process

X ∼ PPA(X; λ(x)) C ∼ PPA(C; c(x)) Y = X ∪ C

• The posterior process p(X|Y ) is in general not a Poisson process

• Find the Probability generating functional of the posterior process

• Find the first (functional) derivative and calculate the first moment

• Moment matching : This gives the intensity function of the “nearest” Poisson process in the KL sense

(41)

Probability generating functional

Daley and Vere-Jones 2003

• Generalises the probability generating function

G[z] =

∞

X

n=0

Z

Xⁿ

P_X⁽ⁿ⁾(dx₁, . . . , dxn)z(x₁) · · · z(xn)

• The functional derivative

G⁽¹⁾[z; ζ] = lim

ǫ→0

G[z + ǫζ] − G[z]

ǫ

• The intensity is recovered by

Λ_A = lim

z→1G⁽¹⁾[z; IA]

(42)

Probability Hypothesis Density Filter – The intensity recursion

• Predict

λ_t|t−1(x_t) = b_t(x_t) + π_sur(x_t)R dx_t−1p(x_t|x_t−1)λ_t−1(x_t−1)

• Update

λt(xt) = (1 − πdet(xt))λ_t|t−1(xt) + P

y_t∈Y_t

πdet(xt)p(yt|xt)λ_t|t−1(xt)

c_t(y_t) + R dx^′π_det(x^′)p(y_t|x^′)λ_t|t−1(x^′)

sk−1 s¯k sk s¯_k+1 . . .

ˆ sk

Clutter

(43)

time

observation

10 20 30 40 50 60 70 80 90 100

5 10

15

20

25 30

35

40

45

50 0

0.5 1 1.5 2 2.5 3 3.5 4

p_sur = 0.98; % Survival probability b = 0.01; % Birth intensity p_det = 0.5; % Detection probability c = 1/3; % Clutter intensity lam0 = 1; % Prior inensity

(44)

time

state

10 20 30 40 50 60 70 80 90 100

5 10

15

20

25 30

35

40

45 50

0.2 0.4 0.6 0.8 1 1.2 1.4

(45)

time

state

10 20 30 40 50 60 70 80 90 100

5 10

15

20

25 30

35

40

45

50 0

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

(46)

Summary

• The PHD filter is an ADF

– Retains only the first moment (the intensity)

– A second moment approximation exists but is costly (Singh et. al. 2007)

• The PHD filter does not solve the association problem – it is a bypass – Tracks the intensity field over time

• Implemented in practice via sequential Monte Carlo

– A stochastic approximation to a variational approximation

• Future ideas

– A PHD Smoother is not known

(47)

Bibliography

D. J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes, volume I: Springer, New York, second edition, 2003.

R. P. S. Mahler. Multitarget bayes filtering via first-order multitarget moments.

IEEE Transactions on Aerospace and Electronic Systems, October 2003.

S.S. Singh, B-N Vo., A. Baddeley, and S. Zuyev. Filters for spatial point processes.

Technical Report CUED/F-INFENG/TR.591, University of Cambridge, 2007.

B. Vo, S. Singh, and A. Doucet. Sequential Monte Carlo methods for multitarget filtering with random finite sets. IEEE Transactions on Aerospace and Electronic Systems, October 2005.

D. Clark, A. T. Cemgil, P. Peeling, and S. Godsill. Multi-object tracking of sinusoidal components in audio with the gaussian mixture probability hypothesis density filter. Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October 2007.

The Probability Hypothesis Density Filter

The Probability Hypothesis Density Filter

Sinusoidal Tracking

Outline

• Introduction to Multi Object Tracking

• The Probability Hypothesis Density Filter – A toy model

– A short summary of point process theory

• Summary

Stochastic Dynamical System

Tracking - Filtering

Tracking - Kalman Filtering

A harder scenario – clutter and missing detections

A harder scenario – clutter and missing detections

Inference in Switching State Space models

Approximate Inference

An even harder scenario

An even harder scenario

A simplified Model

Notation

• Poisson Distribution

PO(s; λ) = e

s! λ

• Binomial distribution – Number of succesful outcomes in n independent trials with success probability π

BI(s; n, π) = n s



π

(1 − π)

Basic Model

Realisation from the process

Superposition of Poisson random variables

s ∼ PO(s; λ) e ∼ PO(e; ν) y = s + e

p(y) = PO(s; λ + ν)

Thinning Poisson Random variables

s|n ∼ BI(s; n, π) n ∼ PO(n; λ)

p(s) = X

BI(s; n, π)PO(n; λ) = PO(s; λπ)

Observing the sum of two Poisson Random variables

Moment matching

Probability generating functions (z-transforms)

Sketch of the derivation of the ADF

• Start : p(s

|m

) = PO(s

; λ

)

• Prediction Step (Survive + Birth) :

p(s

|m

) = PO(s

; λ

)

λ = b + π λ

Sketch of the derivation of the ADF

λ

= m

λ

c + λ

Sketch of the derivation of the ADF

λ

= (1 − π

)λ

+ m

π

λ

Assumed Density Filter

λ

= b + π

λ

λ

= (1 − π

)λ

+ m

π

λ

c + π

λ

Point Process, Definition

X

=

[

BI(s; n, π) = n s