• Sonuç bulunamadı

Factorisation based models for analysis of musical audio

N/A
N/A
Protected

Academic year: 2021

Share "Factorisation based models for analysis of musical audio"

Copied!
63
0
0

Yükleniyor.... (view fulltext now)

Tam metin

(1)

Variations on a theme:

Factorisation based models for analysis of musical audio

A. Taylan Cemgil

Department of Computer Engineering, Bo˘gazi¸ci University, ˙Istanbul, Turkey

17 Dec 2011, Nips Workshops, Granada

(2)

Acknowledgements

• Umut Simsekli, Kenan Yılmaz, Orhan S¨onmez, Barıs Kurt, Bogazici

• Cumhur Erkut, Antti Jylh¨a, Aalto (Helsinki)

• Onur Dikmen, C´edric F´evotte (CNRS, Telecom ParisTech)

• Tuomas Virtanen (Tampere)

• Evrim Acar (Copenhagen)

• Funding for our work

– TUBITAK, BAP, CNRS

(3)

Outline

• Introduction, Motivations

• Matrix Factorisation,

• Tensors

• Probabilistic Latent Tensor Factorisation

• Example models

• Inference framework

• Coupled Tensor Factorisations

• Applications

• Results and Conclusions

(4)

Statistical Approach

• Machine Listening ⇔ inverse synthesis via Bayesian inference p(Structure|Audio) ∝ p(Audio|Structure)p(Structure) – Hierarchical signal models to incorporate prior knowledge – Consistent framework for developing inference algorithms

• Contrast to Traditional/Procedural approaches – where no clear

distinction between “what” and “how”

(5)

Superposition

• Signal from sound sources are mixed – Denoising

– Separation

– Dereverberation

• Feature extraction is hard: Feature(P

isi) 6= P

i Feature(si)

piano + piccolo + cymbals

(6)

Polyphonic Music Transcription

• from sound ...

t/sec

f/Hz

0 1 2 3 4 5 6 7 8

0 1000 2000 3000 4000 5000

0 10 20

• ... to score

(7)

Computational issues

• Parameter Estimation

Which pitch, rhythm, tempo, meter, time signature ... ?

• Model Selection

How many notes, onsets, sections ... ?

• Online/Offline inference

(8)

Signal Models for Audio

• Time domain – state space, dynamical models

– Conditional Linear Dynamical Systems, Gaussian processes (e.g.

AR, ARMA), switching state space models

– Flexible, Physically realistic, Analysis down to sample precision

• Transform domain – Fourier representations, Generalised Linear model

– Models on (orthogonal) transform coefficients, Energy compaction – Practical, can make use of fast transforms (FFT, MDCT, ...)

• Feature based

– MFCC’s, Chroma,

(9)

Matrix Factorisation

• An Inverse problem: estimate Z1, Z2 given X.

X ≈ Z1Z2 X(ν, τ ) ≈ X

i

Z1(ν, i)Z2(i, τ )

i j

i j

i k

k j

≈ =

X ˆ Z

1

Z

2

X ◦ M

(10)

Matrix Factorisation

• An Inverse problem: estimate Z1, Z2 given X.

X ≈ Z1Z2 X(ν, τ ) ≈ X

i

Z1(ν, i)Z2(i, τ )

• Many well known algorithms can be cast as matrix factorisation problems – Clustering: Z1 is arbitrary, columns of Z2 are unit vectors

– NMF: Z1, Z2 are nonnegative (Paatero and Tapper, 1994; Lee and Seung 1999, 2000) – PCA, Latent Semantic Indexing, Latent Dirichlet Allocation ...

• Minimise a suitable error function

(Z1, Z2) = arg min

Z1,Z2 D(X||Z1Z2)

(11)

NMF in Acoustic and Music modeling

• We seek a factorisation of the spectrogram (Smaragdis and Brown 2003; Sha, Saul, Lee;

Virtanen; Abdallah and Plumbley; Schmidt and Olsson; Fevotte)

X ≈ Z1 × Z2

Spectrogram ≈ Templates × Excitations

20 40 60 80 100 120

Freq. Z1

Z Intensity

(12)

Underlying Generative Model

• One-Rank

X(ν, τ ) ∼ p(·; Z1(ν)Z2(τ ))

• Higher Rank (Composite structure)

S(ν, τ, i) ∼ p(·; Z1(ν, i)Z2(τ, i)) X(ν, τ ) = X

i

S(ν, τ, i)

(13)

Music Analysis

τ/Frame

ν/Frequency Bin

Log |MDCT| coefficients

50 100 150 200 250

50 100 150 200 250 300 350 400 450 500

i/Key index

ν/Frequency index

Estimated Scale Parameters of the template prior

10 20 30 40 50 60 70 80

50 100 150 200 250 300 350 400 450 500

×

pitch

τ/Frame index

20 40 60 80 100 120

5 10 15 20 25 30 35 40

One needs to incorporate a lot of prior knowledge to arrive at the “desired factorisation”

• Provide Priors (Spectral continuity, Gamma chains, HMM’s)

• Provide Side Information

(14)

Side Information for guiding the factorisation

X2 (Isolated Notes) X1 (Audio Spectrum)

f p

i p

i t

f t

Observed TensorsHidden Tensors

f i

D (Spectral Templates)

F (Excitations of X2) E (Excitations of X1)

[X1, X2] ≈ D[E, F ]

Many other extensions for audio (Smaragdis, Raj; Morup, Schmidt; Vincent; Virtanen; Fevotte; Coyle, FitzGerald; Bertin, Liutkus, Badeau, Richard)

(15)

Factorisation based audio models

• Need highly structured (and complicated) models

• A unifying and practical framework inspired by graphical models:

– Probabilistic Latent Tensor Factorisation – Generalised Coupled Tensor Factorisation

(16)

Main Research Questions

• Understand several popular models in Audio and Music processing, invent new ones Incorporation of prior knowledge via hierarchical modeling

• A general framework for derivation of decomposition algorithms

Inspiration from probabilistic graphical models, factor graphs, message passing

• Understand the statistical interpretation of Matrix/Tensor factorisation.

Certain Error criteria lead to hierarchical probabilistic models

• Model Selection, sparsity

Bayesian Model Selection via Variational free energy minimisation (VB) or MCMC (not here)

(17)

Tensors

• Tensor ≡ Multidimensional Array (X(i, j, k, . . .))

Disclaimer: not a tensor field

(18)

Tensor Factorisations

Kolda and Bader; Chichocki, Zdunek, Amari

• PARAFAC (parallel factors – Carroll and Chang 1970 CANDECOMP (canonical decomposition – Harshman 1970)

X(ν, ξ, τ ) ≈ X

i

T (ν, i)V (τ, i)W (ξ, i)

• Three-mode FA, Higher order SVD (Tucker, 1966; De Lathauwer et al., 2000)

X(ν, ξ, τ ) ≈ X

i,j,k

G(i, j, k)T (ν, i)V (τ, j)W (ξ, k)

• N-Mode generalisation of Tucker and dozens of variations X(ν1, ν2, . . . , νN) ≈ X

i1,i2,...,iN

G(i1, i2, . . . , iN)V11, i1)V22, i2) · · · VNN, iN)

Actually Tensor factorisations are quite useful even if data are not multiway.

(19)

Example 1: Deconvolution as (latent) tensor factorisation

• X: Observed Signal

• Z1: Original Signal

• Z2: Filter impulse response

X(i) ≈ ˆX(i) = X

t

Z1(t)Z2(

z}|{d

i − t)

= X

t

X

d

Z1(t)Z2(d)δ(d − i + t)

= X

t,d

Z1(t)Z2(d)Z3(d, i, t)

(20)

Example 1: Hierarchical modeling (1)

Assume that the original signal can be confined into a subspace

• X: Observed Signal

• U : Original Signal

• Z2: Filter impulse response

X(i) ≈ ˆX(i) = X

t,d P

rZ4(t,r)Z1(r)

z}|{U (t) Z2(d)Z3(d, i, t)

= X

t

X

d

X

r

Z1(r)Z2(d)Z3(d, i, t)Z4(t, r)

(21)

Example 1: Hierarchical modeling (1)

U (t) = X

r

Z4(t, r)Z1(r)

0 0.5 1

0 100 200 300 400 500 600 700

=

0 1 2 3 4 5 6 7

0 100 200 300 400 500 600 700

×

0.5 1 1.5

1

2

3

4

5

6

7

(22)

Example 1: Hierarchical modeling (2)

Assume the filter can also be confined into a subspace

• X: Observed Signal

• Z1: Original Signals expansion coefficients

• H: Filter Impulse response

X(i) ≈ ˆX(i) = X

t

X

d

X

r

Z1(r)

P

q Z5(d,q)Z2(q)

z }| {

H(d) Z3(d, i, t)Z4(t, r)

= X

t

X

d

X

r

X

q

Z1(r)Z2(q)Z3(d, i, t)Z4(t, r)Z5(d, q)

This process may continue until we run out letters in the alphabet

(23)

Deconvolution

Synthetically blurred image

10 20 30 40 50

10

20

30

40

50

Original Image

10 20 30 40 50

5 10 15 20 25 30 35 40 45 50

Original filter

2 4 6 8 10

1 2 3 4 5 6 7 8 9 10

z1 and z2 convolved

10 20 30 40 50

10

20

30

40

50

z1

10 20 30 40 50

5 10 15 20 25 30 35 40 45 50

z2

2 4 6 8 10

1 2 3 4 5 6 7 8 9 10 20

40 60 80 100

20 40 60 80 100 120 140 160

−0.8

−0.6

−0.4

−0.2 0 0.2 0.4 0.6 0.8 1

20 40 60 80 100

5 10 15 20 25 30 35

0.02 0.04 0.06 0.08 0.1 0.12 0.14

(24)

Ex2: Nonnegative Matrix deconvolution (NMFD)

(Smaragdis, 2004)

X(f, t) =ˆ X

τ,i

W (f, i, τ )H(i,

z }| {d

t − τ )

= X

τ,i,d

W (f, i, τ )H(i, d)Z(d, t, τ ).

X: Spectrogram, W : Basis, H: Weights

(25)

Excitation Filter Models

• (NMF2D) Nonnegative Matrix 2-D deconvolution Schmidt and Mørup 2008

X(f, t) =ˆ X

i,φ,τ,ν,d

D(ν, τ, i)E(φ, d, i)Z1(ν, f, φ)Z2(d, t, τ )

• (SF-SSNTF) Source Filter Sinusoidal Shifted Nonnegative Tensor Factorisation

FitzGerald, Cranitch, Coyle 2008

X(c, t, f ) =ˆ X

i,p,r,τ

G(c, i)H(f, i)N (f, p, r)W (p, i, τ )E(r, i, d)Z(d, t, τ )

– mimics physically inspired source-filter models in spectral domain, – harmonic excitation multiplied by spectral envelope

• Excitation filter model Klapuri and Virtanen

X(t, k) =ˆ P

i,j,n G(n, t)E(n, k)C(i, j)A(j, k)Z(i, n, t)

(26)

Probabilistic Latent Tensor Factorisation

X(v0) ≈ ˆX(v0) = X

¯ v0

Y

α

Zα(vα)

Z1:|α| = argmin

Z

D(X(v0)|| ˆX(v0))

v ∈ V All model indices,

v0 ∈ V0 indices of observation X,

vα ∈ Vα indices of factor Zα for α = 1 . . . |α|,

¯

v0 ∈ ¯V0 V \V0,

¯

vi ∈ ¯Vi V \Vα, α = 1 . . . |α|.

(27)

Example: NMF

Nonnegative Matrix factorisation

X(v0) ≈ ˆX(v0) = X

¯ v0

Y

α

Zα(vα)

X(f, t) ≈ ˆX(f, t) = X

i

Z1(f, i)Z2(i, t)

v ∈ V = {f, t, i} All model indices,

v0 ∈ V0 = {f, t} indices of observation X, v1 ∈ V1 = {f, i} indices of factor Z1,

v2 ∈ V2 = {i, t} indices of factor Z2,

¯

v0 ∈ ¯V0 = {i} v¯1 ∈ ¯V1 = {t} v¯2 ∈ ¯V2 = {f }

(28)

Factor Graph Representation for TF Models

• Multivariate Probability densities have a Factor graph representation

• Tensor Factorisation models can be represented similarly – Clique Potentials ↔ Factors

– Random variables ↔ Indices

• Example: NMF

V = {f, i, t} V0 = {f, t}

V1 = {f, i} V2 = {i, t}

X(f, t) ≈ P

i Z1(f, i)Z2(i, t)

Z1

Z2 f

i

t

(29)

Factor Graph Representation for TF Models

NMF NMFD NMF2D SF-SSNTF

Model V {f, t, i} {f, t, τ, i, d} {f, t, ν, τ, i, φ, d} {c, t, f, i, p, r, τ, d}

Observed V0 {f, t} {f, t} {f, t} {c, t, f }

Latent V¯0 {i} {τ, i, d} {ν, τ, i, φ, d} {i, p, r, τ, d}

Factors {f, i} {f, τ, i} {d, i} {ν, τ, i} {φ, d, i} {c, i} {f, i} {f, p, r}

{i, t} {d, t, τ } {ν, f, φ} {d, t, τ } {p, i, τ } {r, i, d} {d, t, τ }

D

E f

i

t

D

E Z

f

i

d t

τ

Z1

E D

Z2 f

ν i φ t τ d

N

G H

E W

Z c f

p i r t τ d

(30)

Bayesian Networks for MF

Θv vi,1

· · · vi,τ

· · · vi,K

Θt tν,i

sν,i,1

· · · sν,i,τ

· · · sν,i,K i = 1 . . . I

xν,1 xν,τ xν,K

ν = 1 . . . W

• A Directed acyclic graph, not to be confused with our notation

• Nodes denote Random variables, Rectangles denote plates (repeat nodes inside)

• However, fairly complicated even for NMF → not very insightful

(31)

Update Rules for Non-Negative GTF

From GLM one can derive multiplicative update rules (MUR) (Yilmaz et.al., NIPS 2011).

Zα ← Zα ◦ ∆α(M ◦ W ( ˆX) ◦ X)

α(M ◦ W ( ˆX) ◦ ˆX) s.t. Zα(vα) > 0

(inverse variance function), i.e. W ( ˆX) = 1/v( ˆX) where for the Gaussian, Poisson, Exponential and Inverse Gaussian distributions we have simply

W ( ˆX) = ˆX−p with p = {0, 1, 2, 3}.

α(Q) = h X

¯ vα

Q(v0) Y

α6=α

Zα(vα)i

(1)

(32)

PLTF: Iterative Maximum Likelihood

• Specialize to β-divergences (Yilmaz and Cemgil, 2010)

Zα ← Zα ◦ ∆α(M ◦ X ◦ ˆXβ−2)

α 

M ◦ ˆXβ−1 D(·, ·) IS KL Euclidian

β 0 1 2

• M : mask tensor (M (v0) = 1 if X(v0) is observed, 0 otherwise)

• Evaluating ∆, equivalent to computing marginal potentials! ⇒ via message passing on a factor graph

α(Q) ≡

X

¯ vα

Q(v0) Y

α6=α

Zα(vα)

(33)

General Update Rule for GTF

By dropping the non-negativity requirement we obtain the following update equation:

Zα ← Zα + 2 λα\0

α(W ◦ (X − ˆX))

2α(W ) with λα\0 = |vα ∩ ¯v0|

εα(Q) = h X

¯ vα

Q(v0) Y

α6=α

Zα(vα)εi

(34)

NMF

Z1

Z2 ν

i

τ

X(ν, τ ) ←ˆ X

i

Z1(ν, i)Z2(i, τ )

(35)

NMF

M ◦ X/ ˆX Z2 M

ν

i

τ

Z1 ← Z1 ◦ ∆1(M ◦ X/ ˆX)

1(M )

(36)

NMF

M ◦ X/ ˆX M

Z1 ν

i

τ

Z2 ← Z2 ◦ ∆2(M ◦ X/ ˆX)

2(M )

(37)

MAP estimation (for the KL case)

• Conjugate Prior for a Poisson (Gamma G(x; a, b) = baxa−1 exp(−bx)/Γ(a)) Zα ∼ G(Zα;Aα, Bα/Aα)

• Update Rule

Zα ← (Aα − 1) + Zα ◦ ∆α(M ◦ X/ ˆX)

Aα/Bα + ∆α (M ) (2)

(38)

Deriving the update equations

• Matrix factorisations as a Generalised Linear Model

• Consider a MF model

g( ˆX) = Z1Z2

where Z1, Z2 and g( ˆX) are matrices of compatible sizes.

• Use vec(AXB) = (B ⊗ A) vec X to obtain

vec(g( ˆX)) = (I|j| ⊗ Z1) vec(Z2) ≡ Lz

• We can compute a factorisation using the general GLM update equation by alternating between Z1 and Z2

• Readily generalised to arbitrary tensors

(39)

Extensions to NMFD and NMF2D

• Convolutive model (I)

E(φ, d, i) = X

k,l

B(k, l)C(k,

z }| {α

d − l, φ, i) (I) (3)

• Basis spline model (II)

E(φ, d, i) = X

k

B(k, d)C(k, φ, i) (II) (4)

φ Note index, i source index, d local time index

(40)

Extensions to NMFD and NMF2D

D

B Z

1

C

Z

2

f

i

d α

t k

l τ

(a) NMFD+I

D

Z

1

C

B f

τ t i

k d

(b) NMFD+II

Z

1

B

C D

Z

2

Z

3

f

φ

τ α

i k

t ν

d l

(c) NMF2D+I

Z

1

B

C D

Z

2

f

ν i φ

t

τ k

d

(d) NMF2D+II

(41)

Application: Missing audio restoration

• 50 short mono audio examples sampled at 44.1kHz (from FitzGerald, Cranitch, Coyle 2008).

• Compute a spectrogram of 1024 samples windows with no overlap.

• Remove randomly blocks of 10 consecutive time frames, approx. 250ms gaps.

• 20 per cent of each audio file is removed with long gaps.

Table 1: Evaluation of the models on missing audio restoration

SNR MSE

IS KL EUC IS KL EUC

NMFD 2.99 4.74 5.05 4.43 2.91 2.68

SF-SSNTF −0.28 5.09 5.06 15.00 2.57 2.59 NMFD + I 3.01 6.00 6.91 5.89 2.23 1.68 NMFD + II 5.00 5.79 5.80 2.74 2.20 2.17

(42)

Application: Source Separation

• 50 short mono audio examples sampled at 44.1kHz (from FitzGerald, Cranitch, Coyle 2008)

• Mix pairs of examples

• operate on Constant-Q magnitude (computed via Schoerkhuber and Klapuri)

• Using KL cost

• Reconstruct sources using the estimated magnitudes and phase of the mixture

Model SDR SIR SAR

NMF2D 6.10 19.00 7.50

NMF2D + I 6.19 19.84 6.84 SF-SSNTF ≈ 8.00 ≈ 24.00 ≈ 8.00

(43)

Coupled Tensor Factorisations

Example Problem

X1i,j,k ≈ ˆX1i,j,k = X

r

Ai,rBj,rCk,r X2j,p ≈ ˆX2j,p = X

r

Bj,rDp,r X3j,q ≈ ˆX3j,q = X

r

Bj,rEq,r

A B C D E

X1 X2 X3

(44)

Coupled Tensor Factorisations

• Factorise multiple observed tensors simultaneously: Xν for ν = 1 . . . |ν|.

• Each observed tensor Xν now has a corresponding index set V0,ν and a particular configuration will be denoted by v0,ν ≡ uν

• We define a |ν| × |α| coupling matrix R where

Rν,α =

 1 Xν and Zα connected

0 otherwise Xˆν(uν) = X

¯ uν

Y

α

Zα(vα)Rν,α (5)

Example

X1i,j,k ≈ X

r

Z1i,rZ2j,rZ3k,r X2j,p ≈ X

r

Z2j,rZ4p,r X3j,q ≈ X

r

Z2j,rZ5q,r (6)

(45)

Update rules for Coupled Tensor Factorisations

εα,ν(Q) = h X

uν∩¯vα

Q(uν) Y

α6=α

Zα(vα)Rν,αεi

(7)

• Update for Nonnegative CTF

Zα ← Zα ◦ P

ν Rν,αα,ν Wν ◦ Xν

 P

ν Rν,αα,ν Wν ◦ ˆXν

 (8)

• In the special case of a Tweedie family, i.e. for the distributions whose precision as Wν = ˆXν−p, the update is

Zα ← Zα ◦ P

ν Rν,αα,νν−p ◦ Xν P

ν Rν,αα,νν1−p

 (9)

(46)

Update rules for Coupled Tensor Factorisations

εα,ν(Q) = h X

uν∩¯vα

Q(uν) Y

α6=α

Zα(vα)Rν,αεi

(10)

• General Update for CTF

Zα ← Zα + 2 λα\0

P

ν Rν,αα,ν Wν ◦ Xν − ˆXν

P

ν Rν,α2α,ν Wν (11)

(47)

GCTF Application: Audio restoration

Ground Truth

200 400

50 100 150 200 250 300 350 400 450 500

Observed Spectrogram

200 400

50 100 150 200 250 300 350 400 450 500

Reconstructed Spectrogram

200 400

50 100 150 200 250 300 350 400 450 500

11025 kHz, frame length 93 msec, (50 missing chunks: average length 0.23 sec., max. length: 1.07 sec.; with side information: bach chorales, SNR: 4.38 dB

(48)

GCTF Application: Score aided Audio restoration

D (Spectral Templates)

E (Excitations of X1)

B (Chord Templates)

X3 (Isolated Notes) X1 (Audio with Missing Parts) X2 (MIDI file)

f p

i p

f i

k d

i t

f t

i n

k i m

τ k

Observed TensorsHidden Tensors

F (Excitations

of X3) C (Excitations

of E)

G (Excitations of X2)

(49)

GCTF Application: Score aided Audio restoration

1(f, t) = X

i,τ,k,d

D(f, i)B(i, τ, k)C(k, d)Z(d, t, τ ) Test file (12) Xˆ2(i, n) = X

τ,k,m

B(i, τ, k)G(k, m)Y (m, n, τ ) MIDI file (13) Xˆ3(f, p) = X

i

D(f, i)F (i, p)T (i, p) Merged training files (14)

R =

1 1 1 1 0 0 0 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 1 1

 with

1 = P D1B1C1Z0G0Y 0F0T02 = P D0B1C0Z0G1Y 1F0T03 = P D1B0C0Z0G0Y 0F1T1

(15)

(50)

GCTF Application: Score aided Audio restoration

Time (sec)

Frequency (Hz)

X3 (Isolated Recordings)

100 200 300

0 500 1000 1500 2000

Time (sec)

Notes

X2 (Transcription Data)

50 100 150

20 40 60 80

Time (sec)

Frequency (Hz)

X1

5 10 15 20 25

0 500 1000 1500 2000

Time (sec)

Frequency (Hz)

X1hat (Restored)

5 10 15 20 25

0 500 1000 1500 2000

Time (sec)

Frequency (Hz)

Ground Truth

5 10 15 20 25

0 500 1000 1500 2000

20 40 60 80

0 5 10 15

Missing Data Percentage (%)

SNR (dB)

Performance

Reconst. SNR Initial SNR

Figure 1: Observed matrices X1: spectrum of the piano performance, (missing data (70%) are shown white), X2, a piano roll obtained from a musical score of the piece, X3, spectra of 88 isolated notes from a piano.

(51)

Summary and Future Work

• Cover broad class of models and topologies using a graphical model formalism

• Generalised Linear models for ML estimation in TF models

• Full Bayesian treatment (inference, model selection) is possible via MCMC or Variational Inference

• Automatic code generation a-la WinBUGS or Infer.net, given a model specification

• Fast Computations on a GPU

• Prior structures, Online inference

• Applications!!

(52)

A Toolbox based on GPU computation

Probabilistic Latent Tensor Factorization Matlab Toolkit

http://www.cmpe.boun.edu.tr/pilab/pilabfiles/pltftoolbox/

(53)

References http://www.cmpe.boun.edu.tr/~cemgil

• Simsekli, U., Cemgil, A. T. and Yilmaz, K., Score Guided Audio Restoration via Generalised Coupled Tensor Factorisation, Submitted 2011

• Yilmaz, K., Cemgil, A. T. and Simsekli, U., Generalised Coupled Tensor Factorisation, NIPS, 2011

• Cemgil, Simsekli and Subakan, Probabilistic Latent Tensor factorisation framework for audio modeling, IEEE Waspaa, 2011

• Yilmaz, K. and Cemgil, A. T. Algorithms for Probabilistic Latent Tensor Factorization, Signal Processing, Elsevier, 2011

– Longer version of Yilmaz, K. and Cemgil, A. T. Probabilistic Latent Tensor factorisation, Proc.

of ICA/LVA, 2010

• C. F´evotte and A. T. Cemgil Nonnegative matrix factorisations as probabilistic inference in composite models, Eusipco 2009, Glasgow

• A. T. Cemgil. Bayesian inference in non-negative matrix factorisation models. Computational Intelligence and Neuroscience, 2009

(54)

Change Point Models

• Real time pitch/event detection with NMF style models

• Tempo tracking and real time interaction

log t ,i

10 20 30 40 50 60

0 5 10 15 20 25

r!

0 5 10 15 20 25

r!

release sustain attack

0 5 10 15

v!

0 2 4 6 8

v!

log x ,!

10 20 30 40 50 60

log x ,!

10 20 30 40 50 60

HMM CPM

(55)

HMM versus Change Point Model

F F

rτ −1 rτ

vτ −1 vτ

xν,τ −1 xν,τ

F F

cτ −1 cτ

rτ −1 rτ

vτ −1 vτ

xν,τ −1 xν,τ

(56)

Detection Results (HMM versus CP)

0 50 100 150 200 250 300 350 400

0 50 100

Lag (ms)

Precision (%)

HMM CPM

0 50 100 150 200 250 300 350 400

0 50 100

Lag (ms)

Recall (%)

0 50 100 150 200 250 300 350 400

20 40 60 80

Lag (ms)

Latency (ms)

(57)

Realtime Interaction

• Combine Perception with Control

Claves

Lego Robot Loud Speaker

Perception

Midi Synthesizer

Controller Tempo

Bar Position

Motor Feedback

Motor Command

(58)

Tempo Tracker, Graphical Model

F F

nτ −1 nτ

mτ −1 mτ

rτ −1 rτ

vτ −1 vτ

xν,τ −1 xν,τ

(59)

Tempo Tracker

Tempo

BPM

1 2 3 4 5 6 7 8 9

100 150 200

0 0.5 1

Bar Position

1 2 3 4 5 6 7 8 9

200 400 600

0 0.5 1

Acoustic Events

1 2 3 4 5 6 7 8 9

1 2

3

0 0.5 1

Audio Spectra xν,τ

Time (sec)

Frequency

1 2 3 4 5 6 7 8 9

50 100 150 200 250 300 350 400 450 500

−14

−12

−10

−8

−6

−4

−2 0 2 Spectral Templates tν,i

Acoustic Events

Frequency

0.5 1 1.5 2 2.5 3 3.5

50 100 150 200 250 300 350 400 450 500

(60)

Controller

−0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4

−0.05

−0.04

−0.03

−0.02

−0.01 0 0.01 0.02 0.03 0.04 0.05

mτ

n τ

τ = 0 τ = 1 τ = 2

τ = 3 τ = 4

(61)

Controller

0 5 10 15

0 500 1000

Bar Posion

Position

Time (sec)

0 5 10 15

120 125 130 135 140

Tempo (BPM)

Velocity

Time (sec)

0 5 10 15

0 pi/2 pi 3pi/2 2pi

Robot’s Position

Time (sec)

0 5 10 15

0 pi/32 pi/16

Robot’s Velocity

Time (sec)

(62)

Results

0 5 10 15 20

0 0.2 0.4 0.6 0.8 1

Bar Position

Time (seconds)

Bar Position

Robot’s Position Tracker’s Position

0 5 10 15 20

0 50 100 150 200 250 300

Tempo

Time (seconds)

Tempo (beat/min)

Robot Speed Tracker Speed

(63)

References (Realtime Tracking and Interaction)

• U. Simsekli, O. Sonmez, B. Kurt, A. T. Cemgil, Combined Perception and Control for Timing in Robotic Music Performances EURASIP Journal on Audio, Speech, and Music Processing, 2011

• U. Simsekli, A. T. Cemgil, Probabilistic Models for Real-Time Acoustic Event Detection with Application to Pitch Tracking, Journal of New Music Research, 2011

• U. Simsekli, A. Jylha, C. Erkut and A. T. Cemgil, Real-Time Recognition of Percussive Sounds by a Model-Based Method, EURASIP Journal on Advances in Signal Processing, 2011.

Referanslar

Benzer Belgeler

Numerical experiments demonstrate that joint analysis of data from multiple sources via coupled factorisation improves the link prediction performance and the selection of right

In this paper, we address the section linking task and present a score- informed hierarchical Hidden Markov Model for modeling musical audio signals from a coarser temporal level,

In this paper we describe some models developed recently for these tasks, which also have utility in audio and general signal processing applications; and investigate hybrid

He has been developing methods for single-channel sound source separation using non-negative matrix factorization based techniques, and noise-robust speech recognition, music

The main idea of our model is to incorporate different kinds of musical information while estimating the missing parts of the audio: the reconstruction will be aided by an

Because the sounds in the material used to train the basis vectors are not identical to the sounds in the mix- ture, the second algorithm patches the mismatches by representing parts

By exploiting a link between graphical models and tensor factorization models we can realize any arbitrary ten- sor factorization structure, and many popular models such as CP or

A generalised tensor factorisation problem is specified by an observed tensor X (with possibly missing entries, to be treated later) and a collection of latent tensors to be