• Sonuç bulunamadı

and A. Taylan CEMGIL

N/A
N/A
Protected

Academic year: 2021

Share "and A. Taylan CEMGIL"

Copied!
5
0
0

Yükleniyor.... (view fulltext now)

Tam metin

(1)

NONNEGATIVE MATRIX FACTORIZATIONS AS PROBABILISTIC INFERENCE IN COMPOSITE MODELS

C´edric F ´ EVOTTE

1

and A. Taylan CEMGIL

2

1CNRS LTCI; T´el´ecom ParisTech 37-39, rue Dareau 75014 Paris, France [email protected]

2Department of Computer Engineering, Bo˘gazic¸i University

34342 Bebek, Istanbul, Turkey [email protected]

ABSTRACT

We develop an interpretation of nonnegative matrix fac- torization (NMF) methods based on Euclidean distance, Kullback-Leibler and Itakura-Saito divergences in a proba- bilistic framework. We describe how these factorizations are implicit in a well-defined statistical model of superimposed components, either Gaussian or Poisson distributed, and are equivalent to maximum likelihood estimation of either mean, variance or intensity parameters. By treating the components as hidden-variables, NMF algorithms can be derived in a typ- ical data augmentation setting. This setting can in partic- ular accommodate regularization constraints on the matrix factors through Bayesian priors. We describe multiplicative, Expectation-Maximization, Markov chain Monte Carlo and Variational Bayes algorithms for the NMF problem. This pa- per describes in a unified framework both new and known algorithms and aims at providing statistical insights to NMF.

1. INTRODUCTION

Given a data matrix V of dimensions F × N with nonnega- tive entries, NMF is the problem of finding a factorization

V ≈ WH = ˆV (1)

where W and H are nonnegative matrices of dimensions F × K and K × N, respectively. K is usually chosen such that F K+K N << F N, hence ˆV becomes a low-rank matrix with reduced number of parameters. In the following, the entries of matrices V, W, H and ˆV are denoted vf n, wf k, hknand vˆf n respectively. We use the colon notation “:” to denote all column or row indices so that W = [w:,1, . . . , w:,K] and H = [h>1,:, . . . , h>K,:]>.

NMF has been applied to diverse problems (such as pat- tern recognition, clustering, mining, source separation, col- laborative filtering) in many areas (such as bioinformatics, audio and image processing, and finance). In the literature, the factorization (1) is usually sought after through the mini- mization problem

Wmin,H0D(V|WH) = D(V| ˆV)def=

F

f=1 N n=1

d(vf n| ˆvf n) (2)

where d(x|y) is a scalar cost function. Popular choices are the squared Euclidean distance, the (generalized) Kullback- Leibler (KL) divergence, also referred to as I-divergence and

the Itakura-Saito (IS) divergence defined as

dEUC(x|y) = 1

2(x − y)2 (3)

dKL(x|y) = xlogx

y− x + y (4)

dIS(x|y) = x y− logx

y− 1 (5)

All cost functions are positive and have a single minimum 0 when x = y.

The interpretation of NMF as a low rank matrix approxi- mation in the sense of minimizing a given distance metric d may be sufficient for the derivation of useful signal decom- position algorithms. Certainly, many alternative divergence criteria could also be contemplated [4, 3, 12]. However, for many applications it is not clear which distance metric to take or what the dimension of the latent matrices W and H should be. Such model selection questions are inherently re- lated to the underlying statistical properties of V and can be approached in a principled manner via a Bayesian treatment.

We recast NMF with the popular Euclidean, KL and IS costs from a statistical perspective. We show in Section 2 how these factorizations are underlain by a well-defined sta- tistical model of superimposed components, either Gaussian or Poisson distributed, and are equivalent to maximum like- lihood estimation of either mean, variance or intensity pa- rameters. By treating the components as hidden-variable we derive NMF algorithms in Section 3, based on Expectation- Maximization (EM), Markov chain Monte Carlo (MCMC) and Variational Bayes (VB). We also review standard mul- tiplicative algorithms and elaborate on the connections be- tween cost functions (3), (4), (5) and Bregman and β diver- gences [3, 4]. Finally, we discuss in Section 4 the potentials of such probabilistic interpretations of NMF. Parts of the sta- tistical analysis and some of the algorithms presented here have already been published in the literature (see subsequent references); this paper aims at describing these related works in a unified statistical setting.

2. STATISTICAL MODELS 2.1 Observation models

The choice of a certain cost function d(.|.) to measure the fit between vf n and ˆvf n implies certain statistical assumptions about how vf nis generated from ˆvf n. It was already pointed in various papers, e.g, [5, 2] that Euclidean, KL and IS NMF

(2)

underlie the following generative models :

vf n∼N (vf n; ˆvf n, σ2) EUC-NMF (6) vf n∼P(vf n; ˆvf n) KL-NMF (7) vf n∼G (vf n; a, a/ ˆvf n) IS-NMF (8) whereN , P, G refer to the Gaussian, Poisson and Gamma distribution, respectively, defined in the Appendix and where

ˆ

vf n obeys the parametrization ˆvf n= ∑kwf khkn. The likeli- hood of the parameters W and H under the latter models can be mapped to the corresponding cost function (2), so that NMF is actually equivalent to maximum likelihood estimation. In other words, EUC-NMF underlies an additive Gaussian noise, KL-NMF underlies a Poisson noise and IS-NMF underlies a multiplicative Gamma noise.

As matter of fact, all three cost functions belong to the family of regular Bregman divergences, which are in one to one correspondence to families of regular exponential dis- tributions [1]. For scalars, a Bregman divergence is defined with respect to a (differentiable) convex function φ as follows (see, e.g, [1, 12])

dφ(x|y) = φ (x) − (φ (y) + φ0(y)(x − y)).

We have the following correspondences dEUC(x|y) ↔ φ (y) = y2/2, dKL(x|y) ↔ φ (y) = y log y − y, dIS(x|y) ↔ φ (y) =

− log y. NMF with Bregman divergences has been studied in [4] where various multiplicative algorithms are described.

2.2 Composite models

An interesting property of the Gaussian and Poisson distribu- tions is that they are closed under summation; when x = ∑kck and ck are Poisson (or Gaussian), x is Poisson (or Gaus- sian). Conversely, any x can be decomposed as ∑kckwith- out changing the underlying model. In the sequel, we will elaborate on these specific models by further pointing and exploiting their composite structure. We here introduce the following generative model

xf n =

k

ck, f n (9)

ck, f n ∼ p(ck, f nk) (10)

where θk= {w:,k, hk,:}. The next paragraphs describe how Euclidean, KL and IS NMF are equivalent to ML estimation of θ = {θ1, . . . , θK} in specific cases of the latter model, with either vf n= xf nor vf n= |xf n|2. We note Ckand X the F × N matrices with entries {ck, f n}f nand {xf n}f n, respectively. In the sequel we will refer to Ckas component.

NMF with the Euclidean distance (EUC-NMF) The corresponding generative model is

ck, f n ∼ N (ck, f n; wf khkn2

K ) (11)

It is easily shown that

− log p(X|W, H, σ2) = 1

σ2DEUC(X|WH)+NF

2 log(2πσ2) Hence, ML estimation of W and H is equivalent to NMF of V = X into WH where the Euclidean distance is used.

There is however an interpretability ambiguity with the gen- erative model defined by Eqs. (9), (10), (11) as it may pro- duce negative data. As such, even though the resulting opti- mization problem is in the end the same provided that avail- able data X is nonnegative, there is a semantic difference between the two points of view given by EUC-NMF and ML estimation in the Gaussian composite generative model. A more suitable approach, would be to assume the components to be generated from a truncated normal distribution, but this would break the formal correspondence between the two ap- proaches due to the necessary re-normalization of the com- ponent distributions.

NMF with the generalized KL divergence (KL-NMF) Assume the following generative model

ck, f n ∼ P(ck, f n; wf khkn) (12) It is easily shown that

− log p(X|W, H)= Dc KL(X|WH)

where= denotes equality up to a constant. Hence, ML es-c timation of W and H is equivalent to NMF of V = X into WH where the KL divergence is used. The data X produced by the generative model defined by Eqs. (9), (10), (12) is non- negative, but there is still an interpretability ambiguity with real-valued data, as the Poisson process produces integers.

NMF with the IS divergence (IS-NMF) Assume the following generative model

ck, f n ∼ Nc(ck, f n; 0, wf khkn)

The data X generated from this model is complex (but we could also assume a real Gaussian pdf instead of complex).

It is easily shown that [5]

− log p(X|W, H)= Dc IS(|X|.2|WH),

where |X|.2is the matrix with entries |xf n|2. Hence, ML esti- mation of W and H is equivalent to NMF of V = |X|.2into WH where the IS divergence is used. This also corresponds to a = 1, i.e, exponential multiplicative noise in Eq. (8).

3. ALGORITHMS 3.1 Multiplicative algorithms

The multiplicative gradient descent approach taken in [8, 3]

is akin to updating each parameter by multiplying its value at previous iteration by the ratio of the negative and posi- tive parts of the derivative of the criterion w.r.t this param- eter, namely θ ← θ .[∇ f (θ )]/[∇ f (θ )]+, where ∇ f (θ ) = [∇ f (θ )]+− [∇ f (θ )]and the summands are both nonnega- tive. This ensures nonnegativity of the parameter updates, provided initialization with a nonnegative value. A fixed point θ? of the algorithm implies either ∇ f (θ?) = 0 or θ?= 0. This leads to the following updates,

H ← H.WT((WH).[β −2].X)

WT(WH).[β −1] (13)

W ← W.((WH).[β −2].X) HT

(WH).[β −1]HT (14)

(3)

where β = 2 corresponds to EUC-NMF, β = 1 to KL-NMF and β = 0 to IS-NMF, and ‘.’ and ‘./.’ denote entrywise op- erations. Other values of β correspond to performing NMF with the β -divergence dβ(x|y) [3, 5], which is actually the Bregman divergence corresponding to φ (y) = β (β −1)1 yβ, for β /∈ {0, 1}, and which takes the KL and IS cost as limiting cases when β goes to 1 and 0, respectively.

Lee & Seung [8] showed that criterion (2) is nonincreas- ing under the latter updates for β = 2 (Euclidean distance) and β = 1 (KL divergence) and the proof was extended by Kompass [6] for values 1 ≤ β ≤ 2, i.e, where dβ(x|y) is con- vex w.r.t y. Solving for the more simple problem v:,n≈ Wh:,n with W fixed, the proof is simply based on the construction of the functional

G(h:,n, ˜h:,n) =

f k

λk f nd(vf n|wf khkn λk f n

) with λk f n= wf k˜hkn

[W ˜h]f n which is easily shown to be a suitable auxiliary function for C(h) = D(v|Wh) (i.e, G(h, h) = C(h) and G(h, ˜h) ≥ C(h)) by convexity of d(x|y) and using Jensen’s inequality. A similar auxiliary function can be built to solve for vTf,:≈ HTwTf,:with H fixed.

However, the criterion was observed by many authors [4, 3, 5] to be still nonincreasing under updates (13) and (14) for values of β out of the (1, 2) interval (and in particular for β = 0 corresponding to IS divergence), but no proof is available.

Though popularized by Lee & Seung for NMF within the machine learning community in the last decade, the mul- tiplicative updates for each factor in Euclidean and KL NMF corresponds to well-known algorithms for image restoration in the inverse problem community, see [7] and references therein.

3.2 EM algorithms

In Section 2 we have shown how EUC, KL and IS-NMF un- derlie statistical composite models. The components act as latent variablesand may be used as complete data in the EM algorithm. In this setting the following functional has to be maximized iteratively

Q(θ |θ0)def= − Z

C

log p(C|θ ) p(C|X, θ0) dC.

where θ = {W, H} and C is the tensor with slices Ckand elements ck, f n. The convergence of this algorithm to a sta- tionary point is granted. Using conditional independence

p(C|θ ) =

k

p(Ckk) the EM functional can be written

Q(θ |θ0) =

k

Qkk0),

Qkk0)def= − Z

Ck

log p(Ckk) p(Ck|X, θ0) dCk. (15) Under suitable i.i.d assumptions the functional is further re- duced to

Qkk0) = −

f n Z

ck, f n

log p(ck, f nk) p(ck, f n|xf n, θ0) d ck, f n. (16)

We now explicit the EM algorithm in the specific cases of Euclidean, KL and IS NMF. Note that in the follow- ing we are not able to minimize Qk(w:,k, hk,:0) jointly in w:,k and hk,:, but only to perform coordinate descent, i.e, produce w(i+1):,k and h(i+1)k,: such that Qk(w(i+1):,k , h(i+1)k,:(i)) ≥ Qk(w(i):,k, h(i+1)k,:(i)) ≥ Qk(w(i):,k, h(i)k,:(i)), which leads strictly speaking to a (converging) generalized EM (GEM) algorithm instead of pure EM. In the following, the apostrophe0will re- fer to parameter values as of previous iteration (i).

3.2.1 EUC-NMF

− log p(ck, f nk) =c 1

2(ck, f n− wf khkn)2 p(ck, f n|xf n, θ ) = N (ck, f nk, f npost, λk, f npost) with

µk, f npost= wf khkn+1

K(xf n− ˆxf n), λk, f npost=K− 1

K2 σ2 (17) where here ˆxf n= ˆvf n= ∑kwf khkn. Hence, the minimization of functional (16) subject to nonnegative constraints leads to

hkn =

fwf k

1

K(xf n− ˆx0f n) + w0f kh0kn

fw2f k

+

(18)

wf k =

nhkn

1

K(xf n− ˆx0f n) + w0f kh0kn

nh2kn

+

(19)

where bxc+= max{x, 0}. These update equations differ from the usual multiplicative updates given from Eq. (13) and (14).

3.2.2 KL-NMF

− log p(ck, f nk) =c −wf khkn+ ck, f nlog(wf khkn) p(ck, f n|xf n, θ ) = B ck, f n|vf n, πk, f n

where πk, f n= wf khkn/ ˆxf n and here ˆxf n= ˆvf n= ∑kwf khkn. This leads to

hkn= h0kn

fw0f k

xf n

ˆ x0f n



kwf k , wf k= w0f k

nh0kn

xf n

ˆ x0f n



nhkn (20) which coincides with the usual multiplicative updates given by Eq. (13) and (14).

3.2.3 IS-NMF

− log p(ck, f nk) =c log(wf khkn) +|ck, f n|2 wf khkn p(ck, f n|xf n, θ ) = N (ck, f nk, f npost, λk, f npost) with

µk, f npost= wf khkn

lwf lhlnxf n, λk, f npost= wf khkn

lwf lhln

l6=k

wf lhln. (21)

(4)

Leading to hkn= 1

F

f

v0k, f n

wf k, wf k = 1 N

n

v0k, f n

hkn , (22) with v0k, f n= |µk, f npost0|2+ λk, f npost0. These update equations differ from the multiplicative updates given from Eq. (13) and (14), and are equivalent to the SAGE algorithm described in [5].

3.2.4 Bayesian maximum a posteriori

It is interesting to note that the EM framework readily ac- commodates Bayesian approaches for which prior informa- tion about the parameters W and H is available in the form of prior distributions p(H) and p(W). The complete data likelihood term − log p(Ckk) needs only be changed by

− log p(θk|Ck) in Eq. (15), leading to the following func- tional to be maximized

QMAPkk0) = Qkk0) − log p(w:,k) − log p(hk,:) so that only the M-step is changed.

3.3 MCMC algorithms

Monte Carlo methods [9] are powerful computational tech- niques to estimate expectations of form

E= hψ(θ )ip(θ )≈1 L

L

i=1

ψ (θ(i)) = ˜EL

where θ(i)are samples drawn from p(θ ). Under mild condi- tions on the test function ψ, the estimate ˜ELconverges to the true expectation for L → ∞. The difficulty here is obtaining independent samples {θ(i)}i=1...Lfrom complicated distribu- tions. MCMC techniques generate subsequent samples from a Markov chain. One particularly convenient and simple pro- cedure is the Gibbs sampler where one samples each block of variables from full conditional distributions. In the Bayesian setting for the NMF model, a possible Gibbs sampler is

C(i)∼ p(C|W(i−1), H(i−1), X) for k = 1 : K do

h(i)k,:∼ p(hk,:|C(i)k , w(i−1):,k ) w(i):,k∼ p(w:,k|C(i)k , h(i)k,:) end for

Denoting cf n= [c1, f n, . . . , cK, f n]T, the posterior of the hidden components writes

p(C|W, H, X) =

f n

p(cf n|wf,:, h:,n, xf n)

Next, we derive the full conditionals for the three considered models.

3.3.1 EUC-NMF

The posterior of cf nis given by

p(cf n|wf,:, h:,n, xf n) = N (cf npostf n , Σpostf n ) with µpostf n = [µ1, f npost. . . µK, f npost]T, where µk, f npost is defined in Eq. (17), and Σpostf n = σK2(IKK1eKeTK). The diagonal

terms correspond to the posterior variance in Eq. (17). In the unconstrained case conjugate priors for h:,n and wf,:

would be Gaussian. However, more sophisticated sampling schemes are required to enforce nonnegativity, typically by using Gamma priors, see, e.g, [10, 11].

3.3.2 KL-NMF

The full conditional of cf nis given by

p(cf n|wf,:, h:,n, xf n) = M (cf n|xf n, πf n)

where M refers to the multinomial distribution defined in Appendix and πf n = [π1, f n, . . . , πK, f n], with πk, f n = wf khkn/xf n, as defined in Section 3.2.2. Using conjugate pri- ors

p(wf k) = G (wf kw, βw), p(hkn) = G (hknh, βh), the full conditionals can be derived as [2]

p(wf k|Ck, hk,:) = G (wf kw+

n

ck, f n, αwβw+

n

hkn) p(hkn|Ck, w:,k) = G (hknh+

f

ck, f n, αhβh+

f

wf k)

3.3.3 IS-NMF

Denoting λf n= [wf1h1n. . . wf KhKn]T, the posterior of cf nis given by

p(cf n|wf,:, h:,n, xf n) = N (cf npostf n , Σpostf n ) with µpostf n = [µ1, f npost. . . µK, f npost]T, where µk, f npost is defined in Eq. (21), and Σpostf n = diag λf n −vˆ1

f nλf nλTf n. The diago- nal terms correspond to the posterior variance in Eq. (21).

Using conjugate inverse-Gamma priors p(hkn) = I G (hknh, βh), p(wf k) = I G (wf kw, βw), the full conditionals of hk,:and w:,kwrite p(wf k|Ck, hk,:) = I G (wf kw+ N, βw+

n

|ck, f n|2/hkn) p(hkn|Ck, w:,k) = I G (hknh+ F, βh+

f

|ck, f n|2/wf k)

3.4 Variational Bayes

We finally describe how the composite structure of Eu- clidean, KL and IS NMF can be exploited to derive a vari- ational Bayes algorithm [13]. The idea is to bound the marginal likelihood from below

LX(ϑ ) ≡ log p(X|ϑ ) ≥BV B[q]

≡ Z

qlogp(X, C, W, H|ϑ )

q d(C, W, H)

= hlog p(X, C, H, W|ϑ )iq+ H[q]

where ϑ denotes the hyperparameters and q is defined as

q =

f n

q(cf n)

!

f k

q(wf k)

!

kn

q(hkn)

!

α ∈C

qα

(5)

The integral over C will be a summation when C are discrete (i.e, Poisson component in the KL case). Here, α ∈C = {C, W, H} denotes the set of disjoint clusters of variables.

A local optimum can be attained by the following fixed point iteration:

q(i+1)α ∝ exp

hlog p(X, C, W, H|ϑ )i

q(i)¬α

 where q¬α = q/qα. The expectations of hlog p(X, C, W, H|ϑ )i are functions of the sufficient statistics of q. It turns out that the variational update equations have very similar forms to the full conditionals derived for the Gibbs sampler. Here, due to lack of space we only give the equations for the KL case:

q(cf n) = M (cf n|xf n, πf n)

where πf n= [π1, f n, . . . , . . . πK, f n] andck, f n = xf nπk, f nwith

πk, f n ≡ exp(log wf k + hlog hkni)

kexp(log wf k + hlog hkni) The full conditionals can be derived as [2]

q(wf k) = G (wf kw+

n

ck, f n , αwβw+

n

hhkni) q(hkn) = G (hknh+

f

ck, f n , αhβh+

f

wf k )

One attractive feature of VB is that the hyperparameters can be optimized by maximizing the variational bound BV B[q].

While this does not guarantee to increase the true marginal likelihood, it leads in this application to algorithms that en- ables one to do full Bayesian model selection a lot more faster than MCMC based sampling approaches where cal- culation of the marginal likelihood is trickier. For a detailed discussion see [2].

4. DISCUSSION

In this overview paper, we have discussed the probabilistic interpretation of various NMF models in maximum likeli- hood, MAP and full Bayesian setting. In all the algorithms we discuss, we are exploiting the closure under summation property of the observation model and the closed form avail- ability of all the full conditionals. It should be noted that this is not the case for all divergence measures. In other cases other optimization techniques need to be employed.

Prior structures are needed to control the decompositions for exploratory data analysis or various problems in signal processing. There is an emphasis on optimization strategies for maximum likelihood or MAP estimation in NMF models but less research on efficient Bayesian integration methods (with a few exceptions such as [14, 2, 11]). Moreover, as the number of alternatives for data modelling increases (for example consider the number of factorization options with increasing data dimension in tensor factorization) there is a need to do model order selection and model averaging in a principled manner for which ML approaches are known to be inappropriate. Due to lack of space, we are not giving in this paper simulation results with the developed algorithms but refer the reader to other work, such as [2, 5]. A detailed and exhaustive comparison of the algorithms in terms of ef- fectiveness for various signal decomposition is a natural next step and is currently under progress.

A. STANDARD DISTRIBUTIONS

Multivariate Gaussian, with c = 1/2 or 1 (real/complex case) N (x|µ,Σ) = |π Σ/c|−cexp −c(x − µ)TΣ−1(x − µ)

Poisson P(x|λ) = exp(−λ)λx!x

Binomial B(x|n, p) = nx px(1 − p)n−x Multinomial

M (c|n,p) = c n

1c2... cK pc11p2c2· · · pcKKδ (n − ∑kck)

Gamma G (u|α,β) =Γ(α )βα uα −1exp(−β u), u ≥ 0 inv.-Gamma I G (u|α,β) =Γ(α )βα u−(α+1)exp(−βu), u ≥ 0 Acknowledgements

We wish to thank the reviewers for many very helpful com- ments and suggestions, as well as O. Capp´e for discussions related to this work.

REFERENCES

[1] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman divergences. Journal of Machine Learning Research, 6:1705–1749, 2005.

[2] A. T. Cemgil. Bayesian inference in non-negative matrix factorisation models. Technical Report CUED/F-INFENG/TR.609, University of Cambridge, July 2008. Accepted for publication in Computational Intelligence and Neuroscience.

[3] A. Cichocki, R. Zdunek, and S. Amari. Csiszar’s divergences for non- negative matrix factorization: Family of new algorithms. In Proc. 6th International Conference on Independent Component Analysis and Blind Signal Separation (ICA’06), pages 32–39, Charleston SC, USA, Mar. 2006.

[4] I. S. Dhillon and S. Sra. Generalized nonnegative matrix approxi- mations with Bregman divergences. Advances in Neural Information Processing Systems (NIPS), 19, 2005.

[5] C. F´evotte, N. Bertin, and J.-L. Durrieu. Nonnegative matrix factor- ization with the Itakura-Saito divergence. With application to music analysis. Neural Computation, 21(3), Mar. 2009.

[6] R. Kompass. A generalized divergence measure fon nonnegative ma- trix factorization. Neural Computation, 19(3):780–791, 2007.

[7] H. Lant´eri, M. Roche, O. Cuevas, and C. Aime. A general method to devise maximum-likelihood signal restoration multiplicative algo- rithms with non-negativity constraints. Signal Processing, 81(5):945–

974, May 2001.

[8] D. D. Lee and H. S. Seung. Algorithms for non-negative matrix fac- torization. In Advances in Neural and Information Processing Systems 13, pages 556–562, 2001.

[9] J. S. Liu. Monte Carlo strategies in scientific computing. Springer, 2002.

[10] S. Moussaoui, D. Brie, A. Mohammad-Djafari, and C. Carteret.

Separation of non-negative mixture of non-negative sources using a Bayesian approach and mcmc sampling. IEEE Trans. on Signal Pro- cessing, 54(11):4133–4145, Nov. 2006.

[11] M. N. Schmidt, O. Winther, and L. K. Hansen. Bayesian non-negative matrix factorization. In In Proc. 8th Internation conference on Inde- pendent Component Analysis and Signal Separation (ICA’09), Paraty, Brazil, Mar. 2009.

[12] A. P. Singh and G. J. Gordon. A unified view of matrix factoriza- tion models. In Proc. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2008), Part II, number 5212 in LNAI, pages 358–373.

Springer, 2008.

[13] M. Wainwright and M. I. Jordan. Graphical models, exponential fam- ilies, and variational inference. Foundations and Trends in Machine Learning,, 1:1–305, 2008.

[14] O. Winther and K. B. Petersen. Bayesian independent component anal- ysis: Variational methods and non-negative decompositions. Digital Signal Processing, 17(5):858–872, Sep. 2007.

Referanslar

Benzer Belgeler

The power capacity of the hybrid diesel-solar PV microgrid will suffice the power demand of Tablas Island until 2021only based on forecast data considering the

Considering the importance of gender stereotypes, the present study aimed (1) to show how people describe women and men in Turkey, and (2) to generate themes of these descriptions

The aim of this study was to determine hepatitis A and B seroprevalence rates and immunity in patients with chronic hepatitis C in different regions of Turkey.. Materials

questionnaire form can be summarized under eleven modules, which are: General Firm Information, Market Properties and Competition Structure, Firms’ Strategies,

In order to jus- tify premises that would undermine our mathematical and logical beliefs, we need to make use of our mathematical and logical beliefs; so if the conclusion is right

The present manuscript details the characterization of a curious scattering regime associated with low-refractive index materials, describes the phenomenon displayed as a

Araflt›rma verilerinin analizi sonucunda üniversite- lerin tan›t›m videolar›nda vurgulanan temalara ve üniversite- lerin vermifl olduklar› e¤itim aç›s›ndan

When a structuring element is placed in a binary image, each of its pixels is associated with the corresponding pixel of the neighbourhood under the