Abstract—In this paper, we propose a Bayesian methodology for receiver function analysis, a key tool in determining the deep structure of the Earth’s crust

Tam metin

(1)IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 12, DECEMBER 2010. 4151. A Bayesian Deconvolution Approach for Receiver Function Analysis Sinan Yıldırım, A. Taylan Cemgil, Member, IEEE, Mustafa Aktar, Yaman Özakın, and Ay¸sın Ertüzün, Member, IEEE. Abstract—In this paper, we propose a Bayesian methodology for receiver function analysis, a key tool in determining the deep structure of the Earth’s crust. We exploit the assumption of sparsity for receiver functions to develop a Bayesian deconvolution method as an alternative to the widely used iterative deconvolution. We model samples of a sparse signal as i.i.d. Student-t random variables. Gibbs sampling and variational Bayes techniques are investigated for our specific posterior inference problem. We used those techniques within the expectation-maximization (EM) algorithm to estimate our unknown model parameters. The superiority of the Bayesian deconvolution is demonstrated by the experiments on both simulated and real earthquake data. Index Terms—Bayesian inference, deconvolution, expectationmaximization (EM), Gibbs sampling, inverse-gamma, Monte Carlo methods, receiver function, sparsity, variational Bayes.. I. I NTRODUCTION A. Receiver Function Analysis. R. ECEIVER function analysis is an efficient and widely used method for determining the deep structure of the Earth’s crust and mantle down to depths of 100 km and even more. The approach is based on the conversion of the elastic waves at structural boundaries, namely P-to-S conversion. The incoming P-wave from a distant earthquake, considered as the source signal, generates a converted S-wave whenever it passes through a structural boundary. In other words, the S-wave is defined as the signal obtained by the convolution of the source signal with an impulse function, namely the receiver function, that characterizes the earth structure. Both P- and S-waves continue propagating to the surface with different velocities and are subsequently recorded at two different components of. Manuscript received August 6, 2009; revised February 26, 2010. Date of publication July 8, 2010; date of current version November 24, 2010. This work was supported by The Scientific and Technological Research Council of Turkey (TUBITAK) under Grant number 107E050. Part of this work was done when the author was at the Signal Processing and Telecommunications Laboratory, University of Cambridge, U.K. During his stay, he was supported in part by Prof. W. Fitzgerald. S. Yıldırım is with the Statistical Laboratory, Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, CB3 0WB Cambridge, U.K. (e-mail: [email protected]). A. T. Cemgil is with the Department of Computer Engineering, Bo˘gaziçi University, 34342 Istanbul, Turkey (e-mail: [email protected]). M. Aktar and Y. Özakın are with the Department of Geophysics, Kandilli Observatory and Earthquake Research Institute, Bo˘gaziçi University, 34342 Istanbul, Turkey (e-mail: [email protected]; [email protected]). A. Ertüzün are with the Department of Electrical and Electronic Engineering, Bo˘gaziçi University, 34342 Istanbul, Turkey (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TGRS.2010.2050327. a three-component seismic station. An illustration of a seismological process and the resulting receiver function is given in Fig. 1. The task is then to estimate the receiver function by deconvolving the composite S-wave from the source signal P-wave which, in practice, reduces down to deconvolving the radial component of the seismogram from the vertical. The deconvolution approach has been widely used since it was proposed 40 years ago [2]–[5]. The performance of deconvolution heavily depends upon both the quality of the recorded data and the choice of the deconvolution algorithm. Deep and large earthquakes provide signal with sufficient energy and uncontaminated from surface interactions, and are best suited for such applications. However, they are rare and therefore require long observation campaigns, leading to a tradeoff between the data quality and operational costs. A major effort was given to improve the deconvolution procedure, which initially used frequency domain methods [6], [7] and later timedomain approaches [6], [8]–[11]. More effective deconvolution procedures provided ways of getting most out of lesser quality data sets [12], [13] and reduced the operational costs of long field campaigns. Ligorra and Ammon [12] described and applied a timedomain iterative deconvolution approach to receiver-function estimation and illustrated the reliability and advantages of the technique using synthetic simulations and experiments with real data. The approach is described in [14] as well. Özakın and Aktar [15] applied iterative deconvolution for extracting the crustal structure of Southwestern Anatolia, and reported that the iterative deconvolution outperformed the frequencydomain methods like the inverse filtering [4] and water-level deconvolution methods [5]. B. Sparsity and Bayesian Deconvolution Since the convolution model in receiver function analysis arises from the refractions from the P-wave to the S-wave, the receiver function can be assumed as sparse, i.e., there is mostly no activity and the signal level is mostly very close to zero. Occasional spikes are the indicators of P-wave to S-wave refractions. However, the sparsity assumption is not used in the deconvolution methods stated above. Only in iterative deconvolution, the sparsity is introduced indirectly by limiting the number of iterations in an adhoc manner; the computational process is not even guaranteed to converge. Moreover, the signals recorded by a seismogram are noisy, which reduces the performance of the deconvolution algorithms above as the noise is not taken into account. Finally, iterative deconvolution needs the complete convolution data to produce. 0196-2892/$26.00 © 2010 IEEE.

(2) 4152. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 12, DECEMBER 2010. The most crucial point of applying Bayesian methods is the choice of the form of the prior distribution, that is, the statistical modeling of sparsity. In literature, one approach is to model the coefficients of a sparse process by a Bernoulli–Gaussian distribution, which is popular particularly in seismic deconvolution problems [17]–[20], [30]. An early example can be seen in a paper of Mendel et al. [17], where the authors used this prior for deconvolution of reflectivity sequences. For possible Bayesian methods developed for this model for the blind deconvolution problem, one can see [19], [20], and [30]. A modification of this prior-to-model dependencies between successive reflectivity sequences is used in [21] and [22], where a Bernoulli–Markov random field is introduced for the location of layer discontinuities. The Bernoulli–Gaussian model is popular in literature because it is very intuitive and its statistical properties allow easy implementation of Bayesian methods such as Gibbs sampling [31] and the EM [32]. However, this model suffers from multimodality and needs very good initializations. Modeling samples of a sparse signal with a Gaussian random variable whose variance is an inverse-gamma random variable could be an alternative approach for the deconvolution of sparse signals. This prior has already been used in many problems such as 1) source separation, [25]–[28]; and 2) blind image deconvolution [29]. Like the Bernoulli–Gaussian prior, the statistical properties of this model allow easy implementation of Gibbs sampling and the EM algorithm. It also enables us to use variational Bayes techniques for a closed-form approximate posterior inference [33]. Moreover, the model has been reported to have smooth objective functions and to be robust to initialization [27]–[29], [34]. C. Contribution and Structure of the Paper. Fig. 1. Illustration of the seismological process and the resulting receiver function. (b) is taken from [1]. (a) Locations of the source signal and the receiver. (b) Propagation of P- and S-waves with different velocities and the corresponding receiver function.. reliable results. However, when this is not the case in seismic recordings, where the vertical component is as long as the radial component, we have a truncated observation, i.e., missing data. Bayesian theory allows us to formulate our prior knowledge about the data which is ignored by the classical approaches [16]. Since the sparsity is an important information about the characteristics of a signal, using Bayesian methods for receiver function estimation can improve performance. Moreover, since the Bayesian methods are model-based, they can handle observation noise and missing data problems. Sparsity has been exploited in a Bayesian manner in many application areas. An example, very similar to our application area, is seismic deconvolution, see [17]–[23]. Use of sparsity for seismic deconvolution has been justified as well by Larue et al. [24], where they have shown that the sparsity assumption is more relevant than whiteness assumption. Among other applications where sparsity is exploited in a Bayesian manner are source separation, see [25]–[28], and image deconvolution, see [29].. In this paper, we introduce a Bayesian deconvolution approach for the receiver function analysis. Our main motivation is the observation that a receiver function can be modeled as a sparse trace convolved with a narrow smooth pulse, like the Gaussian pulse. We are also motivated by the fact that exploiting the sparsity in the receiver functions in a Bayesian framework has interestingly not been introduced so far, though there are many such works for very similar problems, such as marine seismic deconvolution. Rather than the standard choice, which is the Bernoulli prior, we choose the inverse-gamma prior for the sample variances of the sparse trace. We investigate Gibbs sampling and variational Bayes techniques for our specific posterior inference problem. For the estimation of the unknown parameters in our model, we simply use the EM algorithm. Since our main concern is to have improvement in the performance of receiver function analysis, we show through experiments with both simulated and real data that Bayesian deconvolution can overcome the disadvantages of iterative deconvolution, and produces better results. (A detailed comparison of the inverse-gamma and Bernoulli–Gaussian priors is already available in [34]). For real-life experiments, we use three sets of recordings taken from different seismic stations in Turkey by Bo˘gaziçi University Kandilli Observatory and Earthquake Research Institute. The organization of this paper is as follows: In Section II, the deconvolution problem is stated. In Section III, iterative deconvolution algorithm is summarized. In Section IV, the.

(3) YILDIRIM et al.: BAYESIAN DECONVOLUTION APPROACH FOR RECEIVER FUNCTION ANALYSIS. methodology for Bayesian deconvolution is investigated. In Section V, comparison between the iterative deconvolution and Bayesian deconvolution are given based on the simulation and real data experiments. In the last section, we discuss the results with concluding remarks and give possible directions for future work. II. P ROBLEM D EFINITION As stated in Section I, via a three-component seismic station, the vertical and radial components of motion are available. The vertical component s corresponds to the source signal, whereas, the radial component y is the observation signal, which is the convolution of the source signal and the receiver function r of the earth structure near the receiver. Some additive noise v is present in the observation, and in here, it is assumed to be independent and identically distributed (i.i.d.) white Gaussian. Therefore, to estimate the receiver function, a deconvolution operation must be performed on the noisy observation using the source. The noisy convolution can be formulated as follows: yt =. Ts . si rt−i+1 + vt ,. t = 1, 2, . . . , Ty .. (1). i=1. For simplicity, we will use the notation y =s∗r+v. (2). where ∗ denotes the convolution operation, which is the summation part of (1). In (2), the vertical component is represented by a Ts × 1 vector of coefficients, s = [s1 s2 · · · sTs ] , where [·] is used for the matrix-vector transposition. Likewise, r = [r1 r2 · · · rTr ] denotes the vector of receiver function coefficients, and v = [v1 v2 · · · vTy ] is the observation noise vector with vk being i.i.d Gaussian with zero mean and variance σv2 . Finally, y = [y1 y2 · · · yTy ] is the vector of samples of the noisy radial components. Gaussian Filter for the Receiver Function: Rather than a spiky waveform, the receiver function is preferred to be modeled as a sequence of smooth and slowly varying pulses. For that purpose, the receiver function is assumed to be a sparse trace h, filtered with a Gaussian pulse g. Therefore, the receiver function r can be expressed by r=g∗h. (3). where g is the vector with length Tg used for the discrete approximation of the Gaussian pulse and h ≡ [h1 h2 · · · hTh ] be the vector of sparse trace coefficients with Th = Tr − Tg + 1. In common literature, the Gaussian pulse is defined commonly in the frequency domain as [12] ω2 G(ω) = exp − 2 (4) 4a where ω = 2πf denotes the angular frequency. Note that the frequency content is controlled by the Gaussian filter-width parameter a, and the filter is a unit area pulse (i.e., the filter gain at ω = 0 is unity). It is common in seismology to quantify the filter by the frequency at which G(2πf ) has a value of 0.1 [12].. 4153. Since the convolution operation is associative, using (3), we can rewrite (2) as y = s ∗ r + v = s ∗ (g ∗ h) + v = (s ∗ g) ∗ h + v.. (5). Notice that z ≡ s ∗ g in (5) is given to us. Hence, we can perform deconvolution in order to estimate the receiver function. Given y and s, we shall follow the steps below for deconvolution of the receiver function r: • Calculate z = s ∗ g. • Given y and z, perform deconvolution of h. • Calculate the estimated receiver function r = h ∗ g. III. I TERATIVE D ECONVOLUTION Iterative deconvolution is a least squares method used to find the minimum least squares solution to y = h ∗ z for h given y and z. It was described and used in [12] and [14] for receiverfunction estimation and inversion of complex body waves. The iterative procedure for the discrete version of the algorithm as expressed in [14] is as follows: Define the delay operator Δt as 0, if τ ≤ t . (6) [Δt z]τ = zτ −t , if Tz + t ≥ τ > t Start with i = 1. 1) Find ti and mi such that τ (yτ − mi [Δti z]τ )2 is minimized. For such a ti , mi = y Δti z/z z. (Zero-pad either of the vectors y and Δti z if needed.) 2) Update y ← y − mi Δti z. 3) Update hti ← hti + mi . 4) If the algorithm has not converged, set i ← i + 1, go to 1. The algorithm is easy to implement and very fast. In receiverfunction analysis, it is terminated after some number of iterations. Therefore, convergence is not assured. Moreover, it does not handle sparsity and existence of the observation noise. Also, in case of a truncated convolution data, it automatically assumes the truncated part as zero. We propose a Bayesian approach to overcome these problems. IV. BAYESIAN D ECONVOLUTION The idea behind the Bayesian deconvolution is to model h in a statistical way so that it is a sparse process. In other words, a prior distribution is assigned for h. One suitable approach is to model samples of h as i.i.d. Gaussian random variables each with zero mean and variance being an inverse-gamma random variable. That is 2 2 2 σh,t ∼ IG(σh,t ; α, β). ht ∼ N ht ; 0, σh,t Here, the notation N (x; μ, σ 2 ) denotes the very well-known Gaussian distribution for the random variable x with mean μ and variance σ 2 [35]. 1 1 exp − 2 (x − μ)2 . N (x; μ, σ 2 ) = √ 2σ 2πσ 2 There also exists the multivariate Gaussian distribution for a random vector x with mean vector m and covariance.

(4) 4154. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 12, DECEMBER 2010. where Z is the Ty × Th convolution matrix corresponding to z zi−j+1 , if 0 ≤ i − j Zij = (8) 0, else. Fig. 2. Sparse trace obtained when α = 0.5, β = 0.0001.. matrix Σ [35] N (x; m, Σ) = . 1 exp − (x − m) Σ−1 (x − m) . 2 |2πΣ| 1. Additionally, the inverse-gamma distribution with the shape parameter α and the scale parameter β is given by [35] α+1 1 β βα exp − IG(x; α, β) = Γ(α) x x where Γ(·) denotes the gamma function [35]. Choosing suitable values for α and β, one can obtain sparse and impulsive traces (see Fig. 2). Note that in this model, the samples ht are Student-t random variables [27]. Having introduced our model, the deconvolution problem can be redefined in a Bayesian framework: For the proposed statistical model, we have 2 • Latent statistical variables: h and σh2 ≡ σh,(1:T h) • Deterministic parameters: Θ ≡ (σv2 , α, β). Given the observation data y, we wish to calculate the posterior distribution and thus the mean estimate of the unknown filter h (and σh2 ), and find the maximum likelihood solution for model parameters Θ. A. Inference of the Posterior Distribution We have to calculate the posterior distribution p(h, σh2 |y, Θ). We can write the logarithm of the unnormalized posterior, which is the joint density of y, h, and σh2 , as in the following: log p(h, σh2 , y|Θ) = log p(σh2 |Θ) + log p(h|σh2 ) + =. t=1. +. 2 log N (ht ; 0, σh,t ). t=1. + log N (y; Zh, σv2 ITy ). (7). (9). where δ(·) is the delta-Dirac function. Then, using (9), the expected value of any function of x under P (x) can be approximated as. f (x) P (x). N 1

(5) (n) ≈ f x N n=1. (10). where f (x) P (x) denotes the expectation of f (x) under the distribution of P (x). This approach is called Monte Carlo integration [38]. The idea of Gibbs sampling is that, even though it is difficult to draw samples from P (x) = p(x1 , x2 , . . . , xd ), it may be easier to sample from the conditional densities p(xi |x1 , . . . xi−1 , xi+1 , . . . , xd ). The general sampling scheme of Gibbs sampling for P (x) can be expressed by the following [36], [37]:

(6). (τ +1) (τ ) (τ ) (τ ) x1 ∼ p x1 |x2 , x3 , . . . , xd.

(7) (τ +1) (τ +1) (τ ) (τ ) x2 ∼ p x2 |x1 , x3 , . . . , xd. (τ +1). 2 log IG(σh,t ; α, β). Th . N. 1

(8) Pˆ (x) = δ x − x(n) N n=1. xd. log p(y|h, σh2 , Θ). Th . and ITy is the identity matrix of size Ty . It can be observed from (7) that the posterior distribution p(h, σh2 |y, Θ) does not have a closed-form expression in terms of well-known distributions, hence the expectations of functions of (h, σh2 ) under this posterior distribution cannot be evaluated easily. In other words, it is intractable. However, the full conditional densities p(h|σh2 , y, Θ) and p(σh2 |h, y, Θ) have well known closed form expressions that can be derived easily. Motivated by this fact, we can use Gibbs sampling [31] and variational Bayes techniques [33], [36], in order to approximate the exact posterior of (h, σh2 ). The iterations of two methods are very similar to derive. One uses samples of the latent variables, whereas, the other one defines a fixed point iteration between their sufficient statistics. 1) Gibbs Sampling for (h, σh2 ): Gibbs sampling is a Markov chain Monte Carlo (MCMC) sampling method [37] which can be used when the dimension d of the variable to be sampled, say x is at least two. All Monte Carlo techniques are based on the idea that an analytically inexpressible probability density function (pdf) P (x) is approximated by N number of drawn samples x(n) , n = 1, . . . , N.

(9). .. . (τ +1). ∼ p xd |x1. (τ +1). , x2. (τ +1). , . . . , xd−1. .. It has been shown that, under mild conditions, when τ → ∞, the samples are guaranteed to come from the exact distribution P (x). From (7), we can deduce the full conditional densities of h, and σh2 . The posterior of h is given by p h|σh2 , y, Θ = N (h; mh , Σh ) (11).

(10) YILDIRIM et al.: BAYESIAN DECONVOLUTION APPROACH FOR RECEIVER FUNCTION ANALYSIS. 2 −1 with Σh = (D−1 and mh = (1/σv2 )Σh Z y, h +(1/σv )Z Z) where Dh is the diagonal matrix of size Th × Th with 2 . Dhi,i = σh,i The full conditional posterior of σh2 is in fact conditioned on only the coefficients of the sparse filter h. It can be shown from (7) that T h 1 β + 0.5h2t 2 p h, σh ∝ exp (α + 1.5) log 2 − . (12) 2 σh,t σh,t t=1. 4155. A coordinate ascent maximization of the variational bound VB Q over the factorized distribution in (16) can be performed iteratively as follows [33]: q(x1 )(τ +1) ∝ exp log p(y, x) d i=2. q(x2 )(τ +1) ∝ exp log p(y, x) q(x. 1). (τ +1). p. . Th 2 = p σh,t |ht. =. t=1 Th . 2 IG σh,t ; α + 0.5, β + 0.5h2t .. (13). t=1. As we can see from (11) and (13), a Gibbs sampler can be used to draw samples from the joint posterior distribution. 2) Variational Bayes for (h, σh2 ): As an alternative for Monte Carlo methods, variational Bayesian methods, also called ensemble learning, are another family of techniques for approximating intractable integrals arising in Bayesian statistics [33], [36]. In variational inference, a posterior distribution of x, given the data y, p(x|y), is approximated by a variational distribution Q(x) with respect to some dissimilarity function, such as the Kullback–Leibler (KL) divergence between Q(x) and p(x|y), KL(Qp). We observe that any Q(x) gives rise to a lower bound to the log-evidence of the data log p(y) ≥ log p(y, x) Q(x) + H(Q) ≡ VBQ. (14). where H(Q) is the differential entropy of Q(x), and the inequality is a direct result of Jensen’s inequality and concavity of the log(·) function [39]. The lower bound VB Q in (14) is called the variational bound. On the other hand, we have log p(y) = KL(Qp) + VBQ .. Q(x) =. d i=1. (16). i). (τ +1). .. and the update rules for this factorized distribution are as follows: (τ +1) 2 q(h) ∝ exp log p h, σh , y|Θ q σ2 (τ ) ( h).

(11) (τ +1) q σh2 ∝ exp log p h, σh2 , y|Θ q(h)(τ +1) . Using similar steps to the ones used for deriving the Gibbs sampling iterations, we can show that.

(12) (τ +1) (τ +1) (17) q(h)(τ +1) = N h; mh , Σh (τ +1). Σh. with (τ +1) mh. =. 2 −1 2 )(τ ) + (1/σ )Z Z) = ( D−1 v h q(σh. (τ +1) (1/σv2 )Σh Z y;. q(σh2 )(τ +1) =. T . and. and.

(13). (τ +1) (τ +1) 2 IG σh,t ; αh , βh,t. (18). t=1. with αh = α + 0.5 and βh,t = β + 0.5 h2t q(h)(τ +1) . The sufficient statistics in (17) and (18) are easy to derive. 2 (τ ) Since q(σh,t ) is an inverse-gamma with scale and shape 2 ) is a gamma random variable parameters αh and βh,t , (1/σh,t with shape and scale parameters (αh , (1/βh,t )). It is well known that the mean of a gamma random variable is the product of its shape and scale parameters, therefore we have 1 αh = 2 σh,t βh,t (τ ) 2 q (σh,t ) For (18), we only need to know the sufficient statistics. h2t q(h)(τ +1) for each t. Since q(h)(τ +1) is a Gaussian with (τ +1). q(xi ).. q(xi )(τ ). We have shown that the full conditional densities of h and σh2 can easily be calculated. Moreover, since they are in closed form in terms of known distributions, and the (required) sufficient statistics of the variables can be calculated from these distributions, we can use a variational Bayes algorithm with mean field approximation to infer the joint posterior. In this case, we approximate the full joint posterior as p h, σh2 |y, Θ = q(h)q σh2. (15). Since log p(y) is fixed with respect to Q(x), miaximizing the variational bound VBQ minimizes the KL divergence, which is our objective. Mean Field Approximation of the Posterior: It can be easily shown by substituting Q(x) = p(x|y) in (14) and observing the equality that, the log-evidence log p(y) is maximized by the exact posterior, i.e., when Q(x) = p(x|y). However, since p(x|y) is not in a friendly form, we try to find the bestfitting Q(x) among a limited group of distributions, and the approximation is done by maximizing VB Q over this limited group of distributions. The mean field approximation is the case in which each Q(x) is fully factorized over the hidden variables x1 , x2 , . . . , xd. i=3. q(xd )(τ +1) ∝ exp log p(y, x) d−1 q(x i=1. σh2 |h. d. .. .. Since (12) is factorized, the posterior distributions of the variances can be found separately for each k as . q(xi )(τ ). mean mh. (τ +1). and covariance Σh 2 (τ +1)2 (τ +1) ht q(h)(τ +1) = mh t + Σh t,t ..

(14) 4156. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 12, DECEMBER 2010. B. Parameter Estimation: EM To estimate the unknown parameters, we use EM-type algorithms [32]. EM contains two iterating steps: 1) Calculating the posterior of the statistical latent variables given the current set of parameters, which is the expectation (E)-step; 2) Updating of the parameters under the calculated posterior distribution, which is the maximization (M)-step. Performing these steps iteratively, we eventually hope to find the posterior of the latent variables for the maximum likely parameter set. Remind that we have the latent statistical variables h and σh2 , and the deterministic parameters Θ = (σv2 , α, β). Thus, the EM steps are: E-step: Requires inference of the posterior to compute the EM quantity (19) Ω(k) = log p y, h, σh2 |Θ p(h,σ2 |y,Θ(k−1) ) . h M-step: Parameter estimation: Maximize Ω(k) over Θ Θ(k) = arg max Ω(k) .. (20). Θ. It can be shown that log p(y|Θ(k) ) monotonically increases [32].

(15).

(16) log p y|Θ(k+1) − log p y|Θ(k)

(17). = Ω(k) Θ(k+1) − Ω(k) (Θ(k) )

(18)

(19)

(20) + KL p h, σh2 |y, Θ(k) p h, σh2 |y, Θ(k+1) ≥0. (21). since, by (20), Ω(k) (Θ(k+1) ) = maxΘ Ω(k) (Θ) ≥ Ω(k) (Θ(k) ), and the KL distance KL is nonnegative [39]. Also, notice that we have equality in (21) when Θ(k+1) = Θ(k) , hence, the algorithm has converged to a local maximum point. In (7), we have already found the term in the expectation in (19). Maximizing (19) with respect to σv2 gives the following update at iteration k: σ ˆv2(k) =. 1 y y + z CH z − 2y H z Ty. (22). where H is constructed in the same manner as Z in (8), CH = H H, and by · we mean · p(h,σh2 |y;Θ(k−1) ) . It can be shown that . Th +min(i,j)−1. CH i,j =. mh(l−i+1) mh(l−j+1). l=max(i,j). + Σh(l−i+1,l−j+1). i, j = 1, . . . , Ts .. For α ˆ (k) and βˆ(k) , notice from (7) that Th 1 (k) 2 Ω = α log β −α log σh,t −β −log Γ(α) + C 2 σh,t t=1. where C does not depend on α ˆ (k) and βˆ(k) , and by · , 2 we mean · p(h,σh2 |y;Θ(k−1) ) . Note that log σh,t = log βh,t − ψ(αh ), where ψ(x) = d(log Γ(x))/dx, is also available for the variational Bayes method. A closed-form expression for α ˆ (k) (k) (k) ˆ and β that maximize Ω is not available. However, for a given estimate of α ˆ (k) , we have (k). ˆ Th α βˆ(k) = Th t=1. 1. .. (23). 2 σh,t. Using (23), one can obtain Ω(k) as a function of α only. Fortunately, this function is known to be well behaved and maximization algorithms such as gradient descent [40], or Newton’s method [41], work well to determine α ˆ (k) . In order to favor low values of α supporting sparsity, α may be modeled as an exponential random variable and β may be assigned a gamma conjugate prior. Then, the posterior distributions of α and β are evaluated rather than the estimation of their maximum likely values, and the hyperparameters of their priors may be estimated using a parameter estimation method, most possibly EM. For an example of such a case, see [28]. The general EM procedure applies when we choose the variational Bayes method to approximate p(h, σh2 |y; Θ(k−1) ), because we have a closed form expression for it, and we can evaluate the sufficient statistics needed for maximization. This is also called the variational interpretation of EM [33]. However, if we choose Gibbs sampling for the state inference, then we do not have closed-form expressions but samples. In this case, we perform Monte Carlo-related EM methods as follows [32]. ˆ (0) . • Start with an initial Ω • for k = 1, 2, . . ., — Simulation: Draw Nk samples for (h, σh2 ); ˆ σ ˆ σ (h, ˆh2 )(k,1) , . . . , (h, ˆh2 )(k,Nk ) from p(h, σh2 |y; (k−1) Θ ). ˆ (i) , — Maximization: Compute Θ(i) that maximizes Ω where

(21) Nk (k,n) 1 2 (k) (k−1) ˆ ˆ ˆh Ω = log p y, h, σ |Θ . Nk n=1 Note that, the approximation for any sufficient statistics of (k) (h, σh2 ), Sf = f (h, σh2 ) p(h,σh2 |y;Θ(k−1) ) is

(22) Nk (k,n) 1 (k) 2 ˆ ˆ ˆh Sf = f h, σ . Nk n=1 Constant or increasing Nk are reasonable choices. Another ˆ (k) ˆ (k−1) + νk Ω valid approach is to maximize (1 − νk )Ω for {νk }k≥1 > 0, ν1 = 1, and νk ≤ νk+1 ≤ 1 over Θ [32], which leads to the stochastic approximation

(23) Nk (k,n) 1 (k) (k−1) ˆ σ ˆ 2h Sˆf = (1 − νk )Sˆf + νk f h, . Nk n=1.

(24) YILDIRIM et al.: BAYESIAN DECONVOLUTION APPROACH FOR RECEIVER FUNCTION ANALYSIS. 4157. Fig. 3. Illustration of the relaxation effect of the inverse-gamma model: log-prior and log-posterior (superimposed on the log-prior) are shown for both models with respect to y = 2h1 + h2 + σv v. (y = 3, σ02 = 0.0004, σ12 = 4, λ = 0.03, α = 0.5, β = 0.0001, and σv2 = 0.05). (a) Bernoulli–Gaussian log-prior. (b) Inverse-gamma prior. (c) Bernoulli–Gaussian log-posterior. (d) Inverse-gamma log-posterior.. C. Relaxing Effect of the Inverse-Gamma Prior So far, we have not discussed analytically why we have chosen the inverse-gamma distribution as our prior for the sample 2 variances σh,t yet. The reason for our choice is the inference problems tend to be easier, since inverse-gamma assumption leads to smoother distributions/objective functions. We can see this fact better when we compare our model for sparsity with another well-known one, the Bernoulli–Gaussian model. In this model, ht ’s are assumed as i.i.d. Bernoulli–Gaussian random variables . ht ∼ N ht ; 0, σct. 2. ,. ct ∈ {0, 1}. Pr(ct = 1) = λ. where Pr(A) denotes the probability of the event A. The relaxing effect of the inverse-gamma model is illustrated in Fig. 3. Fig. 3(a) shows the log-prior distribution of [h1 , h2 ], if p(hi ) is a Bernoulli–Gaussian random variable with parameters σ02 = 0.0004, σ12 = 4, and λ = 0.03, whereas, Fig. 3(b) is the log-prior of [h1 , h2 ] when they are normal random variables with their variances coming from the inverse-gamma distribution with parameters α = 0.5, and β = 0.0001. Note that the two priors produce quantitatively similar traces. Fig. 3(c) and (d) show the log-posteriors of [h1 , h2 ] given y, for the Bernoulli–Gaussian and inverse-Gamma models, respectively. Here, the system admits y = 2h1 + h2 + σv v, with y = 3 and σv2 = 0.05. The log-posteriors are illustrated as superimposed. on the log-prior distributions, for the sake of understandability. It can be seen from the bottom figures that, the posterior in the inverse-gamma model is much smoother than the one in the Bernoulli–Gaussian model. Roughly speaking, if we start searching for [h1 , h2 ] those maximize the posterior from a point near to (0, 3), it is much more possible to be stuck in that local maximum in the Bernoulli–Gaussian model. A more detailed comparison between those two models is available in [34]. V. E XPERIMENTS AND R ESULTS A. Experiments With Simulated Data We already stated the disadvantages of the iterative deconvolution method in Section I, and claimed that a Bayesian approach might improve the estimations in the cases where there is noise and the complete convolution signal is not available. Before dealing with real data, we will now support those theoretical arguments with simulation experiments. Recall that one of the disadvantages is that in iterative deconvolution, no observation noise is assumed, which may lead to wrong estimates in the noisy convolution case. However, because of the observation noise assumption, we expect better results from the Bayesian deconvolution. In our first experiment, we aim to compare the performances of iterative deconvolution and Bayesian deconvolution under noise. To do this, we evaluated the normalized mean squared error (nMSE).

(25) 4158. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 12, DECEMBER 2010. Fig. 4. nMSE versus SNR values for iterative deconvolution and Bayesian deconvolution.. Fig. 5. nMSE versus number of truncated samples from the observation signal for iterative deconvolution and Bayesian deconvolution.. values for the two types of deconvolution using different signalto-noise (SNR) ratios. The nMSE for an estimate x ˆ of x is given by (xt − x ˆ t )2 nMSEx (ˆ x) = t 2 (24) t xt By SNR, we mean the ratio obtained by the power of r divided by the power of the observation noise, which is σv2 in our case. In order to make an objective comparison, we generate h using a different model for sparsity, which is the Bernoulli–Gaussian model. We averaged our results over 100 simulations. For each simulation: • We generated a sparse process of length Th = 50 with parameters σ02 = 0.01, σ12 = 1, and λ = 0.05. • The source signal s of length Ts = 50 was generated randomly. • For SNR values −10, −9, . . . , 50: — We generated the observation noise with power corresponding to the current SNR, and add it to the convolution to obtain the noisy convolution y, — We perform Bayesian deconvolution and iterative deconvolution separately for the estimation of r, — For each result, we calculated the nMSE in the estimation of r. Fig. 4 shows the nMSE’s versus the SNR values for both deconvolution types. It can be observed that, the iterative deconvolution method exhibits larger errors while SNR decreases. Bayesian deconvolution suffers from noise too, however, it is still more reliable than iterative deconvolution. Moreover, Bayesian deconvolution performs even better when there is very small, almost no noise. That is due to the fact that in Bayesian deconvolution, we can introduce prior knowledge about h, whereas in iterative deconvolution, the algorithm does not take the statistical properties of h into account. Another disadvantage of the iterative deconvolution is that it cannot handle missing data correctly. The missing data problem in deconvolution usually occurs when the convolution signal is truncated, that is, length of the convolution less than the sum of the lengths of the convolving components. Indeed, we have such a problem in our earthquake data, where the source signal, observation signal and the receiver functions are very close in length, say, 500, 500, and 450, respectively. (The reason for these numbers will be apparent in Section V-B.). Fig. 6. Mean estimates of r and parameter estimations and the likelihood functions through iterations: d = 2, SNR = 5.. This means that almost half of the convolution data is missing. This is a serious problem for iterative deconvolution. However, Bayesian deconvolution method is based on a generative model, and the missing data case can be introduced to the model by suitable choices of the vector lengths s, h, and y. We can show this fact with a simulation experiment, where we obtain nMSE values versus the number of truncated samples in the convolution. The steps and the system parameters of the experiment are the same as the one above, the only difference is that we truncate the observation signal instead of adding noise to it. Fig. 5 shows the result of the experiment. It is obvious that the Bayesian deconvolution keeps estimating h with an acceptable error even if the length of the observation signal is equal to that of the source signal, whereas, in the iterative deconvolution method nMSE gets close to one, even greater than one, while the observation signal is getting shorter. Fig. 6 shows the results for the mean estimate of the sparse trace, and parameter estimations and likelihood functions through iterations both when variational Bayes and Gibbs sampling are used in the E-step of the EM algorithm for a single experiment. The likelihood functions at iteration k are approximated by Monte Carlo integration as follows: We draw particles h(n) from p(h|Θ(k) ) for n = 1, . . . , Nk for some large Nk , evaluate the function f (n) (y) = p(y|h(n) , Θ(k) ), then calculate k (n) log p(y|Θ(k) ) ≈ log(1/Nk ) N (y). It can be observed n=1 f.

(26) YILDIRIM et al.: BAYESIAN DECONVOLUTION APPROACH FOR RECEIVER FUNCTION ANALYSIS. 4159. Fig. 7. Receiver function estimates of iterative deconvolution and Bayesian deconvolution for the station LOD0. (a) Iterative deconvolution. (b) Bayesian deconvolution.. that around 100 EM iterations are enough for convergence in parameter estimates α, β, and σv2 for both methods. It is more difficult to sample from an inverse-gamma distribution when the shape parameter α is smaller, particularly less than one. That is why we observe a slight decrease in the likelihood function for the Gibbs sampling method. Nevertheless, the mean estimates of the sparse trace are too close to be comparable. That is why we only used the variational Bayes technique in the E-step of Bayesian deconvolution in the experiments above. For the same reason, we used only variational Bayes for our experiments on real data. B. Experiments With Real Earthquake Data We have tested the method using data from the National Earthquake Monitoring Center (UDIM) of Bo˘gaziçi University Kandilli Observatory and Earthquake Research Institute. We have chosen three broadband stations: 1) BNN0; 2) LOD0; and 3) KARS, from the west, central and eastern Turkey, respectively. All earthquakes occurred between 2005–2007, at a distance between 30◦ –90◦ and magnitude greater than 5.5 were selected. Horizontal components are rotated to obtain the radial component according to the back-azimuth angle of each earthquake. Recordings with low SNR were eliminated based on the level of polarization of the vertical and radial. components, which is tested using the ratio of eigenvalues of the covariance matrix [15]. This step rejects nearly 30% of the total number of receiver functions and the final number is reduced down to about 150. The receiver functions corresponding to the data were already estimated by Özakın and Aktar [15], who applied iterative deconvolution to the records. We employ Bayesian deconvolution to the data and compare our results with the ones obtained in [15]. The frequency at which the Gaussian filter G(2πf ) has a value of 0.1 is determined as f = 1.2 Hz for this data, which can be realized by selecting a = 2.5 in (4). Then, the corresponding essential width of the Gaussian filter is approximately 1 s. The original data sampling rate is 50 Hz, but since it is known that no information is contained in the frequencies higher than 5 Hz, we resample the data to 5 Hz before applying Bayesian deconvolution. Therefore, the length Tg of the Gaussian filter is chosen 5. For the comparison of the results of iterative deconvolution and Bayesian deconvolution, the results obtained for each station are plotted as follows: First, the records are sorted with respect to their backazimuth values, which are the angles of direction from the source of the earthquake to the receiver [42]. Then, all of the estimated receiver functions belonging to that station are placed in a top-to-down order. For the sake of visuality, the positive and negative parts of each receiver.

(27) 4160. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 12, DECEMBER 2010. Fig. 8. Receiver function estimates of iterative deconvolution and Bayesian deconvolution for the station BNN0. (a) Iterative deconvolution. (b) Bayesian deconvolution.. function estimate are shaded with different colors. Also, all of the receiver estimates are scaled by a constant to increase the visual quality of the images. In fact, there is almost no analytical criteria which can be used to determine and compare the practical quality of our results. In hypothetical situation where all earthquakes occur at a fixed location on Earth, all receiver functions are expected to be equal and all pulses including both the direct ones and the ones generated at boundaries (main pulses as well as reflected ones) are expected to align in a vertical line. However, in real situation where earthquakes are scattered around the globe, the vertical alignment can be observed only for a limited part of the backazimuth range. The deviation from verticality is due to both the nonhorizontality of reflecting layers and also to the variation of the incidence angle with different earthquake distances. The latter one can be removed by applying a correction term to each receiver function since the distance of each earthquake are known with good accuracy. The nonhorizontality of reflective layers produces different reflection arrivals for waves coming from different azimuths. This causes the alignment of pulses in a gently undulating curve instead of a straight line. This undulation is often used as the key for the estimation of the dipping angle for the subsurface layers. It is, therefore, of prime importance to be able to locate with accuracy all coherent pulses in receiver functions. The performance of a deconvolu-. tion technique is therefore directly related to its ability to reveal these coherent pulses. In this paper, we do not attempt to carry a detailed modeling of the crustal structure as would be the case of a standard application of a geophysical survey. Instead, we focus on the comparison the two deconvolution methods by referring to their performance in detecting clear and coherent reflection pulses in the final receiver functions. In Turkey, for the last five recent years, most of the recorded distant earthquakes originated from two given geographical location, the Kurile–Japan subduction (KJS) zone and the Java–Sumatra subduction (JSS) zone, which can be assumed to have a fixed azimuth and a fixed distance. We, therefore, expect that whatever the horizontality of the subsurface layers, the receiver function from any seismic station in Turkey should show vertical alignment close to a straight line in the azimuths corresponding to the two subduction zones above. This a priori knowledge is used as a checkpoint for the coherency of the receiver functions. Figs. 7–9 show the results of the iterative deconvolution and the Bayesian deconvolution methods for the three stations. Almost all receiver functions, independently of the recording station and the method used, show two azimuth ranges where vertical lineament of pulses are clearly observed (receiver functions 10–60 and 90–110). This confirms the two subduction zones, KJ and JS, as being the main supplier of the source.

(28) YILDIRIM et al.: BAYESIAN DECONVOLUTION APPROACH FOR RECEIVER FUNCTION ANALYSIS. 4161. Fig. 9. Receiver function estimates of iterative deconvolution and Bayesian deconvolution for the station KARS. (a) Iterative deconvolution. (b) Bayesian deconvolution.. events and therefore, constitutes a verification for the validity of the receiver functions. Furthermore, the comparison of the two alternative methods for each station, namely the existence of similar pulses in both cases and their position, shows that the results of the Bayesian deconvolution are consistent with the ones of iterative deconvolution. This is an important test for the proposed method since the iterative deconvolution is a well-proven procedure tested with many types of data from different origins. A close observation of the vertical patterns in both methods reveals that the Bayesian deconvolution results have some clear advantages in terms of detection capability. In all the examples, one can see that Bayesian deconvolution results reveal extra vertical lines which are hardly perceptible or even nonexistent in the iterative deconvolution. A clear example of this feature is the receiver functions 30–50 in Fig. 7. The image one on the right hand side, obtained through Bayesian deconvolution, shows a clear vertical alignment of pulses at around 8 s, which is not seen in the one on the left-hand side, obtained by iterative deconvolution. A similar situation can be observed in the receiver function image of the KARS station. The Bayesian deconvolution reveals three primary pulses at 5.5, 12, and 19 s for the waveforms number 30–100. The first two pulses are known as the primary Moho conversion (Ps) and reflection. (PpPs) and are important in the sense that they constitute the main basis for the depth estimation of the crust–mantle boundary [6]. We notice that only the second one (at 12 s) is clearly seen in the iterative deconvolution case while the other two are blurred in noise. We also note that the vertical line patterns in the images of the Bayesian deconvolution are straighter than the ones from the iterative deconvolution. This difference can easily be seen in the figures for the stations LOD0 and KARS. We interpret this as, in low SNR environment, the Bayesian deconvolution is more robust in term of phase estimation of pulses. A corollary of this feature is the improved pulse separation ability of the Bayesian method. This can be clearly illustrated at station BNN0 (Fig. 8), where the first receivers functions 1–120 have a primary conversion (Ps) at 4 s while this conversion jumps to 5.0 s for the latter ones between 120–180. This can possibly be due to the existence of different boundary depths when looking from two different azimuth angles. More interestingly we observe a narrow and intermediate transition zone where both pulses can be seen on a same receiver function (110–120). This corresponds to a crustal transition zone where both boundary coexist. This interesting feature is completely missed by the iterative deconvolution where the resolution capability is too low to detect such details..

(29) 4162. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 12, DECEMBER 2010. VI. C ONCLUSION The overall performance of the Bayesian deconvolution suggests a potential improvement in the deconvolution process for receiver-function estimation. It offers a promising tool for the reevaluation of low SNR data which in turn will not only shorten the waiting period for significant earthquakes, but will also allow the study of certain azimuth angles where only low magnitude events were observed and were therefore discarded till present. The computational load for the implementation of Bayesian deconvolution is many order of magnitude larger (about 104 ) than the conventional methods. This however is not expected to be a serious limitation considering the cost of the alternative which consists of extending the duration of the field deployment. We have stated that we expect to see vertical straight lines of pulses when the receiver functions are aligned in an up-down order with respect to their back azimuth values. Hence, it is easy to come up with the idea that, given a set of receiver functions ordered with respect to their back azimuth values, successive receiver functions should look alike. It would be interesting to exploit this similarity by using our current receiver function estimate in the initialization of the next receiver function estimate. A more principled method to exploit this structure can be based on a Markov chain between the locations of reflections in the successive receiver functions, for a similar approach, see [21] and [22] for reflectivity sequence estimation. The promising results in the Bayesian deconvolution of seismological signals opens the door for further research on receiver function analysis in a Bayesian framework, such as blind deconvolution of receiver functions when the source is not known. Though it is known that classical inference methods fail in accurate estimation [43] when the source is very long, more sophisticated inference techniques such as sequential MonteCarlo samplers [44] can be applied for the solution of this challenging problem. ACKNOWLEDGMENT The authors would like to thank UDIM of Bo˘gaziçi University Kandilli Observatory and Earthquake Research Institute for supplying the earthquake data and to K. Davaslıo˘glu for coding some of the software. R EFERENCES [1] C. J. Ammon, An Overview of Receiver Function Analysis. [Online]. Available: http://eqseis.geosc.psu.edu/cammon/HTML/ RftnDocs/rftn01.html [2] L. J. Burdick and C. A. Langston, “Modeling crustal structure through use of converted phases in teleseismic body-wave forms,” Bull. Seismol. Soc. Amer., vol. 67, no. 3, pp. 677–691, Jun. 1977. [3] C. A. Langston, “Effect of planar dipping structure on source and receiver responses for constant ray parameter,” Bull. Seismol. Soc. Amer., vol. 67, no. 4, pp. 1029–1050, Aug. 1977. [4] R. A. Phinney, “Structure of the Earth’s crust from spectral behavior of long-period body waves,” J. Geophys. Res., vol. 69, no. 14, pp. 2997– 3017, 1964. [5] C. A. Langston, “Structure under Mount Rainier, Washington, inferred from teleseismic body waves,” J. Geophys. Res., vol. 84, no. B9, pp. 4749–4762, 1979. [6] C. Ammon, “A comparison of deconvolution techniques,” Lawrence Livermore Nat. Lab., Livermore, CA, Rep. UCID–ID–111667, 1992. [7] J. Park and V. Levin, “Receiver functions from multiple-taper spectral correlation estimates,” Bull. Seismol. Soc. Amer., vol. 90, no. 6, pp. 1507– 1520, Dec. 2000.. [8] C. J. Ammon, “The isolation of receiver effects from teleseismic P-waveforms,” Bull. Seismol. Soc. Amer., vol. 81, no. 6, pp. 2504–2510, Dec. 1991. [9] M. Bostock and M. Sacchi, “Deconvolution of teleseismic recordings for mantle structure,” Geophys. J. Int., vol. 129, no. 1, pp. 143–152, Apr. 1997. [10] H. Gurrola, G. Baker, and J. Minster, “Simultaneous time-domain deconvolution with application to the computation of receiver functions,” Geophys. J. Int., vol. 120, no. 3, pp. 537–543, Mar. 1995. [11] A. F. Sheehan, G. A. Abers, C. H. Jones, and A. Lerner-Lam, “Crustal thickness variations across the Colorado rocky-mountain from teleseismic receiver functions,” J. Geophys. Res., vol. 100, no. B10, pp. 20 391– 20 404, Oct. 1995. [12] J. P. Ligorría and C. J. Ammon, “Iterative deconvolution and receiverfunction estimation,” Bull. Seismol. Soc. Amer., vol. 89, no. 5, pp. 1395– 1400, Oct. 1999. [13] J. Park and V. Levin, “Receiver functions from regional P waves,” Geophys. J. Int., vol. 147, no. 1, pp. 1–11, Sep. 2001. [14] M. Kikuchi and H. Kanamori, “Inversion of complex body waves,” Bull. Seismol. Soc. Amer., vol. 72, no. 2, pp. 491–506, Apr. 1982. [15] Y. Özakın and M. Aktar, “The Moho topography of Turkey inferred from receiver functions,” in Proc. ESC Gen. Assem., Crete, Greece, 2008. [16] A. Gelman, J. Carlin, H. S. Stern, and D. B. Rubin, Bayesian Data Analysis. London, U.K.: Chapman & Hall, 1995. [17] J. M. Mendel, F. Kormylo, F. Aminzadeh, J. S. Lee, and F. Habibi-Ashrafi, “A novel approach to seismic signal processing and modeling,” Geophysics, vol. 46, no. 10, pp. 1398–1414, Oct. 1981. [18] C. Labat and J. Idier, “Sparse blind deconvolution accounting for timeshift ambiguity,” in Proc. IEEE ICASSP, 2006, pp. 616–619. [19] O. Cappé, A. Doucet, M. Lavielle, and E. Moulines, “Simulation based methods for blind maximum likelihood filter identification,” Signal Process., vol. 73, no. 1/2, pp. 3–25, Jan. 1999. [20] O. Rosec, J. M. Boucher, B. Nsiri, and T. Chonavel, “Blind marine seismic deconvolution using statistical MCMC methods,” IEEE J. Ocean. Eng., vol. 28, no. 3, pp. 502–512, Jul. 2003. [21] J. Idier and Y. Goussard, “Multichannel seismic deconvolution,” IEEE Trans. Geosci. Remote Sens., vol. 31, no. 5, pp. 961–979, Sep. 1993. [22] A. Heimer and I. Cohen, “Multichannel seismic deconvolution using Markov–Bernoulli random field modeling,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 7, pp. 2047–2058, Jul. 2009. [23] B. Nsiri, T. Chonavel, J.-M. Boucher, and H. Nouze, “Blind submarine seismic deconvolution for long source wavelets,” IEEE J. Ocean. Eng., vol. 32, no. 3, pp. 729–743, Jul. 2007. [24] A. Larue, M. Van Der Baan, J. Mars, and C. Jutten, Sparsity or Whiteness: What Criterion to Use for Blind Deconvolution of Seismic Data?, Society of Exploration Geophysicists, France, 2005. [Online]. Available: http://hal.archives-ouvertes.fr/hal-00097184/en/ [25] A. T. Cemgil, C. Fevotte, and S. J. Godsill, “Variational and stochastic inference for Bayesian source separation,” Dig. Signal Process., vol. 17, no. 5, pp. 891–913, Sep. 2007. [26] A. T. Cemgil, C. Fevotte, and S. J. Godsill, “Blind separation of sparse sources using variational EM,” in Proc. 13th Eur. Signal Process. Conf., EURASIP, Antalya, Turkey, 2005. [27] C. Fevotte and S. Godsill, “A Bayesian approach for blind separation of sparse sources,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 6, pp. 2174–2188, Nov. 2006. [28] C. Fevotte, Bayesian Audio Source Separation. New York: SpringerVerlag, 2007, ch. 11, pp. 305–335. [29] D. Tzikas, A. Likas, and N. Galatsanos, “Variational Bayesian blind image deconvolution with student-T priors,” in Proc. IEEE ICIP, 2007, pp. 109–112. [30] M. Lavielle, “A stochastic algorithm for parametric and non-parametric estimation in the case of incomplete data,” Signal Process., vol. 42, no. 1, pp. 3–17, Feb. 1995. [31] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-6, no. 6, pp. 721–741, Nov. 1984. [32] O. Cappé, E. Moulines, and T. Rydén, Inference in Hidden Markov Models. New York: Springer-Verlag, 2005. [33] M. Beal, “Variational algorithms for approximate Bayesian inference,” Ph.D. dissertation, Gatsby Comput. Neurosci. Unit, Univ. College London, London, U.K., 2003. [34] S. Yıldırım, A. T. Cemgil, and A. B. Ertüzün, “A hybrid method for deconvolution of Bernoulli-Gaussian processes,” in Proc. IEEE ICASSP, 2009, pp. 3417–3420. [35] D. S. Everitt, The Cambridge Dictionary of Statistics. Cambridge, U.K.: Cambridge Univ. Press, 2006..

(30) YILDIRIM et al.: BAYESIAN DECONVOLUTION APPROACH FOR RECEIVER FUNCTION ANALYSIS. [36] D. MacKay, Information Theory, Inference and Learning Algorithms. Cambridge, U.K.: Cambridge Univ. Press, 2003. [37] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, Markov Chain Monte Carlo in Practice. London, U.K.: Chapman & Hall, 1996. [38] R. E. Caflisch, Monte Carlo and Quasi-Monte Carlo Methods. Cambridge, U.K.: Cambridge Univ. Press, 1998. [39] T. Cover and J. Thomas, Elements of Information Theory. Hoboken, NJ: Wiley, 1991. [40] J. A. Snyman, Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms. New York: Springer-Verlag, 2005. [41] T. J. Ypma, “Historical development of the Newton-Raphson method,” SIAM Rev., vol. 37, no. 4, pp. 531–551, Dec. 1995. [42] S. Stein and M. Wysession, An Introduction to Seismology, Earthquakes, and Earth Structure. New York: Wiley-Blackwell, 2003. [43] S. Yıldırım, “Bayesian methods for deconvolution of sparse processes,” M.S. thesis, Bo˘gaziçi Univ., Istanbul, Turkey, 2009. [44] P. Del Moral, A. Doucet, and A. Jasra, “Sequential Monte-Carlo samplers,” J. R. Stat. Soc., vol. 68, no. 3, pp. 411–436, Jun. 2006.. Sinan Yıldırım was born in Istanbul, Turkey, on 1984. He received the B.S. and M.S. degrees in electrical and electronics engineering from Bo˘gaziçi University, Istanbul, Turkey. He is currently working toward the Ph.D. degree in the Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, U.K. His current research interests are in the areas of Bayesian inference, Markov chain Monte Carlo methods, and sequential Monte Carlo methods.. A. Taylan Cemgil (M’95) received the B.Sc. and M.Sc. degrees in computer engineering from Bo˘gaziçi University, Istanbul, Turkey and the Ph.D. degree from Radboud University Nijmegen, Nijmegen, the Netherlands, in 2004. He worked as a Postdoctoral Researcher at the University of Amsterdam, Amsterdam, the Netherlands on vision-based multi-object tracking and as a research associate at the Signal Processing and Communications Laboratory, University of Cambridge, Cambridge, U.K., on Bayesian signal processing. He is currently an Assistant Professor with Bo˘gaziçi University, where he cultivates his interests in machine learning methods, stochastic processes and statistical signal processing. He also acts as a consultant to the industry and the professional audio community. His research is focused toward developing computational techniques for audio, music, and multimedia processing.. 4163. Mustafa Aktar was born in 1952 in Istanbul, Turkey. He received the B.Ss. degree in electronics (with honors) from Manchester University Institute of Science and Technology, Manchester, U.K., the M.Sc. and Ph.D. degrees in electrical engineering from Bo˘gaziçi University, Istanbul, Turkey, in 1976, 1978 and 1984, respectively. Since 1996, he is at the Kandilli Observatory and Earthquake Research Institute of Bo˘gaziçi University, where he is currently a Professor and Head with the Department of Geophysics. He has authored and coauthored 26 scientific papers in scientific journals and more than 90 papers in proceedings. His current research interests are in the areas of seismology, in particular source mechanism of earthquakes, earthquake generating processes, analysis of crustal-lithosperic structures, and tectonic processes. Dr. Aktar is a member of American Geophysical Union (AGU), Seismological Society of America (SSA), and Chamber of Geophysical Engineers of Turkey (JMO).. Yaman Özakın was born in Istanbul, Turkey, in 1980. He received the B.S. degree in physics and the M.S. degree in geophysics from Bo˘gaziçi University, Istanbul, Turkey. He is currently working toward the Ph.D. degree in geophysics in University of Southern California, Los Angeles. His current research interests are in the areas of seismology and artificial neural networks.. Ay¸sın Ertüzün (M’94) was born in Salihli, Turkey in 1959. She received the B.S. degree (with honors) from Bo˘gaziçi University, Istanbul, Turkey, in 1981, the M.Eng. degree (with first class standing) from McMaster University, Hamilton, ON, Canada, in 1984 and the Ph.D. degree from Bo˘gaziçi University, Istanbul, Turkey, in 1989, all in electrical engineering. Since 1988, she has been with the Department of Electrical and Electronic Engineering, Bo˘gaziçi University, where she is currently a Professor of electrical and electronic engineering. She has authored and coauthored more than 70 scientific papers in journals and conference proceedings. Her current research interests are in the areas of independent component analysis and its applications, blind signal processing, Bayesian methods, wavelets and adaptive systems with applications to communication systems, image processing, and texture analysis. Dr. Ertüzün is a member of IEEE Signal Processing and Communication Societies, International Association of Pattern Recognition (IAPR), The Institute of Electronics, Information and Communication Engineers (IEICE) and Turkish Pattern Recognition and Image Processing Society (TOTIAD)..

(31)

Abstract&mdash;In this paper, we propose a Bayesian methodology for receiver function analysis, a key tool in determining the deep structure of the Earth&rsquo;s crust

Abstract—In this paper, we propose a Bayesian methodology for receiver function analysis, a key tool in determining the deep structure of the Earth’s crust