O B E , A T C , H E * clusters foropticalcharacterizationofnanoparticle ApproximateBayesiancomputationtechniques

(1)

1

Approximate Bayesian computation techniques

2

for optical characterization of nanoparticle

3

clusters

4 OZAN BURAK ERICOK,¹ ALI TAYLAN CEMGIL,² AND HAKAN ERTURK^1,*

5 ¹Department of Mechanical Engineering, Bogazici University, 34342 Bebek, Istanbul, Turkey 6 ²Department of Computer Engineering, Bogazici University, 34342 Bebek, Istanbul, Turkey 7 *Corresponding author: [email protected]

8 Received 8 May 2017; revised 6 November 2017; accepted 10 November 2017; posted 10 November 2017 (Doc. ID 295428);

9 published 0 MONTH 0000

10 Characterization of nanoparticle aggregates from observed scattered light leads to a highly complex inverse 11 problem. Even the forward model is so complex that it prohibits the use of classical likelihood-based inference 12 methods. In this study, we compare four so-called likelihood-free methods based on approximate Bayesian com- 13 putation (ABC) that requires only numeric simulation of the forward model without the need of evaluating a

14 likelihood. In particular, rejection, Markov chain Monte Carlo, population Monte Carlo, and adaptive popu-

15 lation Monte Carlo (APMC) are compared in terms of accuracy. In the current model, we assume that the nano-

16 particle aggregates are mutually well separated and made up of particles of same size. Filippov’s particle-cluster

17 algorithm is used to generate aggregates, and discrete dipole approximation is used to estimate scattering behav- 18 ior. It is found that the APMC algorithm is superior to others in terms of time and acceptance rates, although all 19 algorithms produce similar posterior distributions. Using ABC techniques and utilizing unpolarized light 20 experiments at 266 nm wavelength, characterization of soot aggregates is performed with less than 2 nm deviation 21 in nanoparticle radius and 3–4 deviation in number of nanoparticles forming the monodisperse aggregates.

OCIS codes: (000.5490) Probability theory, stochastic processes, and statistics; (290.0290) Scattering; (290.3200) Inverse 24 scattering; (160.4236) Nanomaterials.

25

https://doi.org/10.1364/JOSAA.99.099999

26 1. INTRODUCTION

27 Developments in nanotechnology have brought new opportu- 28 nities to various fields from medicine [1] to combustion 29 diagnosis [2,3]. Synthesizing nanostructures with unique opti- 30 cal and physical properties necessitates new and improved 31 characterization tools, as manufacturing unique structures 32 with desired properties is only possible through using proper 33 characterization.

34 There are different techniques for characterization of the

35 size, shape, and configuration of the nanostructures.

36 Although the direct methods relying on microscopy such as

37 transmission electron microscopy (TEM), scanning electron 38 microscopy (SEM), and atomic force microscopy (AFM) are 39 capable of producing high-quality images and are considered

40 as the most reliable means of characterization, they usually

41 are not suitable for in-line monitoring. Recently, characteriza-

42 tion of the nanostructures based on their emission profiles

43 via time-resolved laser-induced incandescence (TR-LII) and

scattering profiles via light scattering have been gaining interest. 44

While these methods have in situ characterization capability, 45

the light-scattering methods might be preferable over 46

TR-LII methods, as they are not only limited to absorbing 47

materials. 48

There are different methods to predict scattering behavior of 49

aggregates in the literature. Rayleigh–Debye–Gans (RDG) 50

approximation [4] and RDG for fractal aggregates (RDG- 51

FA) [5,6], are often used when the scatterer is irregularly 52

shaped and smaller than the wavelength. RDG-FA is valid if 53

jm − 1j ≪ 1 and kjm − 1jdp≪ 1, where m, k, and dp are 54

the refractive index, wavenumber, and primary particle diam- 55

eter of aggregate. These approximations are often used to esti- 56

mate the optical properties of combustion-generated particles, 57

as in [7] and [8]. However, Klusek et al. [9] stated that soot 58

aggregates cannot be treated as Rayleigh scatterers and Köylü 59

and Faeth [10] stated that RDG-FA theory cannot be used for 60

soot aggregates with large refractive index. Besides these, they 61

1

2

(2)

62 neglect the multiple and self-induced scattering effects and the

63 error in angular scattering properties such as S₁₁and S₁₂, which

64 can be as high as 50% for large aggregates [11]. The T-matrix

65 method [12,13,14], on the other hand, is a fast and accurate

66 method to model scattering of light by aggregates. Although

67 there is no limitation on primary particle size, the T-matrix

68 method is limited to nonoverlapping spheres. In reality, aggre-

69 gates may overlap, and overlapping has an impact on scattering

70 properties [15]. Discrete dipole approximation (DDA), pro-

71 posed by Purcell and Pennypacker [16], is another method that

72 is often used in the literature. DDA is a semianalytical volume

73 integral method that can handle the most complex geometries

74 by representing them with a finite set of interacting dipoles.

75 Through the interactions of these dipoles, DDA can easily cal-

76 culate the optical behavior of any geometry with any morphol- 77 ogy. Though it can handle very complex geometries accurately,

78 it is slow compared to approximate methods [17].

79 Applications of light-scattering methods for soot characteri- 80 zation is a well-studied problem in the literature. RDG-FA 81 theory is one of the most frequently used light-scattering meth- 82 ods. For example, Reimannet al. [18] combined elastic light- 83 scattering techniques with LII, whereas Huberet al. [19] com-

84 bined wide angle light-scattering methods with the TR-LII

85 method to estimate the primary particle radius and radius of

86 gyration in the aggregate. Using the Mueller matrix for char-

87 acterization of nanoparticle aggregates is another approach.

88 Mengüç and Manickavasagam [20] outlined a procedure to ob- 89 tain the number of nanoparticles in the aggregate using polari- 90 zation information stored in elements such as S₁₂, S₃₃, and S₃₄ 91 if the primary particle diameter is known via anex situ mea- 92 surement such as SEM. In different studies, Charnigo et al.

93 used the normalized versions and derivatives of S₁₁, S₁₂,

94 S₃₃, and S₃₄to estimate the agglomeration level [21] and nano-

95 particle diameter [22]. The characterization problem is also for-

96 mulated as a parameter estimation problem of identifying the

97 average size and the number of nanoparticles forming the 98 aggregate that can be estimated through the solution of the op- 99 timization problem. More recently, Ericok and Erturk [23]

100 used only the nonpolarization information, S₁₁, to estimate 101 the number and radius of the nanoparticles in the aggregate 102 using the tabu search algorithm, while their analysis showed 103 that characterization is possible for aggregates larger than

104 20 nm using a light source at 266 nm. However, in all of these

105 deterministic methods, it is not possible to validate the results,

106 since the analysis focuses on matching the measured scattering

107 behavior rather than the characterized parameters directly.

108 Therefore, presenting the range of existence of the parameters 109 with a desired confidence level would be more appropriate for 110 characterization unless an independent experiment is designed 111 to measure these parameters directly.

112 Statistical inversion based on Bayesian inference is a prob- 113 abilistic approach that is capable of predicting the probability

114 distribution of the unknown parameters based on measure-

115 ments. The basic idea of the Bayesian inference relies on up-

116 dating the information about the unknown parameters based

117 on observations. Once the final probability distribution of 118 the unknown parameters is obtained, credible intervals with 119 a desired confidence level can be constructed.

The Bayesian inference approach is relatively new for 120 121 nanoparticle characterization. Burr et al. [24] used Bayesian inference to recover soot aggregate size distribution from 122 multiangle elastic light-scattering data using maximum 123 a posteriori (MAP) inference. Sipkens et al. [25] used the 124

TR-LII method to characterize gas-borne silicon nanoparticles 125

and analyzed their measurements with the Bayesian approach 126

to find the most probable nanoparticle size distribution param- 127

eters. Oteroet al. [26] studied the potentials of parametric and 128 nonparametric Bayesian schemes on the inverse problems of 129 light scattering. In another study, Clementi et al. [27] used 130 the Bayesian inversion method based on multidirectional 131 dynamic light-scattering measurements to estimate the particle 132 size distribution of latexes. Charnigoet al. [28] used different 133 elements of the Mueller matrix and their derivatives to 134

construct credible intervals for nanoparticle diameter and 135

agglomeration level. They introduced both systematic and sto- 136

chastic error in their analysis and utilized polarization informa- 137

tion via normalized S₁₁, S₁₂, S₃₃, and S₃₄for different cases. In 138 a more recent work, Hadwinet al. [29] used Bayesian inference 139 to calculate the MAP and credible intervals of soot volume frac- 140

tion and peak temperature for different likelihoods and prior 141

densities, whereas Huber et al. [30] used Bayesian inference 142

to estimate the size and morphology parameters of soot-laden 143

aerosols. Ericok and Erturk [31] used Bayesian inference for 144

characterization of the radius and number of nanoparticles 145

of the soot aggregates with different credible interval levels. 146

Computing the likelihood function is vital for Bayesian in- 147

ference. However, it is not always available in a closed analytical 148

form or feasible to compute due to complex physical models 149

that require too many parameters. In general, there are three 150

ways to overcome this problem: (1) nonparametric estimation 151

of the likelihood. In this method, one can run M auxiliary ex- 152

periments based on the parameters that affect the outcome. The 153

auxiliary experiments must be run at discrete points for a 154

single parameter, and intermediate points can be interpolated. 155

However, if the number of parameters is not small, this method 156

suffers from what is known as the “curse of dimensionality.” 157

(2) Parametric estimation of likelihood. One can assume an 158

analytical form, e.g., Gaussian or Poisson distributions, for like- 159

lihood. Although this method is very practical, it may misrep- 160

resent the physical reality. (3) Parametric estimation after 161

dimension reduction. This method does not use the whole out- 162

come of the system, but only its sufficient statistics. If a suffi- 163

cient statistics such as mean, standard deviation, etc. exists that 164

represents the outcome well, one may reduce the number of 165

unknowns and use parametric estimation on the reduced 166

unknowns. More information can be found in [28]. 167

One alternative way to cope with this problem is not using 168

likelihood altogether. If likelihood is not available or cannot be 169

obtained by the methods described above, Bayesian inference is 170

performed via what is known as approximate Bayesian compu- 171

tation (ABC) techniques. ABC consists of methods that do not 172

require the calculation of the likelihood function to obtain the 173

posterior distribution. The idea is to sample points from the 174

posterior distribution directly, if a model exists that relates 175

the parameters of interest to observable quantities. While 176

ABC methods are widely used in different fields such as 177

(3)

178 genetics [32], epidemiology [33], cosmology [34], and signal

179 detection theory [35], etc., they have not been used for solution 180 of inverse radiative transfer and optical characterization 181 problems.

182 In radiative transfer and optical characterization, Bayesian 183 inference methods are generally performed by assuming an ana-

184 lytical form to the likelihood function as in [24–31]. However,

185 most of the problems considered are complex, where the

186 physical behavior of the system cannot be represented by an

187 analytical expression, necessitating the use of numerical mod- 188 eling. For the analysis of such problems, using ABC methods is 189 a more reasonable approach. In this study, some of the ABC 190 methods will be introduced (for the first time, to our best 191 knowledge) and applied rigorously for characterization of nano- 192 particle aggregates to demonstrate the strengths and limitations 193 of the proposed methods.

194 2. ABC

195 The very fabric of the statistical inversion is to reconstruct the

196 solution of the inverse problem in terms of probability distri-

197 butions. The inverse problems generally comprise identifying 198 the parameters of interest that are related to the measured 199 quantities through a physical model. Likewise, the statistical 200 inversion theory takes all the information about the system that 201 is known to the observer prior to the measurement, as well as 202 the measurement itself, and estimates the probability distribu- 203 tions of the parameters of interest. Therefore, the solution of

204 the inverse problem is not a single set of parameters, but a prob-

205 ability distribution referred to as the posterior probability

206 distribution. The fundamental principles and some details of

207 the concepts can be found in [36] and [37].

208 In Bayesian statistical inference, all the variables are treated 209 as random variables, which are denoted by capital letters. The 210 observable random variable Y ∈ R^m, the measurement, and the 211 unobservable random variable Θ ∈ Rⁿ, the parameters of 212 interest, are related to each other by

Y f θ; E; (1)

213 where E ∈ R^k is the noise.

214 The conditional probability of θ given Y y_observed,

215 πθjy_observed, is the solution of the statistical inverse problem,

216 as it represents the probability distribution ofΘ θ based on

217 the measurement y_observed

πpostθ πθjyobserved R πprθπy_observedjθ

Rⁿπprθπy_observedjθdθ; (2) 218 whereπprθ is the prior density that contains all the knowledge 219 aboutθ before the measurement. The conditional probability 220 of random variable Y given Θ θ, πyjθ, is called the 221 likelihood function.

222 Equation (2) suggests that the posterior probability distribu- 223 tion,πpostθ, can easily be obtained if the analytical forms of

224 both the prior and the likelihood are available. The easiest part

225 in Eq. (2) is the prior density, since it is either known or

226 assumed by the researcher apart from the model. However, rep-

227 resenting the likelihood analytically can be impossible for some 228 cases, as in most of the computational or simulation-based 229 models. In these cases, the standard methods of Bayesian

estimation cannot be applied. As stated earlier, ABC methods 230 overcome this problem by not calculating the likelihood func- 231 tion but by sampling enough points from the posterior distri- 232 bution itself. In this study, four ABC algorithms–rejection 233 sampler, Markov chain Monte Carlo (MCMC), population 234

Monte Carlo (PMC) and adaptive population Monte Carlo 235

(APMC)—are considered. 236

237 A. Rejection Sampler

The first ABC algorithm introduced by Tavaréet al. [38] was a 238 simple rejection algorithm to estimate the common ancestor 239 times for DNA sequence data. Tavaré’s idea was extended 240

by Pritchardet al. [39]. First, a point, θ, is generated from 241

the prior density. Then, the artificial data set, x, associated with 242

the generated pointθis created using the simulation model, 243

f . They proposed that if the generated data set and the actual 244

data set are close enough,ρx; y < ϵ, then the point is sampled 245

from the posterior. In this inequality,ρ is a scalar metric, and ϵ 246

is the user-defined tolerance. If these steps are repeated until N 247

points are drawn from the posterior, a good approximate of the 248

posterior can be obtained without calculating the likelihood 249

function. Algorithm 1, presented below, is the rejection 250

algorithm proposed by Pritchard. Although it is simple to 251

implement, the rejection algorithm suffers from high rejection 252

rates—in other words, low acceptance rates of sampled 253

points. 254

256 Algorithm 1: ABC rejection sampler

257258 1: count 0

2: for i 1…N do 259

260 3: whileρx; y > ϵ do

261 4: count count 1

262 5: generateθ∼ πprθ

6: simulate x∼ f xjθ 263 7: storeθi θ 264

8: r_accept N ∕count 265

266 9: return.

B. MCMC 267

The main disadvantage of the ABC rejection sampler algorithm 268

is the high rejection rates caused by sampling points from the 269

prior distribution. Rejection sampler algorithm samples the 270

new point without using any information from the previously 271

sampled points. However, it is possible, in principle, to reduce 272

the rejection rates by utilizing the information from the previ- 273

ously sampled points. The MCMC algorithm samples a new 274

point by making a jump from the current point with the 275

assumption that the new point will possibly be near the last 276

sampled point. Thus, it eliminates the complete randomness 277

and makes use of available information. 278

The MCMC algorithm is similar to the Metropolis–Hasting 279

algorithm. The only difference is that there are no likelihood 280 calculations involved in MCMC. Like the Metropolis–Hasting 281

282 algorithm, MCMC starts with an initial point,θ⁰, sampled from the prior. Then, the new points are generated by a tran- 283 sition kernel, q, which relates the point from the previous MC 284

step,θⁱ⁻¹, to the new one,θ. Data set x is generated by using 285

θ, andρx; y is calculated. If ρx; y > ϵ, then θⁱ⁻¹is a better 286

(4)

287 approximation than θ; therefore no replacement is made.

288 Otherwise,θis accepted with probability, h, where h is a uni- 289 formly distributed random number between 0 and 1, U 0; 1.

290 It should be noted that the MCMC algorithm only uses the 291 last sampled point, not the ones that are sampled earlier.

292 Algorithm 2, given below, is the MCMC algorithm proposed 293 by Marjoramet al. [40]. More detailed discussions about the

294 algorithm can be found in [41] and [42].

296 Algorithm 2: ABC MCMC 297298 accepted 0

299 2: generateθ⁰∼ πprθ

300 for i 1…N do

301 4: generateθ∼ qθjθⁱ⁻¹;Σ

302 simulate x∼ f xjθ 303 6: ifρx; y > ϵ then

304 θⁱ θⁱ⁻¹

305 8: else

306 calculate h _π^π^pr^θ

prθⁱ⁻¹×^qθ_qθⁱ⁻¹jθⁱ⁻¹^jθ^;Σ;Σ

307 10: if U 0; 1 < h then

308 accepted accepted 1

309 12: storeθⁱ θ

310 else

311 14: storeθⁱ θⁱ⁻¹

312 raccept accepted∕N

313 16: return

314 Although the acceptance rates of MCMC are higher than

315 those of the rejection algorithm, there are several disadvantages.

316 First, sampled points are highly correlated to each other due to

317 a transition kernel. This is not a major drawback unless 318 noncorrelated points are needed. Second, smallϵ together with 319 high correlation property might cause MCMC to stack in a low 320 probability region, resulting in a harder transition from one 321 point to another. Third, a covariance matrix should be defined 322 by the usera priori to the algorithm. This has a great impact on 323 the performance of MCMC, since the covariance matrix

324 determines the nature of the transition kernel. Finally, it is hard

325 to parallelize MCMC chains [43].

326 C. PMC

327 Using the points that are successfully sampled previously and 328 adjusting the tolerance level ϵ are two good ways to increase 329 the acceptance rate. The partial rejection control (PRC) 330 algorithm that Sisson et al. [44] proposed uses weighted 331 resampling from the points sampled previously and a decreas- 332 ing tolerance level in each iteration. Beaumont et al. [45]

333 corrected the bias noted in the original algorithm proposed

334 by Sisson et al. and suggested the PMC algorithm.

335 Algorithm 3, given below, is the PMC algorithm suggested

336 by Beaumont et al.

337 PMC divides the calculations into t T iterations and the 338 corresponding tolerance levels,ϵt’s, are defined by the user with 339 ϵ1>ϵt>ϵT. Thus, more points are accepted at the begin-

340 ning, since a relatively relaxed tolerance is applied. The main

341 advantage of PMC is that it samples new points using the

342 points that represent the posterior well, which results in

343 the elimination of the points that represent the posterior

344 poorly.

Algorithm 3: ABC PMC 346

347348 count 0

t 1 349

350 3: for i 1…N do

351 whileρx; y < ϵ1do

352 count count 1

353 6: drawθ^ti ∼ πprθ

simulate x∼ f xjθi 354

355 let w^t_i 1∕N

9: letΣ twice the empirical covariance of fθ^ti g_i1;2;…N 356

357 for t 2…T do

358 for i 1…N do

359 12: whileρx; y < ϵt do

count count 1 360

pickθi ∼ θ^t−1j with probabilities w^t−1_j 361 15: sampleθ^ti ∼ qθ^ti jθi;Σ 362

simulate x∼ f xjθ^ti 363

calculate and normalize w^t_i ∝P ^π^pr^θ^tⁱ 364

jw^t−1_j qθ^t_i jθ^t−1_j ;Σ

18: letΣ twice the empirical covariance of fθ^ti g_i1;2;…N 365 r_accept N ∕count 366

return. 367

At t 1, the algorithm samples N points satisfying 368

ρx; y < ϵ1 from the prior distribution and assigns a weight, 369

w¹_i , to each of them. Thus, the algorithm has a set of points, 370

fθ¹i g_i1;2;…N, whose elements satisfy the corresponding 371

tolerance level. The set at iteration t, fθ^ti g_i1;2;…N, is formed 372

by selecting a point,θi, from the previous set fθ^t−1i g_i1;2;…N 373

with probability w^t−1_i and sampling a new point,θ^ti , by using 374

a transition kernel, qθ^ti jθi;Σ. Unlike MCMC, the covari- 375

ance matrix at each iteration,Σ, is calculated by taking twice 376

the empirical covariance of the set fθ^t−1i g_i1;2;…N. 377

The major drawback of PMC is the user-defined tolerance 378

levels. The user may define the tolerances too sharply and too 379

shallowly, which leads to an increase or decrease in the 380 computation time required. 381

382 D. APMC

The problem of user-defined tolerances is solved by defining 383 online tolerance at each step as suggested by Wegmannet al. 384

[46], Drovandi and Pettitt [47], and Del Moralet al. [48]. The 385

APMC algorithm proposed by Lenormand et al. [49] is 386

designed to overcome this problem and is based on the basic 387

principles of these works and that of Beaumontet al. 388 APMC goes a step further than PMC by adjusting tolerance 389 levels adaptively. Algorithm 4 requires three parameters: N ,α, 390 and p_acc 391

min, where N is the total number of points that will be sampled,α ∈ 0; 1 is the ratio of points that will be kept in 392 each iteration to the total number of points, and p_acc 393

min is the minimal acceptance rate. The number of points to keep 394

in each iteration, N_α, isαN . 395

At t 1, the algorithm draws N points from the prior dis- 396

tribution and calculates theρ⁰i and sets w⁰_i 1. Then, the 397

first tolerance level, ϵ1, is taken as the first α quantile of 398 fρ⁰i g_i1;2;…N. After the first tolerance level is determined, 399 APMC constructs the new set, fθ¹i ; w¹_i ;ρ¹i g, from the ele- 400

ments of the old set, whereρ⁰i is smaller thanϵ1. Thus, a total 401

3

(5)

402 of N_αelements are obtained at the first iteration. Since the new

403 set consists of the points with the best metrics, ρ¹i , the

404 covariance matrix is taken as twice the empirical covariance

405 of this set. At this point, the rest of the iterations start by setting

406 p_acc to 1.

407 From this point, the algorithm tries to find the best N − Nα 408 points to complete the set at each iteration. At iterations t > 1,

409 a new point is generated by drawing a point from the transition

410 kernel, qθ^t−1i jθi;Σ, around the point θi, which is picked

411 from the previous set,θ^t−1j , with normalized weight (i.e., prob-

412 abilities), where j 1; 2; …N_α. After the new point is gener-

413 ated, the data set is simulated, and the corresponding metric

414 and the weight of the old point, θ^t−1i , and the point itself,

415 are replaced with the new one.

416 When the process of replacing old points with new ones is

417 finished, APMC calculates the new tolerance level for the next

418 iteration as the firstα quantile of fρ^t−1i g_i1;2;…N and deter-

419 mines the minimal acceptance rate, p_acc. The algorithm repeats

420 these steps until p_acc ≤ p_acc_min.

421 The main advantage of this algorithm is that the tolerance

422 levels are calculated and updated in each step, resulting in

423 utilization of more optimized tolerance levels.

425 Algorithm 4: ABC APMC 426427 count 0

428 N_α αN 429 for t 1 do 430 4: for i 1…N do 431 count count 1 432 drawθ⁰i ∼ πprθ

433 simulate x∼ f xjθ⁰i 434 8: calculateρ⁰i ρx; y

435 let w⁰_i 1

436 defineϵ1 Q_ρ0α the first α-quantile of fρ⁰i g_i1;2;…N 437 fθ¹i ; w¹_i ;ρ¹i g fθ⁰i ; w⁰_i ;ρ⁰i jρ⁰i ≤ ϵ1g_i1;2;…N 438 12: letΣ twice the empirical covariance of fθ¹i g_i1;2;…N_α 439 define p_acc 1

440 while p_acc> p_acc_min do 441 t t 1

442 16: for i N_α 1…N do 443 count count 1

444 pickθi ∼ θ^t−1j with probability fw^t−1_j g_j1…N_α 445 drawθ^t−1i ∼ qθ^t−1i jθi;Σ

446 20: simulate x∼ f xjθ^t−1i

447 calculate and updateρ^t−1i ρx; y

448 update w^t−1_i ∝P ^π^pr^θ^t−1ⁱ

jw^t−1_j qθ^t−1_i jθ^t−1_j ;Σ j 1…Nα 449 calculate and update p_acc_N−N¹

α

P_N

kN_α11_ρ^t−1

i <ϵt−1

450 24: letϵt Q_ρt−1α the first α-quantile of fρ^t−1i g_i1;2;…N 451 fθ^ti ; w^t_i ;ρ^ti g fθ^t−1i ; w^t−1_i ;ρ^t−1i jρ^t−1i ≤ ϵtg_i1…N 452 letΣ twice the empirical covariance of fθ^ti g_i1;2;…N_α 453 r_accept N_α∕count

454 28: return.

455 3. OPTICAL SCATTERING

456 Stokes showed that four elements,K IQU V ^Tare enough

457 to fully describe the intensity and polarization state of an

458 electromagnetic wave, where I , Q, U , and V represent the total

459 intensity, the difference between horizontally and vertically

polarized intensities, the difference between 45° and −45° 460

polarized intensities, and the difference between the right- 461

handed and left-handed circularly polarized intensities, 462

respectively, with I² Q² U² V² [50]. 463

Incident,Ki, and scattered,Ks, intensities are related to each 464

other by a 4 × 4 matrix known as the Mueller matrix, Sij, by 465

Ks S_ijKi. The Mueller matrix is the complete representa- 466

tion of the scattering event. Therefore, it can be used for 467

characterization of nanoparticles. In this work, mainly the first 468

element of the Mueller matrix, S₁₁, is used for characterization 469

purposes. S₁₁ is known as the differential scattering cross 470

section and it relates the incident total intensity, Ii, to scattered 471

total intensity, Is, if unpolarized light is incident upon the 472

particle. 473

4. REPRESENTATION OF AGGREGATES 474

It is required to represent the structure of the nanoparticle 475

aggregates, since real-time light-scattering experiments are re- 476

placed with numerical experiments in this study. The aggregates 477

are mathematically modeled with the fractal equation, 478

Np kf

Rg

rp

_D_f

; (3)

where Npis the number of particles in the aggregate, rpis the 479

primary particle radius, Rgis the radius of gyration, and Df and 480

kf are the fractal dimension and fractal prefactor, respectively. 481

There are different techniques to generate fractal-like aggre- 482

gates such as ballistic aggregation, diffusion-limited aggregation 483

(DLA) and reaction-limited aggregation (RLA) [51]. The most 484

commonly used one is Filippov’s sequential algorithm [52]. 485

Filippov’s algorithm can be implemented by two alternative 486

approaches. The sequential algorithm is a particle-cluster 487

aggregation method that adds spherical particles to each other 488

one by one, while satisfying the fractal equation at every step. 489

Generation of fractal-like aggregates is also possible with 490

cluster-cluster aggregation. This method first generates small 491

aggregates and adds them to each other to form a larger aggre- 492

gate. However, the particle-cluster algorithm is preferred to the 493

cluster-cluster algorithm due to its implementation and tuning. 494

Moreover, Filippov’s algorithm was originally designed to 495

generate monodisperse aggregates; it is easy to implement 496

polydispersity, as illustrated in [53]. 497

5. PROBLEM STATEMENT 498

Characterization of soot aggregates using ABC techniques is 499

presented in this study to illustrate the strengths and weak- 500

nesses of the proposed ABC algorithms via numerical light- 501

scattering experiments. It must be noted that although this 502

work focuses on a specific problem, ABC methods are flexible 503

enough to be applied to characterization of different nanopar- 504

ticle aggregates. 505

This work considers dilute suspensions of identical, but ran- 506

domly oriented soot aggregates that are well separated from 507

each other. The total scattering behavior of the whole measure- 508

ment volume comprises the scattering behavior of individual 509

aggregates and can be approximated by the scattering behavior 510

of a single aggregate averaged over many orientations [17,54]. 511

Orientation averaging compensates for the effects of random 512

(6)

513 orientations of many aggregates as well as movement of these

514 aggregates during the measurement time. Both monodisperse

515 and polydisperse aggregates with log-normal particle size

516 distribution of nonoverlapping spheres are considered.

517 The fractal-like aggregates are generated by Filippov’s

518 algorithm [52], and the implementation of Filippov’s

519 particle-cluster algorithm given by Skorupski et al. [53] is

520 followed in this work. The particle-cluster algorithm relies

521 on random numbers that affect the aggregate geometry.

522 Therefore, different geometries will be obtained at each time

523 when the particle-cluster algorithm is executed with exactly

524 the same parameters. Finally, although the particle-cluster algo-

525 rithm fails to produce the expected slope of pair-correlation

526 function [52,53], the particle-cluster algorithm is preferred,

527 since the generated aggregates would be more similar to

528 aggregates in real ensembles.

529 The values of the parameters Df and kf are assumed to be

530 knowna priori following the results of two independent experi-

531 ments conducted by Yonet al. [55] and taken as 1.81 and 1.37,

532 respectively, as inputs to the particle-cluster algorithm. The

533 wavelength of the incident light is considered as 266 nm, which

534 lies in the wavelength range of a commercial device [56], as

535 suggested by [20] in all of the calculations, and the optical

536 properties of soot at 266 nm are adopted from [55].

537 Tianet al. [57] showed experimentally that the radius of the

538 primary particles of soot aggregates is in the range of 6–25 nm,

539 and the probability of aggregates with Np< 30 is higher than

540 those with Np> 30. Moreover, in our previous study [23], it

541 was shown that the characterization of N_p and r_p highly de-

542 pends on the effective radius, r_eff, given in Eq. (4). When

543 r_eff is smaller than 20 nm, the deviations in estimated values

544 for r_pand N_pbegin to increase, since order of magnitude of S₁₁

545 at 266 nm becomes comparable with that of the uncertainty of

546 the commercial device. Therefore, we have selected test cases

547 according to these observations. Table 1 shows the cases

548 selected to test the proposed methods. The effective radius

549 is defined as

r_eff X^N^p

i1

r³_i

!_1∕3

: (4)

550 6. SOLUTION METHODOLOGY

551 Obtaining the scattering behavior of a given aggregate with

552 known geometry and morphology requires the solution of

553 the direct problem by solving Maxwell’s equations. Since an

554 analytical solution of the Maxwell’s equation for a complex

555 geometry such as an aggregate is not available, use of numerical

556 methods is required. In this work, Fortran implementation of

557 DDA and DDSCAT, developed by Draine and Flatau [58], is

558 used for the solution of the direct problem.

559 A. Synthetic Measurements

560 Since the algorithms considered are evaluated by the numerical

561 experiments, the actual experimental measurements are

562 replaced with synthetic measurements that are obtained by add-

563 ing random measurement error to the simulated scattering

564 behavior of known aggregates. The scattering behavior of

the entire measurement volume is approximated by the orien- 565

tation-averaged scattering profile of a single aggregate, and 566

490 orientations are considered as recommended in [20]. 567

The effects of the randomness in Filippov’s algorithm should 568

be taken into account while constructing the synthetic mea- 569

surements. Every fractal geometry generated with exactly the 570

same parameters will be different due to the randomness of 571

the algorithm. Therefore, 10 different fractal geometries are 572

created for monodisperse cases (Cases 1–6), presented in 573

Table1. Moreover, the measurement error may also have some 574

impact on the final results. Therefore, 10 different random er- 575

ror vectors are introduced to the scattering profile of each 576

geometry created for each case. The measurement error is as- 577

sumed to have a normal distribution with zero mean (μ 0) 578

and a known standard deviation (σ 10⁻⁴), considering the 579

uncertainty of a commercial device [56]. Therefore, the syn- 580

thetic measurements are created according to the formula below 581

y_i;j S_11;iN_p; r_p;σr e_i;jμ 0; σ 10⁻⁴; (5) where i and j are the indices of geometry and error, respectively. 582

As a result, 100 different synthetic measurements (10 different 583

measurement errors to each of 10 geometries) are obtained with 584

Eq. (5) for each monodisperse case. 585

The synthetic measurement for a polydisperse case (Case 7) 586

is created in a slightly different way. Since the primary particle 587

radii vary significantly among aggregate realizations generated 588

with the same fractal parameters, the scattering behavior of the 589

ensemble cannot be assumed as the scattering behavior of a sin- 590

gle aggregate. Therefore, an average over aggregate realizations 591

is also required for Case 7. Although our preliminary analysis 592

showed that taking an average over 40 realizations is sufficient 593

to represent the scattering behavior of the ensemble, 100 ag- 594

gregate realizations are created. After random error vectors 595

are added to the scattering behavior of these realizations, the 596

synthetic measurement is created by taking the average of these 597

100 simulated measurements. 598

The physical limitations of the actual experimental setup 599

must be considered while creating synthetic measurements. 600

The measurements at ϕ 0° and ϕ 180° are not possible 601

in the same plane. Moreover, measurements near these angles 602

are highly disturbed by the incident light. Considering these 603

limitations, the initial and final polar angles are taken as 604

ϕi 5° and ϕf 165°, respectively. In our previous study 605

[23], it was observed that the increment between these angles 606

may not affect the results of characterization much if 607

Δϕ < 16°. Therefore, the scattering profiles of every parameter 608

set contain measurements between ϕi 5° and ϕf 165° 609

withΔϕ 8°. 610

Table 1. Cases Studied in This Work

T1:1 Cases Np rp[nm] σr [nm] reff [nm]

Case 1 34 4 0 13 T1:2

Case 2 4 10 0 16 T1:3

Case 3 10 10 0 21 T1:4

Case 4 16 16 0 40 T1:5

Case 5 34 16 0 52 T1:6

Case 6 40 22 0 75 T1:7

Case 7 30 15 1.25 50 T1:8

(7)

611 B. Metric Function

612 The main objective of the ABC methods is to sample points

613 from the posterior to approximate the actual posterior distribu-

614 tion of the parameters of interest. This is achieved by defining a

615 metric between the scattering behavior of the actual parameters,

616 y, and that of the simulated parameters, x. Therefore, the pro-

617 posed method utilizes the first element of the Mueller matrix of

618 a given aggregate at N_mpolar angles,ϕi, where i 1…Nm.

619 Then, a small distance between x S₁₁ϕ1…S11ϕm^T and

620 y S₁₁ϕ1…S11ϕm^T means that the parameters leading

621 to x may be the parameters of interest, where indicates

622 the synthetic measurements. The ratio of the norm of the dif-

623 ference between x and y to the norm of y is used as a metric.

624 Since the ABC methods rely on the distance between the simu-

625 lated estimate, x, and synthetic measurement, y, a threshold

626 value,ϵ, must be defined to decide whether a point is sampled

627 from the posterior or not. A point is considered to be sampled

628 from the posterior if the metric is smaller than the user-defined

629 tolerance,

ρx; y kx − yk

kyk <ϵ: (6)

630 The definition of the distance metric is the same for all the

631 algorithms considered in this work. Second, all algorithms

632 assume a uniform prior density, meaning that there is no

633 knowledge about the parameters of interest before the

634 experiments. Choice of uniform prior density is a common

635 practice in the literature, such as the cases in [25] and [28].

636 Third, a preliminary study is conducted to determine how

637 many points an algorithm should sample in order to have re-

638 sults that do not depend on sample size. Algorithm 1 is

639 executed with different numbers of samples, and it is found

640 that sampling 100 points or more from the posterior distribu-

641 tion for a single synthetic measurement yields almost identical

642 results. Therefore, all algorithms except for Algorithm 2 sample

643 100 points, which results in 10⁴points 100 × 100 in total for

644 each case; a sample of 100 points is sufficient. Since Algorithm

645 2 is MCMC, the number of points that will be sampled cannot

646 be controlled with given implementation. Moreover, the

647 transition kernel, q, is assumed as Gaussian with a known

648 covariance matrix for Algorithms 2, 3, and 4. However, some

649 parameters are different from each other for different

650 algorithms.

651 The main purpose of the transition kernel is to propose a

652 new point based on the previous one, and the very nature

653 of the transition kernel must be chosen considering the physics

654 of the problem. While we have tried different transition kernels

655 before selecting the one used in this study, the choice is not

656 arbitrary. Since all algorithms start from the points that are ran-

657 domly selected within the domain, the initial point might be

658 very far away from the actual posterior distribution region. In

659 such cases, a transition kernel with small covariance might not

660 propose acceptable jumps, whereas the rate of acceptance of a

661 too large covariance will still be small. Therefore, an ideal kernel

662 should have covariance that is not too small or too large. The

663 covariance matrix is then taken as 36 − 10; −10 36 for

664 Algorithm 2. Finally, the negative sign of the off-diagonal terms

665 is because rpand Npare inversely proportional to each other for

a constant effective radius through Eq. (4). The covariance ma- 666

trices in Algorithms 3 and 4 are calculated empirically based on 667

the previously sampled points. 668

The threshold value is taken asϵ 0.03 for Algorithms 1 669

and 2, whereas the interval [0.1 0.03] is divided into T 9 670

equal spaces for Algorithm 3, where T 9 is the total number 671

of iterations in Algorithm 3. The minimum acceptable toler- 672

ance considering the normal measurement error is calculated 673

with Eq. (6) for a given parameter set and found around 674

0.023. The threshold value of ϵ 0.03 considers both the 675

measurement error and the interpolation error within the data- 676

base. Besides, larger values of tolerance would result in larger 677

prediction zones for parameters of interest. While this increases 678

the probability of an accurate or acceptable prediction, which 679

can be identified in the test cases by checking if the actual 680

parameters are contained within the predicted zone or not, pre- 681

dicting a smaller zone is also desired. Therefore, ϵ 0.03 682

would yield a smaller prediction zone while encapsulating both 683

the measurement and interpolation error. Finally,α 0.5 and 684

p_acc 685

min 0.01 are taken as inputs for Algorithm 4.

It should be noted that although Algorithms 3 and 4 have 686

their stopping criteria, the covariance matrix may also cause 687

these algorithms to stop. The covariance matrix is symmetric 688

and positive-definite by definition. If all the sampled points 689

are identical or similar to each other, then, covariance among 690

the sampled points would be equal to or very close to 0. 691

Algorithms 3 and 4 terminate as the covariance matrix is no 692

more positive-definite, and it is assumed that the algorithm 693

converges to that particular point. 694 C. Database 695

The presented algorithms require the solution of the direct 696

problem of calculating the scattering profile of a given set of 697

parameters. The computational time to obtain the orienta- 698

tion-averaged scattering profile of a given parameter set is 699

demanding, as 490 orientations are considered. Solving the di- 700

rect problem for a parameter set Np 15, rp 8 nm, and 701

σr 0 with the parallel implementation of DDSCAT using 702

3393 dipoles and averaged over 490 orientations takes 653 s 703

in an 8-core system with 2.50 GHz frequency. The computa- 704

tional time required would increase if the number or size of 705

nanoparticles in the aggregate increases. Therefore, construct- 706

ing a database and interpolating the solutions of the intermedi- 707

ate parameter sets would be a feasible approach to sample 708

points from the posterior distribution. 709

Another challenge that must be considered while creating a 710

database is the randomness of fractal geometries. The geom- 711

etries generated with same parameters according to Eq. (3) 712

would be all different due to the randomness in Filippov’s 713

algorithm. Therefore, two separate aggregates generated based 714

on the exact same parameters would have different geometries, 715

and hence, their scattering profile would be different even with- 716

out error being introduced. If a database is to be constructed, 717

scattering behaviors of different geometries with the same par- 718

ticle size and number must be averaged to reduce geometry 719

dependency. 720

The database is constructed as follows: Fifteen different 721

numbers of particles ranging from three to 45 with an incre- 722

4