1
Approximate Bayesian computation techniques
2
for optical characterization of nanoparticle
3
clusters
4 OZAN BURAK ERICOK,1 ALI TAYLAN CEMGIL,2 AND HAKAN ERTURK1,*
5 1Department of Mechanical Engineering, Bogazici University, 34342 Bebek, Istanbul, Turkey 6 2Department of Computer Engineering, Bogazici University, 34342 Bebek, Istanbul, Turkey 7 *Corresponding author: hakan.erturk@boun.edu.tr
8 Received 8 May 2017; revised 6 November 2017; accepted 10 November 2017; posted 10 November 2017 (Doc. ID 295428);
9 published 0 MONTH 0000
10 Characterization of nanoparticle aggregates from observed scattered light leads to a highly complex inverse 11 problem. Even the forward model is so complex that it prohibits the use of classical likelihood-based inference 12 methods. In this study, we compare four so-called likelihood-free methods based on approximate Bayesian com- 13 putation (ABC) that requires only numeric simulation of the forward model without the need of evaluating a
14 likelihood. In particular, rejection, Markov chain Monte Carlo, population Monte Carlo, and adaptive popu-
15 lation Monte Carlo (APMC) are compared in terms of accuracy. In the current model, we assume that the nano-
16 particle aggregates are mutually well separated and made up of particles of same size. Filippov’s particle-cluster
17 algorithm is used to generate aggregates, and discrete dipole approximation is used to estimate scattering behav- 18 ior. It is found that the APMC algorithm is superior to others in terms of time and acceptance rates, although all 19 algorithms produce similar posterior distributions. Using ABC techniques and utilizing unpolarized light 20 experiments at 266 nm wavelength, characterization of soot aggregates is performed with less than 2 nm deviation 21 in nanoparticle radius and 3–4 deviation in number of nanoparticles forming the monodisperse aggregates.
22 Promising results are also observed for the polydisperse aggregate with log-normal particle size 23 distribution. © 2017 Optical Society of America
OCIS codes: (000.5490) Probability theory, stochastic processes, and statistics; (290.0290) Scattering; (290.3200) Inverse 24 scattering; (160.4236) Nanomaterials.
25
https://doi.org/10.1364/JOSAA.99.099999
26 1. INTRODUCTION
27 Developments in nanotechnology have brought new opportu- 28 nities to various fields from medicine [1] to combustion 29 diagnosis [2,3]. Synthesizing nanostructures with unique opti- 30 cal and physical properties necessitates new and improved 31 characterization tools, as manufacturing unique structures 32 with desired properties is only possible through using proper 33 characterization.
34 There are different techniques for characterization of the
35 size, shape, and configuration of the nanostructures.
36 Although the direct methods relying on microscopy such as
37 transmission electron microscopy (TEM), scanning electron 38 microscopy (SEM), and atomic force microscopy (AFM) are 39 capable of producing high-quality images and are considered
40 as the most reliable means of characterization, they usually
41 are not suitable for in-line monitoring. Recently, characteriza-
42 tion of the nanostructures based on their emission profiles
43 via time-resolved laser-induced incandescence (TR-LII) and
scattering profiles via light scattering have been gaining interest. 44
While these methods have in situ characterization capability, 45
the light-scattering methods might be preferable over 46
TR-LII methods, as they are not only limited to absorbing 47
materials. 48
There are different methods to predict scattering behavior of 49
aggregates in the literature. Rayleigh–Debye–Gans (RDG) 50
approximation [4] and RDG for fractal aggregates (RDG- 51
FA) [5,6], are often used when the scatterer is irregularly 52
shaped and smaller than the wavelength. RDG-FA is valid if 53
jm − 1j ≪ 1 and kjm − 1jdp≪ 1, where m, k, and dp are 54
the refractive index, wavenumber, and primary particle diam- 55
eter of aggregate. These approximations are often used to esti- 56
mate the optical properties of combustion-generated particles, 57
as in [7] and [8]. However, Klusek et al. [9] stated that soot 58
aggregates cannot be treated as Rayleigh scatterers and Köylü 59
and Faeth [10] stated that RDG-FA theory cannot be used for 60
soot aggregates with large refractive index. Besides these, they 61
1
2
1084-7529/17/120001-01 Journal © 2017 Optical Society of America
62 neglect the multiple and self-induced scattering effects and the
63 error in angular scattering properties such as S11and S12, which
64 can be as high as 50% for large aggregates [11]. The T-matrix
65 method [12,13,14], on the other hand, is a fast and accurate
66 method to model scattering of light by aggregates. Although
67 there is no limitation on primary particle size, the T-matrix
68 method is limited to nonoverlapping spheres. In reality, aggre-
69 gates may overlap, and overlapping has an impact on scattering
70 properties [15]. Discrete dipole approximation (DDA), pro-
71 posed by Purcell and Pennypacker [16], is another method that
72 is often used in the literature. DDA is a semianalytical volume
73 integral method that can handle the most complex geometries
74 by representing them with a finite set of interacting dipoles.
75 Through the interactions of these dipoles, DDA can easily cal-
76 culate the optical behavior of any geometry with any morphol- 77 ogy. Though it can handle very complex geometries accurately,
78 it is slow compared to approximate methods [17].
79 Applications of light-scattering methods for soot characteri- 80 zation is a well-studied problem in the literature. RDG-FA 81 theory is one of the most frequently used light-scattering meth- 82 ods. For example, Reimannet al. [18] combined elastic light- 83 scattering techniques with LII, whereas Huberet al. [19] com-
84 bined wide angle light-scattering methods with the TR-LII
85 method to estimate the primary particle radius and radius of
86 gyration in the aggregate. Using the Mueller matrix for char-
87 acterization of nanoparticle aggregates is another approach.
88 Mengüç and Manickavasagam [20] outlined a procedure to ob- 89 tain the number of nanoparticles in the aggregate using polari- 90 zation information stored in elements such as S12, S33, and S34 91 if the primary particle diameter is known via anex situ mea- 92 surement such as SEM. In different studies, Charnigo et al.
93 used the normalized versions and derivatives of S11, S12,
94 S33, and S34to estimate the agglomeration level [21] and nano-
95 particle diameter [22]. The characterization problem is also for-
96 mulated as a parameter estimation problem of identifying the
97 average size and the number of nanoparticles forming the 98 aggregate that can be estimated through the solution of the op- 99 timization problem. More recently, Ericok and Erturk [23]
100 used only the nonpolarization information, S11, to estimate 101 the number and radius of the nanoparticles in the aggregate 102 using the tabu search algorithm, while their analysis showed 103 that characterization is possible for aggregates larger than
104 20 nm using a light source at 266 nm. However, in all of these
105 deterministic methods, it is not possible to validate the results,
106 since the analysis focuses on matching the measured scattering
107 behavior rather than the characterized parameters directly.
108 Therefore, presenting the range of existence of the parameters 109 with a desired confidence level would be more appropriate for 110 characterization unless an independent experiment is designed 111 to measure these parameters directly.
112 Statistical inversion based on Bayesian inference is a prob- 113 abilistic approach that is capable of predicting the probability
114 distribution of the unknown parameters based on measure-
115 ments. The basic idea of the Bayesian inference relies on up-
116 dating the information about the unknown parameters based
117 on observations. Once the final probability distribution of 118 the unknown parameters is obtained, credible intervals with 119 a desired confidence level can be constructed.
The Bayesian inference approach is relatively new for 120 121 nanoparticle characterization. Burr et al. [24] used Bayesian inference to recover soot aggregate size distribution from 122 multiangle elastic light-scattering data using maximum 123 a posteriori (MAP) inference. Sipkens et al. [25] used the 124
TR-LII method to characterize gas-borne silicon nanoparticles 125
and analyzed their measurements with the Bayesian approach 126
to find the most probable nanoparticle size distribution param- 127
eters. Oteroet al. [26] studied the potentials of parametric and 128 nonparametric Bayesian schemes on the inverse problems of 129 light scattering. In another study, Clementi et al. [27] used 130 the Bayesian inversion method based on multidirectional 131 dynamic light-scattering measurements to estimate the particle 132 size distribution of latexes. Charnigoet al. [28] used different 133 elements of the Mueller matrix and their derivatives to 134
construct credible intervals for nanoparticle diameter and 135
agglomeration level. They introduced both systematic and sto- 136
chastic error in their analysis and utilized polarization informa- 137
tion via normalized S11, S12, S33, and S34for different cases. In 138 a more recent work, Hadwinet al. [29] used Bayesian inference 139 to calculate the MAP and credible intervals of soot volume frac- 140
tion and peak temperature for different likelihoods and prior 141
densities, whereas Huber et al. [30] used Bayesian inference 142
to estimate the size and morphology parameters of soot-laden 143
aerosols. Ericok and Erturk [31] used Bayesian inference for 144
characterization of the radius and number of nanoparticles 145
of the soot aggregates with different credible interval levels. 146
Computing the likelihood function is vital for Bayesian in- 147
ference. However, it is not always available in a closed analytical 148
form or feasible to compute due to complex physical models 149
that require too many parameters. In general, there are three 150
ways to overcome this problem: (1) nonparametric estimation 151
of the likelihood. In this method, one can run M auxiliary ex- 152
periments based on the parameters that affect the outcome. The 153
auxiliary experiments must be run at discrete points for a 154
single parameter, and intermediate points can be interpolated. 155
However, if the number of parameters is not small, this method 156
suffers from what is known as the “curse of dimensionality.” 157
(2) Parametric estimation of likelihood. One can assume an 158
analytical form, e.g., Gaussian or Poisson distributions, for like- 159
lihood. Although this method is very practical, it may misrep- 160
resent the physical reality. (3) Parametric estimation after 161
dimension reduction. This method does not use the whole out- 162
come of the system, but only its sufficient statistics. If a suffi- 163
cient statistics such as mean, standard deviation, etc. exists that 164
represents the outcome well, one may reduce the number of 165
unknowns and use parametric estimation on the reduced 166
unknowns. More information can be found in [28]. 167
One alternative way to cope with this problem is not using 168
likelihood altogether. If likelihood is not available or cannot be 169
obtained by the methods described above, Bayesian inference is 170
performed via what is known as approximate Bayesian compu- 171
tation (ABC) techniques. ABC consists of methods that do not 172
require the calculation of the likelihood function to obtain the 173
posterior distribution. The idea is to sample points from the 174
posterior distribution directly, if a model exists that relates 175
the parameters of interest to observable quantities. While 176
ABC methods are widely used in different fields such as 177
178 genetics [32], epidemiology [33], cosmology [34], and signal
179 detection theory [35], etc., they have not been used for solution 180 of inverse radiative transfer and optical characterization 181 problems.
182 In radiative transfer and optical characterization, Bayesian 183 inference methods are generally performed by assuming an ana-
184 lytical form to the likelihood function as in [24–31]. However,
185 most of the problems considered are complex, where the
186 physical behavior of the system cannot be represented by an
187 analytical expression, necessitating the use of numerical mod- 188 eling. For the analysis of such problems, using ABC methods is 189 a more reasonable approach. In this study, some of the ABC 190 methods will be introduced (for the first time, to our best 191 knowledge) and applied rigorously for characterization of nano- 192 particle aggregates to demonstrate the strengths and limitations 193 of the proposed methods.
194 2. ABC
195 The very fabric of the statistical inversion is to reconstruct the
196 solution of the inverse problem in terms of probability distri-
197 butions. The inverse problems generally comprise identifying 198 the parameters of interest that are related to the measured 199 quantities through a physical model. Likewise, the statistical 200 inversion theory takes all the information about the system that 201 is known to the observer prior to the measurement, as well as 202 the measurement itself, and estimates the probability distribu- 203 tions of the parameters of interest. Therefore, the solution of
204 the inverse problem is not a single set of parameters, but a prob-
205 ability distribution referred to as the posterior probability
206 distribution. The fundamental principles and some details of
207 the concepts can be found in [36] and [37].
208 In Bayesian statistical inference, all the variables are treated 209 as random variables, which are denoted by capital letters. The 210 observable random variable Y ∈ Rm, the measurement, and the 211 unobservable random variable Θ ∈ Rn, the parameters of 212 interest, are related to each other by
Y f θ; E; (1)
213 where E ∈ Rk is the noise.
214 The conditional probability of θ given Y yobserved,
215 πθjyobserved, is the solution of the statistical inverse problem,
216 as it represents the probability distribution ofΘ θ based on
217 the measurement yobserved
πpostθ πθjyobserved R πprθπyobservedjθ
Rnπprθπyobservedjθdθ; (2) 218 whereπprθ is the prior density that contains all the knowledge 219 aboutθ before the measurement. The conditional probability 220 of random variable Y given Θ θ, πyjθ, is called the 221 likelihood function.
222 Equation (2) suggests that the posterior probability distribu- 223 tion,πpostθ, can easily be obtained if the analytical forms of
224 both the prior and the likelihood are available. The easiest part
225 in Eq. (2) is the prior density, since it is either known or
226 assumed by the researcher apart from the model. However, rep-
227 resenting the likelihood analytically can be impossible for some 228 cases, as in most of the computational or simulation-based 229 models. In these cases, the standard methods of Bayesian
estimation cannot be applied. As stated earlier, ABC methods 230 overcome this problem by not calculating the likelihood func- 231 tion but by sampling enough points from the posterior distri- 232 bution itself. In this study, four ABC algorithms–rejection 233 sampler, Markov chain Monte Carlo (MCMC), population 234
Monte Carlo (PMC) and adaptive population Monte Carlo 235
(APMC)—are considered. 236
237 A. Rejection Sampler
The first ABC algorithm introduced by Tavaréet al. [38] was a 238 simple rejection algorithm to estimate the common ancestor 239 times for DNA sequence data. Tavaré’s idea was extended 240
by Pritchardet al. [39]. First, a point, θ, is generated from 241
the prior density. Then, the artificial data set, x, associated with 242
the generated pointθis created using the simulation model, 243
f . They proposed that if the generated data set and the actual 244
data set are close enough,ρx; y < ϵ, then the point is sampled 245
from the posterior. In this inequality,ρ is a scalar metric, and ϵ 246
is the user-defined tolerance. If these steps are repeated until N 247
points are drawn from the posterior, a good approximate of the 248
posterior can be obtained without calculating the likelihood 249
function. Algorithm 1, presented below, is the rejection 250
algorithm proposed by Pritchard. Although it is simple to 251
implement, the rejection algorithm suffers from high rejection 252
rates—in other words, low acceptance rates of sampled 253
points. 254
256 Algorithm 1: ABC rejection sampler
257258 1: count 0
2: for i 1…N do 259
260 3: whileρx; y > ϵ do
261 4: count count 1
262 5: generateθ∼ πprθ
6: simulate x∼ f xjθ 263 7: storeθi θ 264
8: raccept N ∕count 265
266 9: return.
B. MCMC 267
The main disadvantage of the ABC rejection sampler algorithm 268
is the high rejection rates caused by sampling points from the 269
prior distribution. Rejection sampler algorithm samples the 270
new point without using any information from the previously 271
sampled points. However, it is possible, in principle, to reduce 272
the rejection rates by utilizing the information from the previ- 273
ously sampled points. The MCMC algorithm samples a new 274
point by making a jump from the current point with the 275
assumption that the new point will possibly be near the last 276
sampled point. Thus, it eliminates the complete randomness 277
and makes use of available information. 278
The MCMC algorithm is similar to the Metropolis–Hasting 279
algorithm. The only difference is that there are no likelihood 280 calculations involved in MCMC. Like the Metropolis–Hasting 281
282 algorithm, MCMC starts with an initial point,θ0, sampled from the prior. Then, the new points are generated by a tran- 283 sition kernel, q, which relates the point from the previous MC 284
step,θi−1, to the new one,θ. Data set x is generated by using 285
θ, andρx; y is calculated. If ρx; y > ϵ, then θi−1is a better 286
287 approximation than θ; therefore no replacement is made.
288 Otherwise,θis accepted with probability, h, where h is a uni- 289 formly distributed random number between 0 and 1, U 0; 1.
290 It should be noted that the MCMC algorithm only uses the 291 last sampled point, not the ones that are sampled earlier.
292 Algorithm 2, given below, is the MCMC algorithm proposed 293 by Marjoramet al. [40]. More detailed discussions about the
294 algorithm can be found in [41] and [42].
296 Algorithm 2: ABC MCMC 297298 accepted 0
299 2: generateθ0∼ πprθ
300 for i 1…N do
301 4: generateθ∼ qθjθi−1;Σ
302 simulate x∼ f xjθ 303 6: ifρx; y > ϵ then
304 θi θi−1
305 8: else
306 calculate h ππprθ
prθi−1×qθqθi−1jθi−1jθ;Σ;Σ
307 10: if U 0; 1 < h then
308 accepted accepted 1
309 12: storeθi θ
310 else
311 14: storeθi θi−1
312 raccept accepted∕N
313 16: return
314 Although the acceptance rates of MCMC are higher than
315 those of the rejection algorithm, there are several disadvantages.
316 First, sampled points are highly correlated to each other due to
317 a transition kernel. This is not a major drawback unless 318 noncorrelated points are needed. Second, smallϵ together with 319 high correlation property might cause MCMC to stack in a low 320 probability region, resulting in a harder transition from one 321 point to another. Third, a covariance matrix should be defined 322 by the usera priori to the algorithm. This has a great impact on 323 the performance of MCMC, since the covariance matrix
324 determines the nature of the transition kernel. Finally, it is hard
325 to parallelize MCMC chains [43].
326 C. PMC
327 Using the points that are successfully sampled previously and 328 adjusting the tolerance level ϵ are two good ways to increase 329 the acceptance rate. The partial rejection control (PRC) 330 algorithm that Sisson et al. [44] proposed uses weighted 331 resampling from the points sampled previously and a decreas- 332 ing tolerance level in each iteration. Beaumont et al. [45]
333 corrected the bias noted in the original algorithm proposed
334 by Sisson et al. and suggested the PMC algorithm.
335 Algorithm 3, given below, is the PMC algorithm suggested
336 by Beaumont et al.
337 PMC divides the calculations into t T iterations and the 338 corresponding tolerance levels,ϵt’s, are defined by the user with 339 ϵ1>ϵt>ϵT. Thus, more points are accepted at the begin-
340 ning, since a relatively relaxed tolerance is applied. The main
341 advantage of PMC is that it samples new points using the
342 points that represent the posterior well, which results in
343 the elimination of the points that represent the posterior
344 poorly.
Algorithm 3: ABC PMC 346
347348 count 0
t 1 349
350 3: for i 1…N do
351 whileρx; y < ϵ1do
352 count count 1
353 6: drawθti ∼ πprθ
simulate x∼ f xjθi 354
355 let wti 1∕N
9: letΣ twice the empirical covariance of fθti gi1;2;…N 356
357 for t 2…T do
358 for i 1…N do
359 12: whileρx; y < ϵt do
count count 1 360
pickθi ∼ θt−1j with probabilities wt−1j 361 15: sampleθti ∼ qθti jθi;Σ 362
simulate x∼ f xjθti 363
calculate and normalize wti ∝P πprθti 364
jwt−1j qθti jθt−1j ;Σ
18: letΣ twice the empirical covariance of fθti gi1;2;…N 365 raccept N ∕count 366
return. 367
At t 1, the algorithm samples N points satisfying 368
ρx; y < ϵ1 from the prior distribution and assigns a weight, 369
w1i , to each of them. Thus, the algorithm has a set of points, 370
fθ1i gi1;2;…N, whose elements satisfy the corresponding 371
tolerance level. The set at iteration t, fθti gi1;2;…N, is formed 372
by selecting a point,θi, from the previous set fθt−1i gi1;2;…N 373
with probability wt−1i and sampling a new point,θti , by using 374
a transition kernel, qθti jθi;Σ. Unlike MCMC, the covari- 375
ance matrix at each iteration,Σ, is calculated by taking twice 376
the empirical covariance of the set fθt−1i gi1;2;…N. 377
The major drawback of PMC is the user-defined tolerance 378
levels. The user may define the tolerances too sharply and too 379
shallowly, which leads to an increase or decrease in the 380 computation time required. 381
382 D. APMC
The problem of user-defined tolerances is solved by defining 383 online tolerance at each step as suggested by Wegmannet al. 384
[46], Drovandi and Pettitt [47], and Del Moralet al. [48]. The 385
APMC algorithm proposed by Lenormand et al. [49] is 386
designed to overcome this problem and is based on the basic 387
principles of these works and that of Beaumontet al. 388 APMC goes a step further than PMC by adjusting tolerance 389 levels adaptively. Algorithm 4 requires three parameters: N ,α, 390 and pacc 391
min, where N is the total number of points that will be sampled,α ∈ 0; 1 is the ratio of points that will be kept in 392 each iteration to the total number of points, and pacc 393
min is the minimal acceptance rate. The number of points to keep 394
in each iteration, Nα, isαN . 395
At t 1, the algorithm draws N points from the prior dis- 396
tribution and calculates theρ0i and sets w0i 1. Then, the 397
first tolerance level, ϵ1, is taken as the first α quantile of 398 fρ0i gi1;2;…N. After the first tolerance level is determined, 399 APMC constructs the new set, fθ1i ; w1i ;ρ1i g, from the ele- 400
ments of the old set, whereρ0i is smaller thanϵ1. Thus, a total 401
3
402 of Nαelements are obtained at the first iteration. Since the new
403 set consists of the points with the best metrics, ρ1i , the
404 covariance matrix is taken as twice the empirical covariance
405 of this set. At this point, the rest of the iterations start by setting
406 pacc to 1.
407 From this point, the algorithm tries to find the best N − Nα 408 points to complete the set at each iteration. At iterations t > 1,
409 a new point is generated by drawing a point from the transition
410 kernel, qθt−1i jθi;Σ, around the point θi, which is picked
411 from the previous set,θt−1j , with normalized weight (i.e., prob-
412 abilities), where j 1; 2; …Nα. After the new point is gener-
413 ated, the data set is simulated, and the corresponding metric
414 and the weight of the old point, θt−1i , and the point itself,
415 are replaced with the new one.
416 When the process of replacing old points with new ones is
417 finished, APMC calculates the new tolerance level for the next
418 iteration as the firstα quantile of fρt−1i gi1;2;…N and deter-
419 mines the minimal acceptance rate, pacc. The algorithm repeats
420 these steps until pacc ≤ paccmin.
421 The main advantage of this algorithm is that the tolerance
422 levels are calculated and updated in each step, resulting in
423 utilization of more optimized tolerance levels.
425 Algorithm 4: ABC APMC 426427 count 0
428 Nα αN 429 for t 1 do 430 4: for i 1…N do 431 count count 1 432 drawθ0i ∼ πprθ
433 simulate x∼ f xjθ0i 434 8: calculateρ0i ρx; y
435 let w0i 1
436 defineϵ1 Qρ0α the first α-quantile of fρ0i gi1;2;…N 437 fθ1i ; w1i ;ρ1i g fθ0i ; w0i ;ρ0i jρ0i ≤ ϵ1gi1;2;…N 438 12: letΣ twice the empirical covariance of fθ1i gi1;2;…Nα 439 define pacc 1
440 while pacc> paccmin do 441 t t 1
442 16: for i Nα 1…N do 443 count count 1
444 pickθi ∼ θt−1j with probability fwt−1j gj1…Nα 445 drawθt−1i ∼ qθt−1i jθi;Σ
446 20: simulate x∼ f xjθt−1i
447 calculate and updateρt−1i ρx; y
448 update wt−1i ∝P πprθt−1i
jwt−1j qθt−1i jθt−1j ;Σ j 1…Nα 449 calculate and update paccN−N1
α
PN
kNα11ρt−1
i <ϵt−1
450 24: letϵt Qρt−1α the first α-quantile of fρt−1i gi1;2;…N 451 fθti ; wti ;ρti g fθt−1i ; wt−1i ;ρt−1i jρt−1i ≤ ϵtgi1…N 452 letΣ twice the empirical covariance of fθti gi1;2;…Nα 453 raccept Nα∕count
454 28: return.
455 3. OPTICAL SCATTERING
456 Stokes showed that four elements,K IQU V Tare enough
457 to fully describe the intensity and polarization state of an
458 electromagnetic wave, where I , Q, U , and V represent the total
459 intensity, the difference between horizontally and vertically
polarized intensities, the difference between 45° and −45° 460
polarized intensities, and the difference between the right- 461
handed and left-handed circularly polarized intensities, 462
respectively, with I2 Q2 U2 V2 [50]. 463
Incident,Ki, and scattered,Ks, intensities are related to each 464
other by a 4 × 4 matrix known as the Mueller matrix, Sij, by 465
Ks SijKi. The Mueller matrix is the complete representa- 466
tion of the scattering event. Therefore, it can be used for 467
characterization of nanoparticles. In this work, mainly the first 468
element of the Mueller matrix, S11, is used for characterization 469
purposes. S11 is known as the differential scattering cross 470
section and it relates the incident total intensity, Ii, to scattered 471
total intensity, Is, if unpolarized light is incident upon the 472
particle. 473
4. REPRESENTATION OF AGGREGATES 474
It is required to represent the structure of the nanoparticle 475
aggregates, since real-time light-scattering experiments are re- 476
placed with numerical experiments in this study. The aggregates 477
are mathematically modeled with the fractal equation, 478
Np kf
Rg
rp
Df
; (3)
where Npis the number of particles in the aggregate, rpis the 479
primary particle radius, Rgis the radius of gyration, and Df and 480
kf are the fractal dimension and fractal prefactor, respectively. 481
There are different techniques to generate fractal-like aggre- 482
gates such as ballistic aggregation, diffusion-limited aggregation 483
(DLA) and reaction-limited aggregation (RLA) [51]. The most 484
commonly used one is Filippov’s sequential algorithm [52]. 485
Filippov’s algorithm can be implemented by two alternative 486
approaches. The sequential algorithm is a particle-cluster 487
aggregation method that adds spherical particles to each other 488
one by one, while satisfying the fractal equation at every step. 489
Generation of fractal-like aggregates is also possible with 490
cluster-cluster aggregation. This method first generates small 491
aggregates and adds them to each other to form a larger aggre- 492
gate. However, the particle-cluster algorithm is preferred to the 493
cluster-cluster algorithm due to its implementation and tuning. 494
Moreover, Filippov’s algorithm was originally designed to 495
generate monodisperse aggregates; it is easy to implement 496
polydispersity, as illustrated in [53]. 497
5. PROBLEM STATEMENT 498
Characterization of soot aggregates using ABC techniques is 499
presented in this study to illustrate the strengths and weak- 500
nesses of the proposed ABC algorithms via numerical light- 501
scattering experiments. It must be noted that although this 502
work focuses on a specific problem, ABC methods are flexible 503
enough to be applied to characterization of different nanopar- 504
ticle aggregates. 505
This work considers dilute suspensions of identical, but ran- 506
domly oriented soot aggregates that are well separated from 507
each other. The total scattering behavior of the whole measure- 508
ment volume comprises the scattering behavior of individual 509
aggregates and can be approximated by the scattering behavior 510
of a single aggregate averaged over many orientations [17,54]. 511
Orientation averaging compensates for the effects of random 512
513 orientations of many aggregates as well as movement of these
514 aggregates during the measurement time. Both monodisperse
515 and polydisperse aggregates with log-normal particle size
516 distribution of nonoverlapping spheres are considered.
517 The fractal-like aggregates are generated by Filippov’s
518 algorithm [52], and the implementation of Filippov’s
519 particle-cluster algorithm given by Skorupski et al. [53] is
520 followed in this work. The particle-cluster algorithm relies
521 on random numbers that affect the aggregate geometry.
522 Therefore, different geometries will be obtained at each time
523 when the particle-cluster algorithm is executed with exactly
524 the same parameters. Finally, although the particle-cluster algo-
525 rithm fails to produce the expected slope of pair-correlation
526 function [52,53], the particle-cluster algorithm is preferred,
527 since the generated aggregates would be more similar to
528 aggregates in real ensembles.
529 The values of the parameters Df and kf are assumed to be
530 knowna priori following the results of two independent experi-
531 ments conducted by Yonet al. [55] and taken as 1.81 and 1.37,
532 respectively, as inputs to the particle-cluster algorithm. The
533 wavelength of the incident light is considered as 266 nm, which
534 lies in the wavelength range of a commercial device [56], as
535 suggested by [20] in all of the calculations, and the optical
536 properties of soot at 266 nm are adopted from [55].
537 Tianet al. [57] showed experimentally that the radius of the
538 primary particles of soot aggregates is in the range of 6–25 nm,
539 and the probability of aggregates with Np< 30 is higher than
540 those with Np> 30. Moreover, in our previous study [23], it
541 was shown that the characterization of Np and rp highly de-
542 pends on the effective radius, reff, given in Eq. (4). When
543 reff is smaller than 20 nm, the deviations in estimated values
544 for rpand Npbegin to increase, since order of magnitude of S11
545 at 266 nm becomes comparable with that of the uncertainty of
546 the commercial device. Therefore, we have selected test cases
547 according to these observations. Table 1 shows the cases
548 selected to test the proposed methods. The effective radius
549 is defined as
reff XNp
i1
r3i
!1∕3
: (4)
550 6. SOLUTION METHODOLOGY
551 Obtaining the scattering behavior of a given aggregate with
552 known geometry and morphology requires the solution of
553 the direct problem by solving Maxwell’s equations. Since an
554 analytical solution of the Maxwell’s equation for a complex
555 geometry such as an aggregate is not available, use of numerical
556 methods is required. In this work, Fortran implementation of
557 DDA and DDSCAT, developed by Draine and Flatau [58], is
558 used for the solution of the direct problem.
559 A. Synthetic Measurements
560 Since the algorithms considered are evaluated by the numerical
561 experiments, the actual experimental measurements are
562 replaced with synthetic measurements that are obtained by add-
563 ing random measurement error to the simulated scattering
564 behavior of known aggregates. The scattering behavior of
the entire measurement volume is approximated by the orien- 565
tation-averaged scattering profile of a single aggregate, and 566
490 orientations are considered as recommended in [20]. 567
The effects of the randomness in Filippov’s algorithm should 568
be taken into account while constructing the synthetic mea- 569
surements. Every fractal geometry generated with exactly the 570
same parameters will be different due to the randomness of 571
the algorithm. Therefore, 10 different fractal geometries are 572
created for monodisperse cases (Cases 1–6), presented in 573
Table1. Moreover, the measurement error may also have some 574
impact on the final results. Therefore, 10 different random er- 575
ror vectors are introduced to the scattering profile of each 576
geometry created for each case. The measurement error is as- 577
sumed to have a normal distribution with zero mean (μ 0) 578
and a known standard deviation (σ 10−4), considering the 579
uncertainty of a commercial device [56]. Therefore, the syn- 580
thetic measurements are created according to the formula below 581
yi;j S11;iNp; rp;σr ei;jμ 0; σ 10−4; (5) where i and j are the indices of geometry and error, respectively. 582
As a result, 100 different synthetic measurements (10 different 583
measurement errors to each of 10 geometries) are obtained with 584
Eq. (5) for each monodisperse case. 585
The synthetic measurement for a polydisperse case (Case 7) 586
is created in a slightly different way. Since the primary particle 587
radii vary significantly among aggregate realizations generated 588
with the same fractal parameters, the scattering behavior of the 589
ensemble cannot be assumed as the scattering behavior of a sin- 590
gle aggregate. Therefore, an average over aggregate realizations 591
is also required for Case 7. Although our preliminary analysis 592
showed that taking an average over 40 realizations is sufficient 593
to represent the scattering behavior of the ensemble, 100 ag- 594
gregate realizations are created. After random error vectors 595
are added to the scattering behavior of these realizations, the 596
synthetic measurement is created by taking the average of these 597
100 simulated measurements. 598
The physical limitations of the actual experimental setup 599
must be considered while creating synthetic measurements. 600
The measurements at ϕ 0° and ϕ 180° are not possible 601
in the same plane. Moreover, measurements near these angles 602
are highly disturbed by the incident light. Considering these 603
limitations, the initial and final polar angles are taken as 604
ϕi 5° and ϕf 165°, respectively. In our previous study 605
[23], it was observed that the increment between these angles 606
may not affect the results of characterization much if 607
Δϕ < 16°. Therefore, the scattering profiles of every parameter 608
set contain measurements between ϕi 5° and ϕf 165° 609
withΔϕ 8°. 610
Table 1. Cases Studied in This Work
T1:1 Cases Np rp[nm] σr [nm] reff [nm]
Case 1 34 4 0 13 T1:2
Case 2 4 10 0 16 T1:3
Case 3 10 10 0 21 T1:4
Case 4 16 16 0 40 T1:5
Case 5 34 16 0 52 T1:6
Case 6 40 22 0 75 T1:7
Case 7 30 15 1.25 50 T1:8
611 B. Metric Function
612 The main objective of the ABC methods is to sample points
613 from the posterior to approximate the actual posterior distribu-
614 tion of the parameters of interest. This is achieved by defining a
615 metric between the scattering behavior of the actual parameters,
616 y, and that of the simulated parameters, x. Therefore, the pro-
617 posed method utilizes the first element of the Mueller matrix of
618 a given aggregate at Nmpolar angles,ϕi, where i 1…Nm.
619 Then, a small distance between x S11ϕ1…S11ϕmT and
620 y S11ϕ1…S11ϕmT means that the parameters leading
621 to x may be the parameters of interest, where indicates
622 the synthetic measurements. The ratio of the norm of the dif-
623 ference between x and y to the norm of y is used as a metric.
624 Since the ABC methods rely on the distance between the simu-
625 lated estimate, x, and synthetic measurement, y, a threshold
626 value,ϵ, must be defined to decide whether a point is sampled
627 from the posterior or not. A point is considered to be sampled
628 from the posterior if the metric is smaller than the user-defined
629 tolerance,
ρx; y kx − yk
kyk <ϵ: (6)
630 The definition of the distance metric is the same for all the
631 algorithms considered in this work. Second, all algorithms
632 assume a uniform prior density, meaning that there is no
633 knowledge about the parameters of interest before the
634 experiments. Choice of uniform prior density is a common
635 practice in the literature, such as the cases in [25] and [28].
636 Third, a preliminary study is conducted to determine how
637 many points an algorithm should sample in order to have re-
638 sults that do not depend on sample size. Algorithm 1 is
639 executed with different numbers of samples, and it is found
640 that sampling 100 points or more from the posterior distribu-
641 tion for a single synthetic measurement yields almost identical
642 results. Therefore, all algorithms except for Algorithm 2 sample
643 100 points, which results in 104points 100 × 100 in total for
644 each case; a sample of 100 points is sufficient. Since Algorithm
645 2 is MCMC, the number of points that will be sampled cannot
646 be controlled with given implementation. Moreover, the
647 transition kernel, q, is assumed as Gaussian with a known
648 covariance matrix for Algorithms 2, 3, and 4. However, some
649 parameters are different from each other for different
650 algorithms.
651 The main purpose of the transition kernel is to propose a
652 new point based on the previous one, and the very nature
653 of the transition kernel must be chosen considering the physics
654 of the problem. While we have tried different transition kernels
655 before selecting the one used in this study, the choice is not
656 arbitrary. Since all algorithms start from the points that are ran-
657 domly selected within the domain, the initial point might be
658 very far away from the actual posterior distribution region. In
659 such cases, a transition kernel with small covariance might not
660 propose acceptable jumps, whereas the rate of acceptance of a
661 too large covariance will still be small. Therefore, an ideal kernel
662 should have covariance that is not too small or too large. The
663 covariance matrix is then taken as 36 − 10; −10 36 for
664 Algorithm 2. Finally, the negative sign of the off-diagonal terms
665 is because rpand Npare inversely proportional to each other for
a constant effective radius through Eq. (4). The covariance ma- 666
trices in Algorithms 3 and 4 are calculated empirically based on 667
the previously sampled points. 668
The threshold value is taken asϵ 0.03 for Algorithms 1 669
and 2, whereas the interval [0.1 0.03] is divided into T 9 670
equal spaces for Algorithm 3, where T 9 is the total number 671
of iterations in Algorithm 3. The minimum acceptable toler- 672
ance considering the normal measurement error is calculated 673
with Eq. (6) for a given parameter set and found around 674
0.023. The threshold value of ϵ 0.03 considers both the 675
measurement error and the interpolation error within the data- 676
base. Besides, larger values of tolerance would result in larger 677
prediction zones for parameters of interest. While this increases 678
the probability of an accurate or acceptable prediction, which 679
can be identified in the test cases by checking if the actual 680
parameters are contained within the predicted zone or not, pre- 681
dicting a smaller zone is also desired. Therefore, ϵ 0.03 682
would yield a smaller prediction zone while encapsulating both 683
the measurement and interpolation error. Finally,α 0.5 and 684
pacc 685
min 0.01 are taken as inputs for Algorithm 4.
It should be noted that although Algorithms 3 and 4 have 686
their stopping criteria, the covariance matrix may also cause 687
these algorithms to stop. The covariance matrix is symmetric 688
and positive-definite by definition. If all the sampled points 689
are identical or similar to each other, then, covariance among 690
the sampled points would be equal to or very close to 0. 691
Algorithms 3 and 4 terminate as the covariance matrix is no 692
more positive-definite, and it is assumed that the algorithm 693
converges to that particular point. 694 C. Database 695
The presented algorithms require the solution of the direct 696
problem of calculating the scattering profile of a given set of 697
parameters. The computational time to obtain the orienta- 698
tion-averaged scattering profile of a given parameter set is 699
demanding, as 490 orientations are considered. Solving the di- 700
rect problem for a parameter set Np 15, rp 8 nm, and 701
σr 0 with the parallel implementation of DDSCAT using 702
3393 dipoles and averaged over 490 orientations takes 653 s 703
in an 8-core system with 2.50 GHz frequency. The computa- 704
tional time required would increase if the number or size of 705
nanoparticles in the aggregate increases. Therefore, construct- 706
ing a database and interpolating the solutions of the intermedi- 707
ate parameter sets would be a feasible approach to sample 708
points from the posterior distribution. 709
Another challenge that must be considered while creating a 710
database is the randomness of fractal geometries. The geom- 711
etries generated with same parameters according to Eq. (3) 712
would be all different due to the randomness in Filippov’s 713
algorithm. Therefore, two separate aggregates generated based 714
on the exact same parameters would have different geometries, 715
and hence, their scattering profile would be different even with- 716
out error being introduced. If a database is to be constructed, 717
scattering behaviors of different geometries with the same par- 718
ticle size and number must be averaged to reduce geometry 719
dependency. 720
The database is constructed as follows: Fifteen different 721
numbers of particles ranging from three to 45 with an incre- 722
4