Contents lists available atScienceDirect
Digital
Signal
Processing
www.elsevier.com/locate/dsp
Resource-aware
event
triggered
distributed
estimation
over
adaptive
networks
Ihsan Utlu
a,
b,
O. Fatih Kilic
a,
Suleyman
S. Kozat
a,
∗
aDepartmentofElectricalandElectronicsEngineering,BilkentUniversity,Ankara,Turkey bASELSANResearchCenter,Ankara06370,Turkey
a
r
t
i
c
l
e
i
n
f
o
a
b
s
t
r
a
c
t
Articlehistory:
Availableonline1June2017 Keywords:
Distributedestimation Adaptivenetworks
Event-triggeredcommunication Level-crossingquantization
We propose a novel algorithm for distributed processing applications constrained by the available communication resources using diffusion strategies that achieves up to a103 fold reduction in the communicationloadoverthenetwork,whiledeliveringacomparableperformancewithrespecttothe stateoftheart.Aftercomputationoflocalestimates,theinformationisdiffusedamongtheprocessing elements(ornodes)non-uniformlyintimebyconditioningtheinformationtransferonlevel-crossingsof thediffusedparameter,resultinginagreatlyreducedcommunicationrequirement.Weprovidethemean andmean-squarestabilityanalysesofouralgorithms,andillustratethegainincommunicationefficiency comparedtootherreduced-communicationdistributedestimationschemes.
©2017ElsevierInc.Allrightsreserved.
1. Introduction
In tandem with the increasing computational capabilities of processingunits andthe growing amount of generateddata, the demandon distributednetworks anddecentralizeddata process-ing algorithms have remained an area ofgrowing interest [1–3]. With intrinsic characteristics such as robustness and scalability, distributed architectures provide enhanced efficiency and perfor-mance for a wide variety of applications ranging from adaptive filtering,sequential detection, sensor networks, to distributed re-source allocation [4–9]. However, successful implementation of suchapplicationsdependsonasubstantialamountof communica-tionresources.Asanexample,insmartgridapplications, measure-mentunitsoperatingwithhighfrequencyputthecommunication infrastructureofthegridundersignificantpressure[10].Thiscalls forresource-efficient,event-triggereddistributed estimation solu-tionsthatincorporateevent-drivencommunication[11–15].Tothis end,inthispaper,weconstructdistributedarchitecturesthathave asignificantlyreducedcommunicationloadwithoutcompromising performance.Weachievethisbyintroducingnoveleventtriggered communicationarchitecturesoverdistributednetworks.
Inadistributedprocessingframework,agroupof measurement-capableagents,termednodes,inanetworkcooperatewithone an-otherinordertoestimateanunknowncommonphenomenon[16].
*
Correspondingauthor.E-mailaddresses:utlu@ee.bilkent.edu.tr(I. Utlu),kilic@ee.bilkent.edu.tr (O. Fatih Kilic),kozat@ee.bilkent.edu.tr(S.S. Kozat).
Among the different approaches for distributed estimation, we specificallyconsiderdiffusion-basedprotocolsthatexploitthe spa-tial diversity of the network by restricting information sharing to neighboring nodes, without considering any central process-ing unit or a fusion center [16,17]. Diffusion protocols provide an inherentlyscalabledata processingframeworkthat isresilient to changes in network topology such aslink failures as well as changesinthestatisticalpropertiesoftheunknownphenomenon that ismeasured [16].However, therequirementfor allnodes to exchange their currentestimateswiththeir neighbors ateach it-eration places a heavy burden on the available communication resources[18].
Here, we propose novel event-triggered distributed estima-tion algorithms for communication-constrained applications that achieve up to a 103 fold reduction in the communication load over thenetwork. We achievethis byleveraging the uneven dis-tributionoftheeventsovertimetoefficientlyreducethe commu-nication loadin real life applications. In particular, we condition an informationexchange betweenthe neighboringnodes on the level-crossingsofthediffusedparameter[19],unlikeusingafixed rateofdiffusion,cf.[16,17].Furthermore,weshow thatitis suffi-cienttoonlydiffusetheinformationindicatingthedirectionofthe changeinthelevels,whichcanbehandledusingonlytwobitsfor aslowly-varyingparameter.
Reducedcommunicationdiffusionisextensivelystudied inthe signal processing literature [18,20–23]. In [18,20,21], the authors restrict the number of active links between neighbors using a probabilistic framework, or by adaptively choosing a single link ofcommunicationforeach node.In[22],localestimatesare
ran-http://dx.doi.org/10.1016/j.dsp.2017.05.011 1051-2004/©2017ElsevierInc.Allrightsreserved.
Fig. 1. An exampledistributednetworkwithbidirectionalconnections.Circulararea representstheneighborhoodoftheithnode.
domlyprojected,andtheinformationtransferbetweenthe nodes isreducedto asingle bit.In[23],onlycertain dimensionsofthe parametervectoraretransmitted.Ontheotherhand,inthispaper, wereducethecommunicationloaddowntoonlyasinglebitora couple ofbits, unlike [18,20,21,23], in which authors diffuse pa-rametersinfullprecision.Furthermore,weregulatethefrequency ofinformation exchange depending on the rateof change ofthe parameter, unlike [22] where the authorstransfer information at eachsingletimeinstant.
Ourmaincontributionsareasfollows.Weintroducealgorithms fordistributedestimationthati)significantly reducethe commu-nicationloadonthenetwork,ii)whilecontinuingtoprovideequal performancewiththestateoftheart. Wealsoperformthemean andmean-squarestabilityanalysesofouralgorithms.Through nu-mericalexamples,weshowthatouralgorithmsprovidesignificant reductioninthecommunicationloadoverthenetwork.
The paperis organized asfollows: In Section 2,we introduce thedistributedestimationframeworkanddiscussthe adapt-then-combine(ATC)diffusionstrategy.Wefurtherdetailouralgorithms in Section 3, where we formulate the level-triggered distributed estimationalgorithm.InSection4,wepresentthealgorithmic de-scriptionoftheproposedscheme.InSections5and6,weprovide respectively the mean and mean-square stability analyses of the proposed distributed adaptive filter and state the conditions for stability.Weprovideexperimentalverificationofthealgorithm in Section7,andconcludingremarksinSection8.
2. Problemdescription
Consider a network with N nodes that are distributed spa-tiallyasshowninFig. 1.Eachnode sequentiallyobservesa noise-corruptedtransformationofanunknownparameterwo througha
linearmodel
di,t
=
uTi,two+
vi,t,
i=
1, . . . ,
N (1)anddiffusesinformationto itsneighboring nodes j
∈
Ni
,1 where wo∈ R
M is the unknown phenomenon, with ui,t and vi,trep-resentingthe regressorandthe noiseprocesses, respectively.The additiveobservationnoise vi,t andtheregressor ui,t areassumed
to be temporally and spatially independent, and independent of one another, with E
ui,tuTi,t=
σ
2 u,iIM, E v2i,t=
σ
2 v,i. For eachnode i,we assume thatattime t onlythe regressorui,t andthe
1 Werepresentvectors(matrices)byboldlower(upper)caseletters.Fora vec-tora (amatrix A),aT ( AT)isthetranspose.arepresentstheEuclideannorm.
Thediag{A}returnsanewmatrixwithonlythemaindiagonalofA whilediag{a} putsa onthemaindiagonalofthenewmatrix.col{a1, . . . ,aN}producesacolumn
vectorformedbycolumn-wisestackingitsargumentsontopofoneanother.IM
representstheM×M identitymatrix.⊗ standsfortheKroneckerproduct,Tr{·} standsforthetrace.
observation di,t along with the parameter estimates from
neigh-boring nodes
φ
j,t,
j∈
Ni
are available toit. Therefore each nodeincursthecostfortheparameter w [17]
Ji
(
w)
=
1 2E|
di,t−
u T i,tw|
2+
1 2 j∈Ni\{i}α
i,jw− φ
j 2 2,
(2)where
α
i,jisanon-negative,realcoefficientsatisfyingNj=1α
i,j=
1 thatassigns differentweightstodifferentneighbors.Inorderto minimize (2)inan onlinemanner,weemploy thestochastic gra-dientapproach[24].Tothisend, wecalculatethegradientfor(2)
as
∇
wJi(
w)
T=
Ru,iw
−
Rdu,i+
j∈Ni\{i}α
i,j(
w− φ
j),
(3)where Ru,i
=
E[
ui,tuTi,t]
and Rdu,i=
E[
ui,tdi,t]
.Using theinstan-taneous approximations Ru,i
≈
ui,tuiT,t and Rdu,i≈
ui,tdi,t in(3),we obtainan approximateexpressionforthegradientofthe cost functionin(3)as
∇
wJi(
w)
T≈
ui,t(
uTi,twi,t−
di,t)
−
j∈Ni\{i}α
i,j(φ
j−
wi,t).
(4)Consideringthatweareoptimizingasumoftwoconvexcost func-tionsin(2)withtheuseof(4),wenotethatwecancarryoutthe optimization using incremental solutions over (2) where the up-dateisperformedintwosteps.Sinceweconsiderthe adapt-then-combine(ATC)diffusionstrategyforthispaper,firstwe createan intermediate estimatebyusingthegradientofthefirstsummand in(2)andthenupdatetheestimateusingthesecondsummandin
(2)as[17]
φ
i,t+1=
wi,t+
μ
iui,t(
di,t−
uiT,twi,t),
(5) wi,t+1= φ
i,t+1+
η
i j∈Ni\{i}α
i,j(φ
j,t+1− φ
i,t+1),
(6) whereμ
i andη
i are positive step sizes. Note that we havere-placedtheestimatescomingfromneighbors
φ
j withtheirinstan-taneousapproximations
φ
j,t+1.Now,werepresenttheequationin(6)as wi,t+1
=
j∈Nipi,j
φ
j,t+1,
(7)where we have defined pi,i
= (
1−
j∈Ni\{i}
η
iα
i,j) and pi,j=
η
iα
i,jfor j=
i toobtain(7),yieldingthenetworkmatrixP= [
pi,j]
comprisedofthecombinationweights
Nj=1pi,j=
1 withpi,j≥
0. 3. DistributedestimationwithleveltriggeredsamplingThewell-knownATCfulldiffusionscheme(7)requiresallnodes inthenetworktocommunicatetheircurrentestimates(i)intheir entirety,and(ii)atafixedratetoalltheirneighboringnodes[17]. We proposeanewscheme,whichachievesanincreased commu-nicationefficiencyby conditioningthediffusionofinformationon thetriggerofanevent,insteadofrelyingonafixedrateofdiffusion. Ourapproachconsiderablyreducestheloadoncommunication re-sources sinceonly“significantchanges” inthediffusedparameter, e.g.,anabruptchangeinthelocalestimate,areconveyedbasedon theparticularrealizationofthesignal.
Toclarify theframework,we considerthediffusionofascalar parameter
ξi
,t fromagivennodei toaneighboringnode j.Asanexample,thisinformationcan be asingle componentofthe esti-mates [23],orthe errorassociatedwith an additionalestimation layer [22]. In our distributed framework, due to communication constraints, a quantized version of the original parameter,
ξ
iq,t isFig. 2. Illustration oftheoperation oftheLCquantizer.Blue dotsrepresentthe originalnodeestimates,whileredonesrepresentthequantizedversionofthe cor-respondingestimates.(Forinterpretationofthereferencestocolorinthis figure legend,thereaderisreferredtothewebversionofthisarticle.)
shared.We aimto formaquantizationscheme,whichguarantees that
ξi
,t andξ
iq,t are approximatelyequal to each other forall t,while atthe same time keepingthe load on communication re-sourcesrelativelysmall.
To solve this problem, we propose an event-triggered com-municationalgorithmwhere,astheevent-triggeredapproach,we specificallyuselevelcrossing(LC)quantization[19].Toclarifythe framework,suppose we haveadiscrete time signal
ξi
,t asshowninFig. 2thatrepresentstheinformationtobecommunicatedfrom thenode i tothenode j,e.g.,theestimatedparameter,orthe es-timationerror.In conventionalquantization, ateachtime instant, wesampleandquantizethisparameter.Ontheotherhand,inthe LCquantization, weconsiderasetoflevels
S {
l1,
. . . ,
lK}
,whichisillustrated in Fig. 2. At each discrete time index t, thenode i
checks whether a level-crossing has occurred on
ξi
,t. When theparameter
ξi
,t crossesalevelli,t,i.e.,ξ
i,t−1−
li,tξ
i,t−
li,t<
0 for some li,t∈
S
,
thenode i transmitsinformationtoitsneighboringnodes.For ex-ample,thisinformationcan bethe directionof thelevel-crossing
[19].Aneighboringnode j usesthisreceivedinformationtoform anestimate
ξ
iq,t forξi
,t.If there is an information transfer by the node i at time t,
thereceivingnode j estimatestheparameterasthelevelthrough whichalevelcrossinghasoccurred:
ξ
iq,t=
li,t.
(8)Forthe time instants when the node i is silent, the node j
in-fers that nosignificant change inthe parameter hastakenplace, anduses the estimatedparameter value from the previous time instant:
ξ
iq,t= ξ
iq,t−1.
(9)We note that the set of levels
S
is known by all nodes in the network. Hence, as the diffused information, it is sufficient for the node i to only convey howξ
iq,t changes compared to the previously-crossedlevelξ
iq,t−1. In particular, we note the follow-ingtwocases: Inthefirstcase,theparameterξi
,t changes slowlyenough such that a crossing through multiple levels do not oc-cur,sothat thenode i onlyneeds toindicatethedirection ofthe changeinlevels.Therefore,wetransmittwobitsforthiscase,one forindicating that the single level crossing occursand the other forindicatingthedirectionofcrossing.Inthesecondcase,wemay havemultiple crossings where we directly code the full location
information ofthe new levelvalue
ξ
iq,t with a flag bitindicating multiplelevelcrossingoccurredusinglog2(K
)
+
1 bits.Asshown, this approach significantly lowers the amount of communication whilemaintainingestimationperformance.4. Algorithmdescription
In this section, we present the full algorithmic description of theproposeddiffusionschemewiththelevel-crossingquantization
[19].At time t, a given node i in the network makes the scalar observationdi,tthroughthelinearmodeldi,t
=
uTi,two+
vi,t,whichis then used to update its intermediary local estimate using the LMSadaptation
φ
i,t+1= (
IM−
μ
iui,tuTi,t)
wi,t+
μ
iui,tdi,t.
Due to the quantized communication framework, a neighboring node j doesnot have access to the true value of the parameter
φ
i,t+1,whichhas M entries.As such, basedonthe limited infor-mation it receives fromthe node i, the node j triesto estimate thisparameterasthe M-entryvectorφ
iq,t+1.Specifically,intheLC quantization, thenode j receivesinformationabouthowthe cur-rent values of the entries of the parameterφ
i,t+1 have changed relative to the most recent estimate the node j has access to, namelyφ
qi,t.Thenode i recordsthemostrecentestimate,φ
qi,t,asa referenceanddiffusesinformationtotheneighboringnodes j∈
Ni
indicating how the current estimate
φ
i,t+1 compares to this ref-erence ona per-entry basis. Inparticular, the node i makes this comparisonby checkingfora levelcrossing between correspond-ingentries ofthetwo vectorquantitiesφ
qi,t andφ
i,t+1.Ifthereis a levelcrossing on an entry,the node i transmitsinformation to its neighbors through a channel frequency allocated to this par-ticular entry. If there is a single level-crossing, this information indicates thedirectionofthelevelcrossing;otherwise, the trans-mittedinformationdirectlyspecifiesthelocationofthenewlevel. Aneighboringnode j thenconstructstheestimateφ
qi,t+1 using(8)or(9)onaper-entrybasis,dependingonwhetherthenodei
dif-fusesinformationornot,respectively,attime t.
While diffusing informationrelated to its own local estimate, the node i alsoreceives information fromtheneighboring nodes
j representing their local estimates
φ
j,t+1. For each neighboring node j, the node i uses this diffused information to reconstructφ
qj,t+1 using (8) or (9). The final estimate wi,t+1 is then con-structedusingthecombinationwi,t+1
=
pi,iφ
i,t+1+
j∈Ni\{i}pi,j
φ
qj,t+1.
Remark.Inordertokeep thepresentationclear,we illustratethe special caseof M
=
1 ofthe proposed algorithm inAlgorithm 1, whichcanbegeneralizedtoarbitraryM inastraightforward man-ner.Remark.Wenotethatanalternativeapproachtodealingwiththe
M
>
1 case is to havethe nodesinthe network transmit onlya certain entryoftheirintermediary estimatesφ
i,t.Asan example,in thiscase, the nodescan cyclethrough different entriesacross timeinaround-robin fashion.Thenon-communicatedentriesare replacedbythecorrespondingentriesinthelocalintermediary es-timate[23].ThisapproachisexploredinSection7.
5. Meanstabilityanalysis
Tocontinuewiththestabilityanalysisoftheproposedscheme, we assume that the regressors ui,t are temporally and spatially
Algorithm1ATCdiffusionLMSwiththeLCquantization,M
=
1. 1: for i=1 toN do Initialization: 2: wi,0= φ q i,0=0 3: end for 4: for t≥0 do 5: for i =1 toN do Localadaptation: 6: φi,t+1= (1−μiu2i,t)wi,t+μiui,tdi,tCheckforlevelcrossing: 7: if ∃li,t∈ Ssuch that
(φqi,t−li,t)(φi,t+1−li,t)<0 then
8: if The crossingistoanadjacentlevel then 9: Diffusethedirectionofthecrossing 10: else
11: Diffusethelocationofthenewlevel
12: end if
13: Locallystoreφiq,t+1=li,tinrecord
14: else 15: Remainsilent 16: Locallysetφiq,t+1= φ q i,t 17: end if Reconstruction: 18: for all j ∈ Ni\{i}do
19: if node j issilent then
20: Reconstructasφqj,t+1= φq j,t
21: else
22: Reconstructφqj,t+1usingthediffused information 23: end if 24: end for Combination: 25: wi,t+1=pi,iφi,t+1+j∈Ni\{i}pi,jφqj,t+1 26: end for 27: end for
independent,zero meanandwhite,with covariancematrix
i E ui,tuiT,t
=
σ
2u,iIM.Theobservationdi,t atnodei is assumedto
followalinearmodeloftheform
di,t
=
uTi,two+
vi,t,
(10)where
{
vi,t}
t≥1 isazeromeanwhiteGaussiannoise processwith varianceσ
2v,i,independentof
{
uj,t}
t≥1∀
i,j.Inour proposedlevel-triggered estimationframework, ateach node i, the diffusion LMS update for the ATC strategy takesthe form
φ
i,t+1= (
IM−
μ
iui,tuiT,t)
wi,t+
μ
iui,tdi,t,
(11) wi,t+1=
pi,iφ
i,t+1+
j∈Ni\{i} pi,jφ
qj,t+1,
(12)wherethecombinationmatrix P istakentobestochastic,withits rowssumming up to unity. We rewrite theexpressions (11)and
(12)as
φ
i,t+1= (
IM−
μ
iui,tuiT,t)
wi,t+
μ
iui,tdi,t,
(13) wi,t+1=
j∈Ni pi,jφ
j,t+1−
j∈Ni\{i} pi,jα
j,t+1,
(14)bydefiningthequantizationerrorfornode j
α
j,tφ
j,t− φ
q j,t.
We representthe diffusionupdate over thenetwork
N
in state-spaceformbyintroducingthefollowingglobalquantities:dt
col d1,t, . . . ,
dN,t vtcol v1,t, . . . ,
vN,t wocol{
wo, . . . ,
wo}
Utdiag u1,t, . . . ,
uN,t Mdiag{
μ
1IM, . . . ,
μ
NIM}
wtcol w1,t, . . . ,
wN,tφ
tcolφ
1,t, . . . , φ
N,tφ
qt colφ
q1,t, . . . , φ
qN,tα
tcolα
1,t, . . . ,
α
N,t GP⊗
IM PCP−
diag{
P}
GCPC⊗
IMUsingtheabove-definedquantities,thediffusionupdates(13),(14)
takethefollowingglobalstate-spaceform:
φ
t+1= (
IM N−
M UtUtT)
wt+
M Utdt,
(15)wt+1
=
Gφ
t+1−
GCα
t+1.
(16) Similarly, the data model (10)can be expressed in terms of the globalquantitiesasdt
=
UtTwo+
vt.
(17)To facilitate the mean stability analysis, we define the global deviationparameters
˜
wt
wo−
wt,
˜φ
two− φ
t.
Aftersubstituting(17)andsubtractingbothsidesof(15),(16)from wo, the diffusion updates in terms of the deviation parameters takethefollowingform:
˜φ
t+1= (
IM N−
M UtUtT)
w˜
t−
M Utvt,
(18)˜
wt+1
=
G˜φ
t+1+
GCα
t+1,
(19) where wehave usedtherelation G wo=
wo, whichresultsfrom thestochasticnatureof P .Theexpressions(18),(19)canbeexpressedcompactlyas
˜
wt+1
=
G(
IM N−
M UtUtT)
w˜
t−
G M Utvt+
GCα
t+1.
(20)Assumption.Thequantizationerroroverthenetwork
α
t haszeromean.Thisis areasonable assumptionfortheanalysisof quanti-zation effects[24].The applicabilityof theassumptionis verified byourexperimentsinSection7.
Takingexpectationsofbothsidesof(20)yields
E
w˜
t+1=
G(
IM N−
M)E˜
wt,
(21)where
diag
{
1, . . . , N
}
isblockdiagonal.Formeanstability andasymptoticunbiasednessofthedistributedfilter(11)–(12),we require that the spectral radius|
G(
IM N−
M)
|
<
1, which,not-ing that G is stochasticwithnonnegativeentries,isequivalent to requiring
|(
IM N−
M)| <
1,
(22)by the Theorem 4
.
4 of [25]. Noting that the eigenvalues of the block diagonalmatrix IM N−
Mistheunionofthe eigenvalues
ofitsindividualblocksIM
−
μ
iiwhere
i
=
σ
u2,iIM;weconcludethat the distributedfilter is mean stableif
|
1−
μ
iσ
u2,i|
<
1,
i=
1
,
. . . ,
N,i.e.,if 0<
μ
i<
2σ
2 u,i i=
1, . . . ,
N,
6. Mean-squarestability
Weutilizetheweighted energyrelationapproach[24] to pro-ceed the mean square transient analysis of the distributed fil-ter. Through a positive-definite weighting matrix
, taking the weightednormofbothsidesof(20)yields:
˜
wtT+1w
˜
t+1=
˜
wTt(
IM N−
M UtUtT)
TGTG
(
IM N−
M UtUtT)
w˜
t−
2vtTUTt M GTG
(
IM N−
M UtUtT)
w˜
t+
2α
Tt+1GTCG
(
IM N−
M UtUTt)
w˜
t−
2vtTUTt M GTGC
α
t+1+
vTtUtTM GTG M Utvt
+
α
tT+1GCTGC
α
t+1.
(23)Notingthat vt iszero-mean andindependentof Ut and w
˜
t, andtakingtheexpectedvalueofbothsidesof(23)yieldsthefollowing variancerelation: E
˜
wt+12=
E˜
wt2+
2Eα
T t+1GTCG
(
IM N−
M UtUtT)
w˜
t−
2EvtTUtTM GTG
Cα
t+1+
E vTtUTtM GTG M Utvt
+
Eα
Tt+1GTCGC
α
t+1,
(24) whereGT
G
−
GTG M U
tUtT−
UtUtTM GTG
+
UtUtTM GTG M U
tUtT.
BythetemporalindependenceoftheregressorprocessUt andthe
independenceofthenoiseprocessvt fromUt,wehavetheresult
thatUt isindependentofw
˜
t.Hence,therandomweightingmatrixcanbereplacedbyitsmeanvalue
E
in(24).Thus,
=
GTG
−
GTG M
−
M GTG
+
E UtUtTM GTG M U
tUtT,
(25) whereEUtUtT
.Substituting the
˜φ
t+1 expressionfrom(18) into(24)yieldsthefollowingfinalformofthevariancerelationE
˜
wt+12=
E˜
wt2+
2Eα
T t+1GTCG
˜φ
t+1+
Eα
t+T 1GTCG
Cα
t+1+
E vtTUtTM GTG M U
tvt.
(26)To capture the mean-square behavior of the adaptive net-work, we express the relations (25), (26) in a compact form by using the convenient vector notation [24]. In particular, we use the bvec
{·}
block vectorization operation [16] which trans-forms an arbitrary M N×
M N block matrixwith the
(i,
j)th blocki j of size M
×
M into the vector col{
σ
1, . . . ,
σ
N}
,whereσ
jcolvec
{
1 j}, . . . ,
vec{
N j}
. We also use the block Kro-neckerproduct A
B definedashavingthe(i,
j)thblock[
AB]
i j=
⎡
⎢
⎣
Ai j⊗
B11. . .
Ai j⊗
B1N..
.
. .
.
..
.
Ai j⊗
BN1. . .
Ai j⊗
BN N⎤
⎥
⎦ ,
(27)which is related to the bvec
{·}
operator via bvec{
A BC}
=
(C
T A)bvec{
B}
.Definingσ
bvec{}
andvectorizingbothsides of(25)yields bvec{
} = ((
IM NIM N)
− (
MIM N)
− (
IM NM
)) (
GTGT)
σ
+
bvec{
E UtUtTM GTG M U
tUtT}.
(28) The term EUtUtTM GTG M UtUtT
on the right-hand side of
(28) can be vectorized by resorting to the Gaussian factoriza-tion theorem [16,17]. We let
˜ =
M GTG M with
(i,
j)th block˜
i,j and with the vectorized form bvec{ ˜}
=
col˜
σ
1, . . . ,
σ
˜
j whereσ
˜
j=
col˜
σ
1 j, . . . ,
σ
˜
N j.Then,the
(k,
l)thblockkl of
EUtUTt
˜
UtUTt isgivenbykl
=
k
˜
kll for k
=
l,
k
˜
klk
+
2kTr
{ ˜
kkk
}
for k=
l,
withthevectorizedform
γ
kl=
(
l⊗
k)
σ
˜
kl for k=
l,
(
l⊗
k)
+
2rkrkT˜
σ
kl for k=
l,
by the factorization theorem, where
k E uk,tukT,t , rk
vec
{
k}
. Letting bvec{}
=
colγ
1, . . . ,
γ
j whereγ
j=
colγ
1 j, . . . ,
γ
N j,weobservethatwecanexpress
γ
j intheformγ
j=
A
jσ
˜
j,
whereA
jdiagj
⊗
1, . . .
j
⊗
j+
2rjrTj, . . . ,
j⊗
N . FurtherdefiningA
diag{
A
1, . . . ,
AN
}
,wearriveatthe represen-tation bvec{} =
A
bvec{ ˜} =
A
(
MM)(
GTGT)
σ
.
(29) Substituting(29)to(28)yields bvec{
} = ((
IM NIM N)
− (
MIM N)
− (
IM NM
)
+
A
(
MM)) (
GTGT)
σ
.
(30) ThetermEvtTUtTM GTG M Utvt in(26)canbeverifiedtobe E vtTUTtM GT
G M Utvt
=
ETr{
vTtUTtM GTG M Utvt
}
=
E Tr{
G M UtvtvTtUTtM GT}
=
Tr{
G M H M GT},
(31) where we have defined H=
EUtvtvtTUtT. We observe that H has the
(k,
l)th block Hkl=
σ
v2,kk
δ
kl, which yields H=
(
v⊗
IM) ,wherevE vtvTt .Thus(31)becomes E vtTUTtM GT
G M Utvt
=
Tr{
G M(
v⊗
IM)
M GT}
= ((
G MG M)
bvec{(
v⊗
IM) })
Tσ
.
(32)SimilarlytheremainingtermsintheRHSof(26)canbeverifiedto be
E
α
T t+1GTCG
˜φ
t+1= ((
GGC)
bvec{
E[
α
t+1˜φ
T t+1]})
Tσ
,
Eα
Tt+1GTCGC
α
t+1= ((
GCGC)
bvec{
E[
α
t+1α
t+T 1]})
Tσ
.
(33) Definingthequantitiesbt
(
G MG M)
bvec{(
v⊗
IM) } + (
GGC)
bvec{
E[
α
t˜φ
T t]}
+ (
GCGC)
bvec{
E[
α
tα
Tt]},
F((
IM NIM N)
− (
MIM N)
− (
IM NM
)
+
A
(
MM)) (
GTGT),
(34)andfurtherusingtheshorthandE
˜
wt2σ forE˜
wt2bvec−1(σ),yields thefollowingcompactformfortheweightedenergyrecursion:E
˜
wt+12σ=
E˜
wt2Fσ+
bTt+1σ
(35)Remark. We note that the expectations E
[
α
t+1˜φ
T t+1]
and E[α
t+1α
tT+1]
presentsome difficulty forfurtheranalytical simpli-fications in closed form, in exact or approximate terms. This is causedbythelargedegreewithwhichthequantizationerrortermα
t iscoupledwithitselfaswellastheintermediaryparameterde-viation
˜φ
tnonlinearlythroughthenon-deterministicreference lev-els{φ
qi,t}
t≤t against whichthelevel crossingeventsare checked,whichevolvethrough(13)–(14).Wefurthernotethat invokingan approximationbasedonindependenceargumentsforE
[α
t+1˜φ
T t+1
]
, whichcapturesthecovariancesbetweentheintermediary param-eterdeviationsandthe quantizationerrorsover arbitrarypairs of nodes on the network, is not feasible in general unless further assumptions aremade onthe numberofquantizationlevels em-ployed so that the deviations become statistically less sensitive on the error terms. We stress that the lack of closed-form ex-pressionsfortheseexpectationsdoesnothamperouranalysisfor themean-squarestability,sincerequiringthattheaforementioned termsremainboundedissufficientforthepurposesofestablishing aboundforthe(weighted)mean-squaredeviationE˜
wt2σ .Iterationof(35)yieldstherecursions
E
˜
wt+12σ=
E˜
wt2Fσ+
bTt+1σ
E˜
wt+12Fσ=
E˜
wt2F2σ+
btT+1Fσ
..
.
E˜
wt+12 FN2 M2−1σ=
E˜
wt 2 FN2 M2σ+
b T t+1FN 2M2−1σ
.
(36)Using Cayley–Hamilton theorem with characteristic polynomial
p(x)forFresultsin FN2M2
= −
pN2M2−1FN2M2−1
− . . . −
p1F−
p0.
Substitutingto(36)thenresultsintheexpressionE
˜
wt+12 FN2 M2−1σ= −
pN2M2−1E˜
wt2 FN2 M2−1σ− . . . −
p0E˜
wt 2 σ+
bt+T 1FN2M2−1σ
,
whichcanbeplacedintothestatespaceform
Wt
+1=
FWt
+
Yt
+1,
(37) whereWt
⎡
⎢
⎢
⎢
⎢
⎣
E˜
wt2σ E˜
wt2Fσ..
.
E˜
wt2 F(N2 M2−1)σ⎤
⎥
⎥
⎥
⎥
⎦
,
Yt
⎡
⎢
⎢
⎢
⎢
⎣
bTtσ
bTt Fσ
..
.
btTFN2M2−1σ
⎤
⎥
⎥
⎥
⎥
⎦
(38)F
⎡
⎢
⎢
⎢
⎣
0 1 0. . .
0 0 0 1. . .
0..
.
..
.
..
.
. .
.
..
.
−
p0−
p1−
p2. . .
−
pN2M2−1⎤
⎥
⎥
⎥
⎦
.
(39)Tomakethemean-square stabilityanalysismoretractable,we introducethefollowingassumption:
Assumption.ThequantizationerrorcovariancesE
α
t+1˜φ
T t+1 and Eα
t+1α
tT+1remain bounded, with
E
α
t+1˜φ
T t+1F
,
Eα
t+1α
tT+1F<
A forsome A>
0 fortheFrobeniusnorms.Usingtheassumption,weobtainaboundthenorm
bt2as bt2≤ (
G MG M)
bvec{(
v⊗
IM) }
2+
GGC2bvec{
E[
α
t˜φ
T t]}
2+
GCGC2bvec{
E[
α
tα
Tt]}
2≤ (
G MG M)
bvec{(
v⊗
IM) }
2+
A(
P2+
PC2)
PC2B.
Inspecting(39),weobservethattheboundednessof
bt2 im-pliestheboundednessofYt
2,hence∃
C>
0 s.t.Yt
2<
C∀
t.Therecursion(37)canbesolvedfor
Wt
inclosedformasWt
=
F
tW
0+
t−1 n=0F
nYt
−n.
(40)Using(40),wecanobtainaboundfor
Wt
2asWt
2≤
F
t2W
02+
t−1 n=0F
n 2Yt
−n2≤
Ft2W
02+
C t−1 n=0 Fn2=
Ft2W
02+
C 1−
Ft2 1−
F2 (41) where we have used the fact that sinceF
is in the form of a companionmatrixforF ,theysharethesamesetofeigenvalues.We note that requiringthat
Wt
2 remains boundedis suffi-cient toguaranteethemean-squarestabilityoftheoverallsystem since doing so ensures that E˜
wt2σ remainsbounded. Thus, by(41),themean-squarestability conditionreducestothematrix F givenby(34)beingstable.HenceinordertoensureMSstability,it issufficientthatthestepsizes
μ
iarechosensuchthatthematrix F isstable.7. Experiments
Inthissection,wedemonstratethesignificantreductioninthe communication load achievedby our algorithms while providing equalperformancewithrespecttothestateoftheart.
For the first part of the simulations, we consider a sample network consistingof N
=
10 nodes, whereeach node makes its observationthroughthelinearmodeldi,t
=
uTi,two+
vi,t,
i=
1, . . . ,
N.
(42)The regressor data ui,t are zero mean i.i.d. Gaussian with
stan-darddeviations
σ
u,i chosen randomlyfromtheinterval(
0.
3,
0.
8)
.The observationnoisesare generated froma Normal distribution withstandard deviations
σ
v,i chosen randomly fromthe interval(
0.
1,
0.
3)
.InFig. 3,we depict thenetworktopology andthe net-work’s statistical profile to show how the signal power and the noisepowervaryacrossthenetwork.Theunknown vector parameter wo with M
=
10 componentsisrandomlychosenfromaNormaldistributionandnormalizedto haveaunitenergy.Wechangedthesourcestatisticsinthemiddle ofthesimulations toobservehowwell theproposedalgorithmis abletotrackthesuddenchangesintheunknownparameter.
We useMetropolis combination ruleto generate the network matrixP suchthat
pi,j
=
⎧
⎨
⎩
2
M2max(1Ni,Nj) if i
=
j are linked,0 for i and j not linked, 1
−
j∈Ni\ipi,j for i
=
jusingtherandomlyselectednetworkadjacencymatrixgivenby
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
1 1 0 0 0 1 1 0 0 0 1 1 1 0 1 0 1 0 0 0 0 1 1 1 0 1 0 0 0 1 0 0 1 1 0 1 0 0 1 1 0 1 0 0 1 1 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
.
Weconfigure thenodes such that they cyclethroughthe entries ofthe intermediary estimates
φ
i,t in a round-robin fashion, andexchangeonlythisselectedL
=
1 dimensionoutofM inonetime instant. Forinstance, for a L=
1,
M=
3 system at time instantst
=
1,
. . . ,
4,theithnodewillsenditsentriesoftheintermediary estimateφ
i,t asin(43)φ
1,i=
⎡
⎣
φ
10,1,i 0⎤
⎦ , φ
2,i=
⎡
⎣
φ
20,2,i 0⎤
⎦ , φ
3,i=
⎡
⎣
00φ
3,3,i⎤
⎦ ,
φ
4,i=
⎡
⎣
φ
10,4,i 0⎤
⎦ ,
(43)where
φl
,t,i isthelthdimension oftheintermediaryentryφ
i,t oftheithnodeattimet thatissenttotheneighbors.
Weevaluatethe communicationreductionperformance ofthe proposed algorithm withrespect tothe algorithm in[23],where only one entry of intermediate estimates is exchanged by the nodesateachroundinasequentialorderasexplainedin(43).
In Fig. 4,the MSD performance of the proposed algorithm is demonstrated,where as a reference, we have considered the al-gorithm in [23] with an adaptive Lloyd–Max quantizerandwith a no-quantization (scalar diffusion) implementation of the sys-tem. Notethat both in scalar diffusionalgorithm andLloyd–Max quantizedalgorithm,whichisreferredasconventionallyquantized algorithmlater, nodesare exchanging the information ofone di-mension per communication round. However, in scalar diffusion algorithm, information of the exchanged dimension is diffused withfull precision, while in Lloyd–Max case, information of the exchangeddimension isquantizedwithafiniteprecision. We se-lectedthequantizationintervalsothatwedonotsufferfromany
Fig. 3. Network topology and statistical profile.
saturation effects andalsowe have chosen thenumber of quan-tization levels so that nofurther significant improvementcan be madeontheMSDperformanceofthealgorithmsbyincreasingthe numberoflevels.Weobservedthat 53 quantizationlevelsforthe LCalgorithmand31 quantizationlevelsfortheconventional algo-rithm were sufficient.We useastep sizeof
μ
=
0.
05 during theFig. 4. The globalMSDcurvesoftheproposedalgorithm,displayedwiththelabel ‘LC’,incomparisonwiththeconventionalquantizationandthescalardiffusion al-gorithms(N=10,M=10).Magnifiedfigureprovidesthetransientperformanceof thealgorithms.Sourcestatisticschangeattimet=2×104.
Fig. 5. Time evolutionofthenumberofbitstransmittedacrossthenetwork.Sudden increaseinthe‘LC’curvecorrespondstothetimewherethesourcestatisticsare changed.
simulationsduetoitsgoodlearningrateandconvergenceresults. Theresultsthatweobtainedintheexperimentsareaveragedover 100independenttrials.
From thesesimulations,we observethat the convergencerate ofthescalar diffusionandtheconventionally quantizeddiffusion algorithmsaresuperiorcomparedtotheproposedalgorithm,while thesteady-stateMSDvaluesofallthreesystemsareidentical.We notethat itwasouraimtogetequalsteady-stateMSDvalues al-lowingafaircomparisonintermsoftheconvergencespeeds.Also, it is observed that the proposed algorithm is able to adapt well whenfacedwithasuddenchangeinthesourcestatistics.
In Fig. 5, we present the communication load that each al-gorithm incurs on the network. We exclude the scalar (infinite-precision) diffusion algorithm from this comparison since it re-quires an infinite number of bits to encode the information ex-changedamongthenodes.Weobserveasubstantialenhancement in the communication efficiency achievedby the proposed
algo-rithmintermsofthetotalnumberofbitsexchangedbetweenthe nodes across theentire adaptivenetwork withrespect to the al-gorithm that uses the conventional quantization. Particularly, for this N
=
10 node network, we note that the proposed algorithm provides 103 times less communication load over the reference implementation withthesame steadystate MSDvalues.We also observe that atthe time of changein thesource statistics, there is asudden increaseinthe numberofbitsusedby theproposed algorithm.Thisisbecausetherearemultiplelevelcrossingsoccur duetothesuddenchangeintheparameterofinterestatthattime, which requires more than two bits to encode. However, we ob-servethatthesystemquicklyadaptsitselftousingtwo-bitsagain. The same behavior is not present forthe conventional quantiza-tioncasesinceitalreadyencodestruevaluesofthelevelsatevery single time instant. We stress further that we achieve this im-provement withrelatively little complexity since we haveshown that usinga simplenon-adaptivequantizerissufficient torealize theimprovements.In the second part of the experiments, we aim to observe theperformanceoftheproposedalgorithmoverhighdimensional data. Therefore, we have changed the former setup so that the unknown vectorparameter wo with M
=
100 componentsisran-domlychosenfromaNormaldistributionandnormalizedtohave aunit energy.Weusethesamedistributednetworkwith connec-tions giveninFig. 3c.Quantizationlevels forthealgorithmsagain chosen so that no further significant improvement can be made by increasing the numberoflevels. Weobserved that 53 quanti-zation levels forthe LC algorithm and 31 quantizationlevels for the conventional algorithm were sufficient. We againuse a step sizeof
μ
=
0.
05 andtheresultsareaveragedover10independent trials.We havedecreasedthenumberofindependenttrialstobe averagedsinceprocessinghighdimensionaldatatakessignificantly moretime.We presentthe MSD performance of the proposed algorithm incomparisonwiththesequentialvariantofthealgorithmin[23]
with the parameters M
=
100, L=
1 in Fig. 6. We observe that in the high dimensional data case, the convergence rate of the proposedalgorithmisthesameasthecomparedalgorithms.They alsohavethesamesteady-stateMSDvalues.Theseresultsindicate thattheadaptationperformancesofthescalardiffusionalgorithm andtheconventionallyquantizeddiffusionalgorithmdecreasefor the highdimensional case since the nodes are allowed to share onlyonedimensionperround,whichpreventsthemfromquickly sending their entire intermediary estimates to their neighboring nodes.Therefore,we observethatforsuchsystems,theproposed algorithmperformssimilartothescalardiffusionandthe conven-tionallyquantizedalgorithms.In Fig. 7, we illustrate thecommunicationload foreach algo-rithm.Weobserveanimprovementonthecommunication require-ments in a similar vein to the previous experiments. Ultimately, theproposedalgorithmincurs102 timeslesscommunicationload compared to the baseline, wherethe numberof transmitted bits is significantly reduced. The magnitude of this reduction is of a smaller scalecomparedto thenon-highdimensionalcase, onthe other hand,mainly due to the extra bits required to encode the higherdimensionsformultiplelevelcrossingsintheLC quantiza-tion.
In the third partof the experiments, in order to observe the possibleeffectsofnumberofquantizationlevels,wesimulatethe algorithms within an identical experimental setup – except that the number ofquantization levels are no longer optimized asin the previous cases. To this end, we have arbitrarily chosen 25 quantization levels for the LC algorithm and again 25 levels for theconventionalalgorithm.Weusethesamedistributednetwork connectionsgiveninFig. 3c.Wehaveusedastepsizeof
μ
=
0.
05 andtheresultsareaveragedover100independenttrials.Fig. 6. The globalMSDcurvesoftheproposedalgorithm,displayedwiththe la-bel‘LC’,incomparisonwiththeconventionalquantizationandthescalardiffusion algorithmsoverhighdimensionaldata(N=10,M=100).Magnifiedfigure pro-videsthetransientperformanceofthealgorithms.Sourcestatisticschangeattime t=104.
Fig. 7. Time evolutionofthenumberofbitstransmittedbythealgorithmsacross networkoverhighdimensionaldata(N=10,M=100).Suddenincreaseinthe‘LC’ curvecorrespondstothetimeatwhichthesourcestatisticsarechanged.
Wepresentthe MSDperformancesofthealgorithmsinFig. 8. We observe that when sub-optimal quantization levels are used, thecompared algorithms exhibit superior performance compared tothe proposed algorithmboth intermsof theconvergencerate andthesteady-stateMSD. Wealso note thatthe quantized algo-rithmscould notreachthesteady-stateperformanceofthescalar diffusion due to the deliberate poor selection of the number of quantizationlevels.
Theseresultsareobservedduetoafailureonthesystem’spart to satisfy the assumed quantization error model. The statistical model that we used for the quantization error
φ
qi assumes that ithaszeromeansuch that E[φ
iq]
=
0[24].However,whensucha lownumberofquantizationlevelsare selected,thismodelceases tobe applicableandthequantizedalgorithmsarenolongerguar-Fig. 8. The globalMSDcurvesoftheproposedalgorithm,displayedwiththelabel ‘LC’,incomparisonwiththeconventionalquantizationandthescalardiffusion al-gorithmswithsub-optimalquantizationlevels(N=10,M=10).Sourcestatistics changeattimet=104.
Fig. 9. Time evolutionofthenumberofbitstransmittedbythealgorithmsacross networkwithsub-optimalquantizationlevels(N=10,M=10).Suddenincreasein the‘LC’curvecorrespondstothetimeatwhichthesourcestatisticsarechanged.
anteed to converge to the steady-state MSD values of the scalar diffusionalgorithm.
InFig. 9,wepresentthecommunicationloadofthealgorithms overthenetworkforthecaseofasub-optimallevelselection.We again observe a similar behavior where the proposed algorithm diffusesmorethan103 timeslessbitsthroughnetworkcompared tothebaseline.Wenotethatthedifferenceinthenumberofbits exchanged between the two algorithms is larger compared with thepreviousresults.Thiscanbeexplainedbythefactthatweuse fewer quantization levels forthe LC algorithm,which makes the occurrenceofmultiple levelcrossingsa rarerphenomenon.Thus, it becomes less likelyfor each node to send our morethan two bits of information fora given iteration. Ultimately, this particu-larexperimentillustratestheexistenceofatrade-offbetweenthe estimationperformance andthecommunicationloadimposedon thenetwork.