Resource-aware event triggered distributed estimation over adaptive networks

(1)

Contents lists available atScienceDirect

Digital

Signal

Processing

www.elsevier.com/locate/dsp

Resource-aware

event

triggered

distributed

estimation

over

adaptive

networks

Ihsan Utlu

a

,

b

,

O. Fatih Kilic

a

,

Suleyman

S. Kozat

a

,

∗

a_Department_of_Electrical_and_Electronics_Engineering,_Bilkent_University,_Ankara,_Turkey b_ASELSAN_Research_Center,_Ankara_06370,_Turkey

a

r

t

i

c

l

e

i

n

f

o

a

b

s

t

r

a

c

t

Articlehistory:

Availableonline1June2017 Keywords:

Distributedestimation Adaptivenetworks

Event-triggeredcommunication Level-crossingquantization

We propose a novel algorithm for distributed processing applications constrained by the available communication resources using diffusion strategies that achieves up to a103 fold reduction in the communicationloadoverthenetwork,whiledeliveringacomparableperformancewithrespecttothe stateoftheart.Aftercomputationoflocalestimates,theinformationisdiffusedamongtheprocessing elements(ornodes)non-uniformlyintimebyconditioningtheinformationtransferonlevel-crossingsof thediffusedparameter,resultinginagreatlyreducedcommunicationrequirement.Weprovidethemean andmean-squarestabilityanalysesofouralgorithms,andillustratethegainincommunicationeﬃciency comparedtootherreduced-communicationdistributedestimationschemes.

1. Introduction

In tandem with the increasing computational capabilities of processingunits andthe growing amount of generateddata, the demandon distributednetworks anddecentralizeddata process-ing algorithms have remained an area ofgrowing interest [1–3]. With intrinsic characteristics such as robustness and scalability, distributed architectures provide enhanced efficiency and perfor-mance for a wide variety of applications ranging from adaptive filtering,sequential detection, sensor networks, to distributed re-source allocation [4–9]. However, successful implementation of suchapplicationsdependsonasubstantialamountof communica-tionresources.Asanexample,insmartgridapplications, measure-mentunitsoperatingwithhighfrequencyputthecommunication infrastructureofthegridundersignificantpressure[10].Thiscalls forresource-efficient,event-triggereddistributed estimation solu-tionsthatincorporateevent-drivencommunication[11–15].Tothis end,inthispaper,weconstructdistributedarchitecturesthathave asignificantlyreducedcommunicationloadwithoutcompromising performance.Weachievethisbyintroducingnoveleventtriggered communicationarchitecturesoverdistributednetworks.

Inadistributedprocessingframework,agroupof measurement-capableagents,termednodes,inanetworkcooperatewithone an-otherinordertoestimateanunknowncommonphenomenon[16].

*

Correspondingauthor.

E-mailaddresses:utlu@ee.bilkent.edu.tr(I. Utlu),kilic@ee.bilkent.edu.tr (O. Fatih Kilic),kozat@ee.bilkent.edu.tr(S.S. Kozat).

Among the different approaches for distributed estimation, we speciﬁcallyconsiderdiffusion-basedprotocolsthatexploitthe spa-tial diversity of the network by restricting information sharing to neighboring nodes, without considering any central process-ing unit or a fusion center [16,17]. Diffusion protocols provide an inherentlyscalabledata processingframeworkthat isresilient to changes in network topology such aslink failures as well as changesinthestatisticalpropertiesoftheunknownphenomenon that ismeasured [16].However, therequirementfor allnodes to exchange their currentestimateswiththeir neighbors ateach it-eration places a heavy burden on the available communication resources[18].

Here, we propose novel event-triggered distributed estima-tion algorithms for communication-constrained applications that achieve up to a 103 _fold _reduction _in _the _{communication} _load over thenetwork. We achievethis byleveraging the uneven dis-tributionoftheeventsovertimetoefficientlyreducethe commu-nication loadin real life applications. In particular, we condition an informationexchange betweenthe neighboringnodes on the level-crossingsofthediffusedparameter[19],unlikeusingafixed rateofdiffusion,cf.[16,17].Furthermore,weshow thatitis suffi-cienttoonlydiffusetheinformationindicatingthedirectionofthe changeinthelevels,whichcanbehandledusingonlytwobitsfor aslowly-varyingparameter.

Reducedcommunicationdiffusionisextensivelystudied inthe signal processing literature [18,20–23]. In [18,20,21], the authors restrict the number of active links between neighbors using a probabilistic framework, or by adaptively choosing a single link ofcommunicationforeach node.In[22],localestimatesare

(2)

Fig. 1. An exampledistributednetworkwithbidirectionalconnections.Circulararea representstheneighborhoodoftheithnode.

domlyprojected,andtheinformationtransferbetweenthe nodes isreducedto asingle bit.In[23],onlycertain dimensionsofthe parametervectoraretransmitted.Ontheotherhand,inthispaper, wereducethecommunicationloaddowntoonlyasinglebitora couple ofbits, unlike [18,20,21,23], in which authors diffuse pa-rametersinfullprecision.Furthermore,weregulatethefrequency ofinformation exchange depending on the rateof change ofthe parameter, unlike [22] where the authorstransfer information at eachsingletimeinstant.

Ourmaincontributionsareasfollows.Weintroducealgorithms fordistributedestimationthati)signiﬁcantly reducethe commu-nicationloadonthenetwork,ii)whilecontinuingtoprovideequal performancewiththestateoftheart. Wealsoperformthemean andmean-squarestabilityanalysesofouralgorithms.Through nu-mericalexamples,weshowthatouralgorithmsprovidesigniﬁcant reductioninthecommunicationloadoverthenetwork.

The paperis organized asfollows: In Section 2,we introduce thedistributedestimationframeworkanddiscussthe adapt-then-combine(ATC)diffusionstrategy.Wefurtherdetailouralgorithms in Section 3, where we formulate the level-triggered distributed estimationalgorithm.InSection4,wepresentthealgorithmic de-scriptionoftheproposedscheme.InSections5and6,weprovide respectively the mean and mean-square stability analyses of the proposed distributed adaptive ﬁlter and state the conditions for stability.Weprovideexperimentalveriﬁcationofthealgorithm in Section7,andconcludingremarksinSection8.

2. Problemdescription

Consider a network with N nodes that are distributed spa-tiallyasshowninFig. 1.Eachnode sequentiallyobservesa noise-corruptedtransformationofanunknownparameterwo througha

linearmodel

di,t

=

uTi,two

+

vi,t

,

i

=

1

, . . . ,

N (1)

anddiffusesinformationto itsneighboring nodes j

∈

Ni

,1 where wo

∈ R

M is the unknown phenomenon, with ui,t and vi,t

rep-resentingthe regressorandthe noiseprocesses, respectively.The additiveobservationnoise vi,t andtheregressor ui,t areassumed

to be temporally and spatially independent, and independent of one another, with E

ui,tuT_i_,_t

=

σ

2 u,iIM, E

v2_i_,_t

=

σ

2 v,i. For each

node i,we assume thatattime t onlythe regressorui,t andthe

1 _We_represent_vectors_(matrices)_by_bold_lower_(upper)_case_letters._For_a vec-tora (amatrix A),aT _{( A}T₎_is_the_transpose._a_represents_the_Euclidean_norm.

Thediag{A}returnsanewmatrixwithonlythemaindiagonalofA whilediag{a} putsa onthemaindiagonalofthenewmatrix.col{a1, . . . ,aN}producesacolumn

vectorformedbycolumn-wisestackingitsargumentsontopofoneanother.IM

representstheM×M identitymatrix.⊗ standsfortheKroneckerproduct,Tr{·} standsforthetrace.

observation di,t along with the parameter estimates from

neigh-boring nodes

φ

j,t

,

j

∈

Ni

are available toit. Therefore each node

incursthecostfortheparameter w [17]

Ji

(

w

)

=

1 2E

|

di,t

−

u T i,tw

|

2

+

1 2

j∈Ni\{i}

α

i,j

w

− φ

j

2 2

,

(2)

where

α

i,jisanon-negative,realcoeﬃcientsatisfying

N_j=1

α

i,j

=

1 thatassigns differentweightstodifferentneighbors.Inorderto minimize (2)inan onlinemanner,weemploy thestochastic gra-dientapproach[24].Tothisend, wecalculatethegradientfor(2)

as

∇

wJi

(

w

)

T

=

Ru,iw

−

Rdu,i

+

j∈Ni\{i}

α

i,j

(

w

− φ

j

),

(3)

where Ru,i

=

E

[

ui,tuT_i_,_t

]

and Rdu,i

=

E

[

ui,tdi,t

]

.Using the

instan-taneous approximations Ru,i

≈

ui,tu_iT_,_t and Rdu,i

≈

ui,tdi,t in(3),

we obtainan approximateexpressionforthegradientofthe cost functionin(3)as

∇

wJi

(

w

)

T

≈

ui,t

(

uTi,twi,t

−

di,t

)

−

j∈Ni\{i}

α

i,j

(φ

j

−

wi,t

).

(4)

Consideringthatweareoptimizingasumoftwoconvexcost func-tionsin(2)withtheuseof(4),wenotethatwecancarryoutthe optimization using incremental solutions over (2) where the up-dateisperformedintwosteps.Sinceweconsiderthe adapt-then-combine(ATC)diffusionstrategyforthispaper,ﬁrstwe createan intermediate estimatebyusingthegradientoftheﬁrstsummand in(2)andthenupdatetheestimateusingthesecondsummandin

(2)as[17]

φ

i,t+1

=

wi,t

+

μ

iui,t

(

di,t

−

uiT,twi,t

),

(5) wi,t+1

= φ

i,t+1

+

η

i

j∈Ni\{i}

α

i,j

(φ

j,t+1

− φ

i,t+1

),

(6) where

μ

i and

η

i are positive step sizes. Note that we have

re-placedtheestimatescomingfromneighbors

φ

j withtheir

instan-taneousapproximations

φ

_j_,_t+₁.Now,werepresenttheequationin

(6)as wi,t+1

=

j∈Ni

pi,j

φ

j,t+1

,

(7)

where we have deﬁned pi,i

= (

1

−

j∈Ni\{i}

η

i

α

i,j) and pi,j

=

η

i

α

i,jfor j

=

i toobtain(7),yieldingthenetworkmatrixP

= [

pi,j

]

comprisedofthecombinationweights

N_j₌₁pi,j

=

1 withpi,j

≥

0. 3. Distributedestimationwithleveltriggeredsampling

Thewell-knownATCfulldiffusionscheme(7)requiresallnodes inthenetworktocommunicatetheircurrentestimates(i)intheir entirety,and(ii)atafixedratetoalltheirneighboringnodes[17]. We proposeanewscheme,whichachievesanincreased commu-nicationefficiencyby conditioningthediffusionofinformationon thetriggerofanevent,insteadofrelyingonafixedrateofdiffusion. Ourapproachconsiderablyreducestheloadoncommunication re-sources sinceonly“significantchanges” inthediffusedparameter, e.g.,anabruptchangeinthelocalestimate,areconveyedbasedon theparticularrealizationofthesignal.

Toclarify theframework,we considerthediffusionofascalar parameter

ξi

,t fromagivennodei toaneighboringnode j.Asan

example,thisinformationcan be asingle componentofthe esti-mates [23],orthe errorassociatedwith an additionalestimation layer [22]. In our distributed framework, due to communication constraints, a quantized version of the original parameter,

ξ

_iq_,_t is

(3)

Fig. 2. Illustration oftheoperation oftheLCquantizer.Blue dotsrepresentthe originalnodeestimates,whileredonesrepresentthequantizedversionofthe cor-respondingestimates.(Forinterpretationofthereferencestocolorinthis ﬁgure legend,thereaderisreferredtothewebversionofthisarticle.)

shared.We aimto formaquantizationscheme,whichguarantees that

ξi

,t and

ξ

iq,t are approximatelyequal to each other forall t,

while atthe same time keepingthe load on communication re-sourcesrelativelysmall.

To solve this problem, we propose an event-triggered com-municationalgorithmwhere,astheevent-triggeredapproach,we speciﬁcallyuselevelcrossing(LC)quantization[19].Toclarifythe framework,suppose we haveadiscrete time signal

ξi

,t asshown

inFig. 2thatrepresentstheinformationtobecommunicatedfrom thenode i tothenode j,e.g.,theestimatedparameter,orthe es-timationerror.In conventionalquantization, ateachtime instant, wesampleandquantizethisparameter.Ontheotherhand,inthe LCquantization, weconsiderasetoflevels

S {

l1

,

. . . ,

lK

}

,which

isillustrated in Fig. 2. At each discrete time index t, thenode i

checks whether a level-crossing has occurred on

ξi

,t. When the

parameter

ξi

,t crossesalevelli,t,i.e.,

ξ

i,t−1

−

li,t

ξ

i,t

−

li,t

<

0 for some li,t

∈

S

,

thenode i transmitsinformationtoitsneighboringnodes.For ex-ample,thisinformationcan bethe directionof thelevel-crossing

[19].Aneighboringnode j usesthisreceivedinformationtoform anestimate

ξ

_iq_,_t for

ξi

,t.

If there is an information transfer by the node i at time t,

thereceivingnode j estimatestheparameterasthelevelthrough whichalevelcrossinghasoccurred:

ξ

_iq_,_t

=

li,t

.

(8)

Forthe time instants when the node i is silent, the node j

in-fers that nosigniﬁcant change inthe parameter hastakenplace, anduses the estimatedparameter value from the previous time instant:

ξ

_iq_,_t

= ξ

_iq_,_t₋₁

.

(9)

We note that the set of levels

S

is known by all nodes in the network. Hence, as the diffused information, it is suﬃcient for the node i to only convey how

ξ

_iq_,_t changes compared to the previously-crossedlevel

ξ

_iq_,_t₋₁. In particular, we note the follow-ingtwocases: Intheﬁrstcase,theparameter

ξi

,t changes slowly

enough such that a crossing through multiple levels do not oc-cur,sothat thenode i onlyneeds toindicatethedirection ofthe changeinlevels.Therefore,wetransmittwobitsforthiscase,one forindicating that the single level crossing occursand the other forindicatingthedirectionofcrossing.Inthesecondcase,wemay havemultiple crossings where we directly code the full location

information ofthe new levelvalue

ξ

_iq_,_t with a ﬂag bitindicating multiplelevelcrossingoccurredusing

log2

(K

)

+

1 bits.Asshown, this approach signiﬁcantly lowers the amount of communication whilemaintainingestimationperformance.

4. Algorithmdescription

In this section, we present the full algorithmic description of theproposeddiffusionschemewiththelevel-crossingquantization

[19].At time t, a given node i in the network makes the scalar observationdi,tthroughthelinearmodeldi,t

=

uT_i_,_two

+

vi,t,which

is then used to update its intermediary local estimate using the LMSadaptation

φ

_i_,_t+₁

= (

IM

−

μ

iui,tuTi,t

)

wi,t

+

μ

iui,tdi,t

.

Due to the quantized communication framework, a neighboring node j doesnot have access to the true value of the parameter

φ

i,t+1,whichhas M entries.As such, basedonthe limited infor-mation it receives fromthe node i, the node j triesto estimate thisparameterasthe M-entryvector

φ

_iq_,_t+₁.Speciﬁcally,intheLC quantization, thenode j receivesinformationabouthowthe cur-rent values of the entries of the parameter

φ

_i_,_t+₁ have changed relative to the most recent estimate the node j has access to, namely

φ

q_i_,_t.Thenode i recordsthemostrecentestimate,

φ

q_i_,_t,asa referenceanddiffusesinformationtotheneighboringnodes j

∈

Ni

indicating how the current estimate

φ

i,t+1 compares to this ref-erence ona per-entry basis. Inparticular, the node i makes this comparisonby checkingfora levelcrossing between correspond-ingentries ofthetwo vectorquantities

φ

q_i_,_t and

φ

_i_,_t+₁.Ifthereis a levelcrossing on an entry,the node i transmitsinformation to its neighbors through a channel frequency allocated to this par-ticular entry. If there is a single level-crossing, this information indicates thedirectionofthelevelcrossing;otherwise, the trans-mittedinformationdirectlyspeciﬁesthelocationofthenewlevel. Aneighboringnode j thenconstructstheestimate

φ

q_i_,_t₊₁ using(8)

or(9)onaper-entrybasis,dependingonwhetherthenodei

dif-fusesinformationornot,respectively,attime t.

While diffusing informationrelated to its own local estimate, the node i alsoreceives information fromtheneighboring nodes

j representing their local estimates

φ

j,t+1. For each neighboring node j, the node i uses this diffused information to reconstruct

φ

q_j_,_t₊₁ using (8) or (9). The ﬁnal estimate wi,t+1 is then con-structedusingthecombination

wi,t+1

=

pi,i

φ

i,t+1

+

j∈Ni\{i}

pi,j

φ

q_j_,_t₊₁

.

Remark.Inordertokeep thepresentationclear,we illustratethe special caseof M

=

1 ofthe proposed algorithm inAlgorithm 1, whichcanbegeneralizedtoarbitraryM inastraightforward man-ner.

Remark.Wenotethatanalternativeapproachtodealingwiththe

M

>

1 case is to havethe nodesinthe network transmit onlya certain entryoftheirintermediary estimates

φ

i,t.Asan example,

in thiscase, the nodescan cyclethrough different entriesacross timeinaround-robin fashion.Thenon-communicatedentriesare replacedbythecorrespondingentriesinthelocalintermediary es-timate[23].ThisapproachisexploredinSection7.

5. Meanstabilityanalysis

Tocontinuewiththestabilityanalysisoftheproposedscheme, we assume that the regressors ui,t are temporally and spatially

(4)

Algorithm1ATCdiffusionLMSwiththeLCquantization,M

=

1. 1: for i=1 toN do Initialization: 2: wi,0= φ q i,0=0 3: end for 4: for t≥0 do 5: for i =1 toN do Localadaptation: 6: φi,t+1= (1−μiu2i,t)wi,t+μiui,tdi,t

Checkforlevelcrossing: 7: if ∃li,t∈ Ssuch that

(φqi,t−li,t)(φi,t+1−li,t)<0 then

8: if The crossingistoanadjacentlevel then 9: Diffusethedirectionofthecrossing 10: else

11: Diffusethelocationofthenewlevel

12: end if

13: Locallystoreφiq,t+1=li,tinrecord

14: else 15: Remainsilent 16: Locallysetφiq,t+1= φ q i,t 17: end if Reconstruction: 18: for all j ∈ Ni\{i}do

19: if node j issilent then

20: Reconstructasφq_j_,_t₊₁= φq j,t

21: else

22: Reconstructφqj,t+1usingthediffused information 23: end if 24: end for Combination: 25: wi,t+1=pi,iφi,t+1+j∈Ni\{i}pi,jφqj,t+1 26: end for 27: end for

independent,zero meanandwhite,with covariancematrix

i

E

ui,tu_iT_,_t

=

σ

2

u,iIM.Theobservationdi,t atnodei is assumedto

followalinearmodeloftheform

di,t

=

uTi,two

+

vi,t

,

(10)

where

{

vi,t

}

t≥1 isazeromeanwhiteGaussiannoise processwith variance

σ

2

v,i,independentof

{

uj,t

}

t≥1

∀

i,j.

Inour proposedlevel-triggered estimationframework, ateach node i, the diffusion LMS update for the ATC strategy takesthe form

φ

_i_,_t₊₁

= (

IM

−

μ

iui,tuiT,t

)

wi,t

+

μ

iui,tdi,t

,

(11) wi,t+1

=

pi,i

φ

i,t+1

+

j∈Ni\{i} pi,j

φ

q_j_,_t+₁

,

(12)

wherethecombinationmatrix P istakentobestochastic,withits rowssumming up to unity. We rewrite theexpressions (11)and

(12)as

φ

_i_,_t₊₁

= (

IM

−

μ

iui,tuiT,t

)

wi,t

+

μ

iui,tdi,t

,

(13) wi,t+1

=

j∈Ni pi,j

φ

j,t+1

−

j∈Ni\{i} pi,j

α

j,t+1

,

(14)

bydeﬁningthequantizationerrorfornode j

α

j,t

φ

j,t

− φ

q j,t

.

We representthe diffusionupdate over thenetwork

N

in state-spaceformbyintroducingthefollowingglobalquantities:

dt

col

d1,t

, . . . ,

dN,t

vt

col

v1,t

, . . . ,

vN,t

w_o

col

{

wo

, . . . ,

wo

}

Ut

diag

u1,t

, . . . ,

uN,t

M

diag

{

μ

1IM

, . . . ,

μ

NIM

}

wt

col

w1,t

, . . . ,

wN,t

φ

t

col

φ

1,t

, . . . , φ

N,t

φ

q_t

col

φ

q₁_,_t

, . . . , φ

q_N_,_t

α

t

col

α

1,t

, . . . ,

α

N,t

G

P

⊗

IM PC

P

−

diag

{

P

}

GC

PC

⊗

IM

Usingtheabove-deﬁnedquantities,thediffusionupdates(13),(14)

takethefollowingglobalstate-spaceform:

φ

_t+₁

= (

IM N

−

M UtUtT

)

wt

+

M Utdt

,

(15)

wt+1

=

G

φ

_t+1

−

GC

α

t+1

.

(16) Similarly, the data model (10)can be expressed in terms of the globalquantitiesas

dt

=

UtTwo

+

vt

.

(17)

To facilitate the mean stability analysis, we deﬁne the global deviationparameters

˜

wt

wo

−

wt

,

˜φ

_t

w_o

− φ

_t

.

Aftersubstituting(17)andsubtractingbothsidesof(15),(16)from w_o, the diffusion updates in terms of the deviation parameters takethefollowingform:

˜φ

_t₊₁

= (

IM N

−

M UtUtT

)

w

˜

t

−

M Utvt

,

(18)

˜

wt+1

=

G

˜φ

_t+1

+

GC

α

t+1

,

(19) where wehave usedtherelation G w_o

=

w_o, whichresultsfrom thestochasticnatureof P .

Theexpressions(18),(19)canbeexpressedcompactlyas

˜

wt+1

=

G

(

IM N

−

M UtUtT

)

w

˜

t

−

G M Utvt

+

GC

α

t+1

.

(20)

Assumption.Thequantizationerroroverthenetwork

α

t haszero

mean.Thisis areasonable assumptionfortheanalysisof quanti-zation effects[24].The applicabilityof theassumptionis veriﬁed byourexperimentsinSection7.

Takingexpectationsofbothsidesof(20)yields

E

w

˜

_t+1

=

G

(

IM N

−

M)E

˜

wt

,

(21)

where

diag

{

1

, . . . , N

}

isblockdiagonal.Formeanstability andasymptoticunbiasednessofthedistributedﬁlter(11)–(12),we require that the spectral radius

|

G

(

IM N

−

M

)

|

<

1, which,

not-ing that G is stochasticwithnonnegativeentries,isequivalent to requiring

|(

IM N

−

M)

| <

1

,

(22)

by the Theorem 4

.

4 of [25]. Noting that the eigenvalues of the block diagonalmatrix IM N

−

M

istheunionofthe eigenvalues

ofitsindividualblocksIM

−

μ

i

iwhere

i

=

σ

_u2_,_iIM;weconclude

that the distributedﬁlter is mean stableif

|

1

−

μ

i

σ

_u2_,_i

|

<

1

,

i

=

1

,

. . . ,

N,i.e.,if 0

<

μ

i

<

2

σ

2 u,i i

=

1

, . . . ,

N

,

(5)

6. Mean-squarestability

Weutilizetheweighted energyrelationapproach[24] to pro-ceed the mean square transient analysis of the distributed ﬁl-ter. Through a positive-deﬁnite weighting matrix

, taking the weightednormofbothsidesof(20)yields:

˜

w_tT₊₁

w

˜

_t+1

=

˜

wT_t

(

IM N

−

M UtUtT

)

TGT

G

(

IM N

−

M UtUtT

)

w

˜

t

−

2v_tTUT_t M GT

G

(

IM N

−

M UtUtT

)

w

˜

t

+

2

α

T_t₊₁GT_C

G

(

IM N

−

M UtUTt

)

w

˜

t

−

2v_tTUT_t M GT

GC

α

t+1

+

vT_tU_tTM GT

G M Utvt

+

α

_tT₊₁G_CT

GC

α

t+1

.

(23)

Notingthat vt iszero-mean andindependentof Ut and w

˜

t, and

takingtheexpectedvalueofbothsidesof(23)yieldsthefollowing variancerelation: E

˜

wt+1

2

=

E

˜

wt

2

+

2E

α

T t+1GTC

G

(

IM N

−

M UtUtT

)

w

˜

t

−

2E

v_tTU_tTM GT

G

C

α

t+1

+

E

vT_tUT_tM GT

G M Utvt

+

E

α

T_t₊₁GT_C

GC

α

t+1

,

(24) where

GT

G

−

GT

G M U

tUtT

−

UtUtTM GT

G

+

UtUtTM GT

G M U

tUtT

.

BythetemporalindependenceoftheregressorprocessUt andthe

independenceofthenoiseprocessvt fromUt,wehavetheresult

thatUt isindependentofw

˜

t.Hence,therandomweightingmatrix

canbereplacedbyitsmeanvalue

E

in(24).Thus,

=

GT

G

−

GT

G M

−

M GT

G

+

E

UtUtTM GT

G M U

tUtT

,

(25) where

E

UtUtT

.Substituting the

˜φ

t+1 expressionfrom(18) into(24)yieldsthefollowingﬁnalformofthevariancerelation

E

˜

wt+1

2

=

E

˜

wt

2

+

2E

α

T t+1GTC

G

˜φ

t+1

+

E

α

_t+T ₁GT_C

G

C

α

t+1

+

E

v_tTU_tTM GT

G M U

tvt

.

(26)

To capture the mean-square behavior of the adaptive net-work, we express the relations (25), (26) in a compact form by using the convenient vector notation [24]. In particular, we use the bvec

{·}

block vectorization operation [16] which trans-forms an arbitrary M N

×

M N block matrix

with the

(i,

j)th block

i j of size M

×

M into the vector col

{

σ

1

, . . . ,

σ

N

}

,where

σ

j

col

vec

{

1 j

}, . . . ,

vec

{

N j

}

. We also use the block Kro-neckerproduct A

B deﬁnedashavingthe

(i,

j)thblock

[

A

B

]

i j

=

⎡

⎢

⎣

Ai j

⊗

B11

. . .

Ai j

⊗

B1N

..

.

. .

.

..

.

Ai j

⊗

BN1

. . .

Ai j

⊗

BN N

⎤

⎥

⎦ ,

(27)

which is related to the bvec

{·}

operator via bvec

{

A BC

}

=

(C

T

A)bvec

{

B

}

.Deﬁning

σ

bvec

{}

andvectorizingbothsides of(25)yields bvec

{

} = ((

IM N

)

− (

M

IM N

)

− (

IM N

M

)) (

GT

)

σ

+

bvec

{

E

UtUtTM GT

G M U

tUtT

}.

(28) The term E

UtUtTM GT

G M UtUtT

on the right-hand side of

(28) can be vectorized by resorting to the Gaussian factoriza-tion theorem [16,17]. We let

˜ =

M GT

G M with

(i,

j)th block

˜

i,j and with the vectorized form bvec

{ ˜}

=

col

˜

σ

1

, . . . ,

σ

˜

j

where

σ

˜

j

=

col

˜

σ

1 j

, . . . ,

σ

˜

N j

.Then,the

(k,

l)thblock

kl of

E

UtUTt

˜

UtUTt

isgivenby

kl

=

k

˜

kl

l for k

=

l

,

k

˜

kl

k

+

2

kTr

{ ˜

kk

k

}

for k

=

l

,

withthevectorizedform

γ

_kl

=

(

l

⊗

k

)

σ

˜

kl for k

=

l

,

(

l

⊗

k

)

+

2rkrkT

˜

σ

kl for k

=

l

,

by the factorization theorem, where

k

E

uk,tukT,t

, rk

vec

{

k

}

. Letting bvec

{}

=

col

γ

1

, . . . ,

γ

j

where

γ

j

=

col

γ

1 j

, . . . ,

γ

N j

,weobservethatwecanexpress

γ

j intheform

γ

_j

=

A

j

σ

˜

j

,

where

A

j

diag

j

⊗

1

, . . .

j

⊗

j

+

2rjrTj

, . . . ,

j

⊗

N

. Furtherdeﬁning

A

diag

{

A

1

, . . . ,

AN

}

,wearriveatthe represen-tation bvec

{} =

A

bvec

{ ˜} =

A

(

M

)(

GT

)

σ

.

(29) Substituting(29)to(28)yields bvec

{

} = ((

IM N

)

− (

M

IM N

)

− (

IM N

M

)

+

A

(

M

)) (

GT

)

σ

.

(30) ThetermE

v_tTU_tTM GT

G M Utvt

in(26)canbeveriﬁedtobe E

v_tTUT_tM GT

G M Utvt

=

E

Tr

{

vT_tUT_tM GT

G M Utvt

}

=

E

Tr

{

G M UtvtvTtUTtM GT

}

=

Tr

{

G M H M GT

},

(31) where we have deﬁned H

=

E

UtvtvtTUtT

. We observe that H has the

(k,

l)th block Hkl

=

σ

v2,k

k

δ

kl, which yields H

=

(

v

⊗

IM) ,where

v

E

vtvTt

.Thus(31)becomes E

v_tTUT_tM GT

G M Utvt

=

Tr

{

G M

(

v

⊗

IM

)

M GT

}

= ((

G M

)

bvec

{(

v

⊗

IM

) })

T

σ

.

(32)

SimilarlytheremainingtermsintheRHSof(26)canbeveriﬁedto be

(6)

E

α

T t+1GTC

G

˜φ

t+1

= ((

G

GC

)

bvec

{

E

[

α

t+1

˜φ

T t+1

]})

T

σ

,

E

α

T_t+₁GT_C

GC

α

t+1

= ((

GC

)

bvec

{

E

[

α

t+1

α

_t+T 1

]})

T

σ

.

(33) Deﬁningthequantities

bt

(

G M

)

bvec

{(

v

⊗

IM

) } + (

G

GC

)

bvec

{

E

[

α

t

˜φ

T t

]}

+ (

GC

)

bvec

{

E

[

α

t

α

Tt

]},

F

((

IM N

)

− (

M

IM N

)

− (

IM N

M

)

+

A

(

M

)) (

GT

),

(34)

andfurtherusingtheshorthandE

˜

wt

2σ forE

˜

wt

2_bvec−1₍_σ₎,yields thefollowingcompactformfortheweightedenergyrecursion:

E

˜

wt+1

2σ

=

E

˜

wt

2Fσ

+

bTt+1

σ

(35)

Remark. We note that the expectations E

[

α

t+1

˜φ

T t+1

]

and E

[α

_t+1

α

tT+1

]

presentsome diﬃculty forfurtheranalytical simpli-ﬁcations in closed form, in exact or approximate terms. This is causedbythelargedegreewithwhichthequantizationerrorterm

α

t iscoupledwithitselfaswellastheintermediaryparameter

de-viation

˜φ

_tnonlinearlythroughthenon-deterministicreference lev-els

{φ

q_i_,_t

}

t≤t against whichthelevel crossingeventsare checked,

whichevolvethrough(13)–(14).Wefurthernotethat invokingan approximationbasedonindependenceargumentsforE

[α

_t+1

˜φ

T t+1

]

, whichcapturesthecovariancesbetweentheintermediary param-eterdeviationsandthe quantizationerrorsover arbitrarypairs of nodes on the network, is not feasible in general unless further assumptions aremade onthe numberofquantizationlevels em-ployed so that the deviations become statistically less sensitive on the error terms. We stress that the lack of closed-form ex-pressionsfortheseexpectationsdoesnothamperouranalysisfor themean-squarestability,sincerequiringthattheaforementioned termsremainboundedissuﬃcientforthepurposesofestablishing aboundforthe(weighted)mean-squaredeviationE

˜

wt

2σ .

Iterationof(35)yieldstherecursions

E

˜

w_t+1

2σ

=

E

˜

wt

2Fσ

+

bTt+1

σ

E

˜

w_t+1

2Fσ

=

E

˜

wt

2_F2_σ

+

btT+1F

σ

..

.

E

˜

w_t+1

2 FN2 M2−1_σ

=

E

˜

wt

2 FN2 M2_σ

+

b T t+1FN 2_M2₋₁

σ

.

(36)

Using Cayley–Hamilton theorem with characteristic polynomial

p(x)forFresultsin FN2M2

= −

p_N2_M2₋₁FN

2_M2₋₁

− . . . −

p1F

−

p0

.

Substitutingto(36)thenresultsintheexpression

E

˜

wt+1

2 FN2 M2−1_σ

= −

p_N2_M2₋₁E

˜

wt

2 FN2 M2−1σ

− . . . −

p0E

˜

wt

2 σ

+

b_t+T ₁FN2M2−1

σ

,

whichcanbeplacedintothestatespaceform

Wt

+1

=

FWt

+

Yt

+1

,

(37) where

Wt

⎡

⎢

⎣

E

˜

wt

2_σ E

˜

wt

2Fσ

..

.

E

˜

wt

2 F(N2 M2−1)_σ

⎤

⎥

⎦

,

Yt

⎡

⎢

⎣

bT_t

σ

bT_t F

σ

..

.

b_tTFN2M2−1

_σ

⎤

⎥

⎦

(38)

F

⎡

⎢

⎣

0 1 0

. . .

0 0 0 1

. . .

0

..

.

..

.

..

.

. .

.

..

.

−

p0

−

p1

−

p2

. . .

−

pN2_M2₋₁

⎤

⎥

⎦

.

(39)

Tomakethemean-square stabilityanalysismoretractable,we introducethefollowingassumption:

Assumption.ThequantizationerrorcovariancesE

α

t+1

˜φ

T t+1

and E

α

t+1

α

tT+1

remain bounded, with

E

α

t+1

˜φ

T t+1

F

,

E

α

t+1

α

tT+1

F

<

A forsome A

>

0 fortheFrobeniusnorms.

Usingtheassumption,weobtainaboundthenorm

bt

2as

bt

2

≤ (

G M

)

bvec

{(

v

⊗

IM

) }

2

+

G

GC

2

bvec

{

E

[

α

t

˜φ

T t

]}

2

+

GC

2

bvec

{

E

[

α

t

α

Tt

]}

2

≤ (

G M

)

bvec

{(

v

⊗

IM

) }

2

+

A

(

P

₂

+

PC

2

)

PC

2

B

.

Inspecting(39),weobservethattheboundednessof

bt

2 im-pliestheboundednessof

Yt

2,hence

∃

C

>

0 s.t.

Yt

2

<

C

∀

t.

Therecursion(37)canbesolvedfor

Wt

inclosedformas

Wt

=

F

t

_W

0

+

t−1

n=0

F

n

_Yt

−n

.

(40)

Using(40),wecanobtainaboundfor

Wt

2as

Wt

2

≤

F

t2

W

0

2

+

t−1

n=0

F

n 2

Yt

−n

2

≤

F

t₂

W

0

2

+

C t−1

n=0

F

n₂

=

F

t₂

W

0

2

+

C 1

−

F

t₂ 1

−

F

2 (41) where we have used the fact that since

F

is in the form of a companionmatrixforF ,theysharethesamesetofeigenvalues.

We note that requiringthat

Wt

2 remains boundedis suﬃ-cient toguaranteethemean-squarestabilityoftheoverallsystem since doing so ensures that E

˜

wt

2σ remainsbounded. Thus, by

(41),themean-squarestability conditionreducestothematrix F givenby(34)beingstable.HenceinordertoensureMSstability,it issuﬃcientthatthestepsizes

μ

iarechosensuchthatthematrix F isstable.

7. Experiments

Inthissection,wedemonstratethesigniﬁcantreductioninthe communication load achievedby our algorithms while providing equalperformancewithrespecttothestateoftheart.

For the ﬁrst part of the simulations, we consider a sample network consistingof N

=

10 nodes, whereeach node makes its observationthroughthelinearmodel

(7)

di,t

=

uTi,two

+

vi,t

,

i

=

1

, . . . ,

N

.

(42)

The regressor data ui,t are zero mean i.i.d. Gaussian with

stan-darddeviations

σ

u,i chosen randomlyfromtheinterval

(

0

.

3

,

0

.

8

)

.

The observationnoisesare generated froma Normal distribution withstandard deviations

σ

v,i chosen randomly fromthe interval

(

0

.

1

,

0

.

3

)

.InFig. 3,we depict thenetworktopology andthe net-work’s statistical proﬁle to show how the signal power and the noisepowervaryacrossthenetwork.

Theunknown vector parameter wo with M

=

10 components

israndomlychosenfromaNormaldistributionandnormalizedto haveaunitenergy.Wechangedthesourcestatisticsinthemiddle ofthesimulations toobservehowwell theproposedalgorithmis abletotrackthesuddenchangesintheunknownparameter.

We useMetropolis combination ruleto generate the network matrixP suchthat

pi,j

=

⎧

⎨

⎩

2

M2_max₍1_N_i_,_N_j₎ if i

=

j are linked,

0 for i and j not linked, 1

−

_j∈N

i\ipi,j for i

=

j

usingtherandomlyselectednetworkadjacencymatrixgivenby

⎡

⎢

⎣

1 1 0 0 0 1 1 0 0 0 1 1 1 0 1 0 1 0 0 0 0 1 1 1 0 1 0 0 0 1 0 0 1 1 0 1 0 0 1 1 0 1 0 0 1 1 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1

⎤

⎥

⎦

.

Weconﬁgure thenodes such that they cyclethroughthe entries ofthe intermediary estimates

φ

i,t in a round-robin fashion, and

exchangeonlythisselectedL

=

1 dimensionoutofM inonetime instant. Forinstance, for a L

=

1

,

M

=

3 system at time instants

t

=

1

,

. . . ,

4,theithnodewillsenditsentriesoftheintermediary estimate

φ

i,t asin(43)

φ

1,i

=

⎡

⎣

φ

10,1,i 0

⎤

⎦ , φ

₂_,_i

=

⎡

⎣

φ

20,2,i 0

⎤

⎦ , φ

₃_,_i

=

⎡

⎣

00

φ

3,3,i

⎤

⎦ ,

φ

₄_,_i

=

⎡

⎣

φ

10,4,i 0

⎤

⎦ ,

(43)

where

φl

,t,i isthelthdimension oftheintermediaryentry

φ

i,t of

theithnodeattimet thatissenttotheneighbors.

Weevaluatethe communicationreductionperformance ofthe proposed algorithm withrespect tothe algorithm in[23],where only one entry of intermediate estimates is exchanged by the nodesateachroundinasequentialorderasexplainedin(43).

In Fig. 4,the MSD performance of the proposed algorithm is demonstrated,where as a reference, we have considered the al-gorithm in [23] with an adaptive Lloyd–Max quantizerandwith a no-quantization (scalar diffusion) implementation of the sys-tem. Notethat both in scalar diffusionalgorithm andLloyd–Max quantizedalgorithm,whichisreferredasconventionallyquantized algorithmlater, nodesare exchanging the information ofone di-mension per communication round. However, in scalar diffusion algorithm, information of the exchanged dimension is diffused withfull precision, while in Lloyd–Max case, information of the exchangeddimension isquantizedwithaﬁniteprecision. We se-lectedthequantizationintervalsothatwedonotsufferfromany

Fig. 3. Network topology and statistical proﬁle.

saturation effects andalsowe have chosen thenumber of quan-tization levels so that nofurther signiﬁcant improvementcan be madeontheMSDperformanceofthealgorithmsbyincreasingthe numberoflevels.Weobservedthat 53 quantizationlevelsforthe LCalgorithmand31 quantizationlevelsfortheconventional algo-rithm were suﬃcient.We useastep sizeof

μ

=

0

.

05 during the

(8)

Fig. 4. The globalMSDcurvesoftheproposedalgorithm,displayedwiththelabel ‘LC’,incomparisonwiththeconventionalquantizationandthescalardiffusion al-gorithms(N=10,M=10).Magniﬁedﬁgureprovidesthetransientperformanceof thealgorithms.Sourcestatisticschangeattimet=2×104_.

Fig. 5. Time evolutionofthenumberofbitstransmittedacrossthenetwork.Sudden increaseinthe‘LC’curvecorrespondstothetimewherethesourcestatisticsare changed.

simulationsduetoitsgoodlearningrateandconvergenceresults. Theresultsthatweobtainedintheexperimentsareaveragedover 100independenttrials.

From thesesimulations,we observethat the convergencerate ofthescalar diffusionandtheconventionally quantizeddiffusion algorithmsaresuperiorcomparedtotheproposedalgorithm,while thesteady-stateMSDvaluesofallthreesystemsareidentical.We notethat itwasouraimtogetequalsteady-stateMSDvalues al-lowingafaircomparisonintermsoftheconvergencespeeds.Also, it is observed that the proposed algorithm is able to adapt well whenfacedwithasuddenchangeinthesourcestatistics.

In Fig. 5, we present the communication load that each al-gorithm incurs on the network. We exclude the scalar (infinite-precision) diffusion algorithm from this comparison since it re-quires an infinite number of bits to encode the information ex-changedamongthenodes.Weobserveasubstantialenhancement in the communication efficiency achievedby the proposed

algo-rithmintermsofthetotalnumberofbitsexchangedbetweenthe nodes across theentire adaptivenetwork withrespect to the al-gorithm that uses the conventional quantization. Particularly, for this N

=

10 node network, we note that the proposed algorithm provides 103 times less communication load over the reference implementation withthesame steadystate MSDvalues.We also observe that atthe time of changein thesource statistics, there is asudden increaseinthe numberofbitsusedby theproposed algorithm.Thisisbecausetherearemultiplelevelcrossingsoccur duetothesuddenchangeintheparameterofinterestatthattime, which requires more than two bits to encode. However, we ob-servethatthesystemquicklyadaptsitselftousingtwo-bitsagain. The same behavior is not present forthe conventional quantiza-tioncasesinceitalreadyencodestruevaluesofthelevelsatevery single time instant. We stress further that we achieve this im-provement withrelatively little complexity since we haveshown that usinga simplenon-adaptivequantizerissuﬃcient torealize theimprovements.

In the second part of the experiments, we aim to observe theperformanceoftheproposedalgorithmoverhighdimensional data. Therefore, we have changed the former setup so that the unknown vectorparameter wo with M

=

100 componentsis

ran-domlychosenfromaNormaldistributionandnormalizedtohave aunit energy.Weusethesamedistributednetworkwith connec-tions giveninFig. 3c.Quantizationlevels forthealgorithmsagain chosen so that no further signiﬁcant improvement can be made by increasing the numberoflevels. Weobserved that 53 quanti-zation levels forthe LC algorithm and 31 quantizationlevels for the conventional algorithm were suﬃcient. We againuse a step sizeof

μ

=

0

.

05 andtheresultsareaveragedover10independent trials.We havedecreasedthenumberofindependenttrialstobe averagedsinceprocessinghighdimensionaldatatakessigniﬁcantly moretime.

We presentthe MSD performance of the proposed algorithm incomparisonwiththesequentialvariantofthealgorithmin[23]

with the parameters M

=

100, L

=

1 in Fig. 6. We observe that in the high dimensional data case, the convergence rate of the proposedalgorithmisthesameasthecomparedalgorithms.They alsohavethesamesteady-stateMSDvalues.Theseresultsindicate thattheadaptationperformancesofthescalardiffusionalgorithm andtheconventionallyquantizeddiffusionalgorithmdecreasefor the highdimensional case since the nodes are allowed to share onlyonedimensionperround,whichpreventsthemfromquickly sending their entire intermediary estimates to their neighboring nodes.Therefore,we observethatforsuchsystems,theproposed algorithmperformssimilartothescalardiffusionandthe conven-tionallyquantizedalgorithms.

In Fig. 7, we illustrate thecommunicationload foreach algo-rithm.Weobserveanimprovementonthecommunication require-ments in a similar vein to the previous experiments. Ultimately, theproposedalgorithmincurs102 timeslesscommunicationload compared to the baseline, wherethe numberof transmitted bits is signiﬁcantly reduced. The magnitude of this reduction is of a smaller scalecomparedto thenon-highdimensionalcase, onthe other hand,mainly due to the extra bits required to encode the higherdimensionsformultiplelevelcrossingsintheLC quantiza-tion.

In the third partof the experiments, in order to observe the possibleeffectsofnumberofquantizationlevels,wesimulatethe algorithms within an identical experimental setup – except that the number ofquantization levels are no longer optimized asin the previous cases. To this end, we have arbitrarily chosen 25 quantization levels for the LC algorithm and again 25 levels for theconventionalalgorithm.Weusethesamedistributednetwork connectionsgiveninFig. 3c.Wehaveusedastepsizeof

μ

=

0

.

05 andtheresultsareaveragedover100independenttrials.

(9)

Fig. 6. The globalMSDcurvesoftheproposedalgorithm,displayedwiththe la-bel‘LC’,incomparisonwiththeconventionalquantizationandthescalardiffusion algorithmsoverhighdimensionaldata(N=10,M=100).Magniﬁedﬁgure pro-videsthetransientperformanceofthealgorithms.Sourcestatisticschangeattime t=104_.

Fig. 7. Time evolutionofthenumberofbitstransmittedbythealgorithmsacross networkoverhighdimensionaldata(N=10,M=100).Suddenincreaseinthe‘LC’ curvecorrespondstothetimeatwhichthesourcestatisticsarechanged.

Wepresentthe MSDperformancesofthealgorithmsinFig. 8. We observe that when sub-optimal quantization levels are used, thecompared algorithms exhibit superior performance compared tothe proposed algorithmboth intermsof theconvergencerate andthesteady-stateMSD. Wealso note thatthe quantized algo-rithmscould notreachthesteady-stateperformanceofthescalar diffusion due to the deliberate poor selection of the number of quantizationlevels.

Theseresultsareobservedduetoafailureonthesystem’spart to satisfy the assumed quantization error model. The statistical model that we used for the quantization error

φ

q_i assumes that ithaszeromeansuch that E

[φ

_iq

]

=

0[24].However,whensucha lownumberofquantizationlevelsare selected,thismodelceases tobe applicableandthequantizedalgorithmsarenolonger

guar-Fig. 8. The globalMSDcurvesoftheproposedalgorithm,displayedwiththelabel ‘LC’,incomparisonwiththeconventionalquantizationandthescalardiffusion al-gorithmswithsub-optimalquantizationlevels(N=10,M=10).Sourcestatistics changeattimet=104_.

Fig. 9. Time evolutionofthenumberofbitstransmittedbythealgorithmsacross networkwithsub-optimalquantizationlevels(N=10,M=10).Suddenincreasein the‘LC’curvecorrespondstothetimeatwhichthesourcestatisticsarechanged.

anteed to converge to the steady-state MSD values of the scalar diffusionalgorithm.

InFig. 9,wepresentthecommunicationloadofthealgorithms overthenetworkforthecaseofasub-optimallevelselection.We again observe a similar behavior where the proposed algorithm diffusesmorethan103 timeslessbitsthroughnetworkcompared tothebaseline.Wenotethatthedifferenceinthenumberofbits exchanged between the two algorithms is larger compared with thepreviousresults.Thiscanbeexplainedbythefactthatweuse fewer quantization levels forthe LC algorithm,which makes the occurrenceofmultiple levelcrossingsa rarerphenomenon.Thus, it becomes less likelyfor each node to send our morethan two bits of information fora given iteration. Ultimately, this particu-larexperimentillustratestheexistenceofatrade-offbetweenthe estimationperformance andthecommunicationloadimposedon thenetwork.

Resource-aware event triggered distributed estimation over adaptive networks

Digital

Signal

Processing

Resource-aware

event

triggered

distributed

estimation

over

adaptive

networks

Ihsan Utlu

,

,

O. Fatih Kilic

,

Suleyman

S. Kozat

,

∗

a

r

t

i

c

l

e

i

n

f

o

a

b

s

t

r

a

c

t

*

=

+

,

=

, . . . ,

∈

Ni

∈ R

=

σ

=

σ

φ

,

∈

Ni

(

)

=

|

−

|

+

α



− φ



,

α



α

=



∇

(

)



=

−

=