Contents lists available atScienceDirect
Digital
Signal
Processing
www.elsevier.com/locate/dsp
Robust
least
squares
methods
under
bounded
data
uncertainties
N. Denizcan Vanli
a,
∗
,
Mehmet A. Donmez
b,
Suleyman S. Kozat
aaDepartmentofElectricalandElectronicsEngineering,BilkentUniversity,Ankara,Turkey
bDepartmentofElectricalandComputerEngineering,UniversityofIllinoisatUrbana-Champaign,IL,USA
a
r
t
i
c
l
e
i
n
f
o
a
b
s
t
r
a
c
t
Articlehistory:
Availableonline24October2014 Keywords: Dataestimation Leastsquares Robust Minimax Regret
We study the problem of estimating an unknown deterministic signal that is observed through an unknown deterministic data matrix under additive noise. In particular, we present a minimax optimizationframeworktothe leastsquaresproblems,where theestimatorhasimperfectdata matrix andoutputvectorinformation.Wedefinetheperformanceofanestimatorrelativetotheperformanceof theoptimalleastsquares(LS)estimatortunedtotheunderlyingunknowndatamatrixandoutputvector, which isdefined as theregret ofthe estimator. We thenintroduce an efficient robustLS estimation approach that minimizesthisregretfor the worstpossible data matrix and output vector, wherewe refrain fromanystructural assumptionsonthedata. Wedemonstratethatminimizingthisworst-case regret can becast as asemi-definite programming(SDP) problem. We thenconsider the regularized and structuredLSproblemsandpresentnovelrobustestimationmethodsbydemonstratingthatthese problemscan alsobe castas SDPproblems. Weillustratethe meritsoftheproposed algorithmswith respecttothewell-knownalternativesintheliteraturethroughoursimulations.
©2014ElsevierInc.All rights reserved.
1. Introduction
Inthispaper,we investigateestimation ofan unknown deter-ministicsignalthatisobservedthroughadeterministicdatamatrix underadditive noise, whichmodels a widerange ofproblemsin signal processingapplications[1–14].In thisframework,thedata matrixandtheoutputvectorarenotexactlyknown,however, es-timates for both of them as well as uncertainty bounds on the estimates are given [2,8,15–19]. Since the model parameters are notknownexactly,theperformancesoftheclassicalLSestimators may significantly degrade, especially when the perturbations on thedatamatrixandtheoutputvectorarerelativelyhigh[9,15,16, 20–22].Hence,robustestimationalgorithmsare neededtoobtain asatisfactory performance undersuchperturbations. Thisgeneric frameworkmodelsseveralreal-lifeapplications,whichrequire es-timationofasignalobservedthroughalinearmodel[9,16].Asan example,this setup models realistic channel equalization scenar-ios, where the data matrix represents a communication channel and the data vector is the transmitted information. The channel is usually unknown, especially for wireless communications ap-plications, and possibly can be time-varying. Hence, in practical applications,thecommunicationchannel isestimated,where this
*
Correspondingauthor.E-mailaddresses:[email protected](N.D. Vanli),
[email protected]
(M.A. Donmez),[email protected]
(S.S. Kozat).estimate isusually subjectto distortions [9,16]. Undersuch pos-sible perturbations, robust equalization methods can be used to obtain a more consistent and acceptable performance compared to the LS (or MMSE) equalizer. In thissense, thisformulation is comprehensive andcan be usedin other applicationssuch asin feedbackcontrol systemstoestimateadesireddataunder imper-fectsystemknowledge.
A prevalent approachto findrobust solutions to such estima-tionproblemsistherobustminimaxLSmethod[8,9,16,23–27],in which the uncertainties in the data matrix and the output vec-tor are incorporated into optimization framework via a minimax residualformulationandaworst-caseoptimizationwithinthe un-certainty bounds is performed. Although the robust LS methods areabletominimizetheLSerrorfortheworst-caseperturbations, they usually provide unsatisfactory results on the average [15, 23–27]duetotheirconservativenature.Thisissueissignificantly exacerbated especially when the actual perturbations do not re-sult in significant performance degradation. Anotherwell-known approachtocompensateforerrorsinthedatamatrixandthe out-putvectoristhetotalleastsquaresmethod(TLS)[15],whichmay yieldundesirable resultssinceitemploys aconservativeapproach duetodatade-regularization.Ontheother hand,thedatamatrix usually hasa known specialstructure,such asToeplitz and Han-kel,inmanylinearregressionproblems[9,15].Hence,in[9,15],the authors illustrate that the performances of the estimators based onminimaxapproachesimprovewhensuchapriorknowledgeon data matrixstructure is integratedinto theproblemformulation.
http://dx.doi.org/10.1016/j.dsp.2014.10.004
Inallthesemethods,LSestimatorsunderworstcaseperturbations areintroducedtoachieverobustness.However,duetothis conser-vativeproblem formulation, inmany practical applications, these approachesyieldunsatisfactoryperformances[2,8,18,28–30].
Inorder tocounterbalance thisconservative nature ofthe ro-bustLSmethods[9],weproposea novelrobustLSapproachthat minimizesa worst case“regret” that is definedasthe difference between the squared residual error and the smallest attainable squaredresidualerrorwithanLSestimator[2,8,18,28–30].Bythis regretformulation,weseekalinearestimatorwhoseperformance isascloseaspossibletothatoftheoptimalestimatorforall pos-sibleperturbations onthedatamatrixandtheoutputvector. Our maingoalinproposing theminimaxregretformulationisto pro-videatrade-offbetweentherobustLSmethodstunedtotheworst possibledataparameters (underthe uncertaintybounds)andthe optimal LS estimator tuned to the underlying unknown model parameters. Minimax regret approaches have been presented in signal processing literature to alleviate the pessimistic nature of the worst case optimization methods [2,8,18,28–30]. In [18,29], linearminimax regret estimatorsare introduced to minimize the mean squared error (MSE) under imperfect knowledge of chan-nelstatisticsandtrueparameters,respectively.In[28],aminimum meansquarederror(MMSE)estimationtechniqueunderimperfect channel and data knowledge is investigated. In [2], these robust estimation methods are extended to flat fading channels to per-formchannelequalization.Thesemethodsareshowntoprovidea betteraverageperformance comparedtothe minimaxestimators, whereas under large perturbations the robustness of the mini-maxestimatorsaresuperiortothesecompetitivemethods.Onthe other hand, in this paper, the optimization frameworks investi-gatedherearesignificantly differentthan[9,16,23–27],wherethe regrettermsaredirectlyadjoinedinthecostfunctions.In particu-lar,unlike[2,18,28,29],wheretheuncertaintiesareinthestatistics ofthetransmittedsignalorchannelparameters,inthispaper,the uncertaintyisbothonthedatamatrixandtheoutputvector with-outanystatisticalassumptions.Whilein[8],theauthorshave con-sideredasimilar framework,the resultsofthispaperbuild upon themandprovidea completesolutionto theregretbased robust LSestimationmethodsunlike[8].Weemphasizethatperturbation boundson thedatamatrixandtheoutput vectorheavily depend ontheestimationalgorithmsemployed toobtainthem.Sinceour methods are formulated for given perturbation bounds, different estimationalgorithms canbereadilyincorporatedintoour frame-workwiththecorrespondingperturbationbounds[16].
Ourmaincontributionsinthispaperareasfollows.i) We intro-ducea novelandefficient robustLSestimation methodinwhich we findthe transmitted signal by minimizing the worst-case re-gret,i.e., the worst-case difference betweenthe residual errorof theLSestimatorandtheresidualerroroftheoptimalLSestimator tunedtotheunderlyingmodel.Inthissense,wepresentarobust estimationmethodthatachievesatradeoffbetweentherobustLS estimationmethodsandthedirectLSestimationmethodtunedto the estimates of the data matrix and output vector. ii) We next propose a minimax regret formulation forthe regularized LS es-timation problem. iii) We then introduce a structured robust LS estimation method in which the data matrix is known to have a special structure such as Toeplitz or Hankel. iv) We demon-stratethattherobustestimationmethodsweproposecanbecast as SDP problems, hence our methods can be efficiently imple-mented inreal-time [31].v) In our simulations, we observe that our approachesprovide better performance compared to the ro-bustmethods that are optimized with respect to the worst-case residual error[9,32], and the conventional methods that directly solvetheestimationproblemusingtheperturbeddata.
Theorganizationofthepaperisasfollows.Anoverviewtothe problemisprovidedinSection2.InSection3.1,wefirstintroduce
the LS estimation method based on our regret formulation, and thenpresenttheregularizedLSestimationapproachinSection3.2. We then consider the structured LS approach inSection 3.3and provide the explicit SDP formulations for all problems. The nu-mericalexamplesaredemonstratedinSection4.Finally,thepaper concludeswithcertainremarksinSection5.
2. Systemoverview
2.1. Notation
In this paper, all vectors are column vectors and represented byboldfacelowercaseletters.Matricesarerepresentedbyboldface uppercase letters.Fora matrixH, HH is theconjugatetranspose,
H is the spectral norm, H+ is the pseudo-inverse, H>
0 rep-resentsa positivedefinite matrixandH≥
0 represents apositive semi-definitematrix.ForasquarematrixH,Tr(
H)
isthetrace. Nat-urally,foravector x,x=
√
xHx isthe2-norm.Here,0 denotes avectorormatrixwithallzeroelementsandthedimensionscan beunderstoodfromthecontext.Similarly,I representsthe appro-priatesizedidentitymatrix.Theoperatorvec
(
·)
isthevectorization operator,i.e.,itstacksthecolumnsofamatrixofdimensionm×
nintoan mn
×
1 columnvector. Finally,theoperator⊗
isthe Kro-neckerproduct[33].2.2. Problemdescription
We investigate the problemof estimating an unknown deter-ministic vector x
∈ C
n whichis observedthrough adeterministic data matrix. However, instead of the actual data matrix andthe output vector, their estimates H∈ C
m×n andy∈ C
m and uncer-tainty boundsontheseestimatesare provided. Inthissense, our aimistofindasolutiontothefollowingdataestimationproblem y≈
Hx,
suchthat
y
+
y= (
H+
H)
x,
for deterministic perturbations
H
∈ C
m×n,y
∈ C
m. Although these perturbations are unknown, a bound on each perturbation isprovided,i.e.,H
≤ δ
H andy
≤ δ
Y,
where
δ
H,
δ
Y≥
0.Inthissense, werefrainfromanyassumptions on the data matrix and the output vector, yet consider that the estimatesH andy areatleastaccurateto“somedegree”buttheir actualvaluesundertheseuncertaintiesarecompletelyunknownto theestimator.Eveninthepresenceoftheseuncertainties,thesymbolvector x canbenaivelyestimatedbysimplysubstitutingtheestimatesH andy intotheLSestimator[10].FortheLSestimatorwehave
ˆ
x=
H+y,
whereH+isthepseudo-inverseofH[33].However,thisapproach yields unsatisfactory results,when the errors in theestimates of thedatamatrixandtheoutputvectorarerelativelyhigh[9,18,29, 32].A commonapproachtofindarobust solutionistoemploya worst-caseresidualminimization[9]
ˆ
x
=
arg minx∈Cn H≤δmaxH,y≤δY
(
y+
y)
− (
H+
H)
x2,
where x is chosen to minimize the worst-case residual error in theuncertaintyregion. However,since thesolutionisfound with
respecttotheworstpossibledatamatrixandoutputvectorinthe uncertaintyregions,itmaybehighlyconservative[15,18,29].
Here, we propose a novel LS estimation approach that pro-videsa tradeoffbetweenperformance androbustness inorderto mitigatethe conservativenature ofthe worst-caseresidual mini-mizationapproach aswell asto preserve robustness [18,29].The regretfornotusingtheoptimalLSestimatorisdefinedasthe dif-ference betweentheresidual errorwithan estimateof theinput vectorandtheresidualerrorwiththeoptimalLSestimator,i.e.,
R
(
x;
H,
y)
(
y+
y)
− (
H+
H)
x2−
minw∈Cn
(
y+
y)
− (
H+
H)
w2
.
(1)By making such a regret definition, we force our estimator not to construct the symbol vector according to the worst possible scenarioconsidering that it maybe tooconservative.Instead, we definetheregretofanyestimatorbythedifferenceinthe estima-tionperformances ofthatestimator andthe“smartest” estimator knowingbothdatamatrixandoutput vectorinhindsight,sothat we achieve atradeoff betweenrobustness andestimation perfor-mance.
Weemphasizethattheregretdefinedin(1)iscompletely dif-ferentthantheregretformulationintroducedin[18,29].In(1),the uncertaintyisonthedatamatrixwherethedesireddatavector x iscompletelyunknown,unlike[18,29].Weemphasizethatweuse theresidualerror
(
y+
y)
− (
H+
H)
x2insteadofthe estima-tionerrorˆ
x−
xsince theestimationerrordirectly dependson thevectorx andcannotbeusedintheregretformulationsincex isassumedto be unknownin thepresence ofdatauncertainties. Moreover, in our formulation, the estimate x isˆ
not constrained to be linear unlike [18,29] since our regret formulation is well-definedwithoutanylimitationsontheestimatedx.ˆ
In the next sections, the proposed approaches to the robust LS estimation problems are provided. We first introduce the re-gret based unstructured LS estimation method. We next present theunstructuredregularized LSestimationapproachinwhichthe worst-case regret is optimized. Finally, we investigate the struc-turedLSestimationapproach.
3. Robustleastsquaresestimationmethods
3.1. Unstructuredrobustleastsquaresestimation
Inthissection,weprovideanovelrobustunstructuredLS esti-matorbasedonacertainminimaxcriterion.Weconsiderthemost genericestimationproblem
min
x∈CnH≤δmax H,y≤δY
R
(
x;
H,
y),
(2)where
R(
x;
H
,
y
)
isdefinedasin(1).Nowconsideringthe sec-ondterm in(1),we define H˜
H+
H,y˜
y+
y,where H is˜
afull rankmatrix,anddenotetheestimationperformance ofthe optimalLSestimatorforsomegivenH and˜
y by˜
f
( ˜
H,
y˜
)
minw∈Cn
˜
y− ˜
Hw2
.
Sinceweconsideranunconstrainedminimizationoverw,wehave
[10]
w∗
arg minw∈Cn
˜
y− ˜
Hw2= ˜
H+y˜
,
(3)astheoptimaldatavectorminimizingtheresidualerror.Thenwe have
f
( ˜
H,
y˜
)
=
˜
y− ˜
Hw∗2=
y˜
− ˜
Hw∗Hy˜
− ˜
Hw∗= ˜
yHy˜
− ˜
Hw∗= ˜
yHP˜
y˜
,
where the third line follows from H
˜
HHw˜
∗= ˜
HHy˜
[10] and P˜
I
− ˜
HH˜
+istheprojectionmatrixofthespaceperpendiculartothe rangespaceof H.˜
Ifwe usetheTaylorseriesexpansion basedon Wirtingercalculus[33] for f( ˜
H,
y˜
)
aroundH˜
=
H andy˜
=
y,thenf
( ˜
H,
y˜
)
=
f(
H,
y)
+
2 ReTr∇
f( ˜
H,
y˜
)
|
H˜H=H,y˜=y
[
Hy
]
+
O[
H
y
]
2.
(4)Note that the first order Taylor approximation is introduced in order to obtain a tractable solution. Clearly, the effect of using thisapproximationvanishes as
[
Hy
]
decreases andfor dis-tortions withlarger[
Hy
]
, one can easily use higherorder approximations instead.However, we observe through our simu-lations that even for relatively large perturbations, a satisfactory performanceisobtainedusingthisapproximation.Wenow introducethefollowinglemmainordertoobtainthe firstorderTaylorapproximationin(4)inaclosedform.
Lemma1.LetH
˜
=
H+
H beafullrankmatrixandy˜
=
y+
y,where˜
H
∈ C
m×nandy˜
∈ C
m.Thendefining f( ˜
H,
y˜
)
˜
yHP˜
y,˜
whereP˜
I
−
˜
HH˜
+,wehave∂
f( ˜
H,
y˜
)
∂ ˜
H ˜ H=H,y˜=y= −
PyH+yH,
and∂
f( ˜
H,
y˜
)
∂
y˜
˜ H=H,y˜=y=
Py,
wherePI−
HH+.ProofofLemma1. Since H is
˜
full rank and m≥
n, the pseudo-inverseofH is˜
foundby[33]˜
H+˜
HHH˜
−1H˜
H.
Hence,wehave[33] D=
∂
∂ ˜
H˜
yHy˜
− ˜
yHH˜
˜
HHH˜
−1H˜
Hy˜
˜ H=H,˜y=y
=
HHHH−1HHyyHHHHH−1−
yyHHHHH−1=
HH+yH+yH−
yH+yH= −
PyH+yH,
(5) and b=
∂
∂
y˜
˜
yHy˜
− ˜
yHH˜
˜
HHH˜
−1H˜
Hy˜
˜ H=H,y˜=y
=
Py,
(6)where the last lineof the equality followssince HH+ is a sym-metric matrix according to the definition of the pseudo-inverse operation.ThisconcludestheproofofLemma 1.
2
Nowturningourattentionbackto(4),wedenote D
∂
f( ˜
H,
y˜
)
∂ ˜
H ˜ H=H,y˜=y,
and b
∂
f( ˜
H,
y˜
)
∂
y˜
˜ H=H,y˜=y,
where we emphasize that the closed form definitions of D and b canbe obtained fromLemma 1.We then approximate(4) and obtainthefirstorderTaylorapproximationasfollows
f
( ˜
H,
y˜
)
≈
f(
H,
y)
+
2 ReTr[
D b]
H[
Hy
]
=
κ
+
2 Revec(
D)
Hvec(
H)
+
bHy
=
κ
+
dHh
+
hHd+
bHy
+
yHb,
(7)where
κ
f(
H,
y)
,dvec(
D)
,andhvec
(
H)
.Hencewecan approximatetheregretin(1)asfollowsR
(
x;
H,
y)
≈ ˜
y− ˜
Hx2−
κ
+
dHh
+
hHd+
bHy
+
yHb.
(8)In the following theorem, we illustrate how the optimization (orequivalently estimation) problemin(8)can beputin an SDP form.
Theorem1.LetH
∈ C
m×nandy∈ C
mbetheestimatesofthedata ma-trixandtheoutputvector,respectively,bothhavingdeterministic addi-tiveperturbationsH
≤ δ
Handy
≤ δ
Y,respectively,i.e.,H˜
=
H+
Handy
˜
=
y+
y,whereH is˜
thefullrankdatamatrix,y is˜
theoutput vec-tor,andm≥
n.Thentheproblemmin
x∈CnH≤δmax H,y≤δY
R
(
x;
H,
y),
(9)where
R(
x;
H
,
y
)
isdefinedasin(8),isequivalenttosolvingthe followingSDPproblem minγ
subject toτ
1≥
0,
τ
2≥
0,
and⎡
⎢
⎣
γ
+
κ
−
τ
1−
τ
2(
y−
Hx)
Hδ
YbHδ
HdH y−
Hx I−δ
YIδ
HXδ
Yb−δ
YIτ
1I 0δ
Hdδ
HXH 0τ
2I⎤
⎥
⎦ ≥
0,
(10)whereX isthem
×
mn matrixdefinedasXxH⊗
I. TheproofofTheorem 1isprovidedinAppendix A.Remark1.IntheproofofTheorem 1,weuseProposition 1that re-liesonthelossless S-procedure.However, S-procedureis lossless withtwoconstraintswhenthecorresponding twoquadratic (Her-mitian)formsonthecomplexlinearspace[34].However,classical
S-procedureforquadraticformsis,ingeneral,lossywithtwo con-straintsintherealcase[35].Hence,Theorem 1cannotbeextended forreallinearspace.
Nowwe canconsidertwo importantcorollariesofTheorem 1. First,aspecialcaseofTheorem 1inwhichtheuncertaintyisonly inthedatamatrix.Weemphasizethattheperturbationerrorsonly inthe data matrixare also commonin a wide range ofreal life applications[10].Here,wecandefinetheregretasfollows
R
(
x;
H)
y
− ˜
Hx2−
minw∈Cn
y− ˜
Hw2
,
(11)andsimilartotheprevious case,wecalculatetheoptimal estima-tionperformanceunderagivenuncertaintybound
f
( ˜
H)
min w∈Cny− ˜
Hw 2≈
κ
+
2 ReTr∇
f( ˜
H,
y)
HH˜=HH
=
κ
+
2 RevecDHvec(
H)
=
κ
+
dHh
+
hHd.
Henceweapproximatetheregretin(11)asfollows
R
(
x;
H)
≈
y− ˜
Hx2−
κ
+
dHh
+
hHd.
(12)Corollary1.LetH
∈ C
m×n andy∈ C
m betheestimatesofthedata matrixandtheoutputvector,respectively,wherem≥
n.Supposethereis aboundeduncertaintyonthefullrankdatamatrixH,˜
i.e.,H˜
=
H+
H,H
≤ δ
H.Thentheproblemmin
x∈CnmaxH≤δ H
R
(
x;
H),
(13)where
R(
x;
H
)
isdefinedasin(12),isequivalenttosolvingthe follow-ingSDPproblem minγ
subject toτ
≥
0 andγ
+
κ
−
τ
(
y−
Hx)
Hδ
Hd y−
Hx Iδ
HXδ
Hdδ
HXHτ
I≥
0.
(14)OutlineoftheproofofCorollary1. TheproofofCorollary 1canbe explicitlyderived from theproof ofTheorem 1by simply setting
δ
Y=
0 andτ
1=
0,henceisomitted.2
Second, we consider another special case of Theorem 1 in which the uncertainty is only in the output vector. We empha-size thatsimilar to theprevious case, thisone isalsoa common caseinawiderangeofreal-lifeapplications[10],andstudied un-derasimilarframeworkin[18].Here,wecandefinetheregretas follows
R
(
x;
y)
˜
y−
Hx2−
minw∈Cn
˜
y−
Hw2
,
(15)andsimilartothepreviouscase,wecalculatetheoptimalalso per-formanceunderagivenuncertaintybound
f
(
y˜
)
min w∈Cn˜
y−
Hw 2≈
κ
+
2 ReTr∇
f(
H,
y˜
)
Hy˜=yy
=
κ
+
2 RebHy
=
κ
+
bHy
+
yHb.
Henceweapproximatetheregretin(15)asfollows
R
(
x;
y)
≈ ˜
y−
Hx2−
κ
+
bHy
+
yHb.
(16)Corollary2.LetH
∈ C
m×nandy∈ C
mbetheestimatesofthedata ma-trixandtheoutputvector,respectively,wherem≥
n.Supposethereisa boundeduncertaintyontheoutputvectory,˜
i.e.,y˜
=
y+
y,y
≤ δ
Y.Thentheproblem
min
x∈Cnmaxy≤δY
R
(
x;
y),
(17) whereR(
x;
y
)
isdefinedasin(16),isequivalenttosolvingthe follow-ingSDPproblemmin
γ
subject toτ
≥
0 andγ
+
κ
−
τ
(
y−
Hx)
Hδ
YbH y−
Hx I−δ
YIδ
Yb−δ
YIτ
I≥
0.
(18)OutlineoftheproofofCorollary 2. TheproofofCorollary 2canbe explicitlyderived fromthe proof ofTheorem 1by simply setting
δ
H=
0 andτ
2=
0,henceisomitted.2
Remark2.Corollaries 1 and 2follow fromtheproofofTheorem 1, which relies on the lossless S-procedure. Under the frameworks presentedinCorollaries 1 and 2,one cansafelyextendthesame conclusions for the real case also, since S-procedure is lossless forquadraticformswithone constraintbothincomplexandreal spaces[36,37].
3.2. Unstructuredrobustregularizedleastsquaresestimation
Inthis section, we introduce a worst-caseregret optimization approach tosolve the regularized LS estimationproblemin [32]. The regret for not using the optimal regularized LSestimator is definedby
R
(
x;
H,
y)
˜
y− ˜
Hx2+
μ
x2−
min w∈Cn˜
y− ˜
Hw2+
μ
w2,
(19)where
μ
>
0 is theregularization parameter. Weemphasize that therearedifferentapproachestochooseμ
,however,forthefocus of this paper, we assume that it is already set before the opti-mizationsothatthesemethodscanbereadilyincorporatedinour framework.Hence,wesolvetheregularizedLSestimationproblem foran arbitraryμ
>
0 andnote thatwehavealreadycoveredtheμ
=
0 caseinSection3.1.Similartothepreviouscase,wedenotetheestimationerrorof the optimal LSestimator for some estimated data matrix H and outputvectory by f
(
H,
y)
min w∈Cny−
Hw 2+
μ
w2
=
P−1y2=
yHP−1y,
where P
I+
μ
−1HHH. Considering the firstorder Taylor series expansionbasedonWirtingercalculus[33]for f( ˜
H,
y˜
)
aroundH˜
=
H andy˜
=
yf
( ˜
H,
y˜
)
≈
κ
+
2 ReTr∇
f( ˜
H,
y˜
)
HH˜=H,y˜=y[
Hy
]
,
=
κ
+
dHh
+
hHd+
bHy
+
yHb,
wheredvec(
DH)
,h
vec
(
H)
,D
∂
f( ˜
H,
y˜
)
∂ ˜
H ˜ H=H,y˜=y= −
P−1yyHP−1H,
(20) and b∂
f( ˜
H,
y˜
)
∂
y˜
˜ H=H,y˜=y=
P−1y,
where the last line follows since P is symmetric. Hence we can approximatetheregretin(19)asfollows
R
(
x;
H,
y)
≈ ˜
y− ˜
Hx2+
μ
x2−
κ
+
dHh
+
hHd+
bHy
+
yHb,
(21)similarto(8).Inthefollowingtheorem,weillustratehowthe op-timizationproblemin(21)canbeputinanSDPform.
Theorem2.LetH
∈ C
m×nandy∈ C
mbetheestimatesofthedatama-trixandtheoutputvector,respectively,bothhavingdeterministic addi-tiveperturbations
H
≤ δ
Handy
≤ δ
Y,respectively,i.e.,H˜
=
H+
Handy
˜
=
y+
y,whereH is˜
thefullrankdatamatrix,y is˜
theoutput vec-tor,andm≥
n.Thentheproblemmin
x∈CnH≤δmax H,y≤δY
R
(
x;
H,
y),
(22)where
R(
x;
H
,
y
)
isdefinedasin(21),isequivalenttosolvingthe followingSDPproblem minγ
subject toτ
1≥
0,
τ
2≥
0,
and⎡
⎢
⎢
⎢
⎣
γ
+
κ
−
τ
1−
τ
2(
y−
Hx)
H xHδ
YbHδ
HdH y−
Hx I 0−δ
YIδ
HX x 0μ
I 0 0δ
Yb−δ
YI 0τ
1I 0δ
Hdδ
HXH 0 0τ
2I⎤
⎥
⎥
⎥
⎦
≥
0.
(23)ProofofTheorem2. TheproofofTheorem 2followssimilarlines totheproofofTheorem 1,henceisomittedhere.
2
Remark3. Under the framework introduced in this section, one canstraightforwardlyobtainthecorollariessimilartoCorollaries 1 and 2byconsideringcasesinwhichtheuncertaintyiseitheronly on thedata matrixoronly onthe outputvector, i.e.,
δ
Y=
0 andδ
H=
0 cases, respectively. The derivations follow similar lines toCorollaries 1,2andTheorem 2,henceisomitted.However,similar results canbe readilyderived fromtheresultinTheorem 2 with suitablechangesintheSDPformulations.
3.3. Structuredrobustleastsquaresestimation
There arevariouscommunicationsystemswherethedata ma-trix and the perturbation on it have a special structure such as Toeplitz, Hankel,or Vandermonde [9,15].Incorporating this prior knowledge intotheestimationframeworkcouldimprovethe per-formanceoftheregretbasedminimaxLSestimationapproach[9, 15]. Hence, in this section, we investigate a special case of the problem in (2), where the associated perturbations for the data matrix H and the output vector y have special structures. The structureontheperturbationsisdefinedasfollows
H
=
p i=1α
iHi,
(24) andy
=
p i=1β
iyi,
(25)where Hi
∈ C
m×n, yi∈ C
m, and p are known butα
i,
β
i∈ C
,i
=
1,
. . . ,
p, are unknown. However, the boundson the norm ofα
[
α
1,
. . . ,
α
p]
H andβ
[β
1,
. . . ,
β
p]
H areprovidedasα
≤ δ
αand
β
≤ δ
β, whereδ
α,
δ
β≥
0. We emphasize that this formu-lation canrepresenta wide rangeofconstraints onthe structure ofperturbationsofthedatamatrixandtheoutputvectorsuchasToeplitzandHankel[9,10].Ouraimistosolvethefollowing opti-mizationproblem min x∈Cnα≤δmaxα,β≤δ β
R
(
x;
H,
y),
whereR
(
x;
H,
y)
˜
y− ˜
Hx2−
min w∈Cn˜
y− ˜
Hw 2,
(26)˜
HH+
H=
H+
p i=1α
iHi,
(27)˜
yy+
y=
y+
p i=1β
iyi.
(28)After following similar lines to Section 3.1, and introducing thefirstorder Taylorapproximation to f
( ˜
H,
y˜
)
aroundα
=
0 andβ
=
0,weobtainf
( ˜
H,
y˜
)
≈
κ
+
2 ReTr∇
f( ˜
H,
y˜
)
αH=0,β=0[
α
β
]
,
(29)where f
( ˜
H,
y˜
)
= ˜
yHP˜
y and˜
P˜
=
I− ˜
HH˜
+. We next introduce the followinglemmatocalculatethe firstorderTaylorapproximation in(29)inaclosedform.Lemma2.LetH
˜
=
H+
H beafullrankmatrixandy˜
=
y+
y,where˜
H
∈ C
m×n,y˜
∈ C
m,H and
y aredefinedasin(24)and(25),
respec-tively.Thendenoting f
( ˜
H,
y˜
)
˜
yHPy,˜
whereP˜
I− ˜
HH˜
+,wehave∂
f( ˜
H,
y˜
)
∂
α
α=0,β=0=
−
yHPHH1H+y, . . . ,
−
yHPHHpH+y H,
(30) and∂
f( ˜
H,
y˜
)
∂β
α= 0,β=0=
yHPy1, . . . ,
yHPyp H,
(31) wherePI−
HH+.ProofofLemma2. Notethatthederivativeof f
( ˜
H,
y˜
)
istakenwith respectto[
α
β
]
,hencewecanusetheChainRuletocalculatethe derivativesbyusingtheresultswehaveobtainedinLemma 1.First,weconsider thederivative of f
( ˜
H,
y˜
)
withrespect toα
i,i
=
1,
. . . ,
p,i.e., di∂
f( ˜
H,
y˜
)
∂
α
i α= 0,β=0=
Tr∂
f( ˜
H,
y˜
)
∂ ˜
H H∂ ˜
H∂
α
i α= 0,β=0=
Tr−
H+yyHPHHi= −
yHPHHiH+y,
wherethe last line follows fromthe cyclicproperty of the trace operator.
Similarly, we next consider the derivative of f
( ˜
H,
y˜
)
with re-specttoβ
i,i=
1,
. . . ,
p,i.e., bi∂
f( ˜
H,
y˜
)
∂β
i α=0,β=0=
Tr∂
f( ˜
H,
y˜
)
∂
y˜
H∂
y˜
∂β
i α=0,β=0=
yHPyi.
ThisconcludestheproofofLemma 2.
2
Nowturningourattentionbackto(29),wedenote d
∂
f( ˜
H,
y˜
)
∂
α
α= 0,β=0,
and b∂
f( ˜
H,
y˜
)
∂β
α= 0,β=0,
wherewe emphasizethat theclosed formdefinitionsofd andb can be obtained from Lemma 2. We then approximate (29) and obtainthefirstorderTaylorapproximationasfollows
f
( ˜
H,
y˜
)
≈
κ
+
dHα
+
α
Hd+
bHβ
+ β
Hb.
Therefore,wecanapproximatetheregretin(26)asfollows
R
(
x;
H,
y)
≈ ˜
y− ˜
Hx2−
κ
+
dHα
+
α
Hd+
bHβ
+ β
Hb.
(32) In the following theorem,we illustrate how the optimization problemin(32)canbeputinanSDPform.Theorem3.LetH
,
H1,
. . . ,
Hp∈ C
m×n,y,
y1,
. . . ,
yp∈ C
m,δ
H,
δ
Y≥
0,m
≥
n,whereH is˜
thefullrankdatamatrixdefinedasin(27),y is˜
the outputvectordefinedasin(28),withthecorrespondingestimatesH and y,respectively.Thentheproblemmin
x∈Cnα≤δmaxα,β≤δ
β
R
(
x;
H,
y),
(33)where
R(
x;
H
,
y
)
isdefinedasin(32),isequivalenttosolvingthe followingSDPproblem minγ
subject toτ
1≥
0,
τ
2≥
0,
and⎡
⎢
⎣
γ
+
κ
−
τ
1−
τ
2(
y−
Hx)
Hδ
αdHδβ
bH y−
Hx I−δ
αGδβ
Qδ
αd−δ
αGHτ
1I 0δβ
bδβ
QH 0τ
2I⎤
⎥
⎦ ≥
0,
(34) whereG[
H1x,
. . . ,
Hpx]
andQ[
y1,
. . . ,
yp]
.ProofofTheorem3. The proofofTheorem 3followssimilar lines totheproofofTheorem 1,henceisomittedhere.
2
Remark4. Under the framework introduced in this section, one canstraightforwardlyobtainthecorollariessimilartoCorollaries 1 and 2byconsideringcasesinwhichtheuncertaintyiseitheronly on thedata matrixoronly onthe output vector, i.e.,
δ
β=
0 andδα
=
0 cases,respectively. The derivations followsimilar lines toCorollaries 1,2andTheorem 3,henceisomitted.However,similar resultscan be readilyderived fromtheresultinTheorem 3 with suitablechangesintheSDPformulations.
Remark5. The proofs ofTheorem 2 andTheorem 3 followfrom theresultsofTheorem 1,whichreliesonthelossless S-procedure.
However, S-procedure is lossless with two constraints when the corresponding two quadratic (Hermitian) forms on the complex linear space [34]. However, classical S-procedure for quadratic forms is, in general, lossywith two constraints in the real case
[35]. Hence, Theorem 2 and Theorem 3 cannot be extended for real linear space. On the other hand, under the frameworks de-scribed in Remark 3 and Remark 4, one can safely extend the sameconclusionsfortherealcasealso,since S-procedureis loss-lessforquadratic formswithone constraintbothincomplexand realspaces[36,37].
Fig. 1. Sortedresidualerrorsforthergrt-LS,rbst-LS,LS,andTLSestimatorsover 1000 trialswhenδH= δY=1.2,m=5,andn=3.
4. Simulations
Weprovidenumericalexamplesindifferentscenarios inorder toillustratethemeritsoftheproposed algorithms.Inthefirstset of the experiments,we randomly generate a data matrixof size
m
×
n,andan outputvector ofsizem×
1,whichare normalized tohaveunitnorms.Then,wegenerate1000 randomperturbationsH,
y, where
H
≤ δ
H,y
≤ δ
Y,m=
5,n=
3, andδ
H=
δ
Y=
1.
2. Here,welabelthealgorithm inTheorem 1as“rgrt-LS”, therobustLSalgorithm of[9]as“rbst-LS”,thetotal LSalgorithm[9]as“TLS”,andfinallytheLSalgorithmtunedtotheestimatesof thedatamatrixandtheoutput vectoras“LS”,wherewedirectly usex
ˆ
=
H+y.Foreachalgorithmandforeach randomperturbation,we find the corresponding x and
ˆ
calculatethe error˜
Hxˆ
− ˜
y2. Afterwe calculatetheerrorsforeachalgorithm andforallrandom pertur-bations,weplotthecorrespondingsortederrorsinascendingorder inFig. 1for1000perturbations. Sincetherbst-LSalgorithm opti-mizestheworst-caseresidualerrorwithrespecttoworstpossible disturbance, it usually yields the smaller worst-case residual er-roramongallalgorithmsforthesesimulations.Ontheotherhand, sincetheLSalgorithmdirectlyusestheestimates,itusuallyyields thesmallerresidualerrorwhentheperturbationsonthedata ma-trixandtheoutputvectoraresignificantlysmall.TheseresultscanbeobservedinFig. 1,whereinoneextreme, thelargestresidualerrors areobserved as2
.
9762 fortheTLS es-timator, 2.
2557 for the LSestimator, 1.
9275 for the rbst-LS esti-mator,and1.
9325 forthergrt-LSestimator.Intheotherextreme, i.e., when there is almost no perturbation, the smallest estima-tion errors are observed as 0.
3035 for the LS estimator, 0.
4036 fortheTLSestimator,0.
8727 fortherbst-LSestimator,and0.
6387 for the rgrt-LS estimator. While the LS estimator can be prefer-ablewhenthereisrelativelysmallerperturbationsandtherbst-LS estimatorcanbepreferablewhenthereissignificantlyhigher per-turbations,the introduced algorithm provides a tradeoffbetween thesealgorithmsandachieve asignificantlysmalleraverage error performance.Theaverageresidualerrorofthergrt-LSestimatoris observed as 1.
1928, whereas this value is 1.
2180 for the LS es-timator, 1.
2708 for therbst-LS estimator,and 1.
3826 for theTLS estimator.Hence,thergrt-LSestimatorisnotonlyrobustbutalso efficientintermsoftheaverageerrorperformancecomparedtoits well-knownalternatives. Owingtothecompetitive formulationof ourestimators,we achieve suchaverage performancegains espe-ciallywhentheperturbationsaremoderate.Fig. 2. Averagedresidualerrorsforthergrt-LS,rbst-LS,LS,andTLSestimatorsover 2000 trialsform=5 andn=3,whenδ= δH= δY∈ [0.5,1].
Fig. 3. Averagedresidualerrorsforthergrt-LS,rbst-LS,LS,andTLSestimatorsover 2000 trialsform=5 andn=3,whenδH∈ [0.5,1]andδY=1.
In the second set of experiments, we illustrate the perfor-mancesofthe proposedalgorithms undervarious
δ
H andδ
Y val-ues. For theseexperiments, we generate 2000 random perturba-tionsH,
y, where
H
≤ δ
H,y
≤ δ
Y, m=
5, n=
3 for differentperturbationboundsandcomputetheaveragederrorover 2000 trialsforthergrt-LS,LS,rbst-LS,andTLSalgorithms.InFig. 2, wepresenttheaveragedresidualerrorsofthesealgorithmsfor dif-ferent values of perturbation bounds, i.e.,δ
= δ
H= δ
Y∈ [
0.
5,
1]
. Weobservethattheproposedrgrt-LSalgorithmhasthebest aver-ageresidualerrorperformanceoverdifferentperturbationbounds compared tothe LS, therbst-LS andtheTLS algorithms. Further-more,inFig. 3andFig. 4,wepresenttheaveragedresidualerrors of these algorithms for different perturbation bounds, i.e., whenδ
H= δ
Y.Particularly,inFig. 3,we setδ
H∈ [
0.
5,
1]
,δ
Y=
1 andinFig. 4,weset
δ
H=
1,δ
Y∈ [
0.
5,
1]
.As can be observed from Fig. 2, as the perturbation bounds increase, the performancesof the LSandthe TLS estimators sig-nificantly deteriorate, whereas the rgrt-LS estimator provides an excellent performance.Theresidualerroroftherbst-LSestimator, on the other hand, slightlyincreases asthe perturbation bounds increase,i.e.,itisthemostrobustalgorithmagainstthe
perturba-Fig. 4. Averagedresidualerrorsforthergrt-LS,rbst-LS,LS,andTLSestimatorsover 2000 trialsform=5 andn=3,whenδH=1 andδY∈ [0.5,1].
Fig. 5. Sortedresidualerrorsforthestr-rgrt-LS,str-rbst-LS,SLS-BDU,andLS estima-torsover1000 trialswhenδH= δY=0.75,m=5,andn=3.
tions dueto its highly conservative nature. Yet, the performance ofthis estimatoris significantly inferior to the rgrt-LS estimator. Furthermore,the rgrt-LSestimatorprovidesthebest performance under different
δ
H andδ
Y values. Particularly, in Fig. 3, we ob-serveasimilarbehaviortotheoneinFig. 2,whereouralgorithm provides a robust performance while also providingthe smallest residualerror(especiallyforhighδ
H).Ontheotherhand,inFig. 4, weobservethat theperformanceofrgrt-LS estimatorisless sen-sitive tothe changes inδ
Y compared tothe rbst-LS, LS, andTLS estimators.In the next experiment, we examine a system identification problem[15], which can be formulated as H0x
=
y0,where H=
H0+
W is the observed noisy Toeplitz matrixand y=
y0+
w is theobservednoisy outputvector. Here,theconvolution matrixH (whichisToeplitz)constructedfromh whichisselectedasa ran-domsequenceof±
1’s.Wethengenerate1000 randomstructured perturbations forH0 andy0, whereα
≤
0.
75H0,and plotted thesortedestimationerrorsinascendingorderinFig. 5.Theaverageresidualerrorsareobservedas1.1155forthe struc-turedregretLSestimator“str-rgrt-LS”ofRemark 4,1.1807 forthe structuredrobust LSalgorithm“str-rbst-LS”, 1.1138forthe LS
es-Fig. 6. Sortedresidual errorsfor rgrt-reg-LS,rbst-reg-LS,and LSestimatorsover 1000 trialswhenδH= δY=0.65,μ=0.5,m=3,andn=2.
timator,and1.2576forthestructuredleastsquaresboundeddata uncertainties estimator“SLS-BDU” of[15]. Therefore,we observe thatthestr-rgrt-LSalgorithmyieldsa smalleraverageresidual er-rorwithrespecttootherrobust estimatorsandachievesthe aver-ageperformanceoftheLSestimator.Inaddition,weobservethat the maximumresidual errors are observedas 1.5554 forthe str-rgrt-LSestimator,whereasitis1.6659fortheLSestimator.Hence, the introducedalgorithm can beused toobtain robustness with-outsignificantlossesintheaverageestimationperformanceunlike theconventionalrobustestimationmethods.Nevertheless,we em-phasize that for a structured system, the performance of these algorithms are highly sensitive to the structures of the matrices and the vectors. If the perturbation bound is quite high, the ro-bustnessmaynotbepreservedunderlargeperturbations.
Inthefourthexperiment,i.e.,inFig. 6,weprovideerrorssorted inascendingorderforthealgorithminTheorem 2as“rgrt-reg-LS”, fortherobustregularizedLSalgorithmin[16]as“rbst-reg-LS”and finallyfortheregularizedLSalgorithmas“reg-LS”[10],wherethe experimentsetupisthesameasinthefirstexperimentexceptthe perturbationboundsaresetto0
.
65 andtheregularization param-eterischosenasμ
=
0.
5.InFig. 6,weobservethattherobustness and the performance tradeoff (between the rbst-reg-LS and the reg-LSalgorithms)oftheintroducedrgrt-reg-LSalgorithm.Whenthereissmallperturbations onthedatamatrixandthe output vector, i.e., inthebest-case scenario, theresidual errorof the reg-LSestimator is 0.1045,whereas it is 0.2416 forthe rgrt-reg-LSestimator and0.4282forthe rbst-reg-LS estimator.Ascan beobservedfromFig. 6,forhigherperturbations,theperformance ofthereg-LSestimatorsignificantlydeteriorates,whereasthe rgrt-reg-LS and rbst-reg-LS algorithms provide a robust performance. Onthe otherhand,the rgrt-reg-LSestimatorsignificantly outper-formstherbst-reg-LSestimatorintermsoftheaverageerror per-formance and achieveseven a more desirable errorperformance comparedtothe reg-LSestimator.Theaverageresidualerrors are calculated as0.9059 forthe rgrt-reg-LSestimator, 0.9177 forthe reg-LSestimator,and1.0316fortherbst-reg-LSestimator.This ex-periment illustrates thesensitivity ofthe reg-LSestimator tothe perturbations. Onthe otherhand, thergrt-reg-LSandrbst-reg-LS estimators provides more robust performances compared to the reg-LSestimator.Yet,thehighlypessimisticnatureofthe rbst-reg-LSestimatordeterioratesitsestimationperformanceandyieldsan unacceptableperformance. Ouralgorithm,ontheother hand,not onlyyields arobust performance comparedtothe reg-LS
estima-Fig. 7. BERperformancesofthergrt-LS,rbst-LS,andTLSestimators(equalizers)over 1 000 000trialsundervariousSNRs,whenm=3 andn=2.
torbutalsodoesnotcauseanyaverageperformancedegradations unliketheconventionalrobustestimationmethods.
Finally,weillustratethepossibleapplicationsofouralgorithm into different frameworks. Particularly, we consider the channel equalizationproblemandillustratethebiterrorrate(BER) perfor-manceofouralgorithmwithrespecttoitswell-knownalternatives intheliteratureasfollows.
Inthesesimulations,we define thesignal-to-noise ratio(SNR) asfollows SNR
=
20 log xδ
,
where
H=
1 andlog(
·)
isthecommon(i.e.,base10)logarithm. ForagivenSNR,wegenerate1 000 000symbolvectorsofx (having length 2) froma binary alphabetand 1 000 000 estimates ofthe (MIMO) channel matrix H (sized 3×
2) both having unit norms, randomly. For every symbol vector andchannel estimate couple, werandomlygenerateperturbationsH and
y,calculatethe cor-respondingperturbed output vector, andfeedthisinformationto thealgorithms. We quantizetheestimate ofthe symbolvector x
ˆ
andconsiderthenumberofincorrectbitsastheBER(i.e.,we con-sidertheBERratherthanthesymbolerrorrate).InFig. 7,weprovidetheBERsforvariousSNRs.Weobservethat the proposed algorithm outperforms its competitors in terms of equalizationperformance andsuccessfullyreconstructs the trans-mitted bits. While Fig. 7 illustrates the BER of the proposed al-gorithmsaveraged over a huge numberof channel uses,we also illustrate the robustness of our algorithm over small number of channelusesinFig. 8andFig. 9.Intheseexperiments,weperform 100 independenttrialsineachofwhich10 000symbolvectorsand channelmatrixestimatesaregeneratedandsentoverthechannel asinthepreviousexperimentforSNR
=
20 andSNR=
25, respec-tively.In Fig. 8 andFig. 9, we observe that our algorithm not only providesasuperioraveragedperformancewithrespecttoits well-known alternatives but also provides a robust performance. The conventional robust LS estimators provide unsatisfactory results since these algorithms adapt themselves to the worst-case sce-nario. However, the rgrt-LS estimator has a significantly smaller BER compared tothe rbst-LS andTLS estimators,since our algo-rithmdoesnot tuneitselfto theworst possibleperturbation,but considerstheworst possibleregret.Particularly,whenthe pertur-bationontheestimatesarerelativelysmall,ouralgorithmprovides
Fig. 8. SortedBERsforthergrt-LS,rbst-LS,andTLSestimators(equalizers)over100 trials,whereineachtrial10 000symbolvectorsaresendforSNR=20,m=3,and n=2.
Fig. 9. SortedBERsforthergrt-LS,rbst-LS,andTLSestimators(equalizers)over100 trials,whereineachtrial10 000symbolvectorsaresendforSNR=25,m=3,and n=2.
significant performance improvements compared to the conven-tionalmethodsascanbeseeninFig. 8andFig. 9.
5. Conclusion
Inthispaper,we introducearobustapproachtoLSestimation problems underboundeddatauncertainties basedon anovel re-gret formulation. We studytherobust LSestimation problemsin the presence of unstructuredand structured perturbations under residual and regularized residual error criteria. In all cases, the data vectors that minimize the worst-case regrets are found by solvingcertainSDPproblems.Inoursimulations,weobservedthat theproposedestimationmethodsprovidean efficienttradeoff be-tweentheperformanceandrobustness.Owingtotheregretbased formulation of the proposed method, we obtain significant im-provements interms oftheaverageestimationperformance with respect to the conventional robust minimax estimation methods, whilemaintainingtherobustnessasshowninourexperiments.