networks An intelligent cyber security system against DDoS attacks in SIP Computer Networks

(1)

ContentslistsavailableatScienceDirect

Computer Networks

journalhomepage:www.elsevier.com/locate/comnet

An intelligent cyber security system against DDoS attacks in SIP networks

Murat Semerci

^a^,^∗

, Ali Taylan Cemgil

^a

, Bülent Sankur

^b

a Department of Computer Engineering, Bogazici University, Bebek, Istanbul 34342, Turkey

b Department of Electrical and Electronics Engineering, Bogazici University, Bebek, Istanbul 34342, Turkey

a rt i c l e i n f o

Article history:

Received 26 September 2017 Revised 11 January 2018 Accepted 25 February 2018 Available online 7 March 2018 Keywords:

Anomaly detection Malicious user detection DDoS

Mahalanobis distances Sequence alignment kernel

a b s t r a c t

DistributedDenialofServices(DDoS)attacksareamongthemostencounteredcybercriminalactivities incommunicationnetworksthatcanresultinconsiderableﬁnancialandprestigelossesforthecorpora- tionsorgovernmentalorganizations.Therefore,autonomousdetectionofaDDoSattackandidentiﬁcation ofitssourcesisessentialfortakingcounter-measures.Thisstudyproposesanintelligentsecuritysystem againstDDoSattacksincommunicationnetworksthatiscomposedoftwocomponents:Amonitorfor detectionofDDoS attacksand adiscriminatorfor detectionofusers inthesystemwithmaliciousin- tents.Anoveladaptiverealtimechange-pointmodelthattracksthechangesinMahalanobisdistances betweensampledfeaturevectorsinthemonitoredsystemaccountsforpossible DDoSattacks.Aclus- teringmodelthat runsoverthe similarityscoresofbehavioralpatterns betweenthe usersisused to segregatethemaliciousfromtheinnocent.Theproposedmodelisdeployedoverasimulatedtelephone networkthatusesaSessionInitiationProtocol(SIP)server.Theperformanceofthemodelsareevaluated ondatageneratedbythishighthroughputsimulationenvironment.

1. Introduction

DistributedDenialofService(DDoS)attacksareoneofthema- jorcyberthreatsoncommunicationsnetworks.DDoSattacksoccur veryfrequentlybecausetheyarefairlysimpleandcheaptoinitiate while their broad impacton users andservice providerscan po- tentiallybe severe.Such an attackincapacitates thevictimserver andrendersitunabletoprovideservicesatalloratdesiredqual- ityofservicelevels to itssubscribers. Withthe cost-effectivede- ployment ofcloudsystems,DDoS attacksmight affecttheoverall availability ofthe servicesby targetingmorethan oneserver [1]. Theycanevenbeatoolforpoliticalstruggleonagranderscale;a caseinpointisthesetofDDoSattacksto Turkey’sdomainname serversbyhacktivistgroupsinDecember2015[2].Asamoreradi- calcase,theycanbeexertedoversmartpowertransmissiongrids, withpotentiallymorecatastrophicconsequences[3].Thereforeau- tomatic detectionof DDoS attacksandidentiﬁcation ofmalicious users are crucial in protectingthe network entities and fornon- degradedservicecontinuity.

Telephoneserviceprovidersfollowthetrendofchanging their circuit-switchednetworkstopacket-switched onesin viewofthe

∗ Corresponding author.

E-mail address: murat.semerci@boun.edu.tr (M. Semerci).

cost-effectiveness and maturity of the Voice-over-IP (VoIP) tech- nology.The mostpopular protocol for control signaling between communicating parties in VoIP is currentlythe Session Initiation Protocol (SIP)[4].SIP is based ona simple, HTTP-liketext-based request-response transaction model. It provides basic signaling functionalitiesrequiredforregisteringclients,checkingtheirpres- enceandon-lineavailability,exchangingtheir communicationca- pabilities,andoverallmanagingthesessions.Withthedeployment of 5G, VoIP is expected to be one of the major instruments for themultimediacommunication.ThewidedeploymentofVoIPnet- worksandthekey importanceoftelephone networkshavemade thesecurityissuesofSIPserversextremelyimportant.

VoIPnetworksareunderavarietyofcyberthreatsandthein- tensityof attacks seems only to be growing [5]. The attacks can bemotivated bypotential ﬁnancial beneﬁts,such aspilfering call chargesorcausing dataleakagemasqueraded as astealth threat.

Conversely,itmaybepartofaplantocauseﬁnanciallossestothe serviceprovidersviaheavyservicedisruption[6].

In thispaper we introduce a novel real-time onlineintrusion detectionandpreventionsystemforcommunicationnetworks,par- ticularly fornetworkswith SIPtraffic. The proposed systemboth detectsthepresenceofanattack andidentifies theattackers.The system focuses on the DDoS attacks that flood and suffocate a serverwithexcessiveamountofrequests.One cluefortheoccur- renceofaDDoSattackisamarkedchangeinthemessagingtraffic https://doi.org/10.1016/j.comnet.2018.02.025

(2)

patternsinthenetwork.Tothiseffect,wedevelopachangedetec- tionalgorithmthatmonitors thenetworktrafficintensities atthe serverside.Significantchangesinthecharacteristicsofmessaging flowsareinterpretedastheonsetoroffsetofapotentialDDoSat- tack.We assume tacitly that in a DDoS attack, the attackers are alwaysactinginacoordinatedmanner.

Anovelaspectoftheproposedchange-pointdetectionmethod isthatitrelieson theadaptivetrackingofMahalanobisdistances betweensuccessive state vectors asa way to monitor abnormal changesinmessagingtraffic.Thisenablesthemonitortoadaptit- selftothenormaltraffic regimeand/ortothediurnalorseasonal variationswhileatthesametimeremainingsensitivetoabnormal changes.Oneadvantageofourmethodisthatitismodel-free,that is,itisan unsupervisedapproach todetecttraffic anomalies.The systemmakesuseonlyoftheobservedmessagingtraffictypeand intensity,anddoesnotrequireanyadditionalinformationsuchas tracebacks.An abnormalchange in the traffic regime is declared iftheMahalanobis distancesequence ofthe state vectorsinsuc- cessivetimewindowsexceedsathresholdfunction.Thisthreshold valuecan beset toa constantasa functionofsytemparameters orcanbeadaptivelyset.ApreliminaryversionoftheMahalanobis- basedanomalydetectionalgorithmwaspresentedinaconference paper[7]. Thefirst partofthispaperpresents isan extension of themodelwithan adaptivethresholdingfunction.Noticethatthe attackeridentification,asdescribedinthesecondpartofthepaper, wasnot partofthe conference paper. Thesecond noveltyof our studyis thatthe algorithmbesidedetecting theoccurrenceofan attack,italsocanpinpointthesetofattackers.Inotherwordsun- dercertainrealisticassumptions,itcandiscriminatebetweenmes- saging patterns of the attackers and those of the non-malicious, i.e.,normalusers.Similarly, theattackeridentificationmodelruns inanunsupervised mode anditisindependent ofunderlyingat- tackmodelexceptfortheassumptionofattackercoordination.Per- formanceresultsofthealgorithmarestudiedunderextensivenet- worktrafficandattacktrafficsimulations.

In Section2,we give abriefoverviewofcyberthreatsrelated toSIPandproposedremediesforthem.InSection3wedeﬁnethe variablesandsymbolsusedtodescribethetimeseriescorrespond- ingtothemessaginghistoryoftheusersandthestate ofthesys- tem. In Section 4,we introduce ourchange point monitor based on Mahalanobis distances as an instrumentto detect (D)DoS attacks.Themethodfornormalversusmalicioususerdiscrimination isdetailedinSection5.Theperformanceoftheproposedmethods isevaluatedusingsimulationdataandcomparedagainstthose of competitoralgorithms inSection6.Finally,conclusionsaredrawn inSection7wherewealsodiscussthefutureworkinthecontext ofIoT.

2. Literaturereview

Inadditiontothesessionlayerattacks,telecommunicationnet- worksarealsosusceptibletoaplethoraofotherthreatsbelowthe sessionlayer [8].Sincethesearediscussed indetailelsewhere,in thiswork,wefocussolelyonSIP-speciﬁcthreats.

SIP attackstypically exploit vulnerabilities inthe SIPprotocol.

Signature-basedattacksutilizepropertiesoftheSIPgrammar,and canbedetected bypatternmatchingbetweenongoingtraﬃc and the set of signatures.In other words, this type of attack can be determinedorevenpreventedbyinspectingthestepsthattheat- tackermust followthrough.The non-signaturebasedthreats,e.g., behavior-based attacks such as DDoS, are harder to detect. SIP threatscanberoughlycategorizedinto4groups[8]:

• ServiceAbuseThreats:Theseattacksincludecommercialabuse ofservicesto gainsome ﬁnancial beneﬁtsuchastoll fraudor billingavoidance.

• Eavesdropping,InterceptionandModiﬁcationThreats:Theseat- tacks concentrate on illegallyintervening to the call withthe goalofcapturingsensitiveinformation.

• Social Threats: These attacks use protocol shortcomings, mis- conﬁgurationsorimplementationbugsofSIPserverimplemen- tationandusetheseweaknessestomisrepresenttheidentityof maliciouspartiestothesubscribers.

• (Distributed)DenialofService((D)DoS):Theseattacksfocuson the SIP server to prevent it from giving service to the subscribers or to cause signiﬁcant degradation in the quality of networkservices.An attackercan achieve thisbyﬂooding the serverwithSIPmessagesanddepletingthenetworkandserver resources,suchasCPU,memory,bandwidth.IntheDoSattack, onlyone machine is involved tomount the attack on theSIP server. If the attacks are simultaneously performed by many machines,possiblycoordinated,theattackbecomesaDDoSat- tack. The botnet attack, where the attack is staged by many zombie machines that are controlled by a master node, is a well-knowninstanceofDDoS.

There isa large variety ofpossible DDoS attacks,such asDo- mainNameServer(DNS)attackandfuzzingattack[9,10].TheDNS flooding attack wastes the bandwidth resources by injecting fake addresses,tying up the call duringaddress resolution, and caus- ingunnecessarymessagingtrafficbetweenDNSandSIPserver.The fuzzing attack, on the other hand,wastes CPU time by forcing it to parse invalid SIP messages. DDoS attacks in SIPnetworks can be groupedintofourclasses:SIPmessagepayload tampering,SIP messageflowtampering,SIPmessageflooding,andfinallyexploit- ingSIPvulnerabilities,e.g.,fortollfraud[11].

ManymethodshavebeenproposedtodetectandpreventDDoS attacksinVoIPnetworks.Forexample,fortheSIPmessageflood- ingvarieties,an extendedfinitestatemachines(EFSM)canbede- signedforSIPtransactionsinordertomonitortransactionanoma- lies [12]. Selected network traffic variables are tracked and if an undefinedtransactionoccursoranytrafficvariablecount exceeds apre-determinedthreshold,apreventiveactionistriggered.Afull protocolstackintrusiondetectionandpreventionsystemforVoIP systemsisproposedin[13].Thisisatable-basedsystemthatcol- lects, correlates and tuples data from different protocols on the communicationstack,e.g.,MACaddresses,IPaddresses,subscriber IDs, packet timestamps. The decisions, such as dropping packets, aregivenbycertainrulesappliedoverthesetuples.

In[14],thepacketsarelabeled withrespectto theirtransmis- sioncontrol protocol(TCP) flags. An alarmis raisedifthe packet countsinatime windowdeviatesfromthedistributionfittedfor thenormaltraffic.In analternate research,anaive Bayesianclas- sifierhasbeenconstructed asa DDoSdetectorbased onnetwork trafficvariables. In[7,15],aBayesian changepointmodelthat de- tects traffic surges or dips, which possibly correspond to DDoS attacks is proposed. The model is a hierarchical hidden Markov model that links the features extracted from SIP network traffic andserverloadtolatentvariables.Onesetofthesevariablestracks thehiddendynamicsofthesystemandtheothersserveaschange point indicators. The output of the model is the posterior probability of a change indication, which is calculated at fixed time intervals.

AsforSIPmessagepayloadtamperingvariety,anN-gramtech- nique has been considered to detect the fuzzing attacks exploit- ingmalformedSIPmessages.Inthiscase,basedonacorpusofSIP messages,whichcontainsbothvalidandmalformedmessages,4- grams,i.e.,sequential4-byteblocksinSIPmessages,areextracted.

The 4-grams which exceed a given frequency threshold are des- ignated assigniﬁcant featuresandtheir occurrencecount vectors areusedasfeaturestotrainclassiﬁers[16].Anexperimentalstudy of applying 5 different machine learning models to detect DDoS

(3)

attacks in SIP-deployed networks have been conducted [17]. The authors have implemented a simulation environment inorder to train andevaluate theperformance of themodels. Theclassiﬁers are trained with pre-generated training data collected from SIP message headers, which contain both attacks and normal traﬃc.

Themodelsarerequiredtobere-trainedwheneverthenetworkor serviceoperatingconditionsarechanged.Thetrainedclassifiersare evaluatedintermsofaccuracyandtimeoverheadrequiredtorun them on-lineforeach message.Arecentresearch proposesusing an autoregressive integratedmoving average (ARIMA)time series modeltoclassifythenormaltraffic,DoSandDDoSattacks[18]for IPnetworks.ThenumberofpacketsandthenumberofIPsources aretrackedforeachtimeunitandtheirratiosarestored.Thelocal Lyapunovexponentsare calculatedfortheseratios andtheseval- uesarecomparedwithathresholdtodiscriminatemaliciousfrom non-malicioustraffictype.

Astatisticalanomalydetectionmodel,towhichourmethodhas resemblances,wasproposedin[19,20].Thismethoddetectssignif- icant deviations in the 3G mobile network traffic patterns based onavariantofKullback–Leibler(KL)divergencebetweentwoem- piricaldistributions.Thecollecteddatasamplesforeachobserved feature within time window are fitted into respective univariate histograms.Then,theseempiricaldistributionsarecomparedwith referencedistributionsoftheobservedfeaturesbasedonthepro- posed divergence metric. If the distance of any of the inspected featuredistributionstothatofitscorrespondingreferenceexceeds an empirically setthreshold,then an alarmisraised to declarea detected anomaly.A humanexpert givesthefinal decision about thedetectedanomalyastowhetheritisanattackornot.

The spreadofintelligent mobiledeviceshasresultedina new facet ofmobilebotnets.Thedistributedcharacteristicsofthemo- bile network (capability to change IP addresses frequently) and huge numberofeasily-hackedzombiedevicesbymalwares make it hard toprevent the DDoSattacks withconventionalPC-centric solutions.BesidesusingtheInternetforcommandpropagation,the bot mastercancoordinatethe zombiesinsome exceptionalways such asBluetooth communicationorSMS/MMSmessaging. Three differentcommandandcontrolarchitectures(coordinationofzom- bies bythemaster) tostart amobile botnetDDoSattack aredis- cussed in [21]. A recent studyuses machine learningtechniques todiscriminateapplicationsthataremalwaresusedinmobilebot- nets.ThemanifestﬁlesofAndroid ApplicationPackages(APK)are processedtoextractfeatures.After somepre-processingsteps,the selectedfeaturesareusedintrainingclassiﬁerstodetectthemal- wares[22].

AdetailedsurveyonhistoricalevolutionofBotnetsisprovided in[23].Adetailedreview ofnetworkintrusionsystemswhichare capable ofdetecting DDoS attacksandthe speciﬁcmethods used fordetectioncanbefoundin[24].

Analysisoftimeseriesforclassification,prediction,changeand outlierdetectionhasbeenactiveresearch topicsfordecades with particular focusonfinancial markets [25].Among the plethoraof methodsproposedonecanmention:i)methodsthatmapthetime seriesintoanewfeaturespace,suchasspectralentropy,autocor- relationetc.[26];(ii)kernelmethodsfortime-seriesclassification withemphasisonsequencealignment[27–29];(iii)clusteringtime series with a combined distance function satisfying the triangle similarity, whichisthecosinevalue betweentwovector, anddy- namictimewarpingdistance[30];(iv)approachesfittingthedata to anumberofpossible models,such asahiddenMarkovmodel withdynamictimewarping, oran autoregressivemoving average modelwithdynamictimewarping, andclusteringthedatabased on model instance withthe best fit [31,32]; (v) singular spectral analysis wheredata isembedded, the embedding matrixdecom- posedandreconstructed intotrend,noise andoscillatory components.

Metrics,whicharefunctionstocalculatedistancesbetweentwo entitiesin aset,can be usedto detectanomalies inthe network traﬃc and in [33] two such information metrics have been pro- posedforDDoSattacks.Similarly,aDDoSdetectorwhichusesthe Tsallisentropyhasbeenproposed [34].TheMahalanobisdistance, based oninverse covariance matrix,has been previously used in thedetectionofabnormalcallers(outliers)byinspectingtheirSIP messageﬂows [35].In thisstudy,however,we use an adaptively on-linetrainedvarietyoftheMahalanobis distancefora timeseries. We use the time series of Mahalanobis distances accompa- nyingthe input time series to detect DDoS attacks aswell as to identifythemalicioususerfromtheirmessagingbehavioranalysis.

One ofthe first IDS systemarchitectures that uses behavioral analysis to detect DDoS attacks and the malicious attackers was proposed in [36]. The attacking entities aiming fora distributed DoSattackarecharacterizedbyacommonmessagingpattern.This, however,cannotbe representedby arule-basedsystem. Thepro- posedsystem consists of three components: A sniffer to capture thepackets,apreprocessortoextractinformativefeaturesfromthe packetsandaclassifiertodetecttheanomaliesinthetraffic.

3. Mathematicalnotation

Wefirst introduce thenotationspecific tothe communication control, e.g., SIP, messaging. Timeis discrete, represented by the instantst=iât^which ûser^behavior^data îs^collectedând^then processedtooutput afeaturevector.îsânobservationinterval, e.g., 1 s long, within which user messaging activities are moni- tored.A messagingactivity observedatthe server sideis thear- rivalofoneoftheSIPmessages(invite,bye,200etc.)fromauser orthetransmissionofsuchamessagetoauser.

Attheendofthisinterval,therthuser’sactivityisdenotedby the d-dimensional vector vr, where dis the number of different SIP request or response message types taken into consideration.

The vector v is an integer vector whose components correspond tothenumberoftimeseach one ofthedmessagetypeshasoc- curredwithintheithtimeframe((ⁱ− 1)<t<i)^.^Not^all^users areactiveineachobservationinterval.Anactiveuser,forexample therthone, is a registereduser that hassent and/or received at leastoneSIPmessagewithinthegivenobservationinterval,andit isindicated by ur,r=1,...,

|

^U

|

,where |U| isthe cardinal of this set.

Next let’s look into the details of the user’s count vectors. A countvector resultsfromthesumofindividual messagingactivi- tiesofanactiveuser.TherthactiveuserisassumedtorunPr>0 messagingactivitieswithintheobservationinterval.Eachmessag- ingactivityis representedv_r^p,p=1,...,P_r,whichisa unit vector withone componentbeing 1, andthe rest 0. Let’scall this as a messageindicator vector,becauseitindicateswhichoneofthed- messageshasoccurred. Thenv_r=P_r

p=1v^p_r,v_r issimplythecount vectorofmessagessentbytherthuser,asshowninFig.1.

Finally, let us introduce the d-dimensional count vector, x, calledthe state vector, that represents the collective activities of all|U|activeuserswithinatime frame.Thestatevector,whichis thetotalmessagecount vectorfromallusersattheserversideis simplythesumoftheactive usercount vectors,x=_|U|

r=1vr,and thisisillustratedinFig.2.

Wehave sofar omittedanyspeciﬁc indextodenote thetime framesto avoid notational clutter. However, we will use the no- tationx_i,x_j∈^d to denoteserver state vectorsatthe ith andjth observationintervals. Thesefeaturevectorsorserverstatevectors canbeusedtomonitorthetraﬃcregimechangesinanetwork.

LetMbead× dpositive(semi)deﬁnitematrix(M∈S₊ orM∈ S₊₊).D_M(x_i,x_j) isthedistancebetweenthefeaturevectorsx_iand x_j calculatedovermetricmatrixM. f(^M

|

^xⁿ^:^xn−k−1) îsâ ^function ofM definedoverthe time windowoflength k trackedbetween

(4)

0 1 · · · 0 · · · 0

0 0 · · · 0 · · · 1

0 0 · · · 1 · · · 0

.. . .. . . .. .. . . .. · · ·

0 0 · · · 0 · · · 0

1

v¹_r

0

v²_r

· · · 0

v^pr

· · · 0

v^Pr^r

Observation interval between (i 1) ∗ Δ and i ∗ Δ

, → v

1,r

v

2,r

v

3,r

.. . v

d−1,r

v

d,r vr

Fig. 1. The r th user count vector resulting from the accumulation of message indi- cator vectors ( v r = _P_r

p=1 v ^p^r ) in an observation interval.

v

1,1

v

1,2

· · · v

1,r

· · · v

_1,|U|

v

2,1

v

2,2

· · · v

2,r

· · · v

2,|U|

v

3,1

v

3,2

· · · v

3,r

· · · v

3,|U|

.. . .. . . .. .. . . .. · · · v

d−1,1

v

d−1,2

· · · v

d−1,r

· · · v

d−1,|U|

v

d,1 v1

v

d,2 v2

· · · v

d,r vr

· · · v

_d,|U|

v|U|

Observation interval between (i − 1) ∗ Δ and i ∗ Δ

→ x

1

x

2

x

3

.. . x

d−1

x

d x

Fig. 2. The server state vector is the sum of user count vectors ( x = _|U| r=1 v ^r )

w

^p_r

=

v

^p_r

t

^p_r

Fig. 3. The time-stamped user message vector is the concatenation of user unit message vector and the time it is sent (w ^pr ∈ ^d+1) .

featurevectorsfromx_n_−k₋₁ to xn: From thetime indexn− k− 1 totimeindexn.D_ld(A,B)isafunctiondeﬁnedoveranytwosame dimensionmatrices,AandB.

Notice that up tothis point we haveneglected the stamp information, that is, the actual time instances t_r¹,...,t_r^P^r within a generic ^-long ^time ^frame, ^at^which ^the ^P^r ^messaging activities, say, ofthe rth user, are occuring. We can incorporatethis infor- mationby augmenting thedimensionalityofthe messageindicator vector, v_r^p, by one, as follows: (^wr^p)=((^v^pr),t_r^p)^. ^Thus, ^w^pr

isthetimestamp-enrichedversionofthemessageindicatorvector v_r^p.Noticethatw_r^p∈^d+1consistsoftheconcatenationofmessage indicator vector v_r^p and the time instance atwhich the message occurs,t_r^p,asgiveninFig.3.

Given the deﬁnitionsabove, any userur, can be mapped toa timeseries,whichcanberepresentedasoneofthesetwomatri- ces:Vr=[v¹_r

|

^v²r

|

...

|

^v^Pr^r]orWr=[w¹_r

|

^w²r

|

...

|

^w^Pr^r].Thekernelfunc- tionthat measures thesimilarityof anytwouserpair (uq,ur), is representedas K(uq, ur).

κ

(^wr^p^r,w^p_q^q) îs ^defined âs ^the ^heat ^ker-

neltocalculatethesimilaritybetweentime-stampedmessagevec- torsofanytwousersinthesameinterval:p^th_r messageofr^thuser and p^th_q messageofq^th user.Using theuserpairkernelfunctions, forthattimeinterval,wecancalculatethekernelmatrixKofsize

|U|× |U|.

4. Adaptivedistance-basedchangepointdetectionestimator

Feature instances extractedfromadjacent intervalswithin the correlationlengthofastationaryprocesstendtohavehighstatis- ticalsimilarity. On the other hand,features originating fromdif- ferent generative processes or from different sections of a non- stationary process can be expected to have large pairwise distances.Basedonthispremise,asignificantchangeinthedistances between consecutive feature vectors in a time series can be in- terpreted asan indicator ofa changeinthe datageneratingpro- cess. TheHiddenMarkovModel(HMM)cancapturetheseregime changesasa switchingvariablefromonegenerator toanotherin thehiddenlayer.Inthecontextofcommunicationnetworks, such anabruptchangeinfeaturevectorscorresponding totrafficinten- sitypatternsand/orofserverresourceutilizationratescanbecon- jectured to signal a DDoS attack. A Distance-based Change Point Method(DCPM),asusedinourwork,firsttracksthedistancesbe- tweensequential featurevectorsandthen computesthestatistics ofthesedistances todecide forachange ornot.Judicious choice ofadistancefunctioncanprovecriticalintheperformanceofma- chine learningalgorithms. To this effect,one can use one of the well-knowndistancefunctionsorattempttolearnadistancefunc- tionspecificto theproblemathand.Inthiswork,we haveopted tousealearningschemefortheMahalanobisdistance.

4.1. Mahalanobisdistance

TheMahalanobisdistanceD_M betweenx_i,x_j∈^d canbecalcu- latedasinEq.(1).TheMahalanobisdistanceisdefinedoversym- metric positive semi-definite (PSD) matrices, (M∈S₊) d× d, and the choiceof Mcan be madeto account forthe correlationsbe- tweenfeatures andthe differencesbetweenscales.Theinverseof a full rank sample covariance matrix, ^, ^gives ^rise ^to â ^special caseof Mahalanobis distance(M∈S₊₊), which assumesthe data isgenerated froma multivariateGaussian distribution Underthis choiceandtheGaussianassumption,itcanbeshownthatMmaps thedatatouncorrelatedandunit-varianceGaussianvariables.Con- versely,ifthefeaturesfollowastandardGaussiandistributionwith uncorrelatedcomponents,thenwehave:M==I.

DM

(

^xi,xj

)

=

(

^xi− xj

)

^M

(

^xi− xj

)

⁽¹⁾

Any symmetricpositive semi-deﬁnitematrix canbe factorized as M=AA such that A is an e× d projection matrix and e≤ d. Thus,therelationbelowcanbeobtained.

D_M

(

^xi,x_j

)

=

(

^xi− xj

)

^M

(

^xi− xj

)

=

(

^xi− xj

)

^A^A

(

^xi− xj

)

=

(

^A

(

^xi− xj

))

^A

(

^xi− xj

)

=

(

^Axi− Axj

)

(

^Axi− Axj

)

=

^ai− aj

²2=DE

(

^ai,a_j

)

=D_A

(

^xi,x_j

)

⁽²⁾

wherea_i=Ax_iistheprojectedvector,andD_EistheEuclideandis- tance.Eq.(2)showsthatthe Mahalanobisdistancein thefeature spaceisequivalenttotheEuclideandistanceinaprojectedspace.

4.2. Distance-basedchangepointmodel

Thedistance-based changedetectionisachievedby inspecting sum ofdistances over a sliding window, called moving distance,

(5)

where distances between the current feature vector and its immediate predecessorsin atime-frameof sizek are summed.The result of the sliding window sum is compared with a threshold value,

^th,ândân âlarm îs^raised ^for^the^potential ôccurrenceôf

a regime change.Thisstepisfollowed bythe malicioususerdiscrimination algorithm,asdetailed inSection 4.The mainnovelty of thismethod isthat we learn the weight matrixM (calledthe Mahalanobis metric fromnow on) under a loss function so that the detectionalgorithm isadapted toinlier variationsandtrends in thetraﬃc intensityto avoidfalse alarms. The inlier variations canbeduetodiurnalorweek-daybasedchangesortoshort-lived sporadicﬂurryofcallactivities.

Themoving distanceoverak-sizedtime framecanbe deﬁned asa function ofthe symmetricpositive deﬁnite matrix,M∈S₊₊, asfollows:

f

(

^M

|

^xⁿ^:^xn−k−1

)

=

n−1

j=n−k−1

(

^xn− xj

)

^M

(

^xn− xj

)

⁽³⁾

IfthemovingdistancecomputedusingthecurrentMahalanobis metric is above thethreshold, f(^Mn−1

|

^xⁿ^:^xn−k−1)>

th, then an alarmisraised.TheMahalanobismetricisupdatedperiodicallyat eachtimeintervalunderthelossfunctiongivenbelow:

Mmin∈S++

f

(

^M

|

^xn:xn−k−1

)

+

λ

^Dld

(

^M,Mn−1

)

+

β

^Dld

(

^M,I

)

⁽⁴⁾

InEq.(4),thesecond andthethird terms,

λ

^Dld(^M,Mn−1) ^and

β

^Dld(M, I), respectively,are regularizationfunctionsbased onthe logarithmicdeterminantdivergence[37](LogDet).LogDetfunction isapseudo-metricthatmeasuresthedistancebetweentwomatri- cesandisdeﬁnedinEq.(5).DetailedinformationabouttheLogDet function isgiven inthe Appendixsection. The formerregularizer imposestheupdatedmatrixtobeassimilaraspossibletoitspre- decessor.The latteroneforcesittobe ascloseaspossibletothe identitymatrixtopreventitfromconvergingtoanirrelevantma- trix andatthe same time to inducesparsity. Thus, their relative weights can begauged to trade-off theupdate rateoftheMaha- lanobis metric and the aging of the effect of the past measure- ments.The fourparameterstobe setaretheslidingwindowsize k(timeframe size),thetworegularization costweights,

λ

^and

β

^,

andtheparameter

α

^forthresholding.Atthestartofthealgorithm, M₀ isinitializedastheidentitymatrix,M₀=I.SincetheLogDetis aconvexfunctionofM,weareguaranteedtoﬁndtheoptimalpos- itivedeﬁnitematrix,thatminimizesthecriterioninEq.(5). Dld

(

^M,M_t−1

)

=tr

(

^MMt⁻¹−1

)

− logdet

(

^MM⁻¹t−1

)

− d (5) wheretr(•)isthetracefunctionforthematrices.

The optimal Mahalanobis metric(M^∗) can be found by taking thederivativeofEq.(4)andsettingittozero.

M^∗=

λ

⁺

β

^M⁻¹ⁿ⁻¹⁺

β λ

⁺

β

^I

+ 1

λ

⁺

β

n−1

j=n−k−1

(

^xn− xj

)(

^xn− xj

)

−1 (6)

This Mahalanobis metric update is repeated at each time index.

ThechangedetectionalgorithmisgiveninAlgorithm1.

4.3. Thresholdingofthemovingdistances

The characteristics ofthemoving average ofdistances depend on thetraﬃc volumeintensity,the dimensionofthe featurevec- tor, thesizeofthe time frameetc.,andhenceit becomescritical tosetathresholdvalue judiciouslytodetectregimeanomalies or abruptchanges.In thisstudywe testcomparatively twodifferent thresholdfunctions.

Algorithm1 AdaptiveOnlineDistance-BasedChangePointDetec- tionAlgorithm.

1: InitializeM₀(defaultI).

2: Setk,

λ

^,

β

^and

α

^(for

th).

3: repeat

4: Inspect the SIP traﬃc in the time window of size k, and computethecountvector.

5: if f(^Mn−1

|

^xⁿ^:^xn−k−1)>

ththen 6: Raisealarm.

7: RunthemalicioususerdetectordeﬁnedinAlgorithm4.

8: endif 9: EvaluateM^∗. 10: SetM_n₋₁=M^∗. 11: untiltheﬂowends

Experimentalevidencehasshownthatwecanapproximatethe distributionofthemovingsumofdistancesasaChi-squareddistri- bution.ItisthenassumedthatMahalanobisdistancesareobtained froma Gaussian distribution such that

μ

=xn in the immediate past observationinterval, and= M⁻¹. If y, which is the set of observationsinthecurrentslidingwindow,isad-dimensionalran- domvectordrawnfromaGaussiandistributionwithameanvec- tor

μ

^and^a^d^-rank^covariance^matrix^,^then^z=(^y− xn)^M(^y− x_n)=(^y−

μ

)⁻¹(^y−

μ

) ^becomesChi-Squared distributed with d-degreesoffreedom.

Letz_idenoteone ofkindependent,identicallydistributedran- dom variables that follow a chi-square distribution such as z1∼

χ

_α²_,d₁,z₂∼

χ

_α²_,d₂, ..., z_k∼

χ

_α²_,d

k.Due to the additive property of independent chi-squared variables, the sum of the random variables followsa chi-square distribution withd₁+d₂+· · · +d_k degreesoffreedom.Thatis,

Z=z1+z2+· · · +zk

Z∼

χ

_α²_,d₁₊d2+···+dk (7)

Thus, the thresholdof our anomaly detectionmodel becomes

th=

χ

_α²_,k_∗d^. ^The

α

^parameter ^is ^the probability of accepting a chance ﬂuctuation asan anomaly.In other words inthe absence ofanattack,thescoreofthemovingaverageofdistances,denoted by Z above, has a probability lessthan

α

^to ^exceed ^the ^thresh-

old

th. The converse eventof Z exceeding thethreshold can be acceptedasananomalywithprobability1−

α

^.^The^value^of

α

^de-

pendsontherequirementsofthesystemanditistypicallysetby anhumanexperttosome suchvalue as

α

∈{0.1,0.05, 0.02,0.01}.

Thisisastatisticalapproachthatisbasedonthesumofobserved distances.

Analternate, empirically found constant threshold,which isa functionoftwosystemparametersisgivenbelow:

th=ck

_d

2

²

(8)

andisfoundtoworkequallywell.Thisﬁxedthresholdvalue only depends on the time frame size (k), the number of dimensions (d) and a constant c. As a plausible argument for the fact that theconstantthresholdingfunctionworksequallywell,weobserve that the same parameters (k and d) are also inherent in the

χ

²

thresholding. Notice also that there is some liberty in adjusting thisthresholdby settingthe constantc accordingto therequire- ments ofthedeployedsystem. Acase inpointcould be to make theconstant indexedby time periodsc_n,e.g., toaccount forsea- sonaltrends.

More importantly,even though the thresholdis set to a constant,thesystemisstill anadaptivemodelduetothe adaptation inherent in the updates of the Mahalanobis metric. At each observation interval, the Mahalanobis metric is updated to accom-

(6)

w

¹_q

, w

¹_r

w

¹_q

, w

_r²

w

²_q

, w

²_r

w

²_q

, w

¹_r

w

³_q

, w

¹_r

w

³_q

, w

²_r

Fig. 4. All possible alignments W q = [ w ¹q| w ²q| w ³q ] and W ^r = [ w ¹r| w ²r ] .

modate the new distances between the observations. Therefore wheneverthe thresholdis exceeded, it means that the quadratic smoothercouldnotsmoothoutthenewmeasurementdigressions, andthereforeitisverylikelytobeananomaly.

5. Malicioususerdiscrimination

Ifadetectedanomaly isinfactaDDoSattack,thenexttaskis toidentifythesetofmalicioususers thatare presumablycoordi- natingto mounta distributed attack. Forthisanalysis, each subscriber’sbehaviorhistoryintheobservationintervalisrepresented asatime-series,asgiveninFig.1.Weprocessthetimeseriesus- ingasimilarityfunctionsso thatthesubscriberswithsimilarbe- haviorpatterns are clusteredintothe same group. Wehave pro- posedand evaluated two differentattacker discrimination meth- ods.Theﬁrstoneisbasedonaglobaltimeseriesalignmentkernel thatmakesuseofbothepochdifferencesandfeaturedistancesbe- tweenmessagesequences.Thesecond oneusestheusermessage countvectorsattheendofperiodicobservationintervals,i.e.,the information on message time instants are ignored. The pairwise similarityofanytwousersiscalculatedusingtheircountvectors.

5.1.Sequencealignmentkernel

We consider the ensembleof the timestampedmessages sent byauserwithinatime frameofkunits,say(ⁿ− k− 1),...,(ⁿ− 1),asmessagesequences.Eachuser’ssequencecanhaveadiffer- entnumberofmessagingevents,eacheventoccuringatadifferent timeinstant.Inotherwords,auser’smessagesequenceortimese- riescorrespondstotheensembleofmessagessentbyaregistered terminalwithinthedesignatedobservationinterval,eacheventbe- ing characterized by the type of SIP message andits timestamp.

Ourgoalistoestimatethesimilarityofmessagingactivitiesofthe usersviaakernel-basedscheme.Forthispurpose,themessagese- quencesmustbealignedwithoutpairrepetition.Thesimilaritybe- tweentwo sequences ofpossiblydifferentlengths, i.e, numberof messagingevents,canbedeterminedasthesumofsimilaritiesof all their feasible alignments. Thus two sequences are more simi- larasapairiftheir messagingtypes,e.g., inviteorbye, andtheir occurrencesintimeresembleeachother.

Letusassume the usertime series,i.e.,timestamped message sequences, (Wq, Wr) of the user pair (uq, ur), Wq=[w¹_q

|

^w²q

|

^w³q] andWr=[w¹_r

|

^w²r] withthreeand twomessaging events,respectively.Fig.4showsanexampleofallpossiblealignmentsforthese twosequences.Inthisspeciﬁcexample,thereare5possiblealign- ments,asfollows:

• (^w¹q,w¹_r),(^w¹q,w²_r),(^w²q,w²_r),(^w³q,w²_r)

• (^w¹q,w¹_r),(^w²q,w²_r),(^w³q,w²_r)

• (^w¹q,w¹_r),(^w²q,w¹_r),(^w²q,w²_r),(^w³q,w²_r)

• (^w¹q,w¹_r),(^w²q,w¹_r),(^w³q,w²_r)

• (^w¹q,w¹_r),(^w²q,w¹_r),(^w³q,w¹_r),(^w³q,w²_r)

A global alignment kernel has been proposed in [38], which uses dynamicprogrammingto compute the similarityof all pos- siblealignments oftwo sequences. Weuse avariation ofthisal- gorithm,whereweemployapairwiseheatkernelthatisbasedon theMahalanobisdistanceanddifferencesoftimestamps.

5.1.1. Globalsequencealignmentkernel

Given the two message sequences Wq=[w¹_q

|

^w²q

|

...

|

^w^Pq^q] and Wr=[w¹_r

|

^wr²

|

...

|

^wr^P^r] for the user pair (uq, ur) in a state space ^, ^we ^set ^the doubly-indexed series Tp_q,pr as T_p_q_,₀=0 for pq= 1,...,Pq, T₀_,p_r=0 for pr=1,...,Pr, and T₀_,₀=1. We also assume that there is a function to measure the similarity between the p^th_q signaling event of user u_q and the p^th_r signaling event of other user ur,

κ

(^wq^p^q,w_r^p^r)^. ^Computing recursively (^pq,pr)∈

{

¹,...,Pq

}

×

{

¹,...,Pr

}

,fortheterms,onehas:

Tpq,pr=

(

^Tpq,pr−1+T_p_q₋₁_,p_r₋₁+T_p_q₋₁_,p_r

) κ (

^wq^p^q,w_r^p^r

)

⁽⁹⁾

Finally,theunnormalized similaritybetweentwo users(u_q,u_r) is measured when the recursion has considered all possible alignments,thatis:

Kunnormed

(

^u^q,ur

)

=TPq,Pr (10)

Afterthatthekernelmatrixforalluserpairshasbeenobtained, weunit-diagonalnormalizethe|U|× |U|kernelmatrix,where|U|is thenumberofactiveusersinthesystem,inordertoeliminateany scalingissues:

K

(

^uq,ur

)

=

Kunnormed

(

^uq,ur

)

Kunnormed

(

^uq,uq

)

Kunnormed

(

^ur,ur

)

^,^q,^r⁼¹^,^.^.^.^,

|

^U

|

K

(

^uq,ur

)

→[0,1] (11)

Wewillcallthiskernelasthetimeserieskernel.

5.1.2. Pairwiseheatkernel

Each user in a time window can be represented in terms of her ordered timestamped message sequence.Recall that userse- quences can have differing lengths and can consist of different typesofmessages.

A kernelfunction (pairwiseheat function) foranytwo timestamped vectors, (^w^pq^q)=((^v^pq^q),t_q^p^q)^and (^wr^p^r)=((^vr^p^r),t_r^p^r) isevaluatedas:

κ (

^w^pq^q,w_r^p^r

)

⁼^exp

(

⁻

γ

^DM

(

^vq^p^q,v_r^p^r

)

⁻

ρ|

^tq^p^q− tr^p^r

| )

⁽¹²⁾ DM

(

^v^pq^q,v^p_r^r

)

=

(

^vq^p^q− v^pr^r

)

^M

(

^vq^p^q− vr^p^r

)

whereMisthe Mahalanobismetricevaluated atthatobservation interval asinEq. (6). Notethat

κ

(^wq^p^q,w_r^p^r)=1iff v_q^p^q=v_r^p^r and t_q^p^q=t_r^p^r.Thecoeﬃcients

γ

^and

ρ

^determine^the^weights ^of^mes-

sagetype distanceandtimingdistance,respectively.In thisstudy wehaveassumed

γ

=

ρ

=1.

5.2. Userdistancekernel

Akernelmatrixofpairwiseuser-to-usersimilaritiescanbecre- atedbasedontheir Mahalanobisdistances.Userpairswouldhave highsimilarity(closeto1)iftheirMahalanobisdistanceiscloseto 0;conversely,ifthepairsimilarityissmall(closeto0),thentheir distanceislarge.TheMahalanobisdistancekernelcanberegarded asavariantofGaussiankernel.

Anytwousers,uqandur,canbecomparedbasedontheirmes- sagingcountvectorsvq,vr∈^d,asfollows:

K

(

^u^q^,^u^r

)

⁼^exp

(

⁻

(

^v^q^{− v}^r

)

^M

(

^v^q^{− v}^r

))

⁽¹³⁾

(7)

Wewillcallthiskernelsimplyasdistancekernel.K(uq,ur)is1 iff vq=vrNotethatthisfeaturevectordoesnottakeintoaccount the occurrencetiming ofthemessages,butaverages the messaging traﬃc inthat interval. We would like to point out againthe difference betweenthe twowaysofmeasuringuserbehavior dif- ferences.InEq.(13),weconsiderthemessagingeventsintegrated over the observation interval, represented by the d-dimensional count vectorofmessagingeventsaccordingtotheir types.InEqs.

(11)and (12),wecalculate the difference of user behaviors by com- paring and measuring distances, messaging event by messaging event,astheyoccurduringtheobservationinterval.

5.3. Spectralclustering

A matrix of pairwise user-to-user similarities is created from the users’ messages as in Eq.(11) or(13). The kernelmatrix, K, then corresponds to afully connectedweightedadjacency graph, wheretheusersare theverticesandthesimilaritiesaretheedge costs. The adjacency matrix is expected to consists of two sub- graphs: One representing the malicious users characterized by similar behavior patterns, and the other representing the non- malicious users with random-like behavior patterns. In order to partition this graph into these two sub-graphs, we have used thenormalizedLaplacianspectralclusteringalgorithm.Suchalgo- rithmsareconceivedtoﬁndgraphpartitioningsolutionsincluster- ingproblems.Intheliteraturetherearevariousspectralclustering algorithms.WehavepreferredtousenormalizedLaplacianspectral clusteringbecausewewanttonotonlyhavethesimilarnodesto be closelyprojected toeachother,butalsotohavethedissimilar nodestobeprojectedfarfromeachother.Thenormalizedspectral methodssatisfybothofthesecriteria,asdiscussedin[39].

Thedegreeofqthactiveuserinthekernelmatrix,whichisthe sumofalltheweightentriesrelatedtheqthactiveuser,atagiven timeframeisevaluatedas:

dg_q= ^|

U|

r=1

Kq,r (14)

whereK_q,r=K(^uq,u_r)^.

The degreematrixDis adiagonalmatrix whosediagonalele- ments containthedegree values,dg₁,dg₂,. . .,dg_|_U_|.TheLaplacian matrix,L,isevaluatedasinEq.(15)andthespectralclusteringal- gorithmisgiveninAlgorithm2.

Algorithm2 NormalizedLaplacianSpectralClustering.

1: GivenK,evaluateDandL,whichareallin^|^U^|^×^|^U^|.

2: Compute the ﬁrst two eigenvectors,

ψ

1 and

ψ

2, of the two smallesteigenvalues0=

λ

1<

λ

2forthegeneralizedeigenprob- lemL

ψ

= ^D

ψ

^,^where ^is^the^diagonal^matrix^ofeigenvalues

λ

1,...,

λ

|^U|.

3: Matricize

ψ

1and

ψ

2vectorstoobtain∈^|^U^|^×2.Usetherows of ^as^the ^new^feature ^vectorsⁱⁿ^the^mapped^space, ^y∈². Apply2-meansclustering.

4: ReturntheclusterlabelvectorCfrom2-meansclustering.

L=D− K (15)

whereK is the|U|× |U| is kernelmatrixwhoseentries, K(uq,ur), arecalculatedasinEq.(11)or(13).

5.4. Automaticidentiﬁcationofmalicioususerscluster

The malicious users are conjectured to be characterized by repetitiveandcorrelatedbehaviors,andtherestofusersarechar- acterized by uncoordinated and diversebehaviors. Once the two

clustersareobtained, thenthe ﬁnaltaskwouldbe that ofdistin- guishingtheattackerset.

Foreachofthetwoclusters,wecomputethesamplecovariance matrixoftheusermessagesequencevectorsinthatcluster. Since the malicious user cluster is assumed to consist of similar messaging behaviors, such messagevectors are expected to be more stronglyalignedalongafewparticularaxes.Infact,intheextreme casewhen all messages in the cluster are of the same type, the samplecovariancematrixwouldbethe0matrix.Therefore,weas- signthecluster withsigniﬁcantlyhighereigenvalueconcentration tomalicioususers.Thisalgorithm,basedontheheuristicsthatma- licioususers mustbe somewhat coordinatedto mountan attack, andthereforethat thedatavectorsmustconcentrate alongafew eigenvectors asgiven inAlgorithm 3. Eachcluster is assumed to

Algorithm3 ClusterSelectionHeuristics.

1: ForthegivenclusterlabelvectorC,determinethetwoclusters, C₁andC₂.

2: Forthetwoclusters,evaluatethesamplecovariancematrixof theprojectedmessagevectors.

3: ifaclusterhasacovariancematrix=0 then 4: Returnthiscluster.

5: else

6: Evaluatetheeigenvaluesoftheclustercovariancematrices.

7: Returntheclusterwiththehighesteigenvalue 8: endif

containatleasttwosubscribers.

Puttingall ofthesestepstogether,the algorithmtodetect the attackersissummarizedinAlgorithm4.

Algorithm4 AttackerDetection.

1: ifGlobalSequenceKernelisusedthen

2: Settheweightparameters

γ

^and

ρ

^of^the^pairwise^heat^ker-

nel.

3: EvaluatethekernelmatrixKsuchthat

∀

(û^q,ur)∈U× U,we haveKq,r=K(û^q,ur)^,^where^weûse^thetime-stampedmes- sagesequencesWq,Wroftheqthandrthusersinthegiven timeinterval,respectively,withthealignmentkernel,andU isthesetofactiveusers.

4: Unit-diagonalnormalizeK_unnormedtoobtainK. 5: endif

6: ifUserDistanceKernelisusedthen

7: EvaluatethekernelmatrixKsuchthat

∀

(ûq,ur)∈U× U,we haveK_q,r=K(ûq,ur)^,^where^weûse^the^total^message^count vectorsvq,vroftheqthandrthusersinthegiventimein- terval,respectively,withthedistancekernel,andU istheset ofactiveusers.

8: endif

9: Apply the normalized Laplacian spectral clustering algorithm overKsuchthat#clusters=2,asdeﬁnedinAlgorithm2.

10: UsetheclusterlabelvectorCreturnedbythespectralcluster- inginclusterselectionheuristicsasdeﬁnedinAlgorithm3.

11: Returntheselectedclustermembersasthesetofattackers.

6. Experiments

Asis oftenreportedintheliterature, wehave alsofoundthat obtaining andgetting thepermission to use VoIP server datasets proves to be very problematic, mostly due to privacy concerns of the subscribers and the commercial secrecy concerns of the telecommunication operators. Therefore, we have used simulated datasetstoanalyzetheperformanceofthechangepointdetection