• Sonuç bulunamadı

networks An intelligent cyber security system against DDoS attacks in SIP Computer Networks

N/A
N/A
Protected

Academic year: 2021

Share "networks An intelligent cyber security system against DDoS attacks in SIP Computer Networks"

Copied!
18
0
0

Yükleniyor.... (view fulltext now)

Tam metin

(1)

ContentslistsavailableatScienceDirect

Computer Networks

journalhomepage:www.elsevier.com/locate/comnet

An intelligent cyber security system against DDoS attacks in SIP networks

Murat Semerci

a,

, Ali Taylan Cemgil

a

, Bülent Sankur

b

a Department of Computer Engineering, Bogazici University, Bebek, Istanbul 34342, Turkey

b Department of Electrical and Electronics Engineering, Bogazici University, Bebek, Istanbul 34342, Turkey

a rt i c l e i n f o

Article history:

Received 26 September 2017 Revised 11 January 2018 Accepted 25 February 2018 Available online 7 March 2018 Keywords:

Anomaly detection Malicious user detection DDoS

Mahalanobis distances Sequence alignment kernel

a b s t r a c t

DistributedDenialofServices(DDoS)attacksareamongthemostencounteredcybercriminalactivities incommunicationnetworksthatcanresultinconsiderablefinancialandprestigelossesforthecorpora- tionsorgovernmentalorganizations.Therefore,autonomousdetectionofaDDoSattackandidentification ofitssourcesisessentialfortakingcounter-measures.Thisstudyproposesanintelligentsecuritysystem againstDDoSattacksincommunicationnetworksthatiscomposedoftwocomponents:Amonitorfor detectionofDDoS attacksand adiscriminatorfor detectionofusers inthesystemwithmaliciousin- tents.Anoveladaptiverealtimechange-pointmodelthattracksthechangesinMahalanobisdistances betweensampledfeaturevectorsinthemonitoredsystemaccountsforpossible DDoSattacks.Aclus- teringmodelthat runsoverthe similarityscoresofbehavioralpatterns betweenthe usersisused to segregatethemaliciousfromtheinnocent.Theproposedmodelisdeployedoverasimulatedtelephone networkthatusesaSessionInitiationProtocol(SIP)server.Theperformanceofthemodelsareevaluated ondatageneratedbythishighthroughputsimulationenvironment.

© 2018ElsevierB.V.Allrightsreserved.

1. Introduction

DistributedDenialofService(DDoS)attacksareoneofthema- jorcyberthreatsoncommunicationsnetworks.DDoSattacksoccur veryfrequentlybecausetheyarefairlysimpleandcheaptoinitiate while their broad impacton users andservice providerscan po- tentiallybe severe.Such an attackincapacitates thevictimserver andrendersitunabletoprovideservicesatalloratdesiredqual- ityofservicelevels to itssubscribers. Withthe cost-effectivede- ployment ofcloudsystems,DDoS attacksmight affecttheoverall availability ofthe servicesby targetingmorethan oneserver [1]. Theycanevenbeatoolforpoliticalstruggleonagranderscale;a caseinpointisthesetofDDoSattacksto Turkey’sdomainname serversbyhacktivistgroupsinDecember2015[2].Asamoreradi- calcase,theycanbeexertedoversmartpowertransmissiongrids, withpotentiallymorecatastrophicconsequences[3].Thereforeau- tomatic detectionof DDoS attacksandidentification ofmalicious users are crucial in protectingthe network entities and fornon- degradedservicecontinuity.

Telephoneserviceprovidersfollowthetrendofchanging their circuit-switchednetworkstopacket-switched onesin viewofthe

Corresponding author.

E-mail address: murat.semerci@boun.edu.tr (M. Semerci).

cost-effectiveness and maturity of the Voice-over-IP (VoIP) tech- nology.The mostpopular protocol for control signaling between communicating parties in VoIP is currentlythe Session Initiation Protocol (SIP)[4].SIP is based ona simple, HTTP-liketext-based request-response transaction model. It provides basic signaling functionalitiesrequiredforregisteringclients,checkingtheirpres- enceandon-lineavailability,exchangingtheir communicationca- pabilities,andoverallmanagingthesessions.Withthedeployment of 5G, VoIP is expected to be one of the major instruments for themultimediacommunication.ThewidedeploymentofVoIPnet- worksandthekey importanceoftelephone networkshavemade thesecurityissuesofSIPserversextremelyimportant.

VoIPnetworksareunderavarietyofcyberthreatsandthein- tensityof attacks seems only to be growing [5]. The attacks can bemotivated bypotential financial benefits,such aspilfering call chargesorcausing dataleakagemasqueraded as astealth threat.

Conversely,itmaybepartofaplantocausefinanciallossestothe serviceprovidersviaheavyservicedisruption[6].

In thispaper we introduce a novel real-time onlineintrusion detectionandpreventionsystemforcommunicationnetworks,par- ticularly fornetworkswith SIPtraffic. The proposed systemboth detectsthepresenceofanattack andidentifies theattackers.The system focuses on the DDoS attacks that flood and suffocate a serverwithexcessiveamountofrequests.One cluefortheoccur- renceofaDDoSattackisamarkedchangeinthemessagingtraffic https://doi.org/10.1016/j.comnet.2018.02.025

1389-1286/© 2018 Elsevier B.V. All rights reserved.

(2)

patternsinthenetwork.Tothiseffect,wedevelopachangedetec- tionalgorithmthatmonitors thenetworktrafficintensities atthe serverside.Significantchangesinthecharacteristicsofmessaging flowsareinterpretedastheonsetoroffsetofapotentialDDoSat- tack.We assume tacitly that in a DDoS attack, the attackers are alwaysactinginacoordinatedmanner.

Anovelaspectoftheproposedchange-pointdetectionmethod isthatitrelieson theadaptivetrackingofMahalanobisdistances betweensuccessive state vectors asa way to monitor abnormal changesinmessagingtraffic.Thisenablesthemonitortoadaptit- selftothenormaltraffic regimeand/ortothediurnalorseasonal variationswhileatthesametimeremainingsensitivetoabnormal changes.Oneadvantageofourmethodisthatitismodel-free,that is,itisan unsupervisedapproach todetecttraffic anomalies.The systemmakesuseonlyoftheobservedmessagingtraffictypeand intensity,anddoesnotrequireanyadditionalinformationsuchas tracebacks.An abnormalchange in the traffic regime is declared iftheMahalanobis distancesequence ofthe state vectorsinsuc- cessivetimewindowsexceedsathresholdfunction.Thisthreshold valuecan beset toa constantasa functionofsytemparameters orcanbeadaptivelyset.ApreliminaryversionoftheMahalanobis- basedanomalydetectionalgorithmwaspresentedinaconference paper[7]. Thefirst partofthispaperpresents isan extension of themodelwithan adaptivethresholdingfunction.Noticethatthe attackeridentification,asdescribedinthesecondpartofthepaper, wasnot partofthe conference paper. Thesecond noveltyof our studyis thatthe algorithmbesidedetecting theoccurrenceofan attack,italsocanpinpointthesetofattackers.Inotherwordsun- dercertainrealisticassumptions,itcandiscriminatebetweenmes- saging patterns of the attackers and those of the non-malicious, i.e.,normalusers.Similarly, theattackeridentificationmodelruns inanunsupervised mode anditisindependent ofunderlyingat- tackmodelexceptfortheassumptionofattackercoordination.Per- formanceresultsofthealgorithmarestudiedunderextensivenet- worktrafficandattacktrafficsimulations.

In Section2,we give abriefoverviewofcyberthreatsrelated toSIPandproposedremediesforthem.InSection3wedefinethe variablesandsymbolsusedtodescribethetimeseriescorrespond- ingtothemessaginghistoryoftheusersandthestate ofthesys- tem. In Section 4,we introduce ourchange point monitor based on Mahalanobis distances as an instrumentto detect (D)DoS at- tacks.Themethodfornormalversusmalicioususerdiscrimination isdetailedinSection5.Theperformanceoftheproposedmethods isevaluatedusingsimulationdataandcomparedagainstthose of competitoralgorithms inSection6.Finally,conclusionsaredrawn inSection7wherewealsodiscussthefutureworkinthecontext ofIoT.

2. Literaturereview

Inadditiontothesessionlayerattacks,telecommunicationnet- worksarealsosusceptibletoaplethoraofotherthreatsbelowthe sessionlayer [8].Sincethesearediscussed indetailelsewhere,in thiswork,wefocussolelyonSIP-specificthreats.

SIP attackstypically exploit vulnerabilities inthe SIPprotocol.

Signature-basedattacksutilizepropertiesoftheSIPgrammar,and canbedetected bypatternmatchingbetweenongoingtraffic and the set of signatures.In other words, this type of attack can be determinedorevenpreventedbyinspectingthestepsthattheat- tackermust followthrough.The non-signaturebasedthreats,e.g., behavior-based attacks such as DDoS, are harder to detect. SIP threatscanberoughlycategorizedinto4groups[8]:

• ServiceAbuseThreats:Theseattacksincludecommercialabuse ofservicesto gainsome financial benefitsuchastoll fraudor billingavoidance.

• Eavesdropping,InterceptionandModificationThreats:Theseat- tacks concentrate on illegallyintervening to the call withthe goalofcapturingsensitiveinformation.

• Social Threats: These attacks use protocol shortcomings, mis- configurationsorimplementationbugsofSIPserverimplemen- tationandusetheseweaknessestomisrepresenttheidentityof maliciouspartiestothesubscribers.

• (Distributed)DenialofService((D)DoS):Theseattacksfocuson the SIP server to prevent it from giving service to the sub- scribers or to cause significant degradation in the quality of networkservices.An attackercan achieve thisbyflooding the serverwithSIPmessagesanddepletingthenetworkandserver resources,suchasCPU,memory,bandwidth.IntheDoSattack, onlyone machine is involved tomount the attack on theSIP server. If the attacks are simultaneously performed by many machines,possiblycoordinated,theattackbecomesaDDoSat- tack. The botnet attack, where the attack is staged by many zombie machines that are controlled by a master node, is a well-knowninstanceofDDoS.

There isa large variety ofpossible DDoS attacks,such asDo- mainNameServer(DNS)attackandfuzzingattack[9,10].TheDNS flooding attack wastes the bandwidth resources by injecting fake addresses,tying up the call duringaddress resolution, and caus- ingunnecessarymessagingtrafficbetweenDNSandSIPserver.The fuzzing attack, on the other hand,wastes CPU time by forcing it to parse invalid SIP messages. DDoS attacks in SIPnetworks can be groupedintofourclasses:SIPmessagepayload tampering,SIP messageflowtampering,SIPmessageflooding,andfinallyexploit- ingSIPvulnerabilities,e.g.,fortollfraud[11].

ManymethodshavebeenproposedtodetectandpreventDDoS attacksinVoIPnetworks.Forexample,fortheSIPmessageflood- ingvarieties,an extendedfinitestatemachines(EFSM)canbede- signedforSIPtransactionsinordertomonitortransactionanoma- lies [12]. Selected network traffic variables are tracked and if an undefinedtransactionoccursoranytrafficvariablecount exceeds apre-determinedthreshold,apreventiveactionistriggered.Afull protocolstackintrusiondetectionandpreventionsystemforVoIP systemsisproposedin[13].Thisisatable-basedsystemthatcol- lects, correlates and tuples data from different protocols on the communicationstack,e.g.,MACaddresses,IPaddresses,subscriber IDs, packet timestamps. The decisions, such as dropping packets, aregivenbycertainrulesappliedoverthesetuples.

In[14],thepacketsarelabeled withrespectto theirtransmis- sioncontrol protocol(TCP) flags. An alarmis raisedifthe packet countsinatime windowdeviatesfromthedistributionfittedfor thenormaltraffic.In analternate research,anaive Bayesianclas- sifierhasbeenconstructed asa DDoSdetectorbased onnetwork trafficvariables. In[7,15],aBayesian changepointmodelthat de- tects traffic surges or dips, which possibly correspond to DDoS attacks is proposed. The model is a hierarchical hidden Markov model that links the features extracted from SIP network traffic andserverloadtolatentvariables.Onesetofthesevariablestracks thehiddendynamicsofthesystemandtheothersserveaschange point indicators. The output of the model is the posterior prob- ability of a change indication, which is calculated at fixed time intervals.

AsforSIPmessagepayloadtamperingvariety,anN-gramtech- nique has been considered to detect the fuzzing attacks exploit- ingmalformedSIPmessages.Inthiscase,basedonacorpusofSIP messages,whichcontainsbothvalidandmalformedmessages,4- grams,i.e.,sequential4-byteblocksinSIPmessages,areextracted.

The 4-grams which exceed a given frequency threshold are des- ignated assignificant featuresandtheir occurrencecount vectors areusedasfeaturestotrainclassifiers[16].Anexperimentalstudy of applying 5 different machine learning models to detect DDoS

(3)

attacks in SIP-deployed networks have been conducted [17]. The authors have implemented a simulation environment inorder to train andevaluate theperformance of themodels. Theclassifiers are trained with pre-generated training data collected from SIP message headers, which contain both attacks and normal traffic.

Themodelsarerequiredtobere-trainedwheneverthenetworkor serviceoperatingconditionsarechanged.Thetrainedclassifiersare evaluatedintermsofaccuracyandtimeoverheadrequiredtorun them on-lineforeach message.Arecentresearch proposesusing an autoregressive integratedmoving average (ARIMA)time series modeltoclassifythenormaltraffic,DoSandDDoSattacks[18]for IPnetworks.ThenumberofpacketsandthenumberofIPsources aretrackedforeachtimeunitandtheirratiosarestored.Thelocal Lyapunovexponentsare calculatedfortheseratios andtheseval- uesarecomparedwithathresholdtodiscriminatemaliciousfrom non-malicioustraffictype.

Astatisticalanomalydetectionmodel,towhichourmethodhas resemblances,wasproposedin[19,20].Thismethoddetectssignif- icant deviations in the 3G mobile network traffic patterns based onavariantofKullback–Leibler(KL)divergencebetweentwoem- piricaldistributions.Thecollecteddatasamplesforeachobserved feature within time window are fitted into respective univariate histograms.Then,theseempiricaldistributionsarecomparedwith referencedistributionsoftheobservedfeaturesbasedonthepro- posed divergence metric. If the distance of any of the inspected featuredistributionstothatofitscorrespondingreferenceexceeds an empirically setthreshold,then an alarmisraised to declarea detected anomaly.A humanexpert givesthefinal decision about thedetectedanomalyastowhetheritisanattackornot.

The spreadofintelligent mobiledeviceshasresultedina new facet ofmobilebotnets.Thedistributedcharacteristicsofthemo- bile network (capability to change IP addresses frequently) and huge numberofeasily-hackedzombiedevicesbymalwares make it hard toprevent the DDoSattacks withconventionalPC-centric solutions.BesidesusingtheInternetforcommandpropagation,the bot mastercancoordinatethe zombiesinsome exceptionalways such asBluetooth communicationorSMS/MMSmessaging. Three differentcommandandcontrolarchitectures(coordinationofzom- bies bythemaster) tostart amobile botnetDDoSattack aredis- cussed in [21]. A recent studyuses machine learningtechniques todiscriminateapplicationsthataremalwaresusedinmobilebot- nets.ThemanifestfilesofAndroid ApplicationPackages(APK)are processedtoextractfeatures.After somepre-processingsteps,the selectedfeaturesareusedintrainingclassifierstodetectthemal- wares[22].

AdetailedsurveyonhistoricalevolutionofBotnetsisprovided in[23].Adetailedreview ofnetworkintrusionsystemswhichare capable ofdetecting DDoS attacksandthe specificmethods used fordetectioncanbefoundin[24].

Analysisoftimeseriesforclassification,prediction,changeand outlierdetectionhasbeenactiveresearch topicsfordecades with particular focusonfinancial markets [25].Among the plethoraof methodsproposedonecanmention:i)methodsthatmapthetime seriesintoanewfeaturespace,suchasspectralentropy,autocor- relationetc.[26];(ii)kernelmethodsfortime-seriesclassification withemphasisonsequencealignment[27–29];(iii)clusteringtime series with a combined distance function satisfying the triangle similarity, whichisthecosinevalue betweentwovector, anddy- namictimewarpingdistance[30];(iv)approachesfittingthedata to anumberofpossible models,such asahiddenMarkovmodel withdynamictimewarping, oran autoregressivemoving average modelwithdynamictimewarping, andclusteringthedatabased on model instance withthe best fit [31,32]; (v) singular spectral analysis wheredata isembedded, the embedding matrixdecom- posedandreconstructed intotrend,noise andoscillatory compo- nents.

Metrics,whicharefunctionstocalculatedistancesbetweentwo entitiesin aset,can be usedto detectanomalies inthe network traffic and in [33] two such information metrics have been pro- posedforDDoSattacks.Similarly,aDDoSdetectorwhichusesthe Tsallisentropyhasbeenproposed [34].TheMahalanobisdistance, based oninverse covariance matrix,has been previously used in thedetectionofabnormalcallers(outliers)byinspectingtheirSIP messageflows [35].In thisstudy,however,we use an adaptively on-linetrainedvarietyoftheMahalanobis distancefora timese- ries. We use the time series of Mahalanobis distances accompa- nyingthe input time series to detect DDoS attacks aswell as to identifythemalicioususerfromtheirmessagingbehavioranalysis.

One ofthe first IDS systemarchitectures that uses behavioral analysis to detect DDoS attacks and the malicious attackers was proposed in [36]. The attacking entities aiming fora distributed DoSattackarecharacterizedbyacommonmessagingpattern.This, however,cannotbe representedby arule-basedsystem. Thepro- posedsystem consists of three components: A sniffer to capture thepackets,apreprocessortoextractinformativefeaturesfromthe packetsandaclassifiertodetecttheanomaliesinthetraffic.

3. Mathematicalnotation

Wefirst introduce thenotationspecific tothe communication control, e.g., SIP, messaging. Timeis discrete, represented by the instantst=iatwhich userbehaviordata iscollectedandthen processedtooutput afeaturevector.isanobservationinterval, e.g., 1 s long, within which user messaging activities are moni- tored.A messagingactivity observedatthe server sideis thear- rivalofoneoftheSIPmessages(invite,bye,200etc.)fromauser orthetransmissionofsuchamessagetoauser.

Attheendofthisinterval,therthuser’sactivityisdenotedby the d-dimensional vector vr, where dis the number of different SIP request or response message types taken into consideration.

The vector v is an integer vector whose components correspond tothenumberoftimeseach one ofthedmessagetypeshasoc- curredwithintheithtimeframe((i− 1)<t<i).Notallusers areactiveineachobservationinterval.Anactiveuser,forexample therthone, is a registereduser that hassent and/or received at leastoneSIPmessagewithinthegivenobservationinterval,andit isindicated by ur,r=1,...,

|

U

|

,where |U| isthe cardinal of this set.

Next let’s look into the details of the user’s count vectors. A countvector resultsfromthesumofindividual messagingactivi- tiesofanactiveuser.TherthactiveuserisassumedtorunPr>0 messagingactivitieswithintheobservationinterval.Eachmessag- ingactivityis representedvrp,p=1,...,Pr,whichisa unit vector withone componentbeing 1, andthe rest 0. Let’scall this as a messageindicator vector,becauseitindicateswhichoneofthed- messageshasoccurred. Thenvr=Pr

p=1vpr,vr issimplythecount vectorofmessagessentbytherthuser,asshowninFig.1.

Finally, let us introduce the d-dimensional count vector, x, calledthe state vector, that represents the collective activities of all|U|activeuserswithinatime frame.Thestatevector,whichis thetotalmessagecount vectorfromallusersattheserversideis simplythesumoftheactive usercount vectors,x=|U|

r=1vr,and thisisillustratedinFig.2.

Wehave sofar omittedanyspecific indextodenote thetime framesto avoid notational clutter. However, we will use the no- tationxi,xj∈d to denoteserver state vectorsatthe ith andjth observationintervals. Thesefeaturevectorsorserverstatevectors canbeusedtomonitorthetrafficregimechangesinanetwork.

LetMbead× dpositive(semi)definitematrix(MS+ orMS++).DM(xi,xj) isthedistancebetweenthefeaturevectorsxiand xj calculatedovermetricmatrixM. f(M

|

xn:xn−k−1) isa function ofM definedoverthe time windowoflength k trackedbetween

(4)

0 1 · · · 0 · · · 0

0 0 · · · 0 · · · 1

0 0 · · · 1 · · · 0

.. . .. . . .. .. . . .. · · ·

0 0 · · · 0 · · · 0

1

v1r

0

v2r

· · · 0

vpr

· · · 0

vPrr

Observation interval between (i 1) ∗ Δ and i ∗ Δ

, → v

1,r

v

2,r

v

3,r

.. . v

d−1,r

v

d,r vr

Fig. 1. The r th user count vector resulting from the accumulation of message indi- cator vectors ( v r =  Pr

p=1 v pr ) in an observation interval.

v

1,1

v

1,2

· · · v

1,r

· · · v

1,|U|

v

2,1

v

2,2

· · · v

2,r

· · · v

2,|U|

v

3,1

v

3,2

· · · v

3,r

· · · v

3,|U|

.. . .. . . .. .. . . .. · · · v

d−1,1

v

d−1,2

· · · v

d−1,r

· · · v

d−1,|U|

v

d,1 v1

v

d,2 v2

· · · v

d,r vr

· · · v

d,|U|

v|U|

Observation interval between (i − 1) ∗ Δ and i ∗ Δ

x

1

x

2

x

3

.. . x

d−1

x

d x

Fig. 2. The server state vector is the sum of user count vectors ( x =  |U| r=1 v r )

w

pr

=

v

pr

t

pr

Fig. 3. The time-stamped user message vector is the concatenation of user unit message vector and the time it is sent (w pr ∈  d+1) .

featurevectorsfromxn−k−1 to xn: From thetime indexn− k− 1 totimeindexn.Dld(A,B)isafunctiondefinedoveranytwosame dimensionmatrices,AandB.

Notice that up tothis point we haveneglected the stamp in- formation, that is, the actual time instances tr1,...,trPr within a generic -long time frame, atwhich the Pr messaging activities, say, ofthe rth user, are occuring. We can incorporatethis infor- mationby augmenting thedimensionalityofthe messageindica- tor vector, vrp, by one, as follows: (wrp)=((vpr),trp). Thus, wpr

isthetimestamp-enrichedversionofthemessageindicatorvector vrp.Noticethatwrp∈d+1consistsoftheconcatenationofmessage indicator vector vrp and the time instance atwhich the message occurs,trp,asgiveninFig.3.

Given the definitionsabove, any userur, can be mapped toa timeseries,whichcanberepresentedasoneofthesetwomatri- ces:Vr=[v1r

|

v2r

|

...

|

vPrr]orWr=[w1r

|

w2r

|

...

|

wPrr].Thekernelfunc- tionthat measures thesimilarityof anytwouserpair (uq,ur), is representedas K(uq, ur).

κ

(wrpr,wpqq) is defined as the heat ker-

neltocalculatethesimilaritybetweentime-stampedmessagevec- torsofanytwousersinthesameinterval:pthr messageofrthuser and pthq messageofqth user.Using theuserpairkernelfunctions, forthattimeinterval,wecancalculatethekernelmatrixKofsize

|U|× |U|.

4. Adaptivedistance-basedchangepointdetectionestimator

Feature instances extractedfromadjacent intervalswithin the correlationlengthofastationaryprocesstendtohavehighstatis- ticalsimilarity. On the other hand,features originating fromdif- ferent generative processes or from different sections of a non- stationary process can be expected to have large pairwise dis- tances.Basedonthispremise,asignificantchangeinthedistances between consecutive feature vectors in a time series can be in- terpreted asan indicator ofa changeinthe datageneratingpro- cess. TheHiddenMarkovModel(HMM)cancapturetheseregime changesasa switchingvariablefromonegenerator toanotherin thehiddenlayer.Inthecontextofcommunicationnetworks, such anabruptchangeinfeaturevectorscorresponding totrafficinten- sitypatternsand/orofserverresourceutilizationratescanbecon- jectured to signal a DDoS attack. A Distance-based Change Point Method(DCPM),asusedinourwork,firsttracksthedistancesbe- tweensequential featurevectorsandthen computesthestatistics ofthesedistances todecide forachange ornot.Judicious choice ofadistancefunctioncanprovecriticalintheperformanceofma- chine learningalgorithms. To this effect,one can use one of the well-knowndistancefunctionsorattempttolearnadistancefunc- tionspecificto theproblemathand.Inthiswork,we haveopted tousealearningschemefortheMahalanobisdistance.

4.1. Mahalanobisdistance

TheMahalanobisdistanceDM betweenxi,xj∈d canbecalcu- latedasinEq.(1).TheMahalanobisdistanceisdefinedoversym- metric positive semi-definite (PSD) matrices, (MS+) d× d, and the choiceof Mcan be madeto account forthe correlationsbe- tweenfeatures andthe differencesbetweenscales.Theinverseof a full rank sample covariance matrix, , gives rise to a special caseof Mahalanobis distance(MS++), which assumesthe data isgenerated froma multivariateGaussian distribution Underthis choiceandtheGaussianassumption,itcanbeshownthatMmaps thedatatouncorrelatedandunit-varianceGaussianvariables.Con- versely,ifthefeaturesfollowastandardGaussiandistributionwith uncorrelatedcomponents,thenwehave:M==I.

DM

(

xi,xj

)

=

(

xi− xj

)

M

(

xi− xj

)

(1)

Any symmetricpositive semi-definitematrix canbe factorized as M=AA such that A is an e× d projection matrix and e≤ d. Thus,therelationbelowcanbeobtained.

DM

(

xi,xj

)

=

(

xi− xj

)

M

(

xi− xj

)

=

(

xi− xj

)

AA

(

xi− xj

)

=

(

A

(

xi− xj

))

A

(

xi− xj

)

=

(

Axi− Axj

)



(

Axi− Axj

)

=



ai− aj



22=DE

(

ai,aj

)

=DA

(

xi,xj

)

(2)

whereai=Axiistheprojectedvector,andDEistheEuclideandis- tance.Eq.(2)showsthatthe Mahalanobisdistancein thefeature spaceisequivalenttotheEuclideandistanceinaprojectedspace.

4.2. Distance-basedchangepointmodel

Thedistance-based changedetectionisachievedby inspecting sum ofdistances over a sliding window, called moving distance,

(5)

where distances between the current feature vector and its im- mediate predecessorsin atime-frameof sizek are summed.The result of the sliding window sum is compared with a threshold value,



th,andan alarm israised forthepotential occurrenceof

a regime change.Thisstepisfollowed bythe malicioususerdis- crimination algorithm,asdetailed inSection 4.The mainnovelty of thismethod isthat we learn the weight matrixM (calledthe Mahalanobis metric fromnow on) under a loss function so that the detectionalgorithm isadapted toinlier variationsandtrends in thetraffic intensityto avoidfalse alarms. The inlier variations canbeduetodiurnalorweek-daybasedchangesortoshort-lived sporadicflurryofcallactivities.

Themoving distanceoverak-sizedtime framecanbe defined asa function ofthe symmetricpositive definite matrix,MS++, asfollows:

f

(

M

|

xn:xn−k−1

)

=

n−1



j=n−k−1

(

xn− xj

)

M

(

xn− xj

)

(3)

IfthemovingdistancecomputedusingthecurrentMahalanobis metric is above thethreshold, f(Mn−1

|

xn:xn−k−1)>



th, then an alarmisraised.TheMahalanobismetricisupdatedperiodicallyat eachtimeintervalunderthelossfunctiongivenbelow:

MminS++

f

(

M

|

xn:xn−k−1

)

+

λ

Dld

(

M,Mn−1

)

+

β

Dld

(

M,I

)

(4)

InEq.(4),thesecond andthethird terms,

λ

Dld(M,Mn−1) and

β

Dld(M, I), respectively,are regularizationfunctionsbased onthe logarithmicdeterminantdivergence[37](LogDet).LogDetfunction isapseudo-metricthatmeasuresthedistancebetweentwomatri- cesandisdefinedinEq.(5).DetailedinformationabouttheLogDet function isgiven inthe Appendixsection. The formerregularizer imposestheupdatedmatrixtobeassimilaraspossibletoitspre- decessor.The latteroneforcesittobe ascloseaspossibletothe identitymatrixtopreventitfromconvergingtoanirrelevantma- trix andatthe same time to inducesparsity. Thus, their relative weights can begauged to trade-off theupdate rateoftheMaha- lanobis metric and the aging of the effect of the past measure- ments.The fourparameterstobe setaretheslidingwindowsize k(timeframe size),thetworegularization costweights,

λ

and

β

,

andtheparameter

α

forthresholding.Atthestartofthealgorithm, M0 isinitializedastheidentitymatrix,M0=I.SincetheLogDetis aconvexfunctionofM,weareguaranteedtofindtheoptimalpos- itivedefinitematrix,thatminimizesthecriterioninEq.(5). Dld

(

M,Mt−1

)

=tr

(

MMt−1−1

)

− logdet

(

MM−1t−1

)

− d (5) wheretr(•)isthetracefunctionforthematrices.

The optimal Mahalanobis metric(M) can be found by taking thederivativeofEq.(4)andsettingittozero.

M=

 λ

λ

+

β

M−1n−1+

β λ

+

β

I

+ 1

λ

+

β

n−1



j=n−k−1

(

xn− xj

)(

xn− xj

)





−1 (6)

This Mahalanobis metric update is repeated at each time index.

ThechangedetectionalgorithmisgiveninAlgorithm1.

4.3. Thresholdingofthemovingdistances

The characteristics ofthemoving average ofdistances depend on thetraffic volumeintensity,the dimensionofthe featurevec- tor, thesizeofthe time frameetc.,andhenceit becomescritical tosetathresholdvalue judiciouslytodetectregimeanomalies or abruptchanges.In thisstudywe testcomparatively twodifferent thresholdfunctions.

Algorithm1 AdaptiveOnlineDistance-BasedChangePointDetec- tionAlgorithm.

1: InitializeM0(defaultI).

2: Setk,

λ

,

β

and

α

(for



th).

3: repeat

4: Inspect the SIP traffic in the time window of size k, and computethecountvector.

5: if f(Mn−1

|

xn:xn−k−1)>



ththen 6: Raisealarm.

7: RunthemalicioususerdetectordefinedinAlgorithm4.

8: endif 9: EvaluateM. 10: SetMn−1=M. 11: untiltheflowends

Experimentalevidencehasshownthatwecanapproximatethe distributionofthemovingsumofdistancesasaChi-squareddistri- bution.ItisthenassumedthatMahalanobisdistancesareobtained froma Gaussian distribution such that

μ

=xn in the immediate past observationinterval, and= M−1. If y, which is the set of observationsinthecurrentslidingwindow,isad-dimensionalran- domvectordrawnfromaGaussiandistributionwithameanvec- tor

μ

andad-rankcovariancematrix,thenz=(y− xn)M(yxn)=(y

μ

)−1(y

μ

) becomesChi-Squared distributed with d-degreesoffreedom.

Letzidenoteone ofkindependent,identicallydistributedran- dom variables that follow a chi-square distribution such as z1

χ

α2,d1,z2

χ

α2,d2, ..., zk

χ

α2,d

k.Due to the additive property of independent chi-squared variables, the sum of the random vari- ables followsa chi-square distribution withd1+d2+· · · +dk de- greesoffreedom.Thatis,

Z=z1+z2+· · · +zk

Z

χ

α2,d1+d2+···+dk (7)

Thus, the thresholdof our anomaly detectionmodel becomes



th=

χ

α2,k∗d. The

α

parameter is the probability of accepting a chance fluctuation asan anomaly.In other words inthe absence ofanattack,thescoreofthemovingaverageofdistances,denoted by Z above, has a probability lessthan

α

to exceed the thresh-

old



th. The converse eventof Z exceeding thethreshold can be acceptedasananomalywithprobability1−

α

.Thevalueof

α

de-

pendsontherequirementsofthesystemanditistypicallysetby anhumanexperttosome suchvalue as

α

∈{0.1,0.05, 0.02,0.01}.

Thisisastatisticalapproachthatisbasedonthesumofobserved distances.

Analternate, empirically found constant threshold,which isa functionoftwosystemparametersisgivenbelow:



th=ck



d

2



2

(8)

andisfoundtoworkequallywell.Thisfixedthresholdvalue only depends on the time frame size (k), the number of dimensions (d) and a constant c. As a plausible argument for the fact that theconstantthresholdingfunctionworksequallywell,weobserve that the same parameters (k and d) are also inherent in the

χ

2

thresholding. Notice also that there is some liberty in adjusting thisthresholdby settingthe constantc accordingto therequire- ments ofthedeployedsystem. Acase inpointcould be to make theconstant indexedby time periodscn,e.g., toaccount forsea- sonaltrends.

More importantly,even though the thresholdis set to a con- stant,thesystemisstill anadaptivemodelduetothe adaptation inherent in the updates of the Mahalanobis metric. At each ob- servation interval, the Mahalanobis metric is updated to accom-

(6)

w

1q

, w

1r

w

1q

, w

r2

w

2q

, w

2r

w

2q

, w

1r

w

3q

, w

1r

w

3q

, w

2r

Fig. 4. All possible alignments W q = [ w 1q| w 2q| w 3q ] and W r = [ w 1r| w 2r ] .

modate the new distances between the observations. Therefore wheneverthe thresholdis exceeded, it means that the quadratic smoothercouldnotsmoothoutthenewmeasurementdigressions, andthereforeitisverylikelytobeananomaly.

5. Malicioususerdiscrimination

Ifadetectedanomaly isinfactaDDoSattack,thenexttaskis toidentifythesetofmalicioususers thatare presumablycoordi- natingto mounta distributed attack. Forthisanalysis, each sub- scriber’sbehaviorhistoryintheobservationintervalisrepresented asatime-series,asgiveninFig.1.Weprocessthetimeseriesus- ingasimilarityfunctionsso thatthesubscriberswithsimilarbe- haviorpatterns are clusteredintothe same group. Wehave pro- posedand evaluated two differentattacker discrimination meth- ods.Thefirstoneisbasedonaglobaltimeseriesalignmentkernel thatmakesuseofbothepochdifferencesandfeaturedistancesbe- tweenmessagesequences.Thesecond oneusestheusermessage countvectorsattheendofperiodicobservationintervals,i.e.,the information on message time instants are ignored. The pairwise similarityofanytwousersiscalculatedusingtheircountvectors.

5.1.Sequencealignmentkernel

We consider the ensembleof the timestampedmessages sent byauserwithinatime frameofkunits,say(n− k− 1),...,(n− 1),asmessagesequences.Eachuser’ssequencecanhaveadiffer- entnumberofmessagingevents,eacheventoccuringatadifferent timeinstant.Inotherwords,auser’smessagesequenceortimese- riescorrespondstotheensembleofmessagessentbyaregistered terminalwithinthedesignatedobservationinterval,eacheventbe- ing characterized by the type of SIP message andits timestamp.

Ourgoalistoestimatethesimilarityofmessagingactivitiesofthe usersviaakernel-basedscheme.Forthispurpose,themessagese- quencesmustbealignedwithoutpairrepetition.Thesimilaritybe- tweentwo sequences ofpossiblydifferentlengths, i.e, numberof messagingevents,canbedeterminedasthesumofsimilaritiesof all their feasible alignments. Thus two sequences are more simi- larasapairiftheir messagingtypes,e.g., inviteorbye, andtheir occurrencesintimeresembleeachother.

Letusassume the usertime series,i.e.,timestamped message sequences, (Wq, Wr) of the user pair (uq, ur), Wq=[w1q

|

w2q

|

w3q] andWr=[w1r

|

w2r] withthreeand twomessaging events,respec- tively.Fig.4showsanexampleofallpossiblealignmentsforthese twosequences.Inthisspecificexample,thereare5possiblealign- ments,asfollows:

(w1q,w1r),(w1q,w2r),(w2q,w2r),(w3q,w2r)

(w1q,w1r),(w2q,w2r),(w3q,w2r)

(w1q,w1r),(w2q,w1r),(w2q,w2r),(w3q,w2r)

(w1q,w1r),(w2q,w1r),(w3q,w2r)

(w1q,w1r),(w2q,w1r),(w3q,w1r),(w3q,w2r)

A global alignment kernel has been proposed in [38], which uses dynamicprogrammingto compute the similarityof all pos- siblealignments oftwo sequences. Weuse avariation ofthisal- gorithm,whereweemployapairwiseheatkernelthatisbasedon theMahalanobisdistanceanddifferencesoftimestamps.

5.1.1. Globalsequencealignmentkernel

Given the two message sequences Wq=[w1q

|

w2q

|

...

|

wPqq] and Wr=[w1r

|

wr2

|

...

|

wrPr] for the user pair (uq, ur) in a state space , we set the doubly-indexed series Tpq,pr as Tpq,0=0 for pq= 1,...,Pq, T0,pr=0 for pr=1,...,Pr, and T0,0=1. We also as- sume that there is a function to measure the similarity be- tween the pthq signaling event of user uq and the pthr signal- ing event of other user ur,

κ

(wqpq,wrpr). Computing recursively (pq,pr)

{

1,...,Pq

}

×

{

1,...,Pr

}

,fortheterms,onehas:

Tpq,pr=

(

Tpq,pr−1+Tpq−1,pr−1+Tpq−1,pr

) κ (

wqpq,wrpr

)

(9)

Finally,theunnormalized similaritybetweentwo users(uq,ur) is measured when the recursion has considered all possible align- ments,thatis:

Kunnormed

(

uq,ur

)

=TPq,Pr (10)

Afterthatthekernelmatrixforalluserpairshasbeenobtained, weunit-diagonalnormalizethe|U|× |U|kernelmatrix,where|U|is thenumberofactiveusersinthesystem,inordertoeliminateany scalingissues:

K

(

uq,ur

)

=



Kunnormed

(

uq,ur

)

Kunnormed

(

uq,uq

) 

Kunnormed

(

ur,ur

)

,q,r=1,...,

|

U

|

K

(

uq,ur

)

→[0,1] (11)

Wewillcallthiskernelasthetimeserieskernel.

5.1.2. Pairwiseheatkernel

Each user in a time window can be represented in terms of her ordered timestamped message sequence.Recall that userse- quences can have differing lengths and can consist of different typesofmessages.

A kernelfunction (pairwiseheat function) foranytwo times- tamped vectors, (wpqq)=((vpqq),tqpq)and (wrpr)=((vrpr),trpr) isevaluatedas:

κ (

wpqq,wrpr

)

=exp

(

γ

DM

(

vqpq,vrpr

)

ρ|

tqpq− trpr

| )

(12) DM

(

vpqq,vprr

)

=

(

vqpq− vprr

)

M

(

vqpq− vrpr

)

whereMisthe Mahalanobismetricevaluated atthatobservation interval asinEq. (6). Notethat

κ

(wqpq,wrpr)=1iff vqpq=vrpr and tqpq=trpr.Thecoefficients

γ

and

ρ

determinetheweights ofmes-

sagetype distanceandtimingdistance,respectively.In thisstudy wehaveassumed

γ

=

ρ

=1.

5.2. Userdistancekernel

Akernelmatrixofpairwiseuser-to-usersimilaritiescanbecre- atedbasedontheir Mahalanobisdistances.Userpairswouldhave highsimilarity(closeto1)iftheirMahalanobisdistanceiscloseto 0;conversely,ifthepairsimilarityissmall(closeto0),thentheir distanceislarge.TheMahalanobisdistancekernelcanberegarded asavariantofGaussiankernel.

Anytwousers,uqandur,canbecomparedbasedontheirmes- sagingcountvectorsvq,vr∈d,asfollows:

K

(

uq,ur

)

=exp

(

(

vq− vr

)

M

(

vq− vr

))

(13)

(7)

Wewillcallthiskernelsimplyasdistancekernel.K(uq,ur)is1 iff vq=vrNotethatthisfeaturevectordoesnottakeintoaccount the occurrencetiming ofthemessages,butaverages the messag- ing traffic inthat interval. We would like to point out againthe difference betweenthe twowaysofmeasuringuserbehavior dif- ferences.InEq.(13),weconsiderthemessagingeventsintegrated over the observation interval, represented by the d-dimensional count vectorofmessagingeventsaccordingtotheir types.InEqs.

(11)and (12),wecalculate the difference of user behaviors by com- paring and measuring distances, messaging event by messaging event,astheyoccurduringtheobservationinterval.

5.3. Spectralclustering

A matrix of pairwise user-to-user similarities is created from the users’ messages as in Eq.(11) or(13). The kernelmatrix, K, then corresponds to afully connectedweightedadjacency graph, wheretheusersare theverticesandthesimilaritiesaretheedge costs. The adjacency matrix is expected to consists of two sub- graphs: One representing the malicious users characterized by similar behavior patterns, and the other representing the non- malicious users with random-like behavior patterns. In order to partition this graph into these two sub-graphs, we have used thenormalizedLaplacianspectralclusteringalgorithm.Suchalgo- rithmsareconceivedtofindgraphpartitioningsolutionsincluster- ingproblems.Intheliteraturetherearevariousspectralclustering algorithms.WehavepreferredtousenormalizedLaplacianspectral clusteringbecausewewanttonotonlyhavethesimilarnodesto be closelyprojected toeachother,butalsotohavethedissimilar nodestobeprojectedfarfromeachother.Thenormalizedspectral methodssatisfybothofthesecriteria,asdiscussedin[39].

Thedegreeofqthactiveuserinthekernelmatrix,whichisthe sumofalltheweightentriesrelatedtheqthactiveuser,atagiven timeframeisevaluatedas:

dgq= |

U|



r=1

Kq,r (14)

whereKq,r=K(uq,ur).

The degreematrixDis adiagonalmatrix whosediagonalele- ments containthedegree values,dg1,dg2,. . .,dg|U|.TheLaplacian matrix,L,isevaluatedasinEq.(15)andthespectralclusteringal- gorithmisgiveninAlgorithm2.

Algorithm2 NormalizedLaplacianSpectralClustering.

1: GivenK,evaluateDandL,whichareallin|U|×|U|.

2: Compute the first two eigenvectors,

ψ

1 and

ψ

2, of the two smallesteigenvalues0=

λ

1<

λ

2forthegeneralizedeigenprob- lemL

ψ

= D

ψ

,where isthediagonalmatrixofeigenvalues

λ

1,...,

λ

|U|.

3: Matricize

ψ

1and

ψ

2vectorstoobtain∈|U|×2.Usetherows of asthe newfeature vectorsinthemappedspace, y∈2. Apply2-meansclustering.

4: ReturntheclusterlabelvectorCfrom2-meansclustering.

L=D− K (15)

whereK is the|U|× |U| is kernelmatrixwhoseentries, K(uq,ur), arecalculatedasinEq.(11)or(13).

5.4. Automaticidentificationofmalicioususerscluster

The malicious users are conjectured to be characterized by repetitiveandcorrelatedbehaviors,andtherestofusersarechar- acterized by uncoordinated and diversebehaviors. Once the two

clustersareobtained, thenthe finaltaskwouldbe that ofdistin- guishingtheattackerset.

Foreachofthetwoclusters,wecomputethesamplecovariance matrixoftheusermessagesequencevectorsinthatcluster. Since the malicious user cluster is assumed to consist of similar mes- saging behaviors, such messagevectors are expected to be more stronglyalignedalongafewparticularaxes.Infact,intheextreme casewhen all messages in the cluster are of the same type, the samplecovariancematrixwouldbethe0matrix.Therefore,weas- signthecluster withsignificantlyhighereigenvalueconcentration tomalicioususers.Thisalgorithm,basedontheheuristicsthatma- licioususers mustbe somewhat coordinatedto mountan attack, andthereforethat thedatavectorsmustconcentrate alongafew eigenvectors asgiven inAlgorithm 3. Eachcluster is assumed to

Algorithm3 ClusterSelectionHeuristics.

1: ForthegivenclusterlabelvectorC,determinethetwoclusters, C1andC2.

2: Forthetwoclusters,evaluatethesamplecovariancematrixof theprojectedmessagevectors.

3: ifaclusterhasacovariancematrix=0 then 4: Returnthiscluster.

5: else

6: Evaluatetheeigenvaluesoftheclustercovariancematrices.

7: Returntheclusterwiththehighesteigenvalue 8: endif

containatleasttwosubscribers.

Puttingall ofthesestepstogether,the algorithmtodetect the attackersissummarizedinAlgorithm4.

Algorithm4 AttackerDetection.

1: ifGlobalSequenceKernelisusedthen

2: Settheweightparameters

γ

and

ρ

ofthepairwiseheatker-

nel.

3: EvaluatethekernelmatrixKsuchthat

(uq,ur)U× U,we haveKq,r=K(uq,ur),whereweusethetime-stampedmes- sagesequencesWq,Wroftheqthandrthusersinthegiven timeinterval,respectively,withthealignmentkernel,andU isthesetofactiveusers.

4: Unit-diagonalnormalizeKunnormedtoobtainK. 5: endif

6: ifUserDistanceKernelisusedthen

7: EvaluatethekernelmatrixKsuchthat

(uq,ur)U× U,we haveKq,r=K(uq,ur),whereweusethetotalmessagecount vectorsvq,vroftheqthandrthusersinthegiventimein- terval,respectively,withthedistancekernel,andU istheset ofactiveusers.

8: endif

9: Apply the normalized Laplacian spectral clustering algorithm overKsuchthat#clusters=2,asdefinedinAlgorithm2.

10: UsetheclusterlabelvectorCreturnedbythespectralcluster- inginclusterselectionheuristicsasdefinedinAlgorithm3.

11: Returntheselectedclustermembersasthesetofattackers.

6. Experiments

Asis oftenreportedintheliterature, wehave alsofoundthat obtaining andgetting thepermission to use VoIP server datasets proves to be very problematic, mostly due to privacy concerns of the subscribers and the commercial secrecy concerns of the telecommunication operators. Therefore, we have used simulated datasetstoanalyzetheperformanceofthechangepointdetection

Referanslar

Benzer Belgeler

5.47 MHz frekansında Eylül ayında yapılan ölçümlerin iki bölgeye ait şekilleri birleştirilip incelendiğinde, 10-11 Eylül günlerinde Erzincan’dan

where is a forecast for the time series t, represents the value of forecasting for the previous period t-1, and α is a smoothing constant.. where is the initial value

This paper evaluates the real industry implications of the existing forecasting methods and applies neural networks and multivariate time series methods to predict natural

The reason of selecting low values of the packet rate (to trigger the next stage) is that during our experimental study we found that the performance of the

Traditional security mechanisms like Firewalling, Intrusion Detection and Prevention Systems are deployed at the Internet edge are used to protect the network from external

DÜNYAYA BAKIŞ Şükrü Elekdağ YUKARI KARABAĞ RMENİSTAN’ın Yukarı Karabağ'da Aze- rilere karşı sürdürdüğü saldırılara ve Hocalı'da giriştiği toplu

Dikkat Eksikliği Hiperaktivite Bozukluğu tedavisinde dört tedavi seçeneğinin (sadece ilaç, sadece davranışçı tedavi, ilaç ve davranışçı tedavi, standart yerel tedavi)

Her ne kadar MLPA yöntemindeki prob dizaynında, esas olarak bölgedeki dengesiz büyük genomik yeniden düzenlenmelerin tespiti hedefleniyor olsa da, PPARγ üzerinde