A Bayesian change point model for detecting SIP-based DDoS attacks Digital Signal Processing

(1)

Contents lists available atScienceDirect

Digital Signal Processing

www.elsevier.com/locate/dsp

A Bayesian change point model for detecting SIP-based DDoS attacks

Barı ¸s Kurt

^a^,^∗

, Ça˘gatay Yıldız

^a

, Taha Yusuf Ceritli

^a

, Bülent Sankur

^b

, Ali Taylan Cemgil

^a

aDepartmentofComputerEngineering,BogaziciUniversity,34342Bebek,Istanbul,Turkey

bDepartmentofElectricalandElectronicsEngineering,BogaziciUniversity,34342Bebek,Istanbul,Turkey

a r t i c l e i n f o a b s t ra c t

Articlehistory:

Availableonlinexxxx

Keywords:

VoIPsecurity SIP DDoS Simulation

Bayesianchangepointmodels

SessionInitiationProtocol(SIP),asonethemostcommonsignalingmechanismforVoiceOverInternet Protocol (VoIP) applications, is a popular target for the ﬂooding-based Distributed Denial of Service (DDoS) attacks.Inthis paper,we proposeaDDoS attack detectionframeworkbased onthe Bayesian multiplechangemodel,whichcandetectdifferenttypesofﬂoodingattacks.Additionally,weproposea probabilisticSIPnetworksimulationsystemthatprovidesatestenvironmentfornetworksecuritytools.

©²⁰¹⁷ÊlsevierÎnc.Âll^rights^reserved.

1. Introduction

VoIPisthetechnologyofcarryingvoiceandmultimediacom- municationsthroughtheInternetProtocol(IP)networks.Duetoits multimediasupportandlowinfrastructurecost,VoIP systemsare worldwidetakingovercircuit-switchedtelephone networks.With theintroductionof5G,theVoIPispredictedtobecomethedomi- nantmethodologyforvoiceandmultimediacommunications.

TheVoIPsystemstransfervoiceandmultimediadatabetween communicating parties through the packet-switched IP networks basedondata transferprotocols,such astheReal-timeTransport Protocol(RTP).Inaddition,theyrequiresession-levelsignalingpro- tocolsformanaging theircommunicationsessions.Consideringits lightweightnature,simplicityandeaseofimplementation,theSIP [1] is one of the most popular open standard signaling proto- colsdesigned forVoIP.SIPprovides signalingfunctionsnecessary toregisterclients, checktheir locationsandavailability, exchange information on their data transmission capabilities, and provide handshakesnecessaryforconnectionsetups.

Despitealltheir attractivefeatures, thedownside isthat VoIP systemsaremorevulnerable tosecuritythreatscomparedtotheir circuitswitchedpredecessors.Therearetwobasicsourcesofsecu- ritythreatsforVoIPsystems.Firstly,VoIPsystemsare affectedby all the lower protocol layer threats, e.g., layer threats. Secondly, VoIP systems using open standards protocols suffer from many protocols-specific vulnerabilities, in other words, they are prone tosecurity threatsspecificallydesignedto exploit thevulnerabil- itiesoftheunderlyingsignalingprotocols[2],[3].Theseprotocol- specificattacksareusuallynotclassifiedasnetworkattacksbythe

*

Correspondingauthor.

E-mailaddress:bariskurt@gmail.com(B. Kurt).

conventional network-level security systems. Therefore, VoIP sys- temsneedextrasecuritymechanismsfordetectingandpreventing VoIPspeciﬁcattacks.

Oneofthemostfrequentlyobservedtypeofcyber-attackisthe DDoS floodingattack [4],which istypicallyrealizedby sendinga vastamountofnetworkprotocolmessagestoavictim.Thesetypes of attacks aim to exploit the weaknesses in the SIP protocol or faultsduetosomepoorimplementation.AnexampleofsuchDDoS floodingattack istheINVITE attack.Inthiscase, theattackertries toset upcommunicationwithmanySIPusersbysending INVITE requeststotheSIPproxyserver.Theserver,whichmaintainsata- bleforeachSIPsession,holdsanentryforeachINVITErequestand awaitsresponsefromthecallreceiverforafixedamountoftime.

Eventually,theserverreachesitsmemorycapacitywhiletryingto keep track ofan excessive amount ofconnections. Another typical DDoS attack is the SYN-ﬂooding [5], where a target network proxyisforcedtomaintainabarrageofTransmissionControlPro- tocol(TCP) sessions,andeventually becomesunresponsive dueto overutilizationofitsresources.Thus DDoSﬂoodingattacksaimto cripple a target systemby overusing andeventually depletingits resources,suchasbandwidth,CPUormemory,andmakingitun- abletorespondtotherequestsofitslegitimatesubscribers.

DDoSattackscanhavenegativeimpactonbusinesssinceatar- getsystemcannotprovideservicestoitscustomersduringattacks.

Thedowntimeofserverscreatesrevenuelossandreputationdam- age, which in turn leads to loss of revenue as well, for service providers. Furthermore, the productivity of workforce is reduced asemployees cannot useaffected systems foroperations. Among victimsofDDoSattacks,well-knowncompaniescanbefound.For instance, GitHub was under attack for six days [6]. Another victim of such attacks was BBC wherean online DDoS tool named BangStresser,whichdeliversattacksasaservice,mightbeused[7].

https://doi.org/10.1016/j.dsp.2017.10.009

1051-2004/©²⁰¹⁷ÊlsevierÎnc.Âll^rights^reserved.

(2)

A recent survey reports an increase in DDoS attacks, arguing thatitmightbeapossibleresultoftheproliferationofcheapand easy-to-launchattacktools[8].Accordingtoanotherreport[9],the number of attacks decreases while the average peak attack size increases.ForattackstargetingSIPbasedVoIP systems,therehas been an upward trend as well [10]. Defense strategies for these commonDDoSattackshavebeenstudiedextensively[11,12].

Many network security systems have been developed for the detection of SIP-based DDoS attacks [13]. The majority of these systems uses supervised methods,such asthresholding [14] and rule-basedpatternmatching,asin[15,16].Thesesupervisedmeth- ods require a training phase for learning patterns for each type of attack andfor building a dictionary of known attacks. Attack detectionisbasedon ﬁndingmatchingpatternsbetweenthecur- rentnetwork state withoneof theknown attack patternsinthe trainingset.However,whenanunprecedentedattackoccurs,asu- pervisedsystemcaneasilyfailtodetectit,sincethepatternofthe new attack will possibly be differentthan all the learned attack patterns.Itbecomesimperativethen tore-trainthesystemby an extendedtraining data setwhich includessamplesfromthe new attacktraﬃc.

In thispaper, we focus on the detection of SIP-specificDDoS floodingattacks[3,17].Weaimthereforetodevelopamorerobust andgeneralizableDDoSmonitorbasedonanomalydetectionprin- ciples. Anomaly detection [18] is an unsupervised methodology where the systemis programmed to recognize significant devia- tionsfromitslearneddatapatterns,andmarkthemasanomalous events.Inourcase,anomalouseventsareinterpretedandmarked, subjecttofurtheranalysis,assecuritythreats.Weassumethatthe SIPserverstatehasastationarybehaviorundertheso-callednor- mal,“non-attack” SIP traffic, butthat thesestatistics will change noticeably under a DDoS flooding attack. Tosense these attacks, wehavedesignedourfeaturevectorsasconsistingofacombina- tionofincomingandoutgoingSIPmessagecountsplusthevector of resource usage measurements of the SIP proxy software. Our DDoS monitorisbased onthe Bayesianchange pointmodel [19]

whichmodels thenormal SIPserver behaviorandinfers changes thatarepossiblyduetotheDDoSattacks.

Collectingreal-worldVoIPnetworktracesandannotatingthem withoutviolatingtheprivacyoftheusersisatedioustask.There- fore,fortheproof ofourconcept,we conductedourexperiments in a simulated environment. We developed a real-time SIP net- worksimulatorsystem,whichmodelsasocialnetworkforagroup ofusers.Thesimulatorgeneratesactualvoiceconversationcallsby settingupSIPsessionsbetweenusersthrougha SIPproxyserver.

OurDDoS detectionmechanismisdeployednexttotheSIPproxy server,so thatit doesnot trackRTPtraﬃcbetweenusers.There- fore, the simulated SIP sessions are silent communications, i.e., actual data transfer via RTPis not generated. We generateDDoS attackswiththehelpofacommercialnetworkvulnerabilityscan- ningtoolNova-VSpy[20],simultaneouslywiththeVoIPsimulation.

Thecontributionsofourworkcanbelistedas:

– Wedevelopa BayesianchangepointmodelfordetectingSIP- orientedDDoS attacks.The proposedframework extends and generalizesthe previous change-point based detectionmeth- ods.Ourchangepoint-basedDDoSmonitorcanbecustomized withdifferentserverparametersanddifferentprobabilisticob- servationmodels.

– Areal-timeSIPnetworktraﬃcsimulatorbasedonsocial network modelingis developed andthesoftware madepublicly available. The proposed framework is tested with real-time datageneratedbythesimulator,interleavedwithDDoSattack data generated by a commercial network vulnerability scanningtool.

1.1. Paperstructure

The remainder ofthispaperisorganized asfollows.Section 2 presentspreviousstudiesonSIPDDoSdetectionandchangepoint models. Section 3 presents a SIP network terminology and details of the protocol. Section 4 presents our change point model indetails. Section5 describestheexperimentalsetup we usedto evaluate our methodology. The experimental results are given in Section 6.Finally,Section 7,evaluates results oftheexperiments, anddrawsconclusions.

2. Relatedwork

There are comprehensive literature surveys on vulnerabilities of the SIP protocol [2], VoIP security research [3], DoS attacks targeting SIPnetworks[17], andsecuritysystems tocounter SIP- basedDoSattacks[13].Oneoftheearliestandsimplestattempts to prevent single-source DoSﬂooding attacks inSIP systemswas proposed by Iancu [14] where a rate limiter is deployed at the server to limit per-host SIPtraﬃc. More elaborate methodswere proposed to detect both single and distributed DoS attacks em- ployingrule-basedschemes,statisticalmethods,anomalydetection approaches,andmachinelearningtools.

Rule-based methods maintain alist ofrules, orprotocolfinite state machines, and check the current server state against con- sistent patterns described in the rule set [15,21–23]. Ormazabal et al. [24] propose a large scale SIPfirewall solutionby combin- ing several rule-based filters and attack mitigation mechanisms.

Whilesuchrule-basedsystemsareusefulindetectingDoSattacks, theyrequirecarefullydesignedandperpetuallyupdatedrulebooks and ﬁne-tuned thresholds.Since these systemscan easily miss a novelattackwhosedescriptiverulehasnotyetbeenlearned,they needtobereinforcedwithadditionaltoolsbasedonstatisticalap- proaches.

Machine learning methods wereproposed asan alternative to rule-based and statistical methods for DDoS ﬂooding detection, including support vector machines [25], evolutionary algorithms [26],naive Bayes,anddecisiontrees [27].Tsiatsikas et al.[28,29]

give a comparison of 5 supervised classifiers and conclude that thesemethodsprovidegoodresultson low-rateDoSattackswith little classification time overhead. Inherently, the success of supervised algorithms dependson the quality ofthe data set used during their training. For example in [29], authors employ different training sets for different basic scenarios. Obtaining such high quality training data can be difficult in a real world im- plementation of a supervisedsystem. In contrast,we propose an unsupervised system,withanoptionaltraining phasetooptimize itsparameters.Weshowthatsettingthoseparametersempirically withthehelpofdomainexpertiseissufficient.

Reynolds and Ghosal[30] were the ﬁrst to propose applying change point detectionin SIP networks. They present a cumula- tive sum(CUMSUM)algorithminordertodetectINVITEﬂooding.

Later, Rebahi and Sisalem [31] have developed a parametric ver- sionofthisalgorithm.Zhanget al.[32]proposedtouseadditional features to enhance the accuracy of CUMSUM. Geneiatakis et al.

[33] propose bloomfilters toefficiently trackincomplete SIPses- sionsandtoraiseanalarmiftheseexceedacertainthreshold.The majordisadvantageofthesealgorithmsisthatoneneedstoengi- neerdifferentsetsoffeatures inordertodetectdifferenttypesof floodingattacks.

The works closest to our approach are the distance-based anomaly detection methods [34,35], where a distance metric is used to measure the dissimilarity between the distributions of normal andobservedtraﬃc features. Ifthedistance betweenthe normalandobserveddistributions isabove athreshold,an alarm isgenerated.Similartoourapproach,thesemethodscanbe used

(3)

todetect anytype ofnetwork attack provided that full SIPmes- sagehistogramisincludedinthefeatureset.Ourmethodextends and generalizes these anomaly detection methods by introduc- ing Bayesian framework, which models the SIPserver state with a set of features that incorporates both network traffic and SIP serverresourceusagedata.Inourmethod,theattack decisionre- liesonarobustposteriorprobabilitycalculationratherthansimple thresholding.Toour bestknowledge, thiswork presentsthe first Bayesianframework tailored specificallyto modela SIP serverin orderto detectSIPanomalies,hencefillsan importantgapinthe literature.

3. SIPnetworktraﬃc 3.1.SIPterminology

SIP is designed to initiate, modify and terminate communi- cation sessions among agents.Four general types of SIP entities are deﬁnedin RFC 3261 [1]: user agents, proxyservers, redirect servers and registrars. A user agent (UA) is the endpoint entity that generate and receive SIP messages. In a typical SIPsession, UA’s communicate by sending request andresponse messages to eachother.TheregistrarisresponsibleforregisteringtheUA’s,and storing their location information.The registered UA’s communi- catewith each other via the intermediary of proxy servers. The proxyserversdelivertherequestandresponsemessagesbetween UA’s.Finally,theredirectserversallowproxyserverstocommuni- catewithotherserversfromexternaldomains.

The SIP messages are divided into two basic categories: SIP requests and SIP responses. Each SIP request sent by an UA is answered by a corresponding SIP response. For examples, a UA canmakea REGISTER requesttotheregistrarinorder togeton- line, make an INVITE request to another UA to start a call, or makea BYE request to terminatean ongoing conversation. A SIP response message generated for a request can be from one of the 6 SIP response categories: 1xx-provisional, 2xx-success, 3xx- redirection, 4xx-client, 5xx-server erroror 6xx-globalfailure. For example,a UAmayresponsewitha200-OKmessageforaccepting anincomingrequest.

3.2.Anexamplemessageﬂow

AnillustrativeexampleofSIPmessagecommunicationisgiven in Fig. 1. It shows the ﬂow of exchanged messages between a server and two users during a normal call. In this scenario, Al- iceinitiatesacall to Bobby sending an INVITE packetto theSIP Server.Afterauthentication,theSIPserverforwardsthisrequestto Bob.Similarly,theresponseofBob,inthiscaseACK packetshow- ing that the call is accepted,is transmitted to Alice through the SIPserver. At the end, BYE messages terminate the conversation betweenAliceandBob.

OnceaSIPsessionisestablished,twoendpointsstartexchang- ing multimedia data such asaudio conversations, video streams, etc. Recallthat SIP, being a signalingprotocol, isnot involved in themultimediadataexchangebetweenagents.Handshakeonthe kindand encoding type of data, on the address andports to be usedfortransfer,andother detailsregardingthedataexchangeis usuallyachievedusingSessionDescriptionProtocol(SDP)[36].Ad- ditionally,real-timemediadeliveryreliesonRTP[37].

Real-worldSIPpacketexchangescenariosusually involvemore thantwo serversandtwo agents.Theabove callsetup caseisil- lustrativebutsimplistic.Forexample,itdoesnotspecifyhowthe serverreaches out to the caller. A setup inwhich Alice andBob are not registered to the same server would require a location server and the re-transmission of the INVITE message. Similarly, other features supported bySIP – such ascall transfer, callpark,

Fig. 1. A call scenario in SIP network.

conference – leadto distinctcall ﬂows.In summary,SIPmessage traﬃcdatacanbequitecomplex.

3.3. DDoSattacksinSIPnetworks

DDoSfloodingattackscouldrapidlyaffectnetworktrafficchar- acteristics andcauseservicedegradations.Theirimpactis contin- gent on the attack parameters anddiffers substantially fromone attack toanother.A DDoS detectionmethodisexpectedtosignal anattack, inprinciplepracticallyindependentofitsconfiguration.

Therefore,DDoSdetectormustberobustandhighlysensitive,that is,withhighdetectionrateandlow probabilityoffalsealarmun- derawiderangeofrealisticnetworkconditions.

Mirkovic et al.[4] classified DDoS attack mechanisms on the basis of its impact on the victim.First, one wouldexpect an in- creaseintheincomingnetworktraffic –evenbeyondtheserver’s bandwidth –asa resultofa flooding attack. The severityofthis increase,however,isdirectlyrelatedtotheresourcestheattacker possesses and cannot be forecast beforehand. Second, the flood- ingratedoesnotnecessarilystaythesamethroughouttheattack.

Mirkovic etal. alsonoted that slowrateboosts, i.e., creepingat- tackstypicallyresultindetectionlatency.

Another significant parameter is the SIP packet type used in the flooding. Typical DDoS scenarios consider a SIP server being flooded by one type of packet such as INVITE, SUBSCRIBE, or BYE. Nevertheless, DDoS attacks can also be performed using a judiciously selected mixture of SIP requests. Thus, intelligently mountedschemessuchasslowboostattacks,multi-SIPpacketat- tacks,multi-agent attacksthat trytoobfuscatetheir synchronism bytime jitteringrequiremoreadvanced defensemechanismsand concomitantly more computation time and power. Despite these variabilities, the DDoS shield is expected to have low latency in ordertotimelyinitiate attackprevention.

4. Methodology

Inthiswork, we useaBayesian approachforthe detectionof abrupt changes inSIP traﬃc that could possibly correspond to a DDoS attack. This approach is based on a hierarchical probability model, more precisely a hidden Markov model, that relates

(4)

observedfeaturesfromnetworkpackettraﬃcandserverloadmea- surements (such asCPU usage andsystem calls) to hidden variables.Thesehiddenvariablesindicatethestateofthesystem,e.g., changeandno-change,along withother hiddendynamical quan- tities.Formally,we willreferto thefeatures orobservations asv (visible)andthestate changeindicatorsas s.Other variablesthat arenotofdirectrelevanceforourproblem,butareneededforde- scribing the state of the dynamical system will be referred as h (hidden).Oncethemodelisspeciﬁed,theinferentialgoalistocal- culatetheposteriorprobability

p

(

s

|

^v

) =

^p

(

v

|

s

)

p

(

s

)

p

(

v

) ∝

p

(

v

|

^h

,

s

)

p

(

h

|

^s

)

p

(

s

)

dh

Apart from speciﬁc special cases, the above integral is in- tractable. Fortunately for the change-point problem, we can de- scribea modelwhere exactcalculation becomes possible forrel- ativelyshorttime sequences.We showthat,witharathersimple approximationheuristicssuch aspruning,we can getaconstant- spacealgorithmthatcanbefeasiblyimplementedinrealtime.

Inthesequel,weﬁrstpresentthewaytheobservations(v)are composedfromthefeaturescollectedfromtheSIPserver;thenwe describetheprobabilitymodelwhichisaninstanceofaBayesian changepointmodel,andthewaythismodelcanbeusedforonline estimationofDDOSattacks.Finally,wegivedetailsofourapprox- imationandprovideacomplexityanalysis.

4.1. SIPserverfeaturesasobservations

The first step in a Bayesian approach is to provide a probabilistic generative model for the observations collected from the system.However,beforegoingintothemathematicaldetailsofour generativemodel,letusfirstgiveacleardefinitionoftheobserva- tions.AswecontinuouslymonitoraSIPserver,wecollectreal-time statisticsforaperiodoft andformanobservationvector vt as asummary ofthe statisticscollected during that period. vt isan N dimensionalvectorcomposedofthenumberofSIPrequestand response messages, server log messages,server statistics such as numberof TCPconnections, together withthe CPUand memory usagemeasured attheend.The completelistofthefeaturescol- lectedfromthesystemisgiveninTable 3andexplainedinfurther detailinSection5.

4.2. Multiplechangepointmodel

The multiple change point model is a specialform of hierarchical Markov models [19], where the observations conditionally dependonlatent states,andthestateseitherfollowtheprevious regimeorjumprandomlytoa newone. Asfarasnetworkmoni- toringisconcerned,theseregimechangesimplyanomalousevents, andwhichmayberelatedtosomesecuritythreats.Thegenerative equationsofthemultiplechangepointmodelcanbegivenas

h0

∼ (

h0

;

w

)

(1)

s_t

∼ [

^st

=

⁰

] π + [

^st

=

¹

](

¹

− π )

(2) h_t

|

^st

,

h_t₋₁

∼ [

^st

=

⁰

]δ(

^ht

−

^ht−1

) + [

^st

=

¹

](

^ht

;

^w

)

(3)

v_t

|

^ht

∼ (

^vt

;

^ht

)

(4)

whereδisDiracdeltafunction.

The observation vt, at time t, is assumed to be a random variable sampled from a (v;^h) distribution with an unknown parameter h_t. Initially, h₀ is drawn from a (h;^w) distribution.

Afterwards,ateachtimeinstancet,ht iseitherre-drawnfromthe sameinitialdistributionorsettothepreviousvalueht−¹.Thede- cision forchangeis givenby a Bernoulli randomvariable s_t.The modelallows ht tochangeasmanytimesasrequired duringthe

Fig. 2. Bayesian change point graphical model.

run ofthealgorithm.Thegraphicalrepresentationofthemultiple changepointmodelisgiveninFig. 2.

OurDDoSdetectionsystemincludesa monitoringunitforob- serving and collecting network traffic data as well as SIP server activities. Themonitoringunit collects andcompiles networkand server statistics into an observation vector, i.e., a feature vector, vt ateach∼ ^{t (1}^second)^timeînterval,âs^the^resumeôfêvents that have occurred in theSIP server during that last observation interval. For each such feature vector, the model infers whether theobservationvectorisgeneratedbythepreviousregime,thatis s_t=^{0 and}^ht=^ht−1,orwhethertheserverstatehasjumpedtoa newregime,whichmeans s_t=^{1 and}^ht∼ (^ht;^w).Theobserva- tion model andits prior distribution are selectedaccording to the features collected from theserver. The details ofthe data features andthedistributionsusedinthechangepointmodelare giveninSection4.3.

Thepriorprobabilityofchange,

π

,andtheparameters w ofthe prior distribution(h;^w) arethe hyperparametersofourmodel.

Provided that thesehyperparameters are known, andthe system is fully observable, meaning that the change points s1:^T, hidden statesh1:^T andobservationsv1:^T areknown,wecancalculatethe fulljointlikelihoodasfollows:

p

(

s1:^T

,

h0:^T

,

v1:^T

) =

p

(

h0

)

T t=¹

p

(

st

)

p

(

ht

|

ht−¹

,

st

)

p

(

vt

|

ht

)

(5)

In reality,the change point events s1:^T andthe hiddenstates h1:^T arenotobserved,andtheproblemofdetectingachangepoint eventattimet isformulatedascalculatingtheposteriorprobabil- ityp(s_t=¹|^v1:T).FromtheBayesrule,wecanwrite

p

(

s_t

|

^v1:T

) =

^p

(

v₁_:_T

,

s_t

)

p

(

v₁_:_T

) ∝

^p

(

v₁_:_T

,

s_t

)

(6) The probability of changeat time t canbe inferred online by calculatingthefilteringdistributionp(s_t|^v1:t),orinanofflineman- ner bythesmoothingdistribution p(st|^v1:^T).Thecalculationscan be done efficiently via the recursiveForward–Backward algorithm [38].Thefilteringdensityiscalculatedbytheforwardrecursionof the

α

messages:

α (

s_t

,

h_t

) ≡

^p

(

s_t

,

h_t

,

v₁_:_t

)

(7)

=

st−1

ht−1

p

(

h_t

|

^ht−1

,

s_t

) α (

s_t₋₁

,

h_t₋₁

)

×

^p

(

v_t

|

^ht

) ×

^p

(

s_t

)

(8)

Then,thechangeprobabilityiscalculatedas p

(

s_t

|

^v1:^t

) ∝

^p

(

s_t

,

v₁_:_t

) =

ht

α (

s_t

,

h_t

)

(9)

In an oﬄine setting, where we can calculate decisions using the full observations ofthe time series v1:^T, we can smooth the

(5)

ﬁltering distribution with backward recursions to get a stronger estimate p(st|^v1:^T).Thebackwardrecursioncanbewrittenas

β(

s_t

,

h_t

) ≡

^p

(

v_t₊₁_:_T

|

^st

,

h_t

)

(10)

=

st+1

ht+1

p

(

ht+¹

|

^ht

,

st+¹

)β(

st+¹

,

ht+¹

)

×

^p

(

v_t

|

^ht

) ×

^p

(

s_t

)

(11)

Thesmootheddensityiscalculatedas p

(

st

|

v1:^T

) ∝

ht

p

(

st

,

ht

,

v1:^t

)

p

(

vt+¹:^T

|

st

,

ht

)

(12)

=

ht

α (

st

,

ht

)β(

st

,

ht

)

(13)

Real-timeanomalydetectiontracksstreamingdata,sothatpro- cessing the v1:^T observation sequence is not feasible in practice sinceT isnotbounded.Furthermore,anomalydetectionisatime- criticaltask, whichimpliesthatthechangepointsmustberecog- nizedassoonaspossible.Therefore,calculatinga smoothingdis- tributionisfeasibleonlyifthesystemisallowed tomakechange point decisions deferred by a fixed amount of time L, which is calledthelag.Insuchacase,theprocessiscalledfixed-lagsmooth- ing,wherethechangepoint inferenceforst isdone attimet+^L bycalculatingthedensityp(st|^v1:^t+^L)inlieuofp(st|^v1:^T).Itisim- portantto note that this process requirescalculating a backward recursion for L steps starting ateach time point t+^L, ând ^this increasestheprocessingcomplexity.

4.3.DDoSdetectionviamultiplechangepointmodel

We havedescribed a multiplechange point model with arbi- traryhiddenstate distribution andobservationmodel,with theassumption that isthe conjugateprior of for computa- tionalsimplicity.Nowweassignactualprobabilitydistributionsfor thehiddenstateandobservationmodels.

We let theobservation model be a coupleddistribution of multinomialandPoissondistributions.Multinomial distributionis usedto model the ratios ofthe magnitudes of thesignals ratios in an observed vectors. On the other hand, Poisson distribution modelsthemagnitudesofthetrackedsignals.Withoutlossofgen- erality,weassumethatthefeatureswhoseratioswillbemodeled arestoredintheﬁrst M positionsoftheobservationvectorv,de- notedas v1:^M andthe remaining N−M positions areﬁlled with thefeatures whosemagnitudes are modeled.Then, we can write theobservationmodelas

(

v

) =

M

(

v1:^M

;

^p

) ×

N i=^M+¹

P

(

v_i

; λ

i

)

(14)

wherethemultinomialandPoissondistributionsaredeﬁnedas M

(

x

;

^p

) = (

ix_i

+

¹

)

i

(

x_i

+

¹

)

i

p^x_iⁱ (15)

P

(

x

; λ) = λ

^xe^−λ

(

x

+

¹

)

⁽¹⁶⁾

In this setup, the hiddenstate vectors h= (^p;λ) are the re- spectiveparametersofthe multinomialandPoissondistributions.

SincethepriordistributionofmultinomialisDirichletdistribution andthat of the Poisson is the Gamma distribution, the prior of ourstatevectorh becomestheproductoftheseconjugatepriors, namelyDirichletandGamma,andcanbewrittenas:

Table 1

Modelvariablesandparameters.

Variable Description s1:T Reset switches h1:T Hidden state vectors v1:T Observation vectors

π Reset probability

Prior distribution of hidden states w Parameters of thedistribution

Observation model

α Dirichlet distribution parameter a,b Gamma distribution parameters

() Gamma function

Fig. 3. Expansion in the forward variable messages.

(

p

, λ) =

D^ir

(

p

; α ) ×

N i=^M+¹

G

(λ

i

;

^ai

,

b_i

)

(17)

TheDirichletandGammadistributionsaregivenas Dir

(

p

; α ) =

_M

i=1

α

i

M

i=1

( α

i

)

M i=¹

p^α_iⁱ⁻¹ (18)

G

(λ ;

^a

,

b

) =

^b^a

(

a

) λ

^a⁻¹e⁻^b^λ (19)

where

α

isan M dimensionalvectoranda andb are N−^{M di-} mensionalvectorssuch thateach {âi,bi} îsâhyper-parameterfor Gammadistribution.Hence,thehyper-parametersw ofthemodel is theset w= (

α

,a,b).The complete setof modelvariables and parametersaregiveninTable 1.

4.4. Implementationdetailsandcomplexityanalysis

The inference forthe change point modelrequires calculating the

α

(s_t,h_t)andβ(s_t,h_t)messagesattheendofeachobservation period.We simply need to store a table ofprobabilities foreach

α

(st=ⁱ,ht= ^j), such that i∈ {⁰,1}^and ^j∈^Dom() andupdate thistable accordingto theequations in(8).The sameis truefor the β messages. Whenthehiddenstate distributionhascontinu- ousdomain,wehavetoexpressthe

α

andβ messagesasmixtures ofpotentials.Anpotentialisdescribedas

φ (

p

, λ) =

^exp

(

l

)(

p

, λ ; α ,

a

,

b

)

(20) wherel isthelogarithmofthenormalizingconstant,and

α

,a and b are the parameters of the reset and observation distributions, respectively.

Ateachtimestep,theswitchingvariableattainsoneofthetwo values,therefore,anadditionalpotentialisaddedtothe

α

andβ messagestoindicatethechangepotential.

α

(ht,st=¹)isasingle potential and

α

(ht,st=⁰)isamixtureoft potentialstransferred from the previous

α

message. This lineargrowth of the

α

messages for the forwardrecursion is illustrated in Fig. 3. Details of the operations requiredto implement forward andbackward re- cursionsarepresentedinAppendix A.

This linear growth is not sustainable for online continuous tracking of the server state. One has to limit, then, the number

(6)

Algorithm1BayesianChangePointDetection.

function BCPM(π,w,v1:T,LAG,THRESHOLD) alpha← [ ]

for t=1. . .T do

alpha_p=^Predict(alpha,π,w) alpha=^Update(alpha_p,vt) if LAG>0 then

beta=BackwardFilter(π,w,vt+¹−^LAG:^t) gamma=^Smooth(alpha,beta) cpp=^ComputeCPP(gamma,len(beta)) else

cpp=^ComputeCPP(alpha,1) end if

if cpp>THRESHOLD then Alarm()

end if end for end function

Table 2

Averageruntimeofthealgorithm.

Routine Time (μs)

PREDICT 35

UPDATE 648

BACKWARD_FILTER 17

SMOOTH 1400

COMPUTE_CPP 7

Total 2107

of mixture potentials in the forward message by K , indicating the maximum number of components. Once a message reaches themaximumnumberofallowedcomponents,ateachsubsequent step, the component withthe minimum normalizing constant is pruned. Therefore,in the worst case, an

α

messagehas K com- ponentsandaβ messageL components,sincewehaddecidedto runthe backward-recursionforonly L steps.It followsthenthat, during ﬁltering, atmost K observation updatesare required and duringthesmoothingoperations,wherewemultiplyan

α

message with a β message, K×L multiplications are performed. There- fore,thenumberofoperationsatanytimeinstanceis O(K L).We empiricallyset K=^100,ând^the^lag ^parameter ^L=^5.În ôurêx- perimentsetup,usingmorethan100potentialshadnosignificant contribution,andwhilealagvalueof5significantlyimprovedthe accuracyofthesystem,biggerlagvaluesdidnotyieldmuchofan improvement.

4.5. Realtimeanalysis

Algorithm 1presentstheoﬄine versionofthemain detection loopofouralgorithm.Here,oﬄineisinthesensethatthewhole datasetisavailableatthebeginningofthealgorithm.Intheonline version,thesingle loop inthealgorithmwillbe runexactlyonce aftereachobservationperiod.Wetimetheindividualfunctionsof the algorithm, whose descriptions are also given in Appendix D.

The actual run time ofthe algorithm dependsonthe numberof features used, the value of the lag L, and maximum number of components K .In this measurement we set L=^5, ^K=^{100 and} used all available features.The algorithm is coded with C++ and experiments are run oﬄine, on an INTEL i7 CPU@2.7 GHz on a datasequenceof2000observations.WecanseefromTable 2that oneiterationofthemainloop executesin2 ms (2107 μs)onthe average,whichallowsonlinedeploymentofouralgorithm.

4.6. Parameterlearning

During the inference stage, we had assumed that the hyper- parameters of our multiple change point model, namely the reset probability

π

and the latent state prior parameters w were

given.Inpractice,theseparametersmustbesettoappropriateval- ues foraccurate change point estimation. Fora small numberof parameters, a grid search methodcan give good parameter estimates;howeverforlargemodels,i.e.,forlargedimensional w,the search methodisnot applicable.Thus, weusea maximumlikeli- hoodapproachto ﬁndthebest hyper-parametersasa functionof observations. Given observations v₁_:_T, we would like to ﬁnd the parametersthatmaximizetheloglikelihood

Lπ,w

(

v1:^T

) ≡

^{log p}

(

v1:^T

| π ,

w

)

(21)

=

^log

s1:T

h0:^T

p

(

s₁_:_T

,

h₀_:_T

,

v₁_:_T

, | π ,

w

)

(22)

Sincethisloglikelihoodexpressionisintractableduetothesum- mationoverlatentparameters,weemployaniterativeExpectation- Maximization (EM) scheme to ﬁnd the {

π

,w} ^estimates. ^By Jensen’sinequality,theloglikelihoodislowerboundedas Lπ,w

(

v₁_:_T

) ≥

^{log p}

(

s₁_:_T

,

h₀_:_T

,

v₁_:_T

, | π ,

w

)

q(z)

−

log q

(

z

)

q(z) (23)

This bound is tight for q(z)= p(s1:^T,h0:^T|v1:^T,

π

,w). The log- likelihoodcanthenbemaximizediterativelyasfollows:

E-Step:

q

(

z

)

^{ne w}

=

^p

(

s₁_:_T

,

h₀_:_T

|

^v1:T

, π

^old

,

w^old

)

(24) M-Step:

( π

^{ne w}

,

w^{ne w}

) =

arg max π,w

p

(

s1:^T

,

h0:^T

|

v1:^T

, π

^old

,

w^old

)

q(z)^{ne w} (25) The detailed derivations ofthe EM algorithm forDirichlet-Multi- nomial andGamma-Poisson changepoint potentialsare given in Appendix B.

5. Experimentalsetup 5.1. Datageneration

Our data generator, detailed in [39], is made up of four dis- tinct modules:(1) a SIPserver, (2) a trafficsimulator,(3) a DDoS attack generator and(4) a network traffic monitor. Asa registrar and SIP proxy server, we have used an Asterisk-based PBX soft- warenamedTrixbox[40].Tomimicthenormalmessagetrafficon a SIPserver,we havebuiltBoun-Sim[39],aprobabilistic SIPnet- worksimulationtoolthatgeneratescallsbetweenanumberofSIP endpointentities inrealtime.Concurrentlywiththe normaltraffic simulation,a rich variety of DDoS attacks were generatedby a commercial vulnerability scanningtool, calledNOVAV-Spy [20].

Thefourthandﬁnalcomponentinthesetupisthenetworkmon- itor,amodulethattrackstheserver,extractsanddeliversfeatures tothechangepointmonitor.

Our simulation tool is driven by a probabilistic generative modeltorecreatetypicaluserbehaviors,suchasmakingcalls,an- swering,rejectingorignoringanincomingcall,andholdingonan ongoing call.User actionsgeneratedbytheSimulatorarerealized asactualSIPcommunications,whereSIPmessagesareexchanged betweenUA’sandTrixbox.SimulatoromitstheRTP messagescar- ryingtheactualconversationaldatabetweenusers,sincethesedo not pass throughthe SIPserver, andarenot relevant tothe out- come of the simulation. Details of the Simulator parameters are presentedinAppendix C.

Notethat thesimulatorparametersshouldbe setaccordingto thecapacityofTrixbox asnumberofcallspersecondmayprevent

(7)

serverfromhandlingthetraﬃc generatedbythesimulatorBoun- Sim.A detailedperformance analysisofAsteriskserver ofversion 1.6canbefoundin[41]wheretheperformanceisdegradedafter thenumberofsimultaneouscallsexceeds600.

5.2.Datatraces

Weconductexperimentsonfourdifferentsimulateddatasets.

Ourdata setgenerating mechanism iscontrolled by two bi-level variables, one for network traffic intensity, the other for attack intensity, and each can be either set as low or high. To set the networktraffic intensity, we tunethe call rateparameters ofthe userssincephonecallsconstitutethemainsourceofthenetwork traffic.TheaveragenumberofSIPpacketspassingthroughtheSIP server per second in low and high data sets are 75 and90, respectively.Tosetthefloodrate,wechangethenumberofnetwork packets delivered from V-Spy to the server in each second. In a low attack setting,the server is floodedby 100 packets per second,whereas500packets areusedinhighattack settings.Inthe sequel,werefertothesedatasetsasLOW–LOW,LOW–HIGH,HIGH–

LOW andHIGH–HIGH, where thetwo adjectives qualify, inorder, thenetworktraﬃcintensityandtheﬂoodrate.Allsimulationsare realizedby500activeusersregisteredtotheserver.

In order to demonstrate the robustness of our change point model,wetesteditwithadatasetconsistingof40differentDDoS attacks.These attacks are generated by tuning the following op- tionsofV-Spy:

•Âttack^{Type: We} ^flood ^the ^server ^with ^five ^different ^SIP ^re- questpackets,randomlychosenfromamongREGISTER,INVITE, OPTIONS,CANCEL andBYE requests.Eachtypeofattackgener- atesdifferenttypesofchangesinSIPserverstate.

•^TransportProtocol: Since SIP operates independently of the transport protocol, we generated attacks over both TCP and UserDatagramProtocol(UDP).

•Fluctuation: Nova V-Spy cangenerate floods withboth constant and fluctuating rates. In half of our experiments, we generatedfloodswithfluctuatingrates.

•^Content^{Size: Nova}^V-Spy^can^optionally^insert^dummy^strings to the end of SIP messages, which must also be within the capabilityoftheattackdetectorsincethismanipulationaffects thebandwidthconsumption.

An exampleattack could be an “INVITE attack through UDP port withoutanyﬂuctuation andwithlargecontent size”. Theattacks lastforabout20secondsandwehaveleftanintervalofatleast25 secondsbetweentwo consecutiveattacks;this resultsina simu- lationsequence ofaround halfan hour duration inorder forthe 40 attacks to occur. IP addresses and the user id’s of attackers were randomly chosen and the attackerterminals were unregis- teredthroughoutthesimulation.

5.3.Datafeatures

Table 3shows the features monitored for DDoS attack detection.We candividethefeaturespaceroughlyintofivecategories, thefirsttwocategoriescollectedfromserver’snetworkconnection side,andthelastthreecategoriescollected fromserver’sresource management side. The first two categories consist of a variety of packet types in a SIP network: SIPRequests and SIPResponses (Fig. 4); these packet types havedifferent well-defined roles [1].

Noticethattheactualfeaturesusedinthedetectionmodelarethe count statistics or histogram of these message type occurrences withinanobservationinterval(e.g.,1 sec).Theunderlyingassump- tionhereisthatasigniﬁcantchangeinthepatternofSIPmessage

histogramsisadirectreﬂectionofmessagingtraﬃcbehavior,indi- catingpossiblyananomaly,i.e.,anattack.

The other three featurecategories are entitledasResourceUs- age, AsteriskStats and AsteriskLogs. The first of these consists of thepairofCPUusageandmemoryusageofthevirtualmachinein whichTrixboxisinstalled.Thesecond oneismadeupoffeatures that reflecttheloadcreatedby Asterisk.The lastcategory counts thekeywordsinthelogfilesgeneratedbyAsterisk.Weconjecture that allthesefeatureswoulddivergefromtheir averagevaluesin thecaseofanattackandhencepotentiallyqualifyasanomalyin- dicators.

5.4. Evaluation

Wemeasuretheperformance ofourDDoSmonitorontheba- sis oftheF-score,whichis deﬁnedastheharmonicmeanofthe precision (P) andrecall (R) measures. The F-score gets closer to 1when both precision andrecallare closeto 1,andcorrespond- ing togoodperformance;conversely,theF-scorediminishes to0, when thesystem performs poorly eitherdueto low precision or lowrecallorboth.

F-score

=

²

×

^P

×

^R

P

+

^R ⁽²⁶⁾

Precision (P)

=

# true alarms

# alarms

=

^T^a T_a

+

^Fa

(27) Recall (R)

=

# true alarms

# ground truth

=

^T^a

G_a (28)

where Ta and Fa are the true alarms (true positive) and false alarms(falsepositive),andG_aisthetruenumberofchangepoints.

Analarmgt meanssignalingofachangeevent,anditistriggered whenever the change point probability inEq. (14) at time t ex- ceedsacertainthresholdλa.

gt

=

0 if p

(

s_t

|

^v1:t+L

) < λ

a

1 otherwise (29)

The trueand false alarmsare calculated by matchingthe alarms g1:^T,withthegroundtruthofchangeeventsgˆ1:^T.Inthedesignof ourexperiment, thetime stamps forthe beginningofattacksare manually set, butthe actual effectof an attack is observed with some delay due to the combined emergent behavior of the SIP server,simulationandthevulnerability scanningtoolNovaV-Spy.

Therefore we setthe groundtruthsas attack time stamps which areinitiallysetandadjustthemmanuallyafterwards.

Wedeclare acorrectdetectionifthealarm g_i iswithin atol- erance vicinityofthecorresponding groundtruth event^gˆj,that’s

|ⁱ−^j|<w,andincrementthenumberoftruealarms.Alarmsnot matchedwithanygroundtruthareregardedasfalsepositives.

6. Results

We evaluate the performance of our proposed DDoS monitor with model simulation data generated by various input feature combinations.Wetestexhaustivelythe5featurecategoriesinvar- iouscategory combinations,i.e.,includingthem ornot,intoPois- son andMultinomial cases,respectively. Thisresults ina total of 3⁵−¹=242 observation models, whereeach observation model correspondstooneparticularinstanceofcategorycombination.

The hyper-parameters foreach observation model need to be adjusted forgetting bestF-scores. Forthis purpose,we ﬁrst per- form a gridsearch inside theparameter space. Since grid search isfeasibleforonlyalimitednumberofparameters,weuseshared parametersforthepriorsoftheDirichletandGammadistributions.

Inthissetup,weassignasingleparameter

α

fortheDirichletpri- orsbysetting wD= [

α

,

α

, . . . ,

α

_] anda singleparametera forall

(8)

Table 3

TheﬁvecategoriesoffeaturescollectedfromthenetworksideandresourcesideoftheSIPserver.

Category Feature Description

SIP Requests REGISTER Num. of “register” requests INVITE Num. of “invite” requests SUBSCRIBE Num. of “subscription” requests NOTIFY Num. of “notiﬁcation” requests OPTIONS Num. of “options” requests ACK Num. of “acknowledgment” requests

BYE Num. of “bye” requests

CANCEL Num. of “cancellation” requests

PRACK Num. of “provisional acknowledgement” requests PUBLISH Num. of “event publish” requests

INFO Num. of “information update” requests REFER Num. of “call transfer” requests MESSAGE Num. of “instant message” requests UPDATE Num. of “session state update” requests

SIP Responses 100 Num. of trying responses

180 Num. of “ringing” responses 183 Num. of “session progress” responses 200 Num. of “success” responses 400 Num. of “bad request” errors 401 Num. of “unauthorized” errors 403 Num. of “forbidden” errors 404 Num. of “not found” errors 405 Num. of “not allowed” errors 481 Num. of “dialog does not exist” errors

486 Num. of “busy” errors

487 Num. of “request terminated” errors 500 Num. of “server internal” errors 603 Num. of “decline” errors Resource Usage TOT_CPU Percentage of total CPU usage

TOT_MEM Percentage of total virtual memory usage Asterisk Stats A_CPU Percentage of CPU used by Asterisk

MEM Percentage of physical memory utilized by Asterisk FH Num. of Asterisk ﬁle descriptors

THREADS Num. of Asterisk threads TCP_CONN Num. of Asterisk TCP connections UDP_CONN Num. of Asterisk UDP connections Asterisk Logs A_WARNING Num. of Asterisk “warning” log messages

NOTICE Num. of Asterisk “notice” log messages VERBOSE Num. of Asterisk “verbose” log messages ERROR Num. of Asterisk “error” log messages DEBUG Num. of Asterisk “debug” log messages

Table 4 Gridsearchspace.

Parameter Search values α 1,10,100

a 1,10,100

π 10⁻²,10⁻⁴,10⁻⁸

Gammapriors.We alsosetthescaleparameterofGammapriors, b=^1.^The^search^space^is^givenⁱⁿ^{Table 4.}

The conﬁgurationswith thebest averageF-scores on 4differ- enttracesafterthegridsearch arereportedinTables 5, 6 and 7.

Table 5 presentsthe results when change point probabilities are computedonline (using only forward recursion).In order to test the conjecture that deferred change point decisions should yield better performance, we run onlinesmoothing algorithm (see Ta- ble 6).Sincethegridsearchmaynotbefeasibleforbiggerdimen- sionalvectors, wealsodevelopamaximumlikelihoodschemefor estimatingthehyper-parameters.Tothiseffect,weemploytheEM algorithmdescribedinSection 4.6forthecaseofmodelsthathas attainedthehighestF-scoresaccordingtothegridsearch.There- sultsaregiveninTable 7.

From Tables 5–7 one can observe that SIPRequests contribute to the featureset for all cases,hence they seem to be the most importantfeatures collectedfromdata.Second inimportance,the

ResourceUsage features help improving the accuracy of our system. We also notice that the Dirichlet-Multinomial (DM) model usuallygivesbetteraccuracythanthePoisson-Gammamodel.Fur- thermore, asaverage F-scores are Table 6 are greater than those inTable 5,wededucethattheaccuracyofchangeestimationsin- creases providedwe allow deferredchangepoint decisionwitha lagvalueofL=^{5 seconds.}^Increasing^the^lag^further^enhances^the resultsonlyveryslightlywhereasthecostoflatencyintheattack signalmaybecomeprohibitive.

We observe that the maximization of the hyper-parameters withrespect to their likelihood undertheproposed modelsdoes not necessarily maximize the F-scores; in other words, the F- scoresobtainedafterthemaximumlikelihoodestimationofhyper- parametervaluesarebelowthescoresobtainedbythegridsearch.

Weconjecturethatthismaybeduetothemismatchbetweenthe model andthe actual data, andwill be the subject offuture research.

6.1. Comparisontoadistancebasedmethod

We employed a simple distancebased method for classifying thenormalandattacktrafficinourdataset,basedontheprevious works[34]and[35].Inthismethod,weusetheHellingerdistance betweena normaltraffic feature vector p,which islearnedfrom the data,andtraffic vectorsqt, collected ateach time instancet.

(9)

Fig. 4. Histogram of features from the HIGH–HIGH data set for 200 seconds.

Table 5

Bestmodelsafterthegridsearchwithscorescollectedbyﬁltering.

Features Traﬃc F-score

(Avg.) SIP

Requests SIP Responses

Resource Usage

Asterisk Stats

Asterisk Logs

Low–Low Low–High High–Low High–High

P R P R P R P R

DM DM 0.86 0.83 0.93 0.95 0.85 0.94 0.99 0.94 0.91

PG DM 0.94 0.88 0.93 0.95 0.88 0.71 0.95 1.00 0.90

PG 0.91 0.88 0.92 0.95 0.88 0.72 0.95 1.00 0.90

PG DM 0.89 0.86 0.90 0.95 0.89 0.71 0.98 0.98 0.89

PG DM 0.92 0.83 0.89 0.95 0.90 0.66 0.96 1.00 0.88

PG DM DM 0.92 0.82 0.94 0.93 0.90 0.65 0.95 1.00 0.88

DM 0.90 0.77 0.91 0.89 0.82 0.82 0.96 0.94 0.88

DM DM 0.94 0.74 0.91 0.96 0.89 0.64 0.98 0.98 0.87

DM DM DM 0.76 0.88 0.94 0.89 0.85 0.80 0.88 0.98 0.87

DM DM 0.81 0.83 0.84 0.94 0.86 0.82 0.90 0.95 0.87

Average 0.89 0.83 0.91 0.94 0.87 0.75 0.95 0.98 0.89

A Bayesian change point model for detecting SIP-based DDoS attacks Digital Signal Processing

Digital Signal Processing