Fast and flexible packed string matching

(1)

Contents lists available atScienceDirect

Journal

of

Discrete

Algorithms

www.elsevier.com/locate/jda

Fast

and

ﬂexible

packed

string

matching

✩

Simone Faro

a

,

∗

,

M. O˘guzhan Külekci

b

a_Dipartimento_di_Matematica_e_Informatica,_Università_di_Catania,_Italy b_˙Istanbul_Medipol_University,_Faculty_of_Engineering_and_Natural_Sciences,_Turkey

a

r

t

i

c

l

e

i

n

f

o

a

b

s

t

r

a

c

t

Articlehistory:

Available online 24 July 2014 Keywords:

Exact string matching Text algorithms Experimental algorithms Online searching Information retrieval

Searching for all occurrences of a pattern in a text is a fundamental problem in computersciencewithapplicationsinmanyotherﬁelds,likenaturallanguageprocessing, informationretrieval andcomputationalbiology.In thelasttwodecadesageneraltrend has appeared trying to exploit the power of the word RAM model to speed-up the performancesofclassicalstringmatchingalgorithms.In thismodelanalgorithmoperates onwordsoflength w,groupingblocksofcharacters,andarithmeticandlogicoperations onthewordstakeoneunitoftime.

Inthispaperweusespecializedword-sizepackedstringmatchinginstructions,basedon theIntelstreamingSIMDextensions(SSE)technology,to designaveryfaststringmatching algorithm.We evaluateoursolutionintermsofeﬃciency,stabilityandﬂexibility,where weproposetousethedeviationinrunningtimeofanalgorithmondistinctequallength patternsasameasureofstability.

Fromourexperimentalresults itturns outthat,despite theirquadraticworstcase time complexity,thenewpresentedalgorithmbecomestheclearwinnerontheaverageinmany cases,whencomparedagainstthemostrecentandeffectivealgorithmsknowninliterature.

1. Introduction

Givenatextt oflengthn andapatternp oflengthm oversomealphabet

Σ

ofsize

σ

,theexactstringmatchingproblem consistsin ﬁnding all occurrencesofthe pattern p in t. Thisproblemhas beenextensively studiedin computer science becauseof its directapplicationto manyareas.Moreover, stringmatchingalgorithms are thebasic componentsinmany softwareapplicationsandplayanimportantroleintheoreticalcomputersciencebyprovidingchallengingproblems.

Inacomputational modelwherethematchingalgorithmisrestrictedtoread allthecharactersofthetextone byone the optimalcomplexity is

O(

n

)

, andwas achievedthe ﬁrst time by the well known Knuth–Morris–Pratt algorithm [26] (KMP). However,in many practicalcasesit ispossibleto avoidreadingall thecharactersofthetext achievingsub-linear performancesonthe average.Theoptimalaverage

O(

n logσm

m

)

time complexity [35]was reachedfortheﬁrsttime bythe

Backward-DAWG-Matchingalgorithm[11] (BDM).However,allalgorithms withasub-linear averagebehaviormayhaveto read allthetext charactersintheworst case.It is interesting tonote thatmanyofthosealgorithms havean evenworse

O(

nm

)

-timecomplexityintheworst-case[10,19,22].

✩ _{A preliminary version of the results presented in this paper has been previously published in}_[15].

*

Corresponding author.

E-mailaddresses:faro@dmi.unict.it(S. Faro), okulekci@medipol.edu.tr(M.O. Külekci). http://dx.doi.org/10.1016/j.jda.2014.07.003

(2)

Inthelasttwodecadesalotofwork hasbeenmadeinordertoexploitthepowerofthewordRAM modelof compu-tationtospeed-upclassicalstringmatchingalgorithms. In thismodel,thecomputeroperatesonwords oflength w, thus blocksofcharactersarereadandprocessedatonce.Thismeansthatusualarithmeticandlogicoperationsonthewordsall takeoneunitoftime.

MostofthesolutionswhichexploitthewordRAMmodelarebasedoneitherthebit-parallelism techniqueorthepacked

stringmatching technique.

The bit-parallelism technique [1]takes advantage ofthe intrinsicparallelism of the bit operations inside a computer

word,allowing to cutdown thenumber ofoperationsthatan algorithm performsby a factorup to w.Bit-parallelism is particularlysuitable fortheefficientsimulationofnondeterministicautomaton.TheShift-Or[1](SO)algorithmisthefirst of thisgenre,whichsimulatesefficientlythe nondeterministicversion oftheKMPautomaton andrunsin

O(

n

m_w

)

.It is still consideredamongthebestpracticalalgorithmsinthecaseofveryshortpatternsandsmallalphabets[22,19].Latera very fastBDM-likealgorithm (BNDM),based onthebit-parallel simulationofthe nondeterministicsuﬃx automaton,was presented in[31].Some variantsoftheBNDMalgorithm [16,18,12,32]are amongthemostpracticaleﬃcientsolutions in literature(see[22,19]).However,thebit-parallelencodingrequiresonebitperpatternsymbol,foratotalof

m_w

computer words.Thus,as longasapatternﬁtsinasinglecomputerword,bit-parallelalgorithmsareextremelyfast,otherwisetheir performancesdegradeconsiderablyas

m_w

grows.Thoughthereareafewtechniquestomaintaingoodperformancesinthe caseoflongpatterns[28,13,8,9],suchlimitationisintrinsic.

Inthepackedstringmatching techniquemultiplecharactersarepackedintoonelargerword,so thatthecharacterscanbe

comparedinbulkratherthanindividually.In thiscontext,if thecharactersofastringaredrawnfromanalphabetofsize

σ

, then

_logw_σ

differentcharactersﬁtinasingleword,using

log

σ

bitspercharacters.Thepackingfactoris

α

=

w

logσ

.1 A ﬁrsttheoretical resultinpacked stringmatchingwas proposed by Fredriksson [23].He presenteda generalscheme that can be applied to speed-up many patternmatching algorithms. His approach relies on the use of the fourRussian technique(i.e.tabulation),achievinginfavorablecasesan

O(

nεm

)

-spaceand

O(

_{m log}n _σ

+

nεm

+

occ

)

-timecomplexity,where

ε

>

0 denotes an arbitrarysmall constant, andocc denotes the numberof occurrences of p in t.Bille [5] presentedan alternative solution with

O(

_logn

σn

+

m

+

occ

)

-time and

O(

nε

+

m

)

-spacecomplexities by an eﬃcient segmentation and coding oftheKMPautomaton.Belazzougui [2]proposeda packedstringmatchingalgorithmwhichworksin

O(

_mn

+

_αn

+

m

+

occ

)

time and

O(

m

)

space, reaching theoptimal

O(

_αn

+

occ

)

-time boundfor

α

≤

m

≤

n_{α .}Morerecently,Belazzougui andRaﬃnot[3]introducedanaverage-optimaltimestringmatchingalgorithmforpackedstrings,whichachieves

O(

n

/

m

)

querytime.However,noneoftheseresultsleadstopracticalalgorithms.

TheﬁrstalgorithmthatachievesgoodpracticalandtheoreticalresultswasveryrecentlyproposedbyBen-Kikiet al.[4]. The algorithm is basedon two specializedpacked stringinstructions, the pcmpestrm and the pcmpestri instructions [29], andreachestheoptimal

O(

_αn

+

occ

)

-time complexityrequiringonly

O(

1

)

extraspace.Moreover theauthorsshowedthat their algorithmturns out tobe amongthe fasteststringmatching solutions inthecase ofvery shortpatterns.However, it hastobenoticedthat onthefamilyofIntelSandyBridgeprocessors,whichweconsiderasthebenchmarkplatformfor the implementationsthroughoutthestudy, pcmpestrm and pcmpestri have2-cycle throughputand7- and8-cycle latency, respectively[29].

When thelengthofthesearchedpatternincreases, anotheralgorithmnamedStreamingSIMDExtensions Filter(SSEF), presentedbyKülekciin[27](andextendedtomultiplepatternmatchingin[14]),exploitstheadvantagesoftheword-RAM model.Speciﬁcallyitusesaﬁltermethodthatinspectsblocksofcharactersinsteadofreadingthemonebyone.Despiteits

O(

nm

)

worstcasetimecomplexity,theSSEFalgorithmturnsouttobeamongthefastestsolutionswhensearchingforlong patterns[22,19].

Eﬃcient solutionshavebeenalsodesignedforsearchingonpackedDNAsequences [33,17].However inthispaperwe donottakeintoaccountthistypeofsolutionssincetheyrequireadifferenttypeofdatarepresentation.

StreamingSIMDtechnologyofferssingle-instructionstoperformavarietyoftestsonpackedstrings.Unfortunatelythose instructionsareheavierthanotherinstructionsprovidedinthesamefamilyasaconsequenceoftheirrelativelyhigh laten-cies.Hence,in thispaperwefocusondesignofalgorithmsusinginstructionswithlowlatencyandhighthroughput,when comparedwiththoseusedin[4].

Speciﬁcallyweintroduceanewpracticalandeﬃcientalgorithmfortheexactpackedstringmatchingproblemthatturns outtobefasterthanthebestalgorithmsknowninliteratureinmostpracticalcases[15].

Thenewlypresentedalgorithm,namedExactPackedStringMatching(EPSM),is basedonfourdifferentsearchprocedures used for,respectively,very shortpatterns(0

<

m

<

α₂), shortpatterns( α₂

≤

m

<

α

), medium lengthpatterns(m

≥

α

)and long patterns(m

≥

w).Allsearch procedures havean

O(

nm

)

worst casetime complexity.However, they havevery good performances on average.In the caseof very short patterns, i.e. when m

≤

α₂, the ﬁrst two search procedures achieve, respectively,an

O(

n

+

occ

)

andanoptimal

O(

n_α

+

occ

)

-timecomplexity.

Thepaperisorganizedasfollows.In Section2,we introducesomenotionsandterminologies,whileinSection3we de-scribethemodelofcomputationsweassumefordescribingoursolutions.We thenpresentanewalgorithmforthepacked

1 _{However, it is noteworthy that in practice supporting varying packing factors seems not very possible in todays SIMD technologies such as the Intel’s} SSE instruction set. The practical implementations assume the ASCII alphabet with size 8-bits per symbol and the packing factor used is 16 (32) symbols per block in 128-bits (256-bits) SIMD technologies.

(3)

Fig. 1. An example of the application of wscmp(a,b), assuming w=48,γ=4 andα=12.

stringmatchingprobleminSection4andreportexperimentalresultsundervariousconditionsinSection5.Conclusionsare giveninSection6.

2. Notionsandterminology

Throughout the paper we will make use of the following notations and terminology. A string p of length m

>

0 is represented as a ﬁnite array p

[

0

..

m

−

1

]

of characters from a ﬁnite alphabet

Σ

of size

σ

. Thus, p

[

i

]

will denote the

(

i

+

1

)

-stcharacterof p,for0

≤

i

<

m,andp

[

i

..

j

]

willdenotethefactor (or substring)of p containedbetweenthe

(

i

+

1

)

-st andthe

(

j

+

1

)

-stcharactersof p,for0

≤

i

≤

j

<

m.In somecaseswewilldenotebypithe

(

i

+

1

)

-stcharacterofp,so that

pi

=

p

[

i

]

andp

=

p0p1

. . .

pm−1.

Weindicate withsymbol w thenumberofbitsina computerwordandwithsymbol

γ

=

log

σ

thenumberofbits usedforencodingasingle characterofthealphabet

Σ

.Thenumberofcharactersofthealphabetthatﬁtinasingleword isshownby

α

=

w

/

γ

.Withoutlossingeneralitywewillassumealongthepaperthat

γ

divides w andthat

α

isaneven value.

Inchunksof

α

characters,thestring p isrepresentedbyanarray P

[

0

..

k

−

1

]

oflengthk

= (

m

−

1

)/

α

+

1.In particular wedenote P

=

P0P1P2

. . .

Pk−1,wherePi

=

piα piα+1piα+2

. . .

piα+α−1,for0

≤

i

<

k.Thelastblock Pk−1 isnotcompleteif

mmod

α

=

0.In thatcase,therightmostremainingcharactersoftheblockaresettozero.

Althoughdifferentvaluesof

α

and

γ

are possible,in mostcasesweassumethat

α

=

16 and

γ

=

8,whichisthemost commoncase whenworkingwith charactersinASCII codeand ina wordRAM model with128-bit registers,which are almostallavailableinrecentcommodityprocessorssupportingsingleinstructionmultipledata(SIMD)operations.

Finally, we recall the notation of some bitwise inﬁx operators on computer words, namely the bitwise

and

“&”,the bitwise

or

“

|

”andthe

left

shift

“

”operator(whichshiftstotheleftitsﬁrstargumentbyanumberofbitsequalto itssecondargument).

3. Themodel

In thedesign ofour algorithms we usespecialized word-sizepacked string matchinginstructions, based onthe Intel streaming SIMD extensions(SSE) technology. SIMDinstructions existin manyrecent microprocessors supporting parallel executionofsomeoperationsonmultipledatasimultaneouslyviaasetofspecialinstructionsworkingonlimitednumber ofspecialregisters.

Althoughtheusage ofSIMDisexplored deeplyinmultimediaprocessing, implementationofencryption/decryption al-gorithms,andonsomescientiﬁccalculations,it hasnotbeenmuchaddressedinpatternmatching.

Inthe design ofouralgorithms we make useofthe followingspecialized word-sizepackedinstructions. For each in-structionwedescribehowitcouldbeemulatedbyusingSSEspecializedintrinsics.

3.1. wscmp

(

a

,

b

)

(word-sizecompareinstruction)

The wscmp instructioncomparestwo w-bitwords,handledasablockof

α

characters. In particularifa

=

a0a1

. . .

aα−1 andb

=

b0b1

. . .

bα−1arethetwow-bitintegerparameters, wscmp

(

a

,

b

)

returnsan

α

-bitvaluer

=

r0r1

. . .

rα−1,whereri

=

1

ifandonlyifai

=

bi,andri

=

0 otherwise.Fig. 1showsan exampleoftheapplicationof wscmp

(

a

,

b

)

,assuming w

=

48,

γ

=

4 and

α

=

12.

The wscmp specializedinstructioncanbeemulatedinconstanttimebyusingthefollowingsequenceofspecializedSIMD instructions

h

←

_mm_cmpeq _epi8

(

a

,

b

)

r

←

_mm _movemask_epi8

(

h

)

Speciﬁcallythe _mm_cmpeq_epi8 instruction compares two 128-bitwords,handled asablock of sixteen8-bit values, andreturnsa128-bitvalueh

=

h0h1

. . .

h15,wherehi

=

18 ifandonlyifai

=

bi,andhi

=

08 otherwise.It hasa0

.

5-cycle

throughputanda1-cyclelatency.

The _mm_movemask_epi8 instructiongetsa128-bit parameterh,handledassixteen8-bitintegers,andcreatesa16-bit maskfromthemostsigniﬁcantbitsofthe16 integersin h,andzeroextendstheupperbits.

(4)

Fig. 2. An example of the application of wsmatch(a,b), assuming w=48,γ=4,α=12 and k=3.

Fig. 3. An example of the application of wsblend(a,b), assuming w=48,γ=4 andα=12.

3.2. wsmatch

(

a

,

b

)

(word-sizematchinginstruction)

The wsmatch instruction reports all occurrences ofa shortstring b in a w-bit parametera, handledasa string of

α

characters.Theparameterb isastringoflengthk

≤

α

.

Speciﬁcally, if a

=

a0a1

. . .

aα−1,andb

=

b0b1

. . .

bk−1,thenthe wsmatch

(

a

,

b

)

instructionreturnsan

α

-bitintegervalue,

r

=

r0r1

. . .

rα−1, whereri

=

1 ifandonly ifai+j

=

bj for j

=

0

. . .

k

−

1, i.e.an occurrenceofb in a beginsatposition i.

Noticethatri

=

0 for

α

−

k

<

i

<

α

,sincenooccurrenceofb ina couldbeginatapositiongreaterthan

α

−

k.Fig. 2shows

anexampleoftheapplicationof wsmatch

(

a

,

b

)

,assumingw

=

48,

γ

=

4,

α

=

12 andk

=

3.

The wsmatch

(

a

,

b

)

instruction can be emulated inconstant time by using the followingsequence ofSIMD specialized instructions

h

←

_mm_mpsadbw _epu8

(

a

,

b

)

←

_mm _cmpeq_epi8

(

h

,

z

)

r

←

_mm_movemask_epi8

()

wherez isa128-bitregisterwithallbitssetto0,i.e.z

=

0128_.

Speciﬁcally the _mm_mpsadbw_epu8

(

a

,

b

)

instruction getstwo 128-bitwords, handledasa block ofsixteen8-bit val-ues, and returns a 128-bit value r

=

r0r1

. . .

r7 (handled as a block of eight 16-bit values), where ri is computed as

ri

=

3

j=0

|

ai+j

−

bj

|

fori

=

0

. . .

7.

Thuswehavethatri

=

016ifandonlyifai+j

=

bjfor j

=

0

. . .

3,i.e.anoccurrenceofthepreﬁxofb withlength4 begins

ina atposition i.The _mm_mpsadbw_epu8 instructionhas1-cyclethroughputanda4-cyclelatency.The _mm_cmpeq_epi8 and _mm_movemask_epi8 instructionshavebeendescribedabove.

3.3. wsblend

(

a

,

b

)

(word-sizeblendinstruction)

The wsblend instruction blends two w-bit parameters, handled as two blocks of

α

characters. Speciﬁcally if a

=

a0a1

. . .

aα−1 and b

=

b0b1

. . .

bα−1, the instruction returns a w-bit integer r

=

r0r1

. . .

rα−1, where ri

=

ai+α/2, if 0

≤

i

<

α

/

2, andri

=

bi−α/2 if

α

/

2

≤

i

<

α

,i.e. r

=

aα

2aα2+1

. . .

aα−1b0b1

. . .

bα2−1.Fig. 3 showsan example of the

ap-plicationof wsblend

(

a

,

b

)

,assuming w

=

48,

γ

=

4 and

α

=

12.

The wsblend

(

a

,

b

)

instruction can be emulated inconstant time by using the followingsequence of SIMD specialized instructions

h

←

_mm_blend _epi16

(

a

,

b

,

c

)

SHUFFLE

←

_MM_SHUFFLE

(

1

,

0

,

3

,

2

)

r

←

_mm_shuffle_epi32

(

h

,

SHUFFLE

)

Suchinstructionblendstwo128-bitintegers,a

=

a0a1

. . .

a7andb

=

b0b1

. . .

b7,handledaspacked16-bitintegers,according toathirdparameterc.In particularitreturnsa128-bitintegerr

=

r0r1

. . .

r7 whereri

=

ai ifci

=

0,andri

=

bi otherwise.

If wesetc

=

164₀64_we_get_r

₌

_b

0b1b2b3a4a5a6a7.The _mm_blend_epi16 instructionhas0.5-cyclethroughputanda1-cycle latency.

The _mm_shuffle_epi32 instruction shuﬄesa w-bitparameter,a

=

a0a1a2a3,handledasfour32-bitvalues,accordingto the order ofthe _MM_SHUFFLE macro. In this case we get r

=

a2a3a0a1. The _mm_shuffle_epi32 instruction has1-cycle throughputanda1-cyclelatency.

(5)

3.4. wscrc

(

a

)

(word-sizecyclicredundancycheck)

The wscrc instructioncomputesthe32-bitcyclicredundancychecksum (CRC)signature fora w-bitparameter.It is an error-detectingcodecommonlyusedindigitalnetworksandstoragedevicestodetect accidentalchangestorawdataand canalsobeusedasahashfunction.

The wscrc

(

a

)

instructioncanbeemulatedinconstanttimebyusingthefollowingSIMDspecializedinstruction

r

←

_mm_crc32 _u64

(

a

)

Speciﬁcallythe _mm_crc32_u64

(

a

)

instructioncomputesthe32 bitcyclicredundancy checkofa 64-bitblock according toa polynomial. Suchinstruction hasa 1-cyclethroughputanda 3-cyclelatency, thus providesa robustandfastwayof computinghashvalues.

3.5. Additionalspecializedinstructions

In addition tothe above listed instructions, givenan

α

-bit register r, in ourdescription we make use ofthe symbol

{

r

}

to indicatethe set ofbits inr whose value is set.More formally,given an

α

-bitregister r

=

r0r1r2

. . .

rα−1, we have

{

r

} = {

i

|

0

≤

i

<

α

and ri

=

1

}

.Moreover,givenavalues

∈ N

,we useforsimplicitytheexpressions

+ {

r

}

toindicatetheset

ofvalues

{

s

+

i

|

i

∈ {

r

}}

.

Thecardinalityoftheset

{

r

}

canbecomputedinconstanttimebyusingtheSIMDspecializedinstruction

n

←

_mm_popcnt _u32

(

r

)

whichcalculates the numberofbits oftheparameter r thatare setto 1.Such instruction has 1-cyclethroughputanda 3-cyclelatency.

Differentlythelistofvaluesin

{

r

}

canbeeﬃcientlylistedin

O(

α

)

-timeand

O(

1

)

-space,or usingatabulationapproach, in

O(|{

r

}|)

-time and

O(

2α

)

-space. In the latter casewe need an

O(

α

2α

)

-time preprocessingphase inorder to address the 2α possibleregisters.

4. Anewpackedstringmatchingalgorithm

In this section we present the new packed string matching algorithm, named Exact Packed String Matching (EPSM) algorithm.EPSMisbasedonthreedifferentauxiliary algorithms,whichwenameEPSMa, EPSMb andEPSMc,respectively. TheEPSMa,EPSMb andEPSMc procedureshavebeenpreviouslydescribedinapreliminaryresultpresentedin[14].

Theﬁrsttwoauxiliaryalgorithms,EPSMa andEPSMb,aredesignedtosearchforpatternsoflength,atmost,

α

/

2.When thelength ofthepatternislongerthan

α

/

2 thealgorithms adoptaﬁltermechanism:they ﬁrstsearch forasubstringof thepatternoflength

α

/

2 and, whenacandidateoccurrencehasbeenfound,a naivecheckfollows.TheEPSMc algorithm adoptsaﬁlteringbasedsolutionandhasbeendesignedforsearchingmediumlengthandlongpatterns.

All three algorithms run in

O(

nm

)

worst case time complexity anduse, respectively,

O(

min

{

m

,

α})

,

O(

1

)

and

O(

2k

)

additionalspace, wherek is a constantparameter. However, whenm

≤

α

/

2 theEPSMa andEPSMb algorithmsreach, re-spectively,an

O(

m

α

+

mn_α

+

occ

)

and

O(

n_α

+

occ

)

timecomplexity.

TheEPSMa procedureisdesignedtobe extremelyfastinthecaseofveryshortpatterns,i.e.whenm

≤

α₂,theEPSMb procedureturnsouttobeagoodchoicewhen α₂

≤

m

<

α

,whileEPSMc turnsouttobeeffectivewhen

α

≤

m

<

w.

InpracticalcaseswetunedtheEPSMalgorithminordertorunEPSMa when0

<

m

<

4,EPSMb when4

≤

m

<

16,EPSMc whenm

≥

16.ThepseudocodeofthethreealgorithmsisshowninFig. 4.

4.1. EPSMa:searchingforveryshortpatterns

TheEPSMa algorithmisdesignedtobeextremelyfastinthecaseofveryshortpatternsandalthoughitcouldbeadapted toworkforlongerpatternsitsperformancedegradesasthelengthofthepatternsincreases.

The main idea in EPSMa algorithmis to markthe positions of the very short pattern’s symbolson the investigated text chunk.Assumewehavem

α

-bitslongbitmaps, wherethebitsofthe ithbitmapareset to1 atthepositionsofthe appearancesofthecorrespondingsymbol pi,andto0 elsewhere.Forinstance,if P

=

ab,theﬁrstbitmap willindicatethe

positionsthatlettera isobserved,andthesecondonewilldothesameforletter b.If ab appearsonthecurrentblocksuch thattiti+1

=

ab,for0

≤

i

< (

α

−

1

)

,thenthepositioni ontheﬁrstbitmapandpositioni

+

1 onthesecondbitmapshould be setto 1. Thus,thebitwise and betweentheone bitleft-shifted second bitmapandtheﬁrst bitmapshould report a1 atposition i.Carefulreaderswillquicklyrealizethat,theoccurrence ofthereversepatternba willalsoproducea1 atith position.To avoidthiserror,we followasequentialproceduresuchthatateachstepweperformtheand operationbetween thepreviousbitmaskandthenewly computedbitmapthat marksthepositions ofthecurrentpatternsymbol.Noticethat initiallythebitmaskissettoall1s.ThedetailsoftheEPSMa isasfollows.

The preprocessing of the algorithm (lines 1–4) is computed on the preﬁx of the patternof length m

=

min

{

m

,

α₂

}

.

(6)

EPSMa(p,m,t,n) 1. m ←min{m,α/2} 2. fori←0 to(m −1)do 3. forj←0 toα−1 do 4. Bi[j] ←p[i] 5. fori←0 to(n/α)−1 do 6. r←1α 7. forj←0 to m −1 do 8. sj←wscmp(Ti,Bj) 9. r←r &(sjj) 10. ifm=m

11. then report occurrences atiα+ {r} 12. else check positionsiα+ {r} 13. forj←0 to m−2 do 14. check position(i+1)α−j EPSMb(p,m,t,n) 1. m ←min{m,α/2} 2. p ←p[0..m −1] 3. fori←0 to(n/α)−1 do 4. r←wsmatch(Ti,p ) 5. ifr =0αthen 6. ifm=m

7. then report occurrences atiα+ {r} 8. else check positionsiα+ {r} 9. S←wsblend(Ti,Ti+1) 10. r←wsmatch(S,p ) 11. ifr =0αthen 12. ifm=m

13. then report occurrences atiα+α 2+ {r} 14. else check positionsiα+α

2+ {r} EPSMc(p,m,t,n) 1. mask←0α−k₁k 2. fori←0 to m−αdo 3. v←wscrc(p[i..i+α−1]) 4. v←v & mask 5. L[v] ←L[v] ∪ {i} 6. sh← (m/(α/2) −1) 7. fori←0 to(n/(α/2))−1 do 8. v←wscrc(Ti) 9. v←v & mask 10. for allj∈L[v]do 11. if0≤i−j<n−m 12. then check positioni−j 13. i←i+sh

EPSM(p,m,t,n)

1. ifm≤α/2 then return EPSMa(p,m,t,n) 2. ifm≤αthen return EPSMb(p,m,t,n) 3. return EPSMc(p,m,t,n)

Fig. 4. The EPSM algorithm and its EPSMa, the EPSMb and the EPSMc procedures.

occurrencesofthepreﬁxwithlengthm and,afteranoccurrencehasbeenfound,naivelycheckingthewholeoccurrenceof thepattern.

Speciﬁcally thepreprocessingphaseconsistsinconstructinganarray B ofm differentstringsoflength

α

.Eachstring ofthearray exactlyﬁtsina wordofw bits.The i-thstringinthearray B consistsof

α

copiesofthecharacter pi.More

formallythestringB

[

i

]

,for0

≤

i

<

m ,is deﬁnedasB

[

i

] = (

pi

)

α .

For instance, if p

=

ab is a pattern of length m

=

2,

γ

=

8 and w

=

128, then B consists of two strings of length

α

=

16,deﬁnedasB

[

0

] =

a16_and_B

_[

₁

_{] =}

_b16_._The_{preprocessing}_phase_of_the_algorithm_requires

_O(

_min

_{

_m

_,

α

2

}

α

)

-timeand

O(

min

{

m

,

α₂

})

-space.

Thesearchingphaseofthealgorithm(lines5–14)processesthetextt inchunksof

α

characters.LetN

=

_αn

−

1 andlet T

=

T0T1

. . .

TN bethestringt representedinchunksofcharacters.Eachblockofthetext,Ti,is comparedwiththestrings

inthearray B usingtheinstruction wscmp.

Let sj

=

b0b1

. . .

bα−1 bethe

α

-bitregisterreturnedbythe instruction wscmp

(

Ti

,

B

[

j

])

,for0

≤

j

<

m .It can beeasily

proved thatbk

=

1 ifandonlyifthek-th characteroftheblock Ti isequalto pj,i.e.ifandonlyif Ti

[

k

] =

pj (remember

that B

[

j

]

= (

pj

)

α ).Finallyletr

=

r0r1

. . .

rα−1bethe

α

-bitregisterdeﬁnedasr

=

s0 &

(

s1

1

)

&

(

s2

2

)

&

· · ·

&

(

sm −1

(7)

Itiseasytoprovethat p

[

0

..

m

−

1

]

hasanoccurrencebeginningatposition j ofTi ifandonlyifrj

=

1.In factrj

=

1

onlyifsk

[

j

+

k

] =

1,fork

=

0

. . .

m

−

1,whichimpliesthatTi

[

j

+

k

] =

pk,fork

=

0

. . .

m

−

1.

Then, if m

=

m thealgorithmreportsthe occurrencesofthepatternatpositions i

α

+ {

r

}

,if any.Otherwisewe know that occurrencesof thepreﬁx ofthe patternwithlength

α

/

2 begin atpositions i

α

+ {

r

}

.Thus thealgorithm checksthe occurrencesbeginningatthosepositions.

Ifwemaintain,foreachvalue r,with0

≤

r

<

2α ,a listofthevaluesintheset

{

r

}

,thenaivecheckoftheoccurrences canbedonein

O(|{

r

}|

m

)

-time.Whenm

=

m theoccurrencescanbereportedin

O(|{

r

}|)

-time.

Finally,observethatthem

−

1 possibleoccurrencescrossingtheblocksTiandTi+1arenaivelycheckedbythealgorithm (lines13–14).

Theoveralltime complexityoftheEPSMa algorithmis

O(

nm

)

,becauseintheworstcaseanaivecheck isrequiredfor eachpositionofthetext.However,whenm

≤

α₂ theEPSMa algorithmachievesan

O(

n

+

occ

)

time complexity,whereocc isthenumberofoccurrencesofthepattern p inthetextt.

4.2. EPSMb:searchingforshortpatterns

The EPSMb algorithm searches for the whole patternwhen its length is less or equal to

α

/

2 and works asa filter algorithmforlongerpatterns.However,it isbasedonamoreefficientfilteringtechnique andturnsouttobefasterinthe secondcase.

Ina chunkof

α

characters, theoccurrences ofthepatternareinvestigatedvia thesimplewsmatch function described above.SincethelengthofP islessthanorequalto

α

/

2,theappearancesbeginningintheﬁrsthalfoftheinvestigatedblock endinthesecondhalfordinarily,andneednofurtherprocessing.However,it ispossiblethatanoccurrencebeginninginthe secondhalfofthechunkmayextendtothenextchunk.Thus,insteadofscanninginchunksof

α

symbols,we traversethe text inchunks of

α

/

2 characters.We perform thewsblend operationtocreatean

α

-symbols longchunkbyconcatenating thesecondhalfofthecurrentchunkwiththeﬁrsthalfofthenextchunk,andcheckwhetheranoccurrence existsonthe boundaryofthetextblocks.TheformaldeﬁnitionoftheEPSMb isasfollows.

Letm betheminimumbetween

α

/

2 andm.Moreoverlet p bethepreﬁxofp oflengthm .Thesearchingphaseofthe algorithm(lines3–14)processesthetextt inchunksof

α

characters.

Let N

=

n_α

−

1 andlet T

=

T0T1

. . .

TN be the stringt representedinchunks ofcharacters. Eachblock ofthe text, Ti,

is searchedonebyoneforoccurrencesofthestring p usingtheinstruction wsmatch.

Speciﬁcally,letr

=

r0r1

. . .

rα−1bethe

α

-bitregisterreturnedbytheinstruction wsmatch

(

Ti

,

p

)

,for0

≤

j

<

m .We have

that rj

=

1 ifandonly ifan occurrenceof p beginsatpositions j oftheblock Ti,for0

≤

j

<

α

/

2.Then, if m

=

m (and

hence p

=

p ) thealgorithmsimplyreturnspositions i

α

+

j,suchthat rj

=

1.Otherwise,if m

<

m, thealgorithmnaively

checksforthewholeoccurrencesofthepatternstartingatpositionsi

α

+

j,suchthatrj

=

1.

Noticethat generallypackedstringmatchinginstructionsallow toreadonlyblocks Ti of

α

characters(128 bitsinthe

caseofSSEinstructions),whereTi

=

t

[

i

α

..(

i

+

1

)

α

−

1

]

.Occurrencesofthepatternbeginninginthesecondhalfoftheblock

Tiarecheckedseparately.In particularanewblock, S,obtainedbyapplyingtheinstruction wsblend

(

Ti

,

Ti+1

)

,is processed in a similar wayas block Ti. In this casewe report all occurrences ofthe pattern beginning atpositions i

α

+

α

/

2

+

j,

with0

≤

j

<

α

/

2.Onemayarguethatwhyblending isusedinsteadofsimplyshiftingthewindow.ThereasonistheSSE instructionsusedinthiscontext requiretheoperandstobe 16-bytealignedinmemory,wheretheperformance degrades signiﬁcantlyotherwise.Thus,blendingismoreadvantageous.

Theresultingalgorithmhasan

O(

nm

)

worst casetime complexityandrequires

O(

1

)

additionalspace. Whenm

≤

α

/

2 thealgorithmreachestheoptimal

O(

n

/

α

+

occ

)

worstcasetimecomplexity.

4.3. EPSMc:searchingforlongpatterns

The EPSMc algorithmisdesignedtobe fasterformedium andlongpatterns.It is basedona simpleﬁlteringmethod anduses a hash function for computingﬁngerprintvalues on blocks of

α

charactersin a similar wayasin Rabin–Karp algorithm [25]. The ﬁngerprintvalues arecomputed by usinga hash function h

: Σ

α

→ {

0

,

1

,

. . . ,

2k

−

1

}

, fora constant parameter k

≤

α

, that may vary according to the text or the pattern structures. In practical cases we chose a value of k

=

11,whichgaveusbestresultsduringthebenchmarks.

Thehashfunctionh usedforcomputingtheﬁngerprintvalueiscomputedinaveryfastwaybyusingthe wscrc special-izedinstruction,andinparticular

h

(

a

)

=

wscrc

(

a

)

& 0α−k1k

foreach A

∈ Σ

α ,andwherewerememberthat& isthebitwise and operation.

Duringthepreprocessingphase(lines 1–6)aﬁngerprintvalue ofk bitsiscomputedforallsubstringsofthepatternof length

α

.Thenatable L ofsize2k _is_computed_in_order_to_store_starting_positions_of_all_substrings_of_the_pattern,_indexed

bytheirﬁngerprintvalues.In particularwehave L

[

v

] =

i

h

p

[

i

..

i

+

α

−

1

]

=

v

(8)

LetN

=

_αn

−

1 andletT

=

T0T1

. . .

TN bethestringt representedinchunksofcharacters.

Duringthesearchingphase(lines7–13)theEPSMc algorithminspectstheblocksofthetextinstepsof

(

m

/(

α

/

2

)

−

1

)

positions.2 For each inspected block Ti the ﬁngerprintvalue h

(

Ti

)

iscomputed andall positions in theset

{

i

α

−

j

|

j

∈

L

[

h

(

Ti

)

]}

arenaivelychecked.

It iseasy to observethat theEPSMc algorithmhas an

O(

nm

)

worst casetime complexity.However, despite itsworst casetimebehavioritturnsouttobeveryeffectiveinpracticalcases.

5. Experimentalresults

Inthissectionwepresentexperimentalresultsinordertocomparetheperformancesofournewlypresentedalgorithms against the bestsolutionsknown inliterature inthecaseofshort patterns.We considerallthe fastestalgorithms inthe caseofshortpatternsaslistedina recentexperimental evaluationbyFaro andLecroq[22,19].In particularwe compared EPSMwiththefollowingalgorithms:

– theHashalgorithmusinggroupsofq characters[30](HASHq); – theExtendedBackwardOracleMatchingalgorithm[16,18](EBOM); – theFast-Searchalgorithmusingh slidingwindows[6,7,21](TVSBS); – theTVSBSalgorithmusingh slidingwindows[34,21](TVSBS); – theShift-Oralgorithm[1](SO);

– theShift-Oralgorithmwithq-grams[12](UFNDMq); – theFast-Average-Optimal-Shift-Oralgorithm[24](FAOSOq); – theq-gramﬁlteringalgorithm[13](QFqf );

– theForwardSimpliﬁedBNDMalgorithmusingq-gramsand f forwardcharacters[16,18,32](FSBNDMqf ); – theForwardSimpliﬁedBNDMalgorithmusingh slidingwindows[16,18,21](FSBNDM-Wh);

– thePackedSSE-FilteralgorithmusingSIMDinstructions[27](SSEF);

– thePackedCrochemore-PerrinalgorithmusingSIMDinstructions[4](SSECP).

We rememberthat the EPSMalgorithm consistsofthe EPSMa algorithm,whenm

<

4,ofthe EPSMb algorithm when 4

≤

m

≤

16,andoftheEPSMc algorithmwhenm

>

16.

In the caseofalgorithms making use ofq grams, the value ofq ranges inthe set

{

2

,

4

,

6

}

. Allalgorithms have been implementedintheCprogramminglanguageandhavebeentestedusingtheSmarttool[20]forexactstringmatching.The experiments wereexecuted locallyonamachine runningUbuntu 11.10(oneiric)withInteli7-2600processorwith16 GB memory. Algorithmshavebeencomparedintermsofrunningtimes,includinganypreprocessingtime.Fortheevaluation we useda genome sequence,a protein sequence anda naturallanguage text (Englishlanguage),all sequences of 4 MB. ThesequencesareprovidedbytheSmartresearchtool.Foreachinputﬁle,we havesearchedsetsof1000 patternsofﬁxed length m randomly extractedfromthetext, form ranging from2 to32 (shortpatterns). Then,the meanofthe running timeshasbeenreported.

Table 1,Table 2andTable 3 show theexperimental resultsobtainedforagnomesequence, a proteinsequence anda naturallanguagetext,respectively.

Inthecaseofalgorithmsusingq-gramswe havereportedonlythebestresultobtainedby itsvariants.Thevaluesofq whichobtainedthebestrunningtimesarereportedasapices.Runningtimesareexpressedinhundredthsofseconds,best resultshavebeenboldfacedandunderlined.

5.1. Eﬃciency

From experimental results it turnsout that the EPSMalgorithm hasmostly thebest performancesforshort patterns. When searching on a genome sequence it is second only to the BNDMq algorithm for 12

≤

m

≤

14 and to the SSECP algorithm when m

=

6. Observehoweverthat the EPSMalgorithm is(up to 2 times) fasterthan theSSECP algorithm in mostcases.

WhensearchingonanaturallanguagetexttheEPSMalgorithmobtainsinmostcasesthebestresults,andissecond to BNDMbasedalgorithmsonlyfor20

≤

m

≤

22.

ForincreasinglengthsofthepatterntheperformancesoftheEPSMalgorithmremainstable, underliningalineartrend onaverage.However,theperformancesofotheralgorithmsbasedonshiftheuristics,slightlyincrease.Thisismoreevident whensearchingonaproteinsequence,wherethealgorithmsbasedonbit-parallelismandq gramsturnouttobethefaster solutionsforlongerpatterns.However,in thislattercasestheEPSMalgorithmisalwaysveryclosethebestsolutions.

It isinteresting toobservethattheEPSMalgorithmisfasterthantheSSECPalgorithminalmostall cases,andthegap ismoreevidentinthecaseoflongerpatterns.In fact,despiteitsoptimalworstcasetimecomplexity,theSSECPalgorithm showsanincreasingtrendonaverage,whiletheEPSMalgorithmshowsalinearbehavior.

2 _{Actually, using}₍_α_/₂₎_{term instead of}_α_{directly stems from the limitation in practice that, the}_{crc value}

_{can be computed on 64 bits rather than 128}

bits in the current SSE instruction sets. Thus, any crc of

a block deﬁnes the

crc of

the largest possible initial portion of the block.

(9)

Table 1

Experimental results for searching short (on top) and long (on bottom) patterns on a genome sequence. Running times are expressed in hundredths of seconds, best results have been boldfaced and underlined.

m 2 4 6 8 12 16 20 24 HASHq – 10.87(3) _7.98(3) _7.40(3) _5.78(3) _4.57(3) _4.05(3) _3.70(5) EBOM 8.01 7.76 7.96 7.61 7.62 6.68 6.08 5.52 FS-Wh 12.73(2) ₉_.₇₀(2) _8.67(2) _8.33(4) _7.71(4) _7.84(4) _7.44v _7.61(4) TVSBS-Wh 11.93(2) ₉_.₇₂(2) _8.75(2) 8.34(2) 7.62(2) 7.81(2) 7.38(2) 7.54(2) SO 7.86 7.80 7.91 7.89 7.77 7.93 7.80 7.88 FAOSOq – 10.65(2) _8.19(2) _6.35(2) _5.55(2) _4.40(4) _3.65(4) _3.46(4) QFqs – 7.54(3,3) _6.12(4, 3) _5.04(4, 3) _3.11(4, 3) _2.65(4, 3) _2.42(4, 3) _2.22(6, 2) FSBNDMqf 10.38(2,0) ₇_.₆₁(4,2) _5.98(4, 1) _4.71(4, 1) _3.58(4, 1) _3.06(6, 2) _2.67(6, 2) _2.44(6, 2) UFNDMq 8.54(2) ₆_.₁₂(4) _4.71(4) _4.07(4) _3.30(6) _2.84(6) _2.55(6) _2.36(6) SSECP 2.65 2.87 3.17 3.60 6.53 5.96 5.80 5.72 EPSM 2.09(a) _2.27(b) _3.23(b) _3.25(b) _3.28(b) _2.39(b) _2.47(c) _1.91(c) m 32 64 128 256 512 1024 2048 4096 HASHq 3.10(5) _2.61(5) _2.15(5) _1.84(5) _1.66(5) _1.56(5) _1.54(5) _1.53(5) EBOM 4.85 3.48 2.56 2.13 1.99 2.17 2.81 4.26 FS-Wh 7.86(2) _7.37(2) _7.04(2) _6.11(2) _5.99(2) _5.32(2) _4.97(2) _4.61(2) TVSBS-Wh 7.15(2) 6.62(2) 6.69(2) 6.52(2) 6.52(2) 6.66(2) 6.59(2) 6.58(2) SO 7.80 6.68 6.77 6.70 6.71 6.54 6.49 6.62 FAOSOq 4.30(4) _4.33(4) _4.34(4) _4.31(4) _4.35(4) _4.32(4) _4.34(4) _4.33(4) QFqs 1.99(6, 2) _1.66(6, 2) _1.40(6, 2) _1.26(6, 2) _1.20(6, 2) _1.17(6, 2) _1.13(6, 2) _1.13(6, 2) FSBNDM-Wh 3.56(2) _3.55(2) _3.57(2) _3.55(2) _3.55(2) _3.56(2) _3.57(2) _3.55(2) FSBNDMqf 2.15(6, 1) _2.16(6, 1) _2.16(6, 1) _2.15(6, 1) _2.15(6, 1) _2.16(6, 1) _2.16(6, 1) _2.01(6, 2) UFNDMq 2.24(6) 2.24(6) 2.23(6) 2.23(6) 2.23(6) 2.23(6) 2.24(6) 2.24(6) SSEF 2.91 2.03 1.53 1.33 1.26 1.31 1.37 1.49 SSECP 5.52 5.32 5.20 5.18 5.17 5.10 5.20 5.26 EPSM 1.75(c) 1.46(c) 1.26(c) 1.21(c) 1.19(c) 1.21(c) 1.26(c) 1.43(c) Table 2

Experimental results for searching short (on top) and long (on bottom) patterns on a protein sequence. Running times are expressed in hundredths of seconds, best results have been boldfaced and underlined.

m 2 4 6 8 12 16 20 24 HASHq – 10.70(3) _7.86(3) _7.63(3) _5.30(3) _4.23(3) _3.62(3) _3.31(3) EBOM 6.54 3.58 2.83 2.62 2.33 2.20 2.11 2.03 FS-Wh 7.40(6) ₅ .07(6) _4.00(6) _3.42(6) _2.85(6) _2.60(6) _2.46(6) _2.40(6) TVSBS-Wh 7.45(4) ₆_.₁₆(6) _4.89(6) _4.24(6) _3.45(6) _2.56(6) _2.68(6) _2.50(6) SO 7.88 7.91 7.83 7.78 7.79 8.03 7.79 7.99 FAOSOq – 6.14(2) _5.50(2) _4.22(4) _3.41(4) _3.37(4) _2.77(6) _2.72(6) QFqs – 4.72(2,8) _3.25(2, 6) _2.96(3, 4) _2.49(3, 4) _2.18(3, 4) _2.00(3, 4) _1.89(3, 4) FSBNDM-Wh 8.66(8) ₅_.₅₂(4) _4.24(4) _3.61(4) _3.03(4) _2.74(4) _2.54(4) _2.40(4) FSBNDMqf 7.80(2, 1) ₄_.₅₃(2,0) _3.11(2, 0) _3.00(3, 1) _2.42(3, 1) _2.11(3, 1) _1.96(3, 1) _1.88(3, 1) UFNDMq 6.95(2) ₄_.₅₃(2) _3.55(2) _3.13(2) _2.63(2) _2.37(2) _2.18(4) _2.04(4) SSECP 2.67 2.87 3.17 3.62 3.97 3.70 3.55 3.47 EPSM 2.11(a) _1.95(b) _2.26(b) _2.25(b) _2.24(b) _2.37(b) _2.44(c) _1.91(c) m 32 64 128 256 512 1024 2048 4096 HASHq 2.91(3) _2.55(5) _2.06(5) _1.78(5) _1.59(5) _1.52(5) _1.52(5) _1.53(5) EBOM 1.93 1.77 1.65 1.61 1.65 1.94 2.67 4.19 FS-Wh 2.31(6) _2.17(6) _2.04(6) _1.98(6) _1.95(6) _1.94(6) _1.92(6) _1.92(6) TVSBS-Wh 2.27(6) _2.02(6) _1.71(6) _1.53(6) _1.44(6) _1.41(6) _1.39(6) _1.38(6) SO 7.90 7.62 6.72 6.74 6.36 6.75 6.68 6.77 FAOSOq 4.30(4) _4.27(4) _4.31(4) _4.33(4) _4.35(4) _4.29(4) _4.27(4) _4.33(4) QFqs 1.75(3, 4) _1.50(4, 3) _1.28(4, 3) _1.16(4, 3) _1.09(4, 3) _1.07(4, 3) _1.07(4, 3) _1.06(4, 3) FSBNDM-Wh 2.21(4) _2.22(4) _2.20v _2.22(4) _2.21(4) _2.21(4) _2.22(4) _2.21(4) FSBNDMqf 1.74(3, 1) _1.74(3, 1) _1.74(3, 1) _1.74(3, 1) _1.74(3, 1) _1.74(3, 1) _1.75(3, 1) _1.74(3, 1) UFNDMq 1.94(4) _1.94(4) _1.95(4) _1.95(4) _1.95(4) _1.95(4) _1.95(4) _1.95(4) SSEF 2.90 2.02 1.57 1.35 1.29 1.30 1.36 1.50 SSECP 3.35 3.28 3.24 3.21 3.20 3.22 3.24 3.27 EPSM 1.73(c) _1.45(c) _1.24(c) _1.18(c) _1.17(c) _1.19(c) _1.25(c) _1.41(c)

(10)

Table 3

Experimental results for searching short (on top) and long (on bottom) patterns on a natural language text (English). Running times are expressed in hundredths of seconds, best results have been boldfaced and underlined.

m 2 4 6 8 12 16 20 24 HASHq – 10.43(3) _7.79(3) _7.59(3) _5.31(3) _4.23(3) _3.67(3) _3.29(3) EBOM 7.14 4.42 3.76 3.45 3.25 3.08 2.98 2.87 FS-Wh 7.44(6) ₆_.₁₁(6) _5.02(6) _4.36(6) _3.48(6) _3.24(6) _3.02(6) _2.87(6) TVSBS-Wh 7.49(6) 6.42(6) _5.34(6) _4.74(6) _3.68(6) _3.25(6) _2.94(6) _2.74(6) SO 7.88 7.66 7.87 7.73 7.81 7.69 7.84 7.95 FAOSOq – 7.05(2) _5.75(2) _4.75(4) _3.49(4) _3.40(4) _2.82(6) _2.73(6) QFqs – 5.93(2,6) _4.38(2, 6) _3.67(3, 4) _2.85(4, 3) _2.41(4, 3) _2.20(4, 3) _2.08(4, 3) FSBNDM-Wh 8.53(1) ₆_.₆₅(2) _5.31(2) _4.56(4) _3.80(4) _3.39(4) _3.16(4) _2.93(4) FSBNDMqf 7.75(2, 0) ₅_.₇₇(2,0) _4.04(2, 0) _3.60(3, 1) _3.01(3, 1) _2.65(3, 1) _2.40(4, 1) _2.22(4, 1) UFNDMQ4 7.23(2) 5.12(2) _4.23(4) _3.54(4) _2.91(4) _2.55(4) _2.33(4) _2.18(4) SSECP 2.66 2.87 3.17 3.62 4.64 4.17 4.05 3.90 EPSM 2.11(a) _2.29(b) _2.58(b) _2.58(b) _2.57(b) _2.41(b) _2.48(c) _1.93(c) m 32 64 128 256 512 1024 2048 4096 HASHq 2.92(3) _2.65(5) _2.11(5) _1.80(3) _1.59(3) _1.47(3) _1.42(3) _1.40(3) EBOM 2.75 2.46 2.14 1.91 1.88 2.12 2.80 4.24 FS-Wh 2.68(6) 2.39(6) 2.05(6) 1.85(6) 1.71(6) 1.60(6) 1.55(6) 1.52(6) TVSBS-Wh 2.49(6) _2.23(6) _1.87(6) _1.64(6) _1.52(6) _1.43(6) _1.40(6) _1.38(6) SO 7.86 6.62 6.91 6.79 6.69 6.80 6.67 6.80 FAOSOq 4.27(4) _4.31(4) _4.35(4) _4.29(4) _4.31(4) _4.34(4) _4.32(4) _4.38(4) QFq,s 1.91(4, 3) _1.62(4, 3) _1.38(4, 3) _1.08(6, 2) _1.15(6, 2) _1.11(6, 2) _1.09(6, 2) _1.08(6, 2) FSBNDM-Wh 2.72(4) _2.72(4) _2.72(4) _2.73(4) _2.73(4) _2.74(4) _2.72(4) _2.74(4) FSBNDMqf 2.07(4, 1) 2.07(4, 1) 2.07(4, 1) 2.08(4, 1) 2.08(4, 1) 2.08(4, 1) 2.07(4, 1) 2.08(4, 1) UFNDMQ4 2.08 2.08 2.07 2.09 2.09 2.08 2.08 2.08 SSEF 2.89 2.00 1.48 1.30 1.26 1.32 1.36 1.50 SSECP 3.79 3.31 3.03 2.85 2.86 2.85 2.85 2.86 EPSM 1.76(c) _1.47(c) _1.26(c) _1.19(c) _1.18(c) _1.20(c) _1.26(c) _1.44(c) 5.2. Flexibility

Flexibilityisusedasanattributeofvarioustypesofsystems.In thefieldofstringmatching,it referstoalgorithmsthat can adapt whenchanges in theinput dataoccur. Thus astring matchingalgorithm can be consideredflexible when, for instance,it maintainsgoodperformancesforbothshortandlongpatterns,or inthecaseofbothsmallandlargealphabets. Moststringmatchingalgorithmsobtaingoodperformancesonlyinthecaseoflongpatternssacrificingtheirperformance for shortones. Thisisa commonbehavior, forinstance,forall algorithm whichmake useofa slidingwindow approach (Hashq, EBOM, FS-Wh and TVSBS-Wh).Suchapproachallowsthepatterntoslidealongthetextbyperformingsubsequent shifts.Eachshiftcanbeatmostaslongasthelengthofthepattern.It turnsoutthatstatisticallytheshiftincreaseswhen thelengthofthepatternincreases,or whenthesizeofthealphabetincreases.

Adecreasingtrendinrunningtimescanbeobserveralsointhecaseofsuffixautomatabasedalgorithms(FSBNDM-Wh, FSBNDMqf and QFqs).Althoughbit-parallelalgorithmsaredesignedtobeextremelyefficientinthecaseofshortpatterns, alsothisclassofalgorithmssuffersofalackinflexibility.

Onlypackedstringmatchingalgorithms turnout tohavegoodperformancesforshortpatterns.Thisisthecaseofthe SSECPalgorithmwhoseperformances,unfortunately,degradewhenthelengthofthepatternincreases.

Onthecontrary,theperformancesoftheEPSMalgorithmdonotdependonpatternlengthsandthusitistheonly algo-rithmwhichmaintainsverygoodperformancesforbothshortandlongpatterns.TheperformancesoftheEPSMalgorithm aremaintainedalsowhenthesizeofthealphabetdecreases.

Thus wecanstatethat theEPSMalgorithmisthemostﬂexiblealgorithmamongthebestsolutionsknowninliterature todate.

5.3. Stability

We evaluate the stability ofan algorithm asthe standard deviationof runningtimesobserved during the evaluation. Algorithmstabilityisanimportantfeatureinstringmatchingwhenrealtimeprocessingisneeded.Suchvalueshowshow much variation exists from the average,i.e.the mean ofthe running times.A low standard deviationindicates that the runningtimestendtobe veryclosetothemean,underlyinga highstability ofthe algorithm.Ontheother handanhigh standard deviation indicates that the running times are spread out over a large range of values, thus indicating a low stability.SeeTables 4–6.

(11)

Table 4

Values of standard deviation observed while searching short patterns on a genome sequence. Values are expressed in hundredths of seconds, best results have been boldfaced and underlined.

m 2 4 6 8 10 12 14 16 18 20 24 28 HASHq – 2.35 2.43 1.93 0.94 0.67 0.43 0.54 0.26 0.33 0.30 0.21 EBOM 2.58 2.65 2.55 2.43 2.14 1.67 1.59 1.27 0.93 0.87 0.79 0.68 FS-Wh 2.47 2.57 2.61 2.71 2.50 2.56 2.52 2.79 2.60 2.64 2.55 2.50 TVSBS-Wh 2.42 2.31 2.23 2.56 2.48 2.45 2.53 2.33 2.14 2.07 1.84 1.65 SO 2.47 2.52 2.42 2.50 2.46 2.48 2.51 2.46 2.49 2.49 2.52 2.47 FAOSOq – 2.48 2.38 1.10 0.58 0.71 0.74 0.60 0.72 0.67 0.63 0.18 QFqs – 2.45 1.01 0.66 0.34 0.14 0.14 0.14 0.14 0.09 0.12 0.08 FSBNDMqf 2.36 2.41 1.00 0.40 0.28 0.29 0.21 0.14 0.17 0.10 0.13 0.09 UFNDMq 2.55 0.86 0.57 0.27 0.22 0.11 0.11 0.14 0.11 0.11 0.14 0.11 SSECP 0.05 0.07 0.09 0.25 2.44 1.97 1.68 1.50 1.35 1.19 0.95 0.88 EPSM 0.09 0.11 0.34 0.36 0.34 0.33 0.33 0.11 0.10 0.07 0.10 0.07 Table 5

Values of standard deviation observed while searching short patterns on a protein sequence. Values are expressed in hundredths of seconds, best results have been boldfaced and underlined.

m 2 4 6 8 10 12 14 16 18 20 24 28 HASHq – 2.57 2.46 1.41 0.86 0.62 0.41 0.29 0.25 0.33 0.08 0.11 EBOM 1.34 0.18 0.39 0.13 0.11 0.15 0.11 0.09 0.12 0.09 0.11 0.07 FS-Wh 2.23 0.91 0.52 0.37 0.32 0.30 0.25 0.24 0.23 0.23 0.23 0.21 TVSBS-Wh 2.60 1.21 0.81 0.56 0.46 0.33 0.31 0.77 0.22 0.21 0.19 0.15 SO 2.52 2.52 2.49 2.44 2.50 2.44 2.52 2.45 2.43 2.41 2.44 2.49 FAOSOq – 1.13 2.58 0.44 0.40 0.24 0.17 0.18 0.29 0.13 0.14 0.09 QFqs – 1.47 0.21 0.08 0.09 0.10 0.09 0.09 0.09 0.09 0.08 0.10 FSBNDM-Wh – 1.30 0.65 0.38 0.26 0.22 0.17 0.16 0.14 0.14 0.11 0.13 FSBNDMqf 2.75 0.68 0.50 0.12 0.09 0.10 0.10 0.09 0.09 0.09 0.09 0.08 UFNDMq 1.33 0.40 0.33 0.15 0.20 0.16 0.11 0.11 0.11 0.10 0.09 0.08 SSECP 0.06 0.15 0.09 0.22 2.15 1.72 1.29 1.24 1.06 0.96 0.78 0.60 EPSM 0.11 0.53 0.10 0.10 0.10 0.13 0.10 0.09 0.08 0.08 0.07 0.06 Table 6

Values of standard deviation observed while searching short patterns on a natural language text (English). Values are expressed in hundredths of seconds, best results have been boldfaced and underlined.

m 2 4 6 8 10 12 14 16 18 20 24 28 HASHq – 2.16 2.54 1.48 0.89 0.63 0.49 0.29 0.24 0.16 0.18 0.13 EBOM 1.68 1.32 1.02 0.89 0.78 0.74 0.61 0.60 0.56 0.51 0.45 0.41 FS-Wh 3.16 2.36 1.58 1.27 1.05 0.92 0.82 0.77 0.71 0.65 0.59 0.52 TVSBS-Wh 3.08 2.63 1.94 1.50 1.16 0.96 0.82 0.78 0.68 0.60 0.50 0.43 SO 2.47 2.53 2.47 2.49 2.51 2.56 2.47 2.44 2.40 2.51 2.40 2.42 FAOSOq – 1.90 0.85 0.55 0.72 0.77 0.52 0.82 0.67 0.64 0.57 0.18 QFqs – 2.31 1.03 0.85 0.67 0.23 0.18 0.16 0.17 0.14 0.12 0.11 FSBNDMqf – 2.23 0.76 0.55 0.34 0.31 0.25 0.24 0.23 0.21 0.17 0.16 UFNDMq 2.13 0.94 0.36 0.30 0.12 0.08 0.10 0.15 0.10 0.12 0.11 0.10 SSECP 0.10 0.11 0.10 0.14 2.03 1.74 1.44 1.28 1.22 1.17 1.09 1.02 EPSM 0.11 0.13 0.82 0.84 0.79 0.78 0.77 0.12 0.08 0.12 0.07 0.10

Itturnsoutfromourobservationsthatalmostallalgorithmshavealowstability forshortpatternswhiletheirstability increaseswhenthelengthofthepatternincreases.Suchbehaviorbecomesmoreevidentforlargeralphabets.

Sometimesan opposite behavior canbe observedwhen searchingontexts overa smallalphabetlike DNAsequences. Thisisthecase, forinstance,of FS-Wh and TVSBS-Wh algorithms,whosestability decreasesforsmallalphabetswhenthe length of the pattern gets longer. Observe also that the SSECP algorithm showssuch behavior for both small andlarge alphabets.

6. Conclusions

WepresentedanewpackedexactstringmatchingalgorithmbasedontheIntelstreamingSIMDextensionstechnology. Thepresentedalgorithm,namedEPSM,is basedonthreeauxiliaryalgorithmswhichareusedwhen0

<

m

<

4,m

≥

4,and m

≥

16,respectively. Despitethe

O(

nm

)

-worst casetime complexity the resultingalgorithm turns out tobe very fastin

(12)

thecaseofveryshortpatterns.From ourexperimentalresultsitturnsout thattheEPSMalgorithmisingeneralthebest solutions whenm

≤

32.It couldbeinteresting toinvestigatethepossibilitytoimprovetheperformancesofpackedstring matchingalgorithmsbyintroducingshiftheuristics.

References

[1]R.Baeza-Yates,G.H.Gonnet,Anewapproachtotextsearching,Commun.ACM35 (10)(1992)74–82.

[2]D.Belazzougui,Worstcaseeﬃcientsingle andmultiplestringmatchingintheRAMmodel,in:Proceedingsofthe21stInternationalWorkshopon CombinatorialAlgorithms,IWOCA,2010,pp. 90–102.

[3]D.Belazzougui,M.Raﬃnot,Averageoptimalstringmatchinginpackedstrings,in:P.G.Spirakis,M.Serna(Eds.),Proceedingsofthe8thInternational ConferenceonAlgorithmsandComplexity,CIAC,in:LectureNotesinComputerScience,vol. 7878,Springer-Verlag,Berlin,Heidelberg,2013,pp. 37–48. [4]O.Ben-Kiki,P.Bille,D.Breslauer,L.G ˛asieniec,R.Grossi,O.Weimann,Optimalpackedstringmatching,in:IARCSAnnualConferenceonFoundations ofSoftwareTechnologyandTheoreticalComputerScience,FSTTCS2011,in:LeibnizInternationalProceedingsinInformatics(LIPIcs),vol. 13,Schloss Dagstuhl–Leibniz-ZentrumfürInformatik,2011,pp. 423–432.

[5]P.Bille,Fastsearchinginpackedstrings,J.DiscreteAlgorithms9 (1)(2011)49–56.

[6]D.Cantone,S.Faro,Fast-Search:aneweﬃcientvariantoftheBoyer–Moorestringmatchingalgorithm,in:ProceedingsoftheSecondInternational WorkshopExperimentalandEﬃcientAlgorithms,WEA,Ascona,Switzerland,in:LectureNotesinComputerScience,vol. 2647,Springer-Verlag,Berlin, 2003,pp. 247–258.

[7]D.Cantone,S.Faro,Fast-searchalgorithms:neweﬃcientvariantsoftheBoyer–Moorepattern-matchingalgorithm,J.Autom.Lang.Comb.10 (5/6) (2005)589–608.

[8]D.Cantone,S.Faro,E.Giaquinta,Acompactrepresentationofnondeterministic(suﬃx)automataforthebit-parallelapproach,in:CombinatorialPattern Matching,2010,pp. 288–298.

[9]D.Cantone,S.Faro,E.Giaquinta,Acompactrepresentationofnondeterministic(suﬃx)automataforthebit-parallelapproach,Inf.Comput.213(2012) 3–12.

[10]C.Charras,T.Lecroq,HandbookofExactStringMatchingAlgorithms,King’sCollege,2004.

[11]M.Crochemore,A.Czumaj,L.G ˛asieniec,S.Jarominek,T.Lecroq,W.Plandowski,W.Rytter,Speedinguptwostring-matchingalgorithms,Algorithmica 12 (4)(1994)247–267.

[12]B.Durian,J.Holub,H.Peltola,J.Tarhio,TuningBNDMwithq-grams,in:ProceedingsoftheWorkshoponAlgorithmEngineeringandExperiments, ALENEX,2009,pp. 29–37.

[13]B.Durian,H.Peltola,L.Salmela,J.Tarhio,Bit-parallelsearchalgorithmsforlongpatterns,in:PaolaFesta(Ed.),Proceedingsofthe9thInternational SymposiumonExperimentalAlgorithms,SEA,IschiaIsland,Naples,Italy,in:LectureNotesinComputerScience,vol. 6049,Springer-Verlag,Berlin, 2010,pp. 129–140.

[14]S.Faro,M.O.Külekci,FastmultiplestringmatchingusingstreamingSIMD extensionstechnology,in:LilianaCalderón-Benavides,CristinaN. González-Caro,EdgarChávez,NivioZiviani(Eds.),Proceedingsofthe 19thInternationalSymposiumon StringProcessingand InformationRetrieval,SPIRE, Colombia,in:LectureNotesinComputerScience,vol. 7608,Springer-Verlag,Berlin,2012,pp. 217–228.

[15]S.Faro,M.O. Külekci,Fastpackedstring matchingfor shortpatterns,in:PeterSanders,Norbert Zeh(Eds.),Proceedingsofthe15th Meetingon AlgorithmEngineeringandExperiments,ALENEX,SIAM,NewOrleans,LA,USA,2013,pp. 113–121.

[16]S.Faro,T.Lecroq,Eﬃcientvariantsofthebackward-oracle-matchingalgorithm,in:JanHolub,JanŽ ˘dárek(Eds.),ProceedingsofthePragueStringology Conference2008,CzechTechnicalUniversityinPrague,CzechRepublic,2008,pp. 146–160.

[17]S.Faro,T.Lecroq,AneﬃcientmatchingalgorithmforencodedDNAsequencesandbinarystrings,in:CombinatorialPatternMatching,in:LectureNotes inComputerScience,vol. 5577,2009,pp. 106–115.

[18]S.Faro,T.Lecroq,Eﬃcientvariantsofthebackward-oracle-matchingalgorithm,Int.J.Found.Comput.Sci.20 (6)(2009)967–984. [19]S.Faro,T.Lecroq,Theexactstringmatchingproblem:acomprehensiveexperimentalevaluation,preprint,arXiv:1012.2547,2010.

[20] S. Faro, T. Lecroq, Smart: a string matching algorithm research tool, University of Catania and University of Rouen, http://www.dmi.unict.it/~faro/smart/, 2011.

[21]S.Faro,T.Lecroq,Amultipleslidingwindowsapproachtospeedupstringmatchingalgorithms,in:R.Klasing(Ed.),11thInternationalSymposiumon ExperimentalAlgorithms,SEA2012,in:LectureNotesinComputerScience,vol. 7276,Springer-Verlag,Bordeaux,France,2012,pp. 172–183. [22]S.Faro,T.Lecroq,Theexactonlinestringmatchingproblem:areviewofthemostrecentresults,ACMComput.Surv.45 (2)(2013)1–42. [23]K.Fredriksson,Fasterstringmatchingwithsuper-alphabets,in:StringProcessingandInformationRetrieval,Springer,2002,pp. 207–214.

[24]K.Fredriksson,S.Grabowski,Practicalandoptimalstringmatching,in:M.P.Consens,G.Navarro(Eds.),ProceedingsoftheInternationalSymposiumon StringProcessingandInformationRetrieval,SPIRE,in:LectureNotesinComputerScience,vol. 3772,Springer-Verlag,Berlin,2005,pp. 376–387. [25]R.M.Karp,M.O.Rabin,Eﬃcientrandomizedpattern-matchingalgorithms,IBMJ.Res.Dev.31 (2)(1987)249–260.

[26]D.E.Knuth,J.H.MorrisJr.,V.R.Pratt,Fastpatternmatchinginstrings,SIAMJ.Comput.6(1977)323.

[27]M.O.Külekci,FilterbasedfastmatchingoflongpatternsbyusingSIMDinstructions,in:ProceedingsofthePragueStringologyConference, 2009, pp. 118–128.

[28]M.O.Külekci,Blim:a newbit-parallelpatternmatchingalgorithmovercomingcomputerwordsizelimitation,Math.Comput.Sci.3 (4)(2010)407–420. [29]Intel(R)64andIA-32ArchitecturesOptimizationReferenceManual,IntelCorporation,2011.

[30]T.Lecroq,Fastexactstringmatchingalgorithms,Inf.Process.Lett.102 (6)(2007)229–235.

[31]G.Navarro,M.Raﬃnot,Abit-parallelapproachtosuﬃxautomata:fastextendedstringmatching,in:CombinatorialPatternMatching,Springer,1998, pp. 14–33.

[32]H.Peltola,J.Tarhio,Variationsofforward-SBNDM,in:JanHolub,JanŽ ˘dárek(Eds.),ProceedingsofthePragueStringologyConference2011,Czech TechnicalUniversityinPrague,CzechRepublic,2011,pp. 3–14.

[33]J.Rautio,J.Tanninen,J.Tarhio,Stringmatchingwithstopperencodingandcodesplitting,in:Proceedingsofthe13thAnnualSymposiumon Combi-natorialPatternMatching,CPM’02,Springer-Verlag,London,UK,2002,pp. 42–52.

[34]R.Thathoo,A.Virmani,S.SaiLakshmi,N.Balakrishnan,K.Sekar,TVSBS:a fastexactpatternmatchingalgorithmforbiologicalsequences,J.Indian Acad.Sci.,CurrentSci.91 (1)(2006)47–53.

Fast and flexible packed string matching

Journal

of

Discrete

Algorithms

Fast

and

ﬂexible

packed

string

matching

✩

Simone Faro

,

∗

,

M.

O˘guzhan Külekci

a

r

t

i

c

l

e

i

n

f

o

a

b

s

t

r

a

c

t

Σ

σ

O(

)

O(

)

O(

)

*

O(

)

σ





σ

α

= 



O(

)

O(

+

+

)

ε

>

O(

+

+

)

O(

+

)

O(

+

+

+

)

O(

)

O(

+

)

=