A prescription fraud detection model

(1)

j ou rna l h o me pa g e:w w w . i n t l . e l s e v i e r h e a l t h . c o m / j o u r n a l s / c m p b

A

prescription

fraud

detection

model

Karca

Duru

Aral

a

_,

_Halil

_Altay

_Güvenir

b,∗

_{, ˙Ihsan}

_{Sabuncuo ˘glu}

c

_,

_Ahmet

_Ruchan

_Akar

d,e

a_INSEAD,_Technology_&_Operations_Management_Area,_{Fontainebleau,}_France b_Department_of_Computer_Engineering_Bilkent_University,_Ankara,_Turkey c_Department_of_Industrial_Engineering,_Bilkent_University,_Ankara,_Turkey

d_Department_of_{Cardiovascular}_Surgery,_Ankara_University_School_of_Medicine,_Ankara,_Turkey e_Ankara_University_Stem_Cell_Institute,_Ankara,_Turkey

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received23November2010 Receivedinrevisedform 12September2011

Accepted13September2011

Keywords: Healthcarefraud Prescriptionfraud Datamining Outlierdetection

a

b

s

t

r

a

c

t

Prescriptionfraudisamainproblemthatcausessubstantialmonetarylossinhealthcare systems.Weaimedtodevelopamodelfordetectingcasesofprescriptionfraudandtestiton realworlddatafromalargemulti-centermedicalprescriptiondatabase.Conventionally, pre-scriptionfrauddetectionisconductedonrandomsamplesbyhumanexperts.However,the samplesmightbemisleadingandmanualdetectioniscostly.Weproposeanoveldistance basedondata-miningapproachforassessingthefraudulentriskofprescriptions regard-ingcross-features.Finaltestshavebeenconductedonadultcardiacsurgerydatabase.The resultsobtainedfromexperimentsrevealthattheproposedmodelworksconsiderablywell withatruepositiverateof77.4%andafalsepositiverateof6%forthefraudulentmedical prescriptions.Theproposedmodelhasthepotentialadvantagesincludingon-linerisk pre-dictionforprescriptionfraud,off-lineanalysisofhigh-riskprescriptionsbyhumanexperts, andself-learningabilitybyregularupdatesoftheintegrativedatasets.Weconcludethat incorporatingsuchasysteminhealthauthorities,socialsecurityagenciesandinsurance companieswouldimproveefﬁciencyofinternalreviewtoensurecompliancewiththelaw, andradicallydecreasehuman-expertauditingcosts.

1. Introduction

Fraudisdefinedastheabuse ofaprofitorganization’s sys-temwithoutnecessarilyleadingtodirectlegalconsequences. LeviandBurrowsdefinefraudasamechanismthroughwhich thefraudstergainsanunlawfuladvantageorcauses unlaw-fulloss[1].Fraudconstitutesacriticalprobleminmanyareas suchashealthcare[2],banking[3],insurance[4],and telecom-munications[5]. Prescription fraud is definedasthe illegal acquisitionofprescription drugsforpersonaluse orprofit, andcouldbeobservedinnumerousways.Anyeffortaimingto identifythefraudulenttransactionsinsuchdomainsiscalled

∗ _{Corresponding}_author._Tel.:₊₉₀₃₁₂₂₉₀_1252;_fax:₊₉₀₃₁₂₂₆₆_4047. E-mailaddress:guvenir@cs.bilkent.edu.tr(H.A.Güvenir).

asafrauddetectionprocess.Recentdatahavesuggestedthat traditionalmanualdetectionconductedbyhumanexpertsis quitecostlyasaresultofhighexpertwages,andlargesize ofthedatabases.Othermaindrawbacksofmanualdetection arethatindividualhumanexpertscannotrecognizethenewly emergedfraudpatternsspreadoutinthedatabase,and can-notmanagetodetectthefraudulentbehaviorthemomentitis attempted.Thus,customizeddataminingalgorithmsshould analyze theenormousdatabases oftheselargebusinesses, andthenhumanexpertscanfurtherinspectidentiﬁedrisky trasactions.

Having seen a yearly exponential increase in spending, abuse of healthcare systems isbecoming morecritical in

(2)

Table1–HealthcarespendinginTurkeybyyears.

BillionTL 2002 2007 2008

Totalsocialinsurancespending 7.6 20 24 Totalmedicamentspending 4.3 8.6 10.5 Totalhospitalspending 2.8 10.3 13 StatehospitalpaymentsbysocialSSA 1.8 6.4 7.5 SSA;SocialSecurityAgencyinTurkeyknownasSGK(Sosyal Güven-likKurumu).

Turkey as in many other countries [6]. As for the USA, accordingtoGeneral AccountingOfﬁce,annual healthcare expenditureshaveapproachedtwotrilliondollars,whichis 15.3%oftheGrossDomesticProductby2007[7].TheNational HealthCareAnti-FraudAssociation(NHCAA)estimatedthat 3%ofallhealthcarespendingwhichaddsuptobe$68billion islost tohealthcarefraudintheUnitedStates.Other esti-matesarearound10%or$170billionforthislostamount[8]. Examplesforfraudinahealthcaresystemwouldbebilling for services and goods that are not rendered, performing medicallyunnecessaryoperationsorprescribingunnecessary medicines.

TheexpertsfromSocialSecurityAgency(SSA,knownas SGK)inTurkeycommonlydetectprescriptionfraudintheir audits. Currently, while auditing the hospitals, SSA officer examinesasmall sampleofthe hospitalprescriptionsand thenSSAchargesthehospitalbyaproportionalamount.This method isboth costly to conduct and does not guarantee any efficiencycoefficient. Itisworth noting, however, that undetected fraud continues tobe anenormous burdenon theTurkishhealth-caresystem.AccordingtoTurkishHealth Care Syndicate 2008 Health Care Report, fraud in health carehasboomedinTurkeyrecently[6].Havingseenayearly exponentialincreaseinspendingasshowninTable1,health care systems’ abuse is becoming more and more critical. In 2008, health care fraud was committed principally in Van, Eskis¸ehir, Erzurum, Siirt, Adana, Bursa, Zonguldak, Diyarbakır,andmanyothercitiesevenintheHeadCenterof theTuberculosisFightingDepartment.Thesefraudulentacts wereintheformoffakemedicamentreports,fakeinvoices, billing SocialSecurity Agency (SSA)for examinations, and treatmentsthat werenotrendered. Thetotalcost ofthese fraudulentactsbeing millionsofTL, andabout 300people were arrested regarding fraud charges recently. Indeed, Turkish healthcarelaws provide significant legal sanctions forfraud and abuse control (TurkishPenal Law-26.09.2004, No: 5237/204). In contrast, the perception of the Turkish societythattheprescriptionfraudisavictimlesscrimemake it even more widespread and strengthen the fraudulent chain between the pharmaceutical companies, physicians, pharmacies,andpatients.Sincenearlyhalfofthespending of the SSA is on medical drug payments, which summed upto10.5billionTLin2008[6],weseethatthecostofthe fraudulentprescriptionstotheSSAisnottolerable.Thistype offraudcompromisesofexcessivemedicineprescription,and disunityofpatients’featureswiththeprescribedmedicines. Theorthodoxmanualdetectionisconductedbyacommittee of assigned medical doctors in the SSA. When inspecting ahospital,ahumanexpertgoesthrough arelativelysmall sampleofthe prescriptionsassociatedwiththehospital.If

therearefraudulentandabusiveclaimsinthesample,then theagencychargesthehospitaltopaytheamountacquiredby multiplyingthepercentageofthefraudulentclaimsdetected inthesampleandthetotalcostoftheprescriptionsissued by the hospital in that inspection period. This method is bothcostlytoconductanddoesnotguaranteeanyefﬁciency coefﬁcientfortheoutcome.

Inordertoenableanautomateduser-friendlysystemto overcometheabove-mentionedhandicaps,inthispaper,we proposeaprescriptionfrauddetectiontoolthatisableto high-lighttheprescriptionsthatconstitutehigherfraudprobability thresholdassessedbytheuser.Riskmeasurementsare cal-culated forcross-features ina knowledge-based setting to compare to thecommon practice bycertain distance met-rics. Thesystemincorporates anefﬁcienton-linestructure thatcanbeintegratedwiththeelectronicon-lineprescription provisionsystemsalreadyinuseinhealthcareinstitutions. Althoughoriginallyintendedforprescriptionfrauddetection, anyothermedicalclaim(bloodtests,X-rays,MRIscans, biop-sies,etc.)supervisionconstitutes promisingareasoffuture applications of the proposed methodology. Theunderlying assumptionforbuildingsuchasystemisthatthefraudulent behaviorsrelatedtoacrossfeatureareoutlierswhen consid-eringthetotaldataset.

Restofthepaperisorganizedasfollows.Section2provides acomprehensive literaturereviewonfraud detection stud-ies.Thissurveyindicatesthattherearethreemaintypesof frauddetectiontechniquesproposedforhealthcare.Theseare supervised,unsupervised,andhybridsystems.Sincewework onadatasetwithoutanypriorknowledgeonprescriptions’ labeltobefraudulentornot,theproposedsystemis consid-eredasunsupervised.Section3discussesthedatastructure, theproposedmethodology,andtherelatedriskassessment formulations.Section4presentstheresultsofcomputational experiments for both the off-line and on-line applications usingrealdata.Theempiricalvalidationsoftheproposed sys-temanditsperformancecomparedtoahumanexpertarealso giveninthissection.Finally,wegiveconcludingremarksand furtherresearchdirectionsinSection5.

2. work

Therearevariousresourcesrelatingtofrauddetection.Fraud detectionbeingarelativelylargeﬁeld,mostofthestudies con-sidersoutlierdetectionasaprimarytool[9].Theinvestigators mainlyincorporateartiﬁcialintelligence,datamining,expert systems,fuzzylogic,statisticsandvisualization.Nonetheless, studiesonhealthcareinsurancefrauddetectionarelimited. Wecangrouptheexistingmethodologiesoffrauddetection asbeingsupervised,unsupervised,orasbeinghybridsofthe above.

2.1. Supervisedapproaches

Supervisedalgorithmsaretrainedbypreviouslylabeled train-ing setoffraudulentandlegitimatetransactions.Then,the algorithms allocate mathematical methodologies to assign scores ofsimilarity withthe fraudulent proﬁles. Themost popular applications of supervised algorithms are neural

(3)

networks.Inthis context,Kim etal. proposeaneural net-work modelfortelecommunicationsubscriptionfraud [10]. Inanotherstudy,Barseetal.introduceamulti-layerneural networktohandlesyntheticdatabase ofVideo-on-Demand [11].Forthecreditcardfrauddetectionproblem,Syedaetal. developafuzzyneuralnetworkmodelthatworkson paral-lelmachines[12].Afeed-forwardradialbasisfunctionneural networkwiththree-layersisintroducedbyGhoshandReilly [13].Thisneuralnetworkistrainedintwophasestoassign riskscorestonewcreditcardtransactionsperiodically.

Maesetal.compare neuralnetworksand Bayesian net-works.Backpropagationalgorithmisusedtotraintheneural networks[14].TheresultsindicatethateventhoughtBayesian networksaremoreaccurateandrequireashorttrainingtime, theyareslowerintheapplicationfornewinstances.Another BayesianNetworkisdevelopedbyEzawaandNorton,which hasfourstagesandtwoparameters[15].Theauthorsassert thatallthemethodsofregression,nearestneighbor,and neu-ralnetworksaretooslowfortheirdatainhand.

Other methods inthe literature are decision trees, rule induction,andcase-basedreasoning.Metanetal.introduce a real time dispatching rules selection system extracting knowledgefromthedatastreamcomingfromthe manufac-turer[16].Theincorporateddecisiontreedynamicallyupdates in response to changes in the manufacturer’s conditions. Enablingaﬂexibleandhigherqualitydecisions,thesystem istestedonsimulationrunswhichrevealsthattheproposed modeloutperformstheexistingalgorithmsintheliterature.

Asforthestatistical modeling,FosterandStine employ least squares regression and stepwise selection of predic-tors[17].Theyassertthattraditionalstatisticalmethodsare effectivetobeusedforfraud detection. Belhadjiet al. pro-pose the cooperation of human experts for choosing best indicators(attributes)forfrauddetection[18].Then,the con-ditionalprobabilitiesoffraudforeachindicatorarecalculated accordingly.Afterwards,Probitregressionsareusedto iden-tifythemostimportantindicators.Theﬂexiblethresholdsare adjustableforcustomizationregardingthecompany’sfraud policy.Some other techniquesin theliterature incorporate expert systems, association rules, and genetic algorithms. Pejic-Bachgivesanoverviewofproﬁlingintelligentsystems applicationsinfrauddetectionandprevention[19].

2.2. Unsupervisedapproaches

In the area of telecommunications fraud detection, Cortes et al. study temporal evolution of large dynamic graphs [20]. Thegraphs are built up bythe sub-graphs named as Communitiesof Interest (COI). Exponential weighted aver-age method is used to update sub-graphs daily. COIs are builtup bythe mobile phone accountsusing callquantity and durations. The study yields the speciﬁcations of the telecommunicationfraudsters.Inmedicalinsurancedomain, Yamanishietal.present theunsupervised SmartSifter[21]. Thisalgorithmworkswithcategoricalandcontinuous vari-ables.SmartSifterinvestigatesstatisticaloutliersbyHellinger distance.Onautomobileinsurancedata,Brockettetal.employ PrincipalComponentAnalysisofRIDITscoresonrank-ordered categoricalattributes[22].

2.3. Hybridapproaches

Two sub-categoriesareidentiﬁedintheliteratureas super-visedhybridsandunsupervisedhybrids.

2.3.1. Supervisedhybrids

Inthis category,supervisedneuralnetworks, Bayesian net-works,anddecisiontreesarethemethodologiesmostlyused tocreatehybrids.Chanetal.combinenaiveBayes,C4.5,CART, andRIPPERclassiﬁers[23].Theresultsgivebetterefﬁciency oncreditcardtransactions.KimandKimdevelopadecision treealgorithmtoclassifythe datainhand [24]. Theyusea weightingfunctiontocomputefrauddensity,andthenaback propagationneuralnetworkisusedtogenerateaweighted riskscoreoncreditcardtransactions.Heetal.classifythe gen-eralpractitionerdatasetbythek-nearestneighboralgorithm [25].Theoptimalweightsoftheattributesarecomputedby geneticalgorithms.

2.3.2. Unsupervisedhybrids

Cortes and Pregibon propose the use of daily updated telecommunicationaccountsummaries(signatures)[20].The fraudulentlabeledsignaturesaretheninsertedtothetraining set.Thistrainingsetisusedfortrainingthesupervised algo-rithmssuchastree,slipper,andmodel-averagedregression. Thealgorithmallowstheauthorstodriveconclusionsonthe natureofthefraudulentcalls.Moreover,Cortesetal.propose agraph-theoreticmethod[26].Thismethodisusedtovisually detectfraudulentinternationalcalls.Cahilletal.computea riskscoretoeach callregardingitssimilarity tofraudulent proﬁlesanddissimilaritytotheaccount’ssignature[27].The signaturesareupdatedwithlow-scorecalls.Inthisupdating process,recentcallsaregivenhigherweightthanoldercalls. Thestudy byMoreauetal.indicatesthatsupervisedneural networkandruleinductionalgorithmsperformbetterthan twotypesofunsupervisedneuralnetworksinidentifyingthe shiftsbetweenshortandlongtermaccountbehaviorproﬁles [28].Theinvestigatorsusedtheareaunderthereceiver oper-atingcharacteristiccurve(AUC)astheperformancemeasure. Therearealsostudiesinwhichunsupervisedapproachesare usedtoclassifytheinsurancedataintoclustersfor incorporat-ingsupervisedapproaches.Athreestepprocedureisproposed byWilliamsand Huanginwhich: k-meansisemployedfor clusterdetection,C4.5isusedfordecisiontreeruleinduction, anddomainknowledge,thenstatisticalsummariesand visu-alizationtoolsareutilizedforruleevaluation[29].Williams employsageneticalgorithmforthesecondsteptogenerate rules.Thisenablestheusertoexploretherules[30].

Brauseet al.present RBF neuralnetworksforscreening theoutputs ofassociationrulesforcreditcardtransactions [31]. Ormerod et al. present a Mass Detection Tool (MDT) fordetectionofmedicalinsurancefraud[32].Ethnographyis the coreelementoftheproposalforcapturingexpertiseto designthemethodology.TheMDTusesadynamicBayesian Belief Network of fraud indicators. Ortega et al. describes another medicalclaimfraud/abusedetectionsystem based ondataminingusedbyaChileanprivatehealthinsurance company[33].Theproposeddetectionsystememploys multi-layerperceptronneuralnetworks(MLP).Huang,etal.appliesa ﬁlter-basedfeatureselectionmethodusinginconsistencyrate

(4)

Table2–Attributesinthedatabase.

Feature Type Numberofvalues Explanation

Commercialnameoftheprescribeddrug Categorical 2659 2659medicinesofdifferent commercialnamesseeninthe database.

Marketpriceoftheprescribeddrug Continuous 2659 Pricesoftheeachmedicinein Turkishmarketin2007ﬁxedby theHealthMinistry.

PrescriptionI.D.number Categorical 26,419 Identifyingnumbersforthe 26,419prescriptionsinthe database.

Age Continuous 85 Allagesbetween0and85

Sex Categorical 2 Female,male

Diagnosis Categorical 332 332differentdiagnosisseenin thedatabase

measureanddiscretization,toamedicalclaimsdatabaseto predicttheadequacyofdurationofantidepressantmedication utilization[34].

Thisstudy differsfrom theexisting onesin healthcare frauddetectioninthatthedomainknowledgelearnedcanbe usedas:(a) anon-linesystemtocheckifagiven prescrip-tioncarriesrisksoffraudandifsoinwhatrespects,(b)an off-linesystemtoprocessasetofprescriptionsandﬁlterout thosewithariskgreater thanathresholdtocheckfurther byhumanexperts,(c)self-learning abilityofthesystemby regularupdatesoftheintegrativedatasets.Thenextsection introducestheproposedmethodology.

3. Proposed

approach

In general, fraud detection research focuses on nonlinear, black-boxsupervisedalgorithms,nonetheless,wecanassert thatlesscomplex,reliableandfasteralgorithmsareneeded. Giventhattheinstances(prescriptions)inourdatabasedonot havelabelsasfraudulentand legitimate,weincorporatean unsupervisedapproach.

Forauditingmedicaltransactions,weneedtwotools.One isforbatchscreening/auditing whichis anoff-linesystem andtheotherisforon-line/ontimetransactioncontrol.This imposes building up two systems that work interactively. Clearly,the on-linesystemshouldincorporatestrategiesto overcometheneedforre-processingthewholebatchof pre-scriptionsineverynewtransaction.Thedatastructureand size are also other design considerations. We fulﬁll these requirementsundertheassumptionthatthefraudulentcases areoutliersinthedatabase.

3.1. Datastructure

Thedatabaseinhandisalreadyanonymizedandallowsus toconsiderthefollowingfeaturesinprescriptionfraud detec-tion:commercialnameoftheprescribeddrug;marketpriceof theprescribeddrug,prescriptionnumber,age,sex,diagnosis forwhichthedrugisprescribed.Thecharacteristicsofthese featuresaregiveninTable2.Asweexplicatethenatureofthe datainhand,wealsoseethatthefollowingfeaturesare cor-related:medicineanddiagnosis;medicineandage;medicine andsex;diagnosisandthetotalcostofdrugsprescribedfor

thisdiagnosis;medicineandmedicineinteractionsina pre-scription.

Sincethereisnocorrelationbetweenthefeatureslikeage andsex;weignorethesecross-features.Ontheotherhand, considering the interactions betweendiagnosis and age as wellasdiagnosisandsexwecanreasonthatwedonotneedto includethesecrossfeaturessinceanysuchdiagnosisshould conveyspeciﬁcmedicinesintheprescription.Thesespeciﬁc medicinesshouldrevealanymismatchingbetweenthe diag-nosisandageorsex.Theseargumentstransformourdomain of6dimensionstosub-domainsof2dimensionswhichare illustratedbytheinteractionsdiscussedabove.

3.2. Methodology

Theseargumentstransformourdomainof6dimensionsinto2 dimensionalsub-domains,whichareillustratedbythe above-mentionedinteractions.Therefore,ourproblemisrefinedto deal with fivetwo-dimensional spaces.Working with inci-dence and risk matrices which are to be defined in the subsequently,andhavingtwopartsofconsiderationas on-lineandoff-lineprocessing,ourmethodology’sflowchartis asshowninFig.1.

3.3. Off-lineprocessing

WedevelopedaMatlab2008Am-ﬁle,fortheoff-linebatch pro-cessingofthedatabase.Thiscodeprocessesthedatabaseto createtheincidencematricesforallthedomains.

• Medicineandagedomainincidencematrix:MA. • Medicineandsexdomainincidencematrix:MS. • Medicineanddiagnosisdomainincidencematrix:MD. • Medicineandmedicinedomainincidencematrix:MM. • Diagnosisandcostdomainincidencematrix:DC.

Anincidencematrixentry(i,j)correspondstothenumber oftimestheithandjthtraitsofthecorrespondingfeatures areseentogetherinthedatabase.AsfortheDCmatrix,the rowlabelsarediagnosesandcolumnlabelsareindicesfrom1 to204.Theseindicesrepresent5TL(Turkishcurrency) inter-vals,butthelastinterval isforthediagnosiscoststhatare above2500TL.Foreverydiagnosiswithinaprescription,the totalcostsofthecorrespondingmedicinesarecalculatedand

(5)

thenumberoftimesadiagnosisi’stotalcostfallsintoacost intervaljistheincidencematrixentryDC(i,j).

Nowhavingalltheincidencematricesinhand,thecode createsriskmatricesbelow:

• Medicineandagerisks:MAR. • Medicineandsexrisks:MSR. • Medicineanddiagnosisrisks:MDR. • Medicineandmedicinerisks:MMR. • Diagnosisandcostcouple’srisks:DCR.

These matrices are built up bycalculating the risks for thecorrespondingincidencesinthecorrespondingincidence matrices.Forexample,forcalculatingtheMSR(i,j),weusethe correspondingriskmetricforMS(i,j).Weneedtokeepthe inci-dencematricesforon-lineprocessing,sowedonotdirectly updatetheincidencematricesforriskcomputations.

Havingalltheriskmatricesinhand,thecodegoesthrough alltherisksthataregreaterthanthethresholdsgivenbythe user.Theusercanindicateanythresholdhewantsforany oftheriskmatriceskeepinginmindthatmoreprescriptions wouldbeclassiﬁedasriskywhenthethresholdiskeptsmall. Thatis,thereisatradeoffbetweenthetruepositiverateand thehumanexpertscreeningtime.Theusershouldpredeﬁne theleveloftradeoffheisreadytoaccept.

Given the thresholds, the code outputs the fraudulent prescriptions by indicating which types of fraud are seen withinthe prescriptions. That way, the humanexpert has thechancetorevisethemarkedprescriptions, whichsaves timeandmoneyinauditinglargedatabases,besideshaving acquiredalistofpossiblefraudulenttransactionstylesgiven thedatabase.

3.4. On-lineprocessing

Theon-lineprescriptionfrauddetectiontoolisaninteractive toolcodedinMatlabthathasagraphicaluserinterface. Con-sideringthenatureofthehealthcaresectorwhereon-line transactionoftheincominginvoicesisthecommonpractice, wecanassertthatthiskindofanon-linetoolisfundamental forinstantrealtimeauditing.

Thisinterfaceisdesignedtoenabletheusertoinsertnew prescriptionstothe databaseand auditanewprescription withouttheneedtore-runtheoff-linecode.Thus,new pre-scriptionauditingcanbedoneoncetheoff-linecodeisrunon theprescriptiondatabaseinavailable.Pleasenotethatsince thedatabaseweusedisinTurkish,allthegeneratedlistings inthe on-lineuser interface arein Turkish.Fig. 2shows a screenshotofthegraphicaluserinterfaceoftheauditingtool. AsseeninFig.2,theuserﬁrstneedstoinputthe prescrip-tionnumberaswellastheageandsexofthepatient.Then, intheboxbelowtheuserenterstheprescribeddrugandthe corresponding diagnosis by the add button. The drug and diagnosislistboxesarepopulatedbytheTurkishdrugnames anddiagnosislistsofthedatabase,whicharetheoutputsof theoff-linefrauddetectioncode.Theusercanchoosetocheck toseeiftheinputiscorrectbytheviewprescriptionbutton. Iftheprescriptioninputiscorrectlyspeciﬁed,theusermight choosetoaddtheprescriptiondirectlytothedatabase.Thatis achievedbyfetchingthecorrespondingrowsoftheincidence

and riskmatricesandupdatingthosebytheon-linecode’s input of the incoming prescription speciﬁcations. Alterna-tively,theusermightwanttoaudittheprescriptiondirectly. Thatway,inputoftheprescriptionisnotusedtoupdatethe incidenceandriskmatricespermanently.Thisispreferable sinceiftheincomingprescriptionisfraudulent,updatingthe incidenceandriskmatricesbythisinputwouldslightlyaffect the performanceof the code.This because increasing the numberofoutliersinadatabasewouldeventuallyleadthe outlierstobethe commontransactions.Thiswouldhinder thetooltodetectthosefraudulenttransactions.Asaresult, theusershouldaddtheincomingprescriptiontothedatabase iftheprescriptionisnotfraudulent,perhapsaftertheauditing process.Pushingtheauditbutton,theuserinstantlyreceives amessageindicatingeachleveloffraudriskregardingthe pre-scription.Lastly,thenewprescriptionbuttonenablestheuser toputinanewprescriptionrightafterauditinganotherone.

3.5. Riskassessment

Weintroducetheriskassessmentformulas,whichconsistof calculatingrisksgiventheincidencematrices.Asstated previ-ously,incidencematricesholdtheinformationregardingthe numberoftimesaninstanceshowsupinthedataset. 3.5.1. Riskmetricforcategoricalfeatures

Sex,diagnosis,andprescriptionmedicinesaretheun-ordered categoricalfeaturesinthedataset.Theincidencematrixentry (i,j)isthenumberoftimesthemedicineiisissuedtothe cor-respondingun-orderedcategoricalentryj.Medicine–Sex(MS), Medicine–Diagnosis(MD), and the Medicine–Medicine (MM) incidencematricesarethecategoricalmatrices.

Let us denotethe maximum incidenceentry of the ith medicineofanincidencematrixMFbyMaxMF(i),whereF rep-resentsthefeaturedomain.MaxMF(i)isthenumberoftimes themedicineiisissuedtothetraitthatismostissuedto.

Atthispointweintroduceariskestimationfunction,here after denoted asriskMF(i), that represents the likelihood of fraudwhentheithmedicineisprescribedforthejthtrait.We requiredthatfunctiontoreturnarealvaluebetween0and1. Here,theriskvalue1willrepresentthehighestpossibleriskof fraud,whereasthevalue0willrepresentthelowestpossible risk.ThehighestriskvalueisobtainedwhenMF(i,j)hasthe lowestvalue,thatistherarestcase.Further,wewantedthe riskfunctiontodropexponentially,whenMF(i,j)increased, andreachthevalue0whenitisequaltoMaxMF(i),themost commoncase.Havingtriedmanyriskfunctionsthatsatisfy thesecriteria,wefoundthattheriskfunctioninEq.(1)was themostsuccessfulone.

riskMF(i,j)= e

−(MF(i,j)/MaxMF(i))−_e−1

1−e−1 (1)

Then,theriskmatrixoftheMedicineandafeaturedomain Fcanbedeﬁnedas:MFR(i,j)=riskMF(i,j).

TheriskfunctioninEq.(1)employsanexponentialfunction inordertoachieveasteeptrendsincewepreferredhigh val-uesoffraudriskonlyforverysmallvaluesofMF(i,j)/MaxMF(i). That is, the sensitivityofthe risk functionto detectfraud shouldincreaseastheratioMS(i,j)/MaxMS(i)becomessmaller

(6)

Incidence Matrices New Prescription Insert the Prescription to P.A. Tool Historical Prescription Database Compute the Prescription Risks Legal Alarm for Investigation Pre-processing Compute the Prescription Risks Generate Incidence Matrices Fraudulent Prescriptions Report

Generate Report for the

Given

Thresholds

Prescription Risks Allow the transaction Add to

database

Update Incidence

Matrices

OFF-LINE SYSTEM ON-LINE SYSTEM

Yes

No

Fig.1–Aschematicviewoftheﬂowchartmodeloftheproposedsystem.P.A:

(7)

sincethederivativeofe−xincreasesasxgetssmaller.Wethan normalizethevalueofe−(MF(i,j)/MaxMF(i))_by_subtracting_e−1_and dividingby1− e−1_in_order_to_get_risk_values_between₀_and 1forastraightforwardinterpretationoftherisklevels.Note thatheree−1and1−e−1areconstantvalues.

3.5.2. Riskmetricfororderedfeatures

Ordered features are features over which we can make a magnitudecomparison. Thoseare oftencalled as continu-ous features. Here, we define the refined formulations for the Age and Cost ordered features of our database. Con-sidertheMedicine– Age incidencematrix,denotedbyMA. Let Max(i) and Min(i) denotethe maximum and minimum ofagesthatthe medicineiisprescribedto,respectively.In other words, Max(i)={j:MaxMA(i)=MA(i,j)} and Min(i)={j: MinMA(i)=MA(i,j)}.Thentheagerangeofmedicine iisri= Max(i)−Min(i).Themodifiedriskmetricis:

riskMA(i,j)= e −(MA(i,j)/MaxMA(i))×₍₁−_d i(j)/r)−e−1 1−e−1 (2) where, Vi=

kk×MA(i,k)

kMA(i,k)

(centroidageforithmedicine), and

di(j)=|j−Vi| (distanceofthejthagetothecentroidageof ithmedicine).

Then,the risk matrix of the Medicineand Age domain is deﬁned as MAR(i,j)=riskMA(i,j). For the Diagnosis–Cost domain,theformulationisanalogousexceptforthatwedeﬁne the entryDC(i,j) asthe number oftimes the diagnosis i is prescribedmedicinesoftotalcostfallingintotheintervalj.

4. Computational

results

Wedevelopthe codeofthe proposedframeworkinMatlab 2008Arelease.Inthissystem,theusercanindicateany thresh-oldhewants foranyofthe riskmatrices keepinginmind thatthere isa tradeoffbetweenthe true positiverate and thehumanexpertscreeningtime.Giventhethresholds,the codeoutputsthefraudulentprescriptionsbyindicatingwhich typesoffraudareseenwithintheprescriptions.Thatway,the humanexperthasthechancetorevisetheoutputted prescrip-tions,whichsavestimeandmoneytoauditlargedatabases. Theon-lineprescription fraud detectiontoolisan interac-tivetoolthat hasagraphical user interface. Thisinterface isdesignedtoenabletheusertoinsertnewprescriptionsto thedatabaseandauditanewprescriptionwithouttheneed tore-runthe off-linecode.Weruntheoff-linecodeonthe databaseof87,785prescribeddrugs.Thetestswererunona PCwith64byteCore2Duo(3GHz).Thecodetakes414seconds toprocessthewholedataset.Asstatedabove,arunrequires theusertospecifyriskinessthresholdsofeachkindof con-ﬁrmationcheckprocedure.Thecoderevealstheprescriptions whichpossesshigherrisksthanthethresholds.Wehavetaken

severalrunsinordertoreﬁnethepreferablethresholdforeach ofthedomains.

Theresultsindicatethatthesensitivitylevelsofeachofthe criteriaaredifferent.Thereasonforthatliesinthefactthat the sizesofthe incidencematricesare differentfromeach otherandthusthesparsenessandintensitycharacteristicsof each differ.Thatistosay,themaximumnumbersinarisk matrix’srow andthe rowsthemselveschange from matrix tomatrixforeachmedicineleadingtodifferentsetsofrisk indicatorsforthecorrespondingfeatures.Thus,each thresh-oldneedsaseparaterefinement.Knowledgeinferredneedsto bevalidatedandrefinedbyhumanexperts[35].Weachieve thisrefinementinthesupervisionofamedicaldoctorwho assessedthesignificancelevelsoftheoutputs sinceweare interestedinbuildingasystemthatproduceoutputs mean-ingfultothehumanexpertfraudauditorswhoaremedical doctorsinTurkey.Therefinedmodelforeach auditingtask usesthefollowingthresholdvalues:

• Medicine–DiagnosisDomain:0.85. • Medicine–AgeDomain:0.90. • Medicine–SexDomain:0.96. • Medicine–MedicineDomain:0.95. • Diagnosis–CostDomain:0.85.

Weconsiderfalsepositive,falsenegative,andtruepositive ratesaswellastheagreementrateasperformanceindicators foroursystem.Amedicaldoctorlabeledthefraudulent pre-scriptionsinarandomsampleof249prescriptionstakenfrom the database. The comparisonbetween the humanexpert labeling and the proposed systemhas ledtothe following resultswith17falsepositives,19falsenegatives,72true pos-itives,and141truenegatives.Theresultsaresummarizedin Table3.TheAUC(AreaUnderROCCurve)is85.7%.

Wehavecomparedoursystemwithtwoexistingmethods. EFD[36]performedworsewithatruepositiverateof26.4%, falsepositiverate5.9%,andAUCis60%.Themedicalclaim fraud/abusedetectionsystemproposedbyOrtegaetal.[33] achievedatrue positiverateof71%,falsepositiverate6%, withAUCis82.5%.

Aninterestingobservationabouttheauditresultsisthat the prescriptionslabeled asfraudulenttendtohave multi-plenumbersofreasonsforrisk.Forexample,letusconsider theprescription1592467whosedatabasevaluesaregivenin Table4.

Theoutputforthisprescriptionisas: PrescriptionNumber:1592467

• Incompatibility between Medicine: Iliadin Diagnosis: Glaukoma,Risk:0.96.

• Incompatibility between Medicine: Coraspin Diagnosis: Glaukoma,Risk:0.92.

• IncompatibilitybetweenDiagnosis:GlaukomaCost(TL):70, Risk:0.87.

Cosopt, being an ophthalmic suspension,is a legitimate itemintheprescription.Nonetheless,Iliadinisanasalspray and Coraspincontainsacetylsalicylicacid.Thismightbean indicatorthatthefraudsterstendtoaddseveralfraudulent

(8)

Table3–Performanceindicators.

Performanceindicators Explanation Performance

Falsepositiverate _TotalNumber_numberoffalse_of_instancespositives 6.09% Falsenegativerate Number_Total_numberoffalse_ofnegatives_instances 7.63% Truepositiverate Number_Numberof_oftrue_real_positivespositives 77.4% Agreementrate(accuracy) Numberoftrue_Totalpositives+number_number_of_instancesoftruenegatives 85.54%

Table4–Prescription1592467.

Prescriptionno. Drug Age Sex Diagnosis Price(TL)

1592467 Iliadin 57 M Glaukoma 4.59

1592467 Cosopt 57 M Glaukoma 30.80

1592467 Coraspin 57 M Glaukoma 2.40

Fig.3–Insertingaprescriptiontotheprescriptionauditingtool.

itemsinaprescriptionthatcouldhavebeenlegitimate with-outthose.

Theon-linecodecanberunoncehavingtheoff-line pro-cessingdone.Forillustratingtheeffectivenessoftheon-line frauddetection tool,letusconsider aprescriptiongivento a 55 years old woman. Kindly note that the data base we workwithisinTurkish,whichmeansthatwehaveTurkish listingsintheon-linetool.Sheisdiagnosedwiththe upper respiration tube infection and is giventhe medicines Sudafed Syrup, Otrivine Pediatric Spray and Staﬁne Pomade. The ini-tial user interface is as seen in Fig. 3 after inputting the prescription. If the user chooses to view the prescription a message box appears as in Fig. 4. After validating the prescriptioninput,theusermightchoosetoaddthe prescrip-tion tothe database. If so, the messagebox appearsas in Fig.5.

Whentheuser choosestoaudittheprescriptiona mes-sage boxappears asinFig. 6. Here,the Medicineand Age non-conformationriskassessmentsare statedinthe input orderofthe medicines,just asthe MedicineandSex non-conformation.Consideringthediagnoses,theMedicineand Diagnosis risks are seen in the screen in the appearance orderofthemedicineanddiagnosiscouplesinthe prescrip-tion. Lastly, we see one value for the Diagnosis and Cost

non-conformationrisksincethereisonlyonediagnosisinthe prescription.

Consideringtheprescription,wherethediagnosisisupper respiration tract infection and the prescribed medicines are Sudafed Syrup,StaﬁnePomadeand OtrivinePediatricSpray,we can statethatthe tooliseffective tocalculatenorisksfor themedicineanddiagnosisdomainfortheﬁrstandthelast

(9)

Fig.5–Databaseupdatenotiﬁcation.

Fig.6–Riskassessmentscreen.

medicinesandahighriskforthesecondsinceStaﬁnePomadeis askincaremedicine.Forthissecondmedicineweseethatthe toolcalculatesahighrisk(0.85),whichisexpected.Thereisno riskassociatedwiththesexofthepatientandthemedicines. Nonetheless,bothSudafedSyrupandOtrivinePediatricSprayare pediatricmedicines.Thus,thetoolidentiﬁes thehighrisks regardingtheageofthepatientas0.97forSudafedSyrupand 0.99fortheOtrivinePediatricSpray.

5. Concluding

remarks

and

further

research

direction

We conclude by proposing a novel model for detecting casesofprescriptionfraudintendedtoprovideefficientand user-friendlyplatforms,and savefinancialresourcesatthe institutionalandnationallevels.Ourmethodologyproposes dividingupthe6dimensionalfeatures’domainintoseveral 2dimensionalsub-domainsconsideringtheinteractionlevels betweenthefeatures.Themethodologyconsistsofpopulating incidencematricesforeachoftheabovedomainsandthen incorporatinga distancebased data-mining approach. The riskmetricsemployedinthisdata-miningapproachreturn riskmeasuresforeachofthedomainsmentionedabove.This riskmeasure is scaled tobe between0 and 1, in order to giveastraightforwarddefinitionoftherisklevel.Foreachof thedomains,theusercanspecifythresholds.Thatway,the programalarmsforonlythoseprescriptionswithrisklevels higherthanthethresholds.

Theautomatedfrauddetectionmethodologygives consid-erablycompatibleresultswiththehumanexpertauditing.The systemisﬂexibleenoughforanintegratedon-line/on-time userinterface,anditson-lineincorporationis computation-allyinexpensive,itpresentsanovelandeasywaytokeeptrack ofhealthcaretransactionsinincidencematricesforauditing.

Theapproachproposedhereisabletohandleboth categori-calandorderedfeatures.Theoutputofthesystemiseasyto understandandinterpretbyhumanusers.Besides,the sys-temcanlearnandprocessaccordinglyastheinputdatashifts. Finally,itscoremethodologyisadoptabletomanyotherareas inhealthcareandpossiblyinotherindustries.

Giventheperformancemeasurementswithatruepositive rate of77.4% and afalse positiverate of 6%,we can con-clude that the proposed system works reasonablywell for the prescriptionfraud detectionproblem. Nonetheless, fur-ther reﬁnement ofthe tool would require scaling the risk outputsacrossalldomains. Thiswouldmeanthat incorpo-ratingdifferentparametersfordifferentdomainswouldlead tothesameriskmeasurementsacrossalldomains.Besides, atoolcanbebuiltupwheretheusercanspecifythedomains hewantstoworkon.Effortsmustbeundertakentopromote cost-effective fraud detection modelsfor other healthcare practicesandinterventionsthatmayhaveanimpactonthe qualityofhealth-care.

Conﬂicts

of

interest

Thereisnoundisclosedethicalproblemorconﬂictsofinterest relatedtothispaper.

Acknowledgements

WethankCagdasBaranforassistanceinpreparationofthe typesoffraudandabuseinmedicalpracticeandMurat Kurt-cepheforvariousdiscussionsaboutthetopicofthispaper.

r

e

f

e

r

e

n

c

e

s

[1] M.Levi,M.Burrows,Measuringtheimpactoffraudinthe UK:aconceptualandempiricaljourney,BritishJournalof Criminology48(3)(2008)293–318.

[2] A.S.Kesselheim,D.M.Studdert,M.M.Mello,

Whistle-blowers’experiencesinfraudlitigationagainst pharmaceuticalcompanies,NewEnglandJournalof Medicine362(19)(2010)1832–1839.

[3] R.Wheeler,S.Aitken,Multiplealgorithmsforfraud detection,Knowledge-BasedSystems13(2–3)(2000)93–99. [4] S.Viaene,R.A.Derrig,B.Baesens,G.Dedene,Acomparison

ofstate-of-the-artclassiﬁcationtechniquesforexpert automobileinsuranceclaimfrauddetection,JournalofRisk andInsurance69(3)(2002)373–421.

[5] C.S.Hilas,P.A.Mastorocostas,Anapplicationofsupervised andunsupervisedlearningapproachesto

telecommunicationsfrauddetection,Knowledge-Based Systems21(7)(2008)721–726.

[6] TurkishHealthCareSyndicate2008HealthCareReport,2008 (Sa ˘glıkta2008Raporu,TürkSa ˘glıkSen).

[7] J.Li,K.Huang,J.Jin,J.Shi,Asurveyonstatisticalmethods forhealthcarefrauddetection,JournalofHealthCare ManagementScience11(3)(2008)275–287.

[8] USA’sNationalHealthCareAnti-FraudAssociationWeb Page,2009,http://www.nhcaa.org/eweb/StartPage.aspx. [9] X.Weng,J.Shen,Detectingoutliersamplesinmultivariate

timeseriesdataset,Knowledge-BasedSystems21(8)(2008) 807–812.

(10)

[10] H.Kim,S.Pang,H.Je,D.Kim,S.Bang,Constructingsupport vectormachineensemble,PatternRecognition36(2003) 2757–2767.

[11] E.Barse,H.Kvarnstrom,E.Jonsson,Synthesizingtestdata forfrauddetectionsystems,in:Proceedingsofthe19th AnnualComputerSecurityApplicationsConference,2003, pp.384–395.

[12] M.Syeda,Y.Zhang,Y.Pan,Parallelgranularneuralnetworks forfastcreditcardfrauddetection,in:Proceedingsofthe 2002IEEEInternationalConferenceonFuzzySystems,2002. [13] R.Ghosh,D.Reilly,Creditcardfrauddetectionwitha

neural-network,in:ProceedingsoftheTwenty-Seventh AnnualHawaiiInternationalConferenceonSystem Sciences,1994.

[14] S.Maes,K.Tuyls,B.Vanschoenwinkel,B.Manderick,Credit cardfrauddetectionusingBayesianandneuralnetworks, in:Proceedingsofthe1stInternationalNAISOCongresson NeuroFuzzyTechnologies,2002.

[15] K.Ezawa,S.Norton,ConstructingBayesiannetworksto predictuncollectibletelecommunicationsaccounts,IEEE Expert11(5)(1996)45–51.

[16] G.Metan,I.Sabuncuoglu,H.Pierreval,Realtimeselectionof schedulingrulesandknowledgeextractionviadynamically controlleddatamining,InternationalJournalofProduction Research48(23)(2010)6909–6938.

[17] D.Foster,R.Stine,Variableselectionindatamining:building apredictivemodelforbankruptcy,JournalofAmerican StatisticalAssociation99(466)(2004)303–313.

[18] E.Belhadji,G.Dionne,F.Tarkhani,Amodelforthedetection ofinsurancefraud,TheGenevaPapersonRiskand

Insurance25(4)(2000)517–538.

[19] M.Pejic-Bach,Proﬁlingintelligentsystemsapplicationsin frauddetectionandprevention:surveyofresearcharticles, in:ProceedingsofInternationalConferenceonIntelligent Systems,ModellingandSimulation,2010,pp.80–85. [20] C.Cortes,D.Pregibon,Signature-basedmethodsfordata

streams,DataMiningandKnowledgeDiscovery5(2001) 167–182.

[21] K.Yamanishi,J.Takeuchi,G.Williams,P.Milne,On-line unsupervisedoutlierdetectionusingﬁnitemixtureswith discountinglearningalgorithms,DataMiningand KnowledgeDiscovery8(2004)275–300.

[22] P.L.Brockett,R.A.Derrig,L.L.Golden,A.Levine,M.Alpert, Fraudclassiﬁcationusingprincipalcomponentanalysisof RIDITs,JournalofRiskandInsurance69(3)(2002)341–371. [23] C.L.Chan,C.H.Lan,Adataminingtechniquecombining

fuzzysetstheoryandBayesianclassiﬁer– anapplicationof auditingthehealthinsurancefee,in:H.R.Arabnia(Ed.), ProceedingsoftheInternationalConferenceonArtiﬁcial IntelligenceIC-AI’2001,2001,pp.402–408.

[24] M.Kim,T.Kim,Aneuralclassiﬁerwithfrauddensitymap foreffectivecreditcardfrauddetection,in:Proceedingsof IDEAL2002,2002,pp.378–383.

[25] H.He,W.Graco,X.Yao,Applicationofgeneticalgorithms andk-nearestneighbourmethodinmedicalfrauddetection, in:ProceedingsofSEAL1998,1999,pp.74–81.

[26] C.Cortes,D.Pregibon,C.Volinsky,Computationalmethods fordynamicgraphs,JournalofComputationalandGraphical Statistics12(4)(2003)950–970.

[27] M.Cahill,F.Chen,D.Lambert,J.Pinheiro,D.Sun,Detecting fraudintherealworld,in:HandbookofMassiveDatasets, 2002,pp.911–930.

[28] Y.Moreau,E.Lerouge,H.Verrelst,J.Vandewalle,C. Stormann,P.Burge,BRUTUS:ahybridsystemforfraud detectioninmobilecommunications,in:Proceedingsof EuropeanSymposiumonArtiﬁcialNeuralNetworks,1999, pp.447–454.

[29] G.Williams,Z.Huang,Miningtheknowledgemine:thehot spotsmethodologyformininglargerealworlddatabases, LectureNotesinComputerScience(1997)340–348. [30] G.Williams,Evolutionaryhotspotsdatamining:an

architectureforexploringforinterestingdiscoveries,in: ProceedingsofPAKDD99,1999.

[31] R.Brause,T.Langsdorf,M.Hepp,Neuraldataminingfor creditcardfrauddetection,in:Proceedingsof11thIEEE InternationalConferenceonToolswithArtiﬁcial Intelligence,1999.

[32] T.Ormerod,N.Morley,L.Ball,C.Langley,C.Spenser,Using ethnographytodesignaMassDetectionTool(MDT)forthe earlydiscoveryofinsurancefraud,in:CHI’03Extended AbstractsonHumanFactorsinComputingSystems,2003, pp.650–651.

[33] P.Ortega,C.Figueroa,G.Ru,Amedicalclaimfraud/abuse detectionsystembasedondatamining:acasestudyin Chile,in:ProceedingsofDMIN’06,2006,pp.224–231. [34] S.H.Huang,L.R.Wulsin,L.Hua,J.Guo,Dimensionality

reductionforknowledgediscoveryinmedicalclaims database:applicationtoantidepressantmedication utilizationstudy,ComputerMethodsandProgramsin Biomedicine93(2)(2009)115–123.

[35] T.Aydın,H.A.Güvenir,Modelinginterestingnessof streamingassociationrulesasabeneﬁt-maximizing classiﬁcationproblem,KnowledgeBasedSystems37(2) (2009)1713–1718.

[36] A.J.Major,D.R.Riedinger,EFD:ahybrid

knowledge/statistical-basedsystemforthedetectionof fraud,JournalofRiskandInsurance69(3)(2002)309–324.

A prescription fraud detection model

A

prescription

fraud

detection

model

Karca

Duru

Aral

,

Halil

Altay

Güvenir

, ˙Ihsan

Sabuncuo ˘glu

,

Ahmet

Ruchan

Akar

a

r

t

i

c

l

e

i

n

f

o

a

b

s

t

r

a

c

t

1.

Introduction

2.

Related

work

3.

Proposed