• Sonuç bulunamadı

A prescription fraud detection model

N/A
N/A
Protected

Academic year: 2021

Share "A prescription fraud detection model"

Copied!
10
0
0

Yükleniyor.... (view fulltext now)

Tam metin

(1)

j ou rna l h o me pa g e:w w w . i n t l . e l s e v i e r h e a l t h . c o m / j o u r n a l s / c m p b

A

prescription

fraud

detection

model

Karca

Duru

Aral

a

,

Halil

Altay

Güvenir

b,∗

, ˙Ihsan

Sabuncuo ˘glu

c

,

Ahmet

Ruchan

Akar

d,e

aINSEAD,Technology&OperationsManagementArea,Fontainebleau,France bDepartmentofComputerEngineeringBilkentUniversity,Ankara,Turkey cDepartmentofIndustrialEngineering,BilkentUniversity,Ankara,Turkey

dDepartmentofCardiovascularSurgery,AnkaraUniversitySchoolofMedicine,Ankara,Turkey eAnkaraUniversityStemCellInstitute,Ankara,Turkey

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received23November2010 Receivedinrevisedform 12September2011

Accepted13September2011

Keywords: Healthcarefraud Prescriptionfraud Datamining Outlierdetection

a

b

s

t

r

a

c

t

Prescriptionfraudisamainproblemthatcausessubstantialmonetarylossinhealthcare systems.Weaimedtodevelopamodelfordetectingcasesofprescriptionfraudandtestiton realworlddatafromalargemulti-centermedicalprescriptiondatabase.Conventionally, pre-scriptionfrauddetectionisconductedonrandomsamplesbyhumanexperts.However,the samplesmightbemisleadingandmanualdetectioniscostly.Weproposeanoveldistance basedondata-miningapproachforassessingthefraudulentriskofprescriptions regard-ingcross-features.Finaltestshavebeenconductedonadultcardiacsurgerydatabase.The resultsobtainedfromexperimentsrevealthattheproposedmodelworksconsiderablywell withatruepositiverateof77.4%andafalsepositiverateof6%forthefraudulentmedical prescriptions.Theproposedmodelhasthepotentialadvantagesincludingon-linerisk pre-dictionforprescriptionfraud,off-lineanalysisofhigh-riskprescriptionsbyhumanexperts, andself-learningabilitybyregularupdatesoftheintegrativedatasets.Weconcludethat incorporatingsuchasysteminhealthauthorities,socialsecurityagenciesandinsurance companieswouldimproveefficiencyofinternalreviewtoensurecompliancewiththelaw, andradicallydecreasehuman-expertauditingcosts.

©2011ElsevierIrelandLtd.Allrightsreserved.

1.

Introduction

Fraudisdefinedastheabuse ofaprofitorganization’s sys-temwithoutnecessarilyleadingtodirectlegalconsequences. LeviandBurrowsdefinefraudasamechanismthroughwhich thefraudstergainsanunlawfuladvantageorcauses unlaw-fulloss[1].Fraudconstitutesacriticalprobleminmanyareas suchashealthcare[2],banking[3],insurance[4],and telecom-munications[5]. Prescription fraud is definedasthe illegal acquisitionofprescription drugsforpersonaluse orprofit, andcouldbeobservedinnumerousways.Anyeffortaimingto identifythefraudulenttransactionsinsuchdomainsiscalled

Correspondingauthor.Tel.:+903122901252;fax:+903122664047. E-mailaddress:guvenir@cs.bilkent.edu.tr(H.A.Güvenir).

asafrauddetectionprocess.Recentdatahavesuggestedthat traditionalmanualdetectionconductedbyhumanexpertsis quitecostlyasaresultofhighexpertwages,andlargesize ofthedatabases.Othermaindrawbacksofmanualdetection arethatindividualhumanexpertscannotrecognizethenewly emergedfraudpatternsspreadoutinthedatabase,and can-notmanagetodetectthefraudulentbehaviorthemomentitis attempted.Thus,customizeddataminingalgorithmsshould analyze theenormousdatabases oftheselargebusinesses, andthenhumanexpertscanfurtherinspectidentifiedrisky trasactions.

Having seen a yearly exponential increase in spending, abuse of healthcare systems isbecoming morecritical in

0169-2607/$–seefrontmatter©2011ElsevierIrelandLtd.Allrightsreserved. doi:10.1016/j.cmpb.2011.09.003

(2)

Table1–HealthcarespendinginTurkeybyyears.

BillionTL 2002 2007 2008

Totalsocialinsurancespending 7.6 20 24 Totalmedicamentspending 4.3 8.6 10.5 Totalhospitalspending 2.8 10.3 13 StatehospitalpaymentsbysocialSSA 1.8 6.4 7.5 SSA;SocialSecurityAgencyinTurkeyknownasSGK(Sosyal Güven-likKurumu).

Turkey as in many other countries [6]. As for the USA, accordingtoGeneral AccountingOffice,annual healthcare expenditureshaveapproachedtwotrilliondollars,whichis 15.3%oftheGrossDomesticProductby2007[7].TheNational HealthCareAnti-FraudAssociation(NHCAA)estimatedthat 3%ofallhealthcarespendingwhichaddsuptobe$68billion islost tohealthcarefraudintheUnitedStates.Other esti-matesarearound10%or$170billionforthislostamount[8]. Examplesforfraudinahealthcaresystemwouldbebilling for services and goods that are not rendered, performing medicallyunnecessaryoperationsorprescribingunnecessary medicines.

TheexpertsfromSocialSecurityAgency(SSA,knownas SGK)inTurkeycommonlydetectprescriptionfraudintheir audits. Currently, while auditing the hospitals, SSA officer examinesasmall sampleofthe hospitalprescriptionsand thenSSAchargesthehospitalbyaproportionalamount.This method isboth costly to conduct and does not guarantee any efficiencycoefficient. Itisworth noting, however, that undetected fraud continues tobe anenormous burdenon theTurkishhealth-caresystem.AccordingtoTurkishHealth Care Syndicate 2008 Health Care Report, fraud in health carehasboomedinTurkeyrecently[6].Havingseenayearly exponentialincreaseinspendingasshowninTable1,health care systems’ abuse is becoming more and more critical. In 2008, health care fraud was committed principally in Van, Eskis¸ehir, Erzurum, Siirt, Adana, Bursa, Zonguldak, Diyarbakır,andmanyothercitiesevenintheHeadCenterof theTuberculosisFightingDepartment.Thesefraudulentacts wereintheformoffakemedicamentreports,fakeinvoices, billing SocialSecurity Agency (SSA)for examinations, and treatmentsthat werenotrendered. Thetotalcost ofthese fraudulentactsbeing millionsofTL, andabout 300people were arrested regarding fraud charges recently. Indeed, Turkish healthcarelaws provide significant legal sanctions forfraud and abuse control (TurkishPenal Law-26.09.2004, No: 5237/204). In contrast, the perception of the Turkish societythattheprescriptionfraudisavictimlesscrimemake it even more widespread and strengthen the fraudulent chain between the pharmaceutical companies, physicians, pharmacies,andpatients.Sincenearlyhalfofthespending of the SSA is on medical drug payments, which summed upto10.5billionTLin2008[6],weseethatthecostofthe fraudulentprescriptionstotheSSAisnottolerable.Thistype offraudcompromisesofexcessivemedicineprescription,and disunityofpatients’featureswiththeprescribedmedicines. Theorthodoxmanualdetectionisconductedbyacommittee of assigned medical doctors in the SSA. When inspecting ahospital,ahumanexpertgoesthrough arelativelysmall sampleofthe prescriptionsassociatedwiththehospital.If

therearefraudulentandabusiveclaimsinthesample,then theagencychargesthehospitaltopaytheamountacquiredby multiplyingthepercentageofthefraudulentclaimsdetected inthesampleandthetotalcostoftheprescriptionsissued by the hospital in that inspection period. This method is bothcostlytoconductanddoesnotguaranteeanyefficiency coefficientfortheoutcome.

Inordertoenableanautomateduser-friendlysystemto overcometheabove-mentionedhandicaps,inthispaper,we proposeaprescriptionfrauddetectiontoolthatisableto high-lighttheprescriptionsthatconstitutehigherfraudprobability thresholdassessedbytheuser.Riskmeasurementsare cal-culated forcross-features ina knowledge-based setting to compare to thecommon practice bycertain distance met-rics. Thesystemincorporates anefficienton-linestructure thatcanbeintegratedwiththeelectronicon-lineprescription provisionsystemsalreadyinuseinhealthcareinstitutions. Althoughoriginallyintendedforprescriptionfrauddetection, anyothermedicalclaim(bloodtests,X-rays,MRIscans, biop-sies,etc.)supervisionconstitutes promisingareasoffuture applications of the proposed methodology. Theunderlying assumptionforbuildingsuchasystemisthatthefraudulent behaviorsrelatedtoacrossfeatureareoutlierswhen consid-eringthetotaldataset.

Restofthepaperisorganizedasfollows.Section2provides acomprehensive literaturereviewonfraud detection stud-ies.Thissurveyindicatesthattherearethreemaintypesof frauddetectiontechniquesproposedforhealthcare.Theseare supervised,unsupervised,andhybridsystems.Sincewework onadatasetwithoutanypriorknowledgeonprescriptions’ labeltobefraudulentornot,theproposedsystemis consid-eredasunsupervised.Section3discussesthedatastructure, theproposedmethodology,andtherelatedriskassessment formulations.Section4presentstheresultsofcomputational experiments for both the off-line and on-line applications usingrealdata.Theempiricalvalidationsoftheproposed sys-temanditsperformancecomparedtoahumanexpertarealso giveninthissection.Finally,wegiveconcludingremarksand furtherresearchdirectionsinSection5.

2.

Related

work

Therearevariousresourcesrelatingtofrauddetection.Fraud detectionbeingarelativelylargefield,mostofthestudies con-sidersoutlierdetectionasaprimarytool[9].Theinvestigators mainlyincorporateartificialintelligence,datamining,expert systems,fuzzylogic,statisticsandvisualization.Nonetheless, studiesonhealthcareinsurancefrauddetectionarelimited. Wecangrouptheexistingmethodologiesoffrauddetection asbeingsupervised,unsupervised,orasbeinghybridsofthe above.

2.1. Supervisedapproaches

Supervisedalgorithmsaretrainedbypreviouslylabeled train-ing setoffraudulentandlegitimatetransactions.Then,the algorithms allocate mathematical methodologies to assign scores ofsimilarity withthe fraudulent profiles. Themost popular applications of supervised algorithms are neural

(3)

networks.Inthis context,Kim etal. proposeaneural net-work modelfortelecommunicationsubscriptionfraud [10]. Inanotherstudy,Barseetal.introduceamulti-layerneural networktohandlesyntheticdatabase ofVideo-on-Demand [11].Forthecreditcardfrauddetectionproblem,Syedaetal. developafuzzyneuralnetworkmodelthatworkson paral-lelmachines[12].Afeed-forwardradialbasisfunctionneural networkwiththree-layersisintroducedbyGhoshandReilly [13].Thisneuralnetworkistrainedintwophasestoassign riskscorestonewcreditcardtransactionsperiodically.

Maesetal.compare neuralnetworksand Bayesian net-works.Backpropagationalgorithmisusedtotraintheneural networks[14].TheresultsindicatethateventhoughtBayesian networksaremoreaccurateandrequireashorttrainingtime, theyareslowerintheapplicationfornewinstances.Another BayesianNetworkisdevelopedbyEzawaandNorton,which hasfourstagesandtwoparameters[15].Theauthorsassert thatallthemethodsofregression,nearestneighbor,and neu-ralnetworksaretooslowfortheirdatainhand.

Other methods inthe literature are decision trees, rule induction,andcase-basedreasoning.Metanetal.introduce a real time dispatching rules selection system extracting knowledgefromthedatastreamcomingfromthe manufac-turer[16].Theincorporateddecisiontreedynamicallyupdates in response to changes in the manufacturer’s conditions. Enablingaflexibleandhigherqualitydecisions,thesystem istestedonsimulationrunswhichrevealsthattheproposed modeloutperformstheexistingalgorithmsintheliterature.

Asforthestatistical modeling,FosterandStine employ least squares regression and stepwise selection of predic-tors[17].Theyassertthattraditionalstatisticalmethodsare effectivetobeusedforfraud detection. Belhadjiet al. pro-pose the cooperation of human experts for choosing best indicators(attributes)forfrauddetection[18].Then,the con-ditionalprobabilitiesoffraudforeachindicatorarecalculated accordingly.Afterwards,Probitregressionsareusedto iden-tifythemostimportantindicators.Theflexiblethresholdsare adjustableforcustomizationregardingthecompany’sfraud policy.Some other techniquesin theliterature incorporate expert systems, association rules, and genetic algorithms. Pejic-Bachgivesanoverviewofprofilingintelligentsystems applicationsinfrauddetectionandprevention[19].

2.2. Unsupervisedapproaches

In the area of telecommunications fraud detection, Cortes et al. study temporal evolution of large dynamic graphs [20]. Thegraphs are built up bythe sub-graphs named as Communitiesof Interest (COI). Exponential weighted aver-age method is used to update sub-graphs daily. COIs are builtup bythe mobile phone accountsusing callquantity and durations. The study yields the specifications of the telecommunicationfraudsters.Inmedicalinsurancedomain, Yamanishietal.present theunsupervised SmartSifter[21]. Thisalgorithmworkswithcategoricalandcontinuous vari-ables.SmartSifterinvestigatesstatisticaloutliersbyHellinger distance.Onautomobileinsurancedata,Brockettetal.employ PrincipalComponentAnalysisofRIDITscoresonrank-ordered categoricalattributes[22].

2.3. Hybridapproaches

Two sub-categoriesareidentifiedintheliteratureas super-visedhybridsandunsupervisedhybrids.

2.3.1. Supervisedhybrids

Inthis category,supervisedneuralnetworks, Bayesian net-works,anddecisiontreesarethemethodologiesmostlyused tocreatehybrids.Chanetal.combinenaiveBayes,C4.5,CART, andRIPPERclassifiers[23].Theresultsgivebetterefficiency oncreditcardtransactions.KimandKimdevelopadecision treealgorithmtoclassifythe datainhand [24]. Theyusea weightingfunctiontocomputefrauddensity,andthenaback propagationneuralnetworkisusedtogenerateaweighted riskscoreoncreditcardtransactions.Heetal.classifythe gen-eralpractitionerdatasetbythek-nearestneighboralgorithm [25].Theoptimalweightsoftheattributesarecomputedby geneticalgorithms.

2.3.2. Unsupervisedhybrids

Cortes and Pregibon propose the use of daily updated telecommunicationaccountsummaries(signatures)[20].The fraudulentlabeledsignaturesaretheninsertedtothetraining set.Thistrainingsetisusedfortrainingthesupervised algo-rithmssuchastree,slipper,andmodel-averagedregression. Thealgorithmallowstheauthorstodriveconclusionsonthe natureofthefraudulentcalls.Moreover,Cortesetal.propose agraph-theoreticmethod[26].Thismethodisusedtovisually detectfraudulentinternationalcalls.Cahilletal.computea riskscoretoeach callregardingitssimilarity tofraudulent profilesanddissimilaritytotheaccount’ssignature[27].The signaturesareupdatedwithlow-scorecalls.Inthisupdating process,recentcallsaregivenhigherweightthanoldercalls. Thestudy byMoreauetal.indicatesthatsupervisedneural networkandruleinductionalgorithmsperformbetterthan twotypesofunsupervisedneuralnetworksinidentifyingthe shiftsbetweenshortandlongtermaccountbehaviorprofiles [28].Theinvestigatorsusedtheareaunderthereceiver oper-atingcharacteristiccurve(AUC)astheperformancemeasure. Therearealsostudiesinwhichunsupervisedapproachesare usedtoclassifytheinsurancedataintoclustersfor incorporat-ingsupervisedapproaches.Athreestepprocedureisproposed byWilliamsand Huanginwhich: k-meansisemployedfor clusterdetection,C4.5isusedfordecisiontreeruleinduction, anddomainknowledge,thenstatisticalsummariesand visu-alizationtoolsareutilizedforruleevaluation[29].Williams employsageneticalgorithmforthesecondsteptogenerate rules.Thisenablestheusertoexploretherules[30].

Brauseet al.present RBF neuralnetworksforscreening theoutputs ofassociationrulesforcreditcardtransactions [31]. Ormerod et al. present a Mass Detection Tool (MDT) fordetectionofmedicalinsurancefraud[32].Ethnographyis the coreelementoftheproposalforcapturingexpertiseto designthemethodology.TheMDTusesadynamicBayesian Belief Network of fraud indicators. Ortega et al. describes another medicalclaimfraud/abusedetectionsystem based ondataminingusedbyaChileanprivatehealthinsurance company[33].Theproposeddetectionsystememploys multi-layerperceptronneuralnetworks(MLP).Huang,etal.appliesa filter-basedfeatureselectionmethodusinginconsistencyrate

(4)

Table2–Attributesinthedatabase.

Feature Type Numberofvalues Explanation

Commercialnameoftheprescribeddrug Categorical 2659 2659medicinesofdifferent commercialnamesseeninthe database.

Marketpriceoftheprescribeddrug Continuous 2659 Pricesoftheeachmedicinein Turkishmarketin2007fixedby theHealthMinistry.

PrescriptionI.D.number Categorical 26,419 Identifyingnumbersforthe 26,419prescriptionsinthe database.

Age Continuous 85 Allagesbetween0and85

Sex Categorical 2 Female,male

Diagnosis Categorical 332 332differentdiagnosisseenin thedatabase

measureanddiscretization,toamedicalclaimsdatabaseto predicttheadequacyofdurationofantidepressantmedication utilization[34].

Thisstudy differsfrom theexisting onesin healthcare frauddetectioninthatthedomainknowledgelearnedcanbe usedas:(a) anon-linesystemtocheckifagiven prescrip-tioncarriesrisksoffraudandifsoinwhatrespects,(b)an off-linesystemtoprocessasetofprescriptionsandfilterout thosewithariskgreater thanathresholdtocheckfurther byhumanexperts,(c)self-learning abilityofthesystemby regularupdatesoftheintegrativedatasets.Thenextsection introducestheproposedmethodology.

3.

Proposed

approach

In general, fraud detection research focuses on nonlinear, black-boxsupervisedalgorithms,nonetheless,wecanassert thatlesscomplex,reliableandfasteralgorithmsareneeded. Giventhattheinstances(prescriptions)inourdatabasedonot havelabelsasfraudulentand legitimate,weincorporatean unsupervisedapproach.

Forauditingmedicaltransactions,weneedtwotools.One isforbatchscreening/auditing whichis anoff-linesystem andtheotherisforon-line/ontimetransactioncontrol.This imposes building up two systems that work interactively. Clearly,the on-linesystemshouldincorporatestrategiesto overcometheneedforre-processingthewholebatchof pre-scriptionsineverynewtransaction.Thedatastructureand size are also other design considerations. We fulfill these requirementsundertheassumptionthatthefraudulentcases areoutliersinthedatabase.

3.1. Datastructure

Thedatabaseinhandisalreadyanonymizedandallowsus toconsiderthefollowingfeaturesinprescriptionfraud detec-tion:commercialnameoftheprescribeddrug;marketpriceof theprescribeddrug,prescriptionnumber,age,sex,diagnosis forwhichthedrugisprescribed.Thecharacteristicsofthese featuresaregiveninTable2.Asweexplicatethenatureofthe datainhand,wealsoseethatthefollowingfeaturesare cor-related:medicineanddiagnosis;medicineandage;medicine andsex;diagnosisandthetotalcostofdrugsprescribedfor

thisdiagnosis;medicineandmedicineinteractionsina pre-scription.

Sincethereisnocorrelationbetweenthefeatureslikeage andsex;weignorethesecross-features.Ontheotherhand, considering the interactions betweendiagnosis and age as wellasdiagnosisandsexwecanreasonthatwedonotneedto includethesecrossfeaturessinceanysuchdiagnosisshould conveyspecificmedicinesintheprescription.Thesespecific medicinesshouldrevealanymismatchingbetweenthe diag-nosisandageorsex.Theseargumentstransformourdomain of6dimensionstosub-domainsof2dimensionswhichare illustratedbytheinteractionsdiscussedabove.

3.2. Methodology

Theseargumentstransformourdomainof6dimensionsinto2 dimensionalsub-domains,whichareillustratedbythe above-mentionedinteractions.Therefore,ourproblemisrefinedto deal with fivetwo-dimensional spaces.Working with inci-dence and risk matrices which are to be defined in the subsequently,andhavingtwopartsofconsiderationas on-lineandoff-lineprocessing,ourmethodology’sflowchartis asshowninFig.1.

3.3. Off-lineprocessing

WedevelopedaMatlab2008Am-file,fortheoff-linebatch pro-cessingofthedatabase.Thiscodeprocessesthedatabaseto createtheincidencematricesforallthedomains.

• Medicineandagedomainincidencematrix:MA. • Medicineandsexdomainincidencematrix:MS. • Medicineanddiagnosisdomainincidencematrix:MD. • Medicineandmedicinedomainincidencematrix:MM. • Diagnosisandcostdomainincidencematrix:DC.

Anincidencematrixentry(i,j)correspondstothenumber oftimestheithandjthtraitsofthecorrespondingfeatures areseentogetherinthedatabase.AsfortheDCmatrix,the rowlabelsarediagnosesandcolumnlabelsareindicesfrom1 to204.Theseindicesrepresent5TL(Turkishcurrency) inter-vals,butthelastinterval isforthediagnosiscoststhatare above2500TL.Foreverydiagnosiswithinaprescription,the totalcostsofthecorrespondingmedicinesarecalculatedand

(5)

thenumberoftimesadiagnosisi’stotalcostfallsintoacost intervaljistheincidencematrixentryDC(i,j).

Nowhavingalltheincidencematricesinhand,thecode createsriskmatricesbelow:

• Medicineandagerisks:MAR. • Medicineandsexrisks:MSR. • Medicineanddiagnosisrisks:MDR. • Medicineandmedicinerisks:MMR. • Diagnosisandcostcouple’srisks:DCR.

These matrices are built up bycalculating the risks for thecorrespondingincidencesinthecorrespondingincidence matrices.Forexample,forcalculatingtheMSR(i,j),weusethe correspondingriskmetricforMS(i,j).Weneedtokeepthe inci-dencematricesforon-lineprocessing,sowedonotdirectly updatetheincidencematricesforriskcomputations.

Havingalltheriskmatricesinhand,thecodegoesthrough alltherisksthataregreaterthanthethresholdsgivenbythe user.Theusercanindicateanythresholdhewantsforany oftheriskmatriceskeepinginmindthatmoreprescriptions wouldbeclassifiedasriskywhenthethresholdiskeptsmall. Thatis,thereisatradeoffbetweenthetruepositiverateand thehumanexpertscreeningtime.Theusershouldpredefine theleveloftradeoffheisreadytoaccept.

Given the thresholds, the code outputs the fraudulent prescriptions by indicating which types of fraud are seen withinthe prescriptions. That way, the humanexpert has thechancetorevisethemarkedprescriptions, whichsaves timeandmoneyinauditinglargedatabases,besideshaving acquiredalistofpossiblefraudulenttransactionstylesgiven thedatabase.

3.4. On-lineprocessing

Theon-lineprescriptionfrauddetectiontoolisaninteractive toolcodedinMatlabthathasagraphicaluserinterface. Con-sideringthenatureofthehealthcaresectorwhereon-line transactionoftheincominginvoicesisthecommonpractice, wecanassertthatthiskindofanon-linetoolisfundamental forinstantrealtimeauditing.

Thisinterfaceisdesignedtoenabletheusertoinsertnew prescriptionstothe databaseand auditanewprescription withouttheneedtore-runtheoff-linecode.Thus,new pre-scriptionauditingcanbedoneoncetheoff-linecodeisrunon theprescriptiondatabaseinavailable.Pleasenotethatsince thedatabaseweusedisinTurkish,allthegeneratedlistings inthe on-lineuser interface arein Turkish.Fig. 2shows a screenshotofthegraphicaluserinterfaceoftheauditingtool. AsseeninFig.2,theuserfirstneedstoinputthe prescrip-tionnumberaswellastheageandsexofthepatient.Then, intheboxbelowtheuserenterstheprescribeddrugandthe corresponding diagnosis by the add button. The drug and diagnosislistboxesarepopulatedbytheTurkishdrugnames anddiagnosislistsofthedatabase,whicharetheoutputsof theoff-linefrauddetectioncode.Theusercanchoosetocheck toseeiftheinputiscorrectbytheviewprescriptionbutton. Iftheprescriptioninputiscorrectlyspecified,theusermight choosetoaddtheprescriptiondirectlytothedatabase.Thatis achievedbyfetchingthecorrespondingrowsoftheincidence

and riskmatricesandupdatingthosebytheon-linecode’s input of the incoming prescription specifications. Alterna-tively,theusermightwanttoaudittheprescriptiondirectly. Thatway,inputoftheprescriptionisnotusedtoupdatethe incidenceandriskmatricespermanently.Thisispreferable sinceiftheincomingprescriptionisfraudulent,updatingthe incidenceandriskmatricesbythisinputwouldslightlyaffect the performanceof the code.This because increasing the numberofoutliersinadatabasewouldeventuallyleadthe outlierstobethe commontransactions.Thiswouldhinder thetooltodetectthosefraudulenttransactions.Asaresult, theusershouldaddtheincomingprescriptiontothedatabase iftheprescriptionisnotfraudulent,perhapsaftertheauditing process.Pushingtheauditbutton,theuserinstantlyreceives amessageindicatingeachleveloffraudriskregardingthe pre-scription.Lastly,thenewprescriptionbuttonenablestheuser toputinanewprescriptionrightafterauditinganotherone.

3.5. Riskassessment

Weintroducetheriskassessmentformulas,whichconsistof calculatingrisksgiventheincidencematrices.Asstated previ-ously,incidencematricesholdtheinformationregardingthe numberoftimesaninstanceshowsupinthedataset. 3.5.1. Riskmetricforcategoricalfeatures

Sex,diagnosis,andprescriptionmedicinesaretheun-ordered categoricalfeaturesinthedataset.Theincidencematrixentry (i,j)isthenumberoftimesthemedicineiisissuedtothe cor-respondingun-orderedcategoricalentryj.Medicine–Sex(MS), Medicine–Diagnosis(MD), and the Medicine–Medicine (MM) incidencematricesarethecategoricalmatrices.

Let us denotethe maximum incidenceentry of the ith medicineofanincidencematrixMFbyMaxMF(i),whereF rep-resentsthefeaturedomain.MaxMF(i)isthenumberoftimes themedicineiisissuedtothetraitthatismostissuedto.

Atthispointweintroduceariskestimationfunction,here after denoted asriskMF(i), that represents the likelihood of fraudwhentheithmedicineisprescribedforthejthtrait.We requiredthatfunctiontoreturnarealvaluebetween0and1. Here,theriskvalue1willrepresentthehighestpossibleriskof fraud,whereasthevalue0willrepresentthelowestpossible risk.ThehighestriskvalueisobtainedwhenMF(i,j)hasthe lowestvalue,thatistherarestcase.Further,wewantedthe riskfunctiontodropexponentially,whenMF(i,j)increased, andreachthevalue0whenitisequaltoMaxMF(i),themost commoncase.Havingtriedmanyriskfunctionsthatsatisfy thesecriteria,wefoundthattheriskfunctioninEq.(1)was themostsuccessfulone.

riskMF(i,j)= e

−(MF(i,j)/MaxMF(i))−e−1

1−e−1 (1)

Then,theriskmatrixoftheMedicineandafeaturedomain Fcanbedefinedas:MFR(i,j)=riskMF(i,j).

TheriskfunctioninEq.(1)employsanexponentialfunction inordertoachieveasteeptrendsincewepreferredhigh val-uesoffraudriskonlyforverysmallvaluesofMF(i,j)/MaxMF(i). That is, the sensitivityofthe risk functionto detectfraud shouldincreaseastheratioMS(i,j)/MaxMS(i)becomessmaller

(6)

Incidence Matrices New Prescription Insert the Prescription to P.A. Tool Historical Prescription Database Compute the Prescription Risks Legal Alarm for Investigation Pre-processing Compute the Prescription Risks Generate Incidence Matrices Fraudulent Prescriptions Report

Generate Report for the

Given

Thresholds

Prescription Risks Allow the transaction Add to

database

Update Incidence

Matrices

OFF-LINE SYSTEM ON-LINE SYSTEM

Yes

No

Fig.1–Aschematicviewoftheflowchartmodeloftheproposedsystem.P.A:

(7)

sincethederivativeofe−xincreasesasxgetssmaller.Wethan normalizethevalueofe−(MF(i,j)/MaxMF(i))bysubtractinge−1and dividingby1− e−1inordertogetriskvaluesbetween0and 1forastraightforwardinterpretationoftherisklevels.Note thatheree−1and1−e−1areconstantvalues.

3.5.2. Riskmetricfororderedfeatures

Ordered features are features over which we can make a magnitudecomparison. Thoseare oftencalled as continu-ous features. Here, we define the refined formulations for the Age and Cost ordered features of our database. Con-sidertheMedicine– Age incidencematrix,denotedbyMA. Let Max(i) and Min(i) denotethe maximum and minimum ofagesthatthe medicineiisprescribedto,respectively.In other words, Max(i)={j:MaxMA(i)=MA(i,j)} and Min(i)={j: MinMA(i)=MA(i,j)}.Thentheagerangeofmedicine iisri= Max(i)−Min(i).Themodifiedriskmetricis:

riskMA(i,j)= e −(MA(i,j)/MaxMA(i))×(1d i(j)/r)−e−1 1−e−1 (2) where, Vi=



kk×MA(i,k)



kMA(i,k)

(centroidageforithmedicine), and

di(j)=|j−Vi| (distanceofthejthagetothecentroidageof ithmedicine).

Then,the risk matrix of the Medicineand Age domain is defined as MAR(i,j)=riskMA(i,j). For the Diagnosis–Cost domain,theformulationisanalogousexceptforthatwedefine the entryDC(i,j) asthe number oftimes the diagnosis i is prescribedmedicinesoftotalcostfallingintotheintervalj.

4.

Computational

results

Wedevelopthe codeofthe proposedframeworkinMatlab 2008Arelease.Inthissystem,theusercanindicateany thresh-oldhewants foranyofthe riskmatrices keepinginmind thatthere isa tradeoffbetweenthe true positiverate and thehumanexpertscreeningtime.Giventhethresholds,the codeoutputsthefraudulentprescriptionsbyindicatingwhich typesoffraudareseenwithintheprescriptions.Thatway,the humanexperthasthechancetorevisetheoutputted prescrip-tions,whichsavestimeandmoneytoauditlargedatabases. Theon-lineprescription fraud detectiontoolisan interac-tivetoolthat hasagraphical user interface. Thisinterface isdesignedtoenabletheusertoinsertnewprescriptionsto thedatabaseandauditanewprescriptionwithouttheneed tore-runthe off-linecode.Weruntheoff-linecodeonthe databaseof87,785prescribeddrugs.Thetestswererunona PCwith64byteCore2Duo(3GHz).Thecodetakes414seconds toprocessthewholedataset.Asstatedabove,arunrequires theusertospecifyriskinessthresholdsofeachkindof con-firmationcheckprocedure.Thecoderevealstheprescriptions whichpossesshigherrisksthanthethresholds.Wehavetaken

severalrunsinordertorefinethepreferablethresholdforeach ofthedomains.

Theresultsindicatethatthesensitivitylevelsofeachofthe criteriaaredifferent.Thereasonforthatliesinthefactthat the sizesofthe incidencematricesare differentfromeach otherandthusthesparsenessandintensitycharacteristicsof each differ.Thatistosay,themaximumnumbersinarisk matrix’srow andthe rowsthemselveschange from matrix tomatrixforeachmedicineleadingtodifferentsetsofrisk indicatorsforthecorrespondingfeatures.Thus,each thresh-oldneedsaseparaterefinement.Knowledgeinferredneedsto bevalidatedandrefinedbyhumanexperts[35].Weachieve thisrefinementinthesupervisionofamedicaldoctorwho assessedthesignificancelevelsoftheoutputs sinceweare interestedinbuildingasystemthatproduceoutputs mean-ingfultothehumanexpertfraudauditorswhoaremedical doctorsinTurkey.Therefinedmodelforeach auditingtask usesthefollowingthresholdvalues:

• Medicine–DiagnosisDomain:0.85. • Medicine–AgeDomain:0.90. • Medicine–SexDomain:0.96. • Medicine–MedicineDomain:0.95. • Diagnosis–CostDomain:0.85.

Weconsiderfalsepositive,falsenegative,andtruepositive ratesaswellastheagreementrateasperformanceindicators foroursystem.Amedicaldoctorlabeledthefraudulent pre-scriptionsinarandomsampleof249prescriptionstakenfrom the database. The comparisonbetween the humanexpert labeling and the proposed systemhas ledtothe following resultswith17falsepositives,19falsenegatives,72true pos-itives,and141truenegatives.Theresultsaresummarizedin Table3.TheAUC(AreaUnderROCCurve)is85.7%.

Wehavecomparedoursystemwithtwoexistingmethods. EFD[36]performedworsewithatruepositiverateof26.4%, falsepositiverate5.9%,andAUCis60%.Themedicalclaim fraud/abusedetectionsystemproposedbyOrtegaetal.[33] achievedatrue positiverateof71%,falsepositiverate6%, withAUCis82.5%.

Aninterestingobservationabouttheauditresultsisthat the prescriptionslabeled asfraudulenttendtohave multi-plenumbersofreasonsforrisk.Forexample,letusconsider theprescription1592467whosedatabasevaluesaregivenin Table4.

Theoutputforthisprescriptionisas: PrescriptionNumber:1592467

• Incompatibility between Medicine: Iliadin Diagnosis: Glaukoma,Risk:0.96.

• Incompatibility between Medicine: Coraspin Diagnosis: Glaukoma,Risk:0.92.

• IncompatibilitybetweenDiagnosis:GlaukomaCost(TL):70, Risk:0.87.

Cosopt, being an ophthalmic suspension,is a legitimate itemintheprescription.Nonetheless,Iliadinisanasalspray and Coraspincontainsacetylsalicylicacid.Thismightbean indicatorthatthefraudsterstendtoaddseveralfraudulent

(8)

Table3–Performanceindicators.

Performanceindicators Explanation Performance

Falsepositiverate TotalNumbernumberoffalseofinstancespositives 6.09% Falsenegativerate NumberTotalnumberoffalseofnegativesinstances 7.63% Truepositiverate NumberNumberofoftruerealpositivespositives 77.4% Agreementrate(accuracy) NumberoftrueTotalpositives+numbernumberofinstancesoftruenegatives 85.54%

Table4–Prescription1592467.

Prescriptionno. Drug Age Sex Diagnosis Price(TL)

1592467 Iliadin 57 M Glaukoma 4.59

1592467 Cosopt 57 M Glaukoma 30.80

1592467 Cosopt 57 M Glaukoma 30.80

1592467 Coraspin 57 M Glaukoma 2.40

Fig.3–Insertingaprescriptiontotheprescriptionauditingtool.

itemsinaprescriptionthatcouldhavebeenlegitimate with-outthose.

Theon-linecodecanberunoncehavingtheoff-line pro-cessingdone.Forillustratingtheeffectivenessoftheon-line frauddetection tool,letusconsider aprescriptiongivento a 55 years old woman. Kindly note that the data base we workwithisinTurkish,whichmeansthatwehaveTurkish listingsintheon-linetool.Sheisdiagnosedwiththe upper respiration tube infection and is giventhe medicines Sudafed Syrup, Otrivine Pediatric Spray and Stafine Pomade. The ini-tial user interface is as seen in Fig. 3 after inputting the prescription. If the user chooses to view the prescription a message box appears as in Fig. 4. After validating the prescriptioninput,theusermightchoosetoaddthe prescrip-tion tothe database. If so, the messagebox appearsas in Fig.5.

Whentheuser choosestoaudittheprescriptiona mes-sage boxappears asinFig. 6. Here,the Medicineand Age non-conformationriskassessmentsare statedinthe input orderofthe medicines,just asthe MedicineandSex non-conformation.Consideringthediagnoses,theMedicineand Diagnosis risks are seen in the screen in the appearance orderofthemedicineanddiagnosiscouplesinthe prescrip-tion. Lastly, we see one value for the Diagnosis and Cost

non-conformationrisksincethereisonlyonediagnosisinthe prescription.

Consideringtheprescription,wherethediagnosisisupper respiration tract infection and the prescribed medicines are Sudafed Syrup,StafinePomadeand OtrivinePediatricSpray,we can statethatthe tooliseffective tocalculatenorisksfor themedicineanddiagnosisdomainforthefirstandthelast

(9)

Fig.5–Databaseupdatenotification.

Fig.6–Riskassessmentscreen.

medicinesandahighriskforthesecondsinceStafinePomadeis askincaremedicine.Forthissecondmedicineweseethatthe toolcalculatesahighrisk(0.85),whichisexpected.Thereisno riskassociatedwiththesexofthepatientandthemedicines. Nonetheless,bothSudafedSyrupandOtrivinePediatricSprayare pediatricmedicines.Thus,thetoolidentifies thehighrisks regardingtheageofthepatientas0.97forSudafedSyrupand 0.99fortheOtrivinePediatricSpray.

5.

Concluding

remarks

and

further

research

direction

We conclude by proposing a novel model for detecting casesofprescriptionfraudintendedtoprovideefficientand user-friendlyplatforms,and savefinancialresourcesatthe institutionalandnationallevels.Ourmethodologyproposes dividingupthe6dimensionalfeatures’domainintoseveral 2dimensionalsub-domainsconsideringtheinteractionlevels betweenthefeatures.Themethodologyconsistsofpopulating incidencematricesforeachoftheabovedomainsandthen incorporatinga distancebased data-mining approach. The riskmetricsemployedinthisdata-miningapproachreturn riskmeasuresforeachofthedomainsmentionedabove.This riskmeasure is scaled tobe between0 and 1, in order to giveastraightforwarddefinitionoftherisklevel.Foreachof thedomains,theusercanspecifythresholds.Thatway,the programalarmsforonlythoseprescriptionswithrisklevels higherthanthethresholds.

Theautomatedfrauddetectionmethodologygives consid-erablycompatibleresultswiththehumanexpertauditing.The systemisflexibleenoughforanintegratedon-line/on-time userinterface,anditson-lineincorporationis computation-allyinexpensive,itpresentsanovelandeasywaytokeeptrack ofhealthcaretransactionsinincidencematricesforauditing.

Theapproachproposedhereisabletohandleboth categori-calandorderedfeatures.Theoutputofthesystemiseasyto understandandinterpretbyhumanusers.Besides,the sys-temcanlearnandprocessaccordinglyastheinputdatashifts. Finally,itscoremethodologyisadoptabletomanyotherareas inhealthcareandpossiblyinotherindustries.

Giventheperformancemeasurementswithatruepositive rate of77.4% and afalse positiverate of 6%,we can con-clude that the proposed system works reasonablywell for the prescriptionfraud detectionproblem. Nonetheless, fur-ther refinement ofthe tool would require scaling the risk outputsacrossalldomains. Thiswouldmeanthat incorpo-ratingdifferentparametersfordifferentdomainswouldlead tothesameriskmeasurementsacrossalldomains.Besides, atoolcanbebuiltupwheretheusercanspecifythedomains hewantstoworkon.Effortsmustbeundertakentopromote cost-effective fraud detection modelsfor other healthcare practicesandinterventionsthatmayhaveanimpactonthe qualityofhealth-care.

Conflicts

of

interest

Thereisnoundisclosedethicalproblemorconflictsofinterest relatedtothispaper.

Acknowledgements

WethankCagdasBaranforassistanceinpreparationofthe typesoffraudandabuseinmedicalpracticeandMurat Kurt-cepheforvariousdiscussionsaboutthetopicofthispaper.

r

e

f

e

r

e

n

c

e

s

[1] M.Levi,M.Burrows,Measuringtheimpactoffraudinthe UK:aconceptualandempiricaljourney,BritishJournalof Criminology48(3)(2008)293–318.

[2] A.S.Kesselheim,D.M.Studdert,M.M.Mello,

Whistle-blowers’experiencesinfraudlitigationagainst pharmaceuticalcompanies,NewEnglandJournalof Medicine362(19)(2010)1832–1839.

[3] R.Wheeler,S.Aitken,Multiplealgorithmsforfraud detection,Knowledge-BasedSystems13(2–3)(2000)93–99. [4] S.Viaene,R.A.Derrig,B.Baesens,G.Dedene,Acomparison

ofstate-of-the-artclassificationtechniquesforexpert automobileinsuranceclaimfrauddetection,JournalofRisk andInsurance69(3)(2002)373–421.

[5] C.S.Hilas,P.A.Mastorocostas,Anapplicationofsupervised andunsupervisedlearningapproachesto

telecommunicationsfrauddetection,Knowledge-Based Systems21(7)(2008)721–726.

[6] TurkishHealthCareSyndicate2008HealthCareReport,2008 (Sa ˘glıkta2008Raporu,TürkSa ˘glıkSen).

[7] J.Li,K.Huang,J.Jin,J.Shi,Asurveyonstatisticalmethods forhealthcarefrauddetection,JournalofHealthCare ManagementScience11(3)(2008)275–287.

[8] USA’sNationalHealthCareAnti-FraudAssociationWeb Page,2009,http://www.nhcaa.org/eweb/StartPage.aspx. [9] X.Weng,J.Shen,Detectingoutliersamplesinmultivariate

timeseriesdataset,Knowledge-BasedSystems21(8)(2008) 807–812.

(10)

[10] H.Kim,S.Pang,H.Je,D.Kim,S.Bang,Constructingsupport vectormachineensemble,PatternRecognition36(2003) 2757–2767.

[11] E.Barse,H.Kvarnstrom,E.Jonsson,Synthesizingtestdata forfrauddetectionsystems,in:Proceedingsofthe19th AnnualComputerSecurityApplicationsConference,2003, pp.384–395.

[12] M.Syeda,Y.Zhang,Y.Pan,Parallelgranularneuralnetworks forfastcreditcardfrauddetection,in:Proceedingsofthe 2002IEEEInternationalConferenceonFuzzySystems,2002. [13] R.Ghosh,D.Reilly,Creditcardfrauddetectionwitha

neural-network,in:ProceedingsoftheTwenty-Seventh AnnualHawaiiInternationalConferenceonSystem Sciences,1994.

[14] S.Maes,K.Tuyls,B.Vanschoenwinkel,B.Manderick,Credit cardfrauddetectionusingBayesianandneuralnetworks, in:Proceedingsofthe1stInternationalNAISOCongresson NeuroFuzzyTechnologies,2002.

[15] K.Ezawa,S.Norton,ConstructingBayesiannetworksto predictuncollectibletelecommunicationsaccounts,IEEE Expert11(5)(1996)45–51.

[16] G.Metan,I.Sabuncuoglu,H.Pierreval,Realtimeselectionof schedulingrulesandknowledgeextractionviadynamically controlleddatamining,InternationalJournalofProduction Research48(23)(2010)6909–6938.

[17] D.Foster,R.Stine,Variableselectionindatamining:building apredictivemodelforbankruptcy,JournalofAmerican StatisticalAssociation99(466)(2004)303–313.

[18] E.Belhadji,G.Dionne,F.Tarkhani,Amodelforthedetection ofinsurancefraud,TheGenevaPapersonRiskand

Insurance25(4)(2000)517–538.

[19] M.Pejic-Bach,Profilingintelligentsystemsapplicationsin frauddetectionandprevention:surveyofresearcharticles, in:ProceedingsofInternationalConferenceonIntelligent Systems,ModellingandSimulation,2010,pp.80–85. [20] C.Cortes,D.Pregibon,Signature-basedmethodsfordata

streams,DataMiningandKnowledgeDiscovery5(2001) 167–182.

[21] K.Yamanishi,J.Takeuchi,G.Williams,P.Milne,On-line unsupervisedoutlierdetectionusingfinitemixtureswith discountinglearningalgorithms,DataMiningand KnowledgeDiscovery8(2004)275–300.

[22] P.L.Brockett,R.A.Derrig,L.L.Golden,A.Levine,M.Alpert, Fraudclassificationusingprincipalcomponentanalysisof RIDITs,JournalofRiskandInsurance69(3)(2002)341–371. [23] C.L.Chan,C.H.Lan,Adataminingtechniquecombining

fuzzysetstheoryandBayesianclassifier– anapplicationof auditingthehealthinsurancefee,in:H.R.Arabnia(Ed.), ProceedingsoftheInternationalConferenceonArtificial IntelligenceIC-AI’2001,2001,pp.402–408.

[24] M.Kim,T.Kim,Aneuralclassifierwithfrauddensitymap foreffectivecreditcardfrauddetection,in:Proceedingsof IDEAL2002,2002,pp.378–383.

[25] H.He,W.Graco,X.Yao,Applicationofgeneticalgorithms andk-nearestneighbourmethodinmedicalfrauddetection, in:ProceedingsofSEAL1998,1999,pp.74–81.

[26] C.Cortes,D.Pregibon,C.Volinsky,Computationalmethods fordynamicgraphs,JournalofComputationalandGraphical Statistics12(4)(2003)950–970.

[27] M.Cahill,F.Chen,D.Lambert,J.Pinheiro,D.Sun,Detecting fraudintherealworld,in:HandbookofMassiveDatasets, 2002,pp.911–930.

[28] Y.Moreau,E.Lerouge,H.Verrelst,J.Vandewalle,C. Stormann,P.Burge,BRUTUS:ahybridsystemforfraud detectioninmobilecommunications,in:Proceedingsof EuropeanSymposiumonArtificialNeuralNetworks,1999, pp.447–454.

[29] G.Williams,Z.Huang,Miningtheknowledgemine:thehot spotsmethodologyformininglargerealworlddatabases, LectureNotesinComputerScience(1997)340–348. [30] G.Williams,Evolutionaryhotspotsdatamining:an

architectureforexploringforinterestingdiscoveries,in: ProceedingsofPAKDD99,1999.

[31] R.Brause,T.Langsdorf,M.Hepp,Neuraldataminingfor creditcardfrauddetection,in:Proceedingsof11thIEEE InternationalConferenceonToolswithArtificial Intelligence,1999.

[32] T.Ormerod,N.Morley,L.Ball,C.Langley,C.Spenser,Using ethnographytodesignaMassDetectionTool(MDT)forthe earlydiscoveryofinsurancefraud,in:CHI’03Extended AbstractsonHumanFactorsinComputingSystems,2003, pp.650–651.

[33] P.Ortega,C.Figueroa,G.Ru,Amedicalclaimfraud/abuse detectionsystembasedondatamining:acasestudyin Chile,in:ProceedingsofDMIN’06,2006,pp.224–231. [34] S.H.Huang,L.R.Wulsin,L.Hua,J.Guo,Dimensionality

reductionforknowledgediscoveryinmedicalclaims database:applicationtoantidepressantmedication utilizationstudy,ComputerMethodsandProgramsin Biomedicine93(2)(2009)115–123.

[35] T.Aydın,H.A.Güvenir,Modelinginterestingnessof streamingassociationrulesasabenefit-maximizing classificationproblem,KnowledgeBasedSystems37(2) (2009)1713–1718.

[36] A.J.Major,D.R.Riedinger,EFD:ahybrid

knowledge/statistical-basedsystemforthedetectionof fraud,JournalofRiskandInsurance69(3)(2002)309–324.

Şekil

Table 1 – Health care spending in Turkey by years.
Table 2 – Attributes in the database.
Fig. 1 – A schematic view of the flow chart model of the proposed system. P.A:
Table 3 – Performance indicators.
+2

Referanslar

Benzer Belgeler

İttir efsaneye nazaran, Yeni dünyadaki kahve ağaçlarının dedesi, Cavadaki kah~ ve plantasyonlarından celbedilen bir tek kahve ağacıdır.. Bu, hediye olarak

Bunun yanı sıra gerek Kanatlar’da gerekse civar yörede yatırı bulunan Kurt Dede Koca, Eyat Baba, Ali Baba ve Hasan Baba gibi veliler hakkında da menkıbeler anlatılmaktadır..

1935 yılında Güzel Sanatlar Akadem i­ sin e girdiğinde ise ilk hocası kendisi gibi nazik ve huy güzelliği olan Feyhaman Duran’dı. Daha sonra Akademi’de,

“Kurşun, nazar ve kem göz için dökülür, kurun­ tu ve sevda için dökülür, ağrı ve sızı için dökülür, helecan ve çarpıntı için dökülür, dökülür oğlu

Yaratılış itibariyle neş’eli, çalışkan ve çok iyi kalbli bir insan olan doktor Şükrü Şenozan hekimlikten başka, edebiyat, şiir ve bilhassa mu­ siki ile

Orhan Kemal ve Çağdaşlan (Kemal Tahir, Yaşar Kemal) Türk Romanı nı İstanbul'dan çıkarıp, .Anadolu'ya taşımışlardır.. Bir başka anlatımla sosyal gerçekçilik

Aşağıdaki kesirlerin paydalarını eşitleyerek noktalı yerlere < veya > sembollerinden uygun olanını yerleştiriniz.. a) 5 8 ... Buna göre kişileri en çok şeker yiyenden

Bilmem niçin, Malûmat yczilarını daha çok anlamakla beraber, Servet-i Fü.rjn o c'aha kibar, özentilerime daha ya­ kın bulurdum.. Yalnız şiirlerinin ve