ContentslistsavailableatScienceDirect
Journal
of
Neuroscience
Methods
j ou rn a l h om epa g e : w w w . e l s e v i e r . c o m / l o c a t e / j n e u m e t h
Clustering
fMRI
data
with
a
robust
unsupervised
learning
algorithm
for
neuroscience
data
mining
Hadeel
K.
Aljobouri
a,b,∗,
Hussain
A.
Jaber
a,
Orhan
M.
Koc¸
ak
c,
Oktay
Algin
d,e,
Ilyas
C¸
ankaya
aaElectricalandElectronicsEngineeringDepartment,GraduateSchoolofNaturalScience,AnkaraYıldırımBeyazıtUniversity,Ankara,Turkey bBiomedicalEngineeringDepartment,CollegeofEngineering,Al-NahrainUniversity,Baghdad,Iraq
cPsychiatryDepartment,SchoolofMedicine,KırıkkaleUniversity,Kırıkkale,Turkey
dDepartmentofRadiology,AtaturkTrainingandResearchHospital,AnkaraYıldırımBeyazıtUniversity,Ankara,Turkey eNationalMRResearchCenter,BilkentUniversity,Ankara,Turkey
h
i
g
h
l
i
g
h
t
s
•Anovelapplicationoftherobustunsupervisedlearningapproachisproposedinthecurrentstudy.Robustgrowingneuralgas(RGNG)algorithmwas fedintofMRIdataandcomparedwithgrowingneuralgas(GNG)algorithm,whichhasnotbeenusedforthispurposeoranyothermedicalapplication.
•LearningalgorithmsproposedinthecurrentstudyarefedwithrealandfreeauditoryfMRIdatasets.
•Anothercomparisonwasconductedwiththemodel-based(hypothesis)dataanalysisapproachusingthestatisticalparametricmapping(SPM)package, whichisbasedonthegenerallinearmodel.
•ThefMRIresultobtainedbyrunningRGNGwaswithintheexpectedoutcomeandissimilartothosefoundwiththehypothesismethodindetecting activeareaswithintheexpectedauditorycortices.
•ResultsshowthatthefMRIapplicationofthepresentedRGNGapproachisclearlysuperiortootherapproachesintermsofitsinsensitivitytodifferent initializationsandthepresenceofoutliers,aswellasitsabilitytodeterminetheactualnumberofclusterssuccessfully,asindicatedbyitsperformance measuredbyminimumdescriptionlength(MDL)andreceiveroperatingcharacteristic(ROC)analysis.
a
r
t
i
c
l
e
i
n
f
o
Articlehistory:
Received17December2017
Receivedinrevisedform13February2018 Accepted14February2018
Availableonline20February2018 Keywords:
Clusteringtechnique Datamining
Growingneuralgas(GNG) Robustgrowingneuralgas(RGNG)
a
b
s
t
r
a
c
t
Background:Clusteringapproachesusedinfunctionalmagneticresonanceimaging(fMRI)researchuse brainactivitytodividethebrainintovariousparcelswithsomedegreeofhomogeneouscharacteristics, butchoosingtheappropriateclusteringalgorithmsremainsaproblem.
Newmethod:Anovelapplicationoftherobustunsupervisedlearningapproachisproposedinthecurrent study.Robustgrowingneuralgas(RGNG)algorithmwasfedintofMRIdataandcomparedwithgrowing neuralgas(GNG)algorithm,whichhasnotbeenusedforthispurposeoranyothermedicalapplication. LearningalgorithmsproposedinthecurrentstudyarefedwithrealandfreeauditoryfMRIdatasets. Results:ThefMRIresultobtainedbyrunningRGNGwaswithintheexpectedoutcomeandissimilarto thosefoundwiththehypothesismethodindetectingactiveareaswithintheexpectedauditorycortices. Comparisonwithexistingmethod(s):ThefMRIapplicationofthepresentedRGNGapproachisclearly superiortootherapproachesintermsofitsinsensitivitytodifferentinitializationsandthepresenceof outliers,aswellasitsabilitytodeterminetheactualnumberofclusterssuccessfully,asindicatedby itsperformancemeasuredbyminimumdescriptionlength(MDL)andreceiveroperatingcharacteristic (ROC)analysis.
Conclusions:TheRGNGcandetecttheactivezonesinthebrain,analyzebrainfunction,anddetermine theoptimalnumberofunderlyingclustersinfMRIdatasets.Thisalgorithmcandefinethepositionsof thecenterofanoutputclustercorrespondingtotheminimalMDLvalue.
©2018ElsevierB.V.Allrightsreserved.
∗ Correspondingauthorat:BiomedicalEngineeringDepartment,Collegeof Engi-neering,Al-NahrainUniversity,Baghdad,Iraq.
E-mailaddress:hadeelbme77@eng.nahrainuniv.edu.iq(H.K.Aljobouri).
1. Introduction
Functionalmagneticresonanceimaging(fMRI)isapowerfultool
usedbyneuroscientiststoexaminebrainactivitybycalculatingthe
levelsofoxygenintheblood.Bloodoxygenationleveldependent
https://doi.org/10.1016/j.jneumeth.2018.02.007 0165-0270/©2018ElsevierB.V.Allrightsreserved.
(BOLD)signalrepresentstheratioofoxygenatedtodeoxygenated
hemoglobinmeasurementsinthebloodandiscloselyrelatedto
neuralactivity.FMRIconsiders metabolicfunctioninmeasuring
neuralactivitybecauseitdeterminesthehemodynamicresponse
function(HRF)ormetabolicdemands(oxygenconsumption)inthe
brainorspinalcord(Aljobourietal.,2015).
FMRIisusedtounderstandneuronalmechanismsbehindmany
disorders,suchasbipolardisorder,schizophrenia,Parkinson’s
dis-ease,autismspectrumdisorders,andAlzheimer’sdisease.
ThefMRIdatasetis acquiredfromascannermachine inthe
form of raw data as sequences of 3D images because of the
variations of voxel intensitiesover time. With different
exper-imental conditions, the acquired fMRI data are formed as a
combinationofBOLDsignalchangesandnoisesorartifacts.These
artifacts are attributed to hardware systems (the MRI scanner
itself),individualsthemselves(e.g.,headmotion),orphysiological
effects.
ClusteringtechniquesinfMRIresearchareconsidered
model-free or exploratorydata analysis approaches. These techniques
candefinetheactivezonesandfindstructuresinthebrainand
fMRI data competently without the need for prior knowledge
aboutactivationpatternsorexperiments.However,choosingthe
appropriateclusteringalgorithmsremainsaproblem.Independent
componentanalysis(ICA)andprincipalcomponentanalysis(PCA)
algorithmsareregardedasfinemethodstoseparatefMRIsignals
intoagroupofdefinedcomponents.Thesealgorithmscannot
eas-ilypredictoccurrencesduringacquisitionandhavelimitationsin
termsofindependenceandorthogonality,respectively(Korczak,
2012).VariousclusteringalgorithmsareappliedinfMRIfordata
mininginsteadofthepreviousclassicalmethods,whichcannot
eas-ilypredictoccurrencesduringacquisition.Theclassicalmethods
includeK-means,fuzzyclassification,hierarchicalclassifications,
Linde–Buzo–Gray(LBG),clusteringusingrepresentatives(CURE),
neuralmodelsKohonen’sself-organizingmap(SOM),neuralgas
(NG),andFritzke’sgrowingneuralgas(GNG)algorithms.However,
oneofthemainproblemsoffMRIclusteringalgorithmsis
decid-ingthenumberofclustersasaninput(Dimitriadouetal.,2004;
Wismulleretal.,2004).Resultswithahighlevelofinterpretation
wereobtainedusingclusteringapproaches,buttheseapproaches
areassociatedwithhighcostintermsofcomputingtimeand
mem-oryspace(BockandDiday,2000;Lindquist,2008;Goutteetal.,
1999; Baumgartneretal.,1998; Liaoetal.,2008;Katwal,2011;
Pereiraetal.,2009).
TheGNGalgorithmexhibitsthebestclustering performance
andproducerobustness;however,thisalgorithmhaslimitations
associatedwiththesensitivityforinitialization(choosingasetof
neuronvectors),theorderofinputvectors,andtheexistenceof
manyoutliers(QinandSuganthan,2004).Therefore,anovel
appli-cation,whichreliesonusingtherobustgrowingneuralgas(RGNG)
algorithmwithfMRIdatasets,isproposedtodetecttheactivezones
inthebrain.ThisalgorithmwascomparedwiththeGNGalgorithm,
whichhasnotyetbeenusedforthispurpose.
RGNGwasproposedtoidentifyactivatedregionsinthebrain
of various fMRI datasets with differentand important features
unlikeotherclusteringapproaches.Differentrobustness
proper-tiesareassociatedwiththeRGNGnetworkbecauseitisinsensitive
toinitialization,inputsequenceordering,andoutliers,determines
theoptimalnumberofunderlyingclustersduringdifferentgrowth
stages,anddealswithmultimodaldatasetseffectively.
The approach of using RGNG with fMRI dataset is the first
attemptintheliterature.Thecurrentstudyis organizedas
fol-lows:Section2providesthemostimportantpackagesusedwith
fMRIdataanalysisincomparisonwiththeclusteringandespecially
theproposedRGNGapproach.Section3describestheproposed
workandalgorithmsusingsimpleflowchartsandtables.Section
4describesthepreprocessingandperformancemeasures.Section
5presentstheexperimentaloutputresults.Finally,Section6
con-cludesthepaperandintroducesfutureresearchdirections.
2. fMRIdataanalysistechniques
FMRIdataanalysismethodscanbedividedmainlyintotwo
cat-egories,namely,model-driven ormodel-based(hypothesis)and
data-drivenormodel-free(exploratory)approaches.The
model-drivenmethodsdeal withdefiniteactivationpatterns,response
functions,orexperiments.Thesemodelsrequireprevious
knowl-edgeand statisticallytest theanalyzeddataonthepresenceor
absenceofaresponse.Themethodsrelatedtothiscategory
dif-fereitherbystatisticalmethodorsignalestimationprocedurein
performingtheactivation.Anexampleisthecommonlyused
gen-erallinearmodel(GLM),whichisthemostfundamentalandbasic
approachusedforfMRIdataanalysiswithstatisticalparametric
mapping(SPM)(SPM,1991).
Data-drivenmethods,incontrast,havetheabilitytocountallof
thevoxelssimultaneously,definetheactivezonesandfind
struc-turesin thebrainandfMRI datacompetently withoutprevious
knowledgeaboutactivationpatternsorexperimentalparadigms.
Thesemethodscanbedividedmainlyintotwo groups,namely,
blindsourceseparation(BSS)andclusteringapproach.
BSSattemptstofindunobservedsignalsor“sources”from
sev-eralobservedmixturesandgenerateamodelofthedata.Various
methodsareusedforBSS:PCA(Fristonetal.,1993;Fristonetal.,
1996),ICA(Hyvarinenetal.,2001;McKeownetal.,1998;Calhoun etal.,2001;Mckeown,2000),andcanonicalcorrelationanalysis
(CCA)(Frimanetal.,2002)methodsareusedtoseparatethese
mix-turestoobtainsourcesignals.TheFMRIBSoftwareLibrary(FSL)
package (TheAnalysisGroup,2012)usesmelodicICA, which is
adata-driven(model-free)approach,butisinsufficientformost
fMRIdatasetsbecauseICAhassomelimitations.ICAattemptsto
findmaximallyindependentmapsandsplitthewideactivation
areas into a number of maps, which have a strongcorrelation
betweentimecourses(TCs)ofdifferentcomponents.The
indepen-dentcomponents(ICs)fromICAdecompositionareunordered,that
is,thisfeatureisassociatedwiththemodelorderselectionforlinear
model-basedregionextraction,whichremainsanopenproblem.
Thus,determiningwhetherornotICsarecorrelatedwithnonlinear
activationisdifficult.
Clustering(Chenetal.,2006; Seghieretal.,2007)analysisis
basedongroupvoxelsaccordingtotheirTCsignalsinasimilarHDR
(Hemodynamic response) over time. The advantages presented
for theproposed clustering RGNG algorithm will beexplained.
Thisapproachismainlyadata-drivenormodel-free(exploratory)
approach.Table1comparesstatistical,transformationand
cluster-ingmethods.
ClusteringanalysisiswidelyusedforfMRIdataprocessingto
detectthebrainactiveareaeffectively.Inthefollowingparagraph,
thedataminingideaisidentifiedbasedontheGNGnetwork.
Lachicheetal.(2005)introducedanewinteractivedatamining
approachtofMRIimages,whichhasnotbeenusedforthepurpose
of thecurrent study,and showedthat GNGsuccessfully
recog-nizedtheactiveareasinthefMRIimagesofthebrain(Lachiche
etal.,2005).TheideaofdefiningadistancebetweenvoxelsoffMRI
imageswasargued,andthisdistanceisproposedtobebasedon
thesignalonly.
Korczak(2007)introducedanewinteractivedatamining
tech-niquetofMRIimagestoobservecerebralactivity;thistechniqueis
basedonadata-drivenapproach(Korczak,2007).Different
unsu-pervised clustering algorithms were presented, developed, and
tested onsequencesof fMRI images. Five clusteringalgorithms
(GNG,SOM,LBG,K-means,andCURE)wereappliedtosynthetic
perfor-Table1
Comparisonamongstatistical,transformationandclustering.
BasedApproach fMRIAnalysisMethods ApproachProperties Statisticalmethod Model-Driven/Model-Based/Hypothesis
UsedwithSPMpackageandbasedonGLM
Themostfundamental,basicandcommonlyusedapproachforfMRI dataanalysis,butitneedspreviousknowledgeaboutactivation patternsorexperiments.
Transformationmethod Data-Driven/ModelFree/Exploratory
UsedwithFSLpackageandbasedonmelodicICanalysis
Itisbasedonlinearmixingandisunordered.Thus,itmustdealwith independentdata.
Clusteringmethod Data-Driven/ModelFree/Exploratory
RGNGisanexamplewhichwereusedinthisstudy
ItcandefineactivezonesandidentifystructuresinthebrainandfMRI datacompetentlywithouttheneedforpreviousknowledgeabout activationpatternsorexperiments.
manceoftheGNGalgorithmwasthebestamongallotherclustering methods,withacceptablerobustness.
Heydaretal.(2009)developedthealgorithmoftheGNG
net-work,whichcanruntheoptimalnumberofclustersautomatically
(Heydaretal.,2009).Theexperimentalresultsusedartificialand
real fMRI datasets with the proposed algorithm, which is an
improvedversionoftheGNGalgorithm.TheycomparedtheJaccard
coefficientoftheproposedalgorithmwithsomewell-known
clus-teringalgorithms,suchasK-means,NG,GNG,andfuzzyC-means
(FCM);theresultsshowedthat theproposedalgorithm
outper-formedtheotheralgorithms.
TheGNGoriginatesfromtheNGalgorithmbyFritzke(Fritzke,
1995;Fritzke,1997),andtheRGNGalgorithmwasintroducedby
QinandSuganthan(2004)withintheGNGstructure.Thework
pre-sentedinthispaperusingRGNGwithfMRIdatawillbethefirst
attemptintheliterature.Thecurrentstudypresentedhowtofeed
RGNGwithrealandfreeauditoryfMRIdatasets.
GNGandRGNG,bothartificialneuralnetworkapproachesbased
on unsupervised clustering for fMRI analysis, are compared in
Table2.Thistablepresentstheresearcherswhointroducedthese
approaches,theresearcherswhousedtheseapproachesinfMRI
research,and the advantages and limitations of each approach
(AlJobourietal.,2017).
3. Methodologyandproposedwork
The GNGalgorithm is reviewed beforeintroducing the
pro-posedRGNGalgorithmforfeedingwithfMRIdata.TheGNGand
RGNGalgorithmsareextensiveandcomplex.Thus,flowchartsand
amathematicalmodelweredevelopedforconvenienceandeasier
writingoftherelatedcodes.
3.1. GNGalgorithm
TheGNGalgorithmwasdevelopedbyFritzke(1995,1997);he
proposedchangingtheunitnumbers(mostlyincreased)inaSOM
networkwithavariabletopologicalstructure.TheGNGisagrowing
softcompetitivelearningalgorithm,whichcombinesthetopology
formationrulesofthecompetitiveHebbianlearning(Martinetzand
Schulten,1991)withthegrowingcellstructures(Fritzke,1994)into
anewmodel.
BeforefeedingtheGNGalgorithm,thefollowing parameters
mustbedefined:
Inthesubsequentexperiments,theparametersettingsarefixed
foreachalgorithm,withtypicalvaluesproposedintheliterature.
TheGNGalgorithmwassetwithtypicalvaluesasin(Fritzke,1997):
εb=0.05,εn=0.006,␣max=100,ˇ=0.0005,and=300.
Fig.1presentstheflowchartoftheGNGalgorithmandshows
thattheinactiveneuronsthatdonotwinduringalongtime
inter-valmaybedetectedthroughtheGNGalgorithm bytracingthe
changesofanagevariableassociatedwitheachedge.Theproposed
flowchartcanbesummarizedinthefollowingsteps:
TheGNGstartswithaminimalnetworksize,andafew
num-bersofnewneuronsandconnectionsareinsertedintoagrowing
structureusingvectorquantizationuntilthedesiredqualityofthe
modelisachieved(e.g.,netsize,timelimit,predefinednumbersof
neuronsinserted,orsomeperformancemeasure).
3.2. Robustgrowingneuralgas(RGNG)algorithm
The“deadnode”problemoccursintheGNGalgorithmbecause
ofthegrowthschemeassociatedwiththeGNGalgorithm.Dead
node problems occur because of inappropriate initializations,
whichcausesomeprototypestoneverwinthroughthetraining
process.Evenwiththeinitializationinsensitiveclustering
meth-ods,goodclusteringresultsmaynotbeobtainediftheorderofthe
inputsequenceisnotchosenproperly.
Asidefromproblemsrelatedtothesensitivityforinitialization
andtheorderofinputvectorsorting,otherproblemsrelatedtothe
presenceandpositionofvariousoutliersoccur.Thus,theGNG
net-workmayfailtodifferentiatetheoutliersfromtheinliersthrough
theoriginalprototypeupdatingrulewhenvariousoutliersexistin
adataset.
AnovelRGNGwaspresentedbecauseofthelimitationsofthe
GNGalgorithm(QinandSuganthan,2004)withintheGNG
struc-ture.TherobustnessofRGNGtowardinitialization,inputvector
sorting,andtheexistenceandpositionofvariousoutliersimproved,
aswellasitsabilitytofindtheoptimalnumberofneuronsduring
runtimedynamically.
Fig.2presentstheflowchartoftheRGNGalgorithm.The
pro-posedflowchartcanbesummarizedinthefollowingsteps:
BeforefeedingtheRGNGalgorithm,thefollowingparameters
mustbedefined:
Table2
ComparisonbetweenTwoArtificialNeuralNetworkApproachesbasedontheUnsupervisedClusteringforfMRIAnalysis.
Methods GNG RGNG
Introducedby • Fritzke(1995) • Lachicheetal.(2005)
• Korczak(2007) • Heydaretal.(2009) UsedwithfMRI • QinandSuganthan(2004) Itwasnotpreviouslyproposed Advantages • Itsabilitytomodifythenetworktopologybyremoving
edgeswithitsagevariable
• Theneighborhoodsortingstepisunnecessary • Itcanfindanetworksizeandstructureautomatically,
continuelearning,andaddunitsandconnectionsuntila performancecriterionisfulfilled
• Thenumberofclassesisnotfixedinadvanceasinmost clusteringalgorithms
• Insensitivetoinitialization,inputsequenceordering,and thepresenceofoutliersduringdifferentgrowthstages • Canautomaticallydeterminetheoptimalnumberofclusters • Dealswithmultimodaldatasetseffectively
Limitations Itssensitivityfor: • Initialization
• Theorderofinputvectors • Existenceofmanyoutliers
Fig.2.FlowchartdesignoftheRGNGalgorithm.
In thesubsequentexperiments, theparametersettings were
fixedforeachalgorithm,withtypicalvaluesproposedinthe
lit-erature.TheRGNGalgorithmwassetwithtypicalvaluesasin[12]:
εbi=0.1,εbf=0.01,εni=0.005,εnf=0.0005,␣max=100,k=1.3,and
=1×10−4.
For eachreference vectorwi,i=1,2, ..., c,a seriesof edges
emerged from itslocation toa jointwithits direct topological
neighbors.SimilartotheGNG,theRGNGalgorithmstartsinstep
1withtheinitializationofafewprototypevectors(usuallytwo),
W= {w1,w2}.
InfMRI,WrepresentstheTCofthefMRIdataset(seeFig.3),
widenotestheTCofexemplari,andwc istheTCoftheclosest
exemplarc.Prototypevectorsw1,w2arerandomlychosenwith
referencevectorsfromtheTCofallvoxelsP (x),andadatavoxel
xisgeneratedasaninputsignalfromthefMRIdatasetusedfor
training,X= {x1,x2,...,xN}.
The maximum number of neurons to grow is defined as
prenumnodeandthemaximumpredefinedtrainingepochisdefined
asMax iterduringeach growthstage withacertain prototype
number.Theinitialorcurrenttrainingepochnumberissetasm=0.
Theiterationpointinthetrainingepoch(taskperiods)m,t=0.
Thus,thefulliterationstepiterovereachgrowthstepisexpressed
as:
iter=m·N+t, ,(1)
whereNisthelengthofthefMRITC.
ClusteringalgorithmsattempttoclassifytheTCsignalsofthe
voxelsintodifferentgroupsaccordingtothesimilarityamongthe
groups.Thetemporalinformationisorderedinclustersandis
inde-pendentofitsspatialneighborhood.Theseclustersaredescribed
byanaverageTCoraclustercenterobtainedbyaveragingallofthe
TCsofthecluster.ThefMRIdataaretransformedintoaTCofvoxel
intensityvariationsproportionaltoitsaverage,asfollows:
Ixav= 1 N
Iix, (2) where Ixav is theaverageintensityof voxelaof aseriesof N
images;
Wi=Iaxv−Iix. (3)
Thedistancesbetweentwo fMRIsignalsWaand Wb maybe
computedasaEuclidiandistance:
dE=
(Wai−Wbi)2. (4)
Theactivitylevelofthedatasetisgenerallybasedonthedistance
betweeninputvectorsxcomparedwithalloftheexemplarTCsWi.
InRGNG,thesmallestEuclideandistancex−wi canbemadeto
definethebestmatchingnode.
TheRGNGalgorithmusedtheprincipleoftheMDLvalueasthe
clusteringvalidityindex(tofindtheoptimalnumberofclustersand
theircenterpositions)correspondingtothesmallestMDLvalue.
Thus,theoptimalnumberofclustersisdeterminedautomatically
bysearching theextrememinimumvalue oftheMDLmeasure
throughthenetwork-growingprocess.TheRGNGapproachhasthe
smallestMDLvaluerecordedwithrespecttotheGNGcombined
Fig.3.Proposeddataminingsystemarchitecture.(a)Mainblockdiagram.(b) Exper-imentalparadigm“silence”and“talk”.
approachcanfindtheoptimalnumberofclustersandtheircenter
positionscorrespondingtothesmallestMDLvalue.
4. Simulationdesign
Theblockdiagram showstheprocess ofbrain functiondata
analysis,whichisperformedinthecurrentstudy.Theprocessis
composedoffivestages(seeFig.3B):
• preprocessingoftherawdata;
• clusteringvoxelstogetherbasedonthesimilarityoftheir
inten-sityprofileintheTCsoftheimage;
• overlaywiththestructuralimage;
• visualfMRIimage;
• validation.
4.1. Imagespatialpreprocessing
Theexperimentsinthepresentworkwereperformedin
MAT-LAB2016aandSPM12packageforthepreprocessingstage.Various
noisefactorsinterferewiththefMRIsignalsofinterest.Thesubject
istypicallynevercompletelymotionless.Thus,thepreprocessing
stepsmustbeadaptedtoeachidentifiedartifactbeforethe
cluster-ingphase.ThepresentworkusedSPMfortheauditoryfMRIdata
spatialpreprocessingstages.ThefMRIdatasetispreprocessedby
applyingthefollowingsteps:
• Realignment; • Coregistration; • Segmentation; • Normalize;
• SmoothingusingFWHM=6.
Thefunctional imageswerereoriented toMNIspace, which
isstandardbrainformedbyusingalargeseriesofMRIscanson
normalcontrolsdevelopedattheMontrealNeurologicalInstitute.
Thenthefunctionalrawdatawererealignedtocorrectforthehead
movements.Thehigh-resolutionanatomicalT1imageswere
coreg-isteredwiththerealignedfunctionalimagestoenableanatomical
localizationoftheactivations.Segmentationprocessisnot
manda-tory.SPM12usesMNItemplateimage,whicharethemostcommon
templatesused for fMRIspatial normalization.In this step,the
anatomicalandfunctionalimageswerespatiallynormalizedinto
MNIspace.Finally,thefunctionalrawdatawerespatiallysmoothed
withaGaussiansmoothingkernelof6.
4.2. FMRIdataset
Quantitative performance assessment uses an auditoryfMRI
dataset.AuditorydataiscomposedofentirebrainBOLD/EPIimages
acquiredona modified 2T SiemensMAGNETOM Vision system
(Johnetal.,2013).Eachacquisitionconsistsof64contiguousslices
(64×64×643×3×3mmvoxels).Acquisitiontook6.05s,witha
scantoscanrepetitiontime(TR)setarbitrarilyto7s.Atotalof96
acquisitionsweremadefromasinglesubjectinblocksof6scans
(acquiredduringthesameconditionasastimulantorrest),yielding
16blocksandeachblockfor42s.
The experimentalparadigm for successive blocks alternated
between rest and auditory stimulation, starting with rest (see
Fig.3B).Thefunctionaldatastartatacquisition4,functionalimage
(fM4).Auditorystimulationwascomposedofbisyllabicwords(e.g.,
“mother,”“house,”“weather,”and“movie”)presentedbinaurallyat
arateof60/min.Thefirstfewscansmustbediscarded(“dummy”
leaddidnotexistinscans)becauseofT1effects.
4.3. Performancemeasure
AnovelapplicationoftheRGNGalgorithmwascomparedwith
GNGandSPMusingtwoperformancemeasures,namely,theMDL
valueandreceiveroperatingcharacteristic(ROC)analysis.
IntheRGNGalgorithm,theMDLvalueisoneofthewell-known
informationtheoryevaluationmeasures,whichhasbeenusedas
theclusteringvalidityindex(Rissanen,1983).TheaverageMDL
val-uesduringthegrowthstageshavebeenplottedversusthelength
Fig.4.MDLvaluesversusN.
Fig.5.ROCcurvesanalysesoftheauditoryfMRIdataset.
approachescombinedwiththeMDLcriterion;thelengthofthe
fMRITCisselectedrandomlyasN=16.Eachdetectedcluster
num-bercorrespondedtotheMDLvalue.TheRGNGapproachhasthe
smallestMDLvaluerecordedwithrespecttoGNGcombinedwith
theMDLprinciple;thus,itcansuccessfullydeterminetheactual
numberofclusters.
TheROCanalysisisanotherindexoftheperformanceofRGNG
incomparisonwithSPM(Skudlarskietal.,1999).TheROCis
well-knowninmedicalimagingandmachinelearningapplications;the
ROCspaceconsistsofthefalsepositiveratio(FPR)onthex-axis
andthetruepositiveratio(TPR)onthey-axis(SunandXu,2014).
ThegoodclassifierspaceisindicatedbyahighTPRandalowFPR,
whereasthebadclassifierspaceisindicatedbyalowTPRanda
highFPR.
ThecurvesinFig.5,generallyindicatedthatthetwomethods
workasgoodclassifierswithahighTPRandalowFPR.TheRGNG
methodcandetectrealactivationsunderthesameFPRratio.
InfMRI,theFPRiscalculatedbydividingthenumberof
misclas-sifiedinactivatedvoxelsbythetotalnumberofvoxelsconsidered,
whereastheTPRiscalculatedbydividingthenumberofcorrect
classificationsofactivatedvoxelsbythetotalnumberofvoxels
con-Fig.6. ActiveareasinthebrainauditorycortexareawithintheSPMpackage.
sidered(Langeetal.,1999).Inthesamesituation,theROCcurves
fortheRGNGandSPMmethodsarecompared,asshowninFig.5.
5. fMRIresults
The principles behind the prototype-based clustering
algo-rithms were introduced in this work. The validity of the
performance oftheRGNG wasanalyzed andverifiedwithfMRI
experiments.FMRIanalysisinvolvesknownareasand functions
ofthebrain.Thus,thecommonandexpectedresultsmustbeused
intheexperiments.Oneoftheseareasistheauditorycortex.Real
auditoryfMRIdata,whicharefreelyavailableforeducationand
evaluationpurposes,wereusedintheexperiments[http://www.
fil.ion.ucl.ac.uk/spm/data/auditory/].Thesedatawereutilizedby
previousworks(Lachicheetal.,2005;Korczak,2007;Heydaretal.,
2009).
OneofthedecisiveadvantagesoffMRIisthatfMRIstudiesdo
notrequiretheanalysisofagroupofvolunteers,butcanproduce
valuableresultsatthelevelofsingleindividuals.Theanalysisof
singlevolunteersis crucialinanalyzing smallstructures,which
exhibit stronginterindividual variation (Campain andMinckler,
1976; Francesco etal., 2003), similar totheauditorycortex,as
showninFig.6.
5.1. ComparingauditorydatarunningRGNGwiththatofGNG
Ablockdesignexperimentwasconductedusingauditory
stim-ulus.Figs.7and8AandBshowtheactiveareasintheauditory
cortexoftheentirebrainwhenrunningtheGNG,andRGNG
algo-rithms.AlthoughauditorycortexregionswerefoundbyGNGand
RGNGalgorithms,inGNGotherareasarealsoactivatedoutsidethis
cortex.InRGNG,theseareasarelessorapproximatelydisappeared
underthesameexperimentandtheauditorystimulusofthewhole
brain.
Fig.7showstheclustersinatransparentorglassbrainimage
whichisamoreflexibleapproachbyspecifyingarealRGB
(red-green-blue)colorvalueforeveryvoxelintheimage.Fig.8AandB
showthealignmentoftheobtainedclustersintoastructuralspace
ofthebrainwhenrunningtheGNGandRGNG,respectively.With
regardtotheoutputresultsobtainedbyrunningthethree
unsuper-visedclusteringalgorithms,spatialinformationisvisualizedasfine
clustersintheauditorycortexarea.TheGNGalgorithmwasusedin
Fig.7. Clustersinatransparentbrainimagewhenrunningthe(a)GNGand(b) RGNGclusteringtechniques.
Fig.8. Clustersoverlaidontotheanatomicalimagewhenrunningthe(a)GNG(b) RGNGand(c)SPM.
2009).TheROIobtainedwithintheauditorycortexwhenrunning
theGNGalgorithm(Fig.8A)issimilartothatobtainedbythesame
approachintheliterature.Ingeneral,aclustercorrespondstoa
groupofvoxelswithasimilarHDRoveraTC.
Theblockdesignexperimentwasconductedbyrunningthe
pro-posedRGNGapproachusingauditorydata.Theactivationshown
inFig.8Bislocatedinthetemporallobe.Thespatialinformation
showsthattheareasofactivationobtainedaresimilartothose
expectedfromtheauditorycortexexperiments,whicharedetected
asvariationsofvoxelintensityovertime.Inthecurrentstudy,the
dataobtainedbytheRGNGapproachwasseparatedaccordingto
theTCsignalsofvoxelintensityvariationsrelativetoitsaverage.
Similartoallclusteringalgorithms,theRGNGattemptedtoportion
homogeneousareasofactivationinthebrainthatwere
compara-bletothoseareaslocatedusingotherapproachesandfoundinthe
recognizedcorticesrelatedtotheexperiment.Theseareasor
clus-tersaredescribedbyanaverageTCoraclustercenterobtainedby
averagingalloftheTCsofthecluster.
ThenovelapplicationofRGNGoutputclusteringresultscanbe
recognizedasthebestwithrespecttotheGNGapproachbecause
theclusterresultsdefinedthespecificauditorycortexarea.
More-over,thefMRIoutputresultsobtainedbyrunningtheRGNGwas
thesameastheoutcomeobtainedbyrunningtheSPMusingthe
samedatasetandthesameparadigm,aswillbediscussedinthe
nextsubsection.
5.2. ComparingauditorydatarunningRGNGwiththatofSPM
Theparadigmof theblockdesignexperiment alternatestwo
conditions,namely,withoutthestimulusandwithauditory
stim-uli,which consist of repetitions of two-syllable words,suchas
“mother,”“house,”“weather,”and“movie”.Fig.8Bshowstheability
oftheRGNGclusteringtechniquetoidentifywinnernodes,
deter-minetheoptimalnumberofunderlyingclusters,andproducea
TCforactivationdetectioninanauditorydataset.Fig.8Cshowsthe
areaofactivationintheauditorycortexofwholebrainrunningSPM
withf-contrasttestresultswithfamily-wiseerror(FWE)threshold,
withnomasking,theFWE-correctedpvalue=0.05.
TheresultsofSPMbasedonGLMusingtheparadigmasa
ref-erencesignalintroducedbiasintheexperiment.Bycontrast,the
RGNGapproachdidnotusetheparadigmasthereferencesignal
becauseitworksasamodel-freemethod.Insummary,theRGNG
resultswerewithintheexpectedoutputsandhavesimilarresultsto
thosefoundwiththehypothesismethodindetectingactiveareas
withintheexpectedauditorycortices.TheRGNG signalchanges
overaTCinauditoryfMRIdatasetswhichcanbecalculatedby
label-ingthepixelsofthesamecluster(membershipTC)orbyplotting
thedistanceoftheTCstoagivenclustercenter(distanceTC).
NovelandextensivesimulationstudiesonrealfMRIdatasets
were conductedusing the RGNG unsupervised clustering
algo-rithm. A potential problem associated withGLM model is the
requirementofanaccurateestimateofthefMRIparadigmdesign.
Indifferentcases,itisdifficulttoprovideprecisemodeldesigns;
eithertheproblemfromthesubjectswhodidthetaskincorrectly
(alsothesamesubjectmaygiveadifferentresponseforthesame
paradigmatadifferenttime)ordifferentsubjectsmaystillgive
dif-ferentBOLDsignalsduringthesameparadigm.TheresultinFig.8B
showsthatthismethodcancomplementthemodel-basedmethod
tocopewiththedifficultiesandchallengesinfMRIdataanalysis.
ThefindingscanimprovetherecognitionofthenatureofthefMRI
dataandtheunderlyingmechanisms.
6. Conclusions
Themajorobjectiveofthisstudyistodetectandclassifythe
activatedareasofthebrainusingarobustandefficientalgorithm.
Thistypeofstudyhasnotyetbeenconducted,andthecurrent
study,whichusesRGNGwithfMRI,isthefirstattempttodoso.
In conclusion, the RGNG can detect the activezones in the
brain,analyzebrainfunction,anddeterminetheoptimalnumber
ofunderlyingclustersinfMRIdatasets.Thisalgorithmcandefine
thepositionsofthecenterofanoutputclustercorrespondingtothe
minimalMDLvalue.ThevalidityoftheperformanceoftheRGNG
algorithmwastestedusingrealauditoryfMRIdata,whicharebased
Somedifficultieswereaddressedbyusingtheconventional
clus-teringalgorithms.For example,thenumber ofclustersmustbe
defined earlier and thecluster detection problemhasdifferent
dimensionswithinthesamedataset.TheRGNGmergestheGNG
structurewithrobustpropertiesandusesMDLtodefinethe
prob-lemsofoptimalnetworkrepresentationsandparameters,which
madetheRGNGinsensitivetotheinitializations,inputsequence
ordering,andoutliersandmorerobusttowardnoisyinputdata.
During thenetwork-growing process, theRGNG can effectively
determinetheoptimalnumberofclustersandtheir
correspond-ingpositions,whichareclosertotheactualclustercenters(with
thesmallestMDLvalue)withminimalinfluencefromtheoutliers.
Theexperimentaloutputresultsshowedthesuperior
perfor-manceoftheRGNGovermodel-basedapproachesandoneofthe
prototype-based clusteringalgorithms on realfMRI datasets as
revealedbytheirperformancemeasuredbyMDLandROC
anal-ysis.Thisworkproposednoveland powerfulmethodsfor fMRI
dataanalysis,whichintegratetheadvantagesofthehypothesisand
exploratoryanalysismethods.
TwotypesoffMRIanalysismethodswerecompared,namely,
GLManddata-drivenanalysesusingmachinelearningclassifiers.
TheGLMisthemostcommonmethodforfMRIdataanalysisbut
isbased heavilyona prioriBOLDmodeldesign. Insomecases,
theGLMcannotbeusedforbrainactivationdetectionwhen
pre-viousinformation aboutthedata is unavailable.Anexample is
aresearchinvolvingmentalsubjectorduringdaydreamingand
mind-wandering(defaultmodeofbrainfunction)(Yongnan,2010).
Thus,effectivealternativeapproachesusingdata-drivenanalysis
wereintroducedtodetectbrainactivitybasedonthedata
struc-ture.TheproposedapplicationofRGNGonarealfMRIdatasetwas
reviewedonasingle-subjectauditoryfMRIdata.Thismethodcan
bealsoextendedtomulti-subjectdata-drivenanalysis(multiple
subjectdata)offMRIdataset.RGNGapproachmaybepreferable
formultiplesubjectstudiesinsteadofanalysesdatafrom
single-subjectastheusedauditorydata.
Theparadigmoftheauditorydatasetusedintheexperiment
wasablock-typedatadesign.Forfuture,thisworkcanbeextended
towardexperimentswithevent-relateddatadesign.
TheRGNGcandealwellwithfMRI,whichiscomposedof
mul-timodaldatasets.Thus,theapproachcanbeappliedtootherreal
multimodaldatasets,suchasMRIimagesegmentationinthebrain
and otherregionsof thebody.Thus, clustersofdifferentorgan
shapesinthebodycanbedetectedusingotherdistancemetrics
becausetheEuclideandistancemetricusedwiththeRGNGcan
detecttheclustersofthebrain,whichisanapproximately
spher-icalorellipsoidalregionwithminimaldifferencesinthevariance
ineachdimension(FriguiandKrishnapuram,1999).
In future studies, cluster validity measures other than the
MDLcriterioncanbeusedwithRGNG.Minimummessagelength,
Bayesianinformationcriterion,andAkaike’sinformationcriterion
canbeappliedtotackletheuseofthecommonMDLvalidityindex
usedinthiswork.Thefindingsfromthisworkcanhelpaddressthe
variousdifficultiesthatneurologistsandpsychologistsencounter
duringanalysistoimprovetheinterpretationoffMRIdata.
Acknowledgment
Theauthorswouldliketothankthereviewerswhosecomments
greatlyimprovedthequalityofthemanuscript.
References
AlJobouri,H.K.,Jaber,H.A.,C¸ankaya,I.,2017.Performanceevaluationof prototype-BasedclusteringalgorithmscombinedMDLindex.Comput.Appl. Eng.Educ.25(4),642–654(WileyInc.).
Aljobouri,H.K.,C¸ankaya,I.,Karal,O.,2015.Frombiomedicalsignalprocessing techniquestofMRIparcellation.Biosci.Biotechnol.Res.Asia12,1115–1138.
Baumgartner,R.,Windischberger,C.,Moser,E.,1998.Quantificationinfunctional magneticresonanceimaging:fuzzyclusteringvscorrelationanalysis.Magn. Reson.Imaging16,115–125.
Bock,H.H.,Diday,E.,2000.AnalysisofSymbolicData,ExploratoryMethodsfor ExtractingStatisticalInformationfromComplexDataStudiesinClassification. DataAnalysisandKnowledgeOrganization,Springer-Verlag.
Calhoun,V.,Adali,T.,Pearlson,G.,Pekar,J.,2001.Spatialandtemporal independentcomponentanalysisoffunctionalMRIdatacontainingapairof task-relatedwaveforms.Hum.BrainMapp.13,43–53.
Campain,R.,Minckler,J.,1976.Anoteonthegrossconfigurationsofthehuman auditorycortex.BrainLang.3,318–323.
Chen,H.,Yuan,H.,Yao,D.,Chen,L.,Chen,W.,2006.Anintegratedneighborhood correlationandhierarchicalclusteringapproachoffunctionalMRI.IEEETrans. Biomed.Eng.53,452–458.
Dimitriadou,E.,Barth,M.,Windischberger,C.,Hornika,K.,Moser,E.,2004.A quantitativecomparisonoffunctionalclusteranalysis.Artif.Intell.Med.31, 57–71.
Francesco,D.S.,Fabrizio,E.,Tommaso,S.,Elia,F.,Elio,M.,Claudio,S.,Sossio,C., Raffaele,E.,Klaus,S.,Erich,S.,2003.fMRIoftheauditorysystem:
understandingtheneuralbasisofauditorygestalt.Magn.Reson.Imaging21, 1213–1224.
Frigui,H.,Krishnapuram,R.,1999.Arobustcompetitiveclusteringalgorithmwith applicationsincomputervision.IEEETrans.Patt.Anal.Mach.Intell.21, 450–465.
Friman,O.,Borga,M.,Lundberg,P.,Knutsson,H.,2002.ExploratoryfMRIanalysis byautocorrelationmaximization.Neuroimage16,454–464.
Friston,K.J.,Frith,C.D.,Liddle,P.F.,Frackowiak,R.S.,1993.Functionalconnectivity: theprincipalcomponentanalysisoflargePETdatasets.J.Cereb.BloodFlow Metab.13,5–14.
Friston,K.J.,Poline,J.B.,Strother,S.,Holmes,A.P.,Frith,C.D.,Frackowiak,R.S.,1996. AmultivariateanalysisofPETactivationstudies.Hum.BrainMapp.4,140–151. Fritzke,B.,1994.Growingcellsstructures—aself-organizingnetworkfor
unsupervisedandsupervisedlearning.NeuralNetw.7,1441–1460. Fritzke,B.,1995.AGrowingNeuralGasNetworkLearnsTopologies,Advancesin
NeuralInformationProcessingSystems7.MITPress,Cambridge,pp.625–632. Fritzke,B.,1997.SomeCompetitiveLearningMethods(draft),TechniqueReport.
InstituteforNeuralComputation,Ruhr-University,Bochum.
Goutte,C.,Toft,P.,Rostrup,E.,Nielsen,E.F.,Hansen,L.,1999.OnclusteringfMRI timeseries.Neuroimage9,298–310.
Heydar,D.,Ali,T.,Emad,F.,2009.ExtractingactivatedregionsoffMRIdatausing unsupervisedlearning.In:ProceedingsofInternationalJointConferenceon NeuralNetworks,Atlanta,GeorgiaUSA,pp.641–645.
Hyvarinen,A.,Karhunen,J.,Oja,E.,2001.IndependentComponentAnalysis.John Wiley&Sons.
John,A.,Gareth,B.,Chun-Chuan,C.,Jean,D.,Guillaume,F.,Karl,F.,Stefan,K.,James, K.,Vladimir,L.,Rosalyn,M.,Will,P.,Maria,R.,Klaas,S.,Darren,G.,Rik,H., Chloe,H.,Volkmar,G.,Jeremie,M.,Christophe,P.,2013.SPM8Manual, FunctionalImagingLaboratory,TrustCentreforNeuroimaging.Instituteof Neurology,LondonUK.
Katwal,S.B.,2011.AnalyzingfMRIdatawithgraph-basedvisualizationsof self-Organizingmaps.In:IEEEInternationalSymposiumonBiomedical Imaging,Chicago,pp.1577–1580.
Korczak,J.,2007.InteractiveMiningofFunctionalMRIData,Signal-Image TechnologiesandInternet-BasedSystem,(SITIS‘07).IEEEComputerSociety, Washington,DCUSA,pp.912–917.
Korczak,J.,2012.VisualexplorationoffunctionalMRIdata.In:Karahoca,A., INTECH(Eds.),DataMiningApplicationsinEngineeringandMedicine.,pp. 249–264.
Lachiche,N.,Hommet,J.,Korczak,J.,Braud,A.,2005.Neuronalclusteringofbrain fMRIimages.ProceedingofPatternRecognitionandMachineInference, 300–305.
Lange,N.,Strother,S.C.,Anderson,J.R.,Nielsen,F.A.,Holmes,A.P.,Kolenda,T., Savoy,R.,Hansen,L.K.,1999.PluralityandresemblanceinfMRIdataanalysis. Neuroimage10,1999.
Liao,W.,Chen,H.,Yang,Q.,Lei,X.,2008.AnalysisoffMRIdatausingimproved self-Organizingmappingandspatio-temporalmetrichierarchicalclustering. IEEETrans.Med.Imaging27,1472–1483.
Lindquist,M.A.,2008.ThestatisticalanalysisoffMRIdata.Stat.Sci.23,439–464. Martinetz,T.,Schulten,K.,1991.A.NeuralGasNetworkLearnsTopologies,
ArtificialNeuralNetworks.Elsevier,pp.397–402.
McKeown,M.,Makeig,S.,Brown,G.,Jung,T.,Kindermann,S.,Bell,A.,Sejnowski,T., 1998.AnalysisoffMRIdatabyblindseparationintoindependentspatial components.Hum.BrainMapp.6,160–188.
Mckeown,M.J.,2000.Detectionofconsistentlytask-relatedactivationsinfMRI datawithhybridindependentcomponentanalysis.Neuroimage11,24–35. Pereira,F.,Mitchell,T.,Botvinick,M.,2009.MachinelearningclassifiersandfMRI:a
tutorialoverview.Neuroimage45.
Qin,A.K.,Suganthan,P.N.,2004.Robustgrowingneuralgasalgorithmwith applicationinclusteranalysis.NeuralNetw.17,1135–1148.
Rissanen,J.,1983.Auniversalpriorforintegersandestimationbyminimum descriptionlength.Ann.Stat.11,416–431.
SPM,1991.StatisticalParametricMapping.http://www.fil.ion.ucl.ac.uk/spm/. Seghier,M.L.,Friston,K.J.,Price,C.J.,2007.Detectingsubject-Specificactivations
usingfuzzyclustering.Neuroimage36,594–605.
Skudlarski,P.,Constable,R.T.,Gore,J.C.,1999.ROCanalysisofstatisticalmethods usedinfunctionalMRI:Individualsubjects.Neuroimage9,311–329.
Sun,X.,Xu,W.,2014.FastimplementationofDeLong’salgorithmforcomparing theareasundercorrelatedreceiveroperatingcharacteristiccurves.IEEESignal ProcessLett.21,1389–1393.
TheAnalysisGroup,2012.FMRIB(Oxford,UK)http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/. Wismuller,A.,Meyer-Base,A.,Lange,O.,Auer,D.,Reiser,M.F.,Sumners,D.,2004.
Model-freefunctionalMRIanalysisbasedonunsupervisedclustering.J. Biomed.Inform.37,10–18.
Yongnan,J.,2010.Data-drivenfMRIDataAnalysisBasedonParcellation,Ph.D Thesis.UniversityofNottingham(October).