A window-based time series feature extraction method

(1)

A window-based time series feature extraction method

Deniz Katircioglu- €

Oztürk

a,*

, H. Altay Güvenir

b

, Ursula Ravens

c,1

, Nazife Baykal

a a_{Middle East Technical University, Institute of Informatics, Medical Informatics Department, 06800 Ankara, Turkey}

b_{Bilkent University, Computer Engineering Department, 06800 Ankara, Turkey}

c_{Technische Universit€at Dresden, Institut für Pharmakologie und Toxikologie, 01187 Dresden, Germany}

A R T I C L E I N F O

Keywords: Time series analysis Feature extraction Cardiac action potential Atrialﬁbrillation Electrocardiography Myocardial infarction

A B S T R A C T

This study proposes a robust similarity score-based time series feature extraction method that is termed as Window-based Time series Feature ExtraCtion (WTC). Specifically, WTC generates domain-interpretable results and involves significantly low computational complexity thereby rendering itself useful for densely sampled and populated time series datasets. In this study, WTC is applied to a proprietary action potential (AP) time series dataset on human cardiomyocytes and three precordial leads from a publicly available electrocardiogram (ECG) dataset. This is followed by comparing WTC in terms of predictive accuracy and computational complexity with shapelet transform and fast shapelet transform (which constitutes an accelerated variant of the shapelet trans-form). The results indicate that WTC achieves a slightly higher classification performance with significantly lower execution time when compared to its shapelet-based alternatives. With respect to its interpretable features, WTC has a potential to enable medical experts to explore definitive common trends in novel datasets.

1. Introduction

In the medical domain, biophysical signals in the form of time series are frequently used for diagnostic and prognostic purposes and are spe-ciﬁcally relevant to documenting the history[1]and clinical course of a disease[2,3]. Many decisive traits emerge from biophysical signals as rules of thumb suggested by health professionals based on a visual

in-spection [4]. Conversely, computer-aided methods extract common

patterns among time series and establish a more objective data assess-ment framework. Thus, they are of immense practical value in inter-preting data with respect to diagnostic, prognostic, and therapeutic perspectives[5].

Computerized analysis of medical data is an active research area. In the computational geometry domain, it is assumed that“B-spline” con-stitutes one of the most efﬁcient surface representation methods[6,7]. Signiﬁcant curve reconstruction efforts focused on using various spline paradigms[8],[9]especially in the cardiac domain[10]. However, using spline-based methods on biophysical signals is not very helpful to pro-fessionals in diagnosis or assessment of indicators for diseased condi-tions; because splines do not produce human-readable summarizing models. Previous studies examined the construction of local and piece-wise polynomial models to represent time series[11,12]. Recently, the use of Gaussian Process modeling is considered as an attractive option

for analyzing time series data, especially in the medical domain. Within a Bayesian framework,[13]conducted an analysis by solving a regression problem by assuming that a dataset consisted of observations and outcome variables. This approach is suitable for predicting missing outcome values of biological time series that typically suffer from ir-regularities in sampling and missing observations [13,14]. Another feature extraction method was recently been developed to discover sig-niﬁcant subsequences termed as “shapelets” within a dataset[15,16]. A shapelet is a time series subsequence identiﬁed as representative of a certain dataset. The discovery of shapelets requires brute-force traversal of all overlapping subsequences wherein the task of determining the individual length and number of these subsequences is left in an un-guided manner to the algorithm users. A shapelet transformation pro-duces human readable outputs, and therefore they are used for benchmarking purposes in the present study.

This study addresses the problem of extracting definitive time-domain features from a given time series dataset to be used in data mining applications with the ultimate goal of devising a computationally efficient algorithm. A time series feature extraction method is designed to summarize class-dependent behavior within consecutive time windows and to derive an overall similarity score for this behavior. Specifically, the proposed method involves identifying temporal features that define the behavior of instances in a certain target class that are later used to

* Corresponding author.

E-mail address:denizkatircioglu@gmail.com(D. Katircioglu- €Oztürk).

1_{Present address: Freiburg University, Institute for Experimental Cardiovascular Medicine, University Heart Centre, 79110 Freiburg, Germany.}

Contents lists available atScienceDirect

Computers in Biology and Medicine

j o u r n a l h o m e p a g e :w w w . e l s e v i e r . c o m / l o c a t e / c o m p b i o m e d

http://dx.doi.org/10.1016/j.compbiomed.2017.08.011

(2)

discriminate between instances among other classes. In order to demonstrate the effectiveness of the proposed method, a human cardiac action potential (AP) dataset and an ECG dataset consisting of recordings from three different leads are employed.

Cardiac APs are bioelectrical signals that are recorded in cardiac tis-sues obtained during heart surgery from patients with defined heart rhythms. Based on a patient's heart rhythm at the time of surgery, APs may either exhibit a “spike-and-dome” shape for physiological sinus rhythm (SR) or a triangular shape for atrialfibrillation (AF)[17]. Spe-cifically, AF is considered as the most common arrhythmia in clinical practice with an approximate prevalence of 0:4 1% in the general population and is usually associated with stroke, heart failure, and a significant increase in all-cause mortality [18]. Patients with AF are diagnosed based on thorough clinical examinations and ECG recordings. However, the resulting AP signals may exhibit SR characteristics based on the stage of the disease. Cardiac AP is a highly informative signal that attracts significant attention from researchers with a key focus of revealing dynamics in cardiac ion channels that govern electrical activity within the heart. An understanding of ion channels is essential in iden-tifying the effects of certain agents in related drug studies[17]. Addi-tionally, AP signal acquisition is a labor-intensive task, and this is considered as a limiting factor for widespread studies especially in the area of statistical time series analysis. Extant studies produced desired AP properties by fitting empirical models, [8], to simulate the real-life

behavior of AP. The AP dataset used in the present study in addition to validating the proposed model also serves the purpose of stress testing the proposed method for a dataset consisting of crowded and densely sampled time series. To the best of the authors' knowledge, this is theﬁrst study in which a data mining analysis is conducted on a cardiac AP time series dataset.

Conversely, the ECG dataset, consisting of patients with acute myocardial infarction and control subjects, is specifically selected to present a clinical application of the proposed method. Myocardial infarction (MI) constitutes a fatal cardiac disease that corresponds to irreversible loss of heart muscle due to ischemia caused by bloodflow interruption [19]. Prominent guidelines recommend the use of ECGs especially for emergency cases in addition to certain cardiac biomarkers with their superior sensitivity to MI[20]. Additionally, ECG is an effec-tive and non-invasive technique to confirm MI diagnosis during an initial evaluation. With respect to ST elevation myocardial infarction (STEMI), 12-lead ECGs exhibit certain morphological changes in the expected waveform such as ST-segment elevations or depressions and wave in-versions or losses in different leads depending on the localization of MI [21]. Public repositories publish anonymized digital datasets in the form of time series. Examples include“PhysioNet” that is a repository con-sisting of a large and balanced amount of MI and control subjects[22]. The present study examines a specific selection of a PTB Diagnostic ECG Database available in the PhysioNet repository[23].

The rest of the paper is organized as follows: Section2describes the proposed method, WTC. Section3presents the performance evaluation results for the WTC method in conjunction with two other shapelet-based benchmark methods that use the cardiac AP and the three leads of the ECG time series datasets. The medical and technical aspects of these re-sults are discussed in Section4, and this is followed by theﬁnal section that presents the conclusions of the study.

2. WTC method

In this section, the proposed Window-based Time Series Feature Extraction (WTC) method is explained in a stepwise manner. Initially, Section2.1describes the necessity of the pre-processing performed on the time series instances and how important this preliminary step is to the

proposed method. Following that, Section 2.2 entails the proposed

method by elucidating its components that constitute a basis for it. The mathematical expressions for the major components are also predicated with a pseudocode for a further grasp of the computational complexity calculated at the end of this section.Table 1summarizes the notation presumed throughout this section.

2.1. Dataset pre-processing

The proposed method requires processing local features in the time series of interest. Thus, time series instances are registered both in time and signal amplitudes. Registration in time refers to aligning time series to aﬁxed and known position such as a time instant at which a certain event is expected to occur for a single-cycle phenomenon (e.g., ex-vivo action potential recordings) or the epoch of a cycle for a cyclic phe-nomenon (e.g., ECG and phonocardiogram recordings). In contrast, registration in signal amplitude facilitates performing time series analysis irrespective of the offset and scale of the recorded signal that may well depend on an experimental setup. In this study, each instance in a time series dataset S is denoted by Skwhere k2 ½1; K and K denotes the total

number of instances available. For the rest of this section, it is assumed that the whole dataset S is registered to obtain the dataset T in which instances are denoted by Tk. Registration constitutes a data-dependent

step, and thus, the method to obtain T from S is described in Section3 that introduces AP and ECG time series datasets used to validate the proposed method.

Table 1

Symbols in alphabetical order. Symbol Explanation

αj The trajectory-based weight of SCBjbased on Ij;k βj The distance-based weight of SCBj

c The PMF resolution

dj;k The Euclidean distance of Tkto TAwithin SCBj δ DCT energy threshold

D PMF of the deterministic distribution with atom at zero D ð:; :Þ Distance between two PMFs

DTA _{One-dimensional DCT of T}A fN Nyquist rate of TA fs Sampling rate of S

γj The weight of the distance-based component of the similarity score for SCBj H The maximum amplitude of the time series instances in T

Ij;k The indicator of inclusion for Tkwithin SCBj J Time window index

J Number of shape conﬁdence bands k Time series instance index K Number of time series instances l⋆ Cut-off index in DCT domain

L Shapelet length for the brute-force and fast shapelet transform M Number of perceptually important time points (PIPs) in TA n Sample point index of a registered time series instance N Number of sample points in a registered time series instance p Conﬁdence level for the conﬁdence interval

ℙj Distance PMF for SCBj r PIP oversampling ratio S Time series dataset

Sk kth_{time series instance in the dataset S} SCBj jth_{shape conﬁdence band}

Zj The ensemble average similarity score for SCBj Zj;k The individual similarity score of Tkfor SCBj

ZðdÞ_j;k Distance-based component of the similarity score of Tkfor SCBj ZðtÞj;k Trajectory-based component of the similarity score of Tkfor SCBj T Registered time series dataset

TA _{Class representative average time series} Tk kth_{time series instance in the registered dataset T}

TL _{Lower bound of the conﬁdence interval with the level p around T}A TU _{Upper bound of the conﬁdence interval with the level p around T}A U PMF of the uniform distribution

ϕj;k The reward associated with the proximity of Tkto TAfor Wj W Number of sample points in an Wjwhere j≠J

Wj jth_{time window}

(3)

2.2. Proposed analysis

For the purpose of notational simplicity, the proposed WTC method is described with respect to a single class label throughout this section. It is noted that it is possible to independently run WTC for each class for a multiple class scenario as shown in Section3in which datasets with two class labels are examined.

2.2.1. Determining the local time window

Time series instances continue to exhibit a certain degree of inter-subject variation even after registration. The averaging of registered time series instances makes the proposed method robust against noise and distortions that stem from trajectory variability and possible artifacts [24]. In this aspect, the registered time series instances, Tk, are averaged

out to obtain a class representative average time series that is denoted by TA_{as follows:} TA_{ðnÞ ¼} 1 K XK k¼1 TkðnÞ (1)

where n denotes the sample point index of the time series. Restrictions on the averaging step to time series instances bearing the same class label

emphasize local dynamics and their signiﬁcance in deﬁning

class behavior.

The resulting representative trajectory constitutes a base time series from which the local features of the time series are extracted. The determination of suitable time windows, preferably extracted from the time series itself, is necessary to obtain local features. The domain of locality is deﬁned as non-overlapping and contiguous time windows denoted by Wjwith aﬁxed length w where j 2 f1; 2; …; Jg and J

corre-sponds to the total number windows along TA_{. J is equal to}_{⌈N=w⌉ where}

N denotes the total number of sample points in TA_{. It is noted that these}

time windows span the entire time series, and the length of the last time window, WJ, may be lesser than w.

In order to choose w, this study considers“Perceptually Important Points” (PIPs)[25,26]and well-known Discrete Cosine Transform (DCT) as heuristic tools. This is followed by briefly reviewing the concept of PIP and subsequently describing the manner in which it is complemented by DCT to select w. Specifically, PIPs are defined as observational points in a time series that exert a relatively high influence on human vision[27]. The PIP approach is useful in reducing dimensionality while indicating the hierarchy among data points based on their visual importance. The identification of PIPs is as follows[26]: Thefirst and the last data points in the time series are identified as the first two important points in the importance hierarchy, and thus the next PIP is determined as the data point with the largest distance to the points. Recursively, each candidate PIP at a specific iteration corresponds to the data point with the largest distance to its two adjacent PIPs that is determined so far. The PIP se-lection process continues until all data points in the time series are exhausted. The original study,[26], proposes that the distance between PIPs is measured by using any of the following metrics: Euclidean,

ver-tical, or perpendicular distances. The present study employs

Euclidean distance.

In the context of the present study, determination of PIPs in a time seriesﬁts closely with the aim of extracting class-representative patterns from a time series dataset. Essentially, the PIP algorithm sorts the data points of a given time series of total length N based on their perceptual importance. Next, the“most important” M data points are selected from the N points. For this purpose, it is assumed that M¼ ⌊N=r⌋, where r denotes the oversampling ratio of the signal that is deﬁned as follows:

r ¼ fs=fN (2)

The terms fsand fN in Eq.(2)denote sampling and Nyquist rates, respectively, of the signal of interest. The widely known one-dimensional

DCT is employed to calculate r¼ N=l⋆ _where _r_{2 ½1; N and}

l⋆_{2 f1; 2; …; Ng. The term l}⋆_{denotes the lowest frequency index in the} DCT domain representation of the signal below in whichδ percentage of its total energy is contained. It is assumed that DTA _{denotes the} one-dimensional DCT of the time series TA

, and then l⋆is determined based on the following expression:

arg min l⋆ P_l⋆ l¼1DTlAj 2 PN l¼1DTlAj 2 δ (3)

The best choice for the threshold parameterδ is explored in Section3 in conjunction with the remaining parameters of the proposed method.

This is followed by extracting M number of PIPs from the time series toﬁnally set w to the maximum horizontal length between the consec-utive PIPs such that it is guaranteed that at least one PIP appears in Wjfor

all j.

2.2.2. Distance-based similarity

Local time windows Wjare determined to calculate a distance-based

similarity score for each time series instance, Tk. The Euclidean distance

of a speciﬁc time series instance Tk to TAwithin the support of Wj is

denoted by dj;k¼TAðnÞ TkðnÞwhere n2 Wjand dj;k2 ½0; ffiffiffiffiw

p H. H indicates the maximum amplitude value of the time series instances in T. The distance-based weightβjcorresponding to Wjis expressed as follows:

β_j¼ P_K k¼1ϕj;k P_J j¼1 P_K k¼1ϕj;k (4) where, ϕj;k¼ ffiffiffiffi w p H dj;k (5)

The termϕj;kin Eq.(5)that falls in the interval½0; ffiffiffiffiw p

H corresponds to the reward associated with the proximity of the time series instance Tk to TA

for the implied time window Wj. The termϕj;kprovides a stan-dardized scale for all local time windows that span the time series. Thus, they are weighted against each other as shown in Eq.(4). It is noted that an increase in the proximity of the time series instances to TA _{for a} particular Wjincreases the associated weight. The distance-based simi-larity score denoted by Zj;kðdÞfor time series instance Tkand time window Wjis calculated based onϕj;kas follows:

Z_j;kðdÞ¼ βj ϕ_j;k ffiffiffiffi w p H (6) 2.2.3. Trajectory-based similarity

Distance-based similarity is complemented and reinforced with trajectory-based similarity. The average class representative time series, TA_{, is used to obtain a similarity band as a class representative trajectory.}

It is assumed that each sample point in the registered time series follows a Gaussian distribution as shown in Ref.[28], and an upper (lower) time series denoted by TU_(TL_{) is formed in which sample points are higher}

(lower) than those in TA_{dictated by a conﬁdence level, p. Consequently,}

the aforementioned similarity band is deﬁned as a band bounded by TU

and TL _{that is subsequently partitioned into time localized bandlets}

termed as Shape Conﬁdence Bands (SCBs). Each SCB denoted by SCBjis

related to the corresponding Wjas follows:

SCBj:¼

ðn; yÞn 2 Wj∧y 2 ℝ∧TLðnÞ < y < TUðnÞ:

(7)

Speciﬁcally, the SCBjis essential in determining as to whether the temporally coinciding part of a time series of interest aligns with the class representative average time series within Wj. The rationale of SCBs in-volves quantifying the similarity of a trajectory by restricting the orien-tation of an instance such that it is aligned with that of TA

allowing for a certain margin of error. For this purpose, each subsequence of a time

(4)

series is tested as to whether it completely resides within the boundaries of the corresponding SCBj. It is assumed that Ij;kdenotes the indicator of inclusion for a time series instance Tk within SCBj, which is deﬁned as follows:

Ij;k¼

1; TkðnÞ 2 SCBj∀n 2 Wj;

0; otherwise: (8)

Finally, the trajectory-based similarity score denoted by Z_j;kðtÞfor each time series instance Tkand SCBjis expressed as follows:

Zj;kðtÞ¼ Ij;kαj (9) where, αj¼ P_K k¼1Ij;k P_J j¼1 P_K k¼1Ij;k (10)

The termαjin Eq.(10)is deﬁned as the trajectory-based weight of SCBjcalculated over all the time series instances that reside within SCBj. It is noted that an increase in the number of time series instances that follow the trajectory of SCBjincreases the associated weight.Fig. 1shows an example of a scenario that emphasizes the difference between the previously discussed distance and trajectory metrics. Time series in-stances Tk1 and Tk2 involve similar distances to T

A_{. However, the} tra-jectory of Tk1falls out of the band SCBj, and this is in contrast to Tk2.

2.2.4. Overall similarity

The trajectory-based and the distance-based similarity scores Z_j;kðtÞand Z_j;kðdÞ, respectively, are linearly combined to devise a single similarity score of time series Tkfor each SCBjthat is denoted by Zj;kas follows:

Zj;k¼ 1 γj Zj;kðtÞþ γjZ ðdÞ j;k; (11)

where (1 γj) and γj correspond to the associated weights of the

trajectory-based and distance-based components, respectively. Thus, the average similarity score for each SCBjis deﬁned as follows:

Zj¼

XK k¼1

Zj;k: (12)

It is proposed to calculateγjbased on the probability mass function (PMF) of time series distances within SCBjdenoted byℙjas follows:

γ_j¼ D D; ℙj DD; ℙj þ DU; ℙj ; (13)

The termsU and D in Eq.(13)denote PMFs corresponding to the uniform and deterministic distance distributions, respectively. Addi-tionally,D denotes an operator that computes the distance between two PMFs based on Algorithm 1 presented in Ref.[29]. The aforementioned PMFs are constructed with a signal level resolution denoted by c. Throughout this study, c is set to 0.001. Furthermore, PMFU is defined over the support½0;pffiffiffiffiwH, and D corresponds to an impulse at zero. A

random variable associated with the PMFU (D) possesses maximum

(minimum) entropy for its support. The rationale of Eq.(13)involves assigning a weight to the distance-based (trajectory-based) similarity score of SCBj inversely proportional to the distance of ℙj to U (D). Trajectory-based similarity scores are calculated over time series in-stances residing within each SCBj. Conversely, distance-based similarity scores are calculated over the entire set of time series instances that are more dispersed with respect to the conﬁdence band of interest. With respect to extreme cases whereℙj tends toU (D), distance-based (tra-jectory-based) similarity scores dominate the overall similarity score.

The overall algorithm is outlined in Algorithm 1, and the related source codes are accessible online via[30].

Algorithm 1 WTC (T , p ) 1:½TA_{; T}U_{; T}L_{←RepresentativeTimeSeriesðT ; pÞ} _Eq.₍₁₎ 2: r←PIPCompressionRatioðTA_Þ _Eq.₍₂₎ 3: PIPs←PIPIndicesðTA_Þ _[26] 4: Wj←LocalTimeWindowsðPIPs; rÞ Section2.2.1 5: SCBj←extractSCBsðTU; TL; WjÞ Eq.(7) 6:αj←SCBWeightsAlphaðT ; SCBj; WjÞ Eq.(10) 7:βj←SCBWeightsBetaðT ; TA; WjÞ Eq.(4) 8:γj←SCBWeightsGammaðT ; SCBj; WjÞ Eq.(13) 9: Zj←SimilarityScoresðT ; αj; βj; γj; WjÞ Eq.(12) 10:return [SCBj; Zj]

A prominent advantage of Algorithm 1 corresponds to its relatively lower complexity, which is analyzed next. Step 1 of Algorithm 1 involves point-wise mean and conﬁdence level computations that involve OðKNÞ complexity. Steps 2 and 3 require the computation of DCT and the execution of the PIP algorithm, respectively, over a single times series TA_,

which corresponds to OðNlogNÞ when N corresponds to a power of 2 (OðN2_{Þ otherwise)}_[31]_{and OðN}2_Þ_[26]_{. Evidently, steps 4 and 5 involve}

OðNÞ complexity. As suggested by the corresponding equations, Steps 6, 7, 8, and 9 correspond to OðKNÞ complexity. It is concluded that Algo-rithm 1 possesses OðKN þ N2_{Þ quadratic complexity, thereby summing}

up the individual complexities of its constituent steps. It is noted that it is possible to improve the efﬁciency of Step 3 that constitutes the most computationally intensive part of the overall algorithm by abandoning the related iterations immediately after determining the 100=r percent-age of PIPs.

This section is concluded by stating that WTC generates Zj which

corresponds to the intra-class average similarity score for each SCBjfor a

given time series dataset. Speciﬁcally, Zj denotes an efﬁcient

represen-tation of the dataset that it pertains to and is therefore expected to in-crease the interpretability of the dataset by domain experts.

3. Results

In this section, results of the proposed WTC method are presented for AP and ECG time series datasets. First, in Section3.1, time series features are extracted from these datasets with WTC and two other benchmark methods, namely, the shapelet transform[16](hereafter referred to as “brute-force shapelet transform”) and one of its variants that is termed as “fast shapelet transform”[32]. Fast shapelet transform is proposed to

Fig. 1. An example of a case with two time series instances Tk1 and Tk2 with similar

Euclidean distances from the representative average time series TA _{albeit following} different trajectories.

(5)

reduce the computational complexity of the legacy brute-force shapelet transform without sacriﬁcing much from accuracy. The details for the shapelet-based methods are given in Section3.1.1before moving on to the feature extraction results provided in Sections3.1.2 and 3.1.3. Next, the extracted features are evaluated with a series of classiﬁers and the corresponding results of WTC are compared with those of the brute-force and fast shapelet transforms in Section3.2. Finally, statistical results regarding the ranking of WTC among the benchmark methods and the robustness aspect of WTC are presented in Section3.3.

3.1. Feature extraction

3.1.1. Benchmark methods for feature extraction

Brute-force shapelet transform is a feature extraction method to discover the so-called shapelets[16]from a given time series dataset. Shapelets are essentially subsequences that are extracted from the time series instances themselves through an exhaustive sliding window-based search operation. It is assumed that L denotes a desired shapelet length speciﬁed by a user of the brute-force shapelet transform. For a given K number of time series instances of length N, there exist ðN L þ 1Þ subsequences, i.e., shapelet candidates, from each time series that result in a total ofðN L þ 1ÞK candidates. The distance between each sub-sequence - time series pair is computed. It is noted that a single distance computation requires sliding the subsequence across the time series of interest in all possible ways to determine the closest possible match. Next, a distance vector to all time series instances (with mixed class la-bels) is obtained for each subsequence. A distance threshold is then determined for each subsequence to yield the most homogenous sepa-ration of instances, i.e., the highest information gain, in terms of class labels. Finally, a user speciﬁed number of subsequences (with the highest information gain) is designated as shapelets. The corresponding distance vectors of the shapelets then correspond to transformed feature vectors that are generated by the brute-force shapelet transform. It is stated that the computational complexity of the shapelet extraction routine corre-sponds to OðK2_N4_Þ_[16,15]_{. The current state-of-the-art shapelet}

dis-covery method [33] reﬂects an improvement with respect to the

brute-force method as it reduces the worst-case running time to OðK2_N3_{Þ, which is still deemed as inconvenient for large datasets. More}

importantly, the temporal location of a shapelet is disregarded by the

very nature of the shapelet extraction process, and this could be of sig-niﬁcant importance for biological interpretations.

A part of the latest efforts to improve shapelet extraction includes the fast shapelet transform [32]. In this approach, a complete dataset is projected to a lower-dimensional representation that is referred to as SAX [28]such that the entire shapelet discovery is completed in OðKN2_{Þ time}

at the expense of a degradation in performance in terms of accuracy. Although the fast shapelet transform avoids exhaustive distance com-putations, it does not guarantee that the same set of shapelets is reached as that reached by its brute-force counterpart. The proposed WTC method with OðKN þ N2_{Þ complexity clearly outperforms the family of shapelet}

methods in the aforementioned aspect. In the present study, an open-source Java solution [15]is employed to obtain results for the brute-force shapelet transform, and a proprietary Cþþ based imple-mentation[32]is used for fast shapelet transform.

3.1.2. Cardiac action potential (AP) dataset

As an application of the proposed method to a real-life dataset, AP recordings obtained from human right atrial biopsies are used [34].

Within the scope of a project entitled “The European Network for

Translational Research in Atrial Fibrillation (EUTRAF)” [35] that is funded by the European Community's Seventh Framework Programme, the AP dataset was collected in the period from January 2006 to February 2014. Each patient's written informed consent was obtained, and thus the study conforms to the Declaration of Helsinki and was approved by the ethics committee of Dresden University of Technology (No. EK790799). 3.1.2.1. AP dataset description. The time series instances of the AP dataset are clinically attributed to the class label AF if it is considered that the corresponding patient is in chronic atrialﬁbrillation (ICD-10 code I48.2) and to the class label SR with respect to control patients with a normal sinus rhythm. Patients with paroxysmal or intermittent AF are excluded. The presence of AF was conﬁrmed with pre-operative ECGs throughout the data collection. Eligible AP recordings were obtained from a total of 341 unique patients that comprise 142 AF (aged from 47 years to 85 years with a mean of 72.29 years and 37.68% female, 62.32% male) and 199 SR (aged 22 years–86 years with a mean of 67.12 years and 23.98% female, 76.02% male) patients with a temporal sampling rate of 10 kHz. In some cases, where an additional trabecula could be

Fig. 2. (a) Schematic of a human cardiac AP infive phases, namely phase 0: rapid depolarization; phase 1: early rapid repolarization; phase 2: “plateau” phase; phase 3: final repolar-ization; and phase 4: resting membrane potential. (b) Schematic of human cardiac APs (upper row), ion channels (middle row), and major ion currentflows (lower row). Nav1.5, cardiac Naþchannel conducting Naþcurrent (INa); Kv4.3, Kþchannel conducting transient outward current (Ito); Cav1.2, L-type Ca2þchannel conducting L-type Ca2þcurrent (ICa,L); Kv11.1, Kþ channel conducting rapidly activating, delayed outward rectifier Kþ_{current (I}

Kr); Kv7.1, slowly activating outward rectiﬁer Kþcurrent (IKs); Kir2.1, Kir2.3, inward rectiﬁer Kþ cur-rent (IK1).

(6)

dissected for a separate experiment, more than one AP sequence per patient were recorded. For multiple recordings from the same patient, experiments were conducted either in parallel (but on different set-ups) or one after the other, where the remaining preparations were stored in oxygenated Tyrode solution until use. With respect to the class AF, 86 patients have a single recording, 52 patients have 2 recordings and 4 patients have 3 recordings each, making a total of 202 time series in-stances. For the class SR, 179 patients have a single recording and 20 patients have 2 recordings each, making a total of 219 time series in-stances. Therefore, K¼ 202 AF time series instances were obtained from 142 AF patients and K¼ 219 SR time series instances were obtained from 199 SR patients.

In order to pump blood, it is necessary for all parts of the heart to contract in concert. Additionally, electrical excitation of the car-diomyocytes is a prerequisite for coordinated contraction. From the pacemaker cells in which electrical activity is generated, APs propagate throughout the heart and trigger contraction[36].

Each AP exhibits the following distinct phases (as shown inFig. 2a): the resting membrane potential (phase 4), the rapid depolarization

(“upstroke”, phase 0), early rapid repolarization (phase 1), the “plateau” (phase 2);ﬁnal repolarization (phase 3), and return to resting membrane potential (phase 4). Excitable cells possess an inside-negative resting membrane potential that is maintained by electrochemical gradients for the cations Naþand Kþ across the cell membrane due to ion pumps as well as dependent ion channels that open and close in a voltage-dependent and time-voltage-dependent manner (seeFig. 2b).

When the threshold for activation is reached by a small depolarization of the cell membrane, Naþchannels open rapidly and allow Naþto enter the cell. The inﬂux of positive charge further depolarizes the cell mem-brane. Repolarization is caused by an efﬂux of Kþ _{through various K}þ

channels of different kinetics. An inﬂux of Ca2þ_{via L-type Ca}2þ_channels

triggers the release of further Ca2þfrom intracellular stores, and this is necessary for the activation of contractile machinery. Thus, the shape of an AP is governed by the sum of all membrane currents listed along the lower row ofFig. 2b thatﬂow through various ion channels.

Atrial APs from SR patients have a characteristic“spike-and-dome” shape. During AF, electrical activity becomes very fast and uncoordi-nated, leading to remodeling processes that change - amongst others - the expression of ion channels in the atrial cardiomyocytes and giving rise to the typical triangular-shaped AP (seeFig. 3b;[17,37]).

3.1.2.2. AP dataset registration. AP time series instances are single event-triggered and acyclic, and therefore do not require segmentation for registration. All AF and SR instances are registered in time based on the instant at which membrane potential of each time series peak occurs, i.e., end of phase 0. The time window length of a time series from its peak membrane potential to the resting membrane potential actually consti-tutes a differentiating factor between AF and SR classes due to ion channel activities under investigation. Thus, in the study, AP registration is done with respect to the time course of the acquired signal by avoiding techniques such as re-sampling and dynamic time warping. Time domain alignment provides human-readability and brings localized time win-dows into prominence, thereby easing decision making in real-life practice. First, the peak membrane potentials of all time series in-stances are aligned in time. Then, the largest leftward and rightward supports are found for which each time series instance is guaranteed to have all corresponding sample points. For this particular AP dataset, the leftward and rightward supports turn out to be 24 and 4278 sample points, respectively. Adding the single sample point of the peak mem-brane potential, this amounts to an AP time series length of N¼ 4303

Fig. 3. (a) Registered AP time series instances of AF (202 instances) and SR (219 instances) with 4303 sample points each. (b) Typical“spike-and-dome” shaped AP from right atrial tissue of an SR patient (top) and triangular shaped AP from an AF patient (bottom).

Fig. 4. Identiﬁed PIPs from the class representative average time series of AF and SR classes.

(7)

sample points for both classes.

The next step comprises of registering signal amplitudes. With respect to the AP dataset, a single cycle begins with the voltage level that in-dicates a resting membrane potential and ends with almost the same ground value after an AP is triggered. Given this fact, the peak membrane potential occurring at the end of the rapid depolarization phase, i.e., phase 0, and the resting membrane potential phase, i.e., phase 4, of the experiment are considered as reference points for signal amplitude registration. Speciﬁcally, the registered time series Tk is obtained

as follows:

TkðnÞ ¼

SkðnÞ SkðNÞ

maxfSkðnÞg SkðNÞ

(14)

InFig. 3a, all registered instances of AF (blue) and SR (red) classes are shown.

Time series instances inFig. 3a manifest substantial intra-class vari-ability for both classes that corresponds to a coherent observation with

the study in Ref.[38].

3.1.2.3. WTC on AP dataset. An arbitrarily chosen set of 50 AF and 50 SR time series instances are used to extract features with WTC method. The same set is also used for the brute-force and fast shapelet transform methods in the following section. A complementary set with 152 AF and 169 SR instances is spared to evaluate the classiﬁcation performances of all three methods in Section3.2.1.

The proposed WTC method is executed for the energy threshold parameterδ 2 f0:9900; 0:9990; 0:9999g and conﬁdence level parameter p2 f0:95; 0:99g. However, only the results corresponding to δ ¼ 0:9990 and p¼ 0:95 are presented in this section. For these values of δ and p, WTC is shown in Section3.2.1.1to have peak performance in terms of classiﬁcation accuracy. This selection of δ results in an oversampling ratio r corresponding to 27.0629 and 21.4078 for AF and SR, respectively. Fig. 4shows selected PIPs from the class-representative average time series based on these values of r for both AF and SR classes.

Fig. 5. The weightsαj,βjandγjfor SCBs in (a) AF and (b) SR.

Fig. 6. Color map plots for the average values of Zjfor the classiﬁcation set. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

(8)

The values of the localized time window length, w, which corresponds to the maximum horizontal separation between the adjacent PIPs, involve 329 and 169 sample points corresponding to 32.9 ms and 16.9 ms for AF and SR, respectively. The window length determination step is immediately followed by constructing conﬁdence bands with a chosen conﬁdence level corresponding to p ¼ 0:95.

This is followed by segmenting the conﬁdence bands as suggested by the localized time windows Wjinto SCBs for each class. The calculated

weights, namelyαj,βj, andγj, for the corresponding Wjare depicted in

Fig. 5a and b for AF and SR, respectively.

With respect to both classes, the results indicate that the distance-based weights βj are close to each other, and this implies that the

rewards given based on the proximity to the class average for the cor-responding SCBj exhibit a similar trend. Conversely, trajectory-based

weightsαjexhibit a more signiﬁcant variation with respect to the

vari-ations in the SCBjand even decrease to zero for higher indexed SCBs for

both classes. This behavior indicates the failure of all time series in-stances to be completely encapsulated by the corresponding SCBs. The calculated weightsγjare much closer to 0 than to 1, thereby indicating

that they favor the trajectory-based component of the similarity score for both classes.

In order to visualize the predictive power of WTC in terms of Zj, color

map plots are presented inFig. 6a and b for AF and SR, respectively.

Fig. 7. Top 100 brute-force shapelets extracted from AF and SR instances with (a, b) L¼ 50, (d, e) L ¼ 300, (g, h) L ¼ 400 and fast shapelets with (c) L ¼ 50 from AF instances, (f) L ¼ 300 and (i) L¼ 400 from SR instances that are overlaid with the associated class representative average time series TA_.

(9)

Theseﬁgures constitute the most interpretable output of WTC and show the similarity scores Zj calculated by averaging Zj;kover all AF and SR

time series instances in the classification set. The more reddish tones imply regions with higher similarity, and thereby SCBs with higher predictive power. This observation coincides with the expected change in morphology for phases 2 and 3 of the AP signal (Fig. 2a) in the case of AF [39]. Thesefigures also depict the double-sided confidence bands of the class representative average time series and the extracted SCBj.

3.1.2.4. Benchmark methods on AP dataset. The following feature extraction results are obtained by shapelet-based methods for the aforementioned subsets of AP dataset (refer to Section3.1.2.3). In order to achieve a meaningful level of parameter exploration for the brute-force shapelet transform in a reasonable time, a range for the shapelet length L

(in units of sample points) is quantized as

L2 f50; 100; 150; 200; 250; 300; 350; 400g. It is noted that the chosen values for L span a wider range when compared to the values reached by the proposed WTC method for its time localized window length, w, with the aid of DCT domain analysis. The default value of the Java

imple-mentation [15] that corresponds to 100 is used for the number of

shapelets to be extracted. A set of extracted brute-force shapelets are shown inFig. 7a, b, d, e, g and h along with the class average time series, TA_{, for both AF and SR.}

This is followed by the execution of the fast shapelet transform. The fast shapelet procedure acts as an extension to the brute-force shapelet transform method and builds an internal decision tree termed as a“fast shapelet tree” with nodes that are ordered based on the information gain that they individually offer. It is assumed that L possesses the same range of values as those for the brute-force shapelet transform for the purpose of fairness. A set of extracted fast shapelets are depicted inFig. 7c, f and i along with the class average time series, TA_.

3.1.3. ECG dataset

In this section, three chest leads of an ECG dataset are examined to demonstrate the potential of the proposed WTC method from a clinical perspective. The ECG is highly clinically relevant, and therefore consti-tutes one of the most studied biophysical signal types.

3.1.3.1. ECG dataset description. The instances of the ECG dataset used in this study are compiled from Physikalisch-Technische Bundesanstalt DataBase (PTBDB), which is a publicly available repository of physio-logical signals in PhysioNet[23]. The PTBDB contains 549 12-lead ECG time series recordings from 290 unique patients. Each time series is

digitized with a 1 kHz sampling rate and 16-bit resolution over a signal amplitude range of ± 16.384 mV. The PTBDB contains instances with a

wide variety of labels including myocardial infarctions,

car-diomyopathy/heart failures, dysrhythmia, myocardial hypertrophy, and myocarditis. The largest subset of PTBDB is comprised of time series instances belonging to patients diagnosed with acute myocardial infarction. There are identiﬁed differences in the expression of P-QRS-T sequences of the ECG signal recorded from precordial leads during myocardial infarction[40]. Furthermore, it is known that ECG contains noise originating from different sources[41–43]. Thus, this dataset is considered as suitable to test the robustness of the proposed WTC method with respect to noise. Hereafter, the time series instances of this dataset is referred to as“MI” and “NR” corresponding to patients diagnosed with acute myocardial infarction and control patients with normal heart rhythms, respectively. In the present study, out of 148 instances of MI patients, a sub-group of 40 unique patients (aged from 37 years to 85 years with a mean age of 61.03 years and 22.50% female, 77.50% male) are randomly selected to obtain a balanced set relative to 40 available NR patients (aged from 17 years to 81 years with a mean age corresponding to 45.59 years and 25.00% female, 75.00% male).

3.1.3.2. ECG dataset registration. All MI and NR time series instances are registered for the proposed method. Each cyclic ECG time series instance isfirst segmented into multiple single-cycle P-QRS-T sequences begin-ning from an S point to a subsequent point. In contrast to the acyclic recordings in the AP dataset, ECG recordings are self-repeating and their periodicities are determined by the heart rate of the patients. However, it is the signal variation among the P-QRS-T sequences as opposed to the heart rate variability that differentiates MI patients from NR patients. Hence, heart rate variability (i.e., length of the time series from an S point to a subsequent point) is compensated by incorporating a signal pro-cessing technique termed as re-sampling. Re-sampling essentially warps the specified time series, which constitutes a single-cycle ECG signal in this specific case. Each single-cycle P-QRS-T time series is re-sampled in order to extend its length to that of the longest one. It is assumed that SkðnÞ denotes the re-sampled single-cycle P-QRS-T time series, and the

registered time series TkðnÞ is obtained by normalizing their amplitude

as follows:

TkðnÞ ¼

SkðnÞ minfSkðnÞg

maxfSkðnÞg minfSkðnÞg

: (15)

The three precordial leads of v2, v3, and v4 are separately examined. Fig. 8a and b shows TkðnÞ of MI and NR classes, respectively, for lead v2.

Fig. 8. Registered and annotated ECG lead v2 time series instances of (a) MI (40 instances) and (b) NR (40 instances) that each include 1308 sample points.

(10)

3.1.3.3. WTC on ECG dataset. Out of 40 MI (NR) unique patients, 25 are reserved for the feature extraction phase, and this leaves 15 patients to evaluate the classiﬁcation performance in Section3.2.2. With respect to the feature extraction set, a single P-QRS-T sequence is allowed from each patient resulting in 25 time series instances. However, for the classiﬁcation set, multiple (either 3 or 4) sequences are collected from each patient to reach a total of 50 time series instances.

For all leads of the ECG dataset, the same set of parameter valuesδ 2

f0:9900; 0:9990; 0:9999g and conﬁdence level parameter p 2

f0:95; 0:99g as those of the AP dataset in Section3.1.2.3are used. In a similar manner, the presented results in this section are limited to those ofδ ¼ 0:9990 and p ¼ 0:95, which are shown to yield the best classiﬁ-cation accuracy results in Section3.2.2.1. Accordingly,Fig. 9a, b, and c show selected PIPs in conjunction with the class-representative average time series for both MI (red) and NR (blue) classes for leads v2, v3, and v4, respectively.

The resulting similarity scores, Zj, are shown in the form of color map

plots for both MI and NR and for the three studied leads inFig. 10a through f. Theseﬁgures depict the degree of descriptiveness of the WTC method within each time window as discussed in Section3.1.2.3. The

obvious difference in the thickness of the conﬁdence bands when compared to those of the AP dataset indicates a higher variability of amplitude among the ECG instances.

Thus, WTC forms intra-class similarity along the temporal dimension for each SCB and points out morphological differences between the time series classes of interest. Prevalent cardiological literature indicates that the earliest signs of acute MI include increased T-wave amplitude (deﬁned as “hyper-acute”) over the affected area. These are termed as hyper-acute T-waves and are most evident in the anterior precordial chest leads[44](as shown inFig. 11a with respect to the fore-mentioned MI behavior in leads v2, v3, and v4). Additionally, MI SCBs around the T-wave (Fig. 10a, c and e) possess a relatively deeper red color. This observation aligns with a characteristic hyper-acute T-wave pattern that is expected to appear in the early phases of MI and is accompanied by a loss of the R-wave amplitude in the anterior chest leading to v2, v3, and v4[21,45].

3.1.3.4. Benchmark methods on ECG dataset. The following shapelet-based executions are performed by using the aforementioned feature extraction set comprising of 25 MI and 25 NR time series instances of

Fig. 9. Identiﬁed PIPs from the MI and NR class representative average time series from (a) v2 (b) v3, and (c) v4 leads of the ECG dataset.

(11)

Fig. 10. Color map plots for the average values of Zjof leads v2, v3 and v4 for (a,c,e) MI and (b,d,f) NR. (For interpretation of the references to colour in thisﬁgure legend, the reader is referred to the web version of this article.)

(12)

leads v2, v3, and v4 separately. The desired shapelet length is sampled in the same manner with Section3.1.2.4, while obtaining results for the brute-force and fast shapelet transform and limiting the number of shapelets to be extracted to 100. For the purpose of compactness, a limited set of extracted brute-force and fast shapelets from lead v2 is shown inFig. 12.

3.2. Performance evaluation

A series of classifiers are used to facilitate a comparison of the pro-posed WTC and the benchmark shapelet-based methods for AP and ECG datasets, in Sections3.2.1 and 3.2.2, respectively. The following methods within WEKA framework[46]are selected to maintain a broad coverage among the available classifiers: the well-known Naive Bayes classifier, J48 pruned tree (an implementation of the well-known C4.5 algorithm), random decision forest [47], adaptive boosting (AdaBoost.M1)[48], classification via regression (employing a type of decision tree with linear regression functions at the leaves[49]), bagging[50], multi-boosting (MultiBoostAB)[51], locally weighted learning (LWL)[52], partial de-cision tree classifier (PART)[53], ensemble of nested dichotomies (END) [54], decision stump [55], simple classification and regression tree (CART)[56], a proprietary algorithm termed as“ranking instances by

maximizing the area under the ROC curve (RIMARC)”[57], Bayesian

network learning, dagging[58], random subspace method[59], decision table majority classiﬁer, ripple-down rule learner (RIDOR)[60], alter-nating decision tree (ADTree)[61], and random tree construction and multi-class alternating decision tree with a logit-boost strategy (LAD-Tree). The similarity scores Zj;kand the shapelet distances to time series

instances constitute transformed feature vectors for WTC and the benchmark methods, respectively. The classiﬁcation accuracy results are obtained with a 10-fold cross validation.

3.2.1. AP dataset

3.2.1.1. Classification performance of WTC on AP dataset. In order to assess its performance, the proposed WTC method is executed for the Cartesian product of the parameter setsδ 2 f0:9900; 0:9990; 0:9999g, and p2 f0:95; 0:99g. The resulting classification accuracies based on the feature extraction of WTC are tabulated inTable 2for the spared clas-sification set of 152 AF and 169 SR instances and for various classifiers offered by WEKA. With the exception of a few entries inTable 2, WTC yields accuracies within 5% vicinity of each other, and this underlines the relative insensitivity of WTC to its parameters. The success rate of WTC exceeds 94% for a majority of the cases. The highest average classifica-tion accuracy is achieved by the parameter pairδ ¼ 0:9990 and p ¼ 0:95.

The proposed WTC method is self-sufficient in predicting a time localized window length, w. However, classification results are also ob-tained by artificially enforcing a window length. With respect the specific case, SCB extraction is performed by“by-passing” the prediction part for w.Table 3lists the accuracies for variations in the enforced SCB lengths and for the selected classifiers.

A comparison ofTables 2 and 3for the columns corresponding to (δ ¼ 0:9990, p ¼ 0:95) and w ¼ 200 (also for w ¼ 150), respectively, reveals the success of the PIP and DCT-based heuristics pursued by the proposed WTC method to determine the value of time localized window length w.

3.2.1.2. Classification performances of benchmark methods on AP data-set. The individual and average classification accuracies of the brute-force shapelet transform for varying values of shapelet length L and the selected classifiers are given inTable 4. The brute-force shapelet trans-form is shown to yield varying classification accuracies for variations in L irrespective of the deployed classification method. Shapelet length L ¼ 400 yields the most favorable ensemble average accuracy level (93:057%) and the corresponding first 100 brute-force shapelets are already shown inFig. 7g and h along with the class average time series, TA_{, for both AF and SR. The}_{figures reveal that the selected shapelets}

from different instances substantially overlap with each other for both classes. This observation is also interpreted as a redundancy in the computations since majority of subsequences correspond to a narrow time interval.

The resulting classification accuracies of the fast shapelet transform are presented inTable 5. The distance values obtained from the “fast shapelet tree” are considered as inputs for the listed classifiers of WEKA. As shown inTable 5, the ensemble average classification accuracy (90:269%) peaks at L ¼ 300 for which the single discovered shapelet among the instances of SR is shown inFig. 7f.

Based on the results inTables 2, 3 and 5the classiﬁcation accuracies calculated for WTC approximately exceed those of the fast shapelet transform by 4%. Conversely, despite the exhaustive search with an impractical computational complexity, the peak ensemble average clas-siﬁcation accuracy of the brute-force shapelet transform (93:057%) for L¼ 400 (shown inTable 4) is slightly lower than that of WTC for the enforced case with w¼ 200 (also for w ¼ 150) (94:125%) as well as for the non-enforced, i.e., the original case (94:333%).

3.2.2. ECG dataset

3.2.2.1. Classiﬁcation performance of WTC on ECG dataset. The energy thresholdδ and the conﬁdence level p parameters are explored within the

Fig. 11. (a) Typical “hyper acute” T-waves in ECG recordings collected from precordial leads v2, v3, and v4 indicating the condition of MI during early stages (record id: patient036 s0111lre) and (b) waveforms from corresponding leads of a control patient (record id: patient116 s0302lre). Each unit cell represents 0.2 s in time and 0.5 mV in amplitude. (Source: PhysioBank ATM).

(13)

same range of values as those in Section3.2.1.1, to determine the pair that yields the highest average classification with respect to the afore-mentioned 21 classifiers within WEKA. Due to space limitations, only classification accuracies for the lead v2 are presented inTable 6 for variations inδ and p.

Table 6reveals that the average accuracy for lead v2 of ECG dataset peaks (98.095%) at the same set of parameter values as those of the AP dataset in Section 3.2.1. Therefore,δ ¼ 0:9990 and p ¼ 0:95 are also ﬁxed for the leads v3 and v4. For the purpose of completeness, the cor-responding classiﬁcation results are listed inTable 7for both of the leads.

3.2.2.2. Classification performances of benchmark methods on ECG data-set. The individual accuracies achieved by the chosen set of WEKA classifiers on lead v2 by using features extracted by the brute-force and fast shapelet transform methods are listed inTables 8 and 9, respectively. As shown inTables 8 and 9, the classification success of both brute-force and fast shapelet transforms are highly sensitive to L. Shapelet length L¼ 300 yields the highest ensemble average accuracy (94:381%) for the brute-force shapelet transform. Thefirst 100 shapelets with L ¼ 300 on lead v2 are already depicted inFig. 12g and h overlaid with the class average time series, TA_{, for MI and NR, respectively.}_{Fig. 12}_{f also}

Fig. 12. Top 100 brute-force shapelets extracted from MI and NR instances with (a, b) L¼ 50, (d, e) L ¼ 150, (g, h) L ¼ 300 and fast shapelets selected only among MI instances with (c) L¼ 50, (f) L ¼ 150 and (i) L ¼ 300 that are overlaid with the associated class representative average time series TA_{of lead v2.}

(14)

shows the fast shapelets with L¼ 150 resulting in the highest ensemble average accuracy level (95:143%). For the purpose of completeness, the resulting classiﬁcation accuracies obtained for lead v3 and v4 are pre-sented inTable 10 through 13for the benchmark methods by noting that similar observations with lead v2 are made.

As the classiﬁcation accuracies of the proposed WTC method pre-sented in Section3.2.2.1are considered, it is apparent that the brute-force and fast shapelet transforms yield relatively lower accuracies for all three leads of the ECG dataset. The brute-force and fast shapelet transform methods attain accuracies of (94:381%, 95:190%, 95:190%) and (95:143%, 94:667%, 88:286%) for leads (v2, v3, v4), respectively, and these are exceeded by those of the proposed WTC method that correspond to (98:095%, 98:429%, 96:095%).

3.3. Statistical evaluation

Evaluation of the classiﬁcation performance of the proposed WTC

method is concluded by presenting a critical difference (CD) diagram [62]. The CD diagram is a representation that enables statistical evalu-ation of multiple methods over multiple datasets. Brieﬂy, methods of interest are ranked for each dataset to obtain corresponding rank vectors that are subsequently averaged to determine average ranks. Methods whose rank differences exceed the value of CD are termed as“critically different”. The CD is calculated as follows:

CD ¼ qα ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi FðF þ 1Þ 6R r ; (16)

where F, R, and qαdenote the number of methods, number of datasets, and“critical value”, respectively. In the present study, F ¼ 3 methods, R¼ 4 datasets, and q0:1, which is equal to 2.052 is used for a two-tailed

Nemenyi test. Further details on the CD diagram are available in Ref.[62]. The analysis in the present study involves ranking methods based on their average classiﬁcation accuracies considered over 21

Table 2

Individual and ensemble average classification accuracies of the proposed WTC method that is applied to the AP dataset for different values of the energy threshold δ, the confidence level p, and for the selected classifiers. The highest average classification accuracy is shown in boldface.

δ ¼ 0:9900 δ ¼ 0:9990 δ ¼ 0:9999 p¼ 0:95 p¼ 0:99 p¼ 0:95 p¼ 0:99 p¼ 0:95 p¼ 0:99 Naive Bayes 74.766% 81.620% 92.212% 92.212% 89.720% 90.031% J48 91.589% 92.212% 93.770% 93.458% 93.458% 94.081% Random forest 93.770% 93.770% 94.704% 94.704% 93.146% 94.081% AdaBoost.M1 93.146% 93.146% 95.639% 95.639% 95.639% 95.950%

Classif. via regr. 95.327% 95.327% 95.327% 95.016% 93.458% 93.146%

Bagging 94.393% 94.393% 94.704% 94.704% 94.704% 94.704% MultiBoostAB 93.458% 93.458% 94.704% 94.704% 94.704% 94.704% LWL 90.654% 90.654% 94.081% 94.704% 94.393% 94.704% PART 94.081% 94.081% 95.639% 95.639% 94.081% 93.770% END 91.900% 92.212% 93.458% 93.146% 92.835% 93.458% Decision stump 90.654% 90.654% 95.016% 95.016% 95.016% 95.016% Simple CART 92.835% 92.835% 94.704% 94.704% 94.393% 94.704% RIMARC 94.081% 94.081% 95.016% 95.016% 94.704% 94.704% Bayes NET 92.835% 92.835% 93.458% 93.458% 94.081% 94.081% Dagging 94.393% 93.770% 94.704% 94.081% 92.835% 94.704% Random SubSpace 93.770% 93.458% 93.146% 94.081% 94.081% 93.458% Decision Table 92.523% 93.770% 92.523% 92.212% 93.146% 93.458% Ridor 92.523% 92.523% 94.704% 94.393% 93.770% 94.081% ADTree 94.393% 94.081% 94.393% 94.393% 95.016% 95.016% LAD Tree 93.146% 93.458% 94.393% 94.393% 95.016% 95.327% Random Tree 90.343% 91.277% 94.704% 92.523% 93.146% 90.966% AVERAGE 92.123% 92.553% 94.333% 94.200% 93.873% 94.007% Table 3

Individual and ensemble average classification accuracies of the proposed WTC method applied to the AP dataset given variations in the enforced SCB length, w, and the selected classifiers. The highest average classification accuracy is shown in boldface.

w¼ 50 w¼ 100 w¼ 150 w¼ 200 w¼ 250 w¼ 300 w¼ 350 w¼ 400

Naive Bayes 91.277% 90.966% 92.212% 91.589% 91.277% 90.654% 90.654% 90.031%

J48 93.770% 90.654% 93.458% 93.770% 94.081% 94.393% 93.458% 93.146%

Random forest 94.081% 94.704% 94.393% 94.393% 94.081% 93.770% 94.081% 93.458%

AdaBoost.M1 94.704% 95.016% 95.950% 94.393% 95.639% 94.393% 96.262% 95.327%

Classif. via regr. 94.393% 95.016% 95.950% 96.885% 95.327% 95.327% 95.950% 95.639%

Bagging 94.704% 95.016% 95.016% 94.704% 94.393% 94.704% 94.393% 94.393% MultiBoostAB 95.016% 95.016% 94.704% 95.016% 94.704% 94.704% 93.770% 93.458% LWL 93.770% 94.704% 93.770% 94.393% 92.835% 94.081% 91.589% 91.900% PART 93.146% 92.212% 91.589% 91.900% 94.081% 92.835% 94.704% 94.704% END 93.458% 90.031% 93.458% 93.458% 94.081% 94.704% 93.146% 93.146% Decision stump 94.393% 95.016% 93.770% 95.016% 94.081% 95.016% 91.900% 91.900% Simple CART 95.327% 95.016% 94.081% 93.770% 93.458% 96.262% 93.146% 94.704% RIMARC 95.327% 95.327% 95.327% 95.327% 95.327% 95.327% 95.327% 95.327% Bayes NET 94.081% 93.770% 94.393% 94.081% 94.081% 94.081% 94.393% 94.081% Dagging 93.458% 92.212% 94.393% 94.081% 91.900% 93.146% 92.835% 92.835% Random SubSpace 93.770% 93.770% 94.393% 94.081% 93.770% 93.458% 94.393% 94.081% Decision Table 94.081% 91.277% 90.343% 92.212% 93.146% 93.770% 92.523% 94.393% Ridor 94.081% 91.900% 95.016% 94.704% 94.393% 94.081% 94.081% 94.393% ADTree 93.146% 95.016% 96.573% 95.327% 94.704% 94.393% 94.393% 94.393% LAD Tree 94.704% 94.704% 95.639% 94.704% 93.770% 94.704% 95.016% 94.393% Random Tree 92.835% 90.654% 92.212% 92.835% 94.704% 92.212% 92.835% 92.835% AVERAGE 93.977% 93.428% 94.125% 94.125% 93.992% 94.096% 93.755% 93.740%

(15)

classiﬁcation algorithms. The shapelet-based benchmark methods are favored by selecting their“best” L that yields the highest average clas-siﬁcation accuracy for each dataset as tabulated inTable 14.

The resulting CD diagram is shown inFig. 13. The horizontal axis in the diagram represents the average ranks of each feature extraction method. The average rank based on classiﬁcation accuracy improves from left to right. The CD is marked above the axis and corresponds to an indicator in rank magnitude that is required for the compared methods to differ such that they are termed as“critically different”. The connecting blue lines in the diagram depict the groups of methods that are not critically different. With respect to the datasets of the present study, the performance of the proposed WTC method signiﬁcantly exceeds that of the fast shapelet transform. The average rank of WTC exceeds that of the brute-force shapelet transform. However, the difference is not criti-cally different.

Finally, the robustness aspect of WTC is demonstrated in a statistical sense. Robustness of WTC is attributed to the averaging step taken in

obtaining TA_{. T}A_{is the class representative average time series from}

which the local time window length w is extracted. Making the compu-tation of w robust against intra-class subject variability and noise is vital to the success of WTC.

In order to evaluate robustness of WTC, random subsets are generated to obtain the corresponding w for each class of each dataset presented in this study, following the very procedure described in Section2.2.1. Size of the random subsets are chosen to be the same with those used in Section3.1, which are 50 and 25 for AP and ECG datasets, respectively. A total of 100 random subsets are generated for each class of each dataset. As a measure of robustness, Coefﬁcient of Variation (CoV) of the random variable w denoted by Vwis chosen. Vwis deﬁned as follows:

Vw¼ σ_μw w

; (17)

whereσwandμware the standard deviation and mean of w, respectively.

The resulting values of Vw are compared with those of a similar

Table 4

Individual and ensemble average classification accuracies of the brute-force shapelet transform applied to the AP dataset for variations in the shapelet length, L, and the selected classifiers. The highest average classification accuracy is shown in boldface.

L¼ 50 L¼ 100 L¼ 150 L¼ 200 L¼ 250 L¼ 300 L¼ 350 L¼ 400

Naive Bayes 74.766% 86.293% 89.408% 90.654% 91.900% 92.835% 93.146% 93.458%

J48 85.358% 87.851% 90.966% 89.408% 94.393% 93.770% 92.835% 92.523%

Random forest 86.293% 90.654% 93.458% 92.835% 94.393% 95.639% 94.393% 94.704%

AdaBoost.M1 81.308% 88.785% 90.966% 91.277% 94.081% 94.081% 94.704% 95.016%

Classif. via regr. 83.489% 90.654% 90.966% 92.835% 95.327% 94.704% 94.081% 95.327%

Bagging 87.851% 90.654% 92.212% 92.212% 94.393% 92.835% 93.146% 93.770% MultiBoostAB 84.424% 88.474% 91.589% 92.212% 94.393% 94.081% 95.016% 94.704% LWL 81.620% 81.932% 88.474% 87.851% 91.277% 87.227% 87.227% 90.031% PART 81.308% 90.031% 90.654% 89.408% 94.081% 92.212% 93.458% 93.770% END 85.358% 87.851% 90.966% 89.408% 94.393% 93.770% 92.835% 92.523% Decision stump 81.620% 81.620% 88.785% 88.162% 88.785% 89.097% 90.654% 89.097% Simple CART 84.112% 86.916% 91.277% 89.097% 94.081% 91.900% 92.835% 92.835% RIMARC 83.801% 90.654% 92.523% 91.900% 91.900% 93.458% 94.393% 94.393% Bayes NET 82.866% 89.097% 90.966% 90.654% 90.031% 91.900% 91.589% 93.146% Dagging 83.489% 92.212% 93.146% 92.523% 95.016% 95.950% 94.393% 95.950% Random SubSpace 80.685% 88.785% 91.589% 90.966% 90.343% 91.589% 91.589% 93.458% Decision Table 67.601% 72.586% 79.128% 78.505% 83.801% 77.882% 80.685% 83.178% Ridor 84.424% 92.523% 90.966% 89.097% 94.393% 92.212% 94.704% 92.835% ADTree 83.489% 89.720% 91.589% 91.277% 94.081% 95.016% 95.327% 96.573% LAD Tree 87.851% 87.851% 90.966% 91.277% 94.393% 93.770% 94.704% 93.146% Random Tree 80.685% 88.474% 88.785% 89.408% 90.031% 90.654% 92.835% 93.770% AVERAGE 82.495% 87.791% 90.447% 90.046% 92.642% 92.123% 92.598% 93.057% Table 5

Individual and ensemble average classification accuracies of the fast shapelet transform that is applied to the AP dataset for variations in shapelet length, L, and the selected classifiers. The highest average classification accuracy is shown in boldface.

L¼ 50 L¼ 100 L¼ 150 L¼ 200 L¼ 250 L¼ 300 L¼ 350 L¼ 400

Naive Bayes 80.062% 81.308% 85.358% 86.604% 88.474% 90.654% 92.523% 88.474%

J48 78.193% 82.555% 86.293% 87.851% 88.474% 90.031% 90.966% 87.227%

Random forest 72.274% 82.866% 79.751% 81.620% 82.866% 87.227% 90.031% 78.816%

AdaBoost.M1 77.882% 80.997% 86.916% 87.851% 88.785% 90.654% 90.031% 86.604%

Classif. via regr. 79.439% 79.751% 85.358% 85.981% 89.720% 90.031% 90.966% 88.474%

Bagging 78.193% 84.112% 86.293% 87.227% 89.097% 91.277% 89.720% 87.851% MultiBoostAB 77.570% 82.555% 87.851% 88.474% 88.785% 90.654% 89.720% 86.916% LWL 78.816% 80.997% 85.047% 87.851% 88.785% 90.031% 89.408% 87.227% PART 77.259% 82.555% 85.981% 87.851% 88.474% 90.343% 91.589% 87.227% END 78.193% 82.555% 86.293% 87.851% 88.474% 90.654% 90.966% 87.227% Decision Stump 78.816% 78.505% 87.227% 87.851% 88.785% 90.654% 90.343% 86.293% Simple CART 78.505% 81.620% 86.604% 84.424% 89.097% 90.031% 90.343% 87.227% RIMARC 80.062% 85.981% 87.227% 88.162% 90.654% 91.589% 93.458% 89.720% Bayes NET 76.324% 82.866% 86.916% 87.851% 88.785% 90.654% 89.408% 86.604% Dagging 78.505% 78.816% 78.193% 86.604% 89.408% 90.031% 90.654% 86.293% Random SubSpace 77.570% 71.651% 86.916% 87.851% 88.785% 90.654% 84.735% 86.604% Decision Table 78.816% 81.932% 86.916% 87.851% 88.785% 90.654% 88.474% 86.604% Ridor 77.570% 82.243% 83.489% 85.358% 87.227% 90.031% 88.162% 85.670% ADTree 77.259% 82.866% 84.735% 87.227% 88.474% 91.277% 90.654% 87.539% LAD Tree 75.389% 82.243% 85.047% 85.981% 87.227% 91.277% 90.654% 87.851% Random Tree 71.963% 80.685% 78.505% 81.620% 82.555% 87.227% 89.097% 79.128% AVERAGE 77.555% 81.412% 85.091% 86.664% 88.177% 90.269% 90.090% 86.456%

(16)

experiment in which w is extracted directly from the time series in-stances, omitting the averaging step. The number of values for w is equal to the number of instances. It is noted that, there exist a total 202 AF and 219 SR instances for the AP dataset and 75 instances each for MI and NR classes of the studied ECG dataset. As evident in Eq.(17), CoV normalizes standard deviation with mean, and thus is a suitable candidate for comparing standard deviation of two different random variables with

different mean values. As seen from Table 15, the averaging step

considerably decreases CoV of w which implies an improved level of robustness for WTC.

4. Discussion

In the study, a method termed as WTC is proposed to extract features from the signals in the form of time series. WTC is applied to a cardiac AP

dataset with labeled SR and AF patients and three precordial leads of an ECG dataset that consist of control subjects and patients diagnosed with acute MI. Extracted feature vectors from these datasets are then exam-ined for their classiﬁcation accuracies that yield favorable results. Sub-sequently, the brute-force and fast shapelet transforms are used to compare the performance of WTC in terms of predictive accuracy and computational complexity. Critical Difference (CD) analysis is performed on the datasets to reveal that WTC is“critically better” than fast shapelet transform. Although WTC and brute-force shapelet transforms are not “critically different”, the performance of the former is slightly better.

Analog signal acquisition and digitization involve thermal and quantization noise and biophysical signals that are not exceptions in this aspect. Noise caused by motion artifacts resulting from electrode, probe, sensor, equipment, and even from patient movements constitutes dis-turbances in the desired signal in addition to thermal noise. Furthermore, almost all biophysical signals exhibit individualized polymorphism among instances to a certain extent as mentioned by Ref.[63]for ECGs, and this may hamper interpretation. WTC constructs the class represen-tative time series as an ensemble average of all instances sharing a common class label and devises a mean trajectory that the population statistically follows within a conﬁdence bound to mitigate the noise originated effects and variability among individual instances. Hence, as opposed to directly extracting features from the instances themselves, WTC uses a class representative trajectory for this purpose. The study ﬁndings indicate that this approach yields favorable results for AP and ECG datasets, and the latter is recognized as a signal that is usually contaminated with high levels of noise[41–43].

The DCT representation of the time series in conjunction with the extraction of perceptually important points (PIPs) constitutes a central part of the proposed WTC method since it allows the determination of a proper window length for local features. Brieﬂy, DCT is utilized as a robust heuristic to feed a cut-off percentage to select the PIPs from the class representative average time series. Additionally, DCT is a widely used digital processing technique[64]that serves the purpose of elimi-nating redundancy in the processed data. It is noted that WTC determines the time windows and also assigns scores to indicating an order based on their descriptive power.

The presented results from completely different datasets reveal that the values of the energy threshold (δ) and the conﬁdence level (p) pa-rameters that yield favorable results for the WTC method are almost the

Table 6

Individual and ensemble average classification accuracies of the proposed WTC method applied to lead v2 of the ECG dataset for different values of the energy threshold δ, confidence level p, and selected classifiers. The highest average classification accuracy is shown in boldface.

δ ¼ 0:9900 δ ¼ 0:9990 δ ¼ 0:9999 p¼ 0:95 p¼ 0:99 p¼ 0:95 p¼ 0:99 p¼ 0:95 p¼ 0:99 Naive Bayes 95.000% 99.000% 96.000% 95.000% 96.000% 94.000% J48 97.000% 97.000% 98.000% 98.000% 98.000% 98.000% Random forest 100.000% 100.000% 100.000% 99.000% 100.000% 99.000% AdaBoost.M1 98.000% 98.000% 98.000% 98.000% 100.000% 100.000%

Classif. via regr. 99.000% 99.000% 100.000% 100.000% 94.000% 93.000%

Bagging 100.000% 99.000% 97.000% 97.000% 97.000% 97.000% MultiBoostAB 100.000% 99.000% 98.000% 98.000% 98.000% 98.000% LWL 93.000% 95.000% 97.000% 96.000% 99.000% 99.000% PART 97.000% 97.000% 98.000% 98.000% 98.000% 98.000% END 97.000% 97.000% 98.000% 98.000% 98.000% 98.000% Decision stump 93.000% 93.000% 93.000% 93.000% 93.000% 93.000% Simple CART 99.000% 99.000% 100.000% 100.000% 100.000% 100.000% RIMARC 100.000% 100.000% 100.000% 100.000% 100.000% 100.000% Bayes NET 100.000% 100.000% 98.000% 98.000% 98.000% 98.000% Dagging 91.000% 94.000% 94.000% 94.000% 96.000% 90.000% Random SubSpace 96.000% 96.000% 99.000% 98.000% 98.000% 98.000% Decision Table 98.000% 99.000% 98.000% 97.000% 99.000% 99.000% Ridor 99.000% 99.000% 100.000% 100.000% 100.000% 100.000% ADTree 99.000% 99.000% 100.000% 100.000% 100.000% 100.000% LAD Tree 99.000% 99.000% 100.000% 100.000% 100.000% 100.000% Random Tree 98.000% 96.000% 98.000% 86.000% 92.000% 95.000% AVERAGE 97.524% 97.810% 98.095% 97.286% 97.810% 97.476% Table 7

Individual and ensemble average classification accuracies of the WTC method applied to leads v3 and v4 of the ECG dataset for the selected classifiers. The highest average clas-sification accuracy is shown in boldface.

LeadV3 LeadV4

Naive Bayes 92.000% 96.000%

J48 98.000% 97.000%

Random forest 99.000% 99.000%

AdaBoost.M1 99.000% 97.000%

Classif. via regr. 99.000% 94.000%

Bagging 99.000% 95.000% MultiBoostAB 99.000% 94.000% LWL 99.000% 97.000% PART 98.000% 97.000% END 98.000% 97.000% Decision stump 99.000% 94.000% Simple CART 99.000% 95.000% RIMARC 99.000% 99.000% Bayes NET 99.000% 94.000% Dagging 93.000% 94.000% Random SubSpace 99.000% 93.000% Decision Table 99.000% 97.000% Ridor 100.000% 98.000% ADTree 100.000% 99.000% LAD Tree 100.000% 99.000% Random Tree 100.000% 93.000% AVERAGE 98.429% 96.095%