[ieee 2010 ieee symposium on computational intelligence in bioinformatics and computational biology...

8
Abstract—Paroxysmal Atrial Fibrillation (PAF), a really life threatening disease, is the result of irregular and repeated depolarization of the atria. In this paper, patients with PAF disease and their different episodes can be detected by extracting statistical and morphological features from ECG signals and classifying them by applying artificial neural network (ANN), Bayes optimal classifier and K-nearest neighbor (k-NN) classifier. Consequently, we become successful to diagnose about 93% of PAF patients among healthy cases and also detect their ECG signal different episodes such as those far from the PAF episode and the ones which are immediately before PAF episode with the correct classification rates (CCR) of more than 90%. I. INTRODUCTION Electrocardiogram (ECG) signals show heart activity and physicians try to diagnose different heart disorders by analyzing ECG signals. This signal is composed of 4 parts such as P-wave, QRS complex, T-wave and U-wave which are indicated in Fig. 1 [1]. Fig. 1. ECG Signal Nowadays, automated approaches are really welcomed to detect different heart disorders, predict onset of a heart attack and also classify several arrhythmias. Atrial fibrillation (AF) is one of the most common cardiac arrhythmia which causes irregular and repeated depolarization in atrium and hence the atrial rate exceeds 400 beats per minute and this can have serious mortality and critical heart strokes [2]. Chronic AF is usually preceded by Paroxysmal Atrial Fibrillation (PAF). Therefore, in addition to use antiarrhythmic drugs, the physicians are trying to develop pacing devices in order to suppress the onset of AF [3]. There are lots of researchers who have worked on screening and predicting PAF based on feature analysis and some conventional approaches [3-6]. G. Schreier et al. successfully detected and classified PAF dataset by the use Bahareh Pourbabaee, Graduate student of Physiology, Center of Nonlinear Dynamics, Department of Physiology, McGill University, Montreal, Qc, Canada. (e-mail: [email protected] ). Caro Lucas, Professor of Electrical and Computer Engineering Department, University of Tehran, Tehran, Iran. (e-mail: [email protected] ) of ECG pre-processing technique, correlation based assessment of the p-wave morphology and statistical test [3]. Also, W. Zong et al. [4] developed an algorithm based upon the number and timing of atrial premature complexes of ECG signals. In this paper, dataset of 2001 Computers in Cardiology Challenge is used. There are 25 people in each group of normal and PAF patients and for each patient we have two 30-minute ECG records immediately before and far from PAF episode and also there is a 5-minute ECG record showing PAF episode for the patients. Different groups of features are extracted from the above ECG records and the best features with the most separability between classes are selected through using various feature selection algorithms and finally the selected feature vectors are used to solve three classification problems including PAF patients diagnosis among healthy cases, detecting records immediately before and far from the PAF episode and also detecting PAF episode from the immediately before PAF episode records. In order to solve classification problems different classifiers such as k-NN, Bayes and ANN are applied. A flow chart of the above analysis is represented in Fig. 2. Fig. 2. Flow chart of ECG analysis II. ATRIAL FIBRILLATION Paroxysmal atrial fibrillation, a really life threatening disease, is the result of irregular and repeated depolarization of the atria. As you can see in Fig. 3, P-wave becomes distorted in PAF patient since there is rarely an impulse in a PAF patient heart which is able to depolarize the atria and as a result we can see some distorted small waves instead of a normal P-wave. Paroxysmal atrial fibrillation may finally become a critical disease which results in heart strokes and thromboembolisms. Therefore, the benefit of automatic detection of patients suffering from PAF is the ability of producing time and cost effective preliminary screening procedure during short time visit to clinics [7]. Paroxysmal Atrial Fibrillation Diagnosis Based on Feature Extraction and Classification B. Pourbabaee, C. Lucas 978-1-4244-6766-2/10/ $26.00 © IEEE

Upload: c

Post on 16-Mar-2017

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) - Montreal, QC, Canada (2010.05.2-2010.05.5)] 2010 IEEE Symposium on Computational

Abstract—Paroxysmal Atrial Fibrillation (PAF), a really life threatening disease, is the result of irregular and repeated depolarization of the atria. In this paper, patients with PAF disease and their different episodes can be detected by extracting statistical and morphological features from ECG signals and classifying them by applying artificial neural network (ANN), Bayes optimal classifier and K-nearest neighbor (k-NN) classifier. Consequently, we become successful to diagnose about 93% of PAF patients among healthy cases and also detect their ECG signal different episodes such as those far from the PAF episode and the ones which are immediately before PAF episode with the correct classification rates (CCR) of more than 90%.

I. INTRODUCTION Electrocardiogram (ECG) signals show heart activity and physicians try to diagnose different heart disorders by analyzing ECG signals. This signal is composed of 4 parts such as P-wave, QRS complex, T-wave and U-wave which are indicated in Fig. 1 [1]. Fig. 1. ECG Signal

Nowadays, automated approaches are really welcomed to detect different heart disorders, predict onset of a heart attack and also classify several arrhythmias. Atrial fibrillation (AF) is one of the most common cardiac arrhythmia which causes irregular and repeated depolarization in atrium and hence the atrial rate exceeds 400 beats per minute and this can have serious mortality and critical heart strokes [2]. Chronic AF is usually preceded by Paroxysmal Atrial Fibrillation (PAF). Therefore, in addition to use antiarrhythmic drugs, the physicians are trying to develop pacing devices in order to suppress the onset of AF [3]. There are lots of researchers who have worked on screening and predicting PAF based on feature analysis and some conventional approaches [3-6]. G. Schreier et al. successfully detected and classified PAF dataset by the use

Bahareh Pourbabaee, Graduate student of Physiology, Center of Nonlinear Dynamics, Department of Physiology, McGill University, Montreal, Qc, Canada. (e-mail: [email protected]).

Caro Lucas, Professor of Electrical and Computer Engineering Department, University of Tehran, Tehran, Iran. (e-mail: [email protected])

of ECG pre-processing technique, correlation based assessment of the p-wave morphology and statistical test [3]. Also, W. Zong et al. [4] developed an algorithm based upon the number and timing of atrial premature complexes of ECG signals.

In this paper, dataset of 2001 Computers in Cardiology Challenge is used. There are 25 people in each group of normal and PAF patients and for each patient we have two 30-minute ECG records immediately before and far from PAF episode and also there is a 5-minute ECG record showing PAF episode for the patients. Different groups of features are extracted from the above ECG records and the best features with the most separability between classes are selected through using various feature selection algorithms and finally the selected feature vectors are used to solve three classification problems including PAF patients diagnosis among healthy cases, detecting records immediately before and far from the PAF episode and also detecting PAF episode from the immediately before PAF episode records. In order to solve classification problems different classifiers such as k-NN, Bayes and ANN are applied. A flow chart of the above analysis is represented in Fig. 2.

Fig. 2. Flow chart of ECG analysis

II. ATRIAL FIBRILLATION Paroxysmal atrial fibrillation, a really life threatening

disease, is the result of irregular and repeated depolarization of the atria. As you can see in Fig. 3, P-wave becomes distorted in PAF patient since there is rarely an impulse in a PAF patient heart which is able to depolarize the atria and as a result we can see some distorted small waves instead of a normal P-wave. Paroxysmal atrial fibrillation may finally become a critical disease which results in heart strokes and thromboembolisms. Therefore, the benefit of automatic detection of patients suffering from PAF is the ability of producing time and cost effective preliminary screening procedure during short time visit to clinics [7].

Paroxysmal Atrial Fibrillation Diagnosis Based on Feature Extraction and Classification

B. Pourbabaee, C. Lucas

978-1-4244-6766-2/10/ $26.00 © IEEE

Page 2: [IEEE 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) - Montreal, QC, Canada (2010.05.2-2010.05.5)] 2010 IEEE Symposium on Computational

0 50 100 150 200 250 300 350 400-0.2

0

0.2

0.4

0.6

0.8

Samples

EC

G S

igna

l

0 50 100 150 200 250 300 350 400-2

-1

0

1

2

Samples

EC

G S

ign

al

Fig. 3. ECG signal: normal person (above), PAF patient (bottom)

III. FEATURE EXTRACTION ECG data used in this study contains 50 different people

while half of them are normal and the remaining half are PAF patients. These signals are sampled at 128 Hz and are 30 minutes in length. For those who have signs of PAF, there are three kinds of records. The first one is recorded far from the PAF episode, the second one is for immediately before the onset of PAF episode and the last one is recorded from the PAF episode. Therefore, initially to solve three classification problems using the above ECG dataset, different features should be extracted from ECG signals.

Preprocessing is the important step which precedes feature extraction. There is a DC drift in the baseline of the ECG signals and also some noise generated from the ECG collecting electrodes placed on the skin which all can be removed by using pass band Butterworth filter with the cutoff frequencies of 0.8 Hz and 35 Hz. In the next step, the ECG signals are normalized. Now, we can start feature extraction process. The extracted features are categorized in to two groups of statistical and morphological features.

A. Morphological Features The morphological features are extracted directly from

different parts of ECG signals and they are categorized in to two groups of P- wave and QRS complex features. Here, the ECGPUWAVE software [8] is used to find the beginning, peak and end points of different parts of ECG signals. In the first group we extracted P-wave time duration, amplitude, time interval between the onset and the peak, mean and variance of the wave. Then, in the second group R-wave amplitude, RR time interval and the time duration of QRS complexes are extracted from ECG signals.

B. Statistical Features In this group we are able to measure different time and

frequency specifications of ECG signals. Various features such as mean and energy of wavelet transform coefficients in different frequency bands, curve length, nonlinear energy, and the first five coefficients of autoregressive model of ECG signals, forth power, and power spectral density are extracted from the ECG signals.

B.1. Wavelet Transform Wavelet transform can be useful to study non-stationary

signals such as ECG [2]. In this study, the wavelet packet

decomposition with the 4th order of Daubechies mother function is used according to (1).

∑ ∑ ∑∞

−∞=

−∞=

−∞=−Ψ+−Φ=

k k k

jkjk ktdktctf ).2()()( , (1)

Whereas ck, dj,k , (.)φ and (.)Ψ are the approximation and detail coefficients, scale and Daubechies functions respectively. The first term shows the general pattern of signal while the second term gives the signal detail information. Since the 4th order Daubechies function in 5 levels is used, we will have 25

= 32 frequency bands and in each band it is possible to find the mean and energy of coefficients as a group of the extracted features.

B.2. Curve Length Fractal dimension is hardly measured in real time systems

due to its computational burden; therefore, in this study the curve length which is almost near the fractal dimension is measured as one of the statistical extracted features. It’s worth mentioning that all features are extracted in the window moving along the ECG signals which contains one ECG signal cycles. Curve length can be measured for each window according to (2).

∑+−

+−−=−−=

DDNn

DNniixixnCL

)(

1))(1(|)()1(|][ (2)

Whereas x, N, and D are showing the ECG signal samples, window length and the number of common samples between two consecutive windows respectively.

B.3. Energy and Mean of Nonlinear Energy Along a window with the length of N, the signal energy

can be computed according to (3).

∑+−=

=nN

Nniix

NnE

1)1(

2)(1][ (3)

It’s also useful to measure nonlinear energy which gives information on signal’s amplitude and frequency specification changes and can be computed according to (4).

)1()1()(][ 2 +−−= nxnxnxnNE (4)

In this paper we use the mean value of nonlinear energy.

B.4. Auto Regressive (AR) Model Coefficients AR model is the special case of the auto regressive

moving average (ARMA) model while L in the equation below is equal to zero.

∑ ∑=

=

=

=∀−=−+

Mm

m

Ll

lnlnulamnxmbnx

1 0)()()()()( (5)

In AR model, b coefficients can be found through using least square error algorithm [9]. In this study, the first five coefficients of AR model are used as extracted features.

Page 3: [IEEE 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) - Montreal, QC, Canada (2010.05.2-2010.05.5)] 2010 IEEE Symposium on Computational

B.5. Forth Power This feature is useful to get information about the signal

amplitude. B.6. Power Spectral Density ECG signal power spectral density (PSD) can be

expressed in terms of radian per unit sample and if the signal has n discrete samples, then the power spectral density can be measured through periodogram method which is the unbiased estimated of PSD and calculated as the scaled absolute value of the FFT according to (6) [10].

2

1

1)( ∑=

−=N

l

ljl

j exN

eS ωω (6)

Whereasω is the signal sampling frequency. Finally, 83 features are extracted by employing the above methods which are too many in PAF detecting procedure. Therefore, different feature selection algorithms can be applied to reduce the number of features according to the value of separation among classes related to each selected feature(s).

IV. FEATURE SELECTION The separation criteria should be applied to evaluate

different features and find the most important ones which bring more separation among classes. In this paper, the separation matrix and Fisher discriminant ratio (FDR) are found to be useful criteria.

If X shows our data classified in to two groups and P1 and P2 are the prior probabilities, then Sw, SB and S are the within classes scatter, between classes scatter and scatter matrixes respectively.

)2,1()},{( == ixX ii ω (7)

])[()(,.2

1

Tk

kk

L

kkkw xEkClassCovSSPS μ−=== ∑

=

= (8)

∑∑=

=

=

==−−=

2

1

2

1.))(.(

L

kkkall

L

k

TallkallkkB PPS μμμμμμ (9)

Bw SSS += (10) Therefore, the separability measure as a criterion can be

displayed in (11).

).( 1Bw SSTraceJ −=

(11) Also the Fisher discriminant ratio can be calculated

according to (12) for each selected feature, while μ and σ are the mean and variance values of each class.

∑∑= =

≠ +

−=

2 2

22

2)(L

i

L

ij ji

jiFDRσσ

μμ

(12) The more features we have, the more information we can

get from heart activities and ECG signal patterns, but this gives rise to have a long processing time and also it decreases the classifier generalization for testing data while

the error of training data classification was decreased. Therefore, it is recommended to apply feature selection algorithms to find the best features. Two feature selection algorithms named principal component analysis (PCA) and sequential forward selection (SFS) are used in this paper.

A. Principal Component Analysis In this algorithm we are able to find a space with lower

dimension and also with the highest variance [10]. Therefore, the feature vectors (X) can be transformed to a new space with lower dimension according to (13).

PPxPy ii == 2, (13)

Whereas P is the transformation matrix. So, the variance of Y in the new space can be computed according to (14) and (15)..

∑∑ ===i

TTiii i PCPTracePxPx

nPx

nyVar )())((11)(

2

(14)

∑=i

Tii xx

nC .1

(15) In order to transform feature vectors from a space with

higher dimension (D) to a space with lower dimension, d (d<D), we should arrange the Eigen values of the matrix C in the descent order and select the first ‘d’ Eigen values, then their assigned Eigen vectors are inserted to the columns of matrix P. Therefore, in the newly generated space, Eigen vectors and Eigen values are determined as the coordinate’s axis and the variance of selected features along each axis, respectively.

B. Sequential Forward Selection This algorithm starts with an empty feature vector and in

each step the feature which has the highest separation criterion is selected according to (11) and then is inserted to the vector. It is important to know that the second selected feature is the one with the highest separation among others while added to the previous vector and this procedure is continued as long as the desired number of features is selected or the separation criterion started to be decreased by selecting a new feature.

V. CLASSIFICATION The last step of PAF detection process is the feature

vectors classification which is done through using four kinds of classifiers in this study. At first the selected feature vectors acquired from the previous step are categorized in to two groups of training and testing data, then all the classifiers are trained with the training data and their structural parameters are estimated and finally they are used to classify testing data. The classifier efficiency can be measured for both groups of data.

A. K-NN Classifier The aim of a k-NN classifier is to find the nearest

neighbors of an undefined test pattern within a hyper-sphere of pre-defined radius in order to determine its true class. In the other words, k-NN classifiers find k-nearest samples in

Page 4: [IEEE 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) - Montreal, QC, Canada (2010.05.2-2010.05.5)] 2010 IEEE Symposium on Computational

some reference set, by taking a majority vote among the classes of these k samples [11]. Provided that the number of training samples is large enough, this simple rule exhibits good performance. This classifier also is well-known for two-class problems with impressive results. The reason for choosing k-NN classifier in this paper is that for each class, data points forms a mass type cluster in the feature space and hence using k-NN classifier yields accurate result with a simple classifier.

In this method the value of k defined in the way that produces the highest correct classification rate for the training data.

B. Bayes Optimal Classifier The Bayes’ formula [12-13] according to (16), shows that

by observing the value of the x feature vector we can convert the prior probability )( iP ω to a posterior probability )|( xP iω , while )|( ixP ω is the likelihood of

iω with respect to x.

)()()|()|(

xPPxPxP ii

iωωω =

(16) In this study, there are equal numbers of records in both

classes. Hence, 2/1)()( 21 == ωω PP . Finally, a selected feature vector is assigned to the class with the highest a posterior probability. Furthermore, there are two ways including k-NN and Gaussian methods to estimate )|( ixP ω .

B.1. K-NN Estimation Firstly, we consider a hypercube with the volume of V=h n

for each testing data, while h is determined in the way that the hypercube contains K nearest data points to the selected one and n is the number of features in the testing vector. Secondly, the probability density function for the selected testing vector, x, is),” calculated according to (17) [14].

NVKxP i .

1)|( −=ω (17)

Where N is the total number of feature vectors in class i. B.2. Gaussian Estimation In this method we can use (18) to estimate a probability

density function for each testing feature vector, x, while iμ and i∑ are the mean vector and covariance matrix of

training data. Also n is the number of features in a vector. ))()(

21exp(

||)2(

1)|( 1

2ii

Ti

i

ni xxxP μμ

π

ω −∑−−

= − (18)

C. ANN Classifier In this paper, a multi layered perceptron (MLP) model, is

used to classify feature vectors [10]. When the number of classes is limited, ANNs can achieve

more accurate results than other classifiers in many cases. ANNs are one of the most common classifiers in pattern recognition. They show robust performance together with flexible structure that makes them capable to perform many

classification tasks and construct highly complicated decision boundaries [15].

The number of input and output neurons of this network is respectively equal to the number of features and classes. Here, we use the sum of square error cost function and also a partial error back propagation algorithm (PEBP) to train the network. The output and hidden layer weight matrixes are the unknown network parameters which are calculated through the partial derivatives of cost function. The calculated weight matrixes define the structure of the network and then the trained network is used to classify testing data. The relation between the input, hidden and output layers of the network and also the learning algorithm are mentioned below.

)(0 jmi

Ni mim

SGyxWS ==∑ = (19)

)(0 jj

Mm mmjj PHZyUP ==∑ =

Where, G and H are both Tanh functions. W and U are the weight matrixes of input and hidden layers. The equations below show the cost function and its partial derivative in terms of unknown weight matrixes.

21

)(5.0 jC

j j tZE ∑ =−= (20)

00000

*)(*)(0

'mjjj

jmyPHtZ

UE −=

∂∂ (21)

000000

00

*)(**)(*)( ''imjmjjj

im

xSGUPHtZW

E −=∂

∂ (22)

Finally, the unknown parameters (α) or weight matrixes are updated through the training algorithm according to (23).

αηαα

∂∂−= −+ E (23)

Fig. 3. MLP Neural Network [16]

VI. PAF DETECTION RESULT Since it was previously mentioned, the ECG signals

should be preprocessed to have higher signal to noise ratio, then 83 features which are categorized in seven groups are extracted from the ECG signals. These feature groups are

Page 5: [IEEE 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) - Montreal, QC, Canada (2010.05.2-2010.05.5)] 2010 IEEE Symposium on Computational

displayed in Table I. TABLE I

THE NUMBER AND DESCRIPTION OF ALL FEATURE SETS Feature Set Description

1 P-wave Features (5)

2 QRS Complex Features (3)

3 PSD (1)

4

Curve Length, Signal Mean Value, Mean of Nonlinear Energy, Forth Power, Signal Energy (5)

5 AR model parameters (5)

6

Mean of WPD Coefficients(32)

7

Energy of WPD Coefficients (32)

In this study we have 3 classification tasks for which we

use the above feature sets.

A. PAF Patient Diagnosis In this task there are 25 ECG signals for each group of

normal and PAF patients. Therefore, in each class 17 ECG signals are used as training data and the remained signals are considered as testing data. The histogram below shows the FDR for all features using to solve the first classification problem.

0 10 20 30 40 50 60 70 80 900

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Feature Number

FD

R

Fig.4. FDR for all the features used to detect normal people from the PAF patients.

Also we can use PCA and SFS methods to find the best features with the highest separability measure. Then, four classifiers introduced in section V are used to classify selected feature vectors. The classification process is done for all the features selected in each feature set and the best results are gained from the first, forth and sixth feature sets especially those extracted from P-wave. These results are displayed in Tables II, III and IV while only five features selected from each feature set by using PCA and SFS methods.

Besides the above results, we understand that the P-wave

helps us a lot to diagnose PAF patients since during atrial fibrillation the P-wave part of ECG signal is affected more than other parts of the signal. Also, MLP neural network is more potent than other classifiers for detecting PAF patients. B. 30-Minute ECG Records Detection before PAF Episode

After diagnosing PAF patients, we try to detect two 30-minute ECG records of all PAF patients. These records include a period that is distant from any PAF episode and also the period immediately before the PAF episode. Therefore, we have 50 records from 25 patients which are categorized in to two kinds of records. So, there are 17 records in a training dataset and the remained records are used to test the classifiers. The figure below shows the histogram of all the extracted features for the second classification task.

In this task, the sixth and seventh feature sets bring more separation between the classes rather than the other feature sets. In result, the mean and energy of wavelet packet decomposition coefficients play important role to separate ECG signal records of PAF patients. The percentage of correct classification rate of these two feature sets are displayed in Tables V and VI. Similarly, the selected features in each group are acquired through using PCA and SFS methods. Also MLP classifier and Bayes classifier with K-NN estimation method are the best of all.

0 10 20 30 40 50 60 70 80 900

0.05

0.1

0.15

0.2

0.25

Feature Number

FD

R

Fig.5. FDR for all the features used to detect 30-minute ECG records of PAF patients.

C. PAF Episode Detection For each PAF patient we have a 30-minute ECG record

from immediately before the PAF episode and a 5-minute ECG record which exactly shows the PAF episode. In this task we try to detect these two kinds of records for each patient. We have the same explanation for training and testing data set and also seven feature sets as the previous section. FDR histogram of all the features is displayed in Fig. 6.

In according to Fig. 6 and also the selected features by using PCA and SFS methods, we can understand that those features extracted from QRS complexes such as the RR interval and R wave are much more powerful than other features to detect PAF episode. The seventh feature set, WPD coefficients energy, also gives good result for this part. Therefore, by selecting only the above features we can completely detect all the PAF episodes in this task. The

Page 6: [IEEE 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) - Montreal, QC, Canada (2010.05.2-2010.05.5)] 2010 IEEE Symposium on Computational

classification result of the second feature set is displayed in Table VII.

0 10 20 30 40 50 60 70 80 900

2

4

6

8

10

12

Feature Number

FD

R

Fig.6. FDR for all the features used to detect PAF episode.

VII. CONCLUSION In this paper we solve three classification problems to

detect PAF patients among healthy cases and also to find different periods of Paroxysmal Atrial Fibrillation through extracting a large group of features from ECG signals and comparing the ability of various feature sets to solve the classification problems. Applied feature selection algorithms such as PCA and SFS are also useful to decrease the processing time of diagnosis.

By taking the above results in to comparison, we can conclude that the MLP classifier is the best of all among the four mentioned classifiers especially when it is trained by error back propagation algorithm. This algorithm usually gives us higher CCR, less number of neurons and numerical complexity compared with the other algorithms such as total error back propagation and conjugate gradient algorithms. Also, in EBP training algorithm it is less probable for the

cost function to be obstructed in local minima since it is possible for the cost function to be increased at some points.

In according to the above results, P- wave features are the best of all for diagnosing PAF patients since you can see the most changes during the P-wave of a patient’s ECG signal. Moreover, R-wave features especially RR interval has been highly changed since the beginning of PAF episode and this property helps us a lot to detect the PAF episode from the preceding periods accurately. We can conclude that these informative extracted features may help us in the future to predict the onset of PAF episode by considering the changes happened in the amount of these features since the start of PAF episode.

Finally, we became successful to diagnose PAF patient among healthy people in more than 93% of cases and also with the CCR of more than 90% the different PAF patient’s ECG signal episodes were identified from each other through using various kinds of features. These results are much better than the top scores of 2001 Computers in Cardiology competition which are presented in [17]. Applied powerful classification methods and various kinds of extracted features which are selected through feature selection algorithms are really helpful to get the better result and also give us very good information about the best features for each classification problem.

It is possible to improve the above results especially for the first and second classification problems by applying different classifier fusion methods such as fuzzy integral, OWA and also Naïve Bayes in order to get better correct classification rates even for the less number of features. In these methods, mutually independent classifiers are combined together to classify the previously misclassified feature vectors correctly.

APPENDIX

TABLE II CORRECT CLASSIFICATION RATES (%) OF FOUR CLASSIFIERS FOR 5 SELECTED FEATURES FROM THE FIRST FEATURE SET BY USING PCA AND SFS METHODS

FOR BOTH TRAINING (TR) AND TESTING (TE) DATA. MLP Bayes with Gaussian Estimation Bayes with KNN Estimation KNN Feaure

No.

SFS PCA SFS PCA SFS PCA SFS PCA

TE TR TE TR TE TR TE TR TE TR TE TR TE TR TE TR

93.75 100 87.5 97.06 81.25 97.06 87.5 97.06 81.25 100 87.5 100 68.75 100 68.75 100 1 93.75 100 87.5 91.17 75 91.17 81.2 91.17 87.5 100 87.5 100 81.25 100 81.25 100 2 87.5 97.06 87.5 91.17 87.5 91.17 81.2 91.17 81.2 91.17 81.25 97.06 81.25 100 93.75 100 3 87.5 91.17 87.5 91.17 75 85.29 87.5 91.17 75 85.29 81.25 100 75 100 81.25 100 4 75 82.35 81.25 91.17 68.75 91.17 81.25 91.17 62.5 85.29 75 91.17 62.5 100 75 100 5

Page 7: [IEEE 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) - Montreal, QC, Canada (2010.05.2-2010.05.5)] 2010 IEEE Symposium on Computational

TABLE III CORRECT CLASSIFICATION RATES (%) OF FOUR CLASSIFIERS FOR 5 SELECTED FEATURES FROM THE FORTH FEATURE SET BY USING PCA AND SFS METHODS

FOR BOTH TRAINING (TR) AND TESTING (TE) DATA. MLP Bayes with Gaussian Estimation Bayes with KNN Estimation KNN Feature

No.

SFS PCA SFS PCA SFS PCA SFS PCA TE TR TE TR TE TR TE TR TE TR TE TR TE TR TE TR

87.5 97.06 68.75 91.17 87.5 97.06 56.25 88.24 87.5 97.06 56.25 76.47 87.5 100 68.75 100 1 93.75 100 68.75 91.17 93.75 100 62.5 91.17 87.5 100 68.75 91.17 87.5 100 62.5 100 2 93.75 100 81.25 97.06 81.25 97.06 93.75 100 81.25 91.17 68.75 91.17 87.5 100 81.25 100 3 87.5 97.06 87.5 97.06 81.25 97.06 75 91.17 81.25 94.12 68.75 91.17 93.75 100 87.5 100 4 87.5 97.06 81.25 97.06 87.5 97.06 81.25 91.17 75 91.17 81.25 94.12 87.5 100 81.25 100 5

TABLE IV CORRECT CLASSIFICATION RATES (%) OF FOUR CLASSIFIERS FOR 6 SELECTED FEATURES FROM THE SIXTH FEATURE SET BY USING PCA AND SFS METHODS

FOR BOTH TRAINING (TR) AND TESTING (TE) DATA. MLP Bayes with Gaussian Estimation Bayes with KNN Estimation KNN Feature

No.

SFS PCA SFS PCA SFS PCA SFS PCA TE TR TE TR TE TR TE TR TE TR TE TR TE TR TE TR

75 91.17 81.25 91.17 75 91.17 56.25 91.17 87.5 97.06 81.25 94.12 87.5 100 81.25 100 1 81.25 94.12 93.75 97.06 62.5 91.17 62.5 91.17 81.25 97.06 81.25 94.12 81.25 100 87.5 100 2 81.25 94.12 75 91.17 62.5 91.17 75 91.17 81.25 97.06 81.25 94.12 87.5 100 87.5 100 3 81.25 94.12 75 91.17 75 91.17 87.5 91.17 75 97.06 81.25 94.12 81.25 100 87.5 100 4

75 94.12 75 91.17 81.25 91.17 93.75 100 81.25 97.06 75 94.12 81.25 100 87.5 100 5 75 94.12 81.25 91.17 75 91.17 93.75 100 75 97.06 75 94.12 81.25 100 81.25 100 6

TABLE V

CORRECT CLASSIFICATION RATES (%) OF FOUR CLASSIFIERS FOR 5 SELECTED FEATURES FROM THE SIXTH FEATURE SET BY USING PCA AND SFS METHODS FOR BOTH TRAINING (TR) AND TESTING (TE) DATA.

MLP Bayes with Gaussian Estimation Bayes with KNN Estimation KNN Feature

No.

SFS PCA SFS PCA SFS PCA SFS PCA TE TR TE TR TE TR TE TR TE TR TE TR TE TR TE TR

62.5 91.17 62.5 91.17 56.25 88.24 56.25 88.24 68.75 91.17 68.75 91.17 75 100 62.5 100 1 62.5 91.17 81.25 94.12 62.5 91.17 62.5 91.17 62.5 91.17 62.5 91.17 68.75 100 62.5 100 2 62.5 91.17 68.75 91.17 56.25 88.24 62.5 91.17 68.75 91.17 75 91.17 68.75 100 56.25 100 3

68.75 91.17 68.75 91.17 56.25 88.24 56.25 88.24 75 91.17 62.5 91.17 62.5 100 62.5 100 4 62.5 91.17 68.75 91.17 62.5 91.17 56.25 88.24 68.75 91.17 75 91.17 75 100 50 100 5

TABLE VI CORRECT CLASSIFICATION RATES (%) OF FOUR CLASSIFIERS FOR 5 SELECTED FEATURES FROM THE SEVENTH FEATURE SET BY USING PCA AND SFS METHODS

FOR BOTH TRAINING (TR) AND TESTING (TE) DATA. MLP Bayes with Gaussian Estimation Bayes with KNN Estimation KNN Feature

No.

SFS PCA SFS PCA SFS PCA SFS PCA TE TR TE TR TE TR TE TR TE TR TE TR TE TR TE TR

75 94.12 62.5 91.17 62.5 91.17 56.25 88.24 62.5 91.17 75 91.17 75 100 62.5 100 1 81.25 94.12 81.25 94.12 68.75 91.17 81.25 94.12 81.25 94.12 68.75 91.17 68.75 100 62.5 100 2 81.25 94.12 81.25 94.12 68.75 91.17 81.25 94.12 75 91.17 68.75 91.17 81.25 100 62.5 100 3 81.25 94.12 81.25 94.12 62.5 91.17 68.75 91.17 75 91.17 75 91.17 75 100 62.5 100 4 81.25 94.12 87.5 97.06 62.5 91.17 68.75 91.17 93.75 100 62.5 91.17 81.25 100 81.25 100 5

Page 8: [IEEE 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) - Montreal, QC, Canada (2010.05.2-2010.05.5)] 2010 IEEE Symposium on Computational

TABLE VII

CORRECT CLASSIFICATION RATES (%) OF FOUR CLASSIFIERS FOR 3 SELECTED FEATURES FROM THE SECOND FEATURE SET BY USING PCA AND SFS METHODS FOR BOTH TRAINING (TR) AND TESTING (TE) DATA.

MLP Bayes with Gaussian Estimation

Bayes with KNN Estimation KNN Feature N

o.

SFS PCA SFS PCA SFS PCA SFS PCA TE TR TE TR TE TR TE TR TE TR TE TR TE TR TE TR

93.75 100 100 100 87.5 97.06 100 100 81.25 97.06 100 100 93.75 100 100 100 1 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 2 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 3

REFERENCES [1] L. Biel, O. Pettersson, L. Philipson, and P. Wide, “ECG analysis: A

new approach in human identification,” IEEE Trans. Instrumentation and Measurement, Vol. 50, No. 3, June 2001.

[2] S. Kara, and M. Okandan, “Atrial fibrillation classification with artificial neural networks,” Pattern Recognition, Vol. 40, No. 11, pp. 2967-2973, Jan. 2007.

[3] G. Schreier, P. Kastner, and W. Marko, “An automatic ECG processing algorithm to identify patients prone to PAF,” Proceeding of Computer in Cardiology Conference, pp. 133-135, 2001.

[4] W. Zong, “A methodology for predicting PAF based on ECG arrhythmia feature analysis,” Proceeding of Computer in Cardiology Conference, pp. 125-128, 2001.

[5] C. Maier, M. Bauch, and H. Dickhaus, “Screening and prediction of paroxysmal atrial fibrillation by analysis of heart rate variability parameters,” Proceeding of Computer in Cardiology Conference, pp. 129-132, 2001.

[6] K.S. Lynn, H.D. Chiang, “A two- stage solution algorithm for paroxysmal atrial fibrillation prediction,” Proceeding of Computer in Cardiology Conference, pp.405-407, 2001.

[7] Y.V. Chesnokov, A.V. Holden, and H. Zhang, “Screening patients with paroxysmal atrial fibrillation (PAF) from non-PAF heart rhythm using HRV data analysis,” Proceeding of Computer in Cardiology Conference, pp.459-462, 2007.

[8] G. B. Moody, “ECGPUWAVE Software,” available: http://www.Physionet.org/physiotools/ECGPUWAVE

[9] O. Nelles, Nonlinear system Identification, Springer Verlag, Berlin, 2001.

[10] A. V. Oppenheim, R.W. Schafer, Discrete-Time Signal Processing, Prentice-Hall, pp. 730-742, 1989.

[11] A. R. Webb, Statistical Pattern recognition, John Wiley, 2002. [12] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern classification, Wiley

Interscience, 2nd Edition. [13] S. Theodoridis, K. Koutroumbas, Pattern recognition, Elsevier

Academic Press, 2003. [14] K. Fukunaga, Introduction to Statistical Pattern Recognition, Morgan

Kaufmann Academic Press, 2nd Edition, 1990. [15] C. M. Bishop, Neural Networks for Pattern Recognition, Oxford

University Press, 1995. [16] http://www.codeproject.com/KB/cpp/MLP [17] http://physionet.org/challenge/2001/top-scores.shtml