automatic seizure detection based on imaged-eeg signals

Automatic Seizure Detection based on imaged-EEGsignals through statistical learning

Catalina Gomez CaballeroDepartment Biomedical Engineering

Universidad de los AndesBogota, Colombia

[email protected]

Pablo ArbelaezDepartment of Biomedical Engineering


[email protected]

Mario ValderramaDepartment of Biomedical Engineering


[email protected]

Abstract—We proposed an automatic method to detect epilep-tic seizures using an ”imaged-EEG” representation of brainsignals. To accomplish this task, we analyzed scalp EEG record-ings from two datasets: 24 pediatric patients belonging to thepublic CHB-MIT Scalp EEG Database, and 11 patients from theEPILEPSIAE project. To change from signals to images domain,we took the raw amplitude of the EEG signals as the pixelattribute and divided the recordings into four-second windows togenerate ictal and interictal instances. We used Fully Convolu-tional Neural Networks (FCN) to make predictions of variable-length recordings. The data from both classes was distributedinto three cross-patient models: 3 fold Cross-Validation (3FCV),Leave-one-out (LOO), and first seizures model. We evaluatedthe performance of the algorithms using the standard metricsof detection problems: precision, recall, F-measure and AveragePrecision (AP). Our method in the PhysioNet patients with theLOO model achieves an average precision of 51.4%, recall of53.1%, F-measure of 46.6% and AP of 44.4% when evaluating inall the recordings (n=642). Even though, hard patients decreasedthe overall performance, we obtained satisfactory results forpatients with a detection precision over 90% and recall over 70%(n=196 recordings). Using the experimental setup in PhysioNet,we validated the models using imaged-EEG signals as inputsto distinguish between ictal and interictal activities. We appliedthe approach to a larger dataset, EPILEPSIAE, in which weobtained satisfactory results for patient-specific models includingintracranial electrodes.

Index Terms—Deep Learning, EEG, epilepsy, seizure detection.

I. INTRODUCTION

The diagnosis of many brain diseases relies on the tools thathave been designed to measure biological signals. For instance,monitoring brain activity through the electroencephalogram(EEG) is one of the standard methods to diagnose epilepsy[1]. EEG recordings are inherently extensive in time due to thespontaneous nature of epileptic seizures. Thus, their analysisis costly and requires vast amounts of time from well-trainedspecialists. Rapid, automated solutions for the characterizationof such states of the brain are well needed and could lead tofaster diagnostics.

Epilepsy is a chronic condition of the brain characterized byan enduring propensity to generate epileptic seizures, whichare transient aberrations in the brain’s electrical activity, and bythe neurobiological, cognitive, psychological and social conse-quences of this condition. This disease affects over 50 million

people worldwide [2]. EEG recordings of patients sufferingfrom epilepsy show two types of activity: interictal (time spanbetween seizures) and ictal (time span between seizure onsetand offset). Continuous EEG recordings of the patients areneeded (with hours to days duration) since their main purposeis to record seizures that occur without warning. In the clinicalpractice to diagnose epilepsy, the EEG recordings are visuallyanalyzed by experienced neurophysiologists to identify thecharacteristic patterns of the interictal and ictal activities [3].Thus, the main goal of seizure detection is to segment thebrain’s electrical activity in real-time into seizure and non-seizure periods.

However, the visual inspection of EEG recordings is time-consuming and inefficient, especially for long recordings;there could be disagreements among the specialists on thesame recording due to the subjective analysis; or access toa neurologist could be unavailable in developing countries.In addition, the EEG patterns that characterize a seizure canbe missed or confused with the background noise or artifacts,and there is cross-patient variability in seizure and non-seizureactivity [4]. The drawbacks of manual inspection call formethods that perform automatic seizure detection emulatingthe visual analysis performed by experts. In a clinical setting,the acceptance of these algorithms depends on a low false-positive alarm rates, and a high sensitivity. Furthermore, thealgorithms must be tested and validated in a large amount ofEEG recordings.

The problem of identifying ictal and interictal states canbe studied as a binary classification task, in which the timepoints of a brain recording are labeled with either class.However, due to the class imbalance, the problem becomes adetection task, where the main goal is to localize the seizuresamong the background activity. In this work we proposeto use Convolutional Neural Networks for the detection ofpathological states of epileptic patients. Moreover, we changefrom the signals domains to the images domain to analyze theEEG signals, imitating the work of the neurophysiologist inthe practice.

II. OBJECTIVES

A. General Objective

To develop and evaluate algorithms for the automaticdetection of epileptic seizures based on EEG signals usingDeep Learning methods.

B. Specific Objectives

- To evaluate the data generation strategy from the EEGsignals to train the models.- To evaluate the performance of Convolutional NeuralNetworks using different experimental setups.- To validate the performance of the models over the test datausing patients from different datasets.

III. RELATED WORK

Scalp-EEG seizure detection algorithms should detectseizures with high sensitivity and selectivity for many pa-tients. The gold standard for evaluating the performance ofseizure detection algorithms relies on the visual annotations ofseizures by EEG experts. Even though the agreement betweenexperts is usually high, there are discrepancies in high seizurerates and seizures of short and long durations. The commonmetrics to evaluate the performance of a seizure detectionalgorithm are true positives, false negatives, detection delay,false positives, sensitivity and specificity (false positive alarmrate per hour) [5].

The characteristic frequency bands of the brain’s activityhave traditionally called for frequency analysis of EEG record-ings. Traditional methods change signals to the frequencydomain by using the wavelet or Fourier transform, to extractrelevant features that characterize a seizure, and often combinethem with time-domain features. These hand-crafted featuresare the inputs to train a classifier that distinguishes betweenseizure and non-seizure episodes. The limitations of usingdomain-based methods is that they are susceptible to variationsin seizure patterns because of the data acquisition artifactsand the non-stationary EEG data, which makes its statisticalfeatures change across subjects and time [6].

Due to the cross-patient variability in seizures and normalEEG activity, some authors used machine learning to constructpatient-specific detectors that identify seizure onsets quickly.The authors of [4] proposed the evaluation of a seizuredetection algorithm using a public database that is used inthis work. They encoded the time-evolution of spectral andspatial properties of the brain’s electrical activity into a featurevector. The feature vector is then mapped to a binary labelusing a discriminative function that is trained with examplesof seizure and non-seizure feature vectors. To extract thespectral and spatial features, they passed two seconds windowsthrough a filterbank, and then measured the energy at eachfilter for each channel. The time evolution component wasincluded by stacking feature vectors of individual windows.The performance of their patient-specific seizure detector wasevaluated using a leave-one-record-out (a seizure record for

latency and sensitivity or a non-seizure record for specificity)cross-validation scheme. The average results are shown intable I.

TABLE ICOMPARISON OF SEIZURE DETECTION STUDIES AND THEIR RESULTS ON

THE CHB-MIT DATASET.

Author Class. Sens. [%] Spec. [%] Lat [s] FP/hShoeb [4] SVM 96 - 4.6 0.08Khan et al. [7] LDA 84 100 92 -Thodoroff et al. [1] RNN 85 - - 0.8

A more recent approach [1] uses deep learning in a super-vised learning framework to automatically learn more robustfeatures. They proposed a Recurrent Neural Network (RNN)architecture that captures spectral, spatial and temporal pat-terns that represent a seizure. As an input, they create an imagerepresentation of the EEG signals using spatial information(electrodes montage) and the magnitude of different frequencybands encoded as the pixel attribute. They trained cross-patientmodels to learn a seizure representation, and thus, predict ifeach image corresponds to a seizure or not. The leave-one-outscheme in the cross-patient models leaves all the records ofone patient for testing the model, and this process is repeatedas many times as patients. Table I compares the performanceof their methods with other approaches.

In this work we propose different inputs to the seizuredetection algorithms, changing from the signals to imageanalysis domain by representing signals as images withoutpreprocessing steps. The robust state-of-the-art methods inComputer Vision for recognition tasks can be used to addressthis problem, and so evaluate the performance with the metricsof a detection problem with images. Even though we obtaineda good performance for some patients in terms of precisionand recall when testing the detector in complete recordingsof the CHB-MIT database, the other metrics of a detectionproblem can be still improved. Furthermore, we extendedthe experimental framework to a long-term EEG databaseincluding intracranial recordings.

IV. METHODOLOGY

A. Dataset Description

PhysioNet:The epilepsy dataset (CHB-MIT) consists in scalp EEG signalsfrom pediatric subjects with intractable seizures at the BostonChildren’s Hospital. These recordings were acquired using theInternational 10-20 System for electrode positioning in bipolarconfiguration, sampled at 256 Hz with 16-bit depth. Moreover,each recording was annotated by experts indicating seizurebeginning and ending times in seconds. To summarize theavailable data, the dataset comprises more than 600 hours ofEEG recordings with 141 annotated seizures.

EPILEPSIAE:It is an European epilepsy database that contains high qualityneurophysiological scalp and intracranial recordings of more

than 300 patients. The raw EEG recordings comprise at least5 days of continuous recording for each patient, with morethan 24 channels and sampling rates close to 1024 Hz [8].This database contains metadata information about recordingsand patients, which include the EEG onset and offset timesof the seizures (in a date-hour format), the electrodes thatwere involved in the seizure activity (primary and secondarypropagations), additional classification of the seizure, amongothers. The raw channels of the recordings are stored in amonopolar configuration, and they can be subtracted to obtainthe bipolar configuration. We selected 11 patients from thisdataset that fulfilled the requirement of having both scalp EEGand intracranial recordings, which are arranged in grids andhave a different number of channels for each patient.

B. Approach

Fully Convolutional Networks:Convolutional Neural Networks (CNN) have been used toaddress object recognition tasks due to their ability to encodethe input in multiple representations on its hidden layers, andpropagate it through the network to produce an output with afinal classification layer that is usually a fully connected layer.To make dense predictions for per-pixels tasks, CNNs werechanged to Fully Convolutional Layers (FCN) by replacingfully connected layers with convolutions.The advantage ofusing FCN is that they produce dense outputs from arbitrary-sized inputs, so we can fed the network with recordings ofdifferent lengths [9]. For our classification problem, the inputsof the networks are the representations generated from theEEG recordings, which correspond to two classes: ictal andinterictal. The labels or ground truths for a signal segment aregiven by the specialist annotations.

For classifying neural activity states we designed a FCNnetwork with 3 blocks. To design the network we considerednot to reduce much the temporal dimension and the limitednumber of data to train the model, thus the model cannot havea large number of parameters. Each block consists of Convo-lution, Batch Normalization, non-linear transformation ReLU,and Pooling layers, followed by two fully convolutional layersbefore the SoftMax layer, as shown in figure 1. The purposeof the last SoftMax layer is to rescale the elements from theoutput of the last convolutional layer in the range (0,1), suchthat the final output of the network is a probability vectorof being either class. Since the input images have only onedimension (temporal), the convolutions, batch normalizationand pooling operations are applied in one dimension (* in thenetwork diagram corresponds to 1D convolutions). Initially,the number of input channels for the convolutional layerscorresponds to the channels of the recording, and then, thenumber of output channels is defined within each convolution.The dimensionality of the input is reduced only along thetemporal axis in each pooling layer. The kernels in the lastconvolutional layers, which replace the fully connected, aredefined such that the network produces a class label everywindow of a fixed size. In this way, the input can be of variable

length and the number of predictions depends on the lengthof the input/recording to evaluate.

Due to the limited number of patients, seizures and theirshort duration, we used a transfer learning strategy, calledfinetuning, to train a model in a database that does not havesufficient size [10]. Once a model has been trained on a basedataset for a specific task, the learned features are transferredto a second network that is trained with samples from a newdataset for a new task. The weights of the pretrained modelare said to be finetuned by continuing backpropagation. All thelayers from the network can be finetuned, or only a fraction byfreezing the weights of the layers that want to be preserved.This process will work if the learned features are general, andif the nature of both the tasks and data is similar.

C. Implementation Details

Signal RepresentationTo change from signals to images domain, we took the

amplitude of the signal as the attribute of each pixel. Ad-ditionally, for the training setup we defined a window widthW (in seconds) to split the brain recordings into segments,such that the physiological activity of the event to be detectedis present as a characteristic pattern. The resulting imageshave as dimensions N ×W ∗SamplingRate, where N is thenumber of channels or electrodes that were recorded, and thesampling rate depends on the specifications of the recordingtool. To remove noise sources from the scalp EEG signals,we set a threshold amplitude in ±250µV and saturated thevalues out of this range. An example of the training examplesfor both classes is shown in figure 2, in which the amplitudeof each time point is encoded with a color. This visualizationallowed us to determine changes in the spatial and temporaldimensions during an ictal and interictal period, and to identifythe electrodes that are mostly involved during a seizure.

For CHB-MIT epilepsy recordings, the order of the elec-trodes was standardized for all the patients by choosing the23 common bipolar channels among all the records. After thisselection, we kept 630 records with 136 seizures from the 24patients. The windows form both classes (ictal and interictalstates) were used to train the models and adjust the parametersof the network, while complete recordings (with the naturalclass imabalance) were used to evaluate the models.

Due to the spontaneous nature of epileptic seizures andtheir short duration compared to the extensive recording ofbackground activity, we augmented the number of trainingexamples of the ictal class using overlapping windows betweenthe annotated start and ending time of seizures. The overlapwas controlled with a time-shift between consecutive windowsdefined in seconds. Even after data augmentation, the classimbalance remained. The interictal instances were randomlysubsampled such that a fraction of the instances was selectedclose to the seizures, by defining a closeness parameter, and theother fraction was selected far from the seizures. The interictalinstances close to the seizures (pre-ictal and post-ictal states)were augmented with the same strategy mentioned before, butin a lesser extent.

Fig. 1. Proposed FCN architecture for the automated classification of EEG recordings into ictal and interictal states. * denotes a convolution operation inone dimension, n is the number of classes and W in the window width, and O is the number of output channels in the respective convolutional layer.

Fig. 2. Examples of the signal representation taking the amplitude as the pixelattribute. The height encodes the channels in bipolar configuration, and thewidth the window length (4s) with the aspect ratio adjusted for visualizationpurpose. Pixel colors closer to yellow mean a higher amplitude, and closer todark blue, lower values. Top row: ictal examples from four patients, bottomrow: interictal examples.

Image RepresentationWe used the ictal and interictal instances (4s windows) thatwere extracted from the raw EEG signals to generate a visualrepresentation of the signals, as the specialists analyze them.All the 23 channels were plotted in a graph such that theindividual lines were not overlapped. The resulting images hadblack lines representing each channel over a white background.To avoid overlapping between lines, we adjusted the scale ofthe signal amplitude by defining a global and local (channel-related) normalization strategy, since the large amplitude of achannel could mask the oscillations of a channel in a smallermagnitude range. Each image was saved as a .png file withan approximate size of 120 kB. An example of the generatedimages from different patients is shown in figures 3 and 4.

Dataset PartitionIn the PhysioNet dataset, we defined cross-patient models bysplitting the window instances in two ways. First, in the 3 foldCross-Validation (3FCV), we divided patients into three sets,training in two sets and evaluating in the remaining one, suchthat the three possible combinations were covered. Second, inthe Leave-One-Patient-Out (LOO), 24 models were trained,each one using data from k−1 patients (where k is the numberof patients), and evaluated in all the recordings of the patientleft out.

For the EPILEPSIAE dataset, we used a different approach

Fig. 3. Examples of the segments of the EEG signals represented as imagesusing global normalization for each window.

Fig. 4. Examples of the segments of the EEG signals represented as imagesusing local normalization for each window.

to train the models, based on chronological information fromthe signals. Following the strategy of [11] to split the data, wedefined a criterion to train the models with the first N% of thetotal seizures of each patient, and evaluate in the remainingpercentage of seizures. The model contained the first seizuresfrom all the patients. The percentage of recordings withseizures to train the models was defined as the 80%. For thescalp electrodes, we trained models from scratch only withexamples from this database, and finetuned pretrained modelson PhysioNet for the bipolar configuration. The models fromscratch and fintuned from PhysioNet with scalp electrodes, arethe weight initialization for the models that include intracranialchannels, besides patient-specific models trained from scratch.For this, we selected the patient who had more seizuresbecause of the possibility to generate more training instancesof the ictal class.

We evaluated the performance of the algorithms us-ing the standard metrics of detection problems: precision(TP/(TP+FP)), recall (sensitivity TP/(TP+FN)), F-measure(harmonic mean of precision and recall) and Average Precision(AP), that is the area under the precision-recall (PR) curvebuilt with different confidence thresholds. For an individualpatient, to construct the PR curve for the positive class, whichis the ictal, we took into account the predictions from all therecordings of the patient, and thresholded over the probabilityof belonging to the ictal class. The precision and recall thatwe reported are the ones that maximized the F-measure.

The global performance of a method is given by the averagemetrics with standard deviation of the patients that wereevaluated. For instance, in the 3FCV experiments, the metricsfor each fold are computed as the average metrics of thepatients that were used to test the model in that fold. Inthe LOO approach, the global metrics are the average of theperformance of each patient when evaluated. For the modelstrained with the first seizure recordings of all the patients,we calculated the metrics for each patient using only theremaining recordings with the last seizures, and then averagedthem.

Training details: we trained FCNs for 100 epochs anda batch size of 64, with a Cross Entropy (CE) Loss andStochastic Gradient Descent (SGD) optimizer with a learningrate of 1 × 10−3. To adjust the learning rate, we defineda criterion to reduce it when the test average classificationaccuracy has stopped improving. Since we were training theneural networks from scratch, we applied some strategies toreduce overfitting to the training data.

V. RESULTS AND DISCUSSION

A. PhysioNet

We explored different FCN architectures, image generationstrategies, augmentation parameters and interictal subsamplein the 3FCV model, and applied the best combination inthe LOO model. To analyze the best subsample strategy ofinterictal instances, we tried an interictal random selectionand the close-far instances approach. To perform all theexperiments we defined the training window width as 4s, the

time shift for data augmentation of ictal instances as 1/8s and1/2s for interictal instances. The interictal subsample of closeand far instances was defined as a 50/50 ratio. We modifiedthe original FCN architecture, and reported the metrics forthese experiments. Based on the original architecture and thepooling operations over the time dimension, the model madepredictions every 64 sample points.

To include frequency information, we added the signalsfiltered at different bands (0-7,7-14 and 14-49 Hz) [1] tothe original 23 channels, thus having a total of 92 channels.Then, we generated the ictal and interictal instances using thestrategy described before and trained a model with the originalFCN architecture. Additionally, we tried different kernel sizesin the convolutional layers: the base FCN used kernels of size3 (corresponds to 12 ms) and the variation used kernels of size51 (corresponds to 200 ms).

The results of all the experiments are shown in table II.The name of the experiment specifies the network architectureor modification, and the interictal subsample strategy. Forinstance, the experiment in the first row corresponds to theoriginal architecture (base FCN) with the close-far strategy.The experiments with N filters correspond to changes in thenumber of outputs at each convolutional layer, and FC denotesfully convolutional layer. An extra FC layer was added to thenetwork. Within each experiment, each line corresponds to thetest metrics of each fold, reported as the average and standarddeviation over the test patients of the respective fold. The bestconfiguration of parameters (base FCN + learning rate decaywith a plateau function) was used to train the LOO model.The average results are shown at the bottom of table II.

The results of the 3FCV experiments showed that the testpatients of each fold had a wide range of difficulties to detectseizures. For instance, in the second fold all metrics arebetter, compared to the first and the third one, which had thelowest performance. These differences could be attributed tothe distribution of patients, their recordings and the annotatedseizures. Additionally, the order of the standard deviationsuggests that there is a high intra-class variation, which meansthat the metrics for each patient differ significantly. Thetwo strategies to reduce the overfitting of the model duringthe training phase, reported as Dropout layers and WeightDecay (WD), allowed the model to perform better in the hardpatients (from the third fold), improving the metrics in about20 points, while keeping the performance of the other foldsalmost constant. We included the dropout layers in the lastconvolutions (Last 0.3), and also in the all of them (0.1-0.3),with a lower probability for the first layers because we donot want to lose first level features. The dropout techniqueprevents that the units of the model from co-adapting tothe training data, by randomly dropping out units and theirconnections [12]. The results were further improved by addingthe L2 regularization (weight decay) that helps the network togeneralize better.

In the 3FCV models, the performance of some patients, suchas patient 12 and 14, decreased the global metrics reported ontable II. However, we obtained satisfactory results for patients

TABLE IIGLOBAL AVERAGED RESULTS (%) IN ALL THE RECORDS OF THE TEST

PATIENTS FOR THE 3FCV AND LOO MODELS. EACH ROW WITHIN ONEEXPERIMENT OF THE 3FCV MODELS CORRESPONDS TO ONE OF THE

THREE FOLDS (1-3).

3FCVExperiment Precision Recall F-measure AP

42.9± 33.1 31.7± 23.4 35.8± 27.3 32.3± 29.0Base FCN 64.1± 26.5 50.4± 24.5 55.3± 25.7 52.5± 28.0Close-far 26.7± 21.2 55.6± 27.1 28.3± 18.4 20.2± 16.2

FCN 48.1± 37.5 53.0± 27.8 39.8± 30.7 37.6± 33.8256 filters 59.3± 28.5 58.4± 23.8 58.4± 26.4 55.2± 31.2Close-far 61.4± 29.6 54.1± 26.5 56.6± 29.3 50.4± 30.7

Base FCN 44.9± 38.0 57.8± 27.6 40.0± 33.6 36.2± 33.6LR plateau 60.7± 23.4 46.8± 21.2 51.5± 21.3 47.3± 23.6Close-far 36.8± 22.5 53.9± 27.1 39.5± 23.6 30.3± 20.6

Base FCN 37.4± 29.6 32.9± 24.1 32.8± 27.8 30.5± 30.7+ Filters 63.0± 29.9 58.8± 24.7 60.4± 27.3 58.5± 31.3Close-far 20.3± 20.5 51.7± 26.5 24.6± 23.9 16.6± 18.1

kernel 40.6± 36.0 32.6± 28.2 33.1± 31.0 28.8± 33.0size 51 55.9± 29.8 48.3± 21.6 51.2± 24.9 48.9± 30.1TEST 35.5± 27.3 52.4± 29.6 34.6± 22.6 26.0± 22.4

Dropout 36.8± 33.2 45.2± 33.0 34.5± 30.5 32.1± 33.30.1- 0.3 64.7± 26.2 47.1± 20.9 53.5± 23.3 50.8± 25.6

Close-far 51.2± 35.3 44.7± 30.2 46.0± 32.9 43.2± 36.2Dropout 45.0± 36.8 35.4± 25.7 35.6± 30.7 33.6± 33.50.1- 0.3 64.3± 26.4 55.9± 19.2 58.6± 24.0 56.9± 26.2

WD 51.2± 36.0 45.5± 29.3 46.6± 33.0 43.4± 36.3LOO

Base FCNLR plateau 44.3± 31.0 46.3± 24.2 40.2± 26.2 34.1± 28.0Close-far

Base FCNDO 0.3 50.2±33.5 49.4± 25.5 46.9± 29.8 44.3± 33.3

Close-farBase FCNDO 0.1-0.3 51.4±34.1 53.1± 25.5 46.6± 31.0 44.4± 34.4Close-far

with a detection precision over 90% and recall over 70%(n=196 recordings), such as patient 10, 11 and 19.

In the LOO approach, we evaluated the generalizationcapacity of the model, since the test recordings had not beenseen before. Additionally, this approach allowed us to studythe difficulty of each patient individually. Figure 6 shows allthe evaluation metrics for each patient using the base FCNmodel with dropout layers and weight decay. Additionally, thenumber of seizures for each patient is shown in purple dia-monds. The performance tends to decrease with an increasingnumber of seizures, but the patients with less seizures haveboth a good precision and recall. Qualitative results of gooddetections and limitations are shown in figure 5. The top imageshows an example of an accurate math between the annotatedand predicted seizure times. The middle image shows accuratepredictions in more than 16 hours of recordings (the groundtruth lines are covered by the preditions). In the bottom imagethere is an example of false positives and disrupted predictions.

Based on the results of the LOO model, some patients hadseizures harder to detect than others, and we did not want ourresults to be biased by the difficulty of the patients used to trainthe model. Thus, we did a 2-fold cross validation experiment,in which we divided the patients into two sets by setting a

Fig. 5. Qualitative results of the best LOO model (base FCN + DO).Annotations (blue) and predictions (red) are plotted over the signals. Thebipolar configuration of electrodes is shown in the y-axis. Each recording isa sample from a different patient.

threshold in the F-measure (60%) of the LOO results. Thepatients with a F-measure >60% are identified as easy, and theones with a lower value as hard. The results of this experimentare shown in table III, for the evaluation both in the recordingsthat were used to extract the training instances (TRAIN), andin those never seen before by the model (TEST). The goodperformance of the easy patients was maintained even whentraining with the hard ones, where the model did not overfittedto the training data. On the other hand, training only with theeasy patients helped the model to overfit during training, butthere is a significant decrease in the test performance, meaningthat the model cannot generalize well for data that was neverseem before.

When the model was trained with data from the hardpatients, the training metrics showed that the model hadlearned something because it had already seen signal segmentsfrom the complete recordings which are evaluated later. Thus,we proposed to train a model with data from the first 80% ofrecordings with seizures and evaluate in the remaining ones.The results of this model are shown in the bottom block oftable III. The average metrics were calculated over all the 24patients. Compared to the LOO model results, which reflect theaverage performance of the patients, all the metrics improved

TABLE IIIGLOBAL AVERAGED RESULTS (%) IN ALL THE RECORDS OF THE TEST

PATIENTS FOR THE 2FCV MODEL. THE MODEL CONFIGURATION IS THEBEST FCN WITH DROPOUT LAYERS. EACH ROW CORRESPOND TO A

DIFFERENT EVALUATION SET.

Fold 1: training with hard patientsSet Precision Recall F-measure AP

TRAIN 60.7± 23.0 59.6± 20.5 59.2± 22.4 57.2± 25.2TEST 85.0± 10.0 69.5± 9.94 76.2± 9.11 79.2± 10.3

Fold 2: training with easy patientsTRAIN 95.7± 4.65 94.3± 3.59 95.0± 3.90 97.1± 4.13TEST 24.0± 22.3 24.0± 24.6 17.0± 14.6 10.7± 11.1

First seizures modelTEST 62.7± 30.2 58.3± 22.9 59.0± 25.9 54.5± 31.5

in more than 10 points when the model has seen seizures andinterictal periods of each patient.

Image RepresentationTo train the FCNs with the visual representation of the ictaland interictal instances, we adapted the base architecture withtwo dimensional operations. To generate the images of thesignals, we fixed the height of the window (288 pixels) anddefined a value in pixels (288) that corresponds to the temporalwidth of the windows (4s), such that the kernel size of the lastconvolutional layers could be fixed to generate a prediction foreach window. The pixel-time relationship must be preservedin the evaluation of new recordings, since the FCN receivesinputs of variable length. In the last convolutional layerswe changed the stride, that is the distance between spatiallocations where the convolution kernel is applied, such thatthe model made predictions every 288 pixels.

The results for both the global and local normalizationplotting strategies are shown in table IV, for the training andtesting sets, keeping the same patient division of the origi-nal 3FCV model. When testing the models in the completerecordings, the signals must be represented in the same wayas in the training procedure. The window length (in seconds)can be adjusted in the evaluation, taking into account thatmore seconds lead to images with larger resolution, and thusmore load. Having images as inputs increased the number ofparameters to be tuned, thus, to avoid overfitting we trainedmodels with the dropout layers. Overall, the metrics decreasedwhen using the image representation, to a lesser extent for therecall, but the amplitude representation had higher recall andprecision values for all the three folds. We attribute the lowperformance in the training set to the scaling to plot signalsthat depends on the properties of each window, either globalor local. Even if we evaluate the model in the recordingsthat were used to extract the training instances (TRAIN), anysegment of the signal represented as a plot will be a newinstance for the model because of the window-related scaling.

B. EPILEPSIAE

The best configuration of the network parameters foundin PhysioNet, was applied in the sample of patients fromEPILEPSIAE. We subsampled the signals from 1024 to 256Hz by taking the average amplitude every 4 consecutive time

TABLE IVGLOBAL AVERAGED RESULTS (%) IN ALL THE RECORDS OF THE TEST

PATIENTS FOR THE 3FCV MODEL USING THE IMAGE REPRESENTATION OFTHE SIGNALS. EVALUATION IN SIGNAL SEGMENTS OF 24 SECONDS.

TRAIN MEANS THAT THE MODEL WAS EVALUATED IN THE RECORDINGSUSED TO EXTRACT THE TRAINING INSTANCES.

3FCV: Global NormExperiment Precision Recall F-measure APBaseFCN 13.2± 13.4 22.0± 20.4 11.6± 10.6 6.16± 9.08Dropout 11.5± 18.5 23.5± 17.2 11.4± 15.5 7.44± 1.43TRAIN 15.9± 20.5 22.2± 13.5 15.1± 17.2 9.89± 15.8

BaseFCN 7.56± 15.5 23.1± 20.5 6.75± 11.2 4.63± 9.73Dropout 5.57± 6.19 39.4± 37.0 6.67± 6.77 2.82± 3.63TEST 1.70± 2.42 15.6± 13.6 2.26± 2.29 0.67± 0.66

3FCV: Local NormBaseFCN 8.34± 7.90 15.3± 10.5 8.30± 7.19 3.58± 4.34Dropout 19.4± 21.4 25.5± 17.9 17.5± 17.4 11.1± 16.0TRAIN 12.1± 13.2 30.9± 21.2 13.3± 12.5 6.96± 9.76

BaseFCN 8.80± 10.1 25.6± 30.5 6.25± 6.56 2.88± 3.24Dropout 9.05± 5.46 8.70± 5.89 7.32± 4.03 2.14± 1.20TEST 2.81± 3.22 28.1± 23.9 3.26± 2.43 0.99± 0.64

points. For the scalp electrodes, to finetune the best modeltrained with data from more patients and seizures, we definedthe same bipolar configuration as the recordings of PhysioNet.

The window width was kept as 4s and the saturation of thesignals at ±250, but the time shift for data augmentation ofictal instances was changed to 1/16s, which is approximatelyevery 63 ms, and 1/8s for close interictal instances. Weapplied the same strategy to select interictal instances closeand far from the seizures. When selecting the ictal instances,we discarded recordings with seizures shortest than < 9s,and the ones that had non-valid annotations (NULL on thebeginning or ending time or durations > 1h), for both thetraining and evaluation.

Figure 7 summarizes the results in the evaluation set of themodels trained only with the scalp electrodes from 11 patients(removing one outlier patient in the evaluation), when trainedfrom scratch and finetuned from PhysioNet. We took theFCN + DO model used in PhysioNet as the base architectureto experiment in this dataset. For the models trained fromscratch, we tested a monopolar and bipolar configurationof electrodes, and only bipolar for finetuning model fromPhysioNet, since they were trained with bipolar channels.As pretrained models, we selected those trained with the fistseizures from all the PhysioNet patients, which are denotedwith an FT at the beginning of the experiment name. Themetrics show a strong variation within patients, and a lowperformance for 8 patients. Even though the best results wereobtained using a monopolar configuration, the differences werenot notable between experiments. Finetuning from modelsthat have already been trained to identify ictal from interictalactivity in a bipolar configuration of electrodes did not improvethe results as expected. The metrics are similar when trainingthe model from scratch. Patients P8 and P10 had acceptableresults, and the greatest number of seizures, thus, we chosethem to do the tests with intracranial recordings.

To better understand the low performance in some patients,

Fig. 6. Results of the LOO model at the best configuration. Patients are in the x-axis sorted by decreasing F-measure (star marker). The left y-axis correspondsto the metric values and the right one to the number of seizures.

Fig. 7. Comparison of experiments for the sample of EPILEPSIAE patientsusing scalp electrodes only. Each point in the x-axis corresponds to onepatient, and the y-axis to each metric. The colors and patterns encode adifferent experiment and metric, respectively. The green diamonds (light anddark) show the average seizure length in seconds and the number of seizuresof the recordings that were evaluated.

we included additional information from the seizures thatwere evaluated (green diamonds in the bottom plots of figure7). Concerning the number of seizures, there was a differentpattern from the one found in figure 6. The patients with thebest results had the largest number of seizures, and P9, alsowith 5 seizures, had a very low performance. Hence, we cannotrelate the difficulty of each patient with its number of seizures.Besides, if a patient has less seizures in the recordings to eval-uate, it implies that the model was trained with fewer examplesfrom that patient. When analyzing the average seizure length,the two patients with the lowest performance (P5 and P9) hadthe shortest seizures, an average below 30s, which could make

them harder to identify due to the quick emergence of theseizure patterns. However, P4 had the largest average seizureduration, and the predictions were not accurate.

Additionally, we visualized the predictions of the modelsfor some patients, as shown in figure 8. In the top plot thereare two seizures that were correctly detected in the signal,but there was not an exact match with the beginning andending times. In the middle image, a false positive is shown,and a short seizure that is not delimited well. P3, one of thepatients with a low performance, had noisy signals that couldhave limited the model’s ability to identify a seizure (bottomimage).

To include information from the intracranial electrodes wedefined three strategies: (i) finetuning of a model pretrainedin the first recordings with seizures of the 11 EPILEPSIAEpatients, adapting the input size to the intracranial channelsonly (keeping the same electrode configuration), (ii) trainingfrom scratch using only intracranial or scalp electrodes, andtheir combination, and (iii) test the models from (ii) in amonopolar and bipolar configuration of electrodes. The fine-tuning must be performed separately for each patient, sincethey had a different number of intracranial electrodes. Giventhat the pretrained models were trained in scalp recordings, wehad to include an initial convolutional layer that maps fromI intracranial channels to 23 or 21. The parameters of thislayer are learned by the model too. The bipolar configurationfor the intracranial channels was defined within each grid bysubtracting consecutive electrodes. Figure 9 shows examplesof the signal representation for intracranial electrodes (57) ofP8. The yellow patterns that appear for some electrodes arerelated with a higher amplitude.

The evaluation metrics F-measure and AP of patient P8 andP10 are shown in table V, in which each row correspondsto a different experimental configuration. For both patients,training with only ictal and interictal instances from the first80% of recordings with seizures improved the predictionpower of the model. The recordings and seizures in the testset for both patients are the same ones of those reported infigure 7. The model in a bipolar configuration with scalpelectrodes trained from scratch has similar results to thosereported for P8 and P10 in figure 7. Using only intracranial

Fig. 8. Qualitative results in the evaluation recordings of the best scalp modelin the EPILEPSIAE patients. The amplitude of the signals is not saturated. They-axis shows the monopolar electrodes, and the x-axis the time in seconds.Blue lines correspond to annotations, and red lines to predictions. Top image:accurate predictions for P10, middle image: shifted prediction and a falsepositive for P8 , bottom image: challenging detection for P3.

Fig. 9. Signal representation of intracranial electrodes in monopolar config-uration. The labels in the y-axis are the names of the electrodes accordingto the grid they belong to. Images on the left show ictal activity, and on theright an interictal period of 4s.

electrodes improved all the metrics, specially for the bipolarconfiguration. The combination of both electrode types furtherimproved the results for the monopolar configuration, and forP10, we obtained the best performance using both electrodetypes in bipolar configuration (numbers in bold). Additionally,we took the best model trained with the scalp data from the11 patients (monopolar FCN+DO), and finetuned a modelusing only intracranial monopolar electrodes, because themodel had already seen scalp electrodes of P8 and P10. Thelatter combination had the best metrics and performance foridentifying the 5 seizures from the test recordings of P8. Theadvantage of finetuning a model is that it has been previouslytrained on a larger dataset, using the 11 patients, and thetraining can continue on a new smaller set, such as the datafrom an individual patient.

TABLE VEVALUATION METRICS OF FIRST SEIZURES MODEL FOR PATIENT P8 ANDP10 USING INTRACRANIAL CHANNELS. THE EXPERIMENT NAME REFERS

TO THE ELECTRODES CONFIGURATION, TYPE AND MODELINITIALIZATION, RESPECTIVELY.

First seizures modelP8 P10

Experiment F-measure AP F-measure APMonopolar

intra 66.3 67.7 72.1 77.3zero

Monopolarscalp 51.7 48.5 64.4 64.0zero

Monopolarboth 63.3 64.5 79.3 86.2zero

Monopolarintra 88.4 90.3 61.1 68.6FT

Bipolarintra 64.1 64.9 70.6 79.3zero

Bipolarscalp 23.6 15.3 52.9 53.4zero

Bipolarboth 53.9 52.3 90.0 89.3zero

Intracranial electrodes are closer to the epileptic foci, havinga better spatial localization and are free of artifacts present inthe scalp EEG [13]. Seizure patterns may emerge strongerover specific electrodes according to the origin of the seizureactivity. To validate that intracranial electrodes actually con-tribute to the seizure detection problem, we must test theseconfigurations for more patients. However, the number ofseizures of some of them is not enough to train one modelfrom scratch, since neural networks need many examples fromeach class to learn representative features.

VI. CONCLUSIONS

We validated the application of object recognition tech-niques used in Computer Vision, to analyze signals and detectepileptic seizures. By changing from the signals to images

domain, our models do not need any preprocessing steps orhand-crafted features to discriminate between an ictal andinterictal episode. The high variability in the evaluation metricsfor different patients reflects the difficulty of the problem,and the challenges in defining a cross-patient model that cangeneralize well for patients that it has never seen before.Cross-patient models can be improved if the model learnsfrom instances of the first seizures or recordings. Changingthe representation of amplitude pixels to plotting the signalsdid not improved the results and the quality of the predictionswas very low; even though this is the visualization of signalsthat specialists analyze to identify seizures.

In PhysioNet’s patients, we obtained acceptable results(precision and recall over 80%) for 8 of 24 patients in theexperimental set up in which the model has never seen datafrom these patients before. Since there is not an evaluationstandard for seizure detection studies, each one reports itsown metrics, then the results cannot be compared directly. Forinstance, in [1], they report a recall of 85% on cross-patientsmodels, but not the precision. Instead, they report the falsepositive rate per hour (0.8/h), but it is not clear what timelength they use to define a false positive. They show graphswith the metrics for each patient, but as the difference of theirperformance and a previous approach, making the comparisoncomplicated. Despite this, we agreed on some of the patientsthat have the most challenging seizures to detect, such as 6,12 and 14. The visual comparison between the predictions andlabels for these patients, shows the limitations of the model todistinguish brain activity for data that does not have prominentseizure patterns.

The models trained with the EPILEPSIAE patients didnot show satisfactory results using only scalp channels. Wehad less seizures and patients available to generate balancedexamples of both classes, and the annotations had some issuesthat we had to review by hand. However, when we added theintracranial electrodes for a patient-specific model, we had animprovement over all the metrics compared to using only scalpelectrodes.

REFERENCES

[1] Pierre Thodoroff, Joelle Pineau, and Andrew Lim.“Learning Robust Features using Deep Learning for Au-tomatic Seizure Detection”. In: CoRR abs/1608.00220(2016). arXiv: 1608.00220. URL: http://arxiv.org/abs/1608.00220.

[2] Joaquim Ferreira and Tiago Mestre. “Eslicarbazepineacetate: a new option for the treatment of focalepilepsy”. In: Expert opinion on investigational drugs18.2 (2009), pp. 221–229.

[3] Alexandros T Tzallas et al. “Automated epileptic seizuredetection methods: a review study”. In: Epilepsy-histological, electroencephalographic and psychologi-cal aspects. InTech, 2012.

[4] Ali H Shoeb and John V Guttag. “Application ofmachine learning to epileptic seizure detection”. In:Proceedings of the 27th International Conference onMachine Learning (ICML-10). 2010, pp. 975–982.

[5] Christoph Baumgartner and Johannes P Koren. “Seizuredetection using scalp-EEG”. In: Epilepsia 59 (2018),pp. 14–22.

[6] Ramy Hussein et al. “Epileptic Seizure Detection:A Deep Learning Approach”. In: arXiv preprintarXiv:1803.09848 (2018).

[7] YU. Khan, N Rafiuddin, and Farooq O. “Automatedseizure detection in scalp EEG using multiple waveletscales”. In: Proceedings of the IEEE International Con-ference on Signal Processing, Computing and Control(Mar. 2012), pp. 1–5.

[8] Juliane Klatt et al. “The EPILEPSIAE database: Anextensive electroencephalography database of epilepsypatients”. In: Epilepsia 53.9 (2012), pp. 1669–1676.

[9] Jonathan Long, Evan Shelhamer, and Trevor Darrell.“Fully Convolutional Networks for Semantic Segmen-tation”. In: CoRR abs/1411.4038 (2014). arXiv: 1411.4038. URL: http://arxiv.org/abs/1411.4038.

[10] Jason Yosinski et al. “How transferable are featuresin deep neural networks?” In: CoRR abs/1411.1792(2014). arXiv: 1411.1792. URL: http: / /arxiv.org/abs/1411.1792.

[11] Piotr W Mirowski et al. “Comparing SVM and convo-lutional networks for epileptic seizure prediction fromintracranial EEG”. In: Machine Learning for SignalProcessing, 2008. MLSP 2008. IEEE Workshop on.IEEE. 2008, pp. 244–249.

[12] Nitish Srivastava et al. “Dropout: a simple way to pre-vent neural networks from overfitting”. In: The Journalof Machine Learning Research 15.1 (2014), pp. 1929–1958.

[13] Gyorgy Buzsaki. Rhythms of the Brain. Oxford Univer-sity Press, 2006.

automatic seizure detection based on imaged-eeg signals

Documents