acquisition of biomedical signals databases

8
Acquisition of Biomedical Signals Databases Aspects to Consider when Building a Database, Based on Experiences from the SIESTA Project B iomedical signal databases generally are built for reference purposes. In or- der to make use of such reference data- bases, a physiological or medical question first has to be specified. Like in any medi- cal study, hypotheses have to be formu- lated before setting up a biomedical signal database. The questions posed and the hy- potheses formulated naturally lead to the design of a study protocol. The study pro- tocol specifies the recording equipment and details the type and number of signals and the sampling rates of the signals. The questions also allow the estimation of the number of subjects from whom the data should be collected (“power analysis”). The number of subjects and the number of total investigations then determine whether a single center can collect the data or whether a multicenter study is needed to collect the data within a reason- able period of time. One other reason sup- ports the selection of multicenter studies. Different sites do use different equipment and follow slightly different protocols in terms of diagnosis and treatment. A study with many different partners can reflect these differences and may result being more general than it ever could be derived in a single-center study. With a thoughtful design the resulting database can answer the questions posed in the beginning. Due to the large amount of collected information, being the nature of biomedical signal databases, the data- base can also answer more questions by applying new analysis methodologies or hypotheses to the gathered data. But in general, any database, even with exten- sive data, cannot answer all questions in the field. Very often new data have to be acquired with different protocols, differ- ent channel configurations, different sam- pling rates, or different signal precon- ditioning, etc. As biomedical signal databases are used for scientific and reference purposes, one specific aim is to obtain the best signal quality possible. Therefore, quality assur- ance during data acquisition is a very im- portant task and has to follow structured guidelines. However, there is a lack of such general guidelines and often they need to be database specific at least to some extent. Often, signal databases are created in multicenter studies. Multicenter studies require additional specifications in terms of recording equipment and protocol compatibility. Special care must be taken for continuous quality assurance during the recording period at the different sites. Site visits at all recording laboratories are a very useful method to harmonize re- cording conditions. Such site visits should follow a checklist of items being tracked. Although the checklist can be planned ahead of the recordings, its should be re- vised after the first round of recordings from each participating laboratory. This test series often reveals unforeseeable de- viations from the intended protocol. Usually, the first objective of record- ing biomedical data is not to make a data- base but to monitor or diagnose a patient, or to do research. The recorded data strongly depend on this first objective. For instance, monitoring the vital signs of a patient in acute life-threatening states or being under surgical procedures or anes- thesia conditions requires online analysis and immediate visualization. This is needed to allow immediate intervention by personnel trained for such situations. When data is recorded for research or di- agnosis, then immediate visualization of analyzed data is usually not required, but continuous storage of data is. Annotations and expert scorings of the signals recorded are as important as the data itself. Only through the evaluation of May/June 2001 IEEE ENGINEERING IN MEDICINE AND BIOLOGY 25 0739-5175/01/$10.00©2001IEEE ©Digital Vision T. Penzel 1 , B. Kemp 2 , G. Klösch 3 , A. Schlögl 4 , J. Hasan 5 A. Värri 6 , I. Korhonen 7 1 Department of Medicine, Philipps-University, Marburg 2 Sleep Center, MCH-Westeinde Hospital, Den Haag 3 Department of Neurology, Allgemeines Krankenhaus, Wien 4 Institute for Biomedical Engineering, University of Technology, Graz 5 Tampere University Hospital, Tampere 6 Tampere University of Technology, Tampere 7 VTT Information Technology, Finland

Upload: i

Post on 22-Sep-2016

216 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Acquisition of biomedical signals databases

Acquisition of BiomedicalSignals DatabasesAspects to Consider when Building a Database, Based onExperiences from the SIESTA Project

Biomedical signal databases generallyare built for reference purposes. In or-

der to make use of such reference data-bases, a physiological or medical questionfirst has to be specified. Like in any medi-cal study, hypotheses have to be formu-lated before setting up a biomedical signaldatabase. The questions posed and the hy-potheses formulated naturally lead to thedesign of a study protocol. The study pro-tocol specifies the recording equipmentand details the type and number of signalsand the sampling rates of the signals. Thequestions also allow the estimation of thenumber of subjects from whom the datashould be collected (“power analysis”).The number of subjects and the number oftotal investigations then determinewhether a single center can collect thedata or whether a multicenter study isneeded to collect the data within a reason-able period of time. One other reason sup-ports the selection of multicenter studies.Different sites do use different equipmentand follow slightly different protocols interms of diagnosis and treatment. A studywith many different partners can reflectthese differences and may result beingmore general than it ever could be derivedin a single-center study.

With a thoughtful design the resultingdatabase can answer the questions posedin the beginning. Due to the large amountof collected information, being the natureof biomedical signal databases, the data-base can also answer more questions byapplying new analysis methodologies orhypotheses to the gathered data. But ingeneral, any database, even with exten-sive data, cannot answer all questions inthe field. Very often new data have to beacquired with different protocols, differ-ent channel configurations, different sam-pling rates, or different signal precon-ditioning, etc.

As biomedical signal databases areused for scientific and reference purposes,one specific aim is to obtain the best signalquality possible. Therefore, quality assur-ance during data acquisition is a very im-portant task and has to follow structuredguidelines. However, there is a lack ofsuch general guidelines and often theyneed to be database specific at least tosome extent.

Often, signal databases are created inmulticenter studies. Multicenter studiesrequire additional specifications in termsof recording equipment and protocolcompatibility. Special care must be takenfor continuous quality assurance duringthe recording period at the different sites.Site visits at all recording laboratories area very useful method to harmonize re-cording conditions. Such site visits shouldfollow a checklist of items being tracked.Although the checklist can be plannedahead of the recordings, its should be re-vised after the first round of recordingsfrom each participating laboratory. Thistest series often reveals unforeseeable de-viations from the intended protocol.

Usually, the first objective of record-ing biomedical data is not to make a data-base but to monitor or diagnose a patient,or to do research. The recorded datastrongly depend on this first objective. Forinstance, monitoring the vital signs of apatient in acute life-threatening states orbeing under surgical procedures or anes-thesia conditions requires online analysisand immediate visualization. This isneeded to allow immediate interventionby personnel trained for such situations.When data is recorded for research or di-agnosis, then immediate visualization ofanalyzed data is usually not required, butcontinuous storage of data is.

Annotations and expert scorings of thesignals recorded are as important as thedata itself. Only through the evaluation of

May/June 2001 IEEE ENGINEERING IN MEDICINE AND BIOLOGY 250739-5175/01/$10.00©2001IEEE

©Di

gital

Vision

T. Penzel1, B. Kemp2, G. Klösch3,A. Schlögl4, J. Hasan5 A. Värri6,

I. Korhonen7

1Department of Medicine,Philipps-University, Marburg

2Sleep Center,MCH-Westeinde Hospital, Den Haag

3Department of Neurology,Allgemeines Krankenhaus, Wien

4Institute for Biomedical Engineering,University of Technology, Graz

5Tampere University Hospital, Tampere6Tampere University of Technology, Tampere

7VTT Information Technology, Finland

Page 2: Acquisition of biomedical signals databases

annotations a human user or a computeralgorithm can learn the meaning of spe-cific signal patterns. Therefore, annota-tions and visual evaluation, by experts, ofthe recorded signals can be regarded as thekey for further signal processing and anal-ysis [1]. Especially in complex settingssuch as intensive care, it is often impossi-

ble to judge on the basis of recorded dataonly what was really happening.

A Case Study—the SIESTA ProjectAs a case study for a biosignal data-

base, the European project SIESTA is in-troduced here. Sleep recordings in sleeplaboratories are performed in order to

objectify sleep disorders after havingevaluated the subjective symptoms of in-somnia (“I cannot s leep”) andhypersomnia (“I am always tired and I dofall asleep even when trying to stayalert”). The International Classificationof Sleep Disorders (ICSD), which wasdeveloped in 1990 and revised in 1997,

26 IEEE ENGINEERING IN MEDICINE AND BIOLOGY May/June 2001

Table 1: Minimal and optimal requirements for digital polysomnography as a basis for automatic sleep scoring.The digital amplitude resolution is chosen according to the measurement precision

of the underlying instrument (n.a. = non applicable).

Function Signal Minimum Sampling Rate Optimal Sampling Rate Digital Resolution

Neurophysiology ElectroencephalogramElectro-oculogramElectromyogram

100 Hz100 Hz100 Hz

200 Hz200 Hz200 Hz

0.5 µV / Bit0.5 µV / Bit0.2 µV / Bit

Respiration Oro-Nasal AirflowRespiratory MovementsOesophageal PressureCapnographyOxygen SaturationTranscutaneous pO2, pCO2

Breathing Sounds

16 Hz16 Hz16 Hz16 Hz0.5 Hz0.5 Hz1 Hz

25 Hz25 Hz100 Hz25 Hz1 Hz1 Hz5000 Hz

n.a.n.a.0.5 mmHg / Bit0.1% / Bit1 % / Bit0.1 mmHg / Bitn.a.

Cardiovascular ECGHeart RateBlood Pressure

100 Hz1 Hz50 Hz

250 Hz4 Hz100 Hz

10 µV / Bit1 bpm1 mmHg / Bit

Auxiliary Body TemperatureBody Position

0.1 Hz0.1 Hz

1 Hz1 Hz

0.1 ° Celsius / Bitn.a.

EMG 1

EEG 1EEG 2

EOG 1

EMG 1EOG 2

EMG 2

High-ResolutionMonitor 21”

Printer

CD-ROM

Network

Body Position

ECG

Port

LarynxMicrophone

Resp.Effort

O SaturationHeart Rate

2

Resp.Flow

Esophagus

Abdomen

NasalOral

EMG 2

Video

RoomMicrophone

Preamplifier

Transducer

ECG

Blood PressureAmplifier

Snoring Filter

Respitrace

Pulseoximetry

Transducer

Copnography

PressureAmplifier

Personal Computer

Video ProcessingSound CardAD Converter 16-32 Chan

MonitoringAnalysisControl

Central Panel

Gain+OffsetSetting

ChannelSelection

TimeCode

Chart Recorder8 Channels

EEG 10 mm/s

Chart Recorder8 Channels

Respiration 1 mm/s

1. The diagram illustrates the modules comprising a cardiorespiratory polysomnography with all necessary transducers for sig-nals acquired during sleep studies, amplifiers, and data storage options. Paper chart writers are used for signal quality controlproofing at the time of the recording.

Page 3: Acquisition of biomedical signals databases

defines 88 sleep disorders based onsymptoms and findings from sleep re-cordings [2]. In order to objectify a diag-nosis, a sleep recording must be done in asleep laboratory. Biosignals reflectingneurophysiological, respiratory, and car-diac activities are recorded for eight toten hours during the night. During the re-cording, the signals are also monitored,thus allowing the attending personnel tomake notes on movements, talking dur-ing sleep, or other events being of possi-ble relevance. These data are evaluatedby sleep experts using rules developed bya committee chaired by A. Rechtschaffenand A. Kales in 1968 [3]. The rules werebased on chart recordings of electroen-cephalography, electro-oculography,and electromyography in 30-s epochs.This visual scoring results in fournon-REM (rapid eye movement) sleepstages, with 1and 2 being light sleep, 3and 4 being deep sleep, and REM sleep.Wakefulness and body movements arealso scored and noted. This scoring isstill state of the art in the evaluation ofpolygraphic sleep recordings.

Several l imitations of this pa-per-oriented approach became apparent inthe last 30 years and did lead to multipleapproaches for using computer-basedsleep analysis in order to overcome thelimitations [4]. The SIESTA project wasinitiated to acquire a large reference data-base of sleep recordings from healthy vol-unteers in different age groups and frompatients with sleep disorders selected ac-cording to their prevalence. The aims ofthis multicenter study were:

� to develop an enhanced com-puter-based system for analyzingpolysomnographies in a reliable, re-producible way based on a smalltemporal resolution and a high-am-plitude resolution;

� to obtain an increased understandingof the contribution of well-definedand computed variables to sleepanalysis;

� to achieve an improved descriptionof sleep for subjects which do not fallinto the categories of Rechtschaffenand Kales—e.g., elderly persons, pa-tients with sleep disorders;

� to develop a methodology to makethe new system adaptable andrefinable to sleep disorders otherthan those already included here;

� to compile a sleep scoring manualwith the definitions of the proceduresand terms developed in the SIESTAproject.

Signal Acquisition ConditionsAcquisition of raw physiological sig-

nals is highly dependent on the settings ofamplification and filtering. This affectssignal-to-noise ratio for the informationobtained. Despite similar settings, the re-sulting signals recorded by equipmentfrom different manufacturers often differ.This is due to different implementation ofsensors, amplifiers, and filters by differ-ent manufacturers. Signal-to-noise ratioin low-voltage signals such as brainwaves is especially sensitive to the imple-mentation of amplifiers and the circuitschosen. Therefore, the resulting data aredevice dependent and the device specifi-cation has to be documented with the data.

As different signal channels are inter-preted together, the inter-signal synchro-nization is also important and must bethought of when selecting the recordingequipment (or the analog-digital convert-ers) used throughout the study.Inter-signal synchronization becomes aserious problem when different data arerecorded using different devices with in-dependent clocks to different data files.This is the case in sleep recordings andparallel activity recording using awrist-worn actigraph. In intensive care itis a regular scenario to record data withdifferent devices in parallel. One has toguarantee that the start time matches and

one has to correct for a drift between clockrates of the devices.

Sampling rates must be chosen in sucha way that the requirements of subsequentanalysis are covered [5]. The specifica-t ions as they were appl ied forpolysomnographic recordings and as theywere used in the SIESTA project are pre-sented in Table 1. The different biomedi-cal device modules needed forpolysomnography are depicted in Fig. 1.Once all conditions are set, different re-cording sites were free to use their ownequipment for the study. In the SIESTAproject, minimal sampling rates and thefilter settings for all physiological datawere fixed. Due to the different equip-ment being in use in the different laborato-ries, different sampling rates wereaccepted. Processing of the signals usingthe different sampling rate was done usingtwo approaches. For many analysis meth-ods applied, resampling of the raw signalswas performed to come up with the agreedminimum sampling rate of 100 Hz. In thecase of QRS detection in the ECG, theoriginal sampling rate was preserved andthe analysis was adapted to the conversionrates 100 Hz, 128 Hz, 200 Hz, 250 Hz, 256Hz, and 400 Hz, respectively.

It became also clear that the influenceof sensors and transducers was importantin respiration and oxygen saturation re-

May/June 2001 IEEE ENGINEERING IN MEDICINE AND BIOLOGY 27

2. Visual evaluation of biosignals of a sleep recording is performed using segmentsof 30 s. Sleep stages are classified based on the patterns, and then one of the buttons(each corresponding to a sleep stage) is activated. Here a signal data viewer is pre-sented that was developed for EDF data used in the SIESTA project.

Page 4: Acquisition of biomedical signals databases

cording. For respiratory movement re-cording, piezo transducers of differentkinds, pneumatic belts, and inductiveplethysmography were used. The result-ing waveforms had different signal char-acteristics, so that no uniform analysis ofrespiration could be implemented. Forrespiratory flow, different kinds ofthermistors and thermocouples wereused. The differences between the result-ing waveforms were smaller than the dif-ferences found in respiratory movementsignals.

For oxygen saturation, pulse oximetrydevices from different manufacturers wereused. The devices were modules partiallyintegrated in the polysomnographic re-cording equipment itself. Pulse oximetersuse different settings for the averaging ofpulses and different algorithms when cal-

culating oxygen saturation, based on back-scattered or transmitted light in severalwavelengths. The signal recorded is not theraw signal but the result of a first algorithminvolved in feature extraction. These typesof algorithms are an inherent part of the da-tabase. Care must be taken when the data-base contains signals acquired with variousdevices that may use different algorithms.Practically, it was not possible to assurethat all recording sites did use the same pre-processing algorithm with their oximeters.In consequence, a careful documentationof the type and version of oximeters wasrequired.

Algorithms that make use of the re-corded signals, such as an automatic sleepstaging, are not an inherent part of the da-tabase, and therefore it is not compulsoryto include such algorithms in the database.

Database StructuresIn order to have a systematic access to

the recorded signals and the annotations,rules were settled in the study protocol,with minimum criteria for the signals re-corded by all partners in a multicenterstudy. In the SIESTA project all signalswere either directly recorded using theEuropean Data Format (EDF) or theywere converted into the EDF format afterusing the locally available equipment [6,7]. One single file containing the continu-ously recorded signals was produced persleep recording. Filename conventionsspecified the recording site, the runningnumber of the subject, and the number ofthe night being recorded (either 1 or 2).The order of the signals in the file was alsospecified and it was agreed that the first 16

28 IEEE ENGINEERING IN MEDICINE AND BIOLOGY May/June 2001

1

1

1

1

10

10

10

10

100

100

100

100

1000

1000

1000

1000

10000

10000

10000

10000

100000

100000

100000

100000

−1000 −1000

−1000

−1000

−2000

−1000

−1000

−1000

−2000

−1000

−1000

−1000

−1000

−1000

−2000 −1000

0 0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1000 1000

1000

1000

2000

1000

1000

1000

2000

1000

1000

1000

1000

1000

1000

#1:Fp1-M2

#5:C4-M1

#9:Pos18-M1

#13: Airflow

10^5

10^5

10^5

10^5

10^5

10^5

10^5

10^5

10^5

10^5

10^5

10^5

#2: C3-M2

#6: O2-M1

#10: EMG (chin)

#14: Chest wall mov.

#3: O1-M2

#7: M2-M1

#11: EMG (L+R tib.)

#15: Abdominal mov.

#4: Fp2-M1

#8: Pos 8-M1

#12: ECG

#16: Oxygen sat.

1

1

1

1

1

1

1

1

1

1

1

1200 400

3. Histograms of 16 channels (7 EEG, 2 EOG, 2 EMG, 1 ECG, 3 respiration, 1 oxygen saturation) from one polygraphic record-ing in logarithmic scale. The horizontal line and the vertical markers (green) indicate the mean µ ±1,3,5 times of the standarddeviation (µ ±1,3,5*σ); the left and right triangles indicate the maximum and minimum value; the light (blue) parabolic line in-dicates the Gaussian distribution with the same N, µ and σ2 ; cross markers (x) indicate the digital minimum and maximum val-ues as provided in the EDF header information.

Page 5: Acquisition of biomedical signals databases

signals must be arranged in that particularorder to simplify access for analysis algo-rithms. The possible sampling rates forthe different signals were specified. Thefilename convention was further used foraccompanying files, which were used fordescriptive data of all subjects and for ad-ditional test data [8].

Quality Control of Recorded SignalsIn the SIESTA project, site visits were

performed to check the recording condi-tions at the different sites. This helped inharmonizing the recording practice in thedifferent laboratories and thereby im-proved comparability and overall qualityof the recorded data.

Having acquired the data, quality con-trol first checked the formal criteria of thesignal database. It was advantageous herethat there existed only one data file ofbiosignals for each recording. The firstcheck tested the filename conventions.The second check tested the contents ofthe fields in the global header and the sig-nal headers according to the EDF file for-mat definition [6, 7]. The fields werechecked in terms of correct characters inthe various places. The order of signalswas checked by investigating the labelswritten in the signal headers. The allowedset of labels was defined in the study pro-tocol of the SIESTA project. When the en-tries of all header fields were checked,deviations from the strict definitions forthe contents were found regularly. Sometyping errors could be corrected automati-cally whereas others, such as a shuffledorder of signal channels, needs visual in-spection prior to corrections.

If the contents of the file headers werecorrect, but the data itself were shuffledwithin one signal channel, this could be-come obvious when checking signal prop-erties automatically. This was the thirdstep in checking the data files.

The digitized recordings were testedautomatically using a histogram analysisin order to identify technical failures andartifacts. The bin-width of the histogramswas chosen to be the quantization of theanalog-digital converter (ADC). In caseof 16-bit signed integer numbers, as usedin the EDF format, the histogram H i( ) hasbins in the range −32768 ≤ i ≤ 32767.

When the total number of samples Ngoes to infinity, the normalized histo-gram is the probability distribution p i( ).For large N, the probability distributionmay be approximated by the normalizedhistogram:

May/June 2001 IEEE ENGINEERING IN MEDICINE AND BIOLOGY 29

SaO2

Abdomen

Chest

Airflow

ECG

EMGleft

EMGmental

Pos18-M1

Pos8-M1

M2-M1

O2-M1

C4-M1

Fp2-M1

O1-M2

C3-M2

Fp1-M2

N=458

0 5 10 15Entropy [bit]

4. Box-whisker-plot of entropy values of polygraphic recordings. The boxes indicatethe 0.25 and 0.75 quantile, and the whiskers indicate the 1.5 inter-quantile range(IQR). The polygraphic data was stored in EDF format [6] using 16-bit integer val-ues. Each channel of each recording (458 in total) was calculated by the histogramand the entropy [10]. The polygraphic SIESTA database [8] was used.

5. This example of a sleep recording gives muscle artifacts and movement artifactsthat are found at major transitions from one sleep stage to another one. This is ac-companied by a change in body position, which is annotated in the sleep log writtenby the attending technicians.

Page 6: Acquisition of biomedical signals databases

$( )( )

p iH i

N=

(1)

and the entropy of information I [9] in bi-nary digits (bits) is defined as:

I p i p ii

= ∑ ( ) log ( )2 .(2)

The mean µ, varianceσ 2 , and the stan-dard deviation σ can be obtained from thehistogram H i( ):

µ µ= → =∑ ∑i i

i p iN

i H i( ) $ ( )1

(3)

$ ( $ ) ( )σ µ2 21= −∑N

i H ii

.(4)

The results of the quality control aregiven as an example in Fig. 3. This showsthe histograms of 16 channels from onepolygraphic all-night sleep recordingstored in EDF format [6]. For each chan-nel, the header information contained adigital minimum and maximum of −2048and +2047, respectively. Hence, a 12-bitADC was used. The “sidelobes”of the his-togram are clipping peaks due to satura-tion of the input. It can be caused by alimited dynamic range of the input ampli-fier and/or the ADC. The “smearing” ofthe peaks is probably due to digital filteror a temperature drift of the dynamicrange of the input amplifier. Several chan-nels show also a peak at the value zero.

Figure 4 shows that the entropy of thesignal channels is most often between 7

and 11 bits. Only the oxygen saturationshows a median value of 3 bits. This indi-cates that the quantization resolution islow. Pulse oximeters theoretically pro-duce digital-analog converted values be-tween 0% and 100% saturation with stepsof 1%. During the recordings of our sub-jects the range of values was found to bemuch lower due to oxygen saturation be-ing in normal ranges, and this results inthe very low entropy.

Unfortunately, the saturation values ofthe input amplifiers and the ADC used areoften not specified with the signal data[10]. This lack of information makes itdifficult to perform an automated qualitycontrol using an overflow check for thesignals. In order to overcome this diffi-culty, it is recommended that the initialsaturation values are stored together withthe data.

The final check of the signal quality isthe visual inspection of the signals by anexpert (Fig. 5). During the inspectionphase, quality-related annotations can beadded to the database. Signal artifacts andbiological artifacts are noted. Polygraphicrecordings may be superimposed by manydifferent types of noncortical sources. Inorder to describe the sleep process, thesenoncortical sources must be removed, orat least detected. For a systematic evalua-tion of various artifact processing meth-ods, 90-minute segments out of 15randomly selected sleep recordings werevisually analyzed and annotated. Ninetypes of artifacts (EOG, ECG, muscle,movement, failing electrode, sweat, 50

Hz, breathing, pulse) were visuallyidentified, resulting in a total of 5631921-s epochs. Different artifact detectionmethods (least mean squares algorithm,regression analysis, independent compo-nent and principal component analysis,etc.), which are able to remove technicaland signal artifacts, were tested on the se-lected set of recordings with the annotatedartifacts (Table 2). After validating differ-ent artifact processing algorithms [11],adaptive FIR filtering, regression analy-sis, and template removal were recom-mended to minimize the ECGinterference, 50-Hz notch filtering forminimizing the line interference, adaptiveinverse filtering for muscle and move-ment detection, and combined overflowand flat line detector for failing electrodeartifacts (Fig. 6).

Archiving ConsiderationsSignal data need storage space.

Luckily, digital storage space becomesless expensive and more condensed interms of physical volume dimensions.Using the signal specifications given inTable 1, a digital recording of one night ofsleep with 16 channels being recorded foreight to ten hours requires approximately130 MB of digital memory. A recordableCD-ROM can hold four recordings of thissize. Sleep recordings require two consec-utive nights with recording of all signals.The first night is called “adaptation night”and is examined to investigate the“first-night effect” compared to the sec-ond recorded night. Often subjects need

30 IEEE ENGINEERING IN MEDICINE AND BIOLOGY May/June 2001

Table 2. Possible artifacts in the EEG signal. The methods to detect and remove these and the solution chosenwithin the SIESTA project. (ICA = independent component analysis, PCA = principal component analysis)

Artifact Detection and Removing Solution Chosen

EOG Regression Analysis, PCA, ICA Minimizing

ECG Regression Analysis, PCA, ICA, TemplateRemoval

Minimizing

Muscle Activity Adaptive Inverse Filtering Detect and Mark Data

Movement Adaptive inverse filtering, overflow check Detect and Mark Data

Failing Electrodes Detect Typical Step Function with Exponen-tial Decay, Combined Overflow and FlatLine Detector

detect and mark data

Sweating High-Pass Filtering None

50 Hz interference Notch Filter Off-Line Filter

Breathing Not Important for EEG No Action

Pulse Cross Correlation in Time, (did not appear inthe SIESTA artifact database [11])

Page 7: Acquisition of biomedical signals databases

one night to become comfortable with therecording equipment attached to theirhead and body. In order to evaluate thedifference, both recorded nights need tobe stored. Thus, recordings from two sub-jects will fit on one CD-ROM. The data-base of sleep recordings set up by theSIESTA project consists of 200 healthyvolunteers and 100 patients with selectedsleep disorders. This sums up to approxi-mately 150 CD-ROMs, which in conse-quence require some physical space.Today, hard-disk capacities increase rap-idly, but the ~100 GB needed to store theraw data recorded within the SIESTAproject is still not the standard hard-disksize on desktop computers. In order tohave an easy and systematic access to alldata files and all information, it is favor-able to set up a conventional databasethat keeps track of all subjects recordedand the related files. This core databaseholds medical information about the sub-jects, file information about the availabledata, technical errors and signal informa-tion with artifact annotations, quality an-notations, and interpretation results. Thelarge signal data files may either resideon a central server or on CD-ROMswhere filename conventions are strictlyfollowed. DVD can serve as a practicalalternative to CD-ROMs in order to re-duce the number of separate disks. Withthese rules in mind, archiving and access-ing the database can be done in a conve-nient way.

ConclusionWhen setting up a study design for the

acquisition of a signal database, this has tobe done on the basis of defined hypotheseswith a fixed protocol. On the basis of a cal-culation on the number of subjects needed,the decision has to be made as to whether amulticenter study is need. For large signaldatabases this is usually the case. An agree-ment of all groups providing data must beobtained. Quality assurance in the begin-ning of the database collection period andduring the recording is extremely impor-tant. Site visits are very useful and regularchecks on signal files are necessary. Theitems to be checked are:

� filename consistency;� adherence to the recording file for-

mat specification and agreed labelingconventions;

� entries in file header fields can showminor deviations and major inconsis-tencies;

� saturation values should be alwaysavailable in the file header informa-tion;

� entropy values calculated as auto-matic signal quality checks definepossible compression rate and givequality indicators;

� visual inspection yields technical,signal, and biological artifacts.

The total volume of a signal database isso large that archival considerations haveto be clarified in the beginning. This be-gins with filename conventions and endsat the specific information derived fromanalyzing the files in order to obtain newresults for research and clinical work.

AcknowledgmentThe implementation of the database on

sleep was supported by a grant from theEuropean Union Biomed-2 BMH4-CT97-2040 under the name SIESTA “Anew Standard for Integrating PolygraphicSleep Recordings into a ComprehensiveModel of Human Sleep and its Validationin Sleep Disorders.”

Thomas Penzel studiedphysics and mathemat-ics in Göttingen, Berlin,and Marburg. He re-ceived his diploma intheoretical physics in1986. He then studiedhuman biology and ob-tained a doctorate thesis

in 1991. In 1995 he completed his habilita-tion in physiology at the University ofMarburg. He also studied medical infor-matics and completed this in 1997 with acertificate from the German Society onMedical Informatics (GMDS). Since 1982he has worked at the sleep laboratory of theUniversity in Marburg. Since 1987 he hasbeen leading the task group “Methodol-ogy” of the German Sleep Society(DGSM). Since 1992 has been the chair ofthe “commission for the accreditation ofsleep laboratories” in Germany. In 1993 hewas elected as an extended board memberof the German Sleep Society. Since 1997,he has been a member of the board of theInternational Society on Biotelemetry(ISOB). He has written more than 80 na-

May/June 2001 IEEE ENGINEERING IN MEDICINE AND BIOLOGY 31

6. The signals of this recording show a number of different artifacts. The mostprominent is the artifact in oxygen saturation (the bottom-most channel)valuesabove 100% are not possible. This is a technical artifact. The respiratory move-ment channels, marked as “chest” and “abdomen” do move opposite, which indi-cates a inverse polarity in one of the signals. In the linked reference channel(M1-M2) and in the EOG channel (Pos8-M1) ECG artifacts are visible. In thefrontal EEG channels (Fp1-M2 and Fp2-M1), EOG artifacts are very prominent.All these artifacts are neglected for visual analysis: this is a perfect sample ofREM (rapid eye movement) sleep.

Page 8: Acquisition of biomedical signals databases

tional and international papers and morethan 50 book chapters.

Bastiaan (Bob) Kempreceived the M.Sc. inelectrical engineeringand a Ph.D. on model-based monitoring ofsleep stages fromTwente University ofTechnology, Enschede,The Netherlands, in

1977 and 1987, respectively. He is a medi-cal physicist in the Department of Neurol-ogy at Leiden University Medical Centre,The Netherlands. He is also a clinicalphysicist and director of the Center forSleep and Wake Disorders in MCH-Westeinde Hospital in Den Haag, TheNetherlands. His interests are in sensors,models, and algorithms for the study ofsleep and movement disorders.

Gerhard Klösch studiedpsychology and politi-cal science in Vienna.Since 1989 he hasmainly been working inthe field of sleep re-search (neurophy-siology of sleep anddream research). He co-

ordinated quality control and data man-agement in the SIESTA project.

Alois Schlögl receivedhis Dr. techn. degree forhis PhD-thesis on “theelectroencephalogramand the adaptive auto-regressive model” in2000 from the Universityof Technology Graz,Austria. During his Ph.D.

study, he was a research assistant at the In-stitute of Biomedical Engineering. He wasworking for a project on an EEG-basedBrain Computer Interface and for the SI-ESTA project on multicenter sleep analysis.

Joel Hasan studiedmedicine and computerscience in Turku, Fin-land. He received hismedical degree in 1977and was certified as aspecialist in clinicalneurophysiology in

1987. He was the chiefphysician of clinical neurophysiology ofKanta-Häme Central Hospital from1987-89, has been the staff neuro-physiologist of the University Hospital ofTampere since 1990, and head of the De-partment of Clinical Neurophysiologysince 1997. He has written more than 50international and national papers, includ-ing book chapters. The main emphasis ofhis scientific work is computer analysis ofsleep recordings. He has participated inthe European research projectsComacBME and SIESTA.

Alpo Värri received theM.Sc. in electrical engi-neering in 1986 and theDr.Tech. degree in sig-nal processing in 1992,both from the TampereUniversity of Technol-ogy, Finland. Currently,he is a senior researcher

and the vice head of the Signal ProcessingLaboratory of Tampere University ofTechnology. Since 1994 he has partici-pated in health informatics standardiza-tion within CEN/TC251. His researchinterests include biomedical signal pro-cessing, particularly in sleep research andcomputer programming.

Ilkka Korhonen wasborn in 1968 inHankasalmi, Finland.He received his M.Sc.and Dr.Tech. degrees indigital signal process-ing from Tampere Uni-versity of Technologyin 1991 and 1998, re-

spectively. He is currently working atVTT Information Technology. He is a do-cent in medical informatics (with special-ity in biosignal processing) at the RagnarGranit Institute at Tampere University ofTechnology and a member of IEEE EMBSociety. His main research interests are toapply biosignal interpretation methods incritical-care patient monitoring, autono-mous nervous system research, and homeor remote health monitoring.

Address for Correspondence: Dr.Thomas Penzel, Department of InternalMedicine, University Hospital ofPhilipps-University, Baldingerstr. 1,

D-35033 Marburg, Germany. Fax: +496421 2865450. E-mail: [email protected].

References[1] I. Korhonen, J. Ojaniemi, K. Nieminen, M.van Gils, A. Heikelä, and A. Kari, “Building theIMPROVE data library, “IEEE Eng. Med. Biol.Mag., vol. 16, pp. 25-32, 1997.[2] M.J. Thorpy, “American Sleep Disorders As-sociation: International Classification of SleepDisorders—Revised Version: Diagnostic andCoding manual,” American Sleep Disorders As-sociation, Diagnostic Classification SteeringCommittee, Rochester, 1997.[3] A. Rechtschaffen and A. Kales, A Manual ofStandardized Terminology, Techniques, andScoring System for Sleep Stages of Human Sub-jects, Los Angeles, CA: BIS/BRI, Univ. of Cali-fornia, 1968.[4] T. Penzel and R. Conradt, “Computer basedsleep recording and analysis,” Sleep Med. Rev.,vol. 4, pp. 131-148, 2000.[5] T. Penzel, U. Brandenburg, J. Fischer, M.Jobert, B. Kurella, G. Mayer, H.J. Niewerth, J.H.Peter, T. Pollmächer, T. Schäfer, R. Steinberg, E.Trowitzsch, R. Warmuth, H. G. Weeß, C. Wölk,and J. Zul ley, “Empfehlungen zurcomputergestützten Aufzeichnung undAuswertung von Polygraphien,” Somnologie, vol.2, pp. 42-48, 1998.[6] B. Kemp, A. Värri, A.C. Rosa, K.D. Nielsen,and J. Gade, “A simple format for exchange ofdigi t ized polygraphic recordings,”Electroencephalogr. Clin. Neurophysiol., vol. 82,pp. 391-393, 1992.[7] A. Värri, B. Kemp, T. Penzel, and A. Schlögl,“Standards for biomedical signal databases,”IEEE Eng. Med. Biol. Mag., vol. 20, no. 3, pp.33-37, May/Jun. 2001.[8] G. Klösch, B. Kemp, T. Penzel, A. Schlögl, G.Gruber, W. Herrmann, P. Rappelsberger, E.Trenker, J. Hasan, A. Värri, and G. Dorffner, “TheSIESTA project polysomnographic and clinicaldatabase,” IEEE Eng. Med. Biol. Mag., vol. 20,no. 3, pp. 51-57, May/Jun. 2001.[9] C.E. Shannon, “The mathematical theory ofcommunication,” Bell Syst. Tech. J., vol. 27, pp.379-423, 623-656, 1948.[10] A. Schlögl, B. Kemp, T. Penzel, D. Kunz,S-L. Himanen, A. Värri, G. Dorffner, and G.Pfurtschel ler , “Qual i ty Control ofpolysomnographic Sleep Data by Histogram andEntropy Analysis,” Clin. Neurophysiol., vol. 110,pp. 2165-2170, 1999.[11] A. Schlögl, P. Anderer, M-J. Barbanoj, G.Klösch, G. Gruber, J.L. Lorenzo, O. Filz, M.Koivuluoma, I. Rezek, S.J. Roberts, A. Värri, P.Rappelsberger, G. Pfurtscheller, and G. Dorffner,“Artifact processing of the sleep EEG in the ‘SI-ESTA’-project, Proc. EMBEC’99, Vienna, Aus-tria, Nov. 1999, Part II, pp.1644-1645.

32 IEEE ENGINEERING IN MEDICINE AND BIOLOGY May/June 2001