lindqvist pontus 05129

Upload: alan-prasetyo-rantelino

Post on 14-Apr-2018

234 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Lindqvist Pontus 05129

    1/54

    Compression and Storage

    of Medical Data in Pacemakers

    PONTUS LINDQVIST

    Masters Degree ProjectStockholm, Sweden 2005

    TRITA-NA-E05129

  • 7/30/2019 Lindqvist Pontus 05129

    2/54

    Numerisk analys och datalogi Department of Numerical Analysis

    KTH and Computer Science

    100 44 Stockholm Royal Institute of TechnologySE-100 44 Stockholm, Sweden

    PONTUS LINDQVIST

    TRITA-NA-E05129

    Masters Thesis in Computer Science (20 credits)

    at the School of Computer Science and Engineering,

    Royal Institute of Technology year 2005

    Supervisor at Nada was Anders LansnerExaminer was Anders Lansner

    Compression and Storage

    of Medical Data in Pacemakers

  • 7/30/2019 Lindqvist Pontus 05129

    3/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    Abstract

    The heart is a highly sophisticated organ in the body that has fascinated humans forcenturies. In order to target pathological conditions caused by electrical malfunction inthe heart, it is possible to implant a small electrical device known as a pacemaker inthe patient. In addition to stimulating the heart, the pacemaker collects data describingthe electrical activity of the heart thereby enabling monitoring of the patient. In cur-rent pacemakers relatively small datasets are stored, there exist small amounts of com-pression and this is primarily controlled by hardware. In order to replace the ad hocsolutions available today, a new storage strategy was investigated. A wide range ofcompression techniques were evaluated to find an optimal compression strategy forpacemaker data. Wavelet compression techniques were found to be optimal in termsof compression but require much computing power due to their high complexity. Analternative Adaptive Peak compression algorithm was developed during the project.

    This algorithm (together with the Fan algorithm) seems to be the best choice forpacemaker data. Also, databases designed for embedded systems were investigated inorder to fulfill the requirements for storage of large diagnostic data collections in thepacemaker. Based on theoretical as well as empirical results, it was found that theBerkeley DB and eXtreme databases were the best choices for the pacemaker applica-tion.

    Keywords: pacemaker, compression, storage, main memory database, MMDB, heart,real-time, embedded

  • 7/30/2019 Lindqvist Pontus 05129

    4/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    Sammanfattning

    Titel:Komprimering och lagring av medicinska data i pacemakers

    Hjrtat r ett oerhrt sofistikerat organ i kroppen som har fascinerat mnniskor i r-hundraden. Fr att behandla patologiska tillstnd orsakade av fel i hjrtats elektriskaledningssystem kan man implantera en s kallad pacemaker i patienten. Denna anord-ning kan stimulera hjrtat men samlar ven in medicinska data frn patientens hjrta,vilket ger mjligheter till mer skra diagnoser och bttre behandling. I de pacemakerssom idag finns p marknaden r denna datainsamling tmligen enkel med sm data-mngder och i den mn komprimering av data finns r den hrdvarukontrollerad. Fratt mjliggra insamling av strre datamngder krvs en mer lngsiktig strategi ochdetta projekts ml var att utreda olika lsningar fr en sdan. Ett stort antal kompri-meringstekniker undersktes fr att hitta ett optimalt stt att komprimera data insam-

    lade av en pacemaker. Komprimering med hjlp av Wavelet-tekniker ger vldigt brakomprimeringsgrad fr pacemakerdata, men krver mycket berkningskraft. Som ettalternativ till detta utvecklades en ny komprimeringsteknik (Adaptive Peak Compres-sion) vilken visade sig vara den mest optimala (tillsammans med Fan-tekniken). Somen andra del i projektet undersktes ett antal databaser designade fr inbyggda system.Genom att anvnda sdana databaser fr man ett flexibelt system att lagra data i enpacemaker. Teoretiska och empiriska studier visade att produkterna Berkeley DB ocheXtreme var de bsta valen fr en applikation ssom pacemakern.

    Skord: pacemaker, komprimering, lagring, databas, MMDB, hjrta, realtid, inbyggd

  • 7/30/2019 Lindqvist Pontus 05129

    5/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    Preface

    This report describes the results of my master thesis in computer science at the RoyalInstitute of Technology (KTH) in Stockholm. The project has been performed at theinstitution of numerical analysis and computer science in cooperation with the com-mercial company St Jude Medical in Veddesta outside Stockholm. My supervisorshave been professor Anders Lansner (KTH) and Gran Budgifvars (St Jude Medical).

    Pontus Lindqvist, Stockholm 2003

  • 7/30/2019 Lindqvist Pontus 05129

    6/54

    Table of contents

    1. Introduction......................................................................................................................1

    1.1. The Physiology of the Heart....................................................................11.2. Pacemaker Treatment................................................................................31.3. Data Storage in the Pacemaker................................................................41.4. The need for a better storage solution....................................................5

    2. Goals and requirements..................................................................................................62.1. Data compression......................................................................................62.2. Data storage................................................................................................7

    3. Background and theory...................................................................................................83.1. System characteristics................................................................................83.2. Stored data types and their suceptibility to compression.....................9

    3.3. Compression algorithm theory..............................................................11

    4. Investigated candidates.................................................................................................164.1. Investigated compression algorithms....................................................164.2. Investigated database systems................................................................20

    5. Materials and methods..................................................................................................245.1. General approach.....................................................................................245.2. Compression algorithm evaluation setup.............................................245.3. Setup of literature study of database systems......................................27

    6. Results..............................................................................................................................316.1. Compression algorithm compression performances..........................316.2. Compression algorithm complexity performance...............................316.3. Literature study of database systems.....................................................33

    7. Discussion.......................................................................................................................387.1. Compression algorithms.........................................................................387.2. Database systems.....................................................................................39

    8. Conclusions.....................................................................................................................40

    References.......................................................................................................................41

    Acknowledgements........................................................................................................45

    Appendix A Heart Physiology..................................................................................46General vessel anatomy...........................................................................46

    The Cardiac Cycle....................................................................................46Electrophysiological Heart Disorders...................................................47

  • 7/30/2019 Lindqvist Pontus 05129

    7/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    1

    1. Introduction

    The heart is a highly sophisticated organ in the body that has fascinated humans forcenturies. Despite its modest weight and size, the heart is capable of performing a vastamount of work, 24 hours a day, year in and year out. The sole purpose of the heart isto pump blood to the lungs and to the rest of the body. In one day, the heart beatsabout 100,000 times resulting in about 2,500,000,000 beats in the course of a lifetimelasting 70 years [1].

    1.1. The Physiology of the Heart

    The heart is centrally located in the chest, slightly to the left of the bodys midline. Acage of protective ribs surrounds the organ, which is situated between and in front of

    the lungs.

    Figure 1.1 the general anatomy of ahuman heart.

    Being a double pump formed from a network of nerve cells and muscle fibers, theheart is naturally divided in two parts. Each part is a separate pump and there is nodirect exchange of blood between the two sections. The wall separating them is calledtheseptum. Each side (right and left) of the heart has two cavities anatriumand alargerventricle forming a two-staged pump.

    1 . 1 . 1 . T h e El e c t r i c a l A ct i v i t y o f t h e H e a r t The heart is not dependent upon nervous connections for contracting and pumpingblood. This is a result of the hearts intrinsic electrical activity, which is responsible forthe generation of heartbeats. The nervous system is thus only used for regulating theintrinsic mechanisms of the heart. It is imperative that the atrial and ventricular con-tractions of the heart take place in a specific sequence at appropriate intervals. This isaccomplished by the conduction system of the heart, which is able to trigger andtransmit the electrical impulses controlling the heart activity. These impulses are elec-

  • 7/30/2019 Lindqvist Pontus 05129

    8/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    2

    trochemical in nature and they arise throughdepolarization(electrical discharge) ofthe cardiac cells.

    The cells in a heart can be divided into two groups, depending on their electrical char-

    acteristics. The first collection is composed of cells that spontaneously trigger electri-cal impulses and quickly transmit them. These cells do not contract and are mainlyfound in the conduction system of the heart. The second and much larger groupconsists of cells that respond to electrical impulses by contraction. These cells are re-sponsible for the actual mechanical work performed by the heart [1]. In the rightatrium, a small oval group of cells called thesinoatrial(SA)nodeis situated (Figure1.2). These cells belong to the first type of cardiac cells and have a special ability toinitiate electrical impulses at regular intervals, being regulated by factors such as theautonomous nervous system, hormones and multiple other factors. The cells in theSA node are responsible for keeping the pace of the heart and are thus calledpace-maker cells.

    Figure 1.2 The components of theelectrical conduction system of theheart.

    The electrical activity of the SA node spreads across the atrial chambers of the heartresulting in atrial contraction. At the same time, three separate conduction pathwayslead the electrical signal from the SA node through the right atrium to theatrioven-

    tricular (AV) node(Figure 1.2). The AV node is responsible for conducting the de-polarization from the atria to the ventricles, which are electrically insulated from eachother. In addition, it is responsible for delaying the electrical signal so that the ven-tricular contraction is performed at an appropriate time after the atrial one. After theAV node, the electrical signal is conducted to the ventricular muscle cells throughother parts of the conduction system, such as thebundle of Hisandpurkinje fibers.

    The electrical impulses then trigger the myocardium to contract, which in turn pumpsthe blood out of the heart [2;3].

  • 7/30/2019 Lindqvist Pontus 05129

    9/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    3

    1 . 1 . 2 . Re c o r d i n g s o f t h e H e a r t El e c t r i c a l A c t i v i t y By placing sensory electrodes on the surface of the body and recording the voltagedifferences it is possible to obtain a measure of the electrical activity of the heart. Thiscontinuous recording is called an electrocardiogram (ECG) and is a highly useful di-agnostic aid in the study of cardiac health (Figure 1.3).

    Figure 1.3 A surface ECG describing the electrical activity of the heart.The P wave corresponds to atrial depolarization, the large QRS complexshows ventricular depolarization, while the T wave indicates ventricularrepolarization [3].

    By using electrodes positioned at various parts of the body, it is possible to obtain in-

    formation of the electrical activity in specific cardiac regions, making the techniqueeven more powerful [2]. ECG recordings can be used by a physician to study heartrate, hearth rhythm and morphological changes along with many other parameters [1].

    1.2. Pacemaker Treatment

    The heart is a complex organ and as such, susceptible to errors. There are a numberpathological heart conditions described ranging from muscle atrophies to myocardialinfraction. One class of cardiac pathology is associated with disturbances in the elec-trical system of the heart. These conditions can often be treated by implanting an em-bedded device, such as apacemakeror an intracoronary defibrillator(ICD). For

    the interested reader, the various electrophysiological indications targeted by such de-vices are described in appendix A.

    In order to target the pathological conditions described above, it is possible to implanta small electrical device known as a pacemaker in the patient. The pacemaker is im-planted in the body and connected to the heart via one or more insulated wires calledleads (Figure 1.4). The device is powered by a battery that delivers small low-voltagepulses to the heart, thereby enabling it to imitate the intrinsic functionality of the car-diac electrophysiological system.

  • 7/30/2019 Lindqvist Pontus 05129

    10/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    4

    Figure 1.4 The location of the implanted pacemaker with its electrodesleading to the heart chambers.

    Modern pacemakers are small computers with complicated software controlling thetreatment. With these implantable devices it is possible to treat all the indications de-scribed in appendix A. A sick SA node can be replaced by an atrial electrode, whichmimics the action of the endogenous pacemaker cells. Furthermore, AV block can betreated by inserting electrodes in the ventricles thereby bypassing the blocked AVnode [1].

    Many types of pacemakers exist today differing in capabilities and indications. Theyrange from relatively simple examples with one single atrial electrode to more complexsystems with multiple electrodes and more sophisticated software. It is beyond thescope of this thesis to describe these pacemaker types, but an informative review ofthe current pacemaker types can be found in [1].

    1.3. Data Storage in the Pacemaker

    When deciding how to stimulate the heart with electrical impulses, the pacemaker col-lects data describing the electrical activity of the heart. By evaluating this information,the pacemaker can adapt to every patients special physiology and thereby enable an

    optimal treatment. The electrical leads terminating in the heart chambers thus serveboth as sensory and stimulatory probes. Instead of discarding the collected data, it iscontinuously analyzed and stored by the pacemaker for future evaluation. By means ofa technique known astelemetry, it is possible to transfer data between the pacemakerand external devices, thereby enabling the physician to draw conclusions about thehistorical condition of the patient.

    Upon looking at the information collected by the pacemaker, it is possible to repro-gram the pacemaker and thereby changing the settings for the device. This furtherenables the device to be customized for the patients specific needs.

  • 7/30/2019 Lindqvist Pontus 05129

    11/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    5

    1 . 3 . 1 . T h e St o r a g e C o n f ig u r a t i o n i n Cu r r e n t P a cem a k e r s Pacemakers have been able to store diagnostic data for many years, but due to limita-tions in hardware the use has been rather restricted. The storage of data in currentcommercial pacemaker platforms is characterized by the following:

    Small data sets hardware restrictions puts a restrictive limit on the amountof data that can be stored. The data sets that are stored are thus relativelysmall.

    Ad hoc solutions there generally does not exist a general storage strategy.Rather, ad hoc solutions are common with a low degree of standardization.

    This is probably a result of the small data sets stored, which do not require amore sophisticated data model.

    Often hardware controlled much functionality in current pacemakers is

    controlled by hardware. This results in lower flexibility and less generalizedstorage methods.

    Low degree of compression since much of the data storage is hardwarecontrolled, much of the data is stored in a non-compressed format.

    1.4. The need for a better storage solution

    As the development of new pacemaker technology focuses on a more software basedproduct at the same time as hardware limitations get less restrictive, a new strategy for

    storing data is needed. To enable the storage of more complicated data with substan-tially larger data sets, a more generic method for storing data has to be conceived. Themodel has to be able to store all data types stored in the current pacemaker platformas well as providing support for possible future parameter types.

  • 7/30/2019 Lindqvist Pontus 05129

    12/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    6

    2. Goals and requirements

    This section will define the goals of this project as well as give a short introduction tothe general approach used to achieve them. The general goal of the project has beento investigate new methods for improving the storage of data in the pacemaker. Whenanalyzing the problem, it was found that it could be divided into the two rather sepa-rate sub problems of data compression and data storage. These sub problems as wellas the goals within them are presented below.

    2.1. Data compression

    By compressing the data recorded in the pacemaker, more information can be storedfor future evaluation. As stated in section 1.3.1, the amount of compression in current

    pacemaker platforms is low. The first sub goal of this project has thus been to investi-gate different compression methods to find a potential candidate for future platforms.

    2 . 1 . 1 . Com p r e ss io n r e q u i r em e n t s Before going into details about the different available compression techniques, somespecial remarks about the pacemaker platform and its requirements has to be made. Acompression strategy usable in the pacemaker platform has to consider various issues.A strategy for compressing data has to fulfill these requirements:

    Information preservation due to diagnostic restrictions, it is imperative

    that the information found in the original data is preserved after compression.

    Control of compression degree another preference is the ability to controlthe amount of data compression. Recent information is preferably stored inan exact form with a low degree of compression. However, with older data amore aggressive (and information reducing) compression strategy is accepted[4]. The demand for differential compression degrees introduces a preferencefor a homogenous data representation. This means that no matter how com-pressed (or non-compressed) a data section is, the same data structures areused for representing the data. This is not a cardinal requirement, but if thesame method can be universally used much complexity can be removed fromthe software.

    Limited complexity due to the limited processing capacity of the pace-maker, an algorithm for compressing data has to have a low complexity. Thisis partly due to competing processes and processor performance, but theprime reason is the need of energy preservation. The battery of the pacemakerhas limited energy resources and every processor computation requires a smallamount of energy [4]. This fact rules out many compression techniques in-volving extensive calculations, which could be potential candidates in othercircumstances.

  • 7/30/2019 Lindqvist Pontus 05129

    13/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    7

    2.2. Data storage

    The second sub goal of the project has been to evaluate various database solutionswith regard to the pacemaker platform. There exist a variety of commercial software

    designed for storing data in embedded systems. This part of the project has thus fo-cused on investigating a variety of existing embedded database systems to find an ap-propriate candidate for future use in the pacemaker platform. The primary goal was tofind such a candidate and if no one was found, to give recommendations as to how todevelop such a solution.

    2 . 2 . 1 . St o r a g e r e q u i r e m e n t s There are a number of requirements that have to be fulfilled by the storage solution inthe pacemaker. These include such issues as

    Minimization of memory footprint due to physical limitations, theamount of memory in the pacemaker is restricted. An acceptable overheadfootprint is in the order of tens of kilobytes.

    Reduction of resource allocations in the pacemaker, as in most embed-ded systems, the database system is run on the same processor as the rest ofthe application [5]. To reduce power consumption and to allocate as much ca-pacity as possible to other subsystems, it is imperative for the database to allo-cate minimum CPU bandwidth.

    High availability in contrast to traditional database systems, the embeddedsystem in the pacemaker do not have a system administrator present duringrun-time. Therefore, the system must be able to run on its own.

    Operating system support the operating system environment in the pace-maker is very specific. It is important that the database must be able to run onthe current and possible future systems.

    Support for specific data types the data collected by the pacemaker is ofvarious formats [6]. It is important that the database can handle these in anappropriate and efficient way.

    Ease of expansion the database solution must support possible future de-

    mands for expansion of the various forms of data it stores.

  • 7/30/2019 Lindqvist Pontus 05129

    14/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    8

    3. Background and theory

    In this section, some background information about the problem is introduced. Issuessuch as the pacemaker system characteristics and general theory of compression algo-rithms are presented.

    3.1. System characteristics

    The pacemaker platform is a software environment characterized by its very restrictiveproperties. In order to understand the problems facing a developer, it is necessary tounderstand the properties of the platform. Without going into vendor-specific details,the general properties of a pacemaker software platform are described in the followingsubsections.

    3 . 1 . 1 . H a r d w a r e p la t f o r m The pacemaker hardware platform is designed to be very compact, sustainable andpower efficient. Due to its potential life-sustaining nature, the pacemaker platform hasto be very stable with extremely high requirements. The CPU architecture in thepacemaker platform under development is comparable to a 1 MHz Intel 486 proces-sor [4].

    There is no secondary storage facility in the pacemaker such as a hard drive, but alldata is stored in a RAM memory. The total capacity of the memory is in the low

    megabyte range. This memory is divided into two main parts where one is a fastcache-like memory close to the processor and another larger and slower memory islocated further away. The cache-like memory is primarily used for storage of applica-tion logics while the slow larger memory is used for storage of diagnostic data [4].

    Finally, the hardware platform is designed to be as energy efficient as possible. Al-though the battery of the pacemaker is small, it is capable of powering the pacemakerfor as long as ten years without replacement.

    3 . 1 . 2 . So f t w a r e p l a t f o r m The software platform in pacemakers has traditionally been based on Assembler pro-

    gramming. However, in future software releases, higher-level object-oriented para-digms will probably be used. The pacemaker application is characterized by its needfor real-time applications and operating systems. Due to the energy preservation needsdiscussed in section 3.1.1, the software has to be very efficient. In fact, the goal is touse the processing power of the CPU only a fraction of the time available producing aduty cycleof less than 5% [4].

  • 7/30/2019 Lindqvist Pontus 05129

    15/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    9

    3.2. Stored data types in the pacemaker and

    their susceptibility to compression

    Current pacemakers are capable of storing a variety of data types and diagnostic in-

    formation to assist the physician in the treatment of the patient. Prior to the discus-sion of potential storage features of future devices, this section provides a brief sum-mary of the parameter types stored in current pacemakers.

    3 . 2 . 1 . H i s t o g r a m Using histogram counters, it is possible to get statistical information about events in acompact but still revealing form. In the present pacemaker platform there are twotypes of histograms stored. The first type can be characterized as alabel histogram,where each bin has its own label describing the contents. One form of label histogramis the event histogram, in which information about the frequency of various cardiac

    events are shown (Figure 3.1). In this form of histogram, each bin has a specific name,and it collects the number of events corresponding to that bin.

    Event Histogram (% of total time)

    1 2

    32

    65

    PV PR AV AR

    Figure 3.1 An event histogram belongs to theclass of label histograms. In this histogram thereare four types of events and the histogram showstheir relative frequencies.

    The other form of histogram used in the current pacemaker platform is therangehistogram, in which each bin corresponds to a specific range. An example of thistype is the heart rate histogram showed in Figure 3.2.

  • 7/30/2019 Lindqvist Pontus 05129

    16/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    10

    Heart Rate Histogram (% of total time)

    2

    30

    43

    15

    7

    3

    0

    5

    10

    15

    20

    2530

    35

    40

    45

    50

    -59 60-69 70-79 80-89 90-99 100-

    Rate (min-1)

    PercentTime

    Figure 3.2 The heart rate histogram shows in-formation about the heart rate during a time inter-val. This is an example of a range histogram.

    Several histograms of each type are stored in the present pacemaker as well as infor-mation about sampling rates and average values.

    3 . 2 . 2 . Ev e n t r e c o r d s During runtime, the pacemaker device records a number of frequent cardiac eventsoccurring in the heart. Registered cardiac events can be categorized into one of twocategories depending on the event trigger source. The first type is called asensed car-diac event, indicating a physiological event elicited by the heart itself that has beensensed by the device. The pacemaker can for instance record endogenous atrial cur-rents corresponding to the P wave in an electrocardiogram. The device can also recordpaced cardiac events, characterized by the fact that the events have been triggeredby the pacemaker. In the absence of a sensed atrial P wave, the pacemaker paces theatrium and thus records a corresponding event. The cardiac events are a compact

    form of information about the device status. Along with each cardiac event, additionalinformation such as timestamp and heart rate is stored. An example of a saved eventrecord is shown in figure 3.3 below.

    Figure 3.3 a plot showing recordings of different cardiacevents [7]. Each type of event is represented by a specificsymbol. Time is displayed on the x axis and the y axis indi-cated the heart rate.

    The event record is a more detailed way of storing event data than the histogram de-scribed in section 3.2.1. Each event occurring in the device is recorded and placed in acircular buffer with a specified size. The buffer adheres to the first-in-first-out (FIFO)paradigm in that new events overwrite older events stored in the buffer due to storagespace limitations. Various triggers can, however, cause a part of the circular buffer tobe more permanently stored for future reference. These triggers include several intrin-sic heart events as well as extrinsic commands (such as placing a magnet on the pa-tients chest).

  • 7/30/2019 Lindqvist Pontus 05129

    17/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    11

    3 . 2 . 3 . I EGM s a n d o t h e r s am p l i n g s The third type of data stored in newer types pacemakers is samplings ofintracardiacelectrograms(IEGMs) and other parameters. An electrogram is a recording of thecardiac electric activity used for diagnosis of various medical conditions. The mostfamiliar form of an electrogram is the type recorded by electrodes attached to the skinof a patient at different locations on the body. However, a pacemaker device is capa-ble of registering an intracardiac electrogram, indicating that the electrodes used aresituated within the heart. In fact, recordings of IEGMs are done with the same elec-trodes used for the pacing of the heart. An example of an IEGM sampling can beseen in Figure 3.4.

    Figure 3.4 Graph displaying a intracardiac electrogram(IEGM) recorded by a pacemaker device [8]. Some eventmarkers are also visible.

    IEGM recordings are a more detailed view of the intracardiac electrical events thanthe event records described in section 3.2.2. The two types of information are oftenused in combination for determining the status of the pacemaker settings and diagnos-ing the patient. The IEGM samplings contain large amounts of data points as well as

    other information such as timestamps, sampling rate and measured parameter types.In addition to IEGM samplings, other parameters such as battery status, intrinsicheart rate and pacing thresholds are recorded in the same manner.

    3 . 2 . 4 . Com p r e s s io n c a n d i d a t e s There are two main candidate information types in the pacemaker when consideringcompression of data. First of all, the IEGM samplings (section 3.2.3) require largeamounts of storage space due to significant data volumes. If this data could be com-pressed to a higher degree, great performance increases could be accomplished. Sec-ond, the event records (section 3.2.2) stored in association with different cardiac

    events could also be reduced in size. The histogram information stored in the pace-maker does not, however, seem as a potential candidate for further compression. Thisis mainly due to the fact that the histogram itself is a very compressed form of pre-senting data and it does not require large amounts of storage space.

    3.3. Compression algorithm theory

    A key component in the process of using available memory as optimal as possiblemight be data compression. By compressing data in various ways, less space is neededto store information. Compression techniques that ensure that the result of decom-

  • 7/30/2019 Lindqvist Pontus 05129

    18/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    12

    pression is exactly the same data as before compression are known aslossless. Con-versely, with lossy compressiontechniques information is lost in the compressionprocess but a higher degree of size reduction can generally be accomplished. Usingthese techniques in the pacemaker could enable the storage of more information

    without the need for more storage space.

    3 . 3 . 1 . Ne g a t i v e e f f e c t s o f c om p r e s s io n There are not only benefits associated with data compression. Several downside fac-tors exist which have to be considered before making any committing decisions in thisissue. The main negative issues regarding compression are listed below:

    Difficulty of processing compressed information is often more difficult toprocess from within the software. Some compression techniques prevent ran-dom access to compressed information and thus it is necessary to decompress

    an entire data stream to access any part of it. This in turn results in a require-ment to have enough main memory to store all the decompressed informa-tion.

    Software complexity the software has to provide compression support,making it more complex to maintain.

    Resource allocation as mentioned earlier, compression strategies allocateprocessing power from the CPU. This could potentially result in both shorterbattery lifetime as well as reduction of the real-time responsiveness of the de-vice.

    Predictability with some techniques it is difficult to estimate the amount of

    compression that can be accomplished. This depends on the structure of thedata to compress and the algorithm used. Thus, the amount of memory re-quired to store a given amount of data could become less predictable.

    3 . 3 . 2 . L o s s le s s c om p r e s s i o n t e c h n i q u e s The key idea behind compression is that most data contains a large amount of redun-dancy repeated information. The following sections explore several types of non-lossy techniques that use redundancy to enable compression of data.

    Difference encoding much information is constructed of series of datapoints which is accessed sequentially, beginning at the first item and thenprocessing each item in turn. Continuous data sequences (such as IEGMsamplings) are rarely random and in many sequences the values do not changevery much between adjacent items and there are parts where there is nochange for several elements. A particularly attractive alternative for compress-ing data in the pacemaker environment is run-length encoding, which is appli-cable to streams of sequential data. The technique can be used to reduce thesize of any stream of data but the amount of compression depends on thecharacter of the data. Run-length encoding compresses a run of duplicateditems by storing the value of the duplicated once, followed by the length ofthe run [15]. Another form of difference coding is the delta coding technique,which stores differences between adjacent items, rather than the absolute val-

  • 7/30/2019 Lindqvist Pontus 05129

    19/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    13

    ues of the items themselves [16]. This method saves memory space since thedifferences can usually be stored as smaller numbers than the absolute values.Of course, a data sequence can first be compressed using delta coding andthen subjected to run-length encoding resulting in an even more compressed

    data stream. Difference encoding techniques cannot achieve as high amountof compression as other lossless compression techniques, but is easy to im-plement and fast to execute. Sequential operations on the compressed datacan execute almost as fast as operations on native values, preserving real-timeresponsiveness and improving temporal performance [16].

    Table compression the general theory behind table compression is notvery complicated. The key is that some data values occur more frequently thanothers. A statistical table is constructed with all data points and their corre-sponding frequencies. Then, each data value is mapped to a bit sequencewhere the most frequent data value gets the shortest bit sequence (usually justone bit). All values get mapped to a bit sequence where the longest bit se-

    quence corresponds to the least frequent value. There are many implementa-tions of table compression described, but the most well-known one is theHuffman code[15]. Table compression results in a moderate compressionand the implementation is rather straightforward. However, the statistical cal-culations preceding the coding of compressed data may require some extraprocessing time (although not as much as the adaptive compression describedbelow) thereby reducing the real-time performance. In addition, the methodmight be inappropriate for continuous data streams since it is difficult to mapcontinuous data to a table.

    Adaptive compression algorithms using adaptive compression analyze thedata they are compressing and modify their behavior accordingly. These algo-rithms often provide very high compression ratios and work in several losslesstechniques exist including the commonly used gzip file format [17]. Describ-ing the theory behind adaptive compression is beyond the scope of this thesis,but the main principle behind these algorithms is a detailed analysis of thedata to compress and subsequent compression based on the results. A moredetailed description of adaptive compression techniques can be found in [18].

    The great negative aspect of adaptive compression techniques regarding thepacemaker domain is the need for processing time for data calculations. Thecomprehensive analysis of the data set prior to the actual compression re-quires allocation of considerable processing time and thus the methods arenot suitable for real-time work [16].

    In conclusion, the only lossless compression candidate for the pacemaker applicationis various difference encoding techniques. However, these techniques do not com-press the data enough to be a feasible alternative.As an example, let us investigate thedifference encoding compression techniques in the scenario of IEGM compression.First of all, all difference encoding techniques require discrete values in order to com-press the signal. Obviously this is not the case in an IEGM recording. However, thiscan be solved by truncating the signal by dividing it into a histogram-like plot (in forinstance 256 steps). By doing this, the result will in fact be a lossy compression sinceinformation has been lost. However, even if this technique was used the differenceencoding of IEGM data, the compression achieved was not very high. Even if as few

  • 7/30/2019 Lindqvist Pontus 05129

    20/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    14

    as 16 data bins were used (resulting in a very pixilated curve, seeFigure 3.5be-low) were used, the average compression ratios achieved were significantly below 2..

    Figure 3.5 The result of converting a continu-ous IEGM signal into a discrete signal with 16levels.

    Thus, the actual compression is lossless, but prior to that the signal has been severelydegraded, as seen in Figure 3.5 above.The following section briefly describes howthese techniques can be modified to lossy compression algorithms thereby achievingmuch larger compression ratios with remained diagnostic capabilities.

    3 . 3 . 3 . L o ss y c om p r e s s io n t e c h n i q u e s In addition to the lossless compression methods described above there exists manytechniques for compressing data with an associated loss of information. With lossycompression, decompression produces an approximation of the original informationrather than an exact copy. Lossy compression requires knowledge about the specifictype of data being compressed but the achieved compression ratios are usually a lotlarger than with lossless methods. Many lossy techniques require relatively largeamounts of processing time making them inappropriate for the pacemaker applica-tion. However, as described below, there exist some very simple techniques for com-pressing data in a lossy manner.

    Data thinning perhaps the most intuitive way of reducing the size of a datapoint sequence is to simply reduce the number of data points. Reducing thesampling frequency by removing data points, it is possible to accomplish largegains in storage capacity. In principle, the method operates by removing tem-porally redundant data points. In other words, if a data point is to be stored, ithas to differ a certain degree from the previous data point. Data thinning canprovide a very simple and fast way of reducing the memory requirements forstoring large data sequences.

  • 7/30/2019 Lindqvist Pontus 05129

    21/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    15

    Precision elimination an alternative way of reducing the memory require-ments for storing data is to reduce the precision of data. This can be accom-plished in several ways. One method is to simply convert large real-type num-bers to integers requiring much less storage space (typically 1-2 bytes pernumber as compared to 4-8 bytes). A potential disadvantage of this approachis that the uncompressed data is in a different format than the compressed in-formation.

    Curve fitting in addition to the fairly simple lossy approaches describedabove, there are more advanced methods available. These share the character-istics that they require more calculations and thus call for a higher degree ofprocessing power. One methodology might be to try to fit polynomial curvesto data points, thereby describing the data in a very compact manner. Manynumerical methods exist (including Bezier curves, interpolation and splines)for these purposes. Depending on the underlying data (a continuous data se-quence is preferred), they all appear as suitable candidates for compression ofdata. Nevertheless, the calculations required for performing these tasks arecurrently beyond the capacity of the pacemaker [9].

    Transform-based methods by using mathematical transforms such as theFourier or wavelet transforms, it is possible to compress data to a very highdegree. However, this process usually requires large computing efforts.

  • 7/30/2019 Lindqvist Pontus 05129

    22/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    16

    4. Investigated candidates

    This section will describe the various compression algorithm and database algorithmsthat have been examined during this project.

    4.1. Investigated compression algorithms

    Many ECG compression algorithms have been described in the scientific literature.The vast majority of these have been lossy algorithms, as those described in section3.3.3. However, in previous trials focus has been on how much compression a specificalgorithm can achieve without loosing too much diagnostic information. These algo-rithms have been devices for applications with far more computing power than thepacemaker and thus, almost no attention has been given to the complexity of the algo-

    rithms.

    10 different lossy algorithms were investigated during the study. 8 of these were algo-rithms previously described in the scientific literature but two of the algorithms (Tem-plate and Adaptive peak encoding) have been developed during the course of the pro-

    ject and have not been described previously. Almost all algorithms studied compressdata by reducing the number of data point in the time domain. A few of them usetransforms to compress data, but these algorithms usually require more computingpower.

    4 . 1 . 1 . Da t a t h i n n i n g a l g o r i t h m This is a very simple (and fast) algorithm that basically reduces the sampling fre-quency. Using a parameter (with values such as 2, 3, 4) describing the distance be-tween them, data points are simply stored according to their original position in thedata stream. Thus, if the parameter has the value 2, data points in positions 1, 3, 5,7 are kept. This algorithm is very crude and does not adapt in any way to respondto curve changes in the data to be compressed. It was mainly investigated to provethat more complex algorithms are needed.

    4 . 1 . 2 . A z s im p l e a lg o r i t h m The Azsimple algorithm is a form of runlength encoding algorithm [10]. It begins by

    selecting the first data point in the dataset to compress. Using a threshold value of, itthen determines the number of consecutive points that fall within range of relativeto the first data point.

    Once a data point outside the range is found, the initial data point value along with

    the number of data points within the range are stored. The point outside the range isthen chosen as a new base point. Using this approach, a data curve can be approxi-mated in a stepwise manner, generating a reconstructed signal as shown in Figure 4.1.

  • 7/30/2019 Lindqvist Pontus 05129

    23/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    17

    Original

    Azsimple reconstructed

    Figure 4.1 An example of an ECG signal recon-structed with the Azsimple algorithm (verticallydisplaced to simplify comparison).

    The size of determines the accuracy as well as the compression ratio of the algo-rithm. The algorithm is very straightforward and does not require many calculations.

    4 . 1 . 3 . AZTEC a l g o r i t h mThe Azsimple algorithm suffers from a great drawback in that it is very inefficient atcompressing slow slopes in an ECG curve. This is evident from Figure 4.1, where theslopes are constructed in a stepwise manner. This issue was addressed by Cox et al,who introduced theamplitude zone time epoch coding(AZTEC) algorithm asearly as 1968 [11].

    Original

    AZTEC reconstructed

    Figure 4.2 An example of an ECG signal re-constructed with the AZTEC algorithm (verti-cally displaced to simplify comparison).

    AZTEC behaves much as the Azsimple algorithm. However, instead of consideringthe ECG curve as a series of plateaus, it assumes that the data is constructed of bothplateaus and slopes. The plateaus are produced exactly as in Azsimple, but when thenumber of points forming a plateau is less than 3, it is considered as the start of aslope. The algorithm continues to investigate the slope until 3 or more samples can

    form a plateau within the boundaries specified by. The slope length and direction isthen stored.

  • 7/30/2019 Lindqvist Pontus 05129

    24/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    18

    The large benefit of AZTEC compared with the Azsimple algorithm is that sharp anduniform slopes are treated better. However, since all slopes in AZTEC are linear, ithas difficulties reconstructing slopes with curved appearances. This is evident in thelarge QRS complex in Figure 4.2. Introducing limits on the length of a slope can solve

    this problem, but this increases the complexity of the algorithm and thereby restrictsthe efficiency.

    4 . 1 . 4 . A d a p t i v e A z t e c a l g o r i t h m The AZTEC algorithm described in the previous section has been widely used inolder ECG compression systems [12]. The algorithm has great data reduction proper-ties, but an AZTEC signal is usually not visually acceptable to a cardiologist. Furthand Perez have described a real-time data compression algorithm which is a modifiedversion of AZTEC [12]. The algorithm calculates several statistical parameters of theECG signal, which are used online to adapt the algorithm to the nature of the signal

    region.

    4 . 1 . 5 . Fa n a l g o r i t h m One of the techniques most commonly used algorithms in real-time ECG compres-sion is theFan algorithm[13]. The general idea behind the algorithm is to selectwhich data point to retain by drawing cones (or fans) around successive data points.If the subsequent data point lies within the current cone, that point is discarded. Analgorithm sharing many features with Fan is thescan along the polygonal algo-rithm(SAPA-2). It has actually been shown that this algorithm is the same as the Fanmethod [14], thus The SAPA-2 algorithm has not been investigated during this pro-

    ject.

    4 . 1 . 6 . Co r t e s a l g o r i t h m Yet another algorithm that has been tested is theCORTES algorithm(CoordinateReduction Time Encoding System) [12;15]. It is essentially a mixture of the Aztec andFan algorithms. By doing this, it combines features from both sources and does notsuffer from discontinuities like the Aztec algorithm.

    4 . 1 . 7 . D i s cr e t e c o s in e t r a n s f o r m b a s e d c om p r e s s io n The discrete cosine transform (DCT) [16] is currently widely used for compressing

    data such as images (JPEG), video (MPEG) and audio (MP3) [17]. However, it hasalso been used for compressing medical signals such as ECG data [18]. Using originaldataxwith N data points, the DCT can be defined as

    102

    )12(cos)()()(

    1

    0

    +=

    =Nk

    N

    knnxkkX

    N

    n

    where

  • 7/30/2019 Lindqvist Pontus 05129

    25/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    19

    112

    )(,1

    )0( == NkforN

    kN

    Using this equation, it is possible to transform the original signal into a spatial domain.By truncating the signal in the transformed domain and then runlength-encode thesignal, it is possible to achieve high degrees of compression.

    4 . 1 . 8 . W a v e le t t r a n s f o r m b a se d c o m p r e ss io n The wavelet transform is a newer transform type that has acquired much attention thelast couple of years. The transform decomposes an original signal x(t)into a weightedsum of basis functions, which in turn are dilated and translated versions of a proto-

    type functioncalled the mother wavelet. Dilation is achieved by multiplying tbysome scaling factor. Thus, the wavelet transform decomposes the signal into variousscales with differing features. By eliminating scales with non-significant features, it ispossible to reduce the size of data. A description of the entire theory behind wavelettransforms is beyond the scope of this thesis, but there exists several good reviews onthe subject [19;20].

    The wavelet transform has been widely used as a method to compress clinical ECGdata [21;22]. The reported compression rations are very high, but little information isprovided regarding the computational complexity of the algorithms. Thus, the algo-rithm type was tested in this project.

    4 . 1 . 9 . T em p l a t e en c o d i n g None of the algorithms described in the sections above take into account the specificcharacteristics of the ECG signal. Normally, the signal follows a specific rhythmic pat-tern with much redundant information. Every heartbeat gives rise to a new signal, butmany heartbeats share almost the same appearance. To take advantage of this, an algo-rithm that stored template heartbeats was constructed within this project. For theheartbeats following a template, only the difference between the template and theoriginal signal were truncated and stored with runlength encoding.

    This method of compression is inherently more complex than most other time-domain methods. This is because the ECG signal has to be analyzed to be able to de-termine the distance between each heartbeat and apply the templates at correct time

    intervals. However, on a regular signal without singularities, the method was thoughtto achieve high degrees of compression. Thus, the method was developed and imple-mented and tested on the clinical data.

    4 . 1 . 1 0 .A d a p t i v e p e a k e n c o d i n g The second time-domain encoding technique developed during this project has beenattributedadaptive peak encoding(APE). The theory behind the algorithm is rathersimple and takes into account that variability within an ECG signal. Within a signalthere are periods when the baseline is very flat and thus requires few data points asdescription. However, there also exist places where many data points are needed due

  • 7/30/2019 Lindqvist Pontus 05129

    26/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    20

    to fast changes in the baseline. The APE algorithm takes these features into accountby looking at the ECG curve in discrete window sections.

    Starting with an initial window length of a specific length, the algorithm then calcu-

    lates the extreme curve values within the window. If these values are further apartthan a specified value (high), the window size is reduced by a factor of two and newextreme values are calculated. On the other hand, if the extreme values closer thananother parameter value (low), the window is instead increased by a factor of two.When the difference between the extreme values are between the high and low pa-rameters, the extreme points are stored by the algorithm. An illustration of typical thewindow sizes for the algorithm is shown in the Figure 4.3 below.

    Figure 4.3 An illustration of typical window sizes used inthe adaptive peak encoding algorithm.

    4.2. Investigated database systems

    In addition to examining compression algorithms, the project focused on finding a

    feasible solution to storing the compressed data. The development of small intelligentdevices found everywhere from pockets and purses to industrial systems has stimu-lated the need for more general data storage solutions. To support expanding featuresets, applications generally must manage larger volumes of more complex data. As theamount of information increases, it becomes increasingly important that data is man-aged and stored efficiently and in a uniform manner by the system. Current techniquesadopted for storing and manipulating data in the pacemaker platform (as in most em-bedded systems) are ad hoc and the data management is traditionally built as a part ofthe overall system [5].

    Many device developers are now exploring moving from self-developed data man-

    agement solutions to proven commercial database systems. Consequently, the marketfor commercial database solutions for small-footprint main memory applications hasexpanded significantly [5;23]. Due to the growing market for general databases forembedded systems, many existing and new database producers have developed small-footprint databases. These systems may provide a feasible solution for the data man-agement problem in the pacemaker. To investigate this issue further, literature andother sources of information about databases for embedded systems have been stud-ied.

  • 7/30/2019 Lindqvist Pontus 05129

    27/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    21

    4 . 2 . 1 . Em b e d d e d d a t a b a s e s Traditional database management systems, with their roots in business computing,provide only partial solutions to the radically different application domain of embed-ded systems. Relational databases, the most widely used type of traditional databases,emerged almost two decades ago to support the business functions of large corpora-tions. Their features include support for query languages such as SQL among withother characteristics such as lock arbitration and cache control [23]. However, on anembedded device such as the pacemaker, these features are superfluous and cause theapplication to exceed the memory and processing resources available [5].

    The main objectives of a traditional enterprise database system usually are transactionthroughput and low average response time [24]. In contrast, for embedded real-timedatabases the main goal is to achieve predictability with respect to response times,memory allocation and CPU usage. The issues of size and resource usage are not asimportant for traditional database systems as for those designed for embedded sys-

    tems since traditional hardware is relatively inexpensive. For embedded systems how-ever, there are often hardware limitations regarding both price and size resulting inaltered database requirements.

    An embedded system can usually be constructed for several operating system plat-forms. Thus it is important for database vendors to offer support for the various op-erating systems available for embedded use. Most embedded systems also function asreal-time applications making real-time performance of the database a significant fea-ture.

    Another aspect differing between traditional and embedded systems is the need for

    high availability in embedded systems. In contrast to a traditional database system,most embedded systems do not have a system administrator present during run-time.Therefore, an embedded database must be able to run on its own, without supervision[5].

    All existing databases specifically designed for embedded systems thus focus on thesame issues:

    Minimization of memory footprint

    Reduction of resource allocations

    Support for multiple operating systems

    Real-time performance High availability

    The databases covered in this survey all take these issues into consideration, but tovarying extents.

    4 . 2 . 2 . A s p e ct s o f c h o o s in g a n e x i s t i n g s o l u t i o n When deciding if to use an existing database solution designed for embedded systemsor to develop a new solution, several aspects must be taken into account. Potentialpositive consequences of choosing an existing solution include

  • 7/30/2019 Lindqvist Pontus 05129

    28/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    22

    Reduction of development costs reuse of existing software can signifi-cantly reduce both the economical and temporal expenses involved in the de-velopment of new software.

    Integrated security the programmers task is made simpler since most ex-isting embedded database systems provide incorporate support for consistentand safe manipulation of data [5].

    Simplified maintainability as the existing software evolves, updates can bemade in a plain manner.

    Facilitated integration with other systems existing databases usually pro-vide mechanisms that support transfer of data to other systems such as largecentral databases [5].

    There might also exist some downside effects from choosing an existing database sys-tem resulting from the systems potential shortcomings. The system has to be able tohandle the application requirements in terms of memory size, platform support andreal-time performance. It is also imperative that the system can manage the variousspecial data formats used in the application.

    4 . 2 . 3 . Ex am i n e d d a t a b a se sy s t e m s In the initial investigation of existing database systems for embedded systems, a num-ber of commercial and non-commercial solutions were reviewed. A short introductionof the systems examined is given here:

    Pervasive.SQL by Pervasive Software Inc. This database was developed inthree different versions for embedded systems, intended for smart cards, mo-bile systems and general embedded systems respectively. The three versionsdiffer in memory usage and functionality [25]. The fact that these databaseshave very small memory-footprints was one reason for investigating them.

    Polyhedraby Polyhedra Plc. This database is claimed to be a real-time da-tabase with main memory storage [26].

    Berkeley DBby Sleepycat Software Inc. A small-footprint database solu-tion, which is distributed as open source [27]. This could potentially make itinteresting, as it is possible to alter the software behavior.

    eXtremeby McObject LLC Another small-footprint main-memory data-base solution [28].

    Birdstep RDM MobilebyBirdstep Technology Claims to be a real-timedatabase with small footprint ideal for mobile and embedded systems [29].

    Pointbase Micro Editionby PointBase Inc. A small-footprint java-supported database.

  • 7/30/2019 Lindqvist Pontus 05129

    29/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    23

    Solid Flowengineby Solidtech Ltd. Claims to be an embeddable light-weight database management system. The engine provides data access viaSQL with ODBC and JDBC interfaces provided [30].

    c-tree Plusby FairCom Distributed in source-code form resulting in thatusers can port the products source code to several embedded operating sys-tems [31].

    SQL anywhereby Sybase Inc. A small-footprint database that has emergedas a subpart of the larger databases provided by Sybase Inc [32].

  • 7/30/2019 Lindqvist Pontus 05129

    30/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    24

    5. Materials and methods

    This section will describe the methods used for performing the investigation requiredby the project. Aspects such as how the study compression algorithms and the studyof commercial database solutions will be discussed.

    5.1. General approach

    When working on the project, the same general approach to problems was alwaysused. Whenever a problem was found, a search for relevant literature was initiated. Ifsuch literature was found, it was studied to generate ideas of how to solve the specificproblem. If no solution was found, colleagues and experts were consulted to get newideas. Thus, there have been extensive contacts via email with experts on embedded

    systems from around the world during the course of this project.

    5 . 1 . 1 . T e st p l a t f o r m All tests were performed using an 800 MHz Pentium II hardware platform with a Mi-crosoft Windows NT 4.0 operating system. The pacemaker architecture can be com-pared to an Intel 486 processor with a clock frequency of 1 MHz and this informationhas been used when computing compression algorithm data rates and duty cycles.

    5.2. Compression algorithm evaluation setup

    The evaluation of the various compression algorithms was done in a number of ways.Clinical data sets were obtained to use for compression evaluation. After choosing anestimate for error (section 5.2.3), a maximum allowed error was chosen. The compres-sion algorithms were then tested in terms of speed and compression ratio using thiserror criterion as a limit.

    5 . 2 . 1 . I m p l em e n t a t i o n i ss u e s To evaluate the compression ratios achieved by the algorithms, the algorithms werefirst implemented using Matlab 6.0 [33]. Initially, Matlab was also used to evaluate thespeed of the algorithms. However, since Matlab uses inherent optimized functions for

    various computational tasks, the results of these studies were not absolute.

    By implementing the algorithms in C, a more exact time estimate was achieved. Thealgorithms were run on large data sets and timers were used to calculate the numberof data points per second a specific algorithm could handle. All of these tests wereperformed using the Pentium architecture described in section 5.1.1. This is not theoptimal testing platform, since it does not exactly mimic the hardware architecture inthe pacemaker. However, these tests gave an indication of the speed of the algorithms.

    To achieve an even more exact estimate of the speeds of the algorithms, they wereanalyzed on assembler level to calculate the number of clock cycles needed for all op-

  • 7/30/2019 Lindqvist Pontus 05129

    31/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    25

    erations. Using this approach, it was possible to exactly determine the rate of whicheach algorithm could compress the sample data on a pacemaker platform. The resultsof this analysis are presented in section 6.2.

    5 . 2 . 2 . Da t a s e t u s e d When evaluating the compression algorithms, actual clinical intracardiac recordingswere used to achieve an accurate estimate. In cooperation with the research depart-ment at St Jude Medical, ten representative recordings were chosen for evaluation[34]. Together, they represented a variety of curve appearances recorded by an em-bedded pacemaker.

    5 . 2 . 3 . Er r o r e s t i m a t e Compression algorithms all aim at removing redundancy within data, thereby discard-

    ing irrelevant information. In the case of ECG compression, data that does not con-tain diagnostic information can be removed without any loss to the physician. To beable to compare different compression algorithms, it is imperative that an error crite-rion is defined such that it will measure the ability of the reconstructed signal to pre-serve the relevant diagnostic information. Several techniques exist for evaluating thequality of compression algorithms; the ones most commonly used in scientific litera-ture are presented below.

    Subjective judgment the most obvious way to determine the preservationof diagnostic information is to subject the reconstructed data to evaluation bya cardiologist. This approach might be accurate in some cases but suffersfrom many disadvantages. One drawback is that it is a subjective measure ofthe quality of reconstructed data and depending on the cardiologist being con-sulted, different results may be presented. Another shortcoming of the ap-proach is that it is highly inefficient. The subjective judgment solution is ex-pensive and can generally be applied only for research purposes [35].

    RMS in some literature, theroot mean square error(RMS) is used as anerror estimate. The RMS measure is defined as

    N

    nxnx

    RMS

    N

    n

    =

    =1

    2))(')((

    wherex(n)is the original signal, x(n)is the reconstructed signal andNis thelength of the window over which the RMS is calculated [36]. This is a purelymathematical error estimate without any diagnostic considerations.

    PRD to enable comparison between signals with different amplitudes, amodification of the RMS error estimate has been devised. ThepercentageRMS difference(PRD) is defined as

  • 7/30/2019 Lindqvist Pontus 05129

    32/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    26

    =

    =

    =N

    n

    N

    n

    nx

    nxnxPRD

    1

    2

    1

    2

    )(

    ))(')((

    This error estimate is the one most commonly used in all scientific literatureconcerned with ECG compression techniques [35]. The main drawbacks arethe inability to cope with baseline fluctuations and the inability to discriminatebetween the diagnostic portions of an ECG curve. However, its simplicity andits relative accuracy make it a popular error estimate among researchers.

    WDD a couple of years ago, a new error measure for ECG compressiontechniques was presented by Zigel et al [35;37]. This is called theweighteddiagnostic distortion measure(WDD) and can be described as a combina-tion of mathematical and diagnostic subjective measures. The estimate isbased on comparing the PQRST complex features of the original and recon-structed ECG signals. The WDD measures the relative preservation of the di-agnostic information in the reconstructed signal. The features investigated in-clude the location, duration, amplitudes and shapes of the waves and com-plexes that exist in every heartbeat. The WDD is believed to be a diagnosti-cally accurate error estimate at the same time as being objective. However, ithas been designed for surface ECG recordings and its usefulness for intracar-diac electrograms has not been established.

    For this project, the PRD error measure has been chosen. The decision was basedupon the fact that it is a rather simple measure requiring few non-complex calcula-tions. Also, it is currently the prime error estimate used in almost all literature con-

    cerning ECG compression. The new WDD measure might have been interesting, butsince it would have required a lot of effort to adapt it to IEGM recordings, it was dis-carded. In addition to the PRD error estimate, subjective judgment by competent col-leagues was used to estimate errors.

    5 . 2 . 4 . Com p l e x i t y e v a l u a t i o n In traditional algorithm analysis studies, theoretical measures such as the ordo tech-nique have been used. However, since this study focuses on actual performance andmost of the investigated algorithms are inO(n), another estimate had to be used. Themost accurate estimate used have been a half-theoretical measure. By analyzing the

    algorithms, it has been possible to estimate the number of additions, multiplications,divisions and other computational tasks needed by each algorithm to compress thesample data. This was mainly done by analyzing the assembler code required to runthe algorithms. Then, by comparing how many clock cycles each of these operationstake on an Intel 486 processor, it has been possible to compare the algorithms interms of speed [38].

    5 . 2 . 5 . Com p r e s s io n r a t i o e v a l u a t i o n It was also necessary to compare the amount of compression achieved by each algo-rithm. This information could be obtained from literature describing the various algo-

  • 7/30/2019 Lindqvist Pontus 05129

    33/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    27

    rithms. However, since different publishers have used different methods and differingdata sets, it was decided to test the algorithms using homogenous conditions. In addi-tion, two of the algorithms tested have never been published thereby enhancing theneed for performance tests.

    Using the PRD error measure described in section 5.2.3, a top error of 5% was de-cided. This decision was taken in cooperation with colleagues at St Jude Medical andthe level of 5% was found to be an appropriate top error level. Thus, curves com-pressed and decompressed showing a 5% PRD value generally still shared the samediagnostic features as the uncompressed data. However, if a 10% PRD value was cho-sen (resulting in higher compression ratios), the reconstructed curves often showedsignificant differences from the original data. By using the 5% PRD estimate, it waspossible to estimate how much each algorithm could compress the data without ex-ceeding 5% PRD. The compression ratio was defined as

    datacompressedofsizedataoriginalofsizeCR =

    Thus, if an algorithm compressed the data 3 times, the compression ratio would be 3.

    5.3. Setup of literature study of database sys-

    tems

    During the study of existing database systems for embedded use, a large literaturestudy was first performed. Initially, the study focused on finding potential candidates

    for evaluation. When these candidates had been found (section 0), a more specific listof investigation factors was compiled. These factors were considered for every data-base investigated and they are described in more detail in the following subsections.

    5 . 3 . 1 . D a t a b as e m o d e l There are essentially two different database models supported by the various suppliersof databases for embedded systems [5;39]. Each model has advantages and drawbacksand is popular for different reasons. The first model is theclient-server model, inwhich the database server can be considered as an application running separately fromthe main real-time software on the embedded system, although on the same proces-sor. The interaction between the database and the rest of the software is in this casehandled by request-response protocols. This is the model most commonly adopted bytraditional enterprise database systems on larger hardware platforms [40]. The mainshortcomings of client-server systems are extra run-time costs of communication be-tween the client and the server and the additional complexity of maintaining a separateserver process in an embedded system [5].

    The second model describes a database system that is compiled together with themain application into one executable system. This model, known as thelibrarymodel, describes databases explicitly designed for embedded system use [39]. In it,the database is linked into the same address space as the applications that use it. The

  • 7/30/2019 Lindqvist Pontus 05129

    34/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    28

    database is thus more integrated as a partial component of the system, resulting infaster execution since database operations do not need to communicate with a sepa-rate server process. Library databases possess one significant drawback in that theyrequire developers to master non-standard programming interfaces to enable commu-

    nication.

    5 . 3 . 2 . D a t a m o d e l The data model describes how the data is logically structured in the database. Themost common model for both traditional and embedded database systems is therela-tional model[5;40]. An advantage with the relational model is that columns in tablescan be related to other tables so that arbitrary complex logical structures can be imple-mented. Another advantage is the enormous knowledge about the relational databasedesign that exists, facilitating modeling and development. A disadvantage with themodel is the added data lookup overhead due to index searching. This can be a serious

    problem for databases that reside in time critical applications [5]. Another limitation ofthe relational model is that it only supports a few basic data types such as numbersand strings. This could make it highly inefficient for storing the arbitrary data struc-tures needed for a specific application [40].

    A second data model is theobject-oriented model, which describes a database highlyintegrated with object-oriented modeling and programming [41]. The database actuallystores objects that can be retrieved by different applications. Object-oriented data-bases seem like good choices for embedded systems, but to date they have usually notgot serious consideration. This has been suggested to be a result of the fact that fewobject-oriented databases provide support for embedded operating systems [39].

    A third data model has evolved as a merge of the relational and the object-orientedmodel. This has been described as theobject-relational modeland it combines fea-tures from both of its predecessors [5].

    5 . 3 . 3 . So u r c e c o d e a v a i l a b i l i t y Open source products provide the user with the entire source code of the products.

    This could be an important feature for the pacemaker application since it will allowdevelopers to alter the product to suit the needs of a particular application platform.

    5 . 3 . 4 .

    I n t e r f a c e There has to be communication between the database and the main application forthe system to function properly. Different databases supply dissimilar interfaces tohandle data, but as a rule of thumb the client-server databases (se section 5.3.1) pro-vide more standardized query methods [40]. Library databases, on the other hand, of-fer more implementation-specific methods with closer interaction with the main ap-plication software. The most common interfaces specified by database vendors in-clude:

    C/C++/ Java This is the native interface for many database applicationssince it gives direct access to the database functionality. Without database

  • 7/30/2019 Lindqvist Pontus 05129

    35/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    29

    drivers a direct access to the database is achieved resulting in faster executionand less memory consumption. A downside factor with the native interfacecould be security issues when the database has several users, but this problemseldom occurs in embedded systems.

    ODBC Developed by Microsoft Corporation, ODBC is today a well-defined standard for database connectivity. The standard uses SQL to queryrelational databases and does not handle non-relational databases efficiently[42].

    JDBC Specifically designed for Java on top of the ODBC interface, this isthe natural choice for higher level database connectivity for Java [43].

    5 . 3 . 5 . M em o r y c on s u m p t i o n One of the key factors when choosing a database system for the embedded pacemakeris the memory footprint of the database. There are two interesting properties ofmemory consumption to consider for embedded databases with regard to the pace-maker application:

    Memory footprint the size of the database without any data elements in it

    Compression the ability of the database to compress data into a morecompact form.

    5 . 3 . 6 . S t o r a g e m e d i a The pacemaker has no secondary storage capabilities and is thus limited to amain-memory database(MMDB). Such a system has to be constructed differently andthus, there are a number of key differences between MMDBs and conventional data-base systems [44]. A fundamental difference between the two system types is the ac-cess of data. In traditional databases, data has to be copied from secondary storage tothe primary memory resulting in prolonged access times. In MMDBs, however, anapplication can receive a direct memory address from where it can reach a specificobject directly from the database. An MMDB also fetches data randomly, while a con-ventional disk-based database usually optimize data retrieval through sequential dataclustering [44].

    5 . 3 . 7 . O p er a t i n g s y st e m p l a t f o r m s A major factor when deciding which database is best suited for a specific task is ofcourse the different operating systems for embedded products it supports. In its pre-sent version, the pacemaker does not have an entire commercial operating system.Instead, there exists a custom-made platform with exactly the functions needed forthe pacemaker application. This is a definite problem since it will be difficult to installany of the databases above in the pacemaker memory. One possible solution could beto use one of the open-source library databases (se section 5.3.3) and compile the da-tabase with the rest of the application. With the addition of the operating systemframework functions needed by the database, this could be a potential solution.

  • 7/30/2019 Lindqvist Pontus 05129

    36/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    30

    5 . 3 . 8 . Re al - t i m e p r o p e r t i e s A key factor when designing databases for real-time usage is timepredictability. Thisimplies that the database user has to be able to predict within a range how long time adatabase operation will require. It is imperative that this time remains constant as thedatabase grows [45]. Real-time performance can be divided into hard and soft real-time properties. A hard real-time systemis a system that does not permit any kindof increase in the delay to handle a transaction. A soft real-time system, however,does accept a certain degree of delay but with a specified penalty [46].

    5 . 3 . 9 . Ac c e s s i b i l i t y The final factor investigated was if the product is readily accessible. Another issue thatwas considered was the price of the various databases in the survey.

    5 . 3 . 1 0 .Fa c t o r s n o t t h o r o u g h l y i n v e s t ig a t e d In addition to the factors described above, there exist some factors that have beenconsidered but not thoroughly investigated. These include

    Features the feature demands on a database used in a pacemaker are notextraordinary in any way, apart from the memory footprint issue (see sectionsbelow). In fact, only the most basic database operations are required by theapplication. As a result, all databases examined in this survey provide the nec-essary functionality. However, many of the databases offer numerous featuresnot needed by the pacemaker application resulting in larger memory foot-prints.

    Concurrency control the pacemaker software application is run as a singlethread and there are no other active threads that may want to use the data-base. Concurrency control deals with issues associated with multiple user ac-cess to the stored data. Software for future pacemaker generations may beconstructed as multi-threaded but in present and near-future products, con-currency control issues are not important.

  • 7/30/2019 Lindqvist Pontus 05129

    37/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    31

    6. Results

    This section will describe the results of the investigations performed during the pro-ject.

    6.1. Compression algorithm compression per-

    formances

    Based on the 5% PRD error criterion described in section 5.2.3, the maximum com-pression ratios of were investigated using the technique described in section 5.2.5. Theresults from this study are presented in Table 6.1 below.

    Table 6.1 Average compression ratios (CR)achieved by the compression algorithms exam-ined.8 indicates that the algorithm was not able tocompress data at the same time as following the5% PRD rule.

    Algorithm CRData thinning 8AZ Simple 5.6AZTEC 8Adaptive AZTEC 8Fan 5.4Cortes 4.1

    DCT 5.6Wavelet 10.7

    Template 5.4Adaptive peak 7.6

    It is obvious that the wavelet compression technique is very effective compared to allother methods. The other algorithms (except those that does not fulfill the 5% PRDcriterion) all share compression ratios in the same range. Based on subjective judg-ment, however, the Fan and Adaptive peak algorithms provide reconstructed curveswith very high degrees of diagnostic preservation.

    6.2. Compression algorithm complexity per-

    formance

    This section describes the results of the time complexity analysis of the compressionalgorithms.

  • 7/30/2019 Lindqvist Pontus 05129

    38/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    32

    6 . 2 . 1 . Ma t l a b t i m e p e r f o r m a n c e a n a l y s is As described in section 5.2.4, Matlab was first used to get a rough estimate of the rela-tive speeds of the investigated algorithms. The results from this investigation are pre-sented in Table 6.2 below.

    Table 6.2 Average speeds of the compressionalgorithms measured in samples per second(SPS). The figures have been scaled down torepresent the pacemaker processor.

    Algorithm SPSData thinning 199AZ Simple 50AZTEC 32Adaptive AZTEC 14Fan 65

    Cortes 38DCT 43Wavelet 42

    Template 22Adaptive peak 98

    However, these results appeared to be bad estimates of the true speeds of the algo-rithms. First of all, it wasnt encouraging to see so slow data rates. An algorithm capa-ble of handling only 30 samples per second (SPS) would take a very long time (equiva-lent to battery power in the pacemaker) for a 4 minute 200 Hz sampling. The cause ofthe low SPS results was thought to be slow Matlab code execution.There might be

    several causes to this fact, including insufficient use of the matrix optimization algo-rithms present in Matlab. Since Matlab is a high-level programming language it is verydifficult to control (and get access to) the low-level operations needed to implementan algorithm.

    The fact that Matlab controls the lower-level operations leads to two difficulties in thescenario of comparing algorithm efficiency. First, the speed of the algorithms mightbe affected as described above. Second, it is very difficult to compare the relativespeeds of the algorithms in an accurate manner since it is impossible to get informa-tion about how Matlab has optimized the algorithms.As a result of this, it was de-cided to implement the algorithms in lower-level languages to enable a more accurateestimate.

    6 . 2 . 2 . C a n d a s s em b l e r t i m e p e r f o r m a n c e an a l y s i s To get a better estimate of the time performances of the investigated algorithms, theywere all implemented in C code. By running the C algorithms on the example ECGdata, it was possible to get an estimate of the time complexity of the systems. By gen-erating assembler code for the algorithms, it was possible to get an accurate estimateof the amount of multiplications, additions and other time-consuming tasks that thealgorithms required. By using the fact that the pacemaker processor architecture re-

  • 7/30/2019 Lindqvist Pontus 05129

    39/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    33

    sembles an Intel 486 1 MHz processor, it was then possible to infer values for the truetime complexity of the algorithms. These results are shown in Table 6.4 below.

    Table 6.3 Average speeds of the compres-

    sion algorithms measured in samples per sec-ond (SPS). These figures are time perform-ance analyses based on assembler code.

    Algorithm SPSData thinning 53 000AZ Simple 34 000AZTEC 33 000Adaptive AZTEC 16 000Fan 6 200Cortes 23 000DCT 650

    Wavelet 450Template 8 000Adaptive peak 20 000

    These figures seem to be a more close estimate and also look very promising for thepacemaker application.

    6.3. Literature study of database systems

    The results of the literature study of existing database systems for embedded use aredescribed in this section. The results are presented according to the factors described

    in section 5.3.

    6 . 3 . 1 . D a t a b as e m o d e l Looking at the databases examined in this investigation, their respective databasemodel is described in Table 6.4.

    Table 6.4 Data models supported by the different da-tabase systems.

    Database Client-Server LibraryPervasive.SQL 8

    Polyhedra 8Berkeley DB 8eXtreme 8RDM 8Pointbase 8Solid Flowengine 8c-tree Plus 8SQL anywhere 8

    With regard to the pacemaker application environment, the library database seems tobe the preferred choice. The cardinal factor is that as a rule, database systems distrib-

  • 7/30/2019 Lindqvist Pontus 05129

    40/54

    Compression and storage of medical data in pacemakers Pontus Lindqvist

    34

    uted as libraries are far more compact than those distributed as stand-alone binaries[39].

    6 . 3 . 2 . D a t a m o d e l Most of the systems in the survey support the relational data model, as seen in Table6.5. Since the relational model only supports a few basic data types several additions toit have been made. All of the relational databases examined support the storage ofmore complex data types, for instance Binary Large Objects (BLOBs) [5]. This factindicates that arbitrary data structures can be stored in the relational databases withoutlarge overhead storage. Most relational systems also have ways of bypassing the re-source consuming index lookup routine associated with relational databases, resultingin faster access times.

    Table 6.5 Data models supported by different embedded database systems.

    Database Relational Object-Relational

    Key-data Reference

    Pervasive.SQL 8 [5]Polyhedra 8 8 [26]Berkeley DB 8 [27]eXtreme 8 [28]RDM 8 [29]Pointbase 8 [47]Solid Flowengine 8 [30]c-tree Plus 8 [31]SQL anywhere 8 [32]

    The Berkley and the c-tree Plus databases support another distinct database model. Inthis model, every data is associated with a key. The search for data can be done fromthe key or as a sequential search. The data can be of any size and of virtually any struc-ture [27;31].

    It is somewhat problematic to decide which of the data models described above that isbest suited for the pacemaker application. The key-data model has drawbacks regard-ing the relation between different sets of data in the database. The relational modelhowever, has great advantages since it is well known and enables the user to modelcomplex structures. The differences between the relational and the object-relational

    models are rather diffuse and thus it is hard to make a clear recommendation as towhich of them is best suited.

    6 . 3 . 3 . So u r c e c o d e a v a i l a b i l i t y The only products in this survey that provides full access to the source code is theBerkeley DB and the c-tree plus databases [27;31]. The eXtreme