real-time analysis of physiological data to support medical applications

9
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 3, MAY 2009 313 Real-Time Analysis of Physiological Data to Support Medical Applications Daniele Apiletti, Elena Baralis, Member, IEEE, Giulia Bruno, and Tania Cerquitelli Abstract—This paper presents a flexible framework that per- forms real-time analysis of physiological data to monitor people’s health conditions in any context (e.g., during daily activities, in hos- pital environments). Given historical physiological data, different behavioral models tailored to specific conditions (e.g., a particu- lar disease, a specific patient) are automatically learnt. A suitable model for the currently monitored patient is exploited in the real- time stream classification phase. The framework has been designed to perform both instantaneous evaluation and stream analysis over a sliding time window. To allow ubiquitous monitoring, real-time analysis could also be executed on mobile devices. As a case study, the framework has been validated in the intensive care scenario. Experimental validation, performed on 64 patients affected by dif- ferent critical illnesses, demonstrates the effectiveness and the flex- ibility of the proposed framework in detecting different severity levels of monitored people’s clinical situations. Index Terms—Anomaly detection, data mining, mobile applica- tions, patient monitoring, physiological signal analysis. I. INTRODUCTION D EVELOPMENTS in sensing devices, miniaturization of low-power microelectronics, and wireless networks are becoming a significant opportunity for improving the quality of care services for patients and health professionals. Furthermore, since the population is greying, the need for high quality and efficient healthcare, both at home and in hospital, is becom- ing more important. Mobile health applications play a key role in collecting data for medical monitoring and in significantly reducing the cost of medical services. Different mobile moni- toring systems [1]–[3] have been proposed to enhance people comfort, healthcare efficiency, and illness prevention [4]. While many efforts have been devoted to improve the architecture and the connectivity among devices [5], [6], less attention has been devoted to the development of analysis techniques to assess the current health status of monitored people. In particular, a fundamental task is the definition of efficient algorithms that automatically detect unsafe situations in real time. This paper presents a flexible framework that performs real- time analysis of physiological data to monitor people’s health conditions. Given historical physiological data, the framework automatically learns both common and uncommon behaviors and creates different behavioral models tailored to specific con- ditions (e.g., a particular disease, a single patient). The model Manuscript received November 30, 2007; revised September 26, 2008. First published January 20, 2009; current version published May 6, 2009. The authors are with the Dipartimento di Automatica ed Informatica, Politecnico di Torino, Torino 10129, Italy (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TITB.2008.2010702 is then exploited in the real-time classification of vital signs. To allow ubiquitous analysis, the algorithm has been designed to run with limited resources. Hence, it could also be executed directly onboard a mobile device. Even if mobility is not a requirement, the algorithm can be easily deployed into any de- vice (e.g., bedside instrument, miniaturized personal computer) characterized by memory and power consumption constraints. Features extracted from physiological signals are used to feed a risk function that assesses the patient’s health condition. The framework performs both an instantaneous evaluation for emer- gency detection and a stream analysis over a sliding time win- dow by considering the temporal context of interest, because current conditions depend on the recent past behavior. Since our contribution is focused on the data analysis approach, the pro- posed framework can be exploited in many different scenarios where the health status has to be continuously monitored (e.g., hospital environments, intensive care units, and elderly people during daily activities both inside and outside home). The proposed framework has been validated in the intensive care scenario. As discussed in [7], in critical care monitoring, many proposed approaches exploited simple threshold alarms, which lead to many false positives without real clinical meaning. To increase the robustness against artifacts and missing values, new alarm generation algorithms, which perform real-time anal- ysis, have to be designed. In intensive care, a mobile architecture can be exploited when a patient is moved from a ward to an- other, or to a different hospital [2], or when specific analyses that are not provided from bedside instruments are required. Furthermore, the mobile device can be used by the medical staff to monitor patients’ data without being physically near them [8]. Experimental validation, performed on 64 patients affected by different critical illnesses [9], demonstrates the effectiveness and flexibility of the proposed framework in detecting different severity levels of monitored people’s clinical situation. The paper is organized as follows. Section II discusses pre- vious research. Section III thoroughly describes the proposed framework by addressing both behavioral model creation and real-time classification of different physiological signals. The results of the experiments evaluating the effectiveness of the proposed framework in the intensive care scenario are presented in Section IV. Finally, Section V draws conclusions and presents future developments of the proposed approach. II. RELATED WORK Several efforts have been devoted to the design of wearable medical systems [10], [11] and the reduction of power con- sumption of medical body sensors [12], [13]. Sensor devices, integrated into intelligent wearable accessories or smart clothes, 1089-7771/$25.00 © 2009 IEEE

Upload: t

Post on 25-Sep-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Real-Time Analysis of Physiological Data to Support Medical Applications

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 3, MAY 2009 313

Real-Time Analysis of Physiological Datato Support Medical Applications

Daniele Apiletti, Elena Baralis, Member, IEEE, Giulia Bruno, and Tania Cerquitelli

Abstract—This paper presents a flexible framework that per-forms real-time analysis of physiological data to monitor people’shealth conditions in any context (e.g., during daily activities, in hos-pital environments). Given historical physiological data, differentbehavioral models tailored to specific conditions (e.g., a particu-lar disease, a specific patient) are automatically learnt. A suitablemodel for the currently monitored patient is exploited in the real-time stream classification phase. The framework has been designedto perform both instantaneous evaluation and stream analysis overa sliding time window. To allow ubiquitous monitoring, real-timeanalysis could also be executed on mobile devices. As a case study,the framework has been validated in the intensive care scenario.Experimental validation, performed on 64 patients affected by dif-ferent critical illnesses, demonstrates the effectiveness and the flex-ibility of the proposed framework in detecting different severitylevels of monitored people’s clinical situations.

Index Terms—Anomaly detection, data mining, mobile applica-tions, patient monitoring, physiological signal analysis.

I. INTRODUCTION

D EVELOPMENTS in sensing devices, miniaturization oflow-power microelectronics, and wireless networks are

becoming a significant opportunity for improving the quality ofcare services for patients and health professionals. Furthermore,since the population is greying, the need for high quality andefficient healthcare, both at home and in hospital, is becom-ing more important. Mobile health applications play a key rolein collecting data for medical monitoring and in significantlyreducing the cost of medical services. Different mobile moni-toring systems [1]–[3] have been proposed to enhance peoplecomfort, healthcare efficiency, and illness prevention [4]. Whilemany efforts have been devoted to improve the architecture andthe connectivity among devices [5], [6], less attention has beendevoted to the development of analysis techniques to assessthe current health status of monitored people. In particular, afundamental task is the definition of efficient algorithms thatautomatically detect unsafe situations in real time.

This paper presents a flexible framework that performs real-time analysis of physiological data to monitor people’s healthconditions. Given historical physiological data, the frameworkautomatically learns both common and uncommon behaviorsand creates different behavioral models tailored to specific con-ditions (e.g., a particular disease, a single patient). The model

Manuscript received November 30, 2007; revised September 26, 2008. Firstpublished January 20, 2009; current version published May 6, 2009.

The authors are with the Dipartimento di Automatica ed Informatica,Politecnico di Torino, Torino 10129, Italy (e-mail: [email protected];[email protected]; [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TITB.2008.2010702

is then exploited in the real-time classification of vital signs.To allow ubiquitous analysis, the algorithm has been designedto run with limited resources. Hence, it could also be executeddirectly onboard a mobile device. Even if mobility is not arequirement, the algorithm can be easily deployed into any de-vice (e.g., bedside instrument, miniaturized personal computer)characterized by memory and power consumption constraints.Features extracted from physiological signals are used to feeda risk function that assesses the patient’s health condition. Theframework performs both an instantaneous evaluation for emer-gency detection and a stream analysis over a sliding time win-dow by considering the temporal context of interest, becausecurrent conditions depend on the recent past behavior. Since ourcontribution is focused on the data analysis approach, the pro-posed framework can be exploited in many different scenarioswhere the health status has to be continuously monitored (e.g.,hospital environments, intensive care units, and elderly peopleduring daily activities both inside and outside home).

The proposed framework has been validated in the intensivecare scenario. As discussed in [7], in critical care monitoring,many proposed approaches exploited simple threshold alarms,which lead to many false positives without real clinical meaning.To increase the robustness against artifacts and missing values,new alarm generation algorithms, which perform real-time anal-ysis, have to be designed. In intensive care, a mobile architecturecan be exploited when a patient is moved from a ward to an-other, or to a different hospital [2], or when specific analysesthat are not provided from bedside instruments are required.Furthermore, the mobile device can be used by the medical staffto monitor patients’ data without being physically near them [8].Experimental validation, performed on 64 patients affected bydifferent critical illnesses [9], demonstrates the effectivenessand flexibility of the proposed framework in detecting differentseverity levels of monitored people’s clinical situation.

The paper is organized as follows. Section II discusses pre-vious research. Section III thoroughly describes the proposedframework by addressing both behavioral model creation andreal-time classification of different physiological signals. Theresults of the experiments evaluating the effectiveness of theproposed framework in the intensive care scenario are presentedin Section IV. Finally, Section V draws conclusions and presentsfuture developments of the proposed approach.

II. RELATED WORK

Several efforts have been devoted to the design of wearablemedical systems [10], [11] and the reduction of power con-sumption of medical body sensors [12], [13]. Sensor devices,integrated into intelligent wearable accessories or smart clothes,

1089-7771/$25.00 © 2009 IEEE

Page 2: Real-Time Analysis of Physiological Data to Support Medical Applications

314 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 3, MAY 2009

such as watches [14], gloves [10], and T-shirts [15], can collectphysiological signals and transmit data to a mobile device. Im-plementation on a mobile device was proposed, among others,by [1]. The mobile device performed physiological data analy-sis, based on threshold alarms, to detect dangerous situations.One step further toward a portable remote monitoring unit hasbeen proposed in [16], where a portable digital assistant (PDA)is exploited to receive different medical data (e.g., vital biosig-nals, images) from sensors and to transmit them over 3G wirelessnetworks. However, the focus was on telecommunication issues(i.e., simultaneous data transfers over bandwidth-limited wire-less links), whereas our contribution is focused on the analysisalgorithm needed to perform real-time monitoring of differentvital signs and to detect unexpected situations. Furthermore,our algorithm has been designed to work with limited resources(e.g., onboard a mobile device).

While many efforts have been dedicated to improve health-care system architectures [5], [6], less attention has been devotedto investigating analysis techniques to support medical applica-tions. In [17], different classifiers have been used for detectingsport activities by analyzing 3-D accelerometer signals. Themain focus was on comparing supervised and unsupervised set-tings for activity recognition. However, real-time analysis andmobile processing, which are the focus of this paper, are notaddressed. A low-power mobile system to investigate long-termpatterns and classify different activities (e.g., rest, postural ori-entation) has been presented in [18]. Accelerometer signal pro-cessing is executed directly onboard of the wearable unit, thussatisfying memory and time constraints. While body activity is acomponent of the health status, by using only the accelerometersignal [18] the physiological conditions of a patient cannot beproperly analyzed. Our approach is able to simultaneously ana-lyze different vital signs to assess the patient’s health condition.

The definition of efficient algorithms that automatically detectunsafe situations in real-time is a difficult task. In [11], an algo-rithm to discover physiological problems (e.g., cardiac arrhyth-mias) has been proposed. Physiological time series recordedthrough sensors may be exploited for learning usual behavioralpatterns on a long time scale. Any deviation is considered an un-expected and possibly dangerous situation. While [11] requiredsome kind of a priori knowledge and exploited fixed thresholds,our approach automatically learns both common and uncommonbehaviors and exploits features extracted from different physi-ological signals to assess the current health status of monitoredpeople.

More recently, in [19], the extraction of temporal patternsfrom single or multiple physiological signals by means of sta-tistical techniques (e.g., regression) has been proposed. Theproposed signal analysis only provides trend descriptions suchas increasing, decreasing, constant, and transient. The frame-work proposed in this paper is more general than [19]. It allowsthe evaluation of people’s health conditions by analyzing differ-ent signal features (e.g., sharp changes in the signal, long-termtrends).

In the intensive care context, there is the need of algorithmsthat automatically detect risk situation, because clinicians haveto process large amounts of clinical data [20]. Different medical

Fig. 1. Framework architecture.

informatics applications have been proposed to store and dis-play patient information in the intensive care context [8], [21].However, they exploit simple threshold alarms that lead to manyfalse positives [7]. Many false positives may lead to a danger-ous desensitization of the intensive care staff toward true alarms,thus compromising patient safety and care effectiveness. Hence,alarm generation algorithms should be improved by increasingtheir robustness against artifacts and missing values and by per-forming real-time analysis. The proposed approach overcomesthese issues. In particular, our analysis algorithm is character-ized by a good accuracy and a small number of false positives.

A mobile diagnostic method based on multiple wearable sen-sors is presented in [22]. It is focused on the dynamic selectionof the minimal set of sensors required to reach the right diagno-sis. The mobile intelligence that locally processes physiologicalsignals aims at determining the optimal set of sensors, whereasthe severity level evaluation of the physiological conditions isactually performed in a remote central server. The frameworkproposed in this paper achieves this goal in real time directlyon the mobile device. A preliminary version of the framework,with a limited set of features, has been introduced in [23].

III. METHOD

We present a flexible framework that performs real-time anal-ysis of physiological data to monitor patients’ health conditions.Physiological measures, collected by sensors, are analyzed bymeans of data mining techniques to detect anomalies, thus as-sessing the severity level of monitored people’s clinical situa-tion. To allow ubiquitous analysis, real-time processing is per-formed on mobile devices (e.g., pocket PCs and smart phones).When a high-risk situation is detected, a suitable alert may betriggered.

The framework architecture is depicted in Fig. 1. People’shealth conditions are evaluated by analyzing different vitalsigns over a sliding time window. Given historical physiologi-cal data, the framework automatically learns both common and

Page 3: Real-Time Analysis of Physiological Data to Support Medical Applications

APILETTI et al.: REAL-TIME ANALYSIS OF PHYSIOLOGICAL DATA TO SUPPORT MEDICAL APPLICATIONS 315

uncommon behaviors and creates different behavioral modelstailored to specific conditions (e.g., a particular disease, a spe-cific patient). For example, if a patient with respiratory problemshas to be monitored, the model can be built either by using thepatient historical data (i.e., by considering a specific patientprofile), or other patients’ data with the same pathology, thusbuilding a disease profile. Then, the suitable model for the cur-rently monitored patient is selected and exploited in the real-timeclassification phase. Both phases are based on the (same) pre-processing, feature extraction, and risk component assessmentsteps detailed in the following.

A. Preprocessing

Since data collected from sensors are expected to be dirty, apreprocessing phase has been implemented to address the fol-lowing issues: 1) removing incompliant (with human life) mea-surements and 2) handling missing values. We do not addresscommunication errors, because we assume to exploit protocolsthat grant data recovery and admissible time delay.

Incompliant measurements are detected and removed (i.e.,substituted with a null value) by comparing them with twothresholds associated to the current signal and derived frommedical literature [24]. Missing values may be caused by sen-sor failures or obtained by null value substitution in 1). Duringthe training phase on historical data, single missing values arereplaced with the interpolation of the previous and followingmeasurements. During the online classification, missing valuesare ignored and do not contribute to risk computation.

B. Feature Extraction

For each physiological signal x among the X monitored vitalsigns, we extract the following features.

Offset: The offset feature measures the difference between thecurrent value x(t) and the moving average (i.e., mean value overthe time window). It aims at evaluating the difference betweenthe current value and the average conditions in the recent past.If a patient exhibits a strong deviation, this feature reveals suchbehavior. It is computed as

offsetw (x(t)) = x(t) − 1w

∫ t

t−w

x(t) dt (1)

where w is the size of the sliding time window. The slidingwindow size determines the temporal context of interest, asthe present conditions depend on the recent past behavior. Avery short time window focuses the analysis on instantaneousevaluation (e.g., for emergency detection).

Slope. The slope function evaluates the rate of the signalchange. Hence, it assesses short-term trends, where abrupt vari-ations may affect the patient’s health. It is computed as

slope(x(t)) =dx(t)

dt. (2)

Dist. The dist feature measures the drift of the current signalmeasurement from a given normality range. It is zero when themeasurement is inside the normality range. The dist function is

defined as

dist(x(t)) =

0, N−x ≤ x(t) ≤ N+

x

x(t) − N+x , x(t) > N+

x

x(t) − N−x , x(t) < N−

x

(3)

where N+x and N−

x are, respectively, the upper and lower nor-mality threshold values for the considered signal x(t). N+

x andN−

x define a range in which the physiological signal x is sup-posed to be normal for the patient. Physicians typically workwith threshold values and the dist feature models this knowl-edge. For many human vital signs (e.g., heart rate, blood pres-sure, etc.) normality ranges are known from medical literature[24] and domain experts can easily set N+

x and N−x according

to specific conditions (e.g., patient, disease).

C. Risk Components

The signal features contribute to the computation of the fol-lowing risk components. Each feature contribution is weightedby an h function detailed next.

Sharp changes. The z1 component aims at measuring thehealth risk deriving from sharp changes in the signal (e.g., quickchanges in the blood pressure may cause fainting). It is obtainedby considering the h-weighted slope over the time window woffset by the most static condition

z1(x(t)) =∫ t

t−w

hslope |slope(x(t))| dt − h∆∆w (x(t)). (4)

The most static condition for a signal in the time window isdefined as the smoothest change between its first value x(t − w)and its current value x(t)

∆w (x(t)) = |x(t) − x(t − w)| . (5)

Long-term trends. The z2 component measures the risk de-riving from the h-weighted offset over the time window. Whilez1 focuses on quick changes, z2 evaluates long-term trends, asit is offset-based

z2(x(t)) =∫ t

t−w

hoffset |offsetw (x(t))| dt. (6)

Distance from normal behavior. The z3 component assessesthe risk level given by the distance of the signal from the normal-ity range. A patient with an instantaneous measurement outsidethe range may not be critical, but her/his persistence in suchconditions contributes to the risk level

z3(x(t)) =∫ t

t−w

hdist |dist(x(t))| dt. (7)

All features are considered in absolute value to avoid com-pensation of positive and negative contributions, which bothpotentially increase the risk instead.

The contribution of each feature to the risk components canbe personalized on a per signal basis by setting the hslope (whichequals h∆ ), hoffset , and hdist values of the h function

hy ={

Hy , y ≥ 0

1 − Hy , y < 0(8)

Page 4: Real-Time Analysis of Physiological Data to Support Medical Applications

316 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 3, MAY 2009

where y is the current feature value (i.e., slope, offset, dist).This function allows domain experts to discriminate betweenpositive and negative feature contributions, by choosing an Hy

value in the range [0,1]. When Hy is set to 1, only positivefeature values are considered in the risk evaluation, vice versawhen Hy is set to 0. The value 0.5 equally weighs the entirefeature domain. For example, decreasing/low values of periph-eral blood oxygenation lead to increasing health risk, whileincreasing/high measurements bring no contribution to the riskevaluation. Hence, a suitable Hy value for this signal is 0 for allfeatures.

D. Risk Function

The health risk associated to signal x at time t is obtained bycombining its components. To avoid undesired unbalanced con-tributions, component values are clustered into discrete risk lev-els during the model training phase on historical data. To identifythe discrete risk levels, different clustering algorithms [25] areexploited (see Section IV-A for more details on clustering al-gorithm selection issues). The number of discrete levels Cmaxis the same for every risk component and is set during modelbuilding. During real-time classification, component values areassigned to the appropriate risk level stored in the model. In thefollowing, C(z) denotes the function that returns the risk levelassociated to risk component z, computed separately for each z.Hence, each component zi is characterized by its own Ci(zi).

The health risk for the signal x at time t is defined as

riskx(t) =∑

i ki,xCi(zi(x))∑i ki,x

× n

Cmax(9)

where i ranges from 1 to 3 for the three zi risk components.In the risk function, coefficients ki,x ∈ [0, 1] are introduced toallow individual weighing of the ith component of a specificsignal x. For example, assessing the severity level of a bloodpressure signal (BP) by considering solely its closeness to nor-mality thresholds is obtained by setting k3,BP = 1, k2,BP = 0,and k1,BP = 0.

The risk function is normalized to return a value indicatingthe severity level from 0 to n, where n may be personalized. Forexample, many emergency units in hospitals use a scale of fourlevels to quickly indicate the degree of health risk.

E. Global Risk Assessment

Patient conditions depend on the contribution of differentphysiological signals. Risk levels for each physiological signalx are combined to obtain a global health indicator for the patient.To determine the global risk level, we exploited the max function

risk(t) = maxx∈X

(riskx(t)). (10)

The max function detects any high-risk situation, even whenhighlighted by a single physiological signal. Other suitable func-tions (e.g., weighted mean) may be easily introduced.

F. Domain Expert Knowledge

Physicians’ knowledge is integrated in the framework by set-ting the values of normality thresholds (N ), absolute thresholds(A), h function, and k coefficients for each signal.

Normality thresholds may be inferred from medical literature[24] or by analyzing historical data. They may also be definedby the medical staff, thus suiting specific needs. To promptlydetect extreme risks, absolute thresholds are introduced. Theymodel the worst cases. Thus, outside the range defined by Avalues, the risk is always set to the highest level.

G. Applicability of the Approach

The aforesaid processing may be applied to different typesof physiological signals, e.g., 1) signals that in normal condi-tions do not suddenly deviate from the average value and donot present abrupt changes in their trend and 2) Signals that incritical situations take values far from the average and whosevariability grows. Examples of such signals are heart rate (HR),systolic and diastolic arterial blood pressure (ABP), oxygensaturation (SpO2), respiratory rate, ventilation, and body tem-perature. Signals not satisfying the aforesaid conditions cannotbe analyzed by means of the proposed approach (e.g., motiondata, electrocardiogram). For example, motion data do not havea normality range, and in normal conditions, sudden changesmay happen without compromising the health status. Hence, torecognize activity and identify unsafe situation (e.g., falls), adifferent approach is needed.

IV. EXPERIMENTAL RESULTS

Publicly available clinical data have been selected to validatethe effectiveness of the proposed framework. Focusing on theintensive care scenario, the multiparameter intelligent monitor-ing for intensive care (MIMIC) database [9] contains nearly200 patient days of real-time signals. We used 64 records fromthe numerics section of the database, which provides measuressampled at 1 Hz. For each record, the gender, age, and diseaseof the patient are known.

Four physiological variables representative of health condi-tions have been analyzed: 1) systolic arterial blood pressure(ABPsys); 2) diastolic arterial blood pressure (ABPdias); 3)heart rate (HR), and 4) peripheral blood oxygenation (SpO2).These physiological measures are likely to be available also in adifferent mobile context, such as the monitoring of elderly peo-ple, even if sensors are expected to be different from the ICUenvironment (e.g., noninvasive, wearable, etc.). Furthermore,these signals provide a comprehensive view of a patient state.

The experimental session evaluated the different behaviorof the proposed approach by varying the clustering algorithm,the time window size w, and domain knowledge thresholds.Furthermore, several personalizations of the model have beendefined and analyzed. Eventually, the results of automated riskdetection are compared to data classified by a domain expert toverify the effectiveness of the proposed method.

Evaluations are mainly performed by visual inspection ofclassified data, where colors and shapes are used to identify

Page 5: Real-Time Analysis of Physiological Data to Support Medical Applications

APILETTI et al.: REAL-TIME ANALYSIS OF PHYSIOLOGICAL DATA TO SUPPORT MEDICAL APPLICATIONS 317

TABLE IABSOLUTE AND NORMALITY THRESHOLD DEFAULT VALUES

increasing severity levels. We used n = 5 levels: light greencircle, dark green circle, yellow x-shape, orange cross shape,and red square. In the plots, red horizontal lines show abso-lute thresholds, while green horizontal lines denote normalitythresholds. The x-axis shows the sample number and the y-axisthe measured value (millimeters of mercury for ABPdias andABPsys, beats per minute for HR, percentage for SpO2).

The overall mobile architecture has been simulated throughBluetooth connections between a PC and a PDA, where theimplemented framework had been installed. The PC is usedfor creating the patient/disease models from historical data. Allmodels have been built on a selected subset of the dataset. ThePC emulates sensors by sending new samples to the PDA overa Bluetooth channel. The PDA receives the measurements, as ifthey were coming from real sensors, and performs the describedanalysis in real time. The PC is equipped with an Intel Core 2Duo T5500 at 1.67 GHz and 2048 MB of RAM. The mobiledevice used in our tests is an HTC P3600, equipped with aSamsung SC32442A processor at 400 MHz, 64 MB of RAM,and Windows Mobile 6.

The framework has been implemented in Java for the model-building phase. Instead, .NET Compact is used for the mobilereal-time classification phase. Performance of both model build-ing and mobile classification are analyzed in terms of requestedresources and processing time.

A. Experimental Design

For each experiment, disjoint record sets have been selectedfor building the model (training set) and detecting risk in realtime on the mobile device (test set). The severity level of theclinical conditions associated to each physiological signal in testdata has been evaluated in various operating conditions. For eachoperating condition, a disease-specific model has been built andthen applied to classify the clinical conditions of a previouslyunseen record of a patient affected by the same disease.

The experiments addressed the following issues.1) Clustering algorithms: Simple K-means (KM), farthest

first (FF), and expectation maximization (EM) algorithms(default algorithm simple KM).

2) Sampling interval: 3, 5, 10, and 20 s (default value 3 s).3) Sliding time window size: 1, 5, 10, 20, and 30 sampled

measurements (default value ten samples).Default values for absolute thresholds (A) and normality

thresholds (N ) are shown in Table I. They refer to human be-ings in healthy conditions. The reported values are derived frommedical literature [24]. The h function and k coefficient defaultvalues are reported in [26].

Fig. 2. ABPdias signal of record 267 classified using FF (top), KM (middle),and EM (bottom) clustering algorithms.

B. Clustering Algorithms

Three categories of clustering algorithms (i.e., partitioning,hierarchical, model-based) have been exploited to identify thediscrete risk levels. While model-based algorithms use statis-tical techniques to determine the best fit between the modeland the data, partitional and hierarchical methods require thedefinition of a metric to compute distances between objects inthe dataset [25]. The most common distinction between the lasttwo categories is whether the set of clusters is nested (e.g., hi-erarchical clusters) or unnested (e.g., partitional clusters). Inour experiments, we considered the simple KM, FF, and EMalgorithms as representative of the partitional, hierarchical, andmodel-based approaches. They are, respectively, identified asKM, FF, and EM in the presented results.

Clustering algorithm comparisons were performed for all fourphysiological signals. The results of blood pressure classifica-tion on a patient affected by angina are reported in Fig. 2. Gen-erally, FF creates one big cluster for normality conditions (lowseverity levels) leading to a classification with few high-riskmeasurements, whereas KM creates more balanced clusters,

Page 6: Real-Time Analysis of Physiological Data to Support Medical Applications

318 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 3, MAY 2009

Fig. 3. ABPdias signal of record 218 classified using a time window of 1sample (top) and 20 samples (bottom).

thus better discriminating among different levels of normalityand risky conditions. Hence, FF is more prone to (unacceptable)false negatives and KM to (undesired but acceptable) false posi-tives. EM presents a behavior in between FF and KM, at the costof a higher model creation time (see Section IV-F). Focusing onthe ABPdias signal in Fig. 2, KM recognizes both the risky sit-uations over (samples 200–300) and below (samples 400–500)the normality thresholds, where EM and FF fail in one or theother, assigning medium-low severity levels.

C. Time Window Size and Sampling Interval

A large time window focuses on long-term trends of thesignals and allows the identification of patients persisting inconditions of medium risk, which may become severe if exces-sively protracted. Instead, a short time window concentrates onemergency detection, such as sudden physiological parameterchanges. This behavior is shown in Fig. 3, which plots the risklevel for different time window sizes on the ABPdias signalof a patient affected by respiratory failure. Focusing on peaks(e.g., around samples 10 000 and 25 000), a short time windowpromptly raises the risk level, whereas a long time window keepsthe severity level low due to their short duration.

As expected, the increase of the sampling frequency allowsdetecting abrupt changes in the clinical conditions. The lowestsampling interval we tested is 1 s, which corresponds to the timegranularity of the database. Detailed results of these experimentsmay be found in [26].

Fig. 4. ABPsys signal of record 218 (zoomed on samples from 15 to 20thousand) classified by means of a respiratory failure model (top) and a generalmodel (bottom).

D. General Versus Specific Models

By considering training data coming from patients affectedby the same disease, a disease-specific model can be built. Fur-thermore, models may be trained by using measurements from asingle patient, thus building a patient-specific model. A generalmodel, which does not refer to a specific context, is needed inless-defined conditions. It can be built by considering recordscoming from patients affected by different diseases. In our ex-periments, a general intensive care model has been trained usinga patient from each of the nine available diseases (see [26]) anda respiratory-failure-specific model has been trained on patientsaffected by such disease. Fig. 4 compares the respiratory failureand the general models. They present different behaviors formeasurements inside normality thresholds and measurementsoutside. The disease-specific model detects a higher risk thanthe general model when measurements are outside normalitythresholds. Evidences are presented in the plots for points be-low the 120 mmHg line (i.e., lowest normality threshold). Thethree peaks around samples 17 000, 18 000, and 19 000 showa low-level risk in the respiratory-failure-specific model, whilethe generic model presents most midlow risk labels. On thecontrary, outside the normality thresholds, the specific modelleads to higher risks, up to the midhigh level in the case ofsamples around 17 200, whereas in the generic model, thehighest risk is only at midlevel. Other signals, such as HRand SpO2, show very little difference between the two mod-els, whereas the ABPdias signal presents a behavior similar toABPsys.

Page 7: Real-Time Analysis of Physiological Data to Support Medical Applications

APILETTI et al.: REAL-TIME ANALYSIS OF PHYSIOLOGICAL DATA TO SUPPORT MEDICAL APPLICATIONS 319

Fig. 5. HR signal of record 221 classified using standard (top) and personalized(bottom) thresholds.

E. Patient Personalization

Domain expert personalization is especially useful to suit thecharacteristics of patients presenting particular conditions withno need to rebuild the model. For example, considering record221, its HR measurements are continuously over threshold fora very long period. A physician diagnoses tachycardia, thusbeing interested in deviations from such steady condition, andavoiding interruptions by repetitive alarms caused by the al-ready identified situation. The global risk levels obtained withstandard and personalized threshold values are shown in Fig. 5.Normality threshold values have been changed from 60–80 to80–100 beats per min. By focusing on the two circled anomalies,personalized threshold usage leads to midlevel risks (instead ofmidhigh risks) and ordinary tachycardiac measurements (sam-ples from 500 to 2500) are labeled as low risk, avoiding themidhigh risks obtained by applying standard thresholds. Giventhe specific context of patient 221, the risk conditions identifiedby standard values may be uninteresting for a physician. Chang-ing the thresholds allows the domain expert to choose when analert becomes meaningful in the patient context. Other examplesof patient personalization may include bradycardia, hyperten-sion, and hypotension, but personalization is not only limitedto patient disease-specific conditions, as it allows physicians tofocus on their specific interests when analyzing monitored data.

F. Resource Consumption

Considered resources are processing time and requestedmemory. Since classification is performed in real-time, the per-formance of this phase is critical. Furthermore, given the con-

TABLE IIACCURACY OF HIGH-RISK DETECTION

TABLE IIINUMBER OF FALSE POSITIVES OVER 30 000 SAMPLES

tinuous stream of incoming sensor data, the classification timeof a single measurement has to be less than the sampling pe-riod of the sensors. The average time required for classifyinga single measurement of every physiological signal (i.e., fourvalues in our experiments) is always less than 0.5 ms (see [26]).Memory requirements are negligible. The memory needed bythe classification process is of the order of hundreds of bytes(with a ten-cluster model and a time window of tens of samples)for each physiological signal to be monitored. The completeapplication can be executed on a 2-MB-equipped smartphonewithout restrictions. Since the smartphone battery lasted manyhours, effective mobile monitoring proved to be feasible. Themodel building phase is performed off-line, thus its resource re-quirements are less critical. Experiments showed that the choiceof the clustering algorithm strongly affects the time required tobuild the model. The fastest clustering algorithm is FF, whichcompletes in few minutes at most. KM needs tens of minutes,whereas EM may require up to few hours.

G. Domain Expert Validation

Results obtained by automated classification have been com-pared to domain expert classification. Comparison focuses onhigh severity level conditions, as failing in identifying them iscritical in the context (false negative). Low-risk conditions arethe most frequently encountered (i.e., most of the measurementsare associated with low severity levels) and they are accuratelyrecognized in almost all cases.

High-risk events have been independently classified by medi-cal staff. The accuracy of their detection by means of automatedclassification has been reported in Table II for record 218, whichis characterized by a total of 26 h of recordings (30 000 mea-surements for each signal). The (default) clustering algorithmKM obtains the best classification accuracy, while FF shows theworst performance. Focusing on false alarms, Table III showsthe number of false positives detected for each signal. All threeclustering techniques lead to good performance, in particularFF does not trigger any false alarm, while KM and EM detect anegligible number of false positives.

As a conclusion, when considering classification accuracy(reported in Table II), false positive rate (reported in Table III),and computation time, each clustering algorithm presentsstrengths and weaknesses. The best tradeoff between accuracy

Page 8: Real-Time Analysis of Physiological Data to Support Medical Applications

320 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 3, MAY 2009

and false positive rate is provided by the KM algorithm, whichis the most accurate and requires a limited computational time(few minutes), while its false positive rate is acceptable. Theworst performance is given by FF, which is the least accurate inidentifying high-risk situations. Its low false positive rate andcomputational time become irrelevant in this context. EM is theslowest and only shows medium accuracy. For these reasons weselected the KM algorithm as the default clustering algorithm.

V. CONCLUSION

A flexible framework has been proposed to perform real-timeanalysis of physiological data and to evaluate people’s healthconditions. Patient or disease-specific models are built by meansof data mining techniques. Models are exploited to perform real-time classification of physiological signals and continuouslyassess a person’s health conditions. The proposed frameworkallows both instantaneous evaluation and stream analysis overa sliding time window for physiological data. Experiments per-formed on 64 patients affected by different critical illnessesdemonstrate the adaptability and flexibility of the proposed ap-proach and its computational efficiency.

Future developments of the proposed framework will explorethe correlation among medical data (e.g., physiological signals,motion data) to reduce the false positive rate. Furthermore, weplan to apply the approach to different medical contexts (e.g.,elderly people monitoring, patient rehabilitation, etc.).

ACKNOWLEDGMENT

The authors would like to thank Dr. R. Potenza, fromS. Giovanni Bosco Hospital, Turin, Italy, for his support andclinical advice, and for serving as domain expert.

REFERENCES

[1] H.-C. Wu, C.-H. Lin, K.-C. Wang, S.-C. Wang, C.-H. Chen, S.-T. Young,and T.-S. Kuo, “A mobile system for real-time patient-monitoring withintegrated physiological signal processing,” in Proc. 1st Joint BMES/Eng.Med. Biol. Soc. Conf., Atlanta, GA, 1999, p. 712.

[2] Y.-H. Lin, I.-C. Jan, P. Chow-In Ko, Y.-Y. Chen, J.-M. Wong, and G.-J. Jan,“A wireless PDA-based physiological monitoring system for patient trans-port,” IEEE Trans. Inf. Technol. Biomed., vol. 8, no. 4, pp. 439–447, Dec.2004.

[3] R.-G. Lee, K.-C. Chen, C.-C. Hsiao, and C.-L. Tseng, “A mobile caresystem with alert mechanism,” IEEE Trans. Inf. Technol. Biomed., vol. 11,no. 5, pp. 507–517, Sep. 2007.

[4] N. Saranummi, “Information technology in biomedicine,” IEEE Trans.Biomed. Eng., vol. 49, no. 12, pp. 1385–1386, Dec. 2002.

[5] K. Lorincz, D. J. Malan, T. R. F. Fulford-Jones, A. Nawoj, A. Clavel,V. Shnayder, G. Mainland, M. Welsh, and S. Moulton, “Sensor networksfor emergency response: Challenges and opportunities,” IEEE PervasiveComput., vol. 3, no. 4, pp. 16–23, Oct. 2004.

[6] V. Jones, A. van Halteren, I. Widya, N. Dokovsky, G. Koprinkov, R. Bults,D. Konstantas, and R. Herzog, “Mobihealth: Mobile health services basedon body area networks,” Centre for Telematics and Information Technol-ogy, Univ. Twente, Enschede, The Netherlands, Tech. Rep. TR-CTIT-06-37, 2006.

[7] M. Imhoff and S. Kuhls, “Alarm algorithms in critical care monitoring,”Anesth. Analg., vol. 102, pp. 1525–1537, 2006.

[8] F. Lamberti and B. Montrucchio, “Ubiquitous real-time monitoring ofcritical-care patients in intensive care units,” in Proc. IEEE Eng. Med.Biol. Soc., 2003, pp. 318–321.

[9] The MIMIC database on PhysioBank (2007, Oct.) [Online]. Available:http://www.physionet.org/physiobank/database/mimicdb

[10] F. Axisa, A. Dittimar, and G. Delhomme, “Smart clothes for the monitor-ing in real time and conditions of physiological, emotional and sensorialreaction of human,” in Proc. 25th Conf. IEEE Eng. Med. Biol. Soc., 2003,pp. 3744–3747.

[11] P. Varady, Z. Benyo, and B. Benyo, “An open architecture patient mon-itoring system using standard technologies,” IEEE Trans. Inf. Technol.Biomed., vol. 6, no. 1, pp. 95–98, Mar. 2002.

[12] P.-T. Cheng, L.-M. Tsai, L.-W. Lu, and D.-L. Yang, “The design of PDA-based biomedical data processing and analysis for intelligent wearablehealth monitoring systems,” in Proc. 4th Conf. Comput. Inf. Technol.,2004, pp. 879–884.

[13] P. Branche and Y. Mendelson, “Signal quality and power consumption of anew prototype reflectance pulse oximeter sensor,” in Proc. 31st NortheastBioeng. Conf., 2005, pp. 42–43.

[14] Skyaid Watch webpage (2007, Nov.) [Online]. Available: http://www.skyaid.org/LifeWatch/life_watch.htm

[15] J. L. Weber, D. Blanc, A. Dittmar, B. Comet, C. Corroy, N. Noury,R. Baghai, S. Vaysse, and A. Blinowska, “Telemonitoring of vital param-eters with newly designed biomedical clothing,” Stud. Health Technol.Informat., vol. 108, pp. 260–265, 2004.

[16] S. Gupta and A. Ganz, “Design considerations and implementation ofa cost-effective, portable remote monitoring unit using 3G wireless datanetworks,” in Proc. 26th Conf. IEEE Eng. Med. Biol. Soc., 2004, pp. 3286–3289.

[17] M. Ermes, J. Parkka, J. Mantyjarvi, and I. Korhonen, “Detection of dailyactivities and sports with wearable sensors in controlled and uncontrolledconditions,” IEEE Trans. Inf. Technol. Biomed., vol. 12, no. 1, pp. 20–26,Jan. 2008.

[18] D. M. Karantonis, M. R. Narayanan, M. Mathie, N. H. Lovell, andB. G. Celler, “Implementation of a real-time human movement classi-fier using a triaxial accelerometer for ambulatory monitoring,” IEEETrans. Inf. Technol. Biomed., vol. 10, no. 1, pp. 156–167, Jan. 2006.

[19] S. Sharshar, L. Allart, and M. C. Chambrin, “A new approach to theabstraction of monitoring data in intensive care,” in Lect. Notes Comput.Sci., vol. 3581, pp. 13–22, 2005.

[20] N. Adhikari and S. E. Lapinsky, “Medical informatics in the intensivecare unit: Overview of technology assessment,” J. Crit. Care, vol. 18,pp. 41–47, Mar. 2003.

[21] M. Mahmoud, “Real-time data acquisition system for monitoring patientsin intensive care unit (ICU),” in Proc. SPIE Conf. Multisensor MultisourceInf. Fusion, vol. 5099, pp. 320–326, 2003.

[22] W. H. Wu, A. A. T. Bui, M. A. Batalin, D. Liu, and W. J. Kaiser, “Incre-mental diagnosis method for intelligent wearable sensor systems,” IEEETrans. Inf. Technol. Biomed., vol. 11, no. 5, Sep. 2007.

[23] D. Apiletti, E. Baralis, G. Bruno, and T. Cerquitelli, “SAPhyRA: Streamanalysis for physiological risk assessment,” in Proc. IEEE CBMS, 2007,pp. 193–198.

[24] D. L. Kasper, E. Braunwald, A. Fauci, S. Hauser, D. Longo, and J. L.Jameson, Harrison’s Principles of Internal Medicine. New York:McGraw-Hill, 1992, p. 315 and p. 1349.

[25] J. Han and M. Kamber, Data Mining: Concepts and Techniques (MorganKaufmann Series in Data Management Systems). San Mateo, CA: MorganKaufmann, 2000.

[26] D. Apiletti, E. Baralis, G. Bruno, and T. Cerquitelli. (2007, Dec.) [Online].Real-time analysis of physiological data to support medical applications,Tech. Rep. Available: http://tasmania.polito.it/∼daniele/pub/

Daniele Apiletti received the Master degree in com-puter engineering from the Politecnico di Torino,Turin, Italy, in 2005. Since January 2006, he has beenworking toward the Ph.D. degree at the Database andData Mining Group, Dipartimento di Automatica eInformatica, Politecnico di Torino.

His current research interests include the fieldsof bioinformatics, sensor, and network data analy-sis. In bioinformatics, his activities are focused onmicroarray data classification and feature selectiontechniques. In the area of sensor data analysis, he is

developing data mining techniques for physiological data processing on mobiledevices. His activities also include automated classification of network trafficdata.

Page 9: Real-Time Analysis of Physiological Data to Support Medical Applications

APILETTI et al.: REAL-TIME ANALYSIS OF PHYSIOLOGICAL DATA TO SUPPORT MEDICAL APPLICATIONS 321

Elena Baralis (M’99) received the Master degree inelectrical engineering and the Ph.D. degree in com-puter engineering from the Politecnico di Torino,Turin, Italy.

She has managed several Italian and EuropeanUnion research projects. Since January 2005, she hasbeen a Full Professor at the Dipartimento di Auto-matica e Informatica, Politecnico di Torino. She isthe author or coauthor of numerous papers on jour-nal and conference proceedings. Her current researchinterests include the field of databases, in particular

data mining, sensor databases, and bioinformatics.

Giulia Bruno received the Master degree in computerengineering from Politecnico di Torino, Turin, Italy,in September 2005. Since January 2006, she has beenworking toward the Ph.D. degree at the Database andData Mining Group, Dipartimento di Automatica eInformatica, Politecnico di Torino.

She is currently engaged in research in the fieldof data mining and bioinformatics. In particular, heractivity is focused on the analysis of microarray geneexpression data, gene network modeling, data clean-ing, and semantic information discovery. Her current

research interests include classification of physiological signals in order to mon-itor patient conditions for clinical analysis.

Tania Cerquitelli received the Master degrees incomputer science from Universidad De Las AmericasPuebla, Puebla, Mexico, and the Master degree incomputer engineering and the Ph.D. degree from Po-litecnico di Torino, Turin, Italy.

Since January 2007, she has been a PostdoctoralResearcher in computer engineering at the Diparti-mento di Automatica ed Informatica, Politecnico diTorino. Her current research interests include the in-tegration of data mining algorithms into database sys-tem kernels and sensor databases.