temporal similarity measures for querying clinical workflows

18
ARTMED-1026; No of Pages 18 Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013 Temporal similarity measures for querying clinical workflows Carlo Combi a , Matteo Gozzi a , Barbara Oliboni a , Jose M. Juarez b, * , Roque Marin b a Department of Computer Science, University of Verona, Strada le Grazie 15, I-37134 Verona, Italy b Department of Information and Communication Engineering, University of Murcia, Campus de Espinardo, 30100 Murcia, Spain Received 18 December 2007; received in revised form 28 July 2008; accepted 29 July 2008 Artificial Intelligence in Medicine (2008) xxx, xxx—xxx http://www.intl.elsevierhealth.com/journals/aiim KEYWORDS Clinical workflows; Temporal similarity; Clinical guidelines; Temporal constraint networks Summary Objective: In this paper, we extend a preliminary proposal and discuss in a deeper and more formal way an approach to evaluate temporal similarity between clinical workflow cases (i.e., executions of clinical processes). More precisely, we focus on (i) the representation of clinical processes by using a temporal conceptual workflow model; (ii) the definition of ad hoc temporal constraint networks to formally represent clinical workflow cases; (iii) the definition of temporal similarity for clinical workflow cases based on the comparison of temporal constraint networks; (iv) the management of the similarity of clinical processes related to the Italian guideline for stroke prevention and management (SPREAD). Background: Clinical processes are composed by clinical activities to be done by given actors in a given order satisfying given temporal constraints. This description means that clinical processes can be seen as organizational processes, and modeled by workflow schemata. When a workflow schema represents a clinical process, its cases represent different instances derived from dealing with different patients in different situations. With respect to all the cases related to a workflow schema, each clinical case can be different with respect to its structure and to its temporal aspects. Clinical cases can be stored in clinical databases and information retrieval can be done evaluating the similarity between workflow cases. Methodology: We first describe a possible approach to the conceptual modeling of a clinical process, by using a temporally extended workflow model. Then, we define how a workflow case can be represented as a set of activities, and show how to express them through temporal constraint networks. Once we have built temporal constraint networks related to the cases to compare, we propose a similarity function able to evaluate the differences between the considered cases with respect to the order and * Corresponding author at: Departamento de Ingenieria de la Informacion y las Comunicaciones, Facultad de Informatica, Campus de Espinardo, Universidad de Murcia, 30100, Spain. Tel.: +34 968 367345; fax: +34 968 364151. E-mail address: [email protected] (J.M. Juarez). 0933-3657/$ — see front matter # 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.artmed.2008.07.013

Upload: independent

Post on 12-Mar-2023

1 views

Category:

Documents


0 download

TRANSCRIPT

ARTMED-1026; No of Pages 18

Temporal similarity measures for queryingclinical workflows

Carlo Combi a, Matteo Gozzi a, Barbara Oliboni a,Jose M. Juarez b,*, Roque Marin b

aDepartment of Computer Science, University of Verona, Strada le Grazie 15, I-37134 Verona, ItalybDepartment of Information and Communication Engineering, University of Murcia,Campus de Espinardo, 30100 Murcia, Spain

Received 18 December 2007; received in revised form 28 July 2008; accepted 29 July 2008

Artificial Intelligence in Medicine (2008) xxx, xxx—xxx

http://www.intl.elsevierhealth.com/journals/aiim

KEYWORDSClinical workflows;Temporal similarity;Clinical guidelines;Temporal constraintnetworks

Summary

Objective: In this paper, we extend a preliminary proposal and discuss in a deeper andmore formal way an approach to evaluate temporal similarity between clinicalworkflow cases (i.e., executions of clinical processes). More precisely, we focus on(i) the representation of clinical processes by using a temporal conceptual workflowmodel; (ii) the definition of ad hoc temporal constraint networks to formally representclinical workflow cases; (iii) the definition of temporal similarity for clinical workflowcases based on the comparison of temporal constraint networks; (iv) the managementof the similarity of clinical processes related to the Italian guideline for strokeprevention and management (SPREAD).Background: Clinical processes are composed by clinical activities to be done by givenactors in a given order satisfying given temporal constraints. This description meansthat clinical processes can be seen as organizational processes, and modeled byworkflow schemata. When a workflow schema represents a clinical process, its casesrepresent different instances derived from dealing with different patients in differentsituations. With respect to all the cases related to a workflow schema, each clinicalcase can be different with respect to its structure and to its temporal aspects. Clinicalcases can be stored in clinical databases and information retrieval can be doneevaluating the similarity between workflow cases.Methodology: We first describe a possible approach to the conceptual modeling of aclinical process, by using a temporally extended workflow model. Then, we definehow aworkflow case can be represented as a set of activities, and show how to expressthem through temporal constraint networks. Once we have built temporal constraintnetworks related to the cases to compare, we propose a similarity function able toevaluate the differences between the considered cases with respect to the order and

* Corresponding author at: Departamento de Ingenieria de la Informacion y las Comunicaciones, Facultad de Informatica, Campus deEspinardo, Universidad de Murcia, 30100, Spain. Tel.: +34 968 367345; fax: +34 968 364151.

E-mail address: [email protected] (J.M. Juarez).

0933-3657/$ — see front matter # 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.artmed.2008.07.013

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. ArtifIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

2 C. Combi et al.

duration of corresponding activities, and with respect to the presence/absence ofsome activities.Results: In this work, we propose an approach to evaluate temporal similaritybetween workflow cases. The proposed approach can be used (i) to query clinicaldatabases storing clinical cases representing activities related to the management ofdifferent patients in different situations; (ii) to evaluate the quality of the servicecomparing the similarity between a (possibly synthetic) case, perceived as the goodone with respect to a given clinical situation, and the other clinical cases; and (iii) toretrieve a particular class of cases similar to an interesting one.# 2008 Elsevier B.V. All rights reserved.

1. Introduction

In health care organizations, clinical (business) pro-cesses are becoming of particular importance: theyallow the healthcare actors to focus on crucialaspects as evaluating the quality of healthcare ser-vices, suggesting themore suitable clinical pathway,controlling the budget for each clinical activity [1].Clinical business processes are related to the med-ical care and can be considered and studied fromdifferent perspectives [2—5]. From a more clinical-oriented point of view, clinical guidelines and pro-tocols may be seen as specific medical processes,having a widely acknowledged structure: clinicalguidelines describe, in natural language, the recom-mended behaviour of a medical team, the activitiesto apply to the patient, and their fulfilment withrespect to the time and to the state of patienthealth, for defining the best way to managepatients. The definition and management of clinicalprocesses — based on guidelines, protocols, or onspecific clinical and therapeutical actions — providea support to physicians in their daily activities andhelp physicians to improve patient care and healthoutcomes for their patients by providing recommen-dations usually based on scientific evidence andexpert clinical opinion. According to this scenario,in the next years there will be a huge amount ofclinical process-related data available for severaldifferent purposes: evaluating the quality of theprovided care, extracting medical knowledge,assessing clinical procedures, and so on.

In a more general perspective, clinical processescan bemanaged bymeans of business modeling toolssuch as workflow management systems (WfMSs)[2,3]. Workflows are processes involving the coordi-nated execution of single atomic activities (namedtasks), assigned and executed by processing entities(named agents) to achieve a common goal. Clinicalprocesses can be modeled by workflow schemataand enacted by suitable software systems, calledWfMSs. Workflow cases, instances of the same work-flow schema, can be different with respect to the

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

structure, i.e., to the activities composing thecases, and to their (temporal) order and length.When a workflow schema represents a clinical pro-cess, its cases represent different instances derivingfrom dealing with different patients in differentsituations. The best (worst) application of a clinicalprocess can be represented by means of workflowcases and can be used to evaluate the quality of theservice comparing the similarity between several(possibly synthetic) cases, perceived as the goodones with respect to different clinical situations,and the other clinical cases. Moreover, a given case,representing something of interest, can be used toretrieve a particular class of cases similar to thegiven one. Thus, information retrieval can be doneevaluating the similarity between workflow cases.The evaluation of temporal similarity seems to be animportant issue in the clinical context, where casesare slightly different according to the patient situa-tion.

In this work, we extend the proposal described in[6] and discuss in a deeper and more formal way anapproach to evaluate temporal similarity betweenworkflow cases. More precisely, in this paper weface some important methodological issues, whichneed to be suitably solved before the design andimplementation of clinical systems for managingand retrieving clinical process data. In particular,we focus on the following methodological aspects:� use of a temporal conceptual workflow model to

represent clinical processes;� definition of ad hoc temporal constraint networks

to formally represent clinical workflow cases;� definition of temporal similarity for clinical work-

flow cases based on the comparison of temporalconstraint networks;

� use as a proof-of-concept of real world clinicalprocesses derived from modeling some fragmentsof the Italian guideline for stroke prevention andmanagement (SPREAD) [7].

The structure of the paper is as follows: Section 2provides some background on the modeling of clin-

similarity measures for querying clinical workflows. Artif

Temporal similarity measures for querying clinical workflows 3

ARTMED-1026; No of Pages 18

ical processes by workflow systems and on the con-cept of temporal similarity. Section 3 presents aportion of the considered guideline and its concep-tual schema obtained through a workflow temporalconceptual model. Section 4 defines the main con-cepts we propose to evaluate similarity betweenworkflow cases and provides an overall example ofsimilarity evaluation for clinical cases. Finally, Sec-tion 5 sketches some concluding remarks and futureresearch directions.

2. Background

Any health or clinical process requires the coordi-nated execution of single activities to achieve acommon goal: as an example, we may consider ageneral process related to the intensive care ofpatients, where diagnosis-related and therapeutictasks, possibly involving several different peoplesuch as physicians, nursery, and technicians, haveto be coordinated.

Clinical guidelines may be considered as an infor-mal, natural language specification of a clinicalprocess for a given category of patients; guidelinesare acknowledged by the medical community as therecommendation for dealing in a sound way with thespecified patients: for these reasons, issues relatedto the formal representation of guidelines havebeen considered by several research teams [1,3—5].

In general, there are some similarities betweenguidelines and organizational processes: on onehand, in the clinical context, guidelines describea sequence of activities to be done; on the otherhand, in the business context, an organizationalprocess can be defined as a description of tasksand consists of subprocesses, decisions and activ-ities. In both cases, a sequence of activities must bedone to reach a (given) goal, in the former case tomanage in a correct way the patient situation, whilein the latter case to satisfy the business needs. Thismeans that guidelines can be seen as processes, andcan bemanaged bymeans of business modeling toolssuch as workflow management systems [3].

In general, WfMSs are software systems able tosupport the specification and the coordination oforganizational processes [8]: a workflow formallydescribes process activities, including criteria toassign every single activity to an executing unit.We name process model or schema the structure ofthe workflow, defining how the single atomic activ-ities are coordinated and enacted in sequence overtime. The workflow designer specifies the processmodel, along with criteria to assign tasks to theexecuting units, named agents. Given a processmodel, several instances or cases can be run, each

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

one owning its specific data [8]. In general, WfMSsspecify, control, and coordinate the flow of workcases (sequences of activities which form an orga-nizational process). WfMSs support the execution ofworkflow instances and need the management ofhuge quantities of data. To the best of our knowl-edge, no WfMS as such escapes from the use of adatabase management system (DBMS): all of theadopted DBMSs are based on the relational model.

Recently, some research efforts have beenfocused on using WfMSs for managing clinical orhealth processes [2,3,9]. In the clinical workflowcontext, time is an important aspect to consider. Forinstance, activities described in a guideline must beexecuted according to given coordination rules,involving also some temporal constraints. Thus, aworkflow schema may contain qualitative and quan-titative temporal constraints.

In clinical workflows, and in most models forclinical scenarios, it is fundamental to find the bestway for representing time, processing temporaldata, and comparing temporal information by simi-larity techniques. Similarity measures, taken fromthe analogical reasoning, can be used to quantifyhow similar two elements are. These measures aretraditionally defined by mathematical functions,stating the properties of reflexivity and symmetry.

In general, there are two kinds of medical tem-poral data: time series (biosignals), and temporalsequences (time-stamped clinical data). Proposalsof similarity measures for time series usually workon the raw time series data (e.g., ECG or EEGdirectly obtained from monitoring) and aim toderive the most representative features from alarge amount of data [10]. Some of the most suc-cessful strategies are based on the dimensionalityreduction (Discrete Fourier Transform, DiscreteWavelet, Time Warping Transformations), in orderto obtain a feature vector or model parameters[10,11].

Temporal sequences are collections of occur-rences of different event types, as, for example,the set of test results of a patient during a week inthe Intensive Care Unit. Occurrences are usuallyassociated to single time points. The work describedin [12,13] defines the distance between twosequences using the Euclidean distance, by viewinga sequence as a point in a suitable multidimensionalspace. In [14], the similarity evaluates the relativeposition of an event occurrence within a windowcontext. That is, event occurrences are similar ifthey occur in a similar context, and contexts aredefined as the set of events happening within apredefined time window.

Furthermore, it is common to find sequencescomposed by facts holding on intervals, such as

similarity measures for querying clinical workflows. Artif

4 C. Combi et al.

ARTMED-1026; No of Pages 18

the description of protocols, treatments, patientsymptoms, or parameters abstracted from bio-signals (e.g., ST-segment elevation of an ECG). Thiskind of temporal data is called interval sequenceand is of increasing interest in many research fields,as in temporal clinical abstraction [15,16], and intemporal data mining [17,18]. Focusing on the fewproposals dealing with similarity for intervalsequences, in [19], the authors discuss differentsolutions for defining similarity between two tem-poral sequences, according to the distance betweentheir composing intervals.

In [20], the authors consider the issue of recog-nizing similar clinical scenarios, composed by bothevents (point-based) and facts (interval-based); asthe scenarios are represented through temporalconstraint networks, similarity is led back to thefusion of both networks. Temporal Constraint Net-works are a powerful approach for representingand querying temporal information. They arerepresented as a constraint satisfaction problem(CSP) [21], where variables denote event typesand constraints represent the temporal relationsamongst them. The interval algebra (IA), intro-duced by James Allen to represent and manageinterval relations [22], is one of the most consid-ered, used and suitably modified/extended mod-els. For instance, in [23] a general framework forqualitative—quantitative interval constraints isdescribed.

As for the stroke management domain, in [24],the authors analyse the discovery of time depen-dency patterns in clinical pathways related to thecare of patients having had stroke, proposing atemporal similarity measure between pathwaysconsidering the common edges between the timedependency networks.

3. Representing clinical temporalworkflows

In this section, we consider a possible approach forconceptually modeling clinical workflows.We repre-sent clinical workflows by using a temporallyextended workflow model, which allows one tosimply show the required clinical tasks (i.e., activ-ities), the execution flow of tasks, and the temporalconstraints on them [2].

Throughout the paper, we will deal withthe representation and the management of work-flows related to some fragments of the Italianguideline for stroke prevention and management(SPREAD). The fragments we will consider to prac-tically explain our approach will be the followingones:

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

Synthesis 9.1: A stroke victim should rapidly beassessed after hospitalization (T1), by means of ageneral examination (T2) and [. . .]

Recommendation 9.1 and 9.2: an early and standar-dized neurological evaluation (T3) is recommendedin the setting of a qualitatively adequate manage-ment of acute stroke (COND-1).

Recommendation 9.4: [. . .] the following bloodexams (T4) are recommended: complete bloodcount including platelets, [. . .], and coagulationtests [. . .]

Recommendation 9.6: The electrocardiogram (T5)is recommended in all suspected stroke victims whoare admitted to an Emergency Room.

Recommendation 9.7: A non-contrast CTscan (T7) isrecommended as soon as possible in the emergencycare, in any case not later than 24 h from strokeonset.

Recommendation 9.8: The use of adequate techni-cal parameters and positioning criteria (T6) isrecommended for CT scan assessment of acutestroke (COND-2).

Recommendation 9.9: Digital subtraction angiogra-phy (T8) is recommended in the acute stroke only ifpre-procedural to an intra-arterial thrombolyticapproach (COND-3). Otherwise, the study of arter-ial occlusion may be obtained by means of MRA(magnetic resonance angiography: T10) or CTA(computed tomography angiography: T9).

Recommendation 9.10: The repetition of non-con-trast CT scan (T11) is suggested within 48 h andanyhow not later than 7 days from stroke onset. It isparticularly recommended when stroke is severe(COND-5) [. . .].

Fig. 1 depicts the considered portion of the guide-line by means of a workflow schema. The schema is adirected graph, calledworkflow graph, where nodescorrespond to activities and edges represent controlflows that define the execution order of activities.There are two different activity types: task andconnector.

Tasks are the elementary work units that collec-tively achieve the process goal. Each task has oneincoming edge and one outgoing edge and is graphi-cally represented as a box. For example, task T1 inFig. 1 models the hospital admission activity.

A connector can be initial (startcase), final (end-case), or intermediate. Each workflow graph must

similarity measures for querying clinical workflows. Artif

Temporal similarity measures for querying clinical workflows 5

ARTMED-1026; No of Pages 18

Figure 1 The workflow schema of the considered guideline portion. Dashed arrows represent a constraint on the timedistance between two non-subsequent tasks, where ½B=E; x; B=E�granularity quantifies the maximum duration (x)between the begin (B) or end (E) of the tasks.

have only one startcase and at least one endcase.When any endcase is reached, the execution of theprocess is completed: more precisely, when theearliest endcase is performed, the execution isconsidered successful and all the other tasks, whichare still active, are cancelled. Graphically, startcaseand endcase are represented by two parallel lines.

Intermediate connectors are exploited to modeldifferent execution paths. Paths that are (possibly)executed concurrently are splitted by the Total

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

connector and then joined by the And connector.Similarly, paths that are executed alternatively aresplitted by the Cond connector and then joined bythe Or connector. Conditional connectors are gra-phically represented as diamonds.

It is worth noting that data required and/orproduced during the performance of different tasksmay be related to several and different databases:for example, data acquired during tasks T3 and T5,referring to the neurological assessment and to ECG,

similarity measures for querying clinical workflows. Artif

6 C. Combi et al.

ARTMED-1026; No of Pages 18

respectively, are part of the electronic medicalrecord of the considered patient, while data derivedfrom task T6, referring to the tuning of parametersfor CT scan, could be either only operational data,not stored permanently, or technical data stored insome radiology document.

The modeling of guideline activities is enrichedby additional temporal information, such as alloweddurations and delays or temporal constraints. Theduration is a temporal property of both nodes andedges and is always expressed by using intervals. Anode duration represents the temporal span of thecorresponding activity and is the temporal distancebetween the beginning and ending instants of theactivity itself. For instance, the allowed duration oftask T5 is within 3 and 12 min. The edge duration,called delay, denotes the interval between the end-ing instant of the predecessor and the beginninginstant of the successor. For instance, the alloweddelay between the end of T4 and the beginning ofT5 is between 5 and 7 min.

In some cases, only maximum or minimum dura-tion may be defined: consequently intervals couldbe only partially defined. In our model, duration anddelay bounds are integer values if the interval iscompletely specified, otherwise we use the symbolU to represent undefined values. For instance, themaximum delay between the end of T3 and thebeginning of T4 is unbounded.

Another important temporal feature is the cap-ability of constraining the time distance (duration)between the beginning/ending instants of two non-subsequent tasks. It is expressed as a dashed arrowconnecting the two non-subsequent tasks and alabel [IF, TimeDistance, IL] Granule thatdenotes that TimeDistance must be the maximumspan of time between the beginning or endingexecution instant (IF) of the first task and the begin-ning or ending execution instant (IL) of the last task.Fig. 1 provides an example of a relative constraint,

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

Figure 2 Examples of workflow cases

which fixes to 48 h the maximum time distancebetween the beginning of task T7 and the end ofT11, as required by the considered fragment of theguideline.

Moreover, this constraint can be used to link thetask execution to external events, such as theStroke Event. In our model, an event is representedas a rounded clock and indicates a time instant thatcould bias the execution of activities. For instance,task T7 must be completed not later than 24 h fromthe stroke event. Further constructs of the workflowconceptual model are discussed in [2]. It is worthnoting that expressing a clinical guideline through aworkflow schema arises several issues related to theinterpretation of natural language sentences,requiring a sound background of medical knowl-edge. Another important issue is to verify the tem-poral consistency of a clinical guideline: bytemporal consistency we mean that the guidelinecould be executed satisfying all the given temporalconstraints. Checking the temporal consistency of aclinical guideline described through a workflowschema involves the study of some computationalissues, which are out of the scope of this paper. Inthe following, we will assume that these aspectshave been deeply considered and that the resultingworkflow schema is sound and describes the con-sidered guideline in a proper way.

According to the specified schema, there areseveral possible workflow cases, which correspondto the clinical path of different patients. During theexecution of workflow cases, a temporal WfMS isable to detect violations of the specified constraintsand either to possibly perform some compensationtasks or to stop the execution of the current case[25].

A workflow case is defined as follows:

Definition 1. Being C the set of cases correspondingto a given workflow schema, a workflow case

similarity measures for querying clinical workflows. Artif

of five different patient scenarios.

Temporal similarity measures for querying clinical workflows 7

ARTMED-1026; No of Pages 18

CASE2C is an ordered set of labeled task intervals.A labeled task interval is a pair (taskName, t) wheretaskName is a task label (e.g., T1, T2, T3), and t isthe task interval. The task interval t can beexpressed as ðt�; tþÞ where t� and tþ are time-stamps describing the beginning and ending timesof the task interval, respectively.

CASE ¼ < ðtaskInterval1Þ; ðtaskInterval2Þ; . . . ;

ðtaskIntervalnÞ>¼ < ðtaskName1; ðt�1 ; tþ1 ÞÞ; ðtaskName2;

ðt�2 ; tþ2 ÞÞ;. . . ; ðtaskNamen; ðt�n ; tþn ÞÞ>

such that 8 ðtaskNamei; tiÞ 2CASE;t�i � t�iþ1; i ¼1; . . . ; n� 1.

Fig. 2 depicts five different workflow cases for theworkflow schema shown in Fig. 1: each case isrepresented as a sequence of intervals labeled bythe corresponding task name on the timeline havingthe beginning of the case as origin.

In general, cases are only temporally constrainedby the specification of the workflow schema. Thus,cases of the same schema could differ with respect tothe order and duration of tasks, and with respect tothe presence/absence of some tasks due to alterna-tive paths. For instance, cases CASE_1and CASE_3in Fig. 2 differ on the order between tasksT1 andT2.CASE_3 differs from all the other cases on theabsenceof tasksbutT1andT2, due to thealternativepaths induced by the connector COND-1. CASE_4differs from the other cases because it has task T10.

According to this scenario, a huge amount ofcases for the same guideline will be stored in adatabase by a hospital stroke unit. Querying andanalyzing this database is, thus, extremely impor-tant for several clinical applications: for example,to evaluate the quality of the provided care, wecould compare the similarity between a given case,considered as a good one, and the real clinical casesin the database. Moreover, a given case, represent-ing something clinically interesting, may be used toretrieve a set of cases similar to the given one. In thenext section we propose a suitable definition of(temporal) similarity for comparing different cases.

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

Figure 3 Examples of BTR disjunction types. Graphically, in thleft to right, from the relationAbBwe reachAaB, by passing thronot the case for the convex disjunction (right), where it is pos

4. A temporal similarity measure forclinical cases

To evaluate temporal similarity between clinicalworkflow cases, we propose to (i) compare corre-sponding performed tasks; (ii) compare qualitative/quantitative temporal relations between corre-sponding tasks; and (iii) consider the presence/absence of some tasks.

Our proposal compares workflow cases by meansof an interval similarity function. The given casesare represented through constraint networks, andthe similarity is evaluated by considering the dis-tance between intervals representing correspondingtasks, and the distance for the relations betweencorresponding tasks.

4.1. Augmented basic interval network:ABIN

Temporal constraint networks (TCNs) are a temporalmodeling approach that provides an explicit repre-sentation of temporal relations for a given scenario.This is an advantage when two temporal scenariosmust be compared since, instead of comparingtemporal event data, we could directly identifythe overall differences between temporal relations.Moreover, these constraints can also contain differ-ent aspects of the temporal information (e.g., quan-titative/qualitative, crisp/fuzzy) enriching itsdescription and providing a flexible representationfor different purposes.

The interval algebra (IA) [22] deals with variablesrepresenting intervals and qualitative constraintsbetween them. The interval algebra assumes 13basic temporal relations (BTR): before (b), meets(m), overlaps (o), starts (s), during (d), finishes (f),their inverses, and equal (e). However, there aresome IA aspects that, from the point of view of ourparticular domain, must be reviewed. On the onehand, IA is able to represent combinations of BTRs(213 possible disjunctions), making the consistencyalgorithms NP-Complete because non-convex dis-junctions are allowed; intuitively, a disjunction ofrelations between two intervals is convex if they canbe directly transformed into one another by con-

similarity measures for querying clinical workflows. Artif

e non-convex disjunction (left),moving the interval A fromugh several other relations, such as, for example,AmB. It issible to move from AbB to AmB, by simply moving A.

8 C. Combi et al.

ARTMED-1026; No of Pages 18

Figure 4 Example of expressivity of ABIN: sequence, language, and network. In the network at the right side, nodesrepresent intervals (with the range of allowed durations) while edges represent interval constraints (i.e., a qualitativerelation followed by a quantitative constraint).

tinuously deforming the intervals (Fig. 3 depicts anexample of non-convex and convex disjunctions).On the other hand, IA lacks of quantitative informa-tion, which is useful to measure temporal similarity.

Our approach is based on augmented basic inter-val networks. An augmented basic interval network(ABIN) allows one to represent in an explicit way thetemporal information useful to measure temporalsimilarity between sequences of intervals. An ABINis an interval constraint network that permits aconstrained set of the qualitative interval relations,namely BTRs and reducible disjunctions (those dis-junctions which can be translated into conjunctionsof relations between the bounds of the intervals[26]), and quantitative constraints for intervals,i.e., constraints on interval durations and con-straints on the distance between interval bounds.

We define an augmented basic interval networkas follows:

Definition 2. Given a time domain isomorphic toreal numbers R, an augmented basic interval net-work (ABIN) is a structure SABIN ¼ hT ;R;DðTÞiformed by a set Tof intervals, a set R of augmentedinterval constraints, and a set of unary metric con-straints DðTÞ where:

� T is the set of intervals ft1; � � � ; tng. Each t j 2T isrepresented as t j ¼ ðt�j ; tþj Þ and t�j ; t

þj 2R char-

acterize begin and end of the interval, respec-tively.

� R ¼ fri; jjti; t j 2Tg is the set of augmented inter-val constraints, with ri; j ¼ ðbi; j;mi; jÞ, where:� [� bullet] bi; j 2Rb is either a BTR or a reducible

disjunction of BTRs between ti; t j 2T;� [� bullet] mi; j 2M : M ¼ f½x; y�g with x; y 2R, is

the set of quantitative constraints betweentþi and t�j .

� DðTÞ ¼ fd jjfor each t j 2Tg where d j ¼ ½x; y� withx; y 2R; d j represents the range of the alloweddurations for interval t j, i.e. t

þj � t�j .

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

This formalism provides a flexible language toexpress temporal qualitative—quantitative informa-tion on intervals. For instance, Fig. 4 (left) depicts anexamplewith threepotential scenariosof intervalsA,B, and C, and, on the right, shows how they can beexpressed using a single ABIN network. In the exam-ple, any depicted range has the same value for upperand lower bounds, since there is no imprecision in thetemporal scenario. Note that an ABIN describes therelevant informationof theworkflowcase,discardingthe absolute references of the temporal sequence(i.e., the time-stamped information of the begin-endinformation of each interval).

Our approach requires to build a temporal con-straint network for each clinical workflow case. Tothisend,wepropose threeessential steps toobtainanABIN network, froma givenworkflow case CASE i2C:1. Obtain the intervals of an ABIN network from the

intervals of the clinical workflow case.2. Obtain the unary constraints (interval dura-

tions).3. Obtain the rest of relations of an ABIN network

from the temporal data of the workflow case.

Let us to suppose that we want to describe anABIN network SABIN ¼ hT ;R;DðTÞi from the followingworkflow case:

CASE i ¼ < ðtaskIntervali1Þ; ðtaskIntervali2Þ; . . . ;

ðtaskIntervalinÞ>¼ < ðtaskNamei1 ; ðt�i1 ; t

þi1ÞÞ; ðtaskNamei2 ;

ðt�i2 ; tþi2ÞÞ;

. . . ; ðtaskNamein ; ðt�in ; tþinÞÞ>

At the first step, we can state, by definition, adirect match between each task interval of theworkflow case (taskIntervali j 2CASE i), and a corre-sponding interval of the ABIN network (t2T). For-mally we can state that:

8 taskIntervali j 2CASE i; 9 t j 2T withd j 2DðTÞ; d j

¼ ½jt�i j � tþi j j; jt�i j� tþi j j�

similarity measures for querying clinical workflows. Artif

Temporal similarity measures for querying clinical workflows 9

ARTMED-1026; No of Pages 18

Then, we must state which are the relations(r 2R) of the ABIN network. At this point, an essen-tial decision must be taken concerning the numberof relations that have to be obtained. That is, wemust choose between these two approaches: (i) toobtain the complete network (by considering therelations between all pairs of intervals) and (ii) toobtain a partial network. The main advantage of thefirst option is the fact that all relations are obtaineddirectly from the temporal data of the workflowcase, where the complexity of the algorithm, con-sidering jCASE ij ¼ n, is Oðn2Þ. Furthermore, theinference of new relations will not be needed.The main disadvantage of the calculus of a completenetwork is the increment of temporal informationredundancy compared to the information providedby the sequence of task intervals. For instance,given the intervals tk; t j 2T, one of relations rk; jand r j;k is redundant. Despite that this particulartype of redundancy can be avoided, the complexityorder of the algorithms is the same. In our applica-tion domain, if the workflow case similarity must bemeasured, we consider that is more convenient touse only partial networks, avoiding temporal infor-mation redundancy. Because workflow cases have,by definition, a total task order (see Definition 1),the most simple mechanisms to obtain partial net-works is to obtain only relations between consecu-

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

Figure 5 The temporal networks obtained from the workflbetween consecutive tasks of the case, while dashed edges aedges is shortened as m.

tive tasks. This derivation has linear complexity withrespect to the number of tasks of the workflow case.

The obtained network is composed by nodesand edges: nodes represent task intervals, whileedges stand for qualitative relations, enriched bysome quantitative information, between two taskintervals. Nodes are labeled by the correspondingtask name and by the related (upper and lowerbounds of) interval durations, while edges arelabeled by a single Allen’s interval relationenriched with some quantitative data. Each taskinterval has a corresponding node in the network,while edges are introduced only for relationsbetween each task taskIntervali and its successiveone taskIntervaliþ1.

For example, Fig. 5 depicts the networks corre-sponding to cases reported in Fig. 2: in the ABINcorresponding to CASE_1, tasks T1and T2are repre-sented by labeled nodes and the relation beforebetween them is represented as a directed edgelabeled by b and by the interval [1,1], describingthe (minimum and maximum) delay between theend of T1 and the beginning of T2. It is worth notingthat, as the network corresponding to a case has nouncertainty for task durations and delays, it isredundant to have range values for them and edgesare labeled only by a single Allen’s relation. How-ever, this expressiveness is useful in other scenarios,

similarity measures for querying clinical workflows. Artif

ow cases of Fig. 2. Continuous edges represent relationsre derived to compare the networks. The label m½0; 0� for

10 C. Combi et al.

ARTMED-1026; No of Pages 18

not considered in this paper, such as representingimprecise clinical records or describing general clin-ical pathways. For instance, the history of a patientcould register that the electrocardiogram test (T5)lasted between 4 and 7 min. Moreover, uncertaintycould arise from missing data, when some taskfeatures are not completely stored by the clinicalinformation system: in this case, some flexible andcontext-related techniques should be designed andimplemented to allow one to estimate a range ofpossible values for missing data. These aspects ofthe model and their applications will be consideredas part of our future work.

One important facet of the use of a TCN is thetractability of the defined problem. The ABIN modelis based on the enrichment of Allen’s interval alge-bra. However, despite that deciding consistency ofinterval constraint networks is NP-Complete [21] forthe general problem (non-convex disjunctive rela-tions allowed), there are tractable subclasses of theproblem. In particular, as described in [21,27], thereare three tractable subclasses:� The pointisable subclass can be represented by an

interval algebra allowing the use of interval rela-tions that can be expressed as a disjunction of thepoint algebra relations (f< ;¼; > g) betweeninterval bounds. The consistency problem asso-ciated to this class can be solved using pathconsistency algorithms in Oðn4Þ.

� The continuous endpoint subclass corresponds toan interval algebra allowing the use of intervalrelations that can be expressed as a disjunction ofthe point relations excluding 6¼ , i.e., f< ; > g.The consistency problem associated to this classcan be solved using path consistency algorithms inOðn3Þ.

� The ORD-Horn subclass corresponds to the max-imal tractable fragment of IA, where a 5-consis-tency algorithm Oðn5Þ is required to obtain theminimal network.

In an ABIN, the model restricts the qualitativerelations of the problem to BTRs and to the 181reducible disjunctions proposed by Van Beek [26].Thus, the problem is part of the pointisable subclassand, therefore, it can be reduced into a point con-straint network, which is a simple temporal problem(i.e., it can be solved in polynomial time).

Moreover, quantitative aspects of ABIN imply theintroduction of metric point constraints into thepoint constraint network. That is, the duration(di 2DðTÞ) expresses a metric constraint betweenthe begin and end points of the interval, while thequantitative parameters of interval relations (mi; j)are basic metric constraints between the end andthe begin of the two involved intervals.

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

Theorem 1. An augmented simple interval net-work (ABIN) is a structure SABIN ¼ hT ;R;DðTÞi thatcan be reduced to a simple temporal problem.

Proof. A simple temporal problem (STP) is a pro-blem which can be expressed by a metric constraintnetwork, where nodes stand for time points anddirected edges are labeled by a single interval ofallowed distances between the connected nodes;consistency of STPs may be checked in polynomialtime [28]. In order to prove that an ABIN networkcan be reduced to an STP, the following statementsmust be considered:1. In SABIN the intervals of the network (t2T) can be

expressed by a pair of points describing thebounds of the interval (t� and tþ).

2. In SABIN, d j 2DðTÞ is, by definition, a range ofallowed values for the distance between thebounds of interval t j, 8 t j 2T.

3. In SABIN, 8 ti; t j 2T, ri; j 2R can be decomposedinto two constraints mi; j and bi; j. The relationmi; j ¼ ½x; y� is a range of allowed values for the(module of the) distance between tþi and t�j bydefinition. The constraint bi; j can be translatedinto four metric constraints between the fournodes (time points) that describe the two inter-vals ti and t j: these metric constraints are thequantitative version of the qualitative relationsbetween the limits of the interval.

Therefore, when the transformations from 1 to 3have been applied, SABIN is reduced to a STP. &

Once the transformations are applied, all STPalgorithms proposed by Dechter and Meiri [28]can be used. Thus, the classical path consistencyalgorithms can be used to obtain the minimalnetwork.

4.2. Evaluating similarity measure

Our proposal is based on the steps defined in Algo-rithm 1, and compares workflow cases, representedas ABINs, by means of an interval similarity function.

Intuitively, the algorithm expresses the two givencases (line 1) through interval constraint networks(lines 2 and 3), evaluates the distance between

similarity measures for querying clinical workflows. Artif

Temporal similarity measures for querying clinical workflows 11

ARTMED-1026; No of Pages 18

intervals representing corresponding tasks (line 4),and the distance for relations between correspond-ing tasks (line 5), and uses intra-task and inter-taskdistances to evaluate the overall similarity by con-sidering also possible dissimilarities of cases withregard to the occurring tasks (line 6).

As we already discussed, in this paper we do notdeal with uncertainty for task durations and fordelays between tasks. Thus, hereinafter we willuse a simplified notation for the quantitative partof ABIN constraints:mi; j and d j will be considered assingle real numbers instead of ranges, in describingsimilarity measures.

Once workflow cases are translated into temporalnetworks, in order to perform a comparison betweencases, we need to establish a correspondencebetween nodes and edges of two different networks.As task names univocally identify tasks within a case,the correspondence between nodes is built throughtask names; the correspondence between edges isbuilt up on the correspondence of the connectednodes. When two connected nodes are consecutivein a case and are not consecutive in the other one,weneed to derive the missing edge in order to comparecorresponding (possibly derived) edges.Forexample,for comparing CASE_1 and CASE_4 we need toderive the edges between T2 and T1, and betweenT1 and T3 in the temporal network representingCASE_1, and the edges between T1 and T2, andbetween T2 and T3 in the temporal network repre-senting CASE_4. In Fig. 5 derived edges are repre-sented through dashed edges.

4.3. Intra-task distance

The intra-task distance provides a direct method toevaluate the similarity of corresponding tasks, withrespect to their durations, and to other atemporalfeatures within the workflow process (such as theagent that performed the task).

The intra-task distance (din) is based on theduration of the task interval (Dt ¼ ðtþ � t�Þ) andis defined as follows:

Definition 3. Given two workflow cases having twotasks with the same name, represented by twonodes with the same name in the networks S0 andS00, obtained from the considered workflow cases,and given the intervals t0 2T 0 and t00 2T 00 related tothe two nodes, the intra-task distance function isdefined as follows:

din : T � T!½0; 1Þdinðt0; t00Þ ¼ a

jðDt0 � Dt00 ÞjjDt0 j þ jDt00 j

þð1� aÞdwfðt0; t00Þ

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

where a2 ½0; 1� is the weight of the duration in thefunction, and the distance function dwf measuresother workflow parameters not related to the tem-poral dimension.

The distance function din fulfills the followingproperties:

Property 1 (Reflexivity). Given SABIN ¼ hT ;R;DðTÞi and t0 2T, dinðt0; t0Þ ¼ 0.

Property 2. Given

� S0ABIN ¼ hT 0;R0;D0ðT 0Þi

� S00ABIN ¼ hT 00;R00;D00ðT 00Þi

� t0 2T 0; t00 2T 00

then dinðt0; t00Þ ¼ dinðt00; t0Þ (assuming that thedistance function dwf is symmetric).

The proof of Property 2 is trivial since jðDt0 �Dt00 Þj ¼ kðDt00 � Dt0 Þj where Dt0 ;Dt00 2R.

4.4. Inter-task distance

After considering intra-taskdistance,wehave to takeinto account the inter-task distance. The inter-tasksimilarity deals with similarities for temporal rela-tions between corresponding tasks. At this aim, wedefine an inter-task distance function (dIN), whichtakes into account both qualitative (Q) and quanti-tative (q) components of the edge labels in thetemporal network, by using functions Q and q,respectively (as shown inFig. 6). FunctionQmeasuresthe distance between two qualitative temporal rela-tions. The distance between two Allen’s relations isevaluated according to the distance of the consid-ered relations on one of the neighbour graphs pro-posed by Freksa in [29]: two interval relationsbetween the same intervals are neighbours if it ispossible to directly move from one relation to theother one, by continuously deforming the intervals(i.e., shortening, shifting,. . .). For example, if wehave a before relation between two intervals,we canmove from before to meets, by simply shifting thefirst interval to be contiguous to the second one: inthis case, the distance between before and meets is1. In this work, we adopted, without loss of general-ity, the A-neighbours graph depicted in Fig. 7.

Definition 4. Given two qualitative temporal rela-tions b0, b00, the distance function Q can be definedas follows:

similarity measures for querying clinical workflows. Artif

12 C. Combi et al.

ARTMED-1026; No of Pages 18

Figure 6 Example of ABIN reduction to point constraint network.

Q : BTR� BTR!½0; 1�

Q ðb0; b00Þ ¼ pathðb0; b00;GÞmax fpathð�; �;GÞg

Being G the neighbours graph between intervalrelations, function pathðb0; b00;GÞ measures theshortest path on the graph G between the relationsb0 and b00. Function Q is normalized considering thelongest path on the graph G.

Function Q has the following properties:

Property 3 (Reflexivity). Given r0i; j 2R0 and

r0i; j ¼ ðb0i; j;m0i; jÞ, then Qðb0i; j; b0i; jÞ ¼ 0.

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

Figure 7 The A-neighbours graph proposed by

Property 4. Given r0i; j 2R0 and r00i; j 2R

00 thenQðb0i; j; b00i; jÞ ¼ Q ðb00i; j; b0i; jÞ.

The proof of Property 4 is trivial sincepathðbi; bj;GÞ is the length of the shortest pathbetween the nodes bi and bj of the graph G andmax fpathð�; �;GÞg is constant regarding G.

Function q takes into account the quantitativeaspects of an ABIN network, and is defined as fol-lows:

Definition 5. Given two quantitative temporal con-straints between time points inM (i.e., the set of all

similarity measures for querying clinical workflows. Artif

Freksa and an example of the Q function.

Temporal similarity measures for querying clinical workflows 13

ARTMED-1026; No of Pages 18

possible allowed ranges), the distance function q isdefined as follows:

q : M�M!½0; 1�qðm0;m00Þ ¼ jm

0 �m00jjm0 þm00j

Function q has the following properties:

Property 5 (Reflexivity). Given r0i; j 2R0 and r0i; j ¼

ðb0i; j;m0i; jÞ then qðm0i; j;m0i; jÞ ¼ 0.

Property 6. Given r0i; j 2R0 and r00i; j 2R

00 thenqðm0i; j;m00i; jÞ ¼ qðm00i; j;m0i; jÞ.

The proof of Property 6 is trivial since8m0;m00 2M; jm0 �m00j ¼ jm00 �m0j andjm0 þm00j ¼ jm00 þm0j.

Once the distance functions Q and q have beendefined, the inter-task distance can be defined asfollows:

Definition 6. Given two corresponding relations r0,r00 in two ABIN networks representing two differentclinical workflow cases, the inter-task distancefunction dIN is defined as follows:

dIN : R� R!½0; 1�dINðr0; r00Þ ¼ Qðb0; b00Þ ifQ ðb0; b00Þ> 0

bqðm0;m00Þ ifQ ðb0; b00Þ ¼ 0

where r0 ¼ ðb0;m0Þ represents the relation betweenthe tasks of the first case, through a qualitativevalue b0, i.e., one of the Allen’s relations, and aquantitative value m0, i.e., the distance betweenthe end time of the first task and the beginning ofthe second one; and where r00 ¼ ðb00;m00Þ representsanalogously the relation between the tasks of thesecond case. b is a weight for the quantitativecomponent within the function.

Moreover, the distance function dIN has the fol-lowing properties:

Property 7 (Reflexivity). Given S0 2ABIN andr0i; j 2R

0, then dINðr0i; j; r0i; jÞ ¼ 0.

Property 8. Given S0;S00 2ABIN and r0i; j 2R0, r00i; j 2R

00

then dINðr0i;; r00i; jÞ ¼ dINðr00i; j; r0i; jÞ:

The proof of Property 8 is trivial since both Q and q,are reflexive and symmetric.

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

4.5. Overall distance: dABIN

The distance function dABIN evaluates the overalldistance (inter-task and intra-task) between twoABIN networks as follows:

Definition 7. Given S0 ¼ hT 0;R0;D0ðT 0Þi;S00 ¼hT 00;R00;D00ðT 00Þi, the function dABIN : ABIN�ABIN!½0; 1� is:� If jT 0 \T 00j 6¼ 0 (and, thus, jR0 \R00j 6¼ 0), wherejT 0 \T 00j is the number of common tasks, andjR0 \R00j is the number of common (even derived)relations:

dABINðS0;S00Þ ¼ d

Pt0 2 T 0;t00 2T 00 dinðt0; t00ÞjT 0 \T 00j

þ ð1

� dÞP

r0 2R0;r00 2 R00 dINðr0; r00ÞjR0 \R00j

where t0 and t00 stand for intervals of correspond-ing tasks, while r0 and r00 stand for relationsbetween corresponding tasks.

� If jT 0 \T 00j ¼ 0 (and, thus, jR0 \R00j ¼ 0) thendABIN ¼ 1.

The function dABIN has the following properties:

Property 9 (Reflexivity). Given S0 ¼ hT 0;R0;D0ðT 0Þi, then dABINðS0;S0Þ ¼ 0 due to dinðt; tÞ ¼ 0and dINðr; rÞ ¼ 0 8 t2T 0; r 2R0.

Property 10. Given S0;S00, then dABINðS0;S00Þ ¼dABINðS00;S0Þ.

The proof of Property 10 is trivial sincedinðt0; t00Þ ¼ dinðt00; t0Þ and dINðr0; r00Þ ¼ dINðr00; r0Þ foreach corresponding t0, t00 and r0, r00 wheret0 2T 0; r0 2R0 and t00 2T 00; r00 2R00.

4.6. Similarity measure: SABIN

In general, the similarity is inversely proportional tothe distance between elements (i.e., tasks andtemporal relations), to be compared. Until now,we have defined the concept of distance betweencorresponding elements. In our case, when definingthe overall similarity between two clinical workflowcases, we have to consider also the presence of tasksin one workflow case without corresponding tasks inthe other workflow case. Such a presence of non-

similarity measures for querying clinical workflows. Artif

14 C. Combi et al.

ARTMED-1026; No of Pages 18

Table 3 Intra-task distance between CASE_1 andCASE_4

CASE_4 din

T1 0T2 0T3 0T4 0T5 0T6 0T7 0P

din 0

Table 1 Intra-task distance between CASE_1 andCASE_2

CASE_2 din

T1 0T2 0.20T3 0T4 0T5 0.25T6 0T7 0T9 0P

din 0.45

Table 2 Intra-task distance between CASE_1 andCASE_3

CASE_3 din

T1 0.5T2 0P

din 0.5

corresponding tasks is suitably represented in theoverall similarity function we will define for clinicalworkflow cases, by the function p.

The function p is defined as follows:

Definition 8. Given two workflow cases CASE0 andCASE00, represented by sets T 0, T 00 of tasks in thecorresponding ABINs, the function p measures thepresence of non-common nodes as follows:

pðT 0;T 00Þ ¼ jðT0 [T 00ÞnðT 0 \T 00ÞjjT 0j þ jT 00j

Moreover, pðT 0;T 0Þ ¼ 0 and pðT 0;T 00Þ ¼ pðT 00;T 0Þbecause of the commutative property of the inter-section and union operators of sets.

Once function p is defined, we define the overallsimilarity measure using functions dABIN and p.

Definition 9. Given two cases CASE0 and CASE00,represented by S0 ¼ hT 0;R0;D0ðT 0Þi andS00 ¼ hT 00;R00;D00ðT 00Þi, the overall similarity isdefined as:

SABIN : C� C : !½0; 1�SABINðCASE0;CASE00Þ ¼ 1� ðgdABINðS0;S00Þ

þð1� gÞ pðT 0;T 00ÞÞwhere g 2 ½0; 1� is the distance weight and S0;S00 arethe ABIN networks that represents cases CASE0 andCASE00, respectively.

The similarity function SABIN has the followingproperties:

Property 11 (Reflexivity). Given CASE0 2C,SABINðCASE0;CASE0Þ ¼ 1, because dABINðS0;S0Þ ¼ 0and pðABIN0;ABIN0Þ ¼ 0.

Property 12. Given CASE0;CASE00 2C, SABINðCASE0;CASE00Þ ¼ SABINðCASE00;CASE0Þ.

The proof of Property 12 is trivial sincedABINðS0;S00Þ ¼ dABINðS00;S0Þ and pðABIN0;ABIN00Þ ¼pðABIN00;ABIN0Þ.

4.7. An overall example of similarityevaluation

In this section, we show how to evaluate similarityamong cases reported in Fig. 2. We consider theCASE_1as a reference case of the clinical workflow,and thus, given the workflow cases of differentpatients (CASE_2, CASE_3, CASE_4, and

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

CASE_5), we could be interested to state whichof them is closer to the workflow reference case.

The similarity evaluation starts from the compu-tation of the intra-task distance between CASE_1-and all the other cases. To this end, we quantify thedistance between the performed task of each casewith respect to the reference case. Tables 1—4show the intra-task distance values (din) betweenCASE_1 and the rest of cases, by fixing a ¼ 1.

Once we have compared the local properties ofthe performed tasks (intra-task), the next step is tomeasure the difference between the order in whichtasks were performed in the reference case CASE_1and the order in the other cases. Let us supposethat, in this medical scenario, the relative order oftasks with respect to the other ones is more relevantthan the quantitative aspects. For instance, the factthat the acquisition of blood sample for testing wasdone before the ECG test is more significant than thefact that between them there is a delay of 5 min.Therefore, parameter b must be near 0. Tables 5—8

similarity measures for querying clinical workflows. Artif

Temporal similarity measures for querying clinical workflows 15

ARTMED-1026; No of Pages 18

Table 4 Intra-task distance between CASE_1 andCASE_5

CASE_5 din

T1 0.5T2 0T3 0T4 0T5 0.14286T6 0T7 0T9 0P

din 0.64286

Table 6 Inter-interval distance between CASE_1 andCASE_3

CASE_3 r1;2 r2;1

Q 0.66667 0.66667q — —dIN 0.66667 0.66667

Table 5 Inter-interval distance between CASE_1 andCASE_2

CASE_2 r1;2 r2;3 r3;4 r4;5 r5;6 r6;7 r7;9

Q 0 0 0.16666 0 0 0 0q 0 0 — 0.09090 0.00893 0 0.00509dIN 0 0 0.16666 0.00909 0.00089 0 0.00051

Table 7 Inter-interval distance between CASE_1 andCASE_4

CASE_4 r1;2 r1;3 r2;3 r2;1 r3;4 r4;5 r5;6 r6;7

Q 1 0 0 1 0 0 0 0q — 0.66667 0.25 — 0 0 0 0dIN 1 0.06667 0.025 1 0 0 0 0

Table 8 Inter-interval distance between CASE_1 andCASE_5

CASE_5 r1;2 r2;3 r3;4 r4;5 r5;6 r6;7 r7;9

Q 0.33333 0 0 0 0 0 0q — 0 0 0 0.00444 0 0dIN 0.33333 0 0 0 0.00044 0 0

show the inter-interval distance values of the con-sidered example between CASE_1 and the othercases, by fixing b ¼ 0:1.

In Tables 5—8, ri;k stands for corresponding rela-tions (for example, r1;2 stands for the relationbetween tasks T1 and T2).

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

Once we have computed din and dIN, we can easilyobtain a similarity measure between the referencecase (CASE_1) and the other cases. Let us assumetwo arbitrary values g ¼ 0:5 and d ¼ 0:5; the simi-larity between CASE_1 and cases CASE_2,CASE_3, CASE_4, and CASE_5 can be calculatedas reported in the following.

The similarity measure between CASE_1 andCASE_2 is:

dABINðCASE 1;CASE 2Þ ¼ 0:50:45

8þ 0:5

0:17714

7

¼ 0:04078

pðCASE 1;CASE 2Þ

¼

jfT1;T2;T3;T4;T5;T6;T7;T9gnT1;T2;T3;T4;T5;T6;T7;T9gj

j8j þ j8j ¼ 0

16¼ 0

SABINðCASE 1;CASE 2Þ¼ 1� ð0:5� dABINðCASE 1;CASE 2Þ þ 0:5

� pðCASE 1;CASE 2ÞÞ¼ 1� ð0:5� 0:04078 þ 0:5� 0Þ ¼ 0:97961

The similarity measure between CASE_1 andCASE_3 is:

dABINðCASE 1;CASE 3Þ ¼ 0:50:5

2þ 0:5

1:33334

2

¼ 0:45834

pðCASE 1;CASE 3Þ

¼ jfT1;T2;T3;T4;T5;T6;T7;T9gnfT1;T2gjj8j þ j2j

¼ 6

10¼ 0:6

SABINðCASE 1;CASE 3Þ¼ 1� ð0:5� dABINðCASE 1;CASE 3Þ

þ0:5� pðCASE 1;CASE 3ÞÞ¼ 1� ð0:5� 0:45834 þ 0:5� 0:6Þ ¼ 0:47083

The similarity measure between CASE_1 andCASE_4 is:

dABINðCASE 1;CASE 4Þ ¼ 0:50

7þ 0:5

2:09167

8

¼ 0:13073

pðCASE 1;CASE 4Þ ¼ 2

16¼ 0:125

similarity measures for querying clinical workflows. Artif

16 C. Combi et al.

ARTMED-1026; No of Pages 18

SABINðCASE 1;CASE 4Þ¼ 1� ð0:5� dABINðCASE 1;CASE 4Þ þ 0:5

� pðCASE 1;CASE 4ÞÞ¼ 1� ð0:5� 0:12656 þ 0:5� 0:125Þ ¼ 0:87422

The similarity measure between CASE_1 andCASE_5 is:

dABINðCASE 1;CASE 5Þ ¼ 0:50:64286

8þ 0:5

0:33377

7

¼ 0:06402

pðCASE 1;CASE 5Þ ¼ 1

17¼ 0:05882

SABINðCASE 1;CASE 5Þ¼ 1� ð0:5� dðCASE 1;CASE 5Þ þ 0:5

� pðCASE 1;CASE 5ÞÞ¼ 1� ð0:5� 0:06402 þ 0:5� 0:05882Þ ¼ 0:93858

We can conclude, therefore, that CASE_2 is theclosest case to CASE_1 since it has obtained thehighest value of similarity (0.97961). This resultseems reasonable since both workflow cases fol-lowed the same path of the workflow schema andhave a similar performance. Other observationscould be derived from this example. For instance,the second most similar case is CASE_5 because,unlike CASE_4, its performed tasks have practicallythe same order than CASE_1. Finally, CASE_3 is theless similar case since it followed radically differentpaths of the workflow schema.

5. Discussion and conclusions

This work deals with the representation of clinicalprocesses by using temporally extended workflowmodeling techniques. In particular, we propose anapproach to evaluate the temporal similaritybetween workflow cases representing differentapplications of the same guideline. The similaritymeasure proposed in this paper provides a simplebut powerful way to compare workflow cases by theuse of temporal constraint networks, providingexplicit temporal information about interval dis-tances. Therefore, we can propose similarity mea-sures based on temporal relations of temporalconstraint networks.

Related proposals in the literature deal withevent sequences [12—14], or interval sequences,as in the case of the quantitative approach proposedin [19]. The authors state a direct comparisonbetween intervals of two sequences by the overall

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

sum of the offsets of intervals (comparing the startand end components). Unlike our proposal, thesesequence similarity approaches do not consider (orconsider partially) the relative order position ofintervals within the overall sequence. In [24], atemporal similarity measure between time depen-dency patterns for clinical pathways was proposedfor the stroke management domain. The describedsimilarity function measures only the cardinality ofcommon edges between time dependency net-works. In our proposals, the similarity measurecovers this aspect and also considers the presenceand duration of tasks represented as nodes.

We present the similarity function as a linearcombination of functions to differentiate intra-taskdistance (matching and comparing individually taskintervals) and inter-task distance (considering rela-tive positions with respect to other task intervals).These formulae can be easily configured by only fourparameters (a, b, g, and d), allowing a flexible wayto set up the quantitative and qualitative aspectsindependently.

Parameters a and b represent relevance of qua-litative and quantitative aspects, respectively. Inparticular, a states the importance of durationswhen considering corresponding tasks at theexpense of other atemporal factors. The parameterb indicates the relevance of the quantitative simi-larity, when we have the same qualitative relationbetween corresponding tasks. For instance, in somemedical situations the fact that one symptom ispresent before another one could be more relevantthan the exact number of minutes or hours betweenthem (or vice versa). Parameters d and g regard amore general criterion. The parameter d stateswhether the direct comparison between intervals(durations) is more relevant than the order withinthe task interval sequence of the case. For instance,let us suppose d ¼ 1: thus, our proposal is similar tothose proposals in the literature that only considerthe direct comparison of corresponding task inter-vals to measure the similarity. Finally, the para-meter g is used to weight the fact that the twoconsidered cases have non-matching tasks.

In this sense, the use of weights is a strategybroadly used in many similarity-based approaches,such as in case-based reasoning or informationretrieval [30,31]. This strategy provides the cap-ability to modify the function behaviour dependingon the knowledge domain. The main disadvantage isthat the weight configuration demands a knowledgeacquisition process; especially the direct weightassignment by physicians is very costly in terms ofresources and time. This work focuses on the defini-tion of similarity measures, and weight settingsexperiments in a concrete domain have not been

similarity measures for querying clinical workflows. Artif

Temporal similarity measures for querying clinical workflows 17

ARTMED-1026; No of Pages 18

carried out. Tuning this parameters in real clinicalsettings deserves further investigation and a stronginvolvement of clinicians.

Another relevant aspect of our approach is thepotential capability of managing and inferring tem-poral knowledge from the ABIN network. Moreover,the use of temporal constraint networks also pro-vides a flexible representation for evaluating thesimilarity of uncompleted and imprecise descrip-tions of a temporal scenario (e.g., when it is notpossible to obtain a crisp duration of tasks in aworkflow case). In [20], temporal constraint net-works represent clinical scenarios and the consis-tency of the fusion of networks (incompatible,compatible, or satisfactory) is used for a qualitativesimilarity evaluation. In this sense, our proposal alsocovers this aspect, considering the absence of sometasks in the scenario.

In our proposal, the ABIN network is reduced intoa time point constraint network when consistencyand inferring capabilities are required. The transla-tion of intervals into points (beginning and endingpoints of the intervals) is a classical approach toreduce the algorithmic complexity of many intervalproblems [21]. Therefore, for each pair of intervalsthat must be compared, four time points must beconsidered. This fact implies a sensible increment ofthe number of relations in the time point constraintnetwork. However, using this approach we are ableto reuse a wealth of algorithms to check consistencyand derive the minimal network in a polynomialorder [28].

The ABIN network only allows a reduced numberof interval relations between its elements in orderto deal with a tractable interval problem. DespiteABIN network is less expressive than the whole IA, itis expressive enough to describe any kind of work-flow case (sequence of crisp intervals) for similaritymeasure purposes.

It is worth mentioning that in those medicalsituations, far from our case of study (workflows),where the interval of cases includes vague infor-mation, the ABIN network is also expressiveenough to represent fuzziness by the use of impre-cise values of the metric (unary and binary) rela-tions. For instance, the ABIN network could beused to represent a generic path, performed by aworkflow case (i.e., subsets of the workflowschema without branches). An immediate practi-cal application of this expressiveness is the repre-sentation of both workflow cases and genericpaths for similarity purposes, which is part ofour future research.

The main focus of this paper is related to someimportant methodological issues, which need to besuitably solved before the design and implementa-

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

tion of clinical systems for managing and retrievingclinical process data. This proposal is a basic steptowards the use of a formal approach to evaluatesimilarity between workflow cases representingclinical processes. This means that real data comingfrom clinical domains are needed to apply thisproposal to a concrete medical domain such asthe one described in the SPREAD guideline. Otherfuture work will focus on the description of specifictemporal constraint network models to obtain moreefficient similarity functions and their evaluation inconcrete medical domains.

Acknowledgements

This work has been partially supported by contribu-tions from the Spanish MEC under the FPU NationalPlan (Grant Ref. AP2003-4476), the National ProjectTIN2006-15460-C04-01 and the PETRI projectPET2006-406, and the Department of ComputerScience of the University of Verona.

References

[1] Adlassnig KP, Combi C, Das AK, Keravnou ET, Pozzi G. Tem-poral representation and reasoning in medicine: researchdirections and challenges. Artif Intell Med 2006;38(2):101—13.

[2] Combi C, Gozzi M, Juarez JM, Oliboni B, Pozzi G. Conceptualmodeling of temporal clinical workflows. In: Goranko V,Wang XS, editors. Proceedings of the 14th InternationalSymposium on Temporal Representation and Reasoning(TIME 2007). Los Alamitos: IEEE Computer Society; 2007 .p. 70—81.

[3] Panzarasa S, Stefanelli M. Workflow management systemsfor guideline implementation. Neurol Sci 2006;27:245—9.

[4] Peleg M, Tu S, Bury J, Ciccarese P, Fox J, Greenes RA.Comparing computer-interpretable guideline models: acase-study approach. J Am Med Inform Assoc: JAMIA2003;10(1).

[5] Quaglini S, Ciccarese P. Models for guideline representation.Neurol Sci 2006;27.

[6] Combi C, Gozzi M, Juarez JM, Marın R, Oliboni B. Queryingclinical workflows by temporal similarity. In: Bellazzi R, Abu-Hanna A, Hunter J, editors. Artificial Intelligence in Medi-cine, 11th Conference on Artificial Intelligence in Medicine,AIME 2007, Proceedings, volume 4594 of Lecture Notes inComputer Science. Berlin: Springer; 2007. p. 469—78.

[7] The Stroke Prevention and Educational Awareness Diffusion(SPREAD) Collaboration. The Italian guidelines for strokeprevention. Neurol Sci 2000;21.

[8] Workflow Management Coalition and David Hollingsworth.The workflow reference model. http://www.wfmc.org/standards/framework.htm(Accessed 22 July 2008); 1995.

[9] Panzarasa S, Quaglini S, Micieli G, Marcheselli S, PessinaM, Pernice C. Improving compliance to guidelines throughworkflow technology: implementation and results in astroke unit. In: Leong TY, editor. Proceedings of the12th World Congress on Health (Medical) Informatics;

similarity measures for querying clinical workflows. Artif

18 C. Combi et al.

ARTMED-1026; No of Pages 18

Building Sustainable Health Systems. Amsterdam: IOSPress; 2007 . p. 834—9.

[10] Keogh E, Chakrabarti K, Pazzani M, Mehrotra S. Dimension-ality reduction for fast similarity search in large time seriesdatabases. Knowl Inf Syst 2001;3(3):263—86.

[11] Roddick JF, Spiliopoulou M. A survey of temporal knowledgediscovery paradigms and methods. Knowl Data Eng2002;14(4):750—67.

[12] Agrawal R, Faloutsos C, Swami AN. Efficient similarity searchin sequence databases. In: Lomet D, editor. Proceedings ofthe 4th International Conference of Foundations of DataOrganization and Algorithms (FODO). Berlin: Springer; 1993.p. 69—84.

[13] Kahveci T, Singh AK, Gurel A. Similarity searching for multi-attribute sequences. In: Proceedings of the 14th Interna-tional Conference on Scientific and Statistical DatabaseManagement (SSDBM02). Los Alamitos: IEEE ComputerSociety; 2002. p. 175.

[14] Mannila H, Moen P. Similarity between event types insequences. In: Mohania MK, Tjoa AM, editors. DaWaK’99:Proceedings of the First International Conference on DataWarehousing and Knowledge Discovery, volume 1676 ofLecture Notes in Computer Science. Berlin: Springer; 1999.p. 271—80.

[15] Chittaro L, Combi C. Visualizing queries on databases oftemporal histories: new metaphors and their evaluation.Data Knowl Eng 2003;44(2):239—64.

[16] Shahar Y, Musen MA. Knowledge-based temporal abstractionin clinical domains. Artif Intell Med 1996;8(3):267—98.

[17] Hoppner F, Klawonn F. Finding informative rules in intervalsequences. Intell Data Anal 2002;6(3):237—55.

[18] Villafane R, Hua KA, Tran D, Maulik B. Knowledge discoveryfrom series of interval events. J Intell Inf Syst 2000;15(1):71—89.

[19] Yi BK, Roh JW. Similarity search for interval timesequences. In: Lee YJ, Li J, Whang KY, Lee D, editors.Database Systems for Advances Applications, 9th Interna-

Please cite this article in press as: Combi C, et al. TemporalIntell Med (2008), doi:10.1016/j.artmed.2008.07.013

tional Conference, DASFAA 2004, Proceedings, volume2973 of Lecture Notes in Computer Science. Berlin:Springer; 2004 . p. 232—43.

[20] Dojat M, Ramaux N, Fontaine D. Scenario recognition fortemporal reasoning in medical domains. Artif Intell Med1998;14(1—2):139—55.

[21] Vilain M, Kautz H. Constraint propagation algorithms fortemporal reasoning. In: Proceedings of the 5th NationalConference on Artificial Intelligence (AAAI-86). San Fran-cisco: Morgan Kaufmann; 1986. p. 377—82.

[22] Allen JF. Maintaining knowledge about temporal intervals.Commun ACM 1983;26(11):832—43.

[23] Pujari AK, Sattar A. A new framework for reasoning aboutpoints, intervals and durations. In: Dean T, editor. Proceed-ings of the Sixteenth International Joint Conference onArtificial Intelligence, IJCAI 99. San Francisco: Morgan Kauf-mann; 1999. p. 1259—67.

[24] Lin F, Chou S, Pan S, Chen Y. Mining time dependency patternsin clinical pathways. Int J Med Inform 2001;62: 11—25.

[25] Combi C, Pozzi G. Architectures for a temporal workflowmanagement system. In: Haddad H, Omicini A, WainwrightRL, Liebrock LM, editors. Proceedings of the 2004 ACM Sym-posium on Applied Computing (SAC). New York: ACM; 2004. p.659—66.

[26] van Beek P, Cohen R. Exact and approximate reasoning abouttemporal relations. Comput Intell 1990;6(3):132—44.

[27] Nebel B, Burckert HJ. Reasoning about temporal relations: amaximal tractable subclass of Allen’s interval algebra. J ACM1995;42(1):43—66.

[28] Dechter R, Meiri I, Pearl J. Temporal constraint networks.Artif Intell 1991;49(1—3):61—95.

[29] Freksa C. Temporal reasoning based on semi-intervals. ArtifIntell 1992;54(1):199—227.

[30] Finnie G, Sun Z. Similarity and metrics in case-based reason-ing. Int J Intell Syst 2002;17:273—87.

[31] Pal SK, Shiu SCK. Foundations of soft case-based reasoning.San Francisco: John Wiley & Sons; 2004.

similarity measures for querying clinical workflows. Artif