web-based workflow planning platform supporting the design and execution of complex multiscale...

8
824 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY2014 Web-Based Workflow Planning Platform Supporting the Design and Execution of Complex Multiscale Cancer Models Vangelis Sakkalis, Stelios Sfakianakis, Eleftheria Tzamali, Kostas Marias, Member, IEEE, Georgios Stamatakos, Member, IEEE, Fay Misichroni, Eleftherios Ouzounoglou, Student Member, IEEE, Eleni Kolokotroni, Dimitra Dionysiou, David Johnson, Steve McKeever, and Norbert Graf Abstract—Significant Virtual Physiological Human efforts and projects have been concerned with cancer modeling, especially in the European Commission Seventh Framework research program, with the ambitious goal to approach personalized cancer simu- lation based on patient-specific data and thereby optimize ther- apy decisions in the clinical setting. However, building realistic in silico predictive models targeting the clinical practice requires interactive, synergetic approaches to integrate the currently frag- mented efforts emanating from the systems biology and compu- tational oncology communities all around the globe. To further this goal, we propose an intelligent graphical workflow planning system that exploits the multiscale and modular nature of cancer and allows building complex cancer models by intuitively link- ing/interchanging highly specialized models. The system adopts and extends current standardization efforts, key tools, and infras- tructure in view of building a pool of reliable and reproducible models capable of improving current therapies and demonstrating the potential for clinical translation of these technologies. Index Terms—Cancer systems biology, clinical translation, com- putational oncology, personalized medicine, scientific workflows. I. INTRODUCTION T HE extreme complexity of the natural phenomenon of cancer in conjunction with the prevalence of the disease Manuscript received April 30, 2013; revised September 4, 2013 and Novem- ber 15, 2013; accepted December 20, 2013. Date of publication January 2, 2014; date of current version May 1, 2014. This work was supported in part by the European Commission under the Transatlantic Tumor Model Repositories - TU- MOR (FP7-ICT-2009.5.4-247754) and the Computational Horizons In Cancer - CHIC (FP7-ICT-2011.5.2-600841) projects. V. Sakkalis, S. Sfakianakis, E. Tzamali, and K. Marias are with the Insti- tute of Computer Science, Foundation for Research & Technology—Hellas, GR-70013 Heraklion, Greece (e-mail: [email protected]; [email protected]. gr; [email protected]; [email protected]). G. Stamatakos, F. Misichroni, E. Ouzounoglou, E. Kolokotroni, and D. Dionysiou are with the Institute of Communication and Computer Systems, School of Electrical and Computer Engineering, National Technical Univer- sity of Athens, GR-15780 Athens, Greece (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]). D. Johnson is with the Department of Computing, Imperial College London, London, SW7 2AZ, U.K. (e-mail: [email protected]). S. McKeever is with the Department of Informatics and Media, Uppsala Uni- versity, 75120 Uppsala, Sweden (e-mail: [email protected]). N. Graf is with the Department of Pediatric Hematology and On- cology, Saarland University Hospital, 66421 Homburg, Germany (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JBHI.2013.2297167 have dictated the development of highly demanding mathemat- ical and computational cancer models aiming at optimizing the individualized clinical decisions. Already a great diversity of cancer related models exist, focusing on various aspects of this complex phenomenon at different levels [1]. In the past decade, it has become evident that multiscale methods need to be ap- plied to cancer modeling. This is to address the various phases and scales using several levels of biocomplexity [2]. In general, two strategies to model the multiscale cancer phe- nomenon may be identified. The bottom-up approach that fol- lows an inductive synthesis tactic when trying to predict the tumor growth by focusing on linking together the elementary biological components of the underlying mechanisms and the top-down deductive decomposition design that phenotypically models the whole system without specifying in great detail the lower scales in terms of biocomplexity, e.g., molecular scale. Obviously, the second approach is much easier to manipulate and much closer to clinical translation. In the computational oncology domain, microscopic models attempt to describe the individual cell dynamics focusing on the subcellular and cellular levels. On the other hand, the macro- scopic models focus on tissue-level and assume that the solid tumor behavior can be predicted by simulating the behavior of a group of cells and their global interaction with the surround- ing and underlying tissue properties [3]–[6]. In order to produce accurate and reliable models both approaches are equally impor- tant. In other words, one should be able to fine tune macroscopic models using microscopic meaningful parameters. From the mathematical point of view, such approaches to ad- dress the multifaceted cancer phenomenon may be grouped into three main categories; the continuous and discrete methods, as well as the hybrid approaches [7]–[9]. Continuous approaches describe both cancer cell populations and their microenviron- ment (such as nutrients or signaling cues) using continuous variables formulating a system of partial differential equations, whereas discrete approaches describe cells as discrete elements that can change states and evolve in discretized time based on the changing dynamics (ruled by deterministic or probabilistic laws), i.e., cellular automaton models [9] and agent-based mod- els [10]. Hybrid approaches combine the benefits of continuous and discrete mathematics and offer the possibility of integrating phenomena of different time and length scales (from the tissue scale, for example, modeling neovascularization, to intracellu- lar processes such as cell signaling and progression through 2168-2194 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Upload: uoa

Post on 02-May-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

824 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY 2014

Web-Based Workflow Planning Platform Supportingthe Design and Execution of Complex Multiscale

Cancer ModelsVangelis Sakkalis, Stelios Sfakianakis, Eleftheria Tzamali, Kostas Marias, Member, IEEE,

Georgios Stamatakos, Member, IEEE, Fay Misichroni, Eleftherios Ouzounoglou, Student Member, IEEE,Eleni Kolokotroni, Dimitra Dionysiou, David Johnson, Steve McKeever, and Norbert Graf

Abstract—Significant Virtual Physiological Human efforts andprojects have been concerned with cancer modeling, especially inthe European Commission Seventh Framework research program,with the ambitious goal to approach personalized cancer simu-lation based on patient-specific data and thereby optimize ther-apy decisions in the clinical setting. However, building realisticin silico predictive models targeting the clinical practice requiresinteractive, synergetic approaches to integrate the currently frag-mented efforts emanating from the systems biology and compu-tational oncology communities all around the globe. To furtherthis goal, we propose an intelligent graphical workflow planningsystem that exploits the multiscale and modular nature of cancerand allows building complex cancer models by intuitively link-ing/interchanging highly specialized models. The system adoptsand extends current standardization efforts, key tools, and infras-tructure in view of building a pool of reliable and reproduciblemodels capable of improving current therapies and demonstratingthe potential for clinical translation of these technologies.

Index Terms—Cancer systems biology, clinical translation, com-putational oncology, personalized medicine, scientific workflows.

I. INTRODUCTION

THE extreme complexity of the natural phenomenon ofcancer in conjunction with the prevalence of the disease

Manuscript received April 30, 2013; revised September 4, 2013 and Novem-ber 15, 2013; accepted December 20, 2013. Date of publication January 2, 2014;date of current version May 1, 2014. This work was supported in part by theEuropean Commission under the Transatlantic Tumor Model Repositories - TU-MOR (FP7-ICT-2009.5.4-247754) and the Computational Horizons In Cancer- CHIC (FP7-ICT-2011.5.2-600841) projects.

V. Sakkalis, S. Sfakianakis, E. Tzamali, and K. Marias are with the Insti-tute of Computer Science, Foundation for Research & Technology—Hellas,GR-70013 Heraklion, Greece (e-mail: [email protected]; [email protected]; [email protected]; [email protected]).

G. Stamatakos, F. Misichroni, E. Ouzounoglou, E. Kolokotroni, and D.Dionysiou are with the Institute of Communication and Computer Systems,School of Electrical and Computer Engineering, National Technical Univer-sity of Athens, GR-15780 Athens, Greece (e-mail: [email protected];[email protected]; [email protected]; [email protected];[email protected]).

D. Johnson is with the Department of Computing, Imperial College London,London, SW7 2AZ, U.K. (e-mail: [email protected]).

S. McKeever is with the Department of Informatics and Media, Uppsala Uni-versity, 75120 Uppsala, Sweden (e-mail: [email protected]).

N. Graf is with the Department of Pediatric Hematology and On-cology, Saarland University Hospital, 66421 Homburg, Germany (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JBHI.2013.2297167

have dictated the development of highly demanding mathemat-ical and computational cancer models aiming at optimizing theindividualized clinical decisions. Already a great diversity ofcancer related models exist, focusing on various aspects of thiscomplex phenomenon at different levels [1]. In the past decade,it has become evident that multiscale methods need to be ap-plied to cancer modeling. This is to address the various phasesand scales using several levels of biocomplexity [2].

In general, two strategies to model the multiscale cancer phe-nomenon may be identified. The bottom-up approach that fol-lows an inductive synthesis tactic when trying to predict thetumor growth by focusing on linking together the elementarybiological components of the underlying mechanisms and thetop-down deductive decomposition design that phenotypicallymodels the whole system without specifying in great detail thelower scales in terms of biocomplexity, e.g., molecular scale.Obviously, the second approach is much easier to manipulateand much closer to clinical translation.

In the computational oncology domain, microscopic modelsattempt to describe the individual cell dynamics focusing on thesubcellular and cellular levels. On the other hand, the macro-scopic models focus on tissue-level and assume that the solidtumor behavior can be predicted by simulating the behavior ofa group of cells and their global interaction with the surround-ing and underlying tissue properties [3]–[6]. In order to produceaccurate and reliable models both approaches are equally impor-tant. In other words, one should be able to fine tune macroscopicmodels using microscopic meaningful parameters.

From the mathematical point of view, such approaches to ad-dress the multifaceted cancer phenomenon may be grouped intothree main categories; the continuous and discrete methods, aswell as the hybrid approaches [7]–[9]. Continuous approachesdescribe both cancer cell populations and their microenviron-ment (such as nutrients or signaling cues) using continuousvariables formulating a system of partial differential equations,whereas discrete approaches describe cells as discrete elementsthat can change states and evolve in discretized time based onthe changing dynamics (ruled by deterministic or probabilisticlaws), i.e., cellular automaton models [9] and agent-based mod-els [10]. Hybrid approaches combine the benefits of continuousand discrete mathematics and offer the possibility of integratingphenomena of different time and length scales (from the tissuescale, for example, modeling neovascularization, to intracellu-lar processes such as cell signaling and progression through

2168-2194 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

SAKKALIS et al.: WEB-BASED WORKFLOW PLANNING PLATFORM SUPPORTING THE DESIGN AND EXECUTION 825

the cell cycle). These models describe cancer cells as discretevariables and the tumor microenvironment using continuous,reaction–diffusion equations as opposed to the typical discretemodels.

Digging deeper into the mathematical foundations of mostclinically oriented continuous models one has to deal with dif-ferent numerical methods for approximating the solutions toPDEs (e.g., finite differences or finite elements) that involvedifferent assumptions and convergence rates [11].

It is obvious from the above that there is no single gold stan-dard or all-encompassing model that achieves the best possibleresults in all heterogeneous cancer types under study. Whatis most critical to the success of computational oncology, andmore specifically to the success of in silico systems modeling,is to promote the interaction and collaboration among model-ers, experimentalists, clinicians, and other specialists so as todevelop advanced multicompartmental models of cancer devel-opment and response to treatment. Efforts on an internationaland even intercontinental level have already started in the courseof the TUMOR project as a proof of concept and the results arepromising [12].

The systems biology community has been particularly ac-tive in standardizing the way to formulate, store, exchange,and integrate biological models with growing number of com-munity driven initiatives [13] to harmonize the developmentof the various standards and formats in systems biology, e.g.,COMBINE [14]. However, there has not yet been any formalstandardization efforts specifically tailored to the cancer model-ing specific needs, aside from the TumorML language [15] thathas been delivered out of the TUMOR project.

In addition, there are still important problems when dealingwith scale and model linking. In order to translate models inthe clinical setting as decision support tools, we should migratefrom systems biology models to clinically driven models that aremotivated by actual clinical problems and questions. Also, it isnecessary to involve large and diverse communities of scientistsclosely collaborating with clinicians in the model developmentand validation process.

In this paper, we present a web-based scientific workflowplanning platform (see Section IV) designed to support the de-velopment of complex multiscale cancer models aiming towardengaging the wide cancer modeling audience (modelers, compu-tational biologists, and clinicians) and encouraging scientists tocollaborate constructively. The underlying foundation involvesa dedicated model repository, parsing SBML, and TumorMLinformation, as well as executing both SBML and proprietarymodels.

II. MODEL DESCRIPTION STANDARDS

To build the envisioned workflow environment, we had to se-lect existing standards wherever possible and design new onesto cover missing domains. The idea is to facilitate model linkingwith no extra effort to port existing models to a new framework,or reimplementing them, both costly and error prone activities.Hence, the need to fuse disparate models together, in the pre-sented platform, is addressed using the Systems Biology Markup

Language (SBML) to model the biochemical processes at themolecular scales, whereas the higher and more clinically rele-vant scales, specific to cancer modeling, are addressed using thenewly developed TumorML markup language.

A. SBML

Among the numerous standards related to model descriptionat the subcellular level, CellML [16] and SBML [17] are themost widely accepted. Both attempt to describe the structureand underlying mathematics of subcellular models. SBML ismore specific and constrained in exchanging information aboutpathway and reaction models and uses successive hierarchicaldeclarations of model constituents. There is also a wide commu-nity supporting SBML and tools to convert CellML to SBML.We prefer SBML mainly based on its constrained nature, whichallows the language to be adopted quickly and evolve with therequirements of the representation and understanding of systemsbiology.

B. TumorML

The higher scale models enrolled in our environment are de-scribed using TumorML [15], an XML-based markup languagefor describing cancer models. The development of TumorMLcontributes to enabling some of the key interoperability aimswithin the TUMOR project.

First, by annotating cancer models with appropriate documentmetadata, digital curation is facilitated in order to make publish-ing, search, and retrieval of cancer models easier for researchersand clinicians using the TUMOR digital repository. Second,markup will be used to describe abstract interfaces to publishedimplementations allowing execution frameworks to run simula-tions using published models. Finally, TumorML markup facil-itates the composition of compound models, regardless of scaleand source, enabling multiscale models to be developed in amodular fashion, and models from all around the globe may beintegrated with any related models in the TUMOR transatlanticplatform. The TumorML model description will also incorpo-rate and integrate with the MIRIAM guidelines [18] in order toprovide reference correspondence, attribution annotation, andexternal resource semantic annotation to the described models.

III. MODEL EXECUTION

There are two main execution frameworks in the TUMORplatform. The first is based on the SBML description of a modelwhereas the second one is more generic in the sense that a modelcan be provided as a self-contained executable. An SBML de-scription of a model is a declarative artifact. It describes themathematics required, typically in the form of ordinary differ-ential equations (ODEs), to implement the model and nothingelse. In order to implement the model, a solver is required to nu-merically resolve the equations and execute the correspondingreactions based on the kinetic laws and the prescribed parametervalues. This solver can be a simulation environment, a compilerthat links the SBML file with numerical library and generatesa standalone executable or a partial evaluator that attempts to

826 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY 2014

unfold the ODEs with respect to known solving algorithms. Ingeneral, the SBML models can be classified as deterministic orstochastic, with the latter using Monte Carlo simulation and re-lated methods. The TUMOR execution infrastructure supportsdeterministic and stochastic models, through the incorporationof the COPASI simulator [19]. The use of COPASI softwareallows the parsing of SBML models and their execution butnevertheless there are a couple of parameters that need to bespecified prior to the execution:

1) the simulation time for the model;2) the algorithm to be used, e.g., deterministic, stochastic, or

hybrid.These parameters are not specified by SBML but they are

essential in order for the models to produce the desired results.In order to support flexibility, the users can input values for bothparameters at runtime. These parameter values are then passedto the COPASI solver for simulating the models.

In the more generic case, the model is provided with noinformation on its internals. The supplied code, either in binaryor in source format, should be able to be run as a command lineprogram with its inputs and outputs specified either as commandline options or as files. For example, if the execution framework(as in our case) is a Linux 64-bit environment, the suppliedexecutable code should be compliant with it. Of course, in thecase where the source code of the model is available in the formof a scripting language, like Python or Perl, there are fewerrestrictions imposed to the model creators.

Irrespective of the models’ type (SBML or generic/commandline formats), TumorML offers a generic metadata “envelope”to describe both their interface, i.e., input parameters and out-put results, and execution requirements. The interface definitionprovides valuable information for linking models in the work-flow editor, based on the required input and the generated output.On the other hand, the execution information is utilized from theworkflow’s runtime, when the models are simulated or executed.

IV. WORKFLOW DESIGN

Systems biology presents a new way to study biological sys-tems shifting from a “reductionist” approach to a more holisticone [20]. In this new perspective, the complex biological sys-tems are not studied by the isolated analysis of their componentsbut through their investigation as whole integrated systems withdynamic relationships among their parts.

As a first step, we argue that the use of scientific workflowsis a legitimate way to achieve this holistic view of systemsbiology. In general, a workflow can be described as a sequenceof operations or tasks needed to manage a business process or acomputational activity. The latter definition can also be appliedto scientific workflows, which are meant to decompose complexscientific experiments into a series of repetitive computationalsteps that could be run on supercomputers or distributed on acloud system [21].

The proposed new scientific workflow management systemhas been designed and built focusing exactly on the requirementsimposed by the domain users and scenarios. The main objectivesof this new workflow design system are the following.

1) To provide an easy, intuitive, and secure environment forthe design of integrative, predictive, computational mod-els represented as scientific workflows. The activities orsteps in these workflows represent computational modelsin the microscopic or macroscopic level that interact byexchanging information through their adjustable parame-ters.

2) To follow a “Software as a Service” (SaaS) deployment ap-proach. In particular, the system is accessible through theWWW using state-of-the-art web protocols and follows acloud-based architecture in order to alleviate installationand maintenance costs.

3) To support the visual representation of the models and theirsimulation/execution at the workflow runtime by buildingon the TumorML model descriptions.

4) To build upon an extensible architecture where the modelsare stored in potentially disparate model repositories [22].

In terms of its architecture, the TUMOR workflow manage-ment system consists of two components:

1) The workflow editor (or designer), which is a web applica-tion, accessible through the users’ web browser. This is thegraphical front-end for the editing of the workflows, theinvocation of their execution, and the visualization of theresults. A depiction of its interface can be found in Fig. 1.

2) The workflow engine, which is responsible for the man-agement and the execution of the workflows, the commu-nication with the model repositories, etc.

The workflow designer depicts each model as a box with itsabstract interface (inputs and outputs) as little circles attachedto the model (see Fig. 1). The integration of the models intoa scientific workflow is then driven by the user through theintroduction of connecting lines between two model outputs andinputs, in a familiar box-and-arrows diagram. The connectinglines therefore represent “data-flow,” i.e., the flow of data froman output of the source model to an input of the destinationmodel. At the workflow level, inputs of models that are “free”(i.e., not connected) are used as inputs to the whole workflowat the workflow evaluation (execution) phase. Similarly, notconnected outputs of models are used to provide the high levelresults of the workflow execution.

The connections between two models representing flow ofdata and information are not arbitrary but rather constrainedbased on the information that the TumorML descriptions ofthe models provide. In particular, the connected parameters arechecked both at the syntactic and the semantic level. At the syn-tactic level, the workflow designer validates that the parametersto be connected have the same data type, e.g., they both representan integer or a character (string) value. At the semantic level,the designer takes advantage of the semantic, MIRIAM-based,annotation of the parameters in the TumorML descriptions inorder to make sure that they represent the same physiologicalor biological entity. Additional checks include the validationof the units used for the parameters and the range of values.When the user tries to connect two models based on their out-puts and inputs by the familiar “drag-and-drop” operation, theapplication provides information on the matching parameters byhighlighting the corresponding connectors. Therefore, the users

SAKKALIS et al.: WEB-BASED WORKFLOW PLANNING PLATFORM SUPPORTING THE DESIGN AND EXECUTION 827

Fig. 1. Proposed workflow designer represents each model as a box with its abstract interface (inputs and outputs) as little circles attached to the model.

can get an immediate visual indication when a connection be-tween two models is legitimate or not, based on syntactic (e.g.,data type) and the semantic (e.g., units, high level ontologyannotation).

The search and discovery of the models is supported by theworkflow engine, which has been configured to contact a certainlist of model repositories. As noted previously, the model repos-itories need to comply with specific architectural constraints,notably the use of TumorML for describing the models and aset of web service interfaces for querying the models, based onthe TumorML defined metadata, and retrieving their definitions.In addition to this model query and retrieval functionality, theworkflow engine is responsible for the user authentication, thestorage and retrieval of the workflow definitions, and, last but notleast, the execution of the user defined workflows. The executionof the workflow is implemented by first performing a topolog-ical sort of the workflow, since the constructed workflows arein the form of directed acyclic graphs, in order to determinethe proper ordering of the model executions based on their datadependencies (connections). Subsequently, the TumorML de-scriptions of the models are again consulted in order to identifytheir execution requirements, and especially whether they arerealized as SBML or standalone, program-based, models. In thecase of the SBML models, the user is asked to provide addi-tional simulation information, as explained previously, such asthe simulation time and the algorithm to be used. Alternatively,

the system validates that it can execute the standalone, com-mand line program that represents the model. Such validationincludes the check for the execution framework compatibilityof the binary files (e.g., Linux 64-bit), since this is the currentlysupported operating system and machine architecture.

When the user provides the input parameter values for theworkflow and any additional execution information needed, theworkflow engine starts the evaluation of each model based onthe given parameter values and the outputs of the precedingmodels. During the execution of the workflow, the user is ableto “log out” of the application and the execution will continuein a “headless” manner, i.e., running in the background, in theserver’s premises. On the other hand, if the user wants, they caneven monitor the execution of the workflow and have a visualindication of which models are currently running and which areabout to be launched.

The results of the workflow are available after its successfulcompletion along with a detailed listing of all the intermediateresults and files produced. Therefore, an execution trace is pro-duced and kept for future reference in the user’s account in orderto facilitate reproducibility and validation of the workflow.

V. EXEMPLAR CLINICAL SCENARIO

To test the presented framework and better evaluate the out-come, a complex clinically relevant scenario is presented asa test case. The scenario addresses the case of glioblastoma

828 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY 2014

Fig. 2. First compartment of the simulated tumor evolution is the EGFR-related molecular entity, which forms a gene-protein interaction network (greytop box, modified figure from [25], with permission from Elsevier). The PLCγ ,a downstream element of the EGFR pathway is used to constrain the rates ofits corresponding metabolic reactions in the second genome-scale metabolicmodeling compartment (middle box, modified figure reprinted by permissionfrom Macmillan Publishers Ltd: Nat. Rev. Cancer [24], copyright 2004). Themetabolic model estimates the proliferation rate of the glycolytic cancer cellsproviding a microscopic parameter to the tissue-level, macroscopic cytokineticmodel (lower box).

multiforme combined modality treatment using radiation ther-apy and chemotherapy with temozolomide. The anonymizeddata were provided by the Institute of Pathology, UniversityHospital of Saarland, Germany. Schematically, the simulatedmodular tumor is illustrated in Fig. 2. The different modules,which include glioblastoma-specific epidermal growth factor

receptor (EGFR) signaling, cancer metabolism, and the On-cosimulator, are indicated as different colored boxes.

The presented example reflects the multiscale fusion of threeindependently developed cancer models (as depicted in Fig. 1),in an attempt to link microscopic, genotype–phenotype char-acteristics of cancer cells into a macroscopic, tissue-level can-cer model. Reprogramming of signaling, gene regulatory, andmetabolic pathways has been usually observed in cancer cellsaffecting proliferation, migratory response, and other pheno-typic characteristics [23], [24]. Furthermore, these microscopiccharacteristics affect tumor evolution, morphology, invasion andmetastasis, as well as tumor response to treatment.

Although not incorporated in the presented case, it shouldbe stressed out that in a realistic scenario, cells are in a con-stant interaction with their microenvironment, which dynam-ically shapes their molecular pathways and phenotypic prop-erties. Furthermore, tumors usually consist of heterogeneouscell populations with different traits. Therefore, depending onthe structure and variables of the macroscopic model differ-ent instances of the subcellular modeling components (e.g.,EGFR signaling and cancer metabolism) corresponding to alter-native environmental conditions or/and genetic traits could beincorporated.

A. EGFR Signaling Pathway-Based Model

The EGFR has been implicated in several cancers includ-ing lung cancer, breast cancer, and glioblastoma, yet the EGFRactivity itself is not capable of predicting the phenotype of can-cer cells. As shown in Fig. 2 (top box), a microscopic, EGFRgene-protein interaction network-based model has been devel-oped [25]. Given initial concentrations of important moleculesin tumor microenvironment such as glucose, oxygen, and trans-forming growth factor α (TGFα), the model predicts whether thecell proceeds to proliferation or migration. Specifically, whenthe change in PLCγ concentration, an enzyme that lies down-stream of EGFR pathway, is below the migration-threshold, thencells prefer to proliferate than migrate. This key enzyme (de-picted with a red arrow in Fig. 2) can be used to directly linkEGFR signaling and metabolism through its regulatory effecton the rate of the metabolic reactions it catalyzes.

B. Cancer Metabolic Model

Fig. 2 (middle box) shows the metabolic alteration of highlyproliferating cancer cells to inefficient-glycolysis regardless ofwhether oxygen is present (aerobic glycolysis). This metabolicreprogramming can be modeled utilizing genome-scale compu-tational modeling approaches [26]. Based on the work of Shlomiet al. [26], a genome-scale human metabolic network recon-struction consisting of 1496 ORFs, 3742 reactions, and 2766metabolites [27], is used in order to account for the interconnec-tivity of the metabolic reactions. In addition, differentially ex-pressed metabolic genes in glioblastoma multiforme [28], [29]are used as flux constraints in the corresponding metabolic re-actions for the construction of a cancer-specific model. Theconcentration of PLCγ enzyme that is predicted by the EGFR-signaling-based model is also used to constrain the rates of

SAKKALIS et al.: WEB-BASED WORKFLOW PLANNING PLATFORM SUPPORTING THE DESIGN AND EXECUTION 829

the corresponding metabolic reactions. The model predicts thereaction rates of the interconnected metabolic network, thus al-lowing the estimation of important cellular properties such asoxygen and glucose uptake, lactate production, and cellular pro-liferation rate of cancer cells, which can feed the macroscopicmodeling approaches. The metabolic model also shows that theproliferation rate is decreased when the PLCγ-related fluxes in-crease [30], in accordance with the observed tradeoff betweenmigration and proliferation.

C. Oncosimulator: The Macroscopic Tumor Model

In the Oncosimulator [31], the tumor region and surroundingtissue(s) are represented by a 3-D cubic mesh of “geometricalcells” (GCs, the elementary volume of the mesh). The GCs thatbelong to the tumor region (occupied GCs) are assumed to con-tain a population of biological cancer cells. The cancer cellsresiding within each occupied GC are distributed into five cellcategories, i.e., the stem (unlimited mitotic potential), limitedmitotic potential (LIMP), terminally differentiated, apoptotic(cells that have died through apoptosis) and necrotic (cells thathave died through necrosis). Each stem or LIMP cell can be ei-ther proliferating residing in any of the cell cycle phases (G1, S,G2, M) or dormant (G0). The macroscopic model adopts a cy-tokinetic model, which incorporates the biological mechanismsof cell cycling, quiescence, differentiation, and loss (sponta-neous, starvation-induced, and treatment-induced) [32]. The cy-tokinetic model regulates the transition between the consideredcell categories/phases. The morphological rules [32] that governcell movement throughout the tumor volume, aim at a realistic,conformal to the initial shape of the tumor, simulation of expan-sion and shrinkage, in the cases of untreated tumor growth andchemotherapy/radiotherapy treatment, respectively. The prolif-eration rate of cancer cells that is estimated by the previously de-scribed microscopic, signaling-metabolic interconnected mod-els is used as input parameter to the macroscopic tumor model(depicted with a green arrow in Fig. 2). Based on the tumorimaging data, the occupied GCs are defined and the cancer cellpopulations residing therein are initialized by assuming a typ-ical cancer cell density of 106 biological cells/mm3 [32]. Theglioma imaging data module in Fig. 1 performs a set of imageprocessing tasks in order to isolate the tumor region of interestthat will be used as an input to the oncosimulator. This work hasbeen extensively addressed in [33] and [34]. The output of themodel is the time evolution of the various total cell categoriespopulations that comprise the tumor, allowing the evaluationof treatment effectiveness in terms of treatment-induced tumorregression.

D. Workflow Execution for the Exemplar Scenario

The three models are combined together into a scientificworkflow that can be seen in Fig. 1. The connections amongthe models have an exact correspondence with the biologicalinteractions depicted in Fig. 2. The EGFR signaling pathway-based model is realized through SBML, and therefore, it issimulated using COPASI. The cancer metabolic model has beenimplemented as a MATLAB script and executed by a server

Fig. 3. Final output of the scientific workflow combining the EGFR signaling,the cancer metabolic model, and the Oncosimulator. Tumor cells increase untiltherapy is applied where a steep decrease in the population is observed. Theeffect of PLCγ rate on tumor evolution is also depicted. The output of all threemodels is illustrated in the lower trace (PLC-g = 0.4), whereas the output of themetabolic model (unconstrained with respect to PLCγ related reactions) andthe Oncosimulator is illustrated in the upper trace (PLC-g = 0).

installation of the MATLAB engine. Finally, the Oncosimulatoris provided as compiled C++ code that accepts its parametersvia input files or command line options. The output of the On-cosimulator is the final output of the workflow after a smalltransformation step specified as a Unix shell script to producethe evolution of the tumor in an image format.

Fig. 3 shows the effect on tumor evolution of changing a mi-croscopic parameter (that is PLCγ flux constraint in metabolicmodel) that drastically affects cellular proliferation time. Asmentioned previously, PLCγ rate is the outcome of EGFR sig-naling pathway and is used to constrain the corresponding re-actions of the metabolic model. In the first example (depictedwith a dotted red line), the metabolic model is unconstrainedwith respect to PLCγ molecule, contrary to the second example(depicted with a dotted blue line), which shows the combinedoutcome of the EGFR signaling, cancer metabolic model, andthe Oncosimulator.

VI. DISCUSSION

In this paper, we argue in favor of the adoption of the scientificworkflow paradigm for the implementation of complex modelsin the domain of computational biomodeling.

We believe that this is a new application domain for theworkflow methodology, where workflows can prove to be ex-tremely useful. There are already some popular workflow man-agement systems, for the bioinformatics domain. Taverna [35]is probably the most well-known and recently has augmentedits desktop version with a social networking website where theusers can share their Taverna-based workflows [36]. Galaxy isa complete web-based workflow management system that fea-tures a user friendly, intuitive, “drag-and-drop” workflow editing

830 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY 2014

functionality [37]. A detailed comparison of the currently avail-able generic workflow systems may be found in [38].

However, neither of these tools focuses primarily on the VPHcommunity concerned with cancer modeling. Our rationale isbased on designing a tool to integrate models that have been indi-vidually tested and their integration addresses a valid workflow,in the sense that the biomechanisms involved are compatible. Inthe current design, the tool supports linking of models providedby different research groups located worldwide and in order togrant the accuracy of model simulations, the tool currently doesnot allow feedback from one compartment to another allowingparameters (e.g., in the microscopic model) to be re-estimated.

The decision to build a new work flow environment, asopposed to reusing an existing one, was made after a thor-ough evaluation of existing workflow managing systems andthe project requirements of the TUMOR project in regard tothe computational oncology domain. The most important factorfor this decision was the particular architectural considerationsand more specifically the requirements for integrating differ-ent model repositories with dynamic content that is frequentlyupdated [39]. The adoption of a full web-based deploymentapproach was also very important for this decision as realis-tic modeling platforms take even more advantage of cloud andSaaS models of computation.

The proposed workflow planning software is provided as“open source” upon request. More information may be found athttp://tumor-project.eu.

VII. CONCLUSION

Scientific workflows are important infrastructures for the im-plementation of in silico modeling in cancer research. In thispaper, we have described the design and a prototype implemen-tation of a scientific workflow management system to supportresearch in computational oncology. The proposed model link-ing mechanism is expected to simplify the integration of ex-isting cancer models currently developed individually and cu-rated from sole and diverse communities. In the current version,models described in SBML and TumorML may be linked andexecuted. The final system will need of course to be evaluatedand further validated by the research community but in any caseit will be distributed as an open source software and providedas an open access service over the WWW.

REFERENCES

[1] G. Stamatakos, “In silico oncology part I: Clinically oriented cancermultilevel modeling based on discrete event simulation,” in MultiscaleCancer Modeling, T. Deisboeck and G. Stamatakos, Eds. Boca Raton,FL, USA: CRC Press, 2011, pp. 407–436.

[2] T. S. Deisboeck, Z. Wang, P. Macklin, and V. Cristini, “Multiscale cancermodeling,” Annu. Rev. Biomed. Eng., vol. 13, no. 1, pp. 127–155, Aug.2011.

[3] A. Roniotis, V. Sakkalis, I. Karatzanis, M. E. Zervakis, and K. Marias, “In-depth analysis and evaluation of diffusive glioma models,” IEEE Trans.Inf. Technol. Biomed., vol. 16, no. 3, pp. 299–307, May 2012.

[4] V. Sakkalis, A. Roniotis, C. Farmaki, I. Karatzanis, and K. Marias, “Eval-uation framework for the multilevel macroscopic models of solid tumorgrowth in the glioma case,” in Proc. IEEE 32nd Eng. Med. Biol. Soc.Conf., Buenos Aires, Argentina, 2010, pp. 6809–6812.

[5] A. Roniotis, G. C. Manikis, V. Sakkalis, M. E. Zervakis, I. Karatzanis, andK. Marias, “High grade glioma diffusive modeling using statistical tissueinformation and diffusion tensors extracted from atlases,” IEEE Trans.Inf. Technol. Biomed., vol. 16, no. 2, pp. 255–263, Mar. 2011.

[6] P. P. Delsanto, C. A. Condat, N. Pugno, A. S. Gliozzi, and M. Griffa, “Amultilevel approach to cancer growth modeling,” J. Theor. Biol., vol. 250,no. 1, pp. 16–24, Jan. 2007.

[7] L. B. Edelman, J. A. Eddy, and N. D. Price, “In silico models of cancer,”Wiley Interdiscip. Rev. Syst. Biol. Med., vol. 2, no. 4, pp. 438–459, Jul./Aug.2010.

[8] K. A. Rejniak and A. R. Anderson, “Hybrid models of tumor growth,”Wiley Interdiscip. Rev. Syst. Biol. Med., vol. 3, no. 1, pp. 115–125, Jan./Feb.2010.

[9] T. Alarcσn, T. Alarcon, H. M. Byrne, and P. K. Maini, “Towards whole-organ modelling of tumour growth,” Prog. Biophys. Mol. Biol., vol. 85,no. 2–3, pp. 451–472, Jun./Jul. 2004.

[10] Z. Wang, C. M. Birch, J. Sagotsky, and T. S. Deisboeck, “Cross-scale,cross-pathway evaluation using an agent-based non-small cell lung cancermodel,” Bioinformatics, vol. 25, no. 18, pp. 2389–2396, Sep. 2009.

[11] A. Roniotis, K. Marias, V. Sakkalis, G. Stamatakos, and M. Zervakis,“Comparing finite elements and finite differences for developing diffusivemodels of glioma growth,” in Proc. IEEE 32nd Eng. Med. Biol. Soc. Conf.,Buenos Aires, Argentina, 2010, pp. 6797–6800.

[12] V. Sakkalis, S. Sfakianakis, K. Marias, G. Stamatakos, F. Misichroni, D.Dionysiou, S. McKeever, D. Johnson, T. S. Deisboeck, and N. Graf, “TheTUMOR project: integrating cancer model repositories for supporting pre-dictive oncology,” presented at the Abstract Booklet VPH 2012 IntegrativeApproaches Comput. Biomed., London, U.K., 2012.

[13] BioSharing Standards. (2009). [Online]. Available: http://www.biosharing.org/standards.

[14] COMBINE—The Computational Biology Network. [Online]. Available:http://mbine.org/

[15] D. Johnson, S. McKeever, G. Stamatakos, D. Dionysiou, N. Graf, V.Sakkalis, K. Marias, Z. Wang, and T. S. Deisboeck, “Dealing with diversityin computational cancer modeling,” Cancer Inf., vol. 12, pp. 115–124,2013.

[16] C. M. Lloyd, M. D. Halstead, and P. F. Nielsen, “CellML: Its future,present and past,” Prog. Biophys. Mol. Biol., vol. 85, no. 2–3, pp. 433–450, Jun./Jul. 2004.

[17] M. Hucka, F. Bergmann, S. Hoops, S. M. Keating, S. Sahle, andD. J. Wilkinson, “The Systems Biology Markup Language (SBML):Language Specification for Level 3 Version 1,” Nature Precedings,2010, doi:10.1038/npre.2010.4123.1.

[18] N. L. Novere, A. Finney, M. Hucka, U. S. Bhalla, F. Campagne, J. Collado-Vides, E. J. Crampin, M. Halstead, E. Klipp, P. Mendes, P. Nielsen,H. Sauro H, B. Shapiro, J. L. Snoep, H. D. Spence, and B. L. Wan-ner, “Minimum information requested in the annotation of biochemicalmodels (MIRIAM),” Nat. Biotech., vol. 23, no. 12, pp. 1509–1515, Dec.2005.

[19] S. Hoops, S. Sahle, R. Gauges, C. Lee, J. Pahle, N. Simus, M. Sing-hal, L. Xu, P. Mendes, and U. Kummer, “COPASI—A complex pathwaysimulator,” Bioinformatics, vol. 22, no. 24, pp. 3067–3074, Dec. 2006.

[20] A. C. Ahn, M. Tewari, C. Poon, and R. S. Phillips, “The limits of reduc-tionism in medicine: Could systems biology offer an alternative?” PLoSMed., vol. 3, no. 6, p. e208, May 2006.

[21] A. Belloum, E. Deelman, and Z. Zhao, “Scientific Workflows,” ScientificProgramming, vol. 14, no. 3, p. 171, 2006.

[22] S. Sfakianakis, V. Sakkalis, K. Marias, G. Stamatakos, S. McKeever,T. S. Deisboeck, and N. Graf, “An architecture for integrating cancermodel repositories,” in Proc. IEEE 34th Eng. Med. Biol. Soc. Conf., SanDiego, CA, USA, 2012, pp. 6628–6631.

[23] M. G. Vander Heiden, “Understanding the Warburg effect: The metabolicrequirements of cell proliferation,” Science, vol. 324, no. 5940, pp. 1029–1033, May 2009.

[24] R. A. Gatenby and R. J. Gillies, “Why do cancers have high aerobic gly-colysis?” Nat. Rev. Cancer, vol. 4, no. 11, pp. 891–899, Nov. 2004.

[25] C. Athale, Y. Mansury, and T. S. Deisboeck, “Simulating the impact ofa molecular ‘decision-process’ on cellular phenotype and multicellularpatterns in brain tumors,” J. Theor. Biol., vol. 233, no. 4, pp. 469–481,Apr. 2005.

[26] T. Shlomi, T. Benyamini, T. Benyamini, E. Gottlieb, R. Sharan, and E.Ruppin, “Genome-scale metabolic modeling elucidates the role of prolif-erative adaptation in causing the Warburg effect,” PLoS Comput. Biol.,vol. 7, no. 3, p. e1002018, Mar. 2011.

SAKKALIS et al.: WEB-BASED WORKFLOW PLANNING PLATFORM SUPPORTING THE DESIGN AND EXECUTION 831

[27] N. C. Duarte, S. A. Becker, N. Jamshidi, I. Thiele, M. L. Mo, T. D. Vo,R. Srivas, and B. O. Palsson, “Global reconstruction of the humanmetabolic network based on genomic and bibliomic data,” Proc. Nat.Acad. Sci. USA, vol. 104, pp. 1777–1782, Feb. 6, 2007.

[28] A. Wolf, S. Agnihotri, and A. Guha, “Targeting metabolic remodeling inglioblastoma multiforme,” Oncotarget, vol. 1, no. 7, pp. 552–562, Nov.2010.

[29] C. Colin, N. Baeza, C. Bartoli, F. Fina, N. Eudes, I. Nanni, P. M. Martin,L. Ouafik, and D. Figarella-Branger, “Identification of genes differentiallyexpressed in glioblastoma versus pilocytic astrocytoma using suppressionsubtractive hybridization,” Oncogene, vol. 25, no. 19, pp. 2818–2826,May 2006.

[30] E. Tzamali, V. Sakkalis, and K. Marias, “The effects of near optimalgrowth solutions in genome-scale human cancer metabolic model,” inProc. IEEE Int. Conf. BioInformatics BioEng., Larnaca, Cyprus, 2012,pp. 626–631.

[31] G. S. Stamatakos, E. Kolokotroni, D. Dionysiou, C. Veith, Y. J. Kim, A.Franz, K. Marias, J. Sabczynski, R. Bohle, and N. Graf, “In silico oncol-ogy: Exploiting clinical studies to clinically adapt and validate multiscaleoncosimulators,” in Proc. IEEE 35th Annu. Int. Conf. Eng. Med. Biol.Soc., Osaka, Japan, Jul. 3–7, 2013, pp. 5545–5549.

[32] G .S. Stamatakos, E. A. Kolokotroni, D. D. Dionysiou, E. Ch. Georgiadi,and C. Desmedt, “An advanced discrete state-discrete event multiscalesimulation model of the response of a solid tumor to chemotherapy: Mim-icking a clinical study,” J. Theor. Biol., vol. 266, no. 1, pp. 124–139, Sep.2010.

[33] C. Farmaki, K. Marias, V. Sakkalis, and N. Graf, “Spatially adaptiveactive contours: a semi-automatic tumor segmentation framework,” Int. J.Comput. Assist. Radiol. Surg., vol. 5, no. 4, pp. 369–384, 2010.

[34] E. Skounakis, C. Farmaki, V. Sakkalis, A. Roniotis, K. Banitsas, N. Graf,and K. Marias, “DoctorEye: A clinically driven multifunctional platform,for accurate processing of tumors in medical images,” Open Med. Infor-matics J., vol. 4, pp. 105–115, 2010.

[35] T. Oinn, M. Addis, J. Ferris, D. Marvin, T. Carver, M. R. Pocock, and A.Wipat, “Taverna: a tool for the composition and enactment of bioinfor-matics workflows,” Bioinformatics, vol. 20, no. 17, pp. 3045–3054, Jun.2004.

[36] C. A. Goble and D. C. De Roure, “myExperiment: social networking forworkflow-using e-scientists,” in Proc. 2nd Workshop Workflows SupportLarge-Scale Sci., Monterey, CA, USA, 2007, pp. 1–2.

[37] J. Goecks, A. Nekrutenko, J. Taylor, and The Galaxy Team, “Galaxy:A comprehensive approach for supporting accessible, reproducible, andtransparent computational research in the life sciences,” Genome Biol.,vol. 11, no. 8, p. R86, 2010.

[38] M. Abouelhoda, M. Ghanem, and S. Alaa, “Meta-workflows: Pattern-based interoperability between Galaxy and Taverna,” in Proc. 1st Int.Workshop Workflow Approaches New Data-Centric Sci., 2010.

[39] S. Sfakianakis, V. Sakkalis, K. Marias, G. Stamatakos, S. McKeever,T. Deisboeck, and N. Graf, “An architecture for integrating cancer modelrepositories,” in Proc. IEEE 34th Annu. Int. Conf. Eng. Med. Biol. Soc.,San Diego, CA, USA, Aug. 28–Sep. 1, 2012, pp. 6628–6631.

Authors’ photographs and biographies not available at the time of publication.