web service catalogue for biomedical grid...

12
Web service catalogue for Biomedical Grid infrastructure M. Garcia 1, J. Karlsson 2, O. Trelles 1* . 1. Department of Computer Architecture; University of Malaga; Spain. 2. Research Laboratory, Fundacion IMABIS-Carlos Haya Hospital, Spain. Both authors contributed equally to this work * Corresponding author Abstract. A great variety of services have been developed to address problems in the field of biomedicine. The EU project Advancing Clinico-Genomics Trials on Cancer (ACGT - http://www.eu-acgt.org) provides a Grid-based platform for improved medical knowledge discovery and integration of biomedical data in clinical trials on cancer. Metadata describing biomedical services needs to be shared to enable discovery and service composition (as workflows). This paper reports a catalogue for knowledge-based discovery of service metadata and a software module to wrap existing command line programs as a secure Grid service able to handle sensitive information. Keywords. Metadata repository discovery, distributed service integration, Grid, service execution Introduction Significant opportunities for personalized healthcare are possible because of advances in post-genomic research. Personalized health care promises individually adapted therapies, complementing diagnoses by including gene profiling of the patient. However, clinical trials, which constitute the scientific base of such healthcare, need increased informatics support. Such informatics support includes administrative tasks, trial monitoring, data management and data analysis. Considering technological advances in high-throughput sequencing, performing clinical trials could potentially require processing and analysis of massive data sizes. Distributed and advanced parallel computing has been suggested as an effective solution to process such data. Grid computing is, in particular, suitable for these tasks because of the inherent ability to effectively use computational resources. It has, however, become clear that current Grid architectures need to be augmented with advances from the semantic Web. Distributed computing, in the form of service oriented environments (SOA [1]), has become the standard of fact in the bioinformatics field (BioMOBY [2], MyGrid [3] etc.), with a diversity of data types, data formats and analysis tools deployed and freely available. However, this profusion of services causes problems for service discovery and composition (using results of service invocations as input to other services). Syntax and protocol focused on service descriptions such as WSDL [4] are not enough and need to be complemented with semantic information.

Upload: ngodiep

Post on 22-Feb-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Web service catalogue for Biomedical Grid infrastructure

M. Garcia1♦, J. Karlsson2♦, O. Trelles1*. 1. Department of Computer Architecture; University of Malaga; Spain.

2. Research Laboratory, Fundacion IMABIS-Carlos Haya Hospital, Spain. ♦Both authors contributed equally to this work

* Corresponding author

Abstract. A great variety of services have been developed to address problems in the field of biomedicine. The EU project Advancing Clinico-Genomics Trials on Cancer (ACGT - http://www.eu-acgt.org) provides a Grid-based platform for improved medical knowledge discovery and integration of biomedical data in clinical trials on cancer. Metadata describing biomedical services needs to be shared to enable discovery and service composition (as workflows). This paper reports a catalogue for knowledge-based discovery of service metadata and a software module to wrap existing command line programs as a secure Grid service able to handle sensitive information.

Keywords. Metadata repository discovery, distributed service integration, Grid, service execution

Introduction

Significant opportunities for personalized healthcare are possible because of advances in post-genomic research. Personalized health care promises individually adapted therapies, complementing diagnoses by including gene profiling of the patient. However, clinical trials, which constitute the scientific base of such healthcare, need increased informatics support. Such informatics support includes administrative tasks, trial monitoring, data management and data analysis.

Considering technological advances in high-throughput sequencing, performing clinical trials could potentially require processing and analysis of massive data sizes. Distributed and advanced parallel computing has been suggested as an effective solution to process such data. Grid computing is, in particular, suitable for these tasks because of the inherent ability to effectively use computational resources. It has, however, become clear that current Grid architectures need to be augmented with advances from the semantic Web.

Distributed computing, in the form of service oriented environments (SOA [1]), has become the standard of fact in the bioinformatics field (BioMOBY [2], MyGrid [3] etc.), with a diversity of data types, data formats and analysis tools deployed and freely available. However, this profusion of services causes problems for service discovery and composition (using results of service invocations as input to other services). Syntax and protocol focused on service descriptions such as WSDL [4] are not enough and need to be complemented with semantic information.

This paper describes a software component which enables sharing and discovery of high-level service descriptions. Grid services are described as abstractions of software components, providing data analysis capabilities for clinical trials. These service descriptions include metadata regarding service communication protocol in the form of WSDL descriptions, service documentation (free-text) and also include metadata regarding semantic data type for parameters and functional descriptions of services, which in turn can be exploited for automatic service interoperability.

The work reported in this paper is innovative in the following aspects:

• A common metadata schema is used for the different types of services (see section 3.1)

• A software component to easy develop and deploy secure Grid services based on command line programs (see section 2.2)

• To the best of our knowledge, this is the first repository of tools specifically used for biomedical research.

The objective of this paper is to describe the structure and motivate our semantic descriptions and outline how metadata is collected, stored and used in a Grid-based architecture for biomedical research. Utility of our software component is shown by exemplifications of client usage (see section 4).

1. Background

1.1. Sharing and discovering semantic service descriptions

Common approaches to deploy such distributed tools are Web services designed to support interoperable machine-to-machine interaction over a network. An important collateral effect of interoperability is the ability of building workflows, a predefined organized list of Web services to solve complex problems. Tools interfaces must therefore be described in a machine-friendly way. Clients that use these distributed tools need to be able to dynamically discover and use new tools and algorithms. Since the discovery process is supported by tool metadata and this metadata is therefore shared in a public repository, they need to be annotated and registered appropriately in order to be effectively used by searching engines.

The approach of publishing tool metadata in public repositories is not new and there have been several published approaches in other domains than biomedicine. BioMOBY, myGrid and other systems make use of this strategy to implement integration architectures. In the side of Grid environments, Globus framework [5] includes a Monitoring and Discovery System (MDS) consisting of a suite of Web services to integrate new resources and tools in distributed systems, focused to disseminate and gather information of Grids and Virtual Organizations rather than elaborate a complete model schema for these tools.

In a similar context to the ACGT architecture, caGrid [6] provides components for biomedical research by including support to publishing, discovery, access and management of data source and tool metadata. Secure access is also implemented by restricting access to services according to Grid credentials and trust levels. However, the caGrid infrastructure is specifically focused on cancer research and most of their

programmatic support covers the structure and semantic aspects of these types of data. The discovering engines are based on complex queries including input data semantics.

1.2. ACGT Architecture overview

ACGT [7] is an Integrated Project funded in the 6th Framework Program of the European Commission, under the Action Line “Integrated biomedical information for better health”. These actions focuses on the domain of Cancer research with the main objective of design and develop a Grid technology platform, for improved medical knowledge discovery and integration of biomedical data.

To meet these objectives the ACGT team has developed an integrated and Grid-compatible software platform to supports secure and multi-centre clinical trials.

According to a layering design pattern, ACGT architecture is distributed in:

• User access, the highest level layer, including editors, portals, dedicated clients, etc.

• Business process services, covering knowledge discovering, ontology access, data mediation and whatever high level service required by clients.

• Advanced Grid middleware. • Common Grid infrastructure, using Globus toolkit. • Hardware layer, computational resources, networks, databases, etc. The advanced Grid middleware provides services to be used on tools, making

easier the access to the secure environment, data management and execution inside the Grid.

Noteworthy to mention that the entry points for users can be both, a Web Portal or a standalone client. Since a biomedical environment requires a high level of security, it is covered by the Gridge Authorization System (GAS [8]), system which supports all necessary operations related to user credential maintenance. Inside the secure Grid environment, these credentials represents the privileges of each user, and can be delegated to manage user data, supported by the Data Management System (DMS [8]), and to invoke services to perform the analysis (GRMS [8]).

In this architecture, the need is clear for a specific component to standardize the diversity of data types and data formats, facilitating the integration of services and allowing the discovering and invocation of tools.

2. System and methods

2.1. Repository Architecture

We have developed a repository for metadata related to service-oriented architectures in Grid systems. The repository database is connected through the RepoServices API. RepoServices is split in three main sections: tool, functional categories and data types.

Each section is implemented in several layers (see Figure 1):

• Data storage layer: The metadata schema is implemented as MySQL database tables. Scripts have been created that allow the construction of the database instance.

• Persistence layer: Hibernate mappings are used to create persistent Java objects. The properties of the Java objects are mapped to columns in the database tables.

• Web service layer: Axis [9] is the Web service engine and Tomcat [10] is used as the Servlet container of the deployment. This layer implements the methods for management of credentials inside the secure environment.

RepoServices can be accessed locally (persistence layer) or remotely (Web service layer). The repository is deployed in two different instances; one production instance (more stable) and one development instance (more up-to-date) to allow testing before public deployment. Details about the instances, how to install and connect this software are available as supplementary material at http://www.bitlab-es.com/repository.

The ACGT tool metadata repository can be used to perform the following tasks:

• Publish (register) tool metadata by service providers. • Find (discover) tool metadata by service clients. • Provide necessary details for service clients to invoke tools. • Modify existing tool metadata. • Retrieve all tool metadata (for metadata browsing tools).

Figure 1. Repository components’ connectivity. The data storage layer corresponds to DB Server component, Data access layer with object persistence is managed by the Persistence Metadata Manager. The WS Metadata manager exposes the same functionality as in the Persistence Metadata Manager but as Web services interface.

Servlet Container

Servlet Container

Servlet ContainerWebServer

<<artifact>>Tool

Administration

<<artifact>>DataType

Administration

<<artifact>>FunctionalCategory

Administration

DB Server

DB Server

<<component>>Persistence

Metadata Manager3306

<<component>>Persistence

Metadata Manager3306

<<component>>Persistence

Metadata Manager3306

DB Server<<component>>

WS Metadata Manager

8080

<<component>>WS Metadata Manager

8080

<<component>>WS Metadata Manager

8080

<<component>>FunctionalCategory

Repository3306

<<component>>Tool Repository

3306

<<component>>DataType

Repository3306

Hibernate

Hibernate

Hibernate

SOAP

SOAP

JDBC

JDBC

JDBCSOAP

<<use>>

<<use>>

<<use>>

<<use>>

<<use>>

<<use>>

2.2. Integration of command line programs

On the one hand, command line programs (CLP) are the most used style to deploy computational services inside a Grid infrastructure. On the other hand, the standard definition of Web services allows automatic intercommunication between services. To better exploit both, the high number of CLP available and the advantages of standard definitions, we develop a general interface using Axis as SOAP engine and Tomcat as a Servlet container. Once interfaced, the Web services are registered in the repository.

This idea is not new, and other integration developments like Opal [11] or gRAVI [12] offer similar wrapping options for scientific applications. In the case of Opal, job status is not transparent to final users, on the other hand, long data transfers and Publishing must be improved. gRAVI provides much better visual interface and covers a lot of well known possibilities in Grid computing (GridFTP, WSRF, etc), but introducing high complexity in normal tool registering. However, ACGT environment has some requirements (see below) that drive us to the particular method used to integrate CLPs.

2.2.1. Secure access

All the available resources in the Grid (hosts, data, instruments etc.) are accessed by many different users from many different organizations and places. In the low level of the Grid, Globus toolkit provides a Grid Security Infrastructure (GSI) to overcome those issues, providing authentication with digital certificates, credential delegation and transport/message-level security.

ACGT make use of the Gridge Authorization Service (GAS) to support authorization operations in Grid space. GAS is used in our tools to request credentials of users who want to use it. These credentials are used to control the delegation of rights among different components of the architecture.

2.2.2. Using file system

Data Management Suite (DMS) centralizes all user data, providing fast access and management of large amount of data. User credentials obtained with GAS are used to authorize operations over this data. DMS also covers the possibility of manage metadata about data. This metadata can be directly annotated from tools (execution times, author, dates, etc) using different schemas, like Dublin Core or new schemas defined by the user.

2.2.3. Execution

To manage the whole process of remote job submission, our tools use Gridge Resource Management System (GRMS), with a friendly interface to launch, resume and monitoring jobs.

2.2.4. CLP life cycle

Two main steps must be covered to include new CLPs: First, install selected CLP in available servers in the Grid and second register service metadata in ACGT Metadata Repository. Once registered, ACGT environment can use service metadata to discover and execute this new CLP.

Registering. ACGT portal at http://rd.siveco.ro/acgt is the endpoint to register new Tool metadata (see section 3 for more details). Providers have to check data types and functional categories available and if this metadata don’t cover his needs they must add new registers. Parameter information is later used by clients to automatically build-up service interfaces and tool location includes endpoint information very important in execution step.

Figure 2. Simplified UML Sequence Diagram of events. See main text for detailed description; and a full version of this diagram is available as supplementary material.

Discovering. Client applications with the correct credentials may access the

metadata repository to discover tools based on descriptions, input/output data types, functional categories etc. Service discovery is typically performed using the Magallanes [13] application (see section 4.4).

Generic Web service for CLP. This is the default Web service for executing CLPs. The CLPs themselves are registered as abstract tools, each with the endpoint to the actual Web service (the generic Web service). This service has been developed using Axis as SOAP engine and use Tomcat as a Servlet container. Servlet container must compliant with secure access restrictions of ACGT. Currently, this Web service is installed in two servers in the ACGT Grid. The client can launch and control execution using three available operations in the generic Web service: • RunAsync. Launchs the execution in GRMS with the following parameters:

host, relative path, DMS identifiers for inputs, options and metadata for

generated output files (MIME types, data types, Author, etc). It is also important to mention the possibility of using sets of data files as input and output.

• GetJobStatus. This operation returns the current status of execution. • GetResult. Retrieves the DMS identifiers of generated outputs. Collections are

also supported. Figure 2 shows a simplified UML sequence diagram of events between the

software components. Assuming a scenario where user is correctly identified with his credentials, client program gets tools metadata from ACGT Metadata Repository. With this info, client has enough information to build a user interface in the application. Client program requests user data in the interface, and creates runAsync call to generic Web service. In this operation, a jobId is generated to be used in next operations. The generic Web service then retrieves the necessary info of DMS files and creates a Job description to be submitted with GRMS software in the Grid. Client application can monitor the current status of the Job using getJobStatus operation. When status is Finished, client can retrieve results using GetResults operation.

3. Results

3.1. Metadata schema

A well-defined metadata schema is essential to enable tool providers to publish metadata descriptions for their tools and for clients to find (discover) a tool based on metadata. The scope of the tool catalogue includes descriptions of tools and their operations (specific functions of a tool) (see section 3.1.1), data types (see section 3.1.2) and functional categories (see section 3.1.3). In the following subsections, only the main concepts of the schema are described. The complete metadata schema can be found in supplementary material at http://www.bitlab-es.com/repository. Figure 3 shows a simplification of objects used in Persistence metadata layer.

<<component>>Persistence Metadata Manager

FunctionalCategory

FunctionalCategoryManager

...

+retrieve()+save()+delete()

FunctionalCategoryGraph

DataType Manager

+retrieve()+save()+delete()

CompuResource

Tool Manager

+retrieve()+save()+delete()

ServiceLocation

DataTypeGraph

Parameter

Operation

DataType Service

Quality

<<use>>

<<use>>

<<use>><<use>><<use>>

<<use>> <<use>><<use>>

<<use>>

Figure 3. Overview of objects relations in Persistence Metadata Manager, showing only main entities.

3.1.1. Tools

A Tool represents a group of software components that can be used to solve a specific type of problem and acts as a container of operations with closely related functionality. An operation is a software sub-component that solves a specific problem and has several parameters; either input or output. Each parameter is associated with a specific data type (see section 3.1.2).

The metadata for tools includes the author and authority (author’s affiliation). Each tool and operation can be associated with human-readable descriptions (long and short description). The short description is intended for quick browsing purposes and the longer version is intended for users that wish to more carefully study tool/operation documentation.

The tool metadata also specifies which type of tool it is. Currently several types of tools are supported, traditional SOAP services, BioMOBY, BPEL [14] workflows and secure ACGT services (based on command lines programs (see section 2.2) and R package [15]).

Traditional SOAP Web services can include their WSDL descriptions when publishing. The WSDL contains all information needed for the client to bind to the service (such as the protocol specifics and endpoint). For BioMOBY Web services, WSDL is not used to specify the format of the data. In this case, the data type metadata is used to infer the exact format according to the BioMOBY specification.

Workflows are viewed as abstract tools (“black boxes”) which require inputs and produce outputs. Workflow metadata includes additional metadata such as an image representing the workflow and definition (in BPEL format).

For ACGT services based on command line programs execution, the schema includes the command to execute (path). R-Script ACGT services include the R code script which is retrieved by the service and executed on the Grid.

As a proof of concept, we have deployed and registered in the repository 30 services in the field of gene expression data processing and more than 30 general bioinformatics’ services (see Table 1). These tools share the repository with other R-scripts and queries to a mediator developed by other members of the ACGT consortium. This mediator [16] allows querying to other external databases and Ontologies.

Table 1. Summary of current tools registered in the repository. The number of services can vary between development and stable versions of the repository.

Type of Service Services available Scope

BioMOBY services 31 General biomedical and bioinformatics Web services.

Command line programs 32

Pre-processing methods to identify and remove sources of systematic and random variation in the measured gene expression data and exploratory data analysis and visualization.

Mediator queries 15 Predefined queries on Ontology search.

R-Scripts 50 Statistical treatment of data statistical, graphical techniques and data mining methods.

3.1.2. Data types

The data type metadata defines a shared taxonomy of data types. This enables tool composition (combination). The taxonomy follows the object oriented paradigm where data types are related to other data types. Data types can inherit parts from another data

type and add additional structure. Additionally, a new data type may include (contain) or be arrays of other data types.

The interpretation of such relations between data types is domain specific for the service type. For example, BioMOBY Web services would interpret these relations as directly specifying the data format (internal structure of file data). For generic SOAP services, these metadata would only be used to specify a hierarchy of data types without any assumptions of the data formats (which are specified in the WSDL descriptions).

3.1.3. Functional descriptions

Tools can be associated with one or more functional category. A functional category is a keyword that describes the function of the tool. The functional categories can be related to other functional categories to create a taxonomy of keywords. If the keywords are arranged in a hierarchical structure, this makes it possible for clients to discover tools that are annotated with a more generic functional keyword and all inheriting keywords. For example, if the functional category taxonomy consists of the keyword “clustering” and two sub-keywords inheriting from clustering “hierarchical clustering” and “k-means clustering”, searching for a tool with annotation “clustering” would return also tools that are annotated with “hierarchical clustering” and “k-means clustering”.

4. Discussion and conclusions

We have designed a software component that allows service providers to share service descriptions, including semantics. Compatibility between services is facilitated by enforcing the use of shared data type taxonomy.

4.1. Scalability

An important issue for tool catalogues is the ability to curate the registry. It is not difficult to devise two different approaches: (a) free registration and (b) controlled registration. MOBY Central at the University of Calgary [17] -the reference catalogue for BioMOBY services- uses the first strategy. Naturally, they collected a large number of registrations; 1600 services and near 800 data types by January 2009. However, service’s providers are –in general- reluctant to use existing data types, and the general trend is two folded; define a new data type or use the most general one (valid for most cases). As can be observed in Figure 4 close to 50% of services uses the more general “object” data type -in top of the data type hierarchy- and a profusion of data types have been incorporated (it is un-reasonable the need for more that 800 different biological types of data). An extremely important collateral effect of this procedure is that services declared as using a general (e.g. object) data type are unable to be specifically inter-connected to other services; and new data types used only for the declared service has not other compatible services. Thus the main strengths of data type taxonomy become underused.

Learning from this experience, we devise a two steps approach. In the first step, the service provider suggests those data types that better fit their service or propose the definition of a new data type (even a branch of the taxonomy). An internal committee

decides, in a second step, the appropriateness of the incorporation or suggests alternative solutions. This strategy is more expensive and stressing, makes longer the registration process, limits service providers flexibility but offers a powerful, standard and coherent set of definitions. The final results are a faster composition and more alternatives for complex workflows orchestration.

ACGT repository implements the controlled registration procedure towards good practices for further enhancements in interoperability.

Figure 4. Number of services using the different data types (Moby Central repository). Near 50% of services uses the generic “object” data type and a large profusion of services associated with specific data types that are unable to be interconnected with other services for further data processing.

Noteworthy to that data types approach proposed in this paper is more a taxonomy or hierarchy organization rather than Ontology. ACGT platform gives support to Cancer Ontology terms [18] which can be exploited making use of other mediators.

4.2. Compatibility with standard Web service descriptions

The metadata schema we developed is strongly influenced by the BioMOBY metadata. However, one requirement during development was the ability to modularize the metadata; to store the metadata on different servers and to be able to combine the metadata according to the needs of a specific project. For example, some projects might need only data types; some might need data types and services etc. Therefore, we extended the BioMOBY metadata with additional documentation details and split the metadata schema in several modules with various degrees of independence.

The default way to describe SOAP services is by WSDL descriptions. However, WSDL files are normally used to describe a single service while our repository is intended to store a larger catalogue of services (and other types of tools). Concepts in our metadata schema correspond to WSDL artifacts: Tools can be said to be similar to the WSDL document itself (both are wrappers of specific functionalities). Operation is an abstract concept (specific functionality) and has a similar role as WSDL port-types. Parameters in our schema are connected to an operation and a data type, so they match WSDL messages. Tool location metadata is used for a similar purpose as WSDL bindings. Data types in the ACGT repository correspond to WSDL types (XML Schema).

However, our metadata descriptions also extend the information from WSDL with:

• A shared data type taxonomy which allow clients to dynamically discover compatible tools where the output of one tool can be used as input for another tool which use the same or a derived data type (via inheritance).

• Functional categories which describe the functionality of a tool. This is useful if the client wishes to discover only services that perform a certain task.

The situation regarding data types is more complex. WSDL traditionally uses XML schema to describe the structure of the XML data for inputs and outputs of a service. XML Schema is a more expressive data description format compared to the approach in the meta-data schema. However, we believe it is very useful to maintain a shared hierarchy of data types to more easily answer discovery (find) queries such as “show all tools that have operations with data type X as part of the input”. In this sense, the data type taxonomy can be said to describe the semantic of input and outputs parameters instead of specifying exact data formats (except for BioMOBY services, see section 3.1.2).

4.3. Coverage

Note that the ACGT architecture (see section 2.2) includes many other components which also deal with metadata and thereby complements the data stored in the tool catalogue. For example, Virtual Organizations (VO) contains metadata regarding a set of members (correspond to users). Rights to use data or services are decided on the basis of VO membership of users. The Data Management System (DMS) in the ACGT architecture stores the data files and is able to annotate the files with metadata (for example simple provenance metadata).

4.4. Demonstration of suitability (usage and future work)

As a demonstration of the suitability of the catalogue, we describe several components and clients that rely on the catalogue metadata implemented in the context of ACGT Grid architecture.

The ACGT portal presents the ACGT system to end users. Several portlets (specific type of Web pages) have been developed to allow registry administrators to manage tool metadata. These pages provide a graphical interface to the metadata in the tool catalogue. Additionally, there are portlets that allows users to explore the tools, organized according to their functional categories and the data type taxonomies. Additionally, a novel service discovery component (Magallanes) has been integrated as a portlet. This component allows users to supply keywords which are matched (exactly or by approximation) to service and data type descriptions. The component is also able to learn from user selections which are used to continuously improve quality of the search results.

The workflow editor and enactor (AWE [19]) connects to the repository for retrieving tool metadata. AWE visualizes tool metadata in a browsable tree. Users can select tools from the tree and combine in a workflow. The editor uses metadata for the operations and parameters to verify the consistency of the workflow. When the workflow is saved, AWE collects the required information (such as the WSDL definitions) and includes them as a BPEL workflow. This workflow contains all information needed to enact the workflow.

jORCA [20] (http://www.bitlab-es.com/jorca) is a powerful and portable desktop client, highly customizable to cover a broad range of user skills. The client provides several interesting features: access to several repositories with different protocols,

searching for compatible tools based on user data or keywords; embedded file system for handling local user files and user-defined lists of favourite tools.

Magallanes is also available as a stand-alone desktop application. Besides the functionality included within the ACGT portal, the application is able to use metadata (in particular parameter data types) to generate possible service compositions (workflows) based on user specifications of desired input and output data types.

Acknowledgements

Funding: This work has been partially financed by the National Institute for Bioinformatics (www.inab.org) a platform of Genoma España, the EC project "Advancing Clinico-Genomic Trials on Cancer" (Contract No.026996) and the RIRAAF Spanish network on allergies (RD07/0064/0017)

References

[1] W3C - SOAP specifications (http://www.w3.org/TR/soap/) [2] Wilkinson, M.D. et al. The Bio-MOBY Project Explores Open-Source, Simple, Extensible Protocols for

Enabling Biological Database Inter-operability. Proceedings Virtual Conference Genomic and Bioinformatics (3):16-26. (ISSN 1547-383X). 2003.

[3] Stevens, R.D., et al. myGrid: personalised bioinformatics on the information Grid. Bioinformatics, 19, (Suppl. 1), i302–i304. 2003.

[4] Web Services Description Language (WSDL) http://www.w3.org/TR/wsdl [5] I. Foster. Globus Toolkit Version 4: Software for Service-Oriented Systems. IFIP International

Conference on Network and Parallel Computing, Springer-Verlag LNCS 3779, pp 2-13. 2006. [6] Oster, S. et al. caGrid 1.0: An Enterprise Grid Infrastructure for Biomedical Research. J Am Med Inform

Assoc. 2007. [7] EU project Advancing Clinico-Genomics Trials on Cancer (ACGT - http://www.eu-acgt.org/) [8] Pukacki, J. et al. Programming Grid Applications with Gridge. Computational Methods in Science and

Thecnology; 12(1), 47-68. 2006. [9] Axis, an implementation of the Simple Object Access Protocol (SOAP) http://ws.apache.org/axis/ [10] Apache Tomcat, an open source software implementation of Java Servlet and JavaServer Pages

technologies http://tomcat.apache.org/ [11] Sriram, K. et al. Opal: Simple Web Services Wrappers for Scientific Applications. In proceedings of

ICWS 2006, IEEE International Conference on Web Services. 2006. [12] Chard, K. et al. Wrap Scientific Applications as WSRF Grid Services Using gRAVI. ICWS '09:

Proceedings of the 2009 IEEE International Conference on Web Services. 2009. [13] Ríos, J, et al. Magallanes: a Web services discovery and automatic workflow composition tool. BMC

Bioinformatics 10:334. 2009 [14] Web Services Business Process Execution Language Version 2.0, (http://docs.oasis-

open.org/wsbpel/2.0/wsbpel-v2.0.pdf) [15] R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation

for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0. 2005 [16] Martín, L. et al. Data Access and Management in ACGT: Tools to Solve Syntactic and Semantic

Heterogeneities Between Clinical and Image Databases. Lecture Notes in Computer Science, Volume 4802/2007, pag 24-33. 2007. 978-3-540-76291-1

[17] MOBY Central at the University of Calgary http://moby.ucalgary.ca/moby/MOBY-Central.pl [18] Smith, B. et al. Establishing and harmonizing ontologies in an interdisciplinary health care clinical

research environment. EHealth: Combining Health Telematics, Telemedicine, Biomedical Engineering and Bioinformatics on the Edge. Global Expert Summit Textbook. IOS Press, Amsterdam, 219-234.

[19] Stelios, S. et al, "Web-Based Authoring and Secure Enactment of Bioinformatics Workflows," gpc, pp.88-95, 2009 Workshops at the Grid and Pervasive Computing Conference, 2009

[20] Martín-Requena, V. et al. jORCA: easily integrating bioinformatics Web Services; Bioinformatics 2010 26(4):553-559; doi:10.1093/bioinformatics/btp709