semantic and personalised service discoverystevensr/papers/all-hands-discovery.pdf · molecular...

8
Semantic and Personalised Service Discovery Phillip Lord 1 , Chris Wroe 1 , Robert Stevens 1 ,Carole Goble 1 , Simon Miles 2 , Luc Moreau 2 , Keith Decker 2 , Terry Payne 2 and Juri Papay 2 1 Department of Computer Science University of Manchester Oxford Road Manchester M13 9PL UK and 2 School of Electronics and Computer Science University of Southampton Southampton SO17 1BJ UK [email protected], [chris.wroe,robert.stevens,carole]@cs.man.ac.uk, [sm,l.moreau,ksd,trp,jp]@ecs.soton.ac.uk http://www.mygrid.org.uk Abstract One of the most pervasive classes of services needed to support e-Science applications are those responsi- ble for the discovery of resources. We have devel- oped a solution to the problem of service discovery in a Semantic Web/Grid setting. We do this in the context of bioinformatics, which is the use of compu- tational and mathematical techniques to store, man- age, and analyse the data from molecular biology in order to answer questions about biological phe- nomena. Our specific application is my Grid (http: //www.mygrid.org.uk) that is developing open source, service-based middleware upon which bioin- formatics applications can be built. my Grid is specif- ically targeted at developing open source high-level service Grid middleware for bioinformatics. 1 Introduction Service discovery, the process of locating services, devices and resources, is an essential requirement for any distributed, open, dynamic environment. Al- though traditional service discovery methods may be effective when a priori knowledge of the services or agreements about implicitly shared ontologies can be assumed, they fail to scale to large, dynamic, open, environments, where a high degree of autonomy is re- quired. Semantic web service discovery overcomes this limitation by providing an ontological framework by which services may be described and processed. Whilst this is equally applicable to Grid and e-Science domains, these domains impose additional require- ments on the service discovery process, beyond sim- ply locating a service based on a description of its functionality. This paper examines the issues, and proposes a hybrid solution to the task of semantic web service discovery within the context of a Bioin- formatics Grid domain. This domain uses compu- tational and mathematical techniques to store, man- age, and analyse data from molecular biology in or- der to answer questions about biological phenomena. Our specific application is my Grid (http://www. mygrid.org.uk). Molecular biology is about collecting, comparing and analysing information from experimental data sets. Traditionally, these (typically small) data sets are manually obtained from specific “wet” bench ex- periments designed to test a specific hypothesis. In silico experimentation has allowed molecular biolo- gists to obtain relatively large datasets, by conducting experiments purely through computer based analysis of existing experimental data and associated knowl- edge to test a hypothesis, derive a summary, search for patterns or to demonstrate a known fact. Thus, experiments can be performed on a complete genome rather than an individual gene; to model the behaviour of a cell’s complement of genes, rather than one gene; and to compare between species rather than within

Upload: others

Post on 02-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantic and Personalised Service Discoverystevensr/papers/all-hands-discovery.pdf · Molecular biology is about collecting, comparing and analysing information from experimental

Semantic and Personalised Service Discovery

Phillip Lord1, Chris Wroe1, Robert Stevens1,Carole Goble1, Simon Miles2,Luc Moreau2, Keith Decker2, Terry Payne2 and Juri Papay2

1Department of Computer ScienceUniversity of Manchester

Oxford RoadManchester M13 9PL

UKand

2School of Electronics and Computer ScienceUniversity of Southampton

Southampton SO17 1BJUK

[email protected],[chris.wroe,robert.stevens,carole]@cs.man.ac.uk,

[sm,l.moreau,ksd,trp,jp]@ecs.soton.ac.ukhttp://www.mygrid.org.uk

Abstract

One of the most pervasive classes of services neededto support e-Science applications are those responsi-ble for the discovery of resources. We have devel-oped a solution to the problem of service discoveryin a Semantic Web/Grid setting. We do this in thecontext of bioinformatics, which is the use of compu-tational and mathematical techniques to store, man-age, and analyse the data from molecular biologyin order to answer questions about biological phe-nomena. Our specific application is myGrid (http://www.mygrid.org.uk) that is developing opensource, service-based middleware upon which bioin-formatics applications can be built. myGrid is specif-ically targeted at developing open source high-levelservice Grid middleware for bioinformatics.

1 Introduction

Service discovery, the process of locating services,devices and resources, is an essential requirementfor any distributed, open, dynamic environment. Al-though traditional service discovery methods may beeffective when a priori knowledge of the services oragreements about implicitly shared ontologies can beassumed, they fail to scale to large, dynamic, open,environments, where a high degree of autonomy is re-quired. Semantic web service discovery overcomes

this limitation by providing an ontological frameworkby which services may be described and processed.Whilst this is equally applicable to Grid and e-Sciencedomains, these domains impose additional require-ments on the service discovery process, beyond sim-ply locating a service based on a description of itsfunctionality. This paper examines the issues, andproposes a hybrid solution to the task of semanticweb service discovery within the context of a Bioin-formatics Grid domain. This domain uses compu-tational and mathematical techniques to store, man-age, and analyse data from molecular biology in or-der to answer questions about biological phenomena.Our specific application is myGrid (http://www.mygrid.org.uk).

Molecular biology is about collecting, comparingand analysing information from experimental datasets. Traditionally, these (typically small) data setsare manually obtained from specific “wet” bench ex-periments designed to test a specific hypothesis. Insilico experimentation has allowed molecular biolo-gists to obtain relatively large datasets, by conductingexperiments purely through computer based analysisof existing experimental data and associated knowl-edge to test a hypothesis, derive a summary, searchfor patterns or to demonstrate a known fact. Thus,experiments can be performed on a complete genomerather than an individual gene; to model the behaviourof a cell’s complement of genes, rather than one gene;and to compare between species rather than within

Page 2: Semantic and Personalised Service Discoverystevensr/papers/all-hands-discovery.pdf · Molecular biology is about collecting, comparing and analysing information from experimental

one particular species. This form of e-Science in-volves marshalling disparate, autonomous, and het-erogeneous resources to act in concert to achieve aparticular analytical goal.

Bioinformatics resources, such as experimentaldata, services, descriptions of experimental method-ology, are knowledge-rich and require a great deal ofsemantic description for pragmatic use, even withinsemi-automated processes. They should supportthird-party annotations, which may have limited vis-ibility or scope. For example, a scientist may needto record additional comments with these resourceswhilst performing an experiment, such as the applica-bility of a service for a given context, and share thesecomments only with immediate colleagues. Severalsuch additions may be generated by different third-parties.

myGrid is specifically targeted at developing anopen source, high-level service, Grid middleware forthis kind of biology. myGrid middleware is a frame-work using an open, service-based architecture, pro-totyped on Web Services with a migration path to theOpen Grid Services Architecture (OGSA) [3]. Thekey aim is to support the construction, managementand sharing of data-intensive in silico experiments inbiology. In order to achieve this the myGrid middle-ware explicitly captures the experimental method asa workflow. The use of data/computational servicesand the derivation of experimental data is tied to thecorresponding workflows by explicit provenance in-formation. Figure 1 shows the lifecycle of in silicoexperiments, along with the core activities of myGrid.Resource discovery pervades the life cycle. Beforedeveloping an experimental method in the form ofworkflow the user should be supported in re-using andadapting previous work in the community rather thanhaving to start from scratch.

All these activities can involve discovery – for ex-ample, “who has performed an experiment x, when,where and why?”, a question involving details ofprovenance, location, experimental method, etc. Dataand computational services need to be discovered sothat they perform individual tasks in the workflow.In fact there is nothing to stop these tasks being per-formed by more detailed workflows, rather than a sin-gle service.

1.1 Service Semantics

Semantic description of implicit community knowl-edge offers a mechanism to cope with the heterogene-ity of resources by providing a rich descriptive frame-work and common vocabulary to integrate and searchover apparently disparate data, services and work-flows. Several discovery services have been deployed

Executing

Workflow enactment

Distributed Query processing

Job execution

Provenance generation

Single sign-on authorisation

Event notification

Resource & service discovery

Repository creation

Workflow creation

Database integration

Discoverying and Reusing

Workflow discovery & refinement

Resource & service discovery

Repository creation

Provenance

Managing

Information repository

Metadata management

Provenance management

Workflow evolution

Event notification

Publishing

Service registration

Workflow deposition

Metadata Annotation

Third party registration

Personalisation

Personalised registries

Personalised workflows

Info repository views

Personalised annotations & metadata

Security

Forming

Figure 1: The cycle of myGrid in silico experiments.

that utilise description logic reasoning to match a re-quest against different advertised service profiles sys-tems [6, 2]. This provides flexibility within the match-ing algorithm, allowing the search to be broadened toservices that consume more general inputs or producemore specific outputs. Within myGrid we have alsobased semantic service descriptions on the DAML-Sprofile schema with specific extensions for bioinfor-matics [7]. However, we have decided not to force ser-vice publishers and third parties to describe businessdetails, workflow or binding using the schema pro-vided by the DAML-S upper level ontology, Instead,industry standards and associated tools can be used toauthor and discover such information. In myGrid theseinclude the UDDI model for specifying business de-tails, Web Services Flow Language (WSFL) for work-flow, and WSDL for binding information. This lowersthe entry cost for publishing or annotating a service.The DAML-S based approach is only used for seman-tic discovery where domain ontologies (such as bioin-formatics ontologies) and associated reasoning are es-sential.

In Section 2 we analyse the requirements of the in-silico bioinformatics domain and present our archi-tecture to meet those requirements in Section 3. Ex-actly how the components of the architecture interactto solve the service discovery problems is discussedin Section 4. We conclude in Section 5.

2 Requirements for Publishingand Discovering Services

Service discovery is a process in which a user or otheragent gives a query to the system and is presented witha list of available services that match that query. Thequery will state what the user wishes to achieve orwhat data they wish to process or service he or shewishes to discover more about.

The nature of the bioinformatics community (asdescribed above) presents myGrid with several inter-esting challenges: Global distribution and high frag-

Page 3: Semantic and Personalised Service Discoverystevensr/papers/all-hands-discovery.pdf · Molecular biology is about collecting, comparing and analysing information from experimental

mentation of community (except for a few centralisedrepositories); autonomy of community groups (over500 resources are available at the time of writing); au-tonomy of applications, services and formats that leadto massive heterogeneity.

The different community groups produce a rangeof diverse data types such as proteomes, gene expres-sions, protein structures, and pathways. The data cov-ers different scales and different experimental proce-dures that may be challenging to inter-relate. The dif-ferent databases and tools have different formats, ac-cess interfaces, schemas, and coverage, and are hostedon cheap commodity technology rather than in a fewcentralised and unified super-repositories. They com-monly have different, often home-grown, versioning,authorisation, provenance, and capability policies.

Within bioinformatics we cannot assume that wehave control over the format data presented by theservices. Many service providers will therefore beunwilling to represent their data according to a “stan-dard” representation, preferring to use either their ownformats, or one of the existing, hard won, bioinformat-ics standards. Additionally the complexity of biolog-ical data means that we may wish to describe a pieceof data in several different ways, e.g. Two servicesmight both return a DNA sequence, but one might bea complete genome, the other might return only sin-gle genes, information which is not easy to explic-itly encode in a WSDL interface. It is for this rea-son that, within myGrid, we have investigated semanticweb technologies.

We start, in the section below, by presenting exam-ples of the types of query that may be presented byusers in our domain.

2.1 Sample Queries

In order to design the discovery architecture formyGrid we have collected an example set of questionsand categorised them depending on the nature of theinformation that must be searched.

The first category consists of queries which involvesearching on the properties of a service or workflowresource as described by the publisher in terms of con-crete instance data, such as finding a resource basedon its ownership, location, or accessibility. Examplesinclude:

• What resources does a specific organisation pro-vide?

• Who authored this resource?

This requires the author of services to describethese properties using a consistent schema. For ex-ample, businesses and services can be described in

UDDI using a standard data model. Such a descrip-tion must be available to the discovery service at thetime of registration of the service or publication of aworkflow. A discovery service must then be able toprocess queries over these descriptions. In this casethe type of descriptive information is common to anydomain to which the service is targeted. For exam-ple, organisation, authorship, location, address, etc.are features of any domain within e-Science or busi-ness.

The second category consists of queries which in-volve searching on concrete instance based proper-ties provided by third parties (users, organizationaladministrators, domain experts, independent validat-ing institutions, etc.) either as opinion, observable be-haviour or previous usage.

• What services offering x currently give the bestquality of service?

• Which service would the local bioinformatics ex-pert suggest we use?

Figure 2 shows an example of third party descrip-tion of a resource conforming again to the DAML-Sprofile schema.

<profile:qualityRating>

<profile:QualityRating rdf:ID=“NCBI-BLASTn-Rating">

<profile:ratingName>Recommendation</profile:ratingName>

<profile:rating rdf:resource="http://www.mygrid.org.uk/quality_concepts.daml#recommended"/>

</profile:QualityRating>

</profile:qualityRating>

Figure 2: RDF based description of author and pub-lishing organisation adhering to the DAML-S serviceprofile

The need for third party description immediatelyintroduces the requirement for control of who is per-mitted to describe a resource and proper attributionof a description to an author. It would be desirableto allow local (organizational and personal) annota-tion of resources registered in global registries. An-other consequence of third party annotation are viewsbased upon those third party annotations. Individu-als, groups, communities and institutions may differin their opinions of a service.

The final category consists of queries which involvesearching over properties expressed using conceptsfrom a domain specific ontology.

1. Finding a service that will fulfil some task e.g.aligning of biological sequences.

• What services perform a specific kind oftask, for example, what services can I useto perform a biological sequence similaritysearch?

Page 4: Semantic and Personalised Service Discoverystevensr/papers/all-hands-discovery.pdf · Molecular biology is about collecting, comparing and analysing information from experimental

2. Finding a service that will accept or producesome kind of data.

• What services produce this kind of data, forexample, from where can I find sequencedata for a protein?

• What services consume this kind of data,for example, if I have protein sequencedata, what can I do with it?

An example of a commonly used domain servicein bioinformatics is BLAST– “the Basic Local Align-ment Search Tool” [1]. It is an application that en-compasses a number of services used to compare anewly discovered DNA or protein sequence with thelarge public databases of known sequences. It cantherefore accept as input a variety of sequence datawhether protein or DNA, perform a search over a va-riety of databases and produce a variety of result for-mats. Figure 3 shows a conceptual description of theBLAST service BLASTn in DAML+OIL. At its coreit accepts nucleotide sequence data and compares thisagainst nucleotide databases. It is a common situa-tion for the user to actually have a more specific typeof data such as an Expressed Sequence Tag (EST),which is a fragment of DNA known to be derived froma gene. To successfully answer the query “what ser-vice will accept an expressed sequence tag?”, it is nec-essary for the discovery service to have informationabout the domain describing the semantic relation-ships between the bioinformatics datatypes. In myGridthis domain information is stored as a suite of domainontologies [7]. It should also be clear that users maywish to search for resources, other than services, withthese same semantic relationships. So as well query-ing for “all services taking DNA sequences”, we maywish to ask for “all local files containing a DNA se-quence”.

class-def defined BLAST-n_service

subclass-of service

has_Class performs_task (aligning has_Class has_feature local has_Class has_feature pairwise)

has_Class produces_result (report has_Class is_report_of sequence_alignment)

has_Class uses_resource (database has_Class contains

(data has_Class encodes

(sequence has_Class is_sequence_of nucleic_acid_molecule)))

has_Class requires_input (data has_Class encodes

(sequence has_Class is_sequence_of nucleic_acid_molecule))

has_Class is_function_of (BLAST_application)

Figure 3: DAML+OIL description of the functionalityof BLASTn

This categorisation of queries will not be obviousto the user and indeed a single user query may in-corporate all the aspects we have described simulta-neously. For example ‘Which services recommendedby my organisation can I use to process my expressedsequence tag?’ Therefore, although it may be essen-tial for the architecture to separate out ontology based

queries from queries of third party descriptions fromqueries on original published information, it is alsoessential to shield the user from such a distinction.

2.2 Requirements Summary

We would argue that the following requirements, overand above the generic requirements of web services,are necessary to support service discovery in an e-Science context:

1. Descriptions must be attached to different re-sources (services and workflows) published indifferent components (service registries, localfile stores, or databases);

2. Publication of descriptions must be supportedboth for the author of the service and third par-ties;

3. Different classes of user will wish to examinedifferent aspects of the available metadata, bothfrom the service publisher;

4. There is a need for control over who make addand alter third party annotations;

5. We must support two types of discovery: the firstusing cross-domain knowledge; the second re-quiring access to common domain ontologies;

6. A single, unified interface for all these kinds ofdiscovery should be made available to the user.

3 Architecture

User

Semantic

Find Service

Personalised

View

Service

Registry

Service

Registry

Service

Registry

Service

Registry

Gather

service

descriptions

Query for

service instances

and metadata

Discovery by

describing

services required

Discovery by standard

registry protocols and

syntactic matching of personal

metadata attached to services

Discovery

Client

Figure 4: Architecture of discovery services in myGrid

In this section, we discuss the myGrid architectureused to support the types of service discovery dis-cussed in the previous section; Figure 4 shows therelevant components. We assume that there exist a

Page 5: Semantic and Personalised Service Discoverystevensr/papers/all-hands-discovery.pdf · Molecular biology is about collecting, comparing and analysing information from experimental

multitude of service registries on the Grid which canbe used to publish details on how to access services,possibly with additional information to aid discovery.

In order to allow service discovery using third partymetadata, we need a place to store that metadata.Metadata may be personal and private to an individ-ual or organisation and so should not be published inpublic registries, even if that was technically possi-ble. Third-party metadata intended to inform servicediscovery is one way in which to filter the services re-turned to a user on providing a query. A personalisedview is a service that provides a place to add third-party metadata and thereby filter the service detailsreturned by a query. Information from registries iscollected into personalised views that provide a subsetof service advertisements that can be annotated withmetadata by an individual or organisation and thenused for discovery.

Semantic find services use the information (and inparticular the metadata) stored in views to extract rel-evant semantic descriptions of services allowing se-mantic discovery using domain knowledge. A dis-covery client can be used by a user to hide the dis-tinctions between the syntactic matching performedby the view and the semantic reasoning done by a findservice.

3.1 Service Registries

Services can currently be advertised using a varietyof standards, e.g. LDAP, Jini. Within myGrid, wehave mostly been concerned with Web Services, forwhich the primary publishing “standard” is UDDI.UDDI repositories can be deployed on the Internetfor general use, or privately within an organisationas repositories of that organisations’ own services. AUDDI repository contains a set of adverts for services,each of which is usually registered by the provider ofthe service. Service descriptions follow a strict datamodel including information such as the organisationowning the service; details on how to contact the ser-vice; references to technical information regarding theinterface of the service; simple classification of theservice within some standard taxonomy etc.

However, this simple model is inadequate for meet-ing the demands of myGrid as set out in Section 2, asthere is no semantic reasoning, no third-party meta-data and only simple classification.

Registries are necessary for allowing existing ser-vice discovery to take place. Using these registries wecan solve the problem of users being able to locate ser-vices that might match their needs by browsing reg-istries for organisations providing such services. Stan-dard registries provide the functionality for cross do-main queries discussed in Section 2.

3.2 Views

A view is a service that allows discovery of servicesover a set of service descriptions stored in directo-ries on the grid. The discovery process can be per-sonalised by attaching third-party metadata to servicedescriptions. An (experienced) user can set up a viewthat pulls entries from a set of sources (registries). Foreach source, the user specifies a query to provide theinitial data extracted from that source. Third partiescan manually edit the view by editing the metadataattached to entries or deleting entries.

A view may be created and owned either by a sin-gle person or a organisation/group. For example, abiology lab could have a view that contains metadatauseful to members of that lab and has one (or more)designated curator(s) authorised to change the view’sentries and sources. A PhD student who joins a labwill be given access to the lab view of usable services.In their training period, the student will only be givenread access to these views. At a later stage, the PhDstudent can have a view created for them by the viewcurator, with the lab view as its sole source, to whichthey can add metadata but make no other modifica-tions. Later on, the view authorisation policy can bechanged to allow them more control, such as modify-ing metadata and adding sources. Eventually, the PhDstudent can graduate to become the curator of the labview.

The internal architectural details of views and howthey can be used to store semantic information is de-scribed in [5].

One of the sample queries in the “third party” cate-gory in Section 2 is:

• Which service would the local bioinformatics ex-pert, suggest we use?

A simple example of solving this problem is to havea view local to the organisation, and a piece of meta-data attached to some service descriptions in the view.The metadata could have the name ‘isRecommended’and either ‘true’ or ‘false’ as a value. The local bioin-formatics expert can attach this metadata to the ser-vices described in the view that they favour. Others inthe organisation can then present a query that syntac-tically matches only those services with metadata ofname ‘isRecommended’ and value ‘true’. This pro-vides a locally administered filtering of service dis-covery and also allows annotation of service descrip-tions.

3.3 Semantic Find Service

The semantic find service provides discovery over do-main specific descriptions by reference to domain on-tologies. The find service makes use of several ad-

Page 6: Semantic and Personalised Service Discoverystevensr/papers/all-hands-discovery.pdf · Molecular biology is about collecting, comparing and analysing information from experimental

ditional components as shown in Figure 5. The de-scription database holds semantic descriptions gath-ered from resources published in registries and views.The ontology server provides access to the domain on-tologies and manages interaction with the descriptionlogic reasoner FaCT [4]. The find service itself is re-sponsible for:

• gathering semantic descriptions from the viewand maintaining a reference back to the entry inthe view, so that details for communicating withthe services can later be retrieved;

• using the ontology service and associated rea-soner to index items in the descriptions databaseto ensure efficient retrieval of entries at time ofdiscovery;

• using the pre-built index or if necessary the on-tology service and associated reasoner to processa discovery query

Find service

Ontology

service

Description logic

reasoner

Description database

Populating, indexing and

querying descriptions

Determining semantic

relationships between

concepts used in

descriptions

Calculating subsumption

relationships between

concepts using formal

property based definitions

Figure 5: Internal architecture of the semantic findservice

If we take the example of the BLASTn service pre-sented in the requirements section we can demonstratehow the semantic find service can support a seman-tic query over such a resource description. The userpresents a discovery query in terms of a DAML+OILdescription of the kind of service they require. In theexample case it could be a service which accepts Ex-pressed Sequence Tags. The find service uses the on-tology server to determine which services accept Ex-pressed Sequence Tags or a more general semanticdata type. The find service allows users to resolvequeries of the “domain specific” category in Section 2.

The separation of the semantic service discoverfrom registration stems from several key require-ments. Firstly it enables the UDDI registration pro-cess, and semantic service advertisement to be pro-viding by different people, i.e third party metadata.Secondly it allows substantial reuse of the semanticfind service for discovery of entities other than ser-vices, such as workflows, or static data.

Finally it enables other service discovery tech-niques to be added. So, for example, imagine wewished to add a service which allowed discovery ofbioinformatics services based upon some complex

logic operating over the recommendations by thirdparty bioinformaticians and the user’s trust in thoserecommendations. So, the scalable myGrid architec-ture allows the addition of discovery mechanisms overa wide variety of metadata, as well as semantic adver-tisements.

3.4 The Discovery client

The discovery client guides the user in constructinga query that will adhere to the information model ofservice descriptions in myGrid and the ontology usedto describe the domain specific semantic descriptionof a services functionality. The user is presented witha form based interface which transparently integratessemantic and non semantic items of a query. The dis-covery client then separates the user request into theparts relevant for submission to either the semanticfind service or view. It displays the intersection of thetwo queries to the user.

The discovery client removes the need for a userto have pre-existing knowledge of the data model ordomain ontologies used to describe services. It alsoshields the user from having to know where to sendspecific components of their query and pooling theresults. By providing this abstraction, queries of allcategories in Section 2 are resolvable by the user.

3.5 Architecture and RequirementsSummary

In summary the architecture meets the requirementsgiven in Section 2.2, in the following ways.

1. Decoupling of service registration, and descrip-tion, enables discovery over many entities (Re-quirement 1).

2. Providing a view over registries enables thirdparty metadata, (Requirement 2), for discoveryover subsets of total metadata (Requirement 3),and for controlling who can alter such metadata(Requirement 4).

3. The discovery client enables discovery of severalkinds (Requirement 5), but with a single unifiedinterface (Requirement 6).

4 Publishing and Discovery

Using the architecture presented in the preceding sec-tion, service providers can publish descriptions oftheir services and others can annotate those descrip-tions. This information is then accessible by users andcan be searched over by presenting queries to the findservices, views or registries.

Page 7: Semantic and Personalised Service Discoverystevensr/papers/all-hands-discovery.pdf · Molecular biology is about collecting, comparing and analysing information from experimental

Users of our architecture can attach, retrieve andreason over any published metadata such as services’ownership, location, recommendations, function, in-puts or outputs. Public metadata will be stored inthe registries, while private metadata is stored in theviews owned by an organisation or individual biolo-gist.

4.1 Publishing Service Descriptions

Figure 6: Sequence diagram of publishing service

UDDI and other registries have standard interfacesfor publishing service descriptions following theirown data models. Views allow users to attach meta-data to any part of the service descriptions gatheredfrom registry sources. Semantic data following thevocabulary and schema of a given ontology is gath-ered from views, and potentially other sources, andoptimised for reasoning over.

Figure 6 shows the process that takes place whena service is published in the myGrid architecture. Aservice provider publishes their service in a registryon the Grid. The data is later pulled into views setup to monitor the registry, and a notification of thenew service is sent to find services that have regis-tered an interest. A find service can then query a viewfor the metadata attached to the service which pro-vides information for semantic reasoning. The meta-data is associated with service keys (indexes) that canlater be used to retrieve communication informationfor clients to access the services.

4.2 Service Discovery

In Figure 7, we show the process of service discoverysupported by our architecture. The user will providea query to the system using the discovery client. Thisclient divides up the query into the part requiring se-mantic reasoning handled by a find service, and thepart using the data stored in a view. The find servicehas processed metadata containing semantic informa-tion extracted from the view into a form suitable for

Figure 7: Sequence diagram of service discovery

reasoning over. The find service resolves the query re-sults into a set of keys for extracting contact informa-tion (endpoints) of services from the view. The set ofservice instance information matching the query is re-turned to the discovery client and the user is providedwith the intersection of these results and the ones re-turned by the direct query to the view.

The user may for example, wish to discover a ser-vice that accepts a gene sequence as input. A servicedescription may not specify that it has an input exactlyas a gene sequence, but may use a more specific con-cept for which semantic reasoning would be requiredto identify the data as suitable for providing as input.The metadata describing the service as taking a typeof gene sequence as input would be contained in theview and extracted by the find service and analysedbefore discovery takes place. Other data and meta-data stored in the views could be used directly to sat-isfy user preferences, such as recommendation of aservice by a colleague or to limit the hosting organi-sation of the service. In the former case, the metadatawould be personal to an organisation’s view. In thecase of the hosting organisation, this data would havebeen extracted from a registry on the Grid.

5 Conclusions

In this paper we set out our approach to solving someproblems of service discovery in bioinformatics, byproducing a flexible and scalable approach that: en-ables semantic descriptions of different types of en-tities, not just services; allows descriptions to be au-thored and stored in different places, not just a serviceregistry; permits different abstractions of services, notjust instances; and enables descriptions to be searchedin different ways, not just by reasoning and classifica-tion.

By providing a flexible method of metadata stor-age in views, a variety of semantic descriptions canbe attached to service advertisements as well as to the

Page 8: Semantic and Personalised Service Discoverystevensr/papers/all-hands-discovery.pdf · Molecular biology is about collecting, comparing and analysing information from experimental

input and output parameters of those services. Thissubstantially extends the ability of existing registriesas well as allowing annotation of personal metadataby the user. Find services provide a discovery mecha-nism over the metadata in views and descriptions ofother entities, such as the data produced by an ex-periment, stored in other repositories. Find services,using ontologies for vocabularies and schemas, allowabstraction over services and other concepts, and socan provide a very rich querying and discovery mech-anism.

Acknowledgements

This work is supported by the myGrid e-Science pilotproject grant (EPSRC GR/R67743) and the GONGproject grant (DARPA DAML subcontract PY-1149from Stanford University).

References

[1] S.F. Altschul, W. Gish, M. Miller, E.W. Myers,and D.J. Lipman. Basic local alignment searchtool. Journal of Molecular Biology, 215:403–410,1990.

[2] Ion Constantinescu and Boi Faltings. Efficientmatchmaking and directory services. TechnicalReport 200277, Ecole Polytechnique Federale DeLausanne, 2002.

[3] Ian Foster, Carl Kesselman, Jeffrey Nick, andSteven Tuecke. The physiology of the grid: Anopen grid services architecture for distributed sys-tems integration. globus, 2002.

[4] I. Horrocks. FaCT and iFaCT. In P. Lambrix,A. Borgida, M. Lenzerini, R. Moller, and P. Patel-Schneider, editors, Proceedings of the Interna-tional Workshop on Description Logics (DL’99),pages 133–135, 1999.

[5] Simon Miles, Juri Papay, Vijay Dialani, MichaelLuck, Keith Decker, Terry Payne, and LucMoreau. Personalised grid service discovery. InStephen A. Jarvis, editor, Performance Engineer-ing. 19th Annual UK Performance EngineeringWorkshop (UKPEW 2003), pages 131–140, Uni-versity of Warwick, UK, July 2003.

[6] Massimo Paolucci, Takahiro Kawamura, TerryPayne, and Katia Sycara. Semantic matching ofweb services capabilities. In The First Interna-tional Semantic Web Conference (ISWC), 2002.

[7] C. Wroe, R. Stevens, C. Goble, A. Roberts, andM. Greenwood. A suite of DAML+OIL ontolo-gies to describe bioinformatics web services anddata. International Journal of Cooperative Infor-mation Systems, 12(2):197–224, 2003.