ggf summer school 24th july 2004, italy part 2: architecture overview professor carole goble...

26
GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester http://www.mygrid.org.uk

Upload: jasper-moody

Post on 05-Jan-2016

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Part 2: Architecture overview

Professor Carole Goble

University of Manchester

http://www.mygrid.org.uk

Page 2: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

In a nutshell• Bioinformatics toolkit• Open (Web) Services

– myGrid components and external domain services

– Publication, discovery, interoperation, composition, decommissioning of myGrid services

– No control or influence over domain service providers

• Metadata Driven– LSIDs, Common information model,

Ontologies, Semantic Web technologies

• Open extensible architecture– Assemble your own components– Designed to work together– Loosely coupled

Freefluo

WfEE

TavernaWfDE

ViewUDDIregistry

EventNotification

mIR

Pedro

SemanticDiscoveryFeta

Info.Model

SoaplabGowlab

Gateway & CHEFPortal

LSID

HaystackProvenanceBrowser

Page 3: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Key Characteristics• Data Intensive, Up stream analysis• Pipelines - experiments as workflows (chiefly)• Adhoc exploratory investigative workflows for

individuals from no particular a priori community• Openness – the services are not ours.• Low activation energy, incremental take-on• Foundations for sharing knowledge and sharing

experimental objects• Multiple stakeholders• Collection of components for assembly

Page 4: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Openness

• Openness– open source– open world of services– open extensible technology– open to wider eScience context– open to user feedback– open to third party metadata

Page 5: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Platform• Standards based• (Web) Service Oriented Architecture

– Publication, discovery, interoperation, composition, decommissioning of myGrid services

– Web services communication fabric– XML document types– LSIDs for identifying resources

• Implemented in Java using Axis and Tomcat– WS-I -> OGSA / WSRF

• Metadata driven– RDF-coded metadata– OWL-coded ontologies– Common information model

Page 6: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Stakeholders

myGrid users

biologistsIS specialists

infrequent

problem specific bioinformaticians

tool builders

serviceprovider

systemsadministrators

bioinformaticstool builders

• Middleware for • Tool Developers • Bioinformaticians • Service Providers• Biologists are

indirectly supported by the portals and apps these develop.

annotators

Page 7: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Collections of Tasks

Finding

DescriptionService

Discovery

Enactment

BuildingWorkflow

Provenance

Storage

DataManagement

Querying

DomainTasks Service

Providers

Bioinformaticians

Scientists

Annotation providers

Page 8: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Experimental entities

Page 9: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Investigation = set of experiments + metadata

• Experimental design components

• Experimental instances that are records of enacted experiments

• Experimental glue that groups and links design and instance components

• Life Science IDs, URIs, RDF

Page 10: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Web Service (Grid Service) communication fabricWeb Service (Grid Service) communication fabric

AMBITText Extraction

Service

Provenance MgtEvent Notification

Service

e-Science Mediator

Feta Service & WFDiscovery

Information Repository

Ontology Mgt

Metadata Store

Taverna Workbench

Haystack

Native Web Services

SoapLab

Web Portal

Legacy apps

UDDIRegistries

Ontologies

FreeFluo Workflow Enactment Engine

OGSA-DQPDistributed Query Processor

Bio

info

rmat

icia

nsT

ool P

rovi

ders

Ser

vice

Pro

vide

rsA

pplicationsC

ore servicesE

xternal servicesmyGrid Service Stack

Views

Legacy apps

GowLab

LSID Launch pad

LSIDAuthority

Page 11: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Service stack

Taverna workbench

Web Service (Grid Service) communication fabricWeb Service (Grid Service) communication fabric

AMBITText Extraction

Service

Native Web Services

SoapLab

Legacy apps

App

sC

ore

serv

ices

Ext

erna

l se

rvic

es

Websites

GowLab

Web PortalLSID Launch

PadHaystack

e-Science Mediator

e-Science process patterns

Service & workflow discovery

Metadata management

Data management

e-Science event

bus

Workflow enactment

!

!

!

!

Page 12: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

20,000 feet

FreefluoWorkflowEngine

LSID Authority

UDDI

mIR metadata

Store Service

Provenance andData browser

Haystack or Portal

Web services, local toolsUser interaction etc.

TavernaWorkbench

ViewService

SemanticDiscovery

& Registration

Event Notification Service

mIR data

Page 13: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

e-Science Mediator1. Application-oriented: directly supports the e-

Scientist by: • providing pre-configured e-Science processes templates (i.e.

system-level workflows)• helping in capturing and maintaining context information (via the

information model) that is relevant to the interpretation and sharing of the results of the e-science experiments.

• Facilitating personalisation and collaboration

2. Middleware-oriented: contributes to the synergy between myGrid services by:

• Acting as a sink for e-Science events initiated by myGrid components• Interpreting the intercepted events and triggering interactions with

other related components entailed by the semantics of those events• Compensating for possible impedance mismatches with other

services both in terms of data types and interaction protocols

Page 14: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Supporting the e-scientist

• Recurring use-cases can be captured

• Then corresponding process templates can be authored

• e-science mediator makes processes available to the user

launch semantic Search facility

Find WorkflowUse-case

Launch workflowEditor for selected WF

Enable MIR browser For storage with context

Find an interesting workflow for experiment

Create exp. Context for this user

Find WorkflowProcess

Examine and modify if necessary

Store to personal repositoryFor later re-use

Page 15: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

• E-Science process templates maintained by the mediator can derive the GUI generation and interaction with the user

E-Science Mediator

GUI

Page 16: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Mediating between services

Example: mediation during a workflow execution

E-Science Mediator

MIR

1: Execution started[*]3: intermediate process completed6: workflow completed

2: Establish experiment/user context[*]4: link process trace to context7: get WF results

[*]5: Store intermediate process trace8: Store WF results

WF Enactor

NotificationService

9: notify WF completion to subscribers

Page 17: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Simplified Architecture

MIRService Registry

WF EnactorNotification

Service

E-Science Mediator Service

E-Science Mediator client-stubs

GUI (e-science workbench)

Co

nte

xt p

res

erve

d v

ia

myG

rid

In

orm

atio

n M

od

el

Client-side e-science

process logic

Server-side e-science

process logic

The The GridGrid

Client SideClient Side

Page 18: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Event notification Service• Publish/subscribe

model– Topic based (cf.

JMS topics, CORBA channels)

– Hierarchic topics– Persistent event

storage– Subscription leases– Federation for

scalability & reliability

– Event filtering

http://cvs.mygrid.org.uk/notification-stable/downloads

Page 19: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Portal toolkit for bioinformaticians

• Target application– Williams-Beuren Syndrome– Fixed set of workflows

• Extra myGrid portlets– Configurable– Workflow enactment– Workflow scheduling– Completion notification– Results browsing

• Based on CHEF & Jetspeed-1– Portlets for team collaboration

Portlet Container

InterfaceInterfaceInterface

Portlet PortletPortlet

Page 20: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Text Services

User Client

Medline Server (Sheffield)

Swissprot/Blast record

Workflow Server

WorkflowEnactment

ExtractPubMed Id

Get MedlineAbstract

Initial Workflow

Cluster Abstracts

Get Related Abstracts

Medline: pre-processed offline to extract biomedical terms + indexed

XScufl workflow definition+ parameters

Clustered PubMed Ids+ titles

PubMed Ids

PubMed Ids

Term-annotatedMedline abstracts

MedlineAbstracts

Page 21: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

HistoryPre-Prototype

Prototype 1

ExperimentalWeb-based

Requirements gathering

Architectural workoutAll services represented

NetBeans workbenchAPI-based integration

Info Repository orientedXML-based process provenance

Workflow enactment engine

Prototype 2

Second generation servicesReworked information model

Open information managementLife Science IdentifiersRDF based provenance

Taverna workbenchWeb-based portal

Demo at ISMB 2003

Full paper and demoat ISMB 2004GSK deployment

Real biology

Page 22: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Two+ Paths

Core functionality• Services – Soaplab

and Gowlab• Workflow enactment

engine – Freefluo• Workflow workbench

– Taverna• Data integration –

OGSA-DQP• Information model &

management• Mediator

Innovative work• Service and workflow

registration• Semantic discovery• Provenance

management• Text mining

In between• Event notification

Page 23: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

myGrid PeopleCore• Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro

Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pokock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe.

Users• Simon Pearce and Claire Jennings, Institute of Human Genetics School of

Clinical Medical Sciences, University of Newcastle, UK• Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester,

UK• Steve Kemp, Liverpool, UKPostgraduates• Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith

Flanagan, Antoon Goderis, Tracy Craddock, Alastair HampshireIndustrial • Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM)• Robin McEntire (GSK)Collaborators• Keith Decker

Page 24: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Collaboration

http://www.accessgrid.org

Page 25: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

Publications• R. Stevens, H.J. Tipney, C. Wroe, T. Oinn, M. Senger, P. Lord, C.A. Goble, A. Brass

and M. Tassabehji Exploring Williams-Beuren Syndrome Using myGrid to appear in Proceedings of 12th International Conference on Intelligent Systems in Molecular Biology, 31st Jul-4th Aug 2004, Glasgow, UK.

• C.A. Goble, S. Pettifer, R. Stevens and C. Greenhalgh Knowledge Integration: In silico Experiments in Bioinformatics in The Grid: Blueprint for a New Computing Infrastructure Second Edition eds. Ian Foster and Carl Kesselman, 2003, Morgan Kaufman, November 2003.

R. Stevens, A. Robinson, and C.A. Goble myGrid: Personalised Bioinformatics on the Information Grid in proceedings of 11th International Conference on Intelligent Systems in Molecular Biology, 29th June–3rd July 2003, Brisbane, Australia, published Bioinformatics Vol. 19 Suppl. 1 2003, pp302-304.

Page 26: GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF Summer School 24th July 2004, Italy

http://www.mygrid.org.uk