ggf summer school 24th july 2004, italy part 2: architecture overview professor carole goble...
TRANSCRIPT
GGF Summer School 24th July 2004, Italy
Part 2: Architecture overview
Professor Carole Goble
University of Manchester
http://www.mygrid.org.uk
GGF Summer School 24th July 2004, Italy
In a nutshell• Bioinformatics toolkit• Open (Web) Services
– myGrid components and external domain services
– Publication, discovery, interoperation, composition, decommissioning of myGrid services
– No control or influence over domain service providers
• Metadata Driven– LSIDs, Common information model,
Ontologies, Semantic Web technologies
• Open extensible architecture– Assemble your own components– Designed to work together– Loosely coupled
Freefluo
WfEE
TavernaWfDE
ViewUDDIregistry
EventNotification
mIR
Pedro
SemanticDiscoveryFeta
Info.Model
SoaplabGowlab
Gateway & CHEFPortal
LSID
HaystackProvenanceBrowser
GGF Summer School 24th July 2004, Italy
Key Characteristics• Data Intensive, Up stream analysis• Pipelines - experiments as workflows (chiefly)• Adhoc exploratory investigative workflows for
individuals from no particular a priori community• Openness – the services are not ours.• Low activation energy, incremental take-on• Foundations for sharing knowledge and sharing
experimental objects• Multiple stakeholders• Collection of components for assembly
GGF Summer School 24th July 2004, Italy
Openness
• Openness– open source– open world of services– open extensible technology– open to wider eScience context– open to user feedback– open to third party metadata
GGF Summer School 24th July 2004, Italy
Platform• Standards based• (Web) Service Oriented Architecture
– Publication, discovery, interoperation, composition, decommissioning of myGrid services
– Web services communication fabric– XML document types– LSIDs for identifying resources
• Implemented in Java using Axis and Tomcat– WS-I -> OGSA / WSRF
• Metadata driven– RDF-coded metadata– OWL-coded ontologies– Common information model
GGF Summer School 24th July 2004, Italy
Stakeholders
myGrid users
biologistsIS specialists
infrequent
problem specific bioinformaticians
tool builders
serviceprovider
systemsadministrators
bioinformaticstool builders
• Middleware for • Tool Developers • Bioinformaticians • Service Providers• Biologists are
indirectly supported by the portals and apps these develop.
annotators
GGF Summer School 24th July 2004, Italy
Collections of Tasks
Finding
DescriptionService
Discovery
Enactment
BuildingWorkflow
Provenance
Storage
DataManagement
Querying
DomainTasks Service
Providers
Bioinformaticians
Scientists
Annotation providers
GGF Summer School 24th July 2004, Italy
Experimental entities
GGF Summer School 24th July 2004, Italy
Investigation = set of experiments + metadata
• Experimental design components
• Experimental instances that are records of enacted experiments
• Experimental glue that groups and links design and instance components
• Life Science IDs, URIs, RDF
GGF Summer School 24th July 2004, Italy
Web Service (Grid Service) communication fabricWeb Service (Grid Service) communication fabric
AMBITText Extraction
Service
Provenance MgtEvent Notification
Service
e-Science Mediator
Feta Service & WFDiscovery
Information Repository
Ontology Mgt
Metadata Store
Taverna Workbench
Haystack
Native Web Services
SoapLab
Web Portal
Legacy apps
UDDIRegistries
Ontologies
FreeFluo Workflow Enactment Engine
OGSA-DQPDistributed Query Processor
Bio
info
rmat
icia
nsT
ool P
rovi
ders
Ser
vice
Pro
vide
rsA
pplicationsC
ore servicesE
xternal servicesmyGrid Service Stack
Views
Legacy apps
GowLab
LSID Launch pad
LSIDAuthority
GGF Summer School 24th July 2004, Italy
Service stack
Taverna workbench
Web Service (Grid Service) communication fabricWeb Service (Grid Service) communication fabric
AMBITText Extraction
Service
Native Web Services
SoapLab
Legacy apps
App
sC
ore
serv
ices
Ext
erna
l se
rvic
es
Websites
GowLab
Web PortalLSID Launch
PadHaystack
e-Science Mediator
e-Science process patterns
Service & workflow discovery
Metadata management
Data management
e-Science event
bus
Workflow enactment
!
!
!
!
GGF Summer School 24th July 2004, Italy
20,000 feet
FreefluoWorkflowEngine
LSID Authority
UDDI
mIR metadata
Store Service
Provenance andData browser
Haystack or Portal
Web services, local toolsUser interaction etc.
TavernaWorkbench
ViewService
SemanticDiscovery
& Registration
Event Notification Service
mIR data
GGF Summer School 24th July 2004, Italy
e-Science Mediator1. Application-oriented: directly supports the e-
Scientist by: • providing pre-configured e-Science processes templates (i.e.
system-level workflows)• helping in capturing and maintaining context information (via the
information model) that is relevant to the interpretation and sharing of the results of the e-science experiments.
• Facilitating personalisation and collaboration
2. Middleware-oriented: contributes to the synergy between myGrid services by:
• Acting as a sink for e-Science events initiated by myGrid components• Interpreting the intercepted events and triggering interactions with
other related components entailed by the semantics of those events• Compensating for possible impedance mismatches with other
services both in terms of data types and interaction protocols
GGF Summer School 24th July 2004, Italy
Supporting the e-scientist
• Recurring use-cases can be captured
• Then corresponding process templates can be authored
• e-science mediator makes processes available to the user
launch semantic Search facility
Find WorkflowUse-case
Launch workflowEditor for selected WF
Enable MIR browser For storage with context
Find an interesting workflow for experiment
Create exp. Context for this user
Find WorkflowProcess
Examine and modify if necessary
Store to personal repositoryFor later re-use
GGF Summer School 24th July 2004, Italy
• E-Science process templates maintained by the mediator can derive the GUI generation and interaction with the user
E-Science Mediator
GUI
GGF Summer School 24th July 2004, Italy
Mediating between services
Example: mediation during a workflow execution
E-Science Mediator
MIR
1: Execution started[*]3: intermediate process completed6: workflow completed
2: Establish experiment/user context[*]4: link process trace to context7: get WF results
[*]5: Store intermediate process trace8: Store WF results
WF Enactor
NotificationService
9: notify WF completion to subscribers
GGF Summer School 24th July 2004, Italy
Simplified Architecture
MIRService Registry
WF EnactorNotification
Service
E-Science Mediator Service
E-Science Mediator client-stubs
GUI (e-science workbench)
Co
nte
xt p
res
erve
d v
ia
myG
rid
In
orm
atio
n M
od
el
Client-side e-science
process logic
Server-side e-science
process logic
The The GridGrid
Client SideClient Side
GGF Summer School 24th July 2004, Italy
Event notification Service• Publish/subscribe
model– Topic based (cf.
JMS topics, CORBA channels)
– Hierarchic topics– Persistent event
storage– Subscription leases– Federation for
scalability & reliability
– Event filtering
http://cvs.mygrid.org.uk/notification-stable/downloads
GGF Summer School 24th July 2004, Italy
Portal toolkit for bioinformaticians
• Target application– Williams-Beuren Syndrome– Fixed set of workflows
• Extra myGrid portlets– Configurable– Workflow enactment– Workflow scheduling– Completion notification– Results browsing
• Based on CHEF & Jetspeed-1– Portlets for team collaboration
Portlet Container
InterfaceInterfaceInterface
Portlet PortletPortlet
GGF Summer School 24th July 2004, Italy
Text Services
User Client
Medline Server (Sheffield)
Swissprot/Blast record
Workflow Server
WorkflowEnactment
ExtractPubMed Id
Get MedlineAbstract
Initial Workflow
Cluster Abstracts
Get Related Abstracts
Medline: pre-processed offline to extract biomedical terms + indexed
XScufl workflow definition+ parameters
Clustered PubMed Ids+ titles
PubMed Ids
PubMed Ids
Term-annotatedMedline abstracts
MedlineAbstracts
GGF Summer School 24th July 2004, Italy
HistoryPre-Prototype
Prototype 1
ExperimentalWeb-based
Requirements gathering
Architectural workoutAll services represented
NetBeans workbenchAPI-based integration
Info Repository orientedXML-based process provenance
Workflow enactment engine
Prototype 2
Second generation servicesReworked information model
Open information managementLife Science IdentifiersRDF based provenance
Taverna workbenchWeb-based portal
Demo at ISMB 2003
Full paper and demoat ISMB 2004GSK deployment
Real biology
GGF Summer School 24th July 2004, Italy
Two+ Paths
Core functionality• Services – Soaplab
and Gowlab• Workflow enactment
engine – Freefluo• Workflow workbench
– Taverna• Data integration –
OGSA-DQP• Information model &
management• Mediator
Innovative work• Service and workflow
registration• Semantic discovery• Provenance
management• Text mining
In between• Event notification
GGF Summer School 24th July 2004, Italy
myGrid PeopleCore• Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro
Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pokock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe.
Users• Simon Pearce and Claire Jennings, Institute of Human Genetics School of
Clinical Medical Sciences, University of Newcastle, UK• Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester,
UK• Steve Kemp, Liverpool, UKPostgraduates• Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith
Flanagan, Antoon Goderis, Tracy Craddock, Alastair HampshireIndustrial • Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM)• Robin McEntire (GSK)Collaborators• Keith Decker
GGF Summer School 24th July 2004, Italy
Collaboration
http://www.accessgrid.org
GGF Summer School 24th July 2004, Italy
Publications• R. Stevens, H.J. Tipney, C. Wroe, T. Oinn, M. Senger, P. Lord, C.A. Goble, A. Brass
and M. Tassabehji Exploring Williams-Beuren Syndrome Using myGrid to appear in Proceedings of 12th International Conference on Intelligent Systems in Molecular Biology, 31st Jul-4th Aug 2004, Glasgow, UK.
• C.A. Goble, S. Pettifer, R. Stevens and C. Greenhalgh Knowledge Integration: In silico Experiments in Bioinformatics in The Grid: Blueprint for a New Computing Infrastructure Second Edition eds. Ian Foster and Carl Kesselman, 2003, Morgan Kaufman, November 2003.
R. Stevens, A. Robinson, and C.A. Goble myGrid: Personalised Bioinformatics on the Information Grid in proceedings of 11th International Conference on Intelligent Systems in Molecular Biology, 29th June–3rd July 2003, Brisbane, Australia, published Bioinformatics Vol. 19 Suppl. 1 2003, pp302-304.
GGF Summer School 24th July 2004, Italy
http://www.mygrid.org.uk