life science grid middleware in a more dynamic environment

27
GADA Workshop 1-2 November 2005 Life Science Grid Middleware in a More Dynamic Environment Milena Radenkovic & Bartosz Wietrzyk The University of Nottingham, UK http://www.mygrid.org.uk

Upload: marty

Post on 13-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Life Science Grid Middleware in a More Dynamic Environment. Milena Radenkovic & Bartosz Wietrzyk The University of Nottingham, UK http://www.mygrid.org.uk. Talk Plan. From Grid middleware to WSRF and WSN myGrid overview Integrating myGrid with WSRF/WSN Future: self-organizing Grids. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

Life Science Grid Middlewarein a More Dynamic Environment

Milena Radenkovic & Bartosz Wietrzyk

The University of Nottingham, UK

http://www.mygrid.org.uk

Page 2: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

Talk Plan

1. From Grid middleware to WSRF and WSN

2. myGrid overview

3. Integrating myGrid with WSRF/WSN

4. Future: self-organizing Grids

Page 3: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

Web Services

• Web services - the application-centric Web– Standards for message exchanges and interfaces– XML based– Programming language and platform independent

• Convergence of Grid and Web Services• Web Services and the State• Failure of the Open Grid Service Infrastructure

– No modularity– Limited compatibility with existing Web Services– Too object oriented

Page 4: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

WSRF and WSN

• Web Service Resource Framework (WSRF)– Generic and open framework

for modelling and accessing stateful resources using Web Services

– Standardizing the design patterns and message exchanges for expressing state

– Instruction set for the Grid [Priol, 2005]

• Web Service Notification (WSN)– WSRF based

publish/subscribe notification

WSDLSOAP WS-Addressing

WSRF WS-Notification

WS-ResourceProperties

Obligatory

WS-BaseFaults

WS-RenewableReferences

WS-ResourceLifetime

WS-ServiceGroups

Optional

WS-BaseNotification

WS-Topics

WS-BrokerNotification

Obligatory

Optional

Page 5: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

Resource modelling in WSRF

• Stateful resource + stateless Web Service = WS-Resource• WS address + resource identifier = WS-Resource qualified

endpoint reference• Dynamic creation/destruction of resources• The resource state defined by the resource properties

document

Page 6: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

Talk Plan

1. From Grid middleware to WSRF and WSN

2. myGrid overview

3. Integrating myGrid with WSRF/WSN

4. Future: self-organizing Grids

Page 7: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

myGrid

• One of the leading EPSRC eScience pilot projects• Open Source Semantic Grid middleware for

Bioinformatics• High-level services for data and application integration

– resource discovery– distributed query processing– workflow enactment

• Additional services supporting scientific method– provenance management– change notification– personalization

Page 8: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

myGrid architecture

Legacyapplications Web sites Web services

OGSA-DAIdatabases

SoaplabGowlab

OGSA-DAI DQPservice

AMBITtext extraction

service

myGridInformation

model

Feta semantic discovery

Pedro semantic

publication

Pedro semantic

publication

Ser

vice

and

wor

kflo

w

disc

over

y

mIR metadata

store

myGridontology

Pedro semantic

publication

Met

adat

a m

anag

emen

t

Provenance capture

ExternalServices

FreefluoWorkflowengine

WorkflowManagement

mIR myGridInformationrepository

DataManagement

Notificationservice

E-Sciencemediator

LSID support

E-Science coordination

E-S

cience events

Tavernae-Scienceworkbench

Webportals e-Science

processpatterns

CoreServices

LSIDLaunchapad

Haystack

UtopiaThi

rd-

part

y to

ols

Web service communication fabric

Page 9: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

In silico experiments in myGrid

Scufl Simple Conceptual Unified Flow LanguageTaverna Writing, running workflows & examining resultsSOAPLAB Makes applications available

Freefluo Workflow engine to run workflows

Freefluo

SOAPLABWeb Service

Any Application

Web Service e.g. DDBJ BLAST

SeqHoundService

Page 10: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

Soaplab Service

WSDL Web Service BioMOBY Service

Local Java Service

Page 11: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

Talk Plan

1. From Grid middleware to WSRF and WSN

2. myGrid overview

3. Integrating myGrid with WSRF/WSN

4. Future: self-organizing Grids

Page 12: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

myGrid’s stateful components

• myGrid Information Repository (MIR)– Data entities

• Workflow Enactment– Enactment services– Workflow enactments

• myGrid Notification Service

Page 13: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

myGrid Information Repository (MIR) – before WSRF

• MIR data model comprises entity types associated with XML schemas

• Entities are:– described by attributes– stored in a relational database– accessed through the Web Service

interface– Identified by Life Science IDs

Page 14: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

Our model

MIR entity WS-Resource

Entity type WS-Resources type

Entity attribute WS-Resource property

LSIDWS-Resource

Qualified Endpoint Reference

Page 15: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

Our new data architecture

WS-Resource

Data resource hosting server

Public resource factory and

discovery serverRFAD service

RFADservice

Client

ClientWS-Resource

WS-Resource

Data resource hosting server

Public resource factory and

discovery serverRFAD serviceRFAD

service

Client

ClientWS-Resource

Res

ourc

e cr

eatio

n an

d di

scov

ery

Page 16: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

myGrid’s stateful components

• myGrid Information Repository (MIR)– Data entities

• Workflow Enactment– Enactment services– Workflow enactments

• myGrid Notification Service

Page 17: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

Our new enactment architecture

EnactmentGroupresource

Enactmentresource

Enactment server

Ena

ctm

ent c

reat

ion Enactment creation and

discovery server

EnactmentFactory

Client

Client

Ena

ctm

ent c

reat

ion

Enactmentresource

Enactmentresource

Enactment server

EnactmentFactory

Enactmentresource

Page 18: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

myGrid’s stateful components

• myGrid Information Repository (MIR)– Data entities

• Workflow Enactment– Enactment services– Workflow enactments

• myGrid Notification Service

Page 19: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

myGrid Notification Service• Every WS-Resource can be a notification producer and manage its

subscription• Notification Brokers are optional – not necessary for simple deployments

NotificationProducer 1

NotificationProducer 2

NotificationProducer 3

NotificationProducer 4

NotificationProducer 5

NotificationConsumer

NotificationConsumer

NotificationConsumer

NotificationConsumer

NotificationConsumer

NotificationConsumer

Topicaggregation

DistributedMessageDelivery

Topic Set 1 Topic Set 2 Topic Set 3 Topic Set 4 Topic Set 5

Topic Set 3

Topic Set 4

Topic Set 5

Topic Set 1

Topic Set 2Notification

Broker

Topic Set 3

Topic Set 4

Topic Set 5

Topic Set 1

Topic Set 2Notification

BrokerTopic Set 3

Topic Set 4

Topic Set 5

Topic Set 1

Topic Set 2Notification

Broker

• Notification Brokers can:– Aggregate topics from

different notification producers to support their discovery

– Distribute the task of message delivery to increase its speed and decrease the network congestion

Page 20: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

Why Apache WSRF/Pubscribe?

• Increased compatibility with the implemented myGrid components (Java API)

• Dynamic creation of WS-Resources• Call-backs for modification of WS-Resources• High portability (compatible with any Java servlet

container)• Free and Open Source

Page 21: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

Advantages of the integration

• More flexible, distributed and scalable architecture• More scalable, distributed and lightweight notification

infrastructure• One coherent interface to all components• Decreased design efforts in the future• Compatibility with any servlet container• Easier integration with third party software and UK’s

National Grid Service

Page 22: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

Talk Plan

1. From Grid middleware to WSRF and WSN

2. myGrid overview

3. Integrating myGrid with WSRF/WSN

4. Future: self-organizing Grids

Page 23: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

Future: self-organizing Grids

• Current limitations of myGrid:– Naming scheme depends on the DNS servers– State is only available when the hosting machine is

online– Deployment and maintenance requires high

administration effort• Our current work:

– Using Distributed Hash Tables (DHTs) to provide self-organization

– Using self-organized, distributed caching of the state to increase its availability

Page 24: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

Conclusions

• Our work is generic and applicable for other existing higher level middleware projects

• WSRF/WSN standards are well suited for the complex higher level middleware

• However migration may require a significant coding effort

Page 25: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

EPSRC funded UK eScience Program Pilot Project

Some slides taken from Carole Goble

Page 26: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

Core• Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro

Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Jan Humble, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pocock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Ian Roberts, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson, Jimi Worthington and Chris Wroe.

Users• Simon Pearce and Claire Jennings, Institute of Human Genetics School of

Clinical Medical Sciences, University of Newcastle, UK• Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester,

UK• Steve Kemp, Liverpool, UKPostgraduates• Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, Keith Flanagan, Antoon

Goderis, Tracy Craddock, Alastair Hampshire, Bartosz WietrzykIndustrial • Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM)• Robin McEntire (GSK)Collaborators• Keith Decker

Page 27: Life Science Grid Middleware in a More Dynamic Environment

GADA Workshop 1-2 November 2005

References

• Publications on– Home page: www.mrl.nott.ac.uk/~bzw/– myGrid site: www.mygrid.org.uk