myGridPersonalised extensible environments fordata-intensive in silico experiments in biology
http://www.mygrid.org.ukProfessor Carole Goble, University of Manchester,[email protected]
myGrid EPSRC funded pilot project Generic middleware within application
setting 36 month http://www.mygrid.org.uk
IBM
In silico experimentation
Discovery, interoperation, fusion, sharing
Process is as important as outcome
Science is dynamic – change happens
Scientific discovery is personal & global
Ad-hoc solutions, people-powered
myGrid resourcesQuestion: Nucleotide binding protein in mouse
Answer: P12345 in Swiss-Prot is an ATPaseTerri Attwood is an expert on thisJackson labs have a database but you need
to registerA paper has just been published in Proteins
by the Stanford lab on this.
Grid viewpoints
interrogation
workflows
results
Access Grid
New
B
iolo
gy
Technology Grid
private
public
What is it?Where is it?
How to get it?When did it happen?
Who knows it?Why does it?
What are you doing?
Governance & Control
myGrid e-Science objectives
Active support of scientific practice in biology Straightforward discovery, interoperation, sharing
information AND processes AND best practice Improving quality of both experiments and data
provenance through information <-> process linkage propagating change
Individual creativity & collaborative working personalisation
Cottage Industry to an Industrial Scale
myGrid operational environment
(DeFacto) StandardsOMG LSR, I3C, MGED, Gene Ontology
Open SourceOpen-Bio Foundation, Bio*
Sem
an
tic Web
RD
F, RD
FS, D
AM
L+O
IL
Bioinformatics integration platformsDAS, OpenBSA, ISYS, OpenMMS, Kleisli, Ensembl, AppLab,
SRS, BioNavigator, DiscoveryLink, GX, OPM, TAMBIS
Distributed Computing EnvironmentsCORBA, RMI, Jini, JXTA, DCOM
Web ServicesXML, SOAP, WSDL, UDDI
GRIDGlobus/SRB/Condor
Consortium Expertise
View propagation, reasoning, workflow …
Approach
Personalisation
Toolkits
Meta
data
Interoperation layer
Data mgtProcess mgtContext mgt
Communication fabric
Applications
myGrid Stack
Communication fabric
Interoperation layer
Personalisation
Toolkits
Applications
Meta
data Data mgtProcess mgtContext mgt
1. Resource management2. Middleware technologies incl. Globus3. Incorporating existing resources
Communication fabric
Interoperation layer
Personalisation
Toolkits
Applications
Meta
data Data mgtProcess mgtContext mgt
1. Integration & distributed queries 2. View management3. Personal repositories
Communication fabric
Interoperation layer
Personalisation
Toolkits
Applications
Meta
data Data mgtProcess mgtContext mgt
1. Process description & storage2. Process enactment3. Process personalisation
Communication fabric
Interoperation layer
Personalisation
Toolkits
Applications
Meta
data Data mgtProcess mgtContext mgt
1. Security & Confidentiality & Trust2. Provenance & Attribution3. Versioning
Communication fabric
Interoperation layer
Personalisation
Toolkits
Applications
Meta
data Data mgtProcess mgtContext mgt
1. Ontology languages & services2. Resource service descriptions 3. Annotation with metadata
Communication fabric
Interoperation layer
Personalisation
Toolkits
Applications
Meta
data Data mgtProcess mgtContext mgt
1. Agent based communication abstraction2. Software engineering paradigm for extensible
distributed services3. Foundation for architectural evolution
Communication fabric
Interoperation layer
Personalisation
Toolkits
Applications
Meta
data Data mgtProcess mgtContext mgt
1. Personal data repositories2. Personal processes3. Models of sharing
Communication fabric
Interoperation layer
Personalisation
Toolkits
Applications
Meta
data Data mgtProcess mgtContext mgt
1. User interfaces & visualisation2. Collaboration environments3. Environment development4. User-centred application development
Communication fabric
Interoperation layer
Personalisation
Toolkits
Applications
Meta
data Data mgtProcess mgtContext mgt
1. Specialist process: information extraction
myGrid outcomes
1. e-Scientists Environment built on toolkits for service access,
personalisation & community Gene function expression analysis using S.
cerevisiae Annotation workbench for the PRINTS pattern
database
2. Developers myGrid-in-a-Box developers kit Re-purposing DAS, AppLab and OpenBSA … Integrating ISYS & GlaxoSmithKline platforms
myGrid generic technologies
1. Database access from the Grid2. Process enactment on the Grid3. Personalisation services4. Metadata services 5. Laying the foundations for Agent Services
Ontologies, Protocols & APIs
Grid + Services + Semantic Web
Scientific Problems
Scientific Problems
ProcessesProcesses
KnowledgeKnowledge
InformationInformation
Jobs and Data
Jobs and Data
DataData
Raw Resources
Raw Resources
Knowledge / capability
Semantics / process
Data / applications
Valu
e
chain
Inte
ropera
bility
, hig
her le
vel o
nto
logie
s, re
aso
nin
g,
disco
very
, Reaso
nin
g se
rvice
s, Disco
very
se
rvice
s
Fulfillment Grid
"Reproduced by permission of the IT Innovation Centre, University of Southampton." http://www.it-innovation.soton.ac.uk
myGrid phased development
Versions of myGrid Varying degrees of
functionality
Pre-prototype
Architecture Simple services
Early toolkit trials
Developers toolkit
Application trials
Release
Extended services
6 months
12 months
24 months
33 months
myGridPersonalised extensible environments fordata-intensive in silico experiments in biologyhttp://www.mygrid.org.uk
Professor Carole Goble, University of Manchester,UK
Presented at the BiGUM1: Biological Grid Users Meeting 1
NeSC, Glasgow, Scotland October 30th 2001 http://www.nesc.ac.uk/esi/progs/
bigum1.html