copyright discovery net imperial college 2001-2004 sars analysis on the grid discovery net in...
TRANSCRIPT
SARS Analysis on the GridSARS Analysis on the Grid
Discovery Net in Bioinformatics
OverviewOverview
• Introduction to Discovery Net
• SARS project
• Demo
• Conclusion
OverviewOverview
• Introduction to Discovery Net
• SARS project
• Demo
• Conclusion
Structure of Discovery NetStructure of Discovery Net
Workflow ExecutionA compositional GRID
Workflow ManagementCollaborative Knowledge Management
Workflow Deployment:Grid Service and Portal
WorkflowWarehousing
Resource Mapping
Service Abstraction
Workflow AuthoringComposing services
Condor-GCondor-G
Native MPINative MPI OGSA-serviceOGSA-service
Web ServiceWeb Service
UnicoreUnicoreOralce 10g
Web WrapperWeb WrapperSun Grid Engine
Component Design/Integration
Component modelComponent model• Components
– Nodes– Basic units of composition– Contain compositional, integrity and execution
logic
• Component frameworks– Groups of related nodes (sequence alignment)– Common object model (inputs/outputs are
typed)
• Component architectures– Grouping of related frameworks (bioinformatics)
Three levels of a componentThree levels of a componentConnectivity:
– What are my inputs?– What are my outputs?
Metadata:– What are my logical
constraints?– How do I verify myself?– What will I produce?
Execution:– What do I actually do?
Input types
Input metadata
Input data
Output types
Result metadata
Result
Construction of a componentConstruction of a component• Through Software Development Kit –
for new algorithms• Using template nodes for webservices,
command-line tools• With specialised IDEs to produce
customised components• Idea is to remove the complexity of
component construction as far as possible from the user
Workflow Warehousing and Workflow Warehousing and ProvenanceProvenance
• Workflows/Services record their history:
• Discovery Net records the full authoring information
• Users may annotate workflows
• All information stored in DPML
• Shared IP for a virtual Organization
• Users can browse for services based on properties
• Users can browse for existing workflows and workflow templates
• Users can see full project history for each service
Publishing of workflowsPublishing of workflows• Parameterisation of a workflow• Defining the black box that is offered to
the end-user• Once deployed, workflow is accessible as:
– Web service– Grid service– Command line tool– Web page
• Workflows combined in personalised portals
Discovery Net usersDiscovery Net users• Component developers
– IT-literate to an extent
• Analysis designers– Domain experts with understanding of the
research problem
• End users– Scientists with no interest in IT and
coding/assembling their software
• Line does get blurry!
Discovery Net Discovery Net Application ExamplesApplication Examples
• Environmental Modelling– High throughput dispersed air sensing
technology
• Life Sciences– High throughput genomics and proteomics
• Real time geo-hazard modelling– Earthquake modelling through satellite
imagery• GM Crop trial studies
– Simulating the effects of GM crops on the surrounding ecosystem
NMLKJIHGFEDCBA
123456
78
910
OverviewOverview
• Introduction to Discovery Net
• SARS project
• Demo
• Conclusion
SARS Basic FactsSARS Basic Facts• Appeared first in January 2003,
Guangdong province, China• SARS Coronavirus (SARS-CoV)
identified as the cause• China started a major research
initiative to investigate the biology of the virus and predict its behaviour
SARS projectSARS project• Collaboration between Discovery Net
and SCBIT (Shanghai Center for Bioinformation Technology)
• Annotation of SARS genomes obtained from different patient samples
• Analysis of mutation patterns of SARS virus
• Discovery Net providing the IT platform to organize the analysis
Work doneWork done• Data
– Research performed on 33 sample of SARS virus, sequenced from the Chinese patients
– Combined with publicly available data from NCBI
• Goal– Deeper understanding of the mutation patterns of
the SARS virus
• Analysis– Examining the variability of the virus on both
genomic and proteomic level– Providing full insight into the significance of
changes in the nucleic structure of the virus
Genomic analysisGenomic analysis
Alignment - data intensive, performed on the Grid
Retrieval of publicly available knowledge
Examining the variations in different strains
Phylogenetic Phylogenetic viewviewSARS Genome taken from Hong Kong Patients
SARS Genome taken from Beijing Patients
SARS Genome taken from Singapore Patients
Proteomic analysisProteomic analysisIsolating interesting
genomic regions
Identifying relevant protein sequences
Observing the variations in the resulting protein
Proteomic annotationProteomic annotation• Parallel
annotation with multiple sequence analysis tools
• Framework first used in Supercomputing 2002
Annotation editorAnnotation editor
SCBIT Analysis PortalSCBIT Analysis Portal
OverviewOverview
• Introduction to Discovery Net
• SARS project
• Demo
• Conclusion
Next stepNext step
• Portal technology used to build thematical portals concentrating on particular research areas
• Goal: to construct a number of public portals for the needs of the UK eScience community and make them accessible to all
OverviewOverview
• Introduction to Discovery Net
• SARS project
• Demo
• Conclusion
Discovery Net Advantages…Discovery Net Advantages…• Rapid component integration through SDK or
generic connectors:– Grid services– Web services– Command-line tools etc.
• Intuitive research assembly and management– Graphical workflow assembly
• Provenance of analysis– Within the server warehouse
• Personalised end-user environments– Discovery Portal
… … applied to SCBIT researchapplied to SCBIT research• Integrated
– Existing tools (EMBOSS, alignment apps)– In-house data stores (with SARS sequence
data)– Original algorithms for mining variation info
• Workflows assembled by the whole research group
• Research history tracked through the project change information
• SCBIT Portal creating a common platform for multidisciplinary users
SummarySummary• IT platform supporting an urgent discovery
research• Access to data within a scalable knowledge
creation infrastructure • Exploitation and annotation of biological
information using multiple sources, data types and locations
• Integration of external applications within a unified environment
• Sharing of methods, results and data views across the Virtual Organisation
Credits and further infoCredits and further info• Discovery Net team, especially
Moustafa Ghanem, Jameel Syed and Stuart Hassard
• http://www.discovery-on-the.net• Exhibiting at EPSRC and LESC stands• Demo today at 13:15 – 14:45 at EPSRC
stand