the generation challenge programme (gcp) platform for crop research
DESCRIPTION
The Generation Challenge Programme (GCP) Platform for Crop Research. Richard Bruskiewich and the rest of …. …The GCP SP4 team and Contributors. Theo van Hintum (WUR), GCP Subprogramme 4 Leader. IRRI-CIMMYT Crop Research Informatics Laboratory Graham McLaren - PowerPoint PPT PresentationTRANSCRIPT
1 1 1 1
The Generation Challenge Programme (GCP)
Platform for Crop Research
Richard Bruskiewich
and the rest of …
…The GCP SP4 team and Contributors
IRRI-CIMMYT
Crop Research
Informatics Laboratory
Graham McLaren
Thomas Metz
Martin Senger
Ramil Mauleon
Mylah Anacleto
Michael Jonathan Mendoza
Victor Jun Ulat
Arllet Portugal
Ryan Alamban
Lord Hendrix Barboza
Jeffrey Detras
Kevin Manansala
Jeffrey Morales
Barry Peralta
Rowena Valerio
Nelzo Ereful
CIP:
Reinhard Simon
Edwin Rojas
ICRISAT:
Jayashree Balaji
ICARDA:
Akinnola Akintunde
NCGR:
Andrew Farmer
Gary Schiltz
SCRI:
Jennifer Lee
David Marshall
Cornell University:
Terry Casstevens
Pankaj Jaiswal
Dave Matthews
ACGT:
Ayton Meintjes
Jane Morris
CIRAD:
Manuel Ruiz
Alexis Dereeper
Matthieu Conte
Brigitte Courtois
Bioversity:
Mathieu Rouard
Tom Hazekamp
Milko Skofic
Raj Sood
NIAS:
Masaru Takeya
Koji Doi
Kouji Satoh
Shoshi Kikuchi
EMBRAPA:
Marcos Costa
Natalia Martins
Georgios Pappas
Guy Davenport
Trushar Shah
Kyle Braak
Sebastian Ritter
Yi Zhang
Sergio Gregorio
Joseph Hermocilla
Michael Echavez
Roque Almodiel
Samart Wanchana
Supat Thongjuea
Theo van Hintum (WUR), GCP Subprogramme 4 Leader
University of British Columbia:
Mark Wilkinson
GSC Bioinformatics Graduate Program, BC Cancer Agency:
Benjamin Good
James Wagner
Overview
Generation Challenge Programme crop informatics research and development
GCP platform architecture: Domain model & ontology
Application development framework
Challenge Programme
“I challenge the next generation to use new scientific tools and techniques to address the problems that plague the world’s poor”
Dr. Norman Borlaug
http://www.generationcp.org
An international research programme established in 2003, projected to last 10 years, and hosted by the CGIAR with global partners from ARI and NARES
Research Themes Directed to Crop Improvement:
Genomics and comparative biology across species
Characterization of genetic diversity for allele mining
Gene transfer technologies
Five research subprogrammes, one of which is crop information systems development.
What is it?
Challenge Programme
Cornell University USA
Wageningen University Netherlands
John Innes Centre UK
NIAS Japan
Agropolis France
CIPPeru
CIATClombia
CIMMYTMexico
BioversityItaly
WARDACote d’Ivore
IRRIPhilippines
ICRISATIndia
ICARDASyrian Arab Rep.
IITANigeria
EMBRAPA Brazil
BioTecThailand
ACGTSouth Africa
ICARIndia
CAAS China
Genomic annotation,Forward and
Reverse Genetics,Gene arrays/gels
Candidate genes
NILs, RILsMapping pop.
Mutants
Beneficial allelesLinked to Traits
Genebank
GermplasmGenotyping &Phenotyping
Value-added varieties
Advanced breeding lines
as vehicles
Marker-aided Selection/
TransformationProcess
GeneticResources
Product
SP2: Functional Assignment
SP1: Allelic Mining
SP3: TraitSynthesis
GCP Research: from Genotype to Phenotype
• Anatomical• Developmental• Field Performance• Stress Response
GenotypeGermplasm Phenotype
MolecularExpression
Environmen
t
Integration across Diverse Crop Data
• Inventory• Identification (passport)• Genealogy
• Genetic Maps• Physical Maps• DNA Sequence• Functional Annotation• Molecular Variation (Natural or Induced)
• Location (GIS)• Climate• Day Length• Ecosystem• Agronomy• Stresses
• Transcripteome• Proteome• Metabolome• Physiology
has has
determinesdetermines
affects
Crop Information Systems: the Next
Large, globally distributed consortium
Diverse research requiring a diversity of tools
Large data sets with diverse data types
Many legacy informatics systems and tools
Global data integration required…
Key Issue: Interoperability
Some Basic GCP Research Objectives
Compile a list of germplasm meeting specific passport data criteria
Compile a list of genetic markers of interest from genetic and QTL maps
Retrieve genotypes of specified markers, for specified germplasm
Align gene expression data against QTL positional evidence to identify candidate gene loci for specified traits
A Generalized GCP Crop Research Integration Work Flow
ComparativeMap & Trait
Viewer(NCGR/ISYS)
GeneticMap DataSource(s)
Generation Challenge Programme Domain Model & Middleware
GermplasmPassport/
Phenotype/Genotype
Querybuilder
Comparative(Functional)Genomics
Tools
DIVA-GIS
GermplasmData
Source(s)
GenomicsData
Source(s)
GISData
Source(s)
Get/analyse agenetic map
Find germplasmgenotyped with
mapped markers
Get genotype & phenotype of germplasm
Get candidate genes in map
interval
Get functional information about
genes
Plot germplasm, genotype and phenotype on geographical
maps
Analyse source
environment of
germplasm
Select “interesting”
candidate genes; get
alleles
Select adapted germplasm with favorable
phenotype & alleles for further evaluation
An environment that provides improved access to data and analysis tools
applications
integrated databases and tools
GCP Information Platform: User Perspective
GCP Information Platform – Developers’ Perspective
application layerm
iddl
ewar
e
internet
Tapir MOBY, etc.
Data Registry
local database layer
Generation CP Platform
http://pantheon.generationcp.org
GCP Platform - General Architecture
“Model Driven Architecture” based on “platform independent” GCP scientific domain models, parameterized with controlled vocabulary (“ontology”)
GCP domain models mapped onto platform specific implementations.
Reference (Java) GCP platform application programming interface (API)
Semantics of the GCP Model Driven Architecture
GCP is trying to model the meaning (“semantics”) of the crop research world.
Semantics is found in the domain model at three distinct but interconnected levels: System architectural level: general scientific semantics in
terms of high-level object concepts (“object types”) and their global inter-relationships.
Entity level: attributes and behaviors internal to high-level object types.
Attribute level: attribute values of objects that range over data types: simple (e.g. identifiers, numbers), complex (other classes of entities) or ontology (such as Gene Ontology (GO) terms, for a gene product).
Germplasm
Phenotype
has an
Attribute
Value
Observable
with a
has a
ranges over Plant
Ontology
Layers of Semantics
1
Object Model of the Scientific Domain…
23
…Parameterized with Ontology
GCP Domain Model Specification
High-level object types are specified with Unified Modeling Language (UML) and associated text narratives.
Major object classes are represented in the object model. More specialized object types are specified by subclassing major object types using ontology.
Reference model is coded by Eclipse Modeling Language managed with source code versioning and automatically compiled into other representations.
http://pantheon.generationcp.org/demeter
Scope of GCP Domain Model & Ontology
Core models: generic concepts – identification, entities, features, organization, data management Models heavily parameterized by ontology (e.g. entity and
feature “type” attributes)
Scientific models: extends core model into specific scientific scopes relevant to GCP: Germplasm data (including genetic resources passport)
Genomics including genotypes, maps, sequences and functional annotation.
Phenotype data
Environmental data (including geographical location)
GCP Ontology
Every attribute in the GCP domain model with data type SimpleOntologyTerm or subclass thereof, is an integration point for an external ontology.
External public ontology (e.g. GO, PO, SO) reused when available, and new ontology developed within GCP to fill gaps.
Ontology consolidated into GCP database based on GMOD Chado CV tables, indexed within platform using a GCP formatted identifier (that retains the source’s identifier).
GCP Domain Model Mappingsonto Platform Specific
Implementations
GCP PlatformJava Middleware& Applications
OWL/RDF Ontology:VPIN/SSWAP.info
SOAP Web Services(BioMOBY, SoapLab, GDPC)
XML Schemata:GCP Data Templates,
BioCASE/Tapir
GCP Domain Model (UML/EMF)
GCP OntologyDatabase
http://pantheon.generationcp.org/demeter
Reference GCP Platform API
PantheonBase: a relatively simply core Java Application Programming Interface (API) for software integration: DataSource: query data resources, using simple,
ontology-driven SearchFilter specifications
DataTransformer: computational input/output
DataConsumer: communicate data to viewers
http://pantheon.generationcp.org
GCP DataSource Interface
DataSource Interface
GCP Data Source Implementations
Direct Integration of relational databases (Spring HttpInvoker, Hibernate, JPA): Developed for ICIS, GMOD Chado (beta)
Protocols: Generalized Java Client to connect to BioMoby web
services; Java support for GCP-compliant BioMoby web service provider development (beta)
Support for BioCase/Tapir data source integration (prototyped)
GCP-compliant GDPC data source (prototyped) SSWAP/VPIN wrapper (under discussion)
Some other direct custom data source wrappers
Some GCP BioMOBY docs…
http://cropwiki.irri.org/gcp/index.php/MOBY_Rice_Network
http://pantheon.generationcp.org/moby
http://moby.generationcp.org
GCP BioMoby Support – a Synopsis
1. MoSES + Dashboard developed (M. Senger).
2. GCP model specific BioMoby datatypes specified.
3. Java libraries partly developed for interconversion of GCP BioMoby data types to/from GCP domain model Java objects (Barboza).
4. GCP DataSource Java implementation developed for client side of BioMoby that maps GCP DataSource find() use cases onto BioMoby web services using a using XML configuration files (no coding).
5. Java design pattern for modular implementation of BioMoby web services that get their data from any GCP-compliant DataSource that supports a given find() use case.
GCP BioMoby “Sandwich”
(Partial) Inventory of 3rd Party Data Resources targeted for wrapping as GCP
Data SourcesData Type Description
Microarray Data MAXD database with microarray datasets from diverse GCP commissioned or competitive projects.
Genetic and QTL Mapping Data
QTL data available in ICIS, TropGenes. Genomic Diversity and Phenotype Connector (GDPC) connecting to Gramene, Panzea, GrainGenes et al.
Genomic Sequence Data and Annotation
NIAS KOME full length cDNA and RAP genome databases (?), connected to GCP web services by NIAS. OryzaSNP and GCP comparative genomic databases. Public sequence databases (via BioJava?)
Functional Genomics OryGenesDb mutant data (CIRAD); IR64 rice mutant database (IRRI); Tos17 database (NIAS).
Germplasm Sample Characterization Data
Germplasm, passport, genotype and associated field data available in ICIS databases; TropGenes, MGIS, ICRIS.
GCP Platform Implementations
Standalone workbench (“GenoMedium”) Eclipse Rich Client Platform (RCP)
Web-based workbench (“Koios”) AJAX, PHP, Java (server side), Java Web Start
NCGR Integrated SYStem (ISYS)
Direct tool integration (e.g. GCP MaxdLoad)
http://moby.generationcp.org
GCP Web-Based Search Engine
http://koios.generationcp.org
GCP semantics defined query
Summary of query
hits
List of items matched
View details at 3rd party web site or in locally invoked
3rd party data viewer
(Partial) Inventory of 3rd Party Analysis/Viewer Software being targeted for
GCP IntegrationTool Purpose
SoapLab2 Remote computational services access
Taverna Bioinformatics work flow management
Apollo Genome sequence browser
Cytoscape Visualization of networks
ATV Phylogenetic tree visualization
JalView Comparative sequence alignments
TMEV Microarray data analysis
EASE, Mapman Gene functional annotation
CMTV Comparative mapping and QTL
MAXDLoad & MAXDView Microarray data management
GDPC tools (Browser,Tassel) Genomic diversity analysis
GCP “Pantheon” Project in CropForge
http://cropforge.org/projects/pantheon/
Closing Perspective
The GCP is a global consortium of 22++ crop research partners who need to share diverse large data sets and tools, in a globally distributed manner.
Given the scope and duration of the GCP, developers within the consortium embraced the task of developing public global informatics standards for interoperability and integration.
The effort is an open source, global community building exercise.
We welcome the participation of any and all interested scientists and developers who might wish to use and/or contribute to the further evolution and application of these standards.