an open platform approach for management and analysis of next...
TRANSCRIPT
An open platform approach for management and analysis of next-
generation genotyping data by breeders and geneticists
Ramil P. Mauleon Scientist – Bioinformatics Specialist
TT Chang Genetic Resources Center
International Rice Research Institute
NGGIBCI-2014, ICRISAT, India
Presented in behalf of my co-authors from IRRI
Laboratory, software team
• Venice Margaret Juanillas
• Christine Jade Dilla-Ermita
Lead Scientists
• Michael Thomson – Genotyping Service Laboratory
• Nickolai Alexandrov – International Rice Informatics Consortium
• Hei Leung – Program 1 Leader
• Eero Nissila – Program 2 Leader
Outline
• Introduction to IRRI / Global Rice Science Partnership research agenda and bioinformatics needs
• Open Bioinformatics Platforms adopted to support to molecular rice breeding at IRRI
INTERNATIONAL RICE RESEARCH INSTITUTE Los Baños, Philippines
Mission:
Reduce poverty and hunger,
Improve the health of rice farmers and consumers,
Ensure environmental sustainability
All done through research, partnerships Home of the Rice Green Revolution
Established 1960 www.irri.org
Aims to help rice farmers improve the yield and quality of their rice by developing.. •New rice varieties •Rice crop management techniques
A single strategic work plan for global rice research…
Global Rice Science Partnership : GRiSP o Core: 3 international research centers
o Numerous research partners
o NEED TO SHARE RESEARCH SOLUTIONS
IRRI Many more…
NGS is at the heart of several GRiSP research activities…
• Characterizing genetic diversity and creating novel gene pools
>>Whole genome sequencing of genebank stocks
>>Specialized populations for genetic studies
• Genes and allelic diversity conferring stress tolerance and enhanced nutrition
>>Candidate gene discovery
• Accelerating the development, delivery, and adoption of improved rice varieties
>>high-throughput marker applications
Challenges for IRRI scientists/breeders
• Not familiar with SNP-based genotyping
o How do I score the alleles? (no gel image!!!)
o Data does not fit my spreadsheet (run out of columns, rows)…
o Cannot even view the data file using “ordinary” apps
o Computer runs out of memory when I load the dataset…
o Trusted analysis software crashes inexplicably…
• We need to
o enable geneticists, molecular biologists, breeders for bioinformatics
o Share solutions openly across GRiSP partners, with rice research community as a whole
Bioinformatics needs
1. Manipulate, pre-process, and analyze HT data
o Ideally from genotyping machine to analysis-ready data to analysis results
o GALAXY Bioinformatics (Blankenberg et al 2010, http://galaxyproject.org/)
2. Systematic storage of HT genotyping datasets
o Database
o Retrieval of data for analysis
o Generation Challenge Program BMS-GDMS
Galaxy has features that fit our needs
“Open, web-based platform for accessible, reproducible, and transparent computational biomedical research”
• Accessible: Users w/o programming experience can easily specify parameters and run tools and workflows.
• Reproducible: Galaxy captures info so that any user can repeat and understand a complete computational analysis.
• Transparent: Users share and publish analyses via the web and create interactive, web-based documents that describe a complete analysis.
Standard Galaxy release
Illumina BeadXpress Genotyping
Fluidigm EP1Genotyping
GenomeStudio with Alchemy Plugin
IRRI GSL-Galaxy
Infinium Custom 6k chip
Integration of Galaxy to Genotyping Service Lab workflow
•SNP calling (Alchemy, TASSEL-GBS) •Data prep/manipulation •Genetic / association analysis •Bioinformatic analysis
Illumina Reduced genome resequencing
(GBS)
IRRI GALAXY (current)
•Deployed in the cloud (Amazon Web Services Large instance in Asia-Pacific region) •Streamlined to contain rice-specific tools and genotyping data •NO NGS assembly tools included
Rice genome browser installed as data source for curated SNP, genome information
Comprehensive information on SNPs used in GSL
Shared data libraries available within group or to the public
Data manipulation tools in GSL Galaxy
•Format conversion for most commonly used genotype visualization, genetic analysis, diversity study software
Workflows for rice data analysis already defined
SNP calling to analysis-ready dataset for Illumina BeadXpress, Infinium platforms, being tested on Fluidigm system…
Implementing Buckler GBS bioinformatics pipeline…
GBS SNP discovery analysis steps installed …
• TO follow …
Workflow for GBS pipeline implemented, customizable steps …
SNP Data filtering workflow …
Defined workflows could be shared to other users for reproducibility of analysis
Useful SNP analysis tools already in place
TASSEL is being incorporated into IRRI Galaxy TASSEL (Buckler Laboratory, Cornell University) : a Stand-Alone software package to evaluate traits associations, evolutionary patterns, and linkage disequilibrium.
TASSEL – IRRI Galaxy first iteration …
TASSEL pipeline mode for GWAS, population analysis
Software Tools for SNP analysis
• SNP calling: Alchemy (http://alchemy.sourceforge.net/), TASSEL-GBS (http://sourceforge.net/projects/tassel/)
• SNP data exploration, visualization: Flapjack (http://bioinf.scri.ac.uk/flapjack/), TASSEL
• Genetic linkage mapping: Mapmanager QTX, R/QTL
• QTL analysis: R/QTL, QGene, MPMap (for multi-parent inter-crosses)
• GWA analysis: TASSEL, SNP/WGA
• Population structure / diversity analysis : Powermarker, Structure
IRRI Galaxy Toolshed (“APPS STORE”) is under development
IRRI GALAXY Roadmap
Analysis –ready data for external software
•GSL-specific data analysis software (GSL-generated data + analysis) for clients •Community user analysis
Scientific community users
Published analysis workflows/methods :GRiSP output
Genotyping data management
IRRI GSL manages data of customers …
• Customer declares as private – retained in GSL Galaxy account of customer
• Customer declares data as public – loaded into Genotyping Data Management System; shared with research community
> 2000 varieties with 384 SNP genotyping data loaded
Genotype data matrix …
Short-term roadmap for GDMS & IRRI Galaxy
Goal: Integrate GDMS seamlessly as a data source in IRRI Galaxy
Conclusion
> Web- based open software platforms are used to build :
• Data analysis workbench using Galaxy framework, currently tightly integrated with IRRI Genotyping Service Laboratory
• Medium throughput genotyping database using GDMS
> Tools geared to handle NGS – based / high throughput genotyping datasets
> User interface designed for use by geneticists, molecular biologists, and plant breeders
Thank you!!