an open platform approach for management and analysis of next...

40
An open platform approach for management and analysis of next- generation genotyping data by breeders and geneticists Ramil P. Mauleon Scientist – Bioinformatics Specialist TT Chang Genetic Resources Center International Rice Research Institute NGGIBCI-2014, ICRISAT, India

Upload: others

Post on 14-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

An open platform approach for management and analysis of next-

generation genotyping data by breeders and geneticists

Ramil P. Mauleon Scientist – Bioinformatics Specialist

TT Chang Genetic Resources Center

International Rice Research Institute

NGGIBCI-2014, ICRISAT, India

Page 2: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Presented in behalf of my co-authors from IRRI

Laboratory, software team

• Venice Margaret Juanillas

• Christine Jade Dilla-Ermita

Lead Scientists

• Michael Thomson – Genotyping Service Laboratory

• Nickolai Alexandrov – International Rice Informatics Consortium

• Hei Leung – Program 1 Leader

• Eero Nissila – Program 2 Leader

Page 3: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Outline

• Introduction to IRRI / Global Rice Science Partnership research agenda and bioinformatics needs

• Open Bioinformatics Platforms adopted to support to molecular rice breeding at IRRI

Page 4: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

INTERNATIONAL RICE RESEARCH INSTITUTE Los Baños, Philippines

Mission:

Reduce poverty and hunger,

Improve the health of rice farmers and consumers,

Ensure environmental sustainability

All done through research, partnerships Home of the Rice Green Revolution

Established 1960 www.irri.org

Aims to help rice farmers improve the yield and quality of their rice by developing.. •New rice varieties •Rice crop management techniques

Page 5: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

A single strategic work plan for global rice research…

Global Rice Science Partnership : GRiSP o Core: 3 international research centers

o Numerous research partners

o NEED TO SHARE RESEARCH SOLUTIONS

IRRI Many more…

Page 6: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

NGS is at the heart of several GRiSP research activities…

• Characterizing genetic diversity and creating novel gene pools

>>Whole genome sequencing of genebank stocks

>>Specialized populations for genetic studies

• Genes and allelic diversity conferring stress tolerance and enhanced nutrition

>>Candidate gene discovery

• Accelerating the development, delivery, and adoption of improved rice varieties

>>high-throughput marker applications

Page 7: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Challenges for IRRI scientists/breeders

• Not familiar with SNP-based genotyping

o How do I score the alleles? (no gel image!!!)

o Data does not fit my spreadsheet (run out of columns, rows)…

o Cannot even view the data file using “ordinary” apps

o Computer runs out of memory when I load the dataset…

o Trusted analysis software crashes inexplicably…

• We need to

o enable geneticists, molecular biologists, breeders for bioinformatics

o Share solutions openly across GRiSP partners, with rice research community as a whole

Page 8: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Bioinformatics needs

1. Manipulate, pre-process, and analyze HT data

o Ideally from genotyping machine to analysis-ready data to analysis results

o GALAXY Bioinformatics (Blankenberg et al 2010, http://galaxyproject.org/)

2. Systematic storage of HT genotyping datasets

o Database

o Retrieval of data for analysis

o Generation Challenge Program BMS-GDMS

Page 9: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Galaxy has features that fit our needs

“Open, web-based platform for accessible, reproducible, and transparent computational biomedical research”

• Accessible: Users w/o programming experience can easily specify parameters and run tools and workflows.

• Reproducible: Galaxy captures info so that any user can repeat and understand a complete computational analysis.

• Transparent: Users share and publish analyses via the web and create interactive, web-based documents that describe a complete analysis.

Page 10: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Standard Galaxy release

Page 11: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Illumina BeadXpress Genotyping

Fluidigm EP1Genotyping

GenomeStudio with Alchemy Plugin

IRRI GSL-Galaxy

Infinium Custom 6k chip

Integration of Galaxy to Genotyping Service Lab workflow

•SNP calling (Alchemy, TASSEL-GBS) •Data prep/manipulation •Genetic / association analysis •Bioinformatic analysis

Illumina Reduced genome resequencing

(GBS)

Page 12: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

IRRI GALAXY (current)

•Deployed in the cloud (Amazon Web Services Large instance in Asia-Pacific region) •Streamlined to contain rice-specific tools and genotyping data •NO NGS assembly tools included

Page 13: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Rice genome browser installed as data source for curated SNP, genome information

Comprehensive information on SNPs used in GSL

Page 14: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Shared data libraries available within group or to the public

Page 15: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Data manipulation tools in GSL Galaxy

•Format conversion for most commonly used genotype visualization, genetic analysis, diversity study software

Page 16: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Workflows for rice data analysis already defined

SNP calling to analysis-ready dataset for Illumina BeadXpress, Infinium platforms, being tested on Fluidigm system…

Page 17: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Implementing Buckler GBS bioinformatics pipeline…

Page 18: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

GBS SNP discovery analysis steps installed …

• TO follow …

Page 19: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Workflow for GBS pipeline implemented, customizable steps …

Page 20: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

SNP Data filtering workflow …

Page 21: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Defined workflows could be shared to other users for reproducibility of analysis

Page 22: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Useful SNP analysis tools already in place

Page 23: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

TASSEL is being incorporated into IRRI Galaxy TASSEL (Buckler Laboratory, Cornell University) : a Stand-Alone software package to evaluate traits associations, evolutionary patterns, and linkage disequilibrium.

Page 24: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

TASSEL – IRRI Galaxy first iteration …

TASSEL pipeline mode for GWAS, population analysis

Page 25: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Software Tools for SNP analysis

• SNP calling: Alchemy (http://alchemy.sourceforge.net/), TASSEL-GBS (http://sourceforge.net/projects/tassel/)

• SNP data exploration, visualization: Flapjack (http://bioinf.scri.ac.uk/flapjack/), TASSEL

• Genetic linkage mapping: Mapmanager QTX, R/QTL

• QTL analysis: R/QTL, QGene, MPMap (for multi-parent inter-crosses)

• GWA analysis: TASSEL, SNP/WGA

• Population structure / diversity analysis : Powermarker, Structure

Page 26: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

IRRI Galaxy Toolshed (“APPS STORE”) is under development

Page 27: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

IRRI GALAXY Roadmap

Analysis –ready data for external software

•GSL-specific data analysis software (GSL-generated data + analysis) for clients •Community user analysis

Scientific community users

Published analysis workflows/methods :GRiSP output

Page 28: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Genotyping data management

IRRI GSL manages data of customers …

• Customer declares as private – retained in GSL Galaxy account of customer

• Customer declares data as public – loaded into Genotyping Data Management System; shared with research community

Page 29: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks
Page 30: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

> 2000 varieties with 384 SNP genotyping data loaded

Page 31: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks
Page 32: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks
Page 33: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks
Page 34: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks
Page 35: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Genotype data matrix …

Page 36: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks
Page 37: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks
Page 38: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Short-term roadmap for GDMS & IRRI Galaxy

Goal: Integrate GDMS seamlessly as a data source in IRRI Galaxy

Page 39: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Conclusion

> Web- based open software platforms are used to build :

• Data analysis workbench using Galaxy framework, currently tightly integrated with IRRI Genotyping Service Laboratory

• Medium throughput genotyping database using GDMS

> Tools geared to handle NGS – based / high throughput genotyping datasets

> User interface designed for use by geneticists, molecular biologists, and plant breeders

Page 40: An open platform approach for management and analysis of next ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Mauleon-Open... · >>Whole genome sequencing of genebank stocks

Thank you!!