tin-lap lee: next-gen sequencing analysis by gigagalaxy

31
Next-Gen Sequencing Analysis by GigaGalaxy Tin-Lap, LEE School of Biomedical Sciences CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong

Upload: gigascience-bgi-hong-kong

Post on 28-Jan-2015

103 views

Category:

Technology


1 download

DESCRIPTION

Tin-Lap Lee's presentation at Bio-IT World Asia on "Next-Gen Sequencing Analysis by GigaGalaxy", 30th May 2013

TRANSCRIPT

Page 1: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

Next-Gen Sequencing Analysis by GigaGalaxy

Tin-Lap, LEESchool of Biomedical Sciences

CUHK-BGI Innovation Institute of Trans-omics,The Chinese University of Hong Kong

Page 2: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

CUHK-BGI Innovation Institute of Trans-

Omics (CBIIT)

• Jointly established between The Chinese University of Hong Kong (CUHK) and BGI in July 2011.

• “We aim to provide a platform conductive to training of multi-disciplinary talents conversant with the knowledge and application of genomics, proteomics, genetics, computation biology and bioinformatics, by capitalizing on both institutions’ expertise and strengths in genomic science.”

Page 3: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

Galaxy

http://galaxyproject.org/

Page 4: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

www.gigasciencejournal.com

Journal, data-platform and database for large-scale data

Editor-in-Chief: Laurie GoodmanExecutive Editor: Scott Edmunds

Commissioning Editor: Nicole NogoyLead Curator: Chris Hunter

Data Platform: Peter Li

in conjunction with

Page 5: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

GigaDB

Page 6: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

Giga-Galaxy Collaboration between GigaScience and CBIIT

A publicly accessible Galaxy Servers

Share some of the workload of the main Galaxy server

Host data and workflows published in GigaScience, particularly involving NGS data analysis

SOAP package: advantages from GigaGalaxy

Application Instance: SOAPdenovo2 tool

Page 7: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

http://www.cuhk.edu.hk/cbiit/galaxy.html

Galaxy/CUHK-BGI

Page 8: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

Import data from GigaDB to

GigaGalaxy

Page 9: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

GigaSolution: deconstructing the paper

www.gigadb.orgwww.gigasciencejournal.com

galaxy.cbiit.cuhk.edu.hk

Combines and integrates:

Open-access journal

Data Publishing Platform

Data Analysis Platform

Page 10: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

doi:10.1186/2047-217X-1-18doi:10.5524/100038

AnalysisData Methods

doi:10.5524/100044+ =

Wang J et al., (2012): Updated genome assembly of YH: the first diploid genome sequence of a Han Chinese individual (version 2, 07/2012). GigaScience Database. http://dx.doi.org/10.5524/100038

Luo R et al., (2012): Software and supporting material for “SOAPdenovo2: An empirically improved memory-efficient short read de novo assembly”. GigaScience Database. http://dx.doi.org/10.5524/100044

Data

Methods

Luo R et al., (2012): SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler GigaScience, 1:18 (28th December 2012) http://dx.doi.org/10.1186/2047-217X-1-18

Analysis

Example

Page 11: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy
Page 12: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

CBIIT GigaGalaxy Structure

ToolDevelopment PublishingBiomedical and bioinformatics research

Page 13: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

What is SOAP?• SOAP - a tool package that provides full solution to NGS data analysis by BGI.

http://soap.genomics.org.cn/

Page 14: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

SOAPdenovo2 tools An assembly tool for short reads generated from

NGS technology

Four modules Pregraph: construct bruijn graph Contig: identification from overlapping sequence reads Map: reads onto contigs Scaff: generate final assembly results

Generate 1. Contig and 2. Scaffold files

Page 15: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

SOAPdenovo2 in GigaGalaxy

Page 16: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

Integrate BGI SOAP tools into Giga-

Galaxy

Page 17: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

Assembly Supporting Tools

• SOAPfilter: removed reads with artifacts

• Kmerfreq HA: a kmer frequency counter

• Corrector HA: corrects sequencing errors in short reads

• Gapcloser: close gaps in scaffolds

Page 18: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

Put them together

Sequencing Data SOAPfilter kmerFreq HA

Corrector HASOAPdenovo2GAGE evaluation

Page 19: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

Soapdenovo2 Workflow

Page 20: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

S. Aureus Dataset

Page 21: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

GAGE

Page 22: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

Visualization Tool: CONTIGuator2

Page 23: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

CONTIGuator2 output

Page 24: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

VisualizationNC_010079.pdf

gi_161510924_ref_NC_010063.1_.pdf

Page 25: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

Help Center: Shared Data• Several Datasets are available from the shared

data menu for test-running the tools. • Data Libraries• Published Workflows• Published Pages

Page 26: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

What is in the shared data menu?

Page 27: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

SOAPdenovo2 tutorial

Page 28: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy
Page 29: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

How is GigaScience supporting

data reproducibility?

Data sets

Analyses

Linked to

Linked to

DOI

DOI

Open-Paper

Open-Review

DOI:10.1186/2047-217X-1-18

~10000 accesses

Open-Code

8 reviewers tested data in ftp server & named reports published

DOI:10.5524/100044

Open-PipelinesOpen-Workflows

DOI:10.5524/100038

Open-Data

78GB CC0 data

Code in sourceforge under GPLv3: http://soapdenovo2.sourceforge.net/~5000 downloads

Enabled code to being picked apart by bloggers in wiki http://homolog.us/wiki/index.php?title=SOAPdenovo2

Page 30: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

SOAPdenovo2 workflows implemented in

galaxy.cbiit.cuhk.edu.hk

Implemented entire workflow in GigaGalaxy server, inc.:

• 3 pre-processing steps

• 4 SOAPdenovo modules

• 1 post processing steps

• Evaluation and visualization tools

Will be available for >25K Galaxy users in Galaxy Toolshed

Page 31: Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

Acknowledgements• CUHK

• Huayuan Gao

• BGI-HK and GigaScience• Peter Li• Scott Edmunds

• Galaxy team members