Download - Genentech icgc 2015
Status and Update of the International Cancer Genomics Consortium (ICGC)
June 1st 2015B.F. Francis Ouellette [email protected]
• Senior Scientists & Associate Director, Informatics and Biocomputing, Ontario Institute for Cancer Research, Toronto, ON
• Associate Professor, Department of Cell and Systems Biology, University of Toronto, Toronto, ON.
ONTARIO INSTITUTE FOR CANCER RESEARCH
You are free to:
Copy, share, adapt, or re-mix;
Photograph, film, or broadcast;
Blog, live-blog, or post video of;
This presentation. Provided that:
You attribute the work to its author and respect the rights and licenses associated with its components.
Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero.Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at;http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites
ONTARIO INSTITUTE FOR CANCER RESEARCH
3Module #: Title of Module
ONTARIO INSTITUTE FOR CANCER RESEARCH
ONTARIO INSTITUTE FOR CANCER RESEARCH
Disclaimer
I am on the SAB of many NIH funded projects (SGD, Galaxy, GenomeSpace, and HMP2), as well as on the Science, Industry Advisory Committee of Genome Canada.
I do not (and will not) profit in any way, shape or form, from any of the brands, products or companies I may mention.
ONTARIO INSTITUTE FOR CANCER RESEARCH
ONTARIO INSTITUTE FOR CANCER RESEARCH
International Cancer Genome Consortium
ONTARIO INSTITUTE FOR CANCER RESEARCH
http://www.csb.utoronto.ca/
ONTARIO INSTITUTE FOR CANCER RESEARCHhttp://bioinformatics.ca/
ONTARIO INSTITUTE FOR CANCER RESEARCH
ONTARIO INSTITUTE FOR CANCER RESEARCH
ONTARIO INSTITUTE FOR CANCER RESEARCH
http://bioinformatics.ca/workshops/2014
ONTARIO INSTITUTE FOR CANCER RESEARCH
CancerA Disease of the Genome
Challenge in Treating Cancer:
Every tumor is different Every cancer patient is different
ONTARIO INSTITUTE FOR CANCER RESEARCH
Johns Hopkins> 18,000 genes analyzed for mutations11 breast and 11 colon tumorsL.D. Wood et al, Science, Oct. 2007
Wellcome Trust Sanger Institute518 genes analyzed for mutations210 tumors of various typesC. Greenman et al, Nature, Mar. 2007
TCGA (NIH)Multiple technologiesbrain (glioblastoma multiforme), lung (squamous
carcinoma), and ovarian (serous cystadenocarcinoma).
F.S. Collins & A.D. Barker, Sci. Am, Mar. 2007
Large-Scale Studies of Cancer Genomes
ONTARIO INSTITUTE FOR CANCER RESEARCH
Heterogeneity within and across tumor types
High rate of abnormalities (driver vs passenger)
Sample quality matters
Consent and controlled data access is complicated
Lessons learned
ONTARIO INSTITUTE FOR CANCER RESEARCH
International Cancer Genome Consortium
Collect ~500 tumour/normal pairs from each of 50 different major cancer types;
Comprehensive genome analysis of each T/N pair: Genome
Transcriptome
Methylome
Clinical data
Make the data available to the research community & public.
Identify genome changes
…GATTATTCCAGGTAT… …GATTATTGCAGGTAT… …GATTATTGCAGGTAT…
ONTARIO INSTITUTE FOR CANCER RESEARCH
Rationale for the ICGCThe scope is huge, such that no country can do it all.
Coordinated cancer genome initiatives will reduce duplication of effort for common and easy to acquire tumor samples and and ensure complete studies for many less frequent forms of cancer.
Standardization and uniform quality measures across studies will enable the merging of datasets, increasing power to detect additional targets.
The spectrum of many cancers varies across the world for many tumor types, because of environmental, genetic and other causes.
The ICGC will accelerate the dissemination of genomic and analytical methods across participating sites, and the user community
ONTARIO INSTITUTE FOR CANCER RESEARCH
ICGCGoals, Structure, Policies & Guidelines
http://goo.gl/sPGLQN
ONTARIO INSTITUTE FOR CANCER RESEARCH
Primary Goal: coordinate efforts to reach goals (50 tumours)
ONTARIO INSTITUTE FOR CANCER RESEARCH
http://docs.icgc.org/dcc-data-element-specifications
ONTARIO INSTITUTE FOR CANCER RESEARCH
Primary Goal: be comprehensive
http://goo.gl/BE7KH1
ONTARIO INSTITUTE FOR CANCER RESEARCH
Analysis Data Types
Germline variants (SNPs)
Simple Somatic Mutations (SSM)
Copy Number Alterations (CNA)
Structural Variants (SV)
Gene Expression (micro-arrays and RNASeq)
miRNA Expression (RNASeq)
Epigenomics (Arrays and Methylation)
Splicing Variation (RNASeq)
Protein Expression (Arrays)
ONTARIO INSTITUTE FOR CANCER RESEARCH
Primary Goal: generate highest quality
http://goo.gl/FXCvi9
ONTARIO INSTITUTE FOR CANCER RESEARCH
ONTARIO INSTITUTE FOR CANCER RESEARCH
Primary Goal: available to all
ONTARIO INSTITUTE FOR CANCER RESEARCH
Primary Goal: available to all
ONTARIO INSTITUTE FOR CANCER RESEARCH
• Detailed Phenotype and Outcome data Region of residenceRisk factorsExaminationSurgeryRadiationSampleSlideSpecific histological featuresAnalyteAliquotDonor notes
• Gene Expression (probe-level data)• Raw genotype calls• Gene-sample identifier links• Genome sequence files
ICGC Controlled Access Datasets
• Cancer Pathology Histologic type or subtypeHistologic nuclear grade
• Patient/Person Gender, Age range, Vital status, Survival timeRelapse type, Status at follow-up
• Gene Expression (normalized)• DNA methylation •Computed Copy Number and Loss of Heterozygosity• Newly discovered somatic variants
ICGC OA Datasets
http://goo.gl/w4mrV
ONTARIO INSTITUTE FOR CANCER RESEARCH
Secondary Goal: coordinate work to benefit productivity
http://goo.gl/K5mHC3
ONTARIO INSTITUTE FOR CANCER RESEARCH
https://icgc.org/icgc/committees-and-working-groups
ONTARIO INSTITUTE FOR CANCER RESEARCH
Secondary Goal: disseminate knowledge
http://goo.gl/ObcZXy
ONTARIO INSTITUTE FOR CANCER RESEARCH
ICGCGoals, Structure, Policies & Guidelines
http://goo.gl/sPGLQN
ONTARIO INSTITUTE FOR CANCER RESEARCH
Policy
ICGC membership implies compliance with Core Bioethical Elements for samples used in ICGC Cancer Projects:
http://goo.gl/TFrCmKhttp://goo.gl/nYx6YG
ONTARIO INSTITUTE FOR CANCER RESEARCH
POLICY:The members of the International Cancer Genomics Consortium (ICGC) are committed to the principle of rapid data release to the scientific community.
http://goo.gl/TFrCmK
ONTARIO INSTITUTE FOR CANCER RESEARCH
Publication Policy
The individual research groups in the ICGC are free to publish the results of their own efforts in independent publications at any time (subject, of course, to any policies of any collaborations in which they may be participating).
ONTARIO INSTITUTE FOR CANCER RESEARCH
Moratorium: http://www.icgc.org/icgc/goals-structure-policies-guidelines/e3-publication-policy
ONTARIO INSTITUTE FOR CANCER RESEARCH
Publication Policy
ONTARIO INSTITUTE FOR CANCER RESEARCH
Where do you find that information?
We actually make it hard to find, but we are working on that! (this is an example of where ICGC would like to do what TCGA does!)http://cancergenome.nih.gov/publications/publicationguidelines
ONTARIO INSTITUTE FOR CANCER RESEARCH
Policy on Intellectual PropertyAll ICGC members agree not to make claims to possible IP derived from primary data (including somatic mutations) and to not pursue IP protections that would prevent or block access to or use of any element of ICGC data or conclusions drawn directly from those data.
http://goo.gl/TCMXCl
ONTARIO INSTITUTE FOR CANCER RESEARCH
85 Projects 18 Jurisdictions 42 Cancer typesOver 12,000 Cancer Genomes
International Cancer Genome Consortium: February 2015
ONTARIO INSTITUTE FOR CANCER RESEARCH
DCC Activities
DCC activities are split between two groups:
Software Development
DCC portal
Submission tool
Biocuration (which also includes Content Management)
Data level management
Submitter “handling”
Coordination with secretariat
User support
http://dcc.icgc.org/team42
ONTARIO INSTITUTE FOR CANCER RESEARCH
Data
ValidationValidationValidation(dictionary)
Validation(across fields)
Validation(across fields)
Validation(across fields)
indexing
Happy Users
http://goo.gl/1EcyR
ONTARIO INSTITUTE FOR CANCER RESEARCH
http://docs.icgc.org/methods
ONTARIO INSTITUTE FOR CANCER RESEARCH
http://docs.icgc.org/dcc-data-element-specifications
ONTARIO INSTITUTE FOR CANCER RESEARCH
ICGC Biocuration
Helping submitters get their data to ICGC
Progress reporting (data audit)
Quality checks (coverage, correctness, etc.)
Helping users get to the data
Validate and check (and recheck) metadata on public repositories
Test and integrate with other public repositories via standard data formats, ontologies.
Documentation, documentation, and more documentation
Training
46
ONTARIO INSTITUTE FOR CANCER RESEARCH
ICGC datasets to date: https://dcc.icgc.org/projects/history
ONTARIO INSTITUTE FOR CANCER RESEARCH
http://goo.gl/CekF6y
Missing Clinical Data?
49
ONTARIO INSTITUTE FOR CANCER RESEARCH
http://goo.gl/CekF6y
50
ONTARIO INSTITUTE FOR CANCER RESEARCH
ONTARIO INSTITUTE FOR CANCER RESEARCH
DACOData Portal Info/help
Login
ONTARIO INSTITUTE FOR CANCER RESEARCH
http://dcc.icgc.org/
ONTARIO INSTITUTE FOR CANCER RESEARCH
http://dcc.icgc.org/
55 projects
Access to all data files(and more with DACO access)
Faceted searches
ONTARIO INSTITUTE FOR CANCER RESEARCH
https://dcc.icgc.org/projects
ONTARIO INSTITUTE FOR CANCER RESEARCH
https://dcc.icgc.org/search
ONTARIO INSTITUTE FOR CANCER RESEARCH
ONTARIO INSTITUTE FOR CANCER RESEARCH
https://dcc.icgc.org/repository
58
ONTARIO INSTITUTE FOR CANCER RESEARCH
ICGC DCC community http://goo.gl/wfxRqJ
https://goo.gl/M1vch1
ONTARIO INSTITUTE FOR CANCER RESEARCH
ICGCBAM/FASTQ
TCGABAM/FASTQ
ICGCOpenData
(includes TCGA
Open Data)
ONTARIO INSTITUTE FOR CANCER RESEARCH
ICGC
TCGA
ONTARIO INSTITUTE FOR CANCER RESEARCH
ICGC
TCGA
Differences between ICGC & TCGA• Different tumour types• Different geographic rules• Many countries vs one jurisdiction• Different definitions of what is controlled• Different data access rules
ONTARIO INSTITUTE FOR CANCER RESEARCH
• Detailed Phenotype and Outcome data
• Gene Expression (probe-level data)
• Raw genotype calls
• Gene-sample identifier links
• Genome sequence files
• Germ line variants
ICGC Controlled Access Datasets
• Cancer Pathology Histologic type or subtypeHistologic nuclear grade
• Patient/Person Gender, Age range, Vital status, Survival timeRelapse type, Status at follow-up
• Gene Expression (normalized)• DNA methylation •Computed Copy Number and Loss of Heterozygosity• Somatic variants from Exome or WGS
ICGC OpenAccess Datasets
http://goo.gl/w4mrV
ONTARIO INSTITUTE FOR CANCER RESEARCH
• Primary sequence data (BAM and FASTQ files)
• SNP6 array level 1 and level 2 data• Exon array level 1 and level 2 data• Somatic variants from whole
genome sequencing• Certain information in MAFs• A full list of controlled-access
data types can be found at: http://goo.gl/K1h7zu
TCGA Controlled Access Datasets
• De-identified clinical and demographic data
• Gene expression data• Copy number alterations in regions
of the genome• Epigenetic data• Summaries of data compiled across
individuals• Anonymized single amplicon DNA
sequence data• Somatic variants from scrubbed
exome sequencing
TCGA OpenAccess Datasets
http://goo.gl/A1rMRB
ONTARIO INSTITUTE FOR CANCER RESEARCH
TCGA/ICGC users agreed:
… to keep all computer systems on which controlled access data reside, or which provide access to such data, up to date with respect to software and security patches.
… to protect Controlled Access Data against disclosure to unauthorized individuals.
… to monitor and control which individuals have access to Controlled Access Data.
ONTARIO INSTITUTE FOR CANCER RESEARCH
TCGA/ICGC users agreed:
… to destroy all copies of controlled access data after controlled access privileges expires.
... to only use secure transfer protocols: e.g. https and sftp
… to encrypt Controlled Access data in transfers and storage
ONTARIO INSTITUTE FOR CANCER RESEARCH
What does it mean for this file?
simple_somatic_mutation.aggregated.vcf.gzhttps://dcc.icgc.org/repository/release_18/Summary
ONTARIO INSTITUTE FOR CANCER RESEARCH
ONTARIO INSTITUTE FOR CANCER RESEARCH
Identify yourself
Fill out detail form which includes:• Contact and Project Information•Information Technology details and procedures for keeping data secure•Data Access Agreement
All of these documents are put into a PDF file that you print and get your institution to sign off on your behalf
ONTARIO INSTITUTE FOR CANCER RESEARCH
ONTARIO INSTITUTE FOR CANCER RESEARCH
ONTARIO INSTITUTE FOR CANCER RESEARCH
ONTARIO INSTITUTE FOR CANCER RESEARCH
ONTARIO INSTITUTE FOR CANCER RESEARCH
ONTARIO INSTITUTE FOR CANCER RESEARCH
75
ONTARIO INSTITUTE FOR CANCER RESEARCH
https://icgc.org/daco/approved-projects
173 groups 977 people
ONTARIO INSTITUTE FOR CANCER RESEARCH
DACO
ICGC
dbGaPcgHUB
EGA
TCGA
BAM
Open
Open
ERA
BAM
BAM
EGA id& password
WGS
ONTARIO INSTITUTE FOR CANCER RESEARCH
Making sense of it all
1 project == 1 pipeline
ONTARIO INSTITUTE FOR CANCER RESEARCH
Making sense of it all
55 projects == 55 pipelines
ONTARIO INSTITUTE FOR CANCER RESEARCH
Making sense of it all
55 projects == 1 pipeline
ONTARIO INSTITUTE FOR CANCER RESEARCH
PanCancer Analysis of Whole Genomes (PCAWG)
2,400 T/N pairs with clinical dataanalyzed over 6 Academic clouds
16 working groups, > 1000 scientists
1 alignment pipeline (10 months)
Data freeze 2 months ago
3 somatic mutation pipelines (2 more months?)
2 RNA-Seq pipelines (done)
Start writing papers in January 2016
81
ONTARIO INSTITUTE FOR CANCER RESEARCH
From PCAWG we will have:
1st PANCANCER analysis on > 2,400 cancer tumours from a WGS perspective
RNA, SSM, CNV, Methylation analysis
Published (executable) pipelines
Docker https://github.com/docker/docker
Galaxy galaxyproject.org
Seqware http://seqware.github.io/
Method papers
Multiple cloud access to data
Multiple portal access to data
ONTARIO INSTITUTE FOR CANCER RESEARCH
Other projects in planning ICGC to finish in Spring of 2018
Planning for ICGC2
ICGC 1: 25,000 tumours (DNA, RNA, Epigenome, Clinical data)
ICGC2: (planning) 250,000 Tumours (DNA, RNA, Epigenome, Clinical trial) (1/2 million genomes)
ICGC1 was the picture, ICGC2 will be the movie (before and after treatment).
Trailers to come out in December, before Christmas
Submission system with one place for data and metadata
Tools/links directory portal
ONTARIO INSTITUTE FOR CANCER RESEARCH
DCC Software Developer
Vincent Ferretti Daniel ChangAnthony CrosJerry LamBrian O'ConnorBob TiernayStuart WattShane WilsonJunjun Zhang
Acknowledgments
ICGC/OICR Project leaders:
Tom HudsonJohn McPhersonLincoln SteinJared SimpsonPaul BoutrosVincent FerrettiFrancis OuelletteJennifer Jennings
Ouellette Lab
Michelle BrazasEmilie ChautardNina PalikucaZhibin Lu
Web Dev
Joseph YamadaKamen WuKim CullionMiyuki Fukuma
ICGC DCC Biocuration
Hardeep NahalMarc PerryKevin Chen
http://oicr.on.ca http://icgc.org
… and all the patients and their families that that are putting their hopes into our work!
Research IT/Systems
David Sutton, Bob GibsonSam MaclennanDavid MagdaRob NaccaratoBrian OttGino Yearwood
EGAJustin PaschallJeff Almeida-KingIlkka LappalainenJordi Rambla De ArgilaMarc Sitges Puy
Genome Sequence Informatics (GSI)
Lars Jorgensen
Tim BeckTony DeBatLarry HeisslerXuemei (Mei) LuoMichael MoorhouseYogi Sundaravadanam
Morgan TaschukMichael Laszloffy Peter Ruzanov
ONTARIO INSTITUTE FOR CANCER RESEARCH
Informatics and Biocomputing at the OICR
ONTARIO INSTITUTE FOR CANCER RESEARCH
http://icgc.org
http://dcc.icgc.org
http://docs.icgc.org
[email protected] @bffo