will spooner - big data in mental health - 23rd july 2014
DESCRIPTION
Organised by the Bioinformatics group at the BRCMH, IoP, SLaM and Maudsley Digital, this symposium showcased talks regarding the important roles of big data in mental health biomedical research and treatments.TRANSCRIPT
©Eagle Genomics Ltd
Big Data in Mental Health
23 July 2014
From 100,000 genomes to genomic medicine. OpportuniEes and challenges from an informaEcs perspecEve.
William Spooner, CSO, Eagle Genomics Ltd
©Eagle Genomics Ltd
©Eagle Genomics Ltd
"we should remain unabashed about the ul2mate impact of genomic medicine, which will be to transform the health of our children and our children’s children” – Eric Lander
Image: iStockphoto all rights reserved
Lander ES (2011). "IniEal impact of the sequencing of the human genome". Nature 479 (7333): 187–197.
©Eagle Genomics Ltd
Eagle; an Open Source Business
l Consultancy/advice l Training l Support l Installation/Integration l Customization l Out sourced management
Business Open Community (e.g. Academia) Service Company
Service CollaboraEon
©Eagle Genomics Ltd
About Eagle Genomics
Babraham-‐based consultancy InformaEcs: life science R&D Customers: US, Europe, Asia Collaborate: EBI, JIC, U.Man. Founded: 2008 Employees: 20
Solexa/Illumina
Horizon
Medimmune
AstraZeneca
Adenbrookes
Sanger/EBI
University
Babraham
©Eagle Genomics Ltd
The DNA Path
1 mile 10,000 leders 1 gene; BRCA2
BReast CAncer 2 Tumor suppressor
© Keith Edkins (CC BY-‐SA 2.0)
©Eagle Genomics Ltd
The Human Genome 3,000,000,000 leders 20,000 genes x10 round the world
© webdesignhot.com (CC SA 3.0)
©Eagle Genomics Ltd
Molecular Psychiatry advance online publica2on 30 August 2011; doi:10.1038/mp.2011.101
ScienEfic impact of genomics
Image: Sartr hdp://sartr.deviantart.com/gallery/?offset=96#/d1u0z75 CC BY-‐NC-‐ND 3.0
Phen
otype Associa8
on
©Eagle Genomics Ltd 14th October 2013 London Innovators 8
NRCAM
©Eagle Genomics Ltd
Mental health diseases with shared geneEc basis
• 100’s GWAS published for psychiatric disorders; – Licit and illicit drug use, – Schizophrenia, – Bipolar disorder, – Depression, – Anorexia, – OCD, PTDS, – Tourede's, – AuEsm, ADHD, …
©Eagle Genomics Ltd
Missing Heritability:!Linked allele!
v. Rare alleles !v. Polygenic Inheritance !
v. Epistasis!!
©Eagle Genomics Ltd
Need.!
More.!
Genomes!!!
©Eagle Genomics Ltd
About the 100K Genome Project
• Announced by David Cameron in 2012, £100M funding • Run by GeL, a private company 100% owned by DoH • Sequence up to 100,000 paEents over next 5 years • Focus on cancer and rare inherited diseases • “UK will be the first ever country to introduce this technology in its mainstream health system”
©Eagle Genomics Ltd
100,000 Genomes by 2017
©Eagle Genomics Ltd
GeL Data Flows
©Eagle Genomics Ltd
Types of annotaEons anEcipated
• Filtered ranked lists of variants with esEmates of pathogenicity
• Expected impact at level of genes, pathways • Tools organising literature around affected genes, pathways
• Clear, simple clinical reports
©Eagle Genomics Ltd
10 GENOMES
July 23, 2014 Type footer in here 16
10 Human Genomes 2 TB sequence data 5 GB annotaEons
©Eagle Genomics Ltd
1,000 GENOMES
23/07/2014 17
1000 Human Genomes 200 TB sequence data 500 GB annotaEons
©Eagle Genomics Ltd
100,000 GENOMES
23/07/2014 18
1000 Human Genomes 200 TB sequence data 500 GB annotaEons
©Eagle Genomics Ltd
PaEents (consent) Sample
Tracking Biobank
Sequencing Centres
AnnotaEon Service Data
Control Service
Data Service
Clinical Data
Clinician
GeL Data Flows
©Eagle Genomics Ltd
PaEents (consent) Sample
Tracking Biobank
Sequencing Centres
AnnotaEon Service Data
Control Service
Data Service
Clinical Data
Clinician
Metadata Catalog
©Eagle Genomics Ltd
Eagle’s AnnotaEon Service Proposal
©Eagle Genomics Ltd
Annotate datafiles
Register datafiles
Data flow management with Eaglecore
Eaglecore Plaxorm
Eaglecore: plaxorm for the management of GeL metadata • Secure • CollaboraEve • Scalable GeL-‐specific modules for: • RepresenEng clinical NGS
experiments • Automated workflows for
QC/annotaEon
GeL Ap
pliance
Share annotaEons
Register annotaEons
Progression
©Eagle Genomics Ltd
Automated annotaEon workflow
End
Generated data
AcEvity
Input data
Start
KEY
Output from one acEvity as input for a subsequent acEvity
Output report
Clinical ReporEng
Call variants
Alignment QC
Annotate variants
Gene annotaEon (Ensembl)
Alignment
Variation
SNV CNV SV
Alignment filter
Merge variants
Alignments BAM
Filtered Alignments BAM
Filtered Alignments BAM
Alignment Report HTML
Short variants VCF
Copy number variants VCF
Structural variants VCF
Short variants VCF
Copy number variants VCF
Structural variants VCF
Variants VCF Variants
Annotated variants TSV
Variant Report PDF
Annotated variants TSV
VCF
Annotation Disease annotaEon (DDD)
Custom annotaEon (100K Genomes)
Custom annotaEon (100K Genomes)
©Eagle Genomics Ltd
Web UI sFTP
Workflow Server Blackboard
Storage Instances only launched on workflow demand/load
elastic
Job fetching and status updating
Data input/output storage
Main data input/output Exchange for user access
Infrastructure AWS, OpenStack, …
Mas
ter i
nsta
nce
Scalable workflows with eHive
©Eagle Genomics Ltd
Variant Knowledge Base
Variant AnnotaEon with VEP
July 23, 2014
Ensembl Genes
Deciphering Developmental
Disorders
Thomson Reuters GVDB
100K Genomes
Sample VCF Ensembl VEP
Annotated Variants
VEP Plugin
VEP Plugin
VEP Plugin
©Eagle Genomics Ltd
ComparaEve Genomics
Regulatory Genomics
VariaEon
Assembly/Genes
Variant Storage with Ensembl
Data Integration
Data Reporting
Data Analysis
Data Integration
Data Querying
Data QC API
hdpd
DAS
Track Hub
VEP
©Eagle Genomics Ltd
Eaglecore Features
• Metadata catalogue – OrganisaEon of experimental
informaEon – Links everything together in one place
• Collabora8ve – InformaEon easily and securely shared
• Secure – Enterprise soluEon – Deployed on site or in the cloud
• Scalable – Designed to cope with the next
generaEon of assay technologies (e.g. NGS)
– Designed to tackle data science
problems in life sciences R&D
• Extensive search capabili8es – Ontology support to standardise
entries for studies, phenotypes & technologies
• API interface – Allows connecEon to external
programs for further analysis
• User friendly – Easy-‐to-‐use interface for quick data
capture of new experiments & legacy data import
• Open standards – Uses established open standards (ISA,
OWL) & ontologies (e.g. EFO) to organise data
©Eagle Genomics Ltd
SegmentaEon of Commercial Users
©Eagle Genomics Ltd
Nurture Build trust, shared language
Collaborate Enterprise Academia Government FoundaEons Open
InnovaEon
Explore Work together to find a common
purpose
Exploit Turn ideas into
tangible benefits
29/ ElasEcAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
©Eagle Genomics Ltd
©Eagle Genomics Ltd
GeL 100K Genomes
• Marker access, provider focus • Personalised; companion diagnosEcs
Biomarkers in Clinical Trials
• Link between target and disease • Availability of predicEve biomarkers
• StraEficaEon of paEent populaEon • Cohort selecEon
Right target
Right paEent
• PharmacokineEc/dynamic predicEon • Pharmacogenomic biomarkers Right safety
Right market
©Eagle Genomics Ltd
Genomic Association [biomarker]
Towards Genomic Medicine
Personalised Medicine
Right drug
Right pa8ent
Right 8me
Pharmacogenomics
Genotypic
Transcriptomic
Epigene8c
©
©Eagle Genomics Ltd
GeL 100K Genomes, stats for mental health
• GeL does not focus on mental health – 25,000 paired tumor/normal for cancer – 15,000 trios for rare disease
• But, 1 in 4 with mental illness? – Expect over 15,000 cases – With over 50,000 controls
• Secondary use of data is likely to be valuable – Precedent from WT-‐CCC
©Eagle Genomics Ltd
Yesterday…
hdp://www.broadinsEtute.org/news/5896
Cases: 34,241 Controls: 45,604 Trios: 1,235
©Eagle Genomics Ltd
Why is informaEcs important? What will GeL contribute?
• Widely acknowledged that informaEcs, not sequencing, is rate limiEng in genomics research
• GeL addresses informaEcs problems that will become rouEne in a few years; – TranslaEon of academic bioinformaEcs technologies into clinical
se{ngs – Development of new approaches for disease research
• PotenEal to reach beyond immediate GeL focus of cancer and rare inherited disorders
• Creates new markets for genomics data; disrupEve technology – E.g. PotenEal to adract clinical trials back to the UK
©Eagle Genomics Ltd
Eagle® is a registered trademark no. 010418135 of Eagle Genomics Ltd. Postal address: Eagle Genomics Ltd., Babraham Research Campus, Cambridge CB22 3AT, United Kingdom.
©Eagle Genomics Ltd
[email protected] +44 (0)1223 654481 www.eaglegenomics.com
facebook.com/eaglegenomics blog.eaglegenomics.com @wspoonr @eaglegen