genomics experimental-methods
DESCRIPTION
Genomics: experimental methodsTRANSCRIPT
Genomics: Experimental methods
Slides available www.bioinformatics.be
Lab for Bioinformatics and computational genomics
10 “genome hackers” mostly engineers (statistics)
42 scientiststechnicians, geneticists, clinicians
>100 people hardware engineers,
mathematicians, molecular biologists
Overview
Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
Overview
Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
Personalized Medicine
• The use of diagnostic tests (aka biomarkers) to identify in advance which patients are likely to respond well to a therapy
• The benefits of this approach are to– avoid adverse drug reactions– improve efficacy– adjust the dose to suit the patient– differentiate a product in a competitive market– meet future legal or regulatory requirements
• Potential uses of biomarkers– Risk assessment– Initial/early detection– Prognosis– Prediction/therapy selection– Response assessment– Monitoring for recurrence
Biomarker
First used in 1971 … An objective and « predictive » measure … at the molecular level … of normal and pathogenic processes and responses to therapeutic interventions
Characteristic that is objectively measured and evaluated as an indicator of normal biologic or pathogenic processes or pharmacologic response to a drug
A biomarker is valid if:– It can be measured in a test system with well
established performance characteristics – Evidence for its clinical significance has been
established
Rationale 1:Why now ? Regulatory path becoming more clear
There is more at stake than efficient drug development. FDA « critical path initiative » Pharmacogenomics guideline
Biomarkers are the foundation of « evidence based medicine » - who should be treated, how and with what.
Without Biomarkers advances in targeted therapy will be limited and treatment remain largely emperical. It is imperative that Biomarker development be accelarated along with therapeutics
Why now ?
First and maturing second generation molecular profiling methodologies allow to stratify clinical trial participants to include those most likely to benefit from the drug candidate—and exclude those who likely will not—pharmacogenomics-based
Clinical trials should attain more specific results with smaller numbers of patients. Smaller numbers mean fewer costs (factor 2-10)
An additional benefit for trial participants and internal review boards (IRBs) is that stratification, given the correct biomarker, may reduce or eliminate adverse events.
Molecular Profiling
The study of specific patterns (fingerprints) of proteins, DNA, and/or mRNA and how these patterns correlate with an individual's physical characteristics or symptoms of disease.
Generic Health advice
• Exercise (Hypertrophic Cardiomyopathy)• Drink your milk (MCM6 Lactose intolarance)• Eat your green beans (glucose-6-phosphate
dehydrogenase Deficiency)• & your grains (HLA-DQ2 – Celiac disease)• & your iron (HFE - Hemochromatosis)• Get more rest (HLA-DR2 - Narcolepsy)
Generic Health advice (UNLESS)
• Exercise (Hypertrophic Cardiomyopathy)• Drink your milk (MCM6 Lactose intolarance)• Eat your green beans (glucose-6-phosphate
dehydrogenase Deficiency)• & your grains (HLA-DQ2 – Celiac disease)• & your iron (HFE - Hemochromatosis)• Get more rest (HLA-DR2 - Narcolepsy)
Generic Health advice (UNLESS)
• Exercise (Hypertrophic Cardiomyopathy)• Drink your milk (MCM6 Lactose intolerance)• Eat your green beans (glucose-6-phosphate
dehydrogenase Deficiency)• & your grains (HLA-DQ2 – Celiac disease)• & your iron (HFE - Hemochromatosis)• Get more rest (HLA-DR2 - Narcolepsy)
Generic Health advice (UNLESS)
• Exercise (Hypertrophic Cardiomyopathy)• Drink your milk (MCM6 Lactose intolerance)• Eat your green beans (glucose-6-phosphate
dehydrogenase Deficiency)• & your grains (HLA-DQ2 – Celiac disease)• & your iron (HFE - Hemochromatosis)• Get more rest (HLA-DR2 - Narcolepsy)
EGFR based therapy in mCRC
Overview
Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
Before molecular profiling …
Before molecular profiling …
Before molecular profiling …
First Generation Molecular Profiling
• Flow cytometry correlates surface markers, cell size and other parameters
• Circulating tumor cell assays (CTC’s) quantitate the number of tumor cells in the peripheral blood.
• Exosomes are 30-90 nm vesicles secreted by a wide range of mammalian cell types.
• Immunohistochemistry (IHC) measures protein expression, usually on the cell surface.
First Generation Molecular Profiling
• Gene sequencing for mutation detection
• Microarray for m-RNA message detection • RT-PCR for gene expression
• FISH analysis for gene copy number • Comparative Genome Hybridization (CGH) for
gene copy number
Basics of the “old” technology
• Clone the DNA.• Generate a ladder of labeled (colored)
molecules that are different by 1 nucleotide.• Separate mixture on some matrix.• Detect fluorochrome by laser.• Interpret peaks as string of DNA.• Strings are 500 to 1,000 letters long• 1 machine generates 57,000 nucleotides/run• Assemble all strings into a genome.
Genetic Variation Among People
0.1% difference among people
GATTTAGATCGCGATAGAGGATTTAGATCTCGATAGAG
Single nucleotide polymorphisms(SNPs)
The genome fits as an e-mail attachment
First Generation Molecular Profiling
• Gene sequencing for mutation detection
• Microarray for m-RNA message detection • RT-PCR for gene expression
• FISH analysis for gene copy number • Comparative Genome Hybridization (CGH) for
gene copy number
mRNA Expression Microarray
First Generation Molecular Profiling
• Gene sequencing for mutation detection
• Microarray for m-RNA message detection • RT-PCR for gene expression
• FISH analysis for gene copy number • Comparative Genome Hybridization (CGH) for
gene copy number
Overview
Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
Basics of the “new” technology
• Get DNA.• Attach it to something.• Extend and amplify signal with some color
scheme.• Detect fluorochrome by microscopy.• Interpret series of spots as short strings of
DNA.• Strings are 30-300 letters long• Multiple images are interpreted as 0.4 to 1.2
GB/run (1,200,000,000 letters/day). • Map or align strings to one or many genome.
Next Generation Technologies
• Roche (454)–Emulsion PCR–Polymerase–Natural Nucleotides
• 100-500 Mb for 5-15k –1% error rate–Homopolymers
One additional insight ...
Read Length is Not As Important For Resequencing
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
8 10 12 14 16 18 20
Length of K-mer Reads (bp)
% o
f P
aire
d K
-mer
s w
ith
Un
iqu
ely
Ass
ign
able
Lo
cati
on
E.COLI
HUMAN
Jay Shendure
Short Read Techologies
• Illumina GA (HiSeq, MySeq)
• ABI SOLID
Other second generation technology: (ABI) SOLID
So what ?
Second generation DNA/RNA profiling
Second Generation DNA profiling
• Enrichment Sequencing• ChIP-Seq (Chromosome
Immunoprecipitation)• A substitute for ChIP-chip• Eg. to find the binding sequence of
proteins (TFBS)
Paired End Reads are Important!
Repetitive DNAUnique DNA
Single read maps to multiple positions
Read 1 Read 2
Known Distance
Paired End Reads are Important!
Repetitive DNAUnique DNA
Single read maps to multiple positions
Read 1 Read 2
Known Distance
Second Generation DNA profiling
• Exome Sequencing (aka known as targeted exome capture) is an efficient strategy to selectively sequence the coding regions of the genome to identify novel genes associated with rare and common disorders.
• 160K exons
Second Generation DNA profiling
Second Generation DNA profiling
Bioinformatics tools
Bioinformatics tools
Con
tent
s-S
ched
ule
Besides the 6000 protein coding-genes …
140 ribosomal RNA genes275 transfer RNA gnes40 small nuclear RNA genes>100 small nucleolar genes
Function of RNA genes
pRNA in 29 rotary packaging motor (Simpson et el. Nature 408:745-750,2000)Cartilage-hair hypoplasmia mapped to an RNA (Ridanpoa et al. Cell 104:195-203,2001)The human Prader-Willi ciritical region (Cavaille et al. PNAS 97:14035-7, 2000)
Second Generation RNA profiling
RNA genes can be hard to detects
UGAGGUAGUAGGUUGUAUAGU
C.elegans let-27; 21 nt (Pasquinelli et al. Nature 408:86-89,2000)
Often smallSometimes multicopy and redundantOften not polyadenylated (not represented in ESTs)Immune to frameshift and nonsense mutationsNo open reading frame, no codon biasOften evolving rapidly in primary sequence
Second Generation RNA profiling
Although details of the methods vary, the concept behind RNA-seq is simple:
• isolate all mRNA• convert to cDNA using reverse transcriptase• sequence the cDNA• map sequences to the genome
The more times a given sequence is detected, the more abundantly transcribed it is. If enough sequences are generated, a comprehensive and quantitative view of the entire transcriptome of an organism or tissue can be obtained.
Second Generation RNA profiling
• Comparing to microarray– Microarray
• Closed technology: Prior knowledge required• Affected by pseudo-genes (homologous of real genes)• Low sensitivity
– RNA-Seq• Open technology: No prior knowledge required• Not affected by pseudo-genes because exact
sequence is measured• Other information could be yielded (SNP, Alternative
splicing)
Second Generation RNA profiling
ncRNAs in human genome
tRNA 60018S rRNA 2005.8S rRNA 20028S rRNA 2005S rRNA 200snoRNA 300miRNA 250U1 40U2 30U4 30U5 30U6 20U4atac 5U6atac 5U11 5U12 5
SRP RNA 1
RNase P RNA 1
Telomerase RNA 1
RNase MRP 1
Y RNA 5
Vault 4
7SK RNA 1
Xist1
H191
BIC1
Antisense RNAs 1000s?
Cis reg regions 100s?
Others ?
Mapping Structural Variation in Humans
- Thought to be Common 12% of the genome (Redon et al. 2006)
- Likely involved in phenotype variation and disease
- Until recently most methods fordetection were low resolution (>50 kb)
CNVs
>1 kb segments
Size Distribution of CNV in a Human Genome
Next next generation sequencing
Third generation sequencing
Now sequencing
Ultra-low-cost SINGLE molecule sequencing
Pacific Biosciences: A Third Generation Sequencing Technology
Eid et al 2008
Complete genomics
Nanopore Sequencing
Second Generation Protein profiling
• Proteomics MS-MS-based exclusively in discovery mode
• Automate diagnostics assay generation (next generation proteomics)• Aptamers as alternative to antibodies• ImmunoPCR
MS/MS identification pipeline
pipeline overview
Bonanza
Bonanza + IggyPep
Goaldefine PTMs profile
prior to database
search
Goalmulti-tiered
database search
Goalfilter
dataset prior to
database search
Second Generation Protein profiling
• Proteomics MS-MS-based exclusively in discovery mode
• Automate diagnostics assay generation (next generation proteomics)• Aptamers as alternative to antibodies• ImmunoPCR
Overview
Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
CONFIDENTIAL# samples
# markers
MethylCap_Seq
Deep_Seq
Genome-wide methylation …. by next generation sequencing
Discovery Verification Validation
3 000 000
5
6 000
50EpiHealth
<50 only models
and fresh frozen
> 50
CONFIDENTIALSequencing
DeepSeq
Molecular Unification
107 106 105 104 103 102 101 1108109
Full genome bp
Whole-genomeBisulphite seq
RUO Clinical
EPI
GENETIC
Whole-genomesequencing
Enrichment seq(MBD, RRBS)
Enrichment seq(Exome)
Probes(450-27K)
Enrichment Targeted Panels
Enrichment Targeted Panels
UltraDeep
Overview
Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
Math
Informatics
Bioinformatics, a life science discipline …
(Molecular)Biology
Math
Informatics
Bioinformatics, a life science discipline …
Theoretical Biology
Computational Biology
(Molecular)Biology
Computer Science
Math
Informatics
Bioinformatics, a life science discipline …
Theoretical Biology
Computational Biology
(Molecular)Biology
Computer Science
Bioinformatics
Math
Informatics
Bioinformatics, a life science discipline … management of expectations
Theoretical Biology
Computational Biology
(Molecular)Biology
Computer Science
Bioinformatics
Interface Design
AI, Image Analysisstructure prediction (HTX)
Sequence Analysis
Expert Annotation
NPDatamining
Math
Informatics
Bioinformatics, a life science discipline … management of expectations
Theoretical Biology
Computational Biology
(Molecular)Biology
Computer Science
BioinformaticsDiscovery Informatics – Computational Genomics
Interface Design
AI, Image Analysisstructure prediction (HTX)
Sequence Analysis
Expert Annotation
NPDatamining
Translational Medicine: An inconvenient truth
• 1% of genome codes for proteins, however more than 90% is transcribed
• Less than 10% of protein experimentally measured can be “explained” from the genome
• 1 genome ? Structural variation• > 200 Epigenomes ??
• Space/time continuum …
Translational Medicine: An inconvenient truth
• 1% of genome codes for proteins, however more than 90% is transcribed
• Less than 10% of protein experimentally measured can be “explained” from the genome
• 1 genome ? Structural variation• > 200 Epigenomes …
• “space/time” continuum
Epigenetic (meta)information = stem cells
Cellular programming
Cellular reprogramming
Tumor
Epigenetically altered, self-renewing cancer stem cells
Tumor Development and Growth
Gene-specificEpigeneticreprogramming
Cellular reprogramming
Wobblebase Mission
provide tools to both specialists (researchers, bioinformaticians, health care providers) and individual consumers that unlock the power of genomic data to the USER
enable personalized genomics today by simplifying the way we organize, visualize and manage genomic data.
PGM: Personal Genomics Manifesto
Everybody who wants to get his genome sequenced has the human right to do so. No third party can own your genetic data, your genetic data is exclusively yours.
Nobody can be forced to get his genome analyzed or to reveal his genome to a third party.
Your genome should allways be treated as confidential, private information.
People should be advised not to share their identity AND their entire genome on a public forum.
People should be advised to use secure technologies that allow to maximally protect phenotypic and/or genotype data.
People should be able to actively explore, manage and get updated interpretation on their genomic data.
• change the diagnostic/healthcare industry forever by setting a new standard and empowering the user
Wobblebase Mission
Choosing the Red Pill
The Technical Feasibility Argument
The Quality Argument
The Price Argument
The Logistics around the sample on howto manage the data Argument
The Ethical debate
The Privacy/Security concern
Updates are the single moste important feature of
Wobblebase
Notifications
#Rs1805007
Wobblebase
Socialnetworktwitter
Comparison
BioinformaticsAnalysispipelines
UpdatesNotifications
eHealth(fixed
vocabulary)
108
biobixwvcrieki
biobix.bebioinformatics.be