mutation analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. ·...

20
Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin Berlin 2012/03/22

Upload: others

Post on 22-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Mutation Analysis

Sebastian Bauer

Institut für Medizinische GenetikCharité Universitätsmedizin Berlin

2012/03/22

Page 2: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Workflow for Mutation Analysis

Raw Data Generation Sample preparation and sequencing

Raw Data Analysis Base calling

Whole Genome Mapping Alignment to a reference genome

Variant Calling Detection of genetic variation

Annotation Linking variants to biological information

Page 3: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Raw Data Generation

Prepare samples

Then sequence

Output is vendor-specific raw data

Page 4: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Workflow for Mutation Analysis

Raw Data Generation Sample preparation and sequencing

Raw Data Analysis Base calling

Whole Genome Mapping Alignment to a reference genome

Variant Calling Detection of genetic variation

Annotation Linking variants to biological information

Page 5: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Raw Data Analysis: Base Calling

Transform raw data in to sequences of bases

Exact procedure depends on the used sequencing platform

Most report additonally quality score for each base that canbe transformed into a Phread score

QPhred =−10 log10 P(error)

Example: QPhred = 20 ⇔ error = 1%

One Sequence Entry (Read) in the Output Fastq File@SEQ_IDGATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT+!’’*((((***+))%%%++)(%%%%).1***-+*’’))**55CCF>>>>>>CCCCCCC65

Page 6: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Workflow for Mutation Analysis

Raw Data Generation Sample preparation and sequencing

Raw Data Analysis Base calling

Whole Genome Mapping Alignment to a reference genome

Variant Calling Detection of genetic variation

Annotation Linking variants to biological information

Page 7: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Whole Genome Mapping / Aligning

Current methods assume mapping to a reference genome

Allows to find variants with known associations to diseasesbut also new suspects

Most short read mapper use hashs or on data structuresbased on the Burrows-Wheeler transform

Output is some statistics and a SAM or BAM file

Page 8: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Whole Genome Mapping / Aligning

Current methods assume mapping to a reference genome

Allows to find variants with known associations to diseasesbut also new suspects

Most short read mapper use hashs or on data structuresbased on the Burrows-Wheeler transform

Output is some statistics and a SAM or BAM file

Page 9: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Workflow for Mutation Analysis

Raw Data Generation Sample preparation and sequencing

Raw Data Analysis Base calling

Whole Genome Mapping Alignment to a reference genome

Variant Calling Detection of genetic variation

Annotation Linking variants to biological information

Page 10: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Variant Calling: Genetic Variation

Variant Calling: Identify regions that differ from the reference

Single nucleotide variants (SNVs)TGCATTGCGTAGGCTGCATTCCGTAGGC

Short indels (=insertion/deletion)TGCATT– – –TAGGCTGCATTCCGTAGGC

MicrosatellitesTGCTCATCATCATCAGCTGCTCATCA– – – – – –GC

Minisatellites≤ 100bp

Copy number variations (largedeletions, duplications, inversions;CNVs)

≥ 1000bp

Page 11: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Variant Calling: Genotype

Easy Approach

Count alleles at each column X in the pileup and use cutoff rules1 Filter for Phread score (QPhread) of 20

2 Call a genotype heterozygous, if non-ref allele is between 20%and 80%, otherwise homozygous

Works reasonable well when coverage is > 20 (Nielsen et al. 2011)

More elaborate ones are based on probabilistic frameworks

P(G|X) ∝ P(X |G)P(G) = ∏i

P(Xi |G)P(G), G ∈ {A,C,T,G}

Likelihood P(X |G) from quality score P(Xi |G) for each entry i

P(G) allows to specifiy data-independent prior knowledge

Posterior 0 < P(G|X)< 1 assesses genotype and confidence

Page 12: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Variant Calling: Integrating Prior Knowlegde

Single Sample Prior for a Given Position X

Suppose that a G/T polymorphism is reported in dbSNP. Then,G G T GT Other combinations

P(G) 0.454 0.454 0.0909 < 10−4

GATK multi-sample uses estimated allelefrequencies from larger sample setscombined Hardy-Weinberg equilibrium

GATK-Beagle with linkage disequilibriumdata

Page 13: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Variant Calling: Other Examples of Extensions

(taken from the Illumina Website)

But......all is not much of use for rare mutations

Page 14: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Workflow for Mutation Analysis

Raw Data Generation Sample preparation and sequencing

Raw Data Analysis Base calling

Whole Genome Mapping Alignment to a reference genome

Variant Calling Detection of genetic variation

Annotation Linking variants to biological information

Page 15: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Annotation

Between 3 and 5 million SNVs per indivdual

Only few have a functional impact

Separating them is a challenge of bioinformatics

Many tools use supervised learning approaches(remember my last talk)

SNV features

cSNVs (protein-coding)– Amino acid residue substitions prop.– Evolutionary history of AA position– Sequence-function relationship– Structure-function relationship

rSNVs (regulatory)– Transcription– Pre-MRna splicing– MicroRNA binding– Post-translational modification sites

Page 16: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Annotation: Protein-Sequence-Based

(Cooper et. al)

Page 17: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Annotation: DNA-Sequence-Based

(Cooper et. al)

Page 18: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Flow chart for informed use ofSNV function prediction tools

Cline et al.

Page 19: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

Final

Thanks for your attention!

Page 20: Mutation Analysiscompbio.charite.de/tl_files/mutation-analysis-2012.pdf · 2015. 11. 27. · Mutation Analysis Sebastian Bauer Institut für Medizinische Genetik Charité Universitätsmedizin

References

Nielsen et. al. Genotype and SNP calling fromnext-generation sequencing data. Nature ReviewsGenetics. (2011)

Cline et. al. Using bioinformatics to predict the functionalimpact of SNVs. Bioinformatics. (2011)

Cooper et. al. Needles in stacks of needles: findingdisease-causal variants in a wealth of genomic data.Nature Reviews Genetics. (2011)