software tools for the analysis of medically important sequence variations gabor t. marth, d.sc....
TRANSCRIPT
![Page 1: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/1.jpg)
Software tools for the analysis of medically
important sequence variations
Gabor T. Marth, D.Sc.Boston CollegeDepartment of [email protected]://bioinformatics.bc.edu/marthlab
Pfizer visit, March 7. 2006
![Page 2: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/2.jpg)
Our lab focuses on three main projects…
2. software for SNP discovery in clonal and re-sequencing data,
1. software tools for clinical case-control association studies
3. connecting HapMap and pharmaco-genetic data
![Page 3: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/3.jpg)
1. We developing computer software to aid tagSNP selection and association testing
gene annotations
tags
association statistics
input data views
LD views
GUI
user control interface
reference samples
representative computational samples
tag evaluationmarker selectionassociation testing
study specificationuser input
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
LA LD (r2)
5-s
ite
Co
mp
uta
ion
all
y G
en
era
ted
LD
(r
2)
1-4 Mrk Sep.
5-9 Mrk Sep.
10-17 Mrk Sep.
18-26 Mrk Sep.
computationalsample database
(discussed in more detail)
![Page 4: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/4.jpg)
• inherited (germ line) polymorphisms are important as they can predispose to disease
1.
2. We build computer tools for SNP discovery
Siablevarall
]T,G,C,A[S ]T,G,C,A[SiiiorPr
iiorPr
i
iiorPr
i
NiorPrNiorPr
NN
iorPr
i Ni
N
N
N )S,...,S(P)S(P
)R|S(P...
)S(P
)R|S(P...
)S,...,S(P)S(P)R|S(P
...)S(P)R|S(P
)SNP(P
1
1
1
1 11
11
11
• we have a 5-year NIH R01 grant to re-develop our computer package, PolyBayes© , our SNP discovery tool originally developed while the PI was at the Washington University Medical School
Marth et al. Nature Genetics 1999
• looking for SNPs and short INDELs
![Page 5: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/5.jpg)
Apply our tools for genome-scale SNP mining
Sachidanandam et al. Nature 2001
~ 10 million
EST
WGS
BAC
genome reference
![Page 6: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/6.jpg)
Extend our methods for SNP detection in medical re-sequencing data from traditional Sanger sequencers…
Homozygous T
Homozygous C
Heterozygous C/T
![Page 7: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/7.jpg)
… and in 454 pyrosequence data
454 sequence from the NCBI Trace Archive
• accurate base calling for de novo sequencing
• detection of heterozygotes in medical re-sequencing data
Figure from Nordfors, et. al. Human Mutation 19:395-401 (2002)
(discussed in more detail)
![Page 8: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/8.jpg)
Developing methods to detect somatic mutations (as distinguished from inherited polymorphisms)
© Brian Stavely, Memorial University of Newfoundland
• the detection of somatic mutations, and their distinction from inherited polymorphism, will be important to separate pre-disposing variants from mutations that occur during disease progression e.g. in cancer
(discussed in more detail)
![Page 9: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/9.jpg)
Process DNA methylation data obtained with sequencing
DNA methylation is important e.g. because hypo- and hypermethylation is consistently present in various cancers
Issa. Nature Reviews Cancer, 4, 2004: 988-993
we are developing methods to interpret DNA methylation data obtained with sequencing, in the presence of methodological artifacts such as incomplete bi-sulfite conversion of un-methylated cytosines
Lewin et. al. Bioinformatics, 20:3005-30012, 2004
![Page 10: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/10.jpg)
… and tools to integrate genetic and epigenetic data from varied sources to find “common themes” during cancer development
chromatin structure
gene expression profiles
copy number changes
methylation profiles
chromosome rearrangement
s
repeat expansions
somatic mutations
![Page 11: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/11.jpg)
3. We are planning a project to connect multi-marker haplotypes to drug metabolic phenotypes
• predicting metabolic phenotypes (ADR) based on haplotype markers
• evolutionary origin of drug metabolizing enzyme polymorphisms
![Page 12: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/12.jpg)
Computer software to aid case-control association studies: tagSNP selection and association testing (details)
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
LA LD (r2)
5-s
ite
Co
mp
uta
ion
all
y G
en
era
ted
LD
(r
2 )
1-4 Mrk Sep.
5-9 Mrk Sep.
10-17 Mrk Sep.
18-26 Mrk Sep.
Dr. Eric Tsung
![Page 13: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/13.jpg)
Clinical case-control association studies – concepts
• association studies are designed to find disease-causing genetic variants
• searching “significant” marker allele frequency differences between cases and controls
AF(cases)
AF(
contr
ol
s)
clinical cases
clinical controls
• genotyping cases and controls at various polymorphisms
![Page 14: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/14.jpg)
Association study designs
• region(s) interrogated: single gene, list of candidate genes (“candidate gene study”), or entire genome (“genome scan”)
• direct or indirect:
causative variant causative variantmarker that is co-inherited with causative variant
• single-SNP marker or multi-SNP haplotype marker
• single-stage or multi-stage
![Page 15: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/15.jpg)
Marker (tag) selection for association studies
2. LD-driven – based entirely on the reduction of redundancy presented by the linkage disequilibrium (LD) between SNPs; tags represent other SNPs they are correlated with
1. hypothesis driven (i.e. based on gene function)
causative variant
for economy, one cannot genotype every SNP in thousands of clinical samples: marker selection is the process where a subset of all available SNPs is chosen
![Page 16: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/16.jpg)
The International HapMap project
http://www.hapmap.org
The international HapMap project was designed to provide a set of physical and informational reagents for association studies by mapping out human LD structure
![Page 17: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/17.jpg)
LD varies across samples
African reference (YRI)
there are large differences in LD between different human populations…
European reference (CEU)
… and even between samples from the same population.
Other European samples
![Page 18: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/18.jpg)
Sample-to-sample LD differences make tagSNP selection problematic
groups of SNPs that are in LD in the HapMap reference samples may not be in a future set of clinical samples…
… and tags that were selected based on LD in the HapMap may no longer work (i.e. represent the SNPs they were supposed to) in the clinical samples…
… possibly resulting in missed disease associations.
![Page 19: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/19.jpg)
Natural marker allele frequency differences confound association testing
reference samples: ~ 120 chromosomes
cases: 500-2,000 chromosomes
controls: 500-2,000 chromosomes
• the HapMap reference samples are much smaller than clinical sample sizes
• difficult to accurately assess both marker allele frequency (single-SNP or haplotype frequency) in the clinical samples and naturally occurring variation of marker allele frequency differences between cases and controls
AF(cases)
AF(
contr
ol
s)
• therefore difficult to assess statistical significance of candidate associations
![Page 20: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/20.jpg)
We are developing technology for assessing sample-to-sample variance in silico
reference
cases
controlstag evaluationtag selection
association testing
we estimate LD differences betweenHapMap and future clinical samples…
“cases”
“controls”
…by generating “computational” samples representing future clinical samples…
… and use computational “proxy” samples for tabulating LD and allele frequency differences.
![Page 21: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/21.jpg)
Two methods of computational sample generation
“HapMap” “cases”
“controls”HapMap
Method 1. “Data-relevant Coalescent”. This algorithm uses a population genetic model to connect mutations in the HapMap reference to mutations in future clinical samples. Full model but computationally slow.
Method 2. The PAC method (product of approximate conditionals, Li & Stephens). This method constructs “new” samples as mosaics of existing haplotypes, mimicking the effects of recombination. An approximation but fast.
![Page 22: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/22.jpg)
Computational samples
HapMap (CEU)
Computational (PAC)
Computational (Coalescent)
Extra genotypes (Estonia)
![Page 23: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/23.jpg)
MARKER EVALUATION with computational samples
test if markers selected from the HapMap continue to “tag” other SNPs in their original LD group
![Page 24: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/24.jpg)
MARKER SELECTION with computational samples
selecting tags in multiple consecutive sets of computational samples and choosing for the association study the best-performing tags
![Page 25: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/25.jpg)
ASSOCIATION TESTING with computational samples
“cases”
“controls”
“cases”
“controls”
“cases”
“controls”
tabulating ΔAF in “cases” vs. “controls” in multiple consecutive computational pairs of samples provides the natural range of allele frequency differences to decide if a candidate association is statistically significant
AF(cases)
AF(
contr
ol
s)
![Page 26: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/26.jpg)
Do computational samples represent future clinical genotypes realistically?
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
we quantify the quality of representation by comparing the correlation of LD between corresponding pairs of markers (i.e. ask if two markers were in strong LD in one set of samples, are they ALSO in strong LD in the other set?
![Page 27: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/27.jpg)
LD difference -- comparison to extra experimental genotypes
0.949 +/- 0.013
0.978 +/- 0.0100.963 +/- 0.014
• we have analyzed two extra genotype sets collected at the HapMap SNPs in three genome regions, from our clinical collaborators (Prof. Thomas Hudson, McGill; Prof. Stanley Nelson, UCLA)
![Page 28: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/28.jpg)
AF difference -- comparisons to extra experimental genotypes
0
0.01
0.02
0.03
0.04
0.05
0.06
0 0.01 0.02 0.03 0.04 0.05 0.06
AF Diff, Estonian Data
AF
Dif
f, C
om
p S
am
ple
s
• according to our limited initial test, computational samples can represent future clinical samples well for estimating sample-to-sample variability
![Page 29: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/29.jpg)
A new marker selection and association testing software tool
• data visualization
reference samples
representative computational samples
• representative computational sample generation
• advanced tag selection functionality
gene annotations
tags
LD views
• gene annotations overlaid on physical map of SNPs (i.e. the human genome sequence)
association statistics0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
LA LD (r2)
5-s
ite
Co
mp
uta
ion
all
y G
en
era
ted
LD
(r
2)
1-4 Mrk Sep.
5-9 Mrk Sep.
10-17 Mrk Sep.
18-26 Mrk Sep.
• advanced association testing functionality
• multi-level user customization including user conveniences e.g. tag prioritization based on SNP assay score
![Page 30: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/30.jpg)
User community
• companies designing new generations of whole-genome or specialized SNP arrays
• researchers comparing alternative platforms (e.g. Affymetrix 500K and the Illumina 300K ) most suitable for their study
• clinical researchers designing candidate gene studies
• researchers designing second-stage follow-up studies in specific genome regions after an initial genome scan (our methods can take advantage of first-stage data already available in the clinical samples)
• the association testing features should be useful for analysts regardless of study design
![Page 31: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/31.jpg)
Base calling and SNP detection in sequence traces including 454 data
Aaron Quinlan
![Page 32: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/32.jpg)
Base calling and SNP detection in sequence traces including 454 “pyrogram” data
• PolyBayes was originally written to find SNPs in clonal sequences in large SNP discovery projects
• medical re-sequencing projects require the detection of SNPs in heterozygous diploid sequence traces
C
CG
G AT
CG
5’
3’
5’
3’
![Page 33: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/33.jpg)
Heterozygote detection in sequence traces
Ind. 1
Ind. 2
Ind. 3
Ind. 4
![Page 34: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/34.jpg)
Individual traces
• we use a machine learning method (Support Vector Machine, SVM) to recognize characteristic features of homozygous vs. heterozygous positions
![Page 35: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/35.jpg)
Aggregating information from multiple traces
forward/reverse sequences from same individual
P(GT ) = .993
resultant genotype call
P(GT | Read) = .98
P(GT | Read) = .87
![Page 36: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/36.jpg)
Discovery vs. genotyping
Prior(CT) = .001
discovery: “uninformed prior”don’t know if site is polymorphichave to test each site
Prior(CT) = 0.34
genotyping: “informed prior”1. site is known to be polymorphic2. allele frequency estimate
![Page 37: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/37.jpg)
Our heterozygote detection works better than other methods
Performance Measured on ~1000 Alignments covering 500Kb Region of Chromosome 4
Fraction of Data
Analyzed
False Discovery
Rate
Fraction of Heterozygotes
Found
Fraction of Homozygotes
Found
PolyBayes+ 85.1 0.0375 86.60% 97.8%
Polyphred 5 86.17 0.0389 83.16% 82.63%
![Page 38: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/38.jpg)
Base calling for “pyrograms”
From NCBI Trace Archive
• we have access to standardized data formats
• readout in pyrosequencing is based on instantaneous detection of base incorporation… multiple bases of the same type are incorporated in the same cycle
26 55 24 15 10 7 5 4 2 1 0 0
TCAGGGGGGGGGGGACGACAAGGCGTGGGGA• the identity of consecutive bases is very reliable but the length of mono-nucleotide runs (base number) is difficult to quantify (great for re-sequencing; but problematic for de novo sequencing)
![Page 39: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/39.jpg)
SNP genotyping with pyrosequencers
Nordfors, et. al. Human Mutation 19:395-401 (2002)
we are in the process of identifying discriminating pyrogram features to use in our machine-learning methods to recognize polymorphic positions within traces
![Page 40: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/40.jpg)
Somatic mutation detection
Michael Stromberg
![Page 41: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/41.jpg)
Somatic mutations
© Brian Stavely, Memorial University of Newfoundland
the detection of somatic mutations, and their distinction from inherited polymorphism, is important to separate pre-disposing variants from mutations that occur during disease progression e.g. in cancer
1. detect the mutations
2. classify whether somatic or inherited
![Page 42: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/42.jpg)
Detecting somatic mutations with comparative data
• based on comparison of cancer and normal tissue from the same individual
• often cancer tissue is highly heterogeneous and the somatic mutant allele may represent at low allele frequency
![Page 43: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/43.jpg)
Detecting somatic mutations with subtraction
• if normal tissue samples are not available, we detect SNPs in cancer tissue against e.g. the human genome reference sequence
• subtract apparent mutations that are present in sequence variation databases
• search for evidence that these mutations are genetic
![Page 44: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/44.jpg)
Detecting somatic mutations with subtraction
• we have applied our methods for somatic mutation detection in murine mitochondrial sequences
heteroplasmy homoplasmy
• we will be applying our methods for human nuclear DNA from our collaborators
![Page 45: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/45.jpg)
Using new haplotype resources to connect genotype and clinical outcome in pharmaco-genetic systems
• the HapMap was designed as a tool to detect high-frequency (common) phenotypic (e.g. disease-causing) alleles
• important drug metabolizing enzymes are relatively few in number, well studied, are at known genome locations, many associated phenotypes are well described
• many functional alleles are known, and of high frequency (common)
• multi-SNP alleles are highly predictive of metabolic phenotype
• clinical phenotype (adverse drug reaction) less predictable
• ideal candidate for applying haplotype resources
![Page 46: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/46.jpg)
Multi-marker haplotypes as accurate markers for ADRs?
functional allele (known metabolic
polymorphism)
genetic marker (haplotype) in genome
regions of drug metabolizing enzyme
(DME) genes
molecular phenotype (drug concentration measured in blood
plasma)
clinical endpoint (adverse drug
reaction)computational prediction
based on haplotype structure
![Page 47: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/47.jpg)
Resources
• specifics of enzyme-drug interactions
• LD and haplotype structure in the HapMap reference samples, based on high-density SNP map
• functional alleles
• existing DME P genotyping chips
![Page 48: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/48.jpg)
Evolutionary questions
• mutation age?
• mutations single-origin or recurrent?• geographic origin of mutations?
• analysis based on complete local variation structure and haplotype background of functional mutations
• specifics of the selection process that led to specific functional alleles?
![Page 49: Software tools for the analysis of medically important sequence variations Gabor T. Marth, D.Sc. Boston College Department of Biology marth@bc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022062309/56649e915503460f94b95be6/html5/thumbnails/49.jpg)
Proposed steps of analysis
• haplotypes vs. metabolic phenotype?
• complete polymorphic structure?
• ethnicity?
• additional functional SNPs?
• haplotypes vs. functional alleles?
haplotype block?
functional allele(genotype)
metabolic phenotype
clinical phenotype(ADR)haplotype
• haplotypes vs. ADR phenotype?