![Page 1: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/1.jpg)
Association mapping for mendelian, and complex disorders
Apr 21, 2023 Bafna, BfB
![Page 2: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/2.jpg)
UG Bioinformatics specialization at UCSD
Apr 21, 2023 Bafna, BfB
![Page 3: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/3.jpg)
Abstraction of a causal mutation
Apr 21, 2023 Bafna, BfB
![Page 4: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/4.jpg)
Looking for the mutation in populations
Apr 21, 2023 Bafna, BfB
A possible strategy is to collect cases (affected) and control individuals, and look for a mutation that consistently separates the two classes. Next, identify the gene.
![Page 5: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/5.jpg)
Looking for the causal mutation in populations
Apr 21, 2023 Bafna, BfB
Case
Control
Problem 1: many unrelated common mutations, around one every 1000bp
![Page 6: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/6.jpg)
Case
Control
Apr 21, 2023 Bafna, BfB
![Page 7: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/7.jpg)
Looking for the causal mutation in populations
Apr 21, 2023 Bafna, BfB
Case
Control
Problem 2: We may not sample the causal mutation.
![Page 8: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/8.jpg)
How to hunt for disease genes
• We are guided by two simple facts governing these mutations1. Nearby mutations are correlated2. Distal mutations are not
Apr 21, 2023 Bafna, BfB
Case
Control
![Page 9: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/9.jpg)
This lecture
1. The bottom line: How do these facts help in finding disease genes?
2. The genetics: why should this happen?3. The computation4. Challenge of complex diseases.
Apr 21, 2023 Bafna, BfB
Case
Control
1. Nearby mutations are correlated2. Distal mutations are not
![Page 10: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/10.jpg)
The basics of association mapping
• Sample a population of individuals at variant locations across the genome. Typically, these variants are single nucleotide polymorphisms (SNPs).
• Create a new bi-allelic variant corresponding to cases and controls, and test for correlations.
• By our assumptions, only the proximal variants will be correlated.
• Investigate genes near the correlated variants.
Apr 21, 2023 Bafna, BfB
Case
Control
00001
11
1
![Page 11: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/11.jpg)
So, why should the proximal SNPs be correlated, and distal SNPs not?
Apr 21, 2023 Bafna, BfB
![Page 12: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/12.jpg)
A bit of evolution
• Consider a fixed population (of chromosomes) evolving in time.
• Each individual arises from a unique, randomly chosen parent from the previous generation
Apr 21, 2023 Bafna, BfB
Time
![Page 13: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/13.jpg)
Genealogy of a chromosomal population
Current (extant) population
Time
Apr 21, 2023 Bafna, BfB
![Page 14: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/14.jpg)
Adding mutations
Apr 21, 2023 Bafna, BfB
Infinite sites assumption: A mutation occurs at most once at a site.
![Page 15: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/15.jpg)
SNPs
Apr 21, 2023 Bafna, BfB
The collection of acquired mutations in the extant population describe the SNPs
![Page 16: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/16.jpg)
Fixation and elimination
• Not all mutations survive.• Some mutations get fixed, and are no
longer polymorphic
Apr 21, 2023 Bafna, BfB
![Page 17: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/17.jpg)
Removing extinct genealogies
Apr 21, 2023 Bafna, BfB
![Page 18: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/18.jpg)
Removing fixed mutations
Apr 21, 2023 Bafna, BfB
![Page 19: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/19.jpg)
The coalescent
Apr 21, 2023 Bafna, BfB
![Page 20: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/20.jpg)
Disease mutation
Apr 21, 2023 Bafna, BfB
• We drop the ancestral chromosomes, and place the mutations on the internal branches.
![Page 21: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/21.jpg)
Disease mutation
• A causal mutation creates a clade of affected descendants.
Apr 21, 2023 Bafna, BfB
![Page 22: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/22.jpg)
Disease mutation
• Note that the tree (genealogy) is hidden. • However, the underlying tree topology
introduces a correlation between each pair of SNPs
Apr 21, 2023 Bafna, BfB
![Page 23: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/23.jpg)
What have we learnt?
• The underlying genealogy creates a correlation between SNPs.
• By itself, this is not sufficient, because distal SNPs might also be correlated.
• Fortunately, for us the correlation between distal SNPs is quickly destroyed.
Apr 21, 2023 Bafna, BfB
![Page 24: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/24.jpg)
Recombination
Apr 21, 2023 Bafna, BfB
![Page 25: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/25.jpg)
Recombination
• In our idealized model, we assume that each individual chromosome chooses two parental chromosomes from the previous generation
Apr 21, 2023 Bafna, BfB
![Page 26: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/26.jpg)
Multiple recombination change the local genealogy
Apr 21, 2023 Bafna, BfB
![Page 27: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/27.jpg)
A bit of evolution
• Proximal SNPs are correlated, distal SNPs are not.• The correlation (Linkage disequilibirium) decays
rapidly after 20-50kb
Apr 21, 2023 Bafna, BfB
![Page 28: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/28.jpg)
BASIC STATISTICS
Apr 21, 2023 Bafna, BfB
![Page 29: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/29.jpg)
Testing for correlation
• In the absence of correlation
Apr 21, 2023 Bafna, BfB
![Page 30: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/30.jpg)
Testing for correlation
• When correlated
Apr 21, 2023 Bafna, BfB
![Page 31: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/31.jpg)
Assigning confidence
2 2
2 2
Apr 21, 2023 Bafna, BfB
X
X 4 0
0 4
X
X
Expected Observed
![Page 32: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/32.jpg)
Assigning confidence
2 2
2 2
Apr 21, 2023 Bafna, BfB
X
X 4 0
0 4
X
X
Expected Observed
![Page 33: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/33.jpg)
Assigning confidence
2.5 2.5
1.5 1.5
Apr 21, 2023 Bafna, BfB
X
3 2
1 2
XExpected Observed
X X
![Page 34: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/34.jpg)
STATISTICAL TESTS OF ASSOCIATION
Apr 21, 2023 Bafna, BfB
![Page 35: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/35.jpg)
Tests for association: Pearson
• Case-control phenotype:– Build a 3X2 contingency table– Pearson test (2df)=
Cases Controls
mm
Mm
MM O1 O2
O3 O4
O6O5
Apr 21, 2023 Bafna, BfB
![Page 36: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/36.jpg)
The χ2 test
Cases Controls
mm
Mm
MM O1
O5
O3 O4
O2
O6
• The statistic behaves like a χ2 distribution.
• A p-value can be computed directly
Apr 21, 2023 Bafna, BfB
![Page 37: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/37.jpg)
Χ2 distribution properties
A related distribution is the F-distribution
Apr 21, 2023 Bafna, BfB
![Page 38: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/38.jpg)
Likelihood ratio
• Another way to check the extremeness of the distribution is by computing a (log) likelihood ratio.
• We have two competing hypothesis. Let N be the total number of observations
Apr 21, 2023 Bafna, BfB
![Page 39: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/39.jpg)
LLR
• An LLR value close to 0, implies that the null hypothesis is true. Asymptotically, the LLR statistic also follows the chi-square distribution.
Apr 21, 2023 Bafna, BfB
![Page 40: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/40.jpg)
Exact test
• The chi-square test does not work so well when the numbers are small.
• How can we compute an exact probability of seeing a specific distribution of values in the cells?
• Remember: we know the marginals (# cases, # controls,
Apr 21, 2023 Bafna, BfB
![Page 41: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/41.jpg)
Fischer exact test
Cases Controls
mm
Mm
MM a
e
c d
b
f
• Num: #ways of getting configuration (a,b,c,d,e,f)
• Den: #ways of ensuring that the row sums and column sums are fixed
Apr 21, 2023 Bafna, BfB
![Page 42: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/42.jpg)
Fischer exact test
• Remember that the probability of seeing any specific values in the cells is going to be small.
• To get a p-value, we must sum over all similarly extreme values. How?
Apr 21, 2023 Bafna, BfB
![Page 43: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/43.jpg)
Test for association: Fisher exact test
• Here P is the probability of seeing the exact count.• The actual significance is computed by summing over
all such tables that are at least this extreme.
Cases Controls
mm
Mm
MM a
e
c d
b
f
Apr 21, 2023 Bafna, BfB
![Page 44: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/44.jpg)
Test for association: Fisher exact test
Cases Controls
mm
Mm
MM a
e
c d
b
f
Apr 21, 2023 Bafna, BfB
![Page 45: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/45.jpg)
Continuous outcomes
• Instead of discrete (Case/control) data, we have real-valued phenotypes– Ex: Diastolic Blood Pressure
• In this case, how do we test for association
Apr 21, 2023 Bafna, BfB
![Page 46: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/46.jpg)
Continuous outcome ANOVA
• Often, the phenotypes are not offered as case-controls but like a continuous variable– Ex: blood-pressure measurements
• Question: Are the mean values of the two groups significantly different?
MM mm
Apr 21, 2023 Bafna, BfB
![Page 47: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/47.jpg)
Two-sided t-test
• For two categories, ANOVA is also known as the t-test
• Assume that the variables from the two sets are drawn from Normal distributions– Different means, equal variances
• Null hypothesis is that they are both from the same distribution
Apr 21, 2023 Bafna, BfB
![Page 48: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/48.jpg)
t-test continued
Apr 21, 2023 Bafna, BfB
![Page 49: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/49.jpg)
Two-sample t-test
• As the variance is not known, we use an estimate S, defined by
• The T-statistic is given by
• Significant deviations from 0 are used to reject the Null hypothesisApr 21, 2023 Bafna, BfB
![Page 50: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/50.jpg)
Two-sample t-test (unequal variances)
• If the variances cannot be assumed to be equal, we use
• The T-statistic is given by
mS
nS
XX22
21
21 =T
• Significant deviations from 0 are used to reject the Null hypothesisApr 21, 2023 Bafna, BfB
![Page 51: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/51.jpg)
CONFOUNDING ASSOCIATION
Apr 21, 2023 Bafna, BfB
![Page 52: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/52.jpg)
Confounding association
• Association tests can be confounded in many ways.
• We will explore a few of these, at a high level, and point to a few algorithmic problems.
Apr 21, 2023 Bafna, BfB
![Page 53: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/53.jpg)
Confounding association with population substructure
Apr 21, 2023 Bafna, BfB
If the cases and controls are from different subpopulations, then sites with differing allele frequencies will confound association
![Page 54: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/54.jpg)
The algorithmic problem
• Given a collection of individual genotypes, separate them into sub-populations.
• Idea: take markers that are very far apart so that no LD is possible.
• LD indicates structure.• Problem: Partition individuals into sub-populations so
that all correlation across pairs of distant markers is minimized. Penalty for increasing sub-populations?
Apr 21, 2023 Bafna, BfB
![Page 55: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/55.jpg)
Confounding associations with genotypes
Apr 21, 2023 Bafna, BfB
A recombination event
Distinct haplotypes can create identical genotypes confounding association
![Page 56: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/56.jpg)
Confounding association with interactions
• Individually, the markers do not correlate.• Together, they perfectly predict genes.• Find interacting partners that associate
with genes
Apr 21, 2023 Bafna, BfB
![Page 57: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/57.jpg)
Confounding association with rare variants
• Not only can we have multiple interacting SNPs, each SNP individually occurs with very low frequency (< 1%).
• Can you detect associations with rare variants?
Apr 21, 2023 Bafna, BfB
![Page 58: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/58.jpg)
Other problems
Apr 21, 2023 Bafna, BfB
• Can we reconstruct the phylogeny?• Useful for computing recombination
bounds.
![Page 59: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/59.jpg)
Conclusion
• As individual genomes are sequenced, the association of variations with phenotypes presents many confounding challenges.
• Some of these challenges can be modeled as algorithmic problems.
• Population genetics should be part of a bioinformatics undergraduate curriculum.
Apr 21, 2023 Bafna, BfB
![Page 60: Association mapping for mendelian, and complex disorders January 16Bafna, BfB](https://reader036.vdocuments.mx/reader036/viewer/2022070414/5697bffb1a28abf838cc1280/html5/thumbnails/60.jpg)
Thank you
• Homework (due Monday, March 15)– Describe an algorithm to detect associations
of interacting, rare-variants with a complex disease phenotype, in the presence of population substructure in the case-control population.
Apr 21, 2023 Bafna, BfB