1 population genetics basics. 2 terminology review allele locus diploid snp
TRANSCRIPT
![Page 1: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/1.jpg)
1
Population Genetics Basics
![Page 2: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/2.jpg)
2
Terminology review
• Allele• Locus• Diploid• SNP
![Page 3: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/3.jpg)
3
Single Nucleotide Polymorphisms
000001010111000110100101000101010010000000110001111000000101100110
Infinite Sites Assumption:Each site mutates at most once
![Page 4: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/4.jpg)
4
What causes variation in a population?
• Mutations (may lead to SNPs)• Recombinations• Other genetic events (gene conversion)• Structural Polymorphisms
![Page 5: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/5.jpg)
5
Recombination
0000000011111111
00011111
![Page 6: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/6.jpg)
6
Gene Conversion
• Gene Conversion versus crossover– Hard to distinguish
in a population
![Page 7: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/7.jpg)
7
Structural polymorphisms
• Large scale structural changes (deletions/insertions/inversions) may occur in a population.
![Page 8: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/8.jpg)
8
Topic 1: Basic Principles
• In a ‘stable’ population, the distribution of alleles obeys certain laws– Not really, and the deviations are
interesting• HW Equilibrium
– (due to mixing in a population)• Linkage (dis)-equilibrium
– Due to recombination
![Page 9: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/9.jpg)
9
Hardy Weinberg equilibrium
• Consider a locus with 2 alleles, A, a• p (respectively, q) is the frequency of A
(resp. a) in the population• 3 Genotypes: AA, Aa, aa• Q: What is the frequency of each genotype
If various assumptions are satisfied, (such as random mating, no natural selection), Then• PAA=p2
• PAa=2pq• Paa=q2
![Page 10: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/10.jpg)
10
Hardy Weinberg: why?
• Assumptions:– Diploid– Sexual reproduction– Random mating– Bi-allelic sites– Large population size, …
• Why? Each individual randomly picks his two chromosomes. Therefore, Prob. (Aa) = pq+qp = 2pq, and so on.
![Page 11: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/11.jpg)
11
Hardy Weinberg: Generalizations
• Multiple alleles with frequencies– By HW,
• Multiple loci?
1,2,,H
Pr[homozygous genotype i] =i2
Pr[heterozygous genotype i, j] = 2i j
![Page 12: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/12.jpg)
12
Hardy Weinberg: Implications
• The allele frequency does not change from generation to generation. Why?
• It is observed that 1 in 10,000 caucasians have the disease phenylketonuria. The disease mutation(s) are all recessive. What fraction of the population carries the mutation?
• Males are 100 times more likely to have the “red’ type of color blindness than females. Why?
• Conclusion: While the HW assumptions are rarely satisfied, the principle is still important as a baseline assumption, and significant deviations are interesting.
![Page 13: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/13.jpg)
13
Recombination
0000000011111111
00011111
![Page 14: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/14.jpg)
14
What if there were no recombinations?
• Life would be simpler• Each individual sequence would have a
single parent (even for higher ploidy)• The relationship is expressed as a tree.
![Page 15: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/15.jpg)
15
The Infinite Sites Assumption
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0
3
8 5
• The different sites are linked. A 1 in position 8 implies 0 in position 5, and vice versa.
• Some phenotypes could be linked to the polymorphisms• Some of the linkage is “destroyed” by recombination
![Page 16: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/16.jpg)
16
Infinite sites assumption and Perfect Phylogeny
• Each site is mutated at most once in the history.
• All descendants must carry the mutated value, and all others must carry the ancestral value
i
1 in position i0 in position i
![Page 17: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/17.jpg)
17
Perfect Phylogeny
• Assume an evolutionary model in which no recombination takes place, only mutation.
• The evolutionary history is explained by a tree in which every mutation is on an edge of the tree. All the species in one sub-tree contain a 0, and all species in the other contain a 1. Such a tree is called a perfect phylogeny.
![Page 18: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/18.jpg)
18
The 4-gamete condition
• A column i partitions the set of species into two sets i0, and i1
• A column is homogeneous w.r.t a set of species, if it has the same value for all species. Otherwise, it is heterogenous.
• EX: i is heterogenous w.r.t {A,D,E}
iA 0B 0C 0D 1E 1F 1
i0
i1
![Page 19: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/19.jpg)
19
4 Gamete Condition
• 4 Gamete Condition– There exists a perfect phylogeny if and only if for all
pair of columns (i,j), either j is not heterogenous w.r.t i0, or i1.
– Equivalent to– There exists a perfect phylogeny if and only if for all
pairs of columns (i,j), the following 4 rows do not exist
(0,0), (0,1), (1,0), (1,1)
![Page 20: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/20.jpg)
20
4-gamete condition: proof
• Depending on which edge the mutation j occurs, either i0, or i1 should be homogenous.
• (only if) Every perfect phylogeny satisfies the 4-gamete condition
• (if) If the 4-gamete condition is satisfied, does a prefect phylogeny exist?
i0 i1
i
![Page 21: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/21.jpg)
21
An algorithm for constructing a perfect phylogeny
• We will consider the case where 0 is the ancestral state, and 1 is the mutated state. This will be fixed later.
• In any tree, each node (except the root) has a single parent.– It is sufficient to construct a parent for every node.
• In each step, we add a column and refine some of the nodes containing multiple children.
• Stop if all columns have been considered.
![Page 22: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/22.jpg)
22
Inclusion Property
• For any pair of columns i,j– i < j if and only if i1
j1 • Note that if i<j then the
edge containing i is an ancestor of the edge containing i
i
j
![Page 23: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/23.jpg)
23
Example
1 2 3 4 5A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0
r
A B C D E
Initially, there is a single clade r, and each node has r as its parent
![Page 24: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/24.jpg)
24
Sort columns
• Sort columns according to the inclusion property (note that the columns are already sorted here).
• This can be achieved by considering the columns as binary representations of numbers (most significant bit in row 1) and sorting in decreasing order 1 2 3 4 5
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0
![Page 25: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/25.jpg)
25
Add first column
• In adding column i– Check each edge
and decide which side you belong.
– Finally add a node if you can resolve a clade
r
A BC DE
1 2 3 4 5
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0
u
![Page 26: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/26.jpg)
26
Adding other columns
• Add other columns on edges using the ordering property
r
E B
C
D
A
1 2 3 4 5
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0
1
2
4
3
5
![Page 27: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/27.jpg)
27
Unrooted case
• Switch the values in each column, so that 0 is the majority element.
• Apply the algorithm for the rooted case
![Page 28: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/28.jpg)
28
Handling recombination
• A tree is not sufficient as a sequence may have 2 parents
• Recombination leads to loss of correlation between columns
![Page 29: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/29.jpg)
29
Linkage (Dis)-equilibrium (LD)
• Consider sites A &B• Case 1: No
recombination– Pr[A,B=0,1] = 0.25
• Linkage disequilibrium
• Case 2:Extensive recombination– Pr[A,B=(0,1)=0.125
• Linkage equilibrium
A B0 10 10 00 01 01 01 01 0
![Page 30: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/30.jpg)
30
Handling recombination
• A tree is not sufficient as a sequence may have 2 parents
• Recombination leads to loss of correlation between columns
![Page 31: 1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP](https://reader035.vdocuments.mx/reader035/viewer/2022070401/56649f1e5503460f94c355cd/html5/thumbnails/31.jpg)
31
Recombination, and populations
• Think of a population of N individual chromosomes.
• The population remains stable from generation to generation.
• Without recombination, each individual has exactly one parent chromosome from the previous generation.
• With recombinations, each individual is derived from one or two parents.
• We will formalize this notion later in the context of coalescent theory.