遺伝的多様性、集団構造と環境適応 genetic diversity...
TRANSCRIPT
遺伝的多様性、集団構造と環境適応
2019.5.29 生物配列統計学(Sequence Statistics and Mathematical Biology)
Genetic diversity,population structure and environmental adaptation
• Genetic diversity• A fundamental source of biodiversity• Correlates with fitness of individuals and populations • Poor understanding of determinants of the variation
• Population structure• Environmental adaptation of species• Evolution of species• Precise estimation needed
Why important?
Alleles at a locus: 遺伝子座と対立遺伝子
A random mating population
locus (loci)
male female
��������������������������������
microsatellites(CA repeats)
single nucleotide polymorphism (SNP)
DNA
Large
Smallheterozygote
8
56
3
7
(CA)8 (CA)5 (CA)3 (CA)7(CA)6 (CA)6
Allele size(bp)
Microsatellite allele sizes defined by electrophoresis
個体1 個体2 個体3
homozygote
Capillary electrophoresis
Microsatellite genotypes, allele frequencies and genetic diversity (Pacific herring)
Locus(遺伝子座)個体
6AK1, 113151 122132 157162 113120 2162366AK2, 117147 114132 138157 113120 2142206AK3, 117117 122130 157162 113134 2242306AK4, 151159 122122 151157 113113 2162306AK5, 112117 107122 157166 113120 2022206AK6, 112131 120136 162164 113120 1962266AK7, 141167 120128 157168 120124 1982066AK8, 117125 120122 157162 113113 204230
0.00.10.20.30.4 Before2005
0.00.10.20.30.4
2006
0.00.10.20.30.4
2007
0.00.10.20.30.4
microsatellite alleles (Cha123)
After2014
2005
2007
2008
2013
OB (尾駮沼) MY (宮古湾)
Data: Kitada, Yoshikai, Fujita, Hamasaki, Nakamichi, Kishino (2017). Conserv. Genet. 18, 423-437.
松島湾
宮古湾13宮古湾08宮古湾07
宮古湾 放流05宮古湾05尾駮沼14尾駮沼07尾駮沼06尾駮沼05
噴火湾
湧洞沼07湧洞沼03厚岸湖13厚岸湖06
厚岸湖 放流03厚岸湖03
能取湖
サロマ湖
石狩湾
0.0 0.4 0.8Observed heterozygosity
23 samples (4,617 fish) were collected from nine spawning grounds in the spawning season during 2003 –2014.
homozygote
ℎ 1 𝑝
281 SNP genotypes: Atlantic herringLimborg et al. (2012). Molec. Ecol. 21, 3686-3703.
Allles 001, 002 heterozygotes
Bothnian_Bay, 002002 001002 001001 001002 001002 002002 001001 002002 001002 002002002002 001002 002002 001002 002002 001002 001001 002002 002002 001002002002 001002 001002 001002 002002 001002 002002 001002 002002 002002002002 001001 001002 001001 001002 002002 001002 002002 001001 001002001002 001001 000000 001001 001001 002002 002002 001001 002002 001002001002 001002 000000 001001 002002 001002 001002 001002 001001 001002001002 002002 002002 001002 001002 001002 002002 001001 001001 001001001002 002002 001002 002002 002002 001002 001002 002002 001002 001002002002 002002 002002 001001 001002 002002 001001 001002 001001 002002002002 001002 001002 001001 002002 002002 001001 001002 002002 002002001001 002002 001001 001002 002002 002002 002002 001001 002002 001002002002 002002 002002 002002 002002 001002 001002 002002 002002 002002001002 001002 001002 002002 002002 001001 001001 002002 002002 001001002002 001001 002002 001002 001002 002002 001001 001002 002002 002002002002 002002 001002 001001 002002 002002 001002 001002 001001 002002001002 002002 001001 002002 001002 002002 002002 001002 001001 002002002002 002002 002002 002002 001002 002002 002002 001001 001001 001002001001 002002 002002 001001 001002 001002 001002 002002 002002 002002001002 002002 002002 001002 002002 002002 002002 001002 001001 002002001002 001002 001001 001002 002002 001001 001002 002002 001001 002002001002 002002 002002 001001 002002 001002 002002 001002 002002 002002002002 001001 001002 001002 001001 001001 002002 002002 002002 002002001002 001002 002002 001001 002002 002002 001002 001002 001001 001001002002 001002 001002 001002 001002 001002 002002 002002 001001 001002002002 002002 001001 002002 002002 002002 002002 001002 001002 001001002002 001001 001002 001001 001002 001002 001001 001001 002002 002002001002 002002 001001 002002 001002
Bothnian_Bay, 001002 002002 001001 002002 001001 001002 002002 002002 001002 002002002002 001002 002002 002002 002002 002002 001002 002002 001002 001001
個体 SNP
Spring-spawning herring (n=607) were sampled in the spawning season during 1999-2009.
SNP allele frequencies and genetic diversity (Atlantic herring)
BBCNSCLSECGFGDHBICEIRSKATLIM
SHLNOR
GRRF
RUGSKAWIR
HSP 70
0.0 0.4 0.8Allle frequency
BBCNSCLSECGFGDHBICEIRSKATLIM
SHLNOR
GRRF
RUGSKAWIR
Cha_1025.1-149
0.0 0.4 0.8Allle frequency
BBCNSCLSECGFGDHBICEIRSKATLIM
SHLNOR
GRRF
RUGSKAWIR
Cha_15984.1-275 (Hba)
0.0 0.4 0.8Allle frequency
BBCNSCLSECGFGDHBICEIRSKATLIM
SHLNOR
GRRF
RUGSKAWIR
Cha_1513.1-91(cat)
0.0 0.4 0.8Allle frequency
BBCNSCLSECGFGDHBICEIRSKATLIM
SHLNOR
GRRF
RUGSKAWIR
HSP 70
0.0 0.4 0.8Heterozygosity
BBCNSCLSECGFGDHBICEIRSKATLIM
SHLNOR
GRRF
RUGSKAWIR
Cha_1025.1-149
0.0 0.4 0.8Heterozygosity
BBCNSCLSECGFGDHBICEIRSKATLIM
SHLNOR
GRRF
RUGSKAWIR
Cha_15984.1-275 (Hba)
0.0 0.4 0.8Heterozygosity
BBCNSCLSECGFGDHBICEIRSKATLIM
SHLNOR
GRRF
RUGSKAWIR
Cha_1513.1-91(cat)
0.0 0.4 0.8Heterozygosity
allele frequencies
heterozygosity
Environmental gradients in the North Atlantic and the Baltic Sea
5 6 7 8 9 10 11 12
5
10
15
20
25
30
35
KAT
LIM
RFRUG
SKA
BB
GFGDHB
GR
CNSSHL CLSECIRS
WIRICENOR
Mean annual SST
Mea
n an
nual
SSS
http://ocean.dmi.dk/models/hbm.uk.php
Seto Inland SeaMean depth 38mMax depth 105m
Data Limborg et al. (2012). Molec. Ecol. 21, 3686-3703.
Atlantic herring population structure in the NE Atlantic/Baltic
265 neutral SNPs 281 all SNPs
Rosenberg, N. A., Pritchard, J. K., Weber, J. L., Cann, H. M., Kidd, K. K., Zhivotovsky, L. A., & Feldman, M. W. (2002). Genetic structure of human populations. science, 298(5602), 2381-2385.
52 populations (377 microsatellite loci, n=1,056)
Human population structure
STRUCTURE: Bayesian clustering
Pritchard, Stephens, Donnelly (2000) Genetics 155, 945-959.
( ) ( ) ( 1)
( ) ( ) ( )
Step1. Sample , from Pr( , | , ).Step2. Sample from Pr( | , , ).Step2. Update using a Metropolis-Hasting step.
s s s
s s s
P Q P Q X ZZ Z X P Q
MCMC algorithm (s=1, 2,…)
Priors
Goal : Sample from the joint posterior distribution
Pr( | )X K Most likely K
Assumption : HWE in putative original populationsData : multi-locus genotypes
Pr( , , , , | , )P Q Z X K
MCMC
A mixed populationi
1 K
i
Pr( ) 1/iz k K
1
1 2
~ Dir( ,..., )... 1
k m
m
p
𝑞 𝑞 , ⋯ , 𝑞
𝑞 𝑖 ~𝐷𝑖𝑟 𝛼, … , 𝛼 , 𝛼 ∈ 0,10
Hardy-Weinberg equilibrium (HWE)
A1 (p) A2 (q)
A1(p)
A1A1(p×p)
A1A2(p×q)
A2(q)
A1A2(p×q)
A2A2(q×q)
female
male
Genotype frequencies
2 2 22 ( ) 1p pq q p q
A random mating population Allele and genotype frequencies are expected to be stable in a HWE population without
• Natural selection• Mutation• Non-random mating• Migration• Genetic drift
Putative ancestry population
FST, 1
FST, 2
FST, 6
STRUCTURE F-model: linked loci and correlated allele frequencies
Falush, Stephens, Pritchard (2003) Genetics 164, 1567-1587. ST, ST, ST,
1 2ST, ST, ST,
1 1 1~ Dir( , ,..., )k k k
k A A Amk k k
F F Fp p p p
F F F
ST
ST
ST ST
1/ ( 1)11 1
FF
F F
1
1 2
~ Dir( ,..., )... 1
A m
m
p
PA
𝑞 𝑞 , ⋯ , 𝑞
Goal : Sample from the joint posterior distribution
Assumption : HWE in putative original populationsData : multi-locus genotypes
MCMC
P r 𝑃, 𝑄, 𝑍, 𝐹 , 𝛼, 𝜆|𝑋, 𝐾
𝑞 𝑖 ~𝐷𝑖𝑟 𝛼, … , 𝛼 ,
Wright’s island model and FST(Fixation index)
Sewall Green Wright 1889.12.16‐ 1988.3.3
Wright, S., 1931 Evolution in mendelian populations. Genetics 16: 97–158.
Wright, S., 1951 The genetical structure of populations. Ann. Eugen. 15: 323–354.
𝑓 𝑝Γ 4𝑁𝑚
Γ 4𝑁𝑚𝑝 Γ 4𝑁𝑚 1 𝑝 𝑝 1 𝑝
allele frequencies of populations at a biallelic locus (a beta distribution)
multiple loci (a Dirichlet distribution)
𝐹1
4𝑁𝑚 11
𝜃 1
4Nm
Prior for allele frequencies
ST
1
11
1
( )( | )( )
1/ ( 1)
i
m
imiii
m
ii
global
p
F
p α 11
1
( | ) ...!... !
mxxm
m
nL p px x
p x
Likelihood of allele counts
distribution of pmultinomial (binomial)
Conjugate prior for the allele frequency
Histogram of rbeta(5000, 1, 1)
rbeta(5000, 1, 1)
Freq
uenc
y
0.0 0.2 0.4 0.6 0.8 1.0
010
2030
4050
6070
~ beta(1,1)kp
Dirichlet (beta)
1
1 2
~ Dir( ,..., )... 1
A m
m
p
1
1 2
~ Dir( ,..., )... 1
k m
m
p
STRUCTURE
Atlantic herring population structure
LOCPRIOR given information of sampling sites (1,…,18)
STRUCTURE
Data: Neutral marker set (265 SNPs)
Limborg et al. (2012). Molec. Ecol. 21, 3686-3703.
18 sampling sites
STRUCTURE: LOCPRIOR for high gene flow species
Hubisz, Falush, Stephens, Pritchard (2009) Molecular Ecology Resources 9, 1322-1332.
Given the location information as integer (l=1, 2, …, S).
Pr( | )ii l kz k
. 1 2~ Dir( , ,..., )l Kr r r
~ unif (0, ),~ Dir(1,...,1)
MAXrr
Informativeness of location information
A small r estimate (r<1.0) indicates that the location information is informative.
Goal : Sample from the joint posterior distribution
STPr( , , , , , | , 1,..., )S,P Q Z F X Kr
1 k K
li
1~ ( ,..., )i l lKq Dir
( ) ~ unif (0, 10)gk MAX
( )~ gamma( ,1/ )glk kr r
Global value of α forthe k cluster
Sampling location (l=1,…, S)
MCMC: STRUCTURE_LOCPRIOR (Atlantic herring, K=3)Burn‐in 100,000MCMC 500,000
0 50 100 150 200 2500.0
0.1
0.2
0.3
0.4
0.5
SNP
glob
al F
st Cha_1025.1-149
Cha_15389.3‐101 (Hsp70)
Cha_15984.1‐275 (Hba)
Cha_2884.1‐367
1513.1.91 (Cat)
locus‐F S
T(EBF
ST)
Detecting loci influenced by selection
BAYSCAN (2008)
𝐩𝐢𝐣~Dir 𝜃 𝑝 , . . . , 𝜃 𝑝 ,
𝜃1
𝐹ST1 at locus 𝑖 population 𝑗.
𝐩 ~Dir 1, . . . , 1
Ancestry population
FST, 1
FST, 2
FST, J
𝐩𝐢
𝐩𝐢𝐣
Estimates the probability that a locus is influenced by selection
ST
ST
1log log1
ij
jijij
iFF
π 𝐩 𝐩, 𝛉 𝐿 𝜋 𝐩𝐢𝐣|𝐩𝐢, 𝛉𝐢𝐣
K: num. alleles
Foll & Gaggiotti (2008) Genetics 180, 977-993.
𝐿 𝑎 ,…, 𝑎 |𝜃 , 𝑝 ,…, 𝑝∏
𝐴𝜋 𝐩|𝐩, 𝛂, 𝛃 𝜋 𝛼
𝜋 𝐩|𝐩, 𝛂 with 𝛼 0, 𝛃 𝑞 𝛼
To accept to add with probability min (1,A)𝛼
P 𝛼 number of times 𝛼 included in the model
Posterior odds𝑃 𝛼
1 𝑃 𝛼
MCMC
Or to delete 𝛼 with probability min (1,1/A)
PO(posterior odds)
Among 281 loci, 16 outlier were found (265 were neutral).
Outlier loci of Atlantic herring populations (BAYSCAN)
105
0 50 100 150 200 2500.0
0.1
0.2
0.3
0.4
0.5
SNP
glob
al F
st Cha_1025.1-149
Cha_15389.3‐101 (Hsp70)
Cha_15984.1‐275 (Hba)
Cha_2884.1‐367
locus‐FST
1513.1.91 (Cat)
Limborg et al. (2012). Molec. Ecol. 21, 3686-3703.
5% 1%
BAYENV (2010) Coop, Witonski, Rienzo, Pritchard (2010) Genetics 185, 1411-1423.
𝑥 𝑔 𝜃 0 if 𝜃 0 𝜃 0 𝜃 1 1 𝜃 1
Ancestry population of allele frequencies
𝜀 ~Beta 1,1
𝑥Ω
MCMC
BFPr Model 1|𝐧𝐥, 𝐦𝐥Pr Model 0|𝐧𝐥, 𝐦𝐥
Estimates the probability that a locus is influenced by environmental variables
l: locus
𝑃 𝜃 |Ω, 𝜀 ~𝑁 𝜀 , 𝜀 1 𝜀 Ω
Null model (Model 0)
Ω~inverse Wishart
Ω A single draw of Ω after burn-in
𝑃 𝜃 , Ω, 𝛆𝐥|𝐧 , 𝐦 ∝ 𝑃 𝐧 , 𝐦 |𝑥 𝑔 𝜃𝑃 𝜃 |Ω, 𝛆𝐥 𝑃 Ω 𝑃 𝛆𝐥 Observed allele count (1 and 2)
Joint posterior of the parameters
Alternative model (Model 1)
( | , , ) ~ ( , 1 ) ), (l l l l lP NY Y
𝑃 𝜃 , Ω, 𝛆𝐥, 𝛃|𝐧 , 𝐦 ∝ 𝑃 𝐧 , 𝐦 |𝑥 𝑔 𝜃𝑃 𝜃 |Ω, 𝛆𝐥, 𝛃 𝑃 Ω 𝑃 𝛆𝐥 𝑃 𝛃
𝑃 𝛽 ~Unif 𝛽 , 𝛽
Limborg et al. (2012). Molec. Ecol. 21, 3686-3703.
Detecting environment-linked loci (BAYENV)
• The first two columns left of the SNP names show all detected outliers where * and ** denote outliers with P < 0.05 or 0.01 for the ARLEQUIN analysis. BAYESCAN outliers were detected with false discovery rates of 5% (*) and 1% (**).
• Statistical inference of correlations between SNPs and landscape parameters are given for relationships with log10(BF) = 1.5–2.0 (*) and log10(BF) > 2.0 (**).
0 50 100 150 200 2500.0
0.1
0.2
0.3
0.4
0.5
SNP
glob
al F
st Cha_1025.1-149
Cha_15389.3‐101 (Hsp70)
Cha_15984.1‐275 (Hba)
Cha_2884.1‐367
locus‐FST
1513.1.91 (Cat)
0 50 100 150 200 2500.0
0.1
0.2
0.3
0.4
0.5
SNP
glob
al F
st Cha_1025.1-149
Cha_15389.3‐101 (Hsp70)
Cha_15984.1‐275 (Hba)
Cha_2884.1‐367
locus‐FST
1513.1.91 (Cat)
Salinity, SST and allele frequenciesPS
U
浸透圧調整
5 10 15 20 25 30 35
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
KAT
LIM
RF
RUG
SKA
BB
GF
GD
HB
GR
CNSSHLCLSECIRSWIRICENOR
Mean annual sea surface salinity (PSU)
Cha
_153
60.2
-279
(Hsp
70) a
llle fr
eque
ncy
5 6 7 8 9 10 11 12
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
KAT
LIM
RF
RUG
SKA
BB
GF
GD
HB
GR
CNSSHL CLSECIRS WIRICENOR
Mean annual sea surface temperature
Cha
_153
60.2
-279
(Hsp
70) a
llle fr
eque
ncy
HSP70 呼吸?
5 6 7 8 9 10 11 120.65
0.70
0.75
0.80
0.85
0.90
0.95
KAT
LIM
RF
RUG
SKA
BB
GF
GD
HBGR
CNS
SHL
CLS
ECIRS
WIR
ICENOR
Mean annual sea surface temperature
Cha
_151
05.2
-341
(Cat
) allle
freq
uenc
y
5 10 15 20 25 30 350.65
0.70
0.75
0.80
0.85
0.90
0.95
KAT
LIM
RF
RUG
SKA
BB
GF
GD
HBGR
CNS
SHL
CLS
ECIRS
WIR
ICENOR
Mean annual sea surface salinity (PSU)
Cha
_151
05.2
-341
(Cat
) allle
freq
uenc
y
Cat酸素運搬
5 10 15 20 25 30 35
0.0
0.1
0.2
0.3
0.4
0.5
0.6
KAT LIM
RF
RUG
SKA
BBGFGD
HB
GR
CNS
SHLCLSEC
IRSWIRICENOR
Mean annual sea surface salinity (PSU)
Cha
_159
84.1
-275
(Hba
) allle
freq
uenc
y
5 6 7 8 9 10 11 12
0.0
0.1
0.2
0.3
0.4
0.5
0.6
KATLIM
RF
RUG
SKA
BBGF GD
HB
GR
CNS
SHLCLSEC
IRSWIR
ICENOR
Mean annual sea surface temperature
Cha
_159
84.1
-275
(Hba
) allle
freq
uenc
y
Hba
SST
5 6 7 8 9 10 11 12
0.4
0.6
0.8
1.0
KATLIMRF
RUG
SKA
BB
GF
GD
HB
GR
CNS
SHL
CLSECIRSWIR
ICE
NOR
Mean annual sea surface temperature
Cha
_102
5.1-
149
allle
freq
uenc
y
5 10 15 20 25 30 35
0.4
0.6
0.8
1.0
KAT LIMRF
RUG
SKA
BB
GF
GD
HB
GR
CNS
SHL
CLSECIRSWIR
ICE
NOR
Mean annual sea surface salinity (PSU)
Cha
_102
5.1-
149
allle
freq
uenc
y
Cha_1025.1-149
STT S
T
H HFH
全多様度 集団内多様度 集団間多様度
全多様度 全多様度
( )2S
1 1
2
11
1
r mij
i j
within
H pr
p
2
( )T
1 1
2
1
11
1
m rij
j i
m
jj
H pr
p
GST measuring population divergence (= FST)
r: number of populationsm: number of alleles
Nei, M. (1973). Analysis of gene diversity in subdivided populations. PNAS, 70, 3321-3323.
STglobalF
STpairF
Pairwise FST population structure (281 SNPs)
Kitada, Nakamichi, Kishino (2017) Molecular Ecology Resources. 17, 1210-1222.
FinePop
global FST=0.0128 (WC)
c.f. global FST=0.0413 (WC) for Atlantic salmon
Precise estimation of pairwise FST is difficult particularly in high gene flow species
Kitada, Kitakado, Kishino (2007) Genetics 177, 861-873.
1~ Dir( , ..., )k mp
1 1ˆ ˆ~ Dir( , ..., )
posteriork
m m
pn n
STpair T S
T
H HFH
Dirichlet-multinomial marginal likelihood
Kitada, Hayashi, Kishino (2000) Genetics 156, 2063-2079.
EBFST
Performance of pairwise FST estimators
Kitada, Nakamichi, Kishino (2017) Molecular Ecology Resources. 17, 1210-1222.
𝐹ST 𝛽 𝐷 𝛽 𝑇 𝛽 𝑆 𝛽 𝐷 𝑇 𝛽 𝐷 𝑆 𝛽 𝑇 𝑆 𝛽 𝐷 𝑇 𝑆
TIC 2 maximum log likelihood 2 trace 𝐴 𝐵
Kitada, Nakamichi, Kishino (2017) Molecular Ecology Resources. 17, 1210-1222.
FinePop
trace 𝐴 𝐵𝜎𝑠
𝜎 : variance of parameters𝑠 : variance of parameters assuming 𝑖𝑖𝑑
𝐴 𝐵
𝜎𝑠
⋯𝜎 𝜎
𝑠⋮ ⋱ ⋮
𝜎 𝜎𝑠
⋯𝜎𝑠
Salinity and geographical distance explained 60% variation of population structure
iid bootstrap
Lamichhaney et al. (2017). doi/10.1073/pnas.1617728114.
ST 0.026globalF
among 26 Atlantic herring populations
NW Atlantic
NE Atlantic/Baltic
n=1,837, ~1.2 million SNPs
* autumn-spawning
Lamichhaney et al. (2017).doi/10.1073/pnas.1617728114. TSHR thyroid-stimulating hormone receptor甲状腺刺激ホルモン受容体SOX11, CALM1 photoperiodic regulation ofreproduction in birds and mammals生殖の光周期調節ESR2A estrogen receptor beta 2 雌性ホルモンHERPUD2 homocysteine-responsive endoplasmic reticulum-resident ubiquitin-like domain member 2
America, NW Atlantic Europe, NE Atlantic/Baltic
P=10‐50
* autumn-spawning
TSHR pop structure