a statistical procedure to map high-order epistasis for complex traits
TRANSCRIPT
A statistical procedure to maphigh-order epistasis for complex traitsXiaoming Pang*, ZhongWang*, John S.Yap*, JianxinWang, Junjia Zhu,Wenhao Bo, Yafei Lv, Fang Xu,Tao Zhou, Shaofeng Peng, Dengfeng Shen and RonglingWuSubmitted: 22nd February 2012; Received (in revised form): 27th April 2012
AbstractGenetic interactions or epistasis have been thought to play a pivotal role in shaping the formation, developmentand evolution of life. Previous work focused on lower-order interactions between a pair of genes, but it is obviouslyinadequate to explain a complex network of genetic interactions and pathways.We review and assess a statisticalmodel for characterizing high-order epistasis amongmore than two genes or quantitative trait loci (QTLs) that con-trol a complex trait. The model includes a series of start-of-the-art standard procedures for estimating and testingthe nature and magnitude of QTL interactions. Results from simulation studies and real data analysis warrant the
Xiaoming Pang obtained his PhD in Pomology at Huazhong Agricultural University in 2002. After post-doctoral training and
research in China and Japan, he joined Beijing Forestry University as Associate Professor of Tree Breeding in 2006. His research interest
focuses on the utilization of molecular genetics and biotechnologies to study population genetic diversity and map quantitative trait loci
in fruit trees.
ZhongWang obtained his PhD in Engineering Mechanics at Dalian University of Technology in 2000. He found and managed a
software company in Japan from 2000 to 2008. He is Visiting Scholar in the Center for Computational Biology at Beijing Forestry
University. He writes computer software for statistical genetic models.
John S.Yap obtained his PhD in Statistics at the University of Florida in 2007. He monitors the accuracy and reasonableness of
statistical approaches that are used in drug discovery, development and delivery. He also develops new statistical models for genetic
mapping.
JianxinWang obtained his PhD in Computer Science at the University of Science and Technology Beijing in 2001. He is Associate
Professor of Informatics at Beijing Forestry University and currently a visiting scholar at the Pennsylvania State University. His research
interest is in computational bioinformatics in biology.
JinjiaZhu obtained his PhD in Statistics at the Pennsylvania State University in 2008. He was Assistant Professor in the Department of
Mathematics and Computer Sciences at the Penn State Harrisburg from 2008 to 2010, and is currently Assistant Professor in the
Division of Biostatistics in the Department of Public Health Sciences at the Pennsylvania State College of Medicine, Hershey. His
research interest is in computational statistics and statistical applications, particularly in modeling human diseases.
Wenhao Bo is a PhD candidate in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry
University. His research focuses on the genetic mapping of quantitative traits in Populus.Yafei Lv is a Master’s student in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry
University. His research focuses on the statistical genetics of quantitative traits in forest trees.
Fang Xu is a Master’s student in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry
University. His research focuses on the genetic diversity and evolution of Populus using molecular markers.
Tao Zhou is a PhD candidate in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry
University. His research focuses on the study of the genetic mechanisms for adaptive response to environmental stress in Populus.Shaofeng Peng is a PhD student in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry
University. Her research focuses on the use of molecular biotechnologies to study the population genetics of forest trees.
Dengfeng Shen is a PhD student in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry
University. Her research focuses on the study of ecological genetics and evolutionary genetics in Populus.RonglingWu obtained his PhD in Quantitative Genetics at the University of Washington in 1995. He is Changjiang Scholars
Professor of Genetics and the Director of the Center for Computational Biology at Beijing Forestry University. His interest is to
unravel the genetic roots for the outcome of biological traits by dissecting the traits into their biochemical and developmental pathways.
Corresponding author. Rongling Wu, Changjiang Scholars Professor of Molecular Breeding, Director, Center for Computational
Biology, College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing 100083, China. Tel: þ086 10 6233
6283. Fax: þ086 10 6233 6164. E-mail: [email protected]
*These authors contributed equally to this work.
BRIEFINGS IN BIOINFORMATICS. VOL 14. NO 3. 302^314 doi:10.1093/bib/bbs027Advance Access published on 20 June 2012
� The Author 2012. Published by Oxford University Press. For Permissions, please email: [email protected]
by guest on January 13, 2016http://bib.oxfordjournals.org/
Dow
nloaded from
statistical properties of the model and its usefulness in practice. High-order epistatic mapping will provide a routineprocedure for charting a detailed picture of the genetic regulation mechanisms underlying the phenotypic variationof complex traits.
Keywords: Epistasis; high-order interactions; quantitative trait loci; EM algorithm
INTRODUCTIONThe past decade has been a critical period in which
some phenomena related to genetic architecture are
rerecognized. For example, epistasis has been
thought to be an important force for evolution and
speciation [1, 2], but recent genetic studies from vast
quantities of molecular data have increasingly indi-
cated that epistasis critically affects the pathogenesis
of most inherited human diseases, such as cancer or
cardiovascular disease [3–5], the developmental pro-
cess and pattern of traits [6–8], susceptibility to HIV
virus [9, 10] and virus drug resistance [11]. The ex-
pression of an interconnected network of genes is
contingent upon environmental conditions, often
with the elements and connections of the network
displaying non-linear relationships with environ-
mental factors [12]. Not only do these elements
interact with each other in a pair-wise manner,
they also form a complicated web of high-order
interactions [13]. Because the embryonic expression
pattern of a complex trait undergoes a sequence of
metabolic pathways, such an interaction web should
involve multiple interacting gene products and regu-
latory loci [8, 14–19].
Methodologically, it is a challenge to test and
quantify genetic interactions among multiple genes.
Genetic mapping with molecular linkage maps has
proven to be powerful for the genome-wide detec-
tion of specific genes or quantitative trait loci (QTLs)
for complex traits [20–23]. This approach has now
been extended to search for epistatic interactions be-
tween different genes in controlled crosses [24], nu-
clear families [25], natural populations [26] and
case-control designs [19]. Wu et al. [7] incorporated
an epistatic model to study the genetic control of
developmental trajectories for a complex trait.
Several Bayesian approaches that allow an efficient
search for pair-wise epistasis throughout the genome
have been developed [27]. However, these strategies
for genetic mapping can be equipped with genetic
interactions among more than two QTLs, making it
possible to elucidate a detailed picture of the genetic
architecture of complex traits.
In a theoretical exploration by computer simula-
tion, Stich et al. [16] found that genetic mapping has
adequate power for the detection of three-way inter-
actions while with a low false positive rate. Several
authors showed the mathematical description of
high-order epistasis from regulatory networks
[28–30]. These advances in mathematical and statis-
tical modeling of high-order epistasis provide an in-
centive to study this complex genetic phenomenon.
The purpose of this article is to describe and assess a
general procedure for a genome-wide search for
high-order epistasis involving more than two QTLs
using a genetic mapping strategy. This procedure in-
tegrates traditional quantitative genetic theory into
genetic mapping, allowing the discernment of epistasis
at different orders. The procedure is reviewed and
tested in a genetic mapping study of rice with a
doubled haploid population [31], in which significant
three-way additive� additive� additive epistasis was
identified. Computer simulation was used to investi-
gate the statistical behavior of the model and algo-
rithm for three-way epistatic mapping.
HIGH-DIMENSIONALGENETICMODELINGWhy high-order epistasis?Epistasis is the masking of the phenotype of one
allele by the phenotype of an allele in another
locus [32, 33]. Since a phenotypic trait involves an
intricate network of biochemical reactions affected
by multiple interacting gene products and regulatory
loci, it is likely that genes generate higher-order epi-
static interactions [6, 8, 15, 16, 30]. For example,
maize (Zea mays L.) resists the corn earworm
Helicoverpa zea (Boddie), a major insect pest of crops
in the United States and elsewhere in the Western
Hemisphere, because of the C-glycosyl flavones
maysin, apimaysin and methoxymaysin synthesized
in silks [34]. As a resistance phenotype, the biosyn-
thesis of maysin, apimaysin and methoxymaysin
undergoes a complex network of metabolic path-
ways. Figure 1 illustrates a branch of the well
Mapping high-order epistasis for complex traits 303 by guest on January 13, 2016
http://bib.oxfordjournals.org/D
ownloaded from
characterized flavonoid pathway in which each step
and reaction are regulated by genes [15]. In order for
maysin to be synthesized, alleles at the following
genes should coordinate appropriately, p1, c2 and/
or whp1 (encoding chalcone synthases [35]), chi1(encoding chalcone isomerase [36]), pr1 (controlling
the 30-hydroxylation of the flavonoid B-ring to con-
vert monohydroxy to dihydroxy compounds [37])
and unidentified additional loci encoding flavone
synthase, C-glycosyl transferase, glucose oxidase,
rhamnosyl transferase and an enzyme such as gluta-
thione S-transferase for transport to the vacuole
[38, 39] (Figure 1). McMullen et al. [15] argued that
higher-order epistatic interactions among multiple
genes at different levels of biochemical pathways
are a determinant of final maysin synthesis. The
occurrence of high-order epistasis entails the devel-
opment of high-dimensional model for gene
detection.
Quantitative genetic model forhigh-order epistasisThe formation of a final phenotype is the conse-
quence of sequential genetic interactions involved
in biochemical and metabolic networks. Quantita-
tive genetic theory has well been established to
describe pair-wise epistasis by partitioning it into dif-
ferent components [40, 41]. Here, we extend this
theory to study high-order epistasis among three or
more genes. Consider three QTLs Q1, Q2, Q3, each
with two alleles Q and q, which control a complex
trait. Let jk denote one of three genotypes at QTL
Qk (k¼ 1, 2, 3). The genotypic value of a 3-QTL
genotype, denoted as mj1j2j3 (jk¼ 0 for qkqk, 1 for
Qkqk, 2 for QkQk), can be partitioned into different
components including the main effects, two-way
interaction effects and three-way interaction effects
[19], i.e.
Figure 1: A branch of biochemical pathways for flavones synthesis in maize. CHS, chalchone synthase; CHI, chal-cone isomerase; F3H, flavanone-3-hydroxylase; DFR, dihydroflavanone reductase; F30H, flavanone-30 -hydroxylase;FNS, flavone synthase; RT, rhamnosyl transferase. Adapted from [15].
304 Pang et al. by guest on January 13, 2016
http://bib.oxfordjournals.org/D
ownloaded from
where m is the overall mean; a1, a2 and a3 are the
main additive genetic effects, d1, d2 and d3 are the
main dominant effects at QTLs Q1, Q2, Q3, respect-
ively; ia1a2, ia1d2
, id1a2, id1d2
, ia1a3, ia1d3
, id1a3, id1d3
, ia2a3,
ia2d3, id2a3
, id2d3are the two-way additive� additive,
additive� dominant, dominant� additive and
dominant� dominant epistasis between QTLs Q1
and Q2, between QTLs Q1 and Q3, and between
QTLs Q2 and Q3, respectively; ia1a2a3, ia1a2d3
, ia1d2a3,
id1a2a3, ia1d2d3
, id1a2d3, id1d2a3
, id1d2d3are the three-way
additive� additive� additive, additive� additive�
dominant, additive� dominant� additive, domin-
ant� additive� additive, additive� dominant�
dominant, dominant� additive� dominant, domin-
ant� dominant� additive and dominant� domin-
ant� dominant epistasis among QTLs Q1, Q2, Q3,
respectively.
The genetic effect parameters are then solved from
the genotypic values:
m000
m001
m002
m010
m011
m012
m020
m021
m022
m100
m101
m102
m110
m111
m112
m120
m121
m122
m200
m201
m202
m210
m211
m212
m220
m221
m222
2666666666666666666666666666666666666666666664
3777777777777777777777777777777777777777777775
¼
1 �1 �1 �1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 �1 0 0 0 0 0 0 01 �1 �1 0 0 0 1 1 0 0 0 0 0 0 �1 0 0 �1 0 0 0 0 1 0 0 0 01 �1 �1 1 0 0 0 1 �1 �1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 01 �1 0 �1 0 1 0 0 0 1 0 0 0 �1 0 0 0 0 �1 0 0 1 0 0 0 0 01 �1 0 0 0 1 1 0 0 0 0 1 0 �1 �1 0 0 0 0 0 0 0 0 �1 0 0 01 �1 0 1 0 1 0 0 0 �1 0 0 0 �1 0 0 0 0 1 0 0 �1 0 0 0 0 01 �1 1 �1 0 0 0 �1 �1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 01 �1 1 0 0 0 1 �1 0 0 0 0 0 0 �1 0 0 1 0 0 0 0 �1 0 0 0 01 �1 1 1 0 0 0 �1 1 �1 0 0 0 0 0 0 0 0 0 �1 0 0 0 0 0 0 01 0 �1 �1 1 0 0 0 1 0 0 0 0 0 0 �1 �1 0 0 0 1 0 0 0 0 0 01 0 �1 0 1 0 1 0 0 0 0 0 1 0 0 �1 0 �1 0 0 0 0 0 0 �1 0 01 0 �1 1 1 0 0 0 �1 0 0 0 0 0 0 �1 1 0 0 0 �1 0 0 0 0 0 01 0 0 �1 1 1 0 0 0 0 1 0 0 0 0 0 �1 0 �1 0 0 0 0 0 0 �1 01 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 01 0 1 �1 1 0 0 0 �1 0 0 0 0 0 0 1 �1 0 0 0 �1 0 0 0 0 0 01 0 1 0 1 0 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 01 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 01 1 �1 �1 0 0 0 �1 1 �1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 01 1 �1 0 0 0 1 �1 0 0 0 0 0 0 1 0 0 �1 0 0 0 0 0 0 0 0 01 1 �1 1 0 0 0 �1 �1 1 0 0 0 0 0 0 0 0 0 �1 0 0 0 0 0 0 01 1 0 �1 0 1 0 0 0 �1 0 0 0 1 0 0 0 0 �1 0 0 �1 �1 0 0 0 01 1 0 0 0 1 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 01 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 0 01 1 1 �1 0 0 0 1 �1 �1 0 0 0 0 0 0 0 0 0 �1 0 0 0 0 0 0 01 1 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 01 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
2666666666666666666666666666666666666666666664
3777777777777777777777777777777777777777777775
�
ma1
a21
a3
d1
d2
d3
ia1a2
ia1a2
ia1a2
ia1a2
ia1a2
ia1a2
ia1a2
ia1a2
ia1a2
ia1a2
ia1a2
ia1a2
ia1a2
ia1a2
ia1a2
ia1a2
ia1a2
ia1a2
ia1a2
ia1a2
2666666666666666666666666666666666666666666664
3777777777777777777777777777777777777777777775
ð1Þ
bT ¼ ðm,a1,a2,a3,d1,d2,d3,ia1a2,ia1d2
,id1a2,id1d2
,ia1a3,ia1d3
,id1a3,id1d3
,ia2a3,ia2d3
,id2a3,id2d3
,ia1a2a3,ia1a2d3
,ia1d2a3,id1a2a3
,ia1d2d3,id1a2d3
,id1d2a3,id1d2d3
ÞT¼
¼1
23
1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1
�1 0 �1 0 0 0 �1 0 �1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1
�1 0 �1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 �1 0 �1 0 0 0 1 0 1
�1 0 1 0 0 0 �1 0 1 0 0 0 0 0 0 0 0 0 �1 0 1 0 0 0 �1 0 1
�1 0 �1 0 0 0 �1 0 �1 2 0 2 0 0 0 2 0 2 �1 0 �1 0 0 0 �1 0 �1
�1 0 �1 2 0 2 �1 0 �1 0 0 0 0 0 0 0 0 0 �1 0 �1 2 0 2 �1 0 �1
�1 2 �1 0 0 0 �1 2 �1 0 0 0 0 0 0 0 0 0 �1 2 �1 0 0 0 �1 2 �1
1 0 1 0 0 0 �1 0 �1 0 0 0 0 0 0 0 0 0 �1 0 �1 0 0 0 1 0 1
1 0 �1 0 0 0 �1 0 1 0 0 0 0 0 0 0 0 0 1 0 �1 0 0 0 �1 0 1
1 0 �1 0 0 0 1 0 �1 0 0 0 0 0 0 0 0 0 �1 0 1 0 0 0 �1 0 1
1 0 1 �2 0 �2 1 0 1 �2 0 �2 4 0 4 �2 0 �2 1 0 1 �2 0 �2 1 0 1
1 �2 1 �2 4 �2 1 �2 1 0 0 0 0 0 0 0 0 0 1 �2 1 �2 4 �2 1 �2 1
1 �2 1 0 0 0 1 �2 1 �2 4 �2 0 0 0 �2 4 �2 1 �2 1 0 0 0 1 �2 1
1 0 1 �2 0 �2 1 0 1 0 0 0 0 0 0 0 0 0 �1 0 �1 2 0 2 �1 0 �1
1 �2 1 0 0 0 1 �2 1 0 0 0 0 0 0 0 0 0 �1 2 �1 0 0 0 �1 2 �1
1 0 1 0 0 0 �1 0 �1 �2 0 �2 0 0 0 2 0 2 1 0 1 0 0 0 �1 0 �1
1 0 �1 0 0 0 1 0 �1 �2 0 2 0 0 0 �2 0 2 1 0 �1 0 0 0 1 0 �1
1 �2 1 0 0 0 �1 2 �1 0 0 0 0 0 0 0 0 0 1 �2 1 0 0 0 �1 2 �1
1 0 �1 �2 0 2 1 0 �1 0 0 0 0 0 0 0 0 0 1 0 �1 2 0 1 1 0 �1
�1 0 1 0 0 0 1 0 �1 0 0 0 0 0 0 2 0 0 1 0 �1 0 0 0 �1 0 1
�1 0 1 0 0 0 1 0 �1 0 0 �2 0 0 0 �2 0 2 �1 0 1 0 0 0 1 0 �1
�1 0 1 2 0 �2 �1 0 1 2 0 0 0 0 0 0 0 0 1 0 �1 2 0 2 1 0 �1
�1 2 �1 0 0 0 1 �2 1 0 0 0 0 0 0 0 0 0 1 �2 1 0 0 0 �1 2 �1
�1 2 �1 2 �4 2 �1 2 �1 0 0 0 0 0 0 0 0 0 1 �2 1 �2 4 �2 1 �2 1
�1 2 �1 0 0 0 1 �2 1 0 �4 2 0 0 0 �2 4 �2 �1 2 �1 0 0 0 1 �2 1
�1 0 1 2 0 �2 �1 0 1 2 0 �2 �4 0 4 2 0 �2 �1 0 1 2 0 �2 �1 0 1
�1 2 �1 2 �4 2 �1 2 �1 0 �4 2 �4 8 �4 2 �4 2 �1 2 �1 2 �4 2 �1 2 �1
26666666666666666666666666666666666666666666666666666666664
37777777777777777777777777777777777777777777777777777777775
�
m000
m001
m002
m010
m011
m012
m020
m021
m022
m100
m101
m102
m110
m111
m112
m120
m121
m122
m200
m201
m202
m210
m211
m212
m220
m221
m222
26666666666666666666666666666666666666666666666666666666664
37777777777777777777777777777777777777777777777777777777775
ð2Þ
Mapping high-order epistasis for complex traits 305 by guest on January 13, 2016
http://bib.oxfordjournals.org/D
ownloaded from
Using these expressions, we can test the signifi-
cance of each genetic effect. The model can be ex-
tended to characterize high-order epistasis among
any number of QTLs.
Approaches for mapping high-orderepistasisGenetic mapping founded on quantitative genetic
theory has been used to study genetic interactions.
Consider an F2 mapping population, derived from
two inbred lines, in which all n progeny is genotyped
for a panel of molecular markers to construct a genetic
linkage map and phenotyped for a complex trait [23].
The likelihood of trait values, determined by the
three QTLs Q1, Q2, Q3, is formulated by a mixture
model composed of 33 genotype components, i.e.
LðyÞ ¼Yni¼1
X2
j1¼0
X2
j2¼0
X2
j3¼0
oj1j2j3ji fj1j2j3 ðyiÞ ð3Þ
where yi is the phenotypic value of the trait for pro-
geny i, oj1j2j3ji is the probability at which an arbitrary
progeny i is QTL genotype mj1j2j3 and fj1j2j3ðyiÞ is the
a normal density function of progeny i with geno-
typic mean mj1j2j3 and variance s2. Since the QTL
genotype of a progeny is unknown but can be
inferred from its marker genotype, oj1j2j3ji is actually
a conditional probability of QTL genotype j1 j2 j3given the marker genotype of progeny i, which
can be expressed in terms of the recombination frac-
tions between the QTLs and markers [23]. In the
likelihood (3), we have the unknown parameters,
arrayed in ?, including QTL positions, genotypic
means and variance.
The parameters can be estimated by maximizing
the likelihood (3). This can be done by differentiat-
ing the likelihood with respect of individual param-
eters y (y 2 ?), i.e.
@
@ylog LðyÞ
¼@
@y
Xni¼1
logX2
j1¼0
X2
j2¼0
X2
j3¼0
oj1j2j3jifj1j2j3ðyiÞ
¼Xni¼1
X2
j1¼0
X2
j2¼0
X2
j3¼0
oj1j2j3jifj1j2j3ðyiÞP2j01¼0
P2j02¼0
P2j03¼0
oj01j02j03jifj0
1j02j03ðyiÞ
@
@yfj1j2j3 ðyiÞ
¼Xni¼1
X2
j1¼0
X2
j2¼0
X2
j3¼0
�j1j2j3ji@
@yfj1j2j3ðyiÞ
ð4Þ
where
�j1j2j3ji ¼oj1j2j3jifj1j2j3 ðyiÞ
P2j01¼0
P2j02¼0
P2j03¼0
oj01j02j03ji fj0
1j02j03ðyiÞ
ð5Þ
is interpreted as the posterior probability that pro-
geny i has QTL genotype j1j2j3. Substituting (5) into
(4), we obtain the formulas to estimate genotypic
means and variance expressed as
mj1j2j3 ¼
Pni¼1
�j1j2j3jiyi
Pni¼1
�j1j2j3ji
ð6Þ
s2 ¼1
n
Xni¼1
X2
j1¼0
X2
j2¼0
X2
j3¼0
�j1j2j3jiðyi � mj1j2j3Þ2
ð7Þ
The EM algorithm [20] is implemented to esti-
mate mj1j2j3 and s2 by using an iterative procedure
between the E step (5) and M steps (6) and (7). The
values at convergence are the maximum-likelihood
estimates (MLEs) of mj1j2j3 and s2. After obtaining
the estimates of genotypic the means, we can solve
for the estimates of the genetic effect parameters
using Equation (2). In practice, the QTL positions
are estimated by treating oj1j2j3ji as a fixed parameter,
scanning the entire genome and detecting the largest
likelihood that corresponds to the best estimation of
QTL positions.
Hypothesis testsWhen no QTL is segregating, only one normal
density can describe the population in which case
no EM algorithm is needed for parameter estimation.
The existence of QTLs can be tested by calculating
and comparing the likelihoods under the null
hypothesis H0: there is no QTL, L( ~?) and the alter-
native hypothesis H1: there is at least one QTL,
L(?). The resulting log-likelihood ratio (LR) test
statistics is
LR ¼ 2½log Lð ~?� log Lð?Þ�
where ~? and ? are the MLEs of unknown par-
ameters under the H0 and H1, respectively. The sig-
nificance of the result can be tested by using
permutation tests [42]. By reshuffling the phenotypic
data and calculating the LR genome-wide for each
permutation, a critical threshold is obtained at a par-
ticular significance level.
306 Pang et al. by guest on January 13, 2016
http://bib.oxfordjournals.org/D
ownloaded from
A procedure can also be given to test different
components of genotypic values including the addi-
tive (a1, a2, a3) and dominant main genetic effects
(d1, d2, d3) at individual QTLs, two-way epistatic
interactions of 12 different types (ia1a2, ia1d2
, id1a2,
id1d2, ia1a3
, ia1d3, id1a3
, id1d3, ia2a3
, ia2d3, id2a3
, id2d3), and
three-way epistatic interactions of eight different
types (ia1a2a3, ia1a2d3
, ia1d2a3, id1a2a3
, ia1d2d3, id1a2d3
, id1d2a3,
id1d2d3). All these components are calculated from
genotypic means using a group of equations (1).
When we want to test whether one or more of
the 26 effects equals zero, we will only need to es-
timate the remaining effects. In such a reduced
model, we will use the same EM algorithm for
parameter estimation described for the full model
(5)–(7), but with the constraint(s) that poses on the
relationships among genotypic values, mj1j2j3 ( jk¼ 0
for qkqk, 1 for Qkqk, 2 for QkQk), derived under
the condition that the effect parameters being
tested are equal to zero. The Augment-M algo-
rithm is incorporated [43], in which the M step is
derived under the reduced model using the follow-
ing steps:
(1) For progeny i with observed phenotypic
value yi, we augment the trait value yj1j2j3jiof this progeny that carries QTL genotype
j1j2j3, i.e.
yj1j2j3ji ¼ �j1j2j3jiyi,
where �j1j2j3ji is the posterior probability obtained
from the E-step (5);
(2) Define a vector of dummy variable Xj1j2j3ji that
meets
Eðyj1j2j3jiÞ ¼ Xj1j2j3jib,
where b is the vector of genetic effect parameters (2);
(3) By arranging the augmented data in a linear
model framework, we have
yA ¼ XAb,
where yA¼fyj1j2j3jig and XA¼fXj1j2j3jig. For a given
reduced model, represented by KTb¼ 0, where K is
a vector that constrains a certain effects to be equal
to zero, we have
bK ¼ b� ðXTAXAÞ
�1K½KTðXTAXAÞ
�1K�KT b, ð8Þ
where b ¼ ðXTAXAÞ
�1ðXT
AXAÞ;
(4) The variance in the reduced model is estimated
by
s2K ¼
1
n
Xni¼1
�j1j2j3jiðyi�Xj1j2j3jibK Þ2, ð9Þ
where n is the total number of progeny in the
mapping population.
The iteration is made between the E step (5) and
M step (8) and (9) until the stable values are ob-
tained. These stable values are the MLEs of the par-
ameters under the reduced model. In each case of
testing the significance of effect parameters, we cal-
culate the likelihoods under the null and alternative
hypotheses and therefore the LRs. The critical
thresholds for testing each effect can be obtained
from simulation approaches [23].
To reduce the computing burden for threshold
determination, several formulae have been derived
for computing approximate critical thresholds to
control the type I error rate at a chromosome- or
genome-wide level [44, 45]. Chang et al. [46] pro-
posed a score test statistic for QTL mapping. The
score test is computationally simpler than the LR
test, since it only uses the MLEs of parameters
under the null hypothesis. More importantly, the
maximum of the square of score statistics are asymp-
totically equivalent to the maximum of the LR test
statistics under the null hypothesis, thus the critical
threshold for the score test can also be used for the
LR test, which can improve the computing effi-
ciency of threshold determination.
MODELVALIDATIONWorked exampleTwo rice cultivars, semi-dwarf IR64 and tall
Azucena, was crossed to generate a doubled-haploid
(DH) population. Using 135 DH lines from this
population, a genetic linkage map was constructed,
covering 12 chromosomes with 175 molecular mar-
kers [31]. The DH population was grown in a ran-
domized complete design with two replicates at a
spacing of 15� 20 cm in a field near Hangzhou,
China. Final plant heights were measured for each
plant. To reduce random errors in height pheno-
types, we took the mean of two replicates for each
DH line, used for QTL mapping. Indeed, two rep-
licates can be incorporated into the model in a way,
as shown by Wu et al. [47], which takes into account
Mapping high-order epistasis for complex traits 307 by guest on January 13, 2016
http://bib.oxfordjournals.org/D
ownloaded from
the spatial correlation of phenotypic values due to
microenvironmental factors.
Marker and QTL co-segregation in a DH popu-
lation follows a backcross pattern. In a DH popula-
tion, each locus has two homozygotes (denoted as
1and 0, respectively) each inheriting two alleles from
a different parent. Thus, three QTLs form eight dif-
ferent homozygotes, denoted as j1j2j3 (j1, j2, j3¼ 1,
0), with genotypic value mj1j2j3 which can be parti-
tioned into different components as follows:
m111 ¼ mþ a1 þ a2 þ a3 þ ia1a2þ ia1a3
þ ia2a3þ ia1a2a3
m110 ¼ mþ a1 þ a2 � a3 þ ia1a2� ia1a3
� ia2a3� ia1a2a3
m101 ¼ mþ a1 � a2 þ a3 � ia1a2þ ia1a3
� ia2a3� ia1a2a3
m100 ¼ mþ a1 � a2 � a3 � ia1a2� ia1a3
þ ia2a3þ ia1a2a3
m011 ¼ m� a1 þ a2 þ a3 � ia1a2� ia1a3
þ ia2a3� ia1a2a3
m010 ¼ m� a1 þ a2 � a3 � ia1a2þ ia1a3
� ia2a3þ ia1a2a3
m001 ¼ m� a1 � a2 þ a3 þ ia1a2� ia1a3
� ia2a3þ ia1a2a3
m000 ¼ m� a1 � a2 � a3 þ ia1a2þ ia1a3
þ ia2a3� ia1a2a3
where m is the overall mean; a’s are the additive
effects at different QTLs, and i’s are two- or three-
way epistatic interactions between the QTLs. The
procedure for mapping high-order epistasis was
used to estimate the additive, additive� additive,
additive� additive� additive genetic effects on
plant height in this DH population.
By simultaneously searching for three QTLs
at every 4 cm throughout the entire genome, we
detected three locations, markers RG403–RG229
on chromosome 5, markers RZ337B–CDO497 on
chromosome 7, and markers RG667–RG451 on
chromosome 9, which jointly affect plant heights.
Figure 2 shows a portion of the overall LR plot
against the searched positions S, with the LR peak
(64, 288, 440) that corresponds to the locations of
three QTLs on Chromosomes 5, 7 and 9. Three-
dimensional plots are displayed for cycles 1–4 in
Figure 3, where the x, y-coordinates are the b and c
Figure 2: A portion of the LR versus genome plot (top). For a demonstration, cycle 2 is magnified to show aclearer picture of the probable location of the QTL at a¼ 64, b¼ 288 and c¼ 440 (bottom). Because a three-waymodel deals with an exponentially increasing number of combinations, a genome-wide critical threshold may betoo conservative. For this reason, we determined the critical threshold chromosome-wise from 200 permutationtests [42]. That is, the proposed three-way epistatic model was applied 200 times to the three chromosomeswhere the three QTLs were located. At each application, the height measurements were permuted against the cor-respondingmarkers to remove anymarker-QTL association and simulate a null distribution. All 200 LR values result-ing from each application of the model were ranked and the 95th percentile was 45.96, which is below the globalmaximum LR value of 56.17. There are eleven cycles on the top plot, each of which corresponds to a fixed positionfor a and all possible values for b and c. The bottom plot is a magnified view for cycle 2 where the global peak islocated. It shows roughly ten cycles, each of which corresponds to a fixed value for b and all possible values for c.
308 Pang et al. by guest on January 13, 2016
http://bib.oxfordjournals.org/D
ownloaded from
search positions, respectively, and the z-coordinate is
the LR value.
Table 1 gives the MLEs of the locations and gen-
etic effects of the three QTLs detected. The QTL on
chromosome 9, linked to marker RG667, triggers a
highly significant additive genetic effect (a3) on plant
height, but the additive effects of the two QTLs on
chromosomes 5 and 7 are not significant. The gen-
omic region of marker RG667 was shown to harbor
a QTL that affects a plant height-corrected trait,
number of productive tillers in rice [48]. The alleles
derived from the tall parent Azucena contribute
favorably to height growth. There is a significant
two-way interaction epistasis (ia1a3) between the
additive effects at the significant QTL on chromo-
some 5 and the non-significant QTL on chromo-
some 9, but the ‘tall’ alleles (derived from parent
Azucena) at each of these two QTLs interact to in-
hibit the plant height growth of rice. It is interesting
to find that a highly significant three-way interaction
epistasis (ia1a2a3) occurs among the additive effects at
the three QTLs by favorably increasing rice height
growth.
Computer simulationWe simulated a backcross design with two genotypes
1 and 0 at each locus and randomly generated
nine markers equally spaced in a linkage map of
225 cM. Let three putative QTLs be located at 10
cM each from the third, sixth and eighth markers,
respectively. Phenotypic values were then simulated
by summing the genotypic value of a specific
three-QTL genotype and residual errors that follow
a normal distribution with mean zero and variance
s2. The simulation studies were designed for differ-
ent sample sizes (n¼ 100 and 400) and different
heritabilities (H2¼ 0.1 and 0.4). The values of the
residual variance were determined, depending on the
level of heritability.
Tables 2 and 3 tabulate the estimates of QTL
locations and effect parameters from the simulated
data based on 100 simulation replicates. In general,
the locations and genotypic values of the QTLs and
residual variance can well be estimated even when
there is a modest sample size (100) and heritability
(0.1) (Table 2). Increasing sample sizes and heritabil-
ities can remarkably improve the estimation precision
of these parameters. The estimation precision of the
genetic effect parameters depends on the type of
the parameters (Table 3). The additive effect can
be estimated most precisely, followed by two-way
interaction effects and three-way interaction effects.
Also, the estimates of the genetic effect parameters
were found to be sensitive to sample size and/or
heritability. Yet, the increase of heritability from
0.1 to 0.4 produces a much better efficiency in im-
proving estimation precision than that of sample size
from 100 to 400. This suggests that a better manage-
ment of plants, aimed to minimize experimental
errors, will contribute more substantially to mapping
precision than a simple increase of sample size.
In general, if a trait has a high heritability, a sample
size of 100 is adequately enough for the reasonable
estimation of the additive and epistatic effects. For a
modest heritability (say 0.1), 400 samples are needed.
By increasing sample size to 1000, it was found that
all estimates can be improved even when the trait has
a low heritability (�0.05). It is always important to
investigate the power of the model to identify sig-
nificant genetic effects given a particular sample size.
We calculated the power using computer simulation.
With the value of a particular genetic effect, which
is used to determine the magnitude of residual vari-
ance for a given heritability, phenotypic and marker
are simulated. The proportion of the number of sig-
nificant simulation replicates (i.e. those in which the
effect is found to be significant) over the total
number of simulation replicates is empirically re-
garded as statistical power for identifying this genetic
Table 1: The MLEs of the QTL locations and effects for plant height growth in a DH mapping population of rice
Chr. Marker interval Map distance Main effect Two-way epistasis Three-way epistasis
5 RG403-RG229 2.1 cM a1 ¼ 1:05ia1a2 ¼ �4:76
7 RZ337B-CDO497 10.7 cM a2 ¼ 1:05 ia1a3 ¼ �8:77 ia1a2a3 ¼ 15:06ia1a2 ¼ �4:76
9 RG667-RG451 15.6 cM a3 ¼ 11:52
Map distancemeans the distance of the QTL from the leftmaker for an interval.
Mapping high-order epistasis for complex traits 309 by guest on January 13, 2016
http://bib.oxfordjournals.org/D
ownloaded from
effect. We ran an additional simulation to examine
the power of the model for detecting three-way
interactions. For a quantitative trait with a modest
heritability (say 0.1), we detected power of �0.70 for
detecting three-way epistasis if a sample size of 400
was used. Under the same heritability, the power
increases to >0.9 if the sample size increases to 800.
DISCUSSIONOur understanding of how the information con-
tained in genotypes is translated into complex
phenotypic traits represents a major challenge in bio-
logical research. Although its precise description has
not been clear yet, existing evidence shows that this
process undergoes a multilayered hierarchy of regu-
latory networks in which genes and products from
different levels or stages interact and coordinate to
form a final phenotype. It is highly likely that inter-
actions involving more than genes, i.e. so-called
high-order interactions should play a central role in
coordinating the networks [29, 30]. Although the
contribution of epistatic interactions to quantitative
genetic variation has been increasingly recognized by
population and evolutionary biologists [1, 2, 4] and
medical geneticists [5, 33], the impact of high-order
epistasis on phenotypic diversity has not been care-
fully explored. Results from a limited number of
quantitative genetic studies show that high-order
epistasis could be correlated with some certain cyto-
logical phenomena [49] and growth traits [50].
In this article, we review a mapping model for
characterizing genetic interactions of multiple
orders that are responsible for complex traits. The
model was founded on the general framework of
genetic mapping with molecular maps, allowing
the genome-wide search for multilocus interactions
throughout the genome. The model shows several
advantages. First, it can test the relative importance
of different types of genetic effects including the
main additive, low-order epistasis and high-order
epistasis and, thus, provides an unprecedented op-
portunity to study the detailed atlas of genetic con-
trol mechanisms for complex phenotypes. From a
biological perspective, it is possible that a single
gene does not trigger a significant effect on a pheno-
type, but exerts a remarkable impact on the pheno-
type through epistasis with other genes involved in
key pathways that form the final phenotype
[29, 30, 33]. Second, we derived a closed form for
estimating the genetic effects of different types,Table
2:The
averaged
MLE
sof
theQTL
positio
nsandgeno
typicvalues
andtheirstandard
errors
(given
inparenthe
ses)un
derdiffe
rent
samplesizes(n)andherit-
abilitie
s(H
2 )ba
sedon
100sim
ulationreplicates
H2
nQTLloca
tion
s3-Q
TLge
notypemea
nss2
12
3m 1
11m 1
10m 1
01m 1
00
m 011
m 010
m 001
m 000
0.1100
43(3.73)
101(4.42)
157(4.05)
148.88
(0.17
)148.79(0.44)
151.32(0.67)
151.11(0.54
)151.6
2(0.54
)147.2
0(0.79)
150.88
(0.47)
151.79(0.18
)5.19(0.05)
0.140
048
(2.94)
109(4.18)
165(4.03)
149.0
8(0.09
)148.80
(0.24)
150.63(0.52)
150.06
(0.45)
150.36(0.49)
148.34(0.55)
152.18(0.27)
151.79(0.10
)5.83(0.02)
0.4
100
52(2.23)
117(3.25
)172(3.17)
149.0
2(0.07
)148.43(0.16)
150.42(0.26)
149.4
9(0.16)
151.4
8(0.15)
147.7
7(0.31)
152.36(0.24)
152.02(0.08)
2.22
(0.02)
0.4
400
60(0.34)
134(0.66)
185(0.60
)149.0
5(0.03)
148.10(0.05)
150.01(0.09)
149.0
7(0.05)
151.9
9(0.04
)147.12(0.08
)152.90
(0.05)
152.00
(0.03)
2.43(0.01)
True
values
60135
185
149
148
150
149
152
147
153
152
6/2.45
310 Pang et al. by guest on January 13, 2016
http://bib.oxfordjournals.org/D
ownloaded from
facilitating the computational efficiency and imple-
mentation into a package of computer software.
More interesting, the closed form for parameter es-
timation exist for reduced models in which our test is
to focus on a particular subset of parameters.
The model was validated by reanalyzing a real data
set for genetic mapping of plant heights in rice [31].
On one hand, this reanalysis has well warranted the
usefulness and utilization of the model in practice.
On the other hand, new discoveries for the genetic
control of plant height growth in rice have been
made by using the new model. In previous studies,
several significant QTLs have been detected for
height growth in this mapping population [51, 52].
For example, Zhao et al. [52] identified these QTLs
on chromosomes 1, 3, 7, 9 and 11. In addition to the
same QTLs identified on chromosomes 7 and 9, the
new model has also detected a QTL on chromosome
Figure 3: The LR value in three dimensions.The four plots correspond to the four cycles in Figure 2.The rectangleis the 200 permutation cutoff.
Table 3: The averaged MLEs of the QTL positions and genetic effects and their standard errors (given in parenth-eses) under different sample sizes (n) and heritabilities (H2) based on 100 simulation replicates.
H2 n m Additive effect Interaction effects
a1 a2 a3 ia1a2 ia1a3 ia2a3 ia1a2a3
0.1 100 150.20(0.15) �0.17(0.14) �1.08(0.16) 0.48(0.18) �0.11(0.21) �0.40(0.20) 0.65(0.19) �0.68(0.21)0.1 400 150.15(0.11) �0.51(0.13) �1.01(0.12) 0.41(0.16) 0.31(0.13) �0.20(0.13) 0.17(0.13) �0.24(0.16)0.4 100 150.13(0.06) �0.78(0.06) �0.95(0.07) 0.70(0.08) 0.33(0.08) �0.31(0.07) 0.38(0.07) �0.46(0.07)0.4 400 150.03(0.02) �0.97(0.02) �0.97(0.02) 0.96(0.02) 0.48(0.02) �0.48(0.02) 0.50(0.02) �0.49(0.02)TrueValue 150 �1 �1 1 0.50 �0.50 0.50 �0.50
Mapping high-order epistasis for complex traits 311 by guest on January 13, 2016
http://bib.oxfordjournals.org/D
ownloaded from
5. Although this new QTL has no significant main
additive effect, it functions epistatically with one
on chromosome 9 to determine the final plant
height of rice. It is very interesting to find that
these two QTLs, along with one on chromosome
7, display a highly significant three-way addi-
tive� additive� additive epistatic effect through
increasing or decreasing �15 cm in plant height. It
should be noted that the detection of more QTLs by
Zhao et al. [52] than the new model may be due to
the fact that the former makes use of height data at
multiple time points. To test the statistical behavior
of the new model, simulation studies were per-
formed, suggesting that three-way epistasis can well
be estimated when an adequately large sample size is
used.
With the model described in this article, the in-
vestigation of whether three-way epistasis is a wide-
spread phenomenon in plant height can now be
made possible by reanalyzing published mapping
data or designing new mapping experiments.
In practice, it is impossible to precisely estimate a
genetic effect from a single study. The uncertainty
of chromosomal locations and genetic effects for
QTLs can be overcome through replicating the
same experiments. The estimates of these QTL par-
ameters from multiple replicates are closer to the
reality of the parameters.
As genome-wide association studies (GWAS)
have emerged as a useful tool for plant, animal and
human genetics [53–57], it is crucial to incorporate
the multilocus epistasis detection model to illustrate a
network of genetic interactions throughout the
genome. In the current model specification, we do
not consider environmental factors. Given its im-
portance, genotype� environment interaction
should be integrated into the high-order epistasis
model [12]. Also, it is worthwhile to model the
pleiotropic effects of high-order epistasis on different
aspects of phenotypic traits [58, 59]. The major factor
of limiting these extensions is the combinatory search
of too many interactions on a much smaller number
of samples. However, the recent availability of fea-
ture selection methods [60], equipped with efficient
computing algorithms, such as genetic programming
[13], provides an unprecedented opportunity to pro-
duce a useful statistical toolbox for dissecting com-
plex phenotypes into their genetic components at
different levels. The computer code used to detect
and test high-order epistasis is available at http://stat-
gen.psu.edu.
Key Points
� Despite considerable efforts to dissect the genetic architectureof complex traits, much still remains unclear including the distri-bution, mechanisms and importance of genetic interactions.High-order epistasis due to multilevel interactions of genesis thought to be the hidden genetic variation that has not beenutilized in agriculture and biomedicine.
� Genetic mapping, now used as a routine approach for studyingthe genetic regulation of quantitative traits, has a unique powerto characterize pair-wise epistatic interactions. However, thereis still no in-depth exploration to estimate and test the geneticeffects due to interactions among three ormore genes.
� We formulate and assess a state-of-art statistical procedure ofimplementing genetic mapping to detect high-order epistasis,filling a gap that occurs in quantitative genetics, evolutionarygenetics andmedicalgenetics.We argue thathigh-order epistaticmapping can serve as a routine tool to comprehend the geneticarchitecture of complex traits.
ACKNOWLEDGEMENTSThe authors thank Prof. Jun Zhu at Zhejiang University for
providing the rice data to validate the high-order epistatic map-
ping model.
FUNDINGThe Special Fund for Forestry Scientific Research
in the Public Interest (No. 201004017), NSF/
IOS-0923975; the Changjiang Scholarship Award;
and ‘Thousand-person Plan’ Award.
References1. Whitlock MC, Phillips PC, Moore BG, et al. Multiple fit-
ness peaks and epistasis. Ann Rev Ecol Syst 1995;26:601–29.
2. Wolf JB, Brodie EE III, Wade MJ. Epistasis and theEvolutionary Process. Oxford: Oxford University Press, 2000.
3. Carlborg O, Haley CS. Epistasis: too often neglected incomplex trait studies? Nat Rev Genet 2004;5:618–25.
4. Phillips PC. Epistasis – the essential role of gene interactionsin the structure and evolution of genetic systems. Nat RevGenet 2008;9:855–67.
5. Moore J, Williams S. Epistasis and its implications forpersonal genetics export. AmJHumGenet 2009;85:309–20.
6. Huang LS, Sternberg PW. Genetic dissection of develop-mental pathways. Methods Cell Biol 1995;48:97–122.
7. Wu RL, Ma CM, Lin M, et al. A general framework foranalyzing the genetic architecture of developmental charac-teristics. Genetics 2004;166:1541–51.
8. Imielinski M, Belta C. Deep epistasis in human metabolism.CHAOS 2010;20:026104.
9. Martin MP, Gao X, Lee JH, et al. Epistatic interaction be-tween KIR3DS1 and HLA-B delays the progression toAIDS. Nat Genet 2003;3:429–34.
10. Gabutero E, Moore C, Mallal S, et al. Interaction betweenallelic variation in IL12B and CCR5 affects the develop-ment of AIDS. AIDS 2007;21:65–9.
312 Pang et al. by guest on January 13, 2016
http://bib.oxfordjournals.org/D
ownloaded from
11. Hinkley T, Martins J, Chappey C, etal. A systems analysis ofmutational effects in HIV-1 protease and reverse transcript-ase. Nat Genet 2011;43:487–9.
12. Lukens LN, Doebley J. Epistatic and environmental inter-actions for quantitative trait loci involved in maize evolu-tion. Genet Res 1999;74:291–302.
13. Nunkesser R, Bernholt T, Schwender H, et al. Detectinghigh-order interactions of single nucleotide polymorphismsusing genetic programming. Bioinformatics 2009;23:3280–8.
14. Ritchie MD, Hahn LW, Roodi N, et al. Multifactor-dimensionality reduction reveals high-order interactionsamong estrogen-metabolism genes in sporadic breastcancer. AmJHumGenet 2001;69:138–47.
15. McMullen MD, Byrne PF, Snook ME, et al. Quantitativetrait loci and metabolic pathways. Proc Natl Acad Sci U S A1998;95:1996–2000.
16. Stich B, Yu J, Melchinger AE, et al. Power to detecthigher-order epistatic interactions in a metabolic pathwayusing a new mapping strategy. Genetics 2007;176:563–70.
17. Gutierrez J. A developmental systems perspective on epis-tasis: Computational exploration of mutational interactionsin model developmental regulatory networks. PLoS One2009;4(9):e6823.
18. Pettersson M, Besnier F, Siegel PB, et al. Replication andexplorations of high-order epistasis using a large advancedintercross line pedigree. PLoSGenet 2011;7(7):e1002180.
19. Wang Z, Liu T, Lin Z, et al. A general model for multilocusepistatic interactions in case-control studies. PLoS One2010;5(8):e11384.
20. Lander ES, Botstein D. Mapping Mendelian factors under-lying quantitative traits using RFLP linkage maps. Genetics1989;121:185–99.
21. Zeng Z-B. Precision mapping of quantitative trait loci.Genetics 1994;136:1457–68.
22. Xu S. Estimating polygenic effects using markers of theentire genome. Genetics 2003;163:789–801.
23. Wu RL, Ma CX, Casella C. StatisticalGenetics ofQuantitativeTraits: Linkage, Maps, and QTL. New York: Springer-Verlag,2007.
24. Kao C-H, Zeng Z-B. Modeling epistasis of quantitative traitloci using Cockerham’s model. Genetics 2002;160:1243–61.
25. Shah SH, Schmidt MA, Mei H, et al. Searching for epistaticinteractions in nuclear families using conditional linkageanalysis. BMCGenetics 2005;6(Suppl 1):S148.
26. Li Y, Berg A, Chang MN, et al. A statistical model forgenetic mapping of viral infection by integrating epidemio-logical behavior. Statist Appl Genet Mol Biol 2009;8(1):Article 38.
27. Yi NJ, Shriner D, Banerjee S, et al. An efficient Bayesianmodel selection approach for interacting QTL models withmany effects. Genetics 2007;176:1865–77.
28. Hansen TF, Wagner GP. Epistasis and the mutation load:a measurement-theoretical approach. Genetics 2001;158:477–85.
29. Beerenwinkel S, Pachter L, Sturmfels B, et al. Analysis ofepistatic interactions and fitness landscapes using a new geo-metric approach. BMCEvol Biol 2007;7:60.
30. Imielinski M, Belta C. Exploiting the pathway structure ofmetabolism to reveal high-order epistasis. BMC Syst Biol2008;2:40.
31. Huang N, Parco A, Mew T, et al. RFLP mapping ofisozymes, RAPD and QTLs for grain shape, brownplanthopper resistance in a doubled haploid rice population.Mol Breed 1997;3:105–13.
32. Bateson W. Mendel’s Principles of Heredity. Cambridge, UK:Cambridge University Press, 1909.
33. Steen KV. Travelling the world of gene-gene interactions.Brief Bioinform 2012;13: 1–19.
34. Elliger CA, Chan BG, Waiss AC, et al. C-glycosylflavonesfrom Zeamaye that inhibit ineect development. Phyto-chemistry 1980;19:293–7.
35. Franken P, Niesbach-Klosgen U, Weydemann U, et al.The duplicated chalcone synthase genes C2 and Whp(white pollen) of Zea mays are independently regulated;evidence for translational control of Whp expression bythe anthocyanin ntensifying gene in. EMBO J 1991;10:2605–12.
36. Grotewold E, Peterson T. Isolation and characterization ofa maize gene encoding chalcone flavonone isomerase.MolGenGenet 1994;242:1–8.
37. Styles ED, Ceska O. Genetic control of 3-hydroxy- and3-deoxy-flavonoids in Zea mays. Phytochemistry 1975;14:413–5.
38. Heller W, Forkmann G. Biosynthesis of flavonoids. In:Harborne JB, (ed). The Flavonoids: Advances in Research Since1986. London: Chapman & Hall, 1994;499–535.
39. Marrs KA, Alfenito MR, Lloyd AM, et al. A glutathioneS-transferase involved in vacuolar transfer encoded by themaize gene Bronze-2. Nature 1995;375:397–400.
40. Fisher RA. The correlations between relatives on the sup-position of Mendelian inheritance. Trans R Soc Edinb 1918;52:399–433.
41. Mather K, Jinks JL. Biometrical Genetics. 3rd edn. London:Chapman & Hall, 1982.
42. Churchill GA, Doerge RW. Empirical threshold values forquantitative trait mapping. Genetics 1994;138:963–71.
43. Wang CG, Wang Z, Luo JT, etal. A model for transgenera-tional imprinting variation in complex traits. PLoS One2010;5(7):e11396.
44. Dupuis J, Siegmund D. Statistical methods for mappingquantitative trait loci from a dense set of markers. Genetics1999;151:373–86.
45. Piepho H-P. A quick method for computing approximatethresholds for quantitative trait loci detection. Genetics 2001;157:425–32.
46. Chang MR, Wu RL, Wu S, et al. Score statistics of quan-titative trait locus mapping. Stat Appl Genet Mol Biol 2009;8(1):Article 16.
47. Wu JS, Zhang B, Cui YH, et al. Genetic mapping of devel-opmental instability: Design, model and algorithm. Genetics2007;176:1187–96.
48. Senthilvel S, Vinod KK, Senthilvel P, etal. QTL and QTL�environment effects on agronomic and nitrogen acquisitiontraits in rice. J Integ Plant Biol 2008;50:1108–17.
49. Bartual R, Lacasa A, Marsal JI, etal. Epistasis in the resistanceof pepper to phytophthora stem blight (Phytophthora capsiciL.) and its significance in the prediction of double crossperformances. Euphytica 1994;72:149–52.
50. Wu RL. Detecting epistatic genetic variance with aclonally replicated design: models for low- vs.
Mapping high-order epistasis for complex traits 313 by guest on January 13, 2016
http://bib.oxfordjournals.org/D
ownloaded from
high-order nonallelic interaction. TheorApplGenet 1996;93:102–09.
51. Yan JQ, Zhu J, He CX, et al. Molecular dissection of de-velopmental behavior of plant height in rice (Oryza sativaL.). Genetics 1998;150:1257–65.
52. Zhao W, Zhu J, Gallo-Meagher M, etal. A unified statisticalmodel for functional mapping of genotype by environmentinteractions for ontogenetic development. Genetics 2004;168:1751–62.
53. Gayan J, Gonzalez-Perez A, Bermudo F, et al. A methodfor detecting epistasis in genome-wide studies using case-control multi-locus association analysis. BMC Genomics2007;9:360.
54. Huang X, Wei X, Sang T, et al. Genome-wide associationstudies of 14 agronomic traits in rice landraces. Nat Genet2010;42:961–7.
55. Marchini J, Donnelly P, Cardon LR. Genome-widestrategies for detecting multiple loci that influence complexdiseases. Nat Genet 2005;37:413–7.
56. Wang WY, Barratt BJ, Clayton DG, et al. Genome-wideassociation studies: theoretical and practical concerns. NatRev Genet 2005;6:109–18.
57. Wan X, Yang C, Yang Q, etal. Predictive rule inference forepistatic interaction detection in genome-wide associationstudies. Bioinformatics 2010;26:30–7.
58. Hlavacek WS, Faeder JR. The complexity of cell signalingand the need for a new mechanics. Sci Signal 2009;2(81):pe46.
59. Weng GZ, Bhalla US, Iyengar R. Complexity in biologicalsignaling systems export. Science 1999;284:92–6.
60. Li JH, Das K, Fu G, et al. Bayesian lasso for genome-wideassociation studies. Bioinformatics 2011;27: 516–23.
314 Pang et al. by guest on January 13, 2016
http://bib.oxfordjournals.org/D
ownloaded from