a statistical procedure to map high-order epistasis for complex traits

13
A statistical procedure to map high-order epistasis for complex traits Xiaoming Pang*, Zhong Wang*, John S. Yap*, Jianxin Wang, Junjia Zhu,Wenhao Bo, Yafei Lv, Fang Xu, Tao Zhou, Shaofeng Peng, Dengfeng Shen and Rongling Wu Submitted: 22nd February 2012; Received (in revised form) : 27th April 2012 Abstract Genetic interactions or epistasis have been thought to play a pivotal role in shaping the formation, development and evolution of life. Previous work focused on lower-order interactions between a pair of genes, but it is obviously inadequate to explain a complex network of genetic interactions and pathways. We review and assess a statistical model for characterizing high-order epistasis among more than two genes or quantitative trait loci (QTLs) that con- trol a complex trait. The model includes a series of start-of-the-art standard procedures for estimating and testing the nature and magnitude of QTL interactions. Results from simulation studies and real data analysis warrant the Xiaoming Pang obtained his PhD in Pomology at Huazhong Agricultural University in 2002. After post-doctoral training and research in China and Japan, he joined Beijing Forestry University as Associate Professor of Tree Breeding in 2006. His research interest focuses on the utilization of molecular genetics and biotechnologies to study population genetic diversity and map quantitative trait loci in fruit trees. Zhong Wang obtained his PhD in Engineering Mechanics at Dalian University of Technology in 2000. He found and managed a software company in Japan from 2000 to 2008. He is Visiting Scholar in the Center for Computational Biology at Beijing Forestry University. He writes computer software for statistical genetic models. John S. Yap obtained his PhD in Statistics at the University of Florida in 2007. He monitors the accuracy and reasonableness of statistical approaches that are used in drug discovery, development and delivery. He also develops new statistical models for genetic mapping. JianxinWang obtained his PhD in Computer Science at the University of Science and Technology Beijing in 2001. He is Associate Professor of Informatics at Beijing Forestry University and currently a visiting scholar at the Pennsylvania State University. His research interest is in computational bioinformatics in biology. Jinjia Zhu obtained his PhD in Statistics at the Pennsylvania State University in 2008. He was Assistant Professor in the Department of Mathematics and Computer Sciences at the Penn State Harrisburg from 2008 to 2010, and is currently Assistant Professor in the Division of Biostatistics in the Department of Public Health Sciences at the Pennsylvania State College of Medicine, Hershey. His research interest is in computational statistics and statistical applications, particularly in modeling human diseases. Wenhao Bo is a PhD candidate in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry University. His research focuses on the genetic mapping of quantitative traits in Populus. Yafei Lv is a Master’s student in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry University. His research focuses on the statistical genetics of quantitative traits in forest trees. Fang Xu is a Master’s student in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry University. His research focuses on the genetic diversity and evolution of Populus using molecular markers. Tao Zhou is a PhD candidate in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry University. His research focuses on the study of the genetic mechanisms for adaptive response to environmental stress in Populus. Shaofeng Peng is a PhD student in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry University. Her research focuses on the use of molecular biotechnologies to study the population genetics of forest trees. Dengfeng Shen is a PhD student in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry University. Her research focuses on the study of ecological genetics and evolutionary genetics in Populus. Rongling Wu obtained his PhD in Quantitative Genetics at the University of Washington in 1995. He is Changjiang Scholars Professor of Genetics and the Director of the Center for Computational Biology at Beijing Forestry University. His interest is to unravel the genetic roots for the outcome of biological traits by dissecting the traits into their biochemical and developmental pathways. Corresponding author. Rongling Wu, Changjiang Scholars Professor of Molecular Breeding, Director, Center for Computational Biology, College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing 100083, China. Tel: þ086 10 6233 6283. Fax: þ086 10 6233 6164. E-mail: [email protected] *These authors contributed equally to this work. BRIEFINGS IN BIOINFORMATICS. VOL 14. NO 3. 302^314 doi:10.1093/bib/bbs027 Advance Access published on 20 June 2012 ß The Author 2012. Published by Oxford University Press. For Permissions, please email: [email protected] by guest on January 13, 2016 http://bib.oxfordjournals.org/ Downloaded from

Upload: independent

Post on 18-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

A statistical procedure to maphigh-order epistasis for complex traitsXiaoming Pang*, ZhongWang*, John S.Yap*, JianxinWang, Junjia Zhu,Wenhao Bo, Yafei Lv, Fang Xu,Tao Zhou, Shaofeng Peng, Dengfeng Shen and RonglingWuSubmitted: 22nd February 2012; Received (in revised form): 27th April 2012

AbstractGenetic interactions or epistasis have been thought to play a pivotal role in shaping the formation, developmentand evolution of life. Previous work focused on lower-order interactions between a pair of genes, but it is obviouslyinadequate to explain a complex network of genetic interactions and pathways.We review and assess a statisticalmodel for characterizing high-order epistasis amongmore than two genes or quantitative trait loci (QTLs) that con-trol a complex trait. The model includes a series of start-of-the-art standard procedures for estimating and testingthe nature and magnitude of QTL interactions. Results from simulation studies and real data analysis warrant the

Xiaoming Pang obtained his PhD in Pomology at Huazhong Agricultural University in 2002. After post-doctoral training and

research in China and Japan, he joined Beijing Forestry University as Associate Professor of Tree Breeding in 2006. His research interest

focuses on the utilization of molecular genetics and biotechnologies to study population genetic diversity and map quantitative trait loci

in fruit trees.

ZhongWang obtained his PhD in Engineering Mechanics at Dalian University of Technology in 2000. He found and managed a

software company in Japan from 2000 to 2008. He is Visiting Scholar in the Center for Computational Biology at Beijing Forestry

University. He writes computer software for statistical genetic models.

John S.Yap obtained his PhD in Statistics at the University of Florida in 2007. He monitors the accuracy and reasonableness of

statistical approaches that are used in drug discovery, development and delivery. He also develops new statistical models for genetic

mapping.

JianxinWang obtained his PhD in Computer Science at the University of Science and Technology Beijing in 2001. He is Associate

Professor of Informatics at Beijing Forestry University and currently a visiting scholar at the Pennsylvania State University. His research

interest is in computational bioinformatics in biology.

JinjiaZhu obtained his PhD in Statistics at the Pennsylvania State University in 2008. He was Assistant Professor in the Department of

Mathematics and Computer Sciences at the Penn State Harrisburg from 2008 to 2010, and is currently Assistant Professor in the

Division of Biostatistics in the Department of Public Health Sciences at the Pennsylvania State College of Medicine, Hershey. His

research interest is in computational statistics and statistical applications, particularly in modeling human diseases.

Wenhao Bo is a PhD candidate in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry

University. His research focuses on the genetic mapping of quantitative traits in Populus.Yafei Lv is a Master’s student in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry

University. His research focuses on the statistical genetics of quantitative traits in forest trees.

Fang Xu is a Master’s student in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry

University. His research focuses on the genetic diversity and evolution of Populus using molecular markers.

Tao Zhou is a PhD candidate in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry

University. His research focuses on the study of the genetic mechanisms for adaptive response to environmental stress in Populus.Shaofeng Peng is a PhD student in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry

University. Her research focuses on the use of molecular biotechnologies to study the population genetics of forest trees.

Dengfeng Shen is a PhD student in Forest Genetics and Tree Breeding in the Center for Computational Biology at Beijing Forestry

University. Her research focuses on the study of ecological genetics and evolutionary genetics in Populus.RonglingWu obtained his PhD in Quantitative Genetics at the University of Washington in 1995. He is Changjiang Scholars

Professor of Genetics and the Director of the Center for Computational Biology at Beijing Forestry University. His interest is to

unravel the genetic roots for the outcome of biological traits by dissecting the traits into their biochemical and developmental pathways.

Corresponding author. Rongling Wu, Changjiang Scholars Professor of Molecular Breeding, Director, Center for Computational

Biology, College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing 100083, China. Tel: þ086 10 6233

6283. Fax: þ086 10 6233 6164. E-mail: [email protected]

*These authors contributed equally to this work.

BRIEFINGS IN BIOINFORMATICS. VOL 14. NO 3. 302^314 doi:10.1093/bib/bbs027Advance Access published on 20 June 2012

� The Author 2012. Published by Oxford University Press. For Permissions, please email: [email protected]

by guest on January 13, 2016http://bib.oxfordjournals.org/

Dow

nloaded from

statistical properties of the model and its usefulness in practice. High-order epistatic mapping will provide a routineprocedure for charting a detailed picture of the genetic regulation mechanisms underlying the phenotypic variationof complex traits.

Keywords: Epistasis; high-order interactions; quantitative trait loci; EM algorithm

INTRODUCTIONThe past decade has been a critical period in which

some phenomena related to genetic architecture are

rerecognized. For example, epistasis has been

thought to be an important force for evolution and

speciation [1, 2], but recent genetic studies from vast

quantities of molecular data have increasingly indi-

cated that epistasis critically affects the pathogenesis

of most inherited human diseases, such as cancer or

cardiovascular disease [3–5], the developmental pro-

cess and pattern of traits [6–8], susceptibility to HIV

virus [9, 10] and virus drug resistance [11]. The ex-

pression of an interconnected network of genes is

contingent upon environmental conditions, often

with the elements and connections of the network

displaying non-linear relationships with environ-

mental factors [12]. Not only do these elements

interact with each other in a pair-wise manner,

they also form a complicated web of high-order

interactions [13]. Because the embryonic expression

pattern of a complex trait undergoes a sequence of

metabolic pathways, such an interaction web should

involve multiple interacting gene products and regu-

latory loci [8, 14–19].

Methodologically, it is a challenge to test and

quantify genetic interactions among multiple genes.

Genetic mapping with molecular linkage maps has

proven to be powerful for the genome-wide detec-

tion of specific genes or quantitative trait loci (QTLs)

for complex traits [20–23]. This approach has now

been extended to search for epistatic interactions be-

tween different genes in controlled crosses [24], nu-

clear families [25], natural populations [26] and

case-control designs [19]. Wu et al. [7] incorporated

an epistatic model to study the genetic control of

developmental trajectories for a complex trait.

Several Bayesian approaches that allow an efficient

search for pair-wise epistasis throughout the genome

have been developed [27]. However, these strategies

for genetic mapping can be equipped with genetic

interactions among more than two QTLs, making it

possible to elucidate a detailed picture of the genetic

architecture of complex traits.

In a theoretical exploration by computer simula-

tion, Stich et al. [16] found that genetic mapping has

adequate power for the detection of three-way inter-

actions while with a low false positive rate. Several

authors showed the mathematical description of

high-order epistasis from regulatory networks

[28–30]. These advances in mathematical and statis-

tical modeling of high-order epistasis provide an in-

centive to study this complex genetic phenomenon.

The purpose of this article is to describe and assess a

general procedure for a genome-wide search for

high-order epistasis involving more than two QTLs

using a genetic mapping strategy. This procedure in-

tegrates traditional quantitative genetic theory into

genetic mapping, allowing the discernment of epistasis

at different orders. The procedure is reviewed and

tested in a genetic mapping study of rice with a

doubled haploid population [31], in which significant

three-way additive� additive� additive epistasis was

identified. Computer simulation was used to investi-

gate the statistical behavior of the model and algo-

rithm for three-way epistatic mapping.

HIGH-DIMENSIONALGENETICMODELINGWhy high-order epistasis?Epistasis is the masking of the phenotype of one

allele by the phenotype of an allele in another

locus [32, 33]. Since a phenotypic trait involves an

intricate network of biochemical reactions affected

by multiple interacting gene products and regulatory

loci, it is likely that genes generate higher-order epi-

static interactions [6, 8, 15, 16, 30]. For example,

maize (Zea mays L.) resists the corn earworm

Helicoverpa zea (Boddie), a major insect pest of crops

in the United States and elsewhere in the Western

Hemisphere, because of the C-glycosyl flavones

maysin, apimaysin and methoxymaysin synthesized

in silks [34]. As a resistance phenotype, the biosyn-

thesis of maysin, apimaysin and methoxymaysin

undergoes a complex network of metabolic path-

ways. Figure 1 illustrates a branch of the well

Mapping high-order epistasis for complex traits 303 by guest on January 13, 2016

http://bib.oxfordjournals.org/D

ownloaded from

characterized flavonoid pathway in which each step

and reaction are regulated by genes [15]. In order for

maysin to be synthesized, alleles at the following

genes should coordinate appropriately, p1, c2 and/

or whp1 (encoding chalcone synthases [35]), chi1(encoding chalcone isomerase [36]), pr1 (controlling

the 30-hydroxylation of the flavonoid B-ring to con-

vert monohydroxy to dihydroxy compounds [37])

and unidentified additional loci encoding flavone

synthase, C-glycosyl transferase, glucose oxidase,

rhamnosyl transferase and an enzyme such as gluta-

thione S-transferase for transport to the vacuole

[38, 39] (Figure 1). McMullen et al. [15] argued that

higher-order epistatic interactions among multiple

genes at different levels of biochemical pathways

are a determinant of final maysin synthesis. The

occurrence of high-order epistasis entails the devel-

opment of high-dimensional model for gene

detection.

Quantitative genetic model forhigh-order epistasisThe formation of a final phenotype is the conse-

quence of sequential genetic interactions involved

in biochemical and metabolic networks. Quantita-

tive genetic theory has well been established to

describe pair-wise epistasis by partitioning it into dif-

ferent components [40, 41]. Here, we extend this

theory to study high-order epistasis among three or

more genes. Consider three QTLs Q1, Q2, Q3, each

with two alleles Q and q, which control a complex

trait. Let jk denote one of three genotypes at QTL

Qk (k¼ 1, 2, 3). The genotypic value of a 3-QTL

genotype, denoted as mj1j2j3 (jk¼ 0 for qkqk, 1 for

Qkqk, 2 for QkQk), can be partitioned into different

components including the main effects, two-way

interaction effects and three-way interaction effects

[19], i.e.

Figure 1: A branch of biochemical pathways for flavones synthesis in maize. CHS, chalchone synthase; CHI, chal-cone isomerase; F3H, flavanone-3-hydroxylase; DFR, dihydroflavanone reductase; F30H, flavanone-30 -hydroxylase;FNS, flavone synthase; RT, rhamnosyl transferase. Adapted from [15].

304 Pang et al. by guest on January 13, 2016

http://bib.oxfordjournals.org/D

ownloaded from

where m is the overall mean; a1, a2 and a3 are the

main additive genetic effects, d1, d2 and d3 are the

main dominant effects at QTLs Q1, Q2, Q3, respect-

ively; ia1a2, ia1d2

, id1a2, id1d2

, ia1a3, ia1d3

, id1a3, id1d3

, ia2a3,

ia2d3, id2a3

, id2d3are the two-way additive� additive,

additive� dominant, dominant� additive and

dominant� dominant epistasis between QTLs Q1

and Q2, between QTLs Q1 and Q3, and between

QTLs Q2 and Q3, respectively; ia1a2a3, ia1a2d3

, ia1d2a3,

id1a2a3, ia1d2d3

, id1a2d3, id1d2a3

, id1d2d3are the three-way

additive� additive� additive, additive� additive�

dominant, additive� dominant� additive, domin-

ant� additive� additive, additive� dominant�

dominant, dominant� additive� dominant, domin-

ant� dominant� additive and dominant� domin-

ant� dominant epistasis among QTLs Q1, Q2, Q3,

respectively.

The genetic effect parameters are then solved from

the genotypic values:

m000

m001

m002

m010

m011

m012

m020

m021

m022

m100

m101

m102

m110

m111

m112

m120

m121

m122

m200

m201

m202

m210

m211

m212

m220

m221

m222

2666666666666666666666666666666666666666666664

3777777777777777777777777777777777777777777775

¼

1 �1 �1 �1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 �1 0 0 0 0 0 0 01 �1 �1 0 0 0 1 1 0 0 0 0 0 0 �1 0 0 �1 0 0 0 0 1 0 0 0 01 �1 �1 1 0 0 0 1 �1 �1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 01 �1 0 �1 0 1 0 0 0 1 0 0 0 �1 0 0 0 0 �1 0 0 1 0 0 0 0 01 �1 0 0 0 1 1 0 0 0 0 1 0 �1 �1 0 0 0 0 0 0 0 0 �1 0 0 01 �1 0 1 0 1 0 0 0 �1 0 0 0 �1 0 0 0 0 1 0 0 �1 0 0 0 0 01 �1 1 �1 0 0 0 �1 �1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 01 �1 1 0 0 0 1 �1 0 0 0 0 0 0 �1 0 0 1 0 0 0 0 �1 0 0 0 01 �1 1 1 0 0 0 �1 1 �1 0 0 0 0 0 0 0 0 0 �1 0 0 0 0 0 0 01 0 �1 �1 1 0 0 0 1 0 0 0 0 0 0 �1 �1 0 0 0 1 0 0 0 0 0 01 0 �1 0 1 0 1 0 0 0 0 0 1 0 0 �1 0 �1 0 0 0 0 0 0 �1 0 01 0 �1 1 1 0 0 0 �1 0 0 0 0 0 0 �1 1 0 0 0 �1 0 0 0 0 0 01 0 0 �1 1 1 0 0 0 0 1 0 0 0 0 0 �1 0 �1 0 0 0 0 0 0 �1 01 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 01 0 1 �1 1 0 0 0 �1 0 0 0 0 0 0 1 �1 0 0 0 �1 0 0 0 0 0 01 0 1 0 1 0 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 01 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 01 1 �1 �1 0 0 0 �1 1 �1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 01 1 �1 0 0 0 1 �1 0 0 0 0 0 0 1 0 0 �1 0 0 0 0 0 0 0 0 01 1 �1 1 0 0 0 �1 �1 1 0 0 0 0 0 0 0 0 0 �1 0 0 0 0 0 0 01 1 0 �1 0 1 0 0 0 �1 0 0 0 1 0 0 0 0 �1 0 0 �1 �1 0 0 0 01 1 0 0 0 1 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 01 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 0 01 1 1 �1 0 0 0 1 �1 �1 0 0 0 0 0 0 0 0 0 �1 0 0 0 0 0 0 01 1 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 01 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

2666666666666666666666666666666666666666666664

3777777777777777777777777777777777777777777775

ma1

a21

a3

d1

d2

d3

ia1a2

ia1a2

ia1a2

ia1a2

ia1a2

ia1a2

ia1a2

ia1a2

ia1a2

ia1a2

ia1a2

ia1a2

ia1a2

ia1a2

ia1a2

ia1a2

ia1a2

ia1a2

ia1a2

ia1a2

2666666666666666666666666666666666666666666664

3777777777777777777777777777777777777777777775

ð1Þ

bT ¼ ðm,a1,a2,a3,d1,d2,d3,ia1a2,ia1d2

,id1a2,id1d2

,ia1a3,ia1d3

,id1a3,id1d3

,ia2a3,ia2d3

,id2a3,id2d3

,ia1a2a3,ia1a2d3

,ia1d2a3,id1a2a3

,ia1d2d3,id1a2d3

,id1d2a3,id1d2d3

ÞT¼

¼1

23

1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1

�1 0 �1 0 0 0 �1 0 �1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1

�1 0 �1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 �1 0 �1 0 0 0 1 0 1

�1 0 1 0 0 0 �1 0 1 0 0 0 0 0 0 0 0 0 �1 0 1 0 0 0 �1 0 1

�1 0 �1 0 0 0 �1 0 �1 2 0 2 0 0 0 2 0 2 �1 0 �1 0 0 0 �1 0 �1

�1 0 �1 2 0 2 �1 0 �1 0 0 0 0 0 0 0 0 0 �1 0 �1 2 0 2 �1 0 �1

�1 2 �1 0 0 0 �1 2 �1 0 0 0 0 0 0 0 0 0 �1 2 �1 0 0 0 �1 2 �1

1 0 1 0 0 0 �1 0 �1 0 0 0 0 0 0 0 0 0 �1 0 �1 0 0 0 1 0 1

1 0 �1 0 0 0 �1 0 1 0 0 0 0 0 0 0 0 0 1 0 �1 0 0 0 �1 0 1

1 0 �1 0 0 0 1 0 �1 0 0 0 0 0 0 0 0 0 �1 0 1 0 0 0 �1 0 1

1 0 1 �2 0 �2 1 0 1 �2 0 �2 4 0 4 �2 0 �2 1 0 1 �2 0 �2 1 0 1

1 �2 1 �2 4 �2 1 �2 1 0 0 0 0 0 0 0 0 0 1 �2 1 �2 4 �2 1 �2 1

1 �2 1 0 0 0 1 �2 1 �2 4 �2 0 0 0 �2 4 �2 1 �2 1 0 0 0 1 �2 1

1 0 1 �2 0 �2 1 0 1 0 0 0 0 0 0 0 0 0 �1 0 �1 2 0 2 �1 0 �1

1 �2 1 0 0 0 1 �2 1 0 0 0 0 0 0 0 0 0 �1 2 �1 0 0 0 �1 2 �1

1 0 1 0 0 0 �1 0 �1 �2 0 �2 0 0 0 2 0 2 1 0 1 0 0 0 �1 0 �1

1 0 �1 0 0 0 1 0 �1 �2 0 2 0 0 0 �2 0 2 1 0 �1 0 0 0 1 0 �1

1 �2 1 0 0 0 �1 2 �1 0 0 0 0 0 0 0 0 0 1 �2 1 0 0 0 �1 2 �1

1 0 �1 �2 0 2 1 0 �1 0 0 0 0 0 0 0 0 0 1 0 �1 2 0 1 1 0 �1

�1 0 1 0 0 0 1 0 �1 0 0 0 0 0 0 2 0 0 1 0 �1 0 0 0 �1 0 1

�1 0 1 0 0 0 1 0 �1 0 0 �2 0 0 0 �2 0 2 �1 0 1 0 0 0 1 0 �1

�1 0 1 2 0 �2 �1 0 1 2 0 0 0 0 0 0 0 0 1 0 �1 2 0 2 1 0 �1

�1 2 �1 0 0 0 1 �2 1 0 0 0 0 0 0 0 0 0 1 �2 1 0 0 0 �1 2 �1

�1 2 �1 2 �4 2 �1 2 �1 0 0 0 0 0 0 0 0 0 1 �2 1 �2 4 �2 1 �2 1

�1 2 �1 0 0 0 1 �2 1 0 �4 2 0 0 0 �2 4 �2 �1 2 �1 0 0 0 1 �2 1

�1 0 1 2 0 �2 �1 0 1 2 0 �2 �4 0 4 2 0 �2 �1 0 1 2 0 �2 �1 0 1

�1 2 �1 2 �4 2 �1 2 �1 0 �4 2 �4 8 �4 2 �4 2 �1 2 �1 2 �4 2 �1 2 �1

26666666666666666666666666666666666666666666666666666666664

37777777777777777777777777777777777777777777777777777777775

m000

m001

m002

m010

m011

m012

m020

m021

m022

m100

m101

m102

m110

m111

m112

m120

m121

m122

m200

m201

m202

m210

m211

m212

m220

m221

m222

26666666666666666666666666666666666666666666666666666666664

37777777777777777777777777777777777777777777777777777777775

ð2Þ

Mapping high-order epistasis for complex traits 305 by guest on January 13, 2016

http://bib.oxfordjournals.org/D

ownloaded from

Using these expressions, we can test the signifi-

cance of each genetic effect. The model can be ex-

tended to characterize high-order epistasis among

any number of QTLs.

Approaches for mapping high-orderepistasisGenetic mapping founded on quantitative genetic

theory has been used to study genetic interactions.

Consider an F2 mapping population, derived from

two inbred lines, in which all n progeny is genotyped

for a panel of molecular markers to construct a genetic

linkage map and phenotyped for a complex trait [23].

The likelihood of trait values, determined by the

three QTLs Q1, Q2, Q3, is formulated by a mixture

model composed of 33 genotype components, i.e.

LðyÞ ¼Yni¼1

X2

j1¼0

X2

j2¼0

X2

j3¼0

oj1j2j3ji fj1j2j3 ðyiÞ ð3Þ

where yi is the phenotypic value of the trait for pro-

geny i, oj1j2j3ji is the probability at which an arbitrary

progeny i is QTL genotype mj1j2j3 and fj1j2j3ðyiÞ is the

a normal density function of progeny i with geno-

typic mean mj1j2j3 and variance s2. Since the QTL

genotype of a progeny is unknown but can be

inferred from its marker genotype, oj1j2j3ji is actually

a conditional probability of QTL genotype j1 j2 j3given the marker genotype of progeny i, which

can be expressed in terms of the recombination frac-

tions between the QTLs and markers [23]. In the

likelihood (3), we have the unknown parameters,

arrayed in ?, including QTL positions, genotypic

means and variance.

The parameters can be estimated by maximizing

the likelihood (3). This can be done by differentiat-

ing the likelihood with respect of individual param-

eters y (y 2 ?), i.e.

@

@ylog LðyÞ

¼@

@y

Xni¼1

logX2

j1¼0

X2

j2¼0

X2

j3¼0

oj1j2j3jifj1j2j3ðyiÞ

¼Xni¼1

X2

j1¼0

X2

j2¼0

X2

j3¼0

oj1j2j3jifj1j2j3ðyiÞP2j01¼0

P2j02¼0

P2j03¼0

oj01j02j03jifj0

1j02j03ðyiÞ

@

@yfj1j2j3 ðyiÞ

¼Xni¼1

X2

j1¼0

X2

j2¼0

X2

j3¼0

�j1j2j3ji@

@yfj1j2j3ðyiÞ

ð4Þ

where

�j1j2j3ji ¼oj1j2j3jifj1j2j3 ðyiÞ

P2j01¼0

P2j02¼0

P2j03¼0

oj01j02j03ji fj0

1j02j03ðyiÞ

ð5Þ

is interpreted as the posterior probability that pro-

geny i has QTL genotype j1j2j3. Substituting (5) into

(4), we obtain the formulas to estimate genotypic

means and variance expressed as

mj1j2j3 ¼

Pni¼1

�j1j2j3jiyi

Pni¼1

�j1j2j3ji

ð6Þ

s2 ¼1

n

Xni¼1

X2

j1¼0

X2

j2¼0

X2

j3¼0

�j1j2j3jiðyi � mj1j2j3Þ2

ð7Þ

The EM algorithm [20] is implemented to esti-

mate mj1j2j3 and s2 by using an iterative procedure

between the E step (5) and M steps (6) and (7). The

values at convergence are the maximum-likelihood

estimates (MLEs) of mj1j2j3 and s2. After obtaining

the estimates of genotypic the means, we can solve

for the estimates of the genetic effect parameters

using Equation (2). In practice, the QTL positions

are estimated by treating oj1j2j3ji as a fixed parameter,

scanning the entire genome and detecting the largest

likelihood that corresponds to the best estimation of

QTL positions.

Hypothesis testsWhen no QTL is segregating, only one normal

density can describe the population in which case

no EM algorithm is needed for parameter estimation.

The existence of QTLs can be tested by calculating

and comparing the likelihoods under the null

hypothesis H0: there is no QTL, L( ~?) and the alter-

native hypothesis H1: there is at least one QTL,

L(?). The resulting log-likelihood ratio (LR) test

statistics is

LR ¼ 2½log Lð ~?� log Lð?Þ�

where ~? and ? are the MLEs of unknown par-

ameters under the H0 and H1, respectively. The sig-

nificance of the result can be tested by using

permutation tests [42]. By reshuffling the phenotypic

data and calculating the LR genome-wide for each

permutation, a critical threshold is obtained at a par-

ticular significance level.

306 Pang et al. by guest on January 13, 2016

http://bib.oxfordjournals.org/D

ownloaded from

A procedure can also be given to test different

components of genotypic values including the addi-

tive (a1, a2, a3) and dominant main genetic effects

(d1, d2, d3) at individual QTLs, two-way epistatic

interactions of 12 different types (ia1a2, ia1d2

, id1a2,

id1d2, ia1a3

, ia1d3, id1a3

, id1d3, ia2a3

, ia2d3, id2a3

, id2d3), and

three-way epistatic interactions of eight different

types (ia1a2a3, ia1a2d3

, ia1d2a3, id1a2a3

, ia1d2d3, id1a2d3

, id1d2a3,

id1d2d3). All these components are calculated from

genotypic means using a group of equations (1).

When we want to test whether one or more of

the 26 effects equals zero, we will only need to es-

timate the remaining effects. In such a reduced

model, we will use the same EM algorithm for

parameter estimation described for the full model

(5)–(7), but with the constraint(s) that poses on the

relationships among genotypic values, mj1j2j3 ( jk¼ 0

for qkqk, 1 for Qkqk, 2 for QkQk), derived under

the condition that the effect parameters being

tested are equal to zero. The Augment-M algo-

rithm is incorporated [43], in which the M step is

derived under the reduced model using the follow-

ing steps:

(1) For progeny i with observed phenotypic

value yi, we augment the trait value yj1j2j3jiof this progeny that carries QTL genotype

j1j2j3, i.e.

yj1j2j3ji ¼ �j1j2j3jiyi,

where �j1j2j3ji is the posterior probability obtained

from the E-step (5);

(2) Define a vector of dummy variable Xj1j2j3ji that

meets

Eðyj1j2j3jiÞ ¼ Xj1j2j3jib,

where b is the vector of genetic effect parameters (2);

(3) By arranging the augmented data in a linear

model framework, we have

yA ¼ XAb,

where yA¼fyj1j2j3jig and XA¼fXj1j2j3jig. For a given

reduced model, represented by KTb¼ 0, where K is

a vector that constrains a certain effects to be equal

to zero, we have

bK ¼ b� ðXTAXAÞ

�1K½KTðXTAXAÞ

�1K�KT b, ð8Þ

where b ¼ ðXTAXAÞ

�1ðXT

AXAÞ;

(4) The variance in the reduced model is estimated

by

s2K ¼

1

n

Xni¼1

�j1j2j3jiðyi�Xj1j2j3jibK Þ2, ð9Þ

where n is the total number of progeny in the

mapping population.

The iteration is made between the E step (5) and

M step (8) and (9) until the stable values are ob-

tained. These stable values are the MLEs of the par-

ameters under the reduced model. In each case of

testing the significance of effect parameters, we cal-

culate the likelihoods under the null and alternative

hypotheses and therefore the LRs. The critical

thresholds for testing each effect can be obtained

from simulation approaches [23].

To reduce the computing burden for threshold

determination, several formulae have been derived

for computing approximate critical thresholds to

control the type I error rate at a chromosome- or

genome-wide level [44, 45]. Chang et al. [46] pro-

posed a score test statistic for QTL mapping. The

score test is computationally simpler than the LR

test, since it only uses the MLEs of parameters

under the null hypothesis. More importantly, the

maximum of the square of score statistics are asymp-

totically equivalent to the maximum of the LR test

statistics under the null hypothesis, thus the critical

threshold for the score test can also be used for the

LR test, which can improve the computing effi-

ciency of threshold determination.

MODELVALIDATIONWorked exampleTwo rice cultivars, semi-dwarf IR64 and tall

Azucena, was crossed to generate a doubled-haploid

(DH) population. Using 135 DH lines from this

population, a genetic linkage map was constructed,

covering 12 chromosomes with 175 molecular mar-

kers [31]. The DH population was grown in a ran-

domized complete design with two replicates at a

spacing of 15� 20 cm in a field near Hangzhou,

China. Final plant heights were measured for each

plant. To reduce random errors in height pheno-

types, we took the mean of two replicates for each

DH line, used for QTL mapping. Indeed, two rep-

licates can be incorporated into the model in a way,

as shown by Wu et al. [47], which takes into account

Mapping high-order epistasis for complex traits 307 by guest on January 13, 2016

http://bib.oxfordjournals.org/D

ownloaded from

the spatial correlation of phenotypic values due to

microenvironmental factors.

Marker and QTL co-segregation in a DH popu-

lation follows a backcross pattern. In a DH popula-

tion, each locus has two homozygotes (denoted as

1and 0, respectively) each inheriting two alleles from

a different parent. Thus, three QTLs form eight dif-

ferent homozygotes, denoted as j1j2j3 (j1, j2, j3¼ 1,

0), with genotypic value mj1j2j3 which can be parti-

tioned into different components as follows:

m111 ¼ mþ a1 þ a2 þ a3 þ ia1a2þ ia1a3

þ ia2a3þ ia1a2a3

m110 ¼ mþ a1 þ a2 � a3 þ ia1a2� ia1a3

� ia2a3� ia1a2a3

m101 ¼ mþ a1 � a2 þ a3 � ia1a2þ ia1a3

� ia2a3� ia1a2a3

m100 ¼ mþ a1 � a2 � a3 � ia1a2� ia1a3

þ ia2a3þ ia1a2a3

m011 ¼ m� a1 þ a2 þ a3 � ia1a2� ia1a3

þ ia2a3� ia1a2a3

m010 ¼ m� a1 þ a2 � a3 � ia1a2þ ia1a3

� ia2a3þ ia1a2a3

m001 ¼ m� a1 � a2 þ a3 þ ia1a2� ia1a3

� ia2a3þ ia1a2a3

m000 ¼ m� a1 � a2 � a3 þ ia1a2þ ia1a3

þ ia2a3� ia1a2a3

where m is the overall mean; a’s are the additive

effects at different QTLs, and i’s are two- or three-

way epistatic interactions between the QTLs. The

procedure for mapping high-order epistasis was

used to estimate the additive, additive� additive,

additive� additive� additive genetic effects on

plant height in this DH population.

By simultaneously searching for three QTLs

at every 4 cm throughout the entire genome, we

detected three locations, markers RG403–RG229

on chromosome 5, markers RZ337B–CDO497 on

chromosome 7, and markers RG667–RG451 on

chromosome 9, which jointly affect plant heights.

Figure 2 shows a portion of the overall LR plot

against the searched positions S, with the LR peak

(64, 288, 440) that corresponds to the locations of

three QTLs on Chromosomes 5, 7 and 9. Three-

dimensional plots are displayed for cycles 1–4 in

Figure 3, where the x, y-coordinates are the b and c

Figure 2: A portion of the LR versus genome plot (top). For a demonstration, cycle 2 is magnified to show aclearer picture of the probable location of the QTL at a¼ 64, b¼ 288 and c¼ 440 (bottom). Because a three-waymodel deals with an exponentially increasing number of combinations, a genome-wide critical threshold may betoo conservative. For this reason, we determined the critical threshold chromosome-wise from 200 permutationtests [42]. That is, the proposed three-way epistatic model was applied 200 times to the three chromosomeswhere the three QTLs were located. At each application, the height measurements were permuted against the cor-respondingmarkers to remove anymarker-QTL association and simulate a null distribution. All 200 LR values result-ing from each application of the model were ranked and the 95th percentile was 45.96, which is below the globalmaximum LR value of 56.17. There are eleven cycles on the top plot, each of which corresponds to a fixed positionfor a and all possible values for b and c. The bottom plot is a magnified view for cycle 2 where the global peak islocated. It shows roughly ten cycles, each of which corresponds to a fixed value for b and all possible values for c.

308 Pang et al. by guest on January 13, 2016

http://bib.oxfordjournals.org/D

ownloaded from

search positions, respectively, and the z-coordinate is

the LR value.

Table 1 gives the MLEs of the locations and gen-

etic effects of the three QTLs detected. The QTL on

chromosome 9, linked to marker RG667, triggers a

highly significant additive genetic effect (a3) on plant

height, but the additive effects of the two QTLs on

chromosomes 5 and 7 are not significant. The gen-

omic region of marker RG667 was shown to harbor

a QTL that affects a plant height-corrected trait,

number of productive tillers in rice [48]. The alleles

derived from the tall parent Azucena contribute

favorably to height growth. There is a significant

two-way interaction epistasis (ia1a3) between the

additive effects at the significant QTL on chromo-

some 5 and the non-significant QTL on chromo-

some 9, but the ‘tall’ alleles (derived from parent

Azucena) at each of these two QTLs interact to in-

hibit the plant height growth of rice. It is interesting

to find that a highly significant three-way interaction

epistasis (ia1a2a3) occurs among the additive effects at

the three QTLs by favorably increasing rice height

growth.

Computer simulationWe simulated a backcross design with two genotypes

1 and 0 at each locus and randomly generated

nine markers equally spaced in a linkage map of

225 cM. Let three putative QTLs be located at 10

cM each from the third, sixth and eighth markers,

respectively. Phenotypic values were then simulated

by summing the genotypic value of a specific

three-QTL genotype and residual errors that follow

a normal distribution with mean zero and variance

s2. The simulation studies were designed for differ-

ent sample sizes (n¼ 100 and 400) and different

heritabilities (H2¼ 0.1 and 0.4). The values of the

residual variance were determined, depending on the

level of heritability.

Tables 2 and 3 tabulate the estimates of QTL

locations and effect parameters from the simulated

data based on 100 simulation replicates. In general,

the locations and genotypic values of the QTLs and

residual variance can well be estimated even when

there is a modest sample size (100) and heritability

(0.1) (Table 2). Increasing sample sizes and heritabil-

ities can remarkably improve the estimation precision

of these parameters. The estimation precision of the

genetic effect parameters depends on the type of

the parameters (Table 3). The additive effect can

be estimated most precisely, followed by two-way

interaction effects and three-way interaction effects.

Also, the estimates of the genetic effect parameters

were found to be sensitive to sample size and/or

heritability. Yet, the increase of heritability from

0.1 to 0.4 produces a much better efficiency in im-

proving estimation precision than that of sample size

from 100 to 400. This suggests that a better manage-

ment of plants, aimed to minimize experimental

errors, will contribute more substantially to mapping

precision than a simple increase of sample size.

In general, if a trait has a high heritability, a sample

size of 100 is adequately enough for the reasonable

estimation of the additive and epistatic effects. For a

modest heritability (say 0.1), 400 samples are needed.

By increasing sample size to 1000, it was found that

all estimates can be improved even when the trait has

a low heritability (�0.05). It is always important to

investigate the power of the model to identify sig-

nificant genetic effects given a particular sample size.

We calculated the power using computer simulation.

With the value of a particular genetic effect, which

is used to determine the magnitude of residual vari-

ance for a given heritability, phenotypic and marker

are simulated. The proportion of the number of sig-

nificant simulation replicates (i.e. those in which the

effect is found to be significant) over the total

number of simulation replicates is empirically re-

garded as statistical power for identifying this genetic

Table 1: The MLEs of the QTL locations and effects for plant height growth in a DH mapping population of rice

Chr. Marker interval Map distance Main effect Two-way epistasis Three-way epistasis

5 RG403-RG229 2.1 cM a1 ¼ 1:05ia1a2 ¼ �4:76

7 RZ337B-CDO497 10.7 cM a2 ¼ 1:05 ia1a3 ¼ �8:77 ia1a2a3 ¼ 15:06ia1a2 ¼ �4:76

9 RG667-RG451 15.6 cM a3 ¼ 11:52

Map distancemeans the distance of the QTL from the leftmaker for an interval.

Mapping high-order epistasis for complex traits 309 by guest on January 13, 2016

http://bib.oxfordjournals.org/D

ownloaded from

effect. We ran an additional simulation to examine

the power of the model for detecting three-way

interactions. For a quantitative trait with a modest

heritability (say 0.1), we detected power of �0.70 for

detecting three-way epistasis if a sample size of 400

was used. Under the same heritability, the power

increases to >0.9 if the sample size increases to 800.

DISCUSSIONOur understanding of how the information con-

tained in genotypes is translated into complex

phenotypic traits represents a major challenge in bio-

logical research. Although its precise description has

not been clear yet, existing evidence shows that this

process undergoes a multilayered hierarchy of regu-

latory networks in which genes and products from

different levels or stages interact and coordinate to

form a final phenotype. It is highly likely that inter-

actions involving more than genes, i.e. so-called

high-order interactions should play a central role in

coordinating the networks [29, 30]. Although the

contribution of epistatic interactions to quantitative

genetic variation has been increasingly recognized by

population and evolutionary biologists [1, 2, 4] and

medical geneticists [5, 33], the impact of high-order

epistasis on phenotypic diversity has not been care-

fully explored. Results from a limited number of

quantitative genetic studies show that high-order

epistasis could be correlated with some certain cyto-

logical phenomena [49] and growth traits [50].

In this article, we review a mapping model for

characterizing genetic interactions of multiple

orders that are responsible for complex traits. The

model was founded on the general framework of

genetic mapping with molecular maps, allowing

the genome-wide search for multilocus interactions

throughout the genome. The model shows several

advantages. First, it can test the relative importance

of different types of genetic effects including the

main additive, low-order epistasis and high-order

epistasis and, thus, provides an unprecedented op-

portunity to study the detailed atlas of genetic con-

trol mechanisms for complex phenotypes. From a

biological perspective, it is possible that a single

gene does not trigger a significant effect on a pheno-

type, but exerts a remarkable impact on the pheno-

type through epistasis with other genes involved in

key pathways that form the final phenotype

[29, 30, 33]. Second, we derived a closed form for

estimating the genetic effects of different types,Table

2:The

averaged

MLE

sof

theQTL

positio

nsandgeno

typicvalues

andtheirstandard

errors

(given

inparenthe

ses)un

derdiffe

rent

samplesizes(n)andherit-

abilitie

s(H

2 )ba

sedon

100sim

ulationreplicates

H2

nQTLloca

tion

s3-Q

TLge

notypemea

nss2

12

3m 1

11m 1

10m 1

01m 1

00

m 011

m 010

m 001

m 000

0.1100

43(3.73)

101(4.42)

157(4.05)

148.88

(0.17

)148.79(0.44)

151.32(0.67)

151.11(0.54

)151.6

2(0.54

)147.2

0(0.79)

150.88

(0.47)

151.79(0.18

)5.19(0.05)

0.140

048

(2.94)

109(4.18)

165(4.03)

149.0

8(0.09

)148.80

(0.24)

150.63(0.52)

150.06

(0.45)

150.36(0.49)

148.34(0.55)

152.18(0.27)

151.79(0.10

)5.83(0.02)

0.4

100

52(2.23)

117(3.25

)172(3.17)

149.0

2(0.07

)148.43(0.16)

150.42(0.26)

149.4

9(0.16)

151.4

8(0.15)

147.7

7(0.31)

152.36(0.24)

152.02(0.08)

2.22

(0.02)

0.4

400

60(0.34)

134(0.66)

185(0.60

)149.0

5(0.03)

148.10(0.05)

150.01(0.09)

149.0

7(0.05)

151.9

9(0.04

)147.12(0.08

)152.90

(0.05)

152.00

(0.03)

2.43(0.01)

True

values

60135

185

149

148

150

149

152

147

153

152

6/2.45

310 Pang et al. by guest on January 13, 2016

http://bib.oxfordjournals.org/D

ownloaded from

facilitating the computational efficiency and imple-

mentation into a package of computer software.

More interesting, the closed form for parameter es-

timation exist for reduced models in which our test is

to focus on a particular subset of parameters.

The model was validated by reanalyzing a real data

set for genetic mapping of plant heights in rice [31].

On one hand, this reanalysis has well warranted the

usefulness and utilization of the model in practice.

On the other hand, new discoveries for the genetic

control of plant height growth in rice have been

made by using the new model. In previous studies,

several significant QTLs have been detected for

height growth in this mapping population [51, 52].

For example, Zhao et al. [52] identified these QTLs

on chromosomes 1, 3, 7, 9 and 11. In addition to the

same QTLs identified on chromosomes 7 and 9, the

new model has also detected a QTL on chromosome

Figure 3: The LR value in three dimensions.The four plots correspond to the four cycles in Figure 2.The rectangleis the 200 permutation cutoff.

Table 3: The averaged MLEs of the QTL positions and genetic effects and their standard errors (given in parenth-eses) under different sample sizes (n) and heritabilities (H2) based on 100 simulation replicates.

H2 n m Additive effect Interaction effects

a1 a2 a3 ia1a2 ia1a3 ia2a3 ia1a2a3

0.1 100 150.20(0.15) �0.17(0.14) �1.08(0.16) 0.48(0.18) �0.11(0.21) �0.40(0.20) 0.65(0.19) �0.68(0.21)0.1 400 150.15(0.11) �0.51(0.13) �1.01(0.12) 0.41(0.16) 0.31(0.13) �0.20(0.13) 0.17(0.13) �0.24(0.16)0.4 100 150.13(0.06) �0.78(0.06) �0.95(0.07) 0.70(0.08) 0.33(0.08) �0.31(0.07) 0.38(0.07) �0.46(0.07)0.4 400 150.03(0.02) �0.97(0.02) �0.97(0.02) 0.96(0.02) 0.48(0.02) �0.48(0.02) 0.50(0.02) �0.49(0.02)TrueValue 150 �1 �1 1 0.50 �0.50 0.50 �0.50

Mapping high-order epistasis for complex traits 311 by guest on January 13, 2016

http://bib.oxfordjournals.org/D

ownloaded from

5. Although this new QTL has no significant main

additive effect, it functions epistatically with one

on chromosome 9 to determine the final plant

height of rice. It is very interesting to find that

these two QTLs, along with one on chromosome

7, display a highly significant three-way addi-

tive� additive� additive epistatic effect through

increasing or decreasing �15 cm in plant height. It

should be noted that the detection of more QTLs by

Zhao et al. [52] than the new model may be due to

the fact that the former makes use of height data at

multiple time points. To test the statistical behavior

of the new model, simulation studies were per-

formed, suggesting that three-way epistasis can well

be estimated when an adequately large sample size is

used.

With the model described in this article, the in-

vestigation of whether three-way epistasis is a wide-

spread phenomenon in plant height can now be

made possible by reanalyzing published mapping

data or designing new mapping experiments.

In practice, it is impossible to precisely estimate a

genetic effect from a single study. The uncertainty

of chromosomal locations and genetic effects for

QTLs can be overcome through replicating the

same experiments. The estimates of these QTL par-

ameters from multiple replicates are closer to the

reality of the parameters.

As genome-wide association studies (GWAS)

have emerged as a useful tool for plant, animal and

human genetics [53–57], it is crucial to incorporate

the multilocus epistasis detection model to illustrate a

network of genetic interactions throughout the

genome. In the current model specification, we do

not consider environmental factors. Given its im-

portance, genotype� environment interaction

should be integrated into the high-order epistasis

model [12]. Also, it is worthwhile to model the

pleiotropic effects of high-order epistasis on different

aspects of phenotypic traits [58, 59]. The major factor

of limiting these extensions is the combinatory search

of too many interactions on a much smaller number

of samples. However, the recent availability of fea-

ture selection methods [60], equipped with efficient

computing algorithms, such as genetic programming

[13], provides an unprecedented opportunity to pro-

duce a useful statistical toolbox for dissecting com-

plex phenotypes into their genetic components at

different levels. The computer code used to detect

and test high-order epistasis is available at http://stat-

gen.psu.edu.

Key Points

� Despite considerable efforts to dissect the genetic architectureof complex traits, much still remains unclear including the distri-bution, mechanisms and importance of genetic interactions.High-order epistasis due to multilevel interactions of genesis thought to be the hidden genetic variation that has not beenutilized in agriculture and biomedicine.

� Genetic mapping, now used as a routine approach for studyingthe genetic regulation of quantitative traits, has a unique powerto characterize pair-wise epistatic interactions. However, thereis still no in-depth exploration to estimate and test the geneticeffects due to interactions among three ormore genes.

� We formulate and assess a state-of-art statistical procedure ofimplementing genetic mapping to detect high-order epistasis,filling a gap that occurs in quantitative genetics, evolutionarygenetics andmedicalgenetics.We argue thathigh-order epistaticmapping can serve as a routine tool to comprehend the geneticarchitecture of complex traits.

ACKNOWLEDGEMENTSThe authors thank Prof. Jun Zhu at Zhejiang University for

providing the rice data to validate the high-order epistatic map-

ping model.

FUNDINGThe Special Fund for Forestry Scientific Research

in the Public Interest (No. 201004017), NSF/

IOS-0923975; the Changjiang Scholarship Award;

and ‘Thousand-person Plan’ Award.

References1. Whitlock MC, Phillips PC, Moore BG, et al. Multiple fit-

ness peaks and epistasis. Ann Rev Ecol Syst 1995;26:601–29.

2. Wolf JB, Brodie EE III, Wade MJ. Epistasis and theEvolutionary Process. Oxford: Oxford University Press, 2000.

3. Carlborg O, Haley CS. Epistasis: too often neglected incomplex trait studies? Nat Rev Genet 2004;5:618–25.

4. Phillips PC. Epistasis – the essential role of gene interactionsin the structure and evolution of genetic systems. Nat RevGenet 2008;9:855–67.

5. Moore J, Williams S. Epistasis and its implications forpersonal genetics export. AmJHumGenet 2009;85:309–20.

6. Huang LS, Sternberg PW. Genetic dissection of develop-mental pathways. Methods Cell Biol 1995;48:97–122.

7. Wu RL, Ma CM, Lin M, et al. A general framework foranalyzing the genetic architecture of developmental charac-teristics. Genetics 2004;166:1541–51.

8. Imielinski M, Belta C. Deep epistasis in human metabolism.CHAOS 2010;20:026104.

9. Martin MP, Gao X, Lee JH, et al. Epistatic interaction be-tween KIR3DS1 and HLA-B delays the progression toAIDS. Nat Genet 2003;3:429–34.

10. Gabutero E, Moore C, Mallal S, et al. Interaction betweenallelic variation in IL12B and CCR5 affects the develop-ment of AIDS. AIDS 2007;21:65–9.

312 Pang et al. by guest on January 13, 2016

http://bib.oxfordjournals.org/D

ownloaded from

11. Hinkley T, Martins J, Chappey C, etal. A systems analysis ofmutational effects in HIV-1 protease and reverse transcript-ase. Nat Genet 2011;43:487–9.

12. Lukens LN, Doebley J. Epistatic and environmental inter-actions for quantitative trait loci involved in maize evolu-tion. Genet Res 1999;74:291–302.

13. Nunkesser R, Bernholt T, Schwender H, et al. Detectinghigh-order interactions of single nucleotide polymorphismsusing genetic programming. Bioinformatics 2009;23:3280–8.

14. Ritchie MD, Hahn LW, Roodi N, et al. Multifactor-dimensionality reduction reveals high-order interactionsamong estrogen-metabolism genes in sporadic breastcancer. AmJHumGenet 2001;69:138–47.

15. McMullen MD, Byrne PF, Snook ME, et al. Quantitativetrait loci and metabolic pathways. Proc Natl Acad Sci U S A1998;95:1996–2000.

16. Stich B, Yu J, Melchinger AE, et al. Power to detecthigher-order epistatic interactions in a metabolic pathwayusing a new mapping strategy. Genetics 2007;176:563–70.

17. Gutierrez J. A developmental systems perspective on epis-tasis: Computational exploration of mutational interactionsin model developmental regulatory networks. PLoS One2009;4(9):e6823.

18. Pettersson M, Besnier F, Siegel PB, et al. Replication andexplorations of high-order epistasis using a large advancedintercross line pedigree. PLoSGenet 2011;7(7):e1002180.

19. Wang Z, Liu T, Lin Z, et al. A general model for multilocusepistatic interactions in case-control studies. PLoS One2010;5(8):e11384.

20. Lander ES, Botstein D. Mapping Mendelian factors under-lying quantitative traits using RFLP linkage maps. Genetics1989;121:185–99.

21. Zeng Z-B. Precision mapping of quantitative trait loci.Genetics 1994;136:1457–68.

22. Xu S. Estimating polygenic effects using markers of theentire genome. Genetics 2003;163:789–801.

23. Wu RL, Ma CX, Casella C. StatisticalGenetics ofQuantitativeTraits: Linkage, Maps, and QTL. New York: Springer-Verlag,2007.

24. Kao C-H, Zeng Z-B. Modeling epistasis of quantitative traitloci using Cockerham’s model. Genetics 2002;160:1243–61.

25. Shah SH, Schmidt MA, Mei H, et al. Searching for epistaticinteractions in nuclear families using conditional linkageanalysis. BMCGenetics 2005;6(Suppl 1):S148.

26. Li Y, Berg A, Chang MN, et al. A statistical model forgenetic mapping of viral infection by integrating epidemio-logical behavior. Statist Appl Genet Mol Biol 2009;8(1):Article 38.

27. Yi NJ, Shriner D, Banerjee S, et al. An efficient Bayesianmodel selection approach for interacting QTL models withmany effects. Genetics 2007;176:1865–77.

28. Hansen TF, Wagner GP. Epistasis and the mutation load:a measurement-theoretical approach. Genetics 2001;158:477–85.

29. Beerenwinkel S, Pachter L, Sturmfels B, et al. Analysis ofepistatic interactions and fitness landscapes using a new geo-metric approach. BMCEvol Biol 2007;7:60.

30. Imielinski M, Belta C. Exploiting the pathway structure ofmetabolism to reveal high-order epistasis. BMC Syst Biol2008;2:40.

31. Huang N, Parco A, Mew T, et al. RFLP mapping ofisozymes, RAPD and QTLs for grain shape, brownplanthopper resistance in a doubled haploid rice population.Mol Breed 1997;3:105–13.

32. Bateson W. Mendel’s Principles of Heredity. Cambridge, UK:Cambridge University Press, 1909.

33. Steen KV. Travelling the world of gene-gene interactions.Brief Bioinform 2012;13: 1–19.

34. Elliger CA, Chan BG, Waiss AC, et al. C-glycosylflavonesfrom Zeamaye that inhibit ineect development. Phyto-chemistry 1980;19:293–7.

35. Franken P, Niesbach-Klosgen U, Weydemann U, et al.The duplicated chalcone synthase genes C2 and Whp(white pollen) of Zea mays are independently regulated;evidence for translational control of Whp expression bythe anthocyanin ntensifying gene in. EMBO J 1991;10:2605–12.

36. Grotewold E, Peterson T. Isolation and characterization ofa maize gene encoding chalcone flavonone isomerase.MolGenGenet 1994;242:1–8.

37. Styles ED, Ceska O. Genetic control of 3-hydroxy- and3-deoxy-flavonoids in Zea mays. Phytochemistry 1975;14:413–5.

38. Heller W, Forkmann G. Biosynthesis of flavonoids. In:Harborne JB, (ed). The Flavonoids: Advances in Research Since1986. London: Chapman & Hall, 1994;499–535.

39. Marrs KA, Alfenito MR, Lloyd AM, et al. A glutathioneS-transferase involved in vacuolar transfer encoded by themaize gene Bronze-2. Nature 1995;375:397–400.

40. Fisher RA. The correlations between relatives on the sup-position of Mendelian inheritance. Trans R Soc Edinb 1918;52:399–433.

41. Mather K, Jinks JL. Biometrical Genetics. 3rd edn. London:Chapman & Hall, 1982.

42. Churchill GA, Doerge RW. Empirical threshold values forquantitative trait mapping. Genetics 1994;138:963–71.

43. Wang CG, Wang Z, Luo JT, etal. A model for transgenera-tional imprinting variation in complex traits. PLoS One2010;5(7):e11396.

44. Dupuis J, Siegmund D. Statistical methods for mappingquantitative trait loci from a dense set of markers. Genetics1999;151:373–86.

45. Piepho H-P. A quick method for computing approximatethresholds for quantitative trait loci detection. Genetics 2001;157:425–32.

46. Chang MR, Wu RL, Wu S, et al. Score statistics of quan-titative trait locus mapping. Stat Appl Genet Mol Biol 2009;8(1):Article 16.

47. Wu JS, Zhang B, Cui YH, et al. Genetic mapping of devel-opmental instability: Design, model and algorithm. Genetics2007;176:1187–96.

48. Senthilvel S, Vinod KK, Senthilvel P, etal. QTL and QTL�environment effects on agronomic and nitrogen acquisitiontraits in rice. J Integ Plant Biol 2008;50:1108–17.

49. Bartual R, Lacasa A, Marsal JI, etal. Epistasis in the resistanceof pepper to phytophthora stem blight (Phytophthora capsiciL.) and its significance in the prediction of double crossperformances. Euphytica 1994;72:149–52.

50. Wu RL. Detecting epistatic genetic variance with aclonally replicated design: models for low- vs.

Mapping high-order epistasis for complex traits 313 by guest on January 13, 2016

http://bib.oxfordjournals.org/D

ownloaded from

high-order nonallelic interaction. TheorApplGenet 1996;93:102–09.

51. Yan JQ, Zhu J, He CX, et al. Molecular dissection of de-velopmental behavior of plant height in rice (Oryza sativaL.). Genetics 1998;150:1257–65.

52. Zhao W, Zhu J, Gallo-Meagher M, etal. A unified statisticalmodel for functional mapping of genotype by environmentinteractions for ontogenetic development. Genetics 2004;168:1751–62.

53. Gayan J, Gonzalez-Perez A, Bermudo F, et al. A methodfor detecting epistasis in genome-wide studies using case-control multi-locus association analysis. BMC Genomics2007;9:360.

54. Huang X, Wei X, Sang T, et al. Genome-wide associationstudies of 14 agronomic traits in rice landraces. Nat Genet2010;42:961–7.

55. Marchini J, Donnelly P, Cardon LR. Genome-widestrategies for detecting multiple loci that influence complexdiseases. Nat Genet 2005;37:413–7.

56. Wang WY, Barratt BJ, Clayton DG, et al. Genome-wideassociation studies: theoretical and practical concerns. NatRev Genet 2005;6:109–18.

57. Wan X, Yang C, Yang Q, etal. Predictive rule inference forepistatic interaction detection in genome-wide associationstudies. Bioinformatics 2010;26:30–7.

58. Hlavacek WS, Faeder JR. The complexity of cell signalingand the need for a new mechanics. Sci Signal 2009;2(81):pe46.

59. Weng GZ, Bhalla US, Iyengar R. Complexity in biologicalsignaling systems export. Science 1999;284:92–6.

60. Li JH, Das K, Fu G, et al. Bayesian lasso for genome-wideassociation studies. Bioinformatics 2011;27: 516–23.

314 Pang et al. by guest on January 13, 2016

http://bib.oxfordjournals.org/D

ownloaded from