the complex inheritance of maize domestication …...2014/06/22 · the dissertation is approved by...
TRANSCRIPT
The Complex Inheritance of Maize Domestication Traits and
Gene Expression
By
Zachary H. Lemmon
A dissertation submitted in partial fulfillment of
the requirements for the degree of
Doctor of Philosophy
(Genetics)
at the
UNIVERSITY OF WISCONSIN – MADISON
2014
Date of final oral examination: 4/29/14
The dissertation is approved by the following members of the Final Oral Committee:John F. Doebley, Professor, GeneticsDavid A. Baum, Professor, Botany and GeneticsShawn M. Kaeppler, Professor, AgronomyPatrick H. Masson, Professor, GeneticsBret A. Payseur, Professor, Genetics
i
Acknowledgements
I want to extend my thanks to John Doebley for making this dissertation possible. John
has been a constant voice of encouragement and insight throughout my graduate career.
He has been instrumental in keeping me focused on the big question, while allowing me
the freedom to chase down side interests and projects. John has taught me the importance
of focusing my scientific inquiry on the core of a research question, which has shaped the
way I approach research. While I have carried out the experiments described in this work,
the first steps taken in these projects belong to John and I am grateful for the chance I
was given to shepherd them to completion. Every day and conversation I have had with
John as my advisor has made me into a better scientist and I am extremely thankful for
the opportunity I was given six years ago when I joined the Doebley lab.
I have been fortunate enough to also work in an outstanding lab full of supportive
individuals on both a personal and professional level. The work performed by a number
of my fellow lab members was crucial to the completion of these experiments. Without
their help the many DNA and RNA extractions, PCR reactions, measured phenotypes,
and plants grown would simply have not happened. Fellow graduate students, postdocs,
lab technicians, and undergraduate workers have all assisted in their own way. I am
also thankful that in addition to being wonderful coworkers in a professional sense, lab
members have contributed to making the lab a fun, exciting, and enjoyable place to
spend my Ph.D. career. I will never forget the power of “Tak”, being “skinny up top”,
or the “lab master”. To Tony, Laura, CJ, Ali, Bao, Tina, Lisa, Eric III, Jesse, Elizabeth,
David, Claudia, Wei, and the numerous undergrads, thank you for making this wonderful
experience possible.
In addition to my friends and colleagues at Wisconsin, I have been fortunate enough to
be involved in a larger community of maize researchers at Cornell University, University of
ii
Missouri, North Carolina State University, and University of California - Davis. Working
with these scientists has exposed me to a variety of questions and topics in maize research
regarding phenotype, quantitative genetics, and large scale data collection and analysis
resulting in a greatly expanded experience. In particular, collaborations with Qi Sun
and Robert Bukowski at Cornell have greatly contributed to analysis in the third chapter
of this thesis. Also dialog with Jeff Ross-Ibarra and Matt Hufford at UC-Davis has
continuously provided me with insight into the population genetics of maize domestication
and given me a valuable resource to draw on.
My Ph.D. committee has been an excellent resource during my graduate career. Bret
Payseur and Shawn Kaeppler in particular have provided valuable insight into scientific
questions and suggested analyses that have become part of this dissertation. David Baum
has always made time in his busy schedule to meet with me and keep up to date with my
progress. Finally, Patrick Masson has been a constant source of encouragement and has
assisted me in several capacities both within and outside of the Ph.D. committee.
I am also eternally grateful to my family, who have stood by my side throughout
this process. My parents, Karen and Holden, for giving me the tools and opportunity to
pursue my goals. My sisters, Addie and Kelsey, for always being there and my wonderful
nieces, Laney and Havi, for always making me smile. My amazing friend, Alex, who has
been a constant source of support in my life and is one of the family now. Finally, my
wife Megan, you have kept me grounded throughout these six years in Madison in both
the good and bad times. You are my rock and this would not have been possible without
you.
iii
Contents
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
iv
1 Genetic dissection of a genomic region with pleiotropic effects on do-
mestication traits in maize reveals multiple linked QTL 1
1.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Plant Material, Genotypes, and Phenotypes . . . . . . . . . . . . . 6
1.3.2 Mixed Models and Heritability . . . . . . . . . . . . . . . . . . . . . 7
1.3.3 QTL Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.4 Simulation Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 QTL mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.2 Simulation Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Fine mapping of chromosome five domestication genes in maize 26
2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.1 Plant material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.2 Field Trials and Phenotypes . . . . . . . . . . . . . . . . . . . . . . 32
2.3.3 Genotyping with PCR and next generation sequencing . . . . . . . 33
2.3.4 Statistical analysis and segregation of phenotypes . . . . . . . . . . 35
2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.1 RCNIL generation and phenotype least squared means . . . . . . . 38
2.4.2 PCR and GBS genotyping . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.3 QTL fail to segregate as Mendelian traits . . . . . . . . . . . . . . . 42
2.4.4 Multiple factors contribute to culm diameter and kernel row number 45
v
2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.5.1 The complex genetic architecture of culm and kernel row number . 48
2.5.2 Future work on chromosome five QTL . . . . . . . . . . . . . . . . 50
3 The role of cis regulatory evolution in maize domestication 52
3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.1 Plant material, RNA preparation, and sequencing . . . . . . . . . . 56
3.3.2 Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.3 Maize:teosinte gene expression ratios . . . . . . . . . . . . . . . . . 58
3.3.4 Testing for cis and trans effects . . . . . . . . . . . . . . . . . . . . 59
3.3.5 Candidate genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3.6 Proportion of cis variation in maize and teosinte . . . . . . . . . . . 62
3.3.7 Additive and dominant gene expression . . . . . . . . . . . . . . . . 63
3.3.8 CCT gene enrichment in various functional categories . . . . . . . . 64
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4.1 RNAseq provides expression data for more than 17,000 genes per
tissue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4.2 Prolific regulatory variation characterized by relatively few consis-
tent cis differences . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4.3 Possible directional bias in cis evolution . . . . . . . . . . . . . . . 74
3.4.4 Gene expression variation is greater in teosinte . . . . . . . . . . . . 76
3.4.5 Selection candidate genes are enriched for CCT genes . . . . . . . . 78
3.4.6 Microarray and RNAseq data partially correspond . . . . . . . . . . 81
3.4.7 CCT genes are unrelated to differentially methylated regions . . . . 83
3.4.8 Dominant and additive gene expression inheritance . . . . . . . . . 85
vi
3.4.9 Candidate genes enriched in various functional categories . . . . . . 86
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.5.1 Regulatory change between and within maize and teosinte . . . . . 89
3.5.2 What is the frequency of cis and trans regulatory change? . . . . . 90
3.5.3 Tissue specific expression of CCT candidates . . . . . . . . . . . . . 92
3.5.4 Bias toward increased maize expression? . . . . . . . . . . . . . . . 93
3.5.5 Selection-candidates enriched for cis regulatory change . . . . . . . 94
3.5.6 Leaf tissue candidates are enriched for photosynthesis and chloro-
plast GO terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.5.7 Do crop domestication genes show cis differences? . . . . . . . . . . 96
3.5.8 A catalog of genes with cis regulatory variation . . . . . . . . . . . 96
vii
Appendices 99
A Supplemental Content: Genetic dissection of a genomic region with
pleiotropic effects on domestication traits in maize reveals multiple
linked QTL 100
A.1 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
A.2 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
B Supplemental Content: Fine mapping of chromosome five domestication
genes in maize 106
B.1 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
C Supplemental Content: The role of cis regulatory evolution in maize
domestication 109
C.1 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
C.2 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
D Characterization of domestication traits for selection candidate gene Zea
agamous2 157
D.1 Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
D.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
D.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
D.3.1 RCNILs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
D.3.2 Transgenic RNAi lines . . . . . . . . . . . . . . . . . . . . . . . . . 161
D.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
D.4.1 RCNILs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
D.4.2 Transgenic RNAi lines . . . . . . . . . . . . . . . . . . . . . . . . . 163
D.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
viii
References 168
ix
List of Figures
1.1 Cumulative plot of QTL detected in the mapping experiment. . . . . . . . 15
1.2 The number of detected QTL and mean detected QTL effect size versus
number of simulated causative loci. . . . . . . . . . . . . . . . . . . . . . . 19
1.3 The proportion of detected QTL with zero, one, or more than one simulated
causative genes in the 1.5 LOD support interval. . . . . . . . . . . . . . . . 21
2.1 Histograms of least squared means for the culm diameter and kernel row
number phenotypes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2 GBS genotypes for kernel row number RCNILs. . . . . . . . . . . . . . . . 41
2.3 RCNILs sorted by phenotype from least to greatest. . . . . . . . . . . . . . 43
2.4 Density plots of the culm diameter and kernel row number phenotypes
grouped by founding HIF. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5 QTL LOD profiles for fine mapping of culm diameter and kernel row num-
ber traits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1 Overlap of genes assessed in the three tissues overall and in the CCT-AB
gene list. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2 Parent versus hybrid ear tissue allele specific expression ratios. . . . . . . . 72
3.3 Proportion of expression divergence due to cis regulatory difference. . . . . 73
x
3.4 Cis versus estimated trans regulatory effect for CCT-ABC genes in the ear,
leaf, and stem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5 The proportion of average maize to teosinte R2 from linear models explain-
ing F1 hybrid expression by maize and teosinte parent. . . . . . . . . . . . 77
3.6 Density plots of ln(XPCLR) score of conserved versus CCT-AB candidate
genes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.7 Proportion of cis only and trans only genes identified as having dominant
or additive inheritance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
A.1 Histograms of the least squared means for phenotyped traits from the QTL
mapping population. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
A.2 Example histograms of simulated traits for several different conditions in
terms of number of causative loci, effect size, and heritability. . . . . . . . 102
A.3 Proportion of detected QTL with zero, one, or multiple causative genes in
the 1.5 LOD support interval. . . . . . . . . . . . . . . . . . . . . . . . . . 103
C.1 Parent versus hybrid leaf tissue allele specific expression ratios. . . . . . . . 110
C.2 Parent versus hybrid stem tissue allele specific expression ratios. . . . . . . 111
C.3 Dominance by additivity ratio grouped by regulatory category. . . . . . . . 112
D.1 Single kernel weight estimates for zag2 RCNILs. . . . . . . . . . . . . . . . 164
xi
List of Tables
1.1 NIRIL phenotyped traits, descriptions, approximate distribution, between
year Pearson correlation coefficients, and Pearson p-values. . . . . . . . . . 8
1.2 Final models selected for the thirteen NIRIL phenotypes. . . . . . . . . . . 9
1.3 Detected QTL for the T5S mapping population with position, heritability,
and LOD score statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Final linear mixed models used to produce least squared means for fine
mapping RCNILs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 Detected QTL and HIF effects including LOD, percent variation explained,
and additive effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1 Regulatory category as defined by significant (Sig.) or not significant (Not
Sig.) binomial tests (BT) and Fisher’s Exact Tests (FET). . . . . . . . . . 60
3.2 Assignable RNAseq Read Counts from F1 hybrids and parents. . . . . . . . 68
3.3 Genes for which RNAseq data was collected and expression was assayed.1 . 69
3.4 Fisher’s Exact Tests for overlap of selection and CCT candidates. . . . . . 80
3.5 Fisher’s Exact Tests for enrichment/depletion of cis and trans only genes
in selection features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.6 Fisher’s Exact Tests for overlap between microarray and CCT differentially
expressed genes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
xii
3.7 Regulatory category of the closest maize homolog of 6 maize and 22 non-
maize domestication loci. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
A.1 RFLP Markers used during backcrossing of QTL mapping population. . . . 104
A.2 Genetic markers used to score BC6S6 mapping population. . . . . . . . . . 105
B.1 PCR markers used for genotyping RCNILs including gene or SNP target,
AGPv2 position, and primer sequence. . . . . . . . . . . . . . . . . . . . . 107
C.1 Biological replicates for RNAseq experiment. . . . . . . . . . . . . . . . . . 113
C.2 Adapter name, barcode sequence, and barcode length for Illumina adapters
used in RNAseq libraries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
C.3 Number of genomic paired end reads and coverage obtained for constructing
pseudo-transcriptomes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
C.4 Proportion of divergence due to cis regulatory effect grouped by overall
parental divergence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
C.5 The number of genes for which the maize or teosinte allele is expressed at
a higher level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
C.6 Bias for the maize allele grouped by inbred line for the three tissues in the
CCT-ABC gene list. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
C.7 Allele specific expression variation among F1 hybrids explained by maize
and teosinte parent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
C.8 Number of genes with significant cis expression variation explained by
maize and/or teosinte. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
C.9 Comparison of observed and expected numbers of genes classified as differ-
entially expressed (DE) or not differentially expressed (NDE) by RNAseq
and MicroArray assays in groups A, B, and C in the three tissue types. . . 121
xiii
C.10 Regulatory categories for genes identified as differentially expressed be-
tween maize and teosinte by microarray assays. . . . . . . . . . . . . . . . 122
C.11 Fisher’s Exact Tests for the overlap between genes associated with differ-
entially methylated regions (DMRs) and CCT-ABC genes from each of the
three experimental tissues in our work. . . . . . . . . . . . . . . . . . . . . 123
C.12 Number of candidate genes neighboring differentially methylated regions
(DMRs) between maize and teosinte and proportion in which expression
data agrees with methylated status. . . . . . . . . . . . . . . . . . . . . . . 124
C.13 Dominance/additivity ratios for genome-wide gene expression . . . . . . . 125
C.14 Contingency tables for additive and dominant gene counts for A, AB, and
ABC candidate lists. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
C.15 Degree of overlap between our CCT (AB list) genes and genes in different
transcription factor families. . . . . . . . . . . . . . . . . . . . . . . . . . . 127
C.16 Degree of overlap between CCT (AB list) differentially expressed genes and
genes in the 1.5 support intervals for QTL from a previous study. . . . . . 133
C.17 Degree overlap between our CCT (AB list) differentially expressed genes
and genes in metabolic pathways defined in KEGG. . . . . . . . . . . . . . 134
C.18 Significantly enriched and depleted GO terms from CCT and trans only
gene lists. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
D.1 Trait abbreviations and descriptions from the zag2 experiment. . . . . . . 162
D.2 Zag2 transgenic RNAi insertion event, background, phenotype, and t-test
p-value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
xiv
Abstract
The genetic basis for morphological change in divergent species is a central question in
evolutionary biology. The domestication of maize from its wild progenitor, teosinte, is an
excellent system to address this question. We explore the large effect on domestication
phenotypes of a poorly understood region of the maize genome using a chromosome five
specific mapping population. Unlike other large effect regions of the maize genome, many
traits have multiple QTL that do not stack on a single locus suggesting multiple genes on
the fifth chromosome influence domestication traits. Simulation studies show clear evi-
dence for limited power to detect QTL for highly polygenic traits that do not accurately
portray the true complexity of the underlying genetic architecture. Two QTL in different
locations were chosen for fine mapping studies to identify the underlying causative genes.
While a single gene was not identified for either QTL, both were successfully narrowed
to less than three centimorgan intervals with relatively few genes and evidence of positive
selection during maize domestication. Finally, the first genome-wide effort to characterize
cis and trans regulatory change between a domesticated crop and its wild progenitor found
extensive regulatory variation with relatively few genes having consistent cis differences,
which were determined to be under positive selection during the domestication and crop
improvement of maize. Consistent with loss of diversity during the domestication bottle-
neck, cis expression variation explained by the maize parent is reduced in comparison to
teosinte with an even greater reduction seen in cis candidate genes. A general increase in
the expression of maize alleles was also observed suggesting domestication in maize may
have led to a general increase in gene expression. Collectively, these experiments shed
light on the evolution of divergent phenotypes and gene regulation in the domesticated
maize and its wild progenitor.
xv
Preface
The nature of functional changes to the genes responsible for phenotypic divergence in re-
lated species is a topic of ongoing research in evolutionary biology. Many types of genomic
features have been shown to influence the development of novel phenotypes. Studies in
closely related species have identified gene duplications [1], various types of expression
modification [2–4], and gene coding changes [5, 6] that give rise to altered phenotypes. A
major contributor to evolutionary biology research is the study of domesticated crops and
their wild ancestors, where the intense artificial selection for agronomic traits during the
domestication process serves as a proxy for natural selection mechanisms. Experiments
characterizing the functional changes responsible for novel phenotypes in the domesticated
systems of rice, tomato, wheat, and sorghum have been met with great success [7].
One of the most successfully used domestication crop models is maize, where scientists
have extensively investigated the morphological differences between maize (Zea mays spp.
mays) and its wild progenitor (Zea mays spp. parviglumis). Maize is an excellent sys-
tem to pursue evolutionary questions for a number of reasons. Maize was domesticated
approximately 9,000 years ago in the Balsas River valley of Mexico [8]. Like other domes-
ticated systems, maize-teosinte F1 hybrids are fertile, which allows the use of powerful
genetic techniques to dissect the genetics of complex traits. The maize reference genome
also greatly facilitates research by empowering the use of sequence based analyses and
comparative genomics [9]. A common collection of phenotypic differences seen between
domesticated crops and their wild progenitors is also observed when comparing maize and
teosinte. This “domestication syndrome” [10, 11] consists of phenotypes that improve the
suitability of a crop for human use such as loss of shattering (natural seed dispersal),
increased apical dominance, loss of prolificacy (concentration of seed into one unit), and
gigantism of vegetative and reproductive tissues.
xvi
One method commonly used to examine genetic factors controlling morphological vari-
ation in maize is quantitative trait locus (QTL) mapping. Studies examining the domes-
tication of maize have shown QTL representing the profound morphological differences
between maize and its wild progenitor teosinte can be primarily attributed to six regions
of large effect on the first five chromosomes of maize [8]. Three of these genomic features
have been further characterized, identifying single genes of large, pleiotropic effect. The
functional causative polymorphisms of these genes include new tissue specific expression
patterns [4], elevated expression [3], and coding sequence change [5]. In contrast to these
well characterized loci, other regions of the genome with large effect on domestication
phenotypes are poorly understood.
A prominent theory in evolutionary biology suggests the primary mechanism by which
adaptive evolution occurs is through modification of cis regulatory elements [12, 13].
Consistent with this theory, altered cis regulatory elements in domesticated crops account
for a large proportion of identified domestication genes [7]. A striking characteristic of
these genes is the variety of functional changes that result from cis regulatory change
with examples including elevated and decreased expression [3, 14], development of novel
tissue specific expression patterns [4, 15], and heterochronic shifts in expression [16]. The
demonstrated importance of gene regulatory change in the evolution of new forms has
led to several studies investigating genome-wide gene expression in domesticated crops
[17–19]. While measuring gene expression differences between a modern crop and its wild
relative is an important step in exploring regulatory variation in an evolutionary context,
it falls short of the global analyses in yeast and fruit fly [20, 21] that specifically dissect
cis and trans regulatory variation.
The work presented in this dissertation seeks to explore two facets of diversification
between maize and teosinte. First, quantitative genetic methods are used to specifically
assess the architecture of domestication QTL and causative genes on the fifth chromosome
xvii
of maize, providing insight into the genetic factors underlying this previously uncharac-
terized region of large phenotypic effect in the maize genome. Second, genome-wide regu-
latory variation due to cis and trans regulatory change is investigated on a genome-wide
scale using deep RNA sequencing. This work is presented in three chapters.
1. The first chapter describes a chromosome five specific QTL mapping experiment. A
large BC6S6 population was developed while fixing other regions known to impact
domestication traits for a homozygous genotype. Thirteen phenotypes representing
differences between the progenitor and maize were measured in two summers and
QTL mapping was performed. We detected an average of approximately two QTL
per trait with QTL mapping to multiple regions. This suggested that unlike other
genomic regions of importance in maize domestication, the fifth chromosome houses
a complex of linked loci that all contribute to the phenotypic effect. Additional
efforts were made to examine the power and precision of our mapping population
with simulated trait datasets. Heritability of a trait was found to have the primary
influence on the maximum number of detectable QTL and we observed the Beavis
Effect on estimated QTL effect size. This work provides a focused examination of
a previously poorly understood region of the maize genome with large phenotypic
effects on domestication traits.
2. The second chapter focuses on fine mapping efforts for two QTL for culm diame-
ter and kernel row number on the fifth chromosome identified in chapter one. Our
strategy used a population of plants with homozygous recombinant chromosomes
in replicated field trials. Neither QTL was successfully mapped to a single gene,
however, the culm diameter QTL was greatly reduced in size (∼2.5% of the original
1.5 LOD support interval). The kernel row number QTL was analyzed with whole
genome genotyping data and a complex set of genetic factors influencing the trait
were identified. The main kernel row number QTL in terms of LOD score on chro-
xviii
mosome five shifted to a different region outside of the original support interval. The
culm diameter and kernel row number QTL contained 40 and 63 genes, respectively,
which were examined for attractive candidate genes. Neither QTL had a clear best
candidate, but several genes showed evidence for cis regulatory change and multiple
genes had evidence of positive selection during the domestication of maize. While
this work was unsuccessful in identifying a single causative gene, we greatly reduce
the size of the culm diameter QTL and find evidence for complex inheritance of the
kernel row number phenotype.
3. Finally, the extent of genome-wide gene regulatory change is examined using next
generation sequencing methods. Three tissues from a collection of maize-teosinte
F1 hybrids and their inbred parents were harvested and next generation Illumina
sequencing was performed to assess differential expression of alleles. Using a hier-
archical series of statistical tests, we differentiate between significant cis and trans
regulatory effects for approximately 17,000 genes in each of the three tissues studied.
We produce a list of filtered candidate genes (∼500 genes per tissue) with significant
and consistent cis effects. These genes are significantly associated with selection
features from a recent genome-wide scan for selection in maize, suggesting genes
with cis regulatory changes are frequently the target of positive selection. Addi-
tionally, the proportion of effect due to cis was observed to be positively correlated
with overall divergence. Several other characteristics of the candidate cis genes were
also analyzed including gene ontology and other functional annotations. This study
represents the first genome-wide effort in a domesticated crop and wild progenitor
to assess allele specific expression dissecting cis and trans effects using F1 hybrids.
1
Chapter 1
Genetic dissection of a genomic
region with pleiotropic effects on
domestication traits in maize reveals
multiple linked QTL
2
1.1 Abstract
The domesticated crop maize and its wild progenitor, teosinte, have been used in numerous
experiments to investigate the nature of divergent morphologies. This study examines a
poorly understood region on the fifth chromosome of maize associated with a number of
traits under selection during domestication using a QTL mapping population specific to
the fifth chromosome. In contrast with other major domestication loci in maize where
large effect, highly pleiotropic, single genes are responsible for phenotypic effects, our
study found the region on chromosome five fractionates into multiple QTL, none with
singularly large effects. The smallest 1.5 LOD support interval for a QTL contained
54 genes, one of which was a MADS MIKCC transcription factor, a family of proteins
implicated in many developmental programs. We also used simulated trait datasets to
investigate the power of our mapping population to identify QTL for which there is a
single underlying causal gene. This analysis showed that while QTL for traits controlled
by single genes can be accurately mapped, our population design can detect no more than
∼4.5 QTL per trait even when there are 100 causal genes. Thus when a trait is controlled
by 5 or more genes in the simulated data, the number of detected QTL can represent a
simplification of the underlying causative factors. Our results show how a QTL region
with effects on several traits may be due to multiple linked QTL of small effect as opposed
to a single gene with large and pleiotropic effects.
3
1.2 Introduction
In evolutionary biology, quantitative trait locus (QTL) mapping has been used with great
success to define the genetic architecture controlling morphological differences between
species. These QTL mapping experiments have identified a number of QTL with large
effects in animal [22–24] and plant systems [25–28]. Often these experiments identify QTL
clusters in a relatively small number of genomic regions, suggesting an underlying genetic
architecture of single pleiotropic genes or several closely linked genes [8, 24, 29–31]. The
phenotypic effects of QTL have been successfully mapped to single large effect pleiotropic
genes in many species [3, 5, 15, 16, 32–34]. However, these large effect genes often only
explain a portion of the divergence between species, leaving a considerable amount of
phenotypic differences unexplained. Characterization of QTL clusters not associated with
single genes will lead to a more comprehensive understanding of the genetic architecture
that contributes to divergent phenotypes.
Domesticated crop plants and maize in particular provide a well-suited system in which
to study the evolution of new morphologies for a number of reasons. First, maize (Zea
mays spp. mays) and its wild progenitor teosinte (Z. mays spp. parviglumis) differ for a
suite of traits commonly seen in domesticated crop pairs. Collectively, these differences
are known as the domestication syndrome and include reduced lateral branching, loss
of natural seed dispersal, and gigantism of vegetative and reproductive tissues [10, 11].
Second, intense artificial selection upon domesticated crops, including maize, for desirable
agronomic traits leaves a signature of selection (reduced nucleotide diversity) allowing for
identification of putative targets of artificial selection in selective sweeps [35]. Third, like
most domestication events, maize domestication took place in the last 10,000 years and
surviving wild progenitor populations serve as reasonable surrogates for the ancestor [36].
In addition, maize and teosinte are inter-fertile, allowing for the use of genetic techniques
and crosses to dissect the genetic architecture underlying divergent traits [37, 38]. Finally,
4
researchers studying maize have the advantage of a powerful tool in the reference maize
genome sequence providing the ability to anchor genetic markers to physical positions,
annotation of candidate genes, and characterization of important genomic features such
as centromeres [9]. The combination of these characteristics and available tools make
maize an effective model system in which to study the evolution of new forms.
Previous work in maize and its wild progenitor suggests the genes responsible for
phenotypic change are scattered throughout the genome but with several concentrations
of genes (QTL) controlling large portions of the phenotypic differences [8, 25]. To date,
three large effect pleiotropic genes have been mapped to these genomic regions of large
phenotypic importance. The short arm of chromosome one is home to grassy tillers1 (gt1 ),
which influences tillering [39] and is largely responsible for the concentration of seed into
a single large ear [4]. The gene teosinte branched1 (tb1 ) is found on the long arm of
chromosome one and has a large pleiotropic impact on plant and inflorescence branching
[3, 40]. Finally, the gene teosinte glume architecture1 (tga1 ) liberates the kernel from
its stony fruit case in teosinte [5]. In comparison to these extensively studied genes,
little is known about the genetic factors on other chromosomes responsible for phenotypic
divergence during maize domestication.
While early studies identified tb1 as the gene responsible for much of the phenotypic
effect on the long arm of chromosome one [41], a more recent study has identified at least
two additional loci upstream of tb1 with significant effects on phenotype [42]. These loci
influence the expression of tb1 -like phenotypes in both additive and epistatic ways. The
nearest of these loci was only 5 centimorgans (cM) away from tb1 itself and also had an
effect specific to ear traits, leaving plant architecture traits such as tillering unaffected.
This suggests secondary factors to major effect genes are potentially quite closely linked
and could also mediate tissue specific effects. Similarly, the work identifying gt1 also found
5
evidence of a secondary factor located downstream of the identified causative region that
slightly increases prolificacy (the number of ears) in plants carrying the teosinte allele [4].
One of the six genomic regions of large pleiotropic effect identified in maize is on
chromosome five where the genetic architecture underlying the large phenotypic effects
is largely unknown [8]. Previous work has found a number of domestication QTL on
chromosome five for culm diameter, kernel row number, ear diameter, disarticulation,
and pedicellate spikelet length [8, 37, 38]. A more recent experiment also found QTL
for a number of these traits on chromosome five, some of which (kernel row number, ear
diameter, and disarticulartion) had particularly large effect and LOD score [25]. While
these previous mapping experiments found significant QTL for domestication traits on
chromosome five, they could not determine whether this region contained a major QTL
with pleiotropic effects on several traits or multiple linked QTL.
In this paper, we undertook a QTL mapping study to better characterize the effect of
chromosome five on domestication traits. This experiment utilized a population of nearly
isogenic recombinant inbred lines (NIRILs) that allowed for concentration of informative
crossover events in the region of interest (chromosome five) and replicated block experi-
ments to improve trait measurements. Both of these characteristics increase the mapping
power specifically on chromosome five in comparison with a standard F2 mapping pop-
ulation, improving the ability to differentiate between closely linked, moderate to small
effect, and interacting QTL. Our QTL mapping detected QTL at multiple locations on
the fifth chromosome, none of which have singularly large effect. This suggests that un-
like other regions of the maize genome with single large effect genes [3–5], chromosome
five houses several linked factors influencing phenotype. We also performed a simulation
study to gauge the power and precision of our mapping population. This analysis indi-
cates that for some traits the genetic architecture could be more complex than observed
with empirical data.
6
1.3 Materials and Methods
1.3.1 Plant Material, Genotypes, and Phenotypes
We conducted a QTL mapping experiment to investigate the genetic architecture of do-
mestication traits on maize chromosome five using a collection of nearly isogenic recombi-
nant inbred lines (NIRILs) in the summers of 2009 and 2010. The experimental population
was built by introgressing the majority of the short arm of chromosome five and part of
the long arm from a teosinte (Iltis and Cochrane collection 81) into the maize inbred
W22 by six generations of backcrossing. RFLP markers (Supplemental Table A.1) were
used during this process to follow the desired genomic segment and eliminate teosinte
segments at other known domestication QTL identified in a previous study [43]. The
extensive backcrossing in tandem with tracking and eliminating teosinte segments from
specific regions of the genome allowed the experiment to be focused on the segregating
teosinte introgression on chromosome five. Five BC6 individuals heterozygous for the tar-
get segment on chromosome five were selfed to produce five BC6S1 families. The families
were then selfed for five additional generations to give an experimental BC6S6 population
of 259 highly homozygous NIRILs, which carried a collection of teosinte fifth chromosome
introgressions in an isogenic W22 background.
Genomic DNA was extracted with a standard CTAB protocol from tissue collected
from an average of 15 individuals from each NIRIL in the summer of 2009. A collec-
tion of 25 insertion/deletion and microsatellite markers (Supplemental Table A.2) were
genotyped across the fifth chromosome introgression using standard PCR and gel elec-
trophoresis methods. In total, there were 443 observed recombination breakpoints among
the NIRILs or approximately 1.7 events per line. The range of recombination breakpoints
went from zero to six with the majority of lines (51.7%) having either zero or a single
recombination event. The number of lines with each number of breakpoints are as fol-
7
lows: 56 (0 breakpoints), 78 (1 breakpoint), 49 (2 breakpoints), 48 (3 breakpoints), 19 (4
breakpoints), 7 (5 breakpoints), and 2 (6 breakpoints).
Phenotype data was collected for the experimental NIRILs in three replicated blocks,
two in the summer of 2009 and one in 2010, grown at the West Madison Agricultural
Research Station in Madison, Wisconsin. Blocks consisted of the 259 NIRILs planted
in randomized plots of ten or twelve plants each in 2009 and 2010, respectively. Five
plants from each plot were assessed for thirteen phenotypes (Table 1.1) representing a
number of plant and inflorescence phenotypic differences between teosinte and maize.
Plant traits included plant height, days to pollen shed, the amount of tillering, length of
the primary lateral branch, prolificacy, and culm diameter. Inflorescence traits measured
in the female inflorescence (ear) were kernels per rank, kernel row number, ear diameter,
ear length, and percent staminate spikelets. Several traits from the male inflorescence or
tassel were also measured and include the pedicellate spikelet length and tassel branch
number. Genotype and phenotype data are available from the Dryad Digital Repository:
http://dx.doi.org/10.5061/dryad.7sq67.
1.3.2 Mixed Models and Heritability
We estimated the NIRIL phenotype for all traits by fitting a linear mixed model. Fixed
effects consisted of NIRIL, NIRIL family, and position within block, while block and year
were used as random effects. A model (Equation 1.1) was fit with the MIXED procedure
in SAS [44] as an initial scope. In this model, Yijklmno is the individual trait value,
µ the overall mean, fj the family effect, ai(fj) is line nested in family, random block
effect is bk, horizontal and vertical position in the field nested in block are represented
by cl(bk) and dm(bk) respectively, tn the year, eijklmno is the experimental error (between
plots), and finally gijklmno for within plot sampling error. Each model term was tested for
significance on a trait-by-trait basis with t-tests for fixed effects and likelihood ratio tests
8
Table 1.1: NIRIL phenotyped traits, descriptions, approximate distribution, between yearPearson correlation coefficients, and Pearson p-values.
Trait Description Distribution Pearson p-value
CULM Diameter of culm normal 0.688 <0.0001DTP Days to pollen shed normal 0.668 <0.0001EARD Ear diameter bimodal 0.907 <0.0001EARL Ear length normal 0.409 <0.0001KPR Kernels per rank bimodal 0.698 <0.0001KRN Kernel row number bimodal 0.718 <0.0001LBLH Primary lateral branch length normal 0.519 <0.0001PLHT Plant height normal 0.652 <0.0001PROL Prolificacy, ears on lateral branch exponential 0.422 <0.0001SPLH Spikelet length normal N/A N/ASTAM Percent staminate spikelets exponential 0.321 <0.0001TBN Tassel branch number normal 0.691 <0.0001TILL Tillering index exponential 0.346 <0.0001
9
Table 1.2: Final models selected for the thirteen NIRIL phenotypes.
Trait Model
CULM line(family) + family + x(plot) + y(plot)DTP line(family) + family + x(plot) + y(plot) + x*y(plot)
EARD line(family) + family + x(plot) + y(plot) + x*y(plot)EARL line(family) + family + x(plot) + y(plot) + x*y(plot)KPR line(family) + family + x(plot) + y(plot) + x*y(plot)KRN line(family) + family + x(plot)LBLH line(family) + family + x(plot) + y(plot) + x*y(plot)PLHT line(family) + family + x(plot) + y(plot)PROL line(family) + family + x(plot)SPLH line(family) + family + xSTAM line(family) + family + x(plot) + y(plot) + x*y(plot)TBN line(family) + family + x(plot) + y(plot) + x*y(plot)TILL line(family) + family + y(plot)
10
with one degree of freedom for random effects. Likelihood ratio and t-tests with p-values
greater than 0.05 were deemed not significant and the corresponding terms were removed
from the model. While the initial scope of the model included a random block and year
effect, none of the random effects were found to be significant. Following definition of
appropriate models for the studied traits (Table 1.2), least squared means for each trait
were calculated and used for QTL mapping.
Yijklmno = µ+ai(fj)+fj +bk+cl(bk)+dm(bk)+cl(bk)∗dm(bk)+tn+eijklmn+gijklmno (1.1)
Broad-sense heritabilities on a plot means basis (H2) were calculated for each of the
traits. The variance components needed for this calculation were found using a linear
mixed model with plot means as the dependent variable and plot and line as random
independent variables. Variance components for the line or genotypic component (σ2g),
the plot (σ2p), and the residual variance due to environment (σ2
e) were extracted and
equation 1.2 was used to calculate H2. The plot variance (σ2p) was calculated in the
model as a known source of variation in phenotype. Since this plot variance is known, it
does not contribute to unaccounted for environmental variation as seen by the residual
variance (σ2e) and was not used to calculate heritability.
H2 = (σ2g)/(σ
2g + σ2
e) (1.2)
1.3.3 QTL Mapping
We mapped QTL using a model based approach in R/qtl [45, 46] with phenotype, repre-
sented by least squared means, and 25 genetic markers for the NIRILs. The introgression
on the fifth chromosome started as a heterozygous segment in the BC6 generation and
segregates as a S6 population. Consequently, we analyzed the population as a BC0S6
in R/qtl. Genotypes were first used to produce a genetic map for the teosinte segment
introgression using the Kosambi mapping function [47], with a 0.0001 genotyping error
11
rate as implemented in R/qtl. Genetic marker order was initially found by BLAST to
the AGPv2 genome and confirmed using the ripple function in R/qtl with a five marker
window. Significant LOD score thresholds were determined for each trait with a 5% cutoff
based on 10,000 permutations of the data.
QTL models for each phenotype were determined by scanning for potential QTL using
the Haley-Knott regression method and testing for QTL significance one-by-one. Defini-
tion of QTL models was accomplished by first scanning for QTL with the R/qtl function
scanone to find an initial QTL position with a LOD score greater than the 5% cutoff
calculated by permutations. Next, we scanned for additional QTL using the addqtl func-
tion. If this secondary QTL scan detected a QTL that exceeded the 5% LOD score cutoff
defined by permutations, it was added to the model and QTL positions were refined using
the R/qtl function refineqtl. QTL were added to the model using this cycle of: (1) scan-
ning for additional QTL, (2) adding significant QTL to the model, and (3) refining QTL
positions until no more significant QTL could be added. Once all significant QTL were
added, pairwise interactions between QTL were tested using the addint function of R/qtl.
Significant pairwise interactions (F-test, p < 0.05) were added to the model one by one
until no more significant interactions were detected. After the model was finalized, each
QTL in the final QTL model was tested for significance with dropone ANOVA analysis.
1.3.4 Simulation Experiment
In order to explore the theoretical maximum number of detectable QTL possible in this
study, we mapped QTL with simulated datasets where causative genes were randomly
chosen from the genes in the teosinte introgressed region. Simulated traits were made for
one to 15 causative genes, then 20 to 50 genes by fives, and then 75 and 100 causative
genes for a total of 24 different causative gene set sizes. The 25 genotyped markers in
our 259 NIRILs were used to assign genotype probabilities to the 2,576 total genes in
12
the introgressed segment of chromosome five based on the genotype of flanking markers.
These genotype probabilities were assigned based on physical proximity to the two flanking
markers assuming physical distance was proportional to genetic distance so that a gene
closely linked to a given marker had a high probability of sharing that marker genotype.
When consecutive markers had identical genotypes, this method resulted in all genes
between them matching the flanking genotypes.
Phenotypic trait values are based on both the underlying genetic contributions of genes
and random environmental noise, which together define the heritability of a trait. The
genetic values in the simulated data were set as follows. For each simulated dataset, the
randomly chosen causative genes were assigned a genotype based on the previously derived
genotype probabilities and two effect types: equal and random gamma distributed (alpha
= 1.36 and beta = 1) [48]. The effect types for each gene were given a positive, zero,
or negative value depending on whether the assigned genotype was homozygous maize,
heterozygous, or homozygous teosinte, respectively. Thus, each simulated causative gene
had two numeric values (one for equal and one for gamma distributed effects) representing
the magnitude and direction of effect on the trait. The total genetic contribution to NIRIL
phenotype was then found by simply summing the gene values (equal and gamma effects
kept separate) for all simulated causative genes.
Environmental noise was added to the summed NIRIL genetic phenotype values by
taking random draws from a normal distribution with variance equal to the additional
variance needed to reach the desired level of heritability. Two levels of heritability were
simulated, 67% and 90%, to mimic the heritabilities of two actual traits, the moderately
heritable culm diameter and highly heritable ear diameter. Heritability of the simulated
traits was required to be within 2.5% of the desired heritability, otherwise the normal
distribution was resampled. This process resulted in each set of simulated causative genes
13
having four states for the NIRILs: equal effect 67% H2, equal effect 90% H2, gamma
effect 67% H2, and gamma effect 90% H2.
We simulated twenty-four causative gene set sizes with two effect types and two her-
itabilities for a total of 96 distinct simulated states. Each of these states was replicated
1,000 times resulting in 96,000 simulated sets of phenotypes for the 259 NIRILs. These
phenotype values were then used with actual NIRIL genotypes to map QTL in the R/qtl
software using the same method as described in the previous section. Pairwise QTL in-
teractions were not tested for or added in the simulated datasets because interactions
were not part of the simulated conditions. Mapping of QTL for thousands of simulated
traits could not be accomplished manually and consequently was done with a custom R
script that automated the addition of QTL and saved summary information including
QTL estimated effect size, position, LOD scores, and number of QTL.
1.4 Results
1.4.1 QTL mapping
Previous work has shown chromosome five to be home to several high LOD score and large
effect size QTL for a number of inflorescence and plant architecture domestication traits
[8, 25]. We undertook a high resolution mapping experiment with a population of NIRILs
with variable fifth chromosome teosinte introgressions in a W22 maize background. In
the summers of 2009 and 2010, the 259 NIRILs were grown in randomized plots arranged
in three replicated blocks. Phenotype data for thirteen traits was collected for five plants
per plot. Spikelet length was only collected for a single block in the summer of 2010. We
analyzed trait measurements from all three grow environments together in a single linear
mixed model with block and year as random effects and position, NIRIL, and family as
14
fixed explanatory variables. Least squared means were estimated from the mixed models
and later used for QTL mapping.
Histograms of the least squared means show several distribution types including nor-
mal, bimodal, and exponential (Supplemental Figure A.1). NIRILs genotyped as 100%
maize (29 lines) and 100% teosinte (27 lines) were used to determine whether traits
behaved as expected with the full teosinte introgression lines having more teosinte like
phenotypes. Several traits believed to not be primary targets of selection during domes-
tication such as days to pollen shed and plant height appear to have little or no overall
difference between NIRILs containing the maize and teosinte introgression, while traits
that were the primary focus of selection during domestication including kernel row num-
ber (KRN) and ear diameter (EARD) have a substantial phenotypic difference between
homozygous maize and teosinte NIRILs. For all domestication traits, we observed a dif-
ference (sometimes quite small) between the least squared means for maize and teosinte
NIRILs consistent with the expected effect of domestication. Particularly large differences
are shown for EARD and KRN traits, where the maize genotype is 17.3% and 14.8% larger
than the teosinte genotype, respectively. Also of interest is the CULM trait, where the
maize genotype was 6.5% larger than teosinte.
There was a balanced representation of maize and teosinte genotypes with a high de-
gree of homozygosity in the QTL mapping population. Overall genotypes of the NIRILs
were 48.3% maize, 48.2% teosinte, and 3.5% heterozygous. The NIRIL population in-
cluded lines with teosinte introgressions across 162.24 megabases (Mbp), from position
6,985,619 to 169,231,037 on the maize reference genome (AGPv2). This introgression
included 74.47% of the approximately 218 megabase fifth chromosome. Of the 4,503 fifth
chromosome genes on the Filtered Gene Set (version 5b), 411 genes on the tip of the small
arm and 1,516 genes on the long arm were not included in the teosinte introgressions used
in this study. The genetic map generated with the Kosambi mapping function in R/qtl
15
Figure 1.1: Cumulative plot of QTL detected in the mapping experiment. Molecularmarker positions are shown in centimorgans at the bottom. QTL name consisting of anabbreviated trait name, chromosome number, and QTL number are located on the leftside. The 1.5 LOD support intervals for QTL are indicated by horizontal bars and peakLOD scores by vertical lines. Hatched bars indicate interacting QTL while solid barsare non-interacting. In total, 24 QTL were identified across the fifth chromosome with avariety of confidence interval sizes, max LOD scores, and effect sizes (See Table 1.3 forQTL statistics). Five QTL clusters with contiguous regions of five or more QTL 1.5 LODsupport intervals are indicated by grey shading. A grey-scale heat map depicting numberof QTL 1.5 LOD support intervals from white (0) to black (8) is located at the top.
16
was calculated to be 86.64 centimorgans (cM), giving an average Mbp to cM ratio of 1.873
Mbp/cM.
We analyzed 13 traits and identified 24 QTL (Figure 1.1, Table 1.3) with a broad range
of LOD scores ranging from 2.70 (KPR) to 47.22 (KRN). A single epistatic interaction
was detected between the two kernel row number QTL, suggesting epistasis is minimal.
QTL 1.5 LOD support intervals ranged from 2.3 cM (KRN) to 50.6 cM (KPR) with an
average value of approximately 12.5 cM. Heritability on a plot mean basis (Table 1.3) for
each trait varied with an average H2 of 63% and range of 23% (PROL) to 90% (EARD).
Five QTL clusters, defined as contiguous regions with five or more QTL 1.5 LOD support
intervals, were found in the mapping region on chromosome five near 2, 51, 61, 70, and 84
cM (Figure 1.1). There is no clear single concentration of QTL, suggesting this genomic
region lacks a single gene of large, pleiotropic effect and that multiple linked factors at loci
spread across the fifth chromosome are responsible for the previously identified influence
of chromosome five on domestication traits.
1.4.2 Simulation Experiment
We performed a simulation experiment to determine the power and precision of our map-
ping population. Using causative genes projected onto actual NIRIL genotypes, a total
of 96 distinct simulated states in terms of number of genes (between one and 100), heri-
tability (67% and 90%), and effect type (equal and gamma) were replicated 1,000 times
for a grand total of 96,000 simulated NIRIL trait datasets. Histograms of simulated traits
with 90% heritability were clearly bimodal when one causative gene was simulated and
progressively moved towards a normal distribution as more and more causative genes
were simulated. In comparison, simulated traits with 67% heritability lack a clear bi-
modal distribution even when only a single causative gene was simulated and are clearly
approximately normal when 100 genes are simulated (Figure A.2).
17
Table 1.3: Detected QTL for the T5S mapping population with position, heritability, andLOD score statistics.
LOD 1.5 LOD SI Peak Location Percent Variation H2
culm5.1 13.50 58.9 – 69.3 65.3 21.3% 66.5%
dtp5.1 16.36 0.0 – 11.7 2.3 20.1% —dtp5.2 18.76 75.7 – 80.0 77.4 23.6% —
dtp model 28.93 — — 40.1% 67.3%
eard5.1 3.00 0.0 – 24.2 12.9 1.7% —eard5.2 17.99 50.1 – 54.4 51.9 11.7% —eard5.3 33.76 82.9 – 85.9 84.4 25.6% —
eard model 65.62 — — 69.0% 90.0%
earl5.1 12.38 0.0 – 5.4 1.9 19.7% 49.1%
kpr5.1 2.70 0.0 – 50.6 2.2 3.0% —kpr5.2 6.80 44.9 – 64.8 63.2 7.9% —kpr5.3 4.11 76.0 – 86.2 80.9 4.6% —
kpr model 27.41 — — 38.5% 72.7%
krn5.1 6.22 18.8 – 24.7 21.5 4.8% —krn5.2 47.22 82.6 – 84.9 83.8 53.4% —
krn5.1:2 3.32 — — 2.5% —krn model 50.56 — — 59.2% 73.7%
lblh5.1 24.61 75.0 – 81.1 79.0 35.3% 53.5%
plht5.1 7.64 0.0 – 2.4 0.0 11.3% —plht5.2 2.89 24.3 – 39.2 31.7 4.1% —
plht model 14.06 — — 22.0% 63.1%
prol5.1 8.38 56.9 – 71.6 64.2 13.8% 22.9%
splh5.1 9.14 0.0 – 18.7 13.0 10.2% —splh5.2 7.16 65.7 – 68.4 67.7 7.9% —splh5.3 2.78 74.3 – 86.6 78.0 2.9% —
splh model 30.60 — — 41.8% 88.3%
stam5.1 6.50 50.7 – 86.6 83.8 10.9% 25.9%
tbn5.1 8.28 0.0 – 4.0 0.3 13.1% —tbn5.2 4.60 43.6 – 53.2 47.3 7.1% —
tbn model 10.46 — — 16.9% 69.9%
till5.1 7.21 44.1 – 62.9 58.7 9.8% —till5.2 3.22 77.2 – 85.9 81.8 4.2% —
till model 18.61 — — 28.1% 34.3%
18
Since calculating significant LOD score thresholds via permutations for all 96,000
simulated phenotype sets would have taken weeks of computation time, we calculated
LOD score cutoffs in the first 50 replicates of the 96 states. The average threshold was
lower for 90% heritability than 67% heritability with no clear difference in threshold
caused by the effect type of causative genes. Simulated phenotypes with few causative
genes had a lower threshold on average with this effect more pronounced for the gamma
distributed effect type. The range of LOD score thresholds determined was quite narrow
(2.37 to 2.59 for gamma distributed and 2.38 to 2.60 for equal effects). Consequently,
instead of running permutations for the remaining datasets we set a conservative LOD
score threshold for mapping all simulated traits. The cutoff we chose was the maximum
of the 5% cutoffs found in the first 50 replicates of each of the 96 states.
After simulated phenotypes were generated and significance thresholds were set, QTL
were mapped using the 96,000 simulated datasets with actual genotypes for the NIRILs
in this study. Increasing the number of simulated causative genes from one to 100 caused
the mean number of detected QTL to rise from one to ∼4.5 or ∼3.0 for simulated traits
with 90% or 67% heritability, respectively (Figure 1.2). Thus, heritability was an impor-
tant factor in determination of the number of detectable QTL in our experiment. The
simulated gamma effects, as opposed to equal effects, appeared to cause the maximum
number of detectable QTL to be reached at a larger number of simulated causative genes,
but there was no difference in the overall maximum number of QTL detected.
Our results show that QTL 1.5 LOD support intervals quickly become associated with
multiple genes when many causative genes are simulated (Figure 1.3). In the case of
five causative genes with equal effect and 67% heritability, the chance of a QTL contain-
ing a single causative gene has already dropped to approximately 50% (Similar patterns
are seen for gamma simulated phenotypes in Supplemental Figure A.3). This suggests
when making decisions about fine mapping of QTL, researchers would be well advised to
19
Figure 1.2: The number of detected QTL and mean detected QTL effect size versusnumber of simulated causative loci. Black lines indicate 95% confidence intervals. (A)Simulations consistently detect one QTL when a single causative gene is simulated, butwhen using as few as three or four causative genes, we lose the ability to distinguishbetween genes. With high numbers of simulated causative genes, total QTL detectedreaches a ceiling of ∼4.5 QTL for simulated traits with 90% heritability and ∼3.0 fortraits with 67% heritability. (B) The effects of unresolved genes are merged into the fewlarge effect QTL that are detected, consistent with the Beavis Effect. This is seen in thenegative correlation between mean estimated effect and number of causative genes.
20
consider factors such as trait heritability and the power of their mapping population to
identify QTL support intervals that contain single causative genes.
In our simulation experiment, increasing the number of causative genes also led to an
increase in the average estimated effect size of detected QTL (Figure 1.2). We interpreted
this as the effects of multiple underlying causative genes being combined into a single
detected QTL with a cumulative effect, consistent with the Beavis Effect where multiple
small effect loci are detected as single QTL of larger effect [49]. On average, the total
additive effect for each simulated phenotype should be the product of the total number of
simulated causative genes and the average effect size. We found this expected relationship
between number of detected QTL, average estimated additive effect of each detected QTL,
and expected total additive effect for both equal and gamma distributed effect size and
both heritabilities.
Our mapping results using empirical, measured traits, found three QTL for a trait
with heritability of 90% (ear diameter) and a single QTL for a trait with 67% heritability
(culm diameter). Comparison of these results with the simulations show that for traits
with 90% heritability, when three or more QTL are detected there is likely to be anywhere
from four to six underlying causative genes, making a 1:1 relationship between number
of QTL and causative genes uncertain (Figure 1.2). In contrast to this result, simulated
traits with heritability of 67% and a single causative gene averaged a single detected
QTL which contained the causative gene 90% to 95% of the time. These observations
have implications for future fine mapping efforts to identify the causative gene underlying
QTL.
21
Figure 1.3: The proportion of detected QTL with zero, one, or more than one simulatedcausative genes in the 1.5 LOD support interval. High numbers of causative genes lead todetected QTL that contain multiple causative genes. There is a reasonable percentage ofdetected QTL in the simulations that contain a single causative gene when few (less than4) causative genes are simulated, but as the number of simulated causative genes increaseswe quickly lose the power to distinguish between closely linked causative genes and theybecome lumped into single detected QTL. Equal effect simulations shown here are verysimilar to those seen for the gamma distributed effects (Supplemental Figure A.3).
22
1.5 Discussion
Previous studies in maize have found single genes underlying genomic regions of large
effect on multiple domestication traits [3–5, 41, 50]. This is in stark contrast to our work
on chromosome five, where the previously observed large effect of chromosome five on
several domestication traits in maize [8, 25] is caused by multiple regions spread across
the chromosome. This suggests the nature of genetic factors controlling domestication
traits on chromosome five of maize are different from other large domestication loci in
maize. Whether or not the situation of chromosome five in maize is unique in maize
or crop plants is yet to be seen, but the several loci identified in this study suggest
that in addition to effectively acting on highly pleiotropic, large effect single genes, the
domestication process also has the capacity to work on several linked genes of variable
effect to produce a chromosomal region of large QTL effect.
Although our results show that several regions on chromosome five contain QTL af-
fecting different traits, this chromosomal region was initially defined as several tightly
clustered QTL in F2 crosses between teosinte and a small-eared primitive Mexican lan-
drace [43]. In contrast, our NIRIL population was developed from a cross of teosinte by a
modern agronomic maize inbred (W22) and is expected to harbor domestication QTL as
well as improvement QTL selected on during the past 9,000 years since maize was domes-
ticated. Thus while results from this analysis suggest chromosome five houses a complex
made of multiple linked factors, we cannot discount the possibility that a simpler genetic
architecture would have been observed had we used a primitive maize landrace rather
than the maize W22 inbred line.
One potential use of QTL mapping results is interrogation of the genes within QTL 1.5
LOD support intervals for likely candidates. The marker density in our experiment leads
to most QTL 1.5 LOD support intervals containing hundreds of annotated genes. How-
ever, two QTL had a narrow support interval that contained a relatively small number of
23
genes. These two QTL were krn5.2 and eard5.3, which co-localize to the same ∼2.3 cM
region. When expanded to the nearest genetic markers, these QTL fell between umc1348
and um1966, which spanned a 4.81 cM region that included 2.654 Mbp with 54 genes
from the maize filtered gene set (AGPv2). One interesting candidate that falls in this
range is AC212823.4 FG003, which encodes a MADS box transcription factor previously
cataloged as MADS-transcription factor 65 (mads65) in the GRASSIUS transcription
factor database [51]. Initially identified in plants as important floral organ identity reg-
ulators [52, 53], the MADS-box family of transcription factors has since been shown to
be involved in a wide variety of developmental programs in various organs and stages
of plant development [54]. This particular MADS-box gene has homology to the rice
gene OsMADS57, a type II MIKCC MADS gene. The large subclass of MIKCC MADS
genes is quite diverse with members involved in floral specification, phase transition, and
root development among other developmental functions [54]. This gene was also found to
be selected during crop improvement by a recent study [55] and was expressed in many
tissues as described in the maize gene expression atlas [56]. All of these factors make
AC212823.4 FG003 an attractive candidate in future studies to fine map the causative
gene for kernel row number on chromosome five.
The limits of a QTL experiment in terms of power and resolution are important factors
to consider when undertaking an experiment in any mapping population. To better inform
our QTL results with empirically measured traits, we explored the computational limits of
the experimental mapping population using simulated trait datasets. In this experiment,
we never detected more than six QTL for any of the simulated conditions. The most
important characteristic of simulated traits in determining number of detected QTL was
heritability and not effect type. As expected, when the number of underlying causative
genes increased to a high level, we saw the effect of multiple causative genes being rolled
into single detected QTL. This result is consistent of the Beavis Effect [49], a phenomenon
24
that describes the tendency for QTL of small effect to be combined into a single QTL with
large estimated effect. If these polygenic QTL, which can have quite high LOD score and
effect size, were chosen for fine mapping we would be unlikely to find a single underlying
causative polymorphism. Consequently, when considering QTL for fine mapping purposes,
researchers must be careful in choosing QTL that have high heritability and mapping
populations with sufficient power to resolve QTL to single genes. It is important to
realize that the simulation results reflect the specific markers, genotypes, and mapping
population used in this study. While some results are likely generally applicable to other
QTL experiments, simulations using mapping population specific parameters will provide
the best insight into potential genetic architectures and information on population power
and precision.
QTL mapping has been used to great effect to characterize the genomic regions con-
trolling traits selected on during domestication in maize. These studies have shown that
while genetic factors controlling domestication traits are spread throughout the genome,
there are concentrated genomic regions where QTL for several domestication traits are
in close proximity to each other [8, 25]. In this study, we use a QTL mapping popula-
tion of NIRILs with teosinte introgressions specific to chromosome five to closely examine
previously mapped QTL for a number of domestication traits. We confirmed QTL for
these traits exist on chromosome five, however, in our population these QTL further frac-
tionate into multiple QTL. This is in contrast to other genomic regions of large effect
in maize where single pleiotropic genes were identified as the causative factor underly-
ing genomic regions of large effect [3–5, 50]. The presence of multiple QTL in several
locations on chromosome five suggests the existence of a complicated, linked, multi-gene
locus controlling various aspects of domestication traits. This apparent complexity of the
chromosome five locus is consistent with results from our simulation experiment, where
25
we show that traits with multiple mapped QTL likely have a more complicated underlying
genetic architecture than is indicated by the initial QTL mapping results.
26
Chapter 2
Fine mapping of chromosome five
domestication genes in maize
27
2.1 Abstract
The fifth chromosome of Zea mays has previously been shown to contain a large ef-
fect QTL for several domestication traits. In this work I describe efforts to identify the
causative polymorphisms responsible for several of these QTL for the domestication traits
of culm diameter and kernel row number. These two QTL represent the first and eighth
highest LOD scores detected in the QTL mapping experiment of chapter 1. We utilized
several heterogeneous inbred families drawn from a BC2S3 mapping population that were
heterozygous in the 1.5 LOD support interval of these QTL to generate two sets of recom-
binant chromosome nearly isogenic lines, one for the culm diameter QTL and one for the
kernel row number QTL. Lines were grown in replicated, randomized blocks in four years
and phenotypes were measured. A linear mixed model was used to obtain least squared
means for each line and we looked for segregation of the phenotype based on indel and
genotyping by sequencing markers. Simple Mendelian segregation of the lines was not
observed for any of the traits of interest, suggesting a single locus does not explain the
differences in phenotype. Consequently, we used QTL mapping software to map QTL in
the segregating regions of interest on chromosome five for culm diameter and kernel row
number. These analyses showed a highly significant heterogeneous inbred family effect
as well as multiple QTL in the target region for kernel row, suggesting the genetic fac-
tors underlying kernel row number and culm diameter have a complex relationship with
multiple loci on several chromosomes.
28
2.2 Introduction
The ultimate goal of many studies investigating the evolution of novel morphology in di-
vergent lineages is identification of the causative genes responsible for phenotypic change.
Towards this end, genes causing new forms have been identified a number of times in
many species including maize, tomato, wheat, barley, and most successful in rice. Over
the years there have been more than 20 genes identified in rice with important effects
on agronomic and domestication phenotypes such as loss of shattering in domesticated
plants [15], increased grain yield in terms of grain number [57], grain weight [58, 59], and
plant architecture [60, 61]. In contrast, there are considerably fewer success stories in fine
mapping in other organisms. In maize, recent experiments have mapped several high LOD
score, large effect domestication QTL to single genes including teosinte branched1 (tb1 )
[3, 41], grassy tillers1 (gt1 ) [4, 39], teosinte glume architecture1 (tga1 )[5], and ZmCCT
[50, 62]. One common characteristic of these genes is they were initially characterized as
massive, high LOD, large effect size QTL.
In maize, domestication phenotypes have been shown to be largely controlled by six
regions of the genome [8]. The large concentration of domestication QTL on the fifth
chromosome has been repeatedly observed in several studies [25, 37, 43], however, little
is known about the causative genes and underlying polymorphisms that cause this large
effect. Experiments designed to examine chromosome five in maize have several challenges
caused by characteristics of the chromosome. First, this chromosome has gametophyte fac-
tor2 (ga2 ) [63], a pollen incompatibility factor which greatly influences pollination rates of
specific genotype combinations. Second, there is an extended region of low recombination
rate around the centromere (102.3 megabase to 109.2 megabase) that complicates collec-
tion of recombinant chromosomes for mapping experiments. In spite of these challenges,
characterizing the many domestication QTL for plant architecture and inflorescence traits
on the fifth chromosome of maize is a necessary step towards fully understanding the ef-
29
fect domestication had on the maize genome. While many traits have QTL that map
to the fifth chromosome, QTL with exceptionally high LOD score and effect size are of
particular interest for fine mapping studies.
A high LOD score, large effect QTL for kernel row number (krn) and ear diameter
(eard), previously reported on chromosome five of maize [25, 37, 43], was shown to frac-
tionate into at least two or three QTL in Chapter 1. The largest QTL for both of these
traits in terms of LOD score and effect size (eard5.3 and krn5.2 ) were both located to-
wards the right of the mapping interval between umc1966 and umc1348. The krn5.2 QTL
had a LOD score of 45.2, explained 51.98% of phenotype variation, and was estimated
to have an additive effect of -0.73 kernel rows. The co-localizing eard5.3 QTL also had
a trait high 32.7 LOD score, 25.1% variation explained, and effect of −1.41 mm. This
region was ∼1.3 cM or 2.65 Mb and was the narrowest confidence interval found for the
mapping population used in chapter 1. The kernel row number and ear diameter traits
are highly related, both affecting ear size in the transverse plane. This fact, viewed in the
context of co-localization of eard5.3 and krn5.2, suggests a single gene influences both
traits.
In addition to the high LOD score QTL for krn and eard, the fifth chromosome of
maize was shown (Chapter 1) to have QTL for plant architecture traits including tillering,
lateral branch length, and culm diameter. The QTL for culm diameter in chapter 1 had
the eighth highest LOD score detected. In contrast with the krn5.2 and eard5.3 QTL,
mapping for culm diameter revealed a single QTL of moderate effect, culm5.1. This QTL
had a considerably larger 1.5 LOD support interval (97.3 megabases), lower LOD score
(19.8), lower variation explained (21.27%), and smaller additive effect size (−0.67 mm).
The characteristics of culm5.1 in terms of number of QTL, LOD score, and effect size
make for a different type of fine mapping candidate than krn5.2 and eard5.3.
30
An experiment was designed to further investigate and identify the causative poly-
morphisms behind the large effect and LOD score krn5.2 /eard5.3 QTL and the moderate
effect culm5.1 QTL. This project used a collection of recombinant chromosome nearly iso-
genic lines (RCNILs) grown in replicated randomized blocks over multiple years. These
RCNILs were derived from heterogeneous inbred families (HIFs) drawn from a BC2S3
population with a massive ear diameter QTL [25] with a maximum LOD score of 144.4.
Lines were generated, genotyped, and grown in replicated blocks in the summers of 2010,
2012, and 2013. RCNILs did not segregate cleanly in the target QTL 1.5 LOD support
intervals for the kernel row number and culm diameter phenotypes. I next used genome-
wide genotyping and QTL mapping methods to account for secondary segregating regions
in the genome. The results of this analysis suggest that not only are secondary sites segre-
gating with significant effects on kernel row number and culm diamter, but that multiple
factors are again segregating within the initial target QTL support interval. Overall, these
results suggest the genetic architecture controlling domestication traits is quite complex
with multiple loci contributing to kernel row number and culm diameter phenotypes across
the genome. Chromosome five in particular appears to house a collection of genes affect-
ing several domestication traits and represents at least three linked loci that may have
been selected as a unit during maize domestication.
2.3 Materials and Methods
2.3.1 Plant material
We chose to identify the causative genes underlying the large LOD score and effect size
QTL for kernel row number (krn) and culm diameter (culm) on chromosome five using re-
combinant chromosome nearly isogenic lines (RCNILs). These lines consist of individuals
carrying two copies of a recombinant chromosome with a recombination breakpoint in the
31
region of interest, which corresponds with the 1.5 LOD support intervals for culm5.1 and
krn5.2. Based on QTL mapping results from chapter 1, the two QTL are adjacent to each
other with the culm diameter QTL from 54,416,924 to 151,717,831 bp and the kernel row
number QTL from 166,576,639 to 169,231,037 bp. Base pair coordinates for these QTL
are based on BLAST of flanking marker primer sequences against the second version of
the maize reference genome (AGPv2) [9]. I chose to generate RCNILs from segregating
heterogeneous inbred families (HIFs) taken from a large BC2S3 mapping population. Four
founding HIFs, two per QTL, heterozygous for the genomic region of interest defined by
QTL 1.5 LOD support intervals and surrounding regions were used in production of RC-
NILs. Care was taken to use HIFs with limited heterozygosity adjacent to the primary
region of interest and elsewhere in the genome.
A large number of plants from each HIF were screened with PCR based insertion dele-
tion (indel) markers flanking the region of interest to identify plants with recombinant
chromosomes in the summers of 2009 and 2010. The initial screening of HIFs for indi-
viduals with recombinant chromosomes used three flanking markers (ZHL0029, ZHL0033,
and umc1966) located at 38,994,478 bp, 151,446,717 bp, and 169,230,959 bp, respectively.
These markers were chosen to be as close as possible to the boundaries of the QTL.
Individuals with recombinant chromosomes were self pollinated and seed was harvested
and planted in the following winter grow seasons. Plants were grown in winter seasons
in a greenhouse environment, where they were genotyped again at the flanking markers
to identify plants homozygous for the initially detected recombinant chromosome. These
individuals were then self pollinated to make RCNIL seed, carrying two copies of the origi-
nal recombinant chromosome, for use in subsequent summers for randomized phenotyping
blocks and seed increasing purposes.
32
2.3.2 Field Trials and Phenotypes
The RCNILs were grown in a total of 16 replicated, randomized blocks in multiple sum-
mers between 2010 and 2013. Phenotyping experiments took place at the West Madison
Agricultural Research Station (WMARS) with RCNILs for the culm5.1 QTL grown in
2010 and 2012 with the krn5.2 QTL lines grown in 2012 and 2013. When possible, seed
for a single RCNIL was taken from up to five seed packets and mixed prior to planting in
order to minimize the effect of any single seed lot (mother plant) on phenotype. In each
summer, four blocks of RCNILs per QTL were grown in twelve plant plots. Individuals
were planted with equal spacing in 14 foot rows with 30 inches between adjacent rows
and two foot walkways separating the end and start of a new row. Up to five individuals
per plot were measured in 2010 and 2012, while in 2013 kernel row number was assessed
for all possible plants.
In addition to twelve plant plots, select lines for the culm diameter QTL were grown in
a phenotyping block of fully randomized single plant plots (SPP) in the summer of 2012.
This block of plants consisted of 60 individuals each from seventeen RCNILs and eight
control RCNILs (homozygous for the maize or teosinte chromosomal segment) grown in a
completely randomized scheme. The seventeen RCNILs were chosen due to recombination
breakpoints being close to preliminary estimates of the causative gene location based on
initial analysis of data from the summer of 2010. Individual plants were separated by
a larger than normal distance (30 inches in the X and 48 inches in the Y dimension) in
order to allow them to grow to their full phenotypic potential with minimal competition
and shading from neighboring plants.
Traits were measured by hand with culm diameter taken manually in the field with
calipers at the narrowest point of the stalk and kernel row number counted after harvest
in the lab (2012) or in the field (2013). In the SPP we also measured culm diameter at the
largest point to calculate culm area and other basic plant architecture traits (plant height
33
and tiller number) for use in later analyses. In total, 3,182 individuals were assessed for
culm diameter (1,021 in 2010 and 2,161 in 2012) and kernel row number was counted
for 8,625 individuals (3,168 in 2012 and 5,457 in 2013). A highly related trait to kernel
row number (ear diameter) with a co-localizing QTL detected in chapter 1 (See Table 1.3
for details) was also measured in some environments, but was not considered for later
analyses since kernel row number and ear diameter are highly related traits.
2.3.3 Genotyping with PCR and next generation sequencing
Genomic DNA was extracted from the initial plant of each RCNIL with a standard CTAB
method and genotyping from this “founder” individual was used to represent the RCNIL
genotype in later analyses. The genotypes of RCNILs were obtained using two strategies,
a PCR based method targeting known polymorphisms and a high throughput next gen-
eration sequencing protocol. All RCNILs were genotyped using PCR of known indels and
single nucleotide polymorphisms (SNPs), while a subset were genotyped using the high
throughput genotyping by sequencing (GBS) protocol. All RCNILs developed for fine
mapping of krn5.2 were genotyped by GBS while only a subset of culm5.1 RCNILs were
genotyped by GBS. However, genotyping of culm5.1 lines was done with a more extensive
collection (18 markers) of PCR markers than krn5.2 RCNILs (5 markers).
PCR based genetic markers (Supplemental Table B.1) were used to genotype RC-
NILs with standard agarose gel electrophoresis, florescent fragment analysis, and Sanger
sequencing detected SNPs. These three styles of marker were initially developed by identi-
fication of scorable polymorphisms that distinguished maize and teosinte control RCNILs
through Sanger sequencing of annotated genes in the maize reference genome (AGPv2).
Size polymorphism differences greater than approximately 10% of total PCR product
length were scored on 4% agarose gels and smaller size polymorphisms were redesigned
with florescently labeled primers and genotyped using GeneScan software (v1.70) from
34
Applied Biosystems. If the only scorable polymorphism was a SNP, RCNILs were geno-
typed by Sanger sequencing and hand calling of SNPs.
While great care was taken to choose founding HIFs with minimal heterozygosity, all
HIFs had secondary sites segregating elsewhere in the genome. In order to identify these
regions and account for their effect on phenotype, we performed GBS [64] on RCNIL
genomic DNA for all kernel row number and the subset of culm diameter lines grown in
the single plant plot experiment. In order to use the GBS protocol, additional molecular
work was required. DNA was treated with 1 µL of RNaseI at room temperature for 30
minutes to remove total RNA from the CTAB DNA preparation. Next, the samples were
digested using the methylation sensitive ApeKI restriction enzyme and 96-plex barcoded
sequencing adapters were ligated to individual samples. Finally, the 96 barcoded samples
were mixed and sequenced (100 bp reads) on an Illumina HiSeq machine [64]. Sequence
tags were aligned to the reference maize genome (AGPv2) and SNPs were called and im-
puted using the GBS pipeline as implemented at Cornell University. This GBS procedure
resulted in 955,650 SNPs made up of raw A, T, C, and G SNP calls for the RCNILs across
the ten maize chromosomes.
Raw SNP calls were further processed in order to call RCNIL genotypes into maize
and teosinte using a custom Perl script. The genotype calls were made using SNP calls
from the pure maize parent inbred line, W22. Only biallelic markers (43,025 total) were
kept and the non-W22 SNP in the RCNILs was assumed to be the teosinte allele. After
converting the genotypes into maize, teosinte, and heterozygous calls, SNPs separated
by less than 100 base pairs were merged into a single marker, leaving 25,736. If SNP
genotypes within a merged marker did not agree they were converted to missing data,
“N”. After marker genotypes were called and merged, a final genotype imputation step
was carried out using another custom Perl script. In an effort to have this script correct
bad and missing data, all genotype calls were subject to imputation. The criteria for
35
changing a call in any given RCNIL involved ten marker windows both upstream and
downstream of a given marker. If all markers in one direction or the other were 100%
consistent, then the genotype was changed if and only if seven of the ten markers on the
other side were also the same genotype.
The imputation methods for GBS data described above greatly improved genotype
continuity, however, certain regions of the genome were still questionably called. The
most inconsistently called genomic regions included extended heterozygous and recom-
bination breakpoints where genotypes switched. Following the processing steps using
custom Perl scripts, the data were manually screened to remove and correct inconsis-
tently called markers. Uninformative markers where the adjacent marker to either side
had exactly the same genotypes were also removed from the dataset. Regions of the
genome where RCNILs had maize and teosinte fixed genotypes that associated with HIF
(non-segregating regions fixed for different genotypes in the founding HIFs) were also re-
moved. Finally, independently segregating regions on the same chromosome were given
unique chromosomes names (5a, 5b, etc.) to avoid inflation of the genetic map between
fixed ancestral recombination breakpoints. After imputation and filtering, a total of 522
genome-wide GBS markers spread across 13 segregating regions of the genome on six
chromosomes were used in the final analysis. The four other maize chromosomes were
completely fixed for a single homozygous genotype and consequently were excluded from
the analysis.
2.3.4 Statistical analysis and segregation of phenotypes
We utilized the statistical program SAS to fit a linear mixed model with the PROC
MIXED command [44]. Variables used in the model included the RCNIL, HIF, block, year
grown, and position within the block. A forward model selection method was used in which
the starting model had a minimum number of variables (fixed effects for HIF and RCNIL
36
nested in HIF) and additional variables were added to the model one at a time until the
Aikake Information Criterion (AIC) reached its lowest point. The most complicated model
selected was for the culm diameter single plant plot experiment, where five explanatory
variables were used (Table 2.1). In these models Y stands for the measured phenotype,
µ stands for the grand mean, ai the RCNIL, fj corresponds to the HIF, bk the block, cl
and dm denote the horizontal and vertical position in block respectively, tn stands for the
year, hp is the tiller number phenotype used in SPP experiment only, and finally e and g
are error terms. While the single plant plot culm diameter had the most complex model,
the other models had only one less variable.
Least squared means for the RCNIL nested in HIF effect were extracted and used
as an average line phenotype value for subsequent analyses. The goal of these analyses
was to associate RCNIL phenotype and genotype. If a single locus in the segregating
region is responsible for the phenotypic effect, one should observe simple, clean Mendelian
segregation of least squared means based on genotype. Towards this end, RCNILs were
sorted by phenotypic value (as represented by least squared mean). Unfortunately, we did
not observe segregation of least squared means based on genotype, suggesting multiple
factors influence the measured traits.
We have two main hypotheses as to why RCNILs failed to segregate in a Mendelian
manner. First, the less advanced nature (in comparison to the BC6S6 population from
chapter 1) of the BC2S3 founding HIFs of the RCNILs may have additional factors seg-
regating elsewhere in the genome that are confounding Mendelian segregation. Second,
the primary locus of interest on chromosome five is not a single gene, but rather multiple
linked genes that when split up by the various recombination breakpoints in our RCNILs
leads to complicated segregation patterns. In order to investigate both of these possi-
bilities, we obtained whole genome genotypes using GBS (described above) and mapped
QTL in the R/qtl software package for plants grown in twelve plant rows. The single
37
Table 2.1: Final linear mixed models used to produce least squared means for fine mappingRCNILs.
Trait Linear Mixed Model
culm (rows) Yijkmo = µ+ ai(fj) + fj + bk + dm(bk) + eijklm + gijkmo
culm (SPP) Yijlmop = µ+ ai(fj) + fj + cl + dm + hp + eijklmnp + gijkmnpo
krn (rows) Yijkno = µ+ ai(fj) + fj + bk(tn) + tn + eijklmn + gijkmno
38
plant plot culm diameter experiment was not analyzed with QTL mapping methods since
it only included seventeen RCNILs and consequently lacked power for a QTL analysis.
The benefit of using GBS and the statistical methods of QTL mapping are simulta-
neous exploration of multiple factors in the target QTL region and secondary genomic
regions of significant effect outside the QTL. A potential flaw in this approach is lack of
statistical power to differentiate closely linked, moderate effect QTL in the relatively small
RCNIL fine mapping populations. The full set of RCNILs was used to map QTL for the
krn and culm diameter traits in order to maximize our potential power to differentiate be-
tween tightly linked factors. In total, 75 lines were used in mapping of the culm5.1 QTL
(67 recombinant RCNILs and 8 homozygous maize and teosinte controls). The krn5.2
QTL was mapped with 92 lines, all of which were recombinant chromosome lines. QTL
mapping was conducted using the R/qtl package [45] with genetic maps calculated using
the Kosambi mapping function with 0.001 error rate. Ten thousand permutations of the
data were used to define a significant QTL threshold. QTL were mapped using a step-wise
model based approach where QTL were added to a model one-by-one using the addqtl,
fitqtl, and refineqtl functions of R/qtl until no more significant QTL were detected. In
addition to using detected QTL in the model, the founding HIF was used as an additive
covariate to account for variation caused by fixed non-segregating regions of the genome
that differ between HIFs removed in the manual curation of GBS genotypes. Details of
the step-wise QTL mapping method are available in chapter 1 methods.
2.4 Results
2.4.1 RCNIL generation and phenotype least squared means
I screened 4,180 total individuals from the four founding HIFs for recombinant chro-
mosomes in the summers of 2009 to 2011. In total 67 and 92 recombinant individuals
39
Figure 2.1: Histograms of least squared means for the culm diameter and kernel rownumber phenotypes. Distribution of least squared means is approximately normal for theculm diameter least squared means. The kernel row number counts have a noticeableleft skew. Average least squared mean for homozygous teosinte and maize RCNILs (des-ignated by solid and dashed lines, respectively) have the expected relationship with theteosinte average always being the lower phenotypic value.
40
were identified and turned into RCNILs in the 1.5 LOD support intervals of culm5.1 and
krn5.2 /eard5.3, respectively. The vast majority (3,230 of 4,180) of screened individu-
als came from HIFs intended for study of the culm diameter QTL. This large number
of individuals was required due to the presence of the centromere in the middle of the
target QTL region, which greatly reduced recombination rate and limited the number of
recombinant individuals.
Three linear mixed models were used to analyze the phenotype data for kernel row
number and culm diameter. Each model was selected using a forward selection method in
which one variable was added to the model at a time until the model fit, as measured by
AIC, did not improve. When plotted as histograms, the least squared means of the various
RCNILs followed a roughly normal distribution for culm diameter, while the kernel row
number trait had a left skew. RCNILs homozygous for the maize and teosinte segment
showed the expected relationship with maize RCNILs having larger culm diameter and
more kernel rows (Figure 2.1).
2.4.2 PCR and GBS genotyping
Initial genotyping of the RCNIL homozygous recombinant chromosome genomic DNA
was carried out through traditional methods using PCR. In total I placed 18 markers on
75 RCNILs (including four maize and four teosinte control lines) for the culm diameter
QTL and five markers on 92 RCNILs (no maize or teosinte control lines) for the kernel
row number QTL (Table B.1). Only five markers were placed on kernel row number
RCNILs for two reasons. First, the kernel row RCNILs had recombination events in a
much smaller physical distance (17.78 Mb versus 112.45 Mb for the culm diameter QTL).
Second, we expected to obtain thorough genome-wide genotyping using GBS, which had
already been initiated for the krn RCNILs.
41
Fig
ure
2.2:
GB
Sge
not
yp
esfo
rke
rnel
row
num
ber
RC
NIL
s.T
hir
teen
regi
ons
acro
ssth
ege
nom
ear
ese
greg
atin
gin
the
kern
elro
wnum
ber
RC
NIL
s.T
he
pri
mar
yQ
TL
ofin
tere
stis
loca
ted
inth
e5b
regi
onw
her
eal
lR
CN
ILs
hav
ecr
osso
ver
even
ts.
Sec
ondar
yse
greg
atin
gre
gion
sin
only
one
ofth
etw
ofo
undin
gH
IFs
are
clea
rly
vis
ible
for
seve
ral
genom
icre
gion
s,fo
rex
ample
chro
mos
ome
8cse
greg
ates
inH
IFM
R08
41but
not
MR
0818
.F
igure
issc
aled
tom
arke
r,so
each
unit
ofle
ngt
hre
pre
sents
asi
ngl
em
arke
r.
42
Of the nearly one million original SNP calls, only about 5% appeared to be segregat-
ing in a biallelic manner. The end genotyping resulted in zero segregating markers on
chromosomes one, two, four, and six. The structure of the founding HIFs implies each in-
dependent region of the genome that was heterozygous segregates independently of other
regions. To account for this, each segregating region was assigned its own linkage group
(5a, 5b, etc.) for QTL mapping so that non-segregating segments between heterozygous
regions would not influence the results. Overall, 522 markers in 13 linkage groups were
segregating across the six other chromosomes (Figure 2.2).
2.4.3 QTL fail to segregate as Mendelian traits
RCNILs were sorted by the phenotype least squared mean from least to greatest and we
looked for distinct maize and teosinte RCNIL groupings. There was not a clean segregation
of RCNILs into maize and teosinte classes for a single marker, suggesting multiple factors
within the primary QTL of interest or elsewhere in the genome are influencing the traits
of interest (Figure 2.3). The culm diameter trait came closer than the kernel row number
trait to clean segregation, especially for lines planted in the single plant plot.
An additional complication for both the culm diameter and kernel row number fine
mapping was the distinct difference between the grand mean of RCNILs derived from
different founding HIFs. For kernel row number, there was an average difference of ap-
proximately 1.8 kernel rows between RCNILs from different HIFs and the average rank
of HIFs differed by over 40. For culm diameter, the two HIFs differed by an average of
approximately 0.1 cm (Figure 2.4). Founding HIF was part of the linear mixed model used
to produce least squared means, but obviously the model failed to fully correct for differ-
ences between founding HIFs. With this in mind, HIF was used in subsequent mapping
methods to further account for differences caused by the starting HIFs.
43
Figure 2.3: RCNILs sorted by phenotype from least to greatest. Genotypes for RCNILsare indicated on the left by green (teosinte), yellow (maize), grey(heterozygous), or white(N) with least squared means as barplots on the right. (A) Culm diameter least squaredmean from the twelve plant rows. (B) Culm area as measured in the single plant plots. (C)Kernel row number counted from twelve plant rows. A single causative gene should leadto segregation as a Mendelian locus when sorting RCNILs by phenotype. This was notseen and the genotypes appear more or less random suggesting multiple factors influencingphenotype in the RCNILs.
44
Figure 2.4: Density plots of the culm diameter and kernel row number phenotypes groupedby founding HIF. Distinct differences between distributions are visible between the twofounding HIFs for culm diameter (both in the (A) single plant plot and (B) twelve plantrow designs) as well as for the (C) kernel row number phenotypes. The overall phenotypemeans for each HIF are designated by the dashed line for the red HIF and the solid linefor the blue HIF).
45
2.4.4 Multiple factors contribute to culm diameter and kernel
row number
QTL mapping was performed using least squared means as phenotypes and merged geno-
types from GBS and PCR methods. Since a limited number (17) of culm diameter RCNILs
were genotyped with GBS, we used PCR markers only for culm diameter mapping and
consequently QTL were only mapped in the primary segregating region of interest on
chromosome five between 59.6 Mb and 144.8 Mb. In contrast, all 92 RCNILs generated
for fine mapping of the kernel row number phenotype were genotyped by GBS allowing
for full accounting of QTL in genomic locations away from the primary QTL of interest.
A single QTL was detected for the culm diameter trait, suggesting a single factor could
be responsible for culm5.1 (Figure 2.5). However, there was a very significant founding
HIF effect in the QTL mapping model (Table 2.2). This founding HIF effect (F-test, p
< 8.59e-10) suggests secondary sites in the genome are still at play and could explain the
inability to observe clean, simple segregation of RCNILs based on genotype in the QTL of
interest. While mapping of a single QTL for culm diameter is encouraging, the relatively
weak QTL LOD score (5.1) and small additive effect (-0.035) in comparison with the HIF
LOD (8.7) and effect (0.098) tells us that secondary sites are more important contributors
to culm diameter than the QTL we were seeking to fine map.
Results for kernel row number QTL mapping are not particularly comparable to the
culm diameter results due to the inclusion of full genome genotypes, which extended
the mapping to segregating sites elsewhere in the genome. Four QTL were detected
(Figure 2.5), two in the primary region of interest on chromosome five with a single QTL
each detected on chromosomes seven and ten (Table 2.2). Unsurprisingly, the founding
HIF once again had a very significant effect (F-test, p < 2e-16). The two QTL in the target
region had the highest LOD and additive effect of mapped QTL. Like culm diameter, the
HIF effect had the overall highest LOD score and effect.
46
Figure 2.5: QTL LOD profiles for fine mapping of culm diameter and kernel row numbertraits. QTL are color coded and labeled as “chromosome@position”. So the highest LODscore kernel row number QTL ([email protected]) should be read as QTL on chromosome 5b atposition 7.0. (A) Culm diameter LOD profile for the single QTL detected in the primarymapping region. LOD score (y-axis) versus map position in centimorgans is shown. (B)Kernel row number LOD profiles for four detected QTL, two in the primary region ofinterest on chromosome 5b. Secondary QTL on chromosomes 7b and 10a have lower LODscore and effect size than the 5b QTL. In addition to significant QTL, a highly significantHIF effect with high LOD score (culm = 8.688, krn = 19.787) was also included in QTLmodels for both traits.
47
Table 2.2: Detected QTL and HIF effects including LOD, percent variation explained,and additive effect.
Name LOD Var. Explained (%) Add. Effect
krn5b.1 10.161 7.28% -0.413krn5b.2 13.309 10.40% -0.387krn7b.1 4.758 2.95% 0.264krn10a.1 5.391 3.40% -0.114krn HIF 19.787 18.59% -1.550
krn model 44.127 89.02% —
culm5.1 5.126 18.73% 0.0975culm HIF 8.688 35.69% -0.0353
culm model 11.081 49.36% —
48
2.5 Discussion
2.5.1 The complex genetic architecture of culm and kernel row
number
Efforts to identify causative factors underlying QTL have recently been met with great
success in maize. These successful studies have identified genes contributing to loss of
prolificacy [4], day length neutrality [50, 62], liberation of the kernel from its fruitcase [5],
and apical dominance [3]. Our study set out to contribute to this growing list of genes
by examining domestication QTL affecting important traits on the fifth chromosome.
Unfortunately, we were unable to identify a single gene contributing to the domestication
traits of culm diameter and kernel row number. Instead, we found evidence of multiple
factors on chromosome five and other chromosomes controlling kernel row number and
culm diameter suggesting the underlying genetic architecture for these traits is quite
complex.
Prior analyses identified domestication QTL for culm diameter and kernel row number
spanning the fifth chromosome from 54.4 to 151.7 Mb and 166.6 to 169.2 Mb, respectively.
Using QTL mapping with fine mapping RCNILs produced mixed results. The culm5.1
QTL was further refined to a much smaller region from 83.74 Mb to 86.26 Mb on the fifth
chromosome, a reduction in size to ∼2.5 Mb from an initial 1.5 LOD support interval of
close to 100 Mb. Unfortunately, the RCNILs used to fine map krn5.2 resulted in multiple
causative QTL on the fifth and other chromosomes while also harboring major differences
between founding HIFs. The fine mapping QTL for kernel row number closest to the
original target region was located from 160.7 Mb to 163.94 Mb on the fifth chromosomes,
shifted upstream of the original interval by approximately 3 Mb. It is interesting that
the largest LOD score QTL from chapter 1 moved and fractionated into multiple factors,
while the comparably smaller LOD score and effect size QTL culm5.1 was narrowed to
49
an interval ∼2.5% the size of the original interval. In terms of number of genes in the 1.5
LOD support intervals, the fine mapping region for culm diameter had a total of 40 genes
and the kernel row number QTL had 63 genes. While this fell short of the ultimate goal
of a single gene, a small enough number of genes are in the confidence intervals to begin
looking for interesting candidate genes.
The forty genes in the culm diameter QTL were characterized by looking at functional
annotation, expression results from chapter 3 of this thesis, and inclusion in selection
features from a recent genome-wide population genetics scan in maize [55]. In terms of
protein functional annotations, these genes had a variety of biological functions such as
nucleases, transmembrane proteins, metabolic enzymes, chlorophyll binding proteins, and
a number of transcription factors. Gene regulatory differences from the allele specific
RNAseq experiment gave results for 21 of the 40 genes. Seven genes were classified as
having a significant cis regulatory change. However, none of these seven genes were part of
the final filtered candidate gene list. Eight of the forty genes were also inside domestication
selection features, suggesting genes in the culm QTL were under positive selection during
maize domestication. While evidence points to differential gene expression and selection
on the genes in the culm5.1 QTL, no single gene has multiple lines of supporting evidence.
Genes in the highest LOD score kernel row number QTL were also examined for
interesting candidates. Like the culm5.1 QTL, genes in the kernel row QTL had many
different functions including ubiquitin association, ribosomal proteins, nucleases, nuclear
transporters, and several transcription factors. Of the 63 total genes, 36 were not assessed
by the chapter 3 RNAseq experiment. The majority of the assayed genes (24) were
not on filtered candidate gene lists for cis regulatory change, however three genes were.
The most interesting candidate is an armadillo repeat containing protein with a U-box
domain. Armadillo proteins were first characterized in fruit fly and are implicated in
a number of functions including intracellular signaling and cytoskeletal regulation. The
50
U-box family of proteins is a class of ubiquitin-protein E3 ligases. While there is evidence
for positive selection during maize domestication in the krn QTL, none of the genes with
cis regulatory change show signs of positive selection, leaving no ideal candidate for the
kernel row number QTL defined in our fine mapping experiment.
This work provides a cautionary note for researchers looking to identify causal genes
for QTL. In this study we set out to identify the causative gene underlying two QTL, a
large effect and LOD score QTL with a narrow confidence interval and a moderate effect
and LOD score QTL with a larger support interval. Contrary to expectations, we were
actually more successful in narrowing the QTL region for the weaker effect QTL, while
the high LOD kernel row number QTL shifted positions slightly and was influenced by
multiple factors. We show that the inheritance of genetic factors influencing kernel row
number on chromosome five are quite complicated and that a previously mapped high
LOD score QTL fractionates into multiple linked factors. The lower LOD score QTL for
culm diameter actually resulted in the better fine mapping result with a greatly reduced
confidence interval.
2.5.2 Future work on chromosome five QTL
The fifth chromosome of maize has been implicated as a major contributor to maize do-
mestication in several studies [8, 25]. QTL for the kernel row number and ear diameter
traits are of particular interest due to their large effect, high LOD score, and obvious link
to desirable domestication phenotypes. The work in this thesis shows that ear diameter
and kernel row number fractionate into multiple linked QTL on the fifth chromosome.
Evidence from fine mapping and chapter 1 of this thesis put the kernel row number QTL
between 160.7 and 169.2 Mb. Unfortunately, this region contains over 100 genes and we
could not identify a single highly attractive candidate gene based on gene annotation, ex-
pression profiles, and scans for selection. Even though these efforts were not met with full
51
success, the importance of chromosome five on domestication traits (kernel row number
and ear diameter in particular) cannot be understated and future studies looking at this
chromosome are inevitable.
To aid future studies on these QTL, insight can be taken from this work to maximize
the chances of success. I believe there are two primary insights that would be useful
to future researchers in this endeavor. First, the uniformity of the genetic background
on chromosome five and other chromosomes appears to be of critical importance. The
founding HIFs taken from a BC2S3 population in this experiment proved to have a com-
plicated background with multiple secondary segregating sites that caused problems when
mapping in the target QTL interval. Consequently, a more advanced population would
be desired. Second, distinct differences between founding HIFs were detected for both the
culm diameter and kernel row number QTL, suggesting comparison of RCNILs generated
from different HIFs could be misleading. Either designing the experiment to draw on a
single founding HIF to avoid this issue or accounting for HIF in analysis of the pheno-
type data will be important. In spite of accounting for founding HIF in the linear mixed
models, I still observed a large difference in kernel row number between founding HIFs
suggesting simplification of the experiment to use a single HIF may be the best design.
The use of more extensive backcrossing and generation of RCNILs from a single found-
ing HIF will allow for an overall more isogenic genomic background with minimal seg-
regation outside of the desired region. Drawing starting HIFs from the BC6S6 NIRIL
population from chapter 1 is an easy and logical way to do this. Additionally, the kernel
row number QTL is already confirmed in the population. Towards this end, we have
started the crosses necessary to produce a new population of segregating RCNILs from
several of the lines in the mapping population from chapter 1. These RCNILs will be used
in future field trials for a new and improved fine mapping attempt of the highly important
kernel row number and ear diameter phenotypes on chromosome five.
52
Chapter 3
The role of cis regulatory evolution
in maize domestication
53
3.1 Abstract
Gene expression differences in divergent lineages caused by modification of cis regulatory
elements are thought to be a critically important process in the evolution of species.
In this study, we assay genome-wide cis and trans regulatory differences between maize
and its wild progenitor, teosinte, using deep RNA sequencing in F1 hybrid and parent
inbred lines. Three tissues were sampled and approximately 70% of ∼17,000 genes showed
evidence of allele specific expression. Approximately 1,000 of these genes show consistent
cis differences among the sampled maize and teosinte lines, of which ∼70% are specific to a
single tissue. The number of genes with cis regulatory differences is greatest for ear, which
underwent a drastic transformation in form during domestication. Genes with cis effects
were also under positive selection during maize domestication and improvement more often
than expected by chance. Over all genes, maize was shown to possess less cis regulatory
variation than teosinte, a deficit that is greatest for genes with cis regulatory divergence.
We observed a directional bias where genes with cis differences favored higher expression in
maize, suggesting domestication led to a general upregulation of gene expression. Finally,
this work documents the cis and trans regulatory changes between maize and teosinte in
over 17,000 genes for three tissues.
54
3.2 Introduction
Changes in the cis regulatory elements (CREs) of genes with functionally conserved pro-
teins have been considered a key mechanism, if not the primary mechanism, by which the
evolution of the diverse forms of multicellular eukaryotic organisms evolved [12, 13, 65].
Variation in CREs allows for the deployment of tissue specific patterning of gene expres-
sion, differences in developmental timing of expression, and variation in the quantitative
levels of gene expression. Furthermore, modification of CREs, as opposed to coding se-
quence changes, are assumed to have less pleiotropy and consequently a lower chance of
being deleterious due to unintended consequences in secondary tissues. The importance
of CREs for the development of novel morphologies is supported by the growing catalog
of examples for which differences in CREs of specfic genes between closely related species
contributed to the evolution of diversity in form and pigmentation patterning [66].
While compelling evidence for the importance of CREs in evolution has come from
mapping causative variants to CREs, additional evidence has been emerging from genomic
analyses. These analyses have shown that cis regulatory variation is abundant both within
[67–70] and between species [20, 21, 71]. Some studies have reported a bias such that genes
with cis differences between species or ecotypes often show preferential upregulation of
the alleles of one parent, possibly as a result of natural selection [21, 68, 72]. Consistent
with the proposal that cis differences are a key element of adaptive divergence, divergence
for cis regulation between yeast species is more often associated with positive selection
than trans divergence [20, 73].
Crop plants offer a powerful system for the investigation of evolutionary mechanisms
because they display considerable divergence in form from their wild progenitors, yet
exhibit complete cross-fertility with these progenitors [7, 36, 74]. QTL fine-mapping
experiments have provided multiple examples of changes in CREs that underlie trait
divergence between crops and their ancestors. These studies include examples in which
55
cis changes confer the upregulation of a gene during domestication [3], the downregulation
of a gene [14, 62], the loss of a tissue specific expression pattern [15], the gain of a tissue
specific expression pattern [4], and a heterochronic shift in the expression profile [16].
These diverse results suggest that changes in CREs offer a powerful means to fine-tune
gene expression to generate new plant morphologies.
Several genomic scale assays of gene expression differences between crops and their
ancestors have been performed, although the experimental designs used did not allow
the separation of cis and trans effects. These studies have shown that hundreds or even
thousands of genes have altered expression in crops as compared to their progenitors and
that genes with altered expression are more likely to show evidence for past selection than
genes with conserved expression [17–19]. The data suggest massive alterations in gene
expression profiles accompanied domestication. Work in cotton and maize shows a more
frequent upregulation of genes in the cultivated as compared to the wild parent, however
whether this was due to cis or trans effects was not discernible [17, 18].
In this study, we used RNAseq to parse genome-wide expression differences between
maize and its progenitor, teosinte (Zea mays ssp. parviglumis), into cis and trans effects.
Three tissue types were assayed: immature ear, seedling leaf, and seedling stem. Approx-
imately 70% of the 17,000 genes assayed show evidence of regulatory divergence between
maize and teosinte. Over 1,000 genes show cis divergence that is highly consistent across
our sampled lines of maize and teosinte. For ∼70% of genes with consistent cis effects,
the cis effects are specific to just one of the three tissue types. The number of genes with
cis differences is greatest for the ear, which underwent a profound transformation in form
during domestication. Genes with cis regulatory differences between maize and teosinte
more frequently show evidence for positive selection associated with domestication than
do trans genes. Maize also possesses less cis regulatory variation than teosinte over all
genes, and this deficit in maize is greatest for genes with cis regulatory divergence from
56
teosinte. We observed a directional bias in that genes with cis differences more frequently
have upregulated expression of maize alleles over teosinte, although we cannot exclude
the possibility that this is an artifactual result. Finally, our data provide a catalog of cis
and trans regulatory variation for over 17,000 genes in three tissue types for maize and
teosinte.
3.3 Materials and Methods
3.3.1 Plant material, RNA preparation, and sequencing
Six maize inbred lines, nine teosinte inbred lines, and 29 of their 54 possible maize-teosinte
F1 hybrids were used in this experiment (Supplemental Table C.1). An average of 1.96
biological replicates (range 1 to 4) of each genotype were used. Plants were grown in
growth chambers with a 12 hour dark-light cycle for up to 6 weeks, after which they were
moved to a greenhouse. Fifty to 100 milligram samples of the immature ear, leaf, and
seedling stem were harvested for RNA extraction during this time. Leaf and seedling stem
(including the shoot apical meristem) tissue was collected at the v4 leaf stage. Single ears
from maize and F1 hybrid plants were collected when the ears weighed 50 to 100 milligrams
with silks just beginning to be visible. Teosinte ears were also collected when silks just
started to appear, however, due to the small size of teosinte ears 7 to 16 ears (average of
11.27) from each plant were pooled to obtain ∼50 milligrams of tissue. These three tissue
types will from here on be referred to as the ear, leaf, and stem tissues.
Total RNA was extracted from the plant tissues using a standard TRIzol protocol. To-
tal RNA was then quantified by spectrophotometer and normalized to 1 µg/µL in nuclease
free water. Starting with 5 µg total RNA, we generated polyA selected, strand specific,
barcoded RNAseq libraries with a previously published protocol using a five minute frag-
mentation time and 12 PCR amplification cycles [75]. Library adapters used barcode
57
sequences of four and five base pairs (Supplemental Table C.2) designed to balance per-
cent nucleotide composition within the first five base pairs of sequence reads and to have
at least two base pair differences from any other barcode. RNAseq libraries were then
pooled in groups of 14 (F1s) or 15 (parents), and the pooled libraries sequenced on one
lane (parents) or two lanes (F1s) of an Illumina HiSeq2000 sequencer at the University of
Wisconsin Biotech Center.
3.3.2 Bioinformatics
A pipeline was developed to quantify gene expression in F1 hybrid and parental inbred
lines using the RNAseq reads. The pipeline, based on work by Wang et al. [76], has two
main steps (1) construction of a pseudo-transcriptome for each parent line from the B73
reference genome and polymorphisms derived from non-B73 genomic paired-end reads
and (2) alignment of RNAseq reads to the pseudo-transcriptomes followed by evaluation
of read depth at segregating sites.
Pseudo-transcriptomes were constructed using the B73 reference genome (version
AGPv2) and transcriptome (version ZmB73 5a WGS) plus an average of 403.1 million
(17.5X coverage) paired-end genomic sequencing reads from each of the other 14 inbred
lines (Supplemental Table C.3). For each of the 14 non-B73 inbreds, paired-end genomic
sequencing reads were aligned to the reference genome with the BWA aligner (version
0.5.9) [77]. Only uniquely mapping reads with up to two mismatches were used to limit
false polymorphism detection due to paralogous read alignment. Segregating sites from
single nucleotide polymorphisms (SNPs) and small insertion or deletion (indel) polymor-
phisms were called using the GATK package (version 1.0.5588) [78, 79] and filtered to
include only polymorphisms that were homozygous in the inbred with read depth of at
least 4X. A strand bias filter was also applied to ensure that the polymorphism was de-
tected on both the plus and minus strand. Polymorphisms surviving these filters were
58
then inserted into the reference B73 transcriptome to make a pseudo-transcriptome for
each parent.
For each of the 29 maize-teosinte pairs, a robust set of segregating sites was determined
by comparing the pseudo-transcriptomes of the two parents and taking the sites where: the
two parental alleles differed, coverage in genomic read alignment was at least four for both
parents within the read length (88bp) of the site, and no heterozygous polymorphisms
were detected in genomic read alignments of the two parents within the read length of
the site.
RNAseq reads from each F1 hybrid and each corresponding pair of inbred parents
were then aligned to the combined pseudo-transcriptomes of the two parents (in the case
of the B73 parent, the B73 reference transcriptome was used) using the Bowtie aligner
(version 0.12.7) [80]. Allele specific expression was assessed by counting depths of reads
originating from each parent at segregating sites (determined as described above). Since
only perfect alignments were allowed, assignment of reads to parents was straightforward
(a read from a given parent could only align to this parent’s allele at a segregating site).
3.3.3 Maize:teosinte gene expression ratios
We calculated F1 hybrid and parent maize:teosinte expression ratios for each gene for
each of the 29 individual F1 hybrid comparisons. The F1 expression ratio for individual
F1s (e.g. B73 x TIL01) was calculated as the number of maize reads to the number of
teosinte reads summed over all segregating sites in the gene. The parent expression ratio
for individual F1 comparisons was calculated as the number of reads for the maize parent
(e.g. B73) to the number of reads for the teosinte parent (e.g. TIL01) summed over all
segregating sites in the gene after correcting for any difference in the total number of
reads between the two parent lines. The result of these calculations was a set 29 matched
F1 and parent ratios of read counts for each gene. For example, for the B73 x TIL01
59
comparison at a single gene, the F1 and parent maize:teosinte ratios could be 52:56 and
34:30, respectively.
We also calculated F1 hybrid and parent maize:teosinte expression ratios for each
gene summed over all F1 hybrid comparisons by pooling the read depth values for the 29
F1 hybrids and their parents, respectively. To calculate the overall F1 expression ratio,
the maize and teosinte read counts from the F1 hybrids were simply summed over all
segregating sites in a gene and across all hybrids. The calculation of the overall parent
expression ratio required weighting. The weighting was necessary to avoid counting the
parent reads multiple times for each of the F1 hybrids in which it was a parent and to
compensate for the fact that different parents had variable total numbers of reads. Only
genes with a read depth of at least 100 in both the F1 and its parent were included. The
result of these calculations was an overall F1 and parent ratio of read counts for each gene.
For example, for a gene, the overall F1 and parent maize:teosinte ratios could be 804:796
and 123:130, respectively.
3.3.4 Testing for cis and trans effects
The combination of F1 hybrid and parent inbred expression data allows us to estimate
both the cis and trans effects on gene expression. For the F1 hybrids, the maize and
teosinte alleles at each gene are in a common trans cellular environment, and thus any
deviation of the maize:teosinte F1 expression ratio from 1:1 represents purely cis effects.
By contrast, the maize:teosinte parent expression ratio is a combination of the cis and
trans effects and any deviation of this ratio from 1:1 reflects the combined cis plus trans
effects. Therefore, the trans effects can be estimated by subtracting the F1 hybrid ratio
(cis) from the parent ratio (cis plus trans).
Maize and teosinte gene expression as measured by the read depth counts at genes were
used for statistical testing of cis and trans effects. Significant cis and trans effects were
60
Table 3.1: Regulatory category as defined by significant (Sig.) or not significant (NotSig.) binomial tests (BT) and Fisher’s Exact Tests (FET).
Category Parent BT Hybrid BT FET Favored allele?
Cis Sig. Sig. Not Sig. —Trans Sig. Not Sig. Sig. —Cis + Trans Sig. Sig. Sig. SameCis x Trans Sig. Sig. Sig. OppositeCompensatory Not Sig. Sig. Sig. —Conserved Not Sig. Not Sig. — —Ambiguous All other patterns of significant or not significant
61
determined using binomial and Fisher’s Exact Tests as described in McManus et al. [21].
In brief, two binomial tests were used to identify genes with maize:teosinte expression
ratios significantly different from 1:1 in the F1 hybrid and parent comparisons. Genes
with an expression ratio significantly different from 1:1 for the F1 hybrid and/or parent
comparison were then subjected to a Fisher’s Exact Test to determine if the parent and F1
hybrid maize:teosinte expression ratios were different from one another. An FDR rate of
0.5% using Storey’s q-value [81] was used to compensate for the large number of statistical
tests being performed. The combination of the two binomial tests and Fisher’s Exact Test
allowed us to classify each gene into one of seven different regulatory categories (Table 3.1)
as described in McManus et al. [21].
3.3.5 Candidate genes
Genes whose expression level was the direct target of selection during maize domestication
are expected to show a maize:teosinte cis expression ratio that is significantly different
from 1:1. These genes can fall into either the cis only (C) or cis plus trans (CT) groups on
Table 3.1 as determined by the binomial and Fisher’s Exact Tests. We call this combined
group CCT genes and they are the differential expression candidates that are the focus
of many of our analyses.
The list of CCT genes from the overall test was large (5,609 ear; 5,392 leaf; 5,426 stem;
see results). The large number of CCT genes reflects the considerable statistical power
to detect slight overall expression biases given that some genes had thousands of reads
aligning to segregating sites. We observed significant maize:teosinte expression biases
as small as 1.0:1.02 in the overall tests. Such small differences seem unlikely to have
biological importance and genes showing these small differences are weak candidates for
genes with cis expression variation that is causal in maize domestication and improvement.
62
Therefore, we applied filters to identify candidates with the strongest and most consistent
regulatory differences.
To narrow down the CCT gene list to candidate genes that show the strongest evidence
for differential cis regulation between maize and teosinte, we applied two filters. (1) Genes
with the strongest evidence should not only fall in the CCT group for the overall test using
the pooled data from all 29 F1 hybrid comparisons, but the best supported genes for cis
differences will be the ones for which we have data from a large proportion of our sampled
maize and teosinte parents. Thus, we filtered the initial list of CCT genes for those with
data from at least fifteen F1 hybrids that include at least three different maize inbreds
and five different teosinte inbreds. (2) For genes with cis differences that contributed to
maize domestication/improvement, they should not only appear in the CCT list from the
overall test, but the direction of the expression bias should be highly consistent among
each of the individual F1 hybrids. To classify CCT genes for consistency of directionality
of expression bias among the F1s, we partitioned the genes into groups with 100%, 90%
and 80% of F1s showing the same directionality. In calculating these percentages, we used
read depth for each F1 at the gene to weight the contribution of the F1s to the overall
percentage. We refer to the CCT genes with 100%, 90% and 80% consistent directionality
among the F1s as the A-list, B-list and C-list, respectively. For comparative purposes, we
made similar A, B and C lists of genes for the cis only or trans only classes.
3.3.6 Proportion of cis variation in maize and teosinte
The existence of multiple cis regulatory regimes within maize and teosinte populations
are expected to manifest as variation in the expression ratios among F1 hybrids. We asked
whether cis expression variation among F1 hybrid ratios was more heavily influenced by
maize or teosinte inbred parent. Since three teosinte inbreds (TIL05, TIL10, and TIL15)
were involve in only a single F1 each, the three F1s involving these inbreds were removed
63
from the data in order to balance the number of maize and teosinte inbred parents in the
dataset for this analysis. Genes were tested for variation among the F1 expression ratios
(cis variation) using a linear model. The log2(maize:teosinte) F1 expression ratio as the
dependent variable was fit to the maize (j=1 to 6) and teosinte (k=1 to 6) parents as the
independent variables. All models were fit on a gene-by-gene basis. Significant maize and
teosinte parent terms were identified with an F-test (p < 0.05) using the drop1 function
in R. The data for each F1 was weighted by its total depth at the gene to account for
different read-depths in the F1 hybrids.
3.3.7 Additive and dominant gene expression
One theory in domesticated systems states that genes responsible for rapid morpholog-
ical evolution are primarily loss of function (LOF) alleles [82]. In this scenario, a non-
domesticated allele would be dominant to the LOF domesticated allele. While there is
some support for this theory in rice diversification and improvement [83], recent QTL and
domestication gene cloning experiments present a more diverse collection of functional
gene changes [84]. In domesticated systems, the mode of inheritance for gene expression
in terms of additivity and dominance has yet to be explored.
Our dataset consisting of parent inbred and hybrid expression profiles gives the op-
portunity to address the LOF hypothesis in terms of gene expression on a genome-wide
scale. We calculated the additive effect, dominant effect, and dominant/additive (D/A)
ratio for each gene and maize-teosinte F1 hybrid comparison. The overall maize-teosinte
average D/A ratio was then calculated after exclusion of outlier F1 D/A ratios using the
Dixon method [85]. Genes were next classified as having overdominant (1.25 < |D/A|),
dominant (0.75 < | D/A | < 1.25), semi-dominant (0.25 < | D/A | < 0.75), or additive (|
D/A | < 0.25) gene action depending on D/A ratio. Following calculation of overall D/A
ratios and assignment of gene action, we looked for patterns in D/A ratios and gene action
64
that support the LOF hypothesis [82]. Specifically, we looked for evidence of extensive
dominance of the teosinte (non-domesticated) allele for genes with trans only regulatory
change.
3.3.8 CCT gene enrichment in various functional categories
We assessed whether CCT genes are over or under represented in several categories as com-
pared to all genes or genes with conserved expression levels between maize and teosinte.
The categories we tested include transcription factors, several metabolic pathways, gene
ontology (GO) categories, selection candidates, and domestication QTL. A list of maize
transcription factors and their associate families was downloaded from the plant tran-
scription factor database [86]. Metabolic enzyme cDNA sequences for starch and lipid
metabolism pathways in maize were downloaded from the Kyoto Encyclopedia of Genes
and Genomes (KEGG) [87, 88] and matched with genes from the maize filtered gene set
(version 5b) by BLAST. Matches (single gene hit with percent identity greater than 95%)
were found for 370 out of 379 genes and used to test for enrichment of CCT genes in the
various metabolic pathways. Genes under positive selection during maize domestication
and improvement were taken from a recent genomic scan for selection [55]. We obtained a
list of QTL associated with maize domestication and improvement traits from Table A.1
in work by Shannon [25].
In general, we tested for enrichment or depletion of CCT genes in various categories
using Fisher’s Exact Tests on 2x2 contingency tables that parse genes by CCT and cate-
gory status. Statistical testing was first done for CCT-AB candidate genes and extended
to CCT-A and CCT-ABC lists if an interesting result presented itself. Additionally, there
were a few differences in this general approach depending on what category was being
analyzed. For QTL, we looked for enrichment of CCT genes among the genes within the
1.5 LOD support intervals for each trait separately and only included QTL whose 1.5
65
LOD support intervals were narrow enough to encompass 20 or fewer genes. For genes
under positive selection during domestication and improvement, we performed an addi-
tional three tissue union comparison where genes on any of the three tissue CCT lists
were considered a CCT candidate gene.
One expectation for genes under selection for CREs is the signature of selection at
the CRE itself, upstream of the gene in question. Since there is no hard rule as to
how far upstream cis enhancer and repressor elements can function, we addressed this
expectation by looking at selection pressure at the transcriptional start site of genes. The
raw selection score, represented by cross population composite likelihood ratio (XPCLR)
[89], from Hufford et al. [55] served as a test statistic for this analysis. A three tissue
union comparison was made between all genes on CCT-AB lists and all genes identified
as conserved in the initial assay. Significant differences between the XPCLR score at the
transcriptional start site were tested by Kolmogorov-Smirnov and simple t-tests to look
for change in the overall distribution and mean of conserved versus CCT genes.
Finally, we used the goseq package [90] in R [91] to test for GO term enrichment and
depletion in our CCT gene lists, using median gene length to adjust the reference in the
goseq analysis. The base background GO term reference consisted of genes for which
allele specific expression was assessed in 15 crosses, three unique maize, and five unique
teosinte inbred lines with a cumulative depth of 100 at segregating sites in F1 and parent
comparisons. GO terms occurring at least five times in the background reference were
tested for enrichment and depletion in the CCT-A, CCT-AB, and CCT-ABC gene lists
with p-values corrected for multiple testing using the Benjamini-Hochberg method [92].
66
3.4 Results
3.4.1 RNAseq provides expression data for more than 17,000
genes per tissue
RNAseq data for seedling leaf, seedling stem (including the shoot apical meristem), and
immature ear from six maize inbreds, nine teosinte inbreds, and 29 of their 54 possible
F1 hybrids were used to examine variation in gene expression on a genome-wide scale. In
total, 259 RNAseq libraries were constructed from an average of 1.96 biological replicates
for each parent inbred and F1.
Overall, 996 million, 1.13 billion, and 1.21 billion F1 hybrid and 286 million, 283
million, and 276 million parent RNAseq reads were collected for ear, leaf, and stem tissue
types, respectively (Table 3.2). These reads were aligned with custom-made parent specific
pseudo-transcriptomes containing an average of 54,000 segregating sites (SNPs or small
indels) in each of the 29 maize-teosinte contrasts. Out of the reads from the F1 hybrids,
556 million, 670 million, and 716 million reads mapped to pseudo-transcriptomes in ear,
leaf, and stem tissue, respectively. For parent inbred line reads, 171 million, 170 million,
and 163 million mapped to the pseudo-transcriptomes (Table 3.2). Thus, approximately
the same percentage of reads (58.1% and 59.6%) mapped to pseudo-transcriptomes in
both the F1 hybrids and parent datasets with about 7.15% of the total reads mapping to
segregating sites in the individual F1 hybrids and their parents.
The RNAseq reads from the pooled data for all 29 F1 hybrids and 15 parents that
aligned to segregating sites in the transcriptomes represent 23,045, 23,434, and 23,792
genes for ear, leaf and stem tissues, respectively (Table 3.3). The union of these three
groups is 24,983 genes, which is 63% of the 39,423 genes from the maize filtered gene set
(version 5b). We applied a filter to this list, requiring a read-depth of 100 in both the
parent inbreds and F1 hybrids. This filter reduced the lists to 15,939, 15,925, and 16,018
67
Figure 3.1: Overlap of genes assessed in the three tissues overall and in the CCT-ABgene list. Each compartment of the Venn diagram contains the tissue combination ontop, number of genes overall in the middle, and number of genes from the CCT-AB genelist on bottom. CCT-AB overlap numbers marked by an “*” indicate significantly moreoverlap than expected by chance (permutation tests, p < 1e-5). In the overall analysisthe vast majority of genes (82%) were assayed in all three tissues. While this percent ismuch smaller for the CCT-AB candidate gene list (∼7%), this is still more of an overlapthan expected by chance. The much higher degree of overlap of CCT-AB genes thanexpected suggests some CREs act in multiple tissues. Additionally, there are also manysingle tissue CCT-AB genes, which points towards the many cis elements that appear tofunction in tissue specific patterns.
68
Table 3.2: Assignable RNAseq Read Counts from F1 hybrids and parents.
Tissue F1 Hybrid Count Parent CountF1 Hybrid
PercentParentPercent
Total ReadsEar 996,210,711 286,233,926 - -Leaf 1,133,517,167 282,553,096 - -Stem 1,211,779,746 276,295,164 - -
Aligned ReadsEar 556,387,109 171,185,368 55.85% 59.81%Leaf 670,175,942 169,564,817 59.12% 60.01%Stem 716,223,906 162,866,225 59.11% 58.95%
SegregatingSite Reads
Ear 74,556,872 85,296,872a 7.48% 29.80%a
Leaf 72,995,272 78,878,805a 6.44% 27.92%a
Stem 91,355,219 78,583,423a 7.54% 28.44%a
a A higher number and percentage of reads map to segregating sites in parents due toeach set of parent reads being used in multiple comparisons. In contrast each of the F1
comparisons can only map to segregating sites between two pseudo-transcriptomes.
69
Table 3.3: Genes for which RNAseq data was collected and expression was assayed.1
Ear Leaf Stem Union
Genes with mapped RNAseq reads 32,858 32,645 33,316 34,636Genes with RNAseq reads and segregating sites 22,072 22,393 22,901 24,052
Overall Genes (filtered100 depth) 15,939 15,925 16,018 17,575Total CCT genes 5,618 5,402 5,435 10,101
Filtered CCT Genes (15F1+ 3M + 5T) 4,770 4,490 4,601 8,398ABC-List CCT 1,545 1,288 1,371 3,018
C-List CCT 990 843 940 2,314B-List CCT 512 424 404 1,036A-List CCT 43 21 27 69
1 Only genes from the maize filtered gene set (version 5b) were considered.
70
genes in ear, leaf, and stem tissues, respectively. The union of these three groups is 17,575
genes or about 45% of the filtered gene set. There is a large degree of overlap among the
genes expressed in the three tissues. From the total list of 17,575 genes, 14,420 (82%) were
seen in all three tissues. Of the remaining genes, 1,467 are in some combination of two
tissues and 1,688 are in only a single tissue (Figure 3.1). All except 16 of these single or
two tissue genes were detected at a read depth below 100 in additional tissues. However,
for the 1,688 genes expressed in only single tissues at 100 read-depth, an average of 67.4%
of their reads come from the tissue with the most reads. For genes detected in all three
tissues at 100 read-depth, this value is only 46.9%. Thus, while very few of the 1,688
genes are absolutely tissue specific, this group of 1,688 genes shows greater differences in
expression among tissues than the 14,420 genes detected in all three tissues.
3.4.2 Prolific regulatory variation characterized by relatively
few consistent cis differences
We measured log2 of the ratio of maize to teosinte read counts in F1 hybrids (cis regu-
latory effect) and the parent log2 ratio (combined cis and trans regulatory effect). The
trans effect was estimated as the difference between the F1 and parent log2 ratios. Bi-
nomial and Fisher’s Exact Tests were used on read counts to determine whether these
ratios deviated from 1:1 and to assign genes to one of seven regulatory categories (Ta-
ble 3.1). In an overall maize versus teosinte comparison, about 69% of genes (69.27% ear,
74.27% leaf, and 63.82% stem genes) from the three tissues were classified as having some
combination of significant cis and/or trans regulatory effect (Figure 3.2). The remaining
genes were classified as having conserved (18.6%, 15.3%, and 20.7%) expression in maize
and teosinte or ambiguous (12.1%, 10.4%, and 15.5%) expression patterns. All three
tissues had similar proportions of genes falling into the different regulatory categories in
71
the overall maize-teosinte comparison (Ear: Figure 3.2, Leaf: Supplemental Figure C.1,
Stem: Supplemental Figure C.2).
We asked what proportion of regulatory divergence between maize and teosinte was
due to cis effects by calculating the ratio: |cis|/(|cis|+ |trans|) [21]. Overall genes, cis
effects account for 45%, 42% and 47% of regulatory divergence for ear, leaf and stem
tissue, respectively (Supplemental Table C.4). We further asked the relative contribution
of cis and trans in generating large expression differences by binning genes based on
overall expression difference between maize and teosinte (log2 parent ratio). This analysis
shows the magnitude of cis regulatory change is positively correlated with total divergence
in expression (Figure 3.3). At high degrees of expression divergence between maize and
teosinte (log2 change of 5 or more), over 75% of the divergence is due to cis. Thus, large
expression differences appear to be caused primarily through difference in cis regulation
as opposed to trans.
A primary goal in this study was to identify genes with cis regulatory differences
between maize and teosinte. Such genes are candidates for being direct targets of selection
during maize domestication or improvement for altered gene expression. Genes selected
for regulatory differences would be in either the cis only or cis plus trans regulatory
categories. We designate this combined group CCT genes. We identified 5,618 ear, 5,402
leaf and 5,435 stem CCT genes in the overall analysis (Table 3.3). To narrow the list
of CCT genes to those with a broad degree of support, the list was filtered to include
only those assayed in at least 15 maize-teosinte F1s involving at least three maize and five
teosinte inbred lines. This filtering resulted in reduced lists of 4,770 ear, 4,490 leaf, and
4,601 stem CCT genes. The union of these three sets includes 8,398 genes.
Next, we asked if the 8,398 genes on the filtered CCT list from the overall analysis
have a consistent directionality in favor of the maize or teosinte allele in the individual F1
hybrids. The goal was to exclude CCT genes for which the significant overall cis effect was
72
Figure 3.2: Parent versus hybrid ear tissue allele specific expression ratios. The parent(x-axis) versus F1 hybrid (y-axis) allele specific expression ratios are plotted against eachother. Regulatory category in terms of the combination of significant statistical testsdetermined using the method described in methods is shown designated by color. Pro-portion and count of genes falling into the various regulatory categories are also shown inthe lower right hand corner barplot.
73
Figure 3.3: Proportion of expression divergence due to cis regulatory difference. Theamount of total differential expression between the maize and teosinte parents due to thedirectly measured cis effect (F1 hybrid expression ratio) is shown with error bars depictingone standard error. Total divergence (parent expression ratio) was binned from 0-1, 1-2,2-3, 3-4, 4-5, and 5+. Divergence due to cis effects increases with total divergence, sug-gesting large expression differences tend to be caused by cis rather than trans regulatorydifferences.
74
caused by a large expression bias in a minority or even one of the F1 crosses. We defined
three levels of consistency: groups A, B and C for which 100%, 90% and 80% of F1s
showed the same directionality, respectively. Groups A, B, and C genes combined across
tissues contained 69, 1,036, and 2,314 genes respectively (Table 3.3). Thus, relatively
few of the 8,398 filtered CCT genes show a significant overall cis effect that is highly
consistent among 15 or more F1 hybrids.
3.4.3 Possible directional bias in cis evolution
Visual examination of Figure 3.2 shows a greater density of cis genes (black dots) with
positive log2 hybrid expression ratios than with negative ratios, suggesting cis evolution
during domestication more often favored alleles with increased expression in maize relative
to teosinte. Consistent with this visual observation, the number of CCT (ABC list)
genes with a positive (maize biased) vs. negative (teosinte biased) log2 hybrid expression
ratio are 947:598, 814:474 and 826:545 for ear, leaf and stem, respectively (Supplemental
Table C.5). All of these ratios are significantly different from a 50:50 unbiased expectation
(binomial test, p< 0.001). Additionally, a plot of the distribution of log2 hybrid expression
ratio for CCT genes shows a much greater density of genes with positive values (Figure 3.4)
for all three tissue types.
The apparent bias in directionality of cis evolution could be the result of error in our
bioinformatics pipeline. One potential error is preferential alignment of maize RNAseq
reads due to overall greater sequence divergence of teosinte lines from the reference tran-
scriptome (B73) in comparison to non-reference maize inbred lines. If such systematic
error exists, the observed bias in directionality of cis evolution would be expected to be
greatest for F1s involving the reference B73 (zero alignment bias of maize reads and high
bias for teosinte) and less extreme for crosses between teosinte and non-reference maize
lines (moderate bias for non-reference maize and high bias for teosinte).
75
Figure 3.4: Cis versus estimated trans regulatory effect for CCT-ABC genes in the ear,leaf, and stem. CCT genes have a directional bias with more genes overall favoring themaize allele than teosinte. Genes with consistent cis regulatory differences tend to favorthe domesticated maize allele. This phenomenon exists in all three tissues. While wecannot discount references bias as the cause, this trend suggests there may be an overalldirectional bias for cis regulatory evolution in maize domestication.
76
To test this expectation, we calculated the number of CCT (ABC list) genes with
positive (maize biased) vs. negative (teosinte biased) log2 hybrid expression ratios sep-
arately for F1s involving B73 and non-B73 maize parents. For ear tissue, there are 569
teosinte-biased and 975 maize-biased genes for B73 F1s and 606 teosinte-biased and 939
maize-biased genes for non-B73 F1s. A Fisher’s Exact Test fails to reject the null hy-
pothesis that these two ratios are equivalent (p = 0.18). There was also no evidence for
non-equivalent ratios with the other two tissue types (Supplemental Table C.6). Thus,
we see no evidence for significantly greater bias for maize alleles in crosses involving B73
versus the non-reference maize parents, supporting the argument that alignment bias in-
troduced by use of pseudo-transcriptomes does not explain the excess of CCT genes with
the maize allele expressed higher than the teosinte allele.
3.4.4 Gene expression variation is greater in teosinte
Both the domestication/improvement bottleneck and selection during domestication are
expected to reduce variation in maize as compared to teosinte. We asked if these reduc-
tions in variation are apparent in our gene expression data. To quantify whether variation
in maize or in teosinte was the source of the variation in our expression ratios among F1 hy-
brids, we fit a linear model on a gene-by-gene basis where maize and teosinte inbred parent
were used as explanatory factors for the expression ratio. Among ∼13,000 genes included
in this analysis, the maize parent explains only 85% as much variation as the teosinte
parent (Supplemental Table C.7). This represents the general reduction in diversity of
maize as compared to teosinte, presumably a result of the domestication/improvement
bottleneck.
While the bottleneck should cause a reduction in expression variation in maize for all
genes, genes that were targets of selection for regulatory differences should have an even
greater reduction in expression variation. Consistent with this expectation, we observed
77
Figure 3.5: The proportion of average maize to teosinte R2 from linear models explainingF1 hybrid expression by maize and teosinte parent. Error bars represent ± 1 standarderror. In all three tissues, the proportion of maize to teosinte R2 decreases in candi-date CCT gene lists with the most ideal candidates (CCT-A) having the most extremereduction.
78
a greater reduction in variation in maize as compared to teosinte for CCT genes than the
full set of ∼13,000 genes (Figure 3.5, Supplemental Table C.7). This greater reduction
likely reflects the combined effects of the bottleneck plus selection during domestication.
For the full ABC groups of CCT genes, maize contributes 79% of teosinte variation, for
the AB group about 74%, and for the A group about 52% of teosinte variation. Thus,
among our strongest candidates (A group) for genes with cis regulatory difference between
maize and teosinte, the data indicate that maize explains only about half as much of the
cis regulatory variation as teosinte.
The reduction in gene expression variation in maize vs. teosinte is also seen in the
number of individual genes with significant effects due to the maize and/or teosinte parent
(Supplemental Table C.8). In terms of numbers of genes, there were 2.0 to 2.5 fold more
genes for which only the teosinte parent effect was significant than genes for which only
the maize parent effect was significant among AB list genes, and 5-fold more among the
A list CCT genes.
3.4.5 Selection candidate genes are enriched for CCT genes
We compared our list of CCT genes to putative targets of selection during maize domes-
tication and improvement [55]. There is significant enrichment for CCT genes among
selection candidate genes for all three tissues (Table 3.4). The strength of the evidence
for selection is strongest for the union of CCT genes from all three tissues. For example,
there are 134 CCT-AB genes among the selected genes, while 86.7 would be expected by
chance. Also, there were 10 CCT (A-list) genes from stem tissue among selected genes,
although only 2.16 are expected by chance, a nearly 5-fold enrichment.
XPCLR scores (cross population composite likelihood ratios) [89] quantify the de-
gree of support for positive selection on a genomic region. We drew on a recent study
[55] looking at XPCLR score in 10 kilobase windows in maize on a genome-wide scale.
79
Figure 3.6: Density plots of ln(XPCLR) score of conserved versus CCT-AB candidategenes. CCT genes have a significantly higher signature of selection in the 10kb windowholding the transcriptional start site. The natural log transformed XPCLR scores forCCT-AB genes are consistently and statistically higher than genes that were identified asconserved in the initial analysis. The distributions of conserved and CCT-AB genes aresignificantly different by both the shape sensitive Kolmogorov-Smirnov test (p = 1.0587e-11) and simple difference of the means t-test (p = 2.2119e-10)
80
Table 3.4: Fisher’s Exact Tests for the overlap between genes in domestication and im-provement selection candidate genes and CCT genes from each of the three experimentaltissues.
CCT Group Overlap Ear Leaf Stem Union
AExpected 3.42 1.41 2.16 5.6Observed 11 5 10 20p-value 3.52e-04 9.73e-03 1.89e-05 2.49e-07
ABExpected 44.71 35.29 34.78 86.7Observed 70 57 60 134p-value 9.12e-05 1.79e-04 1.74e-05 1.13e-07
ABCExpected 125.48 105.68 109.89 248.92Observed 174 135 139 317p-value 2.11e-06 1.289e-03 1.626e-03 3.54e-07
81
Comparison of the distributions of ln(XPCLR) scores at the transcriptional start site for
CCT-AB genes and genes with conserved expression between maize and teosinte shows
that CCT genes having a higher mean XPCLR than conserved genes (Figure 3.6). These
two distributions are significantly different in terms of shape (Kolmogorov-Smirnov test,
p = 1.06e-11) and overall mean (t-test, p = 2.21e-10).
A goal of this study was to explore the relative importance of cis versus trans regula-
tory divergence during maize domestication. To address this question, we looked at the
evidence for selection on genes with cis only effects in comparison to genes that had trans
only effects. Genes in the cis and trans only regulatory categories were filtered to only
include those that had consistent effects in the F1 hybrid contrasts. Consistent effect was
defined as 100%, 90%, and 80% of hybrid contrasts favoring the same directionality of
effect. Due to this definition genes in the cis only group were merely the cis only subset
of CCT genes. For the trans only group in this analysis, the trans effect was estimated
from parent and hybrid expression ratios and a weighted percent of hybrid contrasts fa-
voring maize or teosinte alleles was calculated. Fisher’s Exact Tests on 2x2 contingency
tables tabulating cis and trans genes with selection feature genes from Hufford et al.
[55] show cis only genes are significantly enriched (p-value < 0.05) for selection in 7 of
9 comparisons, while trans only genes are never enriched and are actually significantly
underrepresented among selected genes in two cases (Table 3.5).
3.4.6 Microarray and RNAseq data partially correspond
We assessed the degree of correspondence between our CCT genes and 612 differentially
expressed genes identified by a recent microarray study in maize [18]. We constructed
2x2 contingency tables for differentially expressed (DE) and non-differentially expressed
(NDE) genes from the two studies. A Fisher’s Exact Test shows a highly significant degree
of correspondence between the two studies for all three tissue types (Table 3.6). Using our
82
Table 3.5: Fisher’s Exact Tests for enrichment/depletion of cis and trans only genes inselection features.
TissueRegulatoryCategory
Group Observed Expected p-value
EarCis only
A List
5 1.998 0.043Leaf 3 0.751 0.032Stem 3 1.316 0.138Ear
Trans only4 5.327 0.818
Leaf 3 2.346 0.506Stem 1 0.282 0.256
EarCis only
AB List
36 24.449 0.018Leaf 24 13.516 0.006Stem 32 19.647 0.006Ear
Trans only28 41.954 0.020
Leaf 34 38.388 0.490Stem 16 12.032 0.222
EarCis only
ABC List
95 70.113 0.002Leaf 54 45.427 0.175Stem 84 65.615 0.016Ear
Trans only78 97.036 0.033
Leaf 91 101.461 0.273Stem 42 43.148 0.935
83
CCT-AB list, ∼25 gene are identified as DE in both studies while about 7 are expected
by chance. However, the absolute level of correspondence between the two studies is
rather low. For example, of the 328 leaf genes identified as DE by RNAseq, only 25 (7%)
were also identified by the microarray study (Supplemental Table C.9). Thus, while the
overlap between our two studies is statistically significant, the two methodologies resulted
in largely different lists of DE genes.
The largely different lists of DE genes identified by microarray and RNAseq analysis
could be due in part to the fact that the microarray analysis includes genes with trans and
cis x trans differences. To assess the proportion of the 612 genes that have trans versus
cis effects, we examined the regulatory categories of the ∼250 differentially expressed
genes (241, 261, 259; ear, leaf, and stem) for which there is both microarray and RNAseq
data (Supplemental Table C.10). About 20% of these genes are classified as trans only
or cis x trans by RNAseq, while 55% are classified as either cis only or cis + trans. The
remainder (25%) are classified as conserved, ambiguous or compensatory. These results
suggests the very different lists of DE genes from the two technologies is to a large degree
due to differences in tissue, germplasm, environment, sampling error, or technical error,
and that inclusion/exclusion of trans and cis x trans genes by the two studies does not
explain all of the difference.
3.4.7 CCT genes are unrelated to differentially methylated re-
gions
In a recent study, Eichten et al. [93] identified differentially methylated regions (DMRs)
in maize and teosinte. We compiled a list of the nearest genes both upstream and down-
stream of each DMR which gave a list of 332 genes. Of these genes, we have RNAseq data
from 115, 116, and 121 for the ear, leaf, and stem tissues, respectively. Of these genes, 19,
14, and 17 genes were on the CCT-ABC gene lists (Supplemental Table C.11). We asked if
84
Table 3.6: Fisher’s Exact Tests for the overlap between differentially expressed genes fromthe microarray study and CCT genes from each of the three experimental tissues in ourwork.
CCT Group Overlap Ear Leaf Stem Union
AExpected 0.556 0.274 0.359 1.040Observed 4 3 2 8p-value 2.14e-03 2.28e-03 4.92e-02 7.83e-06
ABExpected 7.501 6.409 6.248 15.778Observed 23 25 25 48p-value 1.56e-06 4.84e-09 2.91e-09 1.61e-12
ABCExpected 21.774 19.363 20.579 46.069Observed 52 48 46 90p-value 9.58e-10 1.69e-09 1.05e-07 6.34e-12
85
CCT-ABC list genes are over-represented among the DMR associated genes as compared
to random expectation and found that they are not (Fisher’s Exact Test, p = 0.1092, p
= 0.4309, p = 0.1755; ear, leaf, and stem). Finally, the relationship between methyla-
tion status of the DMR does not correspond with the differential expression of maize vs.
teosinte alleles at CCT-ABC list genes. Rather than observing that the more methylated
allele was expressed at a lower level, the data show that ∼50% of the time, the methylated
allele is expressed higher and ∼50% expressed lower (Supplemental Table C.12).
3.4.8 Dominant and additive gene expression inheritance
The dominance/additivity (D/A) ratio was calculated for genes that were assessed in
at least 15 crosses with three unique maize and five unique teosinte inbred lines. The
overall average of gene D/A ratios was close to zero in all three tissues (Supplemental
Table C.13), suggesting there is not an extreme overall trend for dominance of non-
domesticated teosinte alleles over domesticated maize alleles. Tissues with active devel-
opmental programs, immature ear and seedling stem, are quite close to a 1:1 ratio of genes
with a positive D/A ratio to genes with negative D/A ratio (1.084 and 0.982 for ear and
stem, respectively). In contrast the leaf tissue has substantially more genes with a nega-
tive D/A value (1.287 ratio of positive to negative D/A ratios), indicating a higher rate of
domesticated maize allele dominance in the leaf tissue. Of the three experimental tissues,
two (Ear and Leaf) have an overall mean significantly different from zero (z-test, p <
0.05) and significantly more negative D/A ratios (binomial test, p < 0.05) than positive,
suggesting teosinte allele dominance (Supplemental Table C.13).
The average D/A ratios of the seven regulatory categories and three CCT gene lists
are also fairly close to an overall mean of zero. Even the smallest CCT-A lists (21
to 43 genes) were always less than a fully dominant D/A ratio of one. Density plots
for D/A ratio grouped by the seven regulatory categories do not show an obvious shift
86
in distribution (Supplemental Figure C.3). Thus, there is evidence for a weak overall
tendency for dominance of non-domesticated expression levels in the ear and leaf tissues
with no evidence for this teosinte dominance being linked to a specific regulatory category
or candidate CCT gene list.
We compared the proportions of genes showing dominant versus additive gene action
in the cis only and trans only regulatory classes. Our trans only genes will show dominant
gene action when there are haplo-sufficient loss-of-function (LOF) alleles at their trans
regulators. In contrast, the effects of cis regulatory elements are expected to be purely
additive in absence of transvection or similar mechanism [94]. When one of our cis only
genes is classified as having dominant gene action that may also indicate error in classi-
fication because of trans effects on its expression that were below the level of statistical
detection. Consistent with the expectation that dominance is more likely for trans only
genes, the proportion of genes classified as dominant is higher for trans only genes in all
three tissue types (Figure 3.7, Supplemental Table C.14).
It has been proposed that the allelic variants responsible for evolution during domes-
tication are primarily recessive LOF alleles [82]. Under this model, a non-domesticated
allele would be dominant to the recessive LOF domesticated allele. Among our cis only
genes with dominant gene action, dominance of the maize versus teosinte allele does not
differ from the 50:50 expectation (Figure 3.7, Supplemental Table C.14). Among our
trans only genes with dominant gene action, the maize allele is dominant to the teosinte
allele more often than expected by chance. These results are counter to the proposal that
domestication favored recessive LOF alleles.
3.4.9 Candidate genes enriched in various functional categories
We examined our list of CCT genes for enrichment of several functional classes of maize
genes including transcription factors, genes in known metabolic pathways, genes underly-
87
Figure 3.7: The proportion of genes showing dominant (red) versus additive (blue) geneaction for cis only and trans only AB lists. For all tissues, trans only genes have a higherrate of dominance, however this difference is only significant for the ear and leaf tissues(Fisher’s exact test, p < 0.005 indicated by “*”). The proportion of genes in the transonly lists that are dominant for the teosinte allele (green) and the maize allele (yellow) isshown in the barplot to the right of each pie graph. There is significant deviation from theneutral expectation (1:1) for the ear and leaf tissue (binomial test, p < 0.005 indicatedby “*”).
88
ing QTL, and gene ontology (GO) groups. First, a list of maize transcription factors and
their corresponding families were compiled from the transcription factor database [86]. Al-
though CCT genes (AB-list) were found to be slightly enriched for several transcription
factor families (ARF, MADS-MIKC, and LBD) by Fisher’s Exact Tests, these results do
not stand up to Bonferroni multiple test correction (Supplemental Table C.15). We con-
clude that there is no compelling evidence that CCT genes are enriched for transcription
factors.
Our list of CCT (AB list) genes was also compared with results from a recent QTL
mapping experiment for a number of domestication and improvement traits [25]. We
compared observed vs. expected overlap between CCT genes from the three tissues to
the genes located within 1.5 LOD QTL support intervals for 16 traits. Testing was done
on a trait by trait basis and restricted to 1.5 LOD QTL intervals containing 20 or fewer
genes. After correction for multiple testing (Bonferroni), no significant enrichment for
CCT-AB genes in domestication QTL was observed (Supplemental Table C.16). The
greatest enrichment was seen with the trait ear diameter for which there were four CCT
genes assayed in ear tissue within the QTL interval when only 1.22 were expected by
chance (Fisher’s Exact Test, p = 0.03).
A test for enrichment of CCT and trans only genes in 15 different metabolic path-
ways defined in the Kyoto Encyclopedia of Genes and Genomes (KEGG) was done using
Fisher’s Exact Test on 2x2 contingency tables. There was no compelling evidence for
enrichment/depletion of either groups of genes in any of the 15 pathways tested (Supple-
mental Table C.17). The smallest p-value identified was for the cutin, suberine, and wax
biogenesis pathway in leaf tissue for trans only genes (p = 0.012), however this result does
not remain significant after Bonferroni multiple test correction.
We tested for GO term enrichment and depletion in the CCT and trans only gene
lists. These analyses found significant GO term associations in the leaf CCT-ABC gene
89
list for five different categories including enrichment for chloroplast, plastid, thylakoid,
and chloroplast thylakoid membrane, and depletion for DNA binding (Supplemental Ta-
ble C.18). For trans only genes, significant enrichment for a number of GO terms in the
ear tissue was detected for transcription factor and photosynthesis related terms with
additional enrichment for ribosomal GO terms found in the leaf tissue (Supplemental
Table C.18).
3.5 Discussion
3.5.1 Regulatory change between and within maize and teosinte
Of the ∼17,000 genes assayed 70% have significant cis and/or trans regulatory differences,
suggesting considerable regulatory change has occurred during maize domestication and
subsequent crop improvement. A similar proportion of genes were found to have cis
and/or trans differences in a recent study between two species of Drosophila [21] and
yeast [73]. This high amount of variation between maize and teosinte is not surprising
given the incredible diversity of maize. Simple presence and absence of gene expression
within maize itself is quite variable as shown in a recent study where 27.9% of genes were
only expressed in a subset of maize inbred lines [95]. Additionally, this study found over
a thousand novel genes not present in the reference B73 genome, suggesting considerable
presence absence variation (PAV) also exists within maize. This finding is consistent
with another study where PAV and copy number variation (CNV) were assessed, finding
hundreds of CNVs and thousands of PAVs that included at least 180 single copy genes
[96]. These CNVs and PAVs are accompanied by millions of additional SNPs both within
and between genes [97]. In light of the known diversity within maize, it is not particularly
surprising to see evidence for prolific cis and trans regulatory variation in gene expression
between maize and teosinte.
90
Gene expression differences between populations only addresses some of the varia-
tion seen in the dataset. There is also a large amount of variation within the maize and
teosinte populations. Only considering cis differences through F1 hybrids, upwards of 60%
of genes have evidence for multiple maize or teosinte expression levels and consequently
multiple alleles within population. Furthermore, our study shows a drop in expression
variation in maize consistent with the reduction in overall diversity caused by the domesti-
cation/improvement bottleneck with an even greater reduction in expression variation for
genes thought to be under additional artificial selection (CCT candidate genes) [55, 98].
The high level of expression variation still present in teosinte represents an unexplored
source of diversity in maize, which may be useful for future crop improvement and plant
breeding efforts.
This study sheds light on the large amount of expression variation within and between
maize and teosinte. However, only a small fraction of this diversity results in consistent
expression differences that distinguish maize and teosinte inbred lines. The relatively
small number of genes in this study showing consistent expression differences between
maize and teosinte (∼1000 of 17,000, ∼6%) is similar to the fraction of genes seen in
another recent study by Swanson-Wagner et al. [18]. Thus, this study reveals an immense
amount of regulatory diversity within and between maize and teosinte, while also showing
only a small fraction of this diversity appears to be fixed for discrete expression patterns
that distinguish maize and teosinte populations.
3.5.2 What is the frequency of cis and trans regulatory change?
Our study shows cis and trans regulatory differences occur at a similar frequency. How-
ever, this is only part of the story, since we also show that cis effects are arguably more
important for the generation of large divergence in expression between maize and teosinte
(Figure 3.3). Our observation of cis effects accounting for the majority of large expression
91
differences was also seen in a recent Drosophila study by McManus et al. [21]. The fre-
quency of cis and trans regulatory differences in our sampling of maize and teosinte are
fairly similar in the three experimental tissues and consistent with work in Drosophila,
however, cis regulatory effects account for a significant portion of large expression diver-
gence.
In a recent study, Swanson-Wagner et al. [18] used microarrays to assess expression in
a number of maize and teosinte inbred lines, many in common with our RNAseq based
study. They found a relatively few number of genes (612 of ∼18,000) with differential
expression between maize and teosinte. Of the genes assayed in both our RNAseq study
and the Swanson-Wagner microarray experiment, all seven regulatory categories were
found, with approximately 25% classified as cis only, 10% as trans only, and 25% as cis
plus trans. While only ∼50% of the microarray differentially expressed genes were classified
as cis only or cis plus trans in our study (potential CCT candidate genes), the overall
low correlation between our RNAseq and the Swanson-Wagner microarray experiment
makes direct comparison difficult. Comparisons made between two parental samples will
identify genes with cumulative cis plus trans regulatory differences, consistent with this
expectation cis only, cis plus trans, and trans only were the three most frequent regulatory
categories assigned to differentially expressed microarray genes.
A prominent hypothesis in evolutionary biology is that mutation in CREs of func-
tionally conserved proteins is the primary mechanism by which morphological evolution
occurs [12]. In this hypothesis, mutation of the CREs of highly pleiotropic “master reg-
ulator” genes, and the resulting downstream effects, contribute substantially to overall
morphological change, which if true predicts large scale rearrangement of gene expression
networks based on trans effects. While it is true trans effects occur at a high frequency
in this study, these effects are accompanied by an equal number of larger cis regula-
tory driven expression differences. Thus, we believe the changes to gene regulation during
92
maize domestication are best interpreted as frequent “shaving” of expression by cis regula-
tory change to fine-tune various pathway elements in addition to the broader adjustments
to whole pathways through trans regulatory differences.
3.5.3 Tissue specific expression of CCT candidates
We compared the expression of genes identified as candidates between tissues. There
was significantly more overlap between the candidate genes from the three experimental
tissues than expected by random chance (Permutation tests, p < 1e-5, Figure 3.1). This
suggests a high degree of shared cis regulatory effects between tissues. The functioning
of CREs in multiple tissues is also supported by the high observed correlation between
the direction and magnitude of cis effect in different tissues (Adj. R2 ≈ 80%, Pearson
correlation ≈ 90%). These results suggest many CREs function in multiple tissues to
drive expression of a gene.
While there is evidence for significant overlap of CCT genes between tissues, a very
high proportion of total CCT genes (∼70%) are only found in a single tissue. The lowest
overlap between tissues for the CCT-AB list (52 genes) was between the ear and leaf
tissue, arguably the two most developmentally different tissues studied. This trend is
seen in candidate genes as well as when considering all assayed genes. There have been
relatively few genome-wide studies using F1 hybrids to dissect cis and trans effects and
even fewer that consider multiple tissues [69, 72], but our results are consistent with these
previous studies where ∼70% of identified genes were identified in single tissues. Overall,
many CCT genes are shared between tissues, but the majority of genes are tissue specific,
suggesting modification of both globally active and tissue specific CREs occurred during
maize domestication.
Even though gene expression is highly correlated between tissues, there is evidence for
approximately 20% more functional, consistent cis regulatory changes in the ear seen in
93
the larger number of CCT genes in the ear tissue (555) than in leaf (445) and stem (431).
The imbalance in number of differentially expressed genes in different tissues was also
observed in a recent study looking in Arabidopsis [72], where the three studied tissues
had an approximately 80% difference in number of differentially expressed genes. The
maize and teosinte ear have massive morphological differences in terms of size, placement
of spikelets, glume, and absence of fruit case. These morphological differences may be
due in part to these frequent tissue specific cis regulatory differences. This observation
is again at odds with the view of large morphological change in evolution/domestication
caused by mutation of CREs for a few “master regulator” genes [12]. Instead this data
again sheds light on the many single gene expression changes through “shaving” of allele
specific expression with modification of multiple tissue specific CREs.
3.5.4 Bias toward increased maize expression?
In the F1 hybrid analysis ∼55% of genes have higher expression of the maize allele than the
teosinte allele. High expression of the maize allele also occurs in the comparison between
parent inbred lines, except for leaf, where there is the same number of genes favoring
maize and teosinte alleles. This same trend of up regulated maize expression extends to
the CCT gene lists, where ∼60% of genes favor the maize allele. Our observation of high
expression for one of the parents (maize) is also consistent with several previous studies in
multiple organisms including maize [18], cotton [17], Arabidopsis [72], Cirsium [68], and
fruit fly [21]. Our experimental method using parent derived pseudo-transcriptomes and
perfect alignment to segregating sites should ameliorate the issue of alignment bias, but
we cannot be sure to have fully eliminated it. While potential alignment bias prevents
firm conclusions, genes consistent across all maize and teosinte inbreds are less likely to
be artifacts, suggesting the overall bias for maize alleles seen in candidate genes is a real
phenomenon.
94
3.5.5 Selection-candidates enriched for cis regulatory change
Changes in gene expression, specifically through altered CREs, is not uncommon in the
history of domesticated crops. These changes have led to increased fruit size in tomato
[16], maize apical dominance [3, 40, 99], loss of prolificacy in maize [4], and changes
in rice yield and flowering time [57, 58]. These examples represent cases where large
sometimes pleiotropic genetic changes are caused by singular genes. There is no disputing
the important role of these types of genetic changes in creating some of the world’s most
productive crops. However, this study sheds light on the hundreds of other genes with
differential expression patterns, caused by CREs, between maize and teosinte.
These hundreds of genes with regulatory differences between maize and teosinte are
enriched in selection features [55] and have stronger selection upstream and at the gene
in comparison to conserved genes. Positive selection for regulatory effects is restricted to
genes specifically with CRE differences, since genes with trans only regulatory change are
never enriched for selected genes. While genes with consistent CREs differentiating pop-
ulations are not all likely to play large, equal, or even critical roles in the domestication of
maize. Corroborating evidence such as selection scans can provide the information needed
to elucidate truly important players in the domestication process, even if discovering the
function for all of these genes in domestication is likely an impossible task.
One example of how data from other sources, such as selection scans, can help shed
light on candidates is the importance of cis effect magnitude. A number of genes in
this study show large shifts in expression between maize and teosinte (log2(M:T) > 10),
however, the magnitude of cis effect has no correlation with strength of selection, suggest-
ing magnitude of effect is not particularly important. In retrospect, this is not surprising
considering subtle changes in gene expression are known to cause drastic phenotypic differ-
ences. New tissue specific shifts in gt1 expression largely led to elimination of secondary
ears in maize [4] and a relatively moderate 2-fold change in expression of tb1 leads to
95
greatly increased apical dominance [3, 40]. In light of this result, selection on CREs dur-
ing maize domestication may be best characterized as subtle fine-tuning of expression
patterns to generate phenotypic change.
3.5.6 Leaf tissue candidates are enriched for photosynthesis and
chloroplast GO terms
A number of gene ontology terms implicated in photosynthesis and carbon fixation were
found to be enriched in the leaf CCT-ABC list. Mapping these genes back to photosyn-
thesis and carbon fixation pathways show two components in the photosystem I receptor
as well as part of the ATP synthase (delta subunit). Additionally, a number of enzymes
involved in carbon fixation were found to be up or down regulated in maize through cis
regulatory means. Most of these enzymes were involved in reactions converting malate to
other substrates in carbon fixation.
Cytosolic and mitochondrial forms of malate dehydrogenase (mdh) were two of the
identified differentially expressed genes. Mdh2, a mitochondrial form, is higher in teosinte,
whereas mdh4, cytosolic, is expressed at a higher level in maize. These expression dif-
ferences suggest there were changes made to malate-oxaloacetate flux between the mi-
tochondria and cytoplasm during maize domestication. Movement of oxaloacetate (OA)
has important implications in energy metabolism and photorespiration [100, 101]. The
changes in expression suggest there may be lower conversion between OA and malate
within the mitochondrial matrix, leading to reduced malate in the mitochondrial and re-
duced transport of OA into mitochondria. In theory this would leave more OA in the
cytoplasm where it would be available for conversion to malate and transport to bundle
sheath cells for photosynthesis. This could lead to improved rates of photosynthesis in
maize. However, these results should be treated with caution, since the malate dehydro-
96
genase enzymes identified are on a secondary candidate gene list and are not considered
to be our best candidates.
3.5.7 Do crop domestication genes show cis differences?
Domestication is characterized by a number of common phenotypes including gigantism,
loss of prolificacy, loss of shattering, changes to pollination mechanisms, apical dominance,
and branching that are collectively considered the domestication syndrome [10, 11]. While
domestication syndrome is characterized by several common phenotypes, the genetic mod-
ifications that lead to these traits may or may not be due to changes in homologous genes.
Genes such as waxy [102–104], tb1 [3, 105], and ghd7 [50, 57] represent several genes that
were selected on in multiple crop species, however, there are many more unique genes
controlling domestication traits [106–109]. To get a sense of the regulatory status of sev-
eral crop domestication genes in maize, we generated a list of 28 domestication genes (6
maize and 22 non-maize) and identified the closest homologous gene in maize by protein
to protein BLAST (Table 3.7). Of these 28 genes, only sugary1 from maize, an isoamylase
starch debranching enzyme, in the ear was on the CCT-B gene list. Furthermore, only two
of the remaining genes were on the C list. The inability to identify cis regulatory changes
for maize homologs of non-maize domestication genes suggests cis regulatory change in
a domestication context may tend to operate on unique genes in different organisms as
opposed to a single gene with conserved functions in multiple species.
3.5.8 A catalog of genes with cis regulatory variation
A product of this study, similar to selection scans, is a list of candidates for future in-
vestigation. The complete set of 25,000 genes (with information on RNAseq read counts,
parent and F1 expression ratios, regulatory classification, and other summary informa-
97T
able
3.7:
Reg
ula
tory
cate
gory
ofth
ecl
oses
tm
aize
hom
olog
of6
mai
zean
d22
non
-mai
zedom
esti
cati
onlo
ci.
Org
anis
mL
ocu
sN
ame
Funct
ional
Chan
geE
arL
eaf
Ste
m
Reg
.C
at.
CC
TR
eg.
Cat
.C
CT
Reg
.C
at.
CC
T
Mai
zetg
a1C
odin
gtr
ans
only
--
--
-M
aize
Zm
YA
B2.
1E
xpre
ssio
nci
s+
tran
s-
--
--
Mai
zeS
h2E
xpre
ssio
nco
nse
rved
Dtr
ans
only
Dco
mp.
DM
aize
Su
1C
odin
gci
son
lyB
com
p.
D-
-M
aize
gt1
Expre
ssio
ntr
ans
only
D-
-ci
sx
tran
sD
Mai
zetb
1E
xpre
ssio
nci
s+
tran
sD
--
ambig
uou
sD
Am
aran
ths
wax
yC
odin
gci
s+
tran
sD
cis
xtr
ans
Dco
nse
rved
DB
arle
yN
ud
Del
etio
n-
-tr
ans
only
Dtr
ans
only
DB
rass
ica
qFT
10-4
Expre
ssio
ntr
ans
only
D-
--
-B
rass
ica
BoC
AL
Codin
gci
son
lyC
cis
xtr
ans
-tr
ans
only
-P
eaP
sEL
F3
Codin
gci
sx
tran
sD
tran
son
lyD
cis
+tr
ans
DR
ice
DT
H2
Uncl
ear
tran
son
lyD
conse
rved
Dco
nse
rved
DR
ice
GS
6C
odin
gam
big
uou
s-
conse
rved
--
-R
ice
GS
5E
xpre
ssio
nco
nse
rved
Dam
big
uou
sD
conse
rved
DR
ice
qSH
1E
xpre
ssio
nci
son
lyD
conse
rved
Dtr
ans
only
DR
ice
shat
1C
odin
gco
mp.
Dam
big
uou
sD
tran
son
lyD
Ric
eB
h4C
odin
g-
-co
nse
rved
--
-R
ice
TA
C1
Expre
ssio
nci
sx
tran
sD
cis
only
Cci
son
lyD
Ric
eG
W2
Codin
gco
mp.
Dci
son
lyD
cis
only
DR
ice
Ehd
1C
odin
gci
sx
tran
sD
cis
+tr
ans
Dtr
ans
only
DR
ice
BA
DH
2C
odin
gci
sx
tran
sD
tran
son
lyD
cis
only
DR
ice
OsS
PL
16E
xpre
ssio
nci
son
ly-
--
--
Ric
eqP
E9-
1L
oss
ofF
unct
ion
com
p.
Dtr
ans
only
D-
-Sor
ghum
Sh1
expre
ssio
nci
s+
tran
s-
--
--
Sor
ghum
Tan
nin
1C
odin
gco
nse
rved
-co
nse
rved
-tr
ans
only
-T
omat
oF
AS
Expre
ssio
nci
s+
tran
s-
--
--
Whea
tQ
Codin
gan
dex
pre
ssio
nci
sx
tran
sD
tran
son
lyD
conse
rved
DW
hea
tV
rn1
Expre
ssio
nci
son
ly-
--
--
98
tion) will be a valuable tool to investigators for screening for new genes of interest and
answering preliminary questions about the expression of specific genes.
From example, one attractive CCT candidate gene is barren stalk1 (ba1 ), a known
maize single gene mutant that causes a defect in branch formation in both the whole plant
and tassel [110]. The wild type function of ba1 is inferred to be in branch initiation. In
our study, ba1 was one of our strongest candidates with all assayed crosses showing higher
expression of the maize allele in the ear. The overall shift in expression was substantial
( 4-fold) and this shift is caused by cis regulatory differences alone. ba1 was also found
to be under selection during maize domestication in two independent studies [55, 110].
These combined observations suggest that there was selection for a CRE that codes the
upregulation of ba1 in the ear, perhaps resulting in a greater number of rows (branches)
of kernels in the maize ear as compared to the teosinte ear. Compelling evidence for this
hypothesis could be obtained by fine-mapping and identifying the hypothesized CRE and
demonstrating with expression assays that the maize and teosinte alleles of the CRE have
the imagined effects on gene expression during ear development and on phenotype (kernel
row number) in the adult ear. ba1 illustrates the power of genomic scans to identify
strong candidates for future study that can inform us about the fine details of evolution
under domestication.
99
Appendices
100
Appendix A
Supplemental Content: Genetic
dissection of a genomic region with
pleiotropic effects on domestication
traits in maize reveals multiple
linked QTL
101
A.1 Figures
Figure A.1: Histograms of the least squared means for phenotyped traits from the QTLmapping population. Several of these distributions are approximately normal, but othertraits take on an exponential distribution. The average least squared mean for NIRILswith 100% maize and teosinte genotypes is indicated with an arrow and “M” for maizeand “T” for teosinte.
102
Figure A.2: Example histograms of simulated traits for several different conditions interms of number of causative loci, effect size, and heritability. Histograms from traitswith equal effects - 67% H2, equal effects - 90% H2, gamma distributed effect - 67% H2and gamma distributed effect - 90% H2 are shown in different columns from left to right.Histograms from simulated traits with one, five, ten, twenty, fifty, seventy-five, and onehundred causative loci are shown from top to bottom. The average simulated phenotypevalue for NIRILs that are 100% maize and teosinte are indicated with arrows labeled by“M” for maize and “T” for teosinte.
103
Figure A.3: Proportion of detected QTL with zero, one, or multiple causative genes in the1.5 LOD support interval. As seen in the equal effect size simulations, a high number ofgamma distributed causative genes leads to detected QTL with multiple causative factors.There is a reasonable percentage of detected QTL in the simulations containing a singlecausative gene when few (less than 4) causative genes are simulated, but as the numberof simulated causative genes increases we quickly lose the power to distinguish betweenclosely linked causative genes and they become lumped into single detected QTL.
104
A.2 Tables
Table A.1: RFLP Markers used during backcrossing of QTL mapping population.
Marker Chromosome Marker Chromosome
bnl5.62 1 php20725 4umc157 1 umc19 4umc37b 1 umc127a 4npi255 1 bnl10.17b 4BZ2 1 umc15 4
bnl8.10 1 bnl8.23 4npi615 1 bnl8.33 5umc107 1 bnl6.25 5npi225 1 umc90 5bnl8.45 2 umc27 5umc53 2 umc166 5npi320 2 bnl7.71 5npi421 2 npi412 5umc6 2 umc54 5umc34 2 umc127b 5umc134 2 umc104a 5umc131 2 bnl6.29 6umc2b 2 umc65 6umc5a 2 umc21 6
php20005 2 umc46 6umc122 2 umc132 6umc49a 2 umc62 6umc36 2 npi114 8umc32 3 bnl9.11 8umc121 3 umc117 8
php20042 3 umc7 8umc42b 3 npi253 9umc161 3 umc113 9umc18 3 umc81 9TE1 3 umc95 9
bnl5.37 3 bnl3.04 10bnl8.01 3 umc130 10umc60 3 umc49b 10
bnl12.97 3 umc117b 10php10080 3 bnl7.49 10
npi425 3umc2a 3
105
Table A.2: Genetic markers used to score BC6S6 mapping population.
Marker Genetic Position AGPv2
umc2036 0.00 6,985,618bnlg565 6.54 8,492,871bnlg105 20.90 13,812,586phi008 21.54 14,072,755umc2293 25.26 15,110,054umc2060 27.79 16,462,750bnlg1046 31.75 18,701,374umc2035 42.17 23,891,611umc1705 45.36 28,196,243umc1056 48.10 32,036,007umc2294 48.43 33,783,084umc1935 53.24 51,438,549umc1850 54.79 54,416,924mmp58 61.98 74,916,830GRMZM2G116761 63.55 82,236,166umc2298 65.07 84,800,717umc1110 65.39 84,825,409umc1224 66.70 92,368,617umc1283 67.52 111,997,867bnlg1287 67.69 121,584,002dupssr10 68.70 142,483,421bnlg2323 74.26 151,717,831ZHL0301 77.01 159,447,730umc1348 81.83 166,576,639umc1966 86.64 169,231,037
106
Appendix B
Supplemental Content: Fine
mapping of chromosome five
domestication genes in maize
107
B.1 Tables
Table B.1: PCR markers used for genotyping RCNILs including gene or SNP target,AGPv2 position, and primer sequence.
Gene or SNP Name AGPv2 Position Primers
GRMZM2G003313 38,994,478CCACAGAATCTCTCCACCAGACTTTTGCTTCTCACCCCAGA
GRMZM2G048045 62,595,351GCCTACGAGCTGCAACAGGGCCCTCCGTTCTACACACAG
GRMZM2G116761 82,236,265TCGCATCTGGAAAGAGCTTC
TGAATTGCAAAAGAGGAAACA
PZE-105075181 82,970,868GGCCCGGGCTAGAGAACCGAGTGCGGAGCTTGGGACCGAC
GRMZM2G158520 82,952,563TCGGGCACGAAAGGTGTCGCCACTCTCTCCCGCTCCCGCT
GRMZM2G387127 83,436,098CGCAAGCCGATCTTTTACTCGCAGTTGAACTCGAAGTGGA
GRMZM2G387127 83,436,808CGCAAGCCGATCTTTTACTCGCAGTTGAACTCGAAGTGGA
GRMZM2G026117 84,249,368CTCAGGCCAAGGTCTCACTCAGAGTGTGCGGCTTTCAGTT
umc1110 84,825,350TTACACCAAGGTCCGAAACAAGATTCTTGGAAGGCAAGACTCTACCTG
PZE-105076775 85,553,605CAAACCTCCCAAGAGAATGCTTGATGCAGATTCGCTGAAC
GRMZM2G017882 85,864,165GTCCGCCTCGGCGACCTAGACCAGAGGGGACCTGTGGGGG
AC207043.3 FG002 86,014,290CCACACTCATTTGACCAACGTGACGCGTGTTCTAGCTTGT
AC207043.3 FG002 86,014,338CCACACTCATTTGACCAACGTGACGCGTGTTCTAGCTTGT
PZE-105077135 86,221,700AAAGACGCAGCAGGAGAGAGTGCTACGTTACAGGCTGTCG
Table B.1: (continued)
108
Gene or SNP Name AGPv2 Position Primers
GRMZM2G102758 86,783,453AGCAGGGTCAAGGACTACCATCCTGCAGCTCCTCTTCTTC
GRMZM2G063106 87,114,719TGCATTTCTCTGACCTCCTTGTCCGACTTGAGGATCCTGTT
umc1283 111,997,810CTGCTCCCTTATGATGTGATGATGTGCACTGAGGTGTAGGTAGAGCAA
GRMZM2G012923 151,446,717AGCAAAGCATGGGCTAGTGTGCCATGCTGCTTATGGATCT
GRMZM2G027886 159,447,674AACAGCTTTGCTTCCCTGAACCCAGAGGATCCAGAGTCAG
umc1348 166,576,570CTCACTGACACTTGAACACACACGTTACTGGTCTCCTGATCCTTAGCG
umc1221 168,671,954GCAACAGCAACTGGCAACAG
AAACAGGCACAAAGCATGGATAG
umc1966 169,230,959GTTTTCGACGAGGGGACTACATTTCACGGTTGAGAACTTCGCTTGTAG
109
Appendix C
Supplemental Content: The role of
cis regulatory evolution in maize
domestication
110
C.1 Figures
Figure C.1: Parent versus hybrid leaf tissue allele specific expression ratios. The parent(x-axis) versus F1 hybrid (y-axis) allele specific expression ratios are plotted against eachother. Regulatory category in terms of the combination of significant statistical testsdetermined using the method described in methods is shown designated by color. Pro-portion and count of genes falling into the various regulatory categories are also shown inthe lower right hand corner barplot.
111
Figure C.2: Parent versus hybrid stem tissue allele specific expression ratios. The parent(x-axis) versus F1 hybrid (y-axis) allele specific expression ratios are plotted against eachother. Regulatory category in terms of the combination of significant statistical testsdetermined using the method described in methods is shown designated by color. Pro-portion and count of genes falling into the various regulatory categories are also shown inthe lower right hand corner barplot.
112
Figure C.3: Dominance by additivity ratio grouped by regulatory category. Density plotsof gene dominance by additivity (D/A) ratios for the three tissues grouped by regulatorycategory. There is no obvious shift in the distribution for any of the tissues or regulatorycategories, indicating the gene regulatory category does not significantly impact overalladditivity or dominance.
113
C.2 Tables
Table C.1: Biological replicates of F1 hybrid and parent inbred lines for RNAseq expressionstudy with hybrid replicates internal and parent around the perimeter.
B73 CML103 Ki3 Mo17 Oh43 W22 Inbred
TIL01 2/2/2 0/2/2 2/2/2 2/2/2 2/2/2TIL03 2/1/1 2/2/2 1/2/2 1/2/2 2/2/2 2/1/1TIL05 2/2/2 2/2/2TIL09 2/2/1 2/2/2 3/2/2 2/2/2 2/2/2TIL10 2/2/2 2/2/2TIL11 2/2/2 2/2/2 2/2/2 2/2/2 2/2/2 2/2/2TIL14 4/2/2 2/2/2 2/1/2 2/2/2 1/2/2 2/2/2TIL15 2/2/2 2/2/2TIL25 4/3/2 3/2/2 2/2/2 2/2/2
Inbred 2/2/2 2/2/2 2/2/2 2/2/2 2/2/2 2/2/2
114
Table C.2: Adapter name, barcode sequence, and barcode length for Illumina adaptersused in RNAseq libraries.
Adapter # Adapter Name Barcode Sequence Barcode Length
1 PE YC3 GCATGT 5 nt2 PE YC4 TGTGCT 5 nt3 PE YC5 AGTCAT 5 nt4 PE YC6 GTAAGT 5 nt5 PE YC7 TCCTCT 5 nt6 PE YC8 CAGGTT 5 nt7 PE JM 1 TCCAT 4 nt8 PE JM 2 TAGCT 4 nt9 PE JM 3 GTTCT 4 nt10 PE JM 4 CGATT 4 nt11 PE TB 1 ATCGT 4 nt12 PE TB 2 GCTAT 4 nt13 PE TB 3 TGGAT 4 nt14 PE TB 4 ATGCT 4 nt15 PE ZL1 CACTAT 5 nt
115
Table C.3: Number of genomic paired end reads and coverage obtained for constructingpseudo-transcriptomes.
Inbred Line # Reads genome coverage
CML103 4.46E+08 21.24Ki3 4.38E+08 19.85
Mo17 2.57E+08 11.37Oh43 5.59E+08 20.56TI01 3.44E+08 14.5TI03 3.16E+08 13.15TI05 4.76E+08 17.8TI09 3.42E+08 15.21TI10 5.29E+08 24.29TI11 3.41E+08 15.97TI14 3.22E+08 13.82TI15 5.39E+08 24.22TI25 4.27E+08 19.93W22 3.07E+08 13.19
Average 4.03E+08 17.50714
116
Table C.4: Proportion of divergence due to cis regulatory effect grouped by overallparental divergence.
Gene Group1 N Tissue % cis ± SE
All genes 15939 Ear 0.4519 ± 0.00210 to 1 14140 Ear 0.4583 ± 0.00221 to 2 1312 Ear 0.3918 ± 0.00812 to 3 268 Ear 0.3524 ± 0.01883 to 4 95 Ear 0.337 ± 0.02984 to 5 45 Ear 0.4713 ± 0.0495
5+ 79 Ear 0.7777 ± 0.0273
All genes 15925 Leaf 0.4164 ± 0.00210 to 1 13784 Leaf 0.4262 ± 0.00221 to 2 1739 Leaf 0.3309 ± 0.00652 to 3 277 Leaf 0.3752 ± 0.01733 to 4 52 Leaf 0.4458 ± 0.04374 to 5 21 Leaf 0.6534 ± 0.0566
5+ 52 Leaf 0.7707 ± 0.0298
All genes 16018 Stem 0.4704 ± 0.00210 to 1 14746 Stem 0.4715 ± 0.00221 to 2 1000 Stem 0.4284 ± 0.00962 to 3 149 Stem 0.4629 ± 0.02333 to 4 40 Stem 0.5051 ± 0.05394 to 5 23 Stem 0.6365 ± 0.059
5+ 60 Stem 0.8081 ± 0.0248
1 Group (except for “All genes”) indicates group-ing of genes by the absolute value of the parentlog2(Maize:Teosinte) ratio.
117
Table C.5: The number of genes for which the maize or teosinte allele is expressed at ahigher level.
CCT Group Tissue Maize Teosinte
A Ear 34 9A Leaf 16 5A Stem 19 8B Ear 319 193B Leaf 265 159B Stem 249 155C Ear 594 396C Leaf 533 310C Stem 558 382
ABC Ear 947 598ABC Leaf 814 474ABC Stem 826 545
118
Tab
leC
.6:
Bia
sfo
rth
em
aize
alle
legr
oup
edby
inbre
dline
for
the
thre
eti
ssues
inth
eC
CT
-AB
Cge
ne
list
.
Maiz
eIn
bre
dC
CT
Gro
up
Tis
sue
Teosi
nte
Bia
sN
oB
ias
Maiz
eB
ias
Maiz
e:T
eosi
nte
Rati
o
B73
AB
CE
ara
569
197
51.
7135
CM
L10
366
16
839
1.26
93K
i360
25
915
1.51
99M
o17
605
1284
51.
3967
Oh43
594
194
91.
5976
W22
640
488
91.
3891
non
-B73
606
093
91.
5495
B73
AB
CL
eafb
465
082
31.
7699
CM
L10
355
66
688
1.23
74K
i350
63
760
1.50
20M
o17
478
476
51.
6004
Oh43
477
080
71.
6918
W22
502
177
51.
5438
non
-B73
494
079
41.
6073
B73
AB
CSte
mc
524
084
71.
6164
CM
L10
358
27
739
1.26
98K
i355
52
793
1.42
88M
o17
520
480
61.
5500
Oh43
512
185
71.
6738
W22
546
181
41.
4908
non
-B73
545
082
61.
5156
aF
isher
’sE
xac
tT
est
for
B73
vers
us
cum
ula
tive
non
-B73
rati
o,p
=0.
1821
.b
Fis
her
’sE
xac
tT
esfo
rB
73ve
rsus
cum
ula
tive
non
-B73
rati
ot,
p=
0.25
39.
cF
isher
’sE
xac
tT
est
for
B73
vers
us
cum
ula
tive
non
-B73
rati
o,p
=0.
4326
.
119
Table C.7: Allele specific expression variation among F1 hybrids explained by maize andteosinte parent.
Tissue Category R2 maize R2 teosinte Maize/Teosinte Gene Count
Ear All genes 32.48% 38.21% 85.01% 13194Leaf All genes 31.76% 37.18% 85.43% 13121
Stem All genes 32.04% 38.56% 83.09% 13305Ear ABC 32.25% 41.37% 77.96% 1545
Leaf ABC 31.94% 39.79% 80.27% 1288Stem ABC 32.20% 41.26% 78.05% 1371
Ear AB 30.76% 42.95% 71.64% 555Leaf AB 30.61% 41.69% 73.42% 445
Stem AB 32.28% 42.22% 76.45% 431Ear A 26.58% 48.86% 54.41% 43
Leaf A 20.11% 47.63% 42.22% 21Stem A 28.86% 48.26% 59.80% 27
120
Table C.8: Number of genes for which the maize and/or teosinte parent contributed tothe variance among the F1 hybrid gene expression ratios (heterogeneous) and genes forwhich there was no variance in expression attributable to the maize or teosinte parent(homogeneous). CCT genes in groups A, B, and C in the three tissue types are shown.
Tissue Category Heterogeneous Homogenous Total
Maize Teosinte Maize+Teosinte
Ear All genes 1880 2959 2504 5851 13194Leaf All genes 1810 3005 2327 5979 13121Stem All genes 1924 3215 2645 5521 13305Ear ABC 195 417 350 583 1545Leaf ABC 165 322 285 516 1288Stem ABC 193 374 321 483 1371Ear AB 67 157 120 211 555Leaf AB 54 117 104 170 445Stem AB 57 128 105 141 431Ear A 3 17 5 18 43Leaf A 1 6 3 11 21Stem A 2 8 7 10 27
121T
able
C.9
:C
ompar
ison
ofob
serv
edan
dex
pec
ted
num
ber
sof
genes
clas
sified
asdiff
eren
tial
lyex
pre
ssed
(DE
)or
not
diff
er-
enti
ally
expre
ssed
(ND
E)
by
RN
Ase
qan
dM
icro
Arr
ayas
says
ingr
oups
A,
B,
and
Cin
the
thre
eti
ssue
typ
es.
CC
TG
roup
Tis
sue
Obse
rved
Exp
ecte
d
Mic
roA
rray
-ND
EM
icro
Arr
ay-D
EM
icro
Arr
ay-N
DE
Mic
roA
rray
-DE
AE
arR
NA
seq-N
DE
9587
184
9583
.56
187.
44A
Ear
RN
Ase
q-D
E25
428
.44
0.56
AL
eaf
RN
Ase
q-N
DE
9774
192
9771
.27
194.
73A
Lea
fR
NA
seq-D
E11
313
.73
0.27
ASte
mR
NA
seq-N
DE
9804
198
9802
.36
199.
64A
Ste
mR
NA
seq-D
E16
217
.64
0.36
AU
nio
nR
NA
seq-N
DE
1009
720
310
090.
0420
9.96
AU
nio
nR
NA
seq-D
E43
849
.96
1.04
AB
Ear
RN
Ase
q-N
DE
9244
165
9228
.50
180.
50A
BE
arR
NA
seq-D
E36
823
383.
507.
50A
BL
eaf
RN
Ase
q-N
DE
9482
170
9463
.41
188.
59A
BL
eaf
RN
Ase
q-D
E30
325
321.
596.
41A
BSte
mR
NA
seq-N
DE
9532
175
9513
.25
193.
75A
BSte
mR
NA
seq-D
E28
825
306.
756.
25A
BU
nio
nR
NA
seq-N
DE
9414
163
9381
.78
195.
22A
BU
nio
nR
NA
seq-D
E72
648
758.
2215
.78
AB
CE
arR
NA
seq-N
DE
8529
136
8498
.77
166.
23A
BC
Ear
RN
Ase
q-D
E10
8352
1113
.23
21.7
7A
BC
Lea
fR
NA
seq-N
DE
8842
147
8813
.36
175.
64A
BC
Lea
fR
NA
seq-D
E94
348
971.
6419
.36
AB
CSte
mR
NA
seq-N
DE
8835
154
8809
.58
179.
42A
BC
Ste
mR
NA
seq-D
E98
546
1010
.42
20.5
8A
BC
Unio
nR
NA
seq-N
DE
7970
121
7926
.07
164.
93A
BC
Unio
nR
NA
seq-D
E21
7090
2213
.93
46.0
7
122
Table C.10: Regulatory categories for genes identified as differentially expressed betweenmaize and teosinte by microarray assays.
Ear Leaf Stem
Ambiguous 5.81% 7.66% 9.65%Cis + Trans 25.73% 29.12% 22.39%Cis only 26.14% 28.74% 30.89%Cis x Trans 6.64% 6.13% 8.49%Componesatory 7.05% 8.05% 6.56%Conserved 13.28% 8.05% 12.74%Trans only 15.35% 12.26% 9.27%Total Genes 241 261 259
123
Table C.11: Fisher’s Exact Tests for the overlap between genes associated with differen-tially methylated regions (DMRs) and CCT-ABC genes from each of the three experi-mental tissues in our work.
Overlap Ear Leaf Stem Union
Expected 13.466 11.387 12.468 27.493Observed 19 14 17 34p-value 0.1092 0.4309 0.1755 0.1605
124
Table C.12: Number of candidate genes neighboring differentially methylated regions(DMRs) between maize and teosinte and proportion in which expression data agrees withmethylated status.
Ear Leaf Stem
Total 19 14 17A 1 0 0B 3 3 3C 15 11 14
Total-agree 57.90% 50.00% 58.80%A-agree 100% NA NAB-agree 100% 33.30% 33.30%C-agree 46.70% 54.50% 64.30%
125
Table C.13: Characteristics of dominance/additivity ratios from a genome-wide analysisincluding basic statistics such as max, min, mean, and median as well as average D/Aratio for seven regulatory categories and the CCT candidate lists.
Ear Leaf Stem
Min -10.4557 -273.675 -27.8545Max 10.56194 70.80451 78.71309
Median 0.032991 0.160156 -0.01118Mean 0.035682 0.211276 -0.01638
Positive D/A 6863 7385 6593Negative D/A 6331 5736 6712Pos:Neg Ratio 1.084031 1.287483 0.982271
N 13194 13121 13305
Z-test p-value 2.442e-05 1.486e-13 0.354Binomial p-value 3.775e-06 4.741e-47 0.306
Ambiguous -0.00408 0.020225 -0.00841Cis + Trans -0.00204 0.455915 0.05871
Cis only -0.02053 0.044602 0.063987Cis x Trans 0.14616 0.32702 -0.16874
Compensatory 0.052921 -0.08854 0.002721Conserved 0.049997 0.009092 -0.05574Trans only 0.08708 0.382572 -0.10058
CCT-A 0.03508 0.329661 0.026347CCT-AB -0.0169 0.094785 0.129459
CCT-ABC -0.04257 0.208951 0.077445
126
Table C.14: Additive and dominant gene counts for the A, AB, and ABC cis and trans onlycandidate lists. Dominance cells contain the number of genes for which the maize:teosinteallele was dominant. Fisher’s exact tests (FET) interrogate whether the degree of dom-inance/additivity differs between the cis and trans classes. The binomial test (BT) askswhether the number of maize:teosinte dominant alleles are equal.
Ear Leaf Stem
Add Dom Add Dom Add Dom
ACis only 11 1:0 5 1:0 3 2:1
Trans only 13 19:2* 5 4:3 2 0:2
FET p<0.005 FET p>0.05 FET p>0.05
ABCis only 95 22:18 53 18:17 52 19:20
Trans only 112 89:35* 72 81:29* 23 10:13
FET p<0.005 FET p<0.005 FET p>0.05
ABCCis only 266 62:65 136 50:56 178 68:71
Trans only 203 112:68* 121 107:65* 67 35:42
FET p<0.005 FET p<0.005 FET p<0.05
* Binomial test p-value < 0.005.
127
Table C.15: Degree of overlap between our CCT (AB list) genes and genes in differenttranscription factor families.
Family TissueAssayedGenes
ObservedOverlap
ExpectedOverlap
FETp-value
AP2 Ear 6 0 0.25 1ARF Ear 27 4 1.14 0.03
ARR-B Ear 8 0 0.34 1B3 Ear 18 1 0.76 0.54
BBR-BPC Ear 4 0 0.17 1BES1 Ear 3 0 0.13 1bHLH Ear 42 1 1.77 0.84bZIP Ear 51 0 2.15 1C2H2 Ear 28 2 1.18 0.33C3H Ear 42 1 1.77 0.84
CAMTA Ear 8 0 0.34 1CO-like Ear 3 0 0.13 1
CPP Ear 7 1 0.29 0.26DBB Ear 4 0 0.17 1Dof Ear 7 0 0.29 1
E2F/DP Ear 10 0 0.42 1EIL Ear 4 0 0.17 1ERF Ear 17 0 0.72 1FAR1 Ear 15 2 0.63 0.13
G2-like Ear 11 0 0.46 1GATA Ear 10 0 0.42 1GeBP Ear 14 0 0.59 1GRAS Ear 21 1 0.88 0.59GRF Ear 8 0 0.34 1
HB-other Ear 14 0 0.59 1HB-PHD Ear 2 0 0.08 1HD-ZIP Ear 19 1 0.8 0.56
HSF Ear 12 1 0.5 0.4LBD Ear 3 0 0.13 1LFY Ear 0 0 0 NALSD Ear 3 0 0.13 1
M-type Ear 6 1 0.25 0.23MIKC Ear 23 2 0.97 0.25MYB Ear 23 2 0.97 0.25
MYB related Ear 42 4 1.77 0.1NAC Ear 25 0 1.05 1
Table C.15: (continued)
128
Family TissueAssayedGenes
ObservedOverlap
ExpectedOverlap
FETp-value
NF-X1 Ear 2 0 0.08 1NF-YA Ear 10 0 0.42 1NF-YB Ear 7 0 0.29 1NF-YC Ear 7 0 0.29 1Nin-like Ear 11 1 0.46 0.38
RAV Ear 0 0 0 NAS1Fa-like Ear 0 0 0 NA
SBP Ear 12 0 0.5 1SRS Ear 2 0 0.08 1
STAT Ear 1 0 0.04 1TALE Ear 12 0 0.5 1TCP Ear 9 0 0.38 1
Trihelix Ear 22 0 0.93 1VOZ Ear 2 0 0.08 1
Whirly Ear 2 0 0.08 1WOX Ear 0 0 0 NA
WRKY Ear 20 0 0.84 1YABBY Ear 4 0 0.17 1ZF-HD Ear 1 0 0.04 1
ALL Ear 649 24 27.3 0.77
AP2 Leaf 8 0 0.27 1ARF Leaf 27 0 0.92 1
ARR-B Leaf 8 0 0.27 1B3 Leaf 16 1 0.54 0.42
BBR-BPC Leaf 4 0 0.14 1BES1 Leaf 3 0 0.1 1bHLH Leaf 41 0 1.39 1bZIP Leaf 42 0 1.42 1C2H2 Leaf 29 2 0.98 0.26C3H Leaf 41 0 1.39 1
CAMTA Leaf 8 0 0.27 1CO-like Leaf 5 0 0.17 1
CPP Leaf 7 0 0.24 1DBB Leaf 6 0 0.2 1Dof Leaf 8 0 0.27 1
E2F/DP Leaf 10 0 0.34 1EIL Leaf 4 0 0.14 1ERF Leaf 15 0 0.51 1FAR1 Leaf 14 0 0.47 1
Table C.15: (continued)
129
Family TissueAssayedGenes
ObservedOverlap
ExpectedOverlap
FETp-value
G2-like Leaf 16 0 0.54 1GATA Leaf 14 0 0.47 1GeBP Leaf 14 0 0.47 1GRAS Leaf 19 1 0.64 0.48GRF Leaf 6 0 0.2 1
HB-other Leaf 14 1 0.47 0.38HB-PHD Leaf 2 0 0.07 1HD-ZIP Leaf 16 1 0.54 0.42
HSF Leaf 10 0 0.34 1LBD Leaf 1 1 0.03 0.03LFY Leaf 0 0 0 NALSD Leaf 3 0 0.1 1
M-type Leaf 3 0 0.1 1MIKC Leaf 9 2 0.31 0.04MYB Leaf 31 2 1.05 0.28
MYB related Leaf 44 1 1.49 0.78NAC Leaf 28 2 0.95 0.25
NF-X1 Leaf 2 0 0.07 1NF-YA Leaf 9 0 0.31 1NF-YB Leaf 5 0 0.17 1NF-YC Leaf 8 0 0.27 1Nin-like Leaf 10 0 0.34 1
RAV Leaf 0 0 0 NAS1Fa-like Leaf 0 0 0 NA
SBP Leaf 11 0 0.37 1SRS Leaf 0 0 0 NA
STAT Leaf 1 0 0.03 1TALE Leaf 12 0 0.41 1TCP Leaf 8 0 0.27 1
Trihelix Leaf 22 1 0.75 0.53VOZ Leaf 2 0 0.07 1
Whirly Leaf 2 0 0.07 1WOX Leaf 0 0 0 NA
WRKY Leaf 16 0 0.54 1YABBY Leaf 4 0 0.14 1ZF-HD Leaf 1 0 0.03 1
ALL Leaf 623 15 21.13 0.94
AP2 Stem 8 0 0.26 1ARF Stem 27 3 0.87 0.06
Table C.15: (continued)
130
Family TissueAssayedGenes
ObservedOverlap
ExpectedOverlap
FETp-value
ARR-B Stem 8 0 0.26 1B3 Stem 14 0 0.45 1
BBR-BPC Stem 4 0 0.13 1BES1 Stem 3 0 0.1 1bHLH Stem 50 2 1.62 0.49bZIP Stem 47 1 1.52 0.79C2H2 Stem 28 2 0.91 0.23C3H Stem 41 1 1.33 0.74
CAMTA Stem 8 0 0.26 1CO-like Stem 4 0 0.13 1
CPP Stem 7 0 0.23 1DBB Stem 6 0 0.19 1Dof Stem 8 0 0.26 1
E2F/DP Stem 10 0 0.32 1EIL Stem 4 1 0.13 0.12ERF Stem 16 0 0.52 1FAR1 Stem 15 0 0.49 1
G2-like Stem 14 0 0.45 1GATA Stem 12 0 0.39 1GeBP Stem 13 0 0.42 1GRAS Stem 20 0 0.65 1GRF Stem 7 0 0.23 1
HB-other Stem 15 2 0.49 0.08HB-PHD Stem 2 0 0.06 1HD-ZIP Stem 17 1 0.55 0.43
HSF Stem 14 0 0.45 1LBD Stem 2 0 0.06 1LFY Stem 0 0 0 NALSD Stem 3 0 0.1 1
M-type Stem 4 1 0.13 0.12MIKC Stem 10 2 0.32 0.04MYB Stem 23 2 0.75 0.17
MYB related Stem 42 1 1.36 0.75NAC Stem 29 0 0.94 1
NF-X1 Stem 2 0 0.06 1NF-YA Stem 10 1 0.32 0.28NF-YB Stem 6 0 0.19 1NF-YC Stem 7 0 0.23 1Nin-like Stem 11 0 0.36 1
Table C.15: (continued)
131
Family TissueAssayedGenes
ObservedOverlap
ExpectedOverlap
FETp-value
RAV Stem 0 0 0 NAS1Fa-like Stem 0 0 0 NA
SBP Stem 11 0 0.36 1SRS Stem 2 0 0.06 1
STAT Stem 1 0 0.03 1TALE Stem 13 0 0.42 1TCP Stem 6 1 0.19 0.18
Trihelix Stem 23 0 0.75 1VOZ Stem 2 0 0.06 1
Whirly Stem 2 0 0.06 1WOX Stem 0 0 0 NA
WRKY Stem 19 0 0.62 1YABBY Stem 4 0 0.13 1ZF-HD Stem 0 0 0 NA
ALL Stem 640 20 20.73 0.6
AP2 Union 10 0 0.76 1ARF Union 27 6 2.06 0.01
ARR-B Union 8 0 0.61 1B3 Union 18 2 1.38 0.41
BBR-BPC Union 4 0 0.31 1BES1 Union 3 0 0.23 1bHLH Union 53 3 4.05 0.78bZIP Union 52 1 3.97 0.98C2H2 Union 31 4 2.37 0.21C3H Union 42 2 3.21 0.84
CAMTA Union 8 0 0.61 1CO-like Union 5 0 0.38 1
CPP Union 7 1 0.54 0.43DBB Union 6 0 0.46 1Dof Union 9 0 0.69 1
E2F/DP Union 10 0 0.76 1EIL Union 4 1 0.31 0.27ERF Union 18 0 1.38 1FAR1 Union 15 2 1.15 0.32
G2-like Union 18 0 1.38 1GATA Union 15 0 1.15 1GeBP Union 15 0 1.15 1GRAS Union 23 2 1.76 0.53GRF Union 8 0 0.61 1
Table C.15: (continued)
132
Family TissueAssayedGenes
ObservedOverlap
ExpectedOverlap
FETp-value
HB-other Union 15 2 1.15 0.32HB-PHD Union 2 0 0.15 1HD-ZIP Union 20 3 1.53 0.19
HSF Union 14 1 1.07 0.67LBD Union 3 1 0.23 0.21LFY Union 0 0 0 NALSD Union 3 0 0.23 1
M-type Union 7 2 0.54 0.09MIKC Union 25 5 1.91 0.04MYB Union 32 3 2.45 0.45
MYB related Union 48 4 3.67 0.51NAC Union 35 2 2.68 0.76
NF-X1 Union 2 0 0.15 1NF-YA Union 10 1 0.76 0.55NF-YB Union 7 0 0.54 1NF-YC Union 8 0 0.61 1Nin-like Union 11 1 0.84 0.58
RAV Union 0 0 0 NAS1Fa-like Union 0 0 0 NA
SBP Union 12 0 0.92 1SRS Union 2 0 0.15 1
STAT Union 1 0 0.08 1TALE Union 14 0 1.07 1TCP Union 9 1 0.69 0.51
Trihelix Union 24 1 1.83 0.85VOZ Union 2 0 0.15 1
Whirly Union 2 0 0.15 1WOX Union 0 0 0 NA
WRKY Union 23 0 1.76 1YABBY Union 4 0 0.31 1ZF-HD Union 1 0 0.08 1
ALL Union 724 49 55.34 0.84
Table C.15: (continued)
133
Table C.16: Degree of overlap between CCT (AB list) differentially expressed genes andgenes in the 1.5 support intervals for QTL from a previous study.
Trait TissueAssayedGenes
ObservedOverlap
ExpectedOverlap
FETp-value
BARE Ear 0 0 0 1DIAM Ear 29 4 1.22 0.03
DIS Ear 4 1 0.17 0.16DTP Ear 10 1 0.42 0.35
GLCO Ear 3 0 0.13 1GLU Ear 0 0 0 1KRN Ear 15 2 0.63 0.13KW Ear 17 1 0.72 0.52LEN Ear 4 1 0.17 0.16
PROL Ear 5 0 0.21 1STAM Ear 10 1 0.42 0.35BARE Leaf 0 0 0 1DIAM Leaf 28 0 0.95 1
DIS Leaf 4 1 0.14 0.13DTP Leaf 9 0 0.31 1
GLCO Leaf 3 0 0.1 1GLU Leaf 0 0 0 1KRN Leaf 13 0 0.44 1KW Leaf 17 3 0.58 0.02LEN Leaf 4 0 0.14 1
PROL Leaf 5 0 0.17 1STAM Leaf 9 1 0.31 0.27BARE Stem 0 0 0 1DIAM Stem 28 1 0.91 0.6
DIS Stem 4 0 0.13 1DTP Stem 10 0 0.32 1
GLCO Stem 3 0 0.1 1GLU Stem 0 0 0 1KRN Stem 14 1 0.45 0.37KW Stem 18 3 0.58 0.02LEN Stem 4 0 0.13 1
PROL Stem 5 0 0.16 1STAM Stem 10 0 0.32 1
134
Table C.17: Degree overlap between our CCT (AB list) differentially expressed genes andgenes in metabolic pathways defined in KEGG.
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Alpha-linoleic AcidMetabolism
Ear-CCT-A
26 14 0 0.046 1
chidonic AcidMetabolism
Ear-CCT-A
10 7 0 0.023 1
Biosynthesis ofUnsaturated Fatty
Acids
Ear-CCT-A
33 16 0 0.052 1
Cutin, Suberine,and Wax
Biosynthesis
Ear-CCT-A
10 5 0 0.016 1
Ether LipidMetabolism
Ear-CCT-A
11 7 0 0.023 1
Fatty AcidBiosynthesis
Ear-CCT-A
32 19 0 0.062 1
Fatty AcidDegradation
Ear-CCT-A
34 27 0 0.088 1
Fatty AcidElongation
Ear-CCT-A
16 8 0 0.026 1
GlycerolipidMetabolism
Ear-CCT-A
46 31 0 0.101 1
Glycerophospho-lipid
Metabolism
Ear-CCT-A
64 45 0 0.147 1
Linoleic AcidMetabolism
Ear-CCT-A
12 5 0 0.016 1
SphingolipidMetabolism
Ear-CCT-A
21 13 0 0.042 1
Starch and sucrosemetabolism
Ear-CCT-A
98 59 0 0.192 1
SteroidBiosynthesis
Ear-CCT-A
25 15 0 0.049 1
Synthe-sis/Degradation of
Ketone Bodies
Ear-CCT-A
8 8 0 0.026 1
ALLEar-
CCT-A353 223 0 0.727 1
Table C.17: 1 Tissue, candidate, and level of list.
135
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Alpha-linoleic AcidMetabolism
Ear-CCT-AB
26 14 1 0.589 0.452
Arachidonic AcidMetabolism
Ear-CCT-AB
10 7 0 0.294 1
Biosynthesis ofUnsaturated Fatty
Acids
Ear-CCT-AB
33 16 0 0.673 1
Cutin, Suberine,and Wax
Biosynthesis
Ear-CCT-AB
10 5 0 0.21 1
Ether LipidMetabolism
Ear-CCT-AB
11 7 0 0.294 1
Fatty AcidBiosynthesis
Ear-CCT-AB
32 19 2 0.799 0.189
Fatty AcidDegradation
Ear-CCT-AB
34 27 1 1.136 0.687
Fatty AcidElongation
Ear-CCT-AB
16 8 0 0.337 1
GlycerolipidMetabolism
Ear-CCT-AB
46 31 0 1.304 1
Glycerophospho-lipid
Metabolism
Ear-CCT-AB
64 45 0 1.893 1
Linoleic AcidMetabolism
Ear-CCT-AB
12 5 1 0.21 0.193
SphingolipidMetabolism
Ear-CCT-AB
21 13 0 0.547 1
Starch and sucrosemetabolism
Ear-CCT-AB
98 59 3 2.482 0.454
Table C.17: 1 Tissue, candidate, and level of list.
136
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
SteroidBiosynthesis
Ear-CCT-AB
25 15 1 0.631 0.475
Synthe-sis/Degradation of
Ketone Bodies
Ear-CCT-AB
8 8 1 0.337 0.291
ALLEar-
CCT-AB
353 223 8 9.38 0.726
Alpha-linoleic AcidMetabolism
Ear-CCT-ABC
26 14 5 1.639 0.018
Arachidonic AcidMetabolism
Ear-CCT-ABC
10 7 1 0.82 0.582
Biosynthesis ofUnsaturated Fatty
Acids
Ear-CCT-ABC
33 16 2 1.874 0.575
Cutin, Suberine,and Wax
Biosynthesis
Ear-CCT-ABC
10 5 0 0.585 1
Ether LipidMetabolism
Ear-CCT-ABC
11 7 1 0.82 0.582
Fatty AcidBiosynthesis
Ear-CCT-ABC
32 19 3 2.225 0.388
Fatty AcidDegradation
Ear-CCT-ABC
34 27 5 3.162 0.203
Fatty AcidElongation
Ear-CCT-ABC
16 8 0 0.937 1
GlycerolipidMetabolism
Ear-CCT-ABC
46 31 3 3.63 0.721
Glycerophospho-lipid
Metabolism
Ear-CCT-ABC
64 45 5 5.269 0.619
Table C.17: 1 Tissue, candidate, and level of list.
137
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Linoleic AcidMetabolism
Ear-CCT-ABC
12 5 1 0.585 0.464
SphingolipidMetabolism
Ear-CCT-ABC
21 13 0 1.522 1
Starch and sucrosemetabolism
Ear-CCT-ABC
98 59 7 6.909 0.545
SteroidBiosynthesis
Ear-CCT-ABC
25 15 2 1.756 0.539
Synthe-sis/Degradation of
Ketone Bodies
Ear-CCT-ABC
8 8 1 0.937 0.631
ALLEar-
CCT-ABC
353 223 28 26.113 0.376
Alpha-linoleic AcidMetabolism
Ear-trans-A
26 14 1 0.062 0.06
Arachidonic AcidMetabolism
Ear-trans-A
10 7 0 0.031 1
Biosynthesis ofUnsaturated Fatty
Acids
Ear-trans-A
33 16 0 0.07 1
Cutin, Suberine,and Wax
Biosynthesis
Ear-trans-A
10 5 0 0.022 1
Ether LipidMetabolism
Ear-trans-A
11 7 1 0.031 0.03
Fatty AcidBiosynthesis
Ear-trans-A
32 19 0 0.084 1
Fatty AcidDegradation
Ear-trans-A
34 27 0 0.119 1
Fatty AcidElongation
Ear-trans-A
16 8 0 0.035 1
GlycerolipidMetabolism
Ear-trans-A
46 31 0 0.136 1
Table C.17: 1 Tissue, candidate, and level of list.
138
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Glycerophospho-lipid
Metabolism
Ear-trans-A
64 45 2 0.198 0.017
Linoleic AcidMetabolism
Ear-trans-A
12 5 1 0.022 0.022
SphingolipidMetabolism
Ear-trans-A
21 13 0 0.057 1
Starch and sucrosemetabolism
Ear-trans-A
98 59 2 0.259 0.028
SteroidBiosynthesis
Ear-trans-A
25 15 0 0.066 1
Synthe-sis/Degradation of
Ketone Bodies
Ear-trans-A
8 8 0 0.035 1
ALLEar-
trans-A353 223 5 0.98 0.003
Alpha-linoleic AcidMetabolism
Ear-trans-AB
26 14 2 0.506 0.089
Arachidonic AcidMetabolism
Ear-trans-AB
10 7 0 0.253 1
Biosynthesis ofUnsaturated Fatty
Acids
Ear-trans-AB
33 16 1 0.578 0.445
Cutin, Suberine,and Wax
Biosynthesis
Ear-trans-AB
10 5 2 0.181 0.012
Ether LipidMetabolism
Ear-trans-AB
11 7 1 0.253 0.227
Fatty AcidBiosynthesis
Ear-trans-AB
32 19 1 0.687 0.503
Fatty AcidDegradation
Ear-trans-AB
34 27 2 0.976 0.255
Table C.17: 1 Tissue, candidate, and level of list.
139
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Fatty AcidElongation
Ear-trans-AB
16 8 0 0.289 1
GlycerolipidMetabolism
Ear-trans-AB
46 31 4 1.121 0.025
Glycerophospho-lipid
Metabolism
Ear-trans-AB
64 45 5 1.627 0.023
Linoleic AcidMetabolism
Ear-trans-AB
12 5 1 0.181 0.168
SphingolipidMetabolism
Ear-trans-AB
21 13 0 0.47 1
Starch and sucrosemetabolism
Ear-trans-AB
98 59 3 2.133 0.36
SteroidBiosynthesis
Ear-trans-AB
25 15 0 0.542 1
Synthe-sis/Degradation of
Ketone Bodies
Ear-trans-AB
8 8 0 0.289 1
ALLEar-
trans-AB
353 223 15 8.062 0.016
Alpha-linoleic AcidMetabolism
Ear-trans-ABC
26 14 2 1.213 0.345
Arachidonic AcidMetabolism
Ear-trans-ABC
10 7 0 0.606 1
Biosynthesis ofUnsaturated Fatty
Acids
Ear-trans-ABC
33 16 1 1.386 0.766
Cutin, Suberine,and Wax
Biosynthesis
Ear-trans-ABC
10 5 2 0.433 0.063
Table C.17: 1 Tissue, candidate, and level of list.
140
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Ether LipidMetabolism
Ear-trans-ABC
11 7 2 0.606 0.118
Fatty AcidBiosynthesis
Ear-trans-ABC
32 19 1 1.646 0.821
Fatty AcidDegradation
Ear-trans-ABC
34 27 3 2.339 0.418
Fatty AcidElongation
Ear-trans-ABC
16 8 1 0.693 0.516
GlycerolipidMetabolism
Ear-trans-ABC
46 31 6 2.686 0.047
Glycerophospho-lipid
Metabolism
Ear-trans-ABC
64 45 7 3.898 0.09
Linoleic AcidMetabolism
Ear-trans-ABC
12 5 1 0.433 0.364
SphingolipidMetabolism
Ear-trans-ABC
21 13 0 1.126 1
Starch and sucrosemetabolism
Ear-trans-ABC
98 59 5 5.111 0.588
SteroidBiosynthesis
Ear-trans-ABC
25 15 2 1.299 0.378
Synthe-sis/Degradation of
Ketone Bodies
Ear-trans-ABC
8 8 0 0.693 1
ALLEar-
trans-ABC
353 223 23 19.319 0.218
Alpha-linoleic AcidMetabolism
Leaf-CCT-A
26 13 0 0.021 1
Table C.17: 1 Tissue, candidate, and level of list.
141
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Arachidonic AcidMetabolism
Leaf-CCT-A
10 7 0 0.011 1
Biosynthesis ofUnsaturated Fatty
Acids
Leaf-CCT-A
33 19 0 0.03 1
Cutin, Suberine,and Wax
Biosynthesis
Leaf-CCT-A
10 6 0 0.01 1
Ether LipidMetabolism
Leaf-CCT-A
11 7 0 0.011 1
Fatty AcidBiosynthesis
Leaf-CCT-A
32 19 0 0.03 1
Fatty AcidDegradation
Leaf-CCT-A
34 30 0 0.048 1
Fatty AcidElongation
Leaf-CCT-A
16 9 0 0.014 1
GlycerolipidMetabolism
Leaf-CCT-A
46 34 0 0.054 1
Glycerophospho-lipid
Metabolism
Leaf-CCT-A
64 47 0 0.075 1
Linoleic AcidMetabolism
Leaf-CCT-A
12 5 0 0.008 1
SphingolipidMetabolism
Leaf-CCT-A
21 14 0 0.022 1
Starch and sucrosemetabolism
Leaf-CCT-A
98 62 0 0.099 1
SteroidBiosynthesis
Leaf-CCT-A
25 15 0 0.024 1
Synthe-sis/Degradation of
Ketone Bodies
Leaf-CCT-A
8 8 0 0.013 1
ALLLeaf-
CCT-A353 236 0 0.378 1
Alpha-linoleic AcidMetabolism
Leaf-CCT-AB
26 13 0 0.441 1
Table C.17: 1 Tissue, candidate, and level of list.
142
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Arachidonic AcidMetabolism
Leaf-CCT-AB
10 7 0 0.237 1
Biosynthesis ofUnsaturated Fatty
Acids
Leaf-CCT-AB
33 19 2 0.644 0.134
Cutin, Suberine,and Wax
Biosynthesis
Leaf-CCT-AB
10 6 2 0.203 0.016
Ether LipidMetabolism
Leaf-CCT-AB
11 7 1 0.237 0.215
Fatty AcidBiosynthesis
Leaf-CCT-AB
32 19 0 0.644 1
Fatty AcidDegradation
Leaf-CCT-AB
34 30 2 1.017 0.271
Fatty AcidElongation
Leaf-CCT-AB
16 9 0 0.305 1
GlycerolipidMetabolism
Leaf-CCT-AB
46 34 1 1.153 0.691
Glycerophospho-lipid
Metabolism
Leaf-CCT-AB
64 47 2 1.594 0.477
Linoleic AcidMetabolism
Leaf-CCT-AB
12 5 0 0.17 1
SphingolipidMetabolism
Leaf-CCT-AB
21 14 0 0.475 1
Starch and sucrosemetabolism
Leaf-CCT-AB
98 62 1 2.103 0.883
SteroidBiosynthesis
Leaf-CCT-AB
25 15 0 0.509 1
Table C.17: 1 Tissue, candidate, and level of list.
143
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Synthe-sis/Degradation of
Ketone Bodies
Leaf-CCT-AB
8 8 1 0.271 0.241
ALLLeaf-CCT-AB
353 236 9 8.004 0.408
Alpha-linoleic AcidMetabolism
Leaf-CCT-ABC
26 13 1 1.276 0.739
Arachidonic AcidMetabolism
Leaf-CCT-ABC
10 7 1 0.687 0.515
Biosynthesis ofUnsaturated Fatty
Acids
Leaf-CCT-ABC
33 19 3 1.865 0.285
Cutin, Suberine,and Wax
Biosynthesis
Leaf-CCT-ABC
10 6 2 0.589 0.111
Ether LipidMetabolism
Leaf-CCT-ABC
11 7 1 0.687 0.515
Fatty AcidBiosynthesis
Leaf-CCT-ABC
32 19 2 1.865 0.569
Fatty AcidDegradation
Leaf-CCT-ABC
34 30 7 2.945 0.023
Fatty AcidElongation
Leaf-CCT-ABC
16 9 0 0.883 1
GlycerolipidMetabolism
Leaf-CCT-ABC
46 34 4 3.338 0.432
Glycerophospho-lipid
Metabolism
Leaf-CCT-ABC
64 47 4 4.614 0.691
Linoleic AcidMetabolism
Leaf-CCT-ABC
12 5 0 0.491 1
Table C.17: 1 Tissue, candidate, and level of list.
144
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
SphingolipidMetabolism
Leaf-CCT-ABC
21 14 0 1.374 1
Starch and sucrosemetabolism
Leaf-CCT-ABC
98 62 6 6.086 0.577
SteroidBiosynthesis
Leaf-CCT-ABC
25 15 1 1.472 0.788
Synthe-sis/Degradation of
Ketone Bodies
Leaf-CCT-ABC
8 8 1 0.785 0.563
ALLLeaf-CCT-ABC
353 236 26 23.167 0.296
Alpha-linoleic AcidMetabolism
Leaf-trans-A
26 13 0 0.026 1
Arachidonic AcidMetabolism
Leaf-trans-A
10 7 0 0.014 1
Biosynthesis ofUnsaturated Fatty
Acids
Leaf-trans-A
33 19 0 0.038 1
Cutin, Suberine,and Wax
Biosynthesis
Leaf-trans-A
10 6 0 0.012 1
Ether LipidMetabolism
Leaf-trans-A
11 7 0 0.014 1
Fatty AcidBiosynthesis
Leaf-trans-A
32 19 0 0.038 1
Fatty AcidDegradation
Leaf-trans-A
34 30 0 0.059 1
Fatty AcidElongation
Leaf-trans-A
16 9 0 0.018 1
GlycerolipidMetabolism
Leaf-trans-A
46 34 0 0.067 1
Glycerophospho-lipid
Metabolism
Leaf-trans-A
64 47 0 0.093 1
Table C.17: 1 Tissue, candidate, and level of list.
145
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Linoleic AcidMetabolism
Leaf-trans-A
12 5 0 0.01 1
SphingolipidMetabolism
Leaf-trans-A
21 14 0 0.028 1
Starch and sucrosemetabolism
Leaf-trans-A
98 62 0 0.123 1
SteroidBiosynthesis
Leaf-trans-A
25 15 0 0.03 1
Synthe-sis/Degradation of
Ketone Bodies
Leaf-trans-A
8 8 0 0.016 1
ALLLeaf-
trans-A353 236 0 0.468 1
Alpha-linoleic AcidMetabolism
Leaf-trans-AB
26 13 1 0.447 0.365
Arachidonic AcidMetabolism
Leaf-trans-AB
10 7 0 0.241 1
Biosynthesis ofUnsaturated Fatty
Acids
Leaf-trans-AB
33 19 0 0.653 1
Cutin, Suberine,and Wax
Biosynthesis
Leaf-trans-AB
10 6 0 0.206 1
Ether LipidMetabolism
Leaf-trans-AB
11 7 0 0.241 1
Fatty AcidBiosynthesis
Leaf-trans-AB
32 19 1 0.653 0.486
Fatty AcidDegradation
Leaf-trans-AB
34 30 0 1.031 1
Fatty AcidElongation
Leaf-trans-AB
16 9 0 0.309 1
Table C.17: 1 Tissue, candidate, and level of list.
146
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
GlycerolipidMetabolism
Leaf-trans-AB
46 34 0 1.169 1
Glycerophospho-lipid
Metabolism
Leaf-trans-AB
64 47 0 1.616 1
Linoleic AcidMetabolism
Leaf-trans-AB
12 5 1 0.172 0.16
SphingolipidMetabolism
Leaf-trans-AB
21 14 1 0.481 0.387
Starch and sucrosemetabolism
Leaf-trans-AB
98 62 2 2.131 0.634
SteroidBiosynthesis
Leaf-trans-AB
25 15 1 0.516 0.408
Synthesis andDegradation ofKetone Bodies
Leaf-trans-AB
8 8 0 0.275 1
ALLLeaf-trans-AB
353 236 6 8.112 0.826
Alpha-linoleic AcidMetabolism
Leaf-trans-ABC
26 13 2 1.212 0.345
Arachidonic AcidMetabolism
Leaf-trans-ABC
10 7 0 0.652 1
Biosynthesis ofUnsaturated Fatty
Acids
Leaf-trans-ABC
33 19 1 1.771 0.844
Cutin, Suberine,and Wax
Biosynthesis
Leaf-trans-ABC
10 6 0 0.559 1
Ether LipidMetabolism
Leaf-trans-ABC
11 7 0 0.652 1
Table C.17: 1 Tissue, candidate, and level of list.
147
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Fatty AcidBiosynthesis
Leaf-trans-ABC
32 19 2 1.771 0.54
Fatty AcidDegradation
Leaf-trans-ABC
34 30 3 2.796 0.539
Fatty AcidElongation
Leaf-trans-ABC
16 9 0 0.839 1
GlycerolipidMetabolism
Leaf-trans-ABC
46 34 2 3.169 0.839
Glycerophospho-lipid
Metabolism
Leaf-trans-ABC
64 47 3 4.381 0.827
Linoleic AcidMetabolism
Leaf-trans-ABC
12 5 1 0.466 0.387
SphingolipidMetabolism
Leaf-trans-ABC
21 14 1 1.305 0.746
Starch and sucrosemetabolism
Leaf-trans-ABC
98 62 3 5.779 0.937
SteroidBiosynthesis
Leaf-trans-ABC
25 15 3 1.398 0.158
Synthe-sis/Degradation of
Ketone Bodies
Leaf-trans-ABC
8 8 1 0.746 0.543
ALLLeaf-trans-ABC
353 236 17 21.997 0.897
Alpha-linoleic AcidMetabolism
Stem-CCT-A
26 15 0 0.03 1
Arachidonic AcidMetabolism
Stem-CCT-A
10 7 0 0.014 1
Table C.17: 1 Tissue, candidate, and level of list.
148
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Biosynthesis ofUnsaturated Fatty
Acids
Stem-CCT-A
33 17 0 0.034 1
Cutin, Suberine,and Wax
Biosynthesis
Stem-CCT-A
10 6 0 0.012 1
Ether LipidMetabolism
Stem-CCT-A
11 7 0 0.014 1
Fatty AcidBiosynthesis
Stem-CCT-A
32 19 0 0.039 1
Fatty AcidDegradation
Stem-CCT-A
34 30 0 0.061 1
Fatty AcidElongation
Stem-CCT-A
16 8 0 0.016 1
GlycerolipidMetabolism
Stem-CCT-A
46 32 0 0.065 1
Glycerophospho-lipid
Metabolism
Stem-CCT-A
64 47 0 0.095 1
Linoleic AcidMetabolism
Stem-CCT-A
12 6 0 0.012 1
SphingolipidMetabolism
Stem-CCT-A
21 14 0 0.028 1
Starch and sucrosemetabolism
Stem-CCT-A
98 61 1 0.124 0.117
SteroidBiosynthesis
Stem-CCT-A
25 16 0 0.032 1
Synthe-sis/Degradation of
Ketone Bodies
Stem-CCT-A
8 8 0 0.016 1
ALLStem-
CCT-A353 235 1 0.477 0.382
Alpha-linoleic AcidMetabolism
Stem-CCT-AB
26 15 1 0.486 0.39
Arachidonic AcidMetabolism
Stem-CCT-AB
10 7 0 0.227 1
Table C.17: 1 Tissue, candidate, and level of list.
149
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Biosynthesis ofUnsaturated Fatty
Acids
Stem-CCT-AB
33 17 1 0.551 0.429
Cutin, Suberine,and Wax
Biosynthesis
Stem-CCT-AB
10 6 0 0.194 1
Ether LipidMetabolism
Stem-CCT-AB
11 7 1 0.227 0.206
Fatty AcidBiosynthesis
Stem-CCT-AB
32 19 1 0.615 0.465
Fatty AcidDegradation
Stem-CCT-AB
34 30 1 0.972 0.628
Fatty AcidElongation
Stem-CCT-AB
16 8 0 0.259 1
GlycerolipidMetabolism
Stem-CCT-AB
46 32 0 1.037 1
Glycerophospho-lipid
Metabolism
Stem-CCT-AB
64 47 2 1.523 0.453
Linoleic AcidMetabolism
Stem-CCT-AB
12 6 0 0.194 1
SphingolipidMetabolism
Stem-CCT-AB
21 14 0 0.454 1
Starch and sucrosemetabolism
Stem-CCT-AB
98 61 1 1.976 0.866
SteroidBiosynthesis
Stem-CCT-AB
25 16 1 0.518 0.41
Synthe-sis/Degradation of
Ketone Bodies
Stem-CCT-AB
8 8 1 0.259 0.232
Table C.17: 1 Tissue, candidate, and level of list.
150
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
ALLStem-CCT-AB
353 235 8 7.613 0.494
Alpha-linoleic AcidMetabolism
Stem-CCT-ABC
26 15 3 1.546 0.196
Arachidonic AcidMetabolism
Stem-CCT-ABC
10 7 1 0.721 0.533
Biosynthesis ofUnsaturated Fatty
Acids
Stem-CCT-ABC
33 17 2 1.752 0.535
Cutin, Suberine,and Wax
Biosynthesis
Stem-CCT-ABC
10 6 0 0.618 1
Ether LipidMetabolism
Stem-CCT-ABC
11 7 2 0.721 0.157
Fatty AcidBiosynthesis
Stem-CCT-ABC
32 19 1 1.958 0.874
Fatty AcidDegradation
Stem-CCT-ABC
34 30 4 3.091 0.374
Fatty AcidElongation
Stem-CCT-ABC
16 8 1 0.824 0.581
GlycerolipidMetabolism
Stem-CCT-ABC
46 32 1 3.297 0.969
Glycerophospho-lipid
Metabolism
Stem-CCT-ABC
64 47 9 4.843 0.048
Linoleic AcidMetabolism
Stem-CCT-ABC
12 6 0 0.618 1
SphingolipidMetabolism
Stem-CCT-ABC
21 14 0 1.443 1
Table C.17: 1 Tissue, candidate, and level of list.
151
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Starch and sucrosemetabolism
Stem-CCT-ABC
98 61 9 6.286 0.172
SteroidBiosynthesis
Stem-CCT-ABC
25 16 1 1.649 0.825
Synthe-sis/Degradation of
Ketone Bodies
Stem-CCT-ABC
8 8 1 0.824 0.581
ALLStem-CCT-ABC
353 235 29 24.215 0.176
Alpha-linoleic AcidMetabolism
Stem-trans-A
26 15 0 0.006 1
Arachidonic AcidMetabolism
Stem-trans-A
10 7 0 0.003 1
Biosynthesis ofUnsaturated Fatty
Acids
Stem-trans-A
33 17 0 0.006 1
Cutin, Suberine,and Wax
Biosynthesis
Stem-trans-A
10 6 0 0.002 1
Ether LipidMetabolism
Stem-trans-A
11 7 0 0.003 1
Fatty AcidBiosynthesis
Stem-trans-A
32 19 0 0.007 1
Fatty AcidDegradation
Stem-trans-A
34 30 0 0.011 1
Fatty AcidElongation
Stem-trans-A
16 8 0 0.003 1
GlycerolipidMetabolism
Stem-trans-A
46 32 0 0.012 1
Glycerophospho-lipid
Metabolism
Stem-trans-A
64 47 0 0.018 1
Linoleic AcidMetabolism
Stem-trans-A
12 6 0 0.002 1
SphingolipidMetabolism
Stem-trans-A
21 14 0 0.005 1
Table C.17: 1 Tissue, candidate, and level of list.
152
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Starch and sucrosemetabolism
Stem-trans-A
98 61 0 0.023 1
SteroidBiosynthesis
Stem-trans-A
25 16 0 0.006 1
Synthe-sis/Degradation of
Ketone Bodies
Stem-trans-A
8 8 0 0.003 1
ALLStem-
trans-A353 235 0 0.088 1
Alpha-linoleic AcidMetabolism
Stem-trans-AB
26 15 0 0.168 1
Arachidonic AcidMetabolism
Stem-trans-AB
10 7 0 0.078 1
Biosynthesis ofUnsaturated Fatty
Acids
Stem-trans-AB
33 17 0 0.19 1
Cutin, Suberine,and Wax
Biosynthesis
Stem-trans-AB
10 6 0 0.067 1
Ether LipidMetabolism
Stem-trans-AB
11 7 0 0.078 1
Fatty AcidBiosynthesis
Stem-trans-AB
32 19 0 0.213 1
Fatty AcidDegradation
Stem-trans-AB
34 30 0 0.336 1
Fatty AcidElongation
Stem-trans-AB
16 8 1 0.09 0.086
GlycerolipidMetabolism
Stem-trans-AB
46 32 0 0.358 1
Glycerophospho-lipid
Metabolism
Stem-trans-AB
64 47 0 0.526 1
Table C.17: 1 Tissue, candidate, and level of list.
153
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Linoleic AcidMetabolism
Stem-trans-AB
12 6 0 0.067 1
SphingolipidMetabolism
Stem-trans-AB
21 14 0 0.157 1
Starch and sucrosemetabolism
Stem-trans-AB
98 61 1 0.683 0.498
SteroidBiosynthesis
Stem-trans-AB
25 16 0 0.179 1
Synthe-sis/Degradation of
Ketone Bodies
Stem-trans-AB
8 8 0 0.09 1
ALLStem-trans-AB
353 235 2 2.632 0.743
Alpha-linoleic AcidMetabolism
Stem-trans-ABC
26 15 0 0.601 1
Arachidonic AcidMetabolism
Stem-trans-ABC
10 7 0 0.28 1
Biosynthesis ofUnsaturated Fatty
Acids
Stem-trans-ABC
33 17 0 0.681 1
Cutin, Suberine,and Wax
Biosynthesis
Stem-trans-ABC
10 6 0 0.24 1
Ether LipidMetabolism
Stem-trans-ABC
11 7 0 0.28 1
Fatty AcidBiosynthesis
Stem-trans-ABC
32 19 0 0.761 1
Fatty AcidDegradation
Stem-trans-ABC
34 30 1 1.202 0.707
Table C.17: 1 Tissue, candidate, and level of list.
154
Pathway Group1 PathwayGenes
AssayedGenes
Overlap(obs)
Overlap(exp)
FETp-value
Fatty AcidElongation
Stem-trans-ABC
16 8 1 0.32 0.279
GlycerolipidMetabolism
Stem-trans-ABC
46 32 1 1.282 0.73
Glycerophospho-lipid
Metabolism
Stem-trans-ABC
64 47 1 1.883 0.854
Linoleic AcidMetabolism
Stem-trans-ABC
12 6 0 0.24 1
SphingolipidMetabolism
Stem-trans-ABC
21 14 0 0.561 1
Starch and sucrosemetabolism
Stem-trans-ABC
98 61 4 2.444 0.228
SteroidBiosynthesis
Stem-trans-ABC
25 16 1 0.641 0.48
Synthe-sis/Degradation of
Ketone Bodies
Stem-trans-ABC
8 8 1 0.32 0.279
ALLStem-trans-ABC
353 235 8 9.414 0.73
Table C.17: 1 Tissue, candidate, and level of list.
155
Table C.18: Significantly enriched and depleted GO terms from CCT and trans only genelists including tissue, group, accession, description, counts, rate of occurrence, and FDRcorrected p-values.
Group1 GODescription
Cand.genesin acc.
Genesin acc.
Prop.cand.genes
Prop.assayedgenes
FDR
Leaf-CCT-ABC
chloroplast 135 937 0.144 0.071 0.002
Leaf-CCT-ABC
plastid 146 1062 0.137 0.081 0.007
Leaf-CCT-ABC
thylakoid 35 171 0.205 0.013 0.012
Leaf-CCT-ABC
chloroplastthylakoidmembrane
26 115 0.226 0.009 0.017
Leaf-CCT-ABC
DNA binding 2 43 771 0.056 0.059 0.016
Ear-trans-A
chlorophyllbiosynthetic
process3 13 0.231 0.001 0.027
Ear-trans-AB
nucleic acidbinding
transcriptionfactor activity
26 228 0.114 0.017 0
Ear-trans-AB
sequence-specific DNA
bindingtranscriptionfactor activity
26 228 0.114 0.017 0
Ear-trans-AB
regulation oftranscription,
DNA-dependent40 474 0.084 0.036 0
Ear-trans-AB
sequence-specific DNA
binding20 160 0.125 0.012 0
Ear-trans-AB
chlorophyllbiosynthetic
process6 13 0.462 0.001 0.001
Ear-trans-AB
DNA binding 49 798 0.061 0.06 0.013
Table C.18: 1 Tissue, candidate, and level of list. 2 Under-represented GO term
156
Group1 GODescription
Cand.genesin acc.
Genesin acc.
Prop.cand.genes
Prop.assayedgenes
FDR
Ear-trans-AB
biologi-cal process
273 6578 0.042 0.499 0.022
Ear-trans-ABC
nucleic acidbinding
transcriptionfactor activity
45 228 0.197 0.017 0
Ear-trans-ABC
sequence-specific DNA
bindingtranscriptionfactor activity
45 228 0.197 0.017 0
Ear-trans-ABC
regulation oftranscription,
DNA-dependent69 474 0.146 0.036 0.003
Ear-trans-ABC
chlorophyllbiosynthetic
process7 13 0.538 0.001 0.012
Ear-trans-ABC
sequence-specific DNA
binding29 160 0.181 0.012 0.02
Leaf-trans-AB
ribosome 30 294 0.102 0.022 0
Leaf-trans-AB
cell division 11 77 0.143 0.006 0.03
Leaf-trans-AB
microtubule 9 60 0.15 0.005 0.048
Leaf-trans-AB
structuralconstituent of
ribosome20 224 0.089 0.017 0.048
Leaf-trans-ABC
cell division 21 77 0.273 0.006 0.005
Table C.18: 1 Tissue, candidate, and level of list. 2 Under-represented
157
Appendix D
Characterization of domestication
traits for selection candidate gene
Zea agamous2
158
D.1 Forward
This appendix details unpublished work on characterization of a selected gene in maize
known as Zea agamous2 (zag2 ), a homolog of the Arabidopsis thaliana gene Agamous.
Work was carried out by myself with other members of the Doebley Lab contributing to
genotyping and phenotyping efforts.
159
D.2 Introduction
Multiple studies have looked to identify the signature of selection, both artificial and
natural, in evolving species [20, 35, 111, 112]. In maize, recent studies have looked at
the signature of selection on both a gene by gene basis [113, 114] and in genome-wide
scans [55]. Knowing a gene was under selection during domestication can be difficult to
interpret in terms of phenotypic impact due to the inherent lack of phenotype association
in population genetic analyses. While some indication as to phenotypic effect can be
drawn from analysis of selected genes with protein domain annotation and gene ontology
tools, concrete association of a gene with a phenotypic effect using empirical data is still
desired.
One gene identified as the target of artificial selection in a recent study [114] is a known
homolog of Agamous from Arabidopsis thaliana. This Agamous homolog (Zea agamous2
or zag2 ) is located on the third chromosome at ∼137.2 megabases (AGPv2). The trans-
lated protein of zag2 is 258 amino acids long and downstream of the highly conserved
MADS-box domain shares approximately 45% identity and 60% similarity with the Ara-
bidopsis Agamous gene [115]. Expression of zag2 is associated with the carpel or flowering
section of Arabidopsis thaliana and in maize zag2 appears to be exclusively expressed in
the carpels of developing ears [116]. The expression of zag2 mRNA in developing ears
suggests a likely effect on domestication phenotypes in the female inflorescence.
Our study of zag2 involved two techniques. First, we generated a set of recombinant
chromosome near isogenic lines (RCNILs) that had recombination breakpoints between
zag2 and both the next up and downstream genes. RCNILs were genotyped using three
markers (upstream, at the gene, and downstream) to identify the recombination break-
points location with respect to zag2. Lines were then planted and phenotyped in multiple
environments for a large number of phenotypes that focused on ear traits, but also in-
cluded a number of other plant and tassel traits. Second, a transgenic RNAi construct
160
carrying a portion of the zag2 gene was transformed into maize and backcrossed with
two maize inbred lines. We assessed percent fill as a proxy for sterility, while also testing
for presence of the construct using resistance to the BASTA herbicide. Neither of these
experiments produced evidence of a concrete link between a domestication phenotype and
zag2.
D.3 Methods
D.3.1 RCNILs
We screened 1,710 individuals in the winter and summer of 2009 that were drawn from a
heterogeneous inbred family, which was heterozygous at zag2. Markers used were umc1102
and PZD00100. From this screen, thirteen individuals with recombination breakpoints
between the upstream and downstream genes were identified. Recombinant individuals
were selfed and progeny were genotyped with the same markers in the winter 2010 season.
Homozygous individuals were identified and selfed again to produce founding members
of the RCNILs. RCNIL seed was then used in subsequent summers for seed increase and
replicated field block trials.
Genomic DNA was also extracted from founding RCNIL individuals and used to geno-
type at the zag2 coding sequence. This was done with PZD00013.3 (a Taqman SNP
marker) and ZHL0285-ZHL0286 (indel marker). We classified RCNILs by location of
breakpoint (up or downstream of zag2 ) and genotype (maize or teosinte) at zag2. This
resulted in four recombinant NIL classes and two control NIL classes.
Phenotyping blocks consisted of RCNILs and several control NILs that were homozy-
gous maize or teosinte for the entire zag2 region. Lines were planted in randomized twelve
plant plots in four blocks each in the summer of 2010 and 2011 at the West Madison
161
Agricultural Research Station (WMARS). Thirteen plant architecture traits and seven
ear traits were measured (Table D.1) for up to five plants per phenotyping block.
Phenotype measurements were fit to a basic linear mixed model (Equation D.1) in R
[91] using the lme4 package. This basic model only included explanatory variables for the
RCNIL line (ai) and the block as a random effect (bj). This was done because the overall
size of blocks was small and positional variation due to X and Y position seemed unlikely
to be significant.
yijk = µ+ ai + bj + eijk (D.1)
After this model was fit, fixed effects estimates and standard errors were extracted and
we looked for association of the phenotypes (represented by fixed effects estimates) with
NIL class.
D.3.2 Transgenic RNAi lines
A zag2 interference RNA (RNAi) construct was developed and introduced into maize.
Thirteen insertion events of the RNAi construct were recovered and crossed by maize
inbreds B73 and A682. The resulting progeny were then planted in the summer of 2009
and ears were harvested for observation of phenotypes. We scored the percent fill of ears
in an effort to assess sterility of individuals with and without the RNAi construct insertion
events. The construct carried a BASTA herbicide resistance gene, which allowed for the
scoring of presence/absence of the construct by BASTA herbicide treatment.
In total, 275 individuals both BASTA resistant and susceptible (construct present
or absent) were harvested and scored for the sterility phenotype. Scoring was done by
estimation of percent fill in a randomized, blind method to avoid bias caused by knowl-
edge of the individual construct genotype. Phenotypes were analyzed using simple t-test
comparisons in R [91].
162
Table D.1: Trait abbreviations and descriptions from the zag2 experiment.
Trait abbreviation Description
CULM Culm diameterBARE Barren nodesBRNO Number of nodes with silksLWID Leaf widthLCS Length of central spikeTBN Tassel branch numberEAHT Ear heightPLHT Plant heightTILL Tillering indexBRLH Branch length including earNODE Nodes on lateral branchLBIL Lateral branch internode lengthPROL ProlificacyFILL Percent fillEARL Ear lengthEARD Ear diameterKRN Kernel row numberCUPR Cupules per rankSTAM Percent staminate spikeletsKW Single kernel weight
163
D.4 Results
D.4.1 RCNILs
The fixed effects estimates and standard errors were sorted from least to greatest, plot-
ted as barplots, and inspected for association with RCNIL type, in terms of genotype
upstream, at, and downstream of zag2. While a few single RCNILs differed from others,
there was no distinct clustering of RCNIL type in clearly differentiated phenotype group-
ings for any of the thirteen plant and seven ear phenotypes. Generally, the phenotype
estimates for the maize and teosinte control NILs also did not cleanly separate from each
other. An example of RCNIL estimates sorted from least to greatest is shown for single
kernel weight in Figure D.1). While the maize and teosinte control NILs are not inter-
mingled, there is no clustering of genotypes of the four RCNIL types. Additionally, we
see RCNILs with lower phenotype estimates than the maize control NILs, suggesting that
if zag2 influences kernel weight it does so in an unexpected underdominant manner.
D.4.2 Transgenic RNAi lines
Generally, high percent fill was seen in transgenic plants. The two maize backgrounds
(B73 and A682) were not significantly different from each other in percent fill (t-test,
p = 0.525). Data was collected from only three RNAi transformation events in both
the A682 and B73 maize inbred backgrounds. In these three events, a consistent result
was only seen for one event (event 39 had no effect in either background, Table D.2),
suggesting the effect of an event is dependent on genetic background. Of the fifteen maize
transformation event and background combinations, only four had significantly different
percent fill between resistant plants (construct positive) and susceptible plants (construct
negative). Three of the significant results were large shifts with more than a 60% change
in percent fill while the fourth significant result was a more moderate 11% change.
164
Figure D.1: Single kernel weight estimates for zag2 RCNILs. RCNIL class is indicated inthe bar with error bars indicating the standard error. Maize and teosinte NILs are notintermingled, however, there is also no clear separation of the RCNIL types (t1, t2, t3,t4) when lines are sorted by estimated phenotype. Furthermore, RCNILs have a lowerphenotype than either of the control NILs, suggesting some sort of underdominance maybe at work.
165
Table D.2: Zag2 transgenic RNAi insertion event, background, phenotype, and t-testp-value.
MaizeBackground
EventPercent Fill(Resistant)
Percent Fill(Susceptible)
p-value
A682 17 97.8% 98.9% 6.63e-01
B73 23 90.0% 91.0% 6.49e-01
B73 24 100.0% 97.8% 3.47e-01
B73 33 23.0% 97.5% 2.94e-04
A682 35 20.0% 94.0% 6.01e-04B73 35 98.0% 98.8% 6.87e-01
A682 39 98.0% 90.0% 1.15e-01B73 39 93.3% 95.0% 4.89e-01
B73 43 32.2% 95.0% 1.17e-07
A682 45 100.0% 92.9% 2.53e-01
A682 46 95.6% 94.4% 8.29e-01
A682 47 84.4% 83.0% 8.56e-01B73 47 100.0% 88.9% 2.75e-03
B73 49 94.4% 86.7% 1.68e-01
B73 50 91.1% 90.0% 3.47e-01
166
D.5 Discussion
The results obtained from measurement of phenotypes in RCNILs do not present a clear
phenotypic effect of the zag2 gene. RCNIL estimates and standard errors of maize and
teosinte control lines were never significantly different from each other. Furthermore, the
remaining four genotype classes, distinguished by genotype upstream, at, and downstream
of zag2, failed to cluster in segregating groups based on phenotype. Overall, there is very
little if any evidence that zag2 has any effect on the 20 measured phenotypes.
The reduction in expression of zag2 via transgenic RNAi constructs, likewise failed
to present compelling evidence for a phenotypic effect on percent fill of the ear. Overall,
relatively few zag2 RNAi transformation events resulted in increased sterility (measured
by percent fill). The effect on sterility of any given event seems to be highly depen-
dent on genetic background, since less than half of the events assessed in multiple maize
backgrounds gave the same result. Most significant results consisted of drastic increase
in sterility, suggesting a major genetic dysfunction. We conclude that the zag2 RNAi
constructs have largely non-significant results, which are punctuated by several cases of
high genetic dysfunction. Furthermore, the inconsistent effects of specific transformation
events in different maize backgrounds seem unlikely to be related to zag2.
We failed to identify a phenotypic effect for zag2 in spite of evidence from the literature
that zag2 is expressed in the ear [116] and codes for a homolog of a known floral develop-
ment gene in Arabidopsis [115]. It may be that zag2 controls a phenotype that was under
selection during maize domestication that we did not measure. Work by Schmidt et al.
[116] shows that zag2 and another Agamous homolog (zag1 ) are expressed in endosperm
post-pollination, suggesting a potential role in kernel quality and composition. While we
did measure kernel weight, there are many factors that contribute to kernel quality and
desirability that we did not assess including hard to soft endosperm ratio, protein, oil,
and starch content.
167
A potential complicating factor in our analysis of zag2 is the existence of three ad-
ditional Agamous homologs in maize [115]. These homologs also share a high degree of
identity with the Arabidopsis Agamous and consequently, a high degree of identity and
similarity with each other. Of particularly high protein identity with zag2 is Zea mays
Mads1 (zmm1 ), which is over 95% identical. The high degree of identity between the
maize Agamous homologs is concerning in conjunction with expression in the same tis-
sues [116] as it suggests functional conservation as well as sequence conservation. For
example, if the zmm1 gene can substitute functionally for the zag2 gene in the develop-
ing ear, then an experiment looking for an ear phenotypic response (such as the RCNIL
experiment) would need to account for the genotype at both zag2 and zmm1.
The failure to associate a domestication phenotype with zag2 demonstrates the dif-
ficulty in using a population genetics approach to identify interesting candidate genes.
From the perspective of population genetics, zag2 appears to have been under selection
during the maize domestication event and has homology with a known floral development
gene in Arabidopsis. A phenotypic effect on a domestication ear phenotype seems quite
likely, however, we did not see any noticeable effects in the female inflorescence in these
experiments. Similar difficulties in associating phenotype to selection candidate genes
has been encountered for two other genes in our lab. The Prolamin-box Binding Factor1
gene was extensively phenotyped in plant architecture and ear traits (unpublished data),
before finally identifying a slight difference in kernel size and density [14]. Additionally,
the Zea agamous-like1 gene appears to have a significant effect on days to anthesis or
flowering time in maize (unpublished data), however, flowering time is not a standard do-
mestication trait. This study sheds light on the difficulty of associating phenotype with
a selection candidate gene and provides a word of caution for future studies seeking to
accomplish this feat.
168
References
[1] Gaines T, Zhang W, Wang D, Bukun B, Chisholm ST, et al. (2010) Gene ampli-fication confers glyphosate resistance in Amaranthus palmeri. Proceedings of theNational Academy of Sciences 107: 1029–34.
[2] Gompel N, Prud’homme B, Wittkopp PJ, Kassner V, Carroll SB (2005) Chancecaught on the wing: cis-regulatory evolution and the origin of pigment patterns inDrosophila. Nature 433: 481–7.
[3] Studer A, Zhao Q, Ross-Ibarra J, Doebley J (2011) Identification of a functionaltransposon insertion in the maize domestication gene tb1. Nature Genetics 43:1160–3.
[4] Wills DM, Whipple CJ, Takuno S, Kursel LE, Shannon LM, et al. (2013) FromMany, One: Genetic Control of Prolificacy during Maize Domestication. PLoSGenetics 9: e1003604.
[5] Wang H, Nussbaum-Wagler T, Li B, Zhao Q, Vigouroux Y, et al. (2005) The originof the naked grains of maize. Nature 436: 714–9.
[6] Sun L, Li X, Fu Y, Zhu Z, Tan L, et al. (2013) GS6, a member of the GRAS genefamily, negatively regulates grain size in rice. Journal of Integrative Plant Biology: 1–37.
[7] Olsen KM, Wendel JF (2013) A bountiful harvest: genomic insights into crop do-mestication phenotypes. Annual Review of Plant Biology 64: 47–70.
[8] Doebley J (2004) The genetics of maize evolution. Annual Review of Genetics 38:37–59.
[9] Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, et al. (2009) The B73 maizegenome: complexity, diversity, and dynamics. Science 326: 1112–5.
[10] Allaby RG, Fuller DQ, Brown TA (2008) The genetic expectations of a protractedmodel for the origins of domesticated crops. Proceedings of the National Academyof Sciences 105: 13982–6.
[11] Pickersgill B (2007) Domestication of plants in the Americas: insights fromMendelian and molecular genetics. Annals of Botany 100: 925–40.
169
[12] Carroll SB (2008) Evo-devo and an expanding evolutionary synthesis: a genetictheory of morphological evolution. Cell 134: 25–36.
[13] Wittkopp PJ, Kalay G (2012) Cis-regulatory elements: molecular mechanisms andevolutionary processes underlying divergence. Nature Reviews Genetics 13: 59–69.
[14] Lang Z, Wills D, Lemmon Z, Shannon L, Bukowski R, et al. (2014) Defining therole of prolamin-box binding factor1 gene during maize domestication. The Journalof Heredity : In Press.
[15] Konishi S, Izawa T, Lin SY, Ebana K, Fukuta Y, et al. (2006) An SNP caused lossof seed shattering during rice domestication. Science 312: 1392–6.
[16] Frary A, Nesbitt TC, Grandillo S, Knaap E, Cong B, et al. (2000) fw2.2 : a quan-titative trait locus key to the evolution of tomato fruit size. Science 289: 85–8.
[17] Rapp RA, Haigler CH, Flagel L, Hovav RH, Udall JA, et al. (2010) Gene expressionin developing fibres of Upland cotton (Gossypium hirsutum L.) was massively alteredby domestication. BMC Biology 8: 139.
[18] Swanson-Wagner R, Briskine R, Schaefer R, Hufford MB, Ross-Ibarra J, et al. (2012)Reshaping of the maize transcriptome by domestication. Proceedings of the NationalAcademy of Sciences 109: 11878–83.
[19] Koenig D, Jimenez-Gomez JM, Kimura S, Fulop D, Chitwood DH, et al. (2013)Comparative transcriptomics reveals patterns of selection in domesticated and wildtomato. Proceedings of the National Academy of Sciences 110: E2655–62.
[20] Emerson JJ, Hsieh LC, Sung HM, Wang TY, Huang CJ, et al. (2010) Naturalselection on cis and trans regulation in yeasts. Genome Research 20: 826–36.
[21] McManus CJ, Coolon JD, Duff MO, Eipper-Mains J, Graveley BR, et al. (2010)Regulatory divergence in Drosophila revealed by mRNA-seq. Genome Research 20:816–25.
[22] White Ma, Stubbings M, Dumont BL, Payseur Ba (2012) Genetics and evolutionof hybrid male sterility in house mice. Genetics 191: 917–34.
[23] Alem S, Streiff R, Courtois B, Zenboudji S, Limousin D, et al. (2013) Geneticarchitecture of sensory exploitation: QTL mapping of female and male receivertraits in an acoustic moth. Journal of Evolutionary Biology 26: 2581–96.
[24] Miller CT, Glazer AM, Summers BR, Blackman BK, Norman AR, et al. (2014)Modular Skeletal Evolution in Sticklebacks Is Controlled by Additive and ClusteredQuantitative Trait Loci. Genetics : In Press.
[25] Shannon LM (2012) The Genetic Architecture of Maize Domestication and RangeExpansion. Ph.D. dissertation. Ph.D. thesis, University of Wisconsin - Madison.
170
[26] Wills DM, Burke JM (2007) Quantitative trait locus analysis of the early domesti-cation of sunflower. Genetics 176: 2589–99.
[27] Paterson AH, Damon S, Hewitt JD, Zamir D, Rabinowitch HD, et al. (1991)Mendelian factors underlying quantitative traits in tomato: comparison acrossspecies, generations, and environments. Genetics 127: 181–97.
[28] Xiong LZ, Liu KD, Dai XK, Xu CG, Zhang Q (1999) Identification of geneticfactors controlling domestication-related traits of rice using an F2 population of across between Oryza sativa and O. rufipogon. Theoretical and Applied Genetics 98:243–251.
[29] Peng J, Ronin Y, Fahima T, Roder MS, Li Y, et al. (2003) Domestication quantita-tive trait loci in Triticum dicoccoides, the progenitor of wheat. Proceedings of theNational Academy of Sciences 100: 2489–94.
[30] Cai W, Morishima H (2002) QTL clusters reflect character associations in wild andcultivated rice. Theoretical and Applied Genetics 104: 1217–1228.
[31] Gyenis L, Yun SJ, Smith KP, Steffenson BJ, Bossolini E, et al. (2007) Geneticarchitecture of quantitative trait loci associated with morphological and agronomictrait differences in a wild by cultivated barley cross. Genome 50: 714–23.
[32] Simons KJ, Fellers JP, Trick HN, Zhang Z, Tai YS, et al. (2006) Molecular charac-terization of the major wheat domestication gene Q. Genetics 172: 547–55.
[33] Li C, Zhou A, Sang T (2006) Rice domestication by reducing shattering. Science311: 1936–9.
[34] Cong B, Barrero LS, Tanksley SD (2008) Regulatory change in YABBY-like tran-scription factor led to evolution of extreme fruit size during tomato domestication.Nature Genetics 40: 800–4.
[35] Wright SI, Bi IV, Schroeder SG, Yamasaki M, Doebley JF, et al. (2005) The effectsof artificial selection on the maize genome. Science 308: 1310–4.
[36] Doebley JF, Gaut BS, Smith BD (2006) The molecular genetics of crop domestica-tion. Cell 127: 1309–21.
[37] Briggs WH, McMullen MD, Doebley JF, Gaut BS (2007) Linkage mapping of do-mestication loci in a large maize teosinte backcross resource. Genetics 177: 1915–28.
[38] Doebley J, Stec A (1991) Genetic analysis of the morphological differences betweenmaize and teosinte. Genetics 129: 285–95.
[39] Whipple CJ, Kebrom TH, Weber AL, Yang F, Hall D, et al. (2011) Grassy Tillers1Promotes Apical Dominance in Maize and Responds To Shade Signals in theGrasses. Proceedings of the National Academy of Sciences 108: E506–12.
171
[40] Doebley J, Stec A, Hubbard L (1997) The evolution of apical dominance in maize.Nature 386: 485–8.
[41] Clark RM, Nussbaum-Wagler T, Quijada P, Doebley J (2006) A distant upstreamenhancer at the maize domestication gene tb1 has pleiotropic effects on plant andinflorescent architecture. Nature Genetics 38: 594–7.
[42] Studer AJ, Doebley JF (2011) Do large effect QTL fractionate? A case study atthe maize domestication QTL teosinte branched1. Genetics 188: 673–81.
[43] Doebley J, Stec A (1993) Inheritance of the morphological differences between maizeand teosinte: comparison of results for two F2 populations. Genetics 134: 559–70.
[44] Littell R, Milliken G, Stroup W, Wolfinger R (1996) SAS system for mixed models.SAS Institute, Cary, NC., 2nd edition.
[45] Broman KW, Wu H, Sen S, Churchill G (2003) R/qtl: QTL mapping in experimentalcrosses. Bioinformatics 19: 889–890.
[46] Broman KW, Sen S (2009) A Guide to QTL Mapping with R/qtl. Statistics for Biol-ogy and Health. New York, NY: Springer New York. doi:10.1007/978-0-387-92125-9.URL http://www.springerlink.com/index/10.1007/978-0-387-92125-9.
[47] Kosambi DD (1944) The Estimation of Map Distances from Recombination Values.Annals of Eugenics 12: 172–175.
[48] Orr HA (1998) The Population Genetics of Adaptation: The Distribution of FactorsFixed during Adaptive Evolution. Evolution 52: 935.
[49] Beavis WD (1998) QTL Analyses: Power, Precision, and Accuracy. In: PatersonAH, editor, Molecular Dissection of Complex Traits, New York, NY: CRC Press,chapter 10. 1 edition, pp. 145–162.
[50] Hung HY, Shannon LM, Tian F, Bradbury PJ, Chen C, et al. (2012) ZmCCT andthe genetic basis of day-length adaptation underlying the postdomestication spreadof maize. Proceedings of the National Academy of Sciences 109: E1913–21.
[51] Yilmaz A, Nishiyama MY, Fuentes BG, Souza GM, Janies D, et al. (2009) GRAS-SIUS: a platform for comparative regulatory genomics across the grasses. PlantPhysiology 149: 171–80.
[52] Yanofsky MF, Ma H, Bowman JL, Drews GN, Feldmann KA, et al. (1990) Theprotein encoded by the Arabidopsis homeotic gene agamous resembles transcriptionfactors. Nature 346: 35–9.
[53] Schwarz-Sommer Z, Huijser P, Nacken W, Saedler H, Sommer H (1990) GeneticControl of Flower Development by Homeotic Genes in Antirrhinum majus. Science250: 931–6.
172
[54] Smaczniak C, Immink RGH, Angenent GC, Kaufmann K (2012) Developmental andevolutionary diversity of plant MADS-domain factors: insights from recent studies.Development 139: 3081–98.
[55] Hufford MB, Xu X, van Heerwaarden J, Pyhajarvi T, Chia JM, et al. (2012) Com-parative population genomics of maize domestication and improvement. NatureGenetics 44: 808–11.
[56] Sekhon RS, Lin H, Childs KL, Hansey CN, Robin Buell C, et al. (2011) Genome-wide atlas of transcription through maize development. The Plant Journal : 1–11.
[57] Xue W, Xing Y, Weng X, Zhao Y, Tang W, et al. (2008) Natural variation inGhd7 is an important regulator of heading date and yield potential in rice. NatureGenetics 40: 761–7.
[58] Li Y, Fan C, Xing Y, Jiang Y, Luo L, et al. (2011) Natural variation in GS5 playsan important role in regulating grain size and yield in rice. Nature Genetics 43:1266–9.
[59] Fan C, Xing Y, Mao H, Lu T, Han B, et al. (2006) GS3, a major QTL for grainlength and weight and minor QTL for grain width and thickness in rice, encodes aputative transmembrane protein. Theoretical and Applied Genetics 112: 1164–71.
[60] Yu B, Lin Z, Li H, Li X, Li J, et al. (2007) TAC1, a major quantitative trait locuscontrolling tiller angle in rice. The Plant Journal 52: 891–8.
[61] Jin J, Huang W, Gao JP, Yang J, Shi M, et al. (2008) Genetic control of rice plantarchitecture under domestication. Nature Genetics 40: 1365–9.
[62] Yang Q, Li Z, Li W, Ku L, Wang C, et al. (2013) CACTA-like transposable elementin ZmCCT attenuated photoperiod sensitivity and accelerated the postdomesti-cation spread of maize. Proceedings of the National Academy of Sciences 110:16969–74.
[63] Kermicle JL (2006) A selfish gene governing pollen-pistil compatibility confers re-productive isolation between maize relatives. Genetics 172: 499–506.
[64] Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, et al. (2011) A robust,simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoSOne 6: e19379.
[65] Carroll SB (2005) Evolution at two levels: on genes and form. PLoS Biology 3:e245.
[66] Stern DL, Orgogozo V (2008) The loci of evolution: how predictable is geneticevolution? Evolution 62: 2155–77.
173
[67] Springer NM, Stupar RM (2007) Allele-specific expression patterns reveal biasesand embryo-specific parent-of-origin effects in hybrid maize. The Plant Cell 19:2391–402.
[68] Bell GDM, Kane NC, Rieseberg LH, Adams KL (2013) RNA-seq analysis of allele-specific expression, hybrid effects, and regulatory divergence in hybrids comparedwith their parents from natural populations. Genome Biology and Evolution 5:1309–23.
[69] Song G, Guo Z, Liu Z, Cheng Q, Qu X, et al. (2013) Global RNA sequencingreveals that genotype-dependent allele-specific expression contributes to differentialexpression in rice F1 hybrids. BMC Plant Biology 13: 221.
[70] Zhang X, Borevitz JO (2009) Global analysis of allele-specific expression in Ara-bidopsis thaliana. Genetics 182: 943–54.
[71] Tirosh I, Reikhav S, Levy Aa, Barkai N (2009) A yeast hybrid provides insight intothe evolution of gene expression regulation. Science 324: 659–62.
[72] He F, Zhang X, Hu J, Turck F, Dong X, et al. (2012) Genome-wide Analysis ofCis-regulatory Divergence between Species in the Arabidopsis Genus. MolecularBiology and Evolution 29: 3385–3395.
[73] Schaefke B, Emerson JJ, Wang TY, Lu MYJ, Hsieh LC, et al. (2013) Inheritance ofgene expression level and selective constraints on trans- and cis-regulatory changesin yeast. Molecular Biology and Evolution 30: 2121–33.
[74] Purugganan MD, Fuller DQ (2009) The nature of selection during plant domestica-tion. Nature 457: 843–8.
[75] Zhong S, Joung Jg, Zheng Y, Chen Yr, Liu B, et al. (2011) High-throughput illuminastrand-specific RNA sequencing library preparation. Cold Spring Harbor Protocols2011: 940–9.
[76] Wang X, Soloway PD, Clark AG (2011) A survey for novel imprinted genes in themouse placenta by mRNA-seq. Genetics 189: 109–22.
[77] Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754–60.
[78] DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, et al. (2011) A frame-work for variation discovery and genotyping using next-generation DNA sequencingdata. Nature Genetics 43: 491–8.
[79] McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, et al. (2010) TheGenome Analysis Toolkit: a MapReduce framework for analyzing next-generationDNA sequencing data. Genome Research 20: 1297–303.
174
[80] Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficientalignment of short DNA sequences to the human genome. Genome Biology 10: R25.
[81] Storey JD (2002) A direct approach to false discovery rates. Journal of the RoyalStatistical Society: Series B (Statistical Methodology) 64: 479–498.
[82] Lester RN (1989) Evolution under domestication involving disturbance of genicbalance. Euphytica 44: 125–132.
[83] Gross BL, Olsen KM (2010) Genetic perspectives on crop domestication. Trends inPlant Science 15: 529–537.
[84] Burger JC, Chapman MA, Burke JM (2008) Molecular insights into the evolutionof crop plants. American Journal of Botany 95: 113–122.
[85] Dean RB, Dixon WJ (1951) Simplified Statistics for Small Numbers of Observations.Analytical Chemistry 23: 636–638.
[86] Jin J, Zhang H, Kong L, Gao G, Luo J (2014) PlantTFDB 3.0: a portal for thefunctional and evolutionary study of plant transcription factors. Nucleic AcidsResearch 42: D1182–7.
[87] Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M (2012) KEGG for integrationand interpretation of large-scale molecular data sets. Nucleic Acids Research 40:D109–14.
[88] Kanehisa M (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. NucleicAcids Research 28: 27–30.
[89] Chen H, Patterson N, Reich D (2010) Population differentiation as a test for selectivesweeps. Genome Research 20: 393–402.
[90] Young MD, Wakefield MJ, Smyth GK, Oshlack A (2010) Gene ontology analysisfor RNA-seq: accounting for selection bias. Genome Biology 11: R14.
[91] R Development Core Team (2013). R: A language and environment for statisticalcomputing. URL http://www.r-project.org/.
[92] Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practicaland powerful approach to multiple testing. Journal of the Royal Statistical SocietySeries B (Methodological) 57: 289–300.
[93] Eichten SR, Briskine R, Song J, Li Q, Swanson-Wagner R, et al. (2013) Epigeneticand genetic influences on DNA methylation variation in maize populations. ThePlant Cell 25: 2783–97.
[94] Duncan IW (2002) Transvection effects in Drosophila. Annual Review of Genetics36: 521–56.
175
[95] Hansey CN, Vaillancourt B, Sekhon RS, de Leon N, Kaeppler SM, et al. (2012)Maize (Zea mays L.) genome diversity as revealed by RNA-sequencing. PLoS One7: e33071.
[96] Springer NM, Ying K, Fu Y, Ji T, Yeh CT, et al. (2009) Maize inbreds exhibit highlevels of copy number variation (CNV) and presence/absence variation (PAV) ingenome content. PLoS Genetics 5: e1000734.
[97] Chia JM, Song C, Bradbury PJ, Costich D, de Leon N, et al. (2012) Maize HapMap2identifies extant variation from a genome in flux. Nature Genetics 44: 803–7.
[98] Tenaillon MI, U’Ren J, Tenaillon O, Gaut BS (2004) Selection versus demography:a multilocus investigation of the domestication process in maize. Molecular Biologyand Evolution 21: 1214–25.
[99] Clark RM, Linton E, Messing J, Doebley JF (2004) Pattern of diversity in thegenomic region near the maize domestication gene tb1. Proceedings of the NationalAcademy of Sciences 101: 700–7.
[100] Hanning I, Baumgarten K, Schott K, Heldt H (1999) Oxaloacetate transport intoplant mitochondria. Plant Physiology 119: 1025–32.
[101] Zoglowek C, Kromer S, Heldt HW (1988) Oxaloacetate and malate transport byplant mitochondria. Plant Physiology 87: 109–15.
[102] Hunt HV, Denyer K, Packman LC, Jones MK, Howe CJ (2010) Molecular basisof the waxy endosperm starch phenotype in broomcorn millet (Panicum miliaceumL.). Molecular Biology and Evolution 27: 1478–94.
[103] Fan L, Bao J, Wang Y, Yao J, Gui Y, et al. (2009) Post-domestication selection inthe maize starch pathway. PLoS One 4: e7612.
[104] Park YJ, Nemoto K, Nishikawa T, Matsushima K, Minami M, et al. (2009) Waxystrains of three amaranth grains raised by different mutations in the coding region.Molecular Breeding 25: 623–635.
[105] Dussert Y, Remigereau MS, Fontaine MC, Snirc A, Lakis G, et al. (2013) Poly-morphism pattern at a miniature inverted-repeat transposable element locus down-stream of the domestication gene Teosinte-branched1 in wild and domesticated pearlmillet. Molecular Ecology 22: 327–40.
[106] Sugimoto K, Takeuchi Y, Ebana K, Miyao A, Hirochika H, et al. (2010) Molecularcloning of Sdr4, a regulator involved in seed dormancy and domestication of rice.Proceedings of the National Academy of Sciences 107: 5792–7.
[107] Weller JL, Liew LC, Hecht VFG, Rajandran V, Laurie RE, et al. (2012) A conservedmolecular basis for photoperiod adaptation in two temperate legumes. Proceedingsof the National Academy of Sciences 109: 21158–63.
176
[108] Zhu BF, Si L, Wang Z, Zhou Y, Zhu J, et al. (2011) Genetic control of a transitionfrom black to straw-white seed hull in rice domestication. Plant Physiology 155:1301–11.
[109] Liu J, Van Eck J, Cong B, Tanksley SD (2002) A new class of regulatory genesunderlying the cause of pear-shaped tomato fruit. Proceedings of the NationalAcademy of Sciences 99: 13302–6.
[110] Gallavotti A, Zhao Q, Kyozuka J, Meeley RB, Ritter MK, et al. (2004) The role ofbarren stalk1 in the architecture of maize. Nature 432: 630–5.
[111] Carling MD, Brumfield RT (2009) Speciation in Passerina buntings: introgressionpatterns of sex-linked loci identify a candidate gene region for reproductive isolation.Molecular Ecology 18: 834–47.
[112] Pool JE, Corbett-Detig RB, Sugino RP, Stevens KA, Cardeno CM, et al. (2012)Population Genomics of sub-saharan Drosophila melanogaster : African diversityand non-African admixture. PLoS Genetics 8: e1003080.
[113] Zhao Q, Thuillet AC, Uhlmann NK, Weber AL, Rafalski JA, et al. (2008) The roleof regulatory genes during maize domestication: evidence from nucleotide polymor-phism and gene expression. Genetics 178: 2133–43.
[114] Zhao Q, Weber AL, McMullen MD, Guill K, Doebley J (2011) MADS-box genesof maize: frequent targets of selection during domestication. Genetics Research 93:65–75.
[115] Theissen G, Strater T, Fischer A, Saedler H (1995) Structural characterization,chromosomal localization and phylogenetic evaluation of two pairs of AGAMOUS -like MADS-box genes from maize. Gene 156: 155–66.
[116] Schmidt RJ, Veit B, Mandel MA, Mena M, Hake S, et al. (1993) Identification andmolecular characterization of ZAG1, the maize homolog of the Arabidopsis floralhomeotic gene AGAMOUS. The Plant Cell 5: 729–37.