mesoamerican origin of the common bean …mesoamerican origin of the common bean (phaseolus vulgaris...

9
Mesoamerican origin of the common bean (Phaseolus vulgaris L.) is revealed by sequence data Elena Bitocchi a , Laura Nanni a , Elisa Bellucci a , Monica Rossi a , Alessandro Giardini a , Pierluigi Spagnoletti Zeuli b , Giuseppina Logozzo b , Jens Stougaard c , Phillip McClean d , Giovanna Attene e , and Roberto Papa a,f,1 a Dipartimento di Scienze Agrarie, Alimentari ed Ambientali, Università Politecnica delle Marche, 60131 Ancona, Italy; b Dipartimento di Biologia Difesa e Biotecnologie Agro-Forestali, Università degli Studi della Basilicata, 85100 Potenza, Italy; c Centre for Carbohydrate Recognition and Signalling, Department of Molecular Biology, Aarhus University, DK-8000 Aarhus C, Denmark; d Department of Plant Sciences, North Dakota State University, Fargo, ND 58105; e Dipartimento di Scienze Agronomiche e Genetica Vegetale Agraria, Università degli Studi di Sassari, 07100 Sassari, Italy; and f Cereal Research Centre, Agricultural Research Council (CRA-CER), S.S. 16, Km 675, 71122 Foggia, Italy Edited* by Jeffrey L. Bennetzen, University of Georgia, Athens, GA, and approved December 16, 2011 (received for review June 3, 2011) Knowledge about the origins and evolution of crop species rep- resents an important prerequisite for efcient conservation and use of existing plant materials. This study was designed to solve the ongoing debate on the origins of the common bean by investigating the nucleotide diversity at ve gene loci of a large sample that represents the entire geographical distribution of the wild forms of this species. Our data clearly indicate a Mesoamerican origin of the common bean. They also strongly support the occurrence of a bottleneck during the formation of the Andean gene pool that predates the domestication, which was suggested by recent studies based on multilocus molecular markers. Furthermore, a remarkable result was the genetic structure that was seen for the Mesoamerican accessions, with the identication of four different genetic groups that have different relationships with the sets of wild accessions from the Andes and northern PeruEcuador. This nding implies that both of the gene pools from South America originated through different migration events from the Mesoamerican populations that were characteristic of central Mexico. crop evolution | Phaseoleae | population structure | mutation rate T he common bean (Phaseolus vulgaris L.) is the main grain legume for direct human consumption, and it represents a rich source of protein, vitamins, minerals, and ber, especially for the poorer populations of Africa and Latin America (1). Investigations into the origins and evolution of this species would be expected to highlight the structure and organization of its genetic diversity and the role of the evolutionary forces that have been shaping this diversity. Such knowledge is a crucial pre- requisite for efcient conservation and use of the existing germplasm for the development of new improved plant varieties. The current distribution of the wild common bean encompasses a large geographical area: from northern Mexico to northwestern Argentina (2). In general, two major ecogeographical gene pools are recognized: Mesoamerica and the Andes. These two gene pools are characterized by partial reproductive isolation (3, 4), and they are seen in both wild and domesticated materials. They have been recognized in several studies based on morphology (57), agronomic traits (7), seed proteins (8), allozymes (9), and different types of molecular markers (1015), which have given the overall indication of the occurrence of at least two in- dependent domestication events in the two different hemispheres. The existence of these two geographically distinct and isolated evolutionary lineages that predate the domestication of the common bean represents a unique scenario among crops. This scenario differs from the occurrence over a smaller geographical range of multiple domestications that have been proposed for other species, such as barley (16) and wheat (17), and indeed, within the Mesoamerican gene pool of the common bean itself (18, but see 13, 19). In these cases, the lack of isolation between the populations prevented independent evolution of different lineages in both the wild and domesticated forms. Rice is the only crop that may show a scenario that is to some extent similar to the scenario of P. vulgaris (20, 21), although a recent study has sug- gested a single domestication event at the origin of the indica and japonica subgroups (22). While only these two major gene pools are recognized in the domesticated population, the geographical structure of the wild form of the common bean is more complex, with an additional third gene pool that is localized between Peru and Ecuador (23) and characterized by a specic storage seed protein, phaseolin type I (24). Moreover, wild populations from Colombia are often seen as intermediates, and a marked geo- graphical structure is observed in wild beans from Mesoamerica (12). The population from northern Peru and Ecuador is usually considered the ancestral population from which P. vulgaris origi- nated (the northern PeruEcuador hypothesis) (11, 24, 25). In- deed, the work by Kami et al. (24) analyzed a portion of the gene that codes for the storage seed protein phaseolin, including pha- seolin type (type I) from northern PeruEcuador accessions that was not present in the other gene pools, which indicates that type I phaseolin is ancestral to the other phaseolin sequences of P. vul- garis (24). Thus, the work by Kami et al. (24) suggested that, starting from the core area of the western slopes of the Andes in northern Peru and Ecuador, the wild bean was dispersed north (Colombia, Central America, and Mexico) and south (southern Peru, Bolivia, and Argentina), which resulted in the Mesoamer- ican and Andean gene pools, respectively. The Mesoamerican origin of the common bean is an alterna- tive and older hypothesis. This hypothesis is supported by the observations that the closest relatives of wild P. vulgaris in the Phaseolus genus are distributed throughout Mesoamerica (2629). Additionally, the higher diversity found in the Mesoamer- ican compared with the Andean gene pool (phaseolin types, allozyme alleles, and molecular markers) (810, 30, 31) supports a Mesoamerican origin. Recently, the work by Rossi et al. (13) compared amplied fragment length polymorphism (AFLP) data based on an anal- ysis of wild and domesticated P. vulgaris accessions with the data from the work by Kwak and Gepts (14) based on simple se- quence repeats (SSRs) obtained on a similar sample. This analysis suggested that, before domestication, there was a severe bottleneck in the Andean populations, and they reproposed the Mesoamerican origin of the common bean. Author contributions: E. Bitocchi and R.P. designed research; E. Bitocchi, L.N., E. Bellucci, M.R., A.G., P.S.Z., G.L., and J.S., P.M., G.A., and R.P. performed research; E. Bitocchi, L.N., E. Bellucci, G.A., and R.P. analyzed data; and E. Bitocchi and R.P. wrote the paper. The authors declare no conict of interest. *This Direct Submission article had a prearranged editor. Data deposition: The sequences reported in this paper have been deposited in the Gen- Bank database (accession nos. JN796475-JN796922). 1 To whom correspondence should be addressed. E-mail: [email protected]. See Author Summary on page 5148 (volume 109, number 14). This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1108973109/-/DCSupplemental. E788E796 | PNAS | Published online March 5, 2012 www.pnas.org/cgi/doi/10.1073/pnas.1108973109 Downloaded by guest on February 28, 2020

Upload: others

Post on 23-Feb-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mesoamerican origin of the common bean …Mesoamerican origin of the common bean (Phaseolus vulgaris L.) is revealed by sequence data Elena Bitocchia, Laura Nannia, Elisa Belluccia,

Mesoamerican origin of the common bean (Phaseolusvulgaris L.) is revealed by sequence dataElena Bitocchia, Laura Nannia, Elisa Belluccia, Monica Rossia, Alessandro Giardinia, Pierluigi Spagnoletti Zeulib,Giuseppina Logozzob, Jens Stougaardc, Phillip McCleand, Giovanna Attenee, and Roberto Papaa,f,1

aDipartimento di Scienze Agrarie, Alimentari ed Ambientali, Università Politecnica delle Marche, 60131 Ancona, Italy; bDipartimento di Biologia Difesae Biotecnologie Agro-Forestali, Università degli Studi della Basilicata, 85100 Potenza, Italy; cCentre for Carbohydrate Recognition and Signalling, Departmentof Molecular Biology, Aarhus University, DK-8000 Aarhus C, Denmark; dDepartment of Plant Sciences, North Dakota State University, Fargo, ND 58105;eDipartimento di Scienze Agronomiche e Genetica Vegetale Agraria, Università degli Studi di Sassari, 07100 Sassari, Italy; and fCereal Research Centre,Agricultural Research Council (CRA-CER), S.S. 16, Km 675, 71122 Foggia, Italy

Edited* by Jeffrey L. Bennetzen, University of Georgia, Athens, GA, and approved December 16, 2011 (received for review June 3, 2011)

Knowledge about the origins and evolution of crop species rep-resents an important prerequisite for efficient conservation and useof existing plant materials. This study was designed to solve theongoing debate on the origins of the commonbean by investigatingthe nucleotide diversity at five gene loci of a large sample thatrepresents the entire geographical distribution of the wild forms ofthis species. Our data clearly indicate a Mesoamerican origin of thecommon bean. They also strongly support the occurrence of abottleneck during the formation of the Andean gene pool thatpredates the domestication, which was suggested by recent studiesbased on multilocus molecular markers. Furthermore, a remarkableresult was the genetic structure thatwas seen for theMesoamericanaccessions, with the identification of four different genetic groupsthat have different relationships with the sets of wild accessionsfrom the Andes and northern Peru–Ecuador. This finding impliesthat both of the gene pools from South America originated throughdifferent migration events from theMesoamerican populations thatwere characteristic of central Mexico.

crop evolution | Phaseoleae | population structure | mutation rate

The common bean (Phaseolus vulgaris L.) is the main grainlegume for direct human consumption, and it represents

a rich source of protein, vitamins, minerals, and fiber, especiallyfor the poorer populations of Africa and Latin America (1).Investigations into the origins and evolution of this species wouldbe expected to highlight the structure and organization of itsgenetic diversity and the role of the evolutionary forces that havebeen shaping this diversity. Such knowledge is a crucial pre-requisite for efficient conservation and use of the existinggermplasm for the development of new improved plant varieties.The current distribution of the wild common bean encompasses

a large geographical area: from northern Mexico to northwesternArgentina (2). In general, two major ecogeographical gene poolsare recognized: Mesoamerica and the Andes. These two genepools are characterized by partial reproductive isolation (3, 4),and they are seen in both wild and domesticated materials. Theyhave been recognized in several studies based on morphology (5–7), agronomic traits (7), seed proteins (8), allozymes (9), anddifferent types of molecular markers (10–15), which have giventhe overall indication of the occurrence of at least two in-dependent domestication events in the two different hemispheres.The existence of these two geographically distinct and isolatedevolutionary lineages that predate the domestication of thecommon bean represents a unique scenario among crops. Thisscenario differs from the occurrence over a smaller geographicalrange of multiple domestications that have been proposed forother species, such as barley (16) and wheat (17), and indeed,within the Mesoamerican gene pool of the common bean itself(18, but see 13, 19). In these cases, the lack of isolation betweenthe populations prevented independent evolution of differentlineages in both the wild and domesticated forms. Rice is the onlycrop that may show a scenario that is to some extent similar to the

scenario of P. vulgaris (20, 21), although a recent study has sug-gested a single domestication event at the origin of the indica andjaponica subgroups (22). While only these two major gene poolsare recognized in the domesticated population, the geographicalstructure of the wild form of the common bean is more complex,with an additional third gene pool that is localized between Peruand Ecuador (23) and characterized by a specific storage seedprotein, phaseolin type I (24). Moreover, wild populations fromColombia are often seen as intermediates, and a marked geo-graphical structure is observed in wild beans from Mesoamerica(12). The population from northern Peru and Ecuador is usuallyconsidered the ancestral population from which P. vulgaris origi-nated (the northern Peru–Ecuador hypothesis) (11, 24, 25). In-deed, the work by Kami et al. (24) analyzed a portion of the genethat codes for the storage seed protein phaseolin, including pha-seolin type (type I) from northern Peru–Ecuador accessions thatwas not present in the other gene pools, which indicates that type Iphaseolin is ancestral to the other phaseolin sequences of P. vul-garis (24). Thus, the work by Kami et al. (24) suggested that,starting from the core area of the western slopes of the Andes innorthern Peru and Ecuador, the wild bean was dispersed north(Colombia, Central America, and Mexico) and south (southernPeru, Bolivia, and Argentina), which resulted in the Mesoamer-ican and Andean gene pools, respectively.The Mesoamerican origin of the common bean is an alterna-

tive and older hypothesis. This hypothesis is supported by theobservations that the closest relatives of wild P. vulgaris in thePhaseolus genus are distributed throughout Mesoamerica (26–29). Additionally, the higher diversity found in the Mesoamer-ican compared with the Andean gene pool (phaseolin types,allozyme alleles, and molecular markers) (8–10, 30, 31) supportsa Mesoamerican origin.Recently, the work by Rossi et al. (13) compared amplified

fragment length polymorphism (AFLP) data based on an anal-ysis of wild and domesticated P. vulgaris accessions with the datafrom the work by Kwak and Gepts (14) based on simple se-quence repeats (SSRs) obtained on a similar sample. Thisanalysis suggested that, before domestication, there was a severebottleneck in the Andean populations, and they reproposed theMesoamerican origin of the common bean.

Author contributions: E. Bitocchi and R.P. designed research; E. Bitocchi, L.N., E. Bellucci,M.R., A.G., P.S.Z., G.L., and J.S., P.M., G.A., and R.P. performed research; E. Bitocchi, L.N.,E. Bellucci, G.A., and R.P. analyzed data; and E. Bitocchi and R.P. wrote the paper.

The authors declare no conflict of interest.

*This Direct Submission article had a prearranged editor.

Data deposition: The sequences reported in this paper have been deposited in the Gen-Bank database (accession nos. JN796475-JN796922).1To whom correspondence should be addressed. E-mail: [email protected].

See Author Summary on page 5148 (volume 109, number 14).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1108973109/-/DCSupplemental.

E788–E796 | PNAS | Published online March 5, 2012 www.pnas.org/cgi/doi/10.1073/pnas.1108973109

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 28

, 202

0

Page 2: Mesoamerican origin of the common bean …Mesoamerican origin of the common bean (Phaseolus vulgaris L.) is revealed by sequence data Elena Bitocchia, Laura Nannia, Elisa Belluccia,

The main aim of the present study was to investigate the evolu-tionary history of P. vulgaris to solve the issue regarding the originof this species. We have investigated the nucleotide diversity forfive different genes from a wide sample of wild P. vulgaris thatis representative of its geographical distribution. Our findingsidentify the origin of P. vulgaris as Mesoamerican, and they alsoreveal the level and structure of the genetic diversity that char-acterizes the wild accessions of this species. Our data are relevantfor crop improvement and in particular, maximization of theefficient use of wild genetic diversity for breeding programs.

ResultsNucleotide Variation in the Wild Common Bean. We sequenced fiveloci of a large collection that included wild common bean acces-sions from Mesoamerican (n = 49) and Andean (n = 47) genepools and genotypes from northern Peru–Ecuador (n=6) that arecharacterized by the ancestral type I phaseolin (Table S1). Thesegene pools represent a cross-section of the entire geographicaldistribution of the wild form of P. vulgaris. The sequenced regionfor each gene is located within the transcriptional unit andencompasses between 500 and 900 bp, which includes both intronsand exons. Altogether, we sequenced ∼3.4 kb per accession.We investigated the structure of the sequenced gene fragments

in P. vulgaris, starting with BLAST analysis against the referencesequences for the identification of legume anchor (Leg) markers(Fig. S1 and Table S2) (32). For the Leg marker Leg044, weidentified an exon of 102 bp within the fragment, and the deduced

34-aa sequence has 85% identity with histidinol dehydrogenase ofArabidopsis thaliana. This enzyme catalyses the last two steps inthe L-histidine biosynthesis pathway, which is conserved in bac-teria, archaea, fungi, and plants. For Leg100, a coding region of45 bp was identified at the 3′ extremity of the sequence, and itsdeduced 15-aa sequence showed 100% identity with biotin syn-thase of A. thaliana, an enzyme that is involved in the biotinbiosynthetic process. Three exons were found in Leg133 of 75, 72,and 90 bp. The corresponding 79-aa sequence showed 84%identity with dolichyl-diphosphooligosaccharide-protein glyco-transferase of A. thaliana, an enzyme that is involved in aspara-gine-linked protein glycosylation. Two exons (88 and 56 bp) wereidentified in Leg223, with an identity of 71% of the translatedprotein fragment with the eukaryotic translation initiation factorSUI1 family protein of A. thaliana. This protein seems to have animportant role in accurate initiator codon recognition duringtranslation initiation. The structure of PvSHP1 was characterizedin the work by Nanni et al. (19) (Fig. S1 and Table S2), and thefragment sequenced in this study included three coding regions of12, 42, and 42 bp. This sequence was identified as homologous toArabidopsis SHP1 (SHATTERPROOF 1), a gene that is involvedin the control of fruit shattering.We considered the variation in the coding regions of the loci

studied in all of the P. vulgaris accessions. We found that thecoding regions of three loci (Leg044, Leg100, and Leg223) didnot show nucleotide substitutions. For Leg133, there was onesynonymous substitution in three Mexican accessions and all of

Table 1. Population genetics statistics for the five gene fragments and the concatenatesequences in the different gene pools of P. vulgaris

Population N V Pi S H Hd π × 10−3 θW × 10−3 D

ConcatenateAll 84 137 123 14 56 0.96 9.9 8.3 —

MW 37 119 98 21 34* 0.99* 10.6* 8.7* —

AW 43 32 22 10 18 0.86 1.0 2.3 —

PhI 4 18 0 18 4 1.00 2.7 3.0 —

Leg044All 96 23 23 0 13 0.83 6.1 5.4 0.35 nsMW 44 20 10 10 9* 0.82* 4.3* 5.6* −0.73 nsAW 46 9 3 6 5 0.47 0.9 2.5 −1.77 nsPhI 6 14 0 14 2 0.33 5.7 7.4 −1.47 ns

Leg100All 99 45 43 2 13 0.70 22.6 15.4 1.47 nsMW 47 40 39 1 10* 0.83* 22.9* 16.0* 1.46 nsAW 46 1 0 1 2 0.04 0.1 0.4 −1.11 nsPhI 6 2 2 0 2 0.53 1.9 1.6 1.03 ns

Leg133All 101 15 12 3 9 0.63 5.1 5.0 0.06 nsMW 48 13 12 1 6* 0.75* 6.2* 5.1* 0.70 nsAW 47 3 1 2 4 0.20 0.4 1.2 −1.34 nsPhI 6 0 0 0 1 0.00 0.0 0.0 —

Leg223All 94 7 7 0 8 0.78 3.2 2.9 0.23 nsMW 43 6 6 0 7* 0.79* 2.9* 3.0* −0.11 nsAW 47 4 3 1 4 0.45 1.5 1.9 −0.58 nsPhI 4 0 0 0 1 0.00 0.0 0.0 —

PvSHP1All 98 49 44 5 26 0.84 14.0 11.2 0.81 nsMW 47 42 36 6 20* 0.92* 16.0* 11.2* 1.47 nsAW 45 15 15 0 4 0.32 1.7 4.0 −1.79 nsPhI 6 2 0 2 3 0.60 0.8 1.0 −1.13 ns

All, all genotypes of P. vulgaris; AW, Andean wild; D, the D parameter by Tajima (66) for testing neutrality;H, number of haplotypes; Hd, haplotype diversity; MW, Mesoamerican wild; N, sample size; ns, not significant;PhI, phaseolin I type; Pi, parsimony informative sites; S, singleton variable sites; V, variable sites; π × 10−3 andθW × 10−3, two measures of nucleotide diversity from Tajima (64) and Watterson (65) (θ estimator), respectively.*The MW diversity parameters that are higher than those parameters of the AW.

Bitocchi et al. PNAS | Published online March 5, 2012 | E789

AGRICU

LTURA

LSC

IENCE

SPN

ASPL

US

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 28

, 202

0

Page 3: Mesoamerican origin of the common bean …Mesoamerican origin of the common bean (Phaseolus vulgaris L.) is revealed by sequence data Elena Bitocchia, Laura Nannia, Elisa Belluccia,

the accessions characterized by type I phaseolin. There was alsoone nonsynonymous substitution that involved only one Andeanaccession from Argentina [a 1-aa replacement: Asp (D) for Ala(A)]. Finally, for PvSHP1, a nonsynonymous substitution wasfound in two Mesoamerican accessions from Mexico [a 1-aareplacement: Gln (Q) for His (H)].The nucleotide diversity for each of these five loci and the

concatenate sequence was analyzed across all of the P. vulgarisaccessions, taking into account the major population subdivisionsthat corresponded to the three wild gene pools: the Meso-american, Andean, and phaseolin type I (northern Peru–Ecua-dor) groups (Table 1). For the concatenate sequence, whichincluded 84 accessions of P. vulgaris for which all of the five locisequences were obtained, a total of 137 variable sites (V) wereobserved. The number of haplotypes was 56, none of which wasshared among the three gene pools. The highest number ofhaplotypes was 34 for the Mesoamerican gene pool, with 18haplotypes in the Andean gene pool and 4 haplotypes in thenorthern Peru–Ecuador accessions. Among all of the groups, thehighest diversity was seen in the Mesoamerican wild population(π = 10.6 × 10−3), which was 10-fold higher than the diversity ofthe Andean accessions (π = 1.0 × 10−3). Considering each genefragment separately, the diversity was always higher in Meso-america than in the Andes (for all of the statistics considered:number of haplotypes, haplotype diversity, π, and θW) (Table 1).According to the binomial distribution, this pattern has a prob-ability of occurring by chance of P= 0.03. Moreover, consideringthe estimates from the five loci and using the Wilcoxon–Kruskal–Wallis nonparametric test, the Mesoamerican wild bean showeda significant higher diversity compared with the Andean pop-

ulations (from P ≤ 0.009 for haplotype, haplotype diversity, and πto P ≤ 0.016 for θW). This finding was also shown by the loss ofdiversity (Lπ) estimates between the Mesoamerican and Andeanpopulations, with a reduction in the diversity for the lattercompared with the former that ranged from 0.49 (Leg223) to1.00 (Leg100) and was 0.90 for the concatenate sequence (Table2). Finally, even if Tajima’s D was never significant, in theAndean population, it was negative and always much lower thanTajima’s D from Mesoamerica (Table 1).

Population Structure. The population structure analysis de-termined six subpopulations that best define the population (Fig.1, B1 to B6). All of the Andean accessions were clearly assignedto cluster B6, with high percentages of membership (qB6 ≥ 0.76).The same was seen for the type I phaseolin of the northern Peru–Ecuador accessions, which were assigned to cluster B5 (qB5 ≥0.91). The Mesoamerican accessions showed a different sce-nario; indeed, they were subdivided into four different clusters(B1–B4), and they showed higher levels of admixture. We con-sidered a threshold of q ≥ 0.70 to assign the individuals to thesefour clusters. Cluster B1 included 17 Mesoamerican accessions(qB1 ≥ 0.85), 6 of which were from Mexico, 6 from Guatemala, 4from Colombia, and 1 from El Salvador. The other three clusterswere composed of only Mexican accessions: cluster B2 includedseven accessions (qB2 ≥ 0.93), cluster B3 included eight accessions(qB3 ≥ 0.76), and cluster B4 included four accessions (qB4 ≥ 0.99).One Mexican accession (PI325677) was not assigned to anyspecific Mesoamerican cluster because of its high level of ad-mixture (qB2 = 0.28, qB3 = 0.45, qB6 = 0.22, and qB4 = 0.05).To better understand the geographical distributions of the

Mesoamerican accessions that belong to genetic groups B1–B4,we carried out spatial interpolation of the membership coef-ficients (Fig. 2). The accessions of cluster B1 were distributed allalong the Mesoamerican gene pool from the north of Mexicodown to Colombia (Fig. 2A), with a major presence toward thePacific Ocean. The accessions of cluster B2 were spread fromcentral (particularly on the Caribbean side) down to southernMexico (Fig. 2B). However, there was a lack of representation ofthese two clusters in a wide area of central Mexico. Clusters B3and B4 were represented essentially in Mexico and in particular,above the Transverse Volcanic Axis, with cluster B3 widelyspread from northern to central Mexico (Fig. 2C) and cluster B4more restricted to a small area in central Mexico (Fig. 2D).The relationships among the clusters identified were revealed

by a neighbor-joining (NJ) tree (Fig. 3). There was no clear

Table 2. Loss of nucleotide diversity in the Andean wild (AW)population vs. Mesoamerican wild (MW) population calculatedas Lπ = 1 − (πAW/πMW), where πMW and πAW are the nucleotidediversities in MW and AW populations, respectively (39)

Locus Lπ

Leg044 0.78Leg100 1.00Leg133 0.93Leg223 0.49PvSHP1 0.89Concatenate 0.90

Fig. 1. Percentages of membership (q) for each of the clusters identified (B1–B6; color-coded as indicated). Each accession is represented by a vertical linedivided into colored segments, the lengths of which indicate the proportions of the genome that are attributed to the specific clusters. The accessions areordered according to latitude from northern Mexico to northern Argentina. The country of origin is indicated by the horizontal line. AW, Andean wild; ar,Argentina; bl, Bolivia; C_mx, central Mexico; col, Colombia; ec, Ecuador; es, El Salvador; gt, Guatemala; MW, Mesoamerican wild; N_mx, north Mexico; N_pr,northern Peru; PhI, type I phaseolin (northern Peru–Ecuador); S_mx, south Mexico; S_pr, southern Peru.

E790 | www.pnas.org/cgi/doi/10.1073/pnas.1108973109 Bitocchi et al.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 28

, 202

0

Page 4: Mesoamerican origin of the common bean …Mesoamerican origin of the common bean (Phaseolus vulgaris L.) is revealed by sequence data Elena Bitocchia, Laura Nannia, Elisa Belluccia,

distinction between the three gene pools (Mesoamerican,Andean, and northern Peru–Ecuador). Interestingly, the Andeancluster (B6) was more closely related to the Mesoamerican groupB3, and the northern Peru–Ecuador cluster (B5) was moreclosely related to the Mesoamerican group B4; however, theMesoamerican groups B1 and B2 were in an external position.A similar population structure was revealed by the NJ tree

obtained considering the single accessions (Fig. 4) according tothe Bayesian model-based approach [Bayesian Analysis of Pop-ulation Structure (BAPS)]. Also in this case, there was no cleardistinction between the three gene pools. Indeed, the Meso-american accessions were found to be distributed in all of thetree branches: the Andean accessions (Fig. 4, AW) clusteredwith the Mesoamerican group B3 (bootstrap value = 97%) andthe northern Peru–Ecuador accessions (Fig. 4, PhI) were morerelated to the other Mesoamerican groups (bootstrap value =98%) and particularly, the B4 accessions. The Mesoamericangroups B1 and B2 were included in a clade that was statisticallywell-supported (bootstrap value = 79%).

Haplotype Networks. The haplotype networks for each of the fiveloci are shown in Fig. 5. The number of haplotypes found foreach gene ranged from 8 (Leg223) to 26 (PvSHP1). The Leg044,Leg133, and Leg223 haplotypes were interrelated through a fewmutational steps, whereas a higher number of evolutionary stepswas seen for the Leg100 and PvSHP1 haplotypes. However,a consistent observation arose from this analysis: the Meso-american gene pool had the greatest number of haplotypes forall of the loci, from 6 (Leg133) to 20 (PvSHP1), and the Andeangene pool showed a star-like structure with 1 major haplotypewith a high frequency (higher than 0.72) and a few (from 1 to 3)additional minor haplotypes. Furthermore, there was no cleardistinction between the Mesoamerican and Andean group of

haplotypes, where the former seemed to be distributed all alongthe tree and the latter showed haplotypes shared always with theMesoamerican accessions; the only exception here was PvSHP1,where there were no common haplotypes between these genepools. However, a clear relationship was evident between a groupof Mesoamerican haplotypes (from central Mexico) and the three

Fig. 2. Spatial interpolation of membership coefficients (q) for the four Mesoamerican clusters identified by the Bayesian clustering analysis. (A) B1_blue. (B)B2_light-blue. (C) B3_green. (D) B4_orange. Latitude and longitude are expressed in the Universal Transverse Mercator system.

Fig. 3. Unrooted NJ tree showing the phylogenetic relationships of geneticclusters identified by cluster analysis.

Bitocchi et al. PNAS | Published online March 5, 2012 | E791

AGRICU

LTURA

LSC

IENCE

SPN

ASPL

US

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 28

, 202

0

Page 5: Mesoamerican origin of the common bean …Mesoamerican origin of the common bean (Phaseolus vulgaris L.) is revealed by sequence data Elena Bitocchia, Laura Nannia, Elisa Belluccia,

major Andean haplotypes, which were separated by six muta-tional steps (Fig. 5, PvSHP1). A confirmation of the clear re-lationship between the Andean and Mesoamerican accessions ofgroup B3 that was highlighted by the NJ trees (Figs. 3 and 4) wasprovided by these haplotype networks, even if, at this level,a similar situation can be seen for B1. Finally, for all of the genes,the type I phaseolin accessions showed haplotypes that werecloser to the Mesoamerican accessions and often separated fromthe majority of the Andean accessions.

DiscussionIn this study, we analyzed the nucleotide diversity for five genefragments in a large sample of wild P. vulgaris to address theorigins of this species. Indeed, the main aim was to throw lightonto the unique scenario among crop plants that characterizesthe common bean: the existence of two major geographicallydistinct evolutionary lineages that predate domestication (Meso-american and Andean). Our results indicate a clear pattern as-sociated with a Mesoamerican origin of this species from whichdifferent migration events extended the distribution of P. vulgarisinto South America.To date, the most credited hypothesis relating to the origins of

the common bean has indicated that, from a core area on thewestern slopes of the Andes in northern Peru and Ecuador, thewild beans were dispersed north (to Colombia, Central America,andMexico) and south (to southern Peru, Bolivia, and Argentina),which resulted in the Mesoamerican and Andean gene pools, re-spectively. This hypothesis has relied on the identification of aphaseolin (the major seed storage protein) as the ancestral pha-seolin (type I), and it was based on the assumption that the speciesphylogeny is identical to the phylogeny of the gene. However, forseveral reasons (33), this strict relationship has not always held,and as our results suggest, the current distribution of phaseolinmight not reflect its ancient distribution. Alternatively, type Iphaseolin might be extinct in Mesoamerica, or it might still bepresent but just not included in the samples analyzed.

Predomestication Bottleneck in the Andes.Recently, on the basis ofa comparison of the levels observed for AFLP (13) and SSR (14)diversity in the wild populations of P. vulgaris from the Andesand Mesoamerica, the work by Rossi et al. (13) suggested thata bottleneck had occurred in the Andes before domestication.Indeed, for molecular markers characterized by a higher muta-tion rate (SSRs compared with AFLPs), there were no (or onlyvery small) differences in the diversities observed in the twomain geographical areas, whereas there was much lower di-versity in the Andes for AFLPs compared with the Mesoamer-ican wild P. vulgaris. Indirect estimates of the AFLP mutationrate have shown values that vary from 10−6 to 10−5 (34–36),whereas the SSR mutation rate is higher; it ranges from 10−3 to10−4 using both indirect (34, 37, 38) and direct (39, 40) esti-mates. Thus, following the model proposed in the work by Neiet al. (41) that described the effects of a bottleneck on the ge-netic diversity of a population at a neutral locus, the work byRossi et al. (13) suggested the occurrence of a bottleneck in theAndes before domestication. This bottleneck was then re-covered by markers with a high mutation rate compared withmarkers showing lower rates of mutation. According to thishypothesis for nucleotide diversity, an even lower diversityshould be expected in the Andes compared with Mesoamerica.Indeed, as indicated in the work by Lynch and Conery (42) forFabaceae, the nucleotide mutation rate is ∼6.1 × 10−9, muchlower than both the AFLP and SSR rates. Our sequence datashow a very strong difference in the genetic diversity betweenthe wild Mesoamerican and Andean accessions (Lπ = 90%).These reductions are about 2- and 13-fold higher than thereductions computed in a comparable sample of P. vulgarisgenotypes using AFLP (45%) (13) and SSR data (7%) (14),respectively. Thus, our data strongly support the bottleneckhypothesis of Rossi et al. (13), and they are based on the clearrelationship between the mutation rate and the time of diversityrecovery from the occurrence of a bottleneck: the higher themutation rate, the faster the recovery of diversity. To the best ofour knowledge, this example is one of the few (40) of the critical

Fig. 4. Unrooted NJ bootstrap tree inferred from the concatenate sequence data. Each set of accessions (as indicated) is represented by a colored circle, andeach color indicates the membership to the BAPS groups. Small gray and violet circles represent the nodes for which bootstrap values are higher that 50% and80%, respectively (the 80% threshold highlights the relationships with very strong support). AW, Andean wild; MW, Mesoamerican wild; PhI, type I phaseolin(northern Peru–Ecuador).

E792 | www.pnas.org/cgi/doi/10.1073/pnas.1108973109 Bitocchi et al.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 28

, 202

0

Page 6: Mesoamerican origin of the common bean …Mesoamerican origin of the common bean (Phaseolus vulgaris L.) is revealed by sequence data Elena Bitocchia, Laura Nannia, Elisa Belluccia,

role of marker mutations in describing the diversity of plantpopulations, thus underlining the need for the careful consid-eration of mutation rates in diversity studies.

Population Structure in the Mesoamerican Gene Pool. The occur-rence of three gene pools of the wild common bean (Meso-america, Andes, and northern Peru–Ecuador) has been shownextensively (9, 12–14, 23, 24, 31, 43). In particular, the subdivisionsof the two major ecogeographical gene pools (Mesoamerica andAndes) have been shown by several studies that have used dif-ferent types of markers. Even if high population structure hasbeen seen in the Mesoamerican wild gene pool (12), it has usuallybeen considered as a single gene pool. However, in the presentstudy, while the northern Peru–Ecuador and Andean gene poolsare, indeed, characterized by homogeneous assignments intospecific genetic groups (B5 and B6, respectively), the Meso-american accessions are clearly split into four distinct geneticclusters, B1–B4, that are clearly separated also with the NJ treebased on the single individual genotypes (Fig. 4). Moreover, a veryimportant result is seen in the lack of a clear distinction betweenthe Mesoamerican and Andean wild gene pools (Figs. 3–5),whereas the Mesoamerican clusters show different degrees ofrelatedness with the other gene pools. In particular, as showed bythe NJ trees (Figs. 3 and 4), the Andean wild accessions (B6) weremore related to the Mesoamerican B3 as were the northern Peru–Ecuador accessions (B5) to the Mesoamerican B4; indeed, themajor Mesoamerican clusters (B1 and B2) were less related to theAndean and northern Peru–Ecuador clusters. It is, thus, importantto consider the geographical distributions of these groups inMesoamerica. The B1 group was present essentially across all ofthe geographical area from the north of Mexico down to Colom-bia, whereas the B2 group was spread from central to southernMexico. According to our data, the B1 and B2 clusters were almostabsent in a wide area of central Mexico, where they weresubstituted by the two Mesoamerican groups, as the B3 and B4

clusters, which were more related to the South American pop-ulations B5 and B6. This finding is clearly not compatible with thehypothesis of a South American origin, where the phaseolin Igenotypes would be expected to be intermediate between theMesoamerican and Andean clusters.The Mesoamerican origin of the wild common bean is sup-

ported by the large diversity observed, and it is also confirmed bythe single locus haplotype networks, where the northern Peru–Ecuador haplotypes were closer to the Mesoamerican groups andoften separated from the majority of the Andean accessions thatwere usually represented as a single, highly frequent haplotype.In summary, this analysis of the population structure supports

the Mesoamerican origin hypothesis. At the same time, it revealsthe very complex geographical structure of the genetic diversityin Mesoamerica, with central Mexico and the Transverse Vol-canic Axis, which originated ∼5 Mya in the Late Miocene, as thecradle of diversity of P. vulgaris. As the magnitude of thisstructure has not been clearly identified using multilocus mark-ers, on the basis of its old origins, this finding would suggest thatits signature has been partially hidden because of the combinedeffects of different mutation rates and recombination.

Origin of the Common Bean.Our study presents clear evidence of aMesoamerican origin of P. vulgaris, which was most likely lo-cated in Mexico, both from the analysis of the populationstructure and phylogeny and the confirmation of the occurrenceof a bottleneck before domestication in the Andes, which wasproposed in the work by Rossi et al. (13). The Mesoamericanorigin is consistent with the known distribution of most of theclose relatives of P. vulgaris, the much higher diversity of Meso-american wild compared with the diversity from South America,the occurrence of a severe bottleneck in the Andes before do-mestication, and finally, the occurrence in Mesoamerica of wildbeans that are closely related to those beans found in SouthAmerica, both from the Andean and northern Peru–Ecuador

Fig. 5. Haplotype networks of the five nuclear loci. Each circle represents a single different haplotype, and the circle sizes are proportional to the number ofindividuals that carry the same haplotype. Black circles indicate missing intermediate haplotypes. The lengths of the lines of the haplotype networks areproportional to the number of mutational steps, which are indicated when more than one. Colors show the genotypes belonging to the different BAPSclusters (as indicated). AW, Andean wild; MW, Mesoamerican wild; PhI, type I phaseolin (northern Peru–Ecuador).

Bitocchi et al. PNAS | Published online March 5, 2012 | E793

AGRICU

LTURA

LSC

IENCE

SPN

ASPL

US

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 28

, 202

0

Page 7: Mesoamerican origin of the common bean …Mesoamerican origin of the common bean (Phaseolus vulgaris L.) is revealed by sequence data Elena Bitocchia, Laura Nannia, Elisa Belluccia,

gene pool. Thus, we suggest that P. vulgaris from northern Peru–Ecuador is a relict population that only represents a fraction ofthe genetic diversity of the ancestral population and that thispopulation migrated from Central Mexico in ancient times. Theresults that suggest type I phaseolin as ancestral seem quiterobust; thus, the absence of this type of seed protein in theMesoamerican gene pool could be explained by two alternativehypotheses: the type I phaseolin became extinct in Mesoamerica,or it might still be present but just not included in the samplesanalyzed in the literature.Our data also present a scenario for the evolutionary history of

the wild common bean, with the magnitude of the populationsubdivisions in Mesoamerica not having been clearly recognizedbefore this study.

ConclusionsEvolutionary studies of crop species are crucial for severalapplications; indeed, the knowledge relating to the level andstructure of genetic diversity of crop plants and their wild rela-tives is the starting point of any breeding program (44–52). Ourstudy indicates that, to explore new genetic diversity that is notincorporated into the current domesticated germplasm andconsider this high genetic diversity in the Mesoamerican acces-sions that is not present in the Andean gene pool, it is the wildMesoamerican germplasm that should be used in breeding pro-grams, because it has potential for the release of new cultivars.Moreover, it is crucial to consider the high population structurethat characterizes the Mesoamerican wild germplasm to samplethe largest amount of diversity for introgression into commercialvarieties. This finding is very important if we consider that themajority of the improved varieties of the common bean are ofAndean origin at present. Furthermore, exploration of new ge-netic diversity is also fundamental for the meeting of futuredemands for cultivars that can adapt to climate change, whilealso maintaining, or improving their yields.

Materials and MethodsPlant Materials. A panel of 102 wild accessions of the common bean was se-lected to represent the geographical distribution of wild P. vulgaris fromnorthern Mexico to northwestern Argentina. The accessions are representa-tive of the different gene pools of the species: 49 Mesoamerican accessions(Mexico, Central America, and Colombia), 47 Andean accessions (SouthAmerica), and 6 wild accessions from northern Peru and Ecuador that arecharacterized by the ancestral type I phaseolin (23, 24). The accessions char-acterized by the type I phaseolin are from few small populations found inrestricted geographic areas on the western slope of the Andes. We used sixof these accessions to represent the diversity of these populations, includingonly well-described and characterized accessions. A complete list of theaccessions studied is available in Table S1. The seeds were provided by theUnited States Department of AgricultureWestern Regional Plant IntroductionStation and the International Centre of Tropical Agriculture in Colombia.

PCR and Sequencing. Genomic DNA was extracted from each accession fromyoung leaves of a single, greenhouse-grown plant using the miniprep ex-traction method (53). A total of five ∼500- to 900-bp gene regions across thecommon bean genome were sequenced (Table S2). Four of these fragmentgenes (Leg044, Leg100, Leg133, and Leg223) were chosen from a set of theseLeg markers developed in the work by Hougaard et al. (32). Single-copyorthologous genes between legume species were identified, and primerswere designed in conserved exon regions for the amplification of exon andintron sequences (32, 54–56). The fifth gene fragment, PvSHP1, is homolo-gous to the SHATTERPROOF (SHP1) gene involved in the control of fruitshattering in A. thaliana. This finding was developed in the work by Nanniet al. (19) that analyzed PvSHP1 nucleotide variation in a limited sample ofwild P. vulgaris (29 and 16 Mesoamerican and Andean wild genotypes, re-spectively). The available PvSHP1 sequence data of 40 wild genotypes fromthe study by Nanni et al. (19) were included in the present analyses (Table S1).

It is important to note that mapping experiments show that gene du-plication and highly repeated gene sequences in P. vulgaris are generally low,with most loci occurring as single copies (57–59).

DNA fragments were amplified using 25 ng DNA and the followingreagent concentrations: 0.25 μM each forward and reverse primer, 200 μMeach dNTP, 2.5 mMMgCl2, 1× Taq polymerase buffer, 1 unit AmpliTaq DNApolymerase, and sterile double-distilled H2O to a final volume of 100 μL.Amplifications were conducted with a 9700 Thermal Cycler (Perkin-ElmerApplied Biosystems) with an initial denaturation of 1 min at 95 °C that wasfollowed by 30 cycles of 1 min at 95 °C, 1 min at X °C, and 2 min at 72 °Cplus 10 min of final extension at 72 °C. The X °C refers to the annealingtemperature, which is specified for each primer pair in Table S2. The PCRproducts were purified using GFX PCR DNA and Gel Band Purification Kits(GE Healthcare) according to the manufacturer’s instructions. For theLeg100, Leg133, and Leg223 loci, the samples were sequenced on bothstrands using forward and reverse primers on the cycle sequencing re-action with BigDye Terminator Cycle Sequencing Ready Reaction Kits(Applied Biosystems). The products were resolved on an ABI Prism 3100-Avant Automated Sequencer (Applied Biosystems). The sequence datawere analyzed using Pregap4 and Gap4 of the Staden Software Package(http://staden.sourceforge.net/). The Pregap4 modules were used to pre-pare the sequence data for assembly (quality analysis). Gap4 was used forthe final sequence assembly of the Pregap4 output files (normal shotgunassembly). The single-strand sequencing reaction for Leg044 (reverseprimer) and PvSHP1 (forward primer) was performed by Macrogen. Thefragments were resequenced if there was any ambiguity as to which allelewas present. The sequences are accessible at GenBank (GeneBank acces-sion nos. JN796475–JN796922).

Diversity Analyses. For the Leg markers, the identification of exons andintronswas carried out by BLAST analysis (60) against the reference sequenceof A. thaliana (32) (Table S2). For PvSHP1, its structure was previously iden-tified in the work by Nanni et al. (19). Sequence alignment and editing werecarried out using MUSCLE v3.7 (61) and BioEdit v.7.0.9.0 (62). Insertion/deletions (indels) were not included in the analyses. Molecular populationgenetic analyses were conducted using DnaSP 5.10.01 (63). As given in Table1, estimations for the five gene fragments and the concatenate sequences inthe different gene pools of P. vulgarisweremade for the number of variablesites, singleton variable sites, parsimony informative sites, haplotypes,haplotype diversity, nucleotide diversity (π in ref. 64 and θW in ref. 65), and Dby Tajima (66). To measure the loss of nucleotide diversity in the Andeanwild vs. Mesoamerican wild populations as proposed in the work by Vigo-roux et al. (39), we used the statistic Lπ = 1 − (πAW/πMW), where πMW and πAWare the nucleotide diversities in Mesoamerican and Andean wild pop-ulations, respectively; the Lπ parameter ranges from zero to one, which in-dicate a total loss of diversity and no loss of diversity, respectively. Haplotypetrees for each gene fragment were constructed using the median-joiningnetwork algorithm implemented in the program NETWORK 4.5.1.6 (67).Considering the concatenate sequence, an unrooted phylogenetic tree wasconstructed based on Kimura two-parameter distances, and the relativesupport for each node was tested by the bootstrap method using 1,000replicates in MEGA 4 (68). The sites with gaps were also excluded fromthese analyses.

Population Structure Analyses. A Bayesian model-based approach was usedto infer the hidden genetic population structure of our sample and thus,assign the genotypes into genetically structured groups/populations. Thisapproach was implemented in the software BAPS 5.3 (69–72). This versionof the software incorporates the possibility to account for the dependencecaused by linkage between the sites within aligned sequences, which isdifferent from the most widely used STRUCTURE software (73) that usesa model that is not designed to deal with background linkage disequilib-rium between very tightly linked markers (74). A total of 84 accessions,those accessions showing high-quality sequences for all of the five loci, wasused in this analysis. We carried out a genetic mixture analysis to de-termine the most probable number of populations (K ) given the data (71,72). Under its default settings, BAPS includes K as a parameter to be es-timated, and the best partition of the data into K clusters is identified asthe one with the highest marginal log likelihood. The clustering withlinked loci analysis was chosen to account for the linkage present betweensites within aligned sequences; at the same time, the five loci were as-sumed as independent. Ten iterations of K (from 1 to 20) were conductedto determine the optimal number of genetically homogeneous groups.The admixture analysis was then applied to estimate individual admixtureproportions with regards to the most likely number of K clusters identified(70, 72). Admixture inference was based on 100 realizations from theposterior of the allele frequencies. We repeated the admixture five timesto confirm the consistency of the results. Spatial interpolation of mem-

E794 | www.pnas.org/cgi/doi/10.1073/pnas.1108973109 Bitocchi et al.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 28

, 202

0

Page 8: Mesoamerican origin of the common bean …Mesoamerican origin of the common bean (Phaseolus vulgaris L.) is revealed by sequence data Elena Bitocchia, Laura Nannia, Elisa Belluccia,

bership coefficients was performed according to the kriging method,which was implemented in the R packages spatial (http://www.r-project.org/). To explore the relationships among the identified clusters, anunrooted phylogenetic tree was constructed using the NJ algorithm inMEGA 4 (68).

ACKNOWLEDGMENTS. We thank G. Bertorelle for critical reading of thismanuscript and his valuable advice. This study was supported by ItalianGovernment (MIUR) Grant n. 20083PFSXA_001, Project Progetti di Ricerca diInteresse Nazionale (PRIN) 2008, and the Università Politecnica delleMarche (2006–2010).

1. Brougthon WJ, et al. (2003) Beans (Phaseolus spp.): model food legumes. Plant Soil252:55–128.

2. Toro O, Tohme J, Debouck DG (1990) Wild Bean (Phaseolus vulgaris L): Descriptionand Distribution (Centro Internacional de Agricultura Tropical, Cali, Colombia).

3. Gepts P, Bliss FA (1985) F1 hybrid weakness in the common bean: Differentialgeographic origin suggests two gene pools in cultivated bean germplasm. J Hered 76:447–450.

4. Koinange EMK, Gepts P (1992) Hybrid weakness in wild Phaseolus vulgaris L. J Hered83:135–139.

5. Delgado-Salinas A, Bonet A, Gepts P (1988) The wild relative of Phaseolus vulgaris inMiddle America. Genetic Resources of Phaseolus Beans, ed Gepts P (Kluwer, Boston),pp 163–184.

6. Gepts P, Debouck DG (1991) Origin, domestication, and evolution of the commonbean, Phaseolus vulgaris. Common Beans: Research for Crop Improvement, edsVoysest O, Van Schoonhoven A (CAB International, Wallingford, Oxon, UnitedKingdom), pp 7–53.

7. Singh SP, Gutiérrez JA, Molina A, Urrea C, Gepts P (1991) Genetic diversity incultivated common bean: II. Marker-based analysis of morphological and agronomictraits. Crop Sci 31:23–29.

8. Gepts P, Osborn TC, Rashka K, Bliss FA (1986) Phaseolin-protein variability in wildforms and landraces of the common bean (Phaseolus vulgaris): Evidence for multiplecenters of domestication. Econ Bot 40:451–468.

9. Koenig R, Gepts P (1989) Allozyme diversity in wild Phaseolus vulgaris: Furtherevidence for two major centers of genetic diversity. Theor Appl Genet 78:809–817.

10. Becerra-Velásquez VL, Gepts P (1994) RFLP diversity in common bean (Phaseolusvulgaris L.). Genome 37:256–263.

11. Freyre R, Ríos R, Guzmán L, Debouck D, Gepts P (1996) Ecogeographic distribution ofPhaseolus spp. (Fabaceae) in Bolivia. Econ Bot 50:195–215.

12. Papa R, Gepts P (2003) Asymmetry of gene flow and differential geographicalstructure of molecular diversity in wild and domesticated common bean (Phaseolusvulgaris L.) from Mesoamerica. Theor Appl Genet 106:239–250.

13. Rossi M, et al. (2009) Linkage disequilibrium and population structure in wild anddomesticated populations of Phaseolus vulgaris L. Evol Appl 2:504–522.

14. Kwak M, Gepts P (2009) Structure of genetic diversity in the two major gene pools ofcommon bean (Phaseolus vulgaris L., Fabaceae). Theor Appl Genet 118:979–992.

15. Angioi SA, et al. (2009) Development and use of chloroplast microsatellites inPhaseolus spp. and other legumes. Plant Biol (Stuttg) 11:598–612.

16. Morrell PL, Clegg MT (2007) Genetic evidence for a second domestication of barley(Hordeum vulgare) east of the Fertile Crescent. Proc Natl Acad Sci USA 104:3289–3294.

17. Özkan H, Willcox G, Graner A, Salamini F, Kilian B (2010) Geographic distribution anddomestication of wild emmer wheat (Triticum dicoccoides). Genet Resour Crop Evol58:11–53.

18. Chacón S MI, Pickersgill B, Debouck DG (2005) Domestication patterns in commonbean (Phaseolus vulgaris L.) and the origin of the Mesoamerican and Andeancultivated races. Theor Appl Genet 110:432–444.

19. Nanni L, et al. (2011) Nucleotide diversity of a genomic sequence similar toSHATTERPROOF (PvSHP1) in domesticated and wild common bean (Phaseolus vulgarisL.). Theor Appl Genet 123:1341–1357.

20. Vitte C, Ishii T, Lamy F, Brar D, Panaud O (2004) Genomic paleontology providesevidence for two distinct origins of Asian rice (Oryza sativa L.). Mol Genet Genomics272:504–511.

21. Londo JP, Chiang YC, Hung KH, Chiang TY, Schaal BA (2006) Phylogeography of Asianwild rice, Oryza rufipogon, reveals multiple independent domestications of cultivatedrice, Oryza sativa. Proc Natl Acad Sci USA 103:9578–9583.

22. Molina J, et al. (2011) Molecular evidence for a single evolutionary origin ofdomesticated rice. Proc Natl Acad Sci USA 108:8351–8356.

23. Debouck DG, Toro O, Paredes OM, Johnson WC, Gepts P (1993) Genetic diversity andecological distribution of Phaseolus vulgaris in northwestern South America. Econ Bot47:408–423.

24. Kami J, Velásquez VB, Debouck DG, Gepts P (1995) Identification of presumedancestral DNA sequences of phaseolin in Phaseolus vulgaris. Proc Natl Acad Sci USA92:1101–1104.

25. Gepts P, Papa R, Coulibaly S, González Mejía A, Pasquet R (1999) Wild legumediversity and domestication—insights from molecular methods. Wild Legumes:Proceedings of the Seventh MAFF International Workshop on Genetic Resources, edVaughan D (National Institute of Agrobiological Resources, Tsukuba, Japan), pp19–31.

26. Schmit V, Jardin P, Baudoin JP, Debouck DG (1993) Use of chloroplast DNApolymorphisms for the phylogenetic study of seven Phaseolus taxa including P.vulgaris and P. coccineus. Theor Appl Genet 87:506–516.

27. Freytag GF, Debouck DG (2002) Taxonomy, Distribution, and Ecology of the GenusPhaseolus (Leguminosae-Papilionoideae) in North America, Mexico and CentralAmerica (Botanical Research Institute Of Texas, Ft. Worth, TX).

28. Delgado-Salinas A, Turley T, Richman A, Lavin M (1999) Phylogenetic analysis of thecultivated and wild species of Phaseolus (Fabaceae). Syst Bot 24:438–460.

29. Delgado-Salinas A, Bibler R, Lavin M (2006) Phylogeny of the genus Phaseolus(Leguminosae): A recent diversification in an ancient landscape. Syst Bot 31:779–791.

30. Koenig R, Singh SP, Gepts P (1990) Novel phaseolin types in wild and cultivatedcommon bean (Phaseolus vulgaris, Fabaceae). Econ Bot 44:50–60.

31. Chacón MI, Pickersgill B, Debouck DG, Salvador Arias J (2007) Phylogeographicanalysis of the chloroplast DNA variation in wild common bean (Phaseolus vulgaris L.)in the Americas. Plant Syst Evol 266:175–195.

32. Hougaard BK, et al. (2008) Legume anchor markers link syntenic regions betweenPhaseolus vulgaris, Lotus japonicus, Medicago truncatula and Arachis. Genetics 179:2299–2312.

33. Doyle JJ (1992) Gene trees and species trees: Molecular systematics as one charactertaxonomy. Syst Bot 17:144–163.

34. Mariette S, et al. (2001) Genetic diversity within and among Pinus pinasterpopulations: Comparison between AFLP and microsatellite markers. Heredity (Edinb)86:469–479.

35. Gaudeul M, Till-Bottraud I, Barjon F, Manel S (2004) Genetic diversity anddifferentiation in Eryngium alpinum L. (Apiaceae): Comparison of AFLP andmicrosatellite markers. Heredity (Edinb) 92:508–518.

36. Kropf M, Comes HP, Kadereit JW (2009) An AFLP clock for the absolute dating ofshallow-time evolutionary history based on the intraspecific divergence ofsouthwestern European alpine plant species. Mol Ecol 18:697–708.

37. Estoup A, Angers B (1998) Microsatellites and minisatellites for molecular ecology:Theoretical and experimental considerations. Advances in Molecular Ecology, edCarvalho G (IOS Press, Amsterdam), pp 55–86.

38. Garoia F, Guarniero I, Grifoni D, Marzola S, Tinti F (2007) Comparative analysis ofAFLPs and SSRs efficiency in resolving population genetic structure of MediterraneanSolea vulgaris. Mol Ecol 16:1377–1387.

39. Vigouroux Y, et al. (2002) Identifying genes of agronomic importance in maize byscreening microsatellites for evidence of selection during domestication. Proc NatlAcad Sci USA 99:9650–9655.

40. Thuillet AC, Bataillon T, Poirier S, Santoni S, David JL (2005) Estimation of long-termeffective population sizes through the history of durum wheat using microsatellitedata. Genetics 169:1589–1599.

41. Nei M, Maruyama T, Chakraborty R (1975) The bottleneck effect and geneticvariability in populations. Evolution 29:1–10.

42. Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicategenes. Science 290:1151–1155.

43. Tohme J, Gonzales DO, Beebe S, Duque MC (1996) AFLP analysis of gene pools ofa wild bean core collection. Crop Sci 36:1375–1384.

44. Hammer K (1984) The domestication syndrome (Das Domestikationssyndrom).Kulturpflanze 32:11–34.

45. Doebley J, Stec A, Wendel J, Edwards M (1990) Genetic and morphological analysis ofa maize-teosinte F2 population: Implications for the origin of maize. Proc Natl AcadSci USA 87:9888–9892.

46. Harlan JR (1992) Crops and Man (ASA, Madison, WI).47. Grandillo S, Tanksley SD (1996) QTL analysis of horticultural traits differentiating the

cultivated tomato from the closely related species Lycopersicon pimpinellifolium.Theor Appl Genet 92:935–951.

48. Koinange EMK, Singh SP, Gepts P (1996) Genetic control of the domesticationsyndrome in common bean. Crop Sci 36:1037–1045.

49. Xiong LZ, Liu KD, Dai XK, Xu CG, Zhang Q (1999) Identification of genetic factorscontrolling domestication-related traits of rice using an F2 population of a crossbetween Oryza sativa and O. rufipogon. Theor Appl Genet 98:243–251.

50. Poncet V, et al. (1998) Genetic analysis of the domestication syndrome in pearl millet(Pennisetum glaucum L, Poaceae): Inheritance of the major characters. Heredity 81:648–658.

51. Poncet V, et al. (2000) Genetic control of domestication traits in pearl millet(Pennisetum glaucum L, Poaceae). Theor Appl Genet 100:147–159.

52. Singh SP (2001) Broadening the genetic base of common bean cultivars: A review.Crop Sci 41:1659–1675.

53. Doyle JJ, Doyle JL (1987) A rapid DNA isolation procedure for small quantities of freshleaf tissue. Phytochem Bull 19:11–15.

54. Fredslund J, Schauser L, Madsen LH, Sandal N, Stougaard J (2005) PriFi: Usinga multiple alignment of related sequences to find primers for amplification ofhomologs. Nucleic Acids Res 33:W516–W520.

55. Fredslund J, et al. (2006) A general pipeline for the development of anchor markersfor comparative genomics in plants. BMC Genomics 7:207.

56. Fredslund J, et al. (2006) GeMprospector—online design of cross-species geneticmarker candidates in legumes and grasses. Nucleic Acids Res 34:W670–W675.

57. Vallejos CE, Sakiyama NS, Chase CD (1992) A molecular marker-based linkage map ofPhaseolus vulgaris L. Genetics 131:733–740.

58. Freyre R, et al. (1998) Towards an integrated linkage map of common bean. 4.Development of a core map and alignment of RFLP maps. Theor Appl Genet 97:847–856.

59. McClean PE, Lee RK, Otto C, Gepts P, Bassett MJ (2002) Molecular and phenotypicmapping of genes controlling seed coat pattern and color in common bean(Phaseolus vulgaris L.). J Hered 93:148–152.

Bitocchi et al. PNAS | Published online March 5, 2012 | E795

AGRICU

LTURA

LSC

IENCE

SPN

ASPL

US

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 28

, 202

0

Page 9: Mesoamerican origin of the common bean …Mesoamerican origin of the common bean (Phaseolus vulgaris L.) is revealed by sequence data Elena Bitocchia, Laura Nannia, Elisa Belluccia,

60. Altschul SF, et al. (1997) Gapped BLAST and PSI-BLAST: A new generation of proteindatabase search programs. Nucleic Acids Res 25:3389–3402.

61. Edgar RC (2004) MUSCLE: Multiple sequence alignment with high accuracy and highthroughput. Nucleic Acids Res 32:1792–1797.

62. Hall TA (1999) BioEdit: A user-friendly biological sequence alignment editor andanalysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41:95–98.

63. Librado P, Rozas J (2009) DnaSP v5: A software for comprehensive analysis of DNApolymorphism data. Bioinformatics 25:1451–1452.

64. Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations.Genetics 105:437–460.

65. Watterson GA (1975) On the number of segregating sites in genetical models withoutrecombination. Theor Popul Biol 7:256–276.

66. Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNApolymorphism. Genetics 123:585–595.

67. Bandelt HJ, Forster P, Röhl A (1999) Median-joining networks for inferringintraspecific phylogenies. Mol Biol Evol 16:37–48.

68. Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: Molecular Evolutionary Genetics

Analysis (MEGA) software version 4.0. Mol Biol Evol 24:1596–1599.69. Corander J, Waldmann P, Sillanpää MJ (2003) Bayesian analysis of genetic

differentiation between populations. Genetics 163:367–374.70. Corander J, Marttinen P (2006) Bayesian identification of admixture events using

multilocus molecular markers. Mol Ecol 15:2833–2843.71. Corander J, Tang J (2007) Bayesian analysis of population structure based on linked

molecular information. Math Biosci 205:19–31.72. Corander J, Marttinen P, Sirén J, Tang J (2008) Enhanced Bayesian modelling in BAPS

software for learning genetic structures of populations. BMC Bioinformatics 9:539.73. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using

multilocus genotype data. Genetics 155:945–959.74. Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using

multilocus genotype data: Linked loci and correlated allele frequencies. Genetics 164:

1567–1587.

E796 | www.pnas.org/cgi/doi/10.1073/pnas.1108973109 Bitocchi et al.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 28

, 202

0