establishing a soybean germplasm core collection ·...

13
Field Crops Research 119 (2010) 277–289 Contents lists available at ScienceDirect Field Crops Research journal homepage: www.elsevier.com/locate/fcr Establishing a soybean germplasm core collection Marcelo F. Oliveira a,, Randall L. Nelson b , Isaias O. Geraldi c , Cosme D. Cruz d , José Francisco F. de Toledo a a Embrapa Soybean, P.O. Box 231, Londrina, PR 86001-970, Brazil b USDA-Agricultural Research Service, Soybean/Maize Germplasm, Pathology, and Genetics Research Unit, Dept. of Crop Sciences, 1101 W. Peabody Dr., Univ. of Illinois, Urbana, IL 61801, United States c University of São Paulo-Faculdade de Agronomia-ESALQ, Dept. of Genetics, P.O. Box 83, Piracicaba, SP 13400-970, Brazil d Federal University of Vic ¸ osa, Faculdade de Agronomia, Avenida Peter Henry Rolfs, s/n Campus Universitário, Vic ¸ osa, MG 36570-000, Brazil article info Article history: Received 9 March 2010 Received in revised form 23 July 2010 Accepted 25 July 2010 Keywords: Glycine max Genetic diversity Germplasm bank Sampling strategies abstract Core collections are of strategic importance as they allow the use of a small part of a germplasm collection that is representative of the total collection. The objective of this study was to develop a soybean core collection of the USDA Soybean Germplasm Collection by comparing the results of random, proportional, logarithmic, multivariate proportional and multivariate logarithmic sampling strategies. All but the ran- dom sampling strategy used stratification of the entire collection based on passport data and maturity group classification. The multivariate proportional and multivariate logarithmic strategies made further use of qualitative and quantitative trait data to select diverse accessions within each stratum. The 18 quantitative trait data distribution parameters were calculated for each core and for the entire collection for pairwise comparison to validate the sampling strategies. All strategies were adequate for assembling a core collection. The random core collection best represented the entire collection in statistical terms. Proportional and logarithmic strategies did not maximize statistical representation but were better in selecting maximum variability. Multivariate proportional and multivariate logarithmic strategies pro- duced the best core collections as measured by maximum variability conservation. The soybean core collection was established using the multivariate proportional selection strategy. © 2010 Elsevier B.V. All rights reserved. 1. Introduction Access to genetic variability is critical for plant breeding. Germplasm conservation centers were created to preserve the available genetic variability before it is lost due to the widespread use of modern, improved cultivars (Brown, 1989b). In the 1980s, the International Board for Plant Genetic Resources (IBPGR) pro- vided substantial financial support for germplasm preservation that resulted in an increase in the number of collections. The emphasis placed on conservation led to the establishment and preservation of large collections, which were not totally or even partially evaluated or characterized. Although large germplasm collections are desirable from the perspective of genetic variabil- ity preservation (Frankel and Bennett, 1970), their usefulness and accessibility can be inversely related to their size (Frankel and Soulé, 1981). The increase in the number of accessions that are not adequately evaluated can diminish the effectiveness of collections (Holden, 1984; Marshall, 1989). Corresponding author. Tel.: +55 43 3371 6263; fax: +55 43 33716001. E-mail address: [email protected] (M.F. Oliveira). China has the largest Glycine collection, with approximately 26,000 accessions of Glycine max and 6200 accessions of Glycine soja, located in the Institute of Crop Germplasm Resources of the Chinese Academy of Agricultural Science in Beijing (Chang et al., 1999; Carter et al., 2004) and has developed a core collection of this accessions (Zhang et al., 2003). A core collection of the perennial Glycine species has also been established (Brown et al., 1987). The soybean germplasm collection of the United States Depart- ment of Agriculture (USDA) is the second largest in the world with 16,999 accessions of introduced G. max, 1116 accessions of G. soja and 919 accessions of perennial Glycine species. Detailed origin data are available for most entries in the USDA Soybean Germplasm Collection as are data for many descriptive, agronomic, and seed composition traits. For most germplasm collections, there is a gap between the germplasm availability and its use (Peeters and Galwey, 1988) due to the collection size and financial limi- tations. In general, germplasm conservation programs have been more successful in ensuring long-term preservation then in facili- tating germplasm use. The establishment of core collections, as proposed by Frankel and Brown (1984), is an effective strategy to optimize human, mate- rial, and financial resources by providing greater efficiency in the use of germplasm collections (Spagnoletti-Zeuli and Qualset, 1993; 0378-4290/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.fcr.2010.07.021

Upload: lykien

Post on 27-Jul-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

E

MJa

b

6c

d

a

ARRA

KGGGS

1

GautvteppciaSa(

0d

Field Crops Research 119 (2010) 277–289

Contents lists available at ScienceDirect

Field Crops Research

journa l homepage: www.e lsev ier .com/ locate / fc r

stablishing a soybean germplasm core collection

arcelo F. Oliveiraa,∗, Randall L. Nelsonb, Isaias O. Geraldi c, Cosme D. Cruzd,osé Francisco F. de Toledoa

Embrapa Soybean, P.O. Box 231, Londrina, PR 86001-970, BrazilUSDA-Agricultural Research Service, Soybean/Maize Germplasm, Pathology, and Genetics Research Unit, Dept. of Crop Sciences, 1101 W. Peabody Dr., Univ. of Illinois, Urbana, IL1801, United StatesUniversity of São Paulo-Faculdade de Agronomia-ESALQ, Dept. of Genetics, P.O. Box 83, Piracicaba, SP 13400-970, BrazilFederal University of Vicosa, Faculdade de Agronomia, Avenida Peter Henry Rolfs, s/n Campus Universitário, Vicosa, MG 36570-000, Brazil

r t i c l e i n f o

rticle history:eceived 9 March 2010eceived in revised form 23 July 2010ccepted 25 July 2010

eywords:lycine maxenetic diversity

a b s t r a c t

Core collections are of strategic importance as they allow the use of a small part of a germplasm collectionthat is representative of the total collection. The objective of this study was to develop a soybean corecollection of the USDA Soybean Germplasm Collection by comparing the results of random, proportional,logarithmic, multivariate proportional and multivariate logarithmic sampling strategies. All but the ran-dom sampling strategy used stratification of the entire collection based on passport data and maturitygroup classification. The multivariate proportional and multivariate logarithmic strategies made furtheruse of qualitative and quantitative trait data to select diverse accessions within each stratum. The 18

ermplasm bankampling strategies

quantitative trait data distribution parameters were calculated for each core and for the entire collectionfor pairwise comparison to validate the sampling strategies. All strategies were adequate for assemblinga core collection. The random core collection best represented the entire collection in statistical terms.Proportional and logarithmic strategies did not maximize statistical representation but were better inselecting maximum variability. Multivariate proportional and multivariate logarithmic strategies pro-duced the best core collections as measured by maximum variability conservation. The soybean core

d usin

collection was establishe

. Introduction

Access to genetic variability is critical for plant breeding.ermplasm conservation centers were created to preserve thevailable genetic variability before it is lost due to the widespreadse of modern, improved cultivars (Brown, 1989b). In the 1980s,he International Board for Plant Genetic Resources (IBPGR) pro-ided substantial financial support for germplasm preservationhat resulted in an increase in the number of collections. Themphasis placed on conservation led to the establishment andreservation of large collections, which were not totally or evenartially evaluated or characterized. Although large germplasmollections are desirable from the perspective of genetic variabil-ty preservation (Frankel and Bennett, 1970), their usefulness andccessibility can be inversely related to their size (Frankel and

oulé, 1981). The increase in the number of accessions that are notdequately evaluated can diminish the effectiveness of collectionsHolden, 1984; Marshall, 1989).

∗ Corresponding author. Tel.: +55 43 3371 6263; fax: +55 43 33716001.E-mail address: [email protected] (M.F. Oliveira).

378-4290/$ – see front matter © 2010 Elsevier B.V. All rights reserved.oi:10.1016/j.fcr.2010.07.021

g the multivariate proportional selection strategy.© 2010 Elsevier B.V. All rights reserved.

China has the largest Glycine collection, with approximately26,000 accessions of Glycine max and 6200 accessions of Glycinesoja, located in the Institute of Crop Germplasm Resources of theChinese Academy of Agricultural Science in Beijing (Chang et al.,1999; Carter et al., 2004) and has developed a core collection of thisaccessions (Zhang et al., 2003). A core collection of the perennialGlycine species has also been established (Brown et al., 1987).

The soybean germplasm collection of the United States Depart-ment of Agriculture (USDA) is the second largest in the worldwith 16,999 accessions of introduced G. max, 1116 accessions ofG. soja and 919 accessions of perennial Glycine species. Detailedorigin data are available for most entries in the USDA SoybeanGermplasm Collection as are data for many descriptive, agronomic,and seed composition traits. For most germplasm collections, thereis a gap between the germplasm availability and its use (Peetersand Galwey, 1988) due to the collection size and financial limi-tations. In general, germplasm conservation programs have beenmore successful in ensuring long-term preservation then in facili-

tating germplasm use.

The establishment of core collections, as proposed by Frankeland Brown (1984), is an effective strategy to optimize human, mate-rial, and financial resources by providing greater efficiency in theuse of germplasm collections (Spagnoletti-Zeuli and Qualset, 1993;

2 ps Re

vam(tc(roba1maopTqviumrof

2

2

b1aatwpoutItsaacfeftspgoprfiorso11t

78 M.F. Oliveira et al. / Field Cro

an Hintum et al., 2000). A core collection is formed by selectingsmall percentage of the original collection that will representost of the total genetic variation with a minimum of redundancy

Brown, 1995). Those accessions not retained in the core collec-ion are still maintained to preserve rare and potentially importantombinations of alleles. Core collections are not necessarily staticBrown et al., 1987) and accessions may be added over time to rep-esent new geographical areas or taxons, or to replace accessionsf questionable value. Ideally, changes in the core collection shoulde infrequent as one of the goals of a core collection is to constructn extensive data base on these reference accessions (Brown et al.,987). The general steps to establish a core collection are: (a) deter-ine the size of the core; (b) divide the collection in distinct groups;

nd (c) select entries in each group to form the core. The complexityf establishing a core is a function of the available data and sam-ling procedure used (Brown, 1989a,b; Brown and Spillane, 1999).he established core collection must be validated to ensure its ade-uacy and usefulness by assessing whether the characteristics andariability of the entire collection have been maintained. A compar-son of the entire and core collection properties is accomplishedsing mean, variance, frequency, and distribution data of severalorphological traits or molecular markers. The objective of this

esearch was to develop a core collection of the G. max accessionsf the USDA Soybean Germplasm Collection based on the resultsrom five selection strategies.

. Materials and methods

.1. Data tabulation

The entire introduced G. max sub-collection of the USDA Soy-ean Germplasm Collection consists of 16,999 accessions, but only5,558, the entire collection in this work, have been evaluated forny quantitative traits. A majority of those accessions not evalu-ted (960) are in maturity groups IX and X, which are not adaptedo any location in the continental US Only the evaluated accessionsere subjected to the five sampling strategies. In addition to pass-ort information, evaluation data gathered from 1963 to 2005 inne of four locations where specific accessions are adapted weresed. The 000, 00, 0 and I maturity group accessions were charac-erized and assessed in the State of de Minnesota; the I, II, III andV maturity group accessions were characterized and assessed inhe State of Illinois; the V, VI, VII and VIII maturity group acces-ions were characterized and assessed in the State of Mississipi;nd the IX and X maturity group accessions were characterized andssessed in the territory of Porto Rico. The evaluation data wereollected over two years of testing, assessed in the field plots withour 3.6 m long rows, spaced at 0.76 m and classified in two cat-gories, qualitative and quantitative (one replication in each yearor qualitative traits an two replications in each year for quantita-ive traits). The qualitative data included flower color, pod color,eed coat color, hilum color, pubescence color, pubescence form,ubescence density, seed coat luster, plant growth habit, maturityroup, and origin (Hill et al., 2005). The quantitative data, averagedver the two years, included flowering date recorded as days afterlanting when 50% of the plants have one flower; maturity dateecorded as days after planting when 95% of the pods have reachednal color; plant height recorded at maturity; lodging score ratedn a scale from 1 (erect) to 5 (prostrate); stem termination scoreecorded on a scale from 1 to 5 with 1 representing very abrupt

tem termination and 5 being very viney; shattering score recordedn the date of maturity and 2 weeks after maturity on a scale of–5 with 1 equal no shattering, 2 equal 1–10% open pods, 3 equal0–25% open pods, 4 equal 25–50% open pods and 5 equal greaterhan 50% open pods; seed quality score on a scale of 1–5 considering

search 119 (2010) 277–289

wrinkling, defective seed coat, greenish or diseased seeds; mottlingscore on a scale of 1–5 with 1 equal no mottling, 2 equal 1–10% ofseed coat mottled, 3 equal 10–25% of seed coat mottled, 4 equal25–50% of seed coat mottled, and 5 equal greater than 50% of seedcoat mottled; 100 seed weight; seed yield; protein concentration;oil concentration; percentage of palmitic, stearic, oleic, linoleic andlinolenic acid in the oil (Hill et al., 2005). With such a massive eval-uation effort spanning more than forty years, it is not surprisingthat a full complement of data is not available for every accession.Some data points were lost simply because of poor stands or otherfactors that prevented specific data from being collected. Data onsome traits were not collected in some of the first evaluations (e.g.,palmitic, stearic, and oleic acid concentration). Some data were notcollected in some experiments because it would not be meaningful(e.g., shattering scores for accessions that mature very late for theevaluation location) and it is not possible to collect some data (e.g.,mottling scores on seeds with brown or black seed coats).

2.2. Choice of sample size and representativeness

The portion of the core collection derived from the evaluatedaccessions was defined as containing 1600 accessions. This sizerepresents a sample of approximately 10% of the 15,558 evaluatedaccessions and complies with Brown’s (1989b) report on variabilitypreservation in core collections based on neutral allele theory. Twomain sampling strategies were initially used to establish the corecollections: (a) random selection of accessions; and (b) stratifiedsampling selection (Brown, 1989b; Spagnoletti-Zeuli and Qualset,1993; Hu et al., 2000). In the first procedure, the 15,558 accessionswere kept in a single group from which 1600 accessions were ran-domly sampled. In the stratified sampling strategy, the accessionswere classified in groups based on geographic origin and subgroupsbased on maturity groups to form non-overlapping strata.

2.3. Accessions clustering or stratification

There were four large groups based on geographical origin: (a)China (primary diversity center); (b) Korea and Japan (secondarydiversity center); (c) other Asian countries including Russia (ter-tiary diversity center); (d) other regions (the Americas, Africa,Europe and accessions from unknown regions). These groupingswere based on several studies that examined the relationshipbetween geographic origin and genetic relatedness among soybeanaccessions (Hymowitz and Kaizuma, 1981; Perry and McIntosh,1991; Griffin and Palmer, 1995; Li and Nelson, 2001). Griffin andPalmer (1995), using variation in isozymes, reported that acces-sions from Japan and Korea were more closely related to each otherthan to accessions from China and that accessions from southeastand south central Asia formed distinct groups. Analyzing morpho-logical data, Perry and McIntosh (1991) found differences betweennorth and central China, and accessions from other Asian countrieswere distinct from China, Japan, and Korea. Based on random ampli-fied polymorphic DNA (RAPD) data, Li and Nelson (2001) foundthat the genetic diversity within China was related to provincialorigin but this was not true for Japan and South Korea. Diversitywithin China was also much greater than within either Japan orSouth Korea. Subdividing by each Chinese province would have cre-ated too many strata so China was divided into four subgroups: thenortheastern region of China; the Huanghe-Huaihe-Haihe region ofcentral China; southern China, and accessions of unknown provin-cial origin (Table 1). Li and Nelson (2001) also showed that Japan

and South Korea were similar in genetic diversity but distinct fromChina so those countries were place together in a separate group.There are not adequate data available to examine genetic diver-sity patterns among and within other Asian countries but theywere designated as a single tertiary source based their proximity to

M.F. Oliveira et al. / Field Crops Re

Table 1Number of evaluated soybean accessions in each origin category in the USDA Soy-bean Germplasm Collection.

Origin Number of accessions

1 China 60641.1 Northeast region 17621.2 Huanghe-Huaihe-Haihe region 14711.3 South region 14501.4 No provincial identification 13812 Japan and Korea 6479

CanttsdsabItmAeStapsgiqcoe

rccwGamfgKe

TNb

3 Russia and other Asian countries 15194 Other countries 1496

Total accessions 15,558

hina and the limited data that are available that indicate they aren important source of genetic diversity. Because germplasm fromon-Asian sources are relatively recent introductions from Asia,hose accessions could have been excluded from consideration forhe core collection, but Brown-Guedira et al. (2000) showed thatome accessions obtained from Europe or Africa were geneticallyistinct indicating germplasm from these countries may repre-ent sources of genetic diversity that are not represented by Asianccessions. Further stratification to form subgroups was performedased on 12 maturity groups (000, 00, 0, I, II, III, IV, V, VI, VII, VIII and

X) which classify soybean accessions based on photoperiod andemperature response (Carlson and Lersten, 2004). This subdivides

ajor geographical groups into zones of more specific adaptation.total of 81 strata were established and accession selection within

ach stratum was used to assemble each core collection (Table 3).ince maturity groups IX and X cannot be evaluated in continen-al US, there is almost no quantitative data on these accessionnd they were nearly totally excluded from the statistically sam-ling procedures (Table 2). Because these accessions are adapted toub-tropical and tropical conditions and are likely to have uniqueenetic diversity, a sample of 10% (96) of these accessions wasncluded in the core collection based on multivariate analysis of theualitative data that are available for these accessions. The wholeore collection, therefore, consisted of 1696 accessions, but becausenly qualitative data are available for these 96 accessions they werexcluded from the validation procedures.

Based on available data about genetic diversity, two additionalestrictions were added to the sampling procedure: a larger per-entage of accessions from China were included than from otherountries and cultivars derived from scientific planting breedingere excluded. The number of accessions in the USDA Soybeanermplasm Collection from Japan and Korea is nearly the sames the number from China (Table 1); however, China represents a

uch larger and more environmentally diverse area. The results

rom Li and Nelson (2001) also indicate that there is much moreenetic diversity within Chinese accessions than in those fromorea and Japan. Since the soybean was domesticated in China (Gait al., 2000; Zhou et al., 1998) and has been cultivated in Korea

able 2umber of evaluated soybean accessions in each maturity group in the USDA Soy-ean Germplasm Collection.

Maturitygroup

Number ofaccessions

Maturitygroup

Number ofaccessions

000 129 V 2,33300 466 VI 1,3160 1050 VII 846I 1622 VIII 709II 1732 IX 39III 1634IV 3682 Total

Acces-sions

15,558

search 119 (2010) 277–289 279

and Japan for over 2000 years (Kihara, 1969), genetic diversity inaccessions from these east Asia countries will be greater than inaccessions from other parts of the world. Based on this information,approximately 15% of the accessions from the strata representingChina were selected and approximately 7.5% of the accessions in theother strata were chosen. These changes within strata were imple-mented so as to not change the total number of accessions in thecore collection. Additionally, we eliminated, as best we could usingavailable pedigree information, all cultivars derived from scientificplant breeding from being added to the core collection.

2.4. Number of entries (accessions) per group and accessionchoice

A total of five selection strategies were used: (a) random sam-pling from the entire collection as a single stratum and accessionswere randomly selected; (b) proportional sampling selected ran-dom accessions proportional to the stratum size; (c) logarithmicsampling selected random accessions proportional to the log ofthe stratum size; (d) multivariate proportional sampling (Crossaet al., 1995) selected accessions proportionally to the stratum sizecoupled with within stratum selection based on the inverse multi-variate Tocher procedure (Rao, 1952; Cruz, 2006; Vasconcelos et al.,2007); (e) Multivariate logarithmic sampling selected accessionsproportionally to the log of the stratum size coupled with withinstratum based on the inverse Tocher procedure.

In the strategies where the multivariate analysis was used tochoose the accessions, the following analyses were performed ineach stratum. For qualitative traits, the phenotypic distance wascalculated as a simple matching coefficient. The qualitative datawere transformed into binary data, coded in zeros and ones, pro-ducing a data file with 15,558 lines and 131 columns. The 131columns were the result of the sum of the number of classes ofeach qualitative characteristic. From these data a simple match-ing index was calculated with the Genes computer package (Cruz,2006), using the following formula:

dii′ =

√√√√1t

(t∑i

bi + ci

ni

)

where dii′ = distance associated with the simple matching coeffi-cient between the individuals i and i′; t = number of traits analyzed(t = 11); ni = number of data points per trait i (ni varies from trait totrait); bi and ci = unmatched data, 1–0 and 0–1 types, respectively,for trait i.

The qualitative distance matrix was generated for each stratumusing these distances. For quantitative traits, the mean standard-ized Euclidian distance was used to calculate the phenotypicdistance. As the traits used in this study were quantified in dif-ferent units (weight, length, percentages, etc.) and the scale effectsthe result obtained, the standardized values were used for the dis-tance calculation. A quantitative distance matrix was generatedfor each stratum using these distances. The total distance matrixwas obtained as the sum of the simple matching and the meanstandardized Euclidian distance matrices. The distance values foreach matrix were standardized prior to calculating the total dis-tance matrix. This was necessary because the amplitude of thedistance values varied from zero to one in the distance matrixobtained from the simple matching coefficient and from zero to+∞ in the distance matrix obtained from the mean standardized

Euclidian distance. The total distance matrix was generated for eachstratum. The inverse multivariate Tocher procedure uses the totaldistance matrix for clustering the most distant accessions. Accord-ing to the proposed method, the most distant pair of individualsis detected to form the initial group. To include another accession

2 ps Re

isins

2

tforcteaaishtttsef1q1tFdracbtivTtwctsbtZacioraPctato1D

R

80 M.F. Oliveira et al. / Field Cro

n this group the distances between the group and the other acces-ions are calculated and the most distant accession is then includedn the initial group. The procedure is repeated until the requiredumber of accessions is included in each group, which forms theampled stratum.

.5. Validation of the core collection

The five cores developed were compared to the entire collec-ion, using the quantitative data for 18 traits for alterations in therequency distribution of the accessions, for means and variancesf the measured traits, for among traits correlations, and for theange retention index. This validation process used the statisti-al parameters to assess how accurately each core represented theotal variability in the entire collection. The distributions betweenach core and entire collection for each of the 11 qualitative traitsnd 18 quantitative traits were analyzed using the �2 test (Snedecornd Cochran, 1980) to verify whether the frequency of the traitsn the accessions in each stratum of the core collection remainedimilar to the frequency in the entire collection and to verify theomogeneity in the frequency distribution of the traits betweenhe entire collection and each core collection. The classes of eachrait were determined by dividing the range in the entire collec-ion by 20, which was empirically considered adequate for thetudied traits (Sturges, 1926). These ranges were also used withinach core collection. If the class in the core collections containedewer than five accessions, the Yates correction was used (Yates,934; Sokal and Rohlf, 1981). Homogeneity of variances of the 18uantitative traits was compared by F-tests (Snedecor and Cochran,980) to determine any differences in variance between each ofhe cores and the entire collection. The one-tail table was used for> 1 as reported by Snedecor and Cochran (1980). For this proce-ure, consider two populations (X and Y), where s2

x and s2y are the

espective variances for the same trait. The greater variance wasssigned s2

x and the smaller variance was assigned s2y , so that the

alculated F value obtained was always greater than 1. The proba-ility expressed by the F value was multiplied by two and comparedo the one-tail table value. This procedure was necessary becauset is a one-tail test determined by choosing the numerator as theariance of greater value (Snedecor and Cochran, 1980; Cruz, 1997).he 18 quantitative trait means were compared by the Student t-ests (Snedecor and Cochran, 1980). The phenotypic correlationsere independently calculated between each core and the entire

ollection for the 18 quantitative traits to investigate whetherhese associations, which may be under genetic control, were con-erved in the core collections. Pearson’s correlation coefficientsetween quantitative traits were calculated. Equality of correla-ion coefficients in the two collections was tested with Fisher’s

transformation (Snedecor and Cochran, 1980) and the associ-ted test of significance. Because 18 traits were evaluated, 153omparisons were possible [n = 18 traits, n(n − 1)/2 = 153 compar-sons]. Since these comparisons represent sequential comparisonsn the same data set, it was advisable to apply Bonferroni’s cor-ection to the significance level of the statistical tests (Snedecornd Cochran, 1980). Therefore, a significance probability level of≤ 0.0003 (P ≤ 0.05/153) was required for an individual correlationoefficient to be considered significant. To determine the propor-ion of the entire collection range retained by each core collection,range retention index was calculated for each of the 18 quantita-

ive traits. The range retention index is the proportion of the rangef the entire collection maintained in a core collection (Diwan et al.,

995). The mean range retention index was calculated according toiwan et al. (1995), using the following equation:

R =∑t

i=1AiCN/AiCB

t,

search 119 (2010) 277–289

where RR = mean range retention index; AiCN = amplitude of traiti in the core collection; AiCB = amplitude of trait i in the entire col-lection; t = number of traits.

A core collection was considered to be representative of theentire collection and, therefore acceptable, if 30% or fewer (Diwanet al., 1995; Skinner et al., 1999; Hu et al., 2000) of the trait meansand ranges were significantly different (P ≤ 0.05) from those ofthe entire collection. A core collection was taken as representa-tive of the entire collection as a resource for plant breeders whenmeans, ranges and phenotypic correlations between the trait weremaintained and when variances and kurtosis were unequal, thatis the variance of core > variance of entire collection and the kur-tosis of the core < kurtosis of entire collection (Holbrook et al.,1993; Radovic and Jelovac, 1994; Charmet and Balfourier, 1995;Skinner et al., 1999; van Hintum et al., 2000). Therefore, the coredistribution properties would also differ. The assumption, whichwas verified by computer simulation, was that to maintain thegenetic variability of a larger population in smaller populations itwas necessary to increase the proportion of extreme phenotypes(Spagnoletti-Zeuli and Qualset, 1993; van Hintum et al., 2000).

The sample size of 1600 accessions (10.28%) was large enoughto ensure that the all core collections maintained most of the statis-tical properties of the entire collection and all sampling strategiesused to assemble core collections met the requirements presentedin the literature to preserve the entire collection accession variabil-ity.

3. Results

3.1. Size of the core collection and strata

The random core collection was developed using random sam-pling to select 1600 (10.3%) accessions from the 15,558 evaluatedaccessions in the collection. The proportional, multivariate propor-tional, logarithmic and multivariate logarithmic core collectionswere each assembled by selecting 1600 accessions from within the81 strata in which these accessions had been partitioned. Acces-sion number per stratum ranged from 1 to 2407 (Table 3 ). Theproportional and logarithmic sampling strategies were applied todetermine the number of accessions to sample by either randomor multivariate (trait diversity) driven selection within each stra-tum. In the proportional strategy cases, there were eight strata withtwo or fewer accessions and all accessions from those strata wereincluded in the core (Table 3). There were also five strata with 3–10accessions and more than 10% of the accessions were selected. Inthe logarithmic strategy cases, all accessions in strata with 11 orfewer accessions were included in the cores (Table 3). These alter-ations in the sampling strategies were done to ensure that the stratawith very few accessions were adequately represented in the finalcore.

3.2. Validation of the representativeness of the strategies to formthe core collection/Comparison between the entire collection andthe core collection

3.2.1. Comparison of frequency distributionValidation of the cores using the distribution parameters of the

11 qualitative traits and 18 quantitative traits showed that the ran-dom selection strategy provided the core with the best statisticalrepresentation of each of the 81 strata based on origin and matu-

rity group. The random strategy also most accurately representedthe distribution of values of the 11 qualitative traits and 18 quanti-tative traits, with no trait distribution being significantly differentfrom the corresponding trait of the entire collection (Table 4). Inthe proportional strategy, the distribution of only plant growth

M.F. Oliveira et al. / Field Crops Research 119 (2010) 277–289 281

Table 3Stratum definition, number of evaluated accessions in each stratum in the USDA Soybean Germplasm Collection, and number of accessions in each stratum chosen in eachof the selection strategies.

Stratum Stratum code Evaluated collection Random Proportional Logarithmic

Origina Maturity group Number of accession % Number of accession % Number of accession % Number of accession %

1 2 I 180 1.16 17 1.06 18 1.13 25 1.562 2 II 338 2.17 39 2.44 34 2.13 28 1.753 2 III 736 4.73 77 4.81 74 4.63 32 2.004 2 IV 2407 15.47 245 15.31 241 15.06 37 2.315 2 V 1577 10.14 165 10.31 158 9.88 35 2.196 2 VI 737 4.74 71 4.44 74 4.63 32 2.007 2 VII 227 1.46 26 1.63 23 1.44 26 1.638 2 VIII 152 0.98 14 0.88 16 1.00 24 1.509 2 IX 3 0.02 1 0.06 2 0.13 3 0.1910 2 0 75 0.48 12 0.75 8 0.50 21 1.3111 2 00 23 0.15 0 0.00 3 0.19 15 0.9412 2 000 24 0.15 2 0.13 3 0.19 15 0.9413 3 I 226 1.45 22 1.38 23 1.44 26 1.6314 3 II 172 1.11 17 1.06 18 1.13 25 1.5615 3 III 61 0.39 8 0.50 6 0.38 20 1.2516 3 IV 93 0.60 6 0.38 10 0.63 22 1.3817 3 V 148 0.95 16 1.00 15 0.94 24 1.5018 3 VI 110 0.71 7 0.44 11 0.69 23 1.4419 3 VII 164 1.05 21 1.31 17 1.06 25 1.5620 3 VIII 313 2.01 34 2.13 32 2.00 28 1.7521 3 IX 4 0.03 2 0.13 2 0.13 4 0.2522 3 0 130 0.84 11 0.69 13 0.81 23 1.4423 3 00 82 0.53 7 0.44 9 0.56 21 1.3124 3 000 16 0.10 2 0.13 2 0.13 12 0.7525 4 I 234 1.50 23 1.44 24 1.50 26 1.6326 4 II 79 0.51 9 0.56 8 0.50 21 1.3127 4 III 68 0.44 5 0.31 7 0.44 20 1.2528 4 IV 54 0.35 8 0.50 6 0.38 19 1.1929 4 V 62 0.40 7 0.44 7 0.44 20 1.2530 4 VI 57 0.37 6 0.38 6 0.38 19 1.1931 4 VII 63 0.40 3 0.19 7 0.44 20 1.2532 4 VIII 67 0.43 6 0.38 7 0.44 20 1.2533 4 IX 2 0.01 1 0.06 2 0.13 2 0.1334 4 0 459 2.95 52 3.25 46 2.88 29 1.8135 4 00 281 1.81 31 1.94 28 1.75 27 1.6936 4 000 70 0.45 6 0.38 7 0.44 20 1.2537 1.1 I 439 2.82 44 2.75 44 2.75 29 1.8138 1.1 II 618 3.97 72 4.50 62 3.88 31 1.9439 1.1 III 327 2.10 35 2.19 33 2.06 28 1.7540 1.1 IV 126 0.81 6 0.38 13 0.81 23 1.4441 1.1 V 21 0.13 4 0.25 3 0.19 15 0.9442 1.1 VI 11 0.07 3 0.19 2 0.13 11 0.6943 1.1 VII 1 0.01 0 0.00 1 0.06 1 0.0644 1.1 0 166 1.07 16 1.00 17 1.06 25 1.5645 1.1 00 46 0.30 6 0.38 5 0.31 18 1.1346 1.1 000 7 0.04 0 0.00 4 0.25 7 0.4447 1.2 I 40 0.26 2 0.13 4 0.25 18 1.1348 1.2 II 109 0.70 17 1.06 11 0.69 23 1.4449 1.2 III 220 1.41 23 1.44 22 1.38 26 1.6350 1.2 IV 560 3.60 52 3.25 56 3.50 30 1.8851 1.2 V 233 1.50 27 1.69 24 1.50 26 1.6352 1.2 VI 139 0.89 14 0.88 14 0.88 24 1.5053 1.2 VII 129 0.83 14 0.88 13 0.81 23 1.4454 1.2 VIII 25 0.16 3 0.19 3 0.19 15 0.9455 1.2 IX 2 0.01 0 0.00 2 0.13 2 0.1356 1.2 0 12 0.08 2 0.13 2 0.13 11 0.6957 1.2 00 1 0.01 0 0.00 1 0.06 1 0.0658 1.2 000 1 0.01 0 0.00 1 0.06 1 0.0659 1.3 I 26 0.17 1 0.06 3 0.19 16 1.0060 1.3 II 72 0.46 6 0.38 8 0.50 21 1.3161 1.3 III 103 0.66 10 0.63 11 0.69 22 1.3862 1.3 IV 327 2.10 30 1.88 33 2.06 28 1.7563 1.3 V 261 1.68 25 1.56 26 1.63 27 1.6964 1.3 VI 235 1.51 26 1.63 24 1.50 26 1.6365 1.3 VII 251 1.61 25 1.56 25 1.56 27 1.6966 1.3 VIII 142 0.91 10 0.63 15 0.94 24 1.5067 1.3 IX 28 0.18 3 0.19 3 0.19 16 1.0068 1.3 0 2 0.01 1 0.06 2 0.13 2 0.1369 1.3 00 2 0.01 0 0.00 2 0.13 2 0.1370 1.3 000 1 0.01 0 0.00 1 0.06 1 0.0671 1.4 I 477 3.07 53 3.31 48 3.00 30 1.88

282 M.F. Oliveira et al. / Field Crops Research 119 (2010) 277–289

Table 3 (Continued )

Stratum Stratum code Evaluated collection Random Proportional Logarithmic

Origina Maturity group Number of accession % Number of accession % Number of accession % Number of accession %

72 1.4 II 344 2.21 34 2.13 35 2.19 28 1.7573 1.4 III 119 0.76 12 0.75 12 0.75 23 1.4474 1.4 IV 115 0.74 9 0.56 12 0.75 23 1.4475 1.4 V 31 0.20 2 0.13 3 0.19 17 1.0676 1.4 VI 27 0.17 5 0.31 3 0.19 16 1.0077 1.4 VII 11 0.07 1 0.06 2 0.13 11 0.6978 1.4 VIII 10 0.06 2 0.13 2 0.13 10 0.6379 1.4 0 206 1.32 20 1.25 21 1.31 26 1.6380 1.4 00 31 0.20 6 0.38 3 0.19 17 1.0681 1.4 000 10 0.06 0 0.00 2 0.13 10 0.63Total 15558 1600 1600 1600�2b 36.94ns 68.88ns 1969.77c

ihe-HK

hb(mmoc(scafrt

TCmt

n

a Number is the origin code 1.1 – China – Northeast; 1.2 – China – Huanghe-Huaorea; 3 – Russia and other Asian countries; 4 – other countries.b ns—not significant.c Significant at 1% probability by the “�2” test.

abit was significantly different from the corresponding distri-ution in the entire collection. On the other hand, all but onematurity group) distributions were significantly different in the

ultivariate proportional strategy and all multivariate logarith-ic trait distributions were significantly different at the 1% level

f probability from the respective trait distributions of the entireollection. In the logarithmic strategy, the distribution of 10 traitsflower color, pod color, seed coat color, shattering score-early,eed mottling score, palmitic acid concentration, stearic acid con-

entration, oleic acid concentration, linoleic acid concentration,nd linolenic acid concentration) was not significantly differentrom the corresponding distribution in the entire collection. Theseesults were expected since we modified both the proportional andhe logarithmic strategies to over sample the smallest strata, the

able 4hi-square (�2) test results comparing the distributions within the five core collectionsultivariate logarithmic selection strategies to the distributions in the entire evaluated

raits.

Traits Random Proportional Mul

Flower color 5.7ns 0.7ns

Pod color 1.0ns 1.0ns 2Seed coat color 8.0ns 19.0ns 2Hilum color 9.5ns 18.2ns

Pubescence color 0.6ns 7.6ns 1Pubescence form 4.1ns 3.9ns 1Pubescence density 0.8ns 3.2ns 1Seed coat luster 4.6ns 9.4ns 2Plant growth habit 4.2ns 6.3*

Maturity group 6.9ns 16.2ns

Origin 24.6ns 22.7ns 1Flowering date 13.4ns 8.4ns

Maturity date 15.9ns 28.0ns

Plant height 18.7ns 18.7ns 3Lodging score 8.2ns 11.3ns 4Stem termination score 10.8ns 10.5ns 4Shattering score, early 11.5ns 12.0ns 3Shattering score, late 18.0ns 11.0ns 1Seed quality score 12.1ns 17.8ns 2Seed mottling score 3.3ns 8.5ns 1100 seed weight 17.1ns 6.7ns 5Seed yield 15.6ns 13.8ns 3Protein concentration 16.5ns 9.6ns 9Oil concentration 20.2ns 12.2ns 1,4Palmitic acid concentration 17.5ns 12.6ns 2Stearic acid concentration 15.2ns 11.7ns 1Oleic acid concentration 14.8ns 15.0ns 8Linoleic acid concentration 11.8ns 27.5ns 1,7Linolenic acid concentration 13.7ns 10.0ns 7

s = no-significant test at P ≥ 0.05 probability levels by �2 test.* Significant at the P ≤ 0.05 by �2 test.

** Significant at the P ≤ 0.01 by �2 test.

aihe; 1.3 – China – South; 1.4 – China – no provincial identification; 2 – Japan and

logarithmic strategy is already designed to sample more equallyfrom each strata regardless of the size (Brown, 1989b) and themultivariate strategies favored differences between accessions tosample within stratum for assembling the cores. The multivari-ate proportional and multivariate logarithmic strategies increasedthe frequency of the less represented classes, mainly the extremesof the distributions of each characteristic (some examples can beseen in Figs. 1–4). These results can be confirmed by the higher�2 values for the multivariate proportional and multivariate loga-

rithmic strategies shown in Table 4, supporting the idea that thesestrategies increased the frequency of the less represented classesfor most of the characteristics. A decrease in the trait distribu-tions kurtosis in the cores comparatively to those observed in therespective entire collections (data not shown) indicated that the

obtained by the random, proportional, logarithmic, multivariate proportional andUSDA Soybean Germplasm Collection for 11 qualitative traits and 18 quantitative

tivariate proportional Logarithmic Multivariate logarithmic

44.5** 7.8ns 33.2**

04.6** 3.9ns 136.9**

10.2** 25.3ns 137.8**

88.8** 46.0** 99.6**

52.3** 15.0** 122.8**

33.5** 20.7** 55.4**

50.3** 94.4** 101.0**

82.8** 38.5** 269.8**

60.8** 93.5** 127.6**

16.2ns 592.1** 592.1**

47.9** 579.4** 791.7**

70.4** 114.7** 134.7**

81.4** 213.4** 392.4**

23.4** 167.0** 381.4**

34.6** 78.2** 459.0**

00.0** 128.7** 475.5**

65.7** 27.3ns 328.5**

54.0** 47.7** 152.5**

30.5** 35.2** 242.2**

35.8** 21.1ns 87.2**

04.8** 62.6** 676.6**

20.1** 69.6** 422.2**

96.6** 41.3** 300.0**

43.3** 43.1** 501.7**

18.3** 24.8ns 186.9**

01.6** 21.4ns 75.2**

24.2** 23.8ns 757.7**

57.4** 29.4ns 315.7**

00.3** 22.7ns 572.5**

ps Research 119 (2010) 277–289 283

ls

3

trHwtasawotpt(smleae

3

cTtcsticpsells

4

tqst(fr(A1vsaeAeat

m

ssio

ns,

mea

ns

and

vari

ance

sof

the

18qu

anti

tati

vetr

aits

inth

efi

veco

reco

llec

tion

san

dth

eev

alu

ated

USD

ASo

ybea

nG

erm

pla

smC

olle

ctio

n.

Enti

reco

llec

tion

Ran

dom

Prop

orti

onal

Mu

ltiv

aria

tep

rop

orti

onal

Loga

rith

mic

Mu

ltiv

aria

telo

gari

thm

ic

Acc

essi

ons

Mea

nV

aria

nce

Acc

essi

ons

Mea

nV

aria

nce

Acc

essi

onM

ean

Var

ian

ceA

cces

sion

sM

ean

Var

ian

ceA

cces

sion

sM

ean

Var

ian

ceA

cces

sion

sM

ean

Var

ian

ce

e15

558

102.

624

7.4

1600

102.

5n

s24

9.1n

s16

0010

2.8n

s24

7.7n

s16

0010

2.7n

s27

2.6**

1600

103.

6*27

2.3**

1600

103.

1ns

298.

3**

1555

617

5.4

282.

915

9917

5.2

ns

282.

9ns

1600

175.

4ns

303.

0ns

1599

175.

1ns

335.

4**16

0017

4.6n

s39

8.7**

1600

173.

7**43

0.3**

1555

885

.583

8.3

1600

84.7

ns

814.

0ns

1600

86.4

ns

872.

1ns

1600

90.9

**14

16.3

**16

0093

.0**

1123

.5**

1600

92.5

**15

62.8

**

1455

52.

90.

914

992.

9n

s1.

0ns

1519

2.9n

s1.

0ns

1543

3.1**

1.4**

1542

3.0**

1.0**

1563

3.1**

1.4**

tion

scor

e13

573

2.3

1.4

1384

2.3

ns

1.4n

s14

082.

4ns

1.3n

s14

052.

7**1.

9**13

952.

7**1.

5*14

362.

8**1.

9**

re,e

arly

1429

41.

50.

514

521.

5n

s0.

5ns

1459

1.5n

s0.

5ns

1486

1.7**

1.1**

1378

1.5n

s0.

6**14

071.

7**1.

1**

re,l

ate

1107

22.

21.

311

242.

2n

s1.

2ns

1124

2.2n

s1.

3ns

1090

2.5**

1.7**

1084

2.2n

s1.

3ns

1091

2.5**

1.7**

core

1451

32.

50.

314

912.

5n

s0.

3ns

1501

2.5n

s0.

3ns

1480

2.6**

0.5**

1534

2.5n

s0.

3ns

1540

2.5*

0.4**

scor

e11

800

2.0

0.9

1196

2.0

ns

0.9n

s12

492.

0ns

0.9n

s10

382.

2**1.

3**12

552.

1ns

0.9n

s11

072.

1*1.

2**

ht

1555

415

.628

.516

0015

.5n

s29

.1n

s15

9915

.5n

s28

.3n

s15

9913

.9**

39.4

**16

0014

.7**

24.8

**15

9813

.9**

36.5

**

1553

01.

960.

5015

971.

95n

s0.

50n

s15

941.

94n

s0.

51n

s15

921.

7**0.

6**15

981.

86**

0.6**

1594

1.7**

0.6**

ntr

atio

n15

523

44.3

7.7

1596

44.1

ns

8.1n

s15

9244

.3n

s7.

6ns

1592

44.3

ns

12.8

**15

9244

.4*

9.0**

1591

44.5

**12

.5**

ion

1552

317

.84.

015

9617

.9n

s4.

3ns

1592

17.8

ns

4.1n

s15

9217

.3**

7.1**

1592

17.7

**4.

7**15

9117

.3**

7.1**

con

cen

trat

ion

1455

111

.91.

014

9212

.0n

s1.

0ns

1499

12.0

ns

1.0n

s14

6212

.1**

1.7**

1499

11.9

ns

1.1**

1479

12.0

**1.

6**

nce

ntr

atio

n14

551

3.4

0.4

1492

3.4

ns

0.4n

s14

993.

4ns

0.4n

s14

623.

4**0.

6**14

993.

4*0.

4n

s14

793.

4**0.

5**

cen

trat

ion

1455

122

.413

.614

9222

.4n

s13

.1n

s14

9922

.6n

s13

.8n

s14

6222

.5n

s32

.4**

1499

22.7

**14

.8*

1479

22.7

ns

31.7

**

once

ntr

atio

n15

541

53.6

13.2

1598

53.6

ns

12.5

ns

1595

53.5

ns

12.6

ns

1592

52.9

**23

.8**

1595

53.3

**13

.3n

s15

9352

.9**

23.6

**

con

cen

trat

ion

1554

18.

02.

215

988.

0n

s2.

2ns

1595

8.0n

s2.

2ns

1592

8.3**

4.6**

1595

8.0n

s2.

4*15

938.

3**4.

6**

nt

test

atP

≥0.

05p

roba

bili

tyle

vels

byt-

test

betw

een

sele

cted

sam

ple

and

the

enti

reco

llec

tion

.t

the

P≤

0.05

byt-

test

betw

een

sele

cted

sam

ple

and

the

enti

reco

llec

tion

.t

the

P≤

0.01

byt-

test

betw

een

sele

cted

sam

ple

and

the

enti

reco

llec

tion

.

M.F. Oliveira et al. / Field Cro

ess frequent categories were favored by the logarithmic samplingtrategies.

.2.2. Comparison of means and variancesThe 18 trait means and variances of the cores derived by

he random and proportional strategies did not differ from theespective means and variances of the entire collection (Table 5).owever, the logarithmic core collection trait means and variancesere predominantly dissimilar from those of the entire collec-

ion with only five and seven traits showing variance homogeneitynd non-significant mean differences, respectively (Table 5). Dis-imilar trait means and variances between the core collectionsnd the corresponding entire collection were also predominanthenever the selection of accessions within strata was diversity

riented based on multivariate analyses of the 18 quantitativeraits. The means of 14 and 16 traits differed for the multivariateroportional and multivariate logarithmic core collections, respec-ively, from their corresponding means in the entire collectionTable 5), but in most cases the actual differences could be con-idered small for practical purposes. The trait variances in theultivariate proportional and multivariate logarithmic core col-

ections were invariably larger than the respective values of thentire collections, which indicated that these strategies were prob-bly effective in preserving the original genetic variability of thentire collection.

.2.3. Comparison of phenotypic correlationCorrelations among all 18 quantitative traits in each of the five

ores and in the entire collections were obtained (data not shown).he degree of similarity between the correlation coefficients inhe core collections and their corresponding values in the entireollection was very high. The random and proportional samplingtrategies generated core collection with all correlations betweenraits not significantly different (P ≤ 0.05) from those correspond-ng values in the entire collection. Similarly 78%, 78% and 73% of theorrelations between traits in the logarithmic, multivariate pro-ortional, and multivariate logarithmic core collections were notignificantly different (P ≤ 0.05) from the corresponding ones in thentire collection. These results indicated that the phenotypic corre-ations between traits observed in the entire collection were mostikely preserved in the core collections regardless of the samplingtrategy.

. Discussion

Although random and proportional sampling were very effec-ive in maintaining data distributions, means and variances ofuantitative traits; random, proportional and logarithmic samplingtrategies retained the smallest percentages of the original dis-ribution ranges with averages of 84%, 89% and 89%, respectivelyTable 6). The smallest retention percentage values were observedor stearic acid concentration in the core collection obtained by theandom strategy (49%) and for flowering date in random strategy65%), proportional strategy (66%) and logarithmic strategy (69%).ll strategies included 100% of the ranges for the traits scored–5, which would be expected. Core collections obtained by multi-ariate proportional and multivariate logarithmic strategies wereuperior in maintaining the largest range of variation. The largestverage retention index was for multivariate logarithmic strat-gy (99%) with individual trait indexes ranging from 93% to 100%.ccording to Frankel and Brown (1984), the sampling strategy is

fficient when the constructed core collection retains, in average,t least 80% of the original trait range. All five sampling strategiesested exceeded that threshold.

Core collection construction is a sampling strategy that aims ataximizing retention of alleles available in the entire collection Ta

ble

5N

um

ber

ofac

ce

Trai

ts

Flow

erin

gd

atM

atu

rity

dat

ePl

ant

hei

ght

Lod

gin

gsc

ore

Stem

term

ina

Shat

teri

ng

sco

Shat

teri

ng

sco

Seed

qual

ity

sSe

edm

ottl

ing

100

seed

wei

gSe

edyi

eld

Prot

ein

con

ceO

ilco

nce

ntr

atPa

lmit

icac

idSt

eari

cac

idco

Ole

icac

idco

nLi

nol

eic

acid

cLi

nol

enic

acid

ns

=n

o-si

gnifi

ca*

Sign

ifica

nt

a**

Sign

ifica

nt

a

284 M.F. Oliveira et al. / Field Crops Research 119 (2010) 277–289

Fig. 1. Frequency distribution for the days to flowering characteristic for the entire collection and for the five core collections obtained using the random (R), proportional(P), multivariate proportional (MVP), logarithmic (L) and multivariate logarithmic (MVL) sampling strategies.

M.F. Oliveira et al. / Field Crops Research 119 (2010) 277–289 285

Fig. 2. Frequency distribution for the plant height characteristic for the entire collection and for the five core collections obtained using the random (R), proportional (P),multivariate proportional (MVP), logarithmic (L) and multivariate logarithmic (MVL) sampling strategies.

286 M.F. Oliveira et al. / Field Crops Research 119 (2010) 277–289

F ctionm L) sam

wkFt

ig. 3. Frequency distribution for the seed yield characteristic for the entire colleultivariate proportional (MVP), logarithmic (L) and multivariate logarithmic (MV

ith a minimum of redundancy. It does not propose, however, toeep all alleles of the entire in the core collection (Allard, 1992;rankel and Brown, 1984; Frankel et al., 1995). In practice, it meanso reduce the frequency of accessions from large strata and increase

and for the five core collections obtained using the random (R), proportional (P),pling strategies.

the frequency of accessions from smaller strata to maintain thelargest variability possible in a minimal number of accessions toincrease the efficiency of breeders and other scientists in identi-fying useful genetic diversity. All selected cores were considered

M.F. Oliveira et al. / Field Crops Research 119 (2010) 277–289 287

F llectiom ) sam

ancMo

ig. 4. Frequency distribution for the oil percentage characteristic for the entire coultivariate proportional (MVP), logarithmic (L) and multivariate logarithmic (MVL

cceptable because more than 70% of the means and ranges didot differ significantly from the corresponding values in the entireollection (Diwan et al., 1995; Skinner et al., 1999; Hu et al., 2000;alosetti and Abadie, 2001), but not all cores may be considered

ptimal.

n and for the five core collections obtained using the random (R), proportional (P),pling strategies.

As already mentioned, the purely statistical concept of pop-ulation sampling may not apply to describe the core collectionquality, since it may simply imply similarity of population distri-bution, but not of total variability. Random sampling generated acore collection which best represented the entire collection when

288 M.F. Oliveira et al. / Field Crops ReTa

ble

6R

ange

,ran

gere

ten

tion

ind

ex(R

R)

and

aver

age

ran

gere

ten

tion

ind

ex(R

R)

ofth

eba

sean

dfi

veco

reco

llec

tion

sas

sem

bled

byra

nd

om,p

rop

orti

onal

,log

arit

hm

ic,m

ult

ivar

iate

pro

por

tion

al,a

nd

mu

ltiv

aria

telo

gari

thm

icsa

mp

lin

gst

rate

gies

for

the

USD

ASo

ybea

nG

erm

pla

smC

olle

ctio

n.

Trai

tsEn

tire

coll

ecti

onR

and

omPr

opor

tion

alM

ult

ivar

iate

pro

por

tion

alLo

gari

thm

icM

ult

ivar

iate

loga

rith

mic

Ran

geR

ange

RR

(%)

Ran

geR

R(%

)R

ange

RR

(%)

Ran

geR

R(%

)R

ange

RR

(%)

Flow

erin

gd

ate

126

8265

.183

65.9

125

99.2

8769

.012

599

.2M

atu

rity

dat

e11

192

82.9

9282

.911

110

0.0

110

99.1

111

100.

0Pl

ant

hei

ght

286

210.

573

.620

7.5

72.6

286

100.

028

610

0.0

286

100.

0Lo

dgi

ng

scor

e4

410

0.0

410

0.0

410

0.0

410

0.0

410

0.0

Stem

term

inat

ion

scor

e4

410

0.0

410

0.0

410

0.0

410

0.0

410

0.0

Shat

teri

ng

scor

e,ea

rly

44

100.

04

100.

04

100.

04

100.

04

100.

0Sh

atte

rin

gsc

ore,

late

44

100.

04

100.

04

100.

04

100.

04

100.

0Se

edqu

alit

ysc

ore

44

100.

04

100.

04

100.

04

100.

04

100.

0Se

edm

ottl

ing

scor

e4

410

0.0

410

0.0

410

0.0

410

0.0

410

0.0

100

seed

wei

ght

37.6

35.2

93.7

35.0

93.1

36.2

96.3

31.5

83.8

35.9

95.5

Seed

yiel

d4.

23.

991

.93.

994

.03.

789

.34

95.5

3.9

93.1

Prot

ein

con

cen

trat

ion

26.2

18.8

71.8

20.0

76.4

24.6

94.0

24.3

92.8

26.2

100.

0O

ilco

nce

ntr

atio

n17

.315

.489

.015

.086

.715

.891

.313

.377

.117

.310

0.0

Palm

itic

acid

con

cen

trat

ion

10.4

8.1

78.0

8.7

83.7

10.4

100.

08.

178

.010

.096

.4St

eari

cac

idco

nce

ntr

atio

n9.

54.

749

.39.

196

.09.

510

0.0

6.8

71.7

9.5

100.

0O

leic

acid

con

cen

trat

ion

39.2

27.4

69.9

29.3

74.7

39.2

100.

031

.680

.639

.210

0.0

Lin

olei

cac

idco

nce

ntr

atio

n34

.727

.278

.434

.710

0.0

32.9

94.9

25.6

73.8

34.1

98.3

Lin

olen

icac

idco

nce

ntr

atio

n14

.710

.168

.712

.182

.214

.710

0.0

11.5

78.2

14.5

98.6

RR

84.0

89.4

98.1

88.9

99.0

search 119 (2010) 277–289

only statistical parameters (means, variances, symmetry and kurto-sis) were considered. Accessions with small representations in theentire collection, however, were likely excluded in the samplingprocedure, resulting in narrower trait ranges and loss of geneticdiversity comparatively to the entire collection. Accession redun-dancy in the most populated strata may have also occurred.

Brown (1989a), Diwan et al. (1995), Erskine and Muehlbauer(1991), and Schoen and Brown (1995) concluded that effectivestratified sampling of the entire collection will most likely improvethe core collection traits, diminish accession redundancy and retainlarger proportions of the original variability. In our study, themultivariate proportional and multivariate logarithmic samplingprocedures were superior to the others because they selected themost divergent accessions within each entire collection stratum.The variance increase was most likely the result of the preferen-tial selection of the least frequent accessions, which also helpedto diminish the distributions kurtosis in the cores compared tothose of the entire collection (graphs not shown). The multivariateselection strategy should be favored as a within stratum sam-pling strategy independent of whether the strata are large or small,whenever quantitative data are available. It is most effective whenthe variability changes with the strata (Brown, 1989b). Accord-ing to Brown (1989a), logarithmic selection would be preferredwhen the between and within strata quantitative variation areunknown.

Based on the validation data, the multivariate proportional sam-pling strategy was selected as the best procedure for producing asoybean core collection. This is consistent with the recommenda-tion of Brown (1989a) when the entire collection can be effectivelydivided into strata. The list of soybean accessions included in thecore collection is available from the corresponding author and canbe accessed through the Germplasm Resources Information Net-work website at http://www.ars-grin.gov/npgs/.

5. Conclusion

Four core collections assembled based on stratified samplingstrategies conserved more accession variability comparatively tothe random sampling derived core, and the sampling strategiesusing multivariate analysis (inverted Tocher) technique maximizedthe accession variability in the cores comparatively to the otherstrategies.

Acknowledgments

The authors thank The United States Department of Agriculture(USDA) and staff for accepting me as a visiting researcher duringthe doctorate course. Special thanks to Dr. Randall L. Nelson for hiscollaboration, support and availability at every moment.

References

Allard, R.W., 1992. Predictive methods for germplasm identification. In: Stalker, H.T.,Murphy, J.P. (Eds.), Plant Breeding in the 1990s. Oxon, Wallingford, pp. 119–146.

Brown, A.H.D., 1989a. Core collection: a practical approach to genetic resourcesmanagement. Genome 31, 818–824.

Brown, A.H.D., 1989b. The case for core collections. In: Brown, A.H.D., Frankel, O.H.,Marshall, D.R., Williams, J.T. (Eds.), The Use of Plant Genetic Resources. Univer-sity Press Cambridge, Cambridge, pp. 136–156.

Brown, A.H.D., 1995. The core collection at the crossroads. In: Hodgkin, T., Brown,A.H.D., van Hintum, T.J.L., Morales, E.A.V. (Eds.), Core Collections of Plant GeneticResources. International Plant Genetic Resources Institute (IPGRI). John Wiley &Sons, New York, pp. 3–20.

Brown, A.H.D., Grace, J.P., Speer, S.S., 1987. Designation of a core collection of peren-nial Glycine. Soybean Genet. Newsl. 14, 59–70.

Brown, A.H.D., Spillane, C., 1999. Implementing core collections—principles, pro-cedures, progress, problems and promise. In: Johnson, R.C., Hodgkin, T. (Eds.),Core Collections for Today and Tomorrow. International Plant Genetic ResourcesInstitute, Rome, pp. 1–9.

ps Re

B

C

C

C

C

C

C

C

D

E

F

F

F

F

G

G

H

v

H

H

M.F. Oliveira et al. / Field Cro

rown-Guedira, G.L., Thompson, J.A., Nelson, R.L., Warburton, M.L., 2000. Evalua-tion of genetic diversity of soybean introductions and North American ancestorsusing RAPD and SSR markers. Crop Sci. 40, 815–823.

arlson, J.B., Lersten, N.R., 2004. Reproductive morphology. In: Boerma, H.R., Specht,J.E. (Eds.), Soybeans: Improvement, Production and Uses, vol. 16, 3rd ed. Amer-ican Society of Agronomy, Madison, pp. 59–95.

arter, T.E., Nelson, R.L., Sneller, C.H., Cui, Z., 2004. Genetic diversity in soybean. In:Boerma, H.R., Specht, J.E. (Eds.), Soybeans: Improvement, Production and Uses,vol. 16, 3rd ed. American Society of Agronomy, Madison, pp. 303–416.

hang, R.Z., Qiu, J., Sun, J., Chen, Y., Li, X., Xu, Z., 1999. Collection and conservationof soybean germplasm in China. In: Proc. World Soybean Research ConferenceVI, Chicago, IL, 4–7 August 1999. National Soybean Research Lab, Urbana, pp.172–176.

harmet, G., Balfourier, F., 1995. The use of geostatistics for sampling a core collec-tion of perennial ryegrass populations. Genet. Resour. Crop Evol. 42, 303–309.

rossa, J., Basford, K., Taba, S., DeLacy, I., Silva, E., 1995. Three-mode analyses of maizeusing morphological and agronomic attributes measured in multilocational tri-als. Crop Sci. 35, 1483–1491.

ruz, C.D., 1997. Programa Genes: Aplicativo Computacional em Genética e Estatís-tica. Editora UFV, Vicosa.

ruz, C.D., 2006. Programa Genes: Análise Multivariada e Simulacão. Editora UFV,Vicosa.

iwan, N., McIntosh, M.S., Bauchan, G.R., 1995. Methods of developing a core col-lection of annual Medicago species. Theor. Appl. Genet. 90, 755–761.

rskine, W., Muehlbauer, F.J., 1991. Allozyme and morphological variability, out-crossing rate and core collection formation in lentil germplasm. Theor. Appl.Genet. 83, 119–125.

rankel, O.H., Bennett, E., 1970. Genetic Resources in Plants: Their Exploration andConservation. Blackwell Scientific Publications, Oxford.

rankel, O.H., Brown, A.H.D., 1984. Plant genetic resources today: a critical appraisal.In: Holden, J.H.W., Williams, J.T. (Eds.), Crop Genetic Resources: Conservationand Evaluation. Allen and Unwin, Winchester, pp. 249–257.

rankel, O.H., Brown, A.H.D., Burdon, J.J., 1995. The Conservation of Plant Biodiver-sity. Cambrigde University Press, Cambrigde.

rankel, O.H., Soulé, M.E., 1981. Conservation and Evolution. Cambridge UniversityPress, Cambridge.

ai, J.Y., Xu, D.H., Gao, Z., Shimamoto, Y., Abe, J., Fukushi, H., Kitajima, S., 2000. Studieson the evolutionary relationship among eco-types of G. max and G. soja in China.Acta Agron. Sin. 26, 513–520.

riffin, J.D., Palmer, R.G., 1995. Variability of thirteen isozyme loci in the USDAsoybean germplasm collections. Crop Sci. 35, 897–904.

ill, J.H., Peregrine, E.K., Sprau, G.L., Cremeens, C.R., Nelson, R.L., Kenty, M.M., Kilen,T.C., Thomas, D.A., 2005. Evaluation of the USDA Soybean Germplasm Collec-tion: Maturity Groups 000 to IV (PI 507670 to PI 574486). US Department ofAgriculture Technical Bulletin No. 1914.

an Hintum, T.J.L., Brown, A.H.D., Spillane, C., Hodgkin, T., 2000. Core Collection ofPlant Genetic Resources. International Plant Genetic Resources Institute (IPGRI),

Rome.

olden, J.H.W., 1984. The second ten years. In: Holden, J.H.W., Williams, J.T. (Eds.),Crop Genetic Resources: Conservation and Evaluation. Allen and Unwin, Winch-ester, pp. 277–285.

olbrook, C.C., Anderson, W.F., Pittman, R.N., 1993. Selection of a core collectionfrom the US germplasm collection of peanut. Crop Sci. 33, 859–861.

search 119 (2010) 277–289 289

Hu, J., Zhu, J., Xu, H.M., 2000. Methods of constructing core collections by stepwiseclustering with three sampling strategies based on the genotypic values of crops.Theor. Appl. Genet. 101, 264–268.

Hymowitz, T., Kaizuma, N., 1981. Soybean seed protein electrophoresis profiles from15 Asian countries or regions: hypotheses on paths of dissemination of soybeansfrom China. Econ. Bot. 35, 10–23.

Kihara, H., 1969. History of biology and other sciences in Japan in retrospect. In:Oshima, C. (Ed.), Proc. XII Int. Congr. Genetics, Tokyo, 19–28 August 1968, vol. 3.Science Council of Japan, Tokyo, pp. 49–70.

Li, Z., Nelson, R.L., 2001. Genetic diversity among soybean accessions from threecountries measured by RAPDs. Crop Sci. 41, 1337–1347.

Malosetti, M., Abadie, T., 2001. Sampling strategy to develop a core collection ofUruguayan maize landraces based on morphological traits. Genet. Resour. CropEvol. 48, 381–390.

Marshall, D.R., 1989. Limitations to the use of germplasm collections. In: Brown,A.H.D., Frankel, O.H., Marshall, D.R., Williams, J.T. (Eds.), The Use of Plant GeneticResources. University Press Cambridge, Cambridge, pp. 105–120.

Peeters, J.P., Galwey, N.W., 1988. Germplasm collections and breeding needs inEurope. Econ. Bot. 42, 503–521.

Perry, M.C., McIntosh, M.S., 1991. Geographical patterns of variation in USDA soy-bean germplasm collection. I. Morphological traits. Crop Sci. 31, 1350–1355.

Radovic, G., Jelovac, D., 1994. The possible approach in maize core collection devel-opment, in. Genetic Resources section meeting of Eucarpia. In: Evaluation andExploitation of Genetic Resources: Pre-breeding. Proceedings—Versailles. Insti-tut National de la Recherche Agronomique, France, pp. 109–115.

Rao, R.C., 1952. Advanced Statistical Methods in Biometric Research. John Wiley &Sons, New York.

Schoen, D.J., Brown, A.H.D., 1995. Maximising genetic diversity in core collections ofwild relatives of crop species. In: Hodgkin, T., Brown, A.H.D., van Hintum, T.J.L.,Morales, E.A.V. (Eds.), Core Collections of Plant Genetic Resources. InternationalPlant Genetic Resources Institute (IPGRI). John Wiley & Sons, New York, pp.55–76.

Skinner, D.Z., Bauchan, G., Auricht, G., Hughes, S., 1999. Developing a core collectionfrom a large annual Medicago germplasm collection. In: Johnson, R.C., Hodgkin,T. (Eds.), Core Collections for Today and Tomorrow. International Plant GeneticResources Institute, Rome, pp. 61–67.

Snedecor, G.W., Cochran, W.G., 1980. Statistical Methods, 7th ed. Iowa State Univer-sity Press, Ames.

Sokal, R.R., Rohlf, F.J., 1981. Biometry: The Principles and Practice of Statistics inBiological Research. WH Freeman, Oxford.

Spagnoletti-Zeuli, P.L., Qualset, C.O., 1993. Evaluation of five strategies for obtaininga core subset from a large genetic resource collection of durum wheat. Theor.Appl. Genet. 87, 295–304.

Sturges, H.A., 1926. The choice of a class interval. J. Am. Stat. Assoc. 21, 65–66.Vasconcelos, E.S., Cruz, C.D., Bhering, L.L., Ferreira, A., 2007. Estratégias de

amostragem e estabelecimento de colecões nucleares. Pesq. Agropec. Bras. 42,507–514.

Yates, F., 1934. Contigency table involving small numbers and the �2 test. J. Roy.Stat. Soc. Suppl. 1, 217–235.

Zhang, B., Qiu, L., Chang, R., 2003. Advance on genetic diversity and core collectionestablishment for soybean. Cult. Crops 3, 46–48.

Zhou, X.A., Peng, Y.H., Wang, G.X., Chang, R.Z., 1998. The study of sort searches ofcultivated cultivars in China. China Seed Ind. 1, 1–4 (in Chinese).