training, validation, and target populations training, validation, and target populations mark...

23
Training, Validation, Training, Validation, and Target Populations and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling, John Keele, Gary Bennett, and John Pollak

Upload: claire-gray

Post on 23-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

Training, Validation, and Training, Validation, and Target Populations Target Populations

Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

John Keele, Gary Bennett,and John Pollak

Page 2: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

Illumina Infinium Bovine BeadChip

~ 50,000 SNP markers across the bovine genome

-High resolution (1 SNP per 60,000 base pairs)

- Multiple breeds used for SNP discovery

BARC (ARS)USMARCUniversity of MissouriUniversity of Alberta

50,000 Markers on a Chip50,000 Markers on a Chip(50K Chip)(50K Chip)

Page 3: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

Populations Involved in Populations Involved in Genomic Predictions ofGenomic Predictions of

Economically Important TraitsEconomically Important Traits

Discovery (Training)

Validation

Target (Application)

r0

r1

Degree of genetic relationship between populations(ideally similar)

Page 4: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

3 Fundamental Types of 3 Fundamental Types of Discovery PopulationsDiscovery Populations

• Purebreds of a Single Breed

• Purebreds of Multiple Breeds

• Crossbreds

Page 5: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

2 Fundamental Types of 2 Fundamental Types of Discovery DataDiscovery Data

• AI Sires with High Accuracy EPDs

• Individuals with Own Phenotypes

Page 6: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

Two Resource Populations at Two Resource Populations at USMARCUSMARC

USMARC Cycle VIIUSMARC Ongoing GPE 2,000 Bull ProjectAI Sires: AN, HH, AR, SM, CH, LM, GV

Base Cows: AN, HH, MARC III

F1 CowsF1 Steers

F12 CowsF1

2 Steers

F1 BullsAN &

HH only

Page 7: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

2000 Bull Project2000 Bull Project

• Collaborative Effort Researchers Breed Associations

• Breed associations provided semen for DNA on influential sires

• USMARC ran the 50K SNP chip on those 2,000 sires• USMARC provides extensively phenotyped animals

for use as training data set

Page 8: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

2,000 Bull Project:2,000 Bull Project:Number of Sires SampledNumber of Sires Sampled

• Angus• Hereford• Simmental• Red Angus• Gelbvieh• Limousin• Charolais• Shorthorn

• Brangus• Beefmaster• Maine-Anjou• Brahman• Chiangus• Santa Gertrudis• Salers• Braunvieh

402

317

253

173

136

131

125

86

68

64

59

53

47

43

42

27

2026

Page 9: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

Deregression of EPDsDeregression of EPDs• An approach used to scale EPDs so that high

and low accuracy animals can be included in the same analysis Genetic variances are the same regardless of

accuracy Residual variances are heterogeneous depending on

the accuracy of the animal

• Allows use of EPDs as phenotypes in genomic analyses

• AI Sires with High Accuracy EPDs

Page 10: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

Goals of 2,000 Bull ProjectGoals of 2,000 Bull Project

• Demonstrate feasibility and understand challenges of applying whole-genome selection in beef cattle.

• Provide prediction equations for general use

• Provide genomic predictions for the bulls in the project

Page 11: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

Training Data:Training Data:GPE Cycle VII PopulationGPE Cycle VII Population

AI Sires: AN, HH, AR, SM, CH, LM, GV

Base Cows: AN, HH, MARC III

F1 CowsF1 Steers

F12 CowsF1

2 Steers

F1 BullsAN &

HH only

R. Mark Thallman
Replace photos with photos of F1 bulls.Also need photo of steers in feed efficiency facility.Also F1 steers.All photos with red outlines need to be replaced.
Page 12: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

Training Data:Training Data:GPE Ongoing Continuous SamplingGPE Ongoing Continuous Sampling

AI Sires: AN, HH, SM, CH, AR, LM, GV, SH,BN, BM, MA, BR, CI, SG, SA, BV

Dams: AN, HH, CH, SM, MARC III, Cycle VII F1

F1& BC HeifersF1 & BC Steers

F1 BullsA

N, H

H, C

H, S

M

Page 13: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

Cross-validationCross-validation

USMARC Cycle VIIUSMARC Ongoing GPE 2,000 Bull ProjectAI Sires: AN, HH, AR, SM, CH, LM, GV

Base Cows: AN, HH, MARC III

F1 CowsF1 Steers

F12 CowsF1

2 Steers

F1 BullsAN &

HH only

Validation Training

Training Validation

Page 14: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

Genetic Correlations in Genetic Correlations in Cross-validationCross-validation

*2,000 bull predictions excluded the sires of the MARC validation populations

Page 15: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

Proportion of Genetic Variance Proportion of Genetic Variance Explained in Cross-validationExplained in Cross-validation

*2,000 bull predictions excluded the sires of the MARC validation populations

Page 16: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

Proportion of Variation inProportion of Variation inWeight Traits Project fromWeight Traits Project from

Training on 2,000 BullsTraining on 2,000 Bulls

*Full = Prediction of sires including the 2,000 bulls*Reduced = Predictions excluding the 2,000 bulls

Page 17: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

What Can We Do to ImproveWhat Can We Do to ImprovePrediction Accuracy?Prediction Accuracy?

• Add more phenotypes (animals)• Increase marker density• Incorporate individual animal DNA sequence on

influential sires• Do a better job of using the information we have

(better statistical analyses)

Page 18: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

What Can We Do to ImproveWhat Can We Do to ImprovePrediction Accuracy?Prediction Accuracy?

• Add more phenotypes (animals)• Increase marker density• Incorporate individual animal DNA sequence on

influential sires• Do a better job of using the information we have

(better statistical analyses)

We need to find ways to use information from all available discovery populations, regardless of the target breed(s).

Page 19: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

770K Chip770K Chip

• Illumina BovineHD• 770,000 Single Nucleotide Polymorpisms

(SNP)• Much higher SNP density than 50K chip• Should allow predictions to be less breed-

specific• We are in the process of having this new

chip run on > 300 sires that have substantial progeny at USMARC

Page 20: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

Populations Involved in Populations Involved in Whole Genome SelectionWhole Genome Selection

Training (Discovery)

Validation

Application

r0

r1

This relationship contributes to discovery bias through Linkage

This relationship affects the accuracy of prediction, but the effects erode over time

Are these economically relevant traits the same?

Page 21: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

Prospects for Moving Beyond Prospects for Moving Beyond Validation PopulationsValidation Populations

• Conceptually, it should be possible to combine discovery and validation into a single step with a single population.

• There is still considerable work to be done to make this practical.

• This concept assumes that the accuracy of the genomic part of each individual’s genetic evaluation should vary depending on its genetic relationship to the discovery population.

Page 22: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

Prospects for Moving Beyond Prospects for Moving Beyond Validation PopulationsValidation Populations

• This concept requires that raw genotypes be available on the training/validation population as well as the target population.

• For traits that are routinely recorded in the target population, phenotypes should be continuously integrated into the training/validation population.

Page 23: Training, Validation, and Target Populations Training, Validation, and Target Populations Mark Thallman, Kristina Weber, Larry Kuehn, Warren Snelling,

ConclusionsConclusions• Within-breed predictions based on the 50K

work well.

• Training on multiple purebred populations is more effective than training on only a single, small purebred population.

• With increasing marker density, crossbred populations will likely become increasingly important components of training.