introduction to animal breeding with examples of (non-)gaussian traits

Post on 08-Apr-2015

610 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk at INLA group meeting (http://www.r-inla.org) at NTNU, Department of Mathematical Sciences (http://www.ntnu.no/imf), Trondheim, Norway

TRANSCRIPT

Introduction to Animal Breeding withExamples of (Non-)Gaussian Traits

Gregor Gorjanc

University of Ljubljana, Biotechnical Faculty, Department of Animal Science, Slovenia

INLA for Animal Breeders “Project"Trondheim, Norway30th August 2010

Thank you for the invitation to NTNU!!!

My department ...

Table of Contents

1. Animal breeding crash course

2. Categorical trait example

3. Survival analysis example

1. Animal Breeding Crash Course

Introduction

I Animal breeding= mixture(animal science, genetics, statistics, . . . )

I Many species (cattle, chicken, pig, sheep, goat, horse, dog,salmon, shrimp, honeybee, . . . )

I Many (complex) traits:I production (milk, meat, eggs, . . . )I reproduction (no. of offspring, insemination success, . . . )I conformation (body height, width, . . . )I health & longevityI . . .

I Genetic evaluation - to enhance selective breeding

Selective BreedingI Measure phenotype in candidates and select those with the

most favourable values (= "mass” selection)I Selected candidates will bred the next (better) generation

I . . . , but phenotype is not transmitted to the next generation

Decomposition of Phenotypic Value

Genotype Environment

Phenotype

P = G + E + G × E

I Genetic evaluation = inference of genotypic value given thedata and postulated model (= “BLUP” selection)

Postulated Model and DataI Postulated model

P = G + E + G × E = A + D + I + . . .

I A - additive (breeding) valueI D - dominanceI I - epistasis

I DataI phenotypes on various relatives (pedigree)

I own performance testI progeny testI (half-)sib testI . . .

I recently also genotype marker data

Evaluation via Pedigree based Mixed ModelsI Not so standard example - “maternal animal model”

y|b, c, ad , am,R ∼ N (Xb + Zcc + Zadad + Zamam,R)

R = Iσ2e

b ∼ const.c|C ∼ N (0,C)

C = Iσ2c

a =(aT

d , aTm)T |G ∼ N (0,G)

G = G0 ⊗ A,G0 =

(σ2

adσad ,am

sym. σ2am

)data: y (phenotypes), X,Z∗(“covariates”), A (pedigree)

parameters: b, c, a (means)σ2

c , σ2ad, σad ,am , σ

2am , σ

2e (variances)

Inference (for Gaussian models)I “Standard”

I means - solve Mixed Model (Normal) Equations (MME∗)Henderson (1949+)

I SE of means (needed for accuracies) - inversion of LHS orsome approximation

I variances - maximize Restricted Likelihood (REML)Patterson & Thompson (1971)

I “Powerfull/Popular/Fancy/. . . ” - McMC

I ∗MME

LHS =

XTR−1X XTR−1Zc XTR−1Za

ZTc R−1Zc + C−1 ZT

c R−1ZaZT

a R−1Za + G−1 ⊗ A−1

sym.

RHS =

((XTR−1y

)T,(ZT

c R−1y)T,(ZT

a R−1y)T)T

Graphical Model View of Pedigree Model

A−1 =(T−1)TW−1T−1

= (I− 1/2P)TW−1(I− 1/2P)

Wi ,i = 1− 1/4(1 + F f (i)

)− 1/4

(1 + F m(i)

)σ2

a

af (i) am(i)

ai

i = 1 : nI

Wi ,i

1/2 1/2

Genetic GroupsI Different means in founders (usually due to different origin)

= sort of hierarchical centering for pedigree model

. . .

a|G ∼ N (ZaQa0,G)

a0 ∼ const.. . .

after some "massage"

LHS =

. . . . . . . . . 0

. . . . . . 0ZT

a R−1Za + G−1 ⊗ A−1i ,i G−1 ⊗ A−1

i ,gsym. G−1 ⊗ A−1

g ,g

i − individuals, g − genetic groups

Genetic Groups - Graphical Model ViewI Unknown (phantom) parents are represented with (few!)

genetic group(s) - “graphical parent(s)”I Algorithm to set up A−1 directly available!!!I Hierarchical prior can be put on genetic groups for

stability/shrinkage

σ2a

af (i) am(i)

ai

i = 1 : nI

Wi ,i

1/2 1/2

a0g(i)

Multi-trait = multi-variate

y =(yT

1 , yT2)T, X = . . .

y| . . . ∼ N (Xb + Zcc + Zadad + Zamam,R)

R = R0 ⊗ I,R0 =

(σ2

e1 σe1,e2

sym. σ2e2

)c|C ∼ N (0,C)

C = C0 ⊗ I,C0 =

(σ2

c1 σc1,c2

sym. σ2c2

)ad , am|G ∼ N (0,G)

G = G0 ⊗ A,G0 =

σ2

ad1σad1 ,ad2

σad1,am1σad1 ,am2

σ2ad2

σad2 ,am1σad2 ,am2

σ2am1

σam1 ,am2

sym. σ2am2

I there are now 16 variance components!!!

Non-Gaussian TraitsI Categorical (health status, calving ease score, . . . )

I threshold model = (ordered) probit model, cumulative linkmodel, . . .

I multinomial categories mostly treated separately as binarytraits

I Counts (no. of offspring, . . . )I Poisson, but rarely used - replacements: threshold and/or

Gaussian model

I Time (longevity)I survival (Weibull & Cox) models

I MixturesI Gaussian componentsI zero-inflated (no. of black spots in sheep skin -> wool, cure

model - bivariate threshold model)

2. Categorical Trait Example(Calving ease score)

Calving Ease ScoreI Of great economical importance!!!I We can not measure calving difficulty -> subjective score

I 1 = no problemI 2 = easyI 3 = difficultI 4 = mechanical help or ceasearean

I Reasons for difficult calving?I sex (male calfs bigger)I number of calfs - data usually omittedI parity (more problems with the 1st calving)I age (especially in the 1st parity; younger cows more problems)I season?I environment (= herd, herd-year)I . . .

Calving Ease Score III Reasons for difficult calving - genetics?

I morphology of calfI “direct” genetic effect or “sire/bull” effectI genes expressed in calfI “origin” of genes - father and mother of a calf

I morphology of cows’ pelvic areaI “maternal” genetic effectI genes expressed in cowI “origin” of genes - father and mother of a cow

I Negative genetic correlationI larger animals (↑direct effect -> bad) have

larger pelvic area (↓maternal effect -> good)

I Parity specific genetic effects - 1st vs. 2nd+

Threshold Model(Wright, . . . , Gianola & Foulley, Sorensen, . . . )

l|b, c, ad , am,R ∼ N (Xb + Zcc + Zadad + Zamam,R)

Pr (yi = k|µi , t) = Pr (tk−1 < li < tk |µi , t)

= Φ

(tk − µi

σ

)− Φ

(tk−1 − µi

σ

). . .

I Model σ as well to improve model fit? log (σ) = . . .I Methods: approx. EM-REML, Laplace approx., McMC

Approximative (Gaussian) Model - Example(joint work with Marija Špehar - Croatia)

I Dataset: ~150k phenotypes, ~200k animals, 10 dataset samplesI Homogenization of variance by region and period of recording -

scale problems?I Bi-variate (1st & 2nd+ parity) maternal animal model with

heterogenous (by sex within parity class) residual varianceI 18 variance components - with VCE-6 program

I herd-year interaction (3) -> better with autoregressive prior?σ2

h1, σ2

h2+, σh1,h2+

I permanent effect of a cow (repeated records) (1)σ2

c2+

I direct & maternal genetic effect (10)σ2

ad1, σ2

ad2+, σad1 ,ad2+

, . . . σ2am2+

I residual (4)σ2

em1, σ2

ef1, σ2

em2+, σ2

ef2+

Approximative (Gaussian) Model - ExampleI Residual variancesσ2

em1= 0.295, σ2

ef1= 0.204, σ2

em2+= 0.228, σ2

ef2+= 0.162

I Ratios and correlations (1st vs. 2nd+)Herd-year Direct Maternal Perm.

1st 27.545 4.548 3.548 /2nd+ 24.445 9.948 4.248 5.1Corr. 20.845 0.548 0.743 /

I Genetic correlation between direct and maternal effectDirect, 1st Direct, 2nd+

Maternal, 1st -0.490 -0.433Maternal, 2nd+ -0.377 -0.730

A Look at my Data - StructureI Dimensions

I #records (= #calfs) ~150kI #cows ~74kI #bulls ~1kI #pedigree records (all generations + pruning)

I animal pedigree ~230k(basic set are calfs + ancestors)

I sire-dam pedigree ~115k(basic set are mothers and fathers of calfs + ancestors)

I two more options: sire-maternal grandsire pedigree, sirepedigree

I Distribution of scoresI no problem 50.3%I no problem 49.7%

I easy 43.5%I difficult 6.1%I mechanical help or ceasearean 0.1%

A Look at my Data - Sex & Parity

I SexI females 52%I females 47%

I ParityI 1st 59%I 2nd 46%I 3rd 45%I 4th 45%I 5th 45%

A Look at my Data - Age within Parity

20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Age at calving

Ave

rage

sco

re

Score 1st (male)

Score 2nd...#Records

A Look at my Data - Age within Parity & Sex

20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Age at calving

Ave

rage

sco

re

Score 1st (male)Score 1st (female)Score 2nd...#Records

A Look at my Data - Season

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Season

Ave

rage

sco

re

Score#Records

Analysis of my Data in R - Available ToolsI Bernoulli/binomial model

I glm() - package statsI glmer() - package lme4

I Laplace and adaptive Gauss-Hermite approximation (for moreeffects)

I inla()

I threshold modelI polr() - package MASSI clm() - package ordinal

I location (additive) and scale (multiplicative) modelI clmm() - package ordinal

I location (additive) and scale (multiplicative) modelI Laplace and adaptive Gauss-Hermite approximation (for one

effect)

3. Survival Analysis Example(Longevity = Length of Productive Life)

Model and DataI Weibull model

y|b∗,h, a, ρ ∼ Weibull (Xb∗ + Zhh + Zaa, ρ)

h (y|b∗,h, a, ρ) = ρyρ−1 exp (Xb∗ + Zhh + Zaa)

b∗ =(ρ lnλ,bT)T

b∗ ∼ const.h|γ ∼ Log − Gamma (γ, γ)

a|G ∼ N (0,G)

G = Aσ2a

I DataI ~110k cows from ~4k herds, ~40% censoringI sire-maternal grandsire pedigree with ~3k bulls

Implementation

I Survival Kit program

I Log-Gamma prior “integrated out”

I Laplace approximation for Normal prior

Time Independent Effect - Age at 1st Calving

Age at first calving (month)19 22 25 28 31 34 37 40 43 46 49

020

0040

0060

0080

0012

000

No.

of r

ecor

ds

1.0

1.2

1.4

1.6

1.8

Rel

ativ

e ris

k

All recordsUncensored recordsRelative riskBaseline

Time Dependent Effect - Parity*Stage

Length of productive life (day)0 500 1000 1500 2000

020

0040

0060

0080

00N

o. o

f rec

ords

0 500 1000 1500 2000

0.00

000.

0005

0.00

100.

0015

0.00

20H

azar

d fu

nctio

n

All recordsUncensored recordsHazard function

Thank you!

Postulated Model and Data III Breeding value for individual

= f(parent average, phenotype deviation, progeny contribution)

b1 b2

a1 a2

y21

y22

a3y3 a4 y4

a5 y5 a6 y6

a7 a8 a9

a10y10

top related