coat color genomic selection polledness dna -...
TRANSCRIPT
1
Genomic Selection
Alessandro Bagnato
Markers
• Phenotypic markers – Coat color – Blood type – Polledness
• Genetic markers / molecular markers – DNA
• Can be visualised thanks to technology • Close relatedness to available technology
Marker and selection
• Microsatellites Markers – Marker Assisted Selection (MAS)
• New Technologies: – Sequencers – Genotyping techniques
• Cow Genome Sequenced – SNP Markers – Genomic Selection
• Cattle (Dairy) – Outbred populations • Pig, Poultry – Inbreed populations
Prove di progenie e MAS • 1967: Suggerito l’utilizzo di marcatori
per il miglioramento genetico (MAS) • 1980: Progetto Genoma Umano • 1985: Sviluppato il termociclatore • 1985: Identificati i QTL nei bovini da
latte • 1990: Proposto il Granddaughter design • 1990s: Applicata MAS Bell locus • 2009 Utilizzo di selezione genomica
Marcatori molecolari
• Marcatori molecolari – Mutazioni causali
• Direttamente studiabili per l'effetto che le loro mutazioni inducono sull’espressione fenotipica del carettere
– Esempio: le caseine nel bovino (αs1 αs2 β e k)
– "Anonimi" a funzione ignota • Mutazioni genomiche non causali ma a
comportamento mendeliano – Microsatelliti, SNP, CNV
Genetic markers – Microsatelliti (Variable Number of Tandem
Repeats - VNTR) • Brevi ripetizioni di nucleotidi lungo il DNA • Frequenti le ripetizioni di AC AT CAC GATA • ACACACACACACACAC • ATATATATATATAT • CACCACCACCACCAC • Nel genoma migliaia di microsatelliti • Loci mendeliani a tutti gli effetti
2
Genetic markers
DNA-microsatellites!
---> ------CACACACACACACACACACACACACACA------ ------GTGTGTGTGTGTGTGTGTGTGTGTGTGT------ <---
[CA]15"
---> ------CACACACACACACACACACACACACACACACA------ ------GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT------ <---
[CA]17!
R. Achmann
standard
Individual homozygous Individual heterozygous
primer peak
Electropherograms
Individual heterozygous
Individual homozygous
Dinucleotide microsatellite (RM 372 on BTA8)
1 gatcagcctc ggcaattatt atcatctttc 31 attaatgtca caatttagtt tcatgggttc 61 aacccaacat ccacttgttt aaacacacac 91 acacacacac acacacaggt cactcctcag 121 tttcttcttc tgtatttctt ttcttatttt 151 ccaagtcctg ggcttggaaa tctaagtgta 181 ccttaaggat c
Black, unique sequence (flanks); Green, primers; Red, microsatellite tract (here, n=12 repeats).
Marker maps
• Marker map BTA6
• Linkage maps – Based on
recombination rate • Physical maps
– Based on DNA sequence
The Causa)ve DNA change may be
• A small change in DNA sequence GGATGTGGTCG GGATGTGCTCG
– A Single Nucleo)de Polymorphism (SNP)
GGATGT-‐GGTCG GGATGTAGGTCG
– An inser)on or dele)on (indel) • Larger scale structural changes
3
Structural Varia)on
Copy number polymorphisms/varia)on (CNP/CNV)
Copy Number Variants (CNV)
• DNA segment 1kb or larger and present at variable copy number in comparison with a reference genome
• Range from Kb to Mb • Dele)ons, inser)ons, duplica)ons and complex mul) site variants
• Common CNV (minor allele frequency >5%) termed copy number polymorphism (CNP)
• Func)onal significance yet to be fully ascertained
• Evidence for specific effects on gene expression/dosage, diseases and complex traits
Distribu)on of 1447 Human CNVs
From 150 apparently healthy individuals in the human HapMap project
(Redon et al., 2006)
Italian Brown Swiss -‐ Consensus CNVR map on UMD3.1; Y-‐axis Bos Taurus autosomes, complex CNVRs contain gains & losses
Dolezal et al. 2012, Edinburgh ICQG
Step toward GS
• Understand linkage between loci – Mendel laws – Recombination – Recombination rate / mapping functions
• What a QTL is – Identification of QTLs
• Experimental designs to map a QTL – Population structure dependent
• Cattle = outbreed populations • Poultry / swine = inbreed populations
Mendel
Fonte: internet
4
1 - Legge della dominanza
bb
BB
b
B
Gameti
F0 F1
Bb
Bb
2 - Legge della segregazione
bb
BB
b
B
Gameti
F1 F2
Bb
Bb
Bb
Bb
B
b
Legge dell’indipendenza
Gialli Lisci
Verdi Rugosi
GG-LL
vv-rr
G - L
v - r
Gv-Lr
Gialli Lisci
Gameti
F0 F1
Legge dell’indipendenza
Gv-Lr
Gialli Lisci
F1 F2
G – L G – r v - L v - r
G - L GGLL GGLr GvLL GvLr
G - r GGrL GGrr GvrL Gvrr
v – L vGLL vGLr vvLL vvLr
v – r vGrL vGrr vvrL vvrr
9/16 3/16 3/16 1/16
Linkage
• 2 loci on 2 different chromosomes segregate independently from each other. Their chance to be inherited together (co-inherit) is 0,5. These loci are unlinked.
• 2 loci are said to be linked if they are located on the same chromosome and segregate together. – Linkage Disequilibrium – LD
• Due to recombination 2 loci on the same chromosome have got a change to be not inherited together.
Recombination
• During meiosis, the chromosome often breaks up and rejoins with its homologue chromosome, resulting in new chromosomal combinations – cross overs.
• The greater the distance between 2 loci on a chromosome the more likely it is that there is a recombination between them.
5
Recombinants
Mapping functions
• The mapping function gives the relationship between the distance of 2 chromosomal locations on a genetic map and their recombination frequency.
• The distance between 2 loci is determined by their recombination fraction.
• The mapping unit is Morgan. • 1 Morgan is the distance over which on
average 1 cross over /recombination occurs per meiosis. – 1cM = 1 meiosis over 100 meiosis
Mapping function
• Morgan dM=r
• Haldane Mapping function: dM=1/2 ln (1-2r)
r=1/2 [1-exp(-2dM) • Kosambi Mapping Function
dM=1/4 ln[(1+2r)/(1-2r)] r=[1-exp(-4dM)] / 2*[(1+exp(-4dM)]
Haldane Function Cosa è un QTL
• Il fenotipo di un individuo può essere descritto accuratamente da un numero quantitativo – es kg di latte prodotto
• Distribuzione continua a cui è associata una variabilità
• h2 = proporzione di variabilità dovuta a differenze genetiche tra individui
6
Cosa è un QTL
• La teoria genetica per le popolazioni in outbreeding – Numero moderato di geni per i quali è
possibile trovare alleli che influenzano in modo diverso l’espressione del carattere
• Singolo gene che contribuisce alla variazione – Quantitative Trait Gene (QTG)
• Insieme dell’effetto di tutti I geni – Determina variabilità del carattere
Cosa è un QTL
• MA…. I QTG possono essere distinti sulla base della loro localizzazione sui cromosomi (locus)
• Per questo gli elementi responsabili della variabilità del carattere sono chiamati – Quantitative Trait Loci (QTL)
• Definiamo le regioni cromosomiche dove si trovano dei QTL come QTLR
QTL – why map it??
• To provide knowledge of individual gene actions and interactions.
• To build a more realistic model of phenotypic variation, response to selection and evolutionary processes.
• To improve breeding value estimation and selection response / reduce cost of breeding programmes through marker assisted selection.
I passi per la MAS • Identificare associazione tra marcatore e
carattere di interesse (QTL) – Scegliere i marcatori e il disegno
sperimentale (daughter / granddaughter design)
– Effettuare campionamento – Estrazione del DNA – Amplificazione (PCR) / genotyping /
sequencing – Analisi statistica – Utilizzo risultati per MAS / genomic selection
MAS - il marcatore
• Il marcatore "segna" sul genoma la presenza di un gene di interesse zootecnico
Q M
q m
Q = QTL - Gene che codifica per il carattere di interesse M = Marcatore - "Segna" la presenza del QTL
MAS - il marcatore
• Il marcatore "segna" sul genoma la presenza di un gene di interesse zootecnico
PAT. M1 M2 M3 M4 Q1 M5 M6 M7 Q2 M8
Q = QTL - Gene che codifica per il carattere di interesse M = Marcatore - "Segna" la presenza del QTL
MAT. m1 m2 m3 m4 q1 m5 m6 m7 q2 m8
7
Different Q/M het/hom
PAT. M1 M2 M3 M4 Q1 M5 M6 M7 Q2 M8
MAT. m1 m2 m3 m4 Q1 m5 m6 m7 q2 m8
PAT. M1 M2 M3 M4 Q1 M5 M6 M7 Q2 M8
MAT. m1 m2 M3 M4 q1 M5 M6 M7 q2 M8
PAT. M1 M2 M3 M4 Q1 M5 M6 M7 Q2 M8
MAT. m1 M2 M3 m4 Q1 m5 m6 m7 q2 M8
Mapping QTL in different populations
• Inbred line crosses (e.g. mice, plants) – to detect QTL that differ between the lines – Backcross (BC), – Cross in second generation (F2) – F3-F.. (AIL = advanced intercross lines, RIL
= recombinant inbred lines)
Mapping QTL in different populations
• Outbred populations b) within structured outbred populations
• search for QTL within populations • e.g. daugter design & grand daughter design • e.g. dairy cattle, commercial pigs • e.g. trees (maternal half sib families)
c) within unstructured outbred populations • (e.g humans) complex pedigrees – many
generations, different relationships, inbreeding loops"
Mappare i QTL nei bovini
• Bovini da latte – Popolazioni in outbreeding
• Genome Scan – La ricerca di QTL su tutto il genoma
• Disegni sperimentali – Daughter Design – Grand Daughter Design – Selective DNA Pooling
• High resolution mapping – Una ricerca più fine in QTLR
Interval mapping Q MMq mm
Q MM
q MM
Q mM
q Mm
q mm
parent
example gametes
Meiosis
Haplotype
• Haplotype ((linkage) phase): A combination of alleles (for different loci) which are located (closely) together on the same chromosome and which tend to be inherited together.
..A b C D ...... r Q S..
.. A b C D ...... R q s..
8
The Daughter Design • Bulls must be heterozygous at the
markers • Large half sib family groups
Q M q m
1000 / 4000 Daughters
Daughter design m M
q Q
Sire
Inseminations
3.0% protein
3.2% protein
m
q
M
Q
M/m?
q/Q?
Daughters
Sire son
M/m?
q/Q?
Studi esistenti
Grandsire
PT sires
Candidate PT grandsons
CD ??
?? ??
?? ??
Cd ??
cD ??
cd ??
best worst
C Q D c q d
• Use of interval mapping • Many markers to assure
heterozygosity • Use of flanking markers
to trace the trasmission of the QTL
A B C D E F
QTL
Interval mapping: Example
0
10
20
30
40
50
60
0 20 40 60 80 100
Location on chromosome (cM)
Test
sta
tistic
11 markers
6 markers
3 markers
Interval mapping Individual sire F values: Chr1
PROTEIN YIELD
0
2
4
6
8
10
12
1 11 21 31 41 51 61 71 81 91 101 111Position on chromosome (cM)
F value
Sire 1
Sire 2
Sire 3
Sire 4
Sire 5
Sire 6
The Selective DNA pooling • Identify the individual in the tails • Use the correct indicator
– DYD or corrected EBV
Q M q m
1000 / 4000 Daughters
Low 10% 2 sub-pools
High 10% 2 sub-pools
9
The methodology • Selective DNA pooling • Estimation of allele frequency
– A panel of 150/200 microsatellite markers – shadow band correction
f(M) = .40 Q M q m f(m) = .10
f(M) = .10 Q M q m f(m) = .40
Q M q m
1000 / 4000 Daughters
Low 10% 2 sub-pools
High 10% 2 sub-pools
Identification of QTL
From Tal-Stein et al. 2010 - J. Dairy Sci. 93:4913-4927
Milk Somatic Cell Count – Holstein Friesian Selective DNA pooling
Progeny Test and MAS
+ 1000
- + 1000
-
IP = (1000 + 1000)/2 = 1000
+ + + - - -
+ : + : + : + : + : - : + : - :
From Traditional to Genomic Selection
• Variation in phenotypes (eg. Milk production) determined by: – Difference in management (environment) – Difference in genetics: each individual is genetically different
from others
• Genomic selection principle: – Relate phenotypic differences to genetic differences – Use this information (knowledge of genetic differences)to
select individuals
Measurement (e.g. meat quality) = Genetics + Environment
Combined effect of all influential Genes (G1+ G2 + G3 +…)
Useful to know the genes (alleles)
G1
G3
G2
e.g. Calpain and Calpastatin
gene tests detect variation in some of the genes involved
in (the genetic component of) beef
tenderness
Identification of QTL
MSCC - From Tal-Stein et al. 2010 - J. Dairy Sci. 93:4913-4927
What is really controlling the trait?
10
Technology and Information • 2003 human genome delivered
– After many years (US$100’s M)
• 2006 cattle genome delivered – Two years (US$ 50M)
• 2008/2010 reality – 777,000+ SNP panel for cattle ~US$ 0.0003/SNP – 54,000+ SNP panels for pigs, sheep and chickens – ~US$ 0.003/SNP
• 2011 reality – 1 cow sequenced for less than 10,000 US$
• 2012…. Whole genome re-sequencing genotyping – 1 cow sequenced for less than 1,000 US$
• 2015 …. ????
New technologies => new possibilities
• Genomic technology – SNP genotyping platforms – Second / third generation sequencers – Production of genomic information at a very low cost
per individual • Phenotype recording
– Automation in phenotypic recording – E.g. NIR technology for milk quality traits
• Trait ontology • Reproductive technologies
– Embryo Transfer – Semen Sexing
• Association of genomic information to phenotypic information – Genomic selection
Copyright 2001 by the Genetics Society of America
Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps
T. H. E. Meuwissen,* B. J. Hayes† and M. E. Goddard†,‡
*Research Institute of Animal Science and Health, 8200 AB Lelystad, The Netherlands, †Victorian Institute of Animal Science,Attwood 3049, Victoria, Australia and ‡Institute of Land and Food Resources,
University of Melbourne, Parkville 3052, Victoria, Australia
Manuscript received August 17, 2000Accepted for publication January 17, 2001
ABSTRACTRecent advances in molecular genetic techniques will make dense marker maps available and genotyping
many individuals for these markers feasible. Here we attempted to estimate the effects of z50,000 markerhaplotypes simultaneously from a limited number of phenotypic records. A genome of 1000 cM wassimulated with a marker spacing of 1 cM. The markers surrounding every 1-cM region were combined intomarker haplotypes. Due to finite population size (Ne 5 100), the marker haplotypes were in linkage disequilib-rium with the QTL located between the markers. Using least squares, all haplotype effects could not beestimated simultaneously. When only the biggest effects were included, they were overestimated and theaccuracy of predicting genetic values of the offspring of the recorded animals was only 0.32. Best linearunbiased prediction of haplotype effects assumed equal variances associated to each 1-cM chromosomalsegment, which yielded an accuracy of 0.73, although this assumption was far from true. Bayesian methodsthat assumed a prior distribution of the variance associated with each chromosome segment increasedthis accuracy to 0.85, even when the prior was not correct. It was concluded that selection on geneticvalues predicted from markers could substantially increase the rate of genetic gain in animals and plants,especially if combined with reproductive techniques to shorten the generation interval.
SELECTION for economically important quantita- markers feasible (and perhaps even cost effective). How-tive traits in animals and plants is traditionally based ever, the precision of mapping QTL by traditional link-
on phenotypic records of the individual and its relatives. age analysis is little improved by the use of a very denseEstimated breeding values, based on this phenotypic marker map (Darvasi et al. 1993). Therefore, a differ-data, are commonly calculated by best linear unbiased ent approach is needed to efficiently use all this markerprediction (BLUP; Henderson 1984). One justification information.for molecular genetics research on livestock and crop With a dense marker map some markers will be veryspecies is the expectation that information at the DNA close to the QTL and probably in linkage disequilibriumlevel will lead to faster genetic gain than that achieved with it (e.g., Hastbacka et al. 1992). Therefore, somebased on phenotypic data only. The availability of a marker alleles will be correlated with positive effects onsparse map of genetic markers has resulted in the detec- the quantitative trait across all families and can be usedtion of some quantitative trait loci (QTL; Georges et for selection without the need to establish linkage phaseal. 1995). The inclusion of marker information into in each family. Close markers can be combined into aBLUP breeding values was demonstrated by Fernando haplotype. Chromosome segments that contain theand Grossman (1989) and is predicted to yield 8–38% same rare marker haplotypes are likely to be identicalextra genetic gain (Meuwissen and Goddard 1996). by descent (IBD) and hence carry the same QTL allele.However, the usefulness of information from a sparse Our approach is to estimate the effect on the quantita-marker map in outbreeding species is limited because tive trait of small chromosome segments defined by thethe linkage phase between a marker and QTL must be haplotypes of marker alleles that they carry.established for every family in which the marker is to Quantitative traits are usually affected by many genesbe used for selection. and consequently the benefit from marker-assisted se-
The total number of single nucleotide polymorphisms lection is limited by the proportion of the genetic vari-(SNP) is estimated at many millions (Halushka et al. ance explained by the QTL. It would be desirable to1999), and the advent of DNA chip technology may utilize all QTL affecting the trait in marker-assisted se-make genotyping of many animals for many of these lection. However, a dense marker map defines a very
large number of chromosome segments and so therewill be many effects to be estimated, probably more
Corresponding author: Theo Meuwissen, Department of Animal than there are phenotypic data points from which toBreeding and Genetics, DLO-Institute for Animal Science and Health, estimate them.Box 65, 8200 AB Lelystad, The Netherlands.E-mail: [email protected] The problem is essentially the same if we assume that
Genetics 157: 1819–1829 (April 2001)
Measured traits +
Thousand of markers Prediction equations
Thousand of markers +
Prediction equations GEBV
Training
Application
The Principle of ‘Genomic Selection’
Why EBV (DYD) is the phenotype?
P = µ + A + D + I + PE + TE
G E
P = Fenotipo µ = Fattore comune agli individui (es. managment) A = effetti genetici additivi D = effetti genetici di dominanza I = effetti genetici di interazione PE = effetti ambientali permanenti TE = effetti ambientali temporanei
=
Gestione
+
Casualità
+
Genetica
Phen / Gen Variability PAT. M1 M2 M3 M4 Q1 M5 M6 M7 Q2 M8 .... q/Qn… m/Mn
MAT. m1 m2 m3 m4 Q1 m5 m6 m7 Q2 m8 .... q/Qn… m/Mn
EBV => Milk Kg = 2340 Protein % +0.12 SCC = 95 1
PAT. M1 M2 M3 m4 q1 M5 M6 M7 Q2 M8 … q/Qn … m/Mn
MAT. m1 M2 M3 m4 q1 m5 m6 m7 q2 m8 ... q/Qn … m/Mn
EBV => Milk Kg = 1870 Protein % +0.23 SCC = 103 2
PAT. m1 M2 M3 m4 q1 M5 M6 M7 q2 M8 … q/Qn … m/Mn
MAT. m1 m2 M3 M4 Q1 m5 M6 M7 q2 m8 ... q/Qn … m/Mn
EBV => Milk Kg = 2140 Protein % -0.03 SCC = 101 n
... … … … … … … … … … … … … … … … … … … … … … …
... … … … … … … … … … … … … … … … … … … … … … …
11
Es. Marker 10,720 Genotype N. Individuals Phenotype M10,720M10,720 38 2019 M10,720m10,720 545 1780 M10,720m10,720 319 1689
Number of M alleles in the genotype 1600
2000
0 1 2
Phenotype Means + BV
mm Mm MM
Mean Alpha = 141.7
1800
1900
1700
2100
Dom Dev
BV
0.0 0.0 +0.5 0.0 0.0 -0.07 0.0 +0.15
+0.2 0.0 0.0 0.0 0.0 -0.3 0.0 0.0
Assume all errors cancel each other out
Protein Content (%) = + 0.12 Longevity (years) = + 0.34
0.0 +0.2 0.0 0.0 0.0 0.0 +0.03 0.0
-0.1 0.0 +0.35 0.0 0.0 -0.05 -0.01 0.0
And so on … = ????? 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Most markers will have no effect
Direct Genomic EBV • Training (single marker analysis)
• Test population
• Where – xi = is the marker genotype of individual j – gj = is the estimated effect of marker j
DGEBV = xiji
n∑ g j
y = µ1n + Xigi + e
Accuracy of GEBV
• Factors influencing the accuracy of genomic selection i.e. r(DGEBV,TBV) – Linkage Disequilibrium between QTL and
Markers • Density of markers => whole genome
– Method of estimation • Single marker / Haplotypes / IBD • BLUB / Bayes etc.
– Number of records to estimate prediction equations, i.e. n. individuals in training population
Accuracy of genomic selection
• Factors affecting accuracy of genomic selection r(GEBV,TBV)–Linkage disequilibrium between QTL
and markers = density of markers• Haplotypes or single markers be in sufficient LD
with the QTL such that the haplotype or single markers will predict the effects of the QTL across the population.
Accuracy of genomic selection
• Factors affecting accuracy of genomic selection r(GEBV,TBV)–Linkage disequilibrium between QTL
and markers = density of markers• Haplotypes or single markers be in sufficient LD
with the QTL such that the haplotype or single markers will predict the effects of the QTL across the population.
• Calus et al. (2007) used simulation to assess effect of LD between QTL and markers on accuracy of genomic selection
Accuracy of genomic selection• Effect of LD on accuracy of
selection
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0.075 0.095 0.115 0.135 0.155 0.175 0.195 0.215
Average r2 between adjacent marker loci
Acc
ura
cy o
f G
EB
V
HAP_IBS
Accuracy of genomic selection
• Factors affecting accuracy of genomic selection r(GEBV,TBV)– Linkage disequilibrium between QTL and
markers = density of markers– In dairy cattle populations, an average r2
of 0.2 between adjacent markers is only achieved when markers are spaced every 100kb.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500
Distance (kb)
Ave
rag
e r
2 v
alu
e
Australian Holstein
Norwegian Red
Australian Angus
New Zealand Jersey
Dutch Holsteins
Accuracy of genomic selection
• Factors affecting accuracy of genomic selection r(GEBV,TBV)– Linkage disequilibrium between QTL and
markers = density of markers– In dairy cattle populations, an average r2
of 0.2 between adjacent markers is only achieved when markers are spaced every 100kb.
– Bovine genome is approximately 3000000kb
– Implies that in order of 30 000 markers are required for genomic selection to achieve accuracies of 0.8!!
Accuracy of genomic selection
• Comparing the accuracy of genomic selection with – IBD approach –haplotypes – single markers–Calus et al (2007) used simulated
data
'QTL Mapping, MAS, and Genomic Selection'
Course Notes
Taught by Dr. Ben Hayes, Animal Genetics and Genomics group of the Department of Primary Industries Research Victoria (Attwood - Melbourne, Australia)
March 10-14, 2008The Animal Breeding and Genomics Centre (Animal Sciences Group - Wageningen University and Research Centre), Lelystad, The Netherlands
4/15
Courtesy of Ben Hayes, 2008
Accuracy of genomic selection
• Factors affecting accuracy of genomic selection r(GEBV,TBV)–Linkage disequilibrium between QTL
and markers = density of markers• Haplotypes or single markers be in sufficient LD
with the QTL such that the haplotype or single markers will predict the effects of the QTL across the population.
Accuracy of genomic selection
• Factors affecting accuracy of genomic selection r(GEBV,TBV)–Linkage disequilibrium between QTL
and markers = density of markers• Haplotypes or single markers be in sufficient LD
with the QTL such that the haplotype or single markers will predict the effects of the QTL across the population.
• Calus et al. (2007) used simulation to assess effect of LD between QTL and markers on accuracy of genomic selection
Accuracy of genomic selection• Effect of LD on accuracy of
selection
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0.075 0.095 0.115 0.135 0.155 0.175 0.195 0.215
Average r2 between adjacent marker loci
Acc
ura
cy o
f G
EB
V
HAP_IBS
Accuracy of genomic selection
• Factors affecting accuracy of genomic selection r(GEBV,TBV)– Linkage disequilibrium between QTL and
markers = density of markers– In dairy cattle populations, an average r2
of 0.2 between adjacent markers is only achieved when markers are spaced every 100kb.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500
Distance (kb)
Ave
rag
e r
2 v
alu
e
Australian Holstein
Norwegian Red
Australian Angus
New Zealand Jersey
Dutch Holsteins
Accuracy of genomic selection
• Factors affecting accuracy of genomic selection r(GEBV,TBV)– Linkage disequilibrium between QTL and
markers = density of markers– In dairy cattle populations, an average r2
of 0.2 between adjacent markers is only achieved when markers are spaced every 100kb.
– Bovine genome is approximately 3000000kb
– Implies that in order of 30 000 markers are required for genomic selection to achieve accuracies of 0.8!!
Accuracy of genomic selection
• Comparing the accuracy of genomic selection with – IBD approach –haplotypes – single markers–Calus et al (2007) used simulated
data
'QTL Mapping, MAS, and Genomic Selection'
Course Notes
Taught by Dr. Ben Hayes, Animal Genetics and Genomics group of the Department of Primary Industries Research Victoria (Attwood - Melbourne, Australia)
March 10-14, 2008The Animal Breeding and Genomics Centre (Animal Sciences Group - Wageningen University and Research Centre), Lelystad, The Netherlands
4/15
Courtesy of Ben Hayes, 2008
12
Accuracy of genomic selection
• Factors affecting accuracy of genomic selection r(GEBV,TBV)–Linkage disequilibrium between QTL
and markers = density of markers• Haplotypes or single markers be in sufficient LD
with the QTL such that the haplotype or single markers will predict the effects of the QTL across the population.
Accuracy of genomic selection
• Factors affecting accuracy of genomic selection r(GEBV,TBV)–Linkage disequilibrium between QTL
and markers = density of markers• Haplotypes or single markers be in sufficient LD
with the QTL such that the haplotype or single markers will predict the effects of the QTL across the population.
• Calus et al. (2007) used simulation to assess effect of LD between QTL and markers on accuracy of genomic selection
Accuracy of genomic selection• Effect of LD on accuracy of
selection
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0.075 0.095 0.115 0.135 0.155 0.175 0.195 0.215
Average r2 between adjacent marker loci
Acc
ura
cy o
f G
EB
V
HAP_IBS
Accuracy of genomic selection
• Factors affecting accuracy of genomic selection r(GEBV,TBV)– Linkage disequilibrium between QTL and
markers = density of markers– In dairy cattle populations, an average r2
of 0.2 between adjacent markers is only achieved when markers are spaced every 100kb.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500
Distance (kb)
Ave
rag
e r
2 v
alu
e
Australian Holstein
Norwegian Red
Australian Angus
New Zealand Jersey
Dutch Holsteins
Accuracy of genomic selection
• Factors affecting accuracy of genomic selection r(GEBV,TBV)– Linkage disequilibrium between QTL and
markers = density of markers– In dairy cattle populations, an average r2
of 0.2 between adjacent markers is only achieved when markers are spaced every 100kb.
– Bovine genome is approximately 3000000kb
– Implies that in order of 30 000 markers are required for genomic selection to achieve accuracies of 0.8!!
Accuracy of genomic selection
• Comparing the accuracy of genomic selection with – IBD approach –haplotypes – single markers–Calus et al (2007) used simulated
data
'QTL Mapping, MAS, and Genomic Selection'
Course Notes
Taught by Dr. Ben Hayes, Animal Genetics and Genomics group of the Department of Primary Industries Research Victoria (Attwood - Melbourne, Australia)
March 10-14, 2008The Animal Breeding and Genomics Centre (Animal Sciences Group - Wageningen University and Research Centre), Lelystad, The Netherlands
4/15
Courtesy of Ben Hayes, 2008
Accuracy of genomic selection
• Factors affecting accuracy of genomic selection r(GEBV,TBV)–Linkage disequilibrium between QTL
and markers = density of markers• Haplotypes or single markers be in sufficient LD
with the QTL such that the haplotype or single markers will predict the effects of the QTL across the population.
Accuracy of genomic selection
• Factors affecting accuracy of genomic selection r(GEBV,TBV)–Linkage disequilibrium between QTL
and markers = density of markers• Haplotypes or single markers be in sufficient LD
with the QTL such that the haplotype or single markers will predict the effects of the QTL across the population.
• Calus et al. (2007) used simulation to assess effect of LD between QTL and markers on accuracy of genomic selection
Accuracy of genomic selection• Effect of LD on accuracy of
selection
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0.075 0.095 0.115 0.135 0.155 0.175 0.195 0.215
Average r2 between adjacent marker loci
Acc
ura
cy o
f G
EB
V
HAP_IBS
Accuracy of genomic selection
• Factors affecting accuracy of genomic selection r(GEBV,TBV)– Linkage disequilibrium between QTL and
markers = density of markers– In dairy cattle populations, an average r2
of 0.2 between adjacent markers is only achieved when markers are spaced every 100kb.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500
Distance (kb)
Ave
rag
e r
2 v
alu
e
Australian Holstein
Norwegian Red
Australian Angus
New Zealand Jersey
Dutch Holsteins
Accuracy of genomic selection
• Factors affecting accuracy of genomic selection r(GEBV,TBV)– Linkage disequilibrium between QTL and
markers = density of markers– In dairy cattle populations, an average r2
of 0.2 between adjacent markers is only achieved when markers are spaced every 100kb.
– Bovine genome is approximately 3000000kb
– Implies that in order of 30 000 markers are required for genomic selection to achieve accuracies of 0.8!!
Accuracy of genomic selection
• Comparing the accuracy of genomic selection with – IBD approach –haplotypes – single markers–Calus et al (2007) used simulated
data
'QTL Mapping, MAS, and Genomic Selection'
Course Notes
Taught by Dr. Ben Hayes, Animal Genetics and Genomics group of the Department of Primary Industries Research Victoria (Attwood - Melbourne, Australia)
March 10-14, 2008The Animal Breeding and Genomics Centre (Animal Sciences Group - Wageningen University and Research Centre), Lelystad, The Netherlands
4/15
Courtesy of Ben Hayes, 2008
Accuracy of genomic selection
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0.075 0.095 0.115 0.135 0.155 0.175 0.195 0.215
Average r2
between adjacent marker loci
Acc
ura
cy o
f G
EB
V
SNP1
HAP_IBS
HAP_IBD
Accuracy of genomic selection
• Number of records used to estimate chromosome segment effects–Chromosome segment effects gi
estimated in a reference population–How big does this reference
population need to be?–Meuwissen et al. (2001) evaluated
accuracy using LS, BLUP, BayesBusing 500, 1000 or 2000 records in the reference population
Accuracy of genomic selection
• Number of records used to estimate chromosome segment effects
No. of phenotypic records
500 1000 2200
Least squares 0.124 0.204 0.318
Best linear unbiased prediction (BLUP) 0.579 0.659 0.732
BayesB 0.708 0.787 0.848
Accuracy of genomic selection
• Number of records used to estimate chromosome segment effects
No. of phenotypic records
500 1000 2200
Least squares 0.124 0.204 0.318
Best linear unbiased prediction (BLUP) 0.579 0.659 0.732
BayesB 0.708 0.787 0.848
h2=0.5
Accuracy of genomic selection• Number of records used to estimate chromosome
segment effects
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
21000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Heritability
Nu
mb
er
of
ph
en
oty
pic
re
co
rds
ne
ce
ssa
ry t
o
ac
hiv
e t
his
ac
cu
rac
y
Accuracy of GEBV 0.7
Accuracy of GEBV 0.5
Genomic selection• An IBD approach• Factors affecting accuracy of genomic
selection• How often to re-estimate the
chromosome segment effects?• Non-additive effects • Genomic selection with low marker
density• Genomic selection across breeds• Cost effective genomic selection• Optimal breeding program design with
genomic selection
'QTL Mapping, MAS, and Genomic Selection'
Course Notes
Taught by Dr. Ben Hayes, Animal Genetics and Genomics group of the Department of Primary Industries Research Victoria (Attwood - Melbourne, Australia)
March 10-14, 2008The Animal Breeding and Genomics Centre (Animal Sciences Group - Wageningen University and Research Centre), Lelystad, The Netherlands
5/15
Courtesy of Ben Hayes, 2008
Technology and information
The SNP Genotyping technology
Why Genomic Selection?
• Where traits measurements are: – Expensive – Difficult – Only measured in one sex – Only possible late in life – Only possible on relatives of the
animals available for selection • Low heritability traits
– i.e. low signal:noise ratio • Where inheritance is complex • Where incidence is sporadic
• For traits such as: – Product composition – Fertility – Longevity – Meat quality – Disease resistance
13
Why Genomic Selection?
• Higher reliability of proofs (EBV) – Possibility to use a group of unproven bulls – Better choice of bull dams
• Faster generation interval – Use of young bulls
• Lower progeny testing costs – Smaller groups of young bulls to progeny
testing – Reduction in generation interval
Key Questions about GS • Training
– How many animals? – How many markers? – How many generations?
• Application – How accurate? – How stable over time? – How robust across
populations and environments?
• E.g. Genotype x Environment interaction
From Muir et al. 2010 Interbull Bulletin 41
How can we improve genomic selection???
Razza Frisona Italiana Schema di selezione
1 ml vacche registrate 500,000 non registrate
Top 2% madri di toro
Top 1% tori italiani e Top 1% internazionali
500(T) 2000(G)
giovani tori
ANAFI Centro
Genetico
400(T) 100(G) Giovani
Tori
Progeny test(T) 23% Pop. Reg
Top 5% (T+G)
Alcuni marcatori in zootecnia
• Caseine – Da circa 15 anni geni delle caseine isolati e
sequenziati interamente - scoperta di nuove varianti
• β lattoglobulina – Almeno 6 varianti genetiche
• Effetto sul tasso di caseine nel latte e sulla trasformazione in formaggio
• BLAD (Bovine Leukocyte Adhesion Deficency) – Mutazione > 2% soggetti di razza Holstein
Alcuni marcatori in zootecnia
• DUMPS (Deficency of Uridrine Monophosphate Synthase) – Solo nella Holstein
• Portatori sani producono più latte • In omozigosi aumento delle mortalità nei primi
due mesi di gestazione
• WEAVER (Mioencefalopatia degenerativa progressiva) – Solo nella razza Bruna
• Portatori sani producono più latte (700 kg anno) • Gene non ancora isolato ma identificato
marcatore
Alcuni marcatori in zootecnia
• PSS-MH (Porcine Stress Sindrome - Malignant Hyperthermia) – Compromette l'utilizzo della carne da parte
dell'industria di trasformazione – Gene identificato pochi anni fa
• MH (ipertrofia muscolare) • Boorola
– Razza Merino australiana – Omozigoti FF numero di nati doppio
14
Diagnosi con marcatori
• Es. “Complex Vertebral Malformation” • Patologia a carattere recessivo
individuata di recente • Carlin-M Ivanhoe Bell portatore e padre
di numerosi tori in riproduzione • Danno per la selezione