qtl mapping

QTL MappingQTL Mapping

Violeta I. Bartolome

Senior Associate Scientist-Biometrics

Crop Research Informatics Laboratory

International Rice Research Institute

Quantitative TraitsQuantitative Traits

• Vary continuously (e.g.

yield, quality, stress

tolerance)

• Usually governed by a

number of genes

• Loci involved in the

inheritance of quantitative

traits are called QTL

(quantitative trait loci)

QTL MappingQTL Mapping

Objective is to identify QTLs that affect

the quantitative trait of interest.

Mapping PopulationsMapping Populations

Data Needed for QTL MappingData Needed for QTL Mapping

• Assign a trait value for each mapping

population member.

• Allele score for the set of marker loci

distributed throughout the genome.

Methods to Detect QTLSMethods to Detect QTLS

• Single marker analysis

• Interval mapping

• Composite interval mapping

• Multiple QTL mapping

Single Marker Analysis (SMA)Single Marker Analysis (SMA)Model for SMAModel for SMA

eMGy i ++++++++µµµµ====

where y = the phenotype

MG = marker genotype

Single Marker AnalysisSingle Marker Analysis

• A significant difference between phenotypic means of the groups indicates that the marker locus being used to partition the mapping population is linked to a QTL controlling the trait.

• The QTL and marker is usually inherited together and the mean of the group with the tightly-linked marker will be significantly different to the mean of the group without the marker.



• Advantages

o Simple

o Easily incorporates

covariances

o Does not require a

complete genetic

map

• Disadvantages

o Must exclude individuals with missing genotype data

o Less precise about the location of the QTL.

o The farther away a QTL from a marker the less likely it is to be detected thus the QTL effect may be underestimated.

o Only considers one QTL at a time.

R/R/qtlqtl

Data Entry, Data Quality Check,

and Single Marker Analysis

Data analyzable by R/Data analyzable by R/qtlqtl

• F2

• Backcross

• RILs

– class(mydata) [1] <- “riself” # if by selfing

– Class(mydata) [1] <- “risib” # if by sibling mating

Data not analyzable by R/Data not analyzable by R/qtlqtl

• Outcross data

• Half-sib families

• Advanced intercross lines

Input filesInput files

• Text file (comma delimited)

• Mapmaker format

• QTL cartographer format

Sample DataSample Data• cvs format

back

Sample DataSample Data• Map maker format – genotype data

Sample DataSample Data• Map maker format – phenotype data

back

Sample DataSample Data• QTL Cartographer format – Rcross data

Sample DataSample Data• QTL Cartographer format – Rmap data

Reading cross dataReading cross data

• csv file

read.cross(“csv”, file=“csvfile.csv”, genotypes=c("A","H","B","D","C" ))

• Map maker file

read.cross(“csvs”, genfile=“mapmaker_gen.csv”,phefile=“mapmaker_phe.csv”)

• QTL cartographerread.cross(“qtlcart”,

file=“qtlcart.cro”,

mapfile=“qtlcart.map”)

Reading Reading csvcsv datadata

Data Quality CheckData Quality Check

Drop markers deviating from the hypothesized ratio

using the following statement

plot.missingplot.missing()()

• Plot a grid showing

which genotypes

are missing

Note: Genotypes with

missing data are denoted by

black pixels.

plot.mapplot.map()()

• Plot genetic map of

marker locations for

all chromosomes

plot.phenoplot.pheno()()

• Plots a histogram or

barplot of the data

for a phenotype

from an

experimental cross

Note: pheno.col indicates the column

number of the data to be plotted.

plot()plot()

• Plots all graphs

together

est.rfest.rf()()

• Estimate the sex-averaged recombination

fraction between all pairs of genetic

markers

• For a backcross, one can simply count

recombination events. For an intercross or

4-way cross, recombination fractions must

be estimated.

plot.rfplot.rf()()

• Plot a grid showing the recombination fractions for all pairs of markers, and/or the LOD scores for tests of linkage between pairs of markers

• If both are plotted, the recombination fractions are in the upper left triangle while the LOD scores are in the lower right triangle. Red corresponds to a large LOD or a small recombination fraction, while blue is the reverse. Missing values appear in light gray

Plot both Plot both rfrf and and lodlod

Plot Plot rfrf and and lodlod for for ChrChr 1 only1 only Plot Plot lodlod only for only for ChrChr 2 and 32 and 3

scanonescanone()()

• Genome scan with a single QTL

model, with possible allowance for

covariates, using any of several

possible models for the phenotype

and any of several possible

numerical methods

scanonescanone()()

scanone(cross, chr, pheno.col=1,

model=c("normal","binary","2part","np"),

method=c("em","imp","hk","ehk","mr","mr-

imp","mr-argmax"), addcovar=NULL, n.perm,)

cross – object to be analyzed

chr - optional vector indicating the chromosomes for

which LOD scores should be calculated

pheno.col – column number of the phenotype data

addcovar - additive covariates, allowed only for the normal

and binary models

n.perm – the number of permutations forward

model=model=

• normal – the standard QTL model for QTL mapping. The residual phenotypic variation is assumed to follow a normal distribution

• binary – for binary phenotype, which must have values 0 and 1. Available for em and mrmethods only

• 2part – when there is a spike in the phenotype distribution

• np( non-parametric) – an extension of the Kruskal-Wallis test is used

method=method=• mr – single marker regression

o mr – deletes individuals with missing genotype

o mr-imp – fills in missing data using single imputation

o mr-argmax – fills in missing data suing the Vitervi algorithm

• em – maximum likelihood using the Expectation-maximization (EM) algorithm

• hk – Haley-Knott regression

• imp – multiple imputation (Sen and Churchill, 2001). Uses Monte Carlo algorithm instead of EM.

• ehk – extended Haley-Knott method (Feenstra et al., 2006). An improvement of the hk especially when epistasis exists between QTLs

Single marker ANOVASingle marker ANOVA

• Threshold=3

• Using permutation test

Estimating heritabilityEstimating heritability

for each markerfor each marker Interval Mapping (IM)Interval Mapping (IM)

• Used for estimating

the position of a QTL

within two markers

• Statistically more

powerful than single

marker analysis

Methods used in IMMethods used in IM

• Maximum Likelihood (standard interval mapping)

• Haley-Knott Regression

• Extended Haley-Knott Regression

Note:

• All methods estimate three parameters: mean,

genetic effects and residual variance.

• All methods compute the conditional

probabilities for each QTL genotype at a position

between markers.

Probabilities of a putative QTL for Probabilities of a putative QTL for

a backcross a backcross

Prob(Q|M1M2) 12

21

1

)1)(1(

r

rr

−−−−−−−−−−−−

Prob(Q|M1m2) 12

21)1(

r

rr−−−−

Prob(Q|m1M2) 12

21 )1(

r

rr −−−−

Prob(Q|m1m2) 12

21

1 r

rr

−−−−

LOD ScoresLOD Scores

• Logarithmic of the odds – used to identify

the most likely position for a QTL in

relation to the linkage map

• Test of Significance

o LOD > 3 is the significance threshold – 1 in 1,000

the loci are not linked

o Permutation test

forward

OddsOdds

p

p

failureof.prob

successof.probOdds

−==1

Odds = 1 � equal chance of success and failure

Odds < 1 � lower chance of success

Odds > 1 � higher chance of success

Maximum LikelihoodMaximum Likelihood

• The likelihood for a given set of parameters

(QTL position and QTL effect) given the

observed data on phenotypes and marker

genotypes

• The estimates for the parameters are those

where the likelihood are highest

• Expectation-maximization(EM) method is

used in the estimation procedure

Maximum LikelihoodMaximum Likelihood

• A test statistic for this method is:

model)hood(fullMax_Likeli

model)edhood(reducMax_Likeliln2−=LR

The reduced model refers to the null-

hypothesis of no QTL effect.

• The LOD score for a QTL at position c is:

4.61

LR(c)

2ln10

LR(c)LOD(c) ==

HaleyHaley--Knott (HK) RegressionKnott (HK) Regression

• For two markers, the model is:

exy +α+µ=

where y is the observed phenotype

x is the P(Q|mg1,mg2,r1,r12)

HK RegressionHK Regression

• For each QTL position, the residual sums of

squares (SSE) is determined.

• The estimate of the QTL position is where

the SSE is the minimum.

• Estimates an approximate likelihood ratio:

====

full

reduced

SSE

SSEnLR ln

Extended HK RegressionExtended HK Regression

• An improvement of the HK regression

• Correct variance for each genotype is

being used instead of a constant

variance used in the HK regression

Which IM method to useWhich IM method to use

• ML provides better estimates but analysis is complex and computationally expensive

• HK regression is computationally faster but estimate of the residual variance is biased and the power of QTL detection may be affected (Kao et al 1999)

• Extended HK regression is not as fast as HK but provides improved approximations and still faster than ML

• Results are hardly different in practical mapping

Multiple Imputation MethodMultiple Imputation Method

• Another method available for IM

• Fills in all missing genotype data then uses single marker ANOVA to identify significant QTLS

• More robust than ML but has little advantage over the extended HK for single QTL mapping

• Intensive in both computation time and memory use

Interval MappingInterval Mapping

• Advantageso Takes proper account of missing data

o Allows examination of positions between markers

o Gives improved estimates of QTL effects

• Disadvantageso Increased computation time

o Requires specialized software

o Difficult to generalize

o Only considers one QTL at a time

IM sample outputIM sample output

Red – EM

Blue - EHK

R/R/qtlqtl

Interval Mapping

EM, HK, and EHK

Interval mappingInterval mapping

• Maximum likelihood

Permutation test can also be used to get

threshold value for lod scores.

calc.genoprobcalc.genoprob()()

• Calculate QTL probabilities conditional

on the available marker data.

• Needed in most mapping functions

o step – indicates step size in cM at which the

probabilities are to be calculated

o error.prob – assumed genotyping error rate

Note: genotyping error occurs when the

observed genotype of an individual does not

correspond to the true genotype.

Interval mappingInterval mapping

• Extended Haley-Knott Regression

Permutation test can also be used to get

threshold value for lod scores.

Combining IM resultsCombining IM results

Plot of combined resultsPlot of combined results

red – em

blue - ehk

Composite Interval MappingComposite Interval Mapping

• Performs interval mapping using a

subset of marker loci as covariates

• Markers serve as proxies for other

QTLs to account for linked QTLs and

reduce residuals

• Gives greater power in identifying key

QTL.

• More statistically complicated and

requires more computational power.

Steps in CIMSteps in CIM

• Selects a set of markers to serve as covariates.

• Performs interval mapping with these markers as covariates.

• Excludes markers at a fixed distance from the test position.

• Calculates a LOD score comparing the model with the putative QTL in the presence of covariates to the model with just the covariates.

Sample CIM outputSample CIM output

Blue – EM

Red - CIM

Problem with CIMProblem with CIM

• The estimated position of the first QTL

can be influenced by the second QTL

and vice versa, especially for linked

QTLs.

• The choice of covariates is critical: if

too many or too few markers are

chosen there will be a loss of power to

detect QTL.

R/R/qtlqtl

Composite Interval Mapping

cimcim()()

• cim(cross, pheno.col=1, n.marcovar=3, method=c("em", "imp", "hk", "ehk"), imp.method=c("imp", "argmax"), error.prob=0.0001, n.perm, window=10)o n.marcovar - number of marker covariates to use

o imp.method - method used to impute any missing marker genotype data

o window – marker covariates will be omitted this distance from the test postion

• add.cim.covar - Add dots at the locations of the selected marker covariates, for a plot of composite interval mapping results

Composite interval mappingComposite interval mapping

CIMCIM--Using permutation testUsing permutation test Composite interval mappingComposite interval mapping

blue – em

red – cim

Multiple QTL MappingMultiple QTL Mapping

• Extension of interval mapping to multiple QTLs

• Infer the location of QTLs to positions between markers

• Investigate interactions between QTLs(epistasis)

• More powerful and precise in detecting QTL (Kao et al 1999)

Sample Multiple QTL Mapping Sample Multiple QTL Mapping

outputoutput

Other Methods used in Interval Other Methods used in Interval

MappingMapping

• Bayesian Method – uses probability

theories in parameter estimations

based on prior knowledge about the

data (R/qtlbim)

• Mixed model regression – available

in R/ASReml

R/R/qtlqtl

Multiple QTL Mapping


• sim.geno() is used to impute genotypes with missing

data to minimize loss of information

• makeqtl() is used to create a qtl object. It pulls out the

imputed genotypes at the selected positions

• n.gen is the number of genotypes with imputed data


Displays the QTL on the genetic map


Not significant and may be

dropped from the model


Multiple QTL MappingMultiple QTL Mapping Multiple QTL MappingMultiple QTL Mapping

refineqtl() - Iteratively scan the positions for QTL in the

context of a multiple QTL model, to try to identify the

positions with maximum likelihood, for a fixed QTL model.


qtl mapping

Documents