grouping loci

30
Grouping loci Criteria Maximum two-point recombination fraction Example -r ij ≤ 0.40 Minimum LOD score - Z ij For n loci, there are n(n-1)/2 possible combinations that will be tested Expect probability of false positives Significant probability value - p ij Example p ij ≤ 0.00001

Upload: gabby

Post on 24-Feb-2016

75 views

Category:

Documents


0 download

DESCRIPTION

Grouping loci. Criteria Maximum two-point recombination fraction Example - r ij ≤ 0.40 Minimum LOD score - Z ij For n loci, there are n ( n -1)/2 possible combinations that will be tested Expect  probability of false positives Significant probability value - p ij - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Grouping loci

Grouping lociCriteria• Maximum two-point recombination fraction

– Example -rij ≤ 0.40

• Minimum LOD score - Zij

– For n loci, there are n(n-1)/2 possible combinations that will be tested

– Expect probability of false positives• Significant probability value - pij

– Example pij ≤ 0.00001

Page 2: Grouping loci

Locus ordering• Ideally, we would estimate the likelihoods for all

possible orders and take the one that is most probable by comparing log likelihoods

• That is computationally inefficient when there are more than ~10 loci

• Several methods have been proposed for producing a preliminary order

Page 3: Grouping loci

Locus ordering

6)2)(1(

kkkntriplets

No. of locik

Possible orders

No. of triplets

2 1 0

3 3 1

5 60 10

10 1,814,400 120

20 1.22 X 1018 1,140

40 4.08 X 1047 9,880

Number of orders among k loci

Number of triplets among k loci

2)1)(2)(3)...(2)(1(

2!

kkkknk

Page 4: Grouping loci

Three-point Analysis

32

)1)(2)(3(2!3

3 n

Number of unique orders among k loci2!knk

Order Mirror Order

ABC CBA

ACB BCA

BAC CAB

For three loci (k = 3 )

Page 5: Grouping loci

Three-point analysis

Page 6: Grouping loci
Page 7: Grouping loci

Non-Additivity of recombination frequencies

A B C

rAB rBC

rAC

The recombination frequency over the interval A – C (rAC) is less than the sum of rAB and rBC : rAC < rAB + rBC.This is because (rare) double recombination events (a recombination in both A - B and B - C) do not contribute to recombination between A and C.

Page 8: Grouping loci

Non-Additivity of recombination frequenciesA B C

A B C

A B C

A B C

P00=(1-rAB)(1-rBC)

P10=rAB(1-rBC)

P01=(1-rAB)rBC

P11=rABrBC

rAC=rAB(1-rBC)+(1-rAB)rBC

rAC=rAB+rBC-2rABrBC

Page 9: Grouping loci

• Interference means that recombination events in adjacent intervals interfere. The occurrence of an event in a given interval may reduce or enhance the occurrence of an event in its neighbourhood.

• Positive interference refers to the ‘suppression’ of recombination events in the neighbourhood of a given one.

• Negative interference refers to the opposite: enhancement of clusters of recombination events.

• Positive interference results in less double recombinants (over adjacent intervals) than expected on the basis of independence of recombination events.

Interference

rAC=rAB+rBC-2CrABrBC

Page 10: Grouping loci

Interference

C = coefficient of coincidence

A B C

a b c

Interference I = 1 - C

overs sdoublecors ofnumber Expectedovers cross double ofnumber Observed

CCoefficient

of coincidence

Expected number of double crossovers = rABrBCN

Page 11: Grouping loci

Observed Count: 22 24 16 14 8 10 2 4

24.0100

)42108(ˆ

36.0100

)421416(ˆ

BC

AB

r

r

DH population N=100, locus order ABC

69.064.86

10024.036.042

DCs ofnumber ExpectedDCs ofnumber Observed

C

Page 12: Grouping loci

Interference

• No interference– C = 1 and Interference = 1-C = 0

• Complete interference– C = 0 and Interference = 1-C = 1

• Negative interference– C > 1 and Interference = 1-C < 0

• Positive interference– C < 1 and Interference = 1-C > 0

Page 13: Grouping loci

Three locus analysis, DH population

Expected frequency

Genotypes Observed count Without interference With interference

ABC/ABC f1 0.5(1r1)(1r2) 0.5(1r1r2Cr1r2)

ABc/ABc f2 0.5(1r1) r2 0.5(r2Cr1r2)

AbC/AbC f3 0.5r1r2 0.5Cr1r2

Abc/Abc f4 0.5(1r2) r1 0.5(r1Cr1r2)

aBC/aBC f5 0.5(1r2) r1 0.5(r1Cr1r2)

aBc/aBc f6 0.5r1r2 0.5Cr1r2

abC/abC f7 0.5(1r1) r2 0.5(r2Cr1r2)

abc/abc f8 0.5 (1r1)(1r2) 0.5(1r1r2Cr1r2)

NR

NR

DC12

DC12

SC2

SC2

SC1

SC1

For the ABC locus order

Page 14: Grouping loci

MLE of two-locus recombination fractions

Nffffr

Nffffr

Nffffr

BC

AC

AB

)(ˆ

)(ˆ

)(ˆ

7263

5472

5463

2

2121

1

2rr

rCrrrrrr

BC

AC

AB

Genotypes Observed count

Expected frequency

ABC/ABC f1 = 34 0.5(1r1r2Cr1r2)ABc/ABc f2 = 5 0.5(r2Cr1r2)AbC/AbC f3 = 11 0.5Cr1r2Abc/Abc f4 = 0 0.5(r1Cr1r2)aBC/aBC f5 = 1 0.5(r1Cr1r2)aBc/aBc f6 = 10 0.5Cr1r2abC/abC f7 = 4 0.5(r2Cr1r2)abc/abc f8 = 35 0.5(1r1r2Cr1r2)

3.0100

)451011(ˆ

1.0100

)1045(ˆ

22.0100

)101011(ˆ

BC

AC

AB

r

r

rRegardless of locus order the MLEs of r are

For the ABC locus order

Page 15: Grouping loci

Ordering Loci by Minimizing Double Crossovers

Genotypes Observed count

ABC/ABC f1 = 34

ABc/ABc f2 = 5

AbC/AbC f3 = 11

Abc/Abc f4 = 0

aBC/aBC f5 = 1

aBc/aBc f6 = 10

abC/abC f7 = 4

abc/abc f8 = 35

Genotypes Observed countABC + abc f1 + f8 = 34 + 35 = 69

ABc + abC f2 + f7 = 5 + 4 = 9

AbC + aBc f3 + f6 = 11 + 10 = 21

Abc + aBC f4 + f5 = 0 + 1 = 1 Rarest genotypes are double recombinants

B A C

b a cX X

B a C

b A c

The order of loci is BAC

Page 16: Grouping loci

Ordering Loci by using recombination fractions

MLEs of r are

Largest r is rBC = 0.3

Smallest r is rAC = 0.1

B C

A CB A C

Order

3.0100

)451011(ˆ

1.0100

)1045(ˆ

22.0100

)101011(ˆ

BC

AC

AB

r

r

r

Page 17: Grouping loci

Minimum Sum of Adjacent Recombination Frequencies (SARF) (Falk 1989)

1

1

ˆl

iaa jirSARF

Order SARF

ABC 0.22 + 0.30 = 0.52

BAC 0.22 + 0.10 = 0.32

ACB 0.10 + 0.30 = 0.40

r = recombination frequency between adjacent loci ai and ajfor a given order: 1, 2, 3, …, l -1, l

The B-A-C order gives MIN[SARF] and the minimum distance (MD) map

3.0100

)451011(ˆ

1.0100

)1045(ˆ

22.0100

)101011(ˆ

BC

AC

AB

r

r

r

Simulations have shown that SARF is a reliable method to obtain markers orders for large datasets

Page 18: Grouping loci

Minimum Product of Adjacent Recombination Frequencies (PARF) (Wilson 1988)

Order PARF

ABC 0.22 x 0.30 = 0.066

BAC 0.22 x 0.10 = 0.022

ACB 0.10 x 0.30 = 0.0303.0ˆ1.0ˆ22.0ˆ

BC

AC

AB

rrr

r = recombination frequency between adjacent loci ai and ajfor a given order: 1, 2, 3, …, l -1, l

The B-A-C order gives MIN[PARF] and the minimum distance (MD) map

SARF and PARF are equivalent methods to obtain markers orders for large datasets

1

1

ˆl

iaa jirPARF

Page 19: Grouping loci

Maximum Sum of Adjacent LOD Scores(SALOD)

1

1

l

iaa ji

zSALOD

Order SALOD

ABC 3.135 + 1.551 = 4.686

BAC 3.135 + 6.942 = 10.077

ACB 6.942 + 1.551 = 8.493551.1;3.0ˆ942.6;1.0ˆ135.3;22.0ˆ

BCBC

ACAC

ABAB

ZrZrZr

Z = LOD score for recombination frequency between adjacent loci ai and aj

for a given order: 1, 2, 3, …, l -1, l

The B-A-C order gives MAX[SALOD]

SALOD is sensitive to locus informativeness

Page 20: Grouping loci

Minimum Count of Crossover Events (COUNT) (Van Os et al. 2005)

1

1

l

iaa ji

XCOUNT

Order COUNT

ABC 22 + 30 = 52

BAC 22 + 10 = 32

ACB 10 + 30 = 40

X = simple count of recombination events between adjacent loci ai and aj

for a given sequence: 1, 2, 3, …, l -1, l

The B-A-C order gives MIN[COUNT]

3.0100

)451011(ˆ

1.0100

)1045(ˆ

22.0100

)101011(ˆ

BC

AC

AB

r

r

r

COUNT is equivalent to SARF and PARF with perfect data. COUNT is superior to SARF with incomplete data

Page 21: Grouping loci

Locus Order- Likelihood Approach

k

iii pfCrrZ

121 )4log(),,(

r1 = Recombination fraction in interval 1r2 = Recombination fraction in interval 2C = Coefficient of coincidencepi = fi / nfi = Expected frequency of the ith pooled phenotypic classI = 1, 2, …, kk = No. of pooled phenotypic classes

k

iii pfCrrL

121 log),,(

Page 22: Grouping loci

Three locus analysis, DH population

Expected frequency

Genotypes Observed count Without interference With interference

ABC/ABC f1 0.5(1r1)(1r2) 0.5(1r1r2Cr1r2)

ABc/ABc f2 0.5(1r1) r2 0.5(r2Cr1r2)

AbC/AbC f3 0.5r1r2 0.5Cr1r2

Abc/Abc f4 0.5(1r2) r1 0.5(r1Cr1r2)

aBC/aBC f5 0.5(1r2) r1 0.5(r1Cr1r2)

aBc/aBc f6 0.5r1r2 0.5Cr1r2

abC/abC f7 0.5(1r1) r2 0.5(r2Cr1r2)

abc/abc f8 0.5 (1r1)(1r2) 0.5(1r1r2Cr1r2)

NR

NR

DC12

DC12

SC2

SC2

SC1

SC1

For the ABC locus order

Page 23: Grouping loci

MLE of two-locus recombination fractions

Nffffr

Nffffr

Nffffr

BC

AC

AB

)(ˆ

)(ˆ

)(ˆ

7263

5472

5463

2

2121

1

2rr

rCrrrrrr

BC

AC

AB

Genotypes Observed count

Expected frequency

ABC/ABC f1 = 34 0.5(1r1r2Cr1r2)ABc/ABc f2 = 5 0.5(r2Cr1r2)AbC/AbC f3 = 11 0.5Cr1r2Abc/Abc f4 = 0 0.5(r1Cr1r2)aBC/aBC f5 = 1 0.5(r1Cr1r2)aBc/aBc f6 = 10 0.5Cr1r2abC/abC f7 = 4 0.5(r2Cr1r2)abc/abc f8 = 35 0.5(1r1r2Cr1r2)

3.0100

)451011(ˆ

1.0100

)1045(ˆ

22.0100

)101011(ˆ

BC

AC

AB

r

r

rRegardless of locus order the MLEs of r are

For the ABC locus order

Page 24: Grouping loci

Haplotypes Obs. No. Freq. C=3.00 Exp. freq. Exp. freq. C=0 Exp. freq. C=1ABC + abc f1 = 69 0.69 1r1r2Cr1r2 1-0.10-0.3=0.60 1-0.10-0.30+0.03=0.63

ABc + abC f2 = 9 0.09 Cr1r2 0.00 0.03

AbC + aBc f3 = 21 0.21 r2Cr1r2 0.30 0.30-0.03=0.27

Abc + aBC f4 = 1 0.01 r1Cr1r2 0.10 0.10-0.03=0.07

Haplotypes Obs. No.

Freq. C=3.18 Exp. freq. Exp. freq. C=0 Exp. freq. C=1

ABC + abc f1 = 69 0.69 1r1r2Cr1r2 1-0.22-0.30=0.48 1-0.22-0.30+0.066=0.546

ABc + abC f2 = 9 0.09 r2Cr1r2 0.30 0.30-0.066=0.234

AbC + aBc f3 = 21 0.21 Cr1r2 0.00 0.066

Abc + aBC f4 = 1 0.01 r1Cr1r2 0.22 0.22-0.066=0.154

Haplotypes Obs. No.

Freq. C=0.45 Exp. freq. Exp. freq. C=0 Exp. freq. C=1

ABC + abc f1 = 69 0.69 1r1r2Cr1r2 1-0.22-0.10=0.68 1-0.22-0.10+0.022=0.702

ABc + abC f2 = 9 0.09 r2Cr1r2 0.10 0.10-0.022=0.078

AbC + aBc f3 = 21 0.21 r1Cr1r2 0.22 0.22-0.022=0.198

Abc + aBC f4 = 1 0.01 Cr1r2 0.00 0.022

ABC ORDER

BAC ORDER

ACB ORDER

Page 25: Grouping loci

k

iii pfCrrZ

121 )4log(),,(

Haplotypes Obs. No. pi, C=3.18 pi, C=1

ABC + abc f1 = 69 0.69 0.546

ABc + abC f2 = 9 0.09 0.234

AbC + aBc f3 = 21 0.21 0.066

Abc + aBC f4 = 1 0.01 0.154

764.3601.0log121.0log2109.0log969.0log69)18.3,30.0,22.0( L

ABC ORDER

k

iii pfCrrL

121 log),,(

413.49154.0log1066.0log21234.0log9546.0log69)0.1,30.0,22.0( L

793.10)154.0)(4log(1)066.0)(4log(21)234.0)(4log(9)546.0)(4log(69)0.1,30.0,22.0( Z

441.23)01.0)(4log(1)21.0)(4log(21)09.0)(4log(9)69.0)(4log(69)18.3,30.0,22.0( Z

Page 26: Grouping loci

Haplotypes Obs. No. pi, C=0.45 pi, C=1

ABC + abc f1 = 69 0.69 0.702

ABc + abC f2 = 9 0.09 0.078

AbC + aBc f3 = 21 0.21 0.198

Abc + aBC f4 = 1 0.01 0.022

BAC ORDER

764.3601.0log121.0log2109.0log969.0log69)45.0,10.0,22.0( L

002.37022.0log1198.0log21078.0log9702.0log69)0.1,10.0,22.0( L

205.23)022.0)(4log(1)198.0)(4log(21)078.0)(4log(9)702.0)(4log(69)0.1,10.0,22.0( Z

441.23)01.0)(4log(1)21.0)(4log(21)09.0)(4log(9)69.0)(4log(69)45.0,10.0,22.0( Z

k

iii pfCrrZ

121 )4log(),,(

k

iii pfCrrL

121 log),,(

Page 27: Grouping loci

Haplotypes Obs. No. pi, C=3.00 pi, C=1

ABC + abc f1 = 69 0.69 0.63

ABc + abC f2 = 9 0.09 0.03

AbC + aBc f3 = 21 0.21 0.27

Abc + aBC f4 = 1 0.01 0.07

ACB ORDER

764.3601.0log121.0log2109.0log969.0log69)0.3,30.0,10.0( L

648.4007.0log127.0log2103.0log963.0log69)0.1,30.0,102.0( L

558.19)07.0)(4log(1)27.0)(4log(21)03.0)(4log(9)63.0)(4log(69)0.1,30.0,10.0( Z

441.23)01.0)(4log(1)21.0)(4log(21)09.0)(4log(9)69.0)(4log(69)0.3,30.0,10.0( Z

k

iii pfCrrZ

121 )4log(),,(

k

iii pfCrrL

121 log),,(

Page 28: Grouping loci

Likelihood method

Unconstrained Model Constrained Model

Order C Likelihood LODLikelihood

C=1LOD C=1

ABC 3.18 -36.764 23.441 -49.413 10.793

BAC 0.45 -36.764 23.441 -37.001 23.204

ACB 3.00 -36.764 23.441 -40.648 19.558

The B-A-C order gives highest likelihood and LOD under a no interference C=1 model Most multipoint ML mapping algorithms use no interference models

Page 29: Grouping loci

Ordering Loci• GMENDEL (Liu and Knapp 1990) minimizes SARF

(Minimum Sum of Adjacent Recombination Frequencies )

• PGRI (Lu and Liu 1995) minimizes SARF (Minimum Sum of Adjacent Recombination Frequencies ) or maximizes the likelihood.

• RECORD (Van Os et al. 2005) minimizes COUNT (Minimum Count of Crossover Events)

Page 30: Grouping loci

Ordering Loci• JoinMap 4 (Van Ooijen, 2005)

– minimizes the least square locus order using a stepwise search (regression)

– Monte Carlo maximum likelihood (ML). Very fast computation of high density maps