1 genes and ms in tasmania, cont. lecture 5, statistics 246 february 3, 2004

26
1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

Upload: margaret-sullivan

Post on 02-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

1

Genes and MS in Tasmania, cont.

Lecture 5, Statistics 246February 3, 2004

Page 2: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

Mapping genes contributing to complex diseases

Page 3: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

3

MS susceptibility genes are difficult to map

MS is a complex disease. Analyses with traditional methods such as single marker association studies and standard linkage approaches (affected sib-pairs, pedigrees etc) have failed to agree on genomic regions other than the HLA region.

There are a variety of possible reasons for this: • Allelic and locus heterogeneity

(no single gene model fits all)• Significant environmental influences• Imprecise phenotyping

Page 4: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

4

Linkage vs Association studies

• Linkage mapping: tests for cosegregation of a marker allele with the disease within families

• Association mapping: seeks a marker allele that is present more frequently in cases than in controls; all affected individuals are treated as distant relatives– Case/control studies– Transmission disequilibrium test (needs triads)

We will do a quick review of association mapping before turning to our MS study.

Page 5: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

5

Linkage disequilibrium

Suppose that we have a marker with just two alleles, M and m say, having frequencies p and 1-p, and a (not necessarily linked) disease locus with alleles D and d, having frequencies q and 1-q. A (haploid) gamete must have one of the four combinations (haplotypes) DM, Dm, dM or dm. Let the frequencies in a population of these four haplotypes be x1, x2, x3 and x4 .

Under independence, we would have x1 = pq, etc. Deviations of the observed haplotype frequencies from these products is

termed linkage disequilibrium (LD), or, better, gametic association.

If inheriting the allele D at the disease locus increases the chance of getting the disease, and the disease and marker loci are in LD, then the frequencies of the marker alleles M and m will differ between diseased and non-diseased individuals. This observation is the basis of association studies.

Page 6: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

6

Case-control studies in genetic epidemiology

Case-control studies compare case and control allele frequencies at markers or candidate genes (the “exposure” variables). All the standard potential drawbacks of such studies apply, with the similarity of the two base populations being the most critical here. It is thought to be relatively easy for samples from racially mixed populations to differ in allele frequencies, and hard to deal with this in the genetic context. Key term: population structure.

If our cases are MS patients, who are our controls? It would be rare for a study to be able to afford or get ethics approval to carry out random sampling of the relevant background population. More commonly, controls are people such as blood donors, whose blood (DNA) has been collected for other purposes. How close will they be to a random sample from the case population?

In an effort to deal with this, the TDT which follows in effect uses untransmitted genotypes as controls, bypassing any population structure.

Page 7: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

7

The transmission-disequilibrium test

The TDT, as it is called, in its simplest form, starts with parents and an affected child, and considers a biallelic marker locus at which all three are typed, and we can determine which maternal and paternal alleles were transmitted, and which were not.

For example, if the parents were a1/a2 and a1/a1 , and the affected offspring was a1/a2 , then a2 was transmitted and a1 was not transmitted by the first parent.

From a random sample of such trios (called triads), a 22 table

can be built up giving the number of times a1 and a2 were transmitted and were not transmitted, respectively, and a simple test can be derived. Many generalizations of this procedure now exist, see notes for Stat 260, 1998 Week 5.

Page 8: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

8

2 1361

9 15174

1 962

9 172 12

12714671

18 181 410 10

Genotypes Haplotypes

13115492171276118410

26917

16921214718110

Haplotype

Re-construction

• A collection of alleles derived from the same chromosome

What is a haplotype?

Chromosome phase is knownChromosome phase is unknown

Page 9: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

9

Haplotype mapping

If alleles at a disease locus are associated with alleles at one nearby marker locus on gametes, they are likely to be associated with alleles at other nearby marker loci, and hence with marker haplotypes.

A potentially more powerful way to locate disease genes is to search for associations between marker haplotypes and disease.

There are two possible problems here stemming from the fact that there can be a very large number of marker haplotypes: we may have to deal with very small frequencies, and we have a multiple testing problem.

Page 10: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

10

Searching for common or rare haplotypes in cases alone is one form of association mapping. It has been successful, as very substantial LD can arise around disease loci. In general controls are necessary as the background LD can be large.

That is, there can be substantial LD between putative disease gene alleles and alleles of nearby markers, without there being any causal link between the gene and the disease. We call this background LD.

Background LD can be large – when the population is young– when the # of founders is small (bottlenecks)– through admixture of populations

LD, haplotype mapping and background LD

Page 11: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

11

Exercises on LD

1. Under a random mating assumption, the long term values of the frequencies x1, x2, x3 and x4 on page 5 above are pq, (1-p)q, p(1-q) and (1-p)(1-q). (Week 5, Stat 260, 1998).

2. Demonstrate that a mixture (e.g. 50:50) of two populations initially in linkage equilibrium at two loci, will typically not be in LE.

3. Explain why a single mutant arising by chance, will initially be in strong LD with alleles at loci near the locus on which it arises.

Page 12: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

Mapping MS genes in Tasmania

Page 13: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

13

Area: 67,800 km2

Population: 470,000

Tasmania

Capital city: Hobart (~200,000)

Page 14: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

Tasmanian Population Growth

1 : First settled by Europeans (1803)

2 : 24,000 free settlers19,000 convicts (1836)

4 : End of convicttransportation (1853)

3 : Civil registration of births and marriages (1838)

5 : “The Gold Rush” (1860’s)

1 2345

Page 15: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

15

Mapping with haplotype sharing

Time1800-1850’s

6-8 generations2000

Premise: Tasmanians share large(ish) segments of haplotypes because they are distantly relatedSimilarly our MS patients should share these large(ish) segments but even more so (in size and in number) in regions around MS susceptibility genes

Page 16: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

16

Haplotypes are “eroded” by recombinationAncestral

chromosome

Time/generations/meioses

MS MS MS MSMS MS 25 cM (SD=18)

Recombination events can help to map genes with precision, but erode haplotypes making them more difficult to detect

Page 17: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

17

What might have happened in the population?

• A mutation arises in, or is introduced to, a population leading to disease (say MS) in those individuals

• The mutation arises on the background of a unique haplotype

• As this mutation spreads through the population (by chance, or inbreeding) so do remnants of this original haplotype by hitchhiking (linkage disequilibrium)

timeMS

Ancestralsusceptibility

haplotype

MS

MS

MS

MS

Page 18: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

Design of the Tasmanian MS study

Page 19: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

19

What strategy could be used to map MS susceptibility genes in Tasmania?

• Too few affected sib pairs/multiplex families for a conventional linkage approach• Prefer a model free (non-parametric) approach A haplotype-based case-control study design seemed appropriate

Page 20: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

20

MS study in Tasmania: design

• Collect as many MS cases with ancestral links to Tasmania as possible, and a suitable (not necessarily equal) number of similar, socioeconomically and geographically matched unrelated controls

• Around each case and each control, collect a constellation of ~ 4 close relatives for (probabilistic) haplotype reconstruction

• Infer genome-wide haplotypes for all cases and controls

• Carry out a case/control study with the haplotypes, seeking regions of the genome shared more by the cases, in comparison with the controls

Page 21: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

21

Analysis options

Transmitted

Case Haplotypes

Untransmitted Case

Haplotypes

Transmitted Control

Haplotypes

Untransmitted Control

Haplotypes

Green: hope to find signal Red: hope to find nothing

Page 22: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

First mathematical questions

• Resolution of genome-wide scan (length of likely shared chromosomal segments)

• Nature and number of relatives needed to permit the reconstruction of accurate haplotypes with high probability

Page 23: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

Average length of shared chromosomal segments

Exercise. Assume the Poisson model for crossovers along a chromosome. What is the mean and variance of the length in cM of the chromosomal segments shared by individuals with a common ancestor 7 generations back?

Page 24: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

Nature and numbr of relatives needed to give accurate haplotypes

Exercise. Explain why it is that when we have both sets of parental genotypes, and the markers are reasonably polymorphic, we can reconstruct an individual’s haplotypes with high probability. What are the difficult cases?

If we have no parents, or just one parent, and grandparents’, siblings’ or offsprings’ genotypes are available, which are most informative for an individual’s haplotype reconstruction?

Page 25: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

25

Reconstructing haplotypes from genotypes

• Observe genotyping data for an individual

At marker 1 : (1,3)

At marker 2 : (b,d)

• Reconstruct the haplotype by inferring recombination events from genotypes of relatives

At marker 1 : Mum (1,2) Dad (3,4)

At marker 2 : Mum (a,b) Dad (c,d)

1

b

3

d

Marker 1

Marker 2

Page 26: 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004

26

Genotyping

Use STR (short tandem repeat)

also known as microsatellite markers

…AGCTAGCGCGC….GCGCGGCATTA…

…AGCTAGCGCGC….GCGCGGCGCATTA…

Eventual plan: 5 cM genome wide scan (~ 800 markers) with dinucleotide STRs