comparative high-resolution analysis of linkage disequilibrium and tag single nucleotide...

23
1 Comparative high-resolution analysis of linkage disequilibrium and tag single nucleotide polymorphisms between populations in the vitamin D receptor gene. Sergey Nejentsev 1 , Lisa Godfrey 1 , Hywel Snook 1 , Helen Rance 1 , Sarah Nutland 1 , Neil M. Walker 1 , Alex C. Lam 1 , Cristian Guja 2 , Constantin Ionescu-Tirgoviste 2 , Dag E. Undlien 3 , Kjersti S. Rønningen 4 , Eva Tuomilehto-Wolf 5 , Jaakko Tuomilehto 5,6 , Melanie J. Newport 7 , David G. Clayton 1 and John A. Todd 1 . 1 Juvenile Diabetes Research Foundation / Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK 2 Clinic of Diabetes, Institute of Diabetes, Nutrition and Metabolic Diseases “N. Paulescu”, Bucharest, Romania 3 Institute of Medical Genetics, Ulleval University Hospital, University of Oslo, Oslo, Norway 4 Laboratory of Molecular Epidemiology, Division of Epidemiology, Norwegian Institute of Public Health, Oslo, Norway 5 Diabetes and Genetic Epidemiology Unit, National Public Health Institute, Helsinki, Finland 6 Department of Public Health, University of Helsinki, Helsinki, Finland 7 Department of Medicine, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK Correspondence to: Sergey Nejentsev MD, PhD, JDRF/WT Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC building, Addenbrooke's Hospital, Cambridge, CB2 2XY, UK. Tel: +44 (0) 1223 762106, Fax: +44 (0) 1223 762102, E-mail: [email protected] Copyright © 2004 Oxford University Press HMG Advance Access published June 2, 2004 by guest on June 2, 2013 http://hmg.oxfordjournals.org/ Downloaded from

Upload: independent

Post on 17-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

1

Comparative high-resolution analysis of linkage disequilibrium and

tag single nucleotide polymorphisms between populations in the

vitamin D receptor gene.

Sergey Nejentsev1, Lisa Godfrey1, Hywel Snook1, Helen Rance1, Sarah Nutland1, Neil

M. Walker1, Alex C. Lam1, Cristian Guja2, Constantin Ionescu-Tirgoviste2, Dag E.

Undlien3, Kjersti S. Rønningen4, Eva Tuomilehto-Wolf5, Jaakko Tuomilehto5,6,

Melanie J. Newport7, David G. Clayton1 and John A. Todd1.

1Juvenile Diabetes Research Foundation / Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK 2Clinic of Diabetes, Institute of Diabetes, Nutrition and Metabolic Diseases “N. Paulescu”, Bucharest, Romania 3Institute of Medical Genetics, Ulleval University Hospital, University of Oslo, Oslo, Norway 4Laboratory of Molecular Epidemiology, Division of Epidemiology, Norwegian Institute of Public Health, Oslo, Norway 5Diabetes and Genetic Epidemiology Unit, National Public Health Institute, Helsinki, Finland 6Department of Public Health, University of Helsinki, Helsinki, Finland 7Department of Medicine, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK Correspondence to: Sergey Nejentsev MD, PhD, JDRF/WT Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC building, Addenbrooke's Hospital, Cambridge, CB2 2XY, UK. Tel: +44 (0) 1223 762106, Fax: +44 (0) 1223 762102, E-mail: [email protected]

Copyright © 2004 Oxford University Press

HMG Advance Access published June 2, 2004 by guest on June 2, 2013

http://hmg.oxfordjournals.org/

Dow

nloaded from

2

Abstract

A genome-wide map of single nucleotide polymorphisms (SNPs) and a pattern of

linkage disequilibrium (LD) between their alleles is being established in three main

ethnic groups. An important question is the applicability of such maps to different

populations within a main ethnic group. Therefore, we have developed high-

resolution SNP, haplotype and LD maps of the vitamin D receptor gene region in

large samples from five populations. Comparative analysis reveals that LD patterns

are identical in all four European populations tested with two small regions of 1.3 kb

and 5.7 kb at which LD is completely disrupted resulting in three block-like regions

over which there is significant and extensive LD. In an African population the pattern

is similar, but two additional LD-breaking spots are also apparent. This LD pattern

suggests combined action of recombination hotspots and founder effects, but cannot

be explained by random recombination and genetic drift only. Direct comparison

indicates that tag SNPs selected in one European population effectively predict non-

tag SNPs in other Europeans, but not in the Gambians, for this region.

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

3

Introduction

Genetic association studies aim to identify human genome sequence variants that

cause complex diseases. The human genome contains regions in which single

nucleotide polymorphism (SNP) alleles are distributed non-randomly, i.e. are in

linkage disequilibrium (LD). These regions, known as LD or haplotype blocks, have

elevated local LD and limited haplotype diversity (1, 2). Because of the underlying

LD structure all common sequence variation in a given genome region may be

captured by a limited number of tag SNPs that are used to scan genomic regions for

disease association and reduce redundancy and costs of association studies (3, 4).

Recently, the HapMap project was launched to establish LD patterns and to catalogue

tag SNPs in the entire human genome in three main ethnic groups (5).

The origin of the LD blocks in the human genome is not completely understood.

Using direct studies of recombination in sperm Jeffreys and colleagues have shown

that boundaries of the LD blocks co-localise with recombination hotspots at least in

some areas of the genome (6, 7). Recent evidence suggests the presence of

recombination hotspots across the human genome every ~200 kb on average (8).

Recombination hotspots should determine similar LD block structure in different

human populations, unless recombination rates vary between populations. However,

computer simulations suggest that the block-like structure of the human genome may

have been generated by uniform recombination and random genetic drift (9-11). This

model implies that LD block boundaries may vary significantly between populations

that will complicate genetic association studies. Therefore, it is not clear how

informative LD block structure and tag SNPs established by the HapMap project,

would be in various human populations, particularly those not represented in the

HapMap panel (5). Studies of genomic regions in different populations featuring large

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

4

sample size and dense SNP maps may provide answers to these questions. Moreover,

comparative inter-population analysis of LD blocks will help to understand their

evolutionary origin.

Here we have developed and studied a high-resolution SNP map of the vitamin D

receptor (VDR) gene region on chromosome 12q12-q14. VDR is a protein that

mediates the effects of vitamin D3 in bone and mineral metabolism, regulation of

growth and differentiation in many target tissues and acts as a modulator of the

immune system (12). The VDR gene contains eight protein-coding exons 2 – 9, six

untranslated exons 1a – 1f, which are alternatively spliced, and several promoter

regions (13, 14). Four common VDR SNPs were intensively studied for association

with various human traits and were reported to affect risk of osteoporosis, breast and

prostate cancers, and immune-mediated disorders (15). However, the association

observed is inconsistent in different studies suggesting that the SNPs tested may

merely be markers in LD with true causal variant(s), which remain unknown.

Construction of comprehensive SNP, haplotype and LD maps of the VDR gene will

facilitate association studies and fine mapping of the causal sequence variants for a

range of diseases.

Results

In total we sequenced 94 kb in a 164 kb region on chromosome 12q12-q14 around the

VDR gene and found 245 SNPs, of which 128 were not present in the dbSNP

database build 119 (Supplementary Table 1). We then genotyped 98 SNPs, i.e. one

SNP per 1.7 kb on average, in 458 families from Great Britain. We calculated allele

frequency in 916 parents and noticed 68 common SNPs with minor allele frequency

(MAF) >10% (Figure 1a). For these SNPs we calculated pairwise |D’| (Figure 1b), r2,

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

5

and P values. A large sample of 916 subjects allowed us high confidence in estimating

LD. Three separate regions could be distinguished in the VDR gene and were

designated LD blocks A, B and C; SNPs in each of these blocks are in LD with SNPs

located inside a block, but show very little, if any, LD with SNPs located in other

blocks (Figure 1). Within blocks B and C, regions of markedly higher LD can be also

distinguished; we refer to them as B1, B2, C1, C2 and C3. However, their boundaries

cannot be established unambiguously, because considerable LD exists between SNPs

in these regions (Figure 1).

Block A localises 3’ to the VDR exon 9 and spans at least 10.5 kb. VDR exons 3 - 9

localise in the block B, which spans 40.8 kb. A 5.7 kb LD-breaking spot, located 3’ of

the VDR gene separates blocks A and B. Blocks B and C are separated by another 1.3

kb LD-breaking spot; it includes VDR exon 2 and a commonly studied FokI SNP,

which is the only SNP out of the 68 that has no detectable LD with any other SNP and

cannot be assigned to any block. All non-coding VDR exons localise in at least 92.9

kb block C. Both blocks A and C have one of their boundaries outside the region

studied.

In order to compare LD block structure in different populations we tested all 68 VDR

SNPs, which are common in Britons, in four other population samples, three

European and one African. We found 13 SNPs with MAF < 10% in, at least one,

usually Gambian, population sample. We restricted comparative analysis to 55 SNPs

common in all five populations. This provides consistency and avoids bias in the

conclusions due to the analysis of SNPs underrepresented in some populations. The

location of the two LD-breaking spots, which separate blocks A, B and C, was

identical in all five populations studied (Figure 2). However, in the Gambians we

additionally found separate blocks C1, C2 and C3, which correspond to the regions of

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

6

higher LD within block C in Europeans. Accordingly, LD-breaking spots in the

Gambians co-localise with spots of decreased LD within block C in Europeans, but

clearly more LD extends over these spots in Europeans (Figure 2). In contrast, block

structure in all four European populations is remarkably similar.

We then reconstructed haplotypes from the 55 SNPs in each population separately and

found similar haplotype diversity in Europeans, which is a fraction of haplotype

diversity in the Gambians (full list of haplotypes is available in the Supplementary

Table 2). In each of the five populations we then selected a minimal set of tag SNPs,

which have r2 ≥ 0.8 with all non-tag SNPs (tag SNPs for each population are

highlighted in the Supplementary Table 2). As predicted by a higher haplotype

diversity, we found that notably more tag SNPs are required to capture common

variation in the Gambians compared to Europeans: 42 versus 24-26, respectively.

Then we tested how efficiently tag SNPs selected in one population sample predict

non-tag SNPs in the four other populations. We found that tag SNPs selected in each

of the European population samples usually effectively predict non-tag SNPs in all

other Europeans, but not in the Gambians (Figure 3). Conversely, tag SNPs selected

in the Gambian population predict non-tag SNPs in Europeans rather well. However,

this is achieved at a cost of a larger number of tag SNPs in the Gambians.

Similarly, in the four European population samples we have analysed tag SNPs

selected from all 68 VDR SNPs that were common in Britons (MAF > 10%). All these

68 SNPs were also common in other Europeans (MAF > 8 %). We found a similar

pattern: tag SNPs selected in each of the four European population samples

effectively predict non-tag SNPs in the other samples (Supplementary Figure). Tag

SNPs selected among 68 SNPs were more efficient than tag SNPs selected among 55

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

7

SNPs, because extra tag SNPs capture additional information about some of the non-

tag SNPs.

We then investigated how informative was the analysis of the four classically typed

SNPs FokI, ApaI, TaqI and BsmI in previous studies of the VDR gene region aimed to

discover disease association. FokI is not in LD with any other common SNP and,

therefore, was not informative for capturing information from those SNPs. The three

other SNPs localised to block B and do not capture any information about SNPs in

other blocks. We then tested ApaI, TaqI and BsmI as tag SNPs for other block B SNPs

and found that out of the 29 SNPs in block B, which are common in Britons, these

three SNPs predict only 11 with r2 > 0.8 and 15 SNPs with r2 ranging between 0.05

and 0.43. Therefore, in various disease association studies that genotyped FokI, ApaI,

TaqI and BsmI only, information about a large fraction of common SNPs in the VDR

region was not captured.

Discussion

Here we constructed a dense SNP, haplotype and LD maps of the VDR gene region in

five populations. Given multiple association studies and high interest in the VDR gene

these maps provide a basis for future research of the VDR sequence polymorphism in

human diseases.

Structural analysis of LD blocks in the VDR gene region in various populations

allows insight in the mechanism of their origin and in the history of the populations

tested. Identical boundaries of blocks A and B in all five populations indicate general

mechanism that acted in both African and European populations. Potentially it may be

explained either by recombination hotspots or by random crossovers and drift, which

led to a block structure in the common ancestors that was fixed ever since. However,

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

8

additional LD appeared in the history of all European populations. Such LD may have

been created by a bottleneck or a founder effect after divergence of the ancestors of all

modern Europeans from the ancestors of the Gambians (16, 17). This LD pattern is

not ubiquitous in the region but is restricted to the LD blocks, does not extend across

major LD-breaking spots between them, and, therefore, argues in favour of localised

recombination.

The mechanism that may have led to the observed LD block structure is suggested by

the observation that recombination hotspots vary dramatically in recombination

intensity and both highly and weakly active hotspots exist (6, 8, 18). Thus, highly

active recombination hotspots probably manifest by breaking LD in all populations

(e.g. LD-breaking spots between blocks A, B and C, Figure 2). In contrast, weakly

active recombination hotspots occasionally may manifest only in ancient and diverse

African populations, such as Gambian, that have accumulated and retained enough

recombinant chromosomes (18), while in the Europeans residual LD may still be

present over a weakly active recombination hotspot (e.g. LD breaking spots within

block C, Figure 2). Such a model is consistent with a well-known shorter average LD

in African populations in comparison to European populations (19, 20). Direct

recombination analysis is required to test it in the VDR gene region. Nevertheless,

highly structured pattern of the VDR LD blocks in different populations can be

explained by combined action of recombination hotspots of varying intensity and

demographic history events, but argues against the hypothesis that LD blocks may

have been generated by randomly distributed crossovers and genetic drift only (9-11).

Weakly active recombination hotspots localised inside LD blocks, rather than random

recombination, may explain a phenomenon of decreasing LD with distance within a

block (21).

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

9

The remarkable similarity of LD patterns in all European populations tested is in line

with the model assuming that no major founder effects occurred at the origin of

specific farming communities in Europe (22). A bottleneck suggested at the founding

of the Finnish population (23) left no significant trace of additional LD in the VDR

gene. Consequently, studies of the Finnish population provide no major advantage of

stronger LD between SNPs compared to other Europeans (24).

Our direct inter-population comparison shows that tag SNPs selected in one European

population sample usually effectively predict non-tag SNPs in other Europeans, but

not in the Gambians (Figure 3), at least for this chromosome region. This observation

implies that tag SNPs selected in the CEPH families of European origin in the

HapMap project (5) may be applied in other Europeans with only moderate loss of

power. However, comparative analysis of multiple genomic regions in various

populations is required to answer this question confidently. As expected, our data

show that tag SNPs should be established separately for populations of the European

and African origin.

This study of the VDR region illustrates the relative advantages of various populations

for genetic associating mapping. Initial disease association is easier to detect in

Europeans than in Africans, because fewer tag SNPs are required to characterise

common variation (Figure 3). However, should association be detected, subsequent

fine mapping experiments will require testing of SNPs in an LD block region that

often will be smaller in Africans than in Europeans (Figure 2). Provided that Africans

have higher haplotype diversity, analysis of additional haplotypes may further help

fine mapping of a causal variant. Therefore, genetic analysis in populations of

different origin potentiates studies of complex diseases.

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

10

Materials and Methods

Subjects. We searched for SNPs by sequencing eight unrelated Britons and also

sequenced coding VDR exons 2 – 9 in 40 individuals from the same population to find

additional rarer variants. We genotyped 285 new-borns from The Gambia, 458 British

families (916 parents), 90 Norwegian families (180 parents), 63 Finnish families (126

parents), and 64 Romanian families (128 parents). The Gambian population sample

comprises 285 unrelated new-borns collected in five centres across the country. All

European subjects were Caucasian. All families have type 1 diabetes offspring. LD

and haplotypes are similar for the chromosomes transmitted and non-transmitted to

the affected offspring and, therefore, their combined analysis is shown. We obtained

permission from the relevant ethical committees and informed consent from all

participating subjects.

Sequencing. We designed primers using Primer3 (http://www.broad.mit.edu/cgi-

bin/primer/primer3_www.cgi), amplified genomic DNA using protocol described

elsewhere (25) and sequenced 500-700 base pair PCR fragments with an ABI Big Dye

Terminator v2 kit and an ABI 3700 capillary sequencer (ABI, Foster City, CA). We

sequenced 94 kb in a 164 kb region on chromosome 12q12-q14 around the VDR

structural gene, between –34,045 bp upstream of the exon 1f and 27,338 bp

downstream exon 9, allowing for the possibility of uncharacterised long range

regulatory elements. We found 245 VDR SNPs using the Staden sequence analysis

package (26). Of these 128 SNPs were not present in the dbSNP build 119. All 245

SNPs are listed in the Supplementary Table 1 with the position corresponding to

Human Genome build 34.

Genotyping. We selected a total of 98 out of 245 SNPs distributed evenly to avoid

large gaps in the SNP map, one SNP per 1.7 kb on average. The density was higher

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

11

around exons and lower in the 5’ and 3’ UTRs. Four commonly studied SNPs, FokI,

ApaI, TaqI and BsmI were also included. We genotyped 98 SNPs using Invader (Third

Wave Technologies, Madison, WI), TaqMan (Perkin Elmer Applied Biosystems,

Foster City, CA) or BeadArray (Illumina Inc, San Diego, CA) assays. We tested

genotype frequency for each SNP in each population with Arlequin version 2.000

(http://lgb.unige.ch/arlequin/) and found no deviation from the Hardy-Weinberg

equilibrium (P > 0.01).

Statistical analysis. We calculated pairwise r2 (27), |D’| (28), P values and generated

graphical images with GOLD (29) and edited images using GIMP version 1.2.3

(http://www.gimp.org). We reconstructed haplotypes using expectation-maximization

algorithm within SNPHAP version 0.2.1 (http://www-

gene.cimr.cam.ac.uk/clayton/software/) and then selected a minimal set of tag SNPs

that have r2 ≥ 0.8 with all non-tag SNPs in each of the five populations using ‘htstep’

and ‘htsearch’ programs within STATA version 8.1 (http://www.stata.com). Then we

calculated r2 between tag SNPs and non-tag SNPs, which they predict, in other four

population samples using ‘lpredict’ program written by David Clayton for STATA

version 8.1.

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

12

ACKNOWLEDGEMENTS

We thank William Wang and Jason Cooper for interesting and productive discussions.

This work was funded by the Wellcome Trust, the Juvenile Diabetes Research

Foundation International, the Academy of Finland, the Sigrid Juselius Foundation and

the Novo Nordisk Foundation. Genotypic data will be available on request to

researchers on the basis of a Data Access Agreement. Please refer to our web site

(https://www-gene.cimr.cam.ac.uk/todd/access-agreement.html) and contact Neil

Walker ([email protected]).

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

13

References

1. Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J. and Lander, E.S. (2001)

High-resolution haplotype structure in the human genome. Nat. Genet., 29,

229-232.

2. Goldstein, D.B. (2001) Islands of linkage disequilibrium. Nat. Genet., 29, 109-

111.

3. Johnson, G.C., Esposito, L., Barratt, B.J., Smith, A.N., Heward, J., Di Genova,

G., Ueda, H., Cordell, H.J., Eaves, I.A., Dudbridge, F. et al. (2001) Haplotype

tagging for the identification of common disease genes. Nat. Genet., 29, 233-

237.

4. Chapman, J.M., Cooper, J.D., Todd, J.A. and Clayton, D.G. (2003) Detecting

disease associations due to linkage disequilibrium using haplotype tags: a class

of tests and the determinants of statistical power. Hum. Hered., 56, 18-31.

5. The International HapMap Consortium (2003) The International HapMap

Project. Nature, 426, 789-796.

6. Jeffreys, A.J., Kauppi, L. and Neumann, R. (2001) Intensely punctate meiotic

recombination in the class II region of the major histocompatibility complex.

Nat. Genet., 29, 217-222.

7. May, C.A., Shone, A.C., Kalaydjieva, L., Sajantila, A. and Jeffreys, A.J.

(2002) Crossover clustering and rapid decay of linkage disequilibrium in the

Xp/Yp pseudoautosomal gene SHOX. Nat. Genet., 31, 272-275.

8. McVean, G.A., Myers, S.R., Hunt, S., Deloukas, P., Bentley, D.R. and

Donnelly, P. (2004) The fine-scale structure of recombination rate variation in

the human genome. Science, 304, 581-4.

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

14

9. Wang, N., Akey, J.M., Zhang, K., Chakraborty, R. and Jin, L. (2002)

Distribution of recombination crossovers and the origin of haplotype blocks:

the interplay of population history, recombination, and mutation. Am. J. Hum.

Genet., 71, 1227-1234.

10. Zhang, K., Akey, J.M., Wang, N., Xiong, M., Chakraborty, R. and Jin, L.

(2003) Randomly distributed crossovers may generate block-like patterns of

linkage disequilibrium: an act of genetic drift. Hum. Genet., 113, 51-59.

11. Phillips, M.S., Lawrence, R., Sachidanandam, R., Morris, A.P., Balding, D.J.,

Donaldson, M.A., Studebaker, J.F., Ankener, W.M., Alfisi, S.V., Kuo, F.S. et

al. (2003) Chromosome-wide distribution of haplotype blocks and the role of

recombination hot spots. Nat. Genet., 33, 382-387.

12. Jones, G., Strugnell, S.A. and DeLuca, H.F. (1998) Current understanding of

the molecular actions of vitamin D. Physiol. Rev., 78, 1193-1231.

13. Baker, A.R., McDonnell, D.P., Hughes, M., Crisp, T.M., Mangelsdorf, D.J.,

Haussler, M.R., Pike, J.W., Shine, J. and O'Malley, B.W. (1988) Cloning and

expression of full-length cDNA encoding human vitamin D receptor. Proc.

Natl. Acad. Sci. U S A, 85, 3294-3298.

14. Crofts, L.A., Hancock, M.S., Morrison, N.A. and Eisman, J.A. (1998)

Multiple promoters direct the tissue-specific expression of novel N-terminal

variant human vitamin D receptor gene transcripts. Proc. Natl. Acad. Sci. U S

A, 95, 10529-10534.

15. Zmuda, J.M., Cauley, J.A. and Ferrell, R.E. (2000) Molecular epidemiology of

vitamin D receptor gene variants. Epidemiol. Rev., 22, 203-217.

16. Cann, R.L., Stoneking, M. and Wilson, A.C. (1987) Mitochondrial DNA and

human evolution. Nature, 325, 31-36.

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

15

17. Stringer, C.B. and Andrews, P. (1988) Genetic and fossil evidence for the

origin of modern humans. Science, 239, 1263-1268.

18. Kauppi, L., Sajantila, A. and Jeffreys, A.J. (2003) Recombination hotspots

rather than population history dominate linkage disequilibrium in the MHC

class II region. Hum. Mol. Genet., 12, 33-40.

19. Reich, D.E., Cargill, M., Bolk, S., Ireland, J., Sabeti, P.C., Richter, D.J.,

Lavery, T., Kouyoumjian, R., Farhadian, S.F., Ward, R. et al. (2001) Linkage

disequilibrium in the human genome. Nature, 411, 199-204.

20. Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J., Blumenstiel,

B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M. et al. (2002) The

structure of haplotype blocks in the human genome. Science, 296, 2225-2229.

21. Shifman, S., Kuypers, J., Kokoris, M., Yakir, B. and Darvasi, A. (2003)

Linkage disequilibrium patterns of the human genome across populations.

Hum. Mol. Genet., 12, 771-776.

22. Barbujani, G. and Bertorelle, G. (2001) Genetics and the population history of

Europe. Proc. Natl. Acad. Sci. U S A, 98, 22-25.

23. Sajantila, A., Salem, A.H., Savolainen, P., Bauer, K., Gierig, C. and Paabo, S.

(1996) Paternal and maternal DNA lineages reveal a bottleneck in the

founding of the Finnish population. Proc. Natl. Acad. Sci. U S A, 93, 12035-

12039.

24. Eaves, I.A., Merriman, T.R., Barber, R.A., Nutland, S., Tuomilehto Wolf, E.,

Tuomilehto, J., Cucca, F. and Todd, J.A. (2000) The genetically isolated

populations of Finland and sardinia may not be a panacea for linkage

disequilibrium mapping of common disease genes. Nat. Genet., 25, 320-323.

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

16

25. Ueda, H., Howson, J.M., Esposito, L., Heward, J., Snook, H., Chamberlain,

G., Rainbow, D.B., Hunter, K.M., Smith, A.N., Di Genova, G. et al. (2003)

Association of the T-cell regulatory gene CTLA4 with susceptibility to

autoimmune disease. Nature, 423, 506-511.

26. Staden, R., Beal, K.F. and Bonfield, J.K. (2000) The Staden package, 1998.

Methods. Mol. Biol., 132, 115-130.

27. Pritchard, J.K. and Przeworski, M. (2001) Linkage disequilibrium in humans:

models and data. Am. J. Hum. Genet., 69, 1-14.

28. Lewontin, R.C. (1964) The Interaction of Selection and Linkage. I. General

considerations: heterotic models. Genetics, 49, 49-67.

29. Abecasis, G.R. and Cookson, W.O. (2000) GOLD--graphical overview of

linkage disequilibrium. Bioinformatics, 16, 182-183.

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

17

Figure 1. High-resolution linkage disequilibrium analysis of the VDR gene

region.

a – Exons (black boxes) and SNPs (black lines).

b – Pairwise |D’|. SNP numbers correspond to numbers in the Supplementary Table 1

SNPs with MAF > 10% are shown.

c – LD blocks (black bars), unassigned FokI SNP (red dot) located in an LD-breaking

spot between blocks and the size of the LD-breaking spots (in kb) are shown.

Figure 2. Linkage disequilibrium structure in the VDR region in five

populations.

Pair-wise |D’|, LD blocks (black bars), location of the LD-breaking spots seen in all

five populations (thick arrows) and LD-breaking spots apparent in the Gambians only

(thin arrows) are shown.

Figure 3. Inter-population comparison of the tag SNPs.

Among 55 VDR SNPs a set of tag SNPs was selected in one of the five populations (X

axis) so that tag SNPs capture all non-tag SNPs with r2 ≥ 0.8. Then r2 between tag

SNPs and non-tag SNPs (Y axis) was calculated in the four other populations (British

– red, Finnish – blue, Norwegian – green, Romanian – yellow, Gambian – black).

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

18

Supplementary Figure. Inter-population comparison of the tag SNPs selected

among 68 VDR SNPs.

Among 68 VDR SNPs a set of tag SNPs was selected in one of the five populations (X

axis) so that tag SNPs capture all non-tag SNPs with r2 ≥ 0.8. Then r2 between tag

SNPs and non-tag SNPs (Y axis) was calculated in the four other populations (British

– red, Finnish – blue, Norwegian – green, Romanian – yellow).

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

19

Figure 1.

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

20

Figure 2.

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

21

Figure 3.

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

22

Supplementary Figure.

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from

23

Abbreviations

LD, linkage disequilibrium

MAF, minor allele frequency

SNP, single nucleotide polymorphism

VDR, vitamin D receptor

by guest on June 2, 2013http://hm

g.oxfordjournals.org/D

ownloaded from