robust physical methods that enrich genomic regions identical by descent for linkage studies:...

14
BioMed Central Page 1 of 14 (page number not for citation purposes) BMC Genetics Open Access Methodology article Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta Peter Brooks* 1 , Charles Marcaillou 1 , Maud Vanpeene 1 , Jean-Paul Saraiva 1 , Daniel Stockholm 2 , Stephan Francke 3 , Reyna Favis 3 , Nadine Cohen 3 , Francis Rousseau 1 , Frédéric Tores 1 , Pierre Lindenbaum 1 , Jörg Hager 1 and Anne Philippi 1 Address: 1 IntegraGen SA, 4 rue Pierre Fontaine, 91000 Evry, France, 2 Généthon, CNRS UMR8115, 1 rue de l'Internationale, Evry 91000, France and 3 Department of Pharmacogenomics, Johnson & Johnson PRD, PO Box 300, Raritan, New Jersey 08869, USA Email: Peter Brooks* - [email protected]; Charles Marcaillou - [email protected]; Maud Vanpeene - [email protected]; Jean-Paul Saraiva - [email protected]; Daniel Stockholm - [email protected]; Stephan Francke - [email protected]; Reyna Favis - [email protected]; Nadine Cohen - [email protected]; Francis Rousseau - [email protected]; Frédéric Tores - [email protected]; Pierre Lindenbaum - [email protected]; Jörg Hager - [email protected]; Anne Philippi - [email protected] * Corresponding author Abstract Background: The monogenic disease osteogenesis imperfecta (OI) is due to single mutations in either of the collagen genes ColA1 or ColA2, but within the same family a given mutation is accompanied by a wide range of disease severity. Although this phenotypic variability implies the existence of modifier gene variants, genome wide scanning of DNA from OI patients has not been reported. Promising genome wide marker-independent physical methods for identifying disease- related loci have lacked robustness for widespread applicability. Therefore we sought to improve these methods and demonstrate their performance to identify known and novel loci relevant to OI. Results: We have improved methods for enriching regions of identity-by-descent (IBD) shared between related, afflicted individuals. The extent of enrichment exceeds 10- to 50-fold for some loci. The efficiency of the new process is shown by confirmation of the identification of the Col1A2 locus in osteogenesis imperfecta patients from Amish families. Moreover the analysis revealed additional candidate linkage loci that may harbour modifier genes for OI; a locus on chromosome 1q includes COX-2, a gene implicated in osteogenesis. Conclusion: Technology for physical enrichment of IBD loci is now robust and applicable for finding genes for monogenic diseases and genes for complex diseases. The data support the further investigation of genetic loci other than collagen gene loci to identify genes affecting the clinical expression of osteogenesis imperfecta. The discrimination of IBD mapping will be enhanced when the IBD enrichment procedure is coupled with deep resequencing. Published: 30 March 2009 BMC Genetics 2009, 10:16 doi:10.1186/1471-2156-10-16 Received: 13 September 2007 Accepted: 30 March 2009 This article is available from: http://www.biomedcentral.com/1471-2156/10/16 © 2009 Brooks et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Upload: peter-brooks

Post on 06-Jul-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta

BioMed CentralBMC Genetics

ss

Open AcceMethodology articleRobust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfectaPeter Brooks*1, Charles Marcaillou1, Maud Vanpeene1, Jean-Paul Saraiva1, Daniel Stockholm2, Stephan Francke3, Reyna Favis3, Nadine Cohen3, Francis Rousseau1, Frédéric Tores1, Pierre Lindenbaum1, Jörg Hager1 and Anne Philippi1

Address: 1IntegraGen SA, 4 rue Pierre Fontaine, 91000 Evry, France, 2Généthon, CNRS UMR8115, 1 rue de l'Internationale, Evry 91000, France and 3Department of Pharmacogenomics, Johnson & Johnson PRD, PO Box 300, Raritan, New Jersey 08869, USA

Email: Peter Brooks* - [email protected]; Charles Marcaillou - [email protected]; Maud Vanpeene - [email protected]; Jean-Paul Saraiva - [email protected]; Daniel Stockholm - [email protected]; Stephan Francke - [email protected]; Reyna Favis - [email protected]; Nadine Cohen - [email protected]; Francis Rousseau - [email protected]; Frédéric Tores - [email protected]; Pierre Lindenbaum - [email protected]; Jörg Hager - [email protected]; Anne Philippi - [email protected]

* Corresponding author

AbstractBackground: The monogenic disease osteogenesis imperfecta (OI) is due to single mutations ineither of the collagen genes ColA1 or ColA2, but within the same family a given mutation isaccompanied by a wide range of disease severity. Although this phenotypic variability implies theexistence of modifier gene variants, genome wide scanning of DNA from OI patients has not beenreported. Promising genome wide marker-independent physical methods for identifying disease-related loci have lacked robustness for widespread applicability. Therefore we sought to improvethese methods and demonstrate their performance to identify known and novel loci relevant to OI.

Results: We have improved methods for enriching regions of identity-by-descent (IBD) sharedbetween related, afflicted individuals. The extent of enrichment exceeds 10- to 50-fold for someloci. The efficiency of the new process is shown by confirmation of the identification of the Col1A2locus in osteogenesis imperfecta patients from Amish families. Moreover the analysis revealedadditional candidate linkage loci that may harbour modifier genes for OI; a locus on chromosome1q includes COX-2, a gene implicated in osteogenesis.

Conclusion: Technology for physical enrichment of IBD loci is now robust and applicable forfinding genes for monogenic diseases and genes for complex diseases. The data support the furtherinvestigation of genetic loci other than collagen gene loci to identify genes affecting the clinicalexpression of osteogenesis imperfecta. The discrimination of IBD mapping will be enhanced whenthe IBD enrichment procedure is coupled with deep resequencing.

Published: 30 March 2009

BMC Genetics 2009, 10:16 doi:10.1186/1471-2156-10-16

Received: 13 September 2007Accepted: 30 March 2009

This article is available from: http://www.biomedcentral.com/1471-2156/10/16

© 2009 Brooks et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Page 1 of 14(page number not for citation purposes)

Page 2: Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta

BMC Genetics 2009, 10:16 http://www.biomedcentral.com/1471-2156/10/16

BackgroundMapping of regions identical-by-descent (IBD) is a power-ful method for the identification of genetic loci sharedwithin families and implicated in disease. Classically, typ-ing of individual genetic markers throughout the genomesof the afflicted individuals mapped shared haplotypes andhas been successful in finding loci linked with numerousmonogenic traits [1]. An alternative physical method,Genomic Mismatch Scanning (GMS) [2], physically com-pares genomes of two affected individuals, related by anot too distant common ancestor, and enriches for theIBD regions they share. Despite its promise, the techniquehas not been exploited due to technical complexities.

As GMS offers the potential to avoid certain ambiguitiesassociated with genotyping and may be applicable topools of DNA samples from afflicted, related individuals,we aimed to improve the technique to reduce its inherentnoise and to render it robust.

As compared with linkage discovery by genotyping, phys-ical methods based on direct comparison of genomicsequences would enable more complete access to all IBDregions of the genome. Such methods rely on the fact thatnon-IBD regions are densely polymorphic between twoindividuals. A conceptually attractive approach for such adirect comparison involves the formation of duplex heter-ohybrid DNA fragments from the DNAs of related indi-viduals sharing a trait of interest, and then challengingthese fragments with reagents actuated by mispairings inthe heterohybrids. Such reagents may bind to the mis-matched fragment or may introduce strand breaks to per-mit separation of hybrid fragments that are not perfectlycomplementary from those that are perfectly paired fromthe IBD regions. Physical comparison has been success-fully used with a variety of technologies that exploit thechemical or structural differences between perfectlymatched and mismatched hybrids. These include chemi-cal mismatch cleavage [3] and various attempts to harnessproteins that respond to mismatches such as resolvase[4,5], single-stranded DNA-specific nucleases [6,7], MutSmismatch binding [8-10] or cleavage by the mismatch-specific enzymatic activity of E. coli MutS, MutL andMutH [11]. In addition, transfection of hybrid moleculesinto bacteria enables enrichment for perfectly paired frag-ments via an in vivo process dependent on mismatchrepair activities [12,13].

The use of these mismatch recognition methods has gen-erally been limited to the analysis of a few targeted frag-ments, but adapting them to genome-wide analysis isconceivable. Global treatment of fragments from theentire genome, however, also requires elimination ofreannealed homohybrid fragments (fragments betweenstrands of DNA from the same individual formed during

hybridization, including both mismatch-bearing hybridsformed with one paternal and one maternal strand andisohybrids comprising strands from the same parent)whose presence would confound the identification of IBDDNA. Ford and colleagues[14,15], in presenting the con-cept of genome-wide enrichment of IBD fragments, intro-duced the strategy of tagging one genome with methylgroups, and then using restriction enzymes specific foreither methylated or non-methylated DNA, but inactivewith hemimethylated DNA, to remove homohybrid DNA.In addition, they suggested using various mismatch-spe-cific agents, either by immunoprecipitation or by the E.coli mismatch repair system to eliminate mispaired frag-ments [12,14]. Subsequently, Nelson et al. [2] describedGMS, a combination of the methylation-dependenthomohybrid elimination method with in vitro cleavageby the mismatch repair proteins MutS, MutL and MutH,and digestion by exonuclease III. Application of GMS topairs of related yeast strains, including mapping of theIBD-enriched regions by hybridization to arrayed clones,correctly identified meiotic recombination crossoverpoints [2].

Use of GMS with mammalian genomes was demonstratedby enrichment of microsatellite alleles shared betweenrelated individuals [16,17] and confirmation of loci con-taining previously documented disease-related genes withmapping of IBD regions by hybridization to DNA arraystargeted to the chromosome of the known locus [18] or bymicrosatellite allele recovery [19]. Despite the recognitionof the potential of GMS [20,21], the approach has notbeen widely exploited, due to a lack of availability of themismatch repair proteins, various technical challengesinherent in a multi-step procedure and the lack of appro-priate means to map the IBD-enriched DNA. A fundamen-tal problem has been an apparent lack of appreciation ofthe need for a highly efficient process to ensure elimina-tion of non-identical hybrid fragments. Such residual frag-ments will hybridize to the DNA features on microarraysand so increase non-specific noise. Modifications to GMSand IBD mapping have been reported but the results fromgenome wide studies showed a substantial lack of con-cordance to confirm CEPH family meiotic crossovers [22].As linkage studies require processing of many samples, anadditional limitation has been the use of reagent volumesand methods unsuitable to high throughput microtiterplate-based procedures.

We present here an improved protocol for physical enrich-ment of IBD regions. The monogenic disease osteogenesisimperfecta (OI; Brittle Bone Disease) provides a contextfor demonstrating that the protocol could correctly iden-tify a well-established monogenic locus and an opportu-nity to discover novel loci relevant to OI etiology. OIincludes a heterogeneous group of autosomal dominant

Page 2 of 14(page number not for citation purposes)

Page 3: Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta

BMC Genetics 2009, 10:16 http://www.biomedcentral.com/1471-2156/10/16

inherited disorders characterized by bone fragility andother generalized connective tissue abnormalities [23]. Asanalysis of locus-specific restriction fragment length poly-morphisms had shown that nearly all cases of OI segre-gated with the two collagen genes [24-28], genome-widelinkage analysis was not perceived as necessary. However,given the wide variety of clinical expression of OI, it ispossible that variants of modifier genes may also cosegre-gate with OI [29]. Therefore a genome wide analysismight detect loci harbouring such genes.

We have implemented an IBD enrichment protocol thatovercomes many of the difficulties of the original GMSprotocol. We report that application of the improved pro-tocol to DNA from OI patients has correctly identified theshared IBD locus that includes COL1A2 bearing the dis-ease-causing mutation and additional loci that may be rel-evant to OI etiology. The protocol is now suitable forfinding linkage loci in applications ranging from mono-genic disease to complex multigenic disease.

ResultsOverview of IBD enrichment processThe procedure for IBD enrichment can be applied to anyrelated pair of individuals and is outlined in Figure 1.Affected relatives may have the same phenotype becausethey share a particular set of predisposing alleles inheritedfrom common ancestral chromosomal regions. The pro-cedure enriches for loci that, due to patterns of parentalmeiotic recombination events, are identical between thesiblings, and thus include the predisposing alleles. Thefinal IBD-enriched product is generically amplified andthe IBD regions are then mapped by various methods,such as by hybridization to DNA microarrays or highthroughput sequencing.

Genomic DNA (gDNA) from each individual is cleavedwith restriction enzymes that yield fragments of sufficientlength to ensure a high probability that they will includecommon polymorphisms. The two sets of fragments aretagged in different ways such that homohybrid moleculeswith identical tags are targeted for subsequent elimina-tion, but heterohybrids, with differentially tagged strands,are conserved. After tagging, the DNAs are mixed, dena-tured and reannealed to form hybrid fragments of differ-ent types as shown in Figure 1. Strands from oneindividual may reanneal with strands from the same indi-vidual to generate self-self hybrids (as isohybrids or asmixed parental hybrids), and with strands from the otherindividual to produce heterohybrids, either with strandsof different parental origin (M/P heterohybrids) or withboth strands derived from the same parent (M/M or P/Pheterohybrids). Any hybrid with strands from differentancestral chromosomes, hence bearing mispairs at poly-morphic sites, is a target for subsequent enzymatic attack

by MutS, MutL and MutH (LSHase) and exo III. The DNAthat survives the procedures that eliminate self-selfhybrids and mismatch specific enzymatic digestion isenriched for the perfectly paired IBD DNA.

Necessity for high efficiency of removal of non-specific DNAAs shown in Table 1, for any relative pair whose ancestralchromosomes are not related, at least 75% of the DNAmass is targeted for removal. With sibling pairs, for a givenlocus, depending on the sharing status, the process aimsto eliminate 3/4, 7/8 or all of the DNA fragments from thelocus. This represents a major challenge because system-atic inefficient removal of unwanted DNA reduces the

Physical IBD enrichment processFigure 1Physical IBD enrichment process. Genomic DNAs (gDNA) are isolated from two related, afflicted individuals and digested with a restriction enzyme that leaves Exonucle-ase III-resistant ends and generates fragments of about 4 Kb. DNA fragments are modified (diamonds or ovals) to permit discrimination between the individuals, for example by pres-ence or absence of methylation at GATC sequences. The fragments are mixed, denatured and renatured under condi-tions to favour unique copy reannealing. Non-annealed strands and hybrids of strands from the same individual are eliminated. Reannealed DNA fragments with one strand from each individual are called heterohybrids, which may be per-fectly paired due to inheritance of both strands from the same ancestral sequence or mismatched due to variation between different ancestral sequences. Mismatched hetero-hybrid fragments are removed by LSHase, a nucleolytic cock-tail of MutL, MutS and MutH, and subsequent digestion by Exonuclease III. The resulting IBD-enriched DNA is generi-cally amplified, labelled and mapped by two-colour hybridiza-tion to genomic topographic arrays, using the reannealed DNA as reference. The process is repeated for other afflicted pairs in the same family and in additional families. Variations include use of oligonucleotides as discrimination tags, reducing the number of steps by combining similar intermediate purification procedures, and eventually mapping IBD regions by high throughput redundant sequencing, with or without amplification.

Genomic DNADigest with 6-cutter rest. enz.

Family AIndividual 2

gDNAAdd individual discrimination tags

Denature & anneal unique copy sequences

Remove ssDNA

Reference DNA

1.5 μg 1.5 μg

Amplify LabelMap IBD DNA by microarray hybridization

Remove self-self hybrids

Remove mismatched DNA

Eliminated fragments

Family AIndividual 1

gDNA

X

XNon-annealed DNA

Cy-5

Cy-3

XX

XIBD-enriched DNA

Page 3 of 14(page number not for citation purposes)

Page 4: Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta

BMC Genetics 2009, 10:16 http://www.biomedcentral.com/1471-2156/10/16

sensitivity of detection of IBD regions. This is because forany of the procedures used for IBD mapping that measurethe mass of DNA at specific sets of sequence positions,such as those represented on BAC clone microarrays, apositive signal at a given position indicating IBD enrich-ment for some pairs must be compared to the signal forpairs where there is no sharing at the position. Previousreports have not addressed a quantitative definition ofthis problem.

Therefore we determined the level of efficiency required toadequately detect IBD enrichment. We express the sensi-tivity of detecting IBD sharing as a Discrimination Factor(DF), dependent on the overall efficiency of the removalof homohybrid and mismatched fragments. For anexpected IBD mass fraction, (I), and a residual survivingfraction of the DNA that was targeted for removal, r, withr = 1 - overall efficiency,

The behaviour of this function (Figure 2) for the differentclasses of DNA fragments (Table 1) shows that the overallprocess efficiency for loci with monoparental IBD sharingmust be at least 90% to achieve a minimum of about two-fold discrimination, a reasonable goal for microarrayanalysis or for real time quantitative PCR. This level ofoverall efficiency mandates much higher efficiencies forthe individual sequential steps of the process. Therefore,we designed various assays, each specific for a differentstep, to test and select optimum reaction parameters.Some of the same assays are also routinely performedboth as periodic quality controls to monitor reagents andenzyme specific activities and as internal and parallel con-trols during batch processing of multiple DNA samplepairs.

Robust enrichment of IBD DNATo produce adequate yields of reannealed fragments, weensured that all restriction digested gDNAs had a consist-ent range of fragment sizes, indicating intact gDNA. Weoptimized several components of the process, notably thechoice and concentration of reagents for reannealing in aformamide emulsion [14,30,31] and conditioning of theresin that eliminates fragments resulting from nucleolyticdigestion. Quality control during the process also includes

DF I r (1 I)r

= + ⋅ − (1)

Table 1: Partition of hybrid fragments after random reannealing of sibling pair DNAs.

Heterohybrid Homohybrid

Maternal or Paternal Maternal/PaternalIBD Non-Identical

All loci 1/8 1/8 1/4 1/2Biparental IBD locus 1/4 0 1/4 1/2

Monoparental IBD locus 1/8 1/8 1/4 1/2Non-identical locus 0 1/4 1/4 1/2

Variation in the distribution of reannealed DNA mass depending on the class of locus sharing. The mass distribution for a monoparental IBD locus is the same as that expected for annealing of grandparent and grandchild DNAs.

Discrimination between IBD loci and loci with no sharing as a function of overall process efficiencyFigure 2Discrimination between IBD loci and loci with no sharing as a function of overall process efficiency. The Discrimination Factor (DF) is the ratio of a value measured when the signal is derived from pairs when the fragments are enriched for IBD sharing to the value for pairs with no shar-ing. The overall process efficiency indicates the extent that the process has eliminated homohybrids and non-identical heterohybrids. Plots of DF as a function of the efficiency (see Equation 1) are shown for biparental sharing (IBD-2, gray) and monoparental sharing (IBD-1, black). The ratio of the DFs for IBD-2 versus IBD-1 is also shown (dashed).

0

1

2

3

4

5

6

7

8

9

10

70 75 80 85 90 95 100

Overall process efficiency (%)

Dis

cri

min

atio

n F

ac

tor

IBD-2 / IBD-1

IBD-1 / non-identical

IBD-2 / non-identical

Page 4 of 14(page number not for citation purposes)

Page 5: Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta

BMC Genetics 2009, 10:16 http://www.biomedcentral.com/1471-2156/10/16

monitoring of the DNA concentrations after digestion bythe fragmenting restriction endonuclease, after reanneal-ing, after the final enrichment process and after genericamplification to detect any atypical losses or excess yields.Several steps require DNA purification and/or concentra-tion and with a view to automation and robust process-ing, we perform the process entirely in microtiter platesusing ultrafiltration instead of alcohol precipitation. Thenumber of steps has been significantly reduced.

Using mismatched DNA substrates and perfectly matchedcontrols, we found protein concentrations and reactionconditions that would eliminate all measurable mis-matched DNA. We designed model mismatched DNAsubstrates with a GATC density typical for the humangenome (about 1 per 0.5 kb), but with only one mismatchper 3 kb, to ensure that the enzymatic activities thatremove non-identical DNA were efficient even forgenomic DNA fragments containing a lower than typicaldensity of SNPs. As shown in Figure 3, elimination of mis-matched substrates, linear or circular, is dependent onboth the nuclease activity of LSHase and subsequent treat-ment with exonuclease and a resin that removes thedigested DNA. We titrated the LSHase activity and the exo-nuclease/resin procedure such that a significant fraction ofthe control perfectly paired DNA was also degraded. Wealso designed two model substrates as internal qualitycontrols in each sample. One substrate is a 4 kb duplexwith a single mismatch (MM), and the second is perfectlymatched (PM), but with an additional 1 kb of insertedsequence. To each hybrid DNA sample, we addedamounts of MM and PM that mimicked single copy gDNAequivalents. Subsequent evaluation by quantitative PCR

using primers specific for each substrate ensured that ineach sib pair sample all enzymatic functions were suffi-ciently active to provide adequate discrimination (datanot shown). We also included the MM and PM substratesas parallel controls at concentrations sufficient to observethe elimination of the mismatched DNA and conserva-tion of the control DNA (data not shown).

Discrimination efficiency evaluated by quantitative PCRTo determine the extent of discrimination achieved by theIBD enrichment process, we performed quantitative PCR(qPCR) on generically amplified IBD-enriched DNA fromCEPH family grandfather-grandchild and grandmother-grandchild pairs. Meiotic recombination points in CEPHfamily pedigrees have been extensively documented[32,33] and so provide a high resolution map of expectedIBD regions between related pairs of individuals. Process-ing CEPH grandparent-grandchild pairs mimics sib pairanalysis in that reannealing produces the same expectedmass distribution of hybrid fragments as for monoparen-tal IBD loci sharing between sib pairs (Table 1). As a resultof parental meiotic crossing over, the sharing status ofeach region for a grandfather-grandchild pair is always theopposite to that of the grandmother-grandchild pair. Fig-ure 4 shows that IBD loci can be distinguished from non-identical loci over a range of copy number concentrations.Differences in Ct values of up to 4 to 6 cycles between IBDDNA and non-identical depleted DNA from the two com-plementary relative pairs indicate Discrimination Factorsranging from about 16 to 64. Therefore for some frag-ments, there remains less than 1% of the DNA that is

Evaluation and quality control of LSHase activityFigure 3Evaluation and quality control of LSHase activity. Substrates of 3.1 kb that mimic reannealed hybrid restriction fragments, either perfectly complementary (IBD) or with a single base pair mismatch (N-Id), and either as covalently closed circular (ccc) or linear (lin) forms, were incubated with LSHase and/or exo III (exo) and treated with the DAP procedure (resin), as indicated. Agarose gels of reaction products were stained with ethidium bromide. Lane M – high mass ladder, Invitrogen.

Quantitative PCR detects an extensive range of enrichment of IBD DNAFigure 4Quantitative PCR detects an extensive range of enrichment of IBD DNA. Eight CA microsatellite sites were assayed by qPCR using a universal (CA)n-specific probe and IBD-enriched DNA from complementary grandparent-grandchild pairs. Loci 1–4 represent shared loci for the grandfather-grandchild pair and loci 5–8 represent shared loci for the grandmother-grandchild pair. Ct is the cycle number at which a common threshold level of amplification was achieved.

Page 5 of 14(page number not for citation purposes)

Page 6: Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta

BMC Genetics 2009, 10:16 http://www.biomedcentral.com/1471-2156/10/16

expected to be depleted by the process (Equation (1), Fig-ure 2). Based on results from such qPCR assays, we con-clude that the IBD enrichment procedure is highlyefficient and that generic amplification of the IBD DNAadequately conserves copy number differences.

Identification of IBD regions for grandparent-grandchild pairs by mapping on BAC microarraysTo map all IBD regions, amplified reannealed hybridDNA, labelled with Cy3, and amplified IBD-enrichedDNA, labelled with Cy5, were hybridized to genome-wideBAC microarrays [34,35]. As described in Methods, afterinitial filtering, we calculated ratios of the Cy5-labeledIBD-enriched DNA signals to the Cy3-labeled reannealedDNA signals, and normalized the ratios. We standardizedthe variance between arrays using the variance of all theratios on each array. Figure 5 shows the profiles of theratios for complementary grandparent-grandchild pairs.The performance of an immobilized DNA clone is vali-dated when its ratio in an IBD-enriched region is signifi-cantly greater than its ratio when the region representsnon-identical DNA. Regions of sequential clones withratios that consistently discriminate the sharing statusbetween the two pairs correspond to regions where the

grandparent and grandchild share common microsatellitealleles due to the child's inheritance of the region from thegrandparent. Therefore, parental meiotic crossover lociidentified by microarray mapping of IBD-enriched DNAare the same as those identified by microsatellite genotyp-ing.

Such experiments with pairs of known IBD status vali-dated the enrichment process and the behaviour of immo-bilized DNA clones. However, due to uneven distributionof SNPs and different amounts of unique and efficientlyannealable sequence, each clone displays its particularcharacteristics, and hence there is a wide variability in theratio values between clones. Indeed, as shown in Figure 5,the ratios for some clones detecting enrichment are simi-lar to those of other clones detecting depletion. Therefore,as described in Methods, in subsequent experiments todiscover unknown IBD regions, we standardized the ratiosby mean centering using the variance of the ratios of eachclone obtained from experiments with 150 CEPH sibpairs.

Application of IBD enrichment and mapping to identify monogenic disease lociOsteogenesis imperfecta patients from four Old OrderAmish families, descended from a single founding couple,have a single Gly610Cys mutation in COL1A2, and yethave variable clinical expressivity of the disease. A samplefamily pedigree is shown in Figure 6. DNA was availablefor 24 patients and we analyzed 42 affected pairs as shownin Table 2.

In parallel with the affected pairs, 54 independent controlsib-pairs from the CEPH family collection were processed.No linkage is expected at any locus in this collection andas shown in Figure 7, none of the clones showed signifi-cant evidence for linkage at the nominal p-value < 2 × 10-

5. Thus the protocol is not inherently prone to generatefalse positive results.

Analysis of the OI family pairs revealed more than 30peaks of increased IBD sharing with nominal p-values < 2× 10-5 (Figure 8). However, the analysis of non-independ-ent pairs from a small number of families descended froma common founder will increase the observed IBD sharingbetween individuals. Therefore, in the context of a mono-genic disease the genome-wide threshold for significant

Confirmation of known meiotic crossoversFigure 5Confirmation of known meiotic crossovers. The IBD enrichment process was applied to CEPH family grandpar-ent-grandchild pairs. Amplification, labelling, hybridization and data analysis were performed as described in Methods, except standardization was by division of the normalized ratios by the standard deviation of the ratios of all clones on each array. For chromosome 1, the ratio values for the IBD-enriched DNA (solid lines) are compared with the regions where microsatellite genotypes are shared between a grand-parent and grandchild (dotted lines); grandfather (GF)/child, blue and grandmother (GM)/child, red. The arrow indicates a meiotic recombination crossover region.

CrossoverGF / Child GM / Child

Microsat. IBD-enriched

Table 2: Osteogenesis imperfecta relative pairs

Type of pair Number of pairs Expected shared IBD fraction

Sibling 19 0.75Avuncular 20 0.5Cousins 3 0.25

Page 6 of 14(page number not for citation purposes)

Page 7: Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta

BMC Genetics 2009, 10:16 http://www.biomedcentral.com/1471-2156/10/16

linkage is not useful. Some peaks may be linked to specifictraits that segregated only within a single family. We there-fore considered that the loci more likely to include anycosegregating genes would be represented by peaks withclones showing p-values <10-6, for which all pair-wisetests were informative and furthermore where all clonesshowed IBD sharing. By these criteria, there were twoprominent loci on chromosomes 1 and 7 (Figure 8). Thechromosome 1 locus spanned about 6 Mb (180.3 to186.2 Mb) and although this may represent a chance shar-ing of IBD within these families associated with sometrait, it is possible that such a locus harbours a gene that isrelevant to the expression of OI. The chromosome 7 locus,(76.4 to 96.1 Mb) includes Col1A2 at 93.9 Mb. As this isthe gene bearing the OI causative mutation, we concludethat the IBD enrichment and mapping procedures havesuccessfully identified the disease locus in these families.

DiscussionHere we have described several critical modifications to aphysical positional cloning process to ensure its reliabilityand to reduce noise in the final mapping analysis. Wehave recognized a major challenge in applying this meth-odology, namely that the procedure must eliminate alarge fraction of the reannealed DNA, mainly comprisedof strands derived from different chromosomes and frompotentially confounding self-self isohybrid fragments. Wehave formally defined the dependence of the discrimina-tion of IBD DNA from non-identical DNA as a function ofthe efficiency of the process of IBD selection. Therefore,we designed the procedure to ensure highly efficient andspecific intermediate yields, and included step-specificquality control assays. Improvements included a reduc-tion in the number of steps, optimization of reannealing,and conditioning of a DNA binding resin to render thiskey reagent suitably reliable.

Osteogenesis imperfecta, an autosomal dominant diseaseFigure 6Osteogenesis imperfecta, an autosomal dominant disease. Pedigree of Amish family A (Coriell) showing individuals with osteogenesis imperfecta as filled symbols.

?

Linkage test results for CEPH control populationFigure 7Linkage test results for CEPH control population. Genome-wide IBD sharing with standardized ratios was determined as described in Methods. Based on sib pairs from 54 independent families, no clones in chromosomes 1 through 22 surpass the threshold of significance at 2 × 10-5 (red line).

–lo

gP

val

ue

10 -

8 -

6 -

4 -

2 -

0 -

22201814 1612105321 6 7 8 94

Page 7 of 14(page number not for citation purposes)

Page 8: Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta

BMC Genetics 2009, 10:16 http://www.biomedcentral.com/1471-2156/10/16

Page 8 of 14(page number not for citation purposes)

Sharing of IBD regions among individuals from Amish osteogenesis imperfecta familiesFigure 8Sharing of IBD regions among individuals from Amish osteogenesis imperfecta families. Genome-wide IBD shar-ing with standardized ratios was determined as described in Methods.

–lo

gP

val

ue

10 -

8 -

6 -

4 -

2 -

0 -

10 -

8 -

6 -

4 -

2 -

0 -

10 -

8 -

6 -

4 -

2 -

0 -

10 -

8 -

6 -

4 -

2 -

0 -

10 -

8 -

6 -

4 -

2 -

0 -

10 -

8 -

6 -

4 -

2 -

0 -

3 4

1 2

765

98 10 11

12 151413

16 222120191817

Page 9: Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta

BMC Genetics 2009, 10:16 http://www.biomedcentral.com/1471-2156/10/16

The essential component for IBD selection is the enzy-matic activity that recognizes and hydrolyzes mismatchedDNA fragments. We cloned the three proteins into highyield overexpression vectors, and with various methods,including surface plasmon resonance studies, character-ized their activities and optimized storage and reactionconditions [36-38]. We over-titrated the enzymatic activi-ties and other reagents that eliminate unwanted DNA toprovide sufficient discrimination for mapping of IBDregions.

Because the yield of the IBD enrichment process is low, wevalidated a generic amplification method that producedsufficient DNA for mapping. Deviations in copy numberrepresentation introduced by the amplification methodare not detrimental to the final array ratio determinationsbecause increases or decreases in copy number of differentfragments are roughly balanced throughout the long BACclone sequences and hence in contiguous clones repre-senting an IBD region. Any clone-specific or sequence-dependent deviations are expected to be of similar direc-tion and magnitude in both sample and reference DNAs.The range of discrimination factors between 1.3 and 4 thatwe observed by BAC clone hybridization might be inter-preted as representing the efficiency of the enrichmentprocess. However, microarray analysis is subject tonumerous perturbing factors [39] resulting in dynamicrange dampening. To exclude any microarray-related fac-tors, we assessed discrimination by qPCR and showed thatthe process can enrich some IBD fragments at least 10- to50-fold. Even though numerous hybrid fragments mightescape the steps designed to eliminate non-identical DNAand so contribute to noise, the improvements we haveintroduced to the overall process ensure that the prepon-derant hybridization signal from the BAC clones is due tostrongly enriched IBD fragments. We have also developeda related protocol, Genome Hybrid Identity Profiling,which incorporates these improvements and the use ofself-self hybrid discrimination via ligation of oligonucle-otide tags; this method enables initial fragmentation ofthe gDNA with any restriction enzyme that generates effi-ciently ligatable termini.

The distinction between Mendelian monogenic diseaseand complex polygenic disease has blurred in recent years[40]; modifier genes may influence the clinical pheno-types of monogenic conditions [41]. To identify candidateloci harbouring disease or modifier genes, extended pedi-grees are particularly useful, as any IBD regions conservedin most or all of the afflicted individuals are large, whereasthe probability of finding any IBD loci shared amongmany distant relatives is small [42,43]. To demonstrateadequate performance for genome-wide disease genemapping, we applied the IBD-enrichment process to Old

Order Amish OI family pairs from an extended pedigreeand successfully identified a chromosome 7 locus con-taining COL1A2, the gene bearing the mutation responsi-ble for OI in these families. Another IBD-enriched locus,mapping to chromosome 1q, includes PTGS2 (COX-2),located at 184.9 Mb. COX-2 is expressed in a regulatedmanner in osteoblasts and is a key regulator in bone for-mation, interacting with various key proteins of bonemetabolism. Variants of COX-2 might affect the clinicaloutcome of collagen mutations and so may be involved insome of the pleiotropic phenotypes of OI [23,44-46]. Thephenotype of mice homozygous for the Col1A2Gly610Cys mutation mimics the milder clinical expres-sion of the mutation in the Amish families, and the phe-notype of heterozygous mice is inconsistent withautosomal dominance [47]. As phenotypic severity of themutated mice varies substantially depending on thegenetic background [48], these mice would be appropriatefor constructing additional mutations in candidate modi-fier genes, such as COX-2.

The potential of family-base studies to identify complexdisease genes has yet to be realized; numerous reportshave identified linkage peaks of only suggestive p-valuesand loci are often not replicated [49]. Due to modest geneeffects, loci for complex diseases will be difficult to iden-tify unless at least 1000 families are genotyped [50].Therefore, much effort has been invested in the genomewide association approach using high density SNP geno-typing arrays, and many credible associations have beendescribed in case-control studies [51]. Nevertheless, incontrast to linkage analysis, failure to find significantassociation in a region, even with high density arrays, can-not exclude a disease-related gene in the region [50]. Gen-otyping several hundred thousand markers generatesthousands of significant associations [52], such that themost significant p-values most likely correspond to generegions unrelated to the disease; whereas the rare trueassociations are likely to be among the lower ranking p-values of the significant associations [53]. Hence linkagestudies that whittle the genome to a few regions, followedby association studies limited to these regions, decreasethe multiple testing burden and the likelihood of falseassociations.

Genotypes from high density SNP arrays are also usefulfor IBD detection by comparing data from two or morerelated individuals to find long runs that contain no gen-otypes inconsistent with common inheritance from thesame ancestral chromosome, and that exceed some calcu-lated or simulated cut-off length, thus confidently exclud-ing runs of random IBS [42,43,54]. This approach hasbeen applied with small pedigrees to confirm a previouslyidentified locus for prostate cancer [42], and to identify

Page 9 of 14(page number not for citation purposes)

Page 10: Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta

BMC Genetics 2009, 10:16 http://www.biomedcentral.com/1471-2156/10/16

candidate loci for kidney cancer [54]. This novel means toexploit SNP data, together with the physical method wepresent here, provide alternative or complementary strat-egies for gene hunts in family collections and pedigrees.By addressing both the set of variants represented on SNParrays and additional variants, such as those typicallyrevealed only by fine mapping of regions of associatedhaplotypes, the physical method may detect shorter IBDregions than the genotyping approach, including regionsthat might be excluded as likely IBS, or regions of genuineIBD runs broken by erroneous genotypes.

To identify small IBD regions, the resolution of the map-ping method must approach the expected mapping reso-lution of the physical IBD enrichment procedure. Inaddition, genome-wide mapping of IBD-enriched DNAshould preserve the high level of relative enrichmentrevealed by qPCR. We expect that next-generation rese-quencing methods [55] will permit mapping IBD regionsby scoring sequence read depth. Analysis, including geno-typing of sequenced SNPs, would be restricted to uniquesequences and restriction fragments with some minimumdensity of SNPs of high minor allele frequencies, thusignoring the sequences that contribute to dampening ofdiscrimination in microarray analysis. In addition,sequence selection based on resequencing performancewith pools of IBD-enriched DNA from CEPH family pairswould generate a genome-wide set of sequences that accu-rately report IBD sharing. Considering that reannealing togenerate unique sequence hybrids reduces the genomicrepresentation at least five-fold and that the level of IBDenrichment for well-behaved sequences is 10- to 50-fold,very few sequencing runs would be sufficient to obtain adepth of coverage comparable to the depth that permittedcalling heterozygous SNPs after resequencing an entirehuman genome, requiring about 70 runs [56]. Sufficientlydeep coverage may permit distinction between monopa-rental and biparental sharing in IBD-enriched regions.

ConclusionWe have established a robust process that physicallyselects and maps genomic regions that are shared betweenfamily members. The methodological approach, origi-nally proposed by Ford and colleagues to isolate "inherit-ance units" [14,15], is now practicable for thesimultaneous processing of several hundred relative pairssuch as sibling pairs under rigorous quality control condi-tions. The improved process enabled mapping of loci fora monogenic trait, osteogenesis imperfecta. Using thephysical enrichment process, we have previously reportedthe identification of loci for autism, including a locus onchromosome 16p. With subsequent high density genotyp-ing in this locus, we found that PRKCB1, protein kinasebeta, is associated with autism [57]. Thus the procedure isnow suitable for enriching for IBD DNA in applications

ranging from monogenic to complex diseases. Coupledwith mature deep resequencing methods to map the IBD-enriched DNA, the technology will enable increased dis-crimination that will be required to analyze more dis-tantly related individuals and cost-efficient pools of IBD-enriched DNA samples.

MethodsReagents and DNAReagents and suppliers included: restriction endonucle-ases, Dam methylase and exonuclease III, NE Biolabs;TempliPhi kit, Amersham; Repli-G WGA kit, MolecularStaging/Qiagen; SYBR Green, SYBR Gold and PicoGreen,Molecular Probes/InvitroGen; GenomeHIP reagent com-ponents including HyFast gDNA reannealing reagent(HyF), DNA affinity polymer (DAP, a reconditionedderivative of benzoyl-naphthoyl-DEAE cellulose, Sigma),DAP buffer, LSHase (a formulation of E. coli MutS, MutLand MutH), enzyme buffer (EB), a formamide-based arrayhybridization buffer (AHB), coverslip removal buffer(CRB) and stringent wash buffer (SWB), IntegraGen; Mul-tiscreen filtration plates, Millipore; QIAquick glass-fibrepurification plate, Qiagen; UltraGAPS slides, Corning.

CEPH family genomic DNAs were either prepared fromimmortalized tissue culture cells with the Recoverease kit(Stratagene), a procedure that yields highly intact dialyzedDNA [58], or obtained from the Coriell repository. BACclones were obtained from InvitroGen or from the CentralNational de Séquencage, Genoscope (Evry, France). Cori-ell provides DNA from osteogenesis imperfecta families A,B, C and D, (for pedigrees and phenotypes, see http://ccr.coriell.org/nigms/phenotype/oi.html).

DNA quantificationWe measured DNA concentrations with PicoGreen in a384-well fluorescent plate reader using calf thymus DNAas standard.

Hybrid reannealed DNAWe digested genomic DNA (gDNA) from pairs of relativeswith Pst I and purified and concentrated the digests withMultiscreen Manu-30 ultrafiltration. We quantified thepurified, digested DNA and verified the expected range offragment sizes by agarose electrophoresis. We used Dammethylase to tag one of the Pst I-digested gDNAs. Wecombined 1.5 μg of each of the samples, denatured theDNA by incubation in 0.15 M NaOH at RT for 10 min,and neutralized the solution by addition of HyF bufferand phenol. We reannealed complementary strands byshaking the emulsion for 18 hours and recovered theaqueous phase after mixing with chloroform. We immo-bilized duplex fragments on glass fibre (QIAquick) in thepresence of a chaotropic salt to eliminate non-renaturedsingle stranded DNA [59]. We eluted the hybrid renatured

Page 10 of 14(page number not for citation purposes)

Page 11: Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta

BMC Genetics 2009, 10:16 http://www.biomedcentral.com/1471-2156/10/16

fragments with TE and removed an aliquot for labellingand hybridization to microarrays.

Enrichment for IBD DNAWe incubated the reannealed hybrid DNA with LSHase inEB at 37°C for 15 min and heated for 10 min at 65°C. Weincubated the product of the LSHase reaction with Mbo Iand Dpn I for 30 min at 37°C and 10 min at 65°C. Then,we added exo III, incubated for 30 min at 37°C, addedDAP buffer and treated with DAP to isolate IBD-enrichedduplex DNA free of single-stranded digestion products.

Amplification of IBD-enriched DNATo amplify the IBD-enriched DNA, we used either Tem-pliPhi or Repli-G, as recommended by the manufacturerswith minor modifications. We denatured the reannealedhybrid DNA or the IBD-enriched DNA with NaOH andamplified for 16 hrs at 30°C. We purified the amplifiedproducts by ultrafiltration and quantified the DNA. Theextent of amplification ranged from about 250- to 2000-fold, or the equivalent of about 8 to 11 doublings.

Array designWith the aim of choosing 3000 clones with 1 Mb spacing,we used the program CloneTrek (IntegraGen). CloneTrek's input includes essential features of each clone andtwo parameters, the distance between two clones on thetiling path and the minimal distance accepted betweentwo clones. Initially, all clones in the NCBI clone registryare placed on a tiling path. Clone Trek's iterative algo-rithm: for each chromosome, while there are cloneswhose removal would create acceptable gaps, identify andremove the least favoured clone. The least favoured clonehas the lowest score with respect to various defined crite-ria, including FISH data, STS content, sequencing status,size and mapping status. The arrays had 4 replicates of2779 clones, including 2266 mapped to a uniquegenomic position with the Build 36 assembly (May,2006). Average spacing was 1.2 Mb and median spacing0.95 Mb. Duplicate sets of blocks were printed in twozones in order to maximally separate the two sets of dupli-cates printed in each block. Various controls including 15rice BACs were also printed in quadruplicate. Subsequentversions of the arrays have 5500 clones.

Array printingWe purified BAC DNA using alkaline lysis, filtration andisopropanol precipitation. We amplified BAC DNA asdescribed above for amplification of IBD-enriched DNA.Clone identity was verified by end sequencing of allclones and restriction digest fingerprinting of someclones. We checked the fidelity of amplification by verify-ing that sample BAC fingerprint patterns matched beforeand after amplification. We digested 4 to 20 μg of eachamplified BAC with Alu I and purified the digests by ultra-

filtration. The DNA was dried by vacuum centrifugationand resuspended in 12 μl of 50% DMSO. We printed theDNA on GAPS2 slides (Corning) with a BioRoboticsMicroGridII arrayer using BioRobotics quill pins; spotdiameters were about 100 microns. We irradiated arrayswith 100 mJ of 254 nm UV light and then baked them at80°C for two hours. We scanned all slides and examinedthe images of auto-fluorescence to identify any pin-spe-cific problems. We validated the printing batches of about100 slides by SYBR green staining and hybridization testson selected slides.

Mapping of IBD-enriched DNA on microarraysAmplified reannealed hybrid DNA, labelled with Cy3,and amplified IBD-enriched DNA, labelled with Cy5,were hybridized to the BAC microarrays to enable a ratio-metric analysis [34,35]. Labelling reactions of 30 μl werefor 16 to 18 hours at 37°C with Klenow (exo-) DNApolymerase (NE Biolabs) and contained 1 μg of DNA, 125μM random octamers, 120 μM dATP, dGTP, TTP, 60 μMdCTP and either 50 μM Cy5-dCTP or 50 μM Cy3-dCTP.We purified the labelled DNA by spun gel filtrationthrough Sephadex G50 (APB Biotech) in HV45 micro-plates (Millipore). The specific fluorescence (fluoro-chromes/Kbp, fl/Kb), of each probe, as determined byDNA quantification and Cy-specific fluorescence readingsin a fluorescence plate reader using Cy-dCTPs as stand-ards, ranged from 20 to 50 fl/Kb. We prepared hybridiza-tion mixes by mixing Cy5- and Cy3-labeled DNA,concentrating by vacuum centrifugation and resuspend-ing in 35 μl of AHB containing Cot1 DNA. Array slideswere blocked by incubation in 10% BSA, 0.01% SDS at37°C for 30 min. We pre-hybridized slides with 40 μlAHB containing 730 mg/ml salmon sperm DNA at RT for30 min. We removed about 30 μl of the prehybridizationmix, deposited the hybridization mixes on the arrays, cov-ered with Hybrislips (Grace) and placed the arrays in indi-vidual hybridization chambers (Corning) in a water bathat 42°C for 2 to 3 days. We removed coverslips by gentleagitation in CRB, rinsed in 2× SSC, soaked in SWB at 45°Cfor 10 min, and then briefly rinsed slides in a series ofbaths: 0.2× SSC, filtered 0.1× SSC and 70% isopropanol.We have also tested and validated various commerciallabelling kits, hybridization buffers and alternative proto-cols, including washing at higher temperatures in theabsence of formamide.

Image and microarray data analysisArrays were scanned using an Agilent scanner and fluores-cent intensities were corrected by subtraction of localbackground using GenePix® Pro 5.1. Spots with fluores-cent signals indicating partial saturation (> 50,000) or sig-nals less than 2 times the mean of the backgrounds fromall autosomal clones were excluded. A ratio value of IBD-enriched DNA versus reannealed DNA was determined

Page 11 of 14(page number not for citation purposes)

Page 12: Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta

BMC Genetics 2009, 10:16 http://www.biomedcentral.com/1471-2156/10/16

based on the four spot replicates for each clone. Cloneswith less than three morphologically acceptable replicatesor with excessive variance of replicate ratios (Var(repli-cate) - Mean(Var(replicates)) > 2 * Var(Var(replicates))were eliminated. Median ratios of the replicates for eachclone were computed and data were normalized betweenarrays by dividing the ratio of each clone by the mean ofthe ratios of all autosomal clones. Unless otherwise noted,the ratios of each clone were standardized using 150 fullsib pair controls by subtracting the mean of the controlratios of the clone and dividing by the variance of the con-trol clone ratios. Only the clones mapped to a uniquegenomic position were used in the analysis.

IBD determination and linkage analysis

We determined a moving average (MA) ratio for eachclone. With Rk as the standardized ratio of clone k, di the

physical distance weight for neighbouring clone i (d isinversely proportional to the distance in Mb from theclone, using a weighting function based on a normal dis-tribution), and m the number of flanking clones on each

side of clone k, then , the MA ratio of clone k is

obtained:

We set m as three, corresponding to a moving window ofseven adjacent clones. As we expect that the status of anaverage of 75% of the clones is IBD, a threshold ratio Twas determined such that 75% of the MA-ratios from allclones and all sib pair controls were greater than T. We setthe IBD status to one for clones with MA-ratios greaterthan T and to zero for clones with MA ratios less than T.After these binary IBD scores were determined for eachclone for each affected relative pair, we then counted thenumber of pairs that were IBD at each clone. A region rep-resented by a clone or a series of consecutive clones islinked to the trait only if the number of affected pairs thatare IBD for the region exceeds the number of pairs that bychance could have received copies of the same ancestralregion. The null distribution of chance sharing appropri-ate for studies with different types of relative pairs wasdetermined as described [60]; alternative methods todetermine the null distribution may be appropriate forvarious experimental designs, including those with inbredpedigrees [61-64]. To limit the genome-wide probabilityof false linkage to 5%, we used the P-value of 2 × 10-5 toset the pointwise significance threshold for declaration ofsignificantly increased sharing [65].

Quantitative PCRPCR amplifications were quantified by real time qPCR,using a common probe specific for CA microsatelliterepeat sequences [66,67]. Reactions of 25 μl contained 5ng of amplified IBD-enriched DNA, AmpliTaqGoldbuffer, 0.2 mM each dATP, dCTP, dGTP, TTP, 0.4 mMdUTP, 5 mM MgCl2, 0.005 U/μl uracil-N-DNA glycosy-lase (Sigma), 0.03 U/μl AmpliTaqGold DNA polymerase(Roche), 0.2 μM each primer and the probe oligonucle-otide 5' FAM-(CA)14-TAMRA. Incubations were at 37°Cfor 10 min, 95°C for 10 min, followed by 35 cycles at95°C for 15 sec and 60°C for 1 min.

AvailabilityThe microarray gpr files and three annotation files can beobtained from http://BMC_Genetics2008.integragen.org.

The file "Osteogenesis_samples.txt" lists the Coriell osteo-genesis imperfecta samples.

The file "Osteogenesis_relative_pairs.txt" lists the experi-mental pairs of relatives and the identity codes for themicroarray gpr files.

The file "BAC_clones.txt" lists the clone names and theirchromosomal positions. If it was not possible to confi-dently position a clone, it is annotated as "Null".

Authors' contributionsPB wrote the manuscript and, with expert participation byCM, directed the development of the enzymology andtechnology for IBD enrichment, DNA amplification andmicroarray printing and hybridization. MV prepared DNAand performed experiments and hybridizations. JPS man-ufactured the arrays, prepared DNA and performed exper-iments and hybridizations. FR directed theimplementation of the protocol in a production setting.DS guided the qPCR experiments. SR, RF and NC partici-pated in the design of the OI study. PL designed and wrotethe CloneTrek program and managed data bases. JH con-ceived of the study and participated in drafting the manu-script. AP participated in drafting the manuscript anddesigned and performed the statistical analysis with assist-ance by FT. All authors read and approved the final man-uscript.

AcknowledgementsWe thank Jan Mous and Elke Roschmann for critical reading and comments; Lon Aggerbeck and David Rickman, CNRS, Gif-sur-Yvette for initial DNA array printing; Stephanie Maillard and Virginie Decaulne for expert technical assistance, Rachel Rousseau for performing qPCR experiments, Jolanta Luberda for assistance in array printing, Safa Saker, Isabelle Lambert, Isa-belle Bezier and Thierry Larmonier, Genethon, Association Française con-tre les Myopathies, for cell culture; Abdel Benajou for help with array data analysis; Bruno Copin for help with figure preparation; Gabor Gyapay, Jean Weissenbach and colleagues, Genoscope, for contributions to DNA prep-

R k

R

d Ri m

m

di m

mk

i k i

i

=⋅ +

=−∑

=−∑

(2)

Page 12 of 14(page number not for citation purposes)

Page 13: Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta

BMC Genetics 2009, 10:16 http://www.biomedcentral.com/1471-2156/10/16

arations and sequencing; Mark Lathrop and colleagues, Centre National de Genotypage for access to their facilities; and the reviewers for constructive criticism of the manuscript.

References1. Botstein D, Risch N: Discovering genotypes underlying human

phenotypes: past successes for mendelian disease, futureapproaches for complex disease. Nat Genet 2003,33(Suppl):228-237.

2. Nelson SF, McCusker JH, Sander MA, Kee Y, Modrich P, Brown PO:Genomic mismatch scanning: a new approach to geneticlinkage mapping. Nat Genet 1993, 4(1):11-18.

3. Cotton RG, Scriver CR: Proof of "disease causing" mutation.Hum Mutat 1998, 12(1):1-3.

4. Mashal RD, Koontz J, Sklar J: Detection of mutations by cleavageof DNA heteroduplexes with bacteriophage resolvases. NatGenet 1995, 9(2):177-183.

5. Youil R, Kemper BW, Cotton RG: Screening for mutations byenzyme mismatch cleavage with T4 endonuclease VII. ProcNatl Acad Sci USA 1995, 92(1):87-91.

6. Till BJ, Burtner C, Comai L, Henikoff S: Mismatch cleavage by sin-gle-strand specific nucleases. Nucleic Acids Res 2004,32(8):2632-2641.

7. Oleykowski CA, Bronson Mullins CR, Godwin AK, Yeung AT: Muta-tion detection using a novel plant endonuclease. Nucleic AcidsRes 1998, 26(20):4597-4602.

8. Lishanski A, Ostrander EA, Rine J: Mutation detection by mis-match binding protein, MutS, in amplified DNA: applicationto the cystic fibrosis gene. Proc Natl Acad Sci USA 1994,91(7):2674-2678.

9. Geschwind DH, Rhee R, Nelson SF: A biotinylated MutS fusionprotein and its use in a rapid mutation screening technique.Genet Anal 1996, 13(4):105-111.

10. Wagner R, Debbie P, Radman M: Mutation detection usingimmobilized mismatch binding protein (MutS). Nucleic AcidsRes 1995, 23(19):3944-3948.

11. Smith J, Modrich P: Mutation detection with MutH, MutL, andMutS mismatch repair proteins. Proc Natl Acad Sci USA 1996,93(9):4374-4379.

12. Doutriaux MP, Wagner R, Radman M: Mismatch-stimulated kill-ing. Proc Natl Acad Sci USA 1986, 83(8):2576-2578.

13. Faham M, Cox DR: A novel in vivo method to detect DNAsequence variation. Genome Res 1995, 5(5):474-482.

14. Casna NJ, Novack DF, Hsu MT, Ford JP: Genomic analysis II: iso-lation of high molecular weight heteroduplex DNA followingdifferential methylase protection and Formamide-PERThybridization. Nucleic Acids Res 1986, 14(18):7285-7303.

15. Sanda AI, Ford JP: Genomic analysis I: inheritance units andgenetic selection in the rapid discovery of locus linked DNAmarkers. Nucleic Acids Res 1986, 14(18):7265-7283.

16. McAllister L, Penland L, Brown PO: Enrichment for loci identical-by-descent between pairs of mouse or human genomes bygenomic mismatch scanning. Genomics 1998, 47(1):7-11.

17. Cheung VG, Gregg JP, Gogolin-Ewens KJ, Bandong J, Stanley CA,Baker L, Higgins MJ, Nowak NJ, Shows TB, Ewens WJ, et al.: Linkage-disequilibrium mapping without genotyping. Nat Genet 1998,18(3):225-230.

18. Cheung VG, Nelson SF: Genomic mismatch scanning identifieshuman genomic DNA shared identical by descent. Genomics1998, 47(1):1-6.

19. Mirzayans F, Mears AJ, Guo SW, Pearce WG, Walter MA: Identifi-cation of the human chromosomal region containing the iri-dogoniodysgenesis anomaly locus by genomic-mismatchscanning. Am J Hum Genet 1997, 61(1):111-119.

20. Lander ES: Finding similarities and differences amonggenomes. Nat Genet 1993, 4(1):5-6.

21. Kruglyak L, McAllister L: Who needs genetic markers? Nat Genet1998, 18(3):200-202.

22. Smirnov D, Bruzel A, Morley M, Cheung VG: Direct IBD mapping:identical-by-descent mapping without genotyping. Genomics2004, 83(2):335-345.

23. Byers PH, Steiner RD: Osteogenesis imperfecta. Annu Rev Med1992, 43:269-282.

24. Sykes B, Ogilvie D, Wordsworth P, Wallis G, Mathew C, Beighton P,Nicholls A, Pope F, Thompson E, Tsipouras P, et al.: Consistent link-

age of dominantly inherited osteogenesis imperfecta to thetype I collagen loci: COL1A1 and COL1A2. Am J Hum Genet1990, 46(2):293-307.

25. Byers PH, Shapiro JR, Rowe DW, David KE, Holbrook KA: Abnor-mal alpha 2-chain in type I collagen from a patient with aform of osteogenesis imperfecta. J Clin Invest 1983,71(3):689-697.

26. Falk C, Schwartz R, Ramirez F, Tsipouras P: Use of molecular hap-lotypes specific for the human pro alpha 2(I) collagen gene inlinkage analysis of the mild autosomal dominant forms ofosteogenesis imperfecta. Am J Hum Genet 1986, 38(3):269-279.

27. Nicholls A, Pope F, Craig D: An abnormal collagen alpha chaincontaining cysteine in autosomal dominant osteogenesisimperfecta. Br Med J (Clin Res Ed). 1984, 288(6411):112-113.

28. Sykes B, Ogilvie D, Wordsworth P, Anderson , Jones N: Osteogen-esis imperfecta is linked to both type I collagen structuralgenes. Lancet 1986, 2(8498):69-72.

29. McBride D, Streeten E, Mitchell B, Shuldiner A: Variable expressiv-ity of a COL1A2 gly-610-cys mutation in a large Amish ped-igree. Am J Hum Genet 2002, 71(Supplement):1047.

30. Wieder R, Wetmur JG: Factors affecting the kinetics of DNAreassociation in phenol-water emulsion at high DNA con-centrations. Biopolymers 1982, 21(3):665-677.

31. Kohne DE, Levison SA, Byers MJ: Room temperature method forincreasing the rate of DNA reassociation by many thousand-fold: the phenol emulsion reassociation technique. Biochemis-try 1977, 16(24):5329-5341.

32. Broman KW, Murray JC, Sheffield VC, White RL, Weber JL: Com-prehensive human genetic maps: individual and sex-specificvariation in recombination. Am J Hum Genet 1998,63(3):861-869.

33. Dib C, Faure S, Fizames C, Samson D, Drouot N, Vignal A, MillasseauP, Marc S, Hazan J, Seboun E, et al.: A comprehensive genetic mapof the human genome based on 5,264 microsatellites. Nature1996, 380(6570):152-154.

34. Solinas-Toldo S, Lampel S, Stilgenbauer S, Nickolenko J, Benner A,Dohner H, Cremer T, Lichter P: Matrix-based comparativegenomic hybridization: biochips to screen for genomicimbalances. Genes Chromosomes Cancer 1997, 20(4):399-407.

35. Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, Collins C,Kuo WL, Chen C, Zhai Y, et al.: High resolution analysis of DNAcopy number variation using comparative genomic hybridi-zation to microarrays. Nat Genet 1998, 20(2):207-211.

36. Acharya S, Foster PL, Brooks P, Fishel R: The coordinated func-tions of the E. coli MutS and MutL proteins in mismatchrepair. Mol Cell 2003, 12(1):233-246.

37. Brooks P: MutS-DNA interactions and DNase protectionanalysis with surface plasmon resonance. In Methods in Molecu-lar Biology: DNA Repair Protocols: Procaryotic Systems Volume 152. Editedby: Vaughan P. Totowa, NJ: Humana; 2000:119-132.

38. Galio L, Bouquet C, Brooks P: ATP hydrolysis-dependent forma-tion of a dynamic ternary nucleoprotein complex with MutSand MutL. Nucleic Acids Res 1999, 27(11):2325-2331.

39. Botwell D, Sambrook J, eds: DNA Microarrays: A MolecularCloning Manual. Cold Spring Harbor Laboratory Press, ColdSpring Harbor, NY; 2003.

40. Brinkman RR, Dube MP, Rouleau GA, Orr AC, Samuels ME: Humanmonogenic disorders – a source of novel drug targets. Nat RevGenet 2006, 7(4):249-260.

41. Antonarakis SE, Beckmann JS: Mendelian disorders deservemore attention. Nat Rev Genet 2006, 7(4):277-282.

42. Thomas A, Camp NJ, Farnham JM, Allen-Brady K, Cannon-AlbrightLA: Shared genomic segment analysis. Mapping disease pre-disposition genes in extended pedigrees using SNP genotypeassays. Ann Hum Genet 2008, 72(Pt 2):279-287.

43. Kong A, Masson G, Frigge ML, Gylfason A, Zusmanovich P, Thorleif-sson G, Olason PI, Ingason A, Steinberg S, Rafnar T, et al.: Detectionof sharing by descent, long-range phasing and haplotypeimputation. Nat Genet 2008, 40(9):1068-1075.

44. Marini JC, Forlino A, Cabral WA, Barnes AM, San Antonio JD, Mil-grom S, Hyland JC, Korkko J, Prockop DJ, De Paepe A, et al.: Con-sortium for osteogenesis imperfecta mutations in the helicaldomain of type I collagen: regions rich in lethal mutationsalign with collagen binding sites for integrins and proteogly-cans. Hum Mutat 2007, 28(3):209-221.

Page 13 of 14(page number not for citation purposes)

Page 14: Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta

BMC Genetics 2009, 10:16 http://www.biomedcentral.com/1471-2156/10/16

Publish with BioMed Central and every scientist can read your work free of charge

"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."

Sir Paul Nurse, Cancer Research UK

Your research papers will be:

available free of charge to the entire biomedical community

peer reviewed and published immediately upon acceptance

cited in PubMed and archived on PubMed Central

yours — you keep the copyright

Submit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.asp

BioMedcentral

45. Li L, Pettit AR, Gregory LS, Forwood MR: Regulation of bone biol-ogy by prostaglandin endoperoxide H synthases (PGHS): arose by any other name. Cytokine Growth Factor Rev 2006,17(3):203-216.

46. Lau KH, Kapur S, Kesavan C, Baylink DJ: Up-regulation of theWnt, estrogen receptor, insulin-like growth factor-I, andbone morphogenetic protein pathways in C57BL/6J osteob-lasts as opposed to C3H/HeJ osteoblasts in part contributesto the differential anabolic response to fluid shear. J Biol Chem2006, 281(14):9576-9588.

47. McBride DJ, Carleton S, Phillips C, Kouznetsova N, Leikin S, ShapiroJ, Mitchell B, Shuldiner A, Streeten E: COL1A2 G610C mice: aknock-in mouse model based on a large human OI kindredwith phenotype variation. 9th International Meeting on OsteogenesisImperfecta: 2005; Annapolis, Maryland: NICHD 2005.

48. Carleton SM, McBride DJ, Carson WL, Huntington CE, Twenter KL,Rolwes KM, Winkelmann CT, Morris JS, Taylor JF, Phillips CL: Roleof genetic background in determining phenotypic severitythroughout postnatal development and at peak bone mass inCol1a2 deficient mice (oim). Bone 2008, 42(4):681-694.

49. Altshuler D, Daly M, Kruglyak L: Guilt by association. Nat Genet2000, 26(2):135-137.

50. Risch NJ: Searching for genetic determinants in the new mil-lennium. Nature 2000, 405(6788):847-856.

51. Donnelly P: Progress and challenges in genome-wide associa-tion studies in humans. Nature 2008, 456(7223):728-731.

52. Curtis D, Vine AE, Knight J: A pragmatic suggestion for dealingwith results for candidate genes obtained from genome wideassociation studies. BMC Genet 2007, 8:20.

53. Zaykin DV, Zhivotovsky LA: Ranks of genuine associations inwhole-genome scans. Genetics 2005, 171(2):813-823.

54. Leibon G, Rockmore DN, Pollak MR: A SNP streak model for theidentification of genetic regions identical-by-descent. StatAppl Genet Mol Biol 2008, 7(1):Article16.

55. Shendure J, Ji H: Next-generation DNA sequencing. Nat Biotech-nol 2008, 26(10):1135-1145.

56. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J,Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, et al.: Accuratewhole human genome sequencing using reversible termina-tor chemistry. Nature 2008, 456(7218):53-59.

57. Philippi A, Roschmann E, Tores F, Lindenbaum P, Benajou A, Ger-main-Leclerc L, Marcaillou C, Fontaine K, Vanpeene M, Roy S, et al.:Haplotypes in the gene encoding protein kinase c-beta(PRKCB1) on chromosome 16 are associated with autism.Mol Psychiatry 2005, 10(10):950-960.

58. Cairns J: The bacterial chromosome and its manner of repli-cation as seen by autoradiography. J Mol Biol 1963,6(3):208-213.

59. Beld M, Sol C, Goudsmit J, Boom R: Fractionation of nucleic acidsinto single-stranded and double-stranded forms. Nucleic AcidsRes 1996, 24(13):2618-2619.

60. Smalley SL, Woodward JA, Palmer CG: A general statisticalmodel for detecting complex-trait loci by using affected rel-ative pairs in a genome search. Am J Hum Genet 1996,58(4):844-860.

61. Fisher RA: A fuller theory of "junctions" in inbreeding. Heredity1954, 8(2):187-197.

62. Cannings C: The identity by descent process along the chro-mosome. Hum Hered 2003, 56(1–3):126-130.

63. Thomas A, Skolnick MH, Lewis CM: Genomic mismatch scanningin pedigrees. IMA J Math Appl Med Biol 1994, 11(1):1-16.

64. Donnelly KP: The probability that related individuals sharesome section of genome identical by descent. Theor Popul Biol1983, 23(1):34-63.

65. Lander E, Kruglyak L: Genetic dissection of complex traits:guidelines for interpreting and reporting linkage results. NatGenet 1995, 11(3):241-247.

66. Jouquand S, Andre C, Cheron A, Hitte C, Chuat JC, Galibert F: Usingthe fluorogenic 5' nuclease assay for high-throughput detec-tion of (CA)n repeats in radiation hybrid mapping. Biotech-niques 2000, 28(4):754-758.

67. Ginzinger DG, Godfrey TE, Nigro J, Moore DH 2nd, Suzuki S, Pallavi-cini MG, Gray JW, Jensen RH: Measurement of DNA copynumber at microsatellite loci using quantitative PCR analy-sis. Cancer Res 2000, 60(19):5405-5409.

Page 14 of 14(page number not for citation purposes)