gene mapping and the human genome mapping project

7
Gene mapping and the human genome mapping project P.F.R. little Department of Biochemistry, Imperial College of Science, Technology and Medicine, London, UK Current Opinion in Cell Biology 1990, 2:478-484 Introduction Human gene mapping has progressed in the last 20 years from a minor discipline of human genetics, itself a minor discipline of medicine, to the focus of a collaborative ef- fort that in time will grow to the largest single, collabo- rative project ever carried out in the biological sciences. This review is about that project-the human genome mapping project (HGMP). The HGMP has as its ultimate goal the establishment of the complete DNA sequence of the human genome. There is widespread recognition that this cannot be achieved by the direct determination of DNA sequence in isolation and without introducing other frames of ref- erence critical to both data acquisition and interpretation. It is also important to include in the sequence database the large volume of existing data on genes and DNA frag- ments, and their locations and associated pathologies. Central to the HGMP is a hierarchical accumulation of information about the genome, of which DNA sequence represents the most detailed level. It is obvious that humans can never, for good ethical rea- sons, be experimentally manipulated: their genetic analy- sis must, of necessity, be through surrogates. The mouse has become the species of choice for this and the par- allel analysis of its genome is integral to the HGMP. The genome of the nematode worm, Caenor&&Ls elegans, is the focus of the largest current programme of DNA analysis and has become a test bed for several highly in- novative techniques (Co&on et al, Pm Nat1 Acud Sci USA1986,83:7821-7825). This unparalleled technical po- sition combined with the undoubted biological interest in the nematode has required its inclusion in the HGMP. Ge- netic analysis and manipulation of Saccbalmnyces cere vrkiae makes it the most experimentally tractable eukary- otic organism and it is the target of a systematic mapping analysis at the cloned DNA level (Olson et aA, Pm Nat1 Auad Sci US4 1986, 83:732&7830). Eschericbia coli oc- cupies a similar position in the analysis of prokaryotes and is already mapped at the cloned DNA level (Kohara et al, Cell 1987,50:495-508). For these reasons, both or- ganisms are also included in the project. It should be clear by now that the ‘human’ of HGMP is misleading; for human we should understand human, mouse, worm, yeast and bacteria There is also a sub- stantial interest in the fruit lly, Drosc@iiu mehnogast~, which has been a source of several gene families that ap- pear to be central to genetic determination of develop- ment in vertebrates. There are two major misconceptions about the HGMP. First, that what is proposed in the project is the estab- lishment of the whole DNA sequence by processes not dissimilar to the ‘shotgun’ sequencing of small genes. Sec- ond, the project is often misrepresented as having as its sole goal the elucidation of all 3 x 109 base pairs of the human genome. It seems likely that shotgun sequencing of the whole of the human genome is technically impossible. The DNAs of vertebrate genomes contain complex reiterated se- quences, some of which would make the computational determination of the complete DNA sequence a hope- less problem if attempted in a 600 bp frame. The so- lution to this problem is to adopt a hierarchical ap- proach, by breaking down the genome into progressively smaller fragments, avoiding facing the full complexity of the genome in any single analysis. This is the major struc- ture of the HGMP which is explicitly delined to encom- pass four levels of ‘mapping’ activity at increasing reso- lution (see Fig. 1). The construction of a restriction frag- ment polymorphism (RFIP) map is the lowest level of resolution (5-lOMb), a restriction site map is the next level (l-5 Mb), overlapping recombinant DNA clones the next (2&50 kb) and, iinally, the DNA sequence is the ul- timate resolution. Each of these levels, together with their immediate and long-term usefulness and the technical problems associ- ated with them, is discussed in turn. 470 Abbreviations cDNA--complementary DNA; HCMP-human genome mapping project; PCR-polymerase chain reaction; RFLP-restriction fragment length polymorphism; STS-sequence tagging site; VNTR-variable number tandem repeat; YAC-yeast artificial chromosome. @ Current Biology Ud ISSN 0955*74

Upload: pfr

Post on 30-Dec-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Gene mapping and the human genome mapping project

P.F.R. little

Department of Biochemistry, Imperial College of Science, Technology and Medicine, London, UK

Current Opinion in Cell Biology 1990, 2:478-484

Introduction

Human gene mapping has progressed in the last 20 years from a minor discipline of human genetics, itself a minor discipline of medicine, to the focus of a collaborative ef- fort that in time will grow to the largest single, collabo- rative project ever carried out in the biological sciences. This review is about that project-the human genome mapping project (HGMP).

The HGMP has as its ultimate goal the establishment of the complete DNA sequence of the human genome. There is widespread recognition that this cannot be achieved by the direct determination of DNA sequence in isolation and without introducing other frames of ref- erence critical to both data acquisition and interpretation. It is also important to include in the sequence database the large volume of existing data on genes and DNA frag- ments, and their locations and associated pathologies. Central to the HGMP is a hierarchical accumulation of information about the genome, of which DNA sequence represents the most detailed level.

It is obvious that humans can never, for good ethical rea- sons, be experimentally manipulated: their genetic analy- sis must, of necessity, be through surrogates. The mouse has become the species of choice for this and the par- allel analysis of its genome is integral to the HGMP. The genome of the nematode worm, Caenor&&Ls elegans, is the focus of the largest current programme of DNA analysis and has become a test bed for several highly in- novative techniques (Co&on et al, Pm Nat1 Acud Sci USA 1986,83:7821-7825). This unparalleled technical po- sition combined with the undoubted biological interest in the nematode has required its inclusion in the HGMP. Ge- netic analysis and manipulation of Saccbalmnyces cere vrkiae makes it the most experimentally tractable eukary- otic organism and it is the target of a systematic mapping analysis at the cloned DNA level (Olson et aA, Pm Nat1 Auad Sci US4 1986, 83:732&7830). Eschericbia coli oc- cupies a similar position in the analysis of prokaryotes

and is already mapped at the cloned DNA level (Kohara et al, Cell 1987,50:495-508). For these reasons, both or- ganisms are also included in the project.

It should be clear by now that the ‘human’ of HGMP is misleading; for human we should understand human, mouse, worm, yeast and bacteria There is also a sub- stantial interest in the fruit lly, Drosc@iiu mehnogast~, which has been a source of several gene families that ap- pear to be central to genetic determination of develop- ment in vertebrates.

There are two major misconceptions about the HGMP. First, that what is proposed in the project is the estab- lishment of the whole DNA sequence by processes not dissimilar to the ‘shotgun’ sequencing of small genes. Sec- ond, the project is often misrepresented as having as its sole goal the elucidation of all 3 x 109 base pairs of the human genome.

It seems likely that shotgun sequencing of the whole of the human genome is technically impossible. The DNAs of vertebrate genomes contain complex reiterated se- quences, some of which would make the computational determination of the complete DNA sequence a hope- less problem if attempted in a 600 bp frame. The so- lution to this problem is to adopt a hierarchical ap- proach, by breaking down the genome into progressively smaller fragments, avoiding facing the full complexity of the genome in any single analysis. This is the major struc- ture of the HGMP which is explicitly delined to encom- pass four levels of ‘mapping’ activity at increasing reso- lution (see Fig. 1). The construction of a restriction frag- ment polymorphism (RFIP) map is the lowest level of resolution (5-lOMb), a restriction site map is the next level (l-5 Mb), overlapping recombinant DNA clones the next (2&50 kb) and, iinally, the DNA sequence is the ul- timate resolution.

Each of these levels, together with their immediate and long-term usefulness and the technical problems associ- ated with them, is discussed in turn.

470

Abbreviations cDNA--complementary DNA; HCMP-human genome mapping project;

PCR-polymerase chain reaction; RFLP-restriction fragment length polymorphism; STS-sequence tagging site; VNTR-variable number tandem repeat; YAC-yeast artificial chromosome.

@ Current Biology Ud ISSN 0955*74

Gene mapping Little 479

Chromosome

RFLP . . . . .

I - - - / ‘i - 4o-300kb

DNA sequence ? CAllCATCC----TTGAC

Fig. 1. The levels of analysis of the human genome mapping project. Sizes next to each figure indicate the approximate physi- cal resolution of each approach.

The RFLP map

A very large body of data on the RFLP map of individual chromosomes has been published [ 11, Two problems have to be overcome. Firstly, the distribution of markers is neither sufficiently dense (about 20 Cm) nor uniform to provide useful coverage of the complete genome. The immediate goal is to increase the resolution toward 5 Cm, approximating to perhaps one marker per 10Mb. Sec- ondly, the informativeness of the polymorphic markers is variable and the generation of markers with multi-allelic, highly polymorphic characteristics is seen as being of ma- jor technical importance. Highly polymorphic minisatel- lite repeats ~effreys et al, Nature 1985, 31676) or vari- able number tandem repeats (VNTRs; Nakamura et al, Science 1987,235:273-289) and microsatellite sequences (Miesfeld et al, Ntukic Acids Res 1981,9:5931--5974) are very promising candidates for these markers. The distri- bution of VNTR sequences in the genome has a marked bias toward telomeric location in humans (Royle et al, Genomics 1988,3:352-3&I) but microsatellites appear to be widely distributed. Microsatellites are simple di- or mononucleotide short tandem repeats. Variation in re- peat number is a common characteristic of these repeats, presumably generated by replication slippage. Polymor- phism is detected by use of flanking, unique sequence probes in polymerase chain reactions (PCRs) [2] and the number of alleles and frequency of polymorphism is ideal for use in RFLP maps. There are about 50 000 microsatel-

lites in human DNA and their distribution appears not to be clustered. It is likely that this class of sequence will be central to the successful construction of useful RFLP maps, as similar repeats are found in the genome of the mouse and many other species.

The current belief is that the RFW map of the human genome will have two major applications; 6&y, it will allow the positioning of markers flanking known disease- causing loci and, secondly, it will provide a broad map- ping framework to facilitate the construction of the next ‘lower’ level of the HGMP, the restriction map.

The restriction map

Central to this area of work is the use of restriction en- zymes that cleave at rare sites within human or other complex DNA and the analysis of DNA by Southern blot- ting of fragments resolved by electrophoresis through non-uniform voltage fields. Bucan et al [3] provide a good example of local mapping around the Huntington’s disease gene of the human. The large number (20) of probes used to cover 5Mb of DNA should be noted.

Three further problems make such experiments diEi- cult. Firstly, rare cutting sites are clustered within the mammalian genome: Brown and Bird (Nature 1986, 322:477-481) have pointed out that a substantial pro- portion of cleavage sites are located within CpG islands. A consequence of this is that large fragments are of- ten bounded by several small fragments. Since many ap- proaches rely upon probes that are collected at random from a chromosome or region, there remains the prob- lem of bridging the gaps between large fragments. Sec- ondly, many of the enzymes used are sensitive to the eukaryotic methylation of CpG dinucleotides. Whilst this can be advantageous in the generation of partial digests, the patterns are frequentiy unpredictable: that is, merent cell lines from the same tissue will have different patterns of methylation. This makes the comparison between nor- mal and mutant maps diflicult, and also poses problems for comparison of maps generated in different laborato- ries. Fin&y, the construction of restriction maps requires high accuracy in size determination (so that, for example, it can be shown rigorously whether or not two fragments are identical). The lack of markers whose exact size is known in the l-5Mb range means that some computa- tional methods for calculating restriction maps become unworkable, simply because of the uncertainty of sizing methods.

None of these problems is sufi?cientiy serious that maps cannot be constructed; indeed, the construction of a Not1 map of E. coli has been reported (Smith et ai, Science 1987, 236:1448-1453). It is interesting to contra% this map with that of Koham et al (Cell 1987, 50:49%508), published simultaneousiy. They used a clone analysis strategy to generate a map of sites of eight common, frequent-cutting enzymes, providing a graphic demon- stration of the inherent power of such an approach.

480 Nucleus and gene expression

Mapping projects have been targeted so far at specitic hu- man chromosomes: limitation of project size reduces the problems of gap closure, because the smaller the project size, the less gaps and also the fewer the number of frag- ments, the easier the computation of a restriction map.

There are two main potential uses of large restriction maps: as a tool for establishing the physical linkage of RFLP or other markers, thus defining the amount of DNA that must be searched to locate a specific gene, and as a tool for orientating clone arrays. The latter application has not yet been implemented and it is possible that hy- bridization to yeast artificial chromosome (YAC) vectors (Burke et aA, Science 1987,236:806) will be more useful.

Cloned DNA maps: ‘contig’ mapping

A major feature of the HGMP is the construction of in- dexed sets of overlapping DNA clones that correspond to the genome [or complementary DNA(cDNAs)] of the target organism (in this context, ‘indexed’ means that the position of each clone is recorded so that sequence, restriction sites or any other property of the clone is recorded within a database).

DNAs can be isolated in either prokaryotic E. cofz’ hosts and vectors or in eukatyotic YAC vectors. In practice, YACs are still at an early stage of use and the majority of cloned DNA ana@zs are carried out with conventional E. coli vector systems. This may change over the next few years and YACs are discussed in more detail below.

Overlapping clone sets can in principal be constructed either by analysis of restriction fragment digests of cloned DNAs or by using the actual sequence of DNA in a clone to relate it to other clones. These methods are discussed in turn.

Fragment analysis is of two kinds: lirstiy, by identili- cation of restriction site maps of randomly selected clones by computeraided analysis of fragments sized on agarose gels (Olson et al, Pm Nat1 Acud Sci USA 1986,83:732&7830) or by indirect end labelling of par- tial digests (Kohara et al, Cell 1987, 50:495-508); or secondly, by identitication of a more limited set of ‘fin- gerprint’ sites (Coulson et al, Proc Nat1 Acud Sci USA 1986, 83:7821-7825). Clones that are derived from the same DNA sequence or that overlap in the DNA that they contain, will have a statistically vetible similarity of re- striction map or linger-print Fig. 2 shows both methods schematically. Overlapping clones are delined as belong- ing to the same set of contiguous DNAs, or ‘contigs’, and the project proceeds by the identification initially of in- creasing numbers of contigs and later of decreasing num- bers as contigs join together.

All three projects described above have used prokaryotic vector/host cloning systems, with k. phage having been used for E. coli (Kohara et aL, 1987) and Saccharomyces tszmtbae (Olson et al, 1986), and cosmids for C ele gam (Coulson et aL, 1986).

(a)

Random clones

Restriction map Align

(b)

Random clones

Fingerprint

Assemble

1 2 3

w:;:; c cm : ::.*

tM : ::c

*z

1 2 3

n

A

n

0

0

0 0

0 0

0 0 A

1 ,I

Fig. 2. The two major clone techniques of clone overlap analy- sis using restriction sites. (a) The restriction mapping method. fb) The fingerprint method. A subset of fragments from each clone is analysed by gel electrophoresis. Computerized comparison of random clones identifies similar patterns in different clones. Over- laps are assigned by identification of fragments, in this example, present only in 1 (A), shared between 1 and 2 CO), 1, 2 and 3 (O), 2 and 3 (Cl) or present only in 3 (U). This defines the relationship of the three clones shown.

Analysis of clones picked at random can not be carried out to completion (i.e. the whole genome in contigs with no gaps) on purely statistical grounds. Additionally, there are sequences in the genome of target organisms that ap- pear to be unclonable or clonable only with difhculty in the E. coli host (discussed in Gibson et al, Gene 1987, 53:275-281). This results in the problem of gap closure. Coulson et al [4] have approached this by using YAC vectors, which have two features that are important in this context: they contain large DNA inserts of the order of 200-300 bp and they seem insensitive to the features that render DNA unclonable in E. coli This enabled Coulson et al [ 41 to use hybridization of whole nematode cosmid recombinants to representative YAC libraries to identify cosmid contigs that, while not overlapping themselves, must derive from a region of DNA cloned in a single YAC.

Gene mapping Little 481

The body of data on the nematode genome currently rep- resents the most complete genomic analysis of a com- plex organism. The E. colz’ map is virtually complete and the yeast map is still under construction. Cosmld Iinger- printing projects are also targeted at the genome of D. mekanogaster and the plant Arab- tbalianu.

Two different methods use sequence similarity to con- struct cosmid overlaps. Evans and Lewis [5] have used ‘multiplex’ hybridization to construct a cosmid map of regions of the long arm of the human chromosome 11. The method uses radiolabelled RNA probes complemen- taty to the ends of inserted DNA and generated from I7 or T3 promoters located at the ends of the vector DNA Pools of transcripts are hybridized to arrayed DNA sam- ples of target clones, also isolated from mouse/rodent hybrid cells. By using probes constructed from pools of clones constructed in known combination, for example by pooling a vertical row A in one experiment and a hon- zontal row B in a second, it is possible to relate hybridiz- tion of particular clones in the array. The only clone com- mon to a vertical and horizontal row is the clone at the intersection of the rows A and B; if clones not in row A or B, hybridize with both probes made from A and probes made from B, then these clones must overlap with the clone at the intersection of A and B. This allows con stmction of contigs. The method is clearly sensitive to the presence of DNA repeats, although the most abundant re- peats are stripped from the probes by pre-hybridization. It would seem di&ult to apply this strategy to very large genomic regions.

An alternative very promising approach is being pio- neered by Lehrach and his colleagues [6] who are using sequence-specific hybridization of short oligonucleotlde probes to cosmid DNAs immobilized on filters in an in- dexed array. If hybridization conditions and probe se- quences can be identiiied such that with any given probe some set of clones hybridize, then by reiterative hy- bridization of probes, a statement of a given clone’s pat- tern of hybridization can be drawn up. With a few probes, clones will have similar patterns based upon random se- quence sampling; however, the probability of two clones having, for example, 40 probe sequences in common, is intuitively slight It would instead be reasonable to as- sume that the clones contain overlapping DNA sequences from the same region of the genome. This allows the gen- eration of contigs. The process certainly requires com- puterized analysis, as do all the approaches detailed in this section. The major technical diihculty is in the iden- tification of hybridization conditions that are spec& for perfect hybrid formation and of sequences of the cor- rect abundance in the genome to generate hybridizing colonies at the appropriate frequency.

YACs occupy an interesting position in the hierarchy of the genome mapping project. One widely held view sees them as replacements for E. coli cloning systems and there is no doubt that at one level YACs could replace cosmid cloning vectors. The identification of overlapping YAC clones by modification of hngerprinting techniques or by examination of PCR reactions primed from Alu or other repeat elements [7,8] has been widely discussed.

The technical difficulty of data collection and analysis of agarose gel patterns is an area that will require robust analytical methods and some discussions of this prob- lem have ignored these difhculties. However, it seems likely that suitable computerized methods can be imple- mented. The second view of YACs is that they are primar- ily a mapping tool, not a cloning system. This is based upon the view that until a simple and reliable method for isolating usable quantities of any recombinant YAC in pure form is developed, each YAC will be contami- nated by 107bp of yeast DNA The user is then faced with subcloning the YAC, which while beiig less chal- lenging than cloning the whole genome, is nevertheless time consuming. It is important that clones provide a re- source that is of immediate use to a researcher. Ideally, the DNAs should be easily isolated and of a size conve- nient for immediate routine manipulation or sequencing. This implies that YAC clones are less useful than cosmid, phage or plasmid clones.

Evidence is accumulating that YACs may clone regions of DNA that are unclonable in cosmld or phage vectors, but DNA isolated in that way will still be unclonable when it comes to the sorts of manipulation that still require anal- ysis by growth in E. coli. This will remain a source of difkulty in large cloning programs of any type. The new vector systems with a capacity of 100 kb based upon the Pl phage [9] may be a very useful compromise. As yet they have not been widely used. It is my personal opln- ion that there is no one perfect cloning system for ‘con- tig’ mapping at present and that the exploration of the properties of all vector systems in relation to large hu- man cloning projects is necessary.

Despite earlier optimism, there are no proposals to anal- yse the entire human genome by clone analysis. Cur- rently, several cosmid contlg mapping projects are un- derway but all are targeted either at parts of the genome or at single chromosomes separated by flow sorting. My own laboratory is engaged in mapping the short arm of human chromosome 11, using methods detailed in [lo] and target clones picked from a mouse-human hy- brid cell line that contains only the appropriate region of 1 lp. The group at Iawrence Livermore Laboratories (Cal- ifornia, USA) is constructing a map of chromosome 19, analysing cosmid libraries constructed from DNA isolated from flow-sorted material and using an automated fluo- rescence label, fingerprint strategy [ 111. There is a sys- tematic program to isolate single human chromosome libraries in the Lawrence Livermore and the Los Almos (New Mexico, USA) laboratories and these should pro- vide a major resource [ 121.

It is clear that the construction of maps of complex DNAs at the level of cloned sequences is technically diilicult but attainable as either a large, organizational level project, or as a more limited single laboratory endeavour. The appli- cation of robotic sample handling is one area that is un- der intensive exploration: there are several machines that are commercially available and handle arrays of clones ln %-well microtiter dishes. More sophisticated systems are under development for other features of the project.

482 Nucleus and gene expression

The DNA sequence

The technical approaches to sequencing are outside the scope of this review but will undoubtedly rely heavily on automatic sequencers and robotic sample handling [ 131. The major technical problem that is being resolved re- lates to the choice of radioactive or fluorescent labelling technology and to the accuracy that is required of the project and that is delivered by the various automated sequencing devices available.

It is important to emphasize that the clone arrays, dis- cussed earlier, are the obvious targets for large-scale se- quencing projects as they are small enough to overcome the technical problems of both middle and highly repeti- tive elements. The arrays also have biological information inherent in their structure-the chromosomal region or genes contained in the array are by deli&ion known and this allows a considered choice of where and what will be sequenced first.

It has also been suggested that the establishment of the sequence of cDNA copies of all messenger RNA molecules would be a major contribution to the project. It seems likely that both approaches will be initiated si- multaneously.

The storage of information

The data handling and storage problems of the HGMP are complex but not unworkable. The hierarchical na- ture of the project lends itself to data storage in conven- tional relational database formats. These can be best un- derstood as a series of ordinary databases, each database containing information relating to one or other of the lev- els of the HGMP ‘map’. Each entry in the database would contain a ‘pointer’ that indicates that information on this clone, region, RFLP or whatever, exists in one or more of the other databases. This information would then be dis- played on screen. The tendency towards window usage in modem computing makes the idea even more attrac- tive. Such a scheme would require a limited number of databases: cytological and pathological information, RPLP map, restriction-site map, clone array map, gene map and hnally DNA sequence. The major problems are standard- ization of database systems, generation of transparent (to the user) programs that allow movement between the constituent databases, and the location and relationships of the centres that collate the incoming information. Gen- eral access to databases will require some large host com- puter systems.

The storage of clones

Considerable controversy is building up about how and if clones should be stored as reference libraries. Some libraries will inevitably be stored as indexed sets during the course of construction of clone arrays. However, Ol- son et al [ 141 have suggested that, in the long term, the

storage of libraries and access by the scientific commu- nity, is neither technically possible (libraries are of iinite lifetime in storage) nor linancially justifiable (screening, analysis and distribution is too expensive). This led to the idea of sequence tagging sites (STSs). If DNA sequences of about 500 bp could be determined at a regular spacing (e.g. every 50 000 bp) in the genome, then PCR primers could be used to generate probes that any laboratory could use to screen any appropriate library (pathological, normal or cell specific) to isolate the appropriate clone. The clones themselves would not have to be stored in any form other than a computer database of STS sequences at defined relative positions. The idea is attractive but in the initial phases of the project, before the DNA sequence makes probing unnecessary, it is not obvious how a given DNA fragment, such as a flanking RPLP or random cDNA clone, can be used to enter the STS database. Unless the RFLP or cDNA is itself an STS sequence, the researcher is condemned, on average, to produce 25 000 bp of se- quence to finally identify the STS that wiIl allow access to the clone array. In practice, STSs will have to exist along- side physically stored arrays for some time to come.

The uses of the component parts

It is central to the structure of the HGMP that informa- tion will emerge before the project is completed: that is, some or all of the component parts are themselves useful. RPLP linkage analysis is the simplest method for identi- fying the position of, and ultimately isolating, genes that cause genetic diseases in humans (reviewed by Smith and Goodfellow Curr @in Cell Biol 1989, 1:460-465). The cloning of the cystic fibrosis ‘gene’ [ 151 is perhaps the best example of the general approach, and the genes that cause the most common human genetic diseases have all been cloned. The frequency of many of the 2500 genetic diseases of humans is unknown. It has been calculated that, under optimal conditions and with a 20Cm map, 21 informative individuals are needed to identify linkage (Botstein et al, Am J Human Genet 1980, 32:314-331). More sensitive analyses can reduce this figure by perhaps lO-20% (Lander and Botstein, Proc Natl Acud Sci USA 1986, 83:73537357). Even so, it will probably not be possible to link very rare genetic diseases to a more de- tailed map. The further refinement of the genetic map with more informative polymorphisms would seem de- sirable (e.g. for sibling analysis) but it is not obvious at what resolution an RPLP map ceases to be useful. The other role of providing order to contigs could in princi- pal be supplanted by the use of in situ hybridization with whole cosmid or YAC clones, which is proving extremely powerful [16]. The restriction map will dehe the size of a region that is Banked by RPLP markers and, hence, deIine a region in which to search for genes of biological importance in any study. The cloned DNA arrays then provide the major re- source for gene isolation by giving access to cloned DNA surrounding any particular point in the genome. In the early stages of the project, identilication of these clones

Gene mapping Little 483

will be by direct hybridization or PCR analysis of pools of clones. Later, as the project develops and arrays be- come larger and more complete, access to the clones could be by sequence, restriction site or chromosomal location. The nematode database is already used in this fashion [ 171. Once sutficient DNA sequence has accumu- lated, some of the functions of the lower resolution levels will disappear and direct analysis of long DNA sequences will be possible.

The uses of the complete DNA sequence of any organism are too complex a subject to cover here in depth. The obvious use is to allow the analysis of genes without the necessity of cloning (described above) but, in the long term, extensive or complete DNA sequences will allow the cataloguing and classifkation of all genes in an organ- ism. This is not a trivial technical proposition-the iden- tification of exon blocks in the complete DNA sequence of, for example, the 2 x 106 bp dystrophin gene, is prob- ably beyond our current technical capabilities. The most important point is that the sequence of multiple organ- isms will in the end be available, allowing direct ‘trans- port’ of genetic information and concepts between dif- ferent species. By an understanding of syntenic and other kinds of evolutionary relationship, we should be able to accumulate very significant bodies of data.

Conclusion

There is no doubt that all parts of the HGMP are tech- nically demanding and expensive but cettainfy its goals are attainable. It raises some complex and serious ethical problems: the sequence of the human genome will be- come the reference against which variation can be meas- ured and this information is certainly capable of being abused. These issues are beginning to be addressed and should be part of a signifkant public debate.

There is an old idea from political philosophy that quanti- tative changes accumulate and ultimately result in a quafi- tative change. The HGMP represents just such a situation and its effects upon our understanding of biological pro- cesses will be profound.

Annotated references and recommended reading

0 Of interest 00 Of outstanding interest

1. WHTTE Rl, J~ULXJEL J-M, NAKMURA Y, DONIS-KEUER H, GREEN 0 P, BOWDEN P, MAIHEW C, EASTON D, ROBSON E, MORTON NE,

GUSSEIA J, HAINES Jl, RETIFE AE, KIDD K, MURRAY JC, L~THROP M, CANN HM: The CEPH consortium primary linkage map of human chromosome 10. Gerwmicr 1990, 6:393-412.

A good example of the kuge coRaborattve naNre of the RPLP project The map of chromosome 10 is at 7-11 Cm resolution and invokes 37 prObg.

2. WEBBER JI MAY PE: Abundant class of human DNA pofy- 0 morphisms which can be typed using the polymerase chain

reaction Am J Hum G&m 1939, 44:388-3%.

Describes the use of simple (dC-dA), and (dG-dT), repeats as poly- morphic markers detectable by Banking PCR probes. This, and simiku papers, are the basis of the current interest in these sequences for ex- pansion of the RPLP map of humans and mice.

3. BUCAN M, -R M, WHUEY WI, POUSTK.4 A, YOUNGMAN S, a AIMTO BA ORMONDROYD E, SMITH B, POHL TM, MACDONALLI

M, BATES GP, RICHARDS J, Vow S, GILUAM TC, SEDLXEK Z, COLLINS FS, WASMUTH JJ, Sr-t~w DJ, GUSSELA JF, FRIXHAUF A-M, ~EHRACH H: Physical map of 4~16.3, the area expected to contain the Huntington disease mutation Gen0mrL.r 1990, 6:1-15.

The restriction map of this region could only be constructed by us- ing jumping and linking clones in addition to random probes; this is a graphic example of the complexity of the apparently simple map con- struction.

4. COUlsoN A, WATERSTONE R, m J, SLJLSTON J: Cknome 00 Unking with yeast artikial chromosomes. Nature 1988,

335:184-186. Describes the use of YACs to link cosmid contigs of nematode DNA by hybridization of end clones of contigs to an indexed array of yeast celh harbouring YACs containing nematode DNA This was the lirst system- atic use of YACs as a mapping/cloning tool.

5. EVANS GA MEWS KA Physical mapping of complex genomes aa by cosmid multiplex analysis. Proc Nurl Acud Sci USA 1989,

86:503&5034. Describes the methods for linking smaU sets of cosmid clones together by end probe hybridization. A region of the long arm of human chro- mosome 11 was used. The paper is an important contribution to the general done mapping techniques but it is not clear how huge a project could be undertaken with such methods.

6. CRUG A, Nlzmc D, HOHELSEL J, LENNON G, IEHRACH H: The em construction of ordered clone Rbraries by hybridisation tin-

gerprints. Cytogeneti cell Gent9 1989, 51:1-4. An abstract that sets out some of the properties of this very important whole-genome analysii method.

7. NELYQN DI, LEDEIETIER Sq Coruso I+ Vmcrow MF, Rwxsz- l Sous R, WEBSTER TD, IEDB!XI-ER DH, CASKF~ CT: Alu poly-

mecase chain reaction: a method for rapid isolation of human- specilic sequences from complex DNA sources. Pnx Nat1 Acud Sci USA 1989, 8666866698.

Details using Ah sequences as primers for PCR Species specificity is dictated by choice of sequence and fragments correspond to the DNA between inverted Ah sequences. Only a singje primer is required.

8. LEDBEII-ER SB, NELSON DI, WARREN ST, IEDBEXTER DH: Rapid 0 isolation of DNA probes within specific chromosome re-

glans by interspersed repetitive sequence potymerase chain reaction Cen0mk.s 1990, 6:475-481.

Describes the use of PCR primed off interspersed repetitive sequences to generate probes from human DNAs in a complex background. This paper with [7] describes one of the technical methods for developing ‘PCR lingerprints’.

9. STERNBERG N: Bacteriophage Pl cloning system for the iso- em lation, amplification and recovery of DNA fragments as

large as 100 kilobase pairs. PKX Nat1 Acud Sci USA 1990, 87:103107.

Describes the construction and use of the first new cloning system for prokuyotes for 10 years or so. This is potentially very important in ex- panding the cloning technology of large genome projects, as it may reduce the amount of work in clone army projects by 2.5.

10. HARRISON-bVOIE K, JOHN R, POKI’EOUS DJ, LllTIE PFRz A COS- 0 mid clone map of a small region of human chromosome

11. GeTwmics 1989, 5501~509. A report on some of the technical approaches and problems kienti- fied in attempts to construct cosmid contigs by conventional lingerprint analysis. The target mouse/human hybrid cell line contains a smaU re- gion of 11~15.

11. CARRANO AV, LMERDIN J, ASHWORTH IK, W~rrrxms B, SE&UC T, 0 RAFF M, DE JONG P, KEITH D, MCBRIDE I MEISTER S, KRONICK

484 Nucleus and gene expression

M: A high resolution, tluoresccnce baser?, semi-automated method for DNA fingetptinting. Cenomics 1989, 4:129-136.

The authors report on the application of adtomated sequencers and Buorescent DNA detection to construct fingerprint base overlap of cos- mid contigs. This is the srart of a highly automated system for mapping under development in this laboratory.

12. FIJXOE JC, MCNINCH JS, Cowus CC, VAN DIUA MA Human l chromosome specitic DNA libraries construction and purity

analysis. Cyrogener cell Getter 1989, 45:739-752. Describes the current state of the project to construct libraries from all human chromosomes individually. Both phage and cosmid libraries are under construction.

13. W~SON RR, CHEN C, AVDALOMCH N, Buws J, HOOD L: Devcl- l opment of an automated procedure for fluorescent DNA

sequencing. Genomics 1990, 6:626-634. Describes the use of a widely available laboratory robot for setting up fluorescent DNA sequencing reactions. Similar protocols are available for various enzymatic and radioactive methods.

14. OISDN MA, HOOD I C.urro~ C, Bcrrs~~rn D: A common lan- l e guage for physical mapping of the human genome. Science

19t39, 245:14%1435. A letter that sets out the design requirements for the STS proposal. A clearly argued and deceptively attractive proposal!

15. ROMMENS JM, IANUZZI MC, KEREM B, DRUMM ML, ME:LMER G, l e DEAN M, ROZMAHEL R COW JI, KENNEDY D, HIDAKA N, ZSIGA

M, BUCHWAID M. RIORD.~ JR, TSUI LC, COUJNS FS: Identitica- tion of the cystic fibrosis gene: chromosome walking and jumping. Science 1989, 245:10591065.

The isolation of the cystic fibrosis gene employed almost all the tech- niques of gene mapping in use for these sorts of project. None are original to this paper but it represents the best example yet of the ap- plication of technology to the isolation of disease causing genes in hu- mans. A mastedy project.

16. IJCHTER P, TANG CJC, CAU K, HERMANSON G, EVANS Gq l HOUSMAN DA, WNUI DC: High resolution mapping of hu-

man chromosome 11 by in S&A hybridisation with cosmid clones. Science 1990, 24764-69.

A very good example of a level of mapping that is becoming more widespread. YACs can be used in this way; to order YACs along a chro- momme is technically possible, although no papers have yet been pub- lished. 17. BURGUN TR, FINNEIY M, COUIXIN A, RWKUN G: Caenorhabditis me elegans has scores of homeobox containing genes. N&we

1989, 34 1:239-243. An example of the use of the nematode database, in this case provid- ing genomic clones homologous to the ubiquitous hox sequence mo- tif. These clones could immediately be linked to the genetic map via known contig locations. This is a perfect example of the application of an advanced genetic and physical database.