feature a vision for the future of genomics...

13
feature Francis S. Collins, Eric D. Green, Alan E. Guttmacher and Mark S. Guyer on behalf of the US National Human Genome Research Institute* The completion of a high-quality, comprehensive sequence of the human genome, in this fiftieth anniversary year of the discovery of the double-helical structure of DNA, is a landmark event. The genomic era is now a reality. In contemplating a vision for the future of genomics research,it is appropri- ate to consider the remarkable path that has brought us here. The rollfold (Figure 1) shows a timeline of land- mark accomplishments in genetics and genomics, beginning with Gregor Mendel’s discovery of the laws of heredity 1 and their rediscovery in the early days of the twentieth century.Recognition of DNA as the hereditary material 2 , determination of its structure 3 , elucidation of the genetic code 4 , development of recombinant DNA tech- nologies 5,6 , and establishment of increasingly automatable methods for DNA sequen- cing 7–10 set the stage for the Human Genome Project (HGP) to begin in 1990 (see also www.nature.com/nature/DNA50). Thanks to the vision of the original planners, and the creativity and determination of a legion of talented scientists who decided to make this project their overarching focus, all of the initial objectives of the HGP have now been achieved at least two years ahead of expectation, and a revolution in biological research has begun. The project’s new research strategies and experimental technologies have generated a steady stream of ever-larger and more com- plex genomic data sets that have poured into public databases and have transformed the study of virtually all life processes. The genomic approach of technology develop- ment and large-scale generation of commu- nity resource data sets has introduced an important new dimension into biological and biomedical research. Interwoven advances in genetics, comparative genomics, high- throughput biochemistry and bioinformatics NATURE | VOL 422 | 24 APRIL 2003 | www.nature.com/nature 835 are providing biologists with a markedly improved repertoire of research tools that will allow the functioning of organ- isms in health and disease to be analysed and comprehended at an unprecedented level of molecular detail. Genome sequences, the bounded sets of information that guide bio- logical development and function, lie at the heart of this revolution. In short, genomics has become a central and cohesive discipline of biomedical research. The practical consequences of the emer- gence of this new field are widely apparent. Identification of the genes responsible for human mendelian diseases, once a herculean task requiring large research teams, many years of hard work, and an uncertain out- come, can now be routinely accomplished in a few weeks by a single graduate student with access to DNA samples and associated phenotypes, an Internet connection to the public genome databases, a thermal cycler and a DNA-sequencing machine. With the recent publication of a draft sequence of the mouse genome 11 , identification of the mutations underlying a vast number of interesting mouse phenotypes has simi- larly been greatly simplified. Comparison of the human and mouse sequences shows that the proportion of the mammalian genome under evolu- tionary selection is more than twice that previously assumed. Our ability to explore genome function is increasing in specificity as each subsequent genome is sequenced. Microarray technologies have catapulted many laboratories from studying the expres- sion of one or two genes in a month to studying the expression of tens of thousands of genes in a single after- noon 12 . Clinical opportunities for gene-based pre-symptomatic prediction of illness and adverse drug response are emerging at a rapid pace, and the therapeutic promise of genomics has ushered in an exciting phase of expansion and exploration in the commercial sector 13 . The investment of the HGP in studying the ethical, legal and social implications of these scientific advances has created a talented cohort of scholars in ethics, law, social science, clinical research, theology and public policy, and has already resulted in substantial increases in public awareness and the introduction of significant (but still incomplete) protections against misuses such as genetic discrimination (see www.genome.gov/PolicyEthics). These accomplishments fulfil the expan- sive vision articulated in the 1988 report of the National Research Council, Mapping and Sequencing the Human Genome 14 . The suc- cessful completion of the HGP this year thus represents an opportunity to look forward and offer a blueprint for the future of genomics research over the next several years. The vision presented here addresses a different world from that reflected in earlier plans published in 1990, 1993 and 1998 (refs 15–17). Those documents addressed the goals of the 1988 report, defining detailed paths towards the development of genome- *Endorsed by the National Advisory Council for Human Genome Research, whose members are Vickie Yates Brown, David R. Burgess, Wylie Burke, Ronald W. Davis, William M. Gelbart, Eric T. Juengst, Bronya J. Keats, Raju Kucherlapati, Richard P. Lifton, Kim J. Nickerson, Maynard V. Olson, Janet D. Rowley, Robert Tepper, Robert H.Waterston and Tadataka Yamada. A vision for the future of genomics research A blueprint for the genomic era. DARRYL LEJA A. BARRINGTON BROWN/SPL © 2003 Nature Publishing Group

Upload: others

Post on 14-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: feature A vision for the future of genomics researchgenetics.wustl.edu/bio5488/files/2019/01/IntroductionReading_2_coll… · human mendelian diseases,once a herculean task requiring

feature

Francis S. Collins, Eric D. Green,Alan E. Guttmacher and Mark S.Guyer on behalf of the US NationalHuman Genome Research Institute*

The completion of a high-quality,comprehensive sequence of thehuman genome, in this fiftiethanniversary year of the discovery of thedouble-helical structure of DNA, is alandmark event. The genomic era isnow a reality.

In contemplating a vision for thefuture of genomics research,it is appropri-ate to consider the remarkable path thathas brought us here. The rollfold (Figure 1) shows a timeline of land-mark accomplishments in geneticsand genomics, beginning with GregorMendel’s discovery of the laws of heredity1

and their rediscovery in the early days of thetwentieth century.Recognition of DNA as thehereditary material2, determination of itsstructure3, elucidation of the genetic code4,development of recombinant DNA tech-nologies5,6, and establishment of increasinglyautomatable methods for DNA sequen-cing7–10 set the stage for the Human GenomeProject (HGP) to begin in 1990 (see alsowww.nature.com/nature/DNA50). Thanksto the vision of the original planners, andthe creativity and determination of a legionof talented scientists who decided to makethis project their overarching focus, all ofthe initial objectives of the HGP have nowbeen achieved at least two years ahead ofexpectation, and a revolution in biologicalresearch has begun.

The project’s new research strategies andexperimental technologies have generated asteady stream of ever-larger and more com-plex genomic data sets that have poured intopublic databases and have transformed thestudy of virtually all life processes. Thegenomic approach of technology develop-ment and large-scale generation of commu-nity resource data sets has introduced animportant new dimension into biological andbiomedical research. Interwoven advances in genetics, comparative genomics, high-throughput biochemistry and bioinformatics

NATURE | VOL 422 | 24 APRIL 2003 | www.nature.com/nature 835

are providing biologists with amarkedly improved repertoire of researchtools that will allow the functioning of organ-isms in health and disease to be analysed andcomprehended at an unprecedented level ofmolecular detail. Genome sequences, thebounded sets of information that guide bio-logical development and function, lie at theheart of this revolution. In short, genomicshas become a central and cohesive disciplineof biomedical research.

The practical consequences of the emer-gence of this new field are widely apparent.Identification of the genes responsible forhuman mendelian diseases,once a herculeantask requiring large research teams, manyyears of hard work, and an uncertain out-come, can now be routinely accomplished

in a few weeks by a single graduate studentwith access to DNA samples and associatedphenotypes, an Internet connection to thepublic genome databases, a thermal cyclerand a DNA-sequencing machine. With the

recent publication of a draft sequence ofthe mouse genome11, identification ofthe mutations underlying a vast number of interesting mouse phenotypes has simi-larly been greatly simplified. Comparison

of the human and mouse sequencesshows that the proportion of themammalian genome under evolu-

tionary selection is more than twice thatpreviously assumed.

Our ability to explore genome function isincreasing in specificity as each subsequent

genome is sequenced. Microarraytechnologies have catapulted manylaboratories from studying the expres-sion of one or two genes in a month to studying the expression of tens of

thousands of genes in a single after-noon12. Clinical opportunities for gene-based pre-symptomaticprediction of illness and adversedrug response are emerging at arapid pace, and the therapeuticpromise of genomics has usheredin an exciting phase of expansion

and exploration in the commercialsector13.The investment of the HGP in

studying the ethical, legal and socialimplications of these scientific advances

has created a talented cohort of scholars inethics, law, social science, clinical research,theology and public policy, and has alreadyresulted in substantial increases in publicawareness and the introduction of significant(but still incomplete) protections against misuses such as genetic discrimination (seewww.genome.gov/PolicyEthics).

These accomplishments fulfil the expan-sive vision articulated in the 1988 report ofthe National Research Council, Mapping andSequencing the Human Genome14. The suc-cessful completion of the HGP this year thusrepresents an opportunity to look forwardand offer a blueprint for the future ofgenomics research over the next several years.

The vision presented here addresses a different world from that reflected in earlierplans published in 1990, 1993 and 1998 (refs15–17). Those documents addressed thegoals of the 1988 report, defining detailedpaths towards the development of genome-

*Endorsed by the National Advisory Council for Human Genome

Research, whose members are Vickie Yates Brown, David R. Burgess,

Wylie Burke, Ronald W. Davis, William M. Gelbart, Eric T. Juengst,

Bronya J. Keats, Raju Kucherlapati, Richard P. Lifton, Kim J.

Nickerson, Maynard V. Olson, Janet D. Rowley, Robert Tepper,

Robert H. Waterston and Tadataka Yamada.

A vision for the future of genomics researchA blueprint for the genomic era.

DA

RR

YL

LEJA

A.B

AR

RIN

GT

ON

BR

OW

N/S

PL

© 2003 Nature Publishing Group

Page 2: feature A vision for the future of genomics researchgenetics.wustl.edu/bio5488/files/2019/01/IntroductionReading_2_coll… · human mendelian diseases,once a herculean task requiring

analysis technologies, the physical andgenetic mapping of genomes, and thesequencing of model organism genomesand, ultimately, the human genome. Now,with the effective completion of these goals,we offer a broader and still more ambitiousvision, appropriate for the true dawning ofthe genomic era. The challenge is to capital-ize on the immense potential of the HGP toimprove human health and well-being.

The articulation of a new vision is anopportunity to explore transformative newapproaches to achieve health benefits.Although genome-based analysis methodsare rapidly permeating biomedical research,the challenge of establishing robust paths

from genomic information to improvedhuman health remains immense. Currentefforts to meet this challenge are largelyorganized around the study of specific dis-eases, as exemplified by the missions of thedisease-oriented institutes at the US Nation-al Institutes of Health (NIH, www.nih.gov)and numerous national and internationalgovernmental and charitable organizationsthat support medical research. The NationalHuman Genome Research Institute(NHGRI), in budget terms a rather small(less than 2%) component of the NIH, willwork closely with all these organizations inexploring and supporting these biomedicalresearch capabilities. In addition, we envi-

sion a more direct role for both the extra-mural and intramural programmes of theNHGRI in bringing a genomic approach tothe translation of genomic sequence infor-mation into health benefits.

The NHGRI brings two unique assets tothis challenge.First, it has close ties to a scien-tific community whose direct role over thepast 13 years in bringing about the genomicrevolution provides great familiarity with itspotential to transform biomedical research.Second,the NHGRI’s long-standing mission,to investigate the broadest possible implica-tions of genomics,allows unique flexibility toexplore the whole spectrum of human healthand disease from the fresh perspective ofgenome science. By engaging the energeticand interdisciplinary genomics-researchcommunity more directly in health-relatedresearch and by exploiting the NHGRI’s abili-ty to pursue opportunities across all areas ofhuman biology, the institute seeks to partici-pate directly in translating the promises ofthe HGP into improved human health.

To fully achieve this goal, the NHGRImust also continue in its vigorous support ofanother of its vital missions — the couplingof its scientific research programme withresearch into the social consequences ofincreased availability of new genetic tech-nologies and information. Translating thesuccess of the HGP into medical advancesintensifies the need for proactive efforts toensure that benefits are maximized andharms minimized in the many dimensions of human experience.

A reader’s guideThe vision for genomics research detailedhere is the outcome of almost two years ofintense discussions with hundreds of scien-tists and members of the public, in more thana dozen workshops and numerous individ-ual consultations (see www.genome.gov/About/Planning). The vision is formulatedinto three major themes — genomics to biol-ogy, genomics to health, and genomics tosociety — and six crosscutting elements.

We envisage the themes as three floors of a building, firmly resting on the founda-tion of the HGP (Figure 2). For each theme,we present a series of grand challenges, in thespirit of the proposals put forward for math-ematics by David Hilbert at the turn of thetwentieth century18. These grand challengesare intended to be bold, ambitious researchtargets for the scientific community. Somecan be planned on specific timescales, othersare less amenable to that level of precision.We list the grand challenges in an order thatmakes logical sense, not representing priori-ty. The challenges are broad in sweep, notparochial — some can be led by the NHGRIalone, whereas others will be best pursued in partnership with other organizations.Below, we clarify areas in which the NHGRIintends to play a leading role.

feature

836 NATURE | VOL 422 | 24 APRIL 2003 | www.nature.com/nature

One of the key and distinctiveobjectives of the Human GenomeProject (HGP) has been the generationof large, publicly available,comprehensive sets of reagents and

data (scientific resources or ‘infrastructure’) that,along with other new, powerful technologies,comprise a toolkit for genomics-based research.Genomic maps and sequences are the mostobvious examples. Others include databases ofsequence variation, clone libraries and collectionsof anonymous cell lines. The continued generationof such resources is critical, in particular: ◆ Genome sequences of key mammals,vertebrates, chordates, and invertebrates◆ Comprehensive reference sets of codingsequences from key species in various formats,for example, full-length cDNA sequences andcorresponding clones, oligonucleotide primers,and microarrays

◆ Comprehensive collections of knockouts andknock-downs of all genes in selected animals toaccelerate the development of models of disease◆ Comprehensive reference sets of proteins fromkey species in various formats, for example inexpression vectors, with affinity tags and spottedonto protein chips◆ Comprehensive sets of protein affinity reagents◆ Databases that integrate sequences withcurated information and other large data sets, aswell as tools for effective mining of the data◆ Cohort populations for studies designed toidentify genetic contributors to health and toassess the effect of individual gene variants ondisease risk, including a ‘healthy’ cohort◆ Large libraries of small molecules, togetherwith robotic methods to screen them and access to medicinal chemistry for follow-up, to provide investigators easy and affordableaccess to these tools

BOX 1 Resources

Fig 2 The future of genomics rests on the foundation of the Human Genome Project.

© 2003 Nature Publishing Group

Page 3: feature A vision for the future of genomics researchgenetics.wustl.edu/bio5488/files/2019/01/IntroductionReading_2_coll… · human mendelian diseases,once a herculean task requiring

probably contains the bulk of the regulatoryinformation controlling the expression ofthe approximately 30,000 protein-codinggenes, and myriad other functional ele-ments, such as non-protein-coding genesand the sequence determinants of chromo-some dynamics.Even less is known about thefunction of the roughly half of the genomethat consists of highly repetitive sequences orof the remaining non-coding,non-repetitiveDNA.

The next phase of genomics is to cata-logue, characterize and comprehend theentire set of functional elements encoded inthe human and other genomes. Compilingthis genome ‘parts list’ will be an immensechallenge. Well-known classes of functional

elements, such as protein-coding sequences,still cannot be accurately predicted fromsequence information alone. Other types ofknown functional sequences, such as geneticregulatory elements, are even less wellunderstood; undoubtedly new types remainto be defined, so we must be ready to investi-gate novel, perhaps unexpected, ways inwhich DNA sequence can confer function.Similarly, a better understanding of epi-genetic changes (for example, methylationand chromatin remodelling) is needed tocomprehend the full repertoire of ways inwhich DNA can encode information.

Comparison of genome sequences fromevolutionarily diverse species has emerged as a powerful tool for identifying functionallyimportant genomic elements. Initial analysesof available vertebrate genome sequences7,11,19

have revealed many previously undiscoveredprotein-coding sequences. Mammal-to-mammal sequence comparisons have revealedlarge numbers of homologies in non-codingregions11, few of which can be defined in functional terms. Further comparisons ofsequences derived from multiple species,espe-cially those occupying distinct evolutionarypositions, will lead to significant refinementsin our understanding of the functional impor-tance of conserved sequences20. Thus, the generation of additional genome sequencesfrom several well-chosen species is crucial tothe functional characterization of the humangenome (Box 1). The generation of such largesequence data sets will benefit from furtheradvances in sequencing technology that yieldsignificant cost reductions (Box 2). The studyof sequence variation within species will alsobe important in defining the functional natureof some sequences (see Grand Challenge I-3).

The six critically important crosscuttingelements are relevant to all three thematicareas. They are: resources (Box 1); technolo-gy development (Box 2); computationalbiology (Box 3); training (Box 4); ethical,legal and social implications (ELSI, Box 5);and education (Box 6). We also stress thecritical importance of early, unfetteredaccess to genomic data in achieving maxi-mum public benefit. Finally, we propose aseries of ‘quantum leaps’, achievements thatwould lead to substantial advances ingenomics research and its applications tomedicine. Some of these may seem overlybold,but no laws of physics need to be violat-ed to achieve them. Such leaps would haveprofound implications, just as the dreams ofthe mid-1980s about the complete sequenceof the human genome have been realized inthe accomplishments now being celebrated.

I Genomics to biologyElucidating the structure and function of genomes The broadly available genome sequences ofhuman and a select set of additional organ-isms represent foundational information for biology and biomedicine. Embeddedwithin this as-yet poorly understood codeare the genetic instructions for the entirerepertoire of cellular components, knowl-edge of which is needed to unravel the complexities of biological systems. Elucidat-ing the structure of genomes and identifyingthe function of the myriad encoded elementswill allow connections to be made betweengenomics and biology and will, in turn,accelerate the exploration of all realms of thebiological sciences.

For this, new conceptual and technologi-cal approaches will be needed to:◆ Develop a comprehensive and com-

prehensible catalogue of all of thecomponents encoded in the humangenome.

◆ Determine how the genome-encodedcomponents function in an integratedmanner to perform cellular andorganismal functions.

◆ Understand how genomes change andtake on new functional roles.

Grand Challenge I-1 Comprehensivelyidentify the structural and functionalcomponents encoded in the humangenomeAlthough DNA is relatively simple and wellunderstood chemically,the human genome’sstructure is extraordinarily complex and itsfunction is poorly understood.Only 1–2% ofits bases encode proteins7, and the full com-plement of protein-coding sequences stillremains to be established. A roughly equiva-lent amount of the non-coding portion ofthe genome is under active selection11, sug-gesting that it is also functionally important,yet vanishingly little is known about it. It

feature

NATURE | VOL 422 | 24 APRIL 2003 | www.nature.com/nature 837

The Human Genome Project wasaided by several ‘breakthrough’technological developments, includingSanger DNA sequencing and itsautomation, DNA-based genetic

markers, large-insert cloning systems and thepolymerase chain reaction. During the project,these methods were scaled up and made moreefficient by ‘evolutionary’ advances, such asautomation and miniaturization. Newtechnologies, including capillary-basedsequencing and methods for genotyping single-nucleotide polymorphisms, have recently beenintroduced, leading to further improvements incapacity for genomic analyses. Even newerapproaches, such as nanotechnology andmicrofluidics, are being developed, and hold greatpromise, but further advances are still needed.Some examples are:◆ Sequencing and genotyping technologies toreduce costs further and increase access to awider range of investigators◆ Identification and validation of functional

elements that do not encode protein◆ In vivo, real-time monitoring of gene expressionand the localization, specificity, modification andactivity/kinetics of gene products in all relevantcell types◆ Modulation of expression of all gene productsusing, for example, large-scale mutagenesis,small-molecule inhibitors and knock-downapproaches (such as RNA-mediated inhibition)◆ Monitoring of the absolute abundance of any protein (including membrane proteins,proteins at low abundance and all modifiedforms) in any cell◆ Improved imaging methods that allow non-invasive molecular phenotyping◆ Correlating genetic variation to human healthand disease using haplotype information orcomprehensive variation information◆ Laboratory-based phenotyping, including theuse of protein affinity reagents, proteomicapproaches and analysis of gene expression◆ Linking molecular profiles to biology,particularly pathway biology to disease

BOX 2 Technology development

© 2003 Nature Publishing Group

Page 4: feature A vision for the future of genomics researchgenetics.wustl.edu/bio5488/files/2019/01/IntroductionReading_2_coll… · human mendelian diseases,once a herculean task requiring

Effective identification and analysis offunctional genomic elements will requireincreasingly powerful computational capa-bilities, including new approaches for tack-ling ever-growing and increasingly complexdata sets and a suitably robust computation-al infrastructure for housing, accessing andanalysing those data sets (Box 3). In parallel,investigators will need to become increasing-ly adept in dealing with this treasure trove ofnew information (Box 4). As a better under-standing of genome function is gained,refined computational tools for de novoprediction of the identity and behaviour offunctional elements should emerge21.

Complementing the computationaldetection of functional elements will be the generation of additional experimentaldata by high-throughput methodologies.One example is the production of full-length complementary DNA (cDNA)sequences (see, for example, mgc.nci.nih.govand www.fruitfly.org/EST/full.shtml). Majorchallenges inherent in programmes to dis-cover genes are the experimental identifica-tion and validation of alternate splice formsand messenger RNAs expressed in a highlyrestricted fashion. Even more challenging isthe experimental validation of functional ele-ments that do not encode protein (for exam-ple, regulatory regions and non-coding RNAsequences). High-throughput approaches to identify them (Box 2) will be needed togenerate the experimental data that will benecessary to develop, confirm and enhancecomputational methods for detecting func-tional elements in genomes.

Because current technologies cannot yet identify all functional elements, there is a need for a phased approach in which new methodologies are developed, tested on a pilot scale and finally applied to the

entire human genome. Along these lines,the NHGRI recently launched the Encyclo-pedia of DNA Elements (ENCODE) Project (www.genome.gov/Pages/Research/ENCODE) to identify all the functional elements in the human genome. In a pilotproject, systematic strategies for identifyingall functionally important genomic ele-ments will be developed and tested using aselected 1% of the human genome. Parallelprojects involving well-studied modelorganisms,for example,yeast,nematode andfruitfly, are ongoing. The lessons learned willserve as the basis for implementing a broaderprogramme for the entire human genome.

Grand Challenge I-2 Elucidate theorganization of genetic networks andprotein pathways and establish how they

contribute to cellular and organismalphenotypesGenes and gene products do not functionindependently, but participate in complex,interconnected pathways, networks andmolecular systems that, taken together, giverise to the workings of cells, tissues, organsand organisms. Defining these systems anddetermining their properties and inter-actions is crucial to understanding how biological systems function. Yet these systems are far more complex than any problem that molecular biology, genetics orgenomics has yet approached.On the basis ofprevious experience, one effective path willbegin with the study of relatively simplemodel organisms22, such as bacteria andyeast, and then extend the early findings tomore complex organisms, such as mouseand human. Alternatively, focusing on a fewwell-characterized systems in mammals willbe a useful test of the approach (see, forexample,www.signaling-gateway.org).

Understanding biological pathways, net-works and molecular systems will requireinformation from several levels.At the geneticlevel, the architecture of regulatory inter-actions will need to be identified in differentcell types, requiring, among other things,methods for simultaneously monitoring theexpression of all genes in a cell12. At the gene-product level, similar techniques that allow in vivo, real-time measurement of proteinexpression, localization, modification andactivity/kinetics will be needed (Box 2). It will be important to develop, refine and scaleup techniques that modulate gene expression,such as conventional gene-knockout meth-ods23, newer knock-down approaches24 andsmall-molecule inhibitors25 to establish thetemporal and cellular expression pattern ofindividual proteins and to determine thefunctions of those proteins. This is a key firststep towards assigning all genes and theirproducts to functional pathways.

The ability to monitor all proteins in a cellsimultaneously would profoundly improveour ability to understand protein pathwaysand systems biology. A critical step towardsgaining a complete understanding of sys-tems biology will be to take an accurate census of the proteins present in particularcell types under different physiological con-ditions. This is becoming possible in somemodel systems, such as microorganisms26.It will be a major challenge to catalogue proteins present in low abundance or inmembranes. Determining the absoluteabundance of each protein, including allmodified forms, will be an important nextstep. A complete interaction map of the pro-teins in a cell,and their cellular locations,willserve as an atlas for the biological and med-ical explorations of cellular metabolism27

(see www.nrcam.uchc.edu, for example).These and other related areas constitute thedeveloping field of proteomics.

feature

838 NATURE | VOL 422 | 24 APRIL 2003 | www.nature.com/nature

Computational methods have becomeintrinsic to modern biological research,and their importance can only increaseas large-scale methods for datageneration become more prominent, as

the amount and complexity of the data increase,and as the questions being addressed becomemore sophisticated. All future biomedical researchwill integrate computational and experimentalcomponents. New computational capabilities willenable the generation of hypotheses and stimulatethe development of experimental approaches totest them. The resulting experimental data will, inturn, be used to generate more refined models thatwill improve overall understanding and increaseopportunities for application to disease. The areasof computational biology critical to the future ofgenomics research include:◆ New approaches to solving problems, such asthe identification of different features in a DNAsequence, the analysis of gene expression and

regulation, the elucidation of protein structure andprotein–protein interactions, the determination ofthe relationship between genotype andphenotype, and the identification of the patternsof genetic variation in populations and theprocesses that produced those patterns◆ Reusable software modules to facilitateinteroperability◆ Methods to elucidate the effects ofenvironmental (non-genetic) factors and ofgene–environment interactions on health anddisease◆ New ontologies to describe different data types◆ Improved database technologies to facilitatethe integration and visualization of different datatypes, for example, information about pathways,protein structure, gene variation, chemicalinhibition and clinical information/phenotypes◆ Improved knowledge management systemsand the standardization of data sets to allow thecoalescence of knowledge across disciplines

BOX 3 Computational biology

© 2003 Nature Publishing Group

Page 5: feature A vision for the future of genomics researchgenetics.wustl.edu/bio5488/files/2019/01/IntroductionReading_2_coll… · human mendelian diseases,once a herculean task requiring

Establishing a true understanding of howorganized molecular pathways and networksgive rise to normal and pathological cellularand organismal phenotypes will requiremore than large,experimentally derived datasets. Once again, computational investiga-tion will be essential (Box 3), and there willbe a greatly increased need for the collection,storage and display of the data in robustdatabases. By modelling specific pathwaysand networks, predicting how they affectphenotype, testing hypotheses derived fromthese models and refining the models basedon new experimental data, it should be possible to understand more completely thedifference between a ‘bag of molecules’and afunctioning biological system.

Grand Challenge I-3 Develop a detailedunderstanding of the heritable variation inthe human genomeGenetics seeks to correlate variation in DNAsequence with phenotypic differences(traits). The greatest advances in humangenetics have been made for traits associatedwith variation in a single gene. But mostphenotypes, including common diseasesand variable responses to pharmacologicalagents, have a more complex origin, involv-ing the interplay between multiple geneticfactors (genes and their products) and non-genetic factors (environmental influences).Unravelling such complexity will requireboth a complete description of the geneticvariation in the human genome and thedevelopment of analytical tools for usingthat information to understand the geneticbasis of disease.

Establishing a catalogue of all commonvariants in the human population, includingsingle-nucleotide polymorphisms (SNPs),small deletions and insertions, and otherstructural differences, began in earnest several years ago. Many SNPs have been identified28, and most are publicly available(www.ncbi.nlm.nih.gov/SNP). A public collaboration, the International HapMapProject (www.genome.gov/Pages/Research/HapMap), was formed in 2002 to character-ize the patterns of linkage disequilibriumand haplotypes across the human genomeand to identify subsets of SNPs that capturemost of the information about these patternsof genetic variation to enable large-scalegenetic association studies.To reach fruition,such studies need more robust experimental(Box 2) and computational (Box 3) methodsthat use this new knowledge of human haplotype structure29.

A comprehensive understanding ofgeneticvariation, both in humans and in modelorganisms,would facilitate studies to establishrelationships between genotype and biologi-cal function. The study of particular variantsand how they affect the functioning of specificproteins and protein pathways will yieldimportant new insights about physiological

processes in normal and disease states. Anenhanced ability to incorporate informationabout genetic variation into human geneticstudies would usher in a new era for investigat-ing the genetic bases of human disease anddrug response (see Grand Challenge II-1).

Grand Challenge I-4 Understandevolutionary variation across species andthe mechanisms underlying itThe genome is a dynamic structure, continu-ally subjected to modification by the forces of evolution. The genomic variation seen in humans represents only a small glimpsethrough the larger window of evolution,where hundreds of millions of years of trial-and-error efforts have created today’s bio-

sphere of animal,plant and microbial species.A complete elucidation of genome functionrequires a parallel understanding of thesequence differences across species and thefundamental processes that have sculptedtheir genomes into the modern-day forms.

The study of inter-species sequence com-parisons is important for identifying func-tional elements in the genome (see GrandChallenge I-1). Beyond this illuminatingrole, determining the sequence differencesbetween species will provide insight into the distinct anatomical, physiological anddevelopmental features of different organ-isms, will help to define the genetic basis forspeciation and will facilitate the characteri-zation of mutational processes. This lastpoint deserves particular attention, becausemutation both drives long-term evolution-ary change and is the underlying cause ofinherited disease. The recent finding thatmutation rates vary widely across the mam-malian genome11 raises numerous questionsabout the molecular basis for these evolu-tionary changes.At present,our understand-ing of DNA mutation and repair, includingthe important role of environmental factors,is limited.

Genomics will provide the ability to sub-stantively advance insight into evolutionaryvariation, which will, in turn, yield newinsights into the dynamic nature of genomesin a broader evolutionary framework.

Grand Challenge I-5 Develop policyoptions that facilitate the widespread useof genome information in both researchand clinical settingsRealization of the opportunities provided bygenomics depends on effective access to the

feature

NATURE | VOL 422 | 24 APRIL 2003 | www.nature.com/nature 839

Meeting the scientific, medical andsocial/ethical challenges now facinggenomics will require scientists,clinicians and scholars with the skillsto understand biological systems and

to use that information effectively for the benefitof humankind. Adequate training capacity will berequired to address the following needs:◆ Computational skills As biomedical researchis becoming increasingly data intensive,computational capability is increasingly becominga critical skill. ◆ Interdisciplinary skills Although a good starthas been made, expanded interactions will berequired between the sciences (biology, computerscience, physics, mathematics, statistics,chemistry and engineering), between the basicand the clinical sciences, and between the lifesciences, the social sciences and the humanities.Such interactions will be needed at the individuallevel (scientists, clinicians and scholars will needto be able to bring relevant issues, concerns andcapabilities from different disciplines to bear on

their specific research efforts), at a collaborativelevel (researchers will need to be able toparticipate effectively in interdisciplinary researchcollaborations that bring biology together withmany other disciplines) and at the disciplinarylevel (new disciplines will need to emerge at theinterfaces between the traditional disciplines).◆ Different perspectives Individuals fromminority or disadvantaged populations aresignificantly under-represented as bothresearchers and participants in genomicsresearch. This regrettable circumstance deprivesthe field of the best and brightest from allbackgrounds, narrows the field of questionsasked, can lessen sensitivity to cultural concernsin implementing research protocols, andcompromises the overall effectiveness of theresearch. Genomics can learn from successfulefforts in training individuals from under-represented populations in other areas of scienceand health (see, for example,www.genome.gov/Pages/Grants/Policies/ActionPlanGuide).

BOX 4 Training

© 2003 Nature Publishing Group

Page 6: feature A vision for the future of genomics researchgenetics.wustl.edu/bio5488/files/2019/01/IntroductionReading_2_coll… · human mendelian diseases,once a herculean task requiring

information (such as data about genes, genevariants, haplotypes, protein structures,small molecules and computational models)by a wide range of potential users, includingresearchers, commercial enterprises, health-care providers, patients and the public.Researchers themselves need maximumaccess to the data as soon as possible (see‘Data release’,below).Use of the informationfor the development of therapeutic andother products necessarily entails considera-tion of the complex issues of intellectualproperty (for example, patenting and licens-ing) and commercialization.The intellectualproperty practices, laws and regulations thataffect genomics must adhere to the principleof maximizing public benefit, but must alsobe consistent with more general and longer-established intellectual property principles.Further, because genome research is global,international treaties, laws, regulations,practices, belief systems and cultures alsocome into play.

Without commercialization, most diag-nostic and therapeutic advances will notreach the clinical setting, where they can benefit patients. Thus, we need to developpolicy options for data access and for patent-ing, licensing and other intellectual propertyissues to facilitate the dissemination ofgenomics data.

II Genomics to healthTranslating genome-based knowledgeinto health benefitsThe sequencing of the human genome,along with other recent and expectedachievements in genomics, provides anunparalleled opportunity to advance ourunderstanding of the role of genetic factorsin human health and disease, to allow more precise definition of the non-geneticfactors involved, and to apply this insightrapidly to the prevention, diagnosis and

treatment of disease. The report by the USNational Research Council that originallyenvisioned the HGP was explicit in its expec-tation that the human genome sequencewould lead to improvements in humanhealth, and subsequent five-year plans reaffirmed this view15–17. But how this willhappen has been less clearly articulated.With the completion of the original goals of the HGP, the time is right to develop and apply large-scale genomic strategies to empower improvements in human health, while anticipating and avoidingpotential harm.

Such strategies should enable the researchcommunity to achieve the following:◆ Identify genes and pathways with a

role in health and disease, and deter-mine how they interact with environ-mental factors.

◆ Develop, evaluate and apply genome-

based diagnostic methods for the pre-diction of susceptibility to disease, theprediction of drug response, the earlydetection of illness and the accuratemolecular classification of disease.

◆ Develop and deploy methods thatcatalyse the translation of genomicinformation into therapeutic advances.

Grand Challenge II-1 Develop robuststrategies for identifying the geneticcontributions to disease and drug responseFor common diseases, the interplay of multi-ple genes and multiple non-genetic factors,not a single allele, usually dictates diseasesusceptibility and response to treatments.Deciphering the role of genes in humanhealth and disease is a formidable problemfor many reasons, including impediments to defining biologically valid phenotypes,challenges in identifying and quantifyingenvironmental exposures, technologicalobstacles to generating sufficient and usefulgenotypic information, and the difficultiesof studying humans.Yet this problem can besolved. Vigorous development of cross-cutting genomic tools to catalyse advances in understanding the genetics of commondisease and in pharmacogenomics is needed.Prominent among these will be a detailedhaplotype map of the human genome (see Grand Challenge I-3) that can be usedfor whole-genome association studies ofall diseases in all populations, as well as further advances in sequencing and genotyping technology to make such studiesfeasible (see ‘Quantum leaps’,below).

More efficient strategies for detectingrare alleles involved in common disease are also needed, as the hypothesis that allelesthat increase risk for common diseases arethemselves common30 will probably not beuniversally true. Computational and experi-mental methods to detect gene–gene and gene–environment interactions, as wellas methods allowing interfacing of a varietyof relevant databases, are also required (Box 3). By obtaining unbiased assessmentsof the relative disease risk that particulargene variants contribute,a large longitudinalpopulation-based cohort study, with collec-tion of extensive clinical information andongoing follow-up, would be profoundlyvaluable to the study of all common diseases(Box 1). Already, such projects as the UK Biobank (www.ukbiobank.ac.uk),the Marshfield Clinic’s Personalized Medi-cine Research Project (www.mfldclin.edu/pmrp) and the Estonian Genome Project(www.geenivaramu.ee) seek to provide suchresources. But if the multiple populationgroups in the United States and elsewhere inthe world are to benefit fully and fairly fromsuch research (see Grand Challenge II-6), alarge population-based cohort study thatincludes full representation of minoritypopulations is also needed.

feature

840 NATURE | VOL 422 | 24 APRIL 2003 | www.nature.com/nature

Today’s genomics research andapplications rest on more than adecade of valuable investigation intotheir ethical, legal and socialimplications. As the application of

genomics to health increases along with its social impact, it becomes ever more important to expand on this work. There is an increasingneed for focused ELSI research that directlyinforms policies and practices. One can envisage a flowering of ‘translational ELSIresearch’ that builds on the knowledge gained from prior and forthcoming ‘basic ELSI research’, which would provide knowledge for direct use by researchers,clinicians, policy-makers and the public.Examples include:◆ The development of models of genomicsresearch that use attention to these ELSI issues

for enhancing the research, rather than viewingsuch issues as impediments◆ The continued development of appropriate and effective genomics research methods andpolicies that promote the highest levels of science and of protecting human subjects◆ The establishment of crosscutting tools,analogous to the publicly accessible genomicmaps and sequence databases that haveaccelerated other genomics research (examplesof such tools might include searchable databasesof genomic legislation and policies from aroundthe world, or studies of ELSI aspects ofintroducing clinical genetic tests)◆ The evaluation of new genetic and genomictests and technologies, and effective oversight of their implementation, to ensure that only those with confirmed clinical validity are used forpatient care

BOX 5 Ethical, legal and social implications (ELSI)

© 2003 Nature Publishing Group

Page 7: feature A vision for the future of genomics researchgenetics.wustl.edu/bio5488/files/2019/01/IntroductionReading_2_coll… · human mendelian diseases,once a herculean task requiring

Grand Challenge II-2 Develop strategiesto identify gene variants that contribute togood health and resistance to diseaseMost human genetic research has tradition-ally focused on identifying genes that pre-dispose to illness. A relatively unexplored,but important, area of research focuses onthe role of genetic factors in maintaininggood health. Genomics will facilitate furtherunderstanding of this aspect of human biol-ogy and allow the identification of gene variants that are important for the mainten-ance of health, particularly in the presence ofknown environmental risk factors. One use-ful research resource would be a ‘healthycohort’, a large epidemiologically robustgroup of individuals (Box 1) with unusuallygood health, who could be compared withcohorts of individuals with diseases and whocould also be intensively studied to revealalleles protective for conditions such as dia-betes, cancer, heart disease and Alzheimer’sdisease. Another promising approach wouldbe rigorous examination of genetic variantsin individuals at high risk for specific dis-eases who do not develop them, such assedentary, obese smokers without heart dis-ease or individuals with HNPCC mutationswho do not develop colon cancer.

Grand Challenge II-3 Develop genome-based approaches to prediction of diseasesusceptibility and drug response, earlydetection of illness, and moleculartaxonomy of disease statesThe discovery of variants that affect risk fordisease could potentially be used in individ-ualized preventive medicine — includingdiet, exercise, lifestyle and pharmaceuticalintervention — to maximize the likelihoodof staying well. For example, the discovery ofvariants that correlate with successful out-comes of drug therapy, or with unfortunateside effects, could potentially be rapidlytranslated into clinical practice. Turning thisvision into reality will require the following:(1) unbiased determination of the risk associated with a particular gene variant,often overestimated in initial studies31;(2) technological advances to reduce the costof genotyping (Box 2; see ‘Quantum leaps’,below); (3) research on whether this kind ofpersonalized genomic information willactually alter health behaviours (see GrandChallenge II-5); (4) oversight of the imple-mentation of genetic tests to ensure that onlythose with demonstrated clinical validity areapplied outside of the research setting (Box5); and (5) education of healthcare profes-sionals and the public to be well-informedparticipants in this new form of preventivemedicine (Box 6).

The time is right for a focused effort tounderstand, and potentially to reclassify, allhuman illnesses on the basis of detailed mol-ecular characterization. Systematic analysesof somatic mutations, epigenetic modifica-

tions, gene expression, protein expressionand protein modification should allow thedefinition of a new molecular taxonomy ofillness, which would replace our present,largely empirical, classification schemes andadvance both disease prevention and treat-ment. The reclassification of neuromusculardiseases32 and certain types of cancer33 pro-vides striking initial examples, but manymore such applications are possible.

Such a molecular taxonomy would be thebasis for the development of better methodsfor the early detection of disease,which oftenallows more effective and less costly treat-ments. Genomics and other large-scaleapproaches to biology offer the potential for

developing new tools to detect many diseasesearlier than is currently feasible. Such ‘sentinel’ methods might include analysis ofgene expression in circulating leukocytes,proteomic analysis of body fluids, andadvanced molecular analysis of tissue biop-sies. An example would be the analysis ofgene expression in peripheral blood leuko-cytes to predict drug response. A focusedeffort to use a genomic approach to charac-terize serum proteins exhaustively in healthand disease might also be highly rewarding.

Grand Challenge II-4 Use newunderstanding of genes and pathways todevelop powerful new therapeuticapproaches to diseasePharmaceuticals on the market target fewerthan 500 human gene products34. Eventhough not all of the 30,000 or so humanprotein-coding genes7 will have productstargetable for drug development, this sug-gests that there is an enormous untappedpool of human gene-based targets for thera-peutic intervention. In addition, the newunderstanding of biological pathways pro-vided by genomics (see Grand Challenge I-2)should contribute even more fundamentallyto therapeutic design.

The information needed to determinethe therapeutic potential of a gene generallyoverlaps heavily with the information thatreveals its function. The success of imatinibmesylate (Gleevec), an inhibitor of the BCR-ABL tyrosine kinase, in treating chronicmyelogenous leukaemia relied on a detailedmolecular understanding of the disease’sgenetic cause35. This example offers promisethat therapies based on genomic informa-

feature

NATURE | VOL 422 | 24 APRIL 2003 | www.nature.com/nature 841

Marked health improvements fromintegrating genomics into individual andpublic health care depend on theeffective education of healthprofessionals and the public about the

interplay of genetic and environmental factors inhealth and disease. Health professionals must beknowledgeable about genomics to use theoutcomes of genomics research effectively. Thepublic must be knowledgeable to make informeddecisions about participation in genomics researchand to incorporate the findings of such researchinto their own health care. Both groups must beknowledgeable to engage profitably in discussionand decision-making about the societalimplications of genomics.

Promising models for genomics and genetics education exist (see, for example,www.nchpeg.org), but they must be expanded andnew models developed. We have entered a unique‘educable era’ regarding genomics; healthprofessionals and the public are increasinglyinterested in learning about genomics, but itswidespread application to health is still several

years away. For genomics-based health care to bemaximally effective once it is widely feasible, andfor members of society to make the best decisionsabout the uses of genomics, we must takeadvantage now of this unique opportunity toincrease understanding. Some examples are:◆ Health professionals vary, both individually andby discipline, in the amount and type of genomicseducation that they require. So multiple models ofeffective genomics-related education are needed.◆ Print, web and video educational products thatthe public can consume when actively seekinggenomic information should be created and madeeasily available.◆ The media are crucial sources of informationabout genomics and its societal implications.Initiatives to provide the media with greaterunderstanding of genomics are needed.◆ High-school students will be both the users ofgenomic information and the genomics researchersof the future. Especially as they educate all sectorsof society, high-school educators need informationand materials about genomics and its implicationsfor society, to use in their classrooms.

BOX 6 Education

© 2003 Nature Publishing Group

Page 8: feature A vision for the future of genomics researchgenetics.wustl.edu/bio5488/files/2019/01/IntroductionReading_2_coll… · human mendelian diseases,once a herculean task requiring

tion will be particularly effective. GrandChallenge I-1 describes the ‘functionation’ofthe genome, which will increasingly be thecritical first step in the development of newtherapeutics. But stimulating basic scientiststo approach biomedical problems with agenomic attitude is not enough. A therapeu-tic mindset, lacking in much of academicbiomedical research and training, must beexplicitly encouraged, and tools developedand provided for its implementation.

A particularly promising example of thegene-based approach to therapeutics is theapplication of ‘chemical genomics’25. Thisstrategy uses libraries of small molecules(natural compounds, aptamers or the products of combinatorial chemistry) andhigh-throughput screening to advanceunderstanding of biological pathways and to identify compounds that act as positive or negative regulators of individual gene products, pathways or cellular phenotypes.Although the pharmaceutical industryapplies this approach widely as the first stepin drug development, few academic investi-gators have access to this methodology or arefamiliar with its use.

Providing such access more broadly,through one or more centralized facilities,could lead to the discovery of a host of usefulprobes for biological pathways that wouldserve as new reagents for basic researchand/or starting points for the developmentof new therapeutic agents (the ‘hits’ fromsuch library screens will generally requiremedicinal chemistry modifications to yieldtherapeutically usable compounds).

Also needed are new, more powerfultechnologies for generating deep molecularlibraries, especially ones tagged to allow theready determination of precise moleculartargets. A centralized database of screeningresults should lead to further important biological insights. Generating molecularprobes for exploring the basic biology ofhealth and disease in academic laboratorieswould not supplant the major role of bio-pharmaceutical companies in drug develop-ment, but could contribute to the start of thedrug development pipeline. The private sector would doubtless find many of thesemolecular probes of interest for furtherexploration through optimization by medic-inal chemistry, target validation, lead com-pound identification, toxicological studiesand,ultimately,clinical trials.

Academic pursuit of this first step in drugdevelopment could be particularly valuablefor the many rare mendelian diseases, inwhich often the gene defect is known but thesmall market size limits the private sector’smotivation to shoulder the expense ofeffective pharmaceutical development. Suchtranslational research in academic laborato-ries, combined with incentives such as the US Orphan Drug Act, could profoundlyincrease the availability of effective treat-

ments for rare genetic diseases in the nextdecade. Further, the development of thera-peutic approaches to single-gene disordersmight provide valuable insights into apply-ing genomics to reveal the biology of morecommon disorders and developing moreeffective treatments for them (in the waythat, for example, the search for compoundsthat target the presenilins has led to generaltherapeutic strategies for late-onsetAlzheimer’s disease36).

Grand Challenge II-5 Investigate howgenetic risk information is conveyed inclinical settings, how that informationinfluences health strategies andbehaviours, and how these affect healthoutcomes and costsUnderstanding how genetic factors affecthealth is often cited as a major goal ofgenomics, on the assumption that applyingsuch understanding in the clinical setting willimprove health. But this assumption actuallyrests on relatively few examples and data, andmore research is needed to provide sufficientguidance about how to use genomic informa-tion optimally for improving individual orpublic health.

Theoretically, the steps by which geneticrisk information would lead to improvedhealth are: (1) an individual obtainsgenome-based information about his/herown health risks; (2) the individual uses thisinformation to develop an individualized

prevention or treatment plan; (3) the indi-vidual implements that plan; (4) this leads toimproved health; and (5) healthcare costs arereduced. Scrutiny of these assumptions isneeded, both to test them and to determinehow each step could best be accomplished indifferent clinical settings.

Research is also required that criticallyevaluates new genetic tests and interventionsin terms of parameters such as benefits,access and cost. Such research should beinterdisciplinary and use the tools andexpertise of many fields, includinggenomics, health education, health behav-iour research, health outcomes research,healthcare delivery analysis, and healthcareeconomics.Some of these fields have histori-cally paid little attention to genomics, buthigh-quality research of this sort could pro-vide important guidance in clinical decision-making — as the work of several disciplineshas already been helpful in caring for peoplewith an increased risk of colon cancer as aresult of mutations in FAP or HNPCC 37.

Grand Challenge II-6 Develop genome-based tools that improve the health of allDisparities in health status constitute a signif-icant global issue, but can genome-basedapproaches to health and disease help toreduce this problem? Social and other envi-ronmental factors are major contributors tohealth disparities; indeed, some would ques-tion whether heritable factors have any significant role. But population differences inallele frequencies for some disease-associatedvariants could be a contributing factor to cer-tain disparities in health status,so incorporat-ing this information into preventive and/orpublic-health strategies would be beneficial.Research is needed to understand the rela-tionship between genomics and health dis-parities by rigorously evaluating the diversecontributions of socioeconomic status,culture, discrimination, health behaviours,diet,environmental exposures and genetics.

It is also important to explore applicationsof genomics in the improvement of health inthe developing world (www3.who.int/whosis/genomics/genomics_report.cfm), where bothhuman and non-human genomics will playsignificant roles.If we take malaria as an exam-ple, a better understanding of human geneticfactors that influence susceptibility andresponse to the disease, and to the drugs usedto treat it, could have a significant globalimpact.So too could a better understanding ofthe malarial parasite itself and of its mosquitovector, which the recently reported genomesequences38,39 should provide. It will be neces-sary to determine the appropriate roles ofgovernmental and non-governmental organi-zations, academic institutions, industry andindividuals to ensure that genomics producesclinical benefits for resource-poor nations,and is used to produce robust local researchexpertise.

feature

842 NATURE | VOL 422 | 24 APRIL 2003 | www.nature.com/nature

The non-codingpart of the human

genome is functionallyimportant, yet little isknown about it.

© 2003 Nature Publishing Group

Page 9: feature A vision for the future of genomics researchgenetics.wustl.edu/bio5488/files/2019/01/IntroductionReading_2_coll… · human mendelian diseases,once a herculean task requiring

To ensure that genomics research benefitsall, it will be critical to examine howgenomics-based health care is accessed andused. What are the barriers to equitableaccess, and how can they be removed? This isrelevant not only in resource-poor nations,but also in wealthier countries where seg-ments of society, such as indigenous popula-tions, the uninsured, or rural and inner citycommunities,have traditionally not receivedadequate health care.

III Genomics to societyPromoting the use of genomics tomaximize benefits and minimize harmsGenomics has been at the forefront of givingserious attention, through scholarly researchand policy discussions, to the impact of sci-ence and technology on society. Althoughthe major benefits to be realized fromgenomics are in the area of health, asdescribed above, genomics can also con-tribute to other aspects of society. Just as theHGP and related developments havespawned new areas of research in basic biol-ogy and in health, they have also createdopportunities for research on social issues,even to the extent of understanding morefully how we define ourselves and each other.

In the next few years, society must not onlycontinue to grapple with numerous questionsraised by genomics, but must also formulateand implement policies to address many ofthem. Unless research provides reliable dataand rigorous approaches on which to basesuch decisions, those policies will be ill-informed and could potentially compromiseus all. To be successful, this research mustencompass both ‘basic’ investigations thatdevelop conceptual tools and shared vocabu-laries, and more ‘applied’, ‘translational’projects that use these tools to explore anddefine appropriate public-policy options thatincorporate diverse points of view.

As it has in the past, such research willcontinue to have important ramificationsfor all three major themes of the vision pre-sented here. We now address research thatfocuses on society itself, more than on biolo-gy or health. Such efforts should enable theresearch community to:◆ Analyse the impact of genomics on

concepts of race, ethnicity, kinship,individual and group identity, health,disease and ‘normality’ for traits andbehaviours.

◆ Define policy options, and their poten-tial consequences, for the use of gen-omic information and for the ethicalboundaries around genomics research.

Grand Challenge III-1 Develop policyoptions for the uses of genomics inmedical and non-medical settingsSurveys have repeatedly shown that the public is highly interested in the concept that personal genetic information might

guide them to better health, but is deeply concerned about potential misuses of thatinformation (see www.publicagenda.org/issues/pcc_detail.cfm?issue_type=medical_research&list=7). Topping the list of con-cerns is the potential for discrimination inhealth insurance and employment.A signifi-cant amount of research on this issue hasbeen done40, policy options have been published41–43, and many US states have now passed anti-discrimination legislation (see www.genome.gov/Pages/PolicyEthics/Leg/StateIns and www.genome.gov/Pages/PolicyEthics/Leg/StateEmploy). The USEqual Employment Opportunity Commis-sion has ruled that the Americans with Dis-abilities Act should apply to discriminationbased on predictive genetic information44,but the legal status of that construct remainsin some doubt. Although an executive orderprotects US government employees againstgenetic discrimination,this does not apply toother workers. Thus, many observers haveconcluded that effective federal legislation isneeded, and the US Congress is currentlyconsidering such a law.

Making certain that genetic tests offered tothe public have established clinical validityand usefulness must be a priority for futureresearch and policy making. In the UnitedStates,the Secretary’s Advisory Committee onGenetic Testing extensively reviewed this areaand concluded that further oversight is need-ed,asking the Food and Drug Administration

to review new predictive genetic tests prior to marketing (www4.od.nih.gov/oba/sacgt/reports/oversight_report.pdf). That recom-mendation has not yet been acted on; mean-while, numerous websites offering unvalidat-ed genetic tests directly to the public, oftencombined with the sale of ‘nutraceuticals’andother products of highly questionable value,are proliferating.

Many issues currently swirl around theproper conduct of genetic research involvinghuman subjects, and further work is needed to achieve a satisfactory balance between theprotection of research participants fromharm and the ability to conduct clinicalresearch that benefits society as a whole.Mucheffort has gone into developing appropriateguidelines for the use of stored tissue speci-mens (www.georgetown.edu/research/nrcbl/nbac/hbm.pdf), for community consultationwhen conducting genetic research with iden-tifiable populations (www.nigms.nih.gov/news/reports/community_consultation.html#exec), and for the consent of non-examinedfamily members when conducting pedigreeresearch (www.nih.gov/sigs/bioethics/nih_third_party_rec.html), but confusion stillremains for many investigators and institu-tional review boards.

The use of genomic information is notlimited to the arenas of biology and of health,and further research and development of pol-icy options is also needed for the many otherapplications of such information. The arrayof additional users is likely to include the life,disability and long-term care insuranceindustries, the legal system, the military, edu-cational institutions and adoption agencies.Although some of the research informing themedical uses of genomics will be useful inbroader settings, dedicated research outsidethe healthcare sphere is needed to explore thepublic values that apply to uses of genomicsother than for health care and their relation-ship to specific contextual applications. Forexample, should genetic information on pre-disposition to hyperactivity be available inthe future to school officials? Or shouldgenetic information about behavioural traitsbe admissible in criminal or civil proceed-ings? Genomics also provides greater oppor-tunity to understand ancestral origins ofpopulations and individuals, which raisesissues such as whether genetic informationshould be used for defining membership in aminority group.

Because uses of genomics outside thehealthcare setting will involve a significantlybroader community of stakeholders, bothresearch and policy development in this areamust involve individuals and organizationsbesides those involved in the medical applica-tions of genomics. But many of the same perspectives essential to research and policydevelopment for the medical uses of genomicsare also essential. Both the potential users ofnon-medical applications of genomics and the

feature

NATURE | VOL 422 | 24 APRIL 2003 | www.nature.com/nature 843

It should be possible to understand the

difference between a‘bag of molecules’ anda biological system.

© 2003 Nature Publishing Group

Page 10: feature A vision for the future of genomics researchgenetics.wustl.edu/bio5488/files/2019/01/IntroductionReading_2_coll… · human mendelian diseases,once a herculean task requiring

public need education to understand betterthe nature and limits of genomic information(Box 6) and to grasp the ethical,legal and socialimplications of its uses outside health care(Box 5).

Grand Challenge III-2 Understand therelationships between genomics, race andethnicity, and the consequences ofuncovering these relationshipsRace is a largely non-biological concept con-founded by misunderstanding and a longhistory of prejudice. The relationship ofgenomics to the concepts of race and ethni-city has to be considered within complex historical and social contexts.

Most variation in the genome is sharedbetween all populations, but certain allelesare more frequent in some populations thanin others, largely as a result of history andgeography.Use of genetic data to define racialgroups, or of racial categories to classify bio-logical traits, is prone to misinterpretation.To minimize such misinterpretation, the bio-logical and sociocultural factors that inter-relate genetics with constructs of race andethnicity need to be better understood andcommunicated within the next few years.

This will require research on how differ-ent individuals and cultures conceive of race,ethnicity, group identity and self-identity,and what role they believe genes or other biological factors have. It will also require acritical examination of how the scientificcommunity understands and uses these con-cepts in designing research and presentingfindings, and of how the media report these.Also necessary is widespread educationabout the biological meaning and limita-tions of research findings in this area (Box 6),and the formulation and adoption ofpublic-policy options that protect againstgenomics-based discrimination or maltreat-ment (see Grand Challenge III-1).

Grand Challenge III-3 Understand theconsequences of uncovering the genomiccontributions to human traits andbehavioursGenes influence not only health and disease,but also human traits and behaviours. Sci-ence is only beginning to unravel the compli-cated pathways that underlie such attributesas handedness, cognition, diurnal rhythmsand various behavioural characteristics. Toooften, research in behavioural genetics, suchas that regarding sexual orientation or intel-ligence, has been poorly designed and itsfindings have been communicated in a waythat oversimplifies and overstates the role ofgenetic factors.This has caused serious prob-lems for those who have been stigmatized bythe suggestion that alleles associated withwhat some people perceive as ‘negative’phys-iological or behavioural traits are more frequent in certain populations. Given thishistory and the real potential for recurrence,

it is particularly important to gather suffi-cient scientifically valid information aboutgenetic and environmental factors to pro-vide a sound understanding of the contribu-tions and interactions between genes andenvironment in these complex phenotypes.

It is also important that there be robustresearch to investigate the implications, forboth individuals and society, of uncoveringany genomic contributions that there may be to traits and behaviours. The field ofgenomics has a responsibility to consider thesocial implications of research into thegenetic contributions to traits and behav-iours, perhaps an even greater responsibilitythan in other areas where there is less of a his-tory of misunderstanding and stigmatiza-tion.Decisions about research in this area areoften best made with input from a diversegroup of individuals and organizations.

Grand Challenge III-4 Assess how todefine the ethical boundaries for uses ofgenomicsGenetics and genomics can contribute under-standing to many areas of biology, health andlife. Some of these human applications arecontroversial,with some members of the pub-lic questioning the propriety of their scientificexploration. Although freedom of scientificinquiry has been a cardinal feature of humanprogress, it is not unbounded. It is importantfor society to define the appropriate and inap-propriate uses of genomics. Conversations

between diverse parties based on an accurateand detailed understanding of the relevant sci-ence and ethical, legal and social factors willpromote the formulation and implementa-tion of effective policies.For instance,in repro-ductive genetic testing, it is crucial to includeperspectives from the disability community.Research should explore how different indi-viduals, cultures and religious traditions viewthe ethical boundaries for the uses of genomics— for instance,which sets of values determineattitudes towards the appropriateness ofapplying genomics to such areas as reproduc-tive genetic testing,‘genetic enhancement’andgermline gene transfer.

Implementation: the NHGRI’s roleThe vision for the future of genomics presented here is broad and deep, and its real-ization will require the efforts of many. Con-tinuation of the extensive collaborationbetween scientists and between fundingsources that characterized the HGP will beessential. Although the NHGRI intends toparticipate in all the research areas discussedhere, it will need to focus its efforts to use itsfinite resources as effectively as possible.Thus,it will take a major role in some areas, activelycollaborate in others,and have only a support-ing role in yet others. The NHGRI’s prioritiesand areas of emphasis will also evolve as mile-stones are met and new opportunities arise.

The approach that has characterizedgenomics and led to the success of the HGP— an initial focus on technology develop-ment and feasibility studies, followed bypilot efforts to learn how to apply new strat-egies and technologies efficiently on a largerscale, and then implementation of full-scaleproduction efforts — will continue to be atthe heart of the NHGRI’s priority-settingprocess. The following are areas of highinterest,not listed in priority order.

Large-scale production of genomic datasets The NHGRI will continue to supportgenomic sequencing, focusing on thegenomes of mammals, vertebrates, chord-ates and invertebrates; other funders willsupport the determination of additionalgenome sequences from microbes andplants.With current technology, the NHGRIcould support the determination of as much as 45–60 gigabases of genomic DNAsequence, or the equivalent of 15–20 humangenomes, over the next five years. But as thecost of sequencing continues to decrease, thecost/benefit ratio of sequence generationwill improve, so that the actual amount ofsequencing done will be greatly affected bythe development of improved sequencingtechnology.

The decisions about which genomes tosequence next will be based on the results ofcomparative analyses that reveal the ability ofgenomic sequences from unexplored phylo-genetic positions to inform the interpreta-

feature

844 NATURE | VOL 422 | 24 APRIL 2003 | www.nature.com/nature

The time is right todevelop and apply

large-scale genomicstrategies to improvehuman health.

© 2003 Nature Publishing Group

Page 11: feature A vision for the future of genomics researchgenetics.wustl.edu/bio5488/files/2019/01/IntroductionReading_2_coll… · human mendelian diseases,once a herculean task requiring

tion of the human sequence and to provideother insights. Finally, the degree to whichany new genomic sequence is completed —finished, taken to an advanced draft stage orlightly sampled — will be determined by the use for which the sequence is generated.And,of course,the NHGRI’s sequencing pro-gramme will maintain close contact with,and take account of the plans and output of, other sequencing programmes, as hashappened throughout the HGP.

A second data set ready for production-level effort is the human haplotype map(HapMap). This project, a collaborationbetween the NHGRI, many other NIH insti-tutes, and four international partners, isscheduled for completion within three years.The outcome of the International HapMapProject will significantly shape the futuredirection of the NHGRI’s research efforts inthe area of genetic variation.

Pilot-scale efforts The NHGRI has initiatedthe ENCODE Project to begin the develop-ment of the human genome ‘parts list’. Thefirst phase will address the application andimprovement of existing technologies forthe large-scale identification of codingsequences, transcription units and otherfunctional elements for which technology iscurrently available. When the results of theENCODE Project show evidence of efficacyand affordability at the pilot scale,considera-tion will be given to implementing theappropriate technologies across the entirehuman genome.

Technology development Many areas ofcritical importance to the realization of thegenomics-based vision for biomedicalresearch require new technological andmethodological developments before pilotsand then large-scale approaches can beattempted. Recognizing that technologydevelopment is an expensive and high-riskundertaking, the NHGRI is nevertheless committed to supporting and fostering tech-nology development in many of these crucialareas, including the following.

DNA sequencing. There is still greatopportunity to reduce the cost and increasethe throughput of DNA sequencing, and tomake rapid, cheap sequencing availablemore broadly. Radical reduction of sequen-cing costs would lead to very differentapproaches to biomedical research.

Genetic variation. Improved genotypingmethods and better mathematical methodsare necessary to make effective use of infor-mation about the structure of variation inthe human genome for identifying the genetic contributions to human diseases andother complex traits.

The genome ‘parts list’. Beyond codingsequences and transcriptional units, newcomputational and experimental approachesare needed to allow the comprehensive deter-

mination of all sequence-encoded functionalelements in genomes.

Proteomics. In the short term,the NHGRIexpects to focus on the development ofappropriate, scalable technologies for thecomprehensive analysis of proteins and protein machines in human health and inboth rare and complex diseases.

Pathways and networks. As a complementto the development of the genome ‘parts list’and increasingly effective approaches to pro-teome analysis,the NHGRI will encourage thedevelopment of new technologies that gener-ate a synthetic view of genetic regulatory networks and interacting protein pathways.

Genetic contributions to health, diseaseand drug response. The NHGRI will place ahigh priority on creating and applying newcrosscutting genomics tools, technologiesand strategies needed to identify the geneticbases of medically relevant phenotypes.Research on the genetic contributions to rareand common diseases, and to drug response,will typically involve biological systems anddiseases of primary interest to other NIHinstitutes and other funding organizations.Accordingly, the NHGRI expects that itsinvolvement in this area of research willoften be implemented through partnershipsand collaborations. The NHGRI is particu-larly interested in stimulating researchapproaches to the identification of gene variants that confer disease resistance andother manifestations of ‘good health’.

Molecular probes, including small mol-ecules and RNA-mediated interference, forexploring basic biology and disease. Explo-ration of the feasibility of expanding chemical genomics in the academic and public sectors,particularly with regard to theestablishment of one or more centralizedfacilities, will be pursued by the NHGRI inpartnership with others.

Databases Another type of communityresource for the biological and biomedicalresearch communities is represented by data-bases (Box 3). But their support represents apotentially significant problem. Fundingagencies, reflecting the interest of theresearch community, tend to prefer to usetheir research funds to support the genera-tion of new data, and the ongoing need forcontinued and increasing support for thedata archives and robust access to them isoften given less attention. Both the scientificcommunity and the funding agencies mustrecognize that investment in the creation and maintenance of effective databases is as important a component of research funding as data generation. The NHGRI hasbeen a major source of support for severalmajor genetics/genomics-oriented databas-es, including the Mouse Genome Database(www.informatics.jax.org/mgihome/MGD/aboutMGD.shtml), the SaccharomycesGenome Database (genome-www.stanford.edu/Saccharomyces), FlyBase (flybase.bio.indiana.edu), WormBase (www.wormbase.org) and Online Mendelian Inheritance inMan (www.ncbi.nlm.nih.gov/omim). TheNHGRI will continue to be a leader in exploring effective solutions to the issues ofintegrating, displaying and providing accessto genomic information.

Ethical, legal and social research TheNHGRI’s ELSI research activities willincreasingly focus on fundamental, widelyrelevant, societal issues. The community ofscholars and researchers working in thesesocial fields, as well as the scope of issuesbeing explored, need to be expanded. TheELSI research community must includeindividuals from minority and other com-munities that may be disproportionatelyaffected by the use or misuse of genetic infor-mation. New mechanisms for promotingdialogue and collaboration between the ELSIresearchers and genomic and clinicalresearchers need to be developed; suchexamples might include structural rewardsfor interdisciplinary research, intensivesummer courses or mini-fellowships forcross-training, and the creation of centres ofexcellence in ELSI studies to allow sustainedinterdisciplinary collaboration.

Longitudinal population cohort(s) Thispromising research resource will be sobroadly applicable, and will require such

feature

NATURE | VOL 422 | 24 APRIL 2003 | www.nature.com/nature 845

Society mustformulate policies

to address many of thequestions raised bygenomics.

© 2003 Nature Publishing Group

Page 12: feature A vision for the future of genomics researchgenetics.wustl.edu/bio5488/files/2019/01/IntroductionReading_2_coll… · human mendelian diseases,once a herculean task requiring

extensive funding that, although the NHGRImight have a supporting role in design andoversight, success will demand the involve-ment and support of many other fundingsources.

Non-genetic factors in health and disease Aconsequence of an improved definition ofthe genetic factors underlying human healthand disease will be an improvement in therecognition and definition of the environ-mental and other non-genetic contributionsto those traits. This is another area in whichthe NHGRI will be involved through thedevelopment of new strategies and by form-ing partnerships.

Use of genomic information to improvehealth care The NHGRI will catalyse collaboration between the diverse scholarlydisciplines whose joint efforts will be neces-sary for research on the best ways for patientsand healthcare providers to make effectiveuse of personalized genetic information inthe improvement of health. The NHGRI will also strive to ensure that research in this area is informed by, and extends knowledge of, the societal implications of genomics.

Improving the health of all people It will beimportant for the NHGRI to supportresearch that explores how to ensure thatgenomic information is used, to the extentthat such information is relevant, to reduceglobal health disparities. That will include avigorous effort to increase the representationof minorities in the ranks of genomicsresearchers.But the full solution of the healthdisparities problem can only come aboutthrough a committed and sustained effort bygovernments,medical systems and society.

Policy development The NHGRI will con-tinue to help facilitate public-policy devel-opment in the area of genetic/genomic science. Effective policy development willrequire attention to those issues for which itcould have the greatest impact on the policyagenda and could help to facilitate genomicscience. The NHGRI will also focus on issuesthat would assist the public in benefitingfrom genomics, such as privacy of geneticinformation, access to genetics services,direct-to-consumer/providers marketing,patenting and licensing of genetic infor-mation, appropriate treatment of humanparticipants in research, and standards,usefulness and quality in genetic testing.

Data releaseAn important lesson of the HGP has been the benefit of immediately releasing datafrom large-scale sequencing projects, asembodied in the Bermuda principles(www.gene.ucl.ac.uk/hugo/bermuda.htm).Some other large-scale data production

projects have followed suit (such as those for full-length cDNAs and single-nucleotidepolymorphisms), to the benefit of the scien-tific community. Scientific progress andpublic benefit will be maximized by early,open and continuing access to large data setsand by ensuring that excellent scientists areattracted to the task of producing moreresources of this sort. For this system to con-tinue to work, the producers of community-resource data sets have an obligation to makethe results of their efforts rapidly availablefor free and unrestricted use by the scientificcommunity, and resource users have anobligation to recognize and respect theimportant contribution made by the scien-tists who contribute their time and efforts toresource production.

Although these principles have been gener-ally realized in the case of genomic DNAsequencing,they have not been for many othertypes of community-resource projects (struc-tural biology coordinates or gene expressiondata, for example). The development of effec-tive systems for achieving the rapid release ofdata without restrictions and for providingcontinued widespread access to materials andresearch tools should be an integral compo-nent of the planning and development of newcommunity resources. The scientific commu-nity should also develop incentives to supportthe voluntary release of such data before publi-cation by individual investigators, by appro-priately rewarding and protecting the interests

of scientists who wish to share their data withthe community in such a generous manner.

Quantum leaps It is interesting to speculate about potentialrevolutionary technical developments thatmight enhance research and clinical applica-tions in a fashion that would rewrite entireapproaches to biomedicine. The advent ofthe polymerase chain reaction, large-insertcloning systems and methods for low-cost,high-throughput DNA sequencing areexamples of such advances that have alreadyoccurred.

During the course of the NHGRI’s plan-ning discussions, other ideas were raisedabout analogous ‘technological leaps’ thatseem so far off as to be almost fictional butwhich, if they could be achieved,would revo-lutionize biomedical research and clinicalpractice.

The following is not intended to be anexhaustive list, but to provoke creativedreaming:◆ the ability to determine a genotype at

very low cost, allowing an associationstudy in which 2,000 individualscould be screened with about 400,000genetic markers for $10,000 or less;

◆ the ability to sequence DNA at coststhat are lower by four to five orders of magnitude than the current cost,allowing a human genome to besequenced for $1,000 or less;

◆ the ability to synthesize long DNAmolecules at high accuracy for $0.01per base, allowing the synthesis ofgene-sized pieces of DNA of anysequence for between $10 and$10,000;

◆ the ability to determine the methyla-tion status of all the DNA in a singlecell; and

◆ the ability to monitor the state of allproteins in a single cell in a singleexperiment.

ConclusionsPreparing a vision for the future of genomicsresearch has been both daunting and exhila-rating. The willingness of hundreds ofexperts to volunteer their boldest and bestideas, to step outside their areas of self-inter-est and to engage in intense debates aboutopportunities and priorities, has added arichness and audacity to the outcome thatwas not fully anticipated when the planningprocess began. To the extent that this articlecaptures the sense of excitement of the newdiscipline of genomics, it is to their credit. Acomplete list of the participants in this plan-ning process can be found at www.genome.gov/About/Vision/Acknowledgements.

A final word is appropriate about thebreadth of the vision articulated here. Achoice had to be made between portraying a broad view of the future of genomics

feature

846 NATURE | VOL 422 | 24 APRIL 2003 | www.nature.com/nature

Scientific progresswill be maximized

by early, open andcontinuing access tolarge data sets.

© 2003 Nature Publishing Group

Page 13: feature A vision for the future of genomics researchgenetics.wustl.edu/bio5488/files/2019/01/IntroductionReading_2_coll… · human mendelian diseases,once a herculean task requiring

research and focusing more narrowly on thespecific role of the NHGRI.Recognizing thatresearchers and the public are more interest-ed in the promise of the field than about thefunding source responsible, we have focusedhere on the broad landscape of scientificopportunity. We have, however, identifiedthe areas that are particularly appropriate forleadership by the NHGRI throughout thisarticle. These are generally research areasthat are not specific to a particular disease ororgan system, but have broader biomedicaland/or social implications. Yet even in thoseinstances, the word ‘partnership’ appearsnumerous times intentionally. We expect tohave partnerships not only with other publicfunding sources, such as the other 26 NIHinstitutes and centres, but also with manyother governmental agencies, private foun-dations and private-sector organizations.Indeed, public–private partnerships, such asthe SNP Consortium,the Mouse SequencingConsortium and the International HapMapProject, provide powerful new models forthe generation of public data sets withimmediate and far-reaching value. Thus,many of the most exciting opportunities ingenomics research cross traditional bound-aries of specific disease definitions, classi-cally defined scientific disciplines, fundingsources and public versus private enterprise.The new era will flourish best in an environ-ment where such traditional boundariesbecome ever more porous.

Although the opportunities describedhere are thought to be highly achievable, theformal initiation of specific programmeswill require more detailed analysis. The rela-tive priorities of each component must beaddressed in the light of limited resources to support research. The NHGRI plans torelease a revised programme announcementand other grant solicitations later this year,providing more specific guidance to extra-mural researchers about plans for the imple-mentation of this vision. Furthermore, ingenomics research,we have learned to expectthe unexpected. From past experience, itwould be surprising (and rather disappoint-ing) if biological,medical and social contextsdid not change in unpredictable ways. Thatreality requires that this vision be revisitedon a regular basis.

In conclusion, the successful completionthis month of all of the original goals of theHGP emboldens the launch of a new phasefor genomics research, to explore theremarkable landscape of opportunity thatnow opens up before us. Like Shakespeare,we are inclined to say, “what’s past is pro-logue” (The Tempest, Act II, Scene 1). If we,like bold architects, can design and build thisunprecedented and noble structure, restingon the firm bedrock foundation of the HGP (Figure 2), then the true promise of genomics research for benefitinghumankind can be realized.

“Make no little plans; they have no magicto stir men’s blood and probably will themselves not be realized. Make big plans;aim high in hope and work, rememberingthat a noble, logical diagram once recordedwill not die, but long after we are gone will bea living thing, asserting itself with ever-growing insistency” (attributed to Daniel Burnham,architect). ■

Francis S. Collins, Eric D. Green, Alan E.Guttmacher and Mark S. Guyer are at theNational Human Genome Research Institute,National Institutes of Health, Bethesda, Maryland20892, USA 1. Mendel, G. Versuche über Pflanzen-Hybriden. Verhandlungen

des naturforschenden Vereines, Abhandlungen, Brünn 4, 3–47

(1866).

2. Avery, O. T., MacLeod, C. M. & McCarty, M. Studies of the

chemical nature of the substance inducing transformation of

pneumococcal types. Induction of transformation by a

desoxyribonucleic acid fraction isolated from Pneumococcus

Type III. J. Exp. Med. 79, 137–158 (1944).

3. Watson, J. D. & Crick, F. H. C. Molecular structure of nucleic

acids: A structure for deoxyribose nucleic acid. Nature 171, 737

(1953).

4. Nirenberg, M. W. The genetic code: II. Sci. Am. 208, 80–94 (1963).

5. Jackson, D. A., Symons, R. H. & Berg, P. Biochemical method for

inserting new genetic information into DNA of Simian Virus 40:

circular SV40 DNA molecules containing lambda phage genes

and the galactose operon of Escherichia coli. Proc. Natl Acad. Sci.

USA 69, 2904–2909 (1972).

6. Cohen, S. N., Chang, A. C., Boyer, H. W. & Helling, R. B.

Construction of biologically functional bacterial plasmids in

vitro. Proc. Natl Acad. Sci. USA 70, 3240–3244 (1973).

7. The International Human Genome Sequencing Consortium.

Initial sequencing and analysis of the human genome. Nature

409, 860–921 (2001).

8. Sanger, F. & Coulson, A. R. A rapid method for determining

sequences in DNA by primed synthesis with DNA polymerase.

J. Mol. Biol. 94, 441–448 (1975).

9. Maxam, A. M. & Gilbert, W. A new method for sequencing DNA.

Proc. Natl Acad. Sci. USA 74, 560–564 (1977).

10.Smith, L. M. et al. Fluorescence detection in automated DNA-

sequence analysis. Nature 321, 674–679 (1986).

11.The Mouse Genome Sequencing Consortium. Initial sequencing

and comparative analysis of the mouse genome. Nature 420,

520–562 (2002).

12.The Chipping Forecast II. Nature Genet. 32, 461–552 (2002).

13.Guttmacher, A. E. & Collins, F. S. Genomic medicine — A

primer. N. Engl. J. Med. 347, 1512–1520 (2002).

14.National Research Council. Mapping and Sequencing the Human

Genome (National Academy Press, Washington DC, 1988).

15.US Department of Health and Human Services, US DOE.

Understanding Our Genetic Inheritance. The US Human Genome

Project: The First Five Years. NIH Publication No. 90-1590

(National Institutes of Health, Bethesda, MD, 1990).

16.Collins, F. & Galas, D. A new five-year plan for the US Human

Genome Project. Science 262, 43–46 (1993).

17.Collins, F. S. et al. New goals for the US Human Genome Project:

1998–2003. Science 282, 682–689 (1998).

18.Hilbert, D. Mathematical problems. Bull. Am. Math. Soc. 8,

437–479 (1902).

19.Aparicio, S. et al. Whole-genome shotgun assembly and analysis

of the genome of Fugu rubripes. Science 297, 1301–1310 (2002).

20.Sidow, A. Sequence first. Ask questions later. Cell 111, 13–16

(2002).

21.Zhang, M. Q. Computational prediction of eukaryotic protein-

coding genes. Nature Rev. Genet. 3, 698–709 (2002).

22.Banerjee, N. & Zhang, M. X. Functional genomics as applied to

mapping transcription regulatory networks. Curr. Opin.

Microbiol. 5, 313–317 (2002).

23.Van der Weyden, L., Adams, D. J. & Bradley, A. Tools for targeted

manipulation of the mouse genome. Physiol. Genomics 11,

133–164 (2002).

24.Hannon, G. J. RNA interference. Nature 418, 244–251 (2002).

25.Stockwell, B. R. Chemical genetics: Ligand-based discovery of

gene function. Nature Rev. Genet. 1, 116–125 (2000).

26.Gavin, A. C. et al. Functional organization of the yeast proteome

by systematic analysis of protein complexes. Nature 415,

141–147 (2002).

27.Tyson, J. J., Chen, K. & Novak, B. Network dynamics and cell

physiology. Nature Rev. Mol. Cell Biol. 2, 908–916 (2001).

28.Sachidanandam, R. et al. A map of human genome sequence

variation containing 1.42 million single nucleotide

polymorphisms. Nature 409, 928–933 (2001).

29.Gabriel, S. B. et al. The structure of haplotype blocks in the

human genome. Science 296, 2225–2229 (2002).

30.Reich, D. E. & Lander, E. S. On the allelic spectrum of human

disease. Trends Genet. 17, 502–510 (2001).

31.Hirschhorn, J. N., Lohmueller, K., Byrne, E. & Hirschhorn, K. A

comprehensive review of genetic association studies. Genet.

Med. 4, 45–61 (2002).

32.Wagner, K. R. Genetic diseases of muscle. Neurol. Clin. 20,

645–678 (2002).

33.Golub, T. R. Genomic approaches to the pathogenesis of

hematologic malignancy. Curr. Opin. Hematol. 8, 252–261

(2001).

34.Drews, J. & Ryser, S. The role of innovation in drug development.

Nature Biotechnol. 15, 1318–1319 (1997).

35.Druker, B. J. Imatinib alone and in combination for chronic

myeloid leukemia. Semin. Hematol. 40, 50–8 (2003).

36.Selkoe, D. J. Alzheimer’s disease: genes, proteins, and therapy.

Physiol. Rev. 81, 741–66 (2001).

37.Lynch, H. T. & de la Chapelle, A. Genomic medicine: hereditary

colorectal cancer. N. Engl. J. Med. 348, 919–932 (2003).

38.Gardner, M. J. et al. Genome sequence of the human malaria

parasite Plasmodium falciparum. Nature 419, 498–511 (2002).

39.Holt, R. A. et al. The genome sequence of the malaria mosquito

Anopheles gambiae. Science 298, 129–149 (2002).

40.Anderlik, M. R. & Rothstein, M. A. Privacy and confidentiality of

genetic information: What rules for the new science? Annu. Rev.

Genom. Hum. Genet. 2, 401–433 (2001).

41.Hudson, K. L., Rothenberg, K. H., Andrews, L. B., Kahn, M. J. E.

& Collins, F. S. Genetic discrimination and health-insurance —

An urgent need for reform. Science 270, 391–393 (1995).

42.Rothenberg, K. et al. Genetic information and the workplace:

Legislative approaches and policy challenges. Science 275,

1755–1757 (1997).

43.Fuller, B. P. et al. Policy forum: Ethics — privacy in genetics

research. Science 285, 1359–1361 (1999).

44.Miller, P. S. Is there a pink slip in my genes? J. Health Care Law

Policy 3, 225–265 (2000).

AcknowledgementsThe formulation of this vision could not have happened without the

thoughtful and dedicated contributions of a large number of

people. The authors were greatly assisted by Kathy Hudson, Elke

Jordan, Susan Vasquez, Kris Wetterstrand, Darryl Leja and Robert

Nussbaum. A subcommittee of the National Advisory Council for

Human Genome Research, including Wylie Burke, William Gelbart,

Eric Juengst, Maynard Olson, Robert Tepper and David Valle,

provided a critical sounding board for draft versions of this

document. We also thank Aravinda Chakravarti, Ellen Wright

Clayton, Raynard Kington, Eric Lander, Richard Lifton and Sharon

Terry for serving as working-group chairs at the meeting in

November 2002 that refined this document. Finally, we thank the

hundreds of individuals who participated as workshop planners

and/or participants during this 18-month process.

feature

NATURE | VOL 422 | 24 APRIL 2003 | www.nature.com/nature 847© 2003 Nature Publishing Group