news & views: the chimpanzee and us

2
NEWS & VIEWS Vol 437| 1 September 2005 50 The chimpanzee and us Wen-Hsiung Li and Matthew A. Saunders Publication of the draft DNA sequence of the chimpanzee genome is an especially notable event: the data provide a treasury of information for understanding human biology and evolution. What genetic changes make us so different from the chimpanzee, our closest relative? Scientists have been trying to answer this challenging question for decades, and publica- tion of the draft of the chimpanzee genome (page 69 of this issue) 1 is a significant step for- ward. The species studied is the common chimpanzee, Pan troglodytes; its only ‘sister’ species is the pygmy chimpanzee or bonobo, Pan paniscus (Fig. 1). The draft tells us that the DNA sequence of our genome and that of the chimpanzee differ by only a few per cent. This still amounts to tens of millions of differences because each genome contains some 3 billion nucleotides. One way to determine what the important differences are is to identify evolu- tionary changes that are specific to us, Homo sapiens. Another is to look for signatures of positive natural selection in the sequences of the two genomes. Both of these approaches, and other comparative analyses, are described in the draft-genome paper 1 and the compan- ion papers (pages 88–104) 2–4 . The assembly of a complete genome requires multiple rounds of sequencing. The chim- panzee genome draft represents a sequencing coverage of about 3.5 times, lower than that in the initial publication of other genomes, such as those of human, mouse and rat. Nonethe- less, the draft is extremely useful for showing general differences between the chimpanzee and human genomes. The new data show that they differ by only 1.23% in terms of nucleotide substitutions. This is identical to a previous estimate from a mere 53 regions, each of about 500 base pairs, randomly chosen from the genome 5 . The sequence divergence varies among genomic regions, presumably because of regional variations in mutation rate, selective constraints and the rate of sequence exchange (recombination) between chromosome pairs during cell division. The highest divergence is found for the Y chromosome and the lowest for the X chromosome. This is expected, because the Y chromosome is present only in males, which have a higher germ-line mutation rate than females, whereas the X chromosome is carried in both females and males. Natural selection is commonly thought to operate mainly at the protein level. For this reason, nucleotide changes in protein-coding regions are usually classified into two groups: ‘synonymous changes’ (which do not cause any change in amino acids) and ‘non-synonymous changes’ (which do cause amino-acid changes). If a coding region is subject to strong selective constraints, then the non-synonymous substi- tution rate (K A ) will be considerably lower than the synonymous substitution rate (K S ); that is, the K A /K S ratio will be less than 1. On the other hand, if a gene is subject to very weak selective constraints or continued positive selection, K A /K S may be close to 1 or even higher. Comparison 1 of 13,454 human–chim- panzee gene pairs gives an average K A /K S of 0.23, much lower than previously estimated from more limited data sets of human–chim- panzee (0.63) 6 and human–baboon (0.34) 7 comparisons. This ratio is twice that estimated from the mouse–rat comparison (0.13): this is probably due to less effective purifying selection, a process that eliminates deleterious mutations, in species with relatively small pop- ulation sizes such as primates. Importantly, the new estimate is similar to the K A /K S from data on variation among humans (~0.20–0.23), suggesting that the proportion of advanta- geous mutations along the human lineage is lower than previously estimated 7,8 . A total of 585 genes (more than that expected at ran- dom) do, however, display a higher K A than the substitution rate in non-coding sequence (K I ). The highest K A /K I examples include the genes that encode glycophorin C, granulysin, protamine and semenogelin, proteins that are involved in immunity or reproduction. Duplications, insertions and deletions Although single-nucleotide substitutions are commonly considered when quantifying sequence divergence, insertions/deletions (indels) and recent duplications of DNA seg- ments account for a markedly larger propor- tion of the difference between the human and chimpanzee genomes (3% and 2.7%, respec- tively). More than a third of the indels are due to repeated sequences, and about a quar- ter to transposable elements. These are DNA sequences that can move to different genomic regions, two of the major classes being Alu ele- ments (short transposable sequences about 300 base pairs long) and L1 elements (long transposable sequences). There are approximately 7,000 Alu elements in the human genome but only about 2,300 in the chimpanzee genome, indicating that these elements have been less active in the chimpanzee. L1 elements, however, have been equally active in the two genomes — against the previous estimate of two- to three- fold higher activity in the chimpanzee 9 . The functional importance, if any, of these differ- ences remains unknown. Recent segmental duplications (of longer than 20 megabases and Figure 1 | Evolutionary relationships among the higher primates. Divergence of the chimpanzee and human lineages occurred about 6 million years ago; the times of lineage divergence are not to scale. K. LANGERGRABER Nature Publishing Group ©2005

Upload: matthew-a

Post on 29-Jul-2016

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: News & Views: The chimpanzee and us

NEWS & VIEWS

Vol 437|1 September 2005

50

The chimpanzee and usWen-Hsiung Li and Matthew A. Saunders

Publication of the draft DNA sequence of the chimpanzee genome is an especially notable event: the dataprovide a treasury of information for understanding human biology and evolution.

What genetic changes make us so differentfrom the chimpanzee, our closest relative? Scientists have been trying to answer this challenging question for decades, and publica-tion of the draft of the chimpanzee genome(page 69 of this issue)1 is a significant step for-ward. The species studied is the commonchimpanzee, Pan troglodytes; its only ‘sister’species is the pygmy chimpanzee or bonobo,Pan paniscus (Fig. 1).

The draft tells us that the DNA sequence of our genome and that of the chimpanzee differ by only a few per cent. This still amountsto tens of millions of differences because each genome contains some 3 billionnucleotides. One way to determine what theimportant differences are is to identify evolu-tionary changes that are specific to us, Homosapiens. Another is to look for signatures ofpositive natural selection in the sequences ofthe two genomes. Both of these approaches,and other comparative analyses, are describedin the draft-genome paper1 and the compan-ion papers (pages 88–104)2–4.

The assembly of a complete genome requiresmultiple rounds of sequencing. The chim-panzee genome draft represents a sequencingcoverage of about 3.5 times, lower than that inthe initial publication of other genomes, suchas those of human, mouse and rat. Nonethe-less, the draft is extremely useful for showinggeneral differences between the chimpanzeeand human genomes. The new data show thatthey differ by only 1.23% in terms of nucleotidesubstitutions. This is identical to a previousestimate from a mere 53 regions, each of about500 base pairs, randomly chosen from thegenome5.

The sequence divergence varies amonggenomic regions, presumably because ofregional variations in mutation rate, selectiveconstraints and the rate of sequence exchange(recombination) between chromosome pairsduring cell division. The highest divergence isfound for the Y chromosome and the lowest forthe X chromosome. This is expected, becausethe Y chromosome is present only in males,which have a higher germ-line mutation ratethan females, whereas the X chromosome iscarried in both females and males.

Natural selection is commonly thought tooperate mainly at the protein level. For this

reason, nucleotide changes in protein-codingregions are usually classified into two groups:‘synonymous changes’ (which do not cause anychange in amino acids) and ‘non-synonymouschanges’ (which do cause amino-acid changes).If a coding region is subject to strong selectiveconstraints, then the non-synonymous substi-tution rate (KA) will be considerably lower thanthe synonymous substitution rate (KS); that is,the KA/KS ratio will be less than 1. On the otherhand, if a gene is subject to very weak selectiveconstraints or continued positive selection,KA/KS may be close to 1 or even higher.

Comparison1 of 13,454 human–chim-panzee gene pairs gives an average KA/KS of0.23, much lower than previously estimatedfrom more limited data sets of human–chim-panzee (0.63)6 and human–baboon (0.34)7

comparisons. This ratio is twice that estimatedfrom the mouse–rat comparison (0.13): this is probably due to less effective purifying selection, a process that eliminates deleteriousmutations, in species with relatively small pop-ulation sizes such as primates. Importantly, thenew estimate is similar to the KA/KS from dataon variation among humans (~0.20–0.23),suggesting that the proportion of advanta-geous mutations along the human lineage islower than previously estimated7,8. A total of585 genes (more than that expected at ran-dom) do, however, display a higher KA thanthe substitution rate in non-coding sequence

(KI). The highest KA/KI examples include thegenes that encode glycophorin C, granulysin,protamine and semenogelin, proteins that areinvolved in immunity or reproduction.

Duplications, insertions and deletionsAlthough single-nucleotide substitutions arecommonly considered when quantifyingsequence divergence, insertions/deletions(indels) and recent duplications of DNA seg-ments account for a markedly larger propor-tion of the difference between the human andchimpanzee genomes (3% and 2.7%, respec-tively). More than a third of the indels are due to repeated sequences, and about a quar-ter to transposable elements. These are DNAsequences that can move to different genomicregions, two of the major classes being Alu ele-ments (short transposable sequences about300 base pairs long) and L1 elements (longtransposable sequences).

There are approximately 7,000 Alu elementsin the human genome but only about 2,300 in the chimpanzee genome, indicating thatthese elements have been less active in thechimpanzee. L1 elements, however, have been equally active in the two genomes —against the previous estimate of two- to three-fold higher activity in the chimpanzee9. Thefunctional importance, if any, of these differ-ences remains unknown. Recent segmentalduplications (of longer than 20 megabases and

Figure 1 | Evolutionary relationships among the higher primates. Divergence of the chimpanzee andhuman lineages occurred about 6 million years ago; the times of lineage divergence are not to scale.

K.L

AN

GE

RG

RA

BE

R

1.9 News & Views Chimp MH 26/8/05 11:21 AM Page 50

Nature Publishing Group© 2005

© 2005 Nature Publishing Group

Page 2: News & Views: The chimpanzee and us

NATURE|Vol 437|1 September 2005 NEWS & VIEWS

51

greater than 94% sequence identity) are common in both genomes2. But althoughabout 33% of human duplicated segments are human-specific, only about 17% of chim-panzee duplicated segments are chimpanzee-specific. Interestingly, about half of the genesin the human-specific duplicated regionsexhibit significant differences in gene expres-sion relative to the chimpanzee, and are mostoften upregulated.

Human genetic variationThe chimpanzee genome places the wealth ofdata on existing genetic variation in humansinto evolutionary context. It now becomespossible to determine the ancestral states ofthat variation, and, with the aid of gene-frequency data in human populations, we mayuncover ‘footprints’ of positive selection thatoccurred recently (less than 250,000 years ago,say) in humans. Under selective neutrality,new variants should rarely be found at highfrequency, and between-species divergenceshould be correlated with the level of within-species genetic variation. The current analysesidentify only six genomic regions that displaysignificantly less variation than expected fromthe divergence between the Homo and Pan lin-eages, which split about 6 million years ago;each of these regions suggests the recent actionof positive selection in humans. The power ofsuch a method will increase substantially withthe completion of genome drafts of a more dis-tantly related primate such as an Old Worldmonkey or the orang-utan, both of which arein progress10 (Fig. 1; see also page 17).

What makes us human?The question of what genetic changes make ushuman is far more complex. Although the twogenomes are very similar, there are about 35million nucleotide differences, 5 million indelsand many chromosomal rearrangements totake into account. Most of these changes willhave no significant biological effect, so identi-fication of the genomic differences underlyingsuch characteristics of ‘humanness’ as largecranial capacity, bipedalism and advancedbrain development remains a daunting task.Given the short time since the human–chim-panzee split, it is likely that a few mutations of large effect are responsible for part of thecurrent physical — phenotypic — differencesthat separate humans from chimpanzees andother great apes.

There are three prevailing hypotheses toaccount for the evolution of ‘humanness traits’:protein evolution, the ‘less-is-more’ hypothe-sis11, and changes in the regions of the genomethat regulate gene activity12 (Fig. 2). Prelimi-nary analyses of the human and chimpanzeegenomes provide some clues about the relativecontributions of these effects.

First, consider protein evolution. Areamino-acid changes that have contributed to‘humanness’ to be found in rapidly evolvingproteins? Most of those genes that do show a

KA/KS of more than 1 are not involved inprocesses related to supposed humannesstraits. In fact, genes related to brain functionand neuronal activity show lower-than-aver-age KA/KS values. The genes that display highKA/KS are mostly related to host–pathogeninteraction, immunity and reproduction. This pattern is also found in rats, mice and othermammals. This suggests that protein evolutionmay not be a major contributor to the evolu-tion of traits unique to humans. But before dis-missing this possibility, we must bear in mindthat the KA/KS test is biased towards genes thatexperience repeated amino-acid replacements.Genes involved in immunity and reproduc-tion are particularly affected by theseprocesses. But a gene that experiences a ‘selective sweep’ as a result of only a fewchanges — because those changes are stronglyadvantageous — would not leave a significantsignal on KA/KS. For example, two amino-acidchanges alone in the highly conserved FOXP2protein, a gene-transcription factor, mighthave contributed to the human capacity forspeech13. Finally, the role of indels and geneduplications in human–chimpanzee proteinevolution remains largely unexplored.

Second, the ‘less-is-more’ hypothesis positsthat loss-of-function changes relative to the‘prototypical ape’ traits are characteristic ofcertain humanness traits — for example, lackof body hair, preservation of some juveniletraits into adulthood and expansion of the cra-nium. Such loss-of-function changes could be caused by non-synonymous substitutions,indels, loss of coding regions and deletion ofentire genes. The comparisons to the chim-

panzee have unveiled 53 human genes withdisruptive indels in the coding regions, andgenes in this category may be associated withintriguing phenotypes14–16. Indels could plausibly be major contributors to human–chimpanzee phenotypic differences, especiallygiven that these mutations can also influencethe two other proposed mechanisms for theevolution of humanness (Fig. 2).

Third, there is the long-standing hypoth-esis that the phenotypic differences betweenhumans and chimpanzees primarily arisefrom changes in gene-regulatory regions. Thecurrent analyses1 do not address this issue indetail, because it is still notoriously difficult toidentify such regions. Most of our currentknowledge about regulatory regions comesfrom identifying similarities between distantlyrelated species. The matter could be addressedfurther in a comparative genomic frameworkby identifying conserved regulatory regionsamong relatively closely related species17,including Old World monkeys, in conjunctionwith a comparison to the chimpanzeesequence and with microarray expressionstudies that can provide functional validation.The hypothesis invoking evolution in gene-regulating regions is currently the hardest totest. Yet it may be the most promising, givenwhat we know of human biology relative tothat of apes.

The draft of the chimpanzee genome is anexciting addition to the list of sequenced vertebrate genomes. Next to the humangenome itself, it is the most useful for under-standing human biology and evolution. Butthe data still leave many questions unansweredabout what genetic modifications underlie themajor features distinguishing Homo sapiensfrom the great apes. The next stages of thisgrand project will involve finer-scale investi-gation of individual regions and genes toreveal the details of the general patterns nowuncovered at the genomic level. ■

Wen-Hsiung Li and Matthew A. Saunders are in the Department of Ecology and Evolution,University of Chicago, Chicago, Illinois 60637, USA.e-mail: [email protected]

1. The Chimpanzee Sequencing and Analysis ConsortiumNature 437, 69–87 (2005).

2. Cheng, Z. et al. Nature 437, 88–93 (2005).3. Linardopoulou, E. V. et al. Nature 437, 94–100 (2005).4. Hughes, J. F. et al. Nature 437, 101–104 (2005).5. Chen, F. C. & Li, W.-H. Am. J. Hum. Genet. 68, 444–456

(2001).6. Eyre-Walker, A. & Keightley, P. D. Nature 397, 344–347

(1999).7. Fay, J. C., Wyckoff, G. J. & Wu, C. I. Genetics 158, 1227–1234

(2001).8. Clark, A. G. et al. Science 302, 1960–1963 (2003).9. Mathews, L. M., Chi, S. Y., Greenberg, N., Ovchinnikov, I. &

Swergold, G. D. Am. J. Hum. Genet. 72, 739–748 (2003).10. http://www.genome.gov/10002154.11. Olson, M. V. & Varki, A. Nature Rev. Genet. 4, 20–28

(2003).12. King, M. C. & Wilson, A. C. Science 188, 107–116 (1975).13. Enard, W. et al. Nature 418, 869–872 (2002).14. Stedman, H. H. et al. Nature 428, 415–418 (2004).15. Hahn, Y. & Lee, B. Bioinformatics 21, I186–I194 (2005).16. International Human Genome Sequence Consortium.

Nature 431, 931–945 (2004).17. Boffelli, D. et al. Science 299, 1391–1394 (2003).

Figure 2 | Hypotheses to explain the geneticunderpinnings of human-specific traits. Each ofthe three hypotheses — protein evolution, ‘less-is-more’, and gene-regulatory evolution — isdepicted by a circle, with note of the mechanismsor processes that could underlie the evolutionarychange. A missense mutation causes an amino-acid change; a nonsense mutation causes a sensecodon to change into a stop codon, resulting inpremature termination of DNA transcription.Indels are insertions/deletions of DNA segments;exons are coding sequences; promoter regionsregulate gene activity in various ways.

1.9 News & Views Chimp MH 26/8/05 11:21 AM Page 51

Nature Publishing Group© 2005

© 2005 Nature Publishing Group