szn - international union of biological sciences (iubs) · 2015. 12. 2. · century. 1 am also very...

103

Upload: others

Post on 14-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • SZN

    BIOLOGICAL SCIENCES Challenges for the 21 Century st

    Proceedings of the International Symposium organized by the Stazione

    Zoologica “Anton Dohrn” (SZN) and the International Union of Biological Sciences (IUBS), on the occasion of the 27 IUBS General Assembly, th

    held on 9-11 November, 2000, in Naples, Italy

    Edited by

    Giorgio Bernardi Jean-Claude Mounolou

    Talal Younès

    Special Issue Biology International

    International Union of Biological Sciences June, 2001

  • Biology International N° 41 (June, 2001) 101

    Contents Foreword by Marvalee Wak 3 Preface by Giorgio Bernardi, Jean Claude Mounolou and Talal Younès 5 Introductory Remarks by François Gros 7 Chromatin Boundaries: Increasing Levels of Complexity in the Regulation of Gene Expression by Gary Felsenfeld 13 Functions of Eukaryotic DNA Methylation by Jean-Luc Rossignol 19 Some Aspects Cellular Behaviour that depend on Genetic Properties by Renato Dulbecco 24 Simulation of Biological Processes in the Genomic Context by François Képès 29 Expression Profiles of Genes in the Brain by R. Matoba, K. Kato, C. Kurooka, C. Maruyama, C. Lim, A. Fukakusa, S. Saito, and K. Matsubara 39 From Molecule to Perception: Five Hundred Million Years of Olfaction by John G. Hildebrand 41 Consciousness: One of the Last Unsolved Great Biological Problems by Gunther S. Stent 53 Comments :

    by John E. Burriss 58 by Yves LeGal 59

    Genomes, Morphologies and Communities: a Hierarchical Perspective on Evolutionary Patterns by David B. Wake 61 Biotechnology and Plankton: Ecological Aspects of Marine Biotechnology by Adrianna Ianora 65 Integrative Biology: Its Promise and Its Perils by Marvalee H. Wake 71 Theoretical Evolution: the Future of Selection by H.F. Hoenigsberg 75 Challenges in Macromolecular Crystallography by David Davies and Gerson H. Cohen 81 Regulomics after Genomics: A Challenge for the 21st Century by Emile Zuckerkandl 87 Population Genomics: Gene Expression in a Vineyard Yeast from Tuscany by Jeffrey P. Townsend, Duccio Cavalieri, & Daniel L. Hartl 91 Isochores and the Evolutionary Genomics of Vertebrates by Giorgio Bernardi 99

  • Biology International N° 41 (June, 2001) 3

    Foreword

    The advent of the new millenium has prompted many societies and organizations to discuss the accomplishments of the past century, and even the past millenium, and to forecast what the great challenges will be, and their potential solutions, in the period on which we now embark. The scientific symposium for the IUBS General Assembly in November, 2000, undertook that task, focusing on challenges to biology and biologists for the twenty-first century. An eclectic, diverse, and international group of distinguished speakers considered the challenges to biology as a discipline from the perspectives of their broad research orientations; nearly all considered the challenges for the twenty-first century to have great import to the future of human society. Recurrent themes included 1) the impact of biotechnology on science and on policy; 2) the concern that the problems to be dealt with are complex, and require broadly-based, forward-thinking, and integrative approaches that bring together the expertise of several different subdisciplines, not only within biology but including the physical and social sciences, and the humanities, depending on the questions being asked; 3) the implications of new techniques as applied to old and to new questions; 4) the fact that the constantly-emerging new information about what genes can tell us is dramatically changing both our approaches to our science and the kinds of answers, and especially applications, that we can expect, and 5) the idea that two emphases, a new reductionism and especially a new emphasis on hierarchical approaches to complexity, are emerging. Participants identified several major foci that have assumed new relevance, now that approaches to the analysis of complexity include hierarchical approaches that can include the molecule to the whole organism to the ecosystem, more extensive use of quantitative and statistical measures, and a broader perception of the application of the new tools (ranging from new instrumentation to new techniques) to many kinds of biological questions. Foci included the following: the organization of the brain and the peripheral nervous system for major general functions, such as olfaction with all its implications, and particularly the nature of consciousness; the new insights into establishment, maintenance, and loss of biodiversity yielded from studies that examine the molecular through population bases of organization; the relationship of the burgeoning genetic data to causal identification and treatment of disease in ALL organisms; the scientific and societal implications of cloning, gene-therapy approaches, and genetic modification in general; the need for more comparative studies as new understanding of the complexity of regulation of gene expression transcends to understanding of the diversity of life; and the ways that understanding the genetic and cellular structures (e. g., genes, etc.) and functions (e. g., regulation, etc.) that many organisms have in common will at the same time allow new insights into the origin of organismal variation and the evolution of new functions, new forms, and new taxa.

  • Biology International N° 41 (June, 2001) 4

    IUBS promises thoughtful attention to these discussions; the reader will find many areas revealed in these papers in which IUBS can and must promote conceptual and programmatic approaches that should benefit science and society through international collaboration on major new issues in biology. IUBS is poised to accept that task, and welcomes the participation of scientists throughout the world as we embark on addressing the challenges of the 21st century, and, we hope, the delineation of their solutions.

    Marvalee H. Wake President, IUBS

  • Biology International N° 41 (June, 2001) 5

    Preface

    Starting with the rediscovery of Mendel's laws and the discovery by Boveri and Sutton of the connection between Mendel's "factors" and chromosomes, the 20th Century steadily built up a new biological knowledge encompassing all levels of organisation ranging from the molecules and cells to the organisms and ecosystems. This led to the achievements of the last fifty years, notably understanding the DNA structure, deciphering the genetic code, elucidating the regulation of gene expression, and the impressive breakthroughs of molecular genetics, developmental biology, molecular evolution. Our knowledge of the diversity and complexity of life on Earth, and of what we are, has never made such rapid progress in the history of science. This has enabled developments in economy and society that were not even anticipated until recently. In accordance with its mission to address the challenges confronting biologists in their permanent quest for new knowledge, the International Union of Biological Sciences (IUBS), dedicated its General Assemblies in the last decade of the 20th Century to these emerging issues. Following the conferences on “Biodiversity, Science and Development, Towards a New Partnership” and “Frontiers in Biology: the Challenges of Biodiversity, Biotechnology and Sustainable Development,” organised respectively in conjunction with its General Assemblies in 1994, in Paris, and 1997, in Taipei, the conference “Biological Sciences: Challenges for the 21 Century”st was held on the occasion of the 27th IUBS General Assembly, which took place in Naples, Italy, on 8-11 November, 2000, marking its 80th Jubilee. Organised by the Stazione Zoologica “Anton Dohrn” in Naples (SZN) and IUBS, this conference resumed a collaboration initiated fifty years ago between the Stazione and the Union. The Conference’s aim was “Biological Sciences: Challenges for the 21st Century,” or How do IUBS ideas find their niche in a meeting where scientists do not cultivate their usual cosy debates in specialized workshops? Considering the past achievements of biology and its future, the conference has focused, in particular, upon the integration of biological disciplines that are still widely separated, ranging from ecology to neurobiology, into a global view of biology. In setting up of the program, the organisers of the Naples meeting were guided by the most recent progress of biological knowledge and molecular techniques, as well as by the reverberations of questions raised by societies about the benefits and immediate application of such advances to economic activities, social changes and everyday life. Within three days, the ambition could not be exhaustive, and some aspects of biology would deserve more deliberate and developed consideration, especially ecology, environmental sciences and ecological engineering. Attention was focussed on how molecular, genetic and cellular approaches transform our understanding of biological processes and how they bring new building blocks into IUBS ongoing endeavours, namely Integrative Biology, Biodiversity and Biocomplexity. We all know that we have been brought to life and life has been

  • Biology International N° 41 (June, 2001) 6

    brought around us through the lengthy process of evolution. After years of research and debates, the driving forces have been recognised in mutation, selection and drift. Understanding the way these forces act, in the framework of analytical approach, carried on for decades by biologists, step by step, protein by protein, gene by gene and nucleotide by nucleotide, is revolutionised by the automatic deciphering of DNA sequences and by the handling of millions of new data with the concepts and the tools of informatics and information sciences. These very recent opportunities that appeared only in the last decade open new perspectives and frontiers for young scientists. Some questions that have been pending for years are being revived; such as chromosomal architecture and consistency: are they the mere result of molecular history driven by already known evolutionary forces? Or, alternatively, are there other integrative forces and synergies that shape chromosomes at levels and paces we could not examine before? How does research on these subjects influences our understanding of the role of cryptic genes? How do results bear on the technical development of cloning, of transplantation of cells and organs, of gene transfers? Are advances in the field acceptable for societies? Such were the topics and questions of the Naples meeting. This special issue of Biology International provides a set of extended summaries, illustrating these issues and questions. Biologists are now in a position to turn some of these questions into research projects and to contribute to the social debates. A recognisable feature of this new science is the use of integrative intellectual approaches and a persistent effort to harness time and space scales. IUBS is convinced that Integrative Biology is the way in the future to value specific and disciplinary inputs into the understanding of living processes that societies ask for.

    Giorgio Bernardi President, Stazione Zoologica “Anton Dohrn”, Naples Chairman, Italian Committee IUBS Jean-Claude Mounolou Past-President, IUBS Talal Younès Executive Director, IUBS

  • Biology International N° 41 (June, 2001) 7

    Biological Sciences Challenges for the 21st Century

    Opening Remarks

    By François GROS Académie des Sciences, 23 Quai de Conti, 75006 Paris, France

    I wish to thank Giorgio Bernardi for having so kindly invited me to take part in what will be, undoubtedly, a fascinating colloquium devoted to Biological Sciences at the turn of the century. 1 am also very pleased to be back, after many years, at the Zoological Station in Naples. Firstly because this gives me the great pleasure to meet so many friends, including himself but also because it reminds me with emotion, of the first time I met Professor Alberto Monroy and of our first discussions about messenger RNA in amphibians. As national representative of ICSU, it also gives me the opportunity to greet my colleagues and friends from IUBS, since the colloquium and the general assembly will go side by side. When I began my activity as a biologist, in 1947, nucleic acid metabolism held only peripheral interest. People, at that time, were mostly concerned with the glycolytic pathway, the Krebs cycle, the mechanisms of enzyme action, hydrogen transport and the role of vitamin and lipids in animal nutrition! Genes, of course, were known, but, in spite of the early publications on the “transforming principle” by Avery at the Rockefeller Institute, they were still looking like a “black-box,” as far as their physicochemical nature, their mode of replication and even their precise function in the cell economy were concerned. The biology of the gene was far less advanced than was the physiology of the entire organism, or the biology of the cell. The “big turn”, as everyone knows, was the discovery of the double helix (table 1) that constituted the first stage of the molecular biology “saga,” with its main consequences: the genetic code, the mechanisms of RNA and protein synthesis, the messenger RNA discovery, the central dogma hypothesis, etc... The first table emphasizes some of the very early achievements illustrating major landmarks in biochemistry, and molecular biology preceding the onset of DNA technology: one such milestone was the demonstration that nucleic acid-like polymers could be made in vitro, followed by the discovery of DNA-dependent RNA polymerase. 1961 marked the first identification of mRNA in phages and bacteria, lending support to the central dogma hypothesis. Then came the discovery of the reverse transcriptase and of the restriction enzymes, as biological weapons preparing the advent of genetic engineering. While these first achievements were based upon prokaryotic models, the switch of molecular genetics to eukaryotic systems was made possible by the onset of Recombinant DNA technology and by the accompanying development of cDNA cloning (table 2). The first success in achieving cDNA cloning was due to work by Mach, Kourilsky and Rougeon (in Geneva and at the Pasteur Institute). This encouraged people to make use of this approach to isolate and purify many eukaryotic genes playing an important role in the economy of the eucaryotic life. Yet, 20 years ago, no one could predict the explosive

  • Biology International N° 41 (June, 2001) 8

    impact of molecular and developmental genetics and the fact that gene-based technologies would become so instrumental in all the areas of life sciences, including: immunology, neurobiology, general physiology, pathophysiology, development and up to the study of biodiversity and the origin of life (table 3).

    Table 1 The "Prehistoric Period ": From Polynuleotide Synthesis

    to the Onset of Genetic Engineering 1953 The double helix (Watson, Crick & Wilkins) 1954-55 PNPase; DNA polymerase: (Ochoa, Grunberg-Manago, Kornberg) 1958-60 Early studies and purification of DNA-dependent RNA polymerase (Weiss, Hurwitz, Stevens) 1961 The messenger hypothesis (Jacob & Monod) Identification of messenger RNA Preliminary work (Volkin & Astrachan – 1958)

    Identification in phage and bacteria (Jacob, Brenner, Meselson, Gros et al., Hall & Spielgman)

    1968 Chemical synthesis of polydeoxy ribonucleotides of defined sequences ( Khorana)

    1968 - 70 Discovery of restriction enzymes (Arber, Nathans, Smith) Discovery of reverse transcriptase (Temin & Baltimore)

    Total synthesis of the gene for an alanine -tRNA from yeast (Agarwal et al.)

    1970 - 73 Synthesis of DNA's complementary to messenger RNA's (slime mold; rabbit reticulocyte 10S RNA; calf lens crystalline mRNA...)

    1972 -I. Verma, G. F. Temple, H. Fang & D. Baltimore

    1972 - H. Aviv, S. Packman, D. Swan, J. Ross & P. Leder

    1973 - A.J.M. Berns & H. Bloemendal, S.J. Kaufman

    1974 - I. Verma, R.A. Firtel, H. Lodish & D. Baltimore

    1973 Genetic engineering and first cloning techniques

    (Berg, Cote - Chang, Helling & Boyer) 1975 - 78 Nucleic acid sequencing - the prelude to genomics (Sanger, Maxam, Gilbert)

  • Biology International N° 41 (June, 2001) 9

    Table 2 c-DNA Cloning

    1975 – 1982 Cloning of various eukaryotic cDNAs (work from the Pasteur and B. Machs groups) in « Nucl. Acids Res.», 2, 2365 (1975)

    1976 - Synthesis of full-length double stranded DNA from rabbit 9S globin mRNA

    1977 - Cloning and amplification of rabbit α and β globin gene sequences Cloning of mouse kappa light chain Ig gene 1979 - Molecular cloning and nucleotide sequence of human growth hormone cDNA 1980 - Cloning of the “mu” chain of mouse IgG

    1982 - Cloning of rabbit γ heavy chain cDNA

    Cloning of the AchR α subunit cDNA

    Table 3 The Genetic Paradigm and its Impact on Life Sciences

    • The possibility to purify eukaryotic genes by cDNA cloning and to monitor

    their products

    • The knowledge of the main genetic regulatory circuits (cis-regulatory sequences trans-activating factors, phosphorylation cascades, oncogenes, developmental genes, apoptotic genes...)

    • The in vivo approach to gene function (transgenesis knock-out mice…) • Comparative sequence studies afforded by genomics, renewing approaches

    to biodiversity, evolution and the origin of life

    • The (re)discovery of the "RNA world"(ribozymes) • The renewal of physiopathology and pharmacology (monogenic diseases

    gene-therapy approaches, pharmacogenomics, gene expression profiling using microarrays)

  • Biology International N° 41 (June, 2001) 10

    Some of the main clues explaining this phenomenon of irrigation of life sciences by molecular genetics relate to methodological breakthroughs, such as gene cloning, transgenesis, comparative sequence studies involving gene banks, reverse genetics, pharmacogenomics, utilization of micro-arrays, etc. Others are of a more fundamental nature such as: the deciphering of positive control mechanisms, the discovery of developmental genes, of transduction cascades and of apoptotic mechanisms.... But, at this stage of my remarks, I would not like to leave the audience with the feeling that molecular and genetic approaches are “self sufficient” for a thorough and global understanding of life. This way of thinking would give rise to a somewhat naive and oversimplified vision about the present status of life sciences... Understanding the structure of molecules, gene action and regulatory circuits is fine, but many more efforts are needed if we want to understand functions and really penetrate inside of the complex hierarchy in the organisation and physiology of living beings or the way they interact with the environment. On the one hand, it seems to me that some aspects of biology will probably reflect a more reductionistic attitude as a result of the most recent achievements in physics and informatics; on the other, biology of the 21st century will also pursue a complementary, if not opposite direction, not only to integrate molecular data into complex physiological systems, but also to tackle directly the complex systems themselves. Table 4 emphasizes some of the present challenges that “structural biology” has to face; that is to say, to better understand the physico-chemical phenomena of life both in time and space. For example, the recent data concerning the 30S ribosome structure, derived through a combination of synchroton radiation and computerized models, has completely renewed our concepts about the function of these organelles in protein synthesis by showing that this ribosomal RNA, rather than ribosomal proteins, is playing a major role during the main translational steps. Also, great progress has been made in the microscopic examination of the cell, thanks to the new performances of multiphotonic microscopy, which provides access to protein traffic within the cell.

    Table 4A New Reductionistic Approach Using Highly Resolving Power

    Techniques combined with Model Driven Acquisition

    ♦ Applications to the analysis of cellular superstructures (chromosomes, membranes, ribosomes, cytoskeleton)

    ultra rapid transconformations use of synchroton radiations for crystallographic studies improved visualization of cellular components using new microscopes

    (biphotonic, tunnel effect ... )

    ♦ Refine the clues for studying the 3-D structures of proteins

  • Biology International N° 41 (June, 2001) 11

    Much more work will be needed to understand genome organisation within such superstructures as chromatin or the nucleus, and we still know relatively little about the mechanisms of tridimensional protein folding, as well as the rules involved. Too few protein structures have thus been described, and there is no clear-cut relationship between their primary sequences and their 3D structure. Now, turning towards higher levels in the organisation of the living beings, it is clear that we will have to face even more difficult challenges (table 5). Not only shall we have to store and handle a fantastic amount of gene sequence data and to compute gene expression profiles in order to arrive at integrated physiological functions (and hence to reach the complexity of what Zuckerkandl is defining as “Regulomics,”) but it looks obvious to many people that there is urgent need for a complete revival of taxonomy and for the analysis of functional biodiversity. Finally, neurosciences constitute a world in itself that not only embraces molecular genetics but bears upon many complex phenomena related to the communication between neurons, to the state of consciousness and touching the various aspects of cognitive sciences including psychology. According to Gunther Stent “…Consciousness remains one of the last unsolved great biological problems.”

    Table 5 From Molecules to the Biosphere ....

    ♦ An unprecedented effort will be needed (in Bioinformatics...) to integrate data

    from gene sequence analyses, as well as transcriptome and proteome studies, into a physiologically coherent picture, to learn more about functions

    ♦ "Regulomics" after "Genomics" ♦ Bioinformatics should be in synergy with systematics to better tackle

    biodiversity (the taxonomic and functional diversity of living organisms)

    ♦ But this should not obviate the need for more traditional methods (field exploration; management of large collections and specimens)

    ♦ The goal is to attempt a catalogue of life on earth...

  • Biology International N° 41 (June, 2001) 12

    In conclusion, and as is so clearly stated by IUBS and more specifically by Marvalee Wake in the paper that she will be presenting in a forthcoming session; “Many of the questions now being addressed by biologists require both reductionistic and incorporative elements, but in a framework that allows the resolution of the sub-elements of the question to contribute to an answer to a larger problem...” Even if there exist different connotations for the word “integrative biology,” which, as Dr. Wake will explain, is more than the pure aggregation of workers with different expertises to consider complex problems, I am convinced that the majority of people present in this room share the view that the 21st century will be the century of “integrative biologists,” that is to say, of “…biologists trained in such a way as to span different levels of biological organisations, of the complex hierarchy of the living world, and perhaps extend to non-biological realms.”

  • Biology International N° 41 (June, 2001)

    13

    Chromatin Boundaries: Increasing Levels of Complexity in the Regulation of Gene Expression

    By Gary Felsenfeld

    Laboratory of Molecular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0540, U.S.A.

    Introduction Our understanding of how gene expression is regulated began with microbial systems and with very simple regulatory elements like that controlling the lac operon. As in other areas of biology, the continued study of such systems, especially in eukaryotes, has revealed the existence of mechanisms that are increasingly complex. For many years we have investigated the regulation of the chicken β-globin locus and, particularly, the interplay between classical regulatory elements controlling activation of the individual genes and the further control conferred by the chromatin structure in which these genes are packaged. Early models of regulation at the level of individual globin genes seemed to indicate that there were relatively few transcription factors involved. For example, the β/ε enhancer located 3' of the adult β-globin gene contains two binding sites for the transcription factor GATA-1 and a single NF-E2 site which appear to account for most of the activation properties of this enhancer in the erythroid cells we have studied (Emerson et al., 1985, Reitman and Felsenfeld, 1988). The situation at the βA - globin promoter appears, however, to be a great deal more complicated; it involves multiple regulatory factors and, of course, recruits the multiple protein components associated with the basal transcription complex (Emerson et al., 1989). During the past few years, it has become increasingly clear that chromatin structure also contributes critically to the regulation of gene expression in eukaryotes. This should not be viewed as a separate layer of regulation imposed on the simpler enhancer-promoter mechanism, but rather as part of a single system in which all elements interact with one another by exploiting the unique chemistry of the individual components. As we have begun to comprehend the complexity of this regulatory system, we have also begun to realize that we are certainly not at the end of the story. We presently understand only the simpler aspects of chromatin structure: the organization of histones and DNA within the nucleosome is well understood (though its chemistry is only now being explored) (Luger et al., 1997). The arrangement of nucleosomes within the next higher order structure, the 30 nm solenoidal fiber, is, however, still a matter of controversy, and we know very little about the higher orders of organization of chromatin within the nucleus. Studies in yeast and in Drosophila have, however, begun to dissect the structure of hetero-chromatic domains and particularly the protein components that are necessary to stabilize them. These proteins presumably serve to render the folded polynucleosome complex even

  • Biology International N° 41 (June, 2001)

    14

    more inaccessible. In contrast, the structure of transcriptionally active chromatin is much more open. In the case of RNA polymerase II transcribed genes such as β-globin, individual nucleosomes are found over the body of the gene, and some further compaction is even observed (Fisher and Felsenfeld, 1986, Kimura et al., 1983), but such genes are marked by heightened sensitivity to nucleases (notably DNase I) and by elevated levels of histone acetylation over the entire domain. Thus, within the nucleus there are regions of active chromatin adjacent to other regions in which the chromatin is condensed. It follows that there must be places in the genome where they are adjacent. What keeps the two regions distinct? It is now clear that in a number of cases, special sequence elements called insulators serve to define the boundaries between active and inactive domains, or between two domains with distinct regulatory elements and programs of expression. The first such elements to be identified were found in Drosophila. An insulator is defined by two unique properties: it is able to block the action of an enhancer when it lies between the enhancer and promoter, and/or it is able to protect a stably integrated reporter gene from position effects in transgenic animals or in cell lines (see Bell and Felsenfeld, 1999, for review).

    Results We have identified a sequence element with the properties of an insulator at the 5' end of the chicken β-globin locus (Chung et al., 1993). It is marked by a Dnase-hypersensitive site present in every tissue. It has been shown that the boundary of the active chromatin domain of the globin locus occurs just upstream of this site, which is consistent with a possible role in vivo as a boundary element (Hebbes et al., 1994). We first devised an assay to test the enhancer blocking capabilities of a 1.2 kb sequence containing the hypersensitive site (Chung et al., 1993). As shown in Figure 1, this sequence is capable of interfering with enhancer-promoter interaction. We next dissected the element into smaller regions in order to identify both the sequence critical for insulation and the protein or proteins associated with it (Chung et al., 1997). We ultimately found a 50 bp fragment with strong enhancer blocking properties that bound a single protein, which we identified as a known transcription factor, CTCF (Bell et al., 1999). CTCF is a protein with 11 zinc fingers that recognizes a variety of DNA binding sites by employing varying subsets of fingers (Filippova et al., 1998). If CTCF binding sites are indeed associated with domain boundaries it might be expected that a similar arrangement to that found at the 5' end of the β-globin locus might be found also at its 3' end. We therefore examined the DNA sequence and chromatin structure at the 3' end of the open globin chromatin domain and identified a DNase hypersensitive site just upstream of the boundary between this domain and a downstream condensed chromatin region containing a gene for an odorant receptor (Saitoh et al., 2000). We further identified within the hypersensitive site a DNA sequence that functions as a positional enhancer blocker in our assay and is able to bind CTCF. The existence of this second site at the other end of the domain makes it unlikely that the placement of CTCF sites near boundaries is an accident, and supports the

  • Biology International N° 41 (June, 2001)

    15

    idea that it plays some role in the establishment or maintenance of these boundaries. Upstream of the 5' CTCF site, we have another gene encoding a folate receptor, which is expressed early in erythroid development (Prioleau et al., 1999). It has its own program of expression, distinct from that of the β-globin genes. We have suggested that the CTCF sites at either end of the locus serve to prevent cross-talk between regulatory elements of the β-globin genes and those of the folate receptor at the 5' end or the odorant receptor at the 3' end. Figure 1. Results of an assay for enhancer blocking activity of a 1.2 kb insulator element (I) from the 5' end of the chicken β-globin locus. A reporter carrying a gene for neomycin resistance, driven by an erythroid-specific promoter and enhancer, is stably transformed into a human erythroleukemia line, K562. The number of resistant colonies is subsequently counted. The control has DNA sequences from λ bacteriophage inserted to keep spacings similar. When one or two copies of the insulator are inserted, the number of colonies is greatly reduced, because the insulator prevents access of the enhancer to the promoter. (See Chung et al., 1993)

    Effect of 1.2KB 5’ Beta Globin Element on Enhancer Action

    Relative colony number 0.5 1.O

    ENH

    ENH

    λ λ

    γ-NEO

    γ-NEO

    γ-NEO

    I I I I

    I I

    ENH

    We have searched for CTCF sites within other gene loci. Other workers have reported the existence of enhancer blocking elements within the human T cell receptor α/δ locus (Zhong and Krangel, 1997), and also within the Xenopus ribosomal repeat organizer (Robinett et al., 1997). We have shown that both of these contain CTCF binding sites (Bell et al., 1999). Recently we have extended our work to a study of the imprinting phenomenon at the Igf2/H19 locus found in mouse, rat and human (Bell and Felsenfeld, 2000; see also Hark et al., 2000). It has been known for some time that on the maternally transmitted allele, H19 is expressed but Igf2 is not, while the opposite is true on the paternally transmitted allele, where H19 is silent and Igf2 expressed. Furthermore, a region lying between the two genes (called the imprinted control region or ICR) is methylated on the paternal but not the maternal allele. It had been suggested that the ICR might contain an insulator capable of blocking the activation of the Igf2 gene by a downstream enhancer (see Figure 2). We tested a DNA fragment containing most of the ICR and found it capable of blocking enhancer

  • Biology International N° 41 (June, 2001)

    16

    activity; further examination revealed the presence of four CTCF sites in the mouse ICR, and seven in human. Most important, methylation of these CTCF sites abolished binding (Bell and Felsenfeld, 2000). Parallel experiments with quite similar results have been carried out by Tilghman and her colleagues (Hark et al., 2000). These results explain how the imprinting regulates Igf2 expression. They also show that CTCF sites do play an important role in establishing boundaries in vivo, and that this boundary function can be modulated by DNA methylation. Figure 2. Mechanism of insulator action at the Igf2/H19 locus. The imprinted control region (ICR) contains four CTCF binding sites that in the maternal transmitted allele effectively block the action of the downstream enhancer (E) on the Igf2 promoter. In the paternal allele, the CTCF sites are methylated, abolishing CTCF binding. The insulator is now inactivated and the enhancer is able to activate Igf2 expression (Bell and Felsenfeld, 2000, Hark et al., 2000).

    ENHANCER

    BLOCKED

    H19ICR E

    CTCF

    X

    Igf2

    - - - +++

    ENHANCER NOT BLOCKED

    Igf2 H19ICR E

    CH3 +++ - - - Increasing numbers and varieties of insulator elements are now being discovered. In addition to blocking enhancer activity, many of these are able to protect reporter genes from position effects when they surround the reporter in constructions stably transformed into cell lines or transgenic animals. We have shown that the 1.2 kb β-globin boundary element has this property, but other sequences within the region, and not the CTCF site, are responsible.

  • Biology International N° 41 (June, 2001) 17

    Conclusion

    There is increasing evidence that the activity of insulators involves their ability to alter the organization of chromatin structure within the nucleus. This organization certainly involves interactions between the chromatin complex and internal structural elements within the nucleus, for example nuclear lamina and the nuclear matrix (Gerasimova et al., 2000). These are likely to be not merely structural conveniences, but essential participants in the regulatory mechanisms. The history of the last two decades and our most recent discoveries thus suggest that we will need to analyze increasingly complex systems. For that purpose new tools are being developed and will be essential. In particular it will be necessary to use sophisticated microscopic techniques in conjunction with specific probes for proteins and nucleic acid sequences to locate the participants in these reactions with respect to each other and the architectural features of the nucleus. Cell biology will then have returned to its origins in microscopy, but informed by the molecular biology of the last century.

    References

    Bell, A.C. and Felsenfeld, G. (1999). Stopped at the border: boundaries and insulators. Curr. Opin. Genet. Dev. 9, 191-198.

    Bell, A.C. and Felsenfeld, G. (2000). Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf2 gene [see comments]. Nature 405, 482-485.

    Bell, A.C., West, A.G., and Felsenfeld, G. (1999). The protein CTCF is required for the enhancer blocking activity of vertebrate insulators. Cell 98, 387-396.

    Chung, J.H., Bell, A.C., and Felsenfeld, G. (1997). Characterization of the chicken beta-globin insulator. Proc. Natl. Acad. Sci. U. S. A. 94, 575-580.

    Chung, J.H., Whiteley, M., and Felsenfeld, G. (1993). A 5' element of the chicken beta-globin domain serves as an insulator in human erythroid cells and protects against position effect in Drosophila. Cell 74, 505-514.

    Gerasimova, T.I., Byrd, K., and Corces, V.G. (2000). A chromatin insulator determines the nuclear localization of DNA. Molecular Cell 6, 1025–1035

    Emerson, B.M., Lewis, C.D., and Felsenfeld, G. (1985). Interaction of specific nuclear factors with the nuclease- hypersensitive region of the chicken adult beta-globin gene: nature of the binding domain. Cell 41, 21-30.

    Emerson, B.M., Nickol, J.M., and Fong, T.C. (1989). Erythroid-specific activation and derepression of the chick beta-globin promoter in vitro. Cell 57, 1189-1200.

    Filippova, G.N., Lindblom, A., Meincke, L.J., Klenova, E.M., Neiman, P.E., Collins, S.J., Doggett, N.A., and Lobanenkov, V.V. (1998). A widely expressed transcription factor with multiple DNA

  • Biology International N° 41 (June, 2001)

    18

    sequence specificity, CTCF, is localized at chromosome segment 16q22.1 within one of the smallest regions of overlap for common deletions in breast and prostate cancers. Genes Chromosomes. Cancer 22, 26-36.

    Fisher, E.A. and Felsenfeld, G. (1986). Comparison of the folding of beta-globin and ovalbumin gene containing chromatin isolated from chicken oviduct and erythrocytes. Biochemistry 25, 8010-8016.

    Hark, A.T., Schoenherr, C.J., Katz, D.J., Ingram, R.S., Levorse, J.M., and Tilghman, S.M. (2000). CTCF mediates methylation-sensitive enhancer-blocking activity at the H19/Igf2 locus [see comments]. Nature 405, 486-489.

    Hebbes, T.R., Clayton ,A.L., Thorne, A.W., and Crane-Robinson, C. (1994). Core histone hyperacetylation co-maps with generalized DNase I sensitivity in the chicken beta-globin chromosomal domain. EMBO J. 13, 1823-1830.

    Kimura, T., Mills, F.C., Allan ,J., and Gould, H. (1983). Selective unfolding of erythroid chromatin in the region of the active beta-globin gene. Nature 306, 709-712.

    Luger, K., Mader, A.W., Richmond, R.K., Sargent, D.F., and Richmond, T.J. (1997). Crystal structure of the nucleosome core particle at 2.8 A resolution [see comments]. Nature 389, 251-260.

    Prioleau, M.N., Nony, P., Simpson, M., and Felsenfeld, G. (1999). An insulator element and condensed chromatin region separate the chicken beta-globin locus from an independently regulated erythroid-specific folate receptor gene. EMBO J. 18, 4035-4048.

    Reitman, M. and Felsenfeld, G. (1988). Mutational analysis of the chicken beta-globin enhancer reveals two positive-acting domains. Proc. Natl. Acad. Sci. U. S. A 85, 6267-6271.

    Robinett,C.C., O'Connor, A., and Dunaway, M. (1997). The repeat organizer, a specialized insulator element within the intergenic spacer of the Xenopus rRNA genes. Mol. Cell Biol. 17, 2866-2875.

    Saitoh, N., Bell, A.C., Recillas-Targa, F., West,A.G., Simpson, M., Pikaart, M., and Felsenfeld, G. (2000). Structural and functional conservation at the boundaries of the chicken beta-globin domain. EMBO J. 19, 2315-2322.

    Zhong, X.P. and Krangel, M.S. (1997). An enhancer-blocking element between alpha and delta gene segments within the human T cell receptor alpha/delta locus. Proc. Natl. Acad. Sci. U. S. A. 94, 5219-5224.

  • 19Biology International N° 41 (June, 2001)

    Functions of Eukaryotic DNA Methylation

    By Jean-Luc Rossignol Institut Jacques Monod, UMR 7592 (CNRS/Univ. Paris 6/Univ. Paris 7), Paris, France

    Introduction

    Cytosine methylation is a chemical modification of DNA which can be faithfully inherited. In contrast to base substitution mutations, this chemical modification does not change the pairing properties of the modified base. C-methylation is observed in the three domains of life: Archae, Bacteria and Eucaryotes. It is catalyzed by C-DNA methyltransferases (MTases) which all share a catalytical domain with nine well-conserved motifs arranged in the same order along their primary structure. This indicates that C-methylation is an ancestral mechanism that predated the diversification of the three domains of life. However, methylation is not observed in all eukaryotes. Whereas it has been found in all plants and vertebrates studied, it has not been detected in several unrelated animal species. In fungi, it is present in some species (Ascobolus immersus, Neurospora crassa) but has not been detected in others (Saccharomyces cerevisiae, Schizosaccaromyces pombe). The failure to find any gene related to MTases in the completely sequenced genomes of C. elegans and S. cerevisiae strongly suggests that methylation is actually absent in these organisms. How can we solve this apparent paradox? On the one hand, methylation is an ancestral process which has been conserved in many animal, plant and fungal species despite its mutational load (deamination of methylated Cs favours C to T transitions, hypermethylation of gene suppressor promoters favors progression to cancer). On the other hand, methylation is absent in various fungal and animal species. This clearly means that methylation does not deserve any specific essential function which would be common to all eukaryotes. We have proposed, with Vincent Colot (1) that methylation has been maintained during evolution, because, owing to its property to provide an inheritable marking to DNA, it acts as an evolutionary device which has been used for setting up a variety of functions. In support of this hypothesis, methylation displays distinct patterns of distribution among eukaryotic genomes, several types of MTases have been identified which can be classified in distinct subfamilies and methylation serves a variety of functions which can either coexist within the same type of organism or differ from one type to another.

    1) Diversity of methylation patterns The distribution of methylation varies widely among eukaryotic genomes (1, for a review). In vertebrates, methylation is genome wide, involving all genomic regions, with the exception of the one-two kb long DNA segments named “CpG islands.” CpG islands are mainly located upstream from the genes and are found in about two third of them. They

  • 20Biology International N° 41 (June, 2001)

    display a high density of CpG doublets, as compared to the rest of the genome. These CpGs usually stay unmethylated, with a few exceptions, like in the inactivated X of mammalian females. In contrast, in plants and fungi, methylation is fractional, being mainly located in genomic regions corresponding to DNA repeats created by transposons and retrotransposons. However, this is not the only pattern of fractional methylation. In the Urochordate Cyona intestinalis, methylation was found within the transcribed part of single genes, but was absent from DNA repeats (2). Another factor contributes to the variety of methylation patterns: it corresponds to nature and the density of methylation substrates (1, for a review). In adult mammalian tissues, methylation involves almost only Cs belonging to symmetrical CpG doublets. Furthermore, with the exception of CpG islands, the density of CpGs is about three times lower than expected from the density of C and G. In contrast, in fungi, CpGs are as frequent as expected, and methylation also involves Cs which do not belong to CpGs. Therefore, in methylated genomic regions, the density of methylation is much higher in fungi than in vertebrates.

    2) Multiplicity of eukaryotic methyltransferases By comparing the nine most conserved motifs of the catalytical part of MTases, it is possible to set up a phylogenetic tree (Figure). There is as much diversity between eukaryotic Mtases as between eukaryotic and prokaryotic DNA MTases. Eukaryotic MTases can be tentatively classified in five subfamilies. In two of these, the three kingdoms: animals, plants and fungi are represented. These observations suggest that the diversification of MTases predated the appearance of these eukaryotic kingdoms.

    Figure : Phylogenetic tree of eukaryotic C-DNA methyltransferases

  • 21Biology International N° 41 (June, 2001)

    3) Various functions of methylation The present experimental evidence indicates that methylation can serve at least four distinct functions: either preventing or favouring gene expression, marking imprinted genes and repressing recombination. 3.1. Repressing gene expression In mammalian cells, a strong correlation is found between the repression of the transcription of genes located downstream CpG islands and the methylation of these CpG islands. In vitro-methylation prevents the expression of genes after transfection. There is now good evidence that methylation of promoters containing a high amount of CpGs has also a causal role in preventing transcription initiation in vivo (see 1 for a review, and 4). Methylation would allow the establishment and maintenance of inactive chromatin states owing to the recruitment of histone-deacetylases either through methyl-binding proteins (1) or through MTases which are targeted to the replication fork during the late S phase (see 5 for a review). In fungi also, methylation inhibits transcription, but this is obtained via an effect upon transcript elongation, which is blocked when it reaches methylated portions. In contrast, little or no effect is observed on transcription initiation. In contrast, in mammals, heavy methylation of the transcribed sequence does not appear to affect the transcript elongation. Therefore, methylation represses transcription in two completely different ways in vertebrates and in fungi, by blocking its initiation in the first class of organisms and by preventing transcript elongation in the second one. 3.2. Marking imprinted genes Another function of methylation is to provide an heritable signal which has no direct effect upon transcription. This is illustrated by the situation commonly observed in the case of parental imprinting in mammals. Imprinted genes display a monoallelic expression restricted to the paternal allele (for some imprinted genes) or the maternal allele (for some others). The imprint occurs in the germline and has been shown in several cases to be associated with the uniparental methylation of a small DNA region lying close to the imprinted gene, called the “imprinting box.” Several observations indicate that methylation acts as a signal that can be dissociated from the effect upon gene expression. Indeed, the monallelic expression is set up only in the post-implantation embryo, after numerous cell divisions, which implies that the signal was inherited without being executed. Furthermore, in mutants cells which do not express the maintenance MTase dnmt1, the methylation of the imprinting box is lost and monoallelic expression is never set up. Introducing a functional dnmt1 gene by transfection does not enable the restoration of the monoallelic expression, indicating that it is indeed the

  • 22Biology International N° 41 (June, 2001)

    loss of a transmissible signal which is responsible for the lack of monoallelic expression rather than a direct consequence of the absence of methylation (see 1 for a review). 3.3. Allowing gene expression The allele, which will stay unexpressed when monoallelic expression is set up, can be the allele which underwent methylation of its imprinting box. This can be simply explained if, after implantation, methylation expanded to the promoter region of the imprinted gene, hence preventing its expression. However, the other situation where the expressed gene is the one that underwent methylation of its imprinting box is also observed. In this situation, both alleles stay unexpressed during early development, and the loss of the imprinting signal in MTase deficient cells leads to a failure of expression of the imprinted gene, resulting in both alleles being silenced. This means that, in this case, the methylation of the imprinting signal is required to trigger gene expression. An example of this situation is seen with Igf2, in mouse. H19 and Igf2 are two linked imprinted genes. The expression of H19 is restricted to the maternal chromosome, whereas that of Igf2 is restricted to the paternal one. The imprinting box is methylated on the maternal chromosome, and the loss of the imprinting methylation signal results in the absence of expression of Igf2. In other words, the expression of Igf2 requires the methylation of the imprinting box. The teams of G. Felsenfeld and S. Tilgham (reviewed in 6) have shown that the imprinting box corresponds to an insulator element. The insulation is mediated by the protein CTCF which binds this region. On the unmethylated maternal chromosome, an enhancer element located close to H19 activates this gene. The presence of the insulator, distal to H19, prevents the enhancer from activating Igf2, located farther away. In contrast, on the paternal chromosome, methylation of the imprinted box prevents the binding of CTCF, and this region can no longer act as an insulator. Consequently, the enhancer can activate the distal Igf2 gene. On another hand, methylation of the imprinting box, which extends to the promoter of H19, prevents the expression of this gene. 3.4. Repressing homologous recombination It was already shown that methylation of the immuglobulin genes prevented their rearrangement via the VDJ specialized recombination machinery, probably by inducing a close chromatin state which prevents the access of the recombinase (7). Methylation might also prevent homologous recombination. Indeed, in plants and fungi, methylation mainly affects DNA repeats, which often lie at distinct positions on the same or on distinct chromosomes. Events of homologous recombination between these repeats threaten genome integrity by producing various types of chromosome rearrangements. In this regard, methylation might contribute to genome stability by preventing homologous recombination. In order to test this hypothesis in our laboratory, Laurent Maloisel made a construct in the fungus Ascobolus in which a meiotic recombination hot spot flanked by two genetic markers was integrated within a chromosome. The frequency of crossing-over was measured in

  • 23Biology International N° 41 (June, 2001)

    strains in which the hot spot was either unmethylated, methylated in both parental DNA molecules or methylated in only one DNA molecule. Recombination dropped by about 300 times in the strains where both parental molecules had been methylated and by near 50 times in strains harboring one methylated and one unmethylated parent molecules (8). Studies in yeast have shown that meiotic recombination is induced by a double-strand break involving either parental molecule with equal probability. If methylation were to only prevent the initial break, we would expect methylation to suppress only half of the recombination events. The 50 times decrease therefore indicates that methylation acts at steps of the recombination process other than just the initial cut.

    Conclusion Besides the fact that it is not universal among eukaryotic organisms, methylation displays a large diversity of genomic pattern distributions, is catalyzed by a variety of MTases, and performs distinct functions. These functions can coexist within the same organism (either repressing or allowing transcription and being a signal for parental imprinting in mammals; repressing both transcription and recombination in fungi). They also can differ from one type of organism to another one (for example, preventing transcript initiation in vertebrates and arresting transcript elongation in fungi). This variety in methylation conditions, distribution and function is expected in our hypothesis of methylation being used during evolution as a molecular device to set up new functions. It is likely that in the near future, new functions of eukaryotic methylation will be deciphered.

    References 1- Colot, V, Rossignol, J.L (1999) BioEssays 21: 402. 2- Simmen, M.W., Leitgeb, S., Charlton, J., Jones, S.J.M., Harris, B.R., Clark, V.H., Bird, A. (1999)

    Science 283: 1164. 3- Bestor, T.H. (2000) Human Mol Gen 9: 2395; and Finnegan, E.J., Kovac, K.A. (2000) in Plant

    Gene Silencing, MA Matzke and AJM Matzke (Eds) Kluwer Academic Publishers. 4- Siegfried, Z., Eden, S., Mendelsohn, M., Feng, X., Tsuberi, B.Z., Cedar, H. (1999) Nat Genet- 22:

    203. 5- Allshire, R., Bickmore, W. (2000) Cell 102: 705. 6- Reik, W., Murrell, A., (2000) Nature 495: 408. 7- Hsieh, C.L., Lieber, M.R. (1992) EMBO J 11: 315. 8 –Maloisel,, L., and Rossignol, J.L. (1998) Gene & Dev 12: 1381.

  • Biology International N° 41 (June, 2001)

    24

    Some Aspects of Cellular Behaviour that depend on Genetic Properties

    By Renato Dulbecco

    Istituto di Tecnologie Biomediche Avancate (ITBA), Consiglio Nazionale delle Ricerche, Via Ampere 56, 20131 Milano, Italy

    In this section of the meeting, several aspects of cellular behaviour will be considered, both physiological and pathological, that depend on genetic properties. In this introduction, therefore, I will examine some aspects of genetics that are relevant to these problems. We are approaching the time when we will have a complete description of the human and other genomes. This information can now be used for understanding the operation of living things. Knowing the sequences of many genes is only the beginning; now we must understand how these genes work, what is their function in the organism, for instance what is the substrate specificity of their proteins, their intracellular localisation, which are the biochemical or informational pathways they affect, which organs depend on which genes for their function. To reach these goals, we must change our approach to the study of genetics. In addition to studying the role of single genes, as has been done so far, we have to move to a whole genome approach, such as the determination of expression profiles of complexes of genes, of the interactions of proteins among themselves and with their substrates, of the widespread effects of mutations and of environmental changes. To achieve these goals, we need an advanced technology and a sophisticated system of knowledge representation, which includes not only the data, but also their relationships. The investigation of the functions of the genome can be aimed at two levels: the transcriptional level, i.e., the formation of the messengers, or the protein level, since proteins are the functional products of most genes. I will begin by considering the transcriptional level, namely the determination of when and to what extent a gene is expressed; these determinations are central to understanding its activity and its functional role. To determine the transcriptional state of all the genes active in a cell, that is the transcriptome, two methods are available; the SAGE method (Serial Analysis of Gene Expression) and the method of the DNA microarrays. In the SAGE method, small segments of the various mRNAs extracted form cells are joined together into a long chain, which is then sequenced. The number of fragments corresponding to different genes, is the measures of their activities. The microarrays method employs chips containing up to 40 000 reference genes. mRNAs extracted from cells and labelled with fluorescent markers are hybridized in parallel to the chip, in such a way that, by

  • Biology International N° 41 (June, 2001)

    25

    measuring the fluorescence, it is possible to determine the numbers of molecules of each mRNA bound to each gene. Both the SAGE and the microarrays methods have advantages and disadvantages. I will concentrate on the microarrays method, which has a more general usefulness. The global study of the transcriptome made possible by the use of microarrays is directed at understanding the function of the genome, because the results so far obtained clearly show that the transcriptome is highly dynamic, showing rapid and extensive changes that affect many genes in response to perturbations or during physiological cellular events, as well as in pathological processes. To do so, it is necessary to concentrate on the behaviour of a large number of genes in a large number of cells. This approach has been applied mainly to two situations: one is to follow the changes of expression in many genes in a cell population undergoing a change, as has been done in the study of the cell cycle in yeast, or of ageing in the mouse brain; the other is comparing cancer cells among themselves or to normal cells. In the first case mRNA samples are obtained at many time points, in the second they are taken from many cancers of the same type. In both cases the results are analysed mathematically in order to obtain a comprehensive representation; widely used is the formation of a matrix, in which the horizontal rows contain the DNAs of the reference genes, one gene in each row, and the vertical columns contain the mRNAs of the various samples corresponding either to the time points or to the various cancers used, one sample per column. Each point of intersection of a row and a column measures the expression of the mRNA corresponding to the gene represented in that row in the cells corresponding to that column. Given that in any situation there are many simultaneous gene changes, the purpose then becomes that of identifying the group of genes that change activity in a given cellular change. These groups are called clusters. In the matrix, clusters are identified as groups of mRNA (the columns of the matrix) that show the most statistically similar behaviour. The identification of clusters is a method for reducing the dimensionality of the system, by considering the behaviour of a large numbers of genes at the same time. The possibility of examining many cells at the same time has important advantages over studying single cells: in fact, in a single cell the expression of many genes is influenced by local factors, like the stage of the cell cycle, the interaction with other cells, the availability of oxygen and nutrients, stochastic events, etc. In a mixed sample, these variations are averaged out. A problem in studying clusters, and in general in the use of microarrays, is which genes are deposited on the chip, and this varies with the nature of the problem under study. Thus in the investigation of changes occurring in yeast under various conditions (e.g., the cell cycle), it is possible to explore simultaneously the expression of almost all genes of this organism, which are known; however, with human and other mammalian genes this is not possible, because not all genes are known. In fact, work with higher organisms usually utilises less than 10 000 reference genes; this means that the behaviour of many genes of possibly considerable importance for the problem under study is not examined. This is a limitation to be kept in mind when considering the results. In some cases, the difficulty has been overcome by using a collection of genes more directly involved in

  • Biology International N° 41 (June, 2001)

    26

    the problem under study: for instance, for studying the behaviour of genes of lymphomas, about 18 000 genes involved in either normal or abnormal lymphocyte development were used in what is called a lymphochip. But even the lymphochip approach is based on an arbitrary assumption: that all the genes relevant to the development of cancer are included among those selected. This problem does not exist in the SAGE method, because there is no a priori gene selection. In the future, when all genes of a given species will have been identified, the problem with the microarrays will be eliminated and it will be possible to put them all on a chip. An additional problem is encountered in the study of cancers using the cluster approach, because the mRNAs used derive from a collection of cells of various types, including both cancerous and normal cells (from connective tissue, blood vessels, infiltrating cells), and it is not known from which cell they derive. Moreover, the cancer cells are a mixture of various types, adding to the uncertainties. However, the identification of separate clusters related to the various cell types often reduces or eliminates this difficulty. I shall now go through a few examples of results obtained by studying gene expression in cancer cells using microarrays. From a practical standpoint their main contribution is the separation of a cancer into subtypes, with an improvement in the diagnosis of the cancer and its prognosis, and sometimes with therapeutic indications. Thus it has been shown that the diffuse B cell lymphoma can be subdivided into two groups, the cells of which correspond to two different stages of B cell differentiation: germinal B cells, and activated B cells. The difference has predictive value, because tumours corresponding to the activated state have a worse prognosis; apparently the indications for treatment are also different. Considerable differences have been observed between gene expression in Acute Lymphoblastic Leukemia and Acute Myeloblastic Leukemia, with the identification of 50 predictor genes, which should help in diagnosis. These genes control a variety of biological properties, and may therefore give also an insight into the pathogenesis of the two diseases; however, none of the gene activities is correlated to the response to chemotherapy. In breast cancer the results essentially confirm what was already known: that there are two broad groups, oestrogen receptor positive and negative cancers; within each group there are subgroups. Especially well distinct is the ErbB2 positive subgroup among the oestrogen negative cells. In cutaneous melanomas 19 tightly clustered genes were found associated with the disease; however, their activity is not associated with clinical or in vitro variables. In agreement with the limited invasiveness of the cancers examined, there was down-regulation of genes for cell spreading, migration, formation of focal adhesions. In contrast, these genes are expressed in uveal melanomas, which are highly invasive and do not display the clusters shown by cutaneous melanomas, suggesting a different origin. A general result emerging from these studies is that the subdivision of genes into separate clusters is fundamentally determined by the differentiation stage represented in the cancer cells; for instance, in lymphoma: whether they correspond to germinal centre cells or activated B cells; in breast cancer: whether they are basal cells or lumenal cells. With some exceptions, the results do not lead to the identification of the genes responsible for the clinical properties of the various cancers, i.e., malignancy, response to

  • Biology International N° 41 (June, 2001)

    27

    therapy. There may be various reasons for this result. One may be technical. In all these studies the number of genes explored is small, perhaps 20-25% of all human genes. It is therefore possible that genes of major importance for cancer were not included. Future work will have to use the maximum possible number of genes, and also a much larger number of samples for each tumour. Another possible reason is the clustering approach, which will detect parallel changes in many genes, rather than the alterations of single genes which may have important consequences. Concentrating on the behaviour of selected genes in the microarrays, rather than on clusters, may eliminate this problem. Finally, there may be a biological reason, namely the extreme complexity of the genome, both structural and functional. Structurally, most genes exist in families in which the various members share considerable similarities and often produce proteins with related functions; and even genes that are largely independent may share particular domains, with related functional similarities. Functionally, the genes of a genome are connected in a network, as is clearly shown by the vast number of genes that change activity under the influence of physiological events, like cell division, or as a response to changes in the environment. As a result, the modifications of the activities of many different genes can lead to similar results, that is, there is a great deal of redundancy. A change critical for the development of a cancer could be therefore produced by different gene alterations, and also by cooperations of changes of various genes. If this is the case, the limited number of genes deposited on the chips may not be a serious limitation, because under conditions of great redundancy, even a limited sample may include representatives of genes with all possible functions. The answer to this question will be given when it will be possible to explore the activities of all genes. The other approach in the postgenomic era is proteomics, based on two-dimensional electrophoresis and mass spectrometry. Proteomics is complementary to genomics and transcriptomics, because it focuses on the products of genes. In effect the study of proteins may even be more significant than the study of mRNAs, because, for a given gene, the mRNA level may give false information on the level of the protein, for various reasons. Also, studies through microarrays usually do not distinguish between the various splice forms, whereas the proteins always do so. Moreover, the study of proteins reveals the post-translational modifications, some of which have great significance on function. The study of proteomics also has its problems, which are different from those encountered in transcriptomics, such as the difficulty of studying hydrophobic proteins, proteins that are very large or very small; in fact, the current technology is especially useful for abundant proteins. These difficulties may be overcome by the development of protein chips, presently under study, which would allow the production of protein microarrays. As for mRNAs, proteins are now studied as elements of a network, which is formed by the interactions among various proteins. Interactions are studied in part by isolating complexes, or by the method of the two-hybrid screen. This method has now been extended to a genome-wide assay, which seems very useful: in fact, it has led to the discovery of more than 1000 interactions among yeast proteins. This work can also

  • Biology International N° 41 (June, 2001)

    28

    be helped by the study of mRNA clusters, because clustered genes tend to be functionally related, and also by the recognition that two genes that are separated in some organisms but fused in others, are also linked in function (the so-called Rosetta Stone approach). In conclusion, to fully investigate the functionality of the genome, both transcriptome and proteome must be studied in detail. It is clear that, in order to fully achieve this result, many problems, both technical and conceptual, will have to be solved.

  • Biology International N° 41 (June, 2001)

    29

    Simulation of Biological Processes in the Genomic Context

    By François Képès

    Atelier de Génomique Cognitive, CNRS ESA 8071 / genopole® 523 Terrasses de l'Agora, 91000 Evry, France.

    What are the salient features of the new scientific context within which model making and simulation will evolve from now on ? Beyond its etymological link to the word "genome," Genomics could ambitiously be defined as a new field of scientific investigation at the crossroads of biology and other sciences. Its tools are those of high-throughput molecular biology as well as bioinformatics. One central problem in Genomics is to forge new tools and improve our capacity to anticipate or predict a cellular or organismal phenotype, starting from the data generated by high-throughput biology: sequence of genomic DNA (genome), mRNA concentrations (transcriptome), protein concentration, activity, localization, interaction (proteome). The typical approach towards this goal has been so far to try and establish statistical correlations between a given molecular polymorphism and an individual feature (Fig.1). However, these correlations have no validity outside the feature under scrutiny, and they do not entail any causality link. In contrast, simulation demands that causal links of general validity be established. The famous example of sickle cell anemia can serve to illustrate this point (Fig.2). The phenotype at the organism level has been reduced to a genotypic cause in several steps. In the opposite direction of this triumph of reductionism, and on largely unknown ground, the goal could become to re-establish a causal tree, starting from the molecular level where the abundant data spring. In the long run, the present users of statistical correlations (biotechnologists, clinicians…) would benefit more than anyone else from this causal and more generic approach that cuts down the tremendous costs of benchwork. Genomics is characterized by the massive accumulation of molecular data, allowing in principle to generate predictions of a more quantitative nature than before. Model-making/simulation is the preferred tool to test quantitative predictions that involve a great number of objects and their interactions. Since we ultimately challenge our understanding of biological phenomena by prediction testing, simulation is going to play a major and increasing role in the progress of the biological sciences. Furthermore, simulation becomes irreplaceable when it is difficult or impossible to experiment on live material for economical, technical or ethical reasons.

    Genomics and simulation Before discussing the limitations of the simulation approach, let us illustrate it with a fictive example that will bring us from live cells all the way to simulation (Fig.3). Differential gene expression has been measured on a bio-chip in tumor versus normal cells.

  • Biology International N° 41 (June, 2001)

    30

    It is sometimes possible to infer the underlying genetic network from such experimental data. A portion of this network can then be used as a model for running a simulation. This simulation may e.g. predict that the network can exist in either of two states, depending on the level of mitogenic stimulation upon the main regulator “REG.” The network state in a normal cell corresponds to its survival, while that in a tumor cell leads to a “CDK”-mediated cell division. In fact, no model is nowadays complete enough to allow such a simulation, save perhaps the case of the developmental switch of the Lambda bacteriophage 1. However, once some difficulties, most notably concerning how the action of several regulators combine onto the same gene, have been overcome, it is probable that simulation will make an effective breakthrough in this arena. Figure 1. Statistical correlation between an individual variation at the molecular level and an individual phenotype. Bottom : as an example, a single nucleotide polymorphism ("SNP") on the genome has been correlated to a propensity of the individual to diabetes. This correlation does not entail any causal link. A causality tree may then be built, either by a costly and lengthy clinical trial or bench work, or under some conditions by simulation.

    Polymorphism ---------------- Susceptibility to at molecular level disease or to medicine - A T G C C A - Low propensity * --------------- - A T G C G A - High Propensity

    Figure 2. Causal analysis. In this instance, the phenotype includes an anemia (organism level) caused by the brittleness of the red blood cells (cell level), owing to the abnormal formation of fibers by the hemoglobin (protein level), a consequence of a nucleotide substitution in the gene that encodes hemoglobin (genotype level). The genomicist receives abundant information mainly at the molecular level, and must therefore learn how to chain up the causal links in the opposite direction, and on largely unknown ground.

    PHENOTYPE

    Patient has anemia Red blood cells are fragile

    Hemoglobin forms filaments Genomic mutation

    GENOTYPE

    RE D U C E

    E ME R GE

  • Biology International N° 41 (June, 2001)

    31

    Prediction and explanation As the above example implies, simulation can be useful to orient a costly bench or clinical experiment or to attribute a probable function to a gene (annotation), and more generally, to generate a prediction that can be tested on live material. Figure 3. From biological sampling to simulation. Messenger RNAs are extracted from normal or tumor cells. The ratio of their concentrations is measured on a bio-chip. The mRNA transcribed from the "REG" or "CDK" genes are more abundant in the tumor cells than in normal cells, giving a red color (not observable on this B&W picture) at the corresponding spot. The opposite holds true for the "APO" gene, giving a green spot. The other spots are yellow, indicating that the concentrations are similar in both samples for the corresponding gene. These partial results are compatible with the idea that "REGulator" encodes a protein that activates the expression of the "Cell Division Kinase" gene and inhibits that of the "APOptosis" gene. With the help of algorithms such as REVEAL 2, the results obtained with the whole bio-chip may allow to infer a portion of a genetic network such as the one shown on top, with REG' and REG" encoding intermediate regulators (-----> , activation ; -----I , inhibition). In this model, activation of REG by a mitogenic agent yields an hyperactivation of CDK, directly and via REG'. This activation results in cell division and tumor proliferation (bottom). In the absence of a mitogenic agent, division and apoptosis remain balanced, and the cell survives without dividing (top). This equilibrium can be studied through simulation, by affecting measured or calculated coefficients "c" to the arrows that link genes (top). Similarly, in the presence of a drug that inactivates REG', the simulation may predict in which direction the network will re-equilibrate. In this simple example, it can easily be seen that REG' inactivation unfavors division and lifts apoptosis inhibition (bottom). The prediction is that the tumor cell will thus be killed by this drug. Please note that the simulation could have been based on a network model provided by a classical approach or by theoretical considerations, just as well as by inference from transcriptomic data like here. Cells Bio-chip Network Simulation normal

    tumor

  • Biology International N° 41 (June, 2001)

    32

    For instance, how would the genetic network re-equilibrate once its node REG' is inactivated by an available drug (Fig.3, lower right)? Running a simulation might indicate that division would be less favored than in an untreated tumor cell, by a drug that inactivates REG' (but not REG''). Furthermore apoptosis (via APO), being no longer inhibited via REG' (or via REG''), would be triggered if REG is abundant, as is the case in tumor cells. This prediction can then be challenged at the bench. The developmental cycle of the T7 bacteriophage has effectively been studied in the same spirit of alternating simulation and benchwork 3. The rates of protein synthesis in T7 mutants whose genes had been shuffled along its circular genome have been evaluated by simulation and, for a subset, by benchwork. There was a partial agreement between the results obtained by both approaches. It appeared from this study that the normal gene order is quasi-optimal for T7 growth under a changing environment. Besides its predictive capacities, simulation may have the explanatory power to falsify or validate the coherence of the model that it implements. For instance, the growth of cell chains of the Anabaena microbe (Fig.4) has been simulated, using the Lindenmayer's systems 4. The first modelling attempt included only an inhibitor of the differentiation into heterocyst cells. It accounted well for the average distance of ten undifferentiated cells between two heterocysts, but it was hypersensitive to the actual parameter values. By adding to the model the notion of an activator coupled to the inhibitor by a specific regulatory circuit (Fig.4), the model became more robust (i.e., no longer depended critically on the parameter values). Independently, the methods of molecular biology later proved this model essentially right 5 6. In essence, simulation was used as a tool for investigation purposes that allowed the demonstration of the explanatory incoherence of the first model. The expert consequently proposed the minimal organization that would re-establish the explanatory coherence of the model, as verified by running a simulation. The molecular structure that implemented the proposed organization was later discovered at the bench.

    State of the art In the genomic context, simulation attempts are very recent and can be subdivided into computational and intracellular approaches. In the latter category fall the implantation in bacteria of small artificial genetic networks comprising one 7, two 8, or three 9 episomal genes. The reporter of the network state is in all cases a fluorescent protein. These attempts have made it possible to test functionally: homeostasis of gene expression by a self-inhibitory gene; a toggle switch made from two mutually inhibitory genes; an oscillator comprising three genes that inhibit each other in a circular permutation. In the former category of computational simulations, the state of the art is represented by a small number of highly funded, highly publicized, projects in the U.S.A. and in Japan. Although they revolve around metabolism, they each have specificities and a personality of their own. Among them, BioDrive 10 allows for instance in the field of embryogenesis to deal with signal transduction from an extracellular morphogene to intranuclear transcriptional effects, through cytoplasmic protein phosphorylations. E-Cell 11 starts from

  • Biology International N° 41 (June, 2001)

    33

    a set of 127 genes encoding the minimal house-keeping metabolic functions, and implements the notions of energetic cost and protein degradation. V-Cell 12 provides a computational framework that takes into account cell geometry and includes the notion of transmembrane flux, using it e.g. to simulate a calcium wave in the neurone. However, all these computational simulators are deeply rooted in a similar philosophy. Their starting point is a list of a few molecular components and their initial concentrations. Given a set of reactions between these components, at each time step, a few differential equations are integrated and the list of concentrations is updated. Figure 4. Model that explains the relative constancy of the distance between two successive heterocysts in a chain of Anabaena cells. This microorganism forms chains of cells (bright, wider) and differentiates into heterocysts (dark, narrower) present about every ten cells. The simulation indicates that through the action of a couple of genes, it is possible to account for the relative constancy of the distance separating two successive heterocysts, with no critical dependance on the parameter values. It is both necessary and sufficient to assume that the activator induces the production of the inhibitor and of itself, while the inhibitor represses the production of the activator. Simulation has made it possible to demonstrate the explanatory coherence of the model. Anabaena

    Activator Inhibitor

    To illustrate one of the criticisms that can be levelled at most of the existing simulators, let us consider the simple case of a cellular metabolon, i.e. a complex of several enzymes that each catalyze one of the successive reactions of the same metabolic chain (Fig.5). Each enzyme of this complex has been purified separately by biochemists over the past decades, and their parameters have been measured in the test tube. These parameters are now injected in the simulators (Fig.5a). However, even the aqueous cellular compartments are highly organized. As has been demonstrated in a high and increasing number of cases, and to allow for sufficiently rapid reactions to sustain the observed growth rates, a metabolon funnels the successive products of the reaction in a channel ("solid-state metabolism"; Fig. 5b) 13. Thus, the local substrate concentration is extremely high, and the diffusion time is negligible. Taking these facts into account would allow for a less erroneous simulation relying on fewer parameters. As can be perceived through this example, the existing simulators have not asked the basic question of how cells function before actually simulating a cellular process. They use for this purpose parameters obtained outside the organizing context of the cell. To fit the observations made on live cells, they have necessarily to introduce ad hoc parameters. Hence they lose any predictive value and they retain little explanatory value.

  • Biology International N° 41 (June, 2001)

    34

    Figure 5. Metabolic chain and channeling metabolon. In this short metabolic chain (a), three successive chemical reactions transform molecule A into molecule D. These three reactions are catalyzed by three enzymes. The first enzyme, named "A>B", accelerates the transformation of substrate A into product B. Product B serves as a substrate for enzyme "B>C" which accelerates its transformation into product C, etc.., hence the notion of a “chain.” This chain is part of the whole metabolism of the organism, it is a metabolic chain. The parameters of each enzyme have been measured separately following purification (affinity Km, kinetic Vmax, and diffusion D). Proper fulfilment of this set of chemical reactions could rely on either the chemical specificity of the interaction between an enzyme and its substrate (order based on thermodynamics, a), or on spatial isolation that would prevent unwanted interactions (order based on localization, b). It appears that these two mechanisms are simultaneously at play in varying proportions. In the absence of membrane border, how can a metabolic chain be spatially isolated in an aqueous compartment? It suffices that the enzymes of this chain have a tendency to associate, either among themselves - dependent or not upon the presence of their substrate - or to fibers (a certain class of skeleton). Numerous cases of multi-enzymatic complexes that channel their substrates/products have already been described, including the complex of the glycolytic enzymes, a major and central metabolic path. For a channeling metabolon, the most relevant parameter may be the coefficients (c) that relate to the fluxes traversing it.

    a

    Km, V max, D Km, V max, D Km, V max, D

    b

    Outlook One way to overcome these problems would be to rely preferentially on data generated from live cells rather than in vitro. Along these lines, the potential of some in vivo approaches has been underevaluated so far. This is typically the case of metabolic fluxes measurements by isotopic labeling and NMR; of the interactome, i.e. the map of pairwise protein interactions obtained by double- or triple-hybrid methods; perhaps also of the morphodynamic observations of macromolecules under the microscope. Each of these approaches has severe inherent limitations that should not be ignored. However they bring information obtained on live material that pertains to molecular organization and sometimes to its dynamics. One of the major challenges ahead of us will be to make the best of the massive amount of molecular data generated by the tools of Genomics. It is clear that the abundance of such

    A B C D

    A>B B>C C>D

    A B C D

    A>B

    C1 C2 C3

    B>C C>D

  • Biology International N° 41 (June, 2001)

    35

    data has tremendously increased in the recent past. Yet upon closer inspection, and with sometimes the exception of genome sequencing, the exhaustivity of such data is a “vendor's argument,” and their quality is generally poor, although slow improvements can be foreseen. To take just one simple example, the lack of resolution of transcriptomic data is such that one cannot tell apart values such as 1.0 and 1.9, or 1.9 and 3.8. In contrast, one can tell 1.0 from 3.8. Thus, the curve in Fig.6 should be honestly described as: “This concentration increases in two hours,” which is a qualitative statement. For a long time to come, the lack of exhaustivity and of quality of the data will be a serious obstacle on the way to the quantitative predictions called for in the introduction. Besides, it would be wasteful to convert the few available quantitative data into qualitative results for the sake of format homogeneity. It will therefore be crucial to succeed in combining in the same simulation the use of quantitative and qualitative results. Figure 6. Low resolution of the numerical data generated by biochips. As a general rule, the resolution of these data are close to a factor of 2, i.e. it is not possible to contend here that the first value is different from the second one, or the second one from the third one. However, the first value is surely different from the third one. This is just one example of the general problem that high-throughput biology produces non-exhaustive, non-quantitative data. mRNA 3.8 concentration 1.9 1.0 0 1 2 Time In terms of prediction, the difficulty is to reach a sufficient accuracy such that the prediction is useful or testable. The number of variables in a biological system of interest is such that even small errors on their values may prevent the generation of a useful prediction. If, however, the purely quantitative approach turns out to be deceptive, other approaches (e.g. qualitative physics, Lindenmayer's systems, logical analysis of feedback circuits 14), too neglected so far, may permit sometimes to make useful predictions from qualitative or semi-quantitative results. Importantly, it often suffices to be able to give the evolutive trend of the final parameter in order to determine the outcome. Coming back to the example of the drug effect on the network (Fig.3, lower right), if apoptosis outweighs division in the tumor cell, cancer will regress, which is the single most important fact. Moreover, this qualitative approach is common among biologists, as evidenced by the widespread use of “models,” little symbolic drawings, at the end of numerous primary publications. In other terms, qualitative reasoning is fundamental for the elaboration of knowledge in biology.

  • Biology International N° 41 (June, 2001)

    36

    A task for the near future could be to build a common set of models. Ideally, this set should span various usual types throughout biology and various organization levels. Given the molecular rooting of Genomics, it would be desirable, although this is at first sight counter-intuitive, that at any organizational level, the model directly considers molecules (e.g. hormone/receptor at the organismal level, or membrane protein coat at the organellar level). Such a common set of models would ease the comparison of various simulators created world-wide. In the same community spirit, and to cut down on costly benchwork, it would probably be a good idea to organize and improve the synergy between conventional and computational experimentations. The Anabaena case (Fig.4) pleads for such a synergy. The existing simulators distinguish between sub