h gn f mbtr lprhansen.bvs.ilsl.br/textoc/revistas/intjlepr/1994/pdf/v62...btn r nht nt th trtr, fntn...

4
Volume 62, Number I Printed in the U.S.A. INTERNATIONAL JOURNAL OF LEPROSY The Genome of Mycobacterium leprae* The behavior of all life forms is deter- mined by a series of complex messages found in coded form in their genes, and the term "genome" is commonly used to refer to the complete set of generic material found with- in an organism. In bacteria, the genome gen- erally consists of one, sometimes two chro- mosomes and these are usually circular, although linear chromosomes have been de- scribed in a few special cases. Extra-chro- mosomal elements, such as episomes, plas- mids or phages, often provide an additional source of genetic information, and are valid constituents of the genome. The discovery of the universal genetic code in the 1960s, and the development of DNA cloning and sequencing techniques in the 1970s have "revolutionized" biological research by al- lowing genes to be isolated and their func- tions to be deduced from the coded message they contain. The genome of Mycobacterium leprae can be considered as a book containing all the information necessary for the growth, sur- vival, reproduction and especially the pathogenicity of the microbe. Thanks to the power of current molecular biological tech- niques, it has been possible to divide that book into chapters and to start to decipher the text contained therein through the iden- tification and characterization of the vari- ous genes. In many cases, one can predict the function of the gene products, the pro- teins, as homologous counterparts with similar sequences described and studied in other organisms. These predictions can sub- sequently be tested by means of further ge- netic, biochemical and immunological'tech- niques and, in the case of Al. leprae, this may be facilitated by the use of surrogate genetics in which the genes are introduced into suitable heterologous hosts such as M. smegmatis or M. bovis BCG. An integrated strategy was devised to tackle the problem of mapping and sequenc- ing the genome of Al. leprae, and this in- volved analysis of chromosomal DNA by * This State-of-the-Art Lecture in Microbiology was presented on 30 August 1993 at the XIV International Leprosy Congress in Orlando, Florida, U.S.A. pulsed field gel electrophoresis (PFGE) of large restriction fragments and the construc- tion of ordered libraries in cosmid vectors. Owing to the difficulties of cultivating Al. leprae, the former approach was unsuccess- ful and, in consequence, the cosmid ap- proach was employed. Cosmids are small vectors, with a large capacity for foreign DNA (up to 45 kilobases), which facilitate the introduction of recombinant DNA mol- ecules into Escherichia coli as they exploit the packaging and infection systems of phage lambda. Conventional cosmids can be used only with E. coli but are ideal substrates for DNA sequencing projects since they are large but not unmanageable. Shuttle cosmids can be used with both E. coli and mycobacteria and, thus, are valuable tools for studying biological functions since they enable Al. leprae genes to be faithfully expressed in cultivable mycobacterial hosts. There are at least five good reasons to justify the use of ordered libraries: 1) M. leprae is an obligately intracellular pathogen that cannot be cultivated in vitro so cloning its DNA, in the form of large pieces, gen- erates a readily renewable source. 2) Or- dered libraries are valuable tools that con- siderably facilitate genetic research and, on many occasions, it has been found that genes for related functions are linked. 3) Com- parison of the maps obtained from the or- dered clones of the chromosomes of related organisms, e.g., Al. tuberculosis and Al. lep- rae, could lead to the identification of loci that have been modified or truncated which may be associated with pathogenesis or slow growth. 4) Ordered shuttle libraries will al- low us to do surrogate genetics and to iden- tify the genes for virulence factors, protec- tive antigens, etc. 5) They also represent an excellent starting point for a genome se- quencing project. A fingerprinting technique has been used to order the cosmids, and this involved gen- erating a set of radio-labeled restriction fragments, characteristic of each clone, that were resolved by gel electrophoresis to gen- erate typical patterns or "fingerprints." These were subjected to computer-assisted matching techniques to detect regions com- mon to several clones and then assembled 122

Upload: letuyen

Post on 23-Apr-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Volume 62, Number IPrinted in the U.S.A.

INTERNATIONAL JOURNAL OF LEPROSY

The Genome of Mycobacterium leprae*

The behavior of all life forms is deter-mined by a series of complex messages foundin coded form in their genes, and the term"genome" is commonly used to refer to thecomplete set of generic material found with-in an organism. In bacteria, the genome gen-erally consists of one, sometimes two chro-mosomes and these are usually circular,although linear chromosomes have been de-scribed in a few special cases. Extra-chro-mosomal elements, such as episomes, plas-mids or phages, often provide an additionalsource of genetic information, and are validconstituents of the genome. The discoveryof the universal genetic code in the 1960s,and the development of DNA cloning andsequencing techniques in the 1970s have"revolutionized" biological research by al-lowing genes to be isolated and their func-tions to be deduced from the coded messagethey contain.

The genome of Mycobacterium leprae canbe considered as a book containing all theinformation necessary for the growth, sur-vival, reproduction and especially thepathogenicity of the microbe. Thanks to thepower of current molecular biological tech-niques, it has been possible to divide thatbook into chapters and to start to decipherthe text contained therein through the iden-tification and characterization of the vari-ous genes. In many cases, one can predictthe function of the gene products, the pro-teins, as homologous counterparts withsimilar sequences described and studied inother organisms. These predictions can sub-sequently be tested by means of further ge-netic, biochemical and immunological'tech-niques and, in the case of Al. leprae, thismay be facilitated by the use of surrogategenetics in which the genes are introducedinto suitable heterologous hosts such as M.smegmatis or M. bovis BCG.

An integrated strategy was devised totackle the problem of mapping and sequenc-ing the genome of Al. leprae, and this in-volved analysis of chromosomal DNA by

* This State-of-the-Art Lecture in Microbiology waspresented on 30 August 1993 at the XIV InternationalLeprosy Congress in Orlando, Florida, U.S.A.

pulsed field gel electrophoresis (PFGE) oflarge restriction fragments and the construc-tion of ordered libraries in cosmid vectors.Owing to the difficulties of cultivating Al.leprae, the former approach was unsuccess-ful and, in consequence, the cosmid ap-proach was employed. Cosmids are smallvectors, with a large capacity for foreignDNA (up to 45 kilobases), which facilitatethe introduction of recombinant DNA mol-ecules into Escherichia coli as they exploitthe packaging and infection systems of phagelambda. Conventional cosmids can be usedonly with E. coli but are ideal substrates forDNA sequencing projects since they are largebut not unmanageable. Shuttle cosmids canbe used with both E. coli and mycobacteriaand, thus, are valuable tools for studyingbiological functions since they enable Al.leprae genes to be faithfully expressed incultivable mycobacterial hosts.

There are at least five good reasons tojustify the use of ordered libraries: 1) M.leprae is an obligately intracellular pathogenthat cannot be cultivated in vitro so cloningits DNA, in the form of large pieces, gen-erates a readily renewable source. 2) Or-dered libraries are valuable tools that con-siderably facilitate genetic research and, onmany occasions, it has been found that genesfor related functions are linked. 3) Com-parison of the maps obtained from the or-dered clones of the chromosomes of relatedorganisms, e.g., Al. tuberculosis and Al. lep-rae, could lead to the identification of locithat have been modified or truncated whichmay be associated with pathogenesis or slowgrowth. 4) Ordered shuttle libraries will al-low us to do surrogate genetics and to iden-tify the genes for virulence factors, protec-tive antigens, etc. 5) They also represent anexcellent starting point for a genome se-quencing project.

A fingerprinting technique has been usedto order the cosmids, and this involved gen-erating a set of radio-labeled restrictionfragments, characteristic of each clone, thatwere resolved by gel electrophoresis to gen-erate typical patterns or "fingerprints."These were subjected to computer-assistedmatching techniques to detect regions com-mon to several clones and then assembled

122

62, 1^ Editorial^ 123

into "contigs" or blocks of contiguous DNAsequences. At present the M. leprae chro-mosome can be represented as four contigsand by summing their sizes one can estimatethe chromosome to have a size of about 2.8megabases. If M. leprae were a typical bac-teria it would have a circular chromosome.Since we have four contigs, this suggests thatwe have four gaps in the collection, prob-ably as a result of unclonable sequences.Nevertheless, this is a remarkable achieve-ment since very few ordered libraries of bac-terial chromosomes exist and those that doexist were obtained from organisms that areeasy to cultivate.

All of the cloned AI. leprae genes havebeen positioned on the contig map, by meansof hybridization, together with a set of genesof a well-conserved sequence from otherbacteria. There are over 80 loci on the pres-ent gene map, including the Al. leprae -spe-cific repetitive sequence RLEP.

To obtain more insight into the structure,function and organization of the M. lepraechromosome, a genome sequencing projectwas initiated with the aim of systematicallysequencing large stretches of chromosomalDNA. If we return to the earlier analog inwhich the chromosome was likened to abook, then each cosmid that is sequencedcan be considered as a chapter and the in-formation that it contains will help us tounravel the mysteries of the life of M. leprae.The message obtained from the first chap-ter, deduced from the sequence of cosmidB1790, can be summarized. Two criteriawere used to identify the genes. Firstly, theiropen reading frames, defined by the dis-tance between two stop codons, should beat least 80 codons long. Secondly, the co-dons employed should conform to the pat-tern found in known Al. leprae genes. Inthis way 12 candidate genes were found, andthe functions of 11 of them were establishedby comparison of their sequences with thoseof all of the genes and proteins present inthe biological databases.

Four of these genes encode proteins thatare, or could be, potential drug targets: rpoBencodes the subunit of RNA polymerasewhich is the target for rifampin, the back-bone of the multidrug therapy currently usedto control leprosy. The sequence informa-tion obtained has been very useful for de-veloping a simple PCR-based diagnostic test

for monitoring rifampin resistance (this willbe described elsewhere during the Congressby Nadine Honore). The rpsL gene encodesribosomal protein S12, which is known tobe the target for streptomycin in other bac-teria, including M. tuberculosis, while efgand tuf code for two elongation factors, Gand Tu, required for protein synthesis. Fu-sidic-acid resistance in bacteria is associatedwith mutations in efg and tetracycline re-sistance can result from mutations in atf.Since AI. leprae is susceptible to both fusidicacid and minocycline, knowledge of the se-quence of the target genes will enable di-agnostic tests for resistance to be developed,should these drugs find widespread clinicalapplication in leprosy control programs.

One of the surprising features whichemerged on reading the first chapter was thescarcity of the genes because only 40% ofthe potential coding capacity was used. Infast-growing bacteria, such as E. coli, morethan 85% of the sequence is coding and thereis a clear succession of genes. This does notappear to be the case in Al. leprae and, toconfirm this, it was decided to sequencemany more cosmids. This was done in con-junction with Doug Smith at CollaborativeResearch in Boston, Massachusetts, U.S.A.,who was sponsored by the NIH Human Ge-nome program to test the feasibility of a newDNA sequencing technique. A further ninecosmids were sequenced completely and an-alyzed for genes and their functions pre-dicted. Again, it was clear that genes wererelatively scarce and that noncoding regionswere extensive. Based on these findings onecan predict that Al. leprae contains onlyabout 1000 protein-coding genes comparedto the estimated 4000 present in E. coli. Itis conceivable that this gene sparsity, andthe consequent reduction in metabolic po-tential, account for the very slow growth ofAl. leprae.

However, as with all rules, there is a strik-ing exception because one of the cosmidswas found to carry a very large amount ofcoding information. On comparison of thesequences, it was apparent that the proteinsencoded were involved in the synthesis ofpolyketides. These are interesting organiccompounds produced by a variety of bac-teria and fungi, and they include many pig-ments such as actinorhodin, antibiotics suchas erythromycins, and toxins, like members

I 24^ International Journal of Leprosy^ 1994

of the aflatoxin family. In mycobacteria,examples of polyketides could include cell-wall components, such as phenolic glycolip-id or mycoserosic acid. Since these com-pounds are specific for mycobacteria, theyrepresent attractive targets for chemother-apy and immunotherapy.

Mechanistically, polyketide biosynthesisis very similar to that of fatty acids as nas-cent chains are extended by the addition oftwo-carbon units in a step-wise manner cat-alyzed by a series of enzymatic reactions. Inthe first step, acetyl-CoA is bound to fatty-acid synthase (FAS), a large multifunctionalenzyme, via a thiol group in the acyl carrierprotein (ACP) domain, then transferred toa second thiol group by an acyl transferaseactivity (AT). Acyl transferase is again in-volved in the binding of malonyl-CoA tothe freed thiol group, and this is condensedwith the bound acetyl-CoA in a step cata-lyzed by ketoacyl synthase (KS), and sub-sequently subjected to a series of reactionsinvolving ketoreductase (KR), dehydratase(DH), and enoylreductase (ER). In the ter-minal step, the product is released from fat-ty-acid synthase by a thioesterase, and theenzyme is free to catalyze further fatty-acidsynthesis. Likewise, the reaction product canbe used as a substrate by other FASs.

On inspection of the genes present on cos-mid L518, it was clear that several of themencode huge proteins bearing strong resem-blances to fatty-acid synthases. Three ofthem have ACP and AT motifs, 4 of themcarry KS domains, 2 have KR motifs, and1 an ER domain. The close proximity of thegenes and the remarkable functional simi-larity of their products suggest that theyprobably interact and effect consecutivesteps in the synthesis of a novel polyketidein M. leprae. The gene arrangement is typ-ical of a polycistronic operon, and theputative transcript would be over 30,000nucleotides long. At its distal end, this tran-script carries the information required toproduce two membrane-bound proteins anda third polypeptide capable of binding ATP.Since these proteins arc all highly homol-ogous to proteins involved in the resistanceto, and the secretion of, a polyketide anti-biotic in Streptomyces, it seems highly likelythat they will be involved in the export ofthe putative M. leprae polyketide. Clearly,extensive biochemical studies are warrant-ed.

To date, ten of the chapters in the leprosybook have been read and a further ten arebeing "browsed" and will be finished beforethe end of 1993. This means that we willsoon have at our disposal over a third ofthe information required by Al. leprae toinfect, survive, and multiply in man. Todate, 140 genes have been sequenced andputative functions attributed to 94 of themon the basis of their homologies with knowngene products. Already we have learned agreat deal about the biology of M. lepraefrom its genes, and we have been able toidentify a variety of housekeeping func-tions, as well as pathways required for thebiosynthesis of heme, purines, amino acidsand peptidoglycan.

At this stage it should be apparent thatthe genome project is generating enormousamounts of information about the genetics,biochemistry and immunology ofM. leprae.In order to make this readily accessible tothe research community, we have devel-oped a customized database, MycDB, usingthe object-orientated software ACeDB toensure that efficient storage, annotation anddissemination of the data can occur. MycDBruns on a variety of computers and work-stations that use the UNIX operating sys-tem and a WINDOWS system. Informationis accessed by selecting (clicking) with themouse button on different objects. Any onepiece of information can lead to further rel-evant information since they are all inter-connected.

On starting the program the first windowdisplays the various "classes" stored in thedatabase and any one of these, for instancethe chromosome class, can be selected byclicking. Having selected the mycobacterialchromosome of interest, one can then selecta particular gene, for instance the groELgene of M. leprae, and access the informa-tion linked to it. As more windows areopened different classes of interrelated in-formation can be obtained. Thus, one cango from the gene to its product, the 65-kDaprotein antigen, and learn what monoclonalantibodies are available that recognize it andfrom whom they can be obtained. Likewiseone can interrogate MycDB with respect tothe location of the groEL gene and, subse-quently, obtain its sequence and other rel-evant information. A literature service isalso available which contains the abstractsof selected papers and in most cases pro-

62, 1^ Editorial^ 125

vides information about the author, his ad-dress, and telephone and fax numbers.MycDB has been developed with the sup-port of the World Health Organization andthe Association Francaise Raoul Follereau,and is supplied free of charge on request orcan be downloaded via the INTERNETcomputer network using anonymous FTP.

To conclude, I hope that I have convincedyou that the M. leprae genome project isadvancing at great pace, that it is uncoveringmany new leads for research in chemother-apy and rational drug design, that it is open-

ing up new avenues for biochemical andimmunological investigation and, above all,is making a significant contribution to thefight against leprosy.

Many thanks to my coworkers and to youfor your attention.

—Stewart T. Cole, Ph.D.Unite de Genetique Moleculaire BacterienneInstitut Pasteur28 Rue du Dr. Roux75724 Paris 15, France