the esf programme on integrated approaches for functional genomics workshop on ‘proteomics: focus...

8
Conference Report The ESF programme on Integrated Approaches for Functional Genomics workshop on ‘Proteomics: Focus on protein interactions’ 11–13 May 2001, Villa Mondragone, Frascati, Rome, Italy Joan Marsh, Publishing Editor John Wiley & Sons, Ltd., Baffins Lane, Chichester, West Sussex PO19 1UD, UK This inaugural meeting of the European Science Foundation’s programme, ‘Integrated Approaches for Functional Genomics’, was opened by Gianni Cesareni (University of Rome Tor Vergata), one of the organisers. He compared proteomics to a jigsaw, but more difficult because it involves four dimensions and some of the pieces are still missing. Ideally, we would like to know the affinity of each protein for each of its binding partners. This will require much more work and there are several complementary approaches. (i) Most current knowledge of protein-protein interactions resides in the published literature but it is not all readily accessible. Various groups are working on ways to extract this information and compile databases of protein- protein interactions (see Brannetti et al., p. 314 and Blaschke et al., p. 310). (ii) Protein complexes can be isolated from cell extracts and the components identified. (iii) There are experimental means to detect possi- ble interactions among proteins, eg. yeast two- hybrid screening, but these do not necessarily reflect what happens inside the cell. (iv) Characterisation of the specificity of individual protein-protein interactions. (v) Protein interactions may be predicted computa- tionally, based on structural information. There are two views of the type of protein interactions that occur within a cell, which have different implications for cluster modelling. One sees a fixed complement of given complexes, with most proteins existing as multimers: this will give discrete unique clusters. The other sees many monomers in constant exchange among different multimeric com- plexes, which produces extended clusters. Genetic methods The first formal presentation was given by Pierre Legrain (Hybrigenics, Paris) who discussed ways of comparing datasets from large-scale experiments searching for protein interactions (see Legrain, p. 310). Three important issues in this area are the incompleteness of these datasets, the abundance of false positive results, and the large size of the datasets, which result in the need for specific bioinformatic tools. He explained that the primary users of protein interaction data are biologists, who need exploratory tools, such as PIMRider TM , deve- loped by Hybrigenics. Other users of the data are bioinformaticians, who need the data to be deposited in a convenient, extractable format, allowing further analysis, such as clustering of interactions. He also discussed the issue of regulated access: Hybrigenics provides PIMRider TM to academic laboratories for free, but because they are a commercial company they have to pay for, or may even be denied access to, many sources of data. Dr Legrain’s group uses the yeast two-hybrid system to detect protein interactions, whereas Brian Kay (University of Wisconsin) uses phage display of libraries of combinatorial peptides (see Kay, p. 304). He uses this approach to identify peptides that bind to a chosen target protein and then uses the consensus among the selected peptides to predict which protein might bind to the chosen protein in the cell. As pointed out by Mike Taussig, this has the disadvantage of detecting only those interactions that involve short peptides and are independent of three-dimensional structure. Many of the Comparative and Functional Genomics Comp Funct Genom 2001; 2: 319–326. DOI: 10.1002 / cfg.109 Copyright # 2001 John Wiley & Sons, Ltd.

Upload: joan-marsh

Post on 06-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The ESF programme on Integrated Approaches for Functional Genomics workshop on ‘Proteomics: Focus on protein interactions’ : 11–13 May 2001, Villa Mondragone, Frascati, Rome,

Conference Report

The ESF programme on IntegratedApproaches for Functional Genomicsworkshop on ‘Proteomics: Focus on proteininteractions’11–13 May 2001, Villa Mondragone, Frascati, Rome, Italy

Joan Marsh, Publishing EditorJohn Wiley & Sons, Ltd., Baffins Lane, Chichester, West Sussex PO19 1UD, UK

This inaugural meeting of the European ScienceFoundation’s programme, ‘Integrated Approachesfor Functional Genomics’, was opened by GianniCesareni (University of Rome Tor Vergata), oneof the organisers. He compared proteomics to ajigsaw, but more difficult because it involves fourdimensions and some of the pieces are still missing.Ideally, we would like to know the affinity of eachprotein for each of its binding partners. This willrequire much more work and there are severalcomplementary approaches.

(i) Most current knowledge of protein-protein

interactions resides in the published literature

but it is not all readily accessible. Various

groups are working on ways to extract this

information and compile databases of protein-

protein interactions (see Brannetti et al., p. 314

and Blaschke et al., p. 310).

(ii) Protein complexes can be isolated from cell

extracts and the components identified.

(iii) There are experimental means to detect possi-

ble interactions among proteins, eg. yeast two-

hybrid screening, but these do not necessarily

reflect what happens inside the cell.

(iv) Characterisation of the specificity of individual

protein-protein interactions.

(v) Protein interactions may be predicted computa-

tionally, based on structural information.

There are two views of the type of proteininteractions that occur within a cell, which havedifferent implications for cluster modelling. One seesa fixed complement of given complexes, with mostproteins existing as multimers: this will give discreteunique clusters. The other sees many monomers inconstant exchange among different multimeric com-plexes, which produces extended clusters.

Genetic methods

The first formal presentation was given by Pierre

Legrain (Hybrigenics, Paris) who discussed ways ofcomparing datasets from large-scale experimentssearching for protein interactions (see Legrain,p. 310). Three important issues in this area are theincompleteness of these datasets, the abundance offalse positive results, and the large size of thedatasets, which result in the need for specificbioinformatic tools. He explained that the primaryusers of protein interaction data are biologists, whoneed exploratory tools, such as PIMRiderTM, deve-loped by Hybrigenics. Other users of the dataare bioinformaticians, who need the data to bedeposited in a convenient, extractable format,allowing further analysis, such as clustering ofinteractions.

He also discussed the issue of regulated access:Hybrigenics provides PIMRiderTM to academiclaboratories for free, but because they are acommercial company they have to pay for, ormay even be denied access to, many sources ofdata.

Dr Legrain’s group uses the yeast two-hybridsystem to detect protein interactions, whereas Brian

Kay (University of Wisconsin) uses phage display oflibraries of combinatorial peptides (see Kay,p. 304). He uses this approach to identify peptidesthat bind to a chosen target protein and then usesthe consensus among the selected peptides topredict which protein might bind to the chosenprotein in the cell.

As pointed out by Mike Taussig, this has thedisadvantage of detecting only those interactionsthat involve short peptides and are independentof three-dimensional structure. Many of the

Comparative and Functional Genomics

Comp Funct Genom 2001; 2: 319–326.DOI: 10.1002 / cfg.109

Copyright # 2001 John Wiley & Sons, Ltd.

Page 2: The ESF programme on Integrated Approaches for Functional Genomics workshop on ‘Proteomics: Focus on protein interactions’ : 11–13 May 2001, Villa Mondragone, Frascati, Rome,

interacting peptides are found to be on loops inprotein structures, close to proline-rich regions thatdisrupt higher-order structure.

The final genetic approach to detecting proteininteractions was described by Andreas Pluckthun(University of Zurich) who exploits a Darwinianmethod of repeated diversification and selection forevolution of antibodies. This can be carried outin vitro for single selections using ribosome display.In this approach, the mRNA is completely trans-lated but the nascent protein remains trapped in theribosome complex together with the mRNA; theprotein emerges through the ribosomal tunnel,whereupon the relevant domain can fold and bindto a target. The complex is stabilised by addition ofMg++. Antibodies are the principal examples todate of proteins selected by ribosome display. Forthose that bind a given target, the mRNA isisolated and converted back into DNA by reversetranscription PCR under conditions that allowerrors to be introduced (use of non proof-readingTaq polymerase); selection is then repeated, lookingfor proteins with improved binding affinity. ThisPCR-based system, which avoids bacterial trans-formation, can generate huge libraries very quickly;each cycle takes one day and libraries of 1011–1012

proteins are produced. Greater numbers are possi-ble, but Dr Pluckthun said that it is preferable toperform further rounds of variation and selectionthan to increase the size of the library by 100-fold.The system is compatible with stringent selectionconditions; many people consider ribosomes to befragile components but the complexes are reason-ably robust. This method has been used inconjunction with the synthetic Human Combina-torial Antibody Library (HuCAL) [1] to selectantibodies against insulin and a DNA quadruplexstructure. The combination of a synthetic libraryand secondary affinity maturation mimics theversatility of the immune response in vitro.

Selection conditions in ribosome display can bemanipulated to favour certain properties amongstthe newly evolved proteins. For example, forimproved affinity, long off-rate selections (up to250 hours) are used with biotinylated ligands incompetition with an excess of unlabelled antigen;since the on-rate of most antibodies falls within anarrow range, off-rate selection alone producesantibodies with increased affinity.

Parallel screening of several antibodies andantigens simultaneously (‘library versus library’)can be achieved by moving the process inside the

cell, and using growth selection to screen forinteractions in a ‘Protein Fragment Complementa-tion Assay’ (PCA). The idea is to cut an essentialenzyme in half and allow the two parts to interactin the cell; reconstitution of the enzyme, in thecorrect geometry for activity, is mediated by asecondary interaction between an antigen and anti-body. Thus, one half of the DNA encoding theenzyme, in this case dihydrofolate reductase(DHFR), is fused to the DNA for a single chainantibody, while the other half is fused to the DNAfor a potential ligand. Binding of the antibody toligand will reunite the halves, restoring enzymeactivity and allowing the cells to grow on minimalmedium. This system was shown to work inEscherichia coli and to possess the requisite specifi-city: when a mixture of antigen and antibody DNAswas used, such that 21 possible double trans-formants could be made, the only cells that grewwere the three expressing cognate antigen-antibodypairs. This assay has the advantage of being fast,with simple handling; no expression or purificationof antibody and antigen is necessary and only theDNA is needed. It also has a very high signal : noiseratio, with a 107 fold difference between cognateand noncognate pairs. Its limitations are that boththe antigen and the antibody must be stable in thecytoplasm (Dr Pluckthun’s group has constructedsuch an antibody library by removing disulphidebonds) and that the length of the linker needs to beoptimised, which has also been done. The PCAsuffers from the same restriction in library size asphage display, since it involves bacterial trans-formation, and the proteins must fold correctly inE. coli, which does not always happen. However,it will work with individual domains and isparticularly sensitive, because only very smallamounts of DHFR are required to restore growth.This assay does not select for high affinity binding,for which ribosome display is better, but does allowfor the simultaneous isolation of several antigen-antibody interacting pairs.

Peptide and protein chips

This session began with Jens Schneider-Mergener(Jerini AG, Berlin) describing the SPOT technologyfor highly parallel synthesis of peptides, in arrayformat, on flat surfaces (see Schneider-Mergener,p. 307). These arrays can then be used for mappingprotein-protein interactions, amongst other uses.

320 Conference Report

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 319–326.

Page 3: The ESF programme on Integrated Approaches for Functional Genomics workshop on ‘Proteomics: Focus on protein interactions’ : 11–13 May 2001, Villa Mondragone, Frascati, Rome,

One application of this technique relies on selection,in this case using chemical mutagenesis to drive theevolution of new sequences.

The danger, pointed out by Gianni Cesareni, isthat by testing all possible binding interactionsamongst domains, one will find a ligand for anygiven target but if the natural ligand involves morethan one binding motif, one may never find a truehigh affinity ligand. Brian Kay suggested synthesis-ing a complete yeast or bacterial genome for use asa protein set: this has been considered but not yetattempted.

Dolores Cahill (Max-Planck-Institute of MolecularGenetics) has exploited the robotics developed byHans Lehrach’s group, originally for generation ofhigh density DNA arrays, and has adapted these tohigh-throughput protein analysis. She described thestrategy used to generate high-density arrays ofthousands of proteins from a non-redundant cDNAlibrary, where each protein is represented only once.This involves combining three well-known technol-ogies, ie. cDNA library construction, recombinantprotein expression and high density colony array-ing. The proteins, which are His-tagged, aregenerally expressed from E. coli, but Saccharo-myces, Pichia pastoris and in vitro expression arenow also being explored. Robots pick bacterialcolonies, array them into microplates and rearraythe library according to selected coordinates ontoPVDF membranes, where expression is induced.

Dr Cahill described work on a cDNA libraryprepared from human fetal brain mRNA (chosenbecause of the high number of genes expressed inthis tissue compared with other tissues). Afterscreening for in-frame protein expression usingantibody to the His tag, and characterising clonesusing the hybridisation technique of oligonucleo-tide fingerprinting, a non-redundant UNIgene/UNIprotein set of about 15,000 clones was gener-ated. Dr Cahill found an estimated 13,000 singlyrepresented cDNA sequences, of which 60% repre-sent full-length clones. Up to 4800 proteins could bearrayed on a standard microscope slide. Some ofthe drawbacks that Dr Cahill pointed out are thatE. coli expression misses membrane proteins andproteins which are toxic to the bacteria, in addition,there are no glycosylations or post-translationalmodifications, and the proteins are denatured dueto the conditions used to lyse the E. coli.

Applications of protein microarrays that DrCahill described include screening with antibodies,to identify the antigen ligand, and screening of

patient sera. In one example, the protein array wasscreened with sera from autoimmune patients andpositives were picked out, though the signals weresometimes weak and better results were obtained onthe redundant set. When a positive is found, theprotein can easily be identified, by sequencing thecDNA insert from the bacterial clone. The proteinarray is also being used for parallel selection ofantibodies from phage display libraries, from whichantibody arrays are being developed on glass orPVDF membranes. The aim for antibody arrays isto produce profiles of the proteins expressed in agiven tissue. However, with an estimated 30–40,000genes in the human genome, Dr Cahill believes thatalternative splicing will produce about 10 times asmany proteins: what fraction of these will beexpressed in any given cell type, or detectable bythe array, is not yet known.

Ian Humphery-Smith (University of Utrecht) istrying to produce antibody arrays that can be usedto monitor the protein complement of all humantissues in an integrated approach to health anddisease. The objective is to make a reporter matrixcapable of following the protein output of everyhuman gene, which will complement DNA arraydata on gene expression. The three reasons forgoing to an array-based approach to proteomics areparallelisation, miniaturisation and automation.Antibody arrays will allow screening of the entirehuman proteome, but the antibodies will first haveto be produced and then screened against proteinarrays. The traditional approach for assessing anti-body cross-reactivity involves hybridisation againsta panel of over 200 tissue sections, but most cellproteins are in low abundance, while most of theabundant proteins are common to all cells. Toestablish their specificity, antibodies must thereforebe screened extensively on protein (antigen) arrays.The dilemma is to make antibody arrays relevant intime and cost of production. To try and obtainantibodies against every human protein, DrHumphery-Smith is immunising mice in parallelwith conserved exons, tagged with an immunogenicenhancer, which directs the antigen to antigen-presenting cells. In contrast to Andreas Pluckthun,he believes that in vitro technology is not sufficientto replicate the full repertoire of the immunesystem. Because for every human gene there willbe on average 10 products, the proteome is at least400,000 proteins strong, and to produce antibodiesagainst that number of antigens by any technologywill be a daunting task; employing the conserved

Conference Report 321

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 319–326.

Page 4: The ESF programme on Integrated Approaches for Functional Genomics workshop on ‘Proteomics: Focus on protein interactions’ : 11–13 May 2001, Villa Mondragone, Frascati, Rome,

exon immunisation strategy enables one to followgene product families. He believes that to getrelevant antigenicity emulations in the form ofprotein arrays with which to screen the antibodieswill take 2–3 years.

The real challenge is to produce antibody arrayscheaply, which means integrating robotics at eachstep in order to process thousands of antibodiesagainst thousands of protein antigens. To do this,over the last two years Humphery-Smith’s grouphas been developing prototype robotics designed toprocess more than 10 million ELISA equivalentsevery 2–3 hours. Optimal screening will require100,000 different recombinant proteins on a chip,which will not be ready for 2–3 years, but the robotwill be tested at 50,000 ELISA equivalents per dayover the next six months. It is designed to screen1500 different antibodies against 100,000 antigensin 100 million ELISA equivalents per day. Thesenumbers are needed in order to get to grips withbuilding affinity ligands against the elements on thehuman proteome. The objective in fact is to havefive of these robots operating half a billion ELISAequivalents per day within the next three years inUtrecht.

The improvements in protein arrays that areneeded are in the surface chemistry: arrays need asurface that will bind the antigen and nothing else,to avoid background and eliminate the need forblocking and washing. The antigen needs to be heldin conditions that maintain its conformation: plentyof hydroxyl groups are used to mimic an aqueousenvironment. These conditions also need to beadjustable so that the binding of a given compound,be it a protein, nucleic acid or chemical ligand, tothe protein array can be titrated. A problem withyeast two-hybrid assays is that many proteins willinteract to a certain extent with some affinity. Theprotein-protein interaction studies now being per-formed will create vast amounts of data and DrHumphery-Smith stressed the need for furtherinvestment in data management, processing andinterrogation techniques.

Since the proteins of perhaps 10% of the genomecannot be expressed or even cloned, there is a needfor a method to ‘close’ the genome to provide com-plete coverage. For this Humphery-Smith is identify-ing ‘signature peptides’, elements within proteins thathave a functional or structural role in proteininteractions; antibodies will be raised against thesesequences as recombinant peptides and then screenedfor conformational cross-reactivity against protein

arrays. Finally, he called for support for the newly-formed Human Proteome Organisation (http://www.hupo.org/).

Mass spectrometry

The limitations of genome sequence for under-standing biological function were beautifully illu-strated by Peter Roepstorff (University of SouthernDenmark, Odense), who asked us to consider acaterpillar and a butterfly: very different organismphenotypes but with the same genome sequence.Hence the need to study proteins and one of themost successful ways of doing this is by using massspectrometry, which is central to proteomics and thequalitative and quantitative large scale analysis ofcomplex protein mixtures. A second major challengeis to identify not just the core proteins, but also thepost-translational modifications (PTMs), such asglycosylation, phosphorylation and acylation. As aresult of these and other PTMs, current estimatesare that there will be about 10 times as manymolecular protein species as there are genes in agiven organism.

Other issues to be addressed in proteomics include:

(i) Either eliminating the need for protein separa-tion or finding an alternative method to 2Dgels, which remain as much an art as a science.Multidimensional capillary separation is onepossibility, but this has only a fraction of theseparating power of current gels.

(ii) The poor quality of current genome annota-tion, which may be circumvented by searchingcrude genomic sequence using mass spectro-metry data.

(iii) Quantitation: should this be performed at theseparation stage, comparing spot intensities ingels, or during mass spectrometry?

(iv) Should protein interaction studies use gel-basedor affinity-based techniques?

Dr Roepstorff termed detection of PTMs‘modification-specific proteomics’ and introduced anew ‘omics’ - modificomics. He described detectionof sites of phosphorylation using mass spectrometryon proteins isolated with anti-phosphotyrosineantibodies [3]. With MALDI mass mapping, alkalinephosphatase treatment directly on the target can beused to compare spectra to see where phosph-ate groups have been lost. He has successfullydeveloped a means of detecting protein nitration, a

322 Conference Report

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 319–326.

Page 5: The ESF programme on Integrated Approaches for Functional Genomics workshop on ‘Proteomics: Focus on protein interactions’ : 11–13 May 2001, Villa Mondragone, Frascati, Rome,

modification which occurs in different infectiousand stress conditions in the cell. In an exploratorymodel of in vitro nitration of BSA, Western blottingwith an antibody against nitrotyrosine was used toloacte the nitrated protein. Nitrated peptides couldbe identified after in-gel digestion and the nitrotyr-osines identified by electrospray ionisation massspectrometry. This revealed that two peptides werenitrated, while precursor ion scanning for theimmonium ion for nitrotyrosine revealed two addi-tional partially nitrated peptides [4] He has alsoaddressed the complexity of glycosylation, findingthat c-interferon possesses 13 putative glycanstructures on Asn25 and 18 on Asn97 [6]. Suchcomplicated glycosylation heterogeneity is prob-ably important and a way nature uses to fine-tuneinteractions. Finally, in the area of protein inter-actions, Dr Roepstorff has used DNA attached tomagnetic beads as a substrate to isolate DNA-binding proteins that were then characterised usingMALDI mass spectrometry [2]. His philosophy isthat the study of protein interactions should makeuse of all available tools in the hope that theresults will all point in the same direction.

In response to questions, Dr Roepstorff ex-plained that mass spectrometry can generateenough peptide sequence data to identify proteinswhose gene sequence is known, while for proteinswhose sequence is not in the databases, enoughinformation can be obtained by MS/MS for clon-ing. For de novo sequencing, software programs areavailable to help with the interpretation of massspectrometry data, although none provide auto-mated sequencing from the mass spectra.

Benedetta Mattei (University of Rome, LaSapienza) presented studies combining surfaceplasmon resonance with mass spectrometry for theanalysis of protein interactions, that she performedwhile working with Peter Roepstorff (see Matteiet al. next issue). In this approach, surface plasmonresonance is used to capture peptides interactingwith a chosen protein, which are then identified bymass spectrometry. This can be used to map thedomain of a protein that is forming the interactionsurface.

A major development was described byCarol Robinson (University of Oxford) who hassuccessfully applied quadrupole time-of-flight (Q-TOF) mass spectrometry to large complexes heldtogether solely by non-covalent interactions. This isnot a high-throughput approach but the elegance ofthe experiments impressed everyone present. She

uses nanoflow electrospray with low volumes invery small capillaries and allows evaporation toproceed slowly. A gentle pressure gradient guidesthe intact complex across the chamber. Slightlyincreasing the amount of energy in the systemcauses the complex to break up into its more stableconstituents, providing valuable information aboutthe probable routes of assembly. Dr Robinsonstudied a recently described molecular chaperone,MtGimC, which was shown to consist of two a andfour b subunits. Further examination showed thatthe assembly process was highly cooperative, withno intermediates being detected.

Dr Robinson has also investigated protein-RNAinteractions, which are stronger than protein-proteininteractions in the gas phase and thus rather moredifficult to study. TRAP is an RNA-binding proteinthat was known to bind tryptophan. She showed thatin the absence of tryptophan, 11 TRAP monomersassociate autonomously: addition of tryptophan ledto cooperative binding. She also confirmed theexistence of a 22-mer, which had been postulatedbut never proven.

Another experiment looked at the composition ofthe E. coli ribosome, a huge complex of three RNAmolecules and 55 different proteins. This can bemaintained intact in the mass spectrometer, butlowering the Mg++ concentration causes it todissociate into the 50S and 30S subunits. Furthercontrolled dissociation of the 50S particle led to theidentification of pentamers and hexamers, as well asmonomers, elucidating the structure of the ribo-some [5]. This paves the way for dynamic studies ofthe ribosome’s response to therapeutic agents.Finally, Dr Robinson mentioned studies of theyeast spliceosome. This RNA-protein complex hasnever been crystallised. By dissecting the subunitcomposition, she has obtained useful informationon the stability of the various components, whichshould facilitate crystallisation attempts.

Giulio Superti-Furga (CellZome GmbH, Heidelberg)has focused on protein complexes in a differentway, exploiting the fact that tyrosine phosphoryla-tion is almost invariably associated with protein-protein interactions. Endogenous genes are taggedin situ then translated. Phosphatase activity isblocked using vanadate, then the proteins areimmunoprecipitated with an anti-phosphotyrosineantibody. The proteins are purified using tandemaffinity purification, which is reliable and robustand can be adapted to high-throughput studies: thesystem is now handling 4000 MALDI samples per

Conference Report 323

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 319–326.

Page 6: The ESF programme on Integrated Approaches for Functional Genomics workshop on ‘Proteomics: Focus on protein interactions’ : 11–13 May 2001, Villa Mondragone, Frascati, Rome,

week. The aim is to identify all the protein com-plexes in Saccharomyces cerevisiae. The data areautomatically compared with binary interactionslisted in the Yeast Protein Database; novel com-ponents are then annotated manually. This is anambitious project that had just come on-stream atthe time of the workshop but which should soon beproviding useful information. There are limitations,as raised in the discussion: for example, this methodwill not detect membrane proteins, nor does it sayanything about the stoichiometry or the specificityof the interactions. Pierre Legrain pointed out thatresearchers see protein interactions in differentways: geneticists are interested in pathways, linkingproteins in a linear chain of function, whereasbiochemists are interested in complexes that associ-ate to perform a given task. He also raised the issueof subcellular localisation: many proteins are activeonly within a given cellular compartment and anyinteractions detected outside of that compartmentprobably bear little relevance to their function. DrSuperti-Furga acknowledged these caveats butemphasised that collection of the data is just thebeginning, interpretation of the data will be the realchallenge.

Bioinformatics

The first session was concerned with the annota-tion of genomic and protein databases. MichaelSternberg (Imperial College, London) is exploitingthe rapidly increasing repository of informationon protein structure to understand more aboutstructure-function relationships. Integrase and ribo-nuclease H show very little sequence homology buthave conserved structural homology, particularlyaround the active site. Chymotrypsin and subtilisinperform the same function but use different proteinfolds to do so. On the other hand, lysozyme anda-lactalbumin share almost identical structures butone is enzymatically active and the other is not.

Automated methods for protein structure predic-tion are gradually improving, and homology mod-elling has revealed that most amino acid changesoccur in loops and exposed areas of proteins. Whentwo proteins show good structural homology, anautomated prediction is nearly as accurate as amanual one. However, when the homology fallsbelow 40%, manual input becomes essential.

In Dr Sternberg’s automated analysis, the pro-gram 3D-JIGSAW (http://www.bmm.icnet.uk/servers/

3djigsaw/) compares the sequence of a query proteinto the templates in their library. If there is no directstructural ‘hit’, then a fold recognition algorithm,3D-PSSM (http://www.bmm.icnet.uk/y3dpssm/), isapplied to model the fold using known structuralinformation, this can detect more remote homo-logies that PSI-BLAST misses. Dr Sternberg thentakes information on biological function derivedfrom text searches of the PubMed archive and usesthis to manually rank the target proteins proposedby 3D-PSSM. Last year, this technique proved themost efficient in a CASP4 open competition fordetermining structures that excluded novel folds. Itcould be used to facilitate future drug development,specifically in the design of antagonists or inhibitorsto fit into active site pockets.

Thure Etzold (European Bioinformatics Institute)chose to focus on the problems raised by theexplosion in the amount of protein interactiondata and the need for an integrated solution.There are many collections of protein interactiondata being assembled and many use their own set ofstandards, which in turn are based on differentassumptions. The NCBI developed standards for itsown use that have since been adopted more widely;ACeDB was developed to store genome data for theworm Caenorhabditis elegans and was subsequentlyexpanded for more general use; Array Express isnot a standard, but is a technology that has foundbroad application.

Dr Etzold classified people who build solutions asgeneralists or pragmatists: the former focus onflexible and extensible solutions that are often slowand do not match the detailed requirements of aparticular task; the latter create a system that doesexactly what is required initially but cannot bereadily adapted and is often short lived. LionBiosciences’ solution was to create a network ofdatabases, the SRS library network, which hasexplicit external references, such as a link to aspecific database, and implicit cross-references, inthe form of common fields and standard termswithin those fields, for example, correct taxonomi-cal terms for organisms and Gene Ontologynomenclature for genes and their functions. Theproblem with this approach is that there are manysources of error: errors in the databases, changes tocontent or format, and differences in concept. Theenvisaged solution is a database that has frequent,automatic and extensive consistency checks. Suchan integrated database would need to enablescientists to create data viewers, see analysis flows,

324 Conference Report

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 319–326.

Page 7: The ESF programme on Integrated Approaches for Functional Genomics workshop on ‘Proteomics: Focus on protein interactions’ : 11–13 May 2001, Villa Mondragone, Frascati, Rome,

add or change the integrated databanks and thelinks between them. All of this would need to beintegrated with analysis tools. SRS has a descriptivelanguage for such extensions but users may perceivethis as additional programming, therefore itrequires a graphical interface.

Peter Roepstorff asked whether, as an extremeexample, a PhD student should be encouraged todevelop a new database or persuaded to use theexisting technology to avoid further confusion.Thure Etzold replied that new technology isalways needed as the field matures and creativityshould be encouraged. As standard tools are morewidely adopted, it becomes easier for new featuresto be integrated into existing structures. PierreLegrain called for a more systematic gene nomen-clature: the current system has evolved over 100years with no clear pattern so that many genenames do not refer to the primary function of thegene product. He said that if biologists do nottackle this problem now, it will be even worse later.Thure Etzold replied that the Gene Ontologyscheme is making progress in this area.

The second half of the Bioinformatics sessionaddressed the issue of trying to extract informationon protein interactions from the published literature.Manuela Helmer Citterich (University of Rome, TorVergata) described iSPOT, a web tool for inferringprotein-protein interactions (see Brannetti et al.,p. 314), and MINT, a database dedicated to pro-tein interactions (see Wixon, p. 338). iSPOT doesnot need the 3D structure of the domains understudy and can make a prediction as long as theirsequences can be confidently aligned to domains ofknown structure from the same families. Thereliability of the prediction depends on the levelof sequence identity between the query domainsand the domains whose experimentally determinedbinding data have been used to train the software.

Rita Casadio (University of Bologna) presented amethod for the prediction of protein-protein inter-action sites, based on neural networks. The aim isto identify those regions on a protein surface thatinteract with other proteins. She used informationfrom several databases amounting to several thou-sand known interactions among over 4000 proteinsto look at shape, chemical complementarity, andthe distribution of charged and polar residues butfound no characteristics that typified protein-protein interacting surfaces. Such surfaces showedthe same residue composition as other regions ofthe proteins.

Dr Casadio examined bacterial luciferase, whichcomprises a heterodimer that has been crystallised.She conceptually dissected the crystal and studiedthe interacting surfaces. This process was repeatedfor many known examples and the information wasused to ‘train’ a neural network to extract generalrules concerning protein interactions. These werestored in an algorithm that could, when giveninformation about a new structure, predict whichparts of the protein were most likely to form theinterface. For a mouse antibody fragment, thealgorithm correctly detected 73% of interactingresidues; although it did miss some and gave acouple of false positive results. This method couldbe useful in complementing results from otherstudies in proteomics. However, one drawback inusing data from individual protein structures is thatthere may be conformational changes during com-plex formation.

Jong Park (MRC-Dunn Institute, Cambridge)believes that the direct interaction space for proteinsis too big (with many millions of possible interactions)for it to be possible to globally map them. He is takinga different approach, constructing a general proteininteraction map using structural information from theProtein Data Bank (PDB). By applying a strictcriterion of structural interaction, he is able toconstruct a broad and general, but definite, map. Hedefines a domain arbitrarily, merging structural andsequence alignment data. He then incorporates phy-logenetic information on protein family interactions,and data on fold detection and classification. Thesedatasets are then used to predict interactions amonghomologues, but the data have to be validated. Theoutcome is PSIMAP (http://www.biointeraction.net/),which contains information on all known domains.Dr Park proposes that intra- and intermolecularinteractions are fundamentally different. He statesthat protein folds have diverse and distinct reper-toires of interactions, which should ultimately maketheir classification simpler, but there is still a longway to go.

In response to questions, Dr Park conceded thatthis method cannot deal with protein interactionsthat are mediated by unstructured regions and thathe has not yet examined why interactions bet-ween certain pairs of folds are favoured. AndreasPluckthun pointed out that the immunoglobulindomain, while stable, is used differently in variousreceptors: cellular receptors present the edge of thedomain for binding, whereas antibodies use a morecentral part of the domain. Dr Park replied that

Conference Report 325

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 319–326.

Page 8: The ESF programme on Integrated Approaches for Functional Genomics workshop on ‘Proteomics: Focus on protein interactions’ : 11–13 May 2001, Villa Mondragone, Frascati, Rome,

immunoglobulins were a special case, but he feltthat his method could map the majority of inter-actions that occur in a general way.

The meeting was closed by Alfonso Valencia(Protein Design Group, Cantoblanco, Madrid)who described his work on extracting informationconcerning protein interactions from the publishedliterature (see Blaschke et al., p. 310) and a methodfor prediction of protein interactions. The predic-tion method looks for correlated mutations betweenpossible interaction partners and also for comple-mentarity of the phylogenetic trees of potentiallyinteracting proteins, which could indicate co-evolution of the proteins.

References

1. Knappik A, Ge L, Honegger A, et al. Fully synthetic human

combinatorial antibody libraries (HuCAL) based on modular

consensus frameworks and CDRs randomized with trinucleo-

tides. J Mol Biol 2000; 296(1): 57–86.

2. Nordhoff E, Krogsdam AM, Jorgensen HF, et al. Rapid

identification of DNA-binding proteins by mass spectrometry.

Nat Biotechnol 1999; 17(9): 884–888.

3. Pandey A, Podtelejnikov AV, Blagoev B, Bustelo XR, Mann

M, Lodish HF. Analysis of receptor signalling pathways by

mass spectrometry: Identification of Vav-2 as a substrate of

the epidermal and platelet-derived growth factor receptor.

Proc Natl Acad Sci USA 2000; 97: 179–184.

4. Petersson AS, Steen H, Kalume DE, et al. Investigation of

tyrosine nitration in proteins by mass spectrometry. J Mass

Spectrom 2001; 36(6): 616–625.

5. Rostom AA, Fucini P, Benjamin DR, et al. Detection and

selective dissociation of intact ribosomes in a mass spectro-

meter. Proc Natl Acad Sci U S A 2000; 97(10): 5185–5190.

6. Sareneva T, Mortz E, Tolo H, Roepstorff P, Julkunen I.

Biosynthesis and N-glycosylation of human interferon-

gamma. Asn25 and Asn97 differ markedly in how efficiently

they are glycosylated and in their oligosaccharide composi-

tion. Eur J Biochem 1996; 242: 191–200.

This Conference Report aims to present a commentary on the topical issues in genomics studies presentedat the conference. The article represents a personal critical analysis of the reports made at the meeting andaims at providing implications for future genomics studies. This Conference Report was written by JoanMarsh - Publishing Editor, John Wiley and Sons, Ltd, with assistance from Michael Taussig - Chairman ofthe ESF programme on Integrated Approaches for Functional Genomics.

326 Conference Report

Copyright # 2001 John Wiley & Sons, Ltd. Comp Funct Genom 2001; 2: 319–326.