enzyme genomics: application of general enzymatic screens to discover new enzymes

www.fems-microbiology.org

FEMS Microbiology Reviews 29 (2005) 263–279

Enzyme genomics: Application of general enzymaticscreens to discover new enzymes q

Ekaterina Kuznetsova a,b, Michael Proudfoot a, Stephen A. Sanders a,Jeffrey Reinking b, Alexei Savchenko a,b, Cheryl H. Arrowsmith b,c,

Aled M. Edwards a,b,c, Alexander F. Yakunin a,*

a Banting and Best Department of Medical Research, 112 College St., University of Toronto, Toronto, Ont. M5G 1L6, Canadab Department of Medical Biophysics, University of Toronto, Ontario Center for Structural Proteomics, Ontario Cancer Institute,

200 Elizabeth St, Max Bell Research Centre 5R407, Toronto, Ont. M5G 2C4, Canadac Structural Genomics Consortium, 112 College St., University of Toronto, Toronto, Ont. M5G 1L6, Canada

Received 3 November 2004; received in revised form 3 December 2004; accepted 8 December 2004

First published online 28 January 2005

Abstract

In all sequenced genomes, a large fraction of predicted genes encodes proteins of unknown biochemical function and up to 15%

of the genes with ‘‘known’’ function are mis-annotated. Several global approaches are routinely employed to predict function,

including sophisticated sequence analysis, gene expression, protein interaction, and protein structure. In the first coupling of genom-

ics and enzymology, Phizicky and colleagues undertook a screen for specific enzymes using large pools of partially purified proteins

and specific enzymatic assays. Here we present an overview of the further developments of this approach, which involve the use of

general enzymatic assays to screen individually purified proteins for enzymatic activity. The assays have relaxed substrate specificity

and are designed to identify the subclass or sub-subclasses of enzymes (phosphatase, phosphodiesterase/nuclease, protease, esterase,

dehydrogenase, and oxidase) to which the unknown protein belongs. Further biochemical characterization of proteins can be facil-

itated by the application of secondary screens with natural substrates (substrate profiling). We demonstrate here the feasibility and

merits of this approach for hydrolases and oxidoreductases, two very broad and important classes of enzymes. Application of gen-

eral enzymatic screens and substrate profiling can greatly speed up the identification of biochemical function of unknown proteins

and the experimental verification of functional predictions produced by other functional genomics approaches.

� 2005 Federation of European Microbiological Societies. Published by Elsevier B.V. All rights reserved.

Keywords: Functional proteomics; Biochemical proteomics; Enzymology; Enzymatic assays

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

2. High throughput strategies for protein expression and purification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

3. Development of broad specificity enzyme assays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

0168

doi:1

q

*

fax: +

E

3.1. Experimental strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

3.2. Phosphatases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

-6445/$22.00 � 2005 Federation of European Microbiological Societies. Published by Elsevier B.V. All rights reserved.

0.1016/j.femsre.2004.12.006

Edited by Michael Y. Galperin.

Corresponding author. Tel.: +1 416 946 0075;

1 416 978 8528.

-mail address: [email protected] (A.F. Yakunin).

mailto:[email protected]

264 E. Kuznetsova et al. / FEMS Microbiology Reviews 29 (2005) 263–279

3.3. Phosphodiesterases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

3.4. Esterases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

3.5. Proteases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

3.6. Dehydrogenases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

3.7. Oxidases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

4. Substrate profiling: screening proteins with natural substrates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

5. Application of enzymatic screens for functional annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

5.1. Annotation of ‘‘hypothetical’’ proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

5.2. Identification of missing enzymes: E. coli nucleotidases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

5.3. Mis-annotated proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

5.4. Testing structure-based hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

5.5. New activities for known enzymes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

5.6. Confirmation of sequence-based gene annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

6. Concluding remarks and prospects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

1. Introduction

Understanding protein function has always been a

major goal in biology. This problem has now been mag-

nified by global genome sequencing efforts, which have

generated large numbers of new genes whose biologicalor biochemical functions remain unknown. The com-

plete sequences of 224 genomes are currently available

in the public databases and there are 975 genome

sequencing projects underway (GenBank, http://

www.ncbi.nlm.nih.gov/Genbank/index.html; PEDANT,

http://pedant.gsf.de/; Genomes OnLine, http://www.

genomesonline.org/). In newly sequenced genomes,

genes are annotated on the basis of sequence similarity[1] to other proteins that have already been character-

ized. This bioinformatic technique, although the most

successful and least expensive, fails to assign function

to 40–60% of the new sequences [2]; in any prokaryotic

genome, >30% of genes remain annotated as ‘‘function

unknown’’. In eukaryotes, ‘‘hypothetical’’ proteins rep-

resent an even higher percentage of the genome, e.g.,

>60% of the genes in Plasmodium falciparum [3]. Inaddition, large numbers of genes may have non-specific

annotations (like putative hydrolase or esterase). Even

in Escherichia coli, the best characterized model organ-

ism, we do not know the function of 20% of the genes

[4]. It is clear that our global understanding of biological

processes will remain murky until we determine the

functions for the genes that encode proteins with un-

known biochemical or physiological function.In addition to sequence similarity-based methods to

annotate new genomic sequences, there are several func-

tional genomics approaches to infer gene function. A

group of recently developed approaches in comparative

genomics (known as genome context analysis) is focused

on the identification of associations between genes and

proteins in different genomes that may point to func-

tional interactions and suggest function for unknown

proteins [5–7]. Genome context analysis integrates vari-

ous types of genomic evidence, such as phyletic profiles

of protein families, domain fusions, gene neighbour-

hoods, expression patterns, metabolic reconstructions,

and shared regulatory sites. Although these methodsusually provide rather general predictions, they repre-

sent an important development in genomics and are

gaining significance with the rapid growth of the number

of sequenced genomes [8]. Many biological processes

involve protein–protein or protein–nucleic acid interac-

tions, and comprehensively identifying them is impor-

tant to defining their cellular roles. These interactions

can be analyzed using various approaches, includingtwo-hybrid systems, TAP (tandem affinity purification)

tagging experiments, and protein microarrays [9–17].

DNA microarrays have been widely used to simulta-

neously determine the expression levels of thousands

of genes and to link proteins of unknown function to

known pathways [18,19]. The phenotypes of specific

gene disruptions under various growth conditions can

yield important clues on the biological roles for openreading frames (ORFs) of unknown function, especially

for genes that are essential for growth under particular

conditions [20–24]. Analysis of the sub-cellular protein

localization may also provide a hint as to the function

of the protein [25,26]. Another crucial aspect of func-

tional annotation of unknown proteins is their three-

dimensional structures. Structural proteomics emerged

from the simultaneous developments of rapid and paral-lel methodologies in gene cloning, protein purification

and three-dimensional structure determination [27–30],

and recent results [31–34] have demonstrated the feasi-

bility and importance of this approach for functional

annotation.

In most cases, these genomics approaches produce

hypotheses or general annotations concerning biochem-

http://www.ncbi.nlm.nih.gov/Genbank/index.html

http://www.ncbi.nlm.nih.gov/Genbank/index.html

http://pedant.gsf.de/

http://www.genomesonline.org/

http://www.genomesonline.org/

E. Kuznetsova et al. / FEMS Microbiology Reviews 29 (2005) 263–279 265

ical or cellular function, which then require experimental

verification. Since a gene function is often manifested by

the direct activity of its translated protein, the analysis

of protein biochemical function is likely to provide a

superior approach for elucidating gene function [16].

There are parallel genome-scale efforts to determinefunction directly, such as the use of large-scale screens

for specific biochemical activities. These methods have

been applied to a set of purified proteins from the yeast

Saccharomyces cerevisiae and successfully identified

several new enzymes in tRNA metabolism in the yeast

[35–38]. Genome-scale biochemical studies using puri-

fied proteins and specific assays are designed to cast a

wide net while acknowledging a certain risk of false po-sitive and false negative information [35,39]. Here, we

review a complementary approach that is based on the

use of general enzymatic assays with a more limited

set of highly purified proteins. We demonstrate the fea-

sibility and merits of this approach for hydrolases and

oxidoreductases, two very broad and important classes

of enzymes. We also present various examples of

application of general enzymatic assays to quickly testhypotheses generated by structural studies or other

functional genomics approaches. The present approach

may also prove to be adept at identifying moonlighting

enzymes, enzymes with more than one enzymatic activ-

ity, which are gaining increasing attention [40,41].

2. High throughput strategies for protein expression andpurification

A major task in the analysis of any biochemical

activity is the purification and identification of the

protein responsible for that activity. Each protein

has unique properties, which can be exploited for its

individual purification, but makes it impossible to de-

sign a general purification strategy applicable to allproteins. Therefore, for enzymatic screening, a general

purification protocol is required to allow routine and

possibly automated purification of native proteins in

microgram quantities at a rate of hundreds of samples

per day. The most suitable method for standardization

and high-throughput technology is recombinant

expression and affinity purification based on the fusion

of a tag, usually a peptide or small protein, to the tar-get protein. Because polypeptide-purification tags can

be genetically attached to any protein, they are

suitable for high-throughput (HT) operations. A num-

ber of different purification tags have been described,

each with different features affecting the stability, sol-

ubility, and expression level of recombinant proteins

[42].

In a pioneer study that laid the basis of biochemicalgenomics approach, Phizicky and co-workers used a

recombination-cloning technology to fuse 75% of yeast

genes to the glutathione S-transferase (GST) tag for

over-expression and affinity purification. About 4500

yeast proteins were affinity purified in 64 pools (96 pro-

teins in each) and assayed for a specific biochemical

activity [36,37,39]. Active pools were deconvoluted to

identify the source strains by preparation and analysisof sub-pools of the proteins. One important advantage

of the pooling strategy is that many proteins can be rap-

idly analysed. However, this approach does not allow

the assessment of the level of gene expression and qual-

ity of protein purification. This problem is compounded

by low expression of many proteins making them under-

represented (or even absent) in pools of purified

proteins.In our protocol for enzymatic screening of proteins,

we decided to use individually purified proteins. While

impractical until recently, the ability to express and

purify large numbers of individual proteins is becom-

ing more widespread [43], and the purification of

highly expressed recombinant proteins has been semi-

automated [44,45]. Rapid cloning, expression, and

purification of large numbers of recombinant proteinsin parallel have been developed to produce proteins

for structural proteomics efforts and protein micro-

array applications [29,43,45–47]. Most systems use E.

coli as an expression host due to the convenience

and economy of working with bacterial cultures. How-

ever, many prokaryotic proteins (30–50%) cannot be

expressed in soluble form in this organism [29]. At

present, the other choices for expressing proteins, suchas yeast cells, insect or human cells, or cell-free sys-

tems, have disadvantages [29]. Therefore, new systems

and strategies are needed to produce soluble proteins

for functional proteomics. Several protein tags were

shown to improve the solubility of recombinant pro-

teins over their 6His-tagged counterparts [43,48,49].

However, 6His-tags remain a popular choice due to

their small size and resultant lack of effect on thephysical and biological properties of the expressed

protein.

We explored the use of small-scale semi-automated

methods to provide purified proteins for assays. In or-

der to ensure purity, all proteins that we tested for

activity were purified individually using the N-terminal

hexahistidine tag. Since this tag rarely affects catalytic

activity of fused proteins, the expressed proteins wereanalysed for enzymatic activity without the removal of

the tag. The automated protocol using an 8-tip liquid

handling robot included cell lysis, filtration, incubation

with Ni-beads, wash steps and elution. The through-

put is impressive; in three hours (time is further re-

duced when using detergent-based lysis), 96 proteins

were purified in 100–150 ll aliquots. Most clones pro-

duced 10–100 lg of purified protein, which is sufficientfor at least 10 enzymatic assays (1 lg/assay). Gel

electrophoresis of the output of the automated


purification showed that this purification scheme is

able to produce highly purified proteins. Similar re-

sults have been reported recently using custom robot-

ics [44]. However, we noted that the automated

protein purification was able to produce proteins of

sufficient purity for enzyme screens only for those pro-teins that express well in bacteria (>2 mg/l). To

achieve sufficient purity for the more poorly expressed

proteins, it was necessary to implement more extensive

manual purification protocols.

3. Development of broad specificity enzyme assays

3.1. Experimental strategy

There are several thousand different enzyme-cataly-

sed reactions, for which there are hundreds of specific

enzymatic assays [50]. It is therefore impractical to de-

velop and apply them all to hundreds or thousands of

unknown proteins. As a practical solution, we developed

a restricted set of enzymatic assays that have relaxedsubstrate specificity and that are inexpensive, rapid,

and simple. These assays were intended not to identify

specific substrates, but rather to identify only the sub-

class or sub-subclasses (phosphatase, dehydrogenase,

protease) to which the new enzyme belongs, and thus

serve as the basis for more specific studies. Proteins with

identified catalytic activity against general substrates

then passed on to more specific studies, including sec-ondary screens, in order to further characterize their

function.

Oxidoreductases and hydrolases comprise 40–60% of

known enzymes in various genomes (PEDANT data-

base). Because many oxidoreductases and hydrolases

can be monitored with colorimetric assays, we selected

these enzyme classes as a proof of concept for the ap-

proach. Specifically, we developed spectrophotometricscreens for phosphatases, phosphodiesterases/nucleases,

proteases, esterases, dehydrogenases and oxidases, to

be carried out in 200 ll volumes in 96-well plate for-

mat. As a general strategy, for each catalytic function

we compared many literature-based assays to identify

a set of assay conditions in which the maximum num-

ber of enzymes was active. Using these conditions, sev-

eral control enzymes were screened and then the assayparameters were varied to generate a set of conditions

supporting maximal catalytic activity of these proteins.

Commonly varied parameters included the nature of

the substrate, the pH, and the metal requirement. Spec-

trophotometric assays for hydrolases and oxidoreduc-

tases are quite sensitive and can often detect 50 pmol

to 1 nmol of product [51,52]. They are well character-

ized and commonly used, and therefore there was alarge body of literature to draw upon for assay

development.

3.2. Phosphatases

Phosphatases or phosphomonoesterases (EC 3.1.3)

hydrolyze phosphomonoester bonds in a wide range of

natural substrates including small molecules (nucleo-

tides, sugars, and sugar alcohols) and proteins. Thereare different groups of phosphatases – alkaline, neutral,

acid, and protein phosphatases. It is well known that

most phosphatases show significant activity toward the

small, artificial chromogenic susbtrate, p-nitrophenyl

phosphate (pNPP). Even at high temperatures (70 �C),pNPP has a very low rate of non-enzymatic hydrolysis

in aqueous solutions, which makes it possible to perform

long incubations (at least 2-4 hours) to detect activity inenzymes with low turnover numbers. Most phospha-

tases show high or significant affinity for pNPP (Km

0.005–5 mM) (BRENDA database, http://www.bren-

da.uni-koeln.de/), and for those with relatively low

affinities such as some protein phosphatases (Km 50–

130 mM), the rate of hydrolysis is still sufficiently high

(Vmax 230–698 lmol min�1 mg�1 protein) to allow

detection. Both alkaline (pH optimum 8–10) and acid(pH optimum 4–6) phosphatases had significant activity

toward this substrate at neutral pH (20–80% of activity

at optimal pH). Most phosphatases have a metal

requirement for activity, using either Mg2+ or Mn2+.

Accordingly, we implemented a general assay for phos-

phatases that included 4 mM pNPP as the substrate and

was performed in a reaction mixture containing 50 mM

HEPES–K buffer (pH 7.5), 5 mM Mg2+, 0.5 mM Mn2+

(Table 1). Under these conditions, it was possible to de-

tect 0.5 ng of alkaline phosphatase (calf intestinal phos-

phatase; data not shown).

The general assays were able to reveal the chemistry

of protein function. In an effort to provide additional

clues to the biological function, each of the phospha-

tases that displayed activity toward pNPP was further

characterized using a panel of physiological substrates(phosphorylated sugars, nucleotides) and the malachite

green reagent [53]. In this way, the enzymes could be

sub-grouped on the basis of their preferred substrates,

and specific hypotheses for function could be

generated.

3.3. Phosphodiesterases

Phosphodiesterases (EC 3.1.4) hydrolyze phosphodi-

ester bonds in various natural substrates (cAMP, cGMP,

single or double stranded DNA, RNA, and phospholip-

ids). There are two artificial chromogenic substrates for

these enzymes, bis-p-nitrophenyl phosphate (bis-pNPP)

and p-nitrophenyl 5 0-thymidine monophosphate (pNP-

TMP). These small chromogenic substrates present

convenient alternatives to complicated and laboriousprotocols that employ natural substrates. Both bis-pNPP

and pNP-TMP have been applied successfully to

http://www.brenda.uni-koeln.de/

http://www.brenda.uni-koeln.de/

Table 1

General enzymatic assays for hydrolases and oxidoreductases

Activity (assay) Substrate Metals pH Dectection (nm)

Phosphatase pNPP Mg2+ + Mn2+ 7.5 410

Phosphodiesterase/nuclease Bis-pNPP Mg2+ + Mn2+ 8.5 410

Esterase pNP-palmitate None 8.0 410

Esterase Palmitoyl-CoA None 7.5 412

Protease Casein Ca2+ + Zn2+ 7.5 595

Protease BAPNA, Leu-pNA Ca2+ + Zn 7.5 405

Amino acid dehydrogenase 20 amino acids Mg2+ + Mn2+ 8.5 340

Alcohol dehydrogenase 5 alcohols Mg2+ + Mn2+ 8.5 340

Organic acid dehydrogenase 8 organic acids Mg2+ + Mn2+ 8.5 340

Aldehyde dehydrogenase 5 aldehydes Mg2+ + Mn2+ 8.5 340

Carbohydrate dehydrogenase 7 carbohydrates Mg2+ + Mn2+ 8.5 340

Amino acid oxidase 20 amino acids None 8.0 460

Alcohol oxidase 5 alcohols None 8.0 460

Organic acid oxidase 8 organic acids None 8.0 460

Aldehyde oxidase 5 aldehydes None 8.0 460

Carbohydrate oxidase 7 carbohydrates None 8.0 460


characterize various phosphodiesterases and nucleases

from both prokaryotes and eukaryotes [54–57]. Phos-

phodiesterases show high to significant affinity for both

substrates (Km = 0.25–14.4 mM for bis-pNPP and 0.06–

6 mM for pNP-TMP) within a broad pH range 7–10

(BRENDA database). bis-pNPP is reported to be a bet-

ter substrate than pNP-TMP [58]. All known phosphodi-

esterases require a divalent metal ion for catalyticactivity (Mg2+, Mn2+, Zn2+, Ni2+, Co2+ or Ca2+); most

have significant activity in the presence of Mg2+ or

Mn2+. Our optimized general phosphodiesterase/nucle-

ase assay includes 0.83 mM bis-pNPP in a reaction mix-

ture containing 50 mM Tricine buffer (pH 8.5), 5 mM

Mg2+, 0.5 mM Mn2+ (Table 1).

3.4. Esterases

Esterases (carboxylesterases, lipases, thioesterases,

and phospholipases) are hydrolases (EC 3.1.1) that show

broad substrate specificity toward oxo- or thio-esters of

various fatty acids. Some esterases show higher activity

toward long acyl chain substrates (C12–C18); others

prefer short chain substrates (C2–C6). Both carboxyles-

terases and lipases demonstrate high activity over abroad pH optimum (6–10) toward p-nitrophenyl esters

of various fatty acids (Vmax 1.92–1543 lmol min�1 mg�1

protein; Km 0.0004–2.86 mM) (BRENDA database).

Even thioesterases show some activity toward these

chromogenic substrates [59,60]. It might be predicted

that short acyl chain substrates (pNP-acetate, pNP-pro-

pionate) will be preferred over those with long acyl

chains for a generic assay because the smaller substratespresumably would have better accessibility to active

sites. However, small substrates are not stable in aque-

ous solutions, which makes it impossible to conduct

long (1–3 h) incubations. Therefore, we used 1 mM

pNP-palmitate (C16), a long chain substrate for ester-

ases assays (carboxylesterases, lipases, and thioesterases)

in a reaction mixture containing 50 mM Tris–HCl (pH

8.0), 0.4% of Triton X-100, and 0.1% of Gum Arabic

(Table 1). Interestingly, most esterases that preferred

short chain substrates showed significant activity toward

pNP-palmitate. For example, the E. coli BioH protein, a

carboxylesterase with a preference for short chain sub-strates [61], was initially identified based on its ability

to hydrolyze pNP-palmitate.

One disadvantage in the use of pNP-esters of fatty

acids as substrates for screening is their sensitivity to

imidazole, which makes them difficult (due to high

background) to use with protein samples directly after

elution from Ni2+ affinity columns. Further analysis of

available information revealed that many carboxyles-terases show some catalytic activity toward the thioes-

terase substrate palmitoyl-CoA [60–62], (BRENDA

database), for which there is a simple chromogenic as-

say based on the reduction of dithio-bis-nitrobenzoic

acid (DTNB, Ellman reagent) by the newly formed

SH-group of free CoA [63]. We explored the use of

this substrate under our experimental conditions and

found that this assay is not sensitive to imidazole,and that many purified carboxylesterases show signifi-

cant activity toward palmitoyl-CoA. Therefore, the

DTNB-based thioesterase assay with palmitoyl-CoA

as a substrate was selected as our second general assay

for esterases.

3.5. Proteases

Proteases (EC 3.4) comprise a large and complex

group of enzymes that hydrolyse peptide bonds at vari-

ous positions (endopeptidases, aminopeptidases, carb-

oxypeptidases). There are five classes of proteases


based on the moiety that plays the primary role

in catalysis (serine, threonine, cysteine, aspartate or

metallo-proteases). Many different natural and artificial

chromogenic substrates for proteases have been devel-

oped [64]. Historically, casein has been widely used as a

natural protease substrate because many proteases cleavethis protein, and casein appears to be a better substrate

than haemoglobin, albumin, or collagen. Because it

appeared difficult to develop a single assay suitable for

all proteases, we generated assays for the three classes

of proteases (endopeptidases, aminopeptidases, and

carboxypeptidases).

The spectrophotometric assay for endopeptidases

employed a casein-based assay coupled to Coomassieblue binding [65]. This assay relies on the ability of a

dye Coomassie blue to bind proteins but not small pro-

teolytic products (small peptides and amino acids). This

assay can be used to measure the activity of various sub-

classes of endopeptidases (serine, cysteine, aspartate,

and metallo) that have different affinities (Km 0.008–

1.05 mM) and activities (Vmax 0.3 – 279 lmol min�1

mg�1 protein) toward this substrate (BRENDA data-base). Most proteases have a broad pH optimum (5.5–

10.5) and do not need metal ions for activity. However,

since metallopeptidases require Ca2+ or Zn2+ (0.5–

1 mM) for activity, we included both metals (0.5 mM

each) into the reaction mixture (Table 1). Protease activ-

ity was measured using casein as the substrate in a reac-

tion mixture containing 50 mM HEPES–K buffer (pH

7.5), 50 lg of casein per well, 0.5 mM Ca2+, 0.5 mMZn2+, and 1 mM DTT. After 2–4 h of incubation with

enzyme, the Bradford reagent (Bio-Rad) was added

and the decrease in absorbance at 595 nm was deter-

mined. The detection limit for different proteases varies

from 2 ng for subtilisin (data not shown) to 5 lg for cal-

pain [65]. More sensitive fluorogenic casein substrates

for the assay of protease activity have been designed

[66], and these are promising assays for subsequentstudies.

Many artificial chromogenic substrates are presently

available that can be used to identify both endo- and

exopeptidases [64]. For example, serine-, cysteine-,

and some carboxy- and metallopeptidases are active

toward benzoyl-Arg-p-nitroanilide (BAPNA; Km =

0.01–1.12 mM; Vmax = 7.4–115.6 lmol min�1 mg�1 pro-

tein) (BRENDA database). Aminopeptidases and somecarboxypeptidases can be assayed using Leu-p-nitroani-

lide (Leu-pNA; Km = 0.17–0.86 mM; Vmax = 1–

190 lmol min�1 mg�1 protein) (BRENDA database).

Therefore, a mixture of BAPNA (0.2 mM) and Leu-

pNA (0.2 mM) was used for chromogenic protease

screens. The reactions were carried out in 50 mM

HEPES–K buffer (pH 7.5), 0.5 mM Ca2+, 0.5 mM

Zn2+, and 1 mM DTT. Although not as general as thecasein-based method, this chromogenic assay is more

sensitive and can detect up to 5–20 pmol of trypsin [51].

3.6. Dehydrogenases

Dehydrogenases (EC 1.1.1; EC 1.2.1; EC 1.3.1; and

EC 1.4.1) oxidize various organic substrates using

NAD or NADP (or both) as electron acceptors. Some

dehydrogenases are activated by Mg2+ or Mn2+ [67–69]. Most dehydrogenases have alkaline pH optima

(8.5–11) for substrate oxidation (cofactor reduction).

To screen for dehydrogenases, we designed several as-

says using pools of different electron donors. These were

mixtures of the 20 amino acids (0.25 mM of each in the

mixture), 5 different alcohols (methanol, ethanol, 1-hex-

anol, decanol, and benzyl alcohol; 0.3 mM of each in

reaction mixture), 8 different organic acids (acetate,fumarate, malate, lactate, isocitrate, succinate, oxaloac-

etate, and a-ketoglutarate; 0.3 mM of each in reaction

mixture), 5 different aldehydes (hexanal, decanal, glutar-

aldehyde, benzaldehyde, and 2-naphthaldehyde; 0.1 mM

of each in the reaction mixture), and 7 different carbohy-

drates (DD-glucose, DD-galactose, DD-mannitol, DD-fructose,

DD-arabinose, DD-sorbitol, and DD-arabitol; 0.5 mM of each

in the reaction mixture). The substrate concentrationswere selected to fall within the range of characterized

Km�s (0.01–150 mM) (BRENDA database). A mixture

of NAD and NADP (0.5 mM each) was used as the elec-

tron acceptor (Table 1), and the activity was detected as

an increase in absorbance at 340 nm. The reaction mix-

ture contained 50 mM Tricine buffer (pH 8.5), 0.5 mM

NAD, 0.5 mM NADP, 1 mM Mg2+, 0.1 mM Mn2+,

and the substrate solution (described above). Mostdehydrogenases have a specific activity (Vmax) higher

than 10 lmol NAD(P) reduced min�1 mg�1 protein.

The assay should therefore detect most dehydrogenases

even if the enzyme is not saturated with substrate (detec-

tion limit �20 nmol of NAD(P)H produced).

3.7. Oxidases

Oxidases (EC 1.1.3; EC 1.2.3; EC 1.3.3; and EC

1.4.3) use O2 as the electron acceptor and produce

hydrogen peroxide. Like dehydrogenases, they can

oxidize various organic substrates. For the general

oxidase assay, we therefore used the same substrate

pools as for dehydrogenases (20 amino acids, or 5

alcohols, or 8 organic acids, or 5 aldehydes, or 7 car-

bohydrates). Oxidases have broad pH profile (4.3–10.5) and usually do not need metal ions for activity.

Besides O2, these enzymes can also use other electron

acceptors (like 2,6-dichlorophenol indophenol, ferrycy-

anide, methylene blue, and tetrazolium salts), although

O2 supports higher activity. Therefore, the general oxi-

dase assay was performed using the five different sub-

strate pools at pH 8.0 and O2 as the electron acceptor

in a reaction mixture containing 50 mM HEPES–Kbuffer (pH 8.0), substrate, 0.1 mM o-dianisidine, and

2 lg of peroxidase per well (Table 1). The production


of hydrogen peroxide was monitored in a coupled as-

say by the increase in absorbance at 460 nm using the

chromogenic reaction of peroxidase and o-dianisidine

[70].

4. Substrate profiling: screening proteins with natural

substrates

After detection of catalytic activity of unknown pro-

teins in general screens, it was important to develop

methods to identify natural substrates and cofactors.

To speed up the biochemical characterization of new

enzymes, we designed a set of secondary screens withnatural substrates. The availability of purified proteins

and a set of rapid assays with different substrates en-

abled the rapid biochemical description of the enzymes

on the basis of substrate specificity. This process, which

we term ‘‘substrate profiling’’, may facilitate new

groupings of enzymes based on the chemical transfor-

mations that they catalyze and the small molecules

with which they interact rather than on sequence orstructural properties. This approach has been already

successfully applied to identify phosphohydrolase activ-

ity of Nudix hydrolases and to quickly characterize

their substrate specificity [71,72]. Substrate profiling

may be an important aspect of biochemical proteomics,

particularly since many sequence and structurally re-

lated proteins can perform considerably different chem-

istry [73,74].Phosphorylated compounds comprise the largest

group of intracellular metabolites, and the phosphate

group is by far the most common constituent, found

in over one-third of all metabolites [75]. Over 70 various

phosphorylated compounds (nucleotides, carbohy-

drates, amino acids, and organic acids) are commercially

available (from Sigma) and were used as individual

substrates (one compound/well) or as a mix of severalrelated compounds (nucleoside 5 0-mono-, di-, or tri-

phosphates, and nucleoside 3 0-monophosphates) mak-

ing a set of 46 substrates for the secondary

phosphatase screen. The screen is based on the detection

of released Pi with the highly sensitive Malachite Green

reagent [53]. Screening was performed in 96-well micro-

plates using 160 ll reaction mixtures containing 50 mM

HEPES–K (pH 7.5), 0.1 mM substrate, 5 mM MgCl2,0.5 mM MnCl2, 0.5 mM NiCl2, and 1–2 g of protein.

After 30–60 min incubation (at 37 �C or at 70 �C for

thermophilic proteins), the reaction was terminated by

the addition of 40 ll of Malachite Green reagent [53]

and after 5 min the production of Pi was measured at

630 nm. Since most phosphohydrolases require a diva-

lent metal cation for activity, we designed an additional

screen with various metals (Mg2+, Mn2+, Ca2+, Co2+,Cu2+, Zn2+, and Ni2+) to characterize quickly the metal

specificity of unknown proteins. These screens can also

be performed in 96-well microplates with general

(pNPP) or natural substrates.

For phosphodiesterases, there are three main groups

of natural substrates: cyclic nucleotides, nucleic acids

(single and double stranded), and phospholipids.

Phosphohydrolase activities against these substrateswere assayed using published protocols [58,76] and

commercially available biochemicals: 2 0,3 0-cAMP,

2 0,3 0-cCMP, 2 0,3 0-cGMP, 3 0,5 0-cAMP, 3 0,5 0-cCMP,

3 0,5 0-cGMP, 3 0,5 0-cIMP, 3 0,5 0-cTMP, 3 0,5 0-cUMP, dou-

ble stranded DNA (k phage), single stranded DNA

(M13 phage), tRNA (E. coli, yeast), rRNA (E. coli),

and phosphatidylcholine (all from Sigma).

Carboxylesterases and lipases are characterized bytheir ability to hydrolyze a broad range of substrates

with the preference to short-, or medium-, or long-chain

length substrates [77]. Their substrate preference can be

conveniently determined using commercially available

(Sigma) p-nitrophenyl esters of fatty acids with different

chain length (pNP-acetate, pNP-propionate, pNP-buty-

rate, pNP-caproate, pNP-caprate, pNP-laurate, pNP-

palmitate, and pNP-stearate) [78].For the characterization of new dehydrogenases and

oxidases, their activity can be assayed with individual

substrates from the positive substrate pool (such as

20 L-amino acids) using the same reaction conditions

as for general assays.

5. Application of enzymatic screens for functionalannotation

Over 600 different proteins purified individually from

E. coli, Pseudomonas aeruginosa, Thermotoga maritima,

Thermoplasma acidophilum, Methanobacterium thermo-

autotrophicum, Archaeoglobus fulgidus, and Methano-

coccus jannaschii using manual or semi-automatic

protocols were screened with all catalytic assays. Mostof these proteins were selected for structural studies in

the Ontario Proteomics Centre at the University of

Toronto. They are annotated as hypothetical proteins

or as putative enzymes and have no close homologues

(less than 30% identity) with solved three-dimensional

structures. At this stage, no particular strategy has been

applied to select proteins for enzymatic screening, and

we tested all available proteins that showed reasonablelevel of expression. Fig. 1 shows a representative initial

screen (performed in 96-well format) that identified a

potential new phosphatase in T. maritima (TM1254).

Any protein that showed catalytic activity in their ini-

tial screens was purified on a larger scale and further

characterized. It was important to confirm activity in

large scale because the assays, particularly the dehydro-

genase assays, exhibited some false positives. Subse-quent analysis revealed that inevitably false positives

Fig. 1. Screening of purified proteins for phosphatase activity.

(A) General phosphatase screen with pNPP as substrate. Different

proteins (47) from T. maritima, T. acidophilum, M. thermoautotroph-

icum, and A. fulgidus were purified under native conditions and 10 lg(in duplicate) were loaded into microplate wells containing 200 ll ofphosphatase reaction mixture [102]. A positive control was set up using

2 lg of calf intestinal phosphatase (CIP) from Sigma, and 10 ll ofelution buffer were used for the negative control (well H11). The

picture was taken after 1 h incubation at 65 �C. Positive reactions

(indicated by the box) were obtained with the T. maritima protein

TM1254. (B) Screening of the TM1254 for phosphatase activity with

natural substrates. Seventy phosphorylated compounds were added

(one compound/well or as a mix of several related compounds) to a 96-

well microplate containing the phosphatase reaction mixture [102]

without enzyme (rows A, C, E, and G) or with 2 lg of TM1254 (rows

B, D, F, and H) and incubated for 1 h at 65 �C. The reactions were

stopped by the addition of Malachite Green reagent, which in the

presence of free phosphate produced strong green colour. Positive

results were obtained with five substrates: (a) fructose 6-phosphate

(C4, no. TM1254; D4, +TM1254), (b) mannose 6-phosphate (C8, no.

TM1254; D8, +TM1254), (c) erythrose 4-phosphate (C11, no.

TM1254; D11, +TM1254), (d) 2-deoxy glucose 6-phosphate (E11,

no. TM1254; F11, +TM1254), and (e) pyridoxal phosphate (G7, no.

TM1254; H7, +TM1254).


arose from poorly expressed proteins, which had higher

levels of contamination from the E. coli lysate.

Our screening and characterization revealed 36 new

enzymes (Table 2). Half of them (17 proteins) are phos-

phatases indicating that the phosphatase assay with

pNPP as a substrate is quite reliable and generic. We de-

tected phosphatase activity in 15 out of 21 proteins anno-

tated as haloacid dehalogenase (HAD)-like hydrolases.

Ten years ago, Koonin and Tatusov defined this large

superfamily of hydrolases with haloacid dehalogenase,

phosphonatase, phosphatase, and b-phosphoglucomu-tase activities [79]. Phosphatase activity was also demon-

strated in other HAD-like hydrolases from E. coli: YrbI

(3-deoxy-DD-manno-octulosonate 8-phosphate phospha-

tase), SerB (phosphoserine phosphatase), OtsB (treha-

lose 6-phosphate phosphatase) [80–82]. Taken together,

these experimental results show that most predicted

HAD-like hydrolases will have phosphatase activity.

Also, most proteins (13 out of 16) containing esterase/thioesterase sequence motifs revealed the presence of

esterase activity in our screens, indicating that computa-

tional prediction of this activity is also pretty accurate

(these results are also discussed in the Section 5.6).

We most probably have not detect all potential en-

zymes, and there are several reasons that might account

for this. First, the substrates that we selected, though

generic in design, may not adequately cover ‘‘substratespace’’, or perhaps the reaction conditions were sub-

optimal. Second, it is possible that some of the enzymes

were produced in an inactive form or were inactivated

during the purification procedures. For example, it has

been noted that some iron-containing proteins lost iron

during purification, and some putative iron sulphur pro-

teins contained high levels of zinc and only a low per-

centage of iron. When recombinant proteins are highlyexpressed in E. coli, insertion of zinc into iron-binding

sites due to the low bioavailability of iron is well docu-

mented [83–85]. Finally, not all proteins were present at

the same level; it is possible that many were too dilute to

reveal activity under the assay conditions.

The screens were applied to different categories of

proteins. In the first instance, the assays were used to

discover activity for proteins that had previously beenun-annotated (hypothetical proteins) because they

lacked sequence similarity to any protein of known

function. Second, the assays were applied in a focused

search for a particular group of enzymes (nucleotidases

in this case). Third, the assays were able to identify en-

zymes that might have been mis-annotated in the gen-

ome database. Fourth, the assays were used to rapidly

test structure-based hypotheses for function. In thesecases, the three-dimensional structure suggested a set

of possible activities, and the enzyme assays were used

to identify the correct one. Finally, the assays were used

to test known enzymes for new activities and to verify

sequence-based annotations.

5.1. Annotation of ‘‘hypothetical’’ proteins

Pseudomonas aeruginosa PA0065 protein (Q9I767)

was annotated as a ‘‘hypothetical protein’’, though

Table 2

Uncharacterized proteins for which we have identified activity

No. Protein

(Swiss-Prot ID)

Swiss-Prot prediction

(or conserved sequence motif)

Activity

(substrates)

Reference

1. ECa YfcE (P76495) Calcineurin-like

phosphoesterase

Phosphodiesterase (bis-pNPP, pNP-TMP, pNPPC) This work

2. EC SurE (P36664) Acid phosphatase SurE Nucleotidase (pNPP, 30-AMP, dGMP,

GMP, 30-CMP)

[89]

3. EC YihX (P32145) HADb-like hydrolase Phosphatase (pNPP, b-glucose 1P) This work

4. EC AstD (P76217) Aldehyde dehydrogenase

family

Aldehyde dehydrogenase (decanal,

succinic semialdhyde, NAD)

This work

5. EC BioH (P13001) Serine esterase

(Pfam: a/b hydrolase)

Carboxylesterase (pNP-palmitate, pNP-acetate) [61]

6. EC CCA (P06961) HD domain

(phosphohydrolase)

Phosphatase, nucleotidase, phosphodiesterase

(pNPP, NADP, ADP, 20-AMP,

2 0,3 0-cAMP, 2 0,3 0-cGMP)

[102]

7. EC YfbT (P77625) HAD-like hydrolase Phosphatase (pNPP, glucose 6P) This work

8. EC YniC (P77247) HAD-like hydrolase Phosphatase (pNPP, 2-deoxyglucose 6P,

mannose 6P)

This work

9. EC YqaB (P77475) HAD-like hydrolase Phosphatase (pNPP, fructose 1P,

6-phosphogluconate)

This work

10. EC YbhA (P21829) HAD-like hydrolase Phosphatase (pNPP, pyridoxalphosphate,

erythrose 4P)

This work

11. EC YbjI (P75809) HAD-like hydrolase Phosphatase (pNPP, FMN, b-glucose 1P) This work

12. EC YidA (P09997) HAD-like hydrolase Phosphatase (pNPP, erythrose 4P, mannose 1P) This work

13. EC YbiV (P75792) HAD-like hydrolase Phosphatase (pNPP, fructose 1P, ribose 5P) This work

14. EC YieH (P31467) HAD-like hydrolase Phosphatase (pNPP, phosphoenolpyruvate, AMP) This work

15. EC YjjG (P33999) HAD-like hydrolase Nucleotidase (pNPP, UMP, dTMP, dUMP) [89]

16. EC YbdB (P15050) Thioesterase Esterase (palmitoyl-CoA, pNP-butyrate) This work

17. EC YdiI (P77781) Thioesterase Esterase (palmitoyl-CoA, pNP-butyrate) This work

18. EC YjfP (P39298) Esterase/lipase/ thioesterase Esterase (palmitoyl-CoA, pNP-butyrate) This work

19. EC YbfF (P75736) Serine esterase (Pfam:

alpha/beta hydrolase)

Esterase (palmitoyl-CoA,

malonyl-CoA, pNP-butyrate)

This work

20. EC YciA (P04379) Thioesterase Esterase (palmitoyl-CoA, malonyl-CoA) This work

21. EC YpfH (P76561) Serine esterase

(Pfam: a/b hydrolase)

Esterase (palmitoyl-CoA, pNP-butyrate) This work

22. EC YbgC (P08999) Thioesterase Esterase/thioesterase (malonyl-CoA) This work

23. EC YbhC (P46130) Pectinesterase Esterase/thioesterase (palmitoyl-CoA) This work

24. EC YeiG (P33018) Putative esterase Esterase (palmitoyl-CoA, pNP-butyrate) This work

25. EC YfbB (P37355) Serine esterase

(Pfam: alpha/beta hydrolase)

Esterase (palmitoyl-CoA) This work

26. EC YqiA (P36653) Serine esterase Esterase (palmitoyl-CoA, pNP-butyrate) This work

27. EC YafA (P04335) Serine esterase Esterase (pNP-butyrate) This work

28. EC YjfR (P39300) Protein kinase-like

(or Zn-dependent hydrolase)

Phosphodiesterase (bis-pNPP) This work

29. EC YaeI (P37049) Calcineurin-like phosphoesterase Phosphodiesterase (bis-pNPP) This work

30. EC YfbR (P76491) HD domain (phosphohydrolase) Nucleotidase (pNPP, 50-dAMP, 5 0-dCMP, 5 0-dUMP) [89]

31. TA0175 (Q9HLQ2) HAD-like hydrolase Phosphatase (pNPP, phosphoglycolate) [87]

32. TA0845 (Q9HJW8) HAD-like hydrolase Phosphatase (pNPP) This work

33. TM1254 (Q9X0Y1) HAD-like hydrolase Phosphatase (pNPP, erythrose 4P, fructose 6P) This work

34. TM1643 (Q9X1X6) Domain of unknown

function DUF108

Aspartate dehydrogenase (LL-aspartate) [98]

35. MJ0936 (Q58346) Calcineurin-like phosphoesterase Phosphodiesterase (bis-pNPP, pNP-TMP, pNPPC) [76]

36. PA0065 (Q9I767) HAD-like hydrolase Phosphatase (pNPP, 5 0-UMP, 5 0-IMP) This work

a Organism designation: EC, E. coli; TA, T. acidophilum; TM, T. maritima; MJ, M. jannaschii; PA, P. aeruginosa.b HAD, haloacid dehalogenase.


predicted to belong to the haloacid dehalogenase

(HAD)-like hydrolase superfamily [79], which com-

prises a large superfamily of hydrolytic enzymes with

over 2,800 entries in EMBL-EBI database. These pro-

teins have low overall sequence similarity (<29% iden-

tity), but higher similarity surrounding four short

catalytic motifs. The vast majority of HAD-like

hydrolases have unknown function, while members

with known function catalyze one of the following

five activities: dehalogenase, phosphonohydrolase,


phosphatase, phosphoglucomutase, or ATPase. Puri-

fied PA0065 was active in our phosphatase screens

with pNPP (Table 1). The enzyme had an optimum

pH of 7.4 and required divalent metal ions for activity

(Mn2+ > Mg2+). The substrate specificity of PA0065

toward natural phosphatase substrates was then ex-plored. In this screen, PA0065 produced a strong

positive signal with the mixture of 5 0-nucleoside

monophosphates. Further analysis demonstrated that

this protein displayed high activity toward 5 0-UMP

and 5 0-IMP, significant activity against 5 0-XMP and

5 0-TMP, and low activity against 5 0-CMP (Fig. 2E).

The highest activity was displayed toward 5 0-UMP

(Vmax = 3.14 ± 0.05 lmol min�1 mg�1 protein). Withthis substrate, PA0065 preferred Mn2+ as the metal

and showed classical Michaelis–Menten kinetics with

Km = 0.39 ± 0.02 mM. Although PA0065 had 24% se-

quence identity to the recently characterized E. coli

phosphoglycolate phosphatase (Gph, P32662) [86],

the Pseudomonas protein could not hydrolyse this sub-

strate. Our results indicate that PA0065 is a 5 0-nucle-

otidase, and its strict substrate specificity suggests thatthis enzyme plays a unique function in the intracellu-

lar nucleotide metabolism in Pseudomonas.

Like Pseudomonas PA0065, the other ‘‘hypothetical’’

protein, T. acidophilum TA0175 (Q9HLQ2), was also

predicted to be HAD-like hydrolase. In sequence dat-

abases, there are more than 50 similar proteins with se-

quence identities ranging from 75% to 22%, none of

which had a known biological function. This proteinwas found active in general phosphatase screens with

pNPP. Further analysis demonstrated that TA0175 is

an Mg2+-dependent phosphoglycolate phosphatase that

also has significant pyrophosphatase activity [87]. How-

ever, two HAD-like hydrolases with experimentally ver-

ified phosphoglycolate phosphatase activity, TA0175

from T. acidophilum and Gph from E. coli, have only

20% of sequence identity to each other. These resultsclearly demonstrate that for HAD-like hydrolases the

substrate specificity cannot be identified on the basis

of sequence similarity and must be determined

experimentally.

Both E. coli YfcE (P76495) and M. jannaschii

MJ0936 (Q58346) were annotated as ‘‘hypothetical’’

proteins and contain a ‘‘calcineurin-like’’ phosphoest-

erase motif suggesting that they may havephosphomonoesterase, or phosphodiesterase, or phos-

photriesterase activity (PEDANT database). Both

proteins produced positive signals in our phosphodies-

terase screens with bis-pNPP as substrate. YfcE was

active only in the presence of Mn2+, whereas

MJ0936 showed highest activity with Ni2+ [76].

Besides bis-pNPP, both proteins also hydrolysed

(but with lower activity) two other chromogenicsubstrates for phosphodiesterases, pNP-TMP and

p-nitrophenylphosphorylcholine.

5.2. Identification of missing enzymes: E. coli

nucleotidases

Nucleotidases (EC 3.1.3.5 and EC 3.1.3.6) are phos-

phatases that specifically dephosphorylate nucleoside

monophosphates to nucleosides and inorganic phos-phate. Seven mammalian 5 0-nucleotidases with different

amino acid sequences and substrate specificities have

been identified and characterized [88]. In contrast to

well-characterized mammalian nucleotidases, the field

of prokaryotic nucleotidases remains unexplored and

no intracellular nucleotidases have been reported in E.

coli. To find proteins with nucleotidase activity in E.

coli, purified unknown proteins were screened for thepresence of phosphatase activity using the general phos-

phatase substrate pNPP. Proteins exhibiting catalytic

activity were then assayed for nucleotidase activity

against various nucleotides. These screens identified

the presence of nucleotidase activity in three uncharac-

terised E. coli proteins: SurE, YfbR, and YjjG [89].

These proteins show no sequence similarity to each

other (15.5–18.3% identity) and belong to different phos-phohydrolase families: SurE-like, HD domain (YfbR),

and haloacid dehalogenase (HAD)-like hydrolases

(YjjG). The nucleotidase activity of these proteins had

a neutral pH optimum (pH 7.0–8.0) and was strictly

dependent on the presence of divalent metal cations

(YfbR: Co2+ > Mn2+ > Cu2+; YjjG: Mg2+ > Mn2+ >

Co2+). Further biochemical characterization of YfbR re-

vealed that it was strictly specific to deoxyribonucleoside5 0-monophosphates, whereas YjjG showed narrow spec-

ificity to 5 0-dTMP, 5 0-dUMP, and 5 0-UMP (Fig. 2A and

B). The observed substrate affinities (Km) of YjjG, and

YfbR (0.01–0.8 mM) are within the range reported for

known nucleotidases [88] (BRENDA database). These

two proteins also exhibited different sensitivities to inhi-

bition by various nucleoside di- and triphosphates. YjjG

was insensitive and YfbR was equally sensitive to bothdi- and triphosphates. The differences in their sensitivi-

ties to nucleotides and their varied substrate specificities

suggest that these enzymes have unique functions in the

nucleotide metabolism in E. coli.

5.3. Mis-annotated proteins

Thermotoga maritima TM1254 (Q9X0Y1) is anno-tated as a ‘‘putative b-phosphoglucomutase’’ despite

sharing only 27.2% sequence identity to the recently

characterized b-phosphoglucomutase PgmB from Lacto-

coccus lactis [90]. In our assays, purified TM1254 did not

have phosphoglucomutase activity (data not shown).

Rather, this protein demonstrated high phosphatase

activity in screens with pNPP (Fig. 1A), indicating that

the sequence-based annotation of TM1254 as a phospho-glucomutase is incorrect. Further analysis showed that

TM1254 is a phosphatase that requires a divalent metal

Fig. 2. Substrate profiles of new phosphohydrolases identified by general enzymatic screens. (A) E. coli YfbR; (B), E. coli YjjG; (C), E. coli SurE;

(D), T. maritima TM1254; (E), P. aeruginosa PA0065; and (F), E. coli CCA. 100% activities were (in lmol min�1 mg�1 protein): YfbR, 0.71; YjjG,

73.9; SurE, 20.1; TM1254, 2.63; PA0065, 3.14; and CCA, 17.9.


cation for catalysis (Co2+ > Mg2+ > Mn2+ > Ni2+). Sec-

ondary screens with natural phosphatase substrates (sub-

strate profiling) identified high phosphatase activity

toward erythrose 4-phosphate, fructose 6-phosphate,

2-deoxyglucose 6-phosphate, and mannose 6-phosphate

(Figs. 1B and 2D). With erythrose 4-phosphate, purified

TM1254 showed sigmoidal saturation kinetics with a

Hill�s coefficient nH = 1.31 ± 0.45 indicating positive

cooperativity in erythrose 4-phosphate binding. The

protein had high affinity to this substrate (apparent

Km = 152.6 ± 55.0 lM and Vmax = 2.63 ± 0.56 lmol/

min mg protein) and high catalytic efficiency (kcat/

Km = 8.13 · 103 M�1 s�1). To our knowledge, the intra-

cellular concentration of erythrose 4-phosphate has not


been reported for any organism. However, the affinity of

TM1254 to this substrate is among the highest in the

range of Km (0.15–20 mM) calculated for other enzymes

metabolising erythrose 4-phosphate (e.g., erythrose

4-phosphate dehydrogenase or 3-deoxy-DD-arabino-

heptulosonate-7-phosphate synthase) [91–93]. With fruc-tose 6-phosphate, TM1254 also exhibited sigmoidal

saturation kinetics and high affinity (Km = 0.2 ±

0.02 mM). Our results indicate that TM1254 is a broad

substrate range phosphatase with potential role in the

intracellular metabolism of many phosphorylated

carbohydrates.

Escherichia coli SurE (P36664) was annotated as an

acid phosphatase on the basis that its Yarrowia lipolyticahomologue (PHO2; 21.5% sequence identity to the

E. coli SurE) complemented mutations in two of the ma-

jor acid phosphatases of S. cerevisiae [94]. The purified

E. coli SurE was active in the phosphatase screen with

pNPP and had a neutral pH optimum (7.0) for activity

with Mn2+ being the best metal cofactor [89]. This

protein has a broad substrate specificity and can

dephosphorylate various ribo- and deoxyribonucleoside5 0-monophosphates and ribonucleoside 3 0-monophos-

phates with highest affinity to 3 0-AMP (Fig. 2C). Our

biochemical studies of the E. coli SurE [89] and the pre-

vious data on two SurE proteins from the thermophilic

bacterium T. maritima [95,96] and from the archaebacte-

rium Pyrobaculum aerophilum [97] clearly demonstrated

that the annotation of SurE proteins as an acid phos-

phatases is not accurate. Acid phosphatases (EC3.1.3.2) comprise a large group of non-specific phospho-

hydrolases capable to hydrolyze a broad range of phos-

phorylated sugars, amino acids, nucleoside mono-, di-,

and triphosphates (BRENDA database). In contrast to

these non-specific acid phosphatases, SurE proteins

from E. coli, T. maritima, and P. aerophilum showed

strict specificity to nucleoside 5 0(3 0)-monophosphates

and, therefore, should be annotated as 5 0(3 0)-nucleotidases.

5.4. Testing structure-based hypotheses

Our generic enzymatic screens have also been used to

verify hypotheses generated by structural studies of pro-

teins with unknown biochemical function. Structural

studies are commonly used to suggest the EC (EnzymeCommission) class (hydrolase, oxidoreductase) to which

a particular protein belongs. For example, the function

of T. maritima TM1643 (and its more than 15 homo-

logues in bacteria, archaea and eukaryotes) could not

be deduced from its sequence, because it does not share

any recognizable similarity to other proteins of known

function or structure. Analysis of the crystal structure

of TM1643 revealed the presence of a bound NAD sug-gesting that this protein was a dehydrogenase [98]. To

test this hypothesis and to identify substrate(s) for this

protein, we applied our general dehydrogenase assays

with three mixtures of substrates as electron donors (Ta-

ble 1). Assays with amino acid pools detected significant

dehydrogenase activity, and, when tested with individual

amino acids, TM1643 was shown to be strictly specific

toward L-Asp. Therefore, TM1643 and its homologueswere revealed to be aspartate dehydrogenases, an enzy-

matic activity that had not been previously reported

[98]. In T. maritima, the TM1643 gene is part of the

Nad operon involved in de novo biosynthesis of NAD

from aspartate, and the BLAST analysis did not reveal

the presence of an L-Asp oxidase homologue in this

organism. Therefore, it has been suggested that

TM1643 catalyzes the first reaction of NAD biosyn-thesis, providing the iminoaspartate required for this

pathway [98].

The assays were also used to test a structure-based

hypothesis for E. coli BioH (P13001), which is involved

in biotin biosynthesis and whose biochemical function

was unknown [99]. The crystal structure of this protein

was determined and its automated analysis identified a

catalytic triad (Ser82, His235 and Asp207) with a similarconfiguration to the catalytic triad of hydrolases [61].

Enzymatic screening of purified BioH with a panel of

hydrolase assays (esterase, lipase, thioesterase, phospha-

tase, and protease) revealed a carboxylesterase activity

with preference for short acyl chain substrates. These

two examples show that combined use of structural

analysis and experimental screen for detecting enzyme

activity can efficiently provide biochemical confirmationof structure-based hypotheses.

5.5. New activities for known enzymes

The E. coli CCA protein, tRNA nucleotidyltransfer-

ase (P06961), is a well-characterized enzyme that car-

ries out synthesis of the CCA terminus of tRNAs

[100,101] and is involved in the repair of tRNA. Ingeneral enzymatic screens, CCA protein hydrolyzed

pNPP in the presence of Mg2+ and Mn2+; this activity

has not been reported in previous studies. Further

experiments demonstrated that the E. coli CCA showed

highest phosphatase activity in the presence of Ni2+

and hydrolyzed pyrophosphate, canonical 5 0-nucleoside

tri- and diphosphates, NADP, and 2 0-AMP with the

production of Pi [102]. Assays with phosphodiesterasesubstrates revealed a surprising metal-independent

phosphodiesterase activity toward 2 0,3 0-cAMP, 2 0,3 0-

cGMP, and 2 0,3 0-cCMP (Fig. 2F). Without metal or

in the presence of Mg2+, this protein hydrolyzed

2 0,3 0-cyclic substrates with the formation of 2 0-nucleo-

tides, whereas in the presence of Ni2+, it also produced

some 3 0-nucleotides [102]. The E. coli CCA comprises

two domains: an N-terminal domain containing thenucleotidyltransferase activity and an uncharacterised

C-terminal HD domain. The HD motif defines a super-


family of metal-dependent phosphohydrolases that in-

clude a variety of uncharacterised proteins and do-

mains associated with nucleotidyltransferases and

helicases from bacteria, archaea, and eukaryotes

[103]. Mutations at the conserved His-255 and Asp-

256 residues comprising the C-terminal HD domainof the E. coli CCA protein inactivated both phosphodi-

esterase and phosphatase activities, indicating that this

activities are associated with the HD domain [102].

Low concentrations of the E. coli tRNA (10 nM) had

a strong inhibitory effect on both phosphatase and

phosphodiesterase activities. The competitive character

of inhibition by tRNA suggests that it might be a nat-

ural substrate for these activities. On the basis of thisbiochemical information, the following model for the

role of the HD domain in the E. coli CCA protein

was proposed [102]. This model is based on the

assumption that the degradation of tRNA by intracel-

lular RNases produces tRNA molecules with 2 0,3 0-cyc-

lic phosphate at the 3 0-end [104–106]. In the repair

process, the phosphodiesterase activity of the HD

domain might hydrolyze the cyclic phosphodiester to2 0-monophosphate, which would then be predicted to

be removed by the HD domain 2 0-nucleotidase activity.

These activities would eventually produce the unphos-

phorylated 3 0-end of tRNA suitable for the nucleotidyl-

transferase reaction. Thus, the E. coli CCA protein is a

multifunctional enzyme with 2 0,3 0-cyclic phosphodies-

terase, 2 0-nucleotidase, phosphatase, and nucleotidyl-

transferase activities, and we suggest that theseactivities act in concert to repair the 3 0-end of tRNA.

These results also show that general enzymatic assays

can identify new catalytic activities in already charac-

terized proteins.

5.6. Confirmation of sequence-based gene annotations

As many as 15% of genome annotations are incor-rect, providing a clear rationale to produce experimental

evidence for those proteins that have been annotated on

the basis of distant sequence relationships. General

enzymatic assays were used to verify sequence-based

predictions for 14 E. coli proteins that were annotated

as putative esterases or as hypothetical proteins contain-

ing the esterase-like functional sites (esterase/lipase/thio-

esterase; InterPro database; http://www.ebi.ac.uk/interpro/). These proteins were over-expressed, purified,

and screened for esterase activity using palmitoyl-CoA

as a substrate. Eleven proteins showed high or signifi-

cant catalytic activity toward this substrate (Table 2)

representing a discovery rate of 78%. Among these 11

proteins nine possessed a serine hydrolase-like active site

(IPR000379). Of these nine, six demonstrated high ester-

ase activity and three were negative. These results indi-cate either that this InterPro group contains other,

non-esterase activities or that the three proteins were

somehow inactive in the experimental conditions tested.

All the proteins from the other InterPro groups that

were tested (YbdB, YdiI: IPR006683, thioesterase;

YciA: IPR002590, acyl-CoA thioesterase; YbgC:

IPR000365, 4-hydroxybenzoyl-CoA thioesterase; and

YbhC: IPR000070, pectinesterase) were active towardpalmitoyl-CoA, confirming the sequence-based predic-

tions for these proteins. Our results clearly demon-

strated that sequence-based annotations can be quickly

tested using general enzymatic assays, and that the pal-

mitoyl-CoA assay may be a powerful screen to discover

esterases.

Escherichia coli YjjN is annotated as a hypothetical

Zn-type alcohol dehydrogenase-like protein (InterProdomain IPR002328), and YjgI as a hypothetical oxido-

reductase from the short-chain dehydrogenase super-

family (IPR002198). These proteins, as well as three

E. coli proteins annotated as dehydrogenases (AdhP,

AstD, and GpsA) were purified and screened for dehy-

drogenase activity using the substrate pools presented

in this review. While our screens confirmed the pres-

ence of dehydrogenase activity in other known pro-teins, the two hypothetical dehydrogenases, AdhP and

GpsA, did not show activity, suggesting that these pro-

teins might not be dehydrogenases or that they are very

specific for their natural substrates. The E. coli AstD

(P76217) was annotated as succinylglutamic semialde-

hyde dehydrogenase and this activity has been found

in crude cell extracts [107], but AstD has never been

purified and tested for this activity. Our assays on puri-fied AstD revealed low activity with benzaldehyde and

NAD as substrates, but high activity with decanal as

the electron donor. Michaelis kinetics were obtained

with Vmax = 5.0 ± 0.5 lmoles/min mg protein and

Km = 91 ± 20 lM for decanal. The protein exhibited

high affinity to NAD (Km = 160 lM ± 11 lM at pH

8.5), and no activity was detected with NADP as the

electron acceptor. Divalent metal cations (5 mMMg2+, 0.5 mM Mn2+, 0.5 mM Zn2+, and 0.5 mM

Ni2+) showed no observable effect on enzyme activity.

Succinylglutamic semialdehyde, the proposed natural

electron donor for AstD [107], is not commercially

available, but we observed significant activity of this

protein with succinic semialdehyde as a substrate

(apparent Vmax = 0.69 ± 0.02 lmol/min mg protein,

apparent Km = 1.72 ± 0.22 mM at pH 8.5 and 0.5 mMNAD). Our results demonstrated that E. coli AstD is

an aldehyde dehydrogenase with broad substrate

specificity.

6. Concluding remarks and prospects

In any genomes, including E. coli and human, atleast 75% of known enzymes belong to three EC

classes, the oxidoreductases, the hydrolases, and the

http://www.ebi.ac.uk/interpro/

http://www.ebi.ac.uk/interpro/


transferases (PEDANT database). It is therefore highly

likely, that many unknown proteins will also have these

activities. In this article, we presented several general

enzymatic assays that can be used for the detection

of the catalytic activities of hydrolases and oxidoreduc-

tases (phosphatases, phosphodiesterases, nucleases,esterases, proteases, dehydrogenases, and oxidases).

With the advent of automated approaches for protein

purification and several large-scale proteomics efforts

that generate purified proteins and protein complexes,

this approach represents a powerful and potentially

widely applicable method for high-throughput, parallel

screening of large sets of proteins (genome-wide pro-

tein libraries) on the basis of catalytic activity. Consid-erable challenges still exist in terms of protein

overexpression, solubility, purification, and develop-

ment of new general assays, and they should be ad-

dressed in future. Continuous refinements of the

substrate pools used should also be carried out in order

to increase the efficiency of the enzyme activity screen.

However, this initial study enabled the new identifica-

tion of catalytic annotations for 36 proteins from afew hundred proteins screened, in addition to the con-

firmation of activities proposed for three proteins

(E. coli BioH, TM1643, TA0175) based on structural

or sequence motifs.

Global genome sequencing efforts continue to pro-

duce increasing quantities of hypothetical genes. Out

of approximately one million of protein sequences in se-

quence databases, nearly 40% proteins can not be as-signed any putative function; such as in Protein Data

Bank (PDB), and there are approximately 500 three-

dimensional structures for proteins annotated as ‘‘hypo-

thetical’’ (roughly 1 per 50 entries) [108]. Moreover,

there are 1,437 enzyme activities for which Enzyme

Commission numbers have been assigned but for which

no sequence can be found in sequence databases (or-

phan activities) [109]. Since one enzymatic activity canbe catalyzed by more than one family of proteins, there

might be four or more times that number of distinct

gene families encoding enzymes with those activities

[109]. Altogether, these problems undermine our pro-

gress and research in many areas ranging from genome

annotation to metabolic engineering. Accordingly, these

issues were addressed in two recent calls for an effort by

the scientific community to experimentally determinefunctions for ‘‘hypothetical’’ proteins and sequences

for ‘‘orphan’’ enzymatic activities [109,110], and two

‘‘priority’’ lists of conserved hypothetical proteins have

been already proposed for experimental study [111].

Characterization of these proteins will provide impor-

tant enzymatic information pertaining to a vast range

of organisms. Application of general enzymatic screens

and substrate profiling will improve the efficiency offunctional annotation of ‘‘hypothetical’’ proteins and

will discover new enzymes.

Acknowledgements

We thank all members of the Ontario Centre for

Structural Proteomics for help in conducting experi-

ments, as well as our collaborators Sung-Hou Kim, Ros-

alind Kim, Liang Tong, Dinesh Christendat, AndrzejJoachimiak, and Wayne Anderson. Yurij Korniyenko

and Greg Brown are thanked for their help with protein

purification. The financial support from Genome Can-

ada, the Ontario Research and Development Challenge

Fund, and the Protein Structure Initiative of the Na-

tional Institutes of Health (GM62414 and GM62413)

is greatly appreciated. C.H.A. is a Scientist of the Cana-

dian Institutes of Health Research (CIHR). A.M.E. isthe Banbury Chair of Medical Research at the Univer-

sity of Toronto.

References

[1] Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang,

Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and

PSI-BLAST: a new generation of protein database search

programs. Nucleic Acids Res. 25, 3389–3402.

[2] Green, M.L. and Karp, P.D. (2004) A Bayesian method for

identifying missing enzymes in predicted metabolic pathway

databases. BMC Bioinformatics 5, 76.

[3] Gardner, M.J., Shallom, S.J., Carlton, J.M., Salzberg, S.L.,

Nene, V., Shoaibi, A., Ciecko, A., Lynn, J., Rizzo, M., Weaver,

B., Jarrahi, B., Brenner, M., Parvizi, B., Tallon, L., Moazzez, A.,

Granger, D., Fujii, C., Hansen, C., Pederson, J., Feldblyum, T.,

Peterson, J., Suh, B., Angiuoli, S., Pertea, M., Allen, J., Selengut,

J., White, O., Cummings, L.M., Smith, H.O., Adams, M.D.,

Venter, J.C., Carucci, D.J., Hoffman, S.L. and Fraser, C.M.

(2002) Sequence of Plasmodium falciparum chromosomes 2, 10,

11, and 14. Nature 419, 531–534.

[4] Liang, P., Labedan, B. and Riley, M. (2002) Physiological

genomics of Escherichia coli protein families. Physiol. Genomics

9, 15–26.

[5] Galperin, M.Y. and Koonin, E.V. (2000) Who�s your neighbor?New computational approaches for functional genomics. Nat.

Biotechnol. 18, 609–613.

[6] Huynen, M., Snel, B., Lathe, W. and Bork, P. (2000) Exploi-

tation of gene context. Curr. Opin. Struct. Biol. 10, 366–370.

[7] Osterman, A. and Overbeek, R. (2003) Missing genes in

metabolic pathways: a comparative genomics approach. Curr.

Opin. Chem. Biol. 7, 238–251.

[8] Koonin, E.V. and Galperin, M.Y. (2003) Sequence – Evolution –

Function: Computational Approaches in Comparative Genom-

ics. Kluwer Academic Publishers, Boston, 461 pp.

[9] Phizicky, E.M. and Fields, S. (1995) Protein–protein interac-

tions: methods for detection and analysis. Microbiol. Rev. 59,

94–123.

[10] Ito, T., Tashiro, K., Muta, S., Ozawa, R., Chiba, T., Nishizawa,

M., Yamamoto, K., Kuhara, S. and Sakaki, Y. (2000) Toward a

protein–protein interaction map of the budding yeast: a

comprehensive system to examine two-hybrid interactions in

all possible combinations between the yeast proteins. Proc. Natl.

Acad. Sci. USA 97, 1143–1147.

[11] Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S.,

Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M.,

Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover,

D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston,


M., Fields, S. and Rothberg, J.M. (2000) A comprehensive

analysis of protein–protein interactions in Saccharomyces cere-

visiae. Nature 403, 623–627.

[12] Karimova, G., Pidoux, J., Ullmann, A. and Ladant, D. (1998) A

bacterial two-hybrid system based on a reconstituted signal

transduction pathway. Proc. Natl. Acad. Sci. USA 95, 5752–

5756.

[13] Dove, S.L., Joung, J.K. and Hochschild, A. (1997) Activation of

prokaryotic transcription through arbitrary protein–protein

contacts. Nature 386, 627–630.

[14] Puig, O., Caspary, F., Rigaut, G., Rutz, B., Bouveret, E.,

Bragado-Nilsson, E., Wilm, M. and Seraphin, B. (2001) The

tandem affinity purification (TAP) method: a general procedure

of protein complex purification. Methods 24, 218–229.

[15] Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J.,

Chung, S., Emili, A., Snyder, M., Greenblatt, J.F. and Gerstein,

M. (2003) A Bayesian networks approach for predicting protein–

protein interactions from genomic data. Science 302, 449–453.

[16] Zhu, H. and Snyder, M. (2001) Protein arrays and microarrays.

Curr. Opin. Chem. Biol. 5, 40–45.

[17] Predki, P.F. (2004) Functional protein microarrays: ripe for

discovery. Curr. Opin. Chem. Biol. 8, 8–13.

[18] Brown, P.O. and Botstein, D. (1999) Exploring the new world of

the genome with DNA microarrays. Nat. Genet. 21, 33–37.

[19] DeRisi, J.L., Iyer, V.R. and Brown, P.O. (1997) Exploring the

metabolic and genetic control of gene expression on a genomic

scale. Science 278, 680–686.

[20] Arigoni, F., Talabot, F., Peitsch, M., Edgerton, M.D., Meldrum,

E., Allet, E., Fish, R., Jamotte, T., Curchod, M.L. and Loferer,

H. (1998) A genome-based approach for the identification of

essential bacterial genes. Nat. Biotechnol. 16, 851–856.

[21] Akerley, B.J., Rubin, E.J., Camilli, A., Lampe, D.J., Robertson,

H.M. and Mekalanos, J.J. (1998) Systematic identification of

essential genes by in vitro mariner mutagenesis. Proc. Natl.

Acad. Sci. USA 95, 8927–8932.

[22] Reich, K.A., Chovan, L. and Hessler, P. (1999) Genome

scanning in Haemophilus influenzae for identification of essential

genes. J. Bacteriol. 181, 4961–4968.

[23] Winzeler, E.A., Shoemaker, D.D., Astromoff, A., Liang, H.,

Anderson, K., Andre, B., Bangham, R., Benito, R., Boeke, J.D.,

Bussey, H., Chu, A.M., Connelly, C., Davis, K., Dietrich, F.,

Dow, S.W., El Bakkoury, M., Foury, F., Friend, S.H., Gentalen,

E., Giaever, G., Hegemann, J.H., Jones, T., Laub, M., Liao, H.

and Davis, R.W. (1999) Functional characterization of the S.

cerevisiae genome by gene deletion and parallel analysis. Science

285, 901–906.

[24] Ross-Macdonald, P., Coelho, P.S., Roemer, T., Agarwal, S.,

Kumar, A., Jansen, R., Cheung, K.H., Sheehan, A., Symoniatis,

D., Umansky, L., Heidtman, M., Nelson, F.K., Iwasaki, H.,

Hager, K., Gerstein, M., Miller, P., Roeder, G.S. and Snyder, M.

(1999) Large-scale analysis of the yeast genome by transposon

tagging and gene disruption. Nature 402, 413–418.

[25] Dreger, M. (2003) Proteome analysis at the level of subcellular

structures. Eur. J. Biochem. 270, 589–599.

[26] Cai, Y.D. and Chou, K.C. (2004) Predicting 22 protein local-

izations in budding yeast. Biochem. Biophys. Res. Commun.

323, 425–428.

[27] Zarembinski, T.I., Hung, L.W., Mueller-Dieckmann, H.J., Kim,

K.K., Yokota, H., Kim, R. and Kim, S.H. (1998) Structure-

based assignment of the biochemical function of a hypothetical

protein: a test case of structural genomics. Proc. Natl. Acad. Sci.

USA 95, 15189–15193.

[28] Christendat, D., Yee, A., Dharamsi, A., Kluger, Y., Savchenko,

A., Cort, J.R., Booth, V., Mackereth, C.D., Saridakis, V., Ekiel,

I., Kozlov, G., Maxwell, K.L., Wu, N., McIntosh, L.P.,

Gehring, K., Kennedy, M.A., Davidson, A.R., Pai, E.F.,

Gerstein, M., Edwards, A.M. and Arrowsmith, C.H. (2000)

Structural proteomics of an archaeon. Nat. Struct. Biol. 7, 903–

909.

[29] Edwards, A.M., Arrowsmith, C.H., Christendat, D., Dharamsi,

A., Friesen, J.D., Greenblatt, J.F. and Vedadi, M. (2000) Protein

production: feeding the crystallographers and NMR spectros-

copists. Nat. Struct. Biol. 7 (Suppl), 970–972.

[30] Norin, M. and Sundstrom, M. (2002) Structural proteomics:

developments in structure-to-function predictions. Trends Bio-

technol. 20, 79–84.

[31] Zhang, C. and Kim, S.H. (2003) Overview of structural

genomics: from structure to function. Curr. Opin. Chem. Biol.

7, 28–32.

[32] Laskowski, R.A., Watson, J.D. and Thornton, J.M. (2003)

From protein structure to biochemical function? J. Struct. Funct.

Genomics 4, 167–177.

[33] Kim, Y., Dementieva, I., Zhou, M., Wu, R., Lezondra, L.,

Quartey, P., Joachimiak, G., Korolev, O., Li, H. and Joachi-

miak, A. (2004) Automation of protein purification for struc-

tural genomics. J. Struct. Funct. Genomics 5, 111–118.

[34] Yakunin, A.F., Yee, A.A., Savchenko, A., Edwards, A.M. and

Arrowsmith, C.H. (2004) Structural proteomics: a tool for

genome annotation. Curr. Opin. Chem. Biol. 8, 42–48.

[35] Grayhack, E.J. and Phizicky, E.M. (2001) Genomic analysis of

biochemical function. Curr. Opin. Chem. Biol. 5, 34–39.

[36] Martzen, M.R., McCraith, S.M., Spinelli, S.L., Torres, F.M.,

Fields, S., Grayhack, E.J. and Phizicky, E.M. (1999) A

biochemical genomics approach for identifying genes by the

activity of their products. Science 286, 1153–1155.

[37] Phizicky, E., Bastiaens, P.I., Zhu, H., Snyder, M. and Fields, S.

(2003) Protein analysis on a proteomic scale. Nature 422, 208–

215.

[38] Xing, F., Hiley, S.L., Hughes, T.R. and Phizicky, E.M. (2004)

The specificities of four yeast dihydrouridine synthases for

cytoplasmic tRNAs. J. Biol. Chem. 279, 17850–17860.

[39] Phizicky, E.M., Martzen, M.R., McCraith, S.M., Spinelli, S.L.,

Xing, F., Shull, N.P., Van Slyke, C., Montagne, R.K., Torres,

F.M., Fields, S. and Grayhack, E.J. (2002) Biochemical genom-

ics approach to map activities to genes. Methods Enzymol. 350,

546–559.

[40] Copley, S.D. (2003) Enzymes with extra talents: moonlighting

functions and catalytic promiscuity. Curr. Opin. Chem. Biol. 7,

265–272.

[41] Gomez, A., Domedel, N., Cedano, J., Pinol, J. and Querol, E.

(2003) Do current sequence analysis algorithms disclose multi-

functional (moonlighting) proteins? Bioinformatics 19, 895–896.

[42] Nilsson, J., Stahl, S., Lundeberg, J., Uhlen, M. and Nygren, P.A.

(1997) Affinity fusion strategies for detection, purification, and

immobilization of recombinant proteins. Protein Expr. Purif. 11,

1–16.

[43] Braun, P., Hu, Y., Shen, B., Halleck, A., Koundinya, M.,

Harlow, E. and LaBaer, J. (2002) Proteome-scale purification of

human proteins from bacteria. Proc. Natl. Acad. Sci. USA 99,

2654–2659.

[44] Lesley, S.A. (2001) High-throughput proteomics: protein expres-

sion and purification in the postgenomic world. Protein Expr.

Purif. 22, 159–164.

[45] Albala, J.S., Franke, K., McConnell, I.R., Pak, K.L., Folta,

P.A., Rubinfeld, B., Davies, A.H., Lennon, G.G. and Clark, R.

(2000) From genes to proteins: high-throughput expression and

purification of the human proteome. J. Cell. Biochem. 80, 187–

191.

[46] Gilbert, M. and Albala, J.S. (2002) Accelerating code to

function: sizing up the protein production line. Curr. Opin.

Chem. Biol. 6, 102–105.

[47] Huang, R.Y., Boulton, S.J., Vidal, M., Almo, S.C., Bresnick,

A.R. and Chance, M.R. (2003) High-throughput expression,

purification, and characterization of recombinant Caenorhabditis


elegans proteins. Biochem. Biophys. Res. Commun. 307, 928–

934.

[48] Wang, C., Castro, A.F., Wilkes, D.M. and Altenberg, G.A.

(1999) Expression and purification of the first nucleotide-binding

domain and linker region of human multidrug resistance gene

product: comparison of fusions to glutathione S-transferase,

thioredoxin and maltose-binding protein. Biochem. J. 338 (Pt. 1),

77–81.

[49] Hammarstrom, M., Hellgren, N., van Den Berg, S., Berglund,

H. and Hard, T. (2002) Rapid screening for improved solubility

of small human proteins produced as fusion proteins in

Escherichia coli. Protein Sci. 11, 313–321.

[50] Eisenthal, R. and Danson, M.J. (2002) Enzyme Assays: a

Practical Approach. Oxford University Press, Oxford, New

York, 282 pp.

[51] Coleman, P.L., Lathham Jr., H.G. and Shaw, E.N. (1976) Some

sensitive methods for the assay of trypsinlike enzymes. Methods

Enzymol. 45, 12–26.

[52] Reynolds, L.J., Washburn, W.N., Deems, R.A. and Dennis,

E.A. (1991) Assay strategies and methods for phospholipases.

Methods Enzymol. 197, 3–23.

[53] Baykov, A.A., Evtushenko, O.A. and Avaeva, S.M. (1988) A

malachite green procedure for orthophosphate determination

and its use in alkaline phosphatase-based enzyme immunoassay.

Anal. Biochem. 171, 266–270.

[54] Drummond, G.T. and Yamamoto, M. (1970) Nucleoside cyclic

phosphate diesterases In: The Enzymes (Boyer, P.D. and Krebs,

E.G., Eds.), pp. 355–371. Academic Press, New York.

[55] Seki, T. and Fukuda, S. (1981) Purification and some properties

of 20,30-cyclic phosphodiesterase from the cell-free extracts of

Bacillus subtilis var. amyloliquefacus. J. Gen. Appl. Microbiol.

27, 487–498.

[56] Hasunuma, K. (1983) Repressible extracellular phosphodiester-

ases showing cyclic 2 0,3 0- and cyclic 30,50-nucleotide phosphodi-

esterase activities in Neurospora crassa. J. Bacteriol. 156, 291–

300.

[57] Ito, K., Yamamoto, T. and Minamiura, N. (1987) Phosphodi-

esterase I in human urine: purification and characterization of

the enzyme. J. Biochem. (Tokyo) 102, 359–367.

[58] Vogel, A., Schilling, O., Niecke, M., Bettmer, J. and Meyer-

Klaucke, W. (2002) ElaC encodes a novel binuclear zinc

phosphodiesterase. J. Biol. Chem. 277, 29078–29085.

[59] Ohkawa, I., Shiga, S. and Kageyama, M. (1979) An esterase on

the outer membrane of Pseudomonas aeruginosa for the

hydrolysis of long chain acyl esters. J. Biochem. (Tokyo) 86,

643–656.

[60] Mukherjee, J.J., Jay, F.T. and Choy, P.C. (1993) Purification,

characterization and modulation of a microsomal carboxylest-

erase in rat liver for the hydrolysis of acyl-CoA. Biochem. J. 295

(Pt 1), 81–86.

[61] Sanishvili, R., Yakunin, A.F., Laskowski, R.A., Skarina, T.,

Evdokimova, E., Doherty-Kirby, A., Lajoie, G.A., Thornton,

J.M., Arrowsmith, C.H., Savchenko, A., Joachimiak, A. and

Edwards, A.M. (2003) Integrating structure, bioinformatics, and

enzymology to discover function: BioH, a new carboxylesterase

from Escherichia coli. J. Biol. Chem. 278, 26039–26045.

[62] Mori, M., Hosokawa, M., Ogasawara, Y., Tsukada, E. and

Chiba, K. (1999) cDNA cloning, characterization and stable

expression of novel human brain carboxylesterase. FEBS Lett.

458, 17–22.

[63] Berge, R.K. and Farstad, M. (1981) Long-chain fatty acyl-CoA

hydrolase from rat liver mitochondria. Methods Enzymol. 71

(Pt. C), 234–242.

[64] Sarath, G., Zeece, M.G. and Penheiter, A.R. (2001) Protease

assay methods In: Proteolytic Enzymes: a Practical Approach

(Beynon, R.J. and Bond, J.S., Eds.), pp. 45–76. Oxford

University Press, Oxford, New York.

[65] Buroker-Kilgore, M. and Wang, K.K. (1993) A Coomassie

brilliant blue G-250-based colorimetric assay for measuring

activity of calpain and other proteases. Anal. Biochem. 208, 387–

392.

[66] Jones, L.J., Upson, R.H., Haugland, R.P., Panchuk-Voloshina,

N. and Zhou, M. (1997) Quenched BODIPY dye-labeled casein

substrates for the assay of protease activity by direct fluorescence

measurement. Anal. Biochem. 251, 144–152.

[67] Garland, W.J. and Dennis, D.T. (1977) Steady-state kinetics of

glutamate dehydrogenase from Pisum sativum L. mitochondria.

Arch. Biochem. Biophys. 182, 614–625.

[68] Marshall, J.H., Kong, Y.C., Sloan, J. and May, J.W. (1989)

Purification and properties of glycerol:NADP+ 2-oxidoreduc-

tase from Schizosaccharomyces pombe. J. Gen. Microbiol. 135

(Pt 3), 697–701.

[69] Yabe, M., Shirata, K., Kawashima, J., Shinoyama, H., Ando, A.

and Fujii, T. (1992) Purification and properties of an alcohol

dehydrogenase enzyme from methanol-using yeast, Candida sp.

N-16. Biosci. Biotechnol. Biochem. 56, 338–339.

[70] Kelley, R.L. and Reddy, C.A. (1988) Glucose oxidase of

Phanerochaete chrysosporium. Methods Enzymol. 161, 307–316.

[71] Xu, W., Shen, J., Dunn, C.A., Desai, S. and Bessman, M.J.

(2001) The Nudix hydrolases of Deinococcus radiodurans. Mol.

Microbiol. 39, 286–290.

[72] Xu, W., Dunn, C.A., Jones, C.R., D�Souza, G. and Bessman,

M.J. (2004) The 26 Nudix hydrolases of Bacillus cereus, a close

relative of Bacillus anthracis. J. Biol. Chem. 279, 24861–24865.

[73] Gerlt, J.A., and Babbitt, P.C. (2000) Can sequence determine

tfunction? Genome Biol. 1, REVIEWS0005.

[74] Nagano, N., Orengo, C.A. and Thornton, J.M. (2002) One fold

with many functions: the evolutionary relationships between

TIM barrel families based on their sequences, structures and

functions. J. Mol. Biol. 321, 741–765.

[75] Nobeli, I., Ponstingl, H., Krissinel, E.B. and Thornton, J.M.

(2003) A structure-based anatomy of the E. coli metabolome. J.

Mol. Biol. 334, 697–719.

[76] Chen, S., Yakunin, A.F., Kuznetsova, E., Busso, D., Pufan, R.,

Proudfoot, M., Kim, R. and Kim, S.H. (2004) Structural and

functional characterization of a novel phosphodiesterase from

Methanococcus jannaschii. J. Biol. Chem. 279, 31854–31862.

[77] Arpigny, J.L. and Jaeger, K.E. (1999) Bacterial lipolytic

enzymes: classification and properties. Biochem. J. 343 (Pt 1),

177–183.

[78] Vorderwulbecke, T., Kieslich, K. and Erdmann, H. (1992)

Comparison of lipases by different assays. Enzyme Microb.

Technol. 14, 631–639.

[79] Koonin, E.V. and Tatusov, R.L. (1994) Computer analysis of

bacterial haloacid dehalogenases defines a large superfamily of

hydrolases with diverse specificity. Application of an iterative

approach to database search. J. Mol. Biol. 244, 125–132.

[80] Wu, J., Howe, D.L. and Woodard, R.W. (2003) Thermotoga

maritima 3-deoxy-DD-arabino-heptulosonate-7-phosphate (DAHP)

synthase: the ancestral eubacterial DAHP synthase? J. Biol.

Chem. 278, 27525–27531.

[81] Pizer, L.I. (1963) The pathway and control of serine biosynthesis

in Escherichia coli. J. Biol. Chem. 238, 3934–3944.

[82] Seo, H.S., Koo, Y.J., Lim, J.Y., Song, J.T., Kim, C.H., Kim,

J.K., Lee, J.S. and Choi, Y.D. (2000) Characterization of a

bifunctional enzyme fusion of trehalose-6-phosphate synthetase

and trehalose-6-phosphate phosphatase of Escherichia coli.

Appl. Environ. Microbiol. 66, 2484–2490.

[83] Eidsness, M.K., Richie, K.A., Burden, A.E., Kurtz Jr., D.M.

and Scott, R.A. (1997) Dissecting contributions to the thermo-

stability of Pyrococcus furiosus rubredoxin: beta-sheet chimeras.

Biochemistry 36, 10406–10413.

[84] Lee, H.J., Lian, L.Y. and Scrutton, N.S. (1997) Recombinant

two-iron rubredoxin of Pseudomonas oleovorans: overexpression,


purification and characterization by optical, CD and 113Cd

NMR spectroscopies. Biochem. J. 328 (Pt 1), 131–136.

[85] Richie, K.A., Teng, Q., Elkin, C.J. and Kurtz Jr., D.M. (1996)

2D 1H and 3D 1H–15N NMR of zinc-rubredoxins: contribu-

tions of the beta-sheet to thermostability. Protein Sci. 5, 883–

894.

[86] Pellicer, T.M., Felisa Nunez, M., Aguilar, J., Badia, J. and

Baldoma, L. (2003) Role of 2-phosphoglycolate phosphatase of

Escherichia coli in metabolism of the 2-phosphoglycolate formed

in DNA repair. J. Bacteriol. 185, 5815–5821.

[87] Kim, Y., Yakunin, A.F., Kuznetsova, E., Xu, X., Pennycooke,

M., Gu, J., Cheung, F., Proudfoot, M., Arrowsmith, C.H.,

Joachimiak, A., Edwards, A.M. and Christendat, D. (2004)

Structure- and function-based characterization of a new phos-

phoglycolate phosphatase from Thermoplasma acidophilum. J.

Biol. Chem. 279, 517–526.

[88] Bianchi, V. and Spychala, J. (2003) Mammalian 5 0-nucleotida-ses. J. Biol. Chem. 278, 46195–46198.

[89] Proudfoot, M., Kuznetsova, E., Brown, G., Rao, N.N., Kitag-

awa, M., Mori, H., Savchenko, A. and Yakunin, A.F. (2004)

General enzymatic screens identify three new nucleotidases in E.

coli: biochemical characterization of SurE, YfbR, and YjjG. J.

Biol. Chem..

[90] Lahiri, S.D., Zhang, G., Dunaway-Mariano, D. and Allen, K.N.

(2002) Caught in the act: the structure of phosphorylated beta-

phosphoglucomutase from Lactococcus lactis. Biochemistry 41,

8351–8359.

[91] Stephens, C.M. and Bauerle, R. (1991) Analysis of the metal

requirement of 3-deoxy-DD-arabino-heptulosonate-7-phosphate

synthase from Escherichia coli. J. Biol. Chem. 266, 20810–20817.

[92] Zhao, G., Pease, A.J., Bharani, N. and Winkler, M.E. (1995)

Biochemical characterization of gapB-encoded erythrose 4-

phosphate dehydrogenase of Escherichia coli K-12 and its

possible role in pyridoxal 50-phosphate biosynthesis. J. Bacteriol.177, 2804–2812.

[93] Parker, E.J., Bulloch, E.M., Jameson, G.B. and Abell, C. (2001)

Substrate deactivation of phenylalanine-sensitive 3-deoxy-DD-

arabino-heptulosonate-7-phosphate synthase by erythrose 4-

phosphate. Biochemistry 40, 14821–14828.

[94] Treton, B.Y., Le Dall, M.T. and Gaillardin, C.M. (1992)

Complementation of Saccharomyces cerevisiae acid phosphatase

mutation by a genomic sequence from the yeast Yarrowia

lipolytica identifies a new phosphatase. Curr. Genet. 22, 345–355.

[95] Zhang, R.G., Skarina, T., Katz, J.E., Beasley, S., Khachatryan,

A., Vyas, S., Arrowsmith, C.H., Clarke, S., Edwards, A.,

Joachimiak, A. and Savchenko, A. (2001) Structure of Thermo-

toga maritima stationary phase survival protein SurE: a novel

acid phosphatase. Structure (Cambridge) 9, 1095–1106.

[96] Lee, J.Y., Kwak, J.E., Moon, J., Eom, S.H., Liong, E.C.,

Pedelacq, J.D., Berendzen, J. and Suh, S.W. (2001) Crystal

structure and functional analysis of the SurE protein identify a

novel phosphatase family. Nat. Struct. Biol. 8, 789–794.

[97] Mura, C., Katz, J.E., Clarke, S.G. and Eisenberg, D.

(2003) Structure and function of an archaeal homolog

of survival protein E (SurEalpha): an acid phosphatase

with purine nucleotide specificity. J. Mol. Biol. 326,

1559–1575.

[98] Yang, Z., Savchenko, A., Yakunin, A., Zhang, R., Edwards, A.,

Arrowsmith, C. and Tong, L. (2003) Aspartate dehydrogenase, a

novel enzyme identified from structural and functional studies of

TM1643. J. Biol. Chem. 278, 8804–8808.

[99] Lemoine, Y., Wach, A. and Jeltsch, J.M. (1996) To be free or

not: the fate of pimelate in Bacillus sphaericus and in Escherichia

coli. Mol. Microbiol. 19, 645–647.

[100] Best, A.N. and Novelli, G.D. (1971) Studies with tRNA

adenylyl(cytidylyl)transferase from Escherichia coli B. I. Purifi-

cation and kinetic properties. Arch. Biochem. Biophys. 142, 527–

538.

[101] Cudny, H. and Deutscher, M.P. (1986) High-level overexpres-

sion, rapid purification, and properties of Escherichia coli tRNA

nucleotidyltransferase. J. Biol. Chem. 261, 6450–6453.

[102] Yakunin, A.F., Proudfoot, M., Kuznetsova, E., Savchenko, A.,

Brown, G., Arrowsmith, C.H. and Edwards, A.M. (2004) The

HD domain of the Escherichia coli tRNA nucleotidyltransferase

has 2 0,3 0-cyclic phosphodiesterase, 20-nucleotidase, and phos-

phatase activities. J. Biol. Chem. 279, 36819–36827.

[103] Aravind, L. and Koonin, E.V. (1998) The HD domain defines a

new superfamily of metal-dependent phosphohydrolases. Trends

Biochem. Sci. 23, 469–472.

[104] Thompson, J.E., Venegas, F.D. and Raines, R.T. (1994)

Energetics of catalysis by ribonucleases: fate of the 2 0,3 0-cyclic phosphodiester intermediate. Biochemistry 33, 7408–

7414.

[105] Okorokov, A.L., Panov, K.I., Offen, W.A., Mukhortov, V.G.,

Antson, A.A., Karpeisky, M., Wilkinson, A.J. and Dodson,

G.G. (1997) RNA cleavage without hydrolysis. Splitting the

catalytic activities of binase with Asn101 and Thr101 mutations.

Protein Eng. 10, 273–278.

[106] Gonzalez, T.N., Sidrauski, C., Dorfler, S. and Walter, P. (1999)

Mechanism of non-spliceosomal mRNA splicing in the unfolded

protein response pathway. EMBO J. 18, 3119–3132.

[107] Schneider, B.L., Kiupakis, A.K. and Reitzer, L.J. (1998)

Arginine catabolism and the arginine succinyltransferase path-

way in Escherichia coli. J. Bacteriol. 180, 4278–4286.

[108] Pazos, F. and Sternberg, M.J. (2004) Automated prediction of

protein function and detection of functional sites from structure.

Proc. Natl. Acad. Sci. USA 101, 14754–14759.

[109] Karp, P.D. (2004) Call for an enzyme genomics initiative.

Genome Biol. 5, 401.

[110] Roberts, R.J. (2004) Identifying protein function-a call for

community action. PLoS Biol. 2, E42.

[111] Galperin, M.Y. and Koonin, E.V. (2004) �Conserved hypothet-

ical� proteins: prioritization of targets for experimental study.

Nucleic Acids Res. 32, 5452–5463.

enzyme genomics: application of general enzymatic screens to discover new enzymes

Documents