complex graph matrix representations and characterizations of proteomic maps and chemically induced...

10
Complex Graph Matrix Representations and Characterizations of Proteomic Maps and Chemically Induced Changes to Proteomes Krishnan Balasubramanian,* , ,‡ Kanan Khokhani, and Subhash C. Basak § Chemistry and Material Science Directorate, Lawrence Livermore National Laboratory, University of California, Livermore, California 94550, Glenn T. Seaborg Center, Lawrence Berkeley Laboratory, University of California, Berkeley, California 94720, Department of Mathematics and Computer Science, California State University, East Bay, Hayward, California 94542, and Natural Resources Research Institute, University of Minnesota at Duluth, 5013 Miller Trunk Highway, Duluth, Minnesota 55811 Received December 8, 2005 We have presented a complex graph matrix representation to characterize proteomics maps obtained from 2D-gel electrophoresis. In this method, each bubble in a 2D-gel proteomics map is represented by a complex number with components which are charge and mass. Then, a graph with complex weights is constructed by connecting the vertices in the relative order of abundance. This yields adjacency matrices and distance matrices of the proteomics graph with complex weights. We have computed the spectra, eigenvectors, and other properties of complex graphs and the Euclidian/graph distance obtained from the complex graphs. The leading eigenvalues and eigenvectors and, likewise, the smallest eigenvalues and eigenvectors, and the entire graph spectral patterns of the complex matrices derived from them yield novel weighted biodescriptors that characterize proteomics maps with information of charge and masses of proteins. We have also applied these eigenvector and eigenvalue maps to contrast the normal cells and cells exposed to four peroxisome proliferators, namely, clofibrate, diethylhexyl phthalate (DEHP), perfluorodecanoic acid (PFDA), and perfluoroctanoic acid (PFOA). Our complex eigenspectra show that the proteomic response induced by DEHP differs from the corresponding responses of other three chemicals consistent with their chemical structures and properties. Keywords: 2D-gel pattern proteome characterization graph theory of proteome chemically induced response complex matrices 1. Introduction The evaluation of drugs and toxicants for their effects on the cellular proteome is central to many fields such as molec- ular pharmacology, drug discovery, and hazard assessment. Therefore, significant efforts have been devoted to the develop- ment of mathematical and computational techniques for mathematical chemistry characterizing proteomes, DNA, and their responses to chemicals. 1-28 Proteomic maps contain information on the variations of the relative abundance, induction, and repression of thousands of proteins present in a cell and can serve as powerful tools to measure biochemical changes induced upon the cell by toxicants, drugs, and so on. In a typical experimental setup, cellular material (as homo- geneous as possible, i.e., selecting cells from the same organs of experimental animals) is subjected to a combined electro- phoretic and chromatographic analysis which results in a two- dimensional proteomics gel (2D-gel) in which thousands of proteins are separated. 6,7 The experimental data consist of a list of the locations of the proteins (as x and y coordinates) and their abundance. The abundances are given by densities of the experimental spots in a gel, as has been described in the literature. 9 When an animal is exposed to chemicals, the patterns of protein expression in affected cells can change appreciably. The changes may be due to the effects of exposure to chemicals or abnormalities and departure of the cell from the normal state caused by alterations in cellular transcriptional and translational processes, as well as through post-transla- tional modifications of individual proteins. 8 To compare pro- teomics maps, one needs a numerical quantification of their protein patterns maps that leads to a condensed representation of the available data offering a characterization based on a relatively small and manageable number of descriptors. Specific biodescriptors advanced thus far include (a) invariants of graphs or matrices associated with proteomics maps; 2 (b) information-theoretic biodescriptors; 11 (c) spectrum-like de- scriptors of proteomics patterns; 12 and (d) critical protein biomarkers derived using statistical methods. 29 Graph theory has been successfully applied to a number of problems in genomics 11-16 and proteomics. 1-3 For example, * To whom correspondence should be addressed. E-mail, [email protected]; phone, 925-422-4984. Chemistry and Material Science Directorate, Lawrence Livermore National Laboratory, University of California and Glenn T. Seaborg Center, Lawrence Berkeley Laboratory, University of California. California State University. § University of Minnesota at Duluth. 10.1021/pr050445s CCC: $33.50 2006 American Chemical Society Journal of Proteome Research 2006, 5, 1133-1142 1133 Published on Web 03/28/2006

Upload: independent

Post on 13-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Complex Graph Matrix Representations and Characterizations of

Proteomic Maps and Chemically Induced Changes to Proteomes

Krishnan Balasubramanian,*,†,‡ Kanan Khokhani,‡ and Subhash C. Basak§

Chemistry and Material Science Directorate, Lawrence Livermore National Laboratory, University of California,Livermore, California 94550, Glenn T. Seaborg Center, Lawrence Berkeley Laboratory, University of California,

Berkeley, California 94720, Department of Mathematics and Computer Science, California State University,East Bay, Hayward, California 94542, and Natural Resources Research Institute, University of Minnesota at

Duluth, 5013 Miller Trunk Highway, Duluth, Minnesota 55811

Received December 8, 2005

We have presented a complex graph matrix representation to characterize proteomics maps obtainedfrom 2D-gel electrophoresis. In this method, each bubble in a 2D-gel proteomics map is representedby a complex number with components which are charge and mass. Then, a graph with complexweights is constructed by connecting the vertices in the relative order of abundance. This yieldsadjacency matrices and distance matrices of the proteomics graph with complex weights. We havecomputed the spectra, eigenvectors, and other properties of complex graphs and the Euclidian/graphdistance obtained from the complex graphs. The leading eigenvalues and eigenvectors and, likewise,the smallest eigenvalues and eigenvectors, and the entire graph spectral patterns of the complexmatrices derived from them yield novel weighted biodescriptors that characterize proteomics mapswith information of charge and masses of proteins. We have also applied these eigenvector andeigenvalue maps to contrast the normal cells and cells exposed to four peroxisome proliferators, namely,clofibrate, diethylhexyl phthalate (DEHP), perfluorodecanoic acid (PFDA), and perfluoroctanoic acid(PFOA). Our complex eigenspectra show that the proteomic response induced by DEHP differs fromthe corresponding responses of other three chemicals consistent with their chemical structures andproperties.

Keywords: 2D-gel pattern • proteome characterization • graph theory of proteome • chemically induced response• complex matrices

1. Introduction

The evaluation of drugs and toxicants for their effects onthe cellular proteome is central to many fields such as molec-ular pharmacology, drug discovery, and hazard assessment.Therefore, significant efforts have been devoted to the develop-ment of mathematical and computational techniques formathematical chemistry characterizing proteomes, DNA, andtheir responses to chemicals.1-28 Proteomic maps containinformation on the variations of the relative abundance,induction, and repression of thousands of proteins present ina cell and can serve as powerful tools to measure biochemicalchanges induced upon the cell by toxicants, drugs, and so on.In a typical experimental setup, cellular material (as homo-geneous as possible, i.e., selecting cells from the same organsof experimental animals) is subjected to a combined electro-phoretic and chromatographic analysis which results in a two-

dimensional proteomics gel (2D-gel) in which thousands ofproteins are separated.6,7 The experimental data consist of alist of the locations of the proteins (as x and y coordinates)and their abundance. The abundances are given by densitiesof the experimental spots in a gel, as has been described inthe literature.9 When an animal is exposed to chemicals, thepatterns of protein expression in affected cells can changeappreciably. The changes may be due to the effects of exposureto chemicals or abnormalities and departure of the cell fromthe normal state caused by alterations in cellular transcriptionaland translational processes, as well as through post-transla-tional modifications of individual proteins.8 To compare pro-teomics maps, one needs a numerical quantification of theirprotein patterns maps that leads to a condensed representationof the available data offering a characterization based on arelatively small and manageable number of descriptors. Specificbiodescriptors advanced thus far include (a) invariants ofgraphs or matrices associated with proteomics maps;2 (b)information-theoretic biodescriptors;11 (c) spectrum-like de-scriptors of proteomics patterns;12 and (d) critical proteinbiomarkers derived using statistical methods.29

Graph theory has been successfully applied to a number ofproblems in genomics11-16 and proteomics.1-3 For example,

* To whom correspondence should be addressed. E-mail, [email protected];phone, 925-422-4984.

† Chemistry and Material Science Directorate, Lawrence LivermoreNational Laboratory, University of California and Glenn T. Seaborg Center,Lawrence Berkeley Laboratory, University of California.

‡ California State University.§ University of Minnesota at Duluth.

10.1021/pr050445s CCC: $33.50 2006 American Chemical Society Journal of Proteome Research 2006, 5, 1133-1142 1133Published on Web 03/28/2006

Randic and co-workers2 have considered powers of matricesderived from associated graphs, called the D/D matrix ap-proach, which is based on the graph distances and Euclidiandistances between vertices which represent the proteins of theproteomics maps. The vertices are connected in the relativeorder of abundance to generate a graph, which then yields thevarious matrices. While the approach is quite interesting andthe first of its kind, there is room for further development asnoted by Randic et al.2 For example, the D/D matrix approachdoes not weight the vertices with the masses and charges ofproteins, and thus, some intrinsic information pertinent toproteins may not be fully considered in this algorithm.

In the present work, we have considered a new approachfor the quantification of not only proteomic maps of the cellbut also the chemical changes induced upon the cell by variousperoxisome proliferators, namely, clofibrate, diethylhexyl ph-thalate (DEHP), perfluorodecanoic acid (PFDA), and perfluo-roctanoic acid (PFOA). A typical proteomics map is shown inFigure 1, where we have represented each protein componentas a bubble. The x and y axes represent the charge and massof proteins, respectively. The data are from rat liver cells byWitzmann18 and co-workers of Indiana University and PurdueUniversity. Our present approach is to consider the mass andcharge on each protein directly in addition to the relativeabundances of the proteins. We have accomplished this by

cross-fertilization of graph theory and complex algebra byweighting each vertex of the proteomics map by a complexnumber that uses the mass and charge of the protein ascomponents. Consequently, the complex-weighted graph con-structed considers the actual mass and charge of each proteinand the relative abundances. In addition, chemically inducedchanges to the cell are easily represented by the complex-weighted graph procedure. With the complex graph, we obtainthe eigenvalues or spectra, eigenvectors, and so on, which areplotted on a two-dimensional grid to characterize the pro-teomics map. We have shown that the spectral map differssubstantially for the peroxisome proliferators that we haveconsidered here, namely, clofibrate, DEHP, PFDA, and PFOAwhose chemical structures are shown in Figure 2.

2. Computational Methods and Proteomics AlgorithmsBased on Complex Matrices

Table 1 shows a typical 2D-gel pattern of proteins obtainedfrom a cell and chemical changes induced to the cell byperoxisome proliferators, namely, clofibrate, DEHP, PFDA, andPFOA. In Table 1, we have listed principal protein componentswith charge and mass values of proteins from rat liver cells.The control represents the relative abundance of the proteinsin the natural cell, while PFOA, PFDA, clofibrate, and DEHPdata represent the chemical changes induced by these peroxi-

Figure 1. “Bubble” diagram illustrating the location and abundance of individual proteins for the rat liver control in the experimental2D gel.

research articles Balasubramanian et al.

1134 Journal of Proteome Research • Vol. 5, No. 5, 2006

some proliferators, respectively. A bubble map thus generatedon the (x,y) grid for the natural cell is shown in Figure 1.

Since we are considering a new mathematical approach thatcombines the principals of graph theory with complex vari-ables, we first introduce the basic concepts of graphs aspertinent to proteomics. A graph is simply a collection of

vertices connected by edges. One can envisage the variousproteins in the bubble graph in Figure 1 as the vertices of agraph. The question that naturally arises is then how could oneintroduce edges or bonds between the vertices. In accordancewith Randic et al.,2 the edges can be introduced by connectingthe vertices in the order of relative abundance. Such a graphis shown in Figure 3.

Once we have a graph for the proteomics map, as seen fromFigure 3, we can use graph theoretical concepts and algorithmsto characterize the proteomics maps. Moreover, as we showhere, one can invoke complex algebra and arithmetic tocharacterize the proteomics maps and chemically inducedchanges to the liver. We thus introduce basic definitions andpreliminaries for the algorithms considered here.

As seen from Table 1, we are considering data for N ) 20proteins which are principal components in terms of relativeabundance of the rat liver cell under consideration. The datacontains charge, mass, and natural abundance for the Nproteins. First, we have normalized the data provided indecreasing order of abundance. By normalization, it is meantthat the highest charge and mass are set to unities, and all otherproteins’ charges and masses are scaled relative to the maximalvalue of 1.0. We have developed a computer code in thelanguage R that reads this normalized data from DATA.txt,which is a tab-delimited file. The format of data contained inDATA.txt is shown in Table 2. The first column is the SID ofthe protein. The second, third, and fourth columns are charge,mass, and abundance, respectively.

Another input to the program that we have developed isread-in from GRAPH.txt. This file is a tab-delimited file,containing the neighborhood information of N proteins asshown in Table 3. The first column shows the vertex number.The second column shows the number of neighbors havinglabels less than the vertex. The third column shows the labelof the vertex, which is adjacent to vertex in column one. Table3 shows representation of N ) 20 proteins.

Consider these proteins to be a graph of N proteins in achain. We can represent this graph as an adjacency matrix, sayAdj, which will have a 1 as Adj[i,i+1]and Adj[i+1,i] elements;and 0 as all other elements.

Once we have defined a graph-theoretical representation ofthe proteome, the question is how to seek an invariant thattruly characterizes the underlying pattern in the proteomicsmap without loosing the vital physical characteristics of theproteins such as charge and mass. A structural invariant is amathematical function or a quantity that does not depend onlabeling of the structure, its orientation, or the labels of vertices,and to the best possible extent, it characterizes the object suchas DNA sequences or proteomics maps uniquely. While theuniqueness may not always be accomplished, the invarianceto labels and representations can be achieved. These invariantswould serve as descriptors and also as pattern recognition toolsfor the proteomics maps. While generation of such invariantsmay lead to characterization of the pattern, there could alsobe some loss of information since the original pattern may havemore information than that which can be characterized by afew numbers of functions. We endeavor to formulate more thanone such structural invariant so that as much informationcontained in the proteomics maps can be characterized as

Figure 2. Chemical structures of the four peroxisome prolifera-tors tested on rat liver cells. Gray spheres are halogens, whitespheres are hydrogens, cyan spheres are carbons, and redspheres are oxygens.

Table 1. The X and Y Coordinates and the Abundance for theControl Rat Liver Cells and When the Animal Is Exposed tothe Four Chemicals Shown in Figure 2

no. x y control PFOA PFDA clofibrate DEHP

22 1183.9 959.6 136653 113859 150253 163645 811152 2182.2 928.8 127195 99160 73071 76642 112.09620 1527.9 825.5 114929 192437 221567 166080 18059062 1346 1352.5 112251 58669 38915 73159 7707548 1406.3 1118.1 98224 91147 82963 84196 92942

9 1474 665.1 90004 129340 112361 112655 11940236 2068.4 823.1 84842 73814 45482 71911 97444

2 642.2 669.8 82492 73974 74466 84703 8854544 2032.7 902.8 80015 77314 80072 76027 10083615 1053.6 864.3 72173 77982 60376 46808 78121

5 1214.3 620 64684 63511 38075 58364 7576045 2094.5 680.5 58977 142865 46225 48625 146609

1 1021.7 390.2 58001 56547 53473 60224 7165456 2070.4 929.6 55402 46146 33152 59438 6903135 1375.7 992.3 49027 42506 52137 46058 6921419 1623.4 640.8 48976 81452 133705 64580 6597647 1842.5 885.9 48145 40390 24149 47585 5235014 1189.5 614.7 42773 49044 77144 46005 5932226 1465.5 821.1 40923 60359 94014 79981 383841 1323.4 993 36433 30640 31611 33764 4469239 1278.8 981.6 35896 31707 21801 29026 3295624 1433.5 662.3 31194 42226 41489 42432 6914229 1170 862.2 30510 29742 30786 26460 3781233 1139.2 958.4 29296 30067 39531 39204 2756518 1167.3 611.7 26155 25182 41604 21039 2317012 1202.3 495.5 25389 22811 17341 20416 3085240 1030.2 863.2 24006 28597 36744 50236 4615130 1122.7 863 22344 31904 18559 17418 1941057 1894.5 903.1 20142 14044 13687 16071 17075

Adj[i,i+1] ) Adj[i+1,i] ) 1

Adj[i,j] ) 0 otherwise

Complex Graph Matrix Representations research articles

Journal of Proteome Research • Vol. 5, No. 5, 2006 1135

possible. Thus, we propose here a number of structuraldescriptors, which have not only vectorial and complex featuresbut also scalar features, thus, capturing multidimensionalfeatures of the proteome.

The characteristic polynomial is obtained using the algorithmPOLY described in the papers by Balasubramanian.24-26 Theoriginal POLY was in Fortran; an R version of POLY char.poly-() was coded. The function char.poly() expects the lower triangleof adjacency matrix. The lower triangle for N × N matrix is asingle dimensional array (vector in R terminology) with N(N +1)/2 elements. Since the adjacency matrix is symmetrical, weonly need the lower triangle. A function read.graph() readsinformation from GRAPH.txt and builds the lower triangle ofthe adjacency matrix as a single dimensional vector. Theeigenvalues are obtained using the eigen() function from theR libraries. However, the input parameter x to the eigen() is

the entire adjacency matrix. A function full.adj.matrix() wascoded to convert the lower triangle of adjacency matrix to fulladjacency matrix. Another parameter to eigen() is symmetric,this parameter is set to TRUE, if the input matrix is symmetric,and FALSE otherwise. In the code, this parameter is set bydefault to FALSE since the code may have nonsymmetricmatrices too.

A new matrix, say Adj1, is obtained by changing the diagonalelements of Adj to (charge) + i(mass).

The read.graph() function with input parameter comp )TRUE (comp for complex) is used to build Adj1. The charac-

Figure 3. Zigzag graph obtained from the proteomic map in Figure 1 by considering 20 principal components of proteins with largeabundance and connecting the vertices in the order of relative abundance.

Table 2. Normalized Relative Abundance of Various Proteinswith Their Masses and Charges

SID charge mass abundance

187 0.71 1.00 177 0.94 0.40 0.99522 0.40 0.42 0.94752 0.73 0.41 0.881

134 0.90 0.52 0.82120 0.51 0.36 0.79662 0.45 0.59 0.77867 0.96 0.34 0.75448 0.47 0.49 0.68096 0.82 0.18 0.648

9 0.50 0.29 0.62375 1.00 0.34 0.60136 0.70 0.36 0.588

2 0.22 0.29 0.571250 0.96 0.72 0.568

44 0.68 0.40 0.55484 0.93 0.34 0.55380 0.78 0.43 0.50415 0.35 0.38 0.500

176 0.85 0.60 0.481

Table 3. Neighborhood Information of the Proteomics Pattern

vertex

no. of

neighbors neighbor

1 0 02 1 13 1 24 1 35 1 46 1 57 1 68 1 79 1 8

10 1 911 1 1012 1 1113 1 1214 1 1315 1 1416 1 1517 1 1618 1 1719 1 1820 1 19

Adj1[i,i+1] ) Adj1[i+1,i] ) 1

Adj1[i,i] ) charge + i(mass)

Adj[i,j] ) 0 elsewhere

research articles Balasubramanian et al.

1136 Journal of Proteome Research • Vol. 5, No. 5, 2006

teristic polynomial of a graph is defined as the seculardeterminant polynomial of the adjacency matrix. The compu-tation of the determinant is an n! order problem and thusbecomes intractable for large graphs. The technique is basedon computing powers of the adjacency matrix of the graph andfinding the traces of the matrices. One of the authors24-26 hasdeveloped a powerful code and algorithm for the characteristicpolynomials of graphs. Since the adjacency matrices of thegraphs generated from proteomics maps as defined above arecomplex, we have to generalize these algorithms and tech-niques for graphs with complex weights. We have done this inthe current work by expressing all complex arithmetic opera-tions in terms of real functions. The characteristic polynomialis thus obtained using the char.poly() function mentionedabove. Since R can recognize complex and real numbers, thesame version of char.poly() works right for both real as well ascomplex number matrices. The eigenvalues are obtained usingthe eigen() function from the R libraries. Yet another parameterto eigen() is a logical parameter called “only.values”; thisparameter should be TRUE if we need only eigenvalues. Sincewe are also seeking eigenvectors, by default, this parameter isset to a logical value of FALSE. The spectral decomposition ofthe input matrix x is returned as a list, for example, e1. Thereare two components of the list e1; e1$values represent theeigenvalues, and e1$vectors is a complex matrix of N × N order,whose columns represent the eigenvectors. A function get-.min.max() was coded to obtain an index of minimum andmaximum eigenvalues from e1$values. When these indices areused, the eigenvectors for smallest and the largest eigenvaluesare sought. The eigenvector of the largest eigenvalue is calledthe principal eigenvector, which provides information on theparticipation of various components in a complex plane. Wehave plotted the principal eigenvector to provide insight intothe abundance and chemically induced changes as a functionof the chemical. All of this information is obtained frome1$vectors. Eigenvalues obtained from Adj1 are complex innature, and they are plotted along the X and Y axes. Theeigenvectors corresponding to the largest and smallest eigen-values are also plotted in a complex grid.

The D/D/matrix approach of Randic et al. consists of twoparts, one called graph distance and the other called Euclidiandistance. These two distances measure shortest geometrical andtopological (connectivity) distances, respectively. We havedefined the diagonal elements of E as relative abundance[i] andoff-diagonal elements E[i,j] as Euclidian distance betweenproteins i and j.

An important normalized graph signature is called the D/Dmatrix that has off-diagonal elements set to the ratio ofEuclidian and topological distances on the weighted graphwhere graph distances are used in combination with Euclidiandistances as shown by Randic et al.19,20 There are physicalinterpretations also for the mathematical invariants. For ex-ample, it has been suggested that the principal eigenvalue ofD/D matrices measures the degree of foldness of structures.19,20

Several powers of the above D/D matrix generate higher-orderinvariants, and their leading eigenvalues (λ1

k) were consideredearlier as descriptors. In this work, we have presented acomplementary approach that involved complex matrices, theireigenvalues, and eigenvectors.

As indicated in the Introduction, we have considered hereproteomics maps obtained for protein patterns derived fornormal liver cells and liver cells extricated from rats that wereexposed to four different peroxisome proliferators, namely,perfluoroctanoic acid (PFOA), perfluorodecanoic acid (PFDA),clofibrate, and diethylhexyl phthalate (DEHP). All experimentaldata that we consider here were obtained by Witzmann andco-workers in the Molecular Anatomy Laboratory of theDepartment of Biology, Indiana University and Purdue Uni-versity, Columbus, IN.18 Each of these chemicals induceschanges to the proteome, which should then be reflected inthe complex algebraic and graph-theoretical generators thatwe have obtained. Likewise, the sequence of the leadingeigenvalues of kD/kD matrices can also provide insight into theproteome and the action of various chemicals. They can beviewed as “biodescriptors” that characterize the state of cellularproteomes, and in general as descriptors that characterizebiological systems under various external or internal perturba-tions. The experimental techniques for the extraction of the2D-gel patterns have been described in a previous paperadequately.2 Here, we would like to briefly summarize that the2D-gel maps are obtained from male Fisher-344 rats (225-250g) from Charles River Breeding Labs. PFDA and PFOA weredissolved in propylene glycol and water, 1:1 by volume, andconcentration-adjusted so that the dose volume did not exceed0.5 mL. Rats were injected intraperitoneally with the abovesolutions with exposures of 2 mg (n ) 5), 20 mg (n ) 5), and50 mg PFDA/kg body weight (n ) 9), by single injection,animals sacrificed on day 8 of exposure; 50 mg PFDA/kg bodyweight (n ) 5), by single injection, animals sacrificed 30 daysafter exposure; and 150 mg PFOA/kg body weight (n ) 8), bysingle injection, animals sacrificed on day 3 of exposure.Clofibrate (ethyl-p-chlorophenoxyisobutyrate) was adminis-tered as neat oil, 250 mg clofibrate/kg body weight, singleintraperitoneal injection on each of 3 successive days, animalssacrificed on day 5 of exposure (n ) 10). DEHP was adminis-tered as neat oil, via oral gavage, 1200 mg/kg, animals sacrificedon day 5 of exposure (n ) 3). Matched control rats were vehicle-injected and pair-fed (PFC; n ) 10), while one group (Ad Lib;n ) 6) served as free-eating controls. The 2D electrophoretictechnique was employed to get a proteomics map, since thetechnique has the ability to resolve thousands of cellularproteins based first on their content of acidic and basic aminoacids (isoelectric focusing) and second by molecular weight(SDS electrophoresis). The effects of various chemicals werealso reflected on the proteomic patterns. In the present study,we are working with these 2D-gel data to mathematicallycharacterize the proteomics patterns and the effects of variouschemicals on the proteome.

The 2D-gel data contain measurements for charge and masswhich we represent in a complex plane, where the real part isthe charge and the imaginary part is the mass, respectively.The experimental data in absolute terms contain the x and ycoordinates in the range (0 < x < 3000 and 0 < y < 2500), whilethe abundance is measured in units yielding entries severalorders of magnitudes larger. For the control data, whichmeasure the abundance of proteins without any chemicalsexposed, the control is in the range (0, 137 000), but in thepresence of toxic substances, control can increase even above200 000 (for protein no. 20 of F344 liver PFDA). We have thusrenormalized the experimental entries data by setting thelargest positive value to unity and scaling all numbers relativeto that. This also keeps the matrices, eigenvectors, and eigen-

E[i,j] ) x(xi - xj)2 + (yi - yj)

2 if i * j

abundance[i] if i ) j

Complex Graph Matrix Representations research articles

Journal of Proteome Research • Vol. 5, No. 5, 2006 1137

values within numerical bounds and without subjecting themto numerical overflows. The overflows can become problematicparticularly for higher powers of the D/D matrix.

3. Results and Discussion

The characteristic polynomial and eigenvalues from thematrix Adj1, which is the adjacency matrix of the ordinaryunweighted proteomics map, are shown in Table 4. Thecharacteristic polynomial and eigenvectors corresponding tothe smallest and largest eigenvalues from the complex matrix,which have diagonal elements weighted with complex weightscorresponding to charge and mass, are shown in Table 5. Asexpected, in contrast to the results in Table 4, which do notconsider the charge and mass of each protein, the results inTable 5 all have complex eigenvalues, wherein the real com-ponent can be thought of as the mass component, while theimaginary component corresponds to the charge. Clearly, theresults in Tables 4 and 5 are substantially different indicatinghow the charge and mass of the proteomics map play a criticalrole in determining the eigenvalues and eigenvectors, whichare mathematical descriptors of the proteome. As can be seenfrom Table 5, the eigenvalues exhibit a larger spread along thecharge variable with this component varying from 2.3705(largest) to 0.0019 (smallest in magnitude); the spread alongthe imaginary component, which corresponds to the mass, isbetween 0.6 and 0.8. This shows that there is a much smallervariation in the spectra along the mass axis and a larger spreadalong the charge axis. The eigenspectra shown in Table 5correspond only to control of the relative abundances in theabsence of any other external chemicals. Thus, the results inTable 5 can be viewed as a descriptor of the proteome itself.Figure 4 shows a graphical representation of the complexspectral map of the proteomics pattern. As seen from Figure4, which gives more insight, the spread along the y-axis (mass)is much less compared to the spread along the x-axis (charge).There are n possible eigenvectors that are orthogonal to eachother for each of the eigenvalues, where n is the number ofproteins. For the sample data set, since we have consideredthe 20 most abundant proteins, we have 20 eigenvectors.

Among these, the eigenvector corresponding to the largesteigenvalue in norm, called the principal component vector, isan important structural descriptor. This vector is plotted inFigure 5 along the charge and mass axes, respectively. Weexpect this pattern of principal eigenvectors to be a uniquedescriptor for a given proteome and hence a very usefuldescriptor of the proteome.

As discussed in Section 2, we have also considered theEuclidian distance matrix of the proteins.

Table 6 shows the characteristic polynomials, principaleigenvalues, and eigenvectors, as well as smallest eigenvalueand eigenvector of the Euclidian matrix. Note that the Euclidainmatrix measures the shortest geometrical distance between theproteins on the (x,y)-grid. The Euclidian spectra are not toointeresting by themselves, as the principal eigenvalue stands

Table 4. Results from Adjacency Matrix

real characteristic polynomial

1, 0, -19, 0, 153, 0, -680, 0, 1820, 0, -3003, 0, 3003, 0, -1716, 0, 495, 0, -55, 0, 1

real eigenvalues

-1.9777, 1.9777, -1.9111, 1.9111, 1.8019, -1.8019, -1.6525, 1.6525, 1.4661, -1.4661, -1.247, 1.247, -1, 1, 0.7307, -0.7307, 0.445,-0.445, -0.1495, 0.1495

Table 5. Results from Adjacency Matrix with Diagonal Elements as Charge + i(Mass)

complex characteristic polynomial

1 + 0i, -8.86 - 13.86i, -72.7393 + 116.5663i, 777.6914 + 149.3978i, -464.6874 - 3456.4613i, -11252.2716 + 5065.7978i,22906.4781 + 27992.7936i, 53698.4071 - 71512.0605i, -172125.0388 - 77250.3828i, -73236.1264 +333913.1848i, 533380.8604 +13071.4797i, -109820.7759 - 708606.7969i, -785103.1309 + 260111.145i, 371819.4414 + 723219.8399i, 549065.606 -391400.2502i, -317515.0114 - 338270.7795i, -165004.3099 + 199079.6725i, 94325.2473 + 61297.0751i, 16261.2879 -32060.8262i, -7015.8374 - 2733.3932i, -217.5977 + 747.4312i

complex eigenvalues

2.3705 + 0.6697i, 2.3344 + 0.7247i, 2.1922 + 0.6997i, 2.0526 + 0.6762i, 1.9004 + 0.6774i,1.7166 + 0.7635i, -1.5898 + 0.69i, -1.4383+ 0.683i, 1.4541 + 0.6431i, -1.3874 + 0.7217i, -1.2018 + 0.6627i, 1.1567 + 0.684i, 0.9577 + 0.7404i, -1.0129 + 0.614i, -0.7772 +0.7042i, 0.6562 + 0.6222i, -0.5551 + 0.7053i, 0.3033 + 0.7229i, -0.0019 + 0.7539i, -0.2701 + 0.7014i

Figure 4. Plot of complex eigenvalues. The real and imaginarypart plotted on the X- and Y-axis, respectively. Square, smallestprincipal value; triangle, largest principal value.

research articles Balasubramanian et al.

1138 Journal of Proteome Research • Vol. 5, No. 5, 2006

out as a large number and the remaining values are small. Thisis quite typical of purely distance-based measures as shownby one of the authors27,28 in the context of distance spectra,Euclidian distances28 and distance polynomials.27 It can beeasily shown2 that for an unweighted graph the principaleigenvalue, λ1, of the D/D matrix asymptotically reaches thevalue 2 cos[π/(n + 2)], the leading eigenvalue of adjacencymatrix of a chain of length n.

The most interesting trends are obtained by plotting theeigenvalues and the principal eigenvectors on the same charge-mass grid of the proteomics map. We shall see that such plotscharacterize not only the proteomics map but also the chemi-cally induced changes to the proteome by the various peroxi-some proliferators that we have considered here, namely,clofibrate, DEHP, PFDA, and PFOA. We discuss these eigen-values and eigenvector maps and show that they are noveldescriptors of the proteome and their responses to chemicalsor toxicants.

Figure 6 shows the complex spectral map and complexprincipal eigenvector corresponding to the data obtained from

rat liver cell exposed to clofibrate. The real and imaginary partsare plotted on the x- and y-axis, and they represent charge andmass, components in the original matrix, respectively. Thecorresponding plots for DEHP are in Figure 7, FFDA on Figure8, and PFOA on Figure 9, respectively. A uniform feature of allcomplex eigenspectral maps is that the spread is larger alongthe charge axis compared to the mass axis. Note that for eachof the plots, the original matrices measure the perturbationcaused by the chemicals, as diagonal elements are the differ-ences between the data obtained after exposure to the chemicalsubtracted from the data of the control. Thus, the spectral mapsare true reflections of the effects of chemicals on the rat cell.

The most interesting information is obtained by consideringthe distance and vectorial positions of the smallest and largesteigenvalues of the eigenspectra of the four chemicals that wehave considered here. As can be seen from Figures 6-9, DEHPstands out in exhibiting the largest spread or Euclidian distancebetween its smallest and principal eigenvalues (distance be-tween the triangle and square in figure). As can be seen fromthese figures, the vectorial relative positions of the smallest andlargest eigenvalues on the complex grid also differ for DEHPcompared to the other three chemicals. Whereas DEHP showssubstantial vertical displacement between the smallest andlargest eigenvalues, this is not the case for the other threechemicals (see, Figures 6-9). The information-theoretic analy-ses by Basak et al.11 of the proteomics patterns of PFOA, PFDA,clofibrate, and DEHP using 10, 200, 500, and 1054 spots showDEHP to be substantially different from the other threeperoxisome proliferators.

It is interesting to note that when one wants to use a largenumber of spots, for example, >1000 spots to characterize asmall number of maps, such as four peroxisome proliferators,the number of independent variables (spots) are overwhelming.To solve this, Basak and co-workers30 have attempted variousapproaches to develop a small number of compact descriptors.Leading eigenvalues of the D/D matrix formulated by Randicet al.2 and the spectrum-like descriptors developed by Vrackoet al.3 are examples of compact descriptors which condenseinformation present in the map using a few numerical param-eters. As more data on the effects of more numerous chemicalson cellular/biological systems are available, the utility of suchdescriptors can be tested.

The complex norm of the principal eigenvalues correspond-ing to four chemicals would measure the deviation from theunperturbed proteomics map. Since we have subtracted the

Table 6. Characteristic Polynomials and Eigenvalues of the Euclidian Matrix

characteristic polynomial from Euclidian matrix

1, -13.843, 57.43, -85.756, -74.542, 530.51, -1018.199, 1061.374, -545.815, -114.473, 464.774, -437.719, 251.493,-98.453,26.24, -4.31,0.233, 0.06, -0.086, -0.453, -9.521

eigenvalues from Euclidian matrix

8.106, -2.203, 0.886, -0.833, 0.824, 0.781, 0.689, 0.686, 0.65, 0.644, 0.533, 0.51, 0.489, 0.45, 0.435, 0.407, 0.356, 0.3, 0.285,-0.151

eigenvector of smallest principal value

0.015, 0.24, -0.275, 0.062, 0.218, -0.195, -0.212, 0.267, -0.23, 0.117,-0.212, 0.296, 0.027, -0.396, 0.224, 0.005, 0.268, 0.133,-0.357, 0.191

eigenvector of largest principal value

0.361, 0.212, 0.226, 0.171, 0.202, 0.195, 0.226, 0.218, 0.2, 0.232, 0.207, 0.229, 0.169, 0.305, 0.26, 0.165, 0.203, 0.166, 0.233,0.197

Figure 5. Plot of the eigenvector corresponding to largestprincipal eigenvalue. The real and imaginary part plotted on theX- and Y-axis, respectively.

Complex Graph Matrix Representations research articles

Journal of Proteome Research • Vol. 5, No. 5, 2006 1139

control from the diagonal elements of the perturbation matrix,if there were to be no perturbation, we would have a zeroeigenvalue, and thus, the deviation from the zero valuemeasures the perturbation by the chemical to the proteome.That is to say, the principal eigenvalue with the greatest normcauses the largest perturbation, while the one with the smallestnorm causes the least perturbation. One may recall that thenorm of a complex variable is the square root of the sum ofthe squares of real and imaginary parts. On the basis of this,we find that the four chemicals considered here have the norms2.50, 2.98, 3.03, and 3.17 for PFOA, clofibrate, PFDA, and DEHP,respectively. This suggests that PFOA exerts the least perturba-tion on the proteome among the four chemicals or is the leasttoxic among them. This conclusion is consistent with the onearrived at by Randic et al. with their D/D matrix method.2

However, we find that clofibrate, PFDA, and PFOA all havesimilar perturbations, but DEHP stands out as being the mosttoxic and most contrasting in the complex eigenspectra. Randicet al.2 have obtained the result that clofibrate is the most toxic

on the basis of the D/D matrix approach. However, it isinteresting that in both cases PFOA stands out as being theleast toxic, and in our approach, the contrast among clofibrate,PFDA, and PFOA is less, whereas DEHP stands out.

4. Conclusions

We have developed graph-theoretical complex matrix rep-resentations of the relationship of 2D density of gel spotsobtained from cell proteome via 2D electrophoresis/chroma-tography. In this method, a graph is obtained by connectingthe gel spots of the proteomics map in the order of their relativeabundance and diagonal elements of the graph weighted by acomplex variable. The complex weight assigned to each vertexcorresponds to its charge for the real part and mass for theimaginary part. In this manner, both charge and mass informa-

Figure 6. Complex spectral map and complex principal eigen-vector of clofibrate on the rat liver cell. The real and imaginaryparts are plotted on the X- and Y-axis, respectively. Square,smallest principal value; triangle, largest principal value.

Figure 7. Complex spectral map and complex principal eigen-vector of DEHP on the rat liver cell. The real and imaginary partsare plotted on X- and Y-axis, respectively. Square, smallestprincipal value; triangle, largest principal value.

research articles Balasubramanian et al.

1140 Journal of Proteome Research • Vol. 5, No. 5, 2006

tion of the proteins comprising the proteomics map have beenconsidered. We have shown that the eigenspectra of thecomplex matrix and its principal eigenvector yield importantinsight into the proteome. The principal eigenvalue and theprincipal eigenvector seem to provide novel complex descrip-tors of the proteome.

The perturbations caused by four chemicals to the cell, thatis, by various peroxisome proliferators, namely, clofibrate,DEHP, PFDA, and PFOA, have been modeled by complexvariable proteomics graphs. The complex eigenspectral mapsand the map of the principal eigenvector were shown tocharacterize the perturbations caused by these chemicals. Wehave used the norm of the principal eigenvalue as a descriptorof the extent of toxicity which seems to be in accord withexperiment. On the basis of the norm of the principal eigen-value, it was shown that PFOA causes the least toxicity or

perturbation to the cell, while PFDA, DEHP, and clofibratecause comparable perturbations, with DEHP being the chemi-cal that causes the greatest perturbation among these. On thebasis of the proteomics maps, it was shown that DEHP standsout as having a different eigenspectral map compared to thoseof the other three chemicals, namely, PFOA, clofibrate, andPFDA, which are mutually similar. The largest and smallesteigenvalues of DEHP show not only the greatest distance butalso substantial angular variation compared to other threechemicals. Both the eigenvalues exhibit very little variationalong the imaginary components for PFOA, clofiberate, andPFDA, whereas there is a large displacement along the verticaldirection or imaginary axis in the case of DEHP. This vectorialfeature and variation can only be characterized by a complexrepresentation as we have considered here.

Figure 8. Complex spectral map and complex principal eigen-vector of PFDA on the rat liver cell. The real and imaginary partsare plotted on X- and Y-axis, respectively. Square, smallestprincipal value; triangle, largest principal value.

Figure 9. Complex spectral map and complex principal eigen-vector of PFOA on the rat liver cell. The real and imaginary partsare plotted on X- and Y-axis, respectively. Square, smallestprincipal value; triangle, largest principal value.

Complex Graph Matrix Representations research articles

Journal of Proteome Research • Vol. 5, No. 5, 2006 1141

While these approaches seem to provide principal eigenval-ues and eigenvectors for the proteomics maps and for theperturbations induced by chemicals upon the cell, there isconsiderable room to generalize these methods. For example,at present, our approach considers only the mass and chargeof each gel spot, but there is more information on each spot,such as the amino acid sequence and properties of the proteinin each spot. Mathematical characterization of such latentinformation is far more complex than what we have consideredhere. Such studies could be topics of future investigations.

Acknowledgment. The research at California StateUniversity East Bay was supported by the National ScienceFoundation under Grant No. CHE-0236434. The work at LLNLwas performed in part under the auspices of the U.S. Depart-ment of Energy by the University of California, LLNL undercontract number W-7405-Eng-48. The work at NRRI wassupported by Grant F49620-01-1-0098 from the United StatesAir Force Office of Scientific Research. The authors extend theirthanks to Brian Gute, Natural Resources Research Institute ofUMD, Duluth, for insightful comments.

References

(1) Randic, M.; N. Lers, N.; Plavsic, D.; Basak, S. C. J. Proteome Res.2004, 3, 778-785.

(2) Randic, M.; Witzmann, F.; Vracko, M.; Basak, S. C. Med. Chem.Res. 2001, 10, 456-479.

(3) Vraeko, M.; Basak, S. C. Chemometr. Intell. Lab. Syst. 2004, 70,33-38.

(4) Blackstock, W. P.; Weir, M. P Trends Biotechnol. 1999, 17, 121-127.

(5) Cutler, P.; Birrell, H.; Haran, M.; Man, W.; Neville, B.; Rosier, S.;Skehel, M.; White, I. Biochem. Soc. Trans. 1999, 27, 555-559.

(6) O′Farrell, P. Z.; Goodman, H. M.; O’Farrell, P. H. Cell 1977, 12,1133-1141.

(7) Klose, J.; Kobalz, U. Electrophoresis 1995, 16, 1034-1059.(8) Anderson, N. L.; Taylor, J.; Hofmann, J. P., et al. Toxicol. Pathol.

1996, 24, 72-76.(9) Appel, R. D.; Hochstrasser, D. F. Methods Mol. Biol. 1999, 112,

363-381.

(10) Guo, X.; Randic, M.; Basak, S. C. Chem. Phys. Lett. 2001, 350, 106-112.

(11) Basak, S. C.; Gute, B. D.; Witzman, F. WSEAS Trans. Inf. Sci. Appl.2005, 2, 996-1001.

(12) Vracko, M.; Basak, S. C. Chemometr. Intell. Lab. Syst. 2004, 70,33-38.

(13) Randic, M.; Novie, M.; Vraeko, M. J. Chem. Inf. Model. 2005, 45,1205-1213.

(14) Randic, M.; Zupan, J.; Balaban, A. T. Chem. Phys. Lett. 2004, 397,247-252.

(15) Randic, M.; Vraeko, M.; Nandy, A. Basak, S. C. J. Chem. Inf.Comput. Sci. 2000, 40, 1235-1244.

(16) Randic, M.; Razinger, M. On characterization of 3D molecularstructure. In From Chemical Topology to Three-DimensionalGeometry; Balaban, A. T., Ed.; Plenum Press: New York, 1977;pp 159-236.

(17) Bytautas, L.; Klein, D. J.; Randic, M.; Pisanski, T. Foldedness inlinear polymers: A difference between graphical and Euclideandistances. In Discrete Mathematical Chemistry; Hansen, P.,Fowler, P. W., Zheng, M., Eds.; DIMACS Series in DiscreteMathematical and Theoretical Computer Science; AmericanMathematical Society: Providence, RI, 2000; pp 51, 39-61.

(18) Witzman, F. Molecular Anatomy Laboratory, Department ofBiology, Indiana University and Purdue University, Columbus,IN 47203.

(19) Randic, M.; Krilov, G. Int. J. Quantum Chem. 1999, 75, 1017-1026.

(20) Randic, M. J. Chem. Inf. Comput. Sci. 1995, 35, 373-382.(21) Anderson, N. L. Two-Dimensional Electrophoresis: Operation of

the ISO-DALT System; Large Scale Biology Press: Washington,DC, 1991.

(22) Neuhoff, V.; Arold, N.; Taube, D.; Ehrhardt, W. Electrophoresis1988, 9, 255-262.

(23) Anderson, N. L.; Giere, F. A.; Nance, S. L.; Gemmell, M. A.;Tollaksen, S. L.; Anderson, N. G. Fundam. Appl. Toxicol. 1987, 8,39-50.

(24) Balasubramanian, K. Theor. Chim. Acta 1984, 65, 49-58.(25) Balasubramanian, K. J. Comput. Chem. 1984, 5, 387-394.(26) Balasubramanian, K. J. Comput. Chem. 1988, 9, 204-211.(27) Balasubramanian, K. J. Comput. Chem. 1990, 11, 828-836.(28) Balasubramanian, K. Chem. Phys. Lett. 1995, 232, 415-423.(29) Hawkins, D. M.; Basak, S. C.; Karaker, J.; Geiss, K. T.; Witzmann,

F. A. J. Chem. Inf. Model. 2006, 46, 9-16.(30) Bajzer, Z.; M. Randic, M.; D. Plavsic, D.; Basak, S. C. J. Mol.

Graphics Modell. 2003, 22, 1-9.

PR050445S

research articles Balasubramanian et al.

1142 Journal of Proteome Research • Vol. 5, No. 5, 2006