rank order ontologies based on p values remove duplicate probe ids select primary go annotation;...

1
Rank order ontologies based on p values Remove duplicate probe IDs Select primary GO annotation; determine full GO ancestry for each probe ID Enumerate n, f, g, c for each ontology Calculate solution of hypergeometric equation for each ontology P = n - 1 i = 0 f i g - f c - i g c 1 - Included on microarrayNot included on microarray Anti-IgM CD40L LPS G O id g f c n expt prob G O type G O nam e ClusterID C D 40L LP S anti-IgM GO :0005634 2490 254 380 64 38.76 8.71E -06 CC nucleus 1 1 1 1 GO :0009058 2490 141 180 25 10.19 1.36E -05 BP biosynthesis 2 1 1 G O :0008372 2490 1409 1 1 0.566 5.66E -01 CC C C unknow n 3 1 -1 -1 G O :0046072 2490 2 160 2 0.129 4.10E -03 BP dTDP m etabolism 4 1 1 G O :0009605 2490 30 3 2 0.036 4.18E -04 BP response to externalstim ulus 5 1 -1 GO :0016655 2490 14 331 11 1.861 4.94E -08 MF oxidoreductase activity,acting on N A DH orNA DPH , 6 1 GO :0005773 2490 12 11 4 0.053 1.00E -07 CC vacuole 7 -1 -1 1 G O :0003779 2490 15 277 9 1.669 6.30E -06 M F actin binding 8 -1 -1 -1 G O :0016758 2490 6 245 4 0.59 1.17E -03 M F transferase activity,transferring hexosylgroups 9 -1 -1 G O :0006417 2490 2 4 1 0.003 3.21E -03 BP regulation ofprotein biosynthesis 10 -1 1 G O :0008372 2490 1409 41 32 23.2 3.30E -03 CC C C unknow n 11 -1 -1 G O :0008047 2490 10 160 3 0.643 2.23E -02 M F enzym e activatoractivity 12 -1 G O :0006397 2490 19 56 4 0.427 6.92E -04 BP m R N A processing 13 1 1 GO :0005576 2490 156 183 33 11.47 7.28E -09 CC extracellular 14 1 G O :0046916 2490 1 4 1 0.002 1.61E -03 BP transition m etalion hom eostasis 15 -1 1 G O :0003931 2490 3 38 2 0.046 6.74E -04 M F R ho sm allm onom eric G TP ase activity 16 -1 -1 G O :0004032 2490 3 188 3 0.227 4.24E -04 M F aldehyde reductase activity 17 -1 G O :0015672 2490 10 191 7 0.767 1.38E -06 BP m onovalentinorganic cation transport 18 1 GO :0016892 2490 3 50 2 0.06 1.17E -03 MF endoribonuclease activity,producing otherthan 5'-pho 19 -1 P robe ID LocusLink ID G ene N am e Function C530010I21 114143 A tp6v0b A TP ase, H + transporting, V 0 subunit B vesicle acidification 2310069H 14 66290 A tp6g1 A TP ase, H + transporting, V 1 subunit G isoform 1 vesicle acidification 1700025B 18 66335 A tp6v1c1 vacuolarA TP synthase subunit C vesicle acidification 5730403E 06 70495 A tp6ip2 sim ilarto A TP ase, H + transporting, lysosom al interacting protein 1) vesicle acidification 1810018O 03 vacuolarA TP synthase subunit C hom olog vesicle acidification 1700063K 16 76610 sim ilarto vacuolarA TP synthase subunit F vesicle acidification 1500039N 14 108124 N apa (S N A R E ) N-ethylm aleim ide sensitive fusion protein attachm ent protein alpha vesicle trafficking 3010014K 12 108124 N apa (S N A R E ) N-ethylm aleim ide sensitive fusion protein attachm ent protein alpha vesicle trafficking 0910001N 05 69178 S nx5 sorting nexin 5 vesicle trafficking 2010015D 08 56433 V ps29 vacuolarprotein sorting 29 (S . pom be) vesicle trafficking 2310021D 14 26373 Clcn7 chloride channel 7 vesicle pH 5430413F24 56382 R ab9 RAB9, m em berR A S oncogene fam ily vesicle trafficking H 3020C 05 19330 R ab18 R A B 18, m em berR A S oncogene fam ily vesicle trafficking 3732413A 17 19334 R ab22 R A B 22, m em berR A S oncogene fam ily vesicle trafficking 5430417M 23 56208 B ecn1 beclin 1 (coiled-coil, m yosin-like B C L2-interacting protein) apoptosis 2410021B 16 30954 S iva-pending C d27 binding protein (H indu G od of destruction) apoptosis 618272 12363 C asp4 caspase 4, apoptosis-related cysteine protease apoptosis 4933428M 04 66593 Diablo sim ilarto S M AC precursor(caspase activator) apoptosis 5730436C 18 50912 Pm scl2 polym yositis/scleroderm a autoantigen 2 autoimmunity 1500016H 19 56390 S ssca1 S jogren's syndrom e/scleroderm a autoantigen 1 hom olog autoimmunity Molecular components of B cell antigen receptor-mediated endocytosis revealed by CLASSIFI: a tool for functional classification of microarray gene clusters Jamie A. Lee , Robert Sinkovits §†† , Dennis Mock §†† , Eva Rab , Jennifer Cai , Peng Yang , Brian Saunders §†† , Robert C. Hsueh ‡†† , Sangdun Choi ||†† , Tamara I. A. Roach* †† , Shankar Subramaniam §¶†† , and Richard H. Scheuermann †§†† Department of Pathology, Laboratory of Molecular Pathology and Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, Texas 75390; § San Diego Supercomputer Center and Department of Bioengineering, University of California, San Diego, California 92122; || Division of Biology, California Institute of Technology, Pasadena, CA; *San Francisco Veterans Administration Medical Center, San Francisco, CA; †† Alliance for Cellular Signaling Abstract Antigen recognition by B lymphocytes leads to a complex series of phenotypic changes that help orchestrate the immune response to infection. Gene expression microarrays provide excellent tools to identify the genes that control these phenotypic changes. One approach to the analysis of microarray data is to group genes together into gene clusters based on their similarity in expression patterns in comparison with the experimental variables. To understand the biological significance of these gene clusters, we developed CLASSIFI (CLuster ASSIgnment For biological Inference), a bioinformatics tool that classifies gene clusters based on the probability of co- clustering of probes with similar gene ontology annotation. We applied CLASSIFI to an Alliance for Cellular Signaling (AfCS) data set that examines the in vitro responses of B cells to stimulation with three ligands that cause a strong proliferative response: CD40L, LPS, and anti-IgM. CLASSIFI analysis revealed an overrepresentation of gene ontologies related to intracellular transport, including genes involved in endocytosis and vesicle acidification such as ATPase H+ pump subunits and SNARE- related genes, in the cluster of genes that are upregulated only in response to anti- IgM. Based on this gene expression data analysis, we hypothesized that anti-IgM, unlike CD40L and LPS, specifically stimulates a complex biological process that includes endocytosis, endosome acidification, vesicle fusion and vesicle transport. The predicted effect of these ligands on receptor endocytosis has been verified experimentally. Figure 1. Experimental methodology and analysis of microarray data. B cells were negatively selected from mouse spleens using anti-CD43-coated magnetic beads and cultured for 4 hrs in the presence or absence of anti-IgM, LPS or anti-CD40. Fluorescently-labeled cRNA prepared from these cells was mixed with a reference cRNA (from total spleen) and hybridized to a custom spotted cDNA microarray. A. Fluorescence values were filtered to remove features too close to background and normalized to the spleen reference. Significance Analysis of Microarrays (SAM) was used to identify genes whose expression was significantly different between untreated and treated conditions. Genes that were significantly upregulated or downregulated were assigned values of “+1” or “-1”, respectively. The genes were clustered together based on the categorical expression patterns, and analyzed using CLASSIFI. B. The steps involved in CLASSIFI analysis of clustered microarray data are detailed. g=number of probes in entire data set, c=number of probes in a specific gene cluster, f=number of probes with a given ontology in entire data set, n=number of probes with a given ontology in the specific gene cluster. Figure 2. Clustering and CLASSIFI results for data from 3 ligands. Clustering of categorical data from B cells stimulated with CD40L, LPS, and anti-IgM results in 19 gene clusters. Red=upregulated. Green=downregulated. Black=no change. Following CLASSIFI analysis, the gene ontology with the lowest p value in each gene cluster is listed. GO id=a unique Gene Ontology identifier that corresponds to a defined molecular function (MF), biological process (BP), or cellular component (CC). Expt=the expected number of occurrences of a given GO id in a given cluster of size (n) based on a random distribution. Prob=the probability that the GO id co- cluster pattern has occurred by chance. Figure 3. Intracellular transport-related genes in Gene Cluster 18. Selection of genes found in Gene Cluster 18 with functions related to endosome acidification, transport and fusion. Figure 4. Coordinate expression of endocytosis genes. Since the microarray methodology is inherently “noisy”, it is important to verify the expression pattern of potentially interesting genes by a parallel methodology. The ligand-specific expression pattern of four genes found in Gene Cluster 18 was verified by real-time PCR analysis (left side). Based on the CLASSIFI analysis, we hypothesize that anti-IgM might also induce the upregulation of other genes involved in endosome processing. Indeed, anti- IgM was found to induce the mRNA levels of four other components of the ATPase H+ pump, while CD40L and LPS did not (right side). Figure 5. Internalization through the B cell antigen receptor (BCR) WEHI-231 cells were treated with a non-stimulating anti-IgM mAb conjugated to FITC. Polyclonal anti-IgM was then added to stimulate the cells. Following a 1-hour stimulation, cells were harvested and washed with acid to remove surface-bound antibody. Cells that were washed with acid (dotted lines) or not (solid lines) were compared to unstimulated cells (black lines). Anti-IgM, but not CD40L or LPS, stimulates internalization of the BCR, as predicted from the gene expression data. Conclusions •We have applied CLASSIFI, a tool for functional classification of microarray gene clusters, to a microarray data set comparing B cell responses to CD40L, LPS, and anti-IgM. •CLASSIFI analysis reveals significant co-clustering of gene ontologies related to intracellular transport in Gene Cluster 18, which contains genes that are upregulated specifically in response to anti-IgM. •Several genes within Gene Cluster 18 are related to various aspects of endosome internalization, acidification and trafficking, including ATPase H+ pump subunits and SNARE-related genes, leading to the hypothesis that activation of B cells through the antigen receptor induces endocytosis, antigen processing and presentation. •ATPase H+ pump subunit genes that were not included on the microarray were also found to be upregulated in a ligand-specific manner, indicating that genes Basic filtering Normalization Statistical filtering Correlation clustering CLASSIFI Raw Data A. B.

Upload: ethelbert-mckinney

Post on 21-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rank order ontologies based on p values Remove duplicate probe IDs Select primary GO annotation; determine full GO ancestry for each probe ID Enumerate

Rank order ontologies based on p values

Remove duplicate probe IDs

Select primary GO annotation; determine full GO ancestry for each probe ID

Enumerate n, f, g, c for each ontology

Calculate solution of hypergeometric

equation for each ontology

P =n - 1

i = 0

fi

g - fc - i

gc

1 -

Included on microarray Not included on microarray

Anti-IgM CD40L LPS

GO id g f c n expt prob GO type GO name Cluster ID CD40L LPS anti-IgMGO:0005634 2490 254 380 64 38.76 8.71E-06 CC nucleus 1 1 1 1GO:0009058 2490 141 180 25 10.19 1.36E-05 BP biosynthesis 2 1 1GO:0008372 2490 1409 1 1 0.566 5.66E-01 CC CC unknown 3 1 -1 -1GO:0046072 2490 2 160 2 0.129 4.10E-03 BP dTDP metabolism 4 1 1GO:0009605 2490 30 3 2 0.036 4.18E-04 BP response to external stimulus 5 1 -1GO:0016655 2490 14 331 11 1.861 4.94E-08 MF oxidoreductase activity, acting on NADH or NADPH, quinone or similar compound as acceptor6 1GO:0005773 2490 12 11 4 0.053 1.00E-07 CC vacuole 7 -1 -1 1GO:0003779 2490 15 277 9 1.669 6.30E-06 MF actin binding 8 -1 -1 -1GO:0016758 2490 6 245 4 0.59 1.17E-03 MF transferase activity, transferring hexosyl groups 9 -1 -1GO:0006417 2490 2 4 1 0.003 3.21E-03 BP regulation of protein biosynthesis 10 -1 1GO:0008372 2490 1409 41 32 23.2 3.30E-03 CC CC unknown 11 -1 -1GO:0008047 2490 10 160 3 0.643 2.23E-02 MF enzyme activator activity 12 -1GO:0006397 2490 19 56 4 0.427 6.92E-04 BP mRNA processing 13 1 1GO:0005576 2490 156 183 33 11.47 7.28E-09 CC extracellular 14 1GO:0046916 2490 1 4 1 0.002 1.61E-03 BP transition metal ion homeostasis 15 -1 1GO:0003931 2490 3 38 2 0.046 6.74E-04 MF Rho small monomeric GTPase activity 16 -1 -1GO:0004032 2490 3 188 3 0.227 4.24E-04 MF aldehyde reductase activity 17 -1GO:0015672 2490 10 191 7 0.767 1.38E-06 BP monovalent inorganic cation transport 18 1GO:0016892 2490 3 50 2 0.06 1.17E-03 MF endoribonuclease activity, producing other than 5'-phosphomonoesters19 -1

Probe ID LocusLink ID Gene Name FunctionC530010I21 114143 Atp6v0b ATPase, H+ transporting, V0 subunit B vesicle acidification2310069H14 66290 Atp6g1 ATPase, H+ transporting, V1 subunit G isoform 1 vesicle acidification1700025B18 66335 Atp6v1c1 vacuolar ATP synthase subunit C vesicle acidification5730403E06 70495 Atp6ip2 similar to ATPase, H+ transporting, lysosomal interacting protein 1) vesicle acidification1810018O03 vacuolar ATP synthase subunit C homolog vesicle acidification1700063K16 76610 similar to vacuolar ATP synthase subunit F vesicle acidification1500039N14 108124 Napa (SNARE) N-ethylmaleimide sensitive fusion protein attachment protein alpha vesicle trafficking3010014K12 108124 Napa (SNARE) N-ethylmaleimide sensitive fusion protein attachment protein alpha vesicle trafficking0910001N05 69178 Snx5 sorting nexin 5 vesicle trafficking2010015D08 56433 Vps29 vacuolar protein sorting 29 (S. pombe) vesicle trafficking2310021D14 26373 Clcn7 chloride channel 7 vesicle pH5430413F24 56382 Rab9 RAB9, member RAS oncogene family vesicle traffickingH3020C05 19330 Rab18 RAB18, member RAS oncogene family vesicle trafficking3732413A17 19334 Rab22 RAB22, member RAS oncogene family vesicle trafficking5430417M23 56208 Becn1 beclin 1 (coiled-coil, myosin-like BCL2-interacting protein) apoptosis2410021B16 30954 Siva-pending Cd27 binding protein (Hindu God of destruction) apoptosis618272 12363 Casp4 caspase 4, apoptosis-related cysteine protease apoptosis4933428M04 66593 Diablo similar to SMAC precursor (caspase activator) apoptosis5730436C18 50912 Pmscl2 polymyositis/scleroderma autoantigen 2 autoimmunity1500016H19 56390 Sssca1 Sjogren's syndrome/scleroderma autoantigen 1 homolog autoimmunity

Molecular components of B cell antigen receptor-mediated endocytosis revealed by CLASSIFI: a tool for functional classification of microarray gene clusters

Jamie A. Lee†, Robert Sinkovits§††, Dennis Mock§††, Eva Rab†, Jennifer Cai†, Peng Yang†, Brian Saunders§††, Robert C. Hsueh‡††, Sangdun Choi||††, Tamara I. A. Roach*††, Shankar Subramaniam§¶††, and Richard H. Scheuermann†§††

†Department of Pathology, Laboratory of Molecular Pathology and ‡Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, Texas 75390; §San Diego Supercomputer Center and ¶Department of Bioengineering, University of California, San Diego, California 92122; ||Division of Biology, California Institute of Technology, Pasadena, CA;

*San Francisco Veterans Administration Medical Center, San Francisco, CA; ††Alliance for Cellular Signaling Abstract

Antigen recognition by B lymphocytes leads to a complex series of phenotypic changes that help orchestrate the immune response to infection. Gene expression microarrays provide excellent tools to identify the genes that control these phenotypic changes. One approach to the analysis of microarray data is to group genes together into gene clusters based on their similarity in expression patterns in comparison with the experimental variables. To understand the biological significance of these gene clusters, we developed CLASSIFI (CLuster ASSIgnment For biological Inference), a bioinformatics tool that classifies gene clusters based on the probability of co-clustering of probes with similar gene ontology annotation. We applied CLASSIFI to an Alliance for Cellular Signaling (AfCS) data set that examines the in vitro responses of B cells to stimulation with three ligands that cause a strong proliferative response: CD40L, LPS, and anti-IgM. CLASSIFI analysis revealed an overrepresentation of gene ontologies related to intracellular transport, including genes involved in endocytosis and vesicle acidification such as ATPase H+ pump subunits and SNARE-related genes, in the cluster of genes that are upregulated only in response to anti-IgM. Based on this gene expression data analysis, we hypothesized that anti-IgM, unlike CD40L and LPS, specifically stimulates a complex biological process that includes endocytosis, endosome acidification, vesicle fusion and vesicle transport. The predicted effect of these ligands on receptor endocytosis has been verified experimentally.

Figure 1. Experimental methodology and analysis of microarray data.B cells were negatively selected from mouse spleens using anti-CD43-coated magnetic beads and cultured for 4 hrs in the presence or absence of anti-IgM, LPS or anti-CD40. Fluorescently-labeled cRNA prepared from these cells was mixed with a reference cRNA (from total spleen) and hybridized to a custom spotted cDNA microarray. A. Fluorescence values were filtered to remove features too close to background and normalized to the spleen reference. Significance Analysis of Microarrays (SAM) was used to identify genes whose expression was significantly different between untreated and treated conditions. Genes that were significantly upregulated or downregulated were assigned values of “+1” or “-1”, respectively. The genes were clustered together based on the categorical expression patterns, and analyzed using CLASSIFI. B. The steps involved in CLASSIFI analysis of clustered microarray data are detailed. g=number of probes in entire data set, c=number of probes in a specific gene cluster, f=number of probes with a given ontology in entire data set, n=number of probes with a given ontology in the specific gene cluster.

Figure 2. Clustering and CLASSIFI results for data from 3 ligands.Clustering of categorical data from B cells stimulated with CD40L, LPS, and anti-IgM results in 19 gene clusters. Red=upregulated. Green=downregulated. Black=no change. Following CLASSIFI analysis, the gene ontology with the lowest p value in each gene cluster is listed. GO id=a unique Gene Ontology identifier that corresponds to a defined molecular function (MF), biological process (BP), or cellular component (CC). Expt=the expected number of occurrences of a given GO id in a given cluster of size (n) based on a random distribution. Prob=the probability that the GO id co-cluster pattern has occurred by chance.

Figure 3. Intracellular transport-related genes in Gene Cluster 18.Selection of genes found in Gene Cluster 18 with functions related to endosome acidification, transport and fusion.

Figure 4. Coordinate expression of endocytosis genes.Since the microarray methodology is inherently “noisy”, it is important to verify the expression pattern of potentially interesting genes by a parallel methodology. The ligand-specific expression pattern of four genes found in Gene Cluster 18 was verified by real-time PCR analysis (left side). Based on the CLASSIFI analysis, we hypothesize that anti-IgM might also induce the upregulation of other genes involved in endosome processing. Indeed, anti-IgM was found to induce the mRNA levels of four other components of the ATPase H+ pump, while CD40L and LPS did not (right side).

 Figure 5. Internalization through the B cell antigen receptor (BCR)WEHI-231 cells were treated with a non-stimulating anti-IgM mAb conjugated to FITC. Polyclonal anti-IgM was then added to stimulate the cells. Following a 1-hour stimulation, cells were harvested and washed with acid to remove surface-bound antibody. Cells that were washed with acid (dotted lines) or not (solid lines) were compared to unstimulated cells (black lines). Anti-IgM, but not CD40L or LPS, stimulates internalization of the BCR, as predicted from the gene expression data.

Conclusions•We have applied CLASSIFI, a tool for functional classification of microarray gene clusters, to a microarray data set comparing B cell responses to CD40L, LPS, and anti-IgM.•CLASSIFI analysis reveals significant co-clustering of gene ontologies related to intracellular transport in Gene Cluster 18, which contains genes that are upregulated specifically in response to anti-IgM. •Several genes within Gene Cluster 18 are related to various aspects of endosome internalization, acidification and trafficking, including ATPase H+ pump subunits and SNARE-related genes, leading to the hypothesis that activation of B cells through the antigen receptor induces endocytosis, antigen processing and presentation.•ATPase H+ pump subunit genes that were not included on the microarray were also found to be upregulated in a ligand-specific manner, indicating that genes involved in the same biological process are coordinately expressed.•Anti-IgM stimulates ligand-specific receptor internalization, indicating that CLASSIFI analysis is useful in predicting biological responses to ligand stimulation from the gene expression data analysis. 

Basic filtering

Normalization

Statistical filtering

Correlation clustering

CLASSIFI

Raw DataA. B.