phoenix, a web interface for (re)analysis of microarray data

16
Received 3 February 2009; Accepted 6 July 2009 Central European Journal of Biology Unit of Research in Molecular Biology (URBM) University of Namur (F.U.N.D.P.) B-5000 Namur, Belgium PHOENIX, a web interface for (re)analysis of microarray data Fabrice Berger & , Benoît De Hertogh & , Michaël Pierre, Eric Bareke, Anthoula Gaigneaux, Eric Depiereux* Abstract: Microarrays are tools to study the expression profile of an entire genome. Technology, statistical tools and biological knowledge in general have evolved over the past ten years and it is now possible to improve analysis of previous datasets. We have developed a web interface called PHOENIX that automates the analysis of microarray data from preprocessing to the evaluation of significance through manual or automated parameterization. At each analytical step, several methods are possible for (re)analysis of data. PHOENIX evaluates a consensus score from several methods and thus determines the performance level of the best methods (even if the best performing method is not known). With an estimate of the true gene list, PHOENIX can evaluate the performance of methods or compare the results with other experiments. Each method used for differential expression analysis and performance evaluation has been implemented in the PEGASE back-end package, along with additional tools to further improve PHOENIX. Future developments will involve the addition of steps (CDF selection, geneset analysis, meta-analysis), methods (PLIER, ANOVA, Limma), benchmarks (spike-in and simulated datasets), and illustration of the results (automatically generated report). © Versita Warsaw and Springer-Verlag Berlin Heidelberg. Keywords: Affymetrix • Benchmark • Consensus • Differential expression • Microarray • Regularized t-test • SAM • Statistical analysis • Web interface • Window t-test * E-mail: [email protected] & These authors contributed equally to this work Research Article 1. Introduction 1.1 Objectives The main objective of this work was to develop an online interface for the analysis of microarray data. This interface was designed to be user friendly and easy to use by non-specialists as well as easy to adjust (parameterize) by specialists. Another of our goals was to offer a wide range of methods for robust, high-performing analyses that are easy to use and parameterize. Furthermore, we aimed to create an interface which would provide access to a repository of public datasets and analysis results. Other aspects have been included to facilitate the use of high-performance tools as well as comparisons of the results from a panel of methods. Lastly, we added a global analytical step using a robust consensus method. The first public version of the interface, called PHOENIX, includes tools and procedures for preprocessing, statistical analyses and evaluation. PHOENIX is available at http://urbm-cluster.urbm.fundp.ac.be/phoenix and PEGASE (the R back-end package) can be downloaded from the web interface. 1.2 State of the art In this post-genomics era, new technologies based on genome-scale analyses generate extremely large datasets from which biostatisticians must retrieve specific information. Microarray experiments have made it possible to study gene expression profiles, but analyses of significance are typically limited by the small number of experimental runs. This restriction in Cent. Eur. J. Biol. • 4(4) • 2009 • 603–618 DOI: 10.2478/s11535-009-0055-8 603

Upload: fabrice-berger

Post on 02-Aug-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Received 3 February 2009; Accepted 6 July 2009

Central European Journal of Biology

Unit of Research in Molecular Biology (URBM)University of Namur (F.U.N.D.P.)

B-5000 Namur, Belgium

PHOENIX, a web interface for (re)analysis of microarray data

Fabrice Berger&, Benoît De Hertogh&, Michaël Pierre, Eric Bareke, Anthoula Gaigneaux, Eric Depiereux*

Abstract: Microarrays are tools to study the expression profile of an entire genome. Technology, statistical tools and biological knowledge in general have evolved over the past ten years and it is now possible to improve analysis of previous datasets. We have developed a web interface called PHOENIX that automates the analysis of microarray data from preprocessing to the evaluation of significance through manual or automated parameterization. At each analytical step, several methods are possible for (re)analysis of data. PHOENIX evaluates a consensus score from several methods and thus determines the performance level of the best methods (even if the best performing method is not known). With an estimate of the true gene list, PHOENIX can evaluate the performance of methods or compare the results with other experiments. Each method used for differential expression analysis and performance evaluation has been implemented in the PEGASE back-end package, along with additional tools to further improve PHOENIX. Future developments will involve the addition of steps (CDF selection, geneset analysis, meta-analysis), methods (PLIER, ANOVA, Limma), benchmarks (spike-in and simulated datasets), and illustration of the results (automatically generated report).

© Versita Warsaw and Springer-Verlag Berlin Heidelberg.

Keywords: Affymetrix • Benchmark • Consensus • Differential expression • Microarray • Regularized t-test • SAM • Statistical analysis • Web interface • Window t-test

* E-mail: [email protected]& These authors contributed equally to this work

Research Article

1. Introduction1.1 ObjectivesThe main objective of this work was to develop an online interface for the analysis of microarray data. This interface was designed to be user friendly and easy to use by non-specialists as well as easy to adjust (parameterize) by specialists. Another of our goals was to offer a wide range of methods for robust, high-performing analyses that are easy to use and parameterize. Furthermore, we aimed to create an interface which would provide access to a repository of public datasets and analysis results. Other aspects have been included to facilitate the use of high-performance tools as well as comparisons of the results from a panel of methods. Lastly, we added a global analytical step

using a robust consensus method. The first public version of the interface, called PHOENIX, includes tools and procedures for preprocessing, statistical analyses and evaluation. PHOENIX is available at http://urbm-cluster.urbm.fundp.ac.be/phoenix and PEGASE (the R back-end package) can be downloaded from the web interface.

1.2 State of the artIn this post-genomics era, new technologies based on genome-scale analyses generate extremely large datasets from which biostatisticians must retrieve specific information. Microarray experiments have made it possible to study gene expression profiles, but analyses of significance are typically limited by the small number of experimental runs. This restriction in

Cent. Eur. J. Biol. • 4(4) • 2009 • 603–618DOI: 10.2478/s11535-009-0055-8

603

F. Berger et al.

combination with the large number of genes studied simultaneously leads to a great deal of false positives and false negatives. Therefore, differential expression studies must optimize the resolution of biological information and noise inherent to the technology itself and specific experimental designs. In this difficult context, biologists may be at a loss for determining the best analytical strategy, especially since at least three steps are required for a complete analysis (preprocessing, statistical analysis and post-analysis), and because there is a wide range of tools available for each of these steps. Moreover, our work towards the development of a new web interface to automate the analytical procedure is a logical continuation of the new trend towards re-analysis and meta-analysis of old data as new tools and techniques become available for statistical analyses of the ever-increasing body of biological knowledge.

1.3 Background1.3.1 Online databases and analytical toolsBiostatisticians and biologists alike would benefit from a tool to facilitate re-analysis of old datasets in homogeneous analyses (same preprocessing, same treatment) of several datasets as well as meta-analysis of old and/or new data. Our work began with the creation of a list of the available online databases and analysis tools. This collection is available on the PHOENIX website. This compilation identified (i) a large set of databases (often specialized) which can be used to view previous results, (ii) a large set of websites offering free access or selling access to tools for the storage of data analyses, (iii) a smaller set of raw data repositories and (iv) a very small set of sites offering free online data analysis. This compilation reveals the lack of an interface that combines both data storage and data analysis (with storage of the results) functions and performance evaluation of the tools included.

Table 1 compares PHOENIX to the main free online tools for analysis of one-color microarray data. Downloadable tools have been excluded from this table, for two reasons. First, they often need technical skills, which are useless to a web interface use. Secondly, the purpose of this paper is to propose the PHOENIX interface (relying on the back-end PEGASE R-package) as an alternative of the use of existing packages. Cyber-T [1], Expression profiler [2], GEMS [3] and NIA [4] are essentially web tools devoted specifically to one analytical method. Cyber-T uses simple t-tests or regularized t-tests. Expression profiler compares and displays the relationships between different clustering results. The NIA Array Analysis tool uses the false discovery rate to test for statistical significance and the principal component analysis to analyze the gene-expression patterns. Each offers additional features, but none can be compared with GEPAS [5], MIDAW [6] or PHOENIX, which allow the use of different low- and high-level analytical tools. Moreover, a few other analyses and post-analyses (clustering, viewing, supervised classification, and data mining & analysis for GEPAS, boxplot, density estimation, cluster analysis, principal component, partial least-squares analysis for MIDAW) are possible with GEPAS and MIDAW. Though such features are not included in PHOENIX, PHOENIX is still the only tool which allows consensus analyses and benchmarking. To our knowledge, M@CBETH is the only online benchmarking tool, but is designed only for this task.

1.3.2 PreprocessingMicroarray experiments are carried out to identify differentially-expressed genes. However, there are many other factors which may cause variation in microarrays (probe labeling efficiency, RNA concentration or hybridization efficiency, etc.). The goal of preprocessing,

low-level analyses high-level analyses others

one-color two-color methods one-color two-color methods other/post consensus benchmark

Cyber T yes no one yes no cyber t yes no no

Exp. profiler yes no many yes no t-test yes no no

GEMS yes no one yes no GEMS yes no no

GEPAS yes yes many yes yes many yes no no

M@CBETH - - - - - - no no yes

MIDAW yes yes many yes yes PAM yes no no

NIA yes yes one yes yes ANOVA yes no no

PHOENIX affym no many yes no many no yes yes

Table 1. Comparison of the main free online tools for analysis of one-color microarray data.

604

PHOENIX, a web interface for (re)analysis of microarray data

also called low-level analysis, is to eliminate such undesirable variations by using different procedures such as background correction, statistical normalization and, on occasion, perfect match/missmatch correction. After the preprocessing step, the data can then be analyzed statistically, at least at the probe level. For an analysis at the probeset level, the information must also be summarized from the probe level. Numerous preprocessing tools are available to do this. The preprocessing methods chosen for our interface were selected based on their widespread use and successful results obtained in several benchmarks [7-10]. RMA (Robust Multichip Average) can be broken down into three steps: background adjustment (where the signal observed is assumed to be the overlapping of a normal background and an exponential signal), quantile normalization and median-polish summarization of probe-level Affymetrix microarray data. This method fits a robust linear model at the probe level but requires large amounts of RAM [11]. GCRMA is the GC Robust Multi-array Average background correction that uses the GC content of probes as supplementary information. The Bioconductor [12] GCRMA package performs GCRMA background adjustment, quantile normalization, and median-polish summarization of probe-level Affymetrix microarray data [13]. MAS 5.0 software is available with the Affymetrix system. After quantization of a chip, this algorithm computes a signal for each probe set and normalizes the data across all arrays [14]. Expresso is a function of the bioconductor “affy” package that combines the different steps of these methods (and a fourth method, dCHIP) [15,16]. Moreover, the analysis can be terminated before the summarization step in order to perform statistical analyses at the probe level.

1.3.3 Processing (differential expression analysis)Many significance tests have been described previously and focus on microarray-based datasets. Various approaches are currently in use. For example, one can use variants of traditional tests (based on a normal or observed distribution), parametric and non-parametric approaches, or correction of data from multiple genes sharing a close level of expression. The Student t-test and Welch correction are widely used for microarray data analyses [17,18]. These traditional methods have given rise to many other methods. Both the Student t-test and Welch correction are used for analyses or as a reference for performance comparisons. As robust estimators are less sensitive to the measurement of outliers, the traditional procedures can be adapted to use the median and the median absolute deviation instead of the mean and the standard deviation.

Baldi and Long described the Regularized t-test which is based on a Bayesian approach using the Student t-test. This approach results in an estimation of variance from a weighted average of individual variance and background variance. Background variance, computed for genes within a window around the mean expression level (estimated for 101 probesets by default), is thus a correction term added individually to each variance estimate [19]. We previously described the Window t-test (and the Window Welch test) in which variance estimates are computed only within a sliding window. Compared with the Regularized t-test, parameterization of the method does not depend on a balance between individual and background variances. Instead, we prefer to adjust the window size thus giving greater importance to the background variance for large windows and to individual variance for small windows [20]. Local Pooled Error is another method based on a sliding window and adaptable window size [21]. Tusher’s strategy, used in SAM, combines various approaches: variance is corrected using a fudge factor based on the distribution of individual variance. However, it is computed from a given percentile of the individual variance distribution and the same fudge factor is added to each individual variance. The resulting t-based statistic is compared with its expected value computed from permutations between arrays [22]. The Wilcoxon-Mann-Whitney rank-sum test and the Rank Products test are two non-parametric methods. Both methods are based on scores instead of expression values [23,24]. In the Wilcoxon-Mann-Whitney rank-sum test, scores are attributed in a gene-specific manner, and significance is computed from a null distribution defined from permutations between arrays. This method is the non-parametric method equivalent to the Student t-test [23]. In the Rank Products approach, scores are attributed in an array-specific manner and the statistic computed is the product of those scores. The null distribution is defined from permutations within each array [24]. The algorithms used in the methodologies discussed above were described previously by their authors. The corresponding papers provide further explanations of the procedures.

2. Experimental Procedures2.1 Online databasesDue to the large number of databases of microarray data (more than 50) encountered during this work, we could not automate data retrieval from each and we thus

605

F. Berger et al.

listed them in a separate section of the interface. Special attention was paid to databases containing datasets such as “Affymetrix”-type datasets. Nevertheless, we did automate the access to data from two of the largest repositories for wide audiences, ArrayExpress and Gene Expression Omnibus. ArrayExpress, managed by the EBI, is a public repository for transcriptomics data which contains 6,169 experiments and 176,858 hybridizations [25]. Gene Expression Omnibus (GEO), managed by the NCBI, can be described as a gene expression/molecular abundance repository and an organized online resource to browse, query and retrieve gene expression data. GEO currently contains 278,761 samples [26]. Using data from other sources requires that the data be imported to a local computer before upload into the interface.

2.2 Methods for preprocessing and differential expression analysis

In the current version of PHOENIX, preprocessing is only possible for Affymetrix data. Other sources of data can be added at the differential expression analysis step. The methods currently available in the interface for microarray data preprocessing are RMA [11], GCRMA [13], MAS 5.0 [14], and dCHIP [16]. All preprocessing steps are performed using the package Expresso (available on BioConductor) [17]. All differential expression analysis methods used in the PHOENIX interface were written in R language and implemented from the beginning. This collection of scripts was automated and gathered in the R package called PEGASE (Performance Evaluation and Global Analysis of Significant Expression) which is used by PHOENIX as a back-end package. The methods currently used for differential expression analysis are the classic Student t-test [17] and the Welch correction for heteroscedasticity [18], the variants of which use of robust estimators (Median and Median Absolute Deviation), and empirical

Bayes methods such as the Regularized t-test [20], and SAM [22]. We also implemented the non-parametric Wilcoxon-Mann-Whitney rank sum test [23] and the Rank Products method [24]. The methods utilized, based on an empirical relationship between variability and level of expression, are the Regularized t-test [19], the Local Pooled Error test (LPE-test) [21], the Window t-test and the Window Welch test [20].

2.3 PEGASEWe developed a library of functions coding for a set of algorithms in the public domain which are listed in Section 2.2. Our motivation was to provide a back-end to the PHOENIX web interface, and to allow users to run PEGASE on a local computer. PEGASE was developed using the R language. Our main objectives were to: (i) achieve optimal automation of the analysis, (ii) provide the opportunity to “tune” each parameter (freedom of use), (iii) combine specific modules in new ways (creation of new methods), (iv) include extra information/results at each step of the analysis before completing the analytical plan and (v) offer various ways to illustrate the results for easier interpretation. These objectives, combined with our drive to achieve modular implementation, led to the development of functions which allow users to compute estimators from data, compute statistics from those estimators, compute the null distribution (or to use a classic one) needed to compute significance, and automate the analysis by computing the common steps only once. The combination of these functions with other steps requires less computing power than computing each methodology separately. Figure 1 illustrates how modular functions were designed to orchestrate common and specific steps. For example, a common function is used to compute a t-statistic from various estimators (mean or median, SD estimated using a classic estimator or a window,

Figure 1. Schematic representation of the pegase.run() procedure. This function orchestrates the analysis of differential expression using various methods. The procedure is thus illustrated by providing a modular approach for the optimization of the global analysis by combining common and specific steps (evaluation of standard deviation, evaluation of t-statistic, Welch correction procedure, …).

606

PHOENIX, a web interface for (re)analysis of microarray data

or a balance of both approaches in the Regularized t-test). The package is designed in two parts: “back-end” computing for generation of data and functions used to illustrate the results (tables, plots). Figure 2 is a representation of the main steps of microarray differential expression analysis covered by PEGASE. One function is defined in each step. The organization of functions allows to define parameters manually, or to choose easy to use automated procedures. Each step shown in Figure 2 was designed as a separate feature of the PEGASE package. These steps are: (a) dataset preparation (removal of each probeset containing exclusively NA values under one of the conditions compared), (b) configuration of the analysis (selection of methods, analytical steps, and parameterization of the procedures selected) (c) differential expression analysis, (d) global analysis based on the attribution of a “consensus” score for each probeset from the p-values obtained with each methodology, (e) performance evaluation of the methods implemented when a list of truly differentially-expressed genes is provided (or a list of probesets coming from previous analysis of a different dataset). PEGASE includes two functions which allow the user to store the results in, and load from, separate CSV files (pegase.import and pegase.export). More information on PEGASE variables, input and output is provided as supplementary material. PEGASE also offers a set of functions which generate graphical outputs illustrating

the data structure, estimated parameters, results and performance evaluation. These functions use the values in the PEGASE variable list and create PNG files from the coordinates computed.

2.4 Consensus methodTo obtain a more robust estimation of the probeset top list, the software allows the user to compute the probability that each probeset is not differentially expressed (Null hypothesis) across all methods selected. Mathematically speaking, this probability can be obtained by computing the product of probabilities that the null hypothesis will be accepted for each method selected. The consensus p-value can thus be computed from the product of the p-values obtained for each method (Eq. 1).p-value(i) = ∏ (j) p-value(i,j) (Eq. 1)where i is the probeset index and j is the individual method identifier.log(p-value(i)) = ∑ (j) log(p-value(i,j)) (Eq.2)However, the consensus p-value obtained using this approach is dependent on the number of methods. To circumvent this limitation on cross-comparison between consensus p-values, the product of probabilities can first be formulated from the sum of the logarithm of the individual p-values using logarithmic operations (Eq. 2). This procedure has been adjusted to fit our expectations, regardless of the number of methods, by computing the mean of the logarithms of method-specific p-values instead of the sum (Eq. 3).

Figure 2. Overview of the PEGASE R package. The functions listed here are used to orchestrate more specific functions, giving the users the opportunity to analyze the data step by step, or to follow part of the analysis procedure, and then perform a partial analysis with various options for the next steps, instead of running a full analysis when common steps are identical. The global pegase() function can be used to select and parameterize each step (manually or automatically).

607

F. Berger et al.

log(Cons.Score(i)) = ∑ (j) log(p-value(i,j)) / n (Eq. 3)where n is the number of methods used to compute the consensus score (Cons. Score). A second concern with the use of multiple p-value lists relates to the distributions that are used for assessment. Each method provides a list of p-values but those p-values are not assessed in the same way and, thus, each list of p-values has its own distribution/range, even when the results are close between methods. Therefore, we must take care to correct p-values so that the multiple distributions cover the same range of values. This concern can be addressed using an additional FDR correction step. As far as we know, this approach does not completely solve the problem, and various studies aim to improve FDR correction. Instead, we propose to use another strategy which attributes a rank to each probeset based on its position in the sequence of p-values (for each method). Our consensus approach, when applied to ranks (divided by the total number of probesets to be in the 0-1 range), can correct for this issue involving multiple ranges as each list of p-values is given a common range during this substitution of values. Using ranks, the values obtained do not assess a probability, but give a score between 0 and 1 to characterize the position of the probeset in the list. The entire sequence of sorted consensus scores thus represents the expected sequence of probesets likely to be differentially expressed (close to 0) or not (close to 1). The last concern that we wanted to address using this consensus approach was the ability to tune the weight of each method in the final score. For example, the Regularized t-test and Window t-test are preferred over the Student t-test when only a few samples are available. Welch procedures are more likely to give good results when individual variances are distinct between both samples tested. We thus adapted the consensus in order to compute a score knowing that particular methods should give best results due to intrinsic dataset properties (Eq. 4).log(Cons.Score(i)) = ∑ (j) W(j) log(Score(i,j)) / ∑ (j) W(j) (Eq. 4)where the Score is either the p-value or the ratio of the rank/number of probesets, and W(j) is the method-specific weight defined by the user.

2.5 Design of the interfaceThe PHOENIX web interface was created using the classical PHP, JavaScript and HTML languages. PHP is a general-purpose scripting language that is especially well-suited for Web development and can be embedded into HTML (http://www.php.net/). HTML is the publishing language of the World Wide Web (http://www.w3.org/

html/wg/). JavaScript is a scripting language which allows for dynamic interaction between the user and the interface (http://developers.sun.com/scripting/javascript/). Apache is the open-source web server that we used to host the PHOENIX web interface (http://www.apache.org/). The essential principles underlying the PHOENIX interface have always been simplicity of use for non-specialists, high performance level, and the ability for advanced users to manually define parameters.

2.6 DatasetsTwo datasets were used to evaluate the performance of the analysis package. The first dataset selected was the Golden Spike Experiment dataset, described by Choe et al. 1331 probesets were spiked on the DrosGenome1 Genechip model, among the 14010 total probesets [27]. The second dataset was described by Bosco et al. and is available in ArrayExpress under the accession tag E-MEXP-445. The chip model is Affymetrix HG-U133A. This experiment studied the effect of hypoxia on human monocytes. Three microchips were used under hypoxia and three microchips were used under normoxic conditions [28].

2.7 Technical issues associated with PHOENIX / PEGASE

There are both advantages and technical limitations of the current versions of PHOENIX and PEGASE. PHOENIX and PEGASE are flexible and offer basic as well as advanced functions. Firstly, PHOENIX currently supports Affymetrix CEL files for the preprocessing step. For user convenience, the import form allows retrieval of data from the external ArrayExpress and GEO databases. Users can also upload a dataset from a local computer, which eases the analysis of datasets obtained from public repositories. At the differential expression step, PHOENIX and PEGASE support any CSV file which stores expression values (with rows and columns relative to probesets and arrays, respectively). We should note that the differential expression analysis methods were implemented to analyze unpaired univariate procedures. PHOENIX and PEGASE thus correspond to the experimental design of one-color technology. However, the consensus approach and evaluation step can be performed on the results obtained with any methodology. PHOENIX requires a web browser with enabled JavaScript functions, while PEGASE requires installation of the R software to be used locally. PHOENIX and PEGASE are OS-independent. The PHOENIX web interface server host is a 64-bit 80-core cluster running the CentOS GNU/Linux Operating System used to perform all analyses. The

608

PHOENIX, a web interface for (re)analysis of microarray data

technical advantages of PHOENIX are mainly based on the cluster configuration and its computing power. The analysis can be reproduced since the data, results, parameters and scripts are stored jointly. Automated job and user management provides security. The use of PHOENIX does not require a permanent connection as the jobs are queued on a cluster and PEGASE can perform local analyses (using R). The main advantage of the web interface is that it is easy to use (no need to learn R language). As a dataset repository, PHOENIX can analyze various datasets with a common framework. Useful tools for illustration are included in PEGASE. PHOENIX and PEGASE are flexible, offering basic & advanced functions.

3. Results3.1 Import of dataThe first part of the web interface is dedicated to preprocessing tools. In order to begin analysis at the preprocessing level, users must choose a dataset. Users can select a dataset previously uploaded in the PHOENIX repository, upload a dataset manually from personal files, or upload a dataset from ArrayExpress or Gene Expression Omnibus. The datasets originating from any other repository can be uploaded in two steps, by first downloading a dataset on a local computer and then uploading it into our repository. When a user uploads a dataset, it is stored in a private repository which is only available to that user. Both PHOENIX and PEGASE can accommodate various datasets. The preprocessing step in PHOENIX was included as an aid for data preparation using a common framework through several datasets (currently limited to Affymetrix CEL files). The differential expression analysis step can accommodate expression values from any manufacturer, summarized at the probeset level, provided that they comply with the CSV file format where rows represent probesets and columns represent arrays. Because unpaired univariate procedures are used, PHOENIX and PEGASE should be used to analyze only one-color microarray datasets.

3.2 Data preprocessingThe tools described above (RMA, GCRMA, dCHIP, MAS5 and the Bioconductor Expresso package) can be used for data preprocessing. The parameters defining each step of preprocessing can be adjusted. In this case, the selected steps are performed using the Bioconductor Expresso package (parameterized by the R script generated by PHOENIX). Consequently, all combinations of steps performed by Expresso, including dCHIP, are available in the PHOENIX interface. Each

method generates files with expression values at the probeset level, available in R format or as CSV files, ready for differential expression analysis. The last method (Expresso) can also generate files with expression values estimated at the probeset level, at each stage of the process (background correction, normalization, PM/MM correction).

3.3 Differential expression analysisThe second part of the web interface offers differential expression analysis. The user is first invited to select a dataset from the list of stored preprocessed data. Once this selection is complete, a form allows the user to choose the preprocessing strategy, and to define the hypothesis of the test from the list of sample labels. For each step in the form, the user has access to the full list of previously-defined parameters and also has the option to define and store new parameter sets. Configuration of the analysis first involves selection of the steps to perform. For each probeset, the procedure checks the availability of expression values, and NA values may be removed. If there is at least one condition for which less than 2 values are available (NA for the others), then the probeset is removed from the dataset. The user can configure the analysis by selecting the statistical methods to be used for differential expression analysis. For some methods requiring parameterization, manual configuration can be performed using the “+” command for advanced parameters. By default, advanced parameters are either pre-defined or automatically computed. For each method, p-values are estimated for both unidirectional and bidirectional tests. Users of the interface can also choose to evaluate consensus scores and compare method performances. These functionalities are discussed in further sections of this article. PHOENIX uses the information given in the form to write an R script which is then added to the queue list. All results are stored in CSV file format, and gathered in a downloadable directory.

3.4 BenchmarksAt the bottom of the differential expression analysis tab, an option can be used to ask the server to evaluate the performance of the methods selected. This option can be selected when the user provides a CSV file containing a logical value for each probeset (0=not differentially expressed, 1=differentially expressed). Using this file, PHOENIX can compute coordinates for four types of figures of merit. To build ROC curves, coordinates used on the X and Y axes are 1-specificity and sensitivity, respectively. These performance curves are widely used by the scientific community, although they are not very discriminating when comparing methods [29]. In the

609

F. Berger et al.

Modified ROC curves, the coordinates used are False discovery rate and Sensitivity. These performance curves are equivalent to Precision/Recall Curves (PRC) (FDR=1-precision and Sensitivity=Recall). This type of curve shows the evolution of truth discovery compared with the error rate in the top list. The two coordinates used to draw FDR curves are p-value thresholds and

the false discovery rate. These curves compare the expected error rate with the actual error rate. Lastly, a plot of Sensitivity vs. Number of probesets selected illustrates the evolution of truth discovery with the size of the top list. Molecular biologists can use the evaluation procedure to compare experimental results with recently validated results from the scientific

Figure 3. Performance evaluation of PEGASE. Example of plots generated on Choe’s spike-in dataset, using a known list of spiked RNA. A: Traditional ROC curve. B: Modified ROC curve. C: Sensitivity vs. Number of selected genes. D: False discovery rate vs. alpha threshold.

610

PHOENIX, a web interface for (re)analysis of microarray data

literature or with results from other experiments. This second experiment, used to define the truth, can either examine the same conditions or, in a meta-analysis scheme, various experiments when a set of genes is implied in both experiments. Bioinformaticians can use these figures of merit to compare the performance of the methods selected on reference datasets (spike-in experiments, simulated data, or extensively studied dataset). As an example, Figure 3 shows the plots obtained when analyzing Choe’s Golden Spike Experiment.

In the future, scripts will be added to include the possibility to perform benchmarks on external results submitted (lists of p-values obtained with the user’s methodology), and on multiple datasets (Simulations and Latin Squares from Affymetrix).

3.5 ConsensusIn bioinformatics studies, it is common practice to compute the consensus of multiple results as a new, robust result. For example, multiple sequence alignment

Figure 4. Illustration of consensus score performances on Choe’s spike-in dataset, using a known list of spiked RNA. A: Traditional ROC curve. B: Modified ROC curve. C: Sensitivity vs. number of selected genes. D: False discovery rate vs. alpha threshold.

611

PHOENIX, a web interface for (re)analysis of microarray data

algorithms may be used to analyze homologies between DNA or protein sequences and to compute a subsequent consensus sequence. In the field of microarray differential expression, the result obtained is a sequence of p-values which are probabilities that the probesets studied between two conditions are distributed in the same population. Probesets with a p-value close to 0 have a very low probability of being part of the same population, and have a higher probability of belonging to distinct populations. The PHOENIX interface and PEGASE R package both offer the option to compute p-values using various methodologies. Probesets belonging to very different populations are expected to be found by all methods, and those belonging to the same population, provided it is well estimated, are expected to produce p-values close to 1. However, with more modest differences between populations, some methods detect probesets as differentially expressed, while others do not. We included a simple consensus approach with our automated analysis procedure. This consensus approach relies on a weighted logarithmic average of the method-specific scores (p-values or ranks). The ranks of the genes can be used to ensure that each individual list of results is distributed in the same range. Methods can be weighted in the consensus approach to provide a more accurate gene list by favoring appropriate methods when there are some clues about which method(s) perform best. More explanations of the algorithm can be found in the Procedures section of this article. Figure 4 illustrates performance results of the consensus approach on Choe’s Golden Spike experiment. The performance comparison between Figure 3 and Figure 4 shows that the four alternative ways to compute the consensus score are close to the best performing methods, with improved stability of the beginning of the curve (even when some individual methods give very poor results). This suggests that probesets discovered falsely using separate methods are not part of the top list in the consensus. This improvement has a price: some probesets previously found by individual methods are not detected with this approach, and the performance obtained with the consensus approach does not reach the best performance (but stays close to it). Consequently, the results provided by the consensus approach display an optimal evolution of the false discovery rate in the top list, while maintaining performances close to the best methods. As illustrated in Figure 4, this was not the case with individual methods. Additionally, results demonstrate that knowledge of the best performing method is not required in order to compute the consensus, as the performances are very close when the consensus is computed with or without weights.

3.6 PEGASE as an R package for Performance Evaluation and Global Analysis of Significant Expression

As described in the Procedures section of this article, all methods used in the PHOENIX web interface are available in a back-end package called PEGASE. The preprocessing step is not included in PEGASE. One goal was to give the user the opportunity to run PEGASE on a local computer. The analysis package was intended to be easy to use for molecular biologists unfamiliar with statistical analysis and bioinformatics tools, as well as to give the opportunity to fine-tune the steps and adjust parameters to be useful for biostatisticians wishing to track data and intermediate results. Users can add their own list of p-values from external or experimental procedures to include their method in the plots. Using PEGASE as an R package offers many additional advantages. Like PHOENIX, PEGASE can be used with automated procedures, with or without manual parameterization, using one function call. It can also be used sequentially from the step-based functions by first performing dataset preparation (removal of NA probesets) followed by configuration, differential expression analysis, consensus evaluation and performance comparison. The modularized development of PEGASE provides flexibility, as each step can be parameterized specifically, and methods can be modified by simply adjusting a given modular function. Thus, a new significance analysis method is the result of the succession of common and modified/added steps. As an example, the previously published Window t-test and Window Welch t-test were designed using an earlier version of the package [20]. The ultimate goal in automation of the analytical procedure is to produce an automatically generated report, which will be a focus of future development. Some of the components required for this possibility have already been integrated into the back-end R package. The analytical procedure, intermediate values and final results can be tracked through illustrative functions and tables (for example: mean and variance relationship, fudge factor determination for Tusher d statistic). Each of the extra utilities added to PEGASE compared with PHOENIX were included either to facilitate the summarization of results in the R environment, or to provide back-end support for future improvements of the web interface. We encourage the reader to refer to the Procedures section of this article for more details on the internal structure of the package. Description of PEGASE variables and Input/Output formats are provided as supplementary data, to help users of PEGASE in the R environment. PEGASE can be downloaded from the PHOENIX web interface.

612

F. Berger et al.

3.7 Example of analysis of a biological datasetPEGASE and the web interface were used to compare the expression profiles of monocytes under hypoxia. The dataset used for this study is available publicly in the ArrayExpress repository under the identifier E-MEXP-445 and was described previously by Bosco et al. [28]. The authors listed 74 genes known to be involved in hypoxia as well as other validated genes. The results described by the authors were used to illustrate the performance of the methods implemented in PEGASE and available on the PHOENIX web interface. Finally, the expression of 90 genes (188 probesets) known to be involved in the hypoxia response were analyzed using

various methods to determine which approach had the highest performance. In addition to identifying many of the known 90 genes, the best-performing methods also revealed other candidate genes which were previously undetected. Thus, this new analysis of the dataset using several methods provided a more in-depth view of the mechanisms involved. We studied the evolution of the number of known probesets/genes detected among the total number of probesets/genes selected by each method, using four cut-offs (top lists with 50 to 200 probesets, using steps of 50 probesets). Preprocessing was performed using GCRMA and the standard Affymetrix HGU-133A chip

Probesets Known Other papers Unknown Total Candidates Genes Known Others Unknown Total Candidates

Total 77 33 240 350 47 Total 51 24 219 294 31

Student 35 9 56 100 27 Student 21 8 48 77 19

Window Student 45 14 41 100 25 Window Student 33 10 32 75 16

Welch 31 9 60 100 22 Welch 18 8 55 81 17

Window Welch 46 15 39 100 23 Window Welch 33 12 33 78 17

Reg. T-test 52 17 31 100 22 Reg. T-test 34 11 21 66 13

SAM 36 9 55 100 28 SAM 22 8 47 77 20

Robust Student 7 7 86 100 11 Robust Student 5 7 85 97 10

Robust Welch 8 3 89 100 9 Robust Welch 7 3 88 98 9

LPE 41 16 43 100 23 LPE 30 12 31 73 14

Cons. P. 49 15 36 100 28 Cons. P. 30 13 24 67 16

Cons. PW. 50 19 31 100 28 Cons. PW. 32 14 21 67 18

Cons. R. 52 12 36 100 27 Cons. R. 33 10 27 70 18

Cons. RW. 50 14 36 100 29 Cons. RW. 32 10 25 67 18

9 meth. 2 0 1 3 1 9 meth. 2 0 2 4 2

8 meth. 1 1 1 3 1 8 meth. 1 1 1 3 1

7 meth. 9 0 8 17 8 7 meth. 6 0 5 11 5

6 meth. 13 3 3 19 3 6 meth. 10 3 0 13 0

5 meth. 4 1 2 7 2 5 meth. 1 3 3 7 3

4 meth. 15 6 11 32 6 4 meth. 7 2 10 19 5

3 meth. 7 9 35 51 15 3 meth. 6 6 30 42 10

2 meth. 8 4 71 83 9 2 meth. 6 3 66 75 4

1 meth. 17 9 108 134 2 1 meth. 12 6 102 120 1

0 meth. 1 0 0 1 0 0 meth. 0 0 0 0 0

4 cons. 45 9 23 77 21 4 cons. 28 9 16 53 14

3 cons. 2 1 8 11 7 3 cons. 2 0 5 7 4

2 cons. 6 9 7 22 3 2 cons. 3 5 4 12 1

1 cons. 3 3 9 15 1 1 cons. 3 1 8 12 0

0 cons. 21 11 193 225 15 0 cons. 15 9 186 210 12

Table 2. Performance summary of the individual methods, consensus evaluation and global analysis on the E-MEXP-445 dataset. The left and right parts of the table show the probesets and genes, respectively. For each method or set of methods considered, the probesets/genes are classified according to three categories: “Known” refers to genes listed in the original publication of the dataset. “Other papers” refers to genes validated recently in other studies. The remaining set of genes is labeled “Unknown”. Finally, a new category of candidate genes was defined from the genes identified using our proposed analytical design. The bottom part of the table summarizes the same information when probesets/genes are selected from several methods or from several consensus results.

613

PHOENIX, a web interface for (re)analysis of microarray data

Gene Symbol Student t-test

Window t-test

Welch t-test

Window Welch t-test

Regularized t-test

SAM test

Robust Student

t-test

Robust Welch t-test

LPE test

Consensus (p-value)

Weighted Consensus (p-value)

Consensus (rank)

Weighted Consensus

(rank)Probeset

DFNA5 - 67 - 66 79 - 70 - - 79 80 74 71 203695_s_at

EGLN3 - - - - - - - - 84 - - - - 219232_s_at

GAS7 - 46 - - - - - - 56 73 71 91 84 202191_s_at

- 44 - - - - - - 46 76 67 - 97 202192_s_at

- - - - - - 22 77 29 53 65 - - 207704_s_at

- 64 - - - - 71 - 6 19 17 89 83 210872_x_at

- 94 - - - - - - 35 77 73 - - 211067_s_at

HSPA6 - - - - 77 - 78 - - - - - - 213418_at

IGHG1 - - - - - - - - 47 - - - - 213674_x_at

68 - 91 - - 75 - - - - - - - 217039_x_at

KLHL18 57 15 - 6 - 60 - - - - 81 70 55 212882_at

LASP1 21 - 24 - - 25 - - - - - - - 200618_at

LGALS8 18 - 68 63 72 17 5 12 - 21 42 19 33 208934_s_at

60 - 29 - 96 54 - - - 90 - 67 78 208936_x_at

19 - 35 - 90 18 - - - 85 - 63 79 210732_s_at

MERTK 39 4 - 3 8 35 6 23 15 4 5 2 2 206028_s_at

44 5 41 4 5 42 - - 19 12 8 7 3 211913_s_at

METAP1 16 - 6 - - 20 - - - - - - - 212673_at

NID1 86 - 63 - - 81 - - - - - - - 202007_at

- - - - - - 10 18 - - - - - 202008_s_at

NR1H3 - 49 - 68 - - 3 3 9 6 11 35 48 203920_at

OLFML2B - 11 - 23 35 99 - - 26 20 14 26 24 213125_at

PANX1 - - - 77 - - 84 38 - 82 89 88 96 204715_at

PARVB - 59 - 37 89 - - - 45 99 72 - 88 37966_at

PFTK1 - 32 - 45 - - - - - - 94 78 70 211502_s_at

PGM1 15 9 71 17 38 14 - - 13 10 9 12 10 201968_s_at

PPIF 92 - 42 - - 98 - - - - - - - 201489_at

PRDX4 - 85 - 59 87 - - - - - 100 81 77 201923_at

RNASET2 35 56 94 54 13 32 - - - 58 51 39 36 217983_s_at

36 22 - 32 14 31 - - 59 45 38 36 29 217984_at

RXRA 55 - 28 - - 58 - - - - - - - 202449_s_at

SAE2 - - - - - - 43 52 - 92 - 90 - 201177_s_at

SDCBP 47 - - - - 61 - - - - - - - 200958_s_at

SLCO2B1 30 2 15 1 65 29 - - 92 27 21 13 7 203473_at

90 - 52 - - 89 - - - - - - - 211557_x_at

STK38 65 - 90 - - 74 - - - - - - - 202951_at

TMEM158 - 66 - 56 - - - - 10 29 26 80 74 213338_at

TMEM43 71 - 50 - - 76 - - - - - - - 217795_s_at

TNS1 42 7 - 25 67 40 - - 16 15 12 20 17 218864_at

64 76 81 62 26 57 - - 72 44 47 38 41 221246_x_at

13 93 14 - 15 11 - 55 63 22 35 17 26 221747_at

43 20 30 12 3 41 - - 67 28 23 18 12 221748_s_at

VDAC1 84 - 75 - - 79 - - - - - - - 212038_s_at

VGLL4 10 23 46 69 23 9 2 21 75 8 15 4 11 212399_s_at

- - - 73 78 - - - - - - - 87 214004_s_at

ZNF395 - 17 - 58 74 - - - 27 63 45 69 58 221123_x_at

25 3 73 26 25 23 - - 1 1 1 5 5 218149_s_at

Table 3. Putative genes involved in hypoxia. This table shows the genes that may contribute to the cellular response to hypoxia, due to their ranks attributed by several methods, but not previously described in the hypoxia literature. It shows the interest in analyzing the results using multiple methods, which sometimes highlights different genes.

614

F. Berger et al.

definition file. Analysis of differential expression was performed using PEGASE (from the Web interface). Due to the reduced number of replicates (3 vs. 3), we did not use the Wilcoxon-Mann-Whitney Rank Sum and Rank Products tests. The gene lists obtained with the top 100 probesets were characterized and are summarized in Table 2. For each method, the number of true positives, true negatives, false positives and false negatives was computed for both the probesets (based on 188 known probesets) and genes (based on 90 known genes). At least three assertions can be made about individual methods. First, variants of the Student t-test with/without Welch correction, using median and MAD to assess mean and standard deviation, provided the least accurate results based on current knowledge of the mechanism of response to hypoxia. Second, window-based methods (Window t-test, Window Welch t-test, Regularized t-test, LPE test) using an empirical relationship between variability and expression performed best. The third conclusion that can be made is that consensus methods provided results close to the best performing methods, though some of the individual methods gave poor results (robust t-test with and without Welch correction). Furthermore, the four variants of the consensus approach provided similar results, regardless of the weight of the individual methods, illustrating that the consensus score can retrieve good results when the user does not know which method is best suited to the data. Table 2 summarizes the main features of the top lists combined. Four categories of genes are considered: known genes reported before publication of this dataset, genes detected recently in related papers, unknown genes, and candidate genes inferred from our analysis. The definition of the last category is described hereunder. Table 2 shows that the most represented category of genes is detected by only one method, and is not known to be involved in the hypoxia response. At least four methods are required to define a category containing a majority of known genes. Individual top list diversity is thus an important aspect of this type of analytical framework since only a few genes are detected simultaneously with several methods. These differences observed seem to be in genes that are not hypoxia-responsive. Such genes comprise a greater proportion of false positives. Among 75 of the positive genes identified in the analysis, 24 have been validated in recent publications. Our analytical design thus makes it possible to reduce the number of false negatives and to detect more putative genes. The least successful analysis methods (Robust Student/Welch) provide a list of genes almost exclusively represented by only one probeset. Alternatively, optimal methods are

characterized by a reduced number of detected genes but a stronger correlation (several related probesets detected together). An important observation relates to the robustness of the results at the gene level. In each category, the number of known genes is smaller than the number of known probesets. This suggests a strong correlation within the dataset, leading to the detection of many probesets related to the same gene. Furthermore, the number of genes detected in common with other studies reveals a correlation between separate studies. Among 75 known genes detected (51+24), one third has been validated in recent studies. Moreover, the correlation with other studies is stronger at the gene level than the same observation at the probeset level (common genes are identified from separate probesets). Consensus evaluation reveals that most of the genes detected from the four variants of the consensus are hypoxia-related. Furthermore, most of the genes that are never detected with any consensus belong to the “unknown” category. Genes belonging to this category provide less robust results (186 genes/193 probesets) compared with other categories. False positives for individual methods are mainly classified as true negatives in consensus methods. Comparing probesets and gene categories carefully, our analysis reveals clues regarding the quality of the results from several methods. However, this is a side effect of the structure of the microarray used, depending on the chip definition file. Such observations derive from the redundancy of genes in a standard CDF file. Not all genes involve several probesets, and the interpretation of the results would be different if hypoxia-related genes were only represented once in the standard CDF. Moreover, the use of alternative chip definition files, focusing on the gene level, would render this comparison impossible. Based on the observed features of the combined individual top lists, we selected several candidate genes detected simultaneously with several methods and/or several probesets. Due to the strong correlation between consensus variants, and to the performance of this approach, genes represented in the consensus top list were considered candidates. The characterization of candidate genes is shown in Table 2. Table 3 lists those genes with their corresponding individual scores. This global analysis and the consensus approach showed that many genes were detected from multiple different probesets. The results summarized in Table 3 suggest that the expression of many genes should be further characterized in a hypoxic environment: DFNA5, EGLN3, GAS7, HSPA6, IGHG1, KLHL18, LASP1, LGALS8, MERTK, METAP1, NID1, NR1H3, OLFML2B, PANX1, PARVB, PFTK1, PGM1, PPIF, PRDX4, RNASET2, RXRA, SAE2, SDCBP, SLCO2B1,

615

PHOENIX, a web interface for (re)analysis of microarray data

STK38, TMEM158, TMEM43, TNS1, VDAC1, VGLL4 and ZNF395. Supplementary Table 1 provides the full list of genes and their corresponding scores. It was created from the union of the top 100 probesets with each method, summarizing the ranks of each probeset with the method used in its selection. The table was sorted based on the Gene Symbol associated with each probeset. Previously validated genes are shown in bold, and genes validated elsewhere are highlighted in grey (bibliographic references for those genes are provided in the supplementary data). Supplementary Table 2 extends the global analysis approach detailed here and provides a comparison of the individual methods and the consensus approaches for a selection of the top 50, 100, 150 and 200 probesets.

4. DiscussionThe primary aim of this work was to create PHOENIX, a user-friendly set of online tools for microarray data analysis which allows for the use of several methods at each step of the analysis strategy. The second goal of this work was to provide an R package to perform the analysis on a local computer, without remote connection. The name of this package is PEGASE (acronym for Performance Evaluation and Global Analysis of Significant Expression). PHOENIX and PEGASE were designed both for molecular biologists and experienced bioinformaticians able to fine-tune critical parameters. The methods interfaced in PHOENIX or implemented in PEGASE were selected based on their performance and notoriety in the public literature. The PEGASE/PHOENIX duo can be used to better identify genes which are differentially expressed between two experiments. After data selection (from ArrayExpress, GEO or uploaded manually into a private repository), a set of automated procedures can be implemented/interfaced to run each procedure, summarize the results and illustrate the intermediate results. Old datasets can be re-analyzed with new analytical tools and the latest Affymetrix CDFs (chip definition files). As a supplementary step in the differential expression analysis of microarray data, we included a global analysis approach, based on the summarization of results from various methods. The underlying consensus method is either based on the p-values or ranks obtained from the selected methods, using method-specific weights if desired. The main advantage of this consensus approach is that it provides users with a global analytical tool which yields good results from a set of methods (better quality gene list), even if there is no clear knowledge of which methods perform best on their data. When the truth is known (extensively

studied dataset, spike-in experiments), performance comparisons can also be computed and illustrated using four types of figures of merit. PEGASE, which includes more features than PHOENIX, was designed to provide modularity, automated analysis, graphical illustration, and to allow bioinformaticians to add their own analytical results as well as to use modules in combination to create new procedures. To validate the use of PHOENIX and PEGASE, we described an example analysis of the E-MEXP-445 dataset (comparing monocyte expression profile under normoxia and hypoxia). We compared the results obtained with publicly validated results and demonstrated that the global analysis approach, based on several methods and a consensus approach, yields accurate results. Classification of the top lists into three categories (Known, Other publications, Unknown) reveals a positive correlation of the results, both within an experiment (multiplicity of probesets belonging to a same gene) and between experiments (genes identified in other publications). The analysis also identified a set of genes which was previously undetected, and which can be a candidate of further study. Future PHOENIX developments will offer new features. For example, a new query interface for the internal PHOENIX repository will allow users to easily find appropriate public datasets based on various criteria (pathology, chip model) and objectives (re-analysis, meta-analysis). We will later focus on the possibility to select alternative CDFs (chip definition files), and additional methods will be included for preprocessing and differential expression analysis. For work at the probe level, we will include the opportunity to use ANOVA-2 models (Anthoula Gaigneaux, personal communication). A new step will be added to analyze data at the gene-set level, which should make it possible to identify undetected genes with higher power during analysis. The existing benchmark tool will include scripts for performance evaluation on spike-in datasets (Affymetrix Latin-Squares HG-U95 and HG-U133, Golden Spike Experiment) [21,30]. Lastly, we will focus on the illustration and formatting of results and the ultimate goal is to provide automatically-generated reports. Some of these procedures have already been implemented in the PEGASE package (creation of tables and plots). The PHOENIX interface will be translated into various languages, and tutorials will be available for both newcomers and advanced users.

AcknowledgementsWe would like to thank Thierry Coche and Jean-Louis Ruelle from GSK Biologicals (Rixensart, Belgium),

616

F. Berger et al.

Bertrand De Meulder (FUNDP, Belgium) and Isabelle Motte (FUNDP, Belgium) for helpful discussions and technical help. The financial support for this work was provided by GSK Biologicals (Rixensart, Belgium) and the DGTRE (General Directorate for Technology and Research of the Walloon Region, Belgium).

[1] Baldi P., Long A.D., A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes, Bioinformatics, 2001, 17, 509-519

[2] Kapushesky M., Kemmeren P., Culhane A.C., Durinck S., Ihmels J., Körner C., et al., Expression profiler: next generation - an online platform for analysis of microarray data, Nucleic Acids Res., 2004, 32, W465-W470

[3] Wu C.J., Kasif S., GEMS: a web server for biclustering analysis of expression data, Nucleic Acids Res., 2005, 33, W596-W599

[4] Montaner D., Tárraga J., Huerta-Cepas J., Burguet J., Vaquerizas J.M., Conde L., et al., Next station in microarray data analysis: Gepas, Nucleic Acids Res., 2006, W486-W491

[5] Romualdi C., Vitulo N., Favero M.D., Lanfranchi G., Midaw: a web tool for statistical analysis of microarray data, Nucleic Acids Res., 2005, 33, W644-W649

[6] Pochet N.L., Janssens F.A., De Smet F., Marchal K., Suykens J.A., De Moor B.L., M@cbeth: a microarray classification benchmarking tool, Bioinformatics, 2005, 21, 3185-3186

[7] Cope L.M., Irizarry R.A., Jaffee H.A., Wu Z., Speed T.P., A benchmark for Affymetrix GeneChip expression measures, Bioinformatics, 2004, 20, 323-331

[8] Ploner A., Miller L.D., Hall P., Bergh J., Pawitan Y., Correlation test to assess low-level processing of high-density oligonucleotide microarray data, BMC Bioinformatics, 2005, 6, 80

[9] Harr B., Schlötterer C., Comparison of algorithms for the analysis of Affymetrix microarray data as evaluated by co-expression of genes in known operons, Nucleic Acids Res., 2006, 34, e8

[10] Pepper S.D., Saunders E.K., Edwards L.E., Wilson C.L., Miller C.J., The utility of MAS5 expression summary and detection call algorithms, BMC Bioinformatics, 2007, 8, 273

[11] Irizarry R.A., Bolstad B.M., Collin F., Cope L.M., Hobbs B., Speed T.P., Summaries of Affymetrix GeneChip probe level data, Nucleic Acids Res., 2003, 31, e15

[12] Gentleman R.C., Carey V.J., Bates D.M., Bolstad B., Dettling M., Dudoit S., et al. ,Bioconductor: open software development for computational biology and bioinformatics, Genome Biol., 2004, 5, R80

[13] Wu Z., Irizarry R.A., Gentleman R., Murillo F.M., Spencer F., A Model-Based Background Adjustment for Oligonucleotide Expression Arrays, J. Am. Stat. Assoc., 2004, 99, 909-917

[14] Hubbell E., Liu W.M., Mei R., Robust estimators for expression analysis, Bioinformatics, 2002, 18, 1585-1592

[15] Li C., Wong W.H., Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application, Genome Biol., 2001, 2, RESEARCH0032

[16] Li C., Wong W.H. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection, Proc. Natl. Acad. Sci. U.S.A., 2001, 98, 31-36

[17] Student, The Probable Error of a Mean, Biometrika, 1908, 1-25

[18] Welch B.L., The significance of the difference between two means when the population variances are unequal, Biometrika, 1938, 29, 350-362

[19] Baldi P., Long A.D. A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes, Bioinformatics 2001, 17, 509-519

[20] Berger F., De Hertogh B., Pierre M., Gaigneaux A., Depiereux E., The “Window t-test”: a simple and powerful approach to detect differentially expressed genes in microarray datasets, Cent. Eur. J. Biol., 2008, 3, 327-344

[21] Jain N., Thatte J., Braciale T., Ley K., O’Connell M., Lee J.K., Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays, Bioinformatics, 2003, 19, 1945-1951

[22] Tusher V.G., Tibshirani R., Chu G., Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U.S.A., 2001, 98, 5116-5121, (Erratum in: Proc. Natl. Acad. Sci. U.S.A., 98, 10515)

References

617

PHOENIX, a web interface for (re)analysis of microarray data

[23] Mann H.B., Whitney D.R., On a test of whether one of two random variables is stochastically larger than the other, Ann. Math. Stat., 1947, 18, 50-60

[24] Breitling R., Armengaud P., Amtmann A., Herzyk P., Rank Products: A simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments, FEBS Lett., 2004, 573, 83-92

[25] Parkinson H., Kapushesky M., Shojatalab M., Abeygunawardena N., Coulson R., Farne A., et al., ArrayExpress - a public database of microarray experiments and gene expression profiles, Nucleic Acids Res., 2007, 35, D747-D750

[26] Edgar R., Domrachev M., Lash A.E., Gene Expression Omnibus: NCBI gene expression and hybridization array data repository, Nucleic Acids Res., 2002, 30, 207-210

[27] Choe S.E., Boutros M., Michelson A.M., Church G.M., Halfon M.S., Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control data set, Genome Biol., 2005, 6, R16

[28] Bosco M.C., Puppo M., Santangelo C., Anfosso L., Pfeffer U., Fardin P., et al., Hypoxia Modifies the Transcriptome of Primary Human Monocytes: Modulation of Novel Immune-Related Genes and Identification Of CC-Chemokine Ligand 20 as a New Hypoxia-Inducible Gene, J. Immunol., 2006, 177, 1941-1955

[29] Gaigneaux A., De Hertogh B., Berger F., Pierre M., Bareke E., Depiereux E., Discussion about ROC curves and others figures used to compare microarray statistical analyses, Proceedings of Benelux Bioinformatics Conference 2008 (BBC 2008), Maastricht, Netherlands

[30] Affymetrix, http://www.affymetrix.com/support/technical/sample_data/datasets.affx

618