beginning microarray data analysis: a biologist's guide to analysis of dna microarray data

3
Book Reviews 1649 A Biologist’s Guide to Analysis of DNA Microarray Data by Steen Knudsen John Wiley & Sons (2002) 144 pages. ISBN 0471224901 US$44.95 DNA microarrays have revolutionized biology. Instead of studying one gene or one protein at a time, scientists are now studying many simultaneously. This global approach has created many new opportunities to study human disease. For example, a number of microarray studies have demonstrated the existence of different clinical subtypes of cancer with different prognoses from those identified by other methods (Alizadeh et al., 2000; Bittner et al., 2000). Many biologists have jumped on this bandwagon and started performing their own microarray experiments. However, data analysis can often be confusing, because, on the one hand, this field is evolving quickly, and, on the other, the modern data mining techniques may appear to be daunting and intractable. Although there are several microarray books on the market, and few are dedicated to data analysis (Leung, 2002), there is no single book tailored to biologists. A Biologist’s Guide to Analysis of DNA Microarray Data is a good starting point for biologists new to data analysis. Written by Steen Knudsen, the book is composed of 14 chapters. The book starts with an introductory chapter explaining the main principles and usage of DNA microarrays. This is followed by a chapter presenting an overview of data analysis, in which all the methods are summarized in a simple flowchart. This useful chart clearly shows the basic workflow of microarray data analysis and includes the experimental setups. Most contemporary data analysis methods are discussed in chapters 3-8, in which the underlying principles are illustrated with vivid and simple examples. In chapter 3, Knudsen describes the basic data analysis methods, including scaling the measurements in the sample and the control, calculating the change in expression level of a gene and determining the significance of the expression level of a gene by using student’s t-tests, ANOVA or non- parametric tests. Potential problems, such as outliers, and multiple testing are discussed. Chapters 4-8 introduce various data processing and mining methods: principal component analysis for dimensionality reduction; cluster analysis, including hierarchical clustering, K-means clustering and self- organizing maps; various distance measures and their effects on data clustering; normalization methods to correct systematic biases; mining functions of orphan proteins and regulatory relationships between genes; reverse engineering of regulatory networks by time-series and steady- state approaches; constructing molecular classifiers, including nearest neighbor, neural networks and support vector machines. The constraints of these data analysis methods are emphasized and discussed in detail. This is very helpful for those without a strong background in statistics, as the limitations of statistical analysis methods are often overlooked. In chapter 9, Knudsen discusses the various considerations that need to be taken into account when selecting the appropriate probes for arrays. In chapter 10, the limitations of expression analysis are outlined. In particular, microarray expression study, transcriptomics, is primarily focused on gene expression and neglects many other aspects of cellular dynamics, such as alternative splicing, protein translation, post-translational modifications and degradation. Users need to be very cautious before making bold conclusions on the basis of their expression data. The genotyping array, a close relative to expression array, is briefly discussed in chapter 11. The discussion is largely concentrated on the author’s interest in neural network sequence prediction. Cell biologists often want to know which software is best for microarray data analysis. Chapters 12 and 13 provide a quick overview of the issues related to the choice of software. Often commercial software gives a false sense of security: they have inherent limitations, such as making implicit analysis assumptions for you. Therefore, Knudsen advocates the use of open source/free software for data analysis. There are a few important take home messages regarding software: standardizing the data format will greatly assist data sharing and comparability; learning a scripting language like Awk or Perl will allow you to manipulate your data with ease; and learning an open source statistical language, such as R, will allow you to run different analyses. In addition, with R there are numerous extension modules, libraries, that are written specifically for microarray data analysis, and almost all are free. A great feature of this book is that it shows a number of simple Awk scripts and R commands for various statistical analyses. Therefore, the reader can follow these step-by-step codes to experience first hand command- line-driven programs. There are some drawbacks to this book. Firstly, the background to these various statistical analyses is only briefly discussed; therefore, it requires some statistical training to appreciate many of the chapters. This conflicts with the book’s objective of guiding biologists Beginning Microarray Data Analysis

Upload: b-j

Post on 24-Mar-2017

232 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Beginning Microarray Data Analysis: A Biologist's Guide to Analysis of DNA Microarray Data

Book Reviews 1649

A Biologist’s Guide toAnalysis of DNA MicroarrayDataby Steen KnudsenJohn Wiley & Sons (2002) 144 pages. ISBN0471224901US$44.95

DNA microarrays have revolutionizedbiology. Instead of studying one gene orone protein at a time, scientists are nowstudying many simultaneously. Thisglobal approach has created many newopportunities to study human disease.For example, a number of microarraystudies have demonstrated the existenceof different clinical subtypes of cancerwith different prognoses from thoseidentified by other methods (Alizadeh etal., 2000; Bittner et al., 2000). Manybiologists have jumped on thisbandwagon and started performing theirown microarray experiments. However,data analysis can often be confusing,because, on the one hand, this field isevolving quickly, and, on the other, themodern data mining techniques mayappear to be daunting and intractable.Although there are several microarraybooks on the market, and few arededicated to data analysis (Leung, 2002),

there is no single book tailored tobiologists.

A Biologist’s Guide to Analysis of DNAMicroarray Datais a good starting pointfor biologists new to data analysis.Written by Steen Knudsen, the book iscomposed of 14 chapters. The bookstarts with an introductory chapterexplaining the main principles andusage of DNA microarrays. This isfollowed by a chapter presenting anoverview of data analysis, in which allthe methods are summarized in a simpleflowchart. This useful chart clearlyshows the basic workflow of microarraydata analysis and includes theexperimental setups.

Most contemporary data analysismethods are discussed in chapters 3-8,in which the underlying principlesare illustrated with vivid and simpleexamples. In chapter 3, Knudsendescribes the basic data analysismethods, including scaling themeasurements in the sample and thecontrol, calculating the change inexpression level of a gene anddetermining the significance of theexpression level of a gene by usingstudent’s t-tests, ANOVA or non-parametric tests. Potential problems,such as outliers, and multiple testingare discussed. Chapters 4-8 introducevarious data processing and miningmethods: principal component analysisfor dimensionality reduction; clusteranalysis, including hierarchicalclustering, K-means clustering and self-organizing maps; various distancemeasures and their effects on dataclustering; normalization methods tocorrect systematic biases; miningfunctions of orphan proteins andregulatory relationships between genes;reverse engineering of regulatorynetworks by time-series and steady-state approaches; constructingmolecular classifiers, including nearestneighbor, neural networks and supportvector machines. The constraints ofthese data analysis methods areemphasized and discussed in detail.This is very helpful for those without astrong background in statistics, asthe limitations of statistical analysismethods are often overlooked.

In chapter 9, Knudsen discusses the

various considerations that need to betaken into account when selecting theappropriate probes for arrays. Inchapter 10, the limitations of expressionanalysis are outlined. In particular,microarray expression study,transcriptomics, is primarily focusedon gene expression and neglectsmany other aspects of cellulardynamics, such as alternative splicing,protein translation, post-translationalmodifications and degradation. Usersneed to be very cautious before makingbold conclusions on the basis of theirexpression data. The genotyping array,a close relative to expression array, isbriefly discussed in chapter 11. Thediscussion is largely concentrated onthe author’s interest in neural networksequence prediction.

Cell biologists often want to know whichsoftware is best for microarray dataanalysis. Chapters 12 and 13 provide aquick overview of the issues relatedto the choice of software. Oftencommercial software gives a false senseof security: they have inherentlimitations, such as making implicitanalysis assumptions for you. Therefore,Knudsen advocates the use of opensource/free software for data analysis.There are a few important take homemessages regarding software:standardizing the data format willgreatly assist data sharing andcomparability; learning a scriptinglanguage like Awk or Perl will allow youto manipulate your data with ease; andlearning an open source statisticallanguage, such as R, will allow you torun different analyses. In addition, withR there are numerous extensionmodules, libraries, that are writtenspecifically for microarray data analysis,and almost all are free. A great featureof this book is that it shows a number ofsimple Awk scripts and R commands forvarious statistical analyses. Therefore,the reader can follow these step-by-stepcodes to experience first hand command-line-driven programs.

There are some drawbacks to this book.Firstly, the background to these variousstatistical analyses is only brieflydiscussed; therefore, it requires somestatistical training to appreciate many ofthe chapters. This conflicts with thebook’s objective of guiding biologists

Beginning MicroarrayData Analysis

Page 2: Beginning Microarray Data Analysis: A Biologist's Guide to Analysis of DNA Microarray Data

1650 Journal of Cell Science 116 (9)

All about Drosophilaeye development(almost)

individual topics. These range from theearliest establishment of the eye-field toretinal connections, colour vision andDrosophila as a model for disease, andtogether give a fairly comprehensiveoverview of the current areas of interestin this field. Reading the book from coverto cover is slightly problematic. Therewas quite a lot of repetition in the earlychapters, which describe early patterninggenes; TGF-β and Hedgehog signallingfeatured in several places too. Thesechapters created a modicum of confusionsince the conclusions seemed to differslightly. Beyond these early chapters wewere hooked by a wide range ofcontributions that each touch on adifferent problem. Particularly refreshingwere those topics that we had not comeacross in other reviews, such as theevolution of colour vision andapplications to human disease modelling.The latter chapter starts with a good,succinct overview of some of the majorcontributions made through studies of theeye and would be a useful chapter to giveto upper-level undergraduates to illustratethe diversity of applications of this model.

The book, therefore, provides a valuableintroduction to an important paradigm indevelopmental biology. As a whole itmight not be of immediate relevance tocell biologists, although chapters onregulation of growth and proliferation,protein stability, and programmed celldeath would be of interest to cell anddevelopmental biologists alike. Inaddition there is one gem by Don Readyabout the emergence of form in the eye,which describes the progression in cellshapes and cell contacts that occur as theeye develops. This short essay highlightssome of the amazing changes in cellmorphology and considers how thesecould contribute forces that shape thegeometrical regularity of the Drosophilacompound eye.

There is a danger that the book willbecome dated as the field progresses, andso those chapters with a well-roundedhistorical perspective are likely to be theones that better stand the test of time.Some of the chapters tend to focus on themost recent findings, whereas others,even amongst the better known topics,manage to achieve a balance betweenthe two. Occasionally, we foundourselves wanting more debate on

without special training through theanalysis step. However this is anunavoidable trade-off to make the bookeasier to read. Secondly, the book doesnot emphasize enough the experimentaldesign, which could significantly affectthe data analysis in the later stages of theexperiment. Without thorough planningand an understanding of the analysismethods, microarray analysis risksbeing a ‘fishing expedition’. But witha careful and critical approachexperiments can be quite the opposite.Finally, some of the chapters in thisbook are just too short to be justified assuch. For example, chapters 9-11 areonly between three and six pages long.More discussion of the issues raisedwould be welcome, even in anintroductory text.

Nonetheless this book is a good startingpoint for cell biologists who areinterested in analysis of DNAmicroarrays. It provides a background tomicroarray data analysis and a quickoverview of the current trends. ABiologist’s Guide to Analysis of DNAMicroarray Data does a marvelous jobof introducing biologists into the realmof genomic data analysis.

ReferencesAlizadeh, A. A., Eisen, M. B., Davis, R. E.,Ma, C., Lossos, I. S., Rosenwald, A.,Boldrick, J. C., Sabet, H., Tran, T. anf Yu,X. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expressionprofiling. Nature403, 503-511.Bittner, M. , Meltzer, P., Chen, Y., Jiang,Y., Seftor, E., Hendrix, M., Radmacher,M., Simon, R., Yakhini, Z. and Ben-Dor,A. (2000). Molecular classification ofcutaneous malignant melanoma by geneexpression profiling. Nature406, 536-540.Leung, Y. F. (2002). Microarray dataanalysis for dummies... and experts too?Trends Biochem. Sci.27, 433-434.

Bao Jian Fan 1 and Yuk Fai Leung 2

1Department of Ophthalmology andVisual Sciences, The Chinese Universityof Hong Kong, Hong Kong2Bauer Center for Genomics Research,Harvard University, 7 Divinity Avenue,Cambridge, MA 02138, USAJournal of Cell Science 116, 1649-1651 © 2003 TheCompany of Biologists Ltddoi:10.1242/jcs.00436

Drosophila EyeDevelopmentedited by Kevin MosesSpringer-Verlag (2002) 282 pages. ISBN 3-540-42590-X

£97.50/$149

Knowledge of Drosophila eyedevelopment has grown almostexponentially over the past few decades.Not only are the mechanisms thataccount for the formation of thismultifaceted structure intrinsicallyinteresting in their own right, but theyhave also contributed enormously to ourunderstanding of general developmentalparadigms and molecular pathways. Theexplosion in research into theDrosophilaeye was sparked principallyby the groups of Seymour Benzer andGerry Rubin, and it is primarily theiroffspring who have contributed thechapters to the recent volumeDrosophila Eye Development (Vol. 37 inthe series Results and Problems in CellDifferentiation), edited by Kevin Moses.

Each chapter in the book is a stand-alonereview making it possible to readup on

Page 3: Beginning Microarray Data Analysis: A Biologist's Guide to Analysis of DNA Microarray Data

Book Reviews 1651

the differing current models andcontroversies, but the extensivereference lists should allow readers toexplore these for themselves. Thequality of figures used to illustrate eachchapter also varies considerably. Thosecomparing vertebrate and Drosophilaeye development are exceptionallygood, particularly the colour diagrams.The chapter on cell death is also nicelyillustrated, but in some others the figures

are thin on the ground, which makessections a bit dry or hard to follow.

With contributions from many keyresearchers in the field, this bookprovides an excellent reference text forthose already working with theDrosophilaeye and, for those about to,it conveys a fascination for the eye andthe intricacy of its development. Theonly major shortcoming is its cover price

(almost £100), which means that a bookthat many might like to have on theirpersonal bookshelves will instead beconfined largely to library shelves.

Sarah Bray and Ruth Johnson

Department of Anatomy, University ofCambridge, UKJournal of Cell Science 116, 1649-1651 © 2003 TheCompany of Biologists Ltddoi:10.1242/jcs.00435

CommentariesJCS Commentaries highlight and critically discuss recent exciting work that willinterest those working in cell biology, molecular biology, genetics and relateddisciplines. These short reviews are commissioned from leading figures in the fieldand are subject to rigorous peer-review and in-house editorial appraisal. Each issueof the journal contains at least two Commentaries. JCS thus provides readers withmore than 50 Commentaries over the year, which cover the complete spectrum of cellscience. The following are just some of the Commentaries appearing in JCS over thecoming months.

Holiday junction resolvases Paul RussellI-κB complexes Anthony ManningIntermediate filament motility Robert GoldmanVav Victor TybulewiczFormins Charlie BooneSignalling roles of α-catenin Elaine FuchsThe functions of dynamin Harvey McMahonElectron tomography Wolfgang BaumeisterInteractions between Ras- and Rho-dependent signalling pathways in celltransformation Chris MarshallSignalling in three dimensions Mina BissellMechanosensitive channels Boris MartinacImmunodeficiency, albinism and Rab27a Gillian GriffithsExpanding the view of inositol signaling: the genomic era John YorkMediator function Danny Reinberg

Although we discourage submission of unsolicited Commentaries to the journal,ideas for future articles – in the form of a short proposal and some key references –are welcome and should be sent to the Executive Editor at the address below.

Journal of Cell Science, Bidder Building, 140 Cowley Rd, Cambridge, UK CB4 0DLE-mail: [email protected]; http://jcs.biologists.org