a gentle introduction to support vector machines in ... · december 21, 2010 9:41 9 x 7.5...
TRANSCRIPT
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
CHAPTER 1
Introduction
Classes of Data-Analytic Problems Considered in This Book
This book focuses on several classes of data-analytic problems that are described below.
Problem class I (classification ): Build computational classification models (or “classifiers”)that assign objects (e.g., patients/samples) into two or more classes. A classifier can be used fordifferential diagnosis, outcome prediction, and other classification tasks. Figure 1.1 illustratesan example classifier, which is a decision support system to diagnose primary and metastaticcancers from gene expression profiles of patients with lung cancer.1
Classifiermodel
Patient withlung cancer
Biopsy Gene expressionprofile
Primary Lung Cancer
Metastatic Lung Cancer
Figure 1.1
1The use of SVMs has grown with the genomic era and we use in this book many examples drawn from microarraygene expression analysis. The genes are the basic elements of the genome, stored in the cell nuclei as DNA(Deoxyribonucleic Acid), which encodes the program of all living organisms. Genes are selectively activated duringdevelopment and, after maturity of an organism, perform a variety of vital functions. The advent of microarrays hasrevolutionized genomics and opened the doors to new ways of studying gene regulation and diagnosing complexdiseases, including cancers. Microarrays now record the expression of tens of thousands of genes simultaneously.A patient may thus be associated to a vector of thousands of gene expression coefficients representing his/hergene activation status, which is characteristic of his/her health condition. This numeric representation allowscomputational methods described in this book to perform automatically diagnosis, prognosis, and other functions.
1
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
2 A Gentle Introduction to SVMs in Biomedicine, Volume 1: Theory and Methods
Problem: Which of the model(s) in Figure 1.2 (a), (b), or (c) is/are notclassifiers?
ModelArticle
Relevant to clinical trials
Irrelevant to clinical trials
Model
PatientBloodsample
Mass spectrometryproteomics profile
Will respond to treatment
Will not respond to treatment
Model
Patient Biopsy
Cannot make decision
Gene expressionprofile
Will survive for 3.5 years
a)
b)
c)
Figure 1.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Answer:ThemodelinFigure1.2(c)isnotaclassifierbecauseitdoesnotassignobjectsintodiscreteclasses.
Problem class II (regression): Build computational regression models to predict values ofsome continuous response variable. Regression models can be used to predict patient survival,length of stay in the hospital, laboratory test values, etc. Figure 1.3 illustrates an exampleregression model, which is a decision support system to predict optimal dosage of a drug ofchoice to be administered to the patient. This dosage is determined by the values of patientbiomarkers, and clinical and demographics data.
Regressionmodel
PatientBiomarkers,clinical and
demographics data
Optimaldosage is 5IU/Kg/week
1 3 22.2 423 3 92 2 1 8
Figure 1.3
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
Introduction 3
Problem: What type of model is shown in Figure 1.2(c)?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Answer:Thisisaregressionmodel,itpredictsacontinuousvalue(time)ofpatientsurvival.
Problem class III (feature/variable selection): Out of all measured variables in thedataset, select the smallest subset of variables that is necessary for accurate prediction (classi-fication or regression) of some response variable of interest (e.g., phenotypic response variable).Figure 1.4 gives a graphical illustration of the problem to find the most compact set of breastcancer biomarkers from microarray gene expression data for 20,000 genes. The figure showsa heat-map (matrix) of gene expression coefficients rendered with the following color cod-ing. Green values represent under-expressed genes and red values over-expressed genes. Eachcolumn of the matrix represents a gene and each line a patient sample. The problem is toselect the smallest subset of genes, which crisply separates normal from cancer patients (this iscalled “gene selection”). Notice that none of the genes outside the yellow box can accuratelyclassify breast cancer. However, any single gene in the yellow box classifies breast cancer with100% accuracy and thus solves the current problem.
Breastcancertissues
Normaltissues
20,000 genes measured by a microarray
Figure 1.4
Problem: Consider a dataset with 4 variables: systolic blood pressure, favoritenewspaper, first letter of the last name, and age of the pet. What will be the smallestsubset of variables that is necessary for accurate prediction of hypertension?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Answer:Systolicbloodpressure.
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
4 A Gentle Introduction to SVMs in Biomedicine, Volume 1: Theory and Methods
Problem class IV (novelty detection): Build a computational model to identify novel oroutlier objects (e.g., patients/samples). For example, such a model can be used to discoverdeviations in sample handling protocol when doing quality control of assays. Figure 1.5shows a more humorous illustration of novelty detection. The problem here is to build adecision support system to identify aliens.
Figure 1.5
Problem class V (clustering): Group objects (e.g., patients/samples) into several clustersbased on their similarity. Figure 1.6 gives a graphical illustration of the clustering of braintumor patients into 4 clusters based on their gene expression profiles. The figure shows aheat-map (matrix) of gene expression coefficients rendered with the following color coding.Green values represent under-expressed genes and red values over-expressed genes. Eachcolumn of the matrix represents a gene and each line a patient sample. All patients havethe same pathological type of the disease, and clustering defines four new disease subtypes(corresponding to clusters shown). These subtypes may have different characteristics in termsof patient survival and time to recurrence after treatment.
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
Introduction 5
Cluster #1
Cluster #2
Cluster #3
Cluster #4
Figure 1.6
Problem: The dataset contains the age of 20 patients admitted to the emer-gency room this morning. The patients’ ages were recorded as “child”, “adult”,and “senior”.a) How should one cluster this data to identify patients who should be seen by
the geriatrician?b) How should one cluster this data to identify patients who should be seen by
the pediatrician?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Answer:Thereisnouniversallyoptimalwaytoclusterthisdata.Itdependsontheintendedapplication.a)Toidentifypatientswhoshouldbeseenbythegeriatrician,onecangroup
children&adultsinoneclusterandkeepseniorsintheotherone.b)Toidentifypatientswhoshouldbeseenbythepediatrician,onecangroup
adults&seniorsinoneclusterandkeepchildrenintheotherone.
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
6 A Gentle Introduction to SVMs in Biomedicine, Volume 1: Theory and Methods
Basic Principles of Classification
Let us consider basic principles of classification. Imagine a situation where one would like toclassify objects as boats and houses from the picture shown in Figure 1.7.
Figure 1.7
One simple way to do it is to say that all objects before the coast line are boats and allobjects after the coast line are houses. In this case, the coast line serves as a decision surfacethat separates two classes. Consider Figure 1.8 where the decision surface is shown in yellow,boats are shown inside red circles and houses are shown inside green squares.
Obviously such classification is not ideal and can lead to misclassifications. For example,if there are boats that are located on the shore, they will be misclassified as houses (seeFigure 1.9).
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
Introduction 7
Figure 1.8
These boats will be misclassified as houses
Figure 1.9
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
8 A Gentle Introduction to SVMs in Biomedicine, Volume 1: Theory and Methods
Problem: What will be the classification of this house that is located before thecoast line?
Figure 1.10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Answer:Itwillbeclassifiedasaboatbecauseitisbeforethecoastline.
The methods that build classification models (i.e., “classification algorithms”) operate verysimilarly to the example above. First, all objects (boats and houses) are represented geomet-rically, for example see Figure 1.11. In this example, the horizontal axis is longitude and thevertical one is latitude. The details of this representation are provided in Chapter 2.
Then the algorithm seeks to find a decision surface that separates classes of objects(Figure 1.12). The example decision surface is shown in yellow.
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
Introduction 9
Longitude
Latitude
Boat
House
Figure 1.11
Longitude
Latitude
Boat
House
Figure 1.12
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
10 A Gentle Introduction to SVMs in Biomedicine, Volume 1: Theory and Methods
The coast line is one possible decision surface; however, there are infinitely many otherdecision surfaces to separate these two classes of objects without errors. Two more decisionsurfaces are shown in Figure 1.13 with blue and magenta, respectively.
Once we have defined a decision surface, we can use the following rule for classification:unseen (new) objects are classified as “boats” if they fall below the decision surface and oth-erwise as “houses” (Figure 1.14). The decision surface together with the above classificationrule define the classification model.
Longitude
Latitude
Boat
House
Figure 1.13
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
Introduction 11
Longitude
Latitude
? ? ?
? ? ?
These objects are classified as boats
These objects are classified as houses
Figure 1.14
Problem: What will be the classification of the three objects shown in Fig-ure 1.15 below?
Longitude
Latitude
Object #2
Object #1
Object #3
Figure 1.15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Answer:Object#1willbeclassifiedasahouse(itisabovethedecisionsurface),object#2willbeclassifiedasaboat(itisbelowthedecisionsurface),andobject#3willbeclassifiedasaboat(itisbelowthedecisionsurface).
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
12 A Gentle Introduction to SVMs in Biomedicine, Volume 1: Theory and Methods
Main Ideas of the Support Vector Machine (SVM)Classification Algorithm
In this book we focus on a family of classification algorithms (i.e., they solve problems ofclass I) known as Support Vector Machines (SVMs). Extensions of the SVM algorithm canbe applied to solve problems of classes II-V.
Consider a dataset shown in Figure 1.16. This graph is a representation (scatter plot) ofthe health status of subjects in a two-dimensional space. Each coordinate represents a geneexpression level. The symbols (stars and circles) represent the two categories of subjects. Wewant to build a classifier to differentiate cancer patients from normal subjects based on twogenes X and Y.
Cancer patientsNormal subjectsGene X
Gene Y
Figure 1.16
Support vector machines seek a linear decision surface (e.g., a line in a two-dimensionalspace) that can separate classes of objects and has the largest distance (or largest “gap” or“margin”) between border-line objects (that are also called “support vectors”). See Figure 1.17for an example of several linear decision surfaces that can separate classes of objects (shown aslines of different colors); an infinite number of such decision surfaces exist in this data. SeeFigure 1.18 for an example of a linear decision surfaces that can separate classes of objects andalso has the largest gap between support vectors (border-line objects); only one such decisionsurface exists. Support vectors are three objects (one normal subject and two patients withcancer) that are shown with yellow highlighting in the figure.
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
Introduction 13
Cancer patientsNormal subjectsGene X
Gene Y
Figure 1.17
Cancer patientsNormal subjectsGene X
Gene Y
Figure 1.18
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
14 A Gentle Introduction to SVMs in Biomedicine, Volume 1: Theory and Methods
Problem: Which of the decision surfaces in Figure 1.19 a) or b) are linear andhave maximum gap between border-line objects?
Cancer patientsNormal subjectsGene X
Gene Y
Cancer patientsNormal subjectsGene X
Gene Y
a)
b)
Figure 1.19. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Answer:None.Decisionsurfaceina)doesnothavemaximumgap,andtheoneinb)isnon-linear.
Now consider another dataset shown in Figure 1.20. Obviously, there is no linear decisionsurface that separates normal subjects from cancer patients on the basis of genes X and Y.
What SVMs do in such cases is they “map” the data into a higher dimensional space (oftenmuch higher) known as the “feature space”, where the separating linear decision surface existsand it is determined. The feature space results from a clever mathematical construction knownas the “kernel trick”, see Figure 1.21.
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
Introduction 15
Gene Y
Gene X
Cancer
Normal
Figure 1.20
Gene Y
Gene X
Cancer
Normal
Cancer
Normal
kernel
Decision surface
Figure 1.21
The above examples were concerned with two-dimensional data (described by twogenes); however, SVMs are particularly helpful with data that has many thousands of di-mensions (e.g., genes, proteins, SNPs, etc.) when the analyst cannot make such intuitivedrawings.
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
16 A Gentle Introduction to SVMs in Biomedicine, Volume 1: Theory and Methods
History of SVMs and Their Use in the Literature
Support vector machines have a long history of development starting from the early 1950’s.In 1950 Aronszajn introduced the theory of reproducing kernels which broadly constitutesa theoretical basis of support vector machines (Aronszajn, 1950). The next milestone wasthe invention by Rosenblatt of a linear classifier called the Perceptron in 1957 (Rosenblatt,1962). Then in 1963 Vapnik and Lerner introduced the Generalized Portrait algorithm, aparticular case of which computes the optimal margin linear classifier, the linear version ofSVMs (Vapnik and Lerner, 1963). In 1964 Aizerman, Braverman and Rozonoer introducedthe geometrical interpretation of kernels as inner products in a feature space (Aizerman et al.,1964). They formally proved the duality (equivalence) of Perceptrons and Potential functions(a particular class of radial basis function), what was later referred to as the “kernel trick”. Thisinterpretation is a key component of non-linear SVMs. In 1965 Cover discussed large marginhyperplanes and also talked about data sparseness, all of which are fundamental aspects of SVMs(Cover, 1965). Then in 1968 Smith introduced the notion of slack variables to deal with noisyand linearly non-separable data (Smith, 1968). This was a very important milestone becauseslack variables are instrumental in soft-margin SVMs. In 1974–1979 Vapnik and Chervonenkislaid the foundation of learning theory, which gives a strong theoretical backing of SVMs andexplains their superior classification performance (Vapnik, 1979; Vapnik and Chervonenkis,1974). In 1975 Poggio proposed the much used polynomial kernel in SVMs (Poggio, 1975).In 1990 Wahba advanced the field of kernel methods for regression and contributed thecelebrated “Representer Theorem”, which states that regularized risk functionals in RKHSadmit solutions that are linear combinations of kernel functions of the training examples(Wahba, 1990). In 1990 Poggio and Girosi established connections between kernel regressionmethods and neural networks (Poggio and Girosi, 1990a; Poggio and Girosi, 1990b). SVMsin their modern form were first introduced in the milestone work by Boser, Guyon andVapnik in 1992 (Boser et al., 1992). Specifically, Boser, Guyon and Vapnik extended theoptimal margin linear classifier that was proposed by Vapnik in 1963 to non-linear cases.They suggested a way to create non-linear classifiers by applying the kernel trick (originallyproposed by Aizerman et al. in 1964) to maximum-margin classifiers. The resulting algorithmis formally similar, except that every dot product is replaced by a non-linear kernel function andis what is now known as SVM.2 In 1995 Cortes and Vapnik introduced soft-margin versionof SVM classifiers that use slack variables and are suitable for noisy and linearly non-separabledata (Cortes and Vapnik, 1995).
2The group of Vapnik and that of Aizerman both worked in the sixties in the same institution in Moscow, but ittook another 30 years to put together the two algorithms they proposed and give birth to the modern SVMs!
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
Introduction 17
359 621
906
1,430
2,330
3,530
4,950
6,660
8,180
8,860
4 12 46 99 201 351 521 726
917 1,190
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Use of Support Vector Machines in the Literature
General sciences
Biomedicine
Figure 1.22
Figure 1.22 shows the number of publications that reported the use of support vector ma-chines in biomedicine and general sciences. The data was obtained from the Google Scholarsystem (http://scholar.google.com/) on 10/23/2009 using the query “support vectormachines” OR “support vector machine” OR “support vector classifier” OR “support vector clas-sifiers” OR “support vector classification” to retrieve relevant publications. For biomedicalpublications, we used subject categories Biology, Life Sciences, and Environmental Science andMedicine, Pharmacology, and Veterinary Science. For general sciences, we used the remainingsubject categories. Figure 1.23 presents similar statistics for use of linear regression (querying“linear regression”). As can be seen from Figure 1.22, the use of SVMs is significantly increas-ing from year to year both in biomedicine and general sciences; however, general sciencesutilize support vector machines much more than biomedicine. On the other hand, an estab-lished and mathematically simpler method such as linear regression is used in more or less thesame way in biomedicine and general sciences, and its use is not increasing significantly overtime (Figure 1.23). From our experience of teaching and applying SVMs, the relatively smalladoption rate of these methods in biomedicine can be attributed to a large degree to a lack oftechnical background of biomedical researchers that impedes grasping both the theory andapplications of SVMs. That is why the primary purpose of this book is to introduce SVMsand their extensions in a very easy manner to allow biomedical researchers to understand andapply these important methods to real-life problems.
A GENTLE INTRODUCTION TO SUPPORT VECTOR MACHINES IN BIOMEDICINE - Volume 1: Theory and Methods© World Scientific Publishing Co. Pte. Ltd.http://www.worldscibooks.com/lifesci/7922.html
December 21, 2010 9:41 9 x 7.5 b1022-ch01 A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods
18 A Gentle Introduction to SVMs in Biomedicine, Volume 1: Theory and Methods
9,770 10,800
12,000
13,500
14,900 16,000
17,700
19,500 20,000 19,600
14,900 15,500
19,200 18,700 19,100
22,200
24,100
20,100
17,700 18,300
0
5000
10000
15000
20000
25000
30000
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Use of Linear Regression in the Literature
General sciences
Biomedicine
Figure 1.23