applied statistics using spss, statistica, matlab and r978-3-540-71972-4/1 · with 195 figures and...

20
Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Upload: doanhanh

Post on 07-Feb-2018

227 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

Applied Statistics Using SPSS, STATISTICA,MATLAB and R

Page 2: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

With 195 Figures and a CD

123

Joaquim P. Marques de Sá

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Page 3: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

Printed on acid-free paper 5 4 3 2 1 0SPIN: 11908944 42/

E d itors

3100/Integra

TypesettinProduction: Integra Software Services Pvt. Ltd., IndiaCover design: WMX design, Heidelberg

g: by the editors

Library of Congress Control Number: 2007926024

This work is subject to copyright. All rights are reserved, whether the whole or part of the material isconcerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publicationor parts thereof is permitted only under the provisions of the German Copyright Law of September 9,1965, in its current version, and permission for use must always be obtained from Springer. Violationsare liable for prosecution under the German Copyright Law.

Springer is a part of Springer Science+Business Mediaspringer.com© Springer-Verlag Berlin Heidelberg 2007

The use of general descriptive names, registered names, trademarks, etc. in this publication does notimply, even in the absence of a specific statement, that such names are exempt from the relevant pro-tective laws and regulations and therefore free for general use.

ISBN 978-3-540-71971-7 Springer Berlin Heidelberg New York

Prof. Dr. Joaquim P. Marques de Sá Universidade do Porto Fac. Engenharia

4200-465 Porto Portugal e-mail: [email protected]

Rua Dr. Roberto Frias s/n

Page 4: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

To Wiesje and Carlos.

Page 5: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

Contents

Preface to the Second Edition xv

Preface to the First Edition xvii

Symbols and Abbreviations xix

1 Introduction 1

1.1 Deterministic Data and Random Data.........................................................1 1.2 Population, Sample and Statistics ...............................................................5 1.3 Random Variables.......................................................................................8 1.4 Probabilities and Distributions..................................................................10

1.4.1 Discrete Variables .......................................................................10 1.4.2 Continuous Variables ..................................................................12

1.5 Beyond a Reasonable Doubt... ..................................................................13 1.6 Statistical Significance and Other Significances.......................................17 1.7 Datasets .....................................................................................................19 1.8 Software Tools ..........................................................................................19

1.8.1 SPSS and STATISTICA..............................................................20 1.8.2 MATLAB and R..........................................................................22

2 Presenting and Summarising the Data 29

2.1 Preliminaries .............................................................................................29 2.1.1 Reading in the Data .....................................................................29 2.1.2 Operating with the Data...............................................................34

2.2 Presenting the Data ...................................................................................39 2.2.1 Counts and Bar Graphs................................................................40 2.2.2 Frequencies and Histograms........................................................47 2.2.3 Multivariate Tables, Scatter Plots and 3D Plots ..........................52 2.2.4 Categorised Plots .........................................................................56

2.3 Summarising the Data...............................................................................58 2.3.1 Measures of Location ..................................................................58 2.3.2 Measures of Spread .....................................................................62 2.3.3 Measures of Shape.......................................................................64

Page 6: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

2.3.4 Measures of Association for Continuous Variables.....................66 2.3.5 Measures of Association for Ordinal Variables...........................69 2.3.6 Measures of Association for Nominal Variables .........................73

Exercises.................................................................................................................77

3 Estimating Data Parameters 81

3.1 Point Estimation and Interval Estimation..................................................81 3.2 Estimating a Mean ....................................................................................85 3.3 Estimating a Proportion ............................................................................92 3.4 Estimating a Variance ...............................................................................95 3.5 Estimating a Variance Ratio......................................................................97 3.6 Bootstrap Estimation.................................................................................99 Exercises...............................................................................................................107

4 Parametric Tests of Hypotheses 111

4.1 Hypothesis Test Procedure......................................................................111 4.2 Test Errors and Test Power .....................................................................115 4.3 Inference on One Population...................................................................121

4.3.1 Testing a Mean ..........................................................................121 4.3.2 Testing a Variance.....................................................................125

4.4 Inference on Two Populations ................................................................126 4.4.1 Testing a Correlation .................................................................126 4.4.2 Comparing Two Variances........................................................129 4.4.3 Comparing Two Means .............................................................132

4.5 Inference on More than Two Populations..............................................141 4.5.1 Introduction to the Analysis of Variance...................................141 4.5.2 One-Way ANOVA ....................................................................143 4.5.3 Two-Way ANOVA ...................................................................156

Exercises...............................................................................................................166

5 Non-Parametric Tests of Hypotheses 171

5.1 Inference on One Population...................................................................172 5.1.1 The Runs Test............................................................................172 5.1.2 The Binomial Test .....................................................................174 5.1.3 The Chi-Square Goodness of Fit Test .......................................179 5.1.4 The Kolmogorov-Smirnov Goodness of Fit Test ......................183 5.1.5 The Lilliefors Test for Normality ..............................................187 5.1.6 The Shapiro-Wilk Test for Normality .......................................187

5.2 Contingency Tables.................................................................................189 5.2.1 The 2×2 Contingency Table ......................................................189 5.2.2 The rxc Contingency Table .......................................................193

viii Contents

Page 7: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

Contents ix

5.2.3 The Chi-Square Test of Independence ......................................195 5.2.4 Measures of Association Revisited............................................197

5.3 Inference on Two Populations ................................................................200 5.3.1 Tests for Two Independent Samples..........................................201 5.3.2 Tests for Two Paired Samples ...................................................205

5.4 Inference on More Than Two Populations..............................................212 5.4.1 The Kruskal-Wallis Test for Independent Samples ...................212 5.4.2 The Friedmann Test for Paired Samples ...................................215 5.4.3 The Cochran Q test ....................................................................217

Exercises...............................................................................................................218

6 Statistical Classification 223

6.1 Decision Regions and Functions.............................................................223 6.2 Linear Discriminants...............................................................................225

6.2.1 Minimum Euclidian Distance Discriminant ..............................225 6.2.2 Minimum Mahalanobis Distance Discriminant.........................228

6.3 Bayesian Classification ...........................................................................234 6.3.1 Bayes Rule for Minimum Risk..................................................234 6.3.2 Normal Bayesian Classification ................................................240 6.3.3 Dimensionality Ratio and Error Estimation...............................243

6.4 The ROC Curve ......................................................................................246 6.5 Feature Selection.....................................................................................253 6.6 Classifier Evaluation ...............................................................................256 6.7 Tree Classifiers .......................................................................................259 Exercises...............................................................................................................268

7 Data Regression 271

7.1 Simple Linear Regression .......................................................................272 7.1.1 Simple Linear Regression Model ..............................................272 7.1.2 Estimating the Regression Function ..........................................273 7.1.3 Inferences in Regression Analysis.............................................279 7.1.4 ANOVA Tests ...........................................................................285

7.2 Multiple Regression ................................................................................289 7.2.1 General Linear Regression Model .............................................289 7.2.2 General Linear Regression in Matrix Terms .............................289 7.2.3 Multiple Correlation ..................................................................292 7.2.4 Inferences on Regression Parameters ........................................294 7.2.5 ANOVA and Extra Sums of Squares.........................................296 7.2.6 Polynomial Regression and Other Models ................................300

7.3 Building and Evaluating the Regression Model......................................303 7.3.1 Building the Model....................................................................303 7.3.2 Evaluating the Model ................................................................306 7.3.3 Case Study.................................................................................308

7.4 Regression Through the Origin...............................................................314

Page 8: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

x Contents

7.5 Ridge Regression ....................................................................................316 7.6 Logit and Probit Models .........................................................................322 Exercises...............................................................................................................327

8 Data Structure Analysis 329

8.1 Principal Components .............................................................................329 8.2 Dimensional Reduction...........................................................................337 8.3 Principal Components of Correlation Matrices.......................................339 8.4 Factor Analysis .......................................................................................347 Exercises...............................................................................................................350

9 Survival Analysis 353

9.1 Survivor Function and Hazard Function .................................................353 9.2 Non-Parametric Analysis of Survival Data .............................................354

9.2.1 The Life Table Analysis ............................................................354 9.2.2 The Kaplan-Meier Analysis.......................................................359 9.2.3 Statistics for Non-Parametric Analysis......................................362

9.3 Comparing Two Groups of Survival Data ..............................................364 9.4 Models for Survival Data ........................................................................367

9.4.1 The Exponential Model .............................................................367 9.4.2 The Weibull Model....................................................................369 9.4.3 The Cox Regression Model .......................................................371

Exercises...............................................................................................................373

10 Directional Data 375

10.1 Representing Directional Data ................................................................375 10.2 Descriptive Statistics...............................................................................380 10.3 The von Mises Distributions ...................................................................383 10.4 Assessing the Distribution of Directional Data.......................................387

10.4.1 Graphical Assessment of Uniformity ........................................387 10.4.2 The Rayleigh Test of Uniformity ..............................................389 10.4.3 The Watson Goodness of Fit Test .............................................392 10.4.4 Assessing the von Misesness of Spherical Distributions...........393

10.5 Tests on von Mises Distributions............................................................395 10.5.1 One-Sample Mean Test .............................................................395 10.5.2 Mean Test for Two Independent Samples .................................396

10.6 Non-Parametric Tests..............................................................................397 10.6.1 The Uniform Scores Test for Circular Data...............................397 10.6.2 The Watson Test for Spherical Data..........................................398 10.6.3 Testing Two Paired Samples .....................................................399

Exercises...............................................................................................................400

Page 9: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

Contents xi

Appendix A - Short Survey on Probability Theory 403

A.1 Basic Notions ..........................................................................................403 A.1.1 Events and Frequencies .............................................................403 A.1.2 Probability Axioms....................................................................404

A.2 Conditional Probability and Independence .............................................406 A.2.1 Conditional Probability and Intersection Rule...........................406 A.2.2 Independent Events ...................................................................406

A.3 Compound Experiments..........................................................................408 A.4 Bayes’ Theorem ......................................................................................409 A.5 Random Variables and Distributions ......................................................410

A.5.1 Definition of Random Variable .................................................410 A.5.2 Distribution and Density Functions ...........................................411 A.5.3 Transformation of a Random Variable ......................................413

A.6 Expectation, Variance and Moments ......................................................414 A.6.1 Definitions and Properties .........................................................414 A.6.2 Moment-Generating Function ...................................................417 A.6.3 Chebyshev Theorem..................................................................418

A.7 The Binomial and Normal Distributions.................................................418 A.7.1 The Binomial Distribution.........................................................418 A.7.2 The Laws of Large Numbers .....................................................419 A.7.3 The Normal Distribution ...........................................................420

A.8 Multivariate Distributions .......................................................................422 A.8.1 Definitions .................................................................................422 A.8.2 Moments....................................................................................425 A.8.3 Conditional Densities and Independence...................................425 A.8.4 Sums of Random Variables .......................................................427 A.8.5 Central Limit Theorem ..............................................................428

Appendix B - Distributions 431

B.1 Discrete Distributions .............................................................................431 B.1.1 Bernoulli Distribution................................................................431 B.1.2 Uniform Distribution .................................................................432 B.1.3 Geometric Distribution..............................................................433 B.1.4 Hypergeometric Distribution.....................................................434 B.1.5 Binomial Distribution................................................................435 B.1.6 Multinomial Distribution...........................................................436 B.1.7 Poisson Distribution ..................................................................438

B.2 Continuous Distributions ........................................................................439 B.2.1 Uniform Distribution .................................................................439 B.2.2 Normal Distribution...................................................................441 B.2.3 Exponential Distribution............................................................442 B.2.4 Weibull Distribution..................................................................444 B.2.5 Gamma Distribution ..................................................................445 B.2.6 Beta Distribution .......................................................................446 B.2.7 Chi-Square Distribution.............................................................448

Page 10: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

xii Contents

B.2.8 Student’s t Distribution..............................................................449 B.2.9 F Distribution ...........................................................................451 B.2.10 Von Mises Distributions............................................................452

Appendix C - Point Estimation 455

C.1 Definitions...............................................................................................455 C.2 Estimation of Mean and Variance...........................................................457

Appendix D - Tables 459

D.1 Binomial Distribution .............................................................................459 D.2 Normal Distribution ................................................................................465 D.3 Student´s t Distribution ...........................................................................466 D.4 Chi-Square Distribution ..........................................................................467 D.5 Critical Values for the F Distribution .....................................................468

Appendix E - Datasets 469

E.1 Breast Tissue...........................................................................................469 E.2 Car Sale...................................................................................................469 E.3 Cells ........................................................................................................470 E.4 Clays .......................................................................................................470 E.5 Cork Stoppers..........................................................................................471 E.6 CTG ........................................................................................................472 E.7 Culture ....................................................................................................473 E.8 Fatigue ....................................................................................................473 E.9 FHR.........................................................................................................474 E.10 FHR-Apgar .............................................................................................474 E.11 Firms .......................................................................................................475 E.12 Flow Rate ................................................................................................475 E.13 Foetal Weight..........................................................................................475 E.14 Forest Fires..............................................................................................476 E.15 Freshmen.................................................................................................476 E.16 Heart Valve .............................................................................................477 E.17 Infarct......................................................................................................478 E.18 Joints .......................................................................................................478 E.19 Metal Firms.............................................................................................479 E.20 Meteo ......................................................................................................479 E.21 Moulds ....................................................................................................479 E.22 Neonatal ..................................................................................................480 E.23 Programming...........................................................................................480 E.24 Rocks ......................................................................................................481 E.25 Signal & Noise........................................................................................481

Page 11: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

Contents xiii

E.26 Soil Pollution ..........................................................................................482 E.27 Stars ........................................................................................................482 E.28 Stock Exchange.......................................................................................483 E.29 VCG........................................................................................................484 E.30 Wave .......................................................................................................484 E.31 Weather ...................................................................................................484 E.32 Wines ......................................................................................................485

Appendix F - Tools 487

F.1 MATLAB Functions ...............................................................................487 F.2 R Functions .............................................................................................488 F.3 Tools EXCEL File ..................................................................................489 F.4 SCSize Program ......................................................................................489

References 491

Index 499

Page 12: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

Preface to the Second Edition

Four years have passed since the first edition of this book. During this time I have had the opportunity to apply it in classes obtaining feedback from students and inspiration for improvements. I have also benefited from many comments by users of the book. For the present second edition large parts of the book have undergone major revision, although the basic concept – concise but sufficiently rigorous mathematical treatment with emphasis on computer applications to real datasets –, has been retained.

The second edition improvements are as follows:

• Inclusion of R as an application tool. As a matter of fact, R is a free software product which has nowadays reached a high level of maturity and is being increasingly used by many people as a statistical analysis tool.

• Chapter 3 has an added section on bootstrap estimation methods, which have gained a large popularity in practical applications.

• A revised explanation and treatment of tree classifiers in Chapter 6 with the inclusion of the QUEST approach.

• Several improvements of Chapter 7 (regression), namely: details concerning the meaning and computation of multiple and partial correlation coefficients, with examples; a more thorough treatment and exemplification of the ridge regression topic; more attention dedicated to model evaluation.

• Inclusion in the book CD of additional MATLAB functions as well as a set of R functions.

• Extra examples and exercises have been added in several chapters.

• The bibliography has been revised and new references added. I have also tried to improve the quality and clarity of the text as well as notation.

Regarding notation I follow in this second edition the more widespread use of denoting random variables with italicised capital letters, instead of using small cursive font as in the first edition. Finally, I have also paid much attention to correcting errors, misprints and obscurities of the first edition.

J.P. Marques de Sá

Porto, 2007

Page 13: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

Preface to the First Edition

This book is intended as a reference book for students, professionals and research workers who need to apply statistical analysis to a large variety of practical problems using STATISTICA, SPSS and MATLAB. The book chapters provide a comprehensive coverage of the main statistical analysis topics (data description, statistical inference, classification and regression, factor analysis, survival data, directional statistics) that one faces in practical problems, discussing their solutions with the mentioned software packages.

The only prerequisite to use the book is an undergraduate knowledge level of mathematics. While it is expected that most readers employing the book will have already some knowledge of elementary statistics, no previous course in probability or statistics is needed in order to study and use the book. The first two chapters introduce the basic needed notions on probability and statistics. In addition, the first two Appendices provide a short survey on Probability Theory and Distributions for the reader needing further clarification on the theoretical foundations of the statistical methods described.

The book is partly based on tutorial notes and materials used in data analysis disciplines taught at the Faculty of Engineering, Porto University. One of these

management. The students in this course have a variety of educational backgrounds and professional interests, which generated and brought about datasets and analysis objectives which are quite challenging concerning the methods to be applied and the interpretation of the results. The datasets used in the book examples and exercises were collected from these courses as well as from research. They are included in the book CD and cover a broad spectrum of areas: engineering, medicine, biology, psychology, economy, geology, and astronomy.

Every chapter explains the relevant notions and methods concisely, and is illustrated with practical examples using real data, presented with the distinct intention of clarifying sensible practical issues. The solutions presented in the examples are obtained with one of the software packages STATISTICA, SPSS or MATLAB; therefore, the reader has the opportunity to closely follow what is being done. The book is not intended as a substitute for the STATISTICA, SPSS and MATLAB user manuals. It does, however, provide the necessary guidance for applying the methods taught without having to delve into the manuals. This includes, for each topic explained in the book, a clear indication of which STATISTICA, SPSS or MATLAB tools to be applied. These indications appear in

use the tools, whenever necessary. In this way, a comparative perspective of the specific “Commands” frames together with a complementary description on how to

disciplines is attended by students of a Master’s Degree course on information

Page 14: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

xviii Preface to the First Edition

capabilities of those software packages is also provided, which can be quite useful for practical purposes.

STATISTICA, SPSS or MATLAB do not provide specific tools for some of the statistical topics described in the book. These range from such basic issues as the choice of the optimal number of histogram bins to more advanced topics such as directional statistics. The book CD provides these tools, including a set of MATLAB functions for directional statistics.

I am grateful to many people who helped me during the preparation of the book. Professor Luís Alexandre provided help in reviewing the book contents. Professor Willem van Meurs provided constructive comments on several topics. Professor Joaquim Góis contributed with many interesting discussions and suggestions, namely on the topic of data structure analysis. Dr. Carlos Felgueiras and Paulo Sousa gave valuable assistance in several software issues and in the development of some software tools included in the book CD. My gratitude also to Professor Pimenta Monteiro for his support in elucidating some software tricks during the preparation of the text files. A lot of people contributed with datasets. Their names are mentioned in Appendix E. I express my deepest thanks to all of them. Finally, I would also like to thank Alan Weed for his thorough revision of the texts and the clarification of many editing issues.

J.P. Marques de Sá Porto, 2003

Page 15: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

Symbols and Abbreviations

Sample Sets

A event

A set (of events)

{A1, A2,…} set constituted of events A1, A2,…

A complement of {A}

BAU union of {A} with {B}

BAI intersection of {A} with {B}

E set of all events (universe)

φ empty set

Functional Analysis

∃ there is

∀ for every

∈ belongs to

≡ equivalent to

|| || Euclidian norm (vector length)

⇒ implies

→ converges to

ℜ real number set +ℜ [0, +∞ [

[a, b] closed interval between and including a and b

]a, b] interval between a and b, excluding a

[a, b[ interval between a and b, excluding b

doesn’t belong to

Page 16: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

xx Symbols and Abbreviations

]a, b[ open interval between a and b (excluding a and b)

∑ =ni 1 sum for index i = 1,…, n

∏=

n

i 1 product for index i = 1,…, n

∫b

a integral from a to b

k! factorial of k, k! = k(k−1)(k−2)...2.1

( )nk combinations of n elements taken k at a time

| x | absolute value of x

x largest integer smaller or equal to x

gX(a) function g of variable X evaluated at a

dXdg

derivative of function g with respect to X

a

n

dXgdn

derivative of order n of g evaluated at a

ln(x) natural logarithm of x

log(x) logarithm of x in base 10

sgn(x) sign of x

mod(x,y) remainder of the integer division of x by y

Vectors and Matrices

x vector (column vector), multidimensional random vector

x' transpose vector (row vector)

[x1 x2…xn] row vector whose components are x1, x2,…,xn

xi i-th component of vector x

xk,i i-th component of vector xk

∆x vector x increment

x'y inner (dot) product of x and y

A matrix

aij i-th row, j-th column element of matrix A

A' transpose of matrix A

A−1 inverse of matrix A

Page 17: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

Symbols and Abbreviations xxi

|A| determinant of matrix A

tr(A) trace of A (sum of the diagonal elements)

I unit matrix

λi eigenvalue i

Probabilities and Distributions

X random variable (with value denoted by the same lower case letter, x)

P(A) probability of event A

P(A|B) probability of event A conditioned on B having occurred

P(x) discrete probability of random vector x

P(ωi|x) discrete conditional probability of ωi given x

f(x) probability density function f evaluated at x

f(x |ωi) conditional probability density function f evaluated at x given ωi

X ~ f X has probability density function f

X ~ F X has probability distribution function (is distributed as) F

Pe probability of misclassification (error)

Pc probability of correct classification

df degrees of freedom

xdf,α α-percentile of X distributed with df degrees of freedom

bn,p binomial probability for n trials and probability p of success

Bn,p binomial distribution for n trials and probability p of success

u uniform probability or density function

U uniform distribution

gp geometric probability (Bernoulli trial with probability p)

Gp geometric distribution (Bernoulli trial with probability p)

hN,D,n hypergeometric probability (sample of n out of N with D items)

HN,D,n hypergeometric distribution (sample of n out of N with D items)

pλ Poisson probability with event rate λ

Pλ Poisson distribution with event rate λ

nµ,σ normal density with mean µ and standard deviation σ

Page 18: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

xxii Symbols and Abbreviations

Nµ,σ normal distribution with mean µ and standard deviation σ

ελ exponential density with spread factor λ

Ελ exponential distribution with spread factor λ

wα,β Weibull density with parameters α, β

Wα,β Weibull distribution with parameters α, β

γa,p Gamma density with parameters a, p

Γa,p Gamma distribution with parameters a, p

βp,q Beta density with parameters p, q

Βp,q Beta distribution with parameters p, q 2dfχ Chi-square density with df degrees of freedom

2dfΧ Chi-square distribution with df degrees of freedom

tdf

Tdf

21,dfdff F density with df1, df2 degrees of freedom

21,dfdfF F distribution with df1, df2 degrees of freedom

Statistics

x̂ estimate of x

[ ]XΕ expected value (average, mean) of X

[ ]XV variance of X

Ε[x | y] expected value of x given y (conditional expectation)

km central moment of order k

µ mean value

σ standard deviation

XYσ covariance of X and Y

ρ correlation coefficient

µ mean vector

Student’s t density with df degrees of freedom

Student’s t distribution with df degrees of freedom

Page 19: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

Symbols and Abbreviations xxiii

Σ covariance matrix

x arithmetic mean

v sample variance

s sample standard deviation

xα α-quantile of X ( αα =)(xFX )

med(X) median of X (same as x0.5)

S sample covariance matrix

α significance level (1−α is the confidence level)

xα α-percentile of X

ε tolerance

Abbreviations

FNR False Negative Ratio

FPR False Positive Ratio

iff if an only if

i.i.d. independent and identically distributed

IRQ inter-quartile range

pdf probability density function

LSE Least Square Error

ML Maximum Likelihood

MSE Mean Square Error

PDF probability distribution function

RMS Root Mean Square Error

r.v. Random variable

ROC Receiver Operating Characteristic

SSB Between-group Sum of Squares

SSE Error Sum of Squares

SSLF Lack of Fit Sum of Squares

SSPE Pure Error Sum of Squares

SSR Regression Sum of Squares

Page 20: Applied Statistics Using SPSS, STATISTICA, MATLAB and R978-3-540-71972-4/1 · With 195 Figures and a CD 123 Joaquim P. Marques de Sá Applied Statistics Using SPSS, STATISTICA, MATLAB

xxiv Symbols and Abbreviations

SST Total Sum of Squares

SSW Within-group Sum of Squares

TNR True Negative Ratio

TPR True Positive Ratio

VIF Variance Inflation Factor

Tradenames

EXCEL Microsoft Corporation

MATLAB The MathWorks, Inc.

SPSS SPSS, Inc.

STATISTICA Statsoft, Inc.

WINDOWS Microsoft Corporation