introduction to probability and statisticsconditional probability let Ωbe an event space and let p...

99
Introduction to Probability and Statistics Xi Kathy Zhou, PhD Division of Biostatistics and Epidemiology Department of Public Health http://www.med.cornell.edu/public.health/biostat.htm Feb. 2008

Upload: others

Post on 28-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Introduction to Probability and Statistics

Xi Kathy Zhou, PhDDivision of Biostatistics and Epidemiology

Department of Public Health

http://www.med.cornell.edu/public.health/biostat.htmFeb. 2008

Page 2: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Overview

Statistics:The mathematics of the collection, organization and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling – definition in American Heritage Dictionary

Why statistics:Through studying the characteristics of a small collection of observations proper inference for the entire population could be derived

Probability theory is the basic tool for statistical inference

Page 3: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Outline

Basic concepts in probabilityEvents and random variables Probability and probability distributionsMeans, variance and momentsJoint, marginal and conditional probabilitiesDependence and independence

Basic concepts in statisticsDataDescriptive statisticsStatistical Inference – EstimationStatistical Inference – Hypothesis testing

Page 4: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Probability – a measure of uncertainty

Example:

Random experiment Possible OutcomeToss a coin H, TRoll a 6-sided die 3, 5, 1,2,3

What do you think you’ll get in the above experiments?How sure are you?Why?

“each outcome is equally probable”.- Probability is used as a way to measure uncertainty.

Page 5: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Events

Definitions:Random experiment: an experiment which can result in different outcomes, and for which the outcome is unknown in advance.Sample space Ω : a set of all possible elementary outcomes of an experimentEvent: a subset of the sample space Ω

Random experiment sample space EventsToss a coin H, T H, TRoll a 6-sided die: 1,2,3,4,5,6 3, 5, 1,2,3

Page 6: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

)()()( then , and , If 3.1)( 2.

any for 0)( 1.

BPAPBAPØBABAP

AAP

+=∪=∈=Ω

∈≥

IF

F

Probability measure

F

FF

FF F

∈∈∈

∈∩∈∪∈

Ø 3.Athen A If 2.

BAandBAthen B A, If 1.c , ,

,Sigma field F: a set that satisfies the following,

Probability measure P on (Ω, F ): a function P: F → [0,1] satisfying the following properties (Ø denotes the empty set):

The 6-sided die example, Sigma field: Ø, 1, …, 6, 1,2, …, 1,2,3,4,5,6

Sigma field: Ø, 1,2,3, 4,5,6, 1,2,3,4,5,6

Page 7: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Probability measure - some properties

Comparing the uncertainty of events:

Assessing the uncertainty associated with other events

Example (rolling a 6-sided die):If we know P(1), …, P(6), we should know the uncertainty of more complicated events such as P(1,2,4)

)()( then , and , If BPAPBABA ≤⊆Ω∈

AAAAPAP −Ω=Ω∈−= and for )(1)(

Ω∈∩−+=∪Ω∈++=∪∪∪

BAanyBAPBPAPBAPAAAPAPAAP(A kkk

, for )()()()(,...,disjoint pairwisefor )(...)()... 1121

Illustration of rule 6.

A∩BΩ

A-B B-AA B

Page 8: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Probabilities of events - Examples

Experiment: randomly picking a DNA sequence of length 3

Event A: the picked sequence is “ATG”P(A)= 1/43 = 1/64

Experiment: randomly taking a DNA sequence of length 20 from a length 100 sequence with 20A’s

• Event A: having 20A’s in a row• What is the sample space, what is the probability of event A?Answer:

⎟⎟⎠

⎞⎜⎜⎝

⎛20

100/81

Page 9: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Conditional probability Let Ω be an event space and let P be a probability measure on Ω. Let B є Ω be an event (on which we want to condition). The function

defines a probability measure on Ω, the conditional probability given B. (Proof as exercise)

Independence: Let Ω be an event space and let P be a probability measure on Ω. Two events A,B є Ω with P(A)>0, P(B)>0 are called (stochastically) independent if one of the following equivalent conditions holds:

A , P(B)

B)P(A B)|P(A , ]1,0[ :B)|P(. Ω∈∩

=→Ω

EXAMPLE: Throwing a six sided fair dice, event A=“even number”, event B=<5, C= prime number, D=“<4”P(A)= ?, P(B)=?, P(A|B)=? Are A and B independent? P(C)=?, P(D)=?, P(C|D)=? Are C and D independent?

Relationship of two events

• P(A∩B) = P(A)·P(B)• P(A|B) = P(A)• P(B|A) = P(B)

Page 10: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Random variable

R.each xfor x)X( :hat property t with theR:Xfunction A :Variable Random

∈∈≤Ω∈→Ω

Fωω

A more common description of the results of a random experiment.

Takes on value from a set of mutually exclusive and collectively exhaustive states and represent it with a number.

Usually denoted by capital letters, e.g. X, Y, Z, etc.

Realizations of random variables are usually denoted in lower case, e.g., x,y,z, etc.

Can be discrete or continuous

Page 11: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Types of Random Variables

Discrete random variableAny variable whose possible values either form a finite set, or else can be listed in a countable infinite sequence

Continuous random variableAny variable whose possible values consist of an entire interval on the number line.

Page 12: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Probability distributions

Definition:

Probability Distribution of a random variable X is the function F:

R ->[0,1] given by F(x)= P(X<=x)

Characterizes the uncertainties of a random experiment before the experiment is conducted, i.e. we know that some results are more likely than others.

Page 13: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete Random Variable –Probability distribution function (pdf)A discrete random variable X with values x1, x2, …, xk, … has a

Probability density distribution of X is

P(X=xi)=pi, I = 1, 2, …, k, …

where pi is the probability mass function that satisfies

0≤pi≤1

p1+p2+… +pk+ … =1Range of this random variable: x1, x2, …, xk,…

Page 14: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete Random Variable –Cumulative distribution function (cdf)

The cumulative distribution function F(x) of a discrete random variable X defined by

Properties of the cdf:

0 ≤ F(x) ≤ 1

If x ≤ y then F(x) ≤ F(y)

Discrete case: step function, continuous from the right,

jump discontinues at x1, x2, …, xk,… with heights p1, p2, …, pk, …

∑≤

=≤=xxi

ii

pxXPxF:

)()(

Page 15: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete Random Variable –pdf and cdf exampleRandom experiment: Roll 2 dice

Random variable X: Sum of both valuesprobability distribution function cumulative distribution function

Page 16: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete Random Variable –Probability calculation rules

if

Page 17: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete random variables –Example of common distributions

Discrete uniform distribution (rolling a fair 6-sided die)

Geometric distribution (Repeat a Bernoulli experiment until the first success, the first occurrence of a event A.)

Page 18: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete random variable –Discrete uniform distributionA discrete random variable X is called a uniformly distributed on the range x1, x2, …, xk

if for all i = 1, …, k:

Example: Roll a fair die

Probability distribution of a uniform distribution

Page 19: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete Random Variable –Geometric distribution (1)Random experiment (Repeat a Bernoulli experiment until the first success)

Event: TH, TTH, …

Probability for a success: P(H) = π

Random variable X: “Number of trials until the first success” 1, 2, …

X has a geometric distribution with parameter π

The probability distribution function has the form

The cumulative distribution function has the form

Page 20: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete random variable –Geometric distribution (2)

Page 21: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete Random variable –Geometric distribution (3)

Page 22: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete random variable - Mean

The value you expect to get in a random experiment is the mean.

Example: If you toss a coin 10 times, you expect to get 5 heads and 5 tails. You expect this value because the probability of getting "heads" is 0.5 and if you toss 10 times you should get 5.

Definition: The mean of a discrete random variable with values x1, x2, …, xk, … and probability distribution p1, p2, …, pk, … is

Note that E(x) characterizes the random experiment

Page 23: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete random variable –Mean (Example)

Binary random variable X:

Assume P(X=1) = π and P(X=0)=1- π, then

E(x) = 0*P(X=0) + 1*P(X=1) = π

Toss a fair coin:

X gain/loss of a monetary unit

If P(X=1) = P(x=0), E(X) = ?

Roll a fair die: Once X value of the landing, E(X)=?

Twice X sum of values, E(X)=?

Page 24: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete random variable –Variance and Standard deviation

The variance of a discrete random variable is

The standard deviation is

Page 25: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete random variables –Independence

Definition: 2 discrete random variables X with range x1, x2, …, xk, … and Y with range y1, y2, …, yl, … are called independent, if for all

and

More general: n discrete random variables X1, X2, …, X3 are called independent, if for arbitrary values x1, x2, …, xn in their respective range the following term is true

Page 26: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete Random Variable –

Properties of the mean and calculation rules (1)Linear transformations:

Nonlinear transformations: real function

Example:

Note:

In general,

Example:

Page 27: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete Random Variable –Properties of the mean and calculation rules (2)

Linearity Rule:

Mean of a sum of (discrete) random variables:E(X+Y) = E(X)+E(Y)

E(a1X1+ … + anXn) = a1E(X1)+ … + anE(Xn)

Product rule independent (discrete) random variables:If X, Y are independent, then E(XY) = E(X)E(Y)

Example: Roll a die twice.

What is the mean of the product of the values

Page 28: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete random variable –Properties of the variance

Linear transformations

For independent random variables X, Y and X1, …, Xnrespectively, we can show

Var(X+Y)= Var(X)+ Var(Y)

and for any constant a1, …, an

Page 29: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete random variable –Variance (Examples)Binary random variable: Var(X)= π(1−π)

Proof

Roll a fair die once: X is the value at the landing

Var(X)=?

Roll a fair die twice: X is the sum of the values

Var(X) = ?

Page 30: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete Random variable –Independence (Example)Random experiment: Roll two dice

For all 1 ≤ i, j ≤ 6P(X=I, Y=j) = 1/36 =1/6 X1/6 = P(X=i)P(Y=j)

Random experiment: Roll a dieY = 1 if the value is a “prime number”

Z = 1 if the value is “smaller than four”

Are these two events independent? No

Because Y=1 and Z=1 means “2 or 3”,

so P(Y=1, Z=1) = 2/6 ≠ 1/2 · 1/2 = P(Y)·P(Z).

or equivalent: Is P(Y|Z) = P(Y) ?

Page 31: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Continuous Random Variables

Page 32: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Continuous random variable –Probability distribution

)()( , : xXPxFIRIRF ≤=→Definition. If X:Ω→IR is a random variable, the function

is called the distribution function of X.

If X is a continuous random variable with density f, the distribution function F can be expressed as

. )()( ∫∞−

=x

dxxfxF

This formula is the continuous analogue of the discrete case, in which the distribution function was defined as . )()( ∑

=xx

jj

xfxF

Page 33: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Continuous random variables –mean and variance

The statistics “mean” and “variance”, which were already defined for discrete random variables can be defined in an analogous way for continuous random variables:

∫∞

∞−

= dxxxfXE )()(

∫∞

∞−

−= dxxfXExXVar )())(()( 2

X discrete, Xj є x1,x2,…, X continuous with density f

∑ ==jx

jj xXPxXE )()(Mean

Variance )())(()( 2j

xj xXPXExXVar

j

=−= ∑

p.d.f. P(X=xj)c.d.f. P(X≤xj)

density function f(x)distribution function F(x)

Page 34: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Continuous random variables –Example 1

Uniform distribution. A continuous random variable X is called uniform or uniformly distributed (in the interval [a,b]) if it has a density function of the form

for some real values a<b. This is denoted by X ~ U(a,b).

⎪⎩

⎪⎨⎧ ∈

−=otherwise 0

b][a,for x 1)( abxf

f F

1

0

1

0

a b a b

Page 35: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

λ=0.3λ=1

λ=3

λ=1λ=0.3

λ=3f

F

Density and distribution function of an exponentially distributed rancom variable X ~ Ex(λ) for λ = 0.3, 1, 3.

Exponential distribution. A continuous random variable having a density

⎩⎨⎧ ≥⋅

=otherwise 0

0for x x)exp(-)(

λλxf

for some real parameter λ>0 is called exponentially distributed. Denote this by X ~ Ex(λ). The corresponding distribution function is

⎩⎨⎧ ≥−

=otherwise 0

0for x x)exp(-1)(

λxF

Continuous random variables –Example 2

Page 36: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Normal distribution. A continuous random variable X is called normally distributed or gaussian (with mean μ and standard deviation σ>0), write X ~ N(μ,σ2), if it has a density function of the form.

⎟⎠⎞

⎜⎝⎛ −

−= 221 )(exp

21)(

σμ

πσxxf

There is no closed form for the distribution function F of such a variable. the distribution function has to be computed numerically.

μ=0, σ=1

μ=0, σ=2

μ=0 , σ=0.8FStandard

normaldistributionμ=0, σ=1

μ=0, σ=0.8 μ=1,5 ,

σ=0.8

μ=1,σ=2

f

Density and distribution function of some normally distributed rancom variables X ~ N(μ,σ2)

Continuous random variables –Example 3

Page 37: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Two more continuous distributions

The χ2-distribution. If X1,…,Xn are independent random variables that are N(0,1)-distributed, then the random variable

is said to be Chi-squared distributed with n degrees of freedom, for short Z ~ χ2(n).

Student t-distribution (t-distribution). If X~N(0,1) and Z~ χ2(n) are independent, then the random variable

is said to have a t-distribution with n degrees of freedom, for short T ~ t(n).

222

21 ... nXXXZ +++=

nZXT

/=

This list of continuous random variables is by no means complete. For a survey, consult the statistics literature given in the reference list to this lecture series.

Page 38: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Definition. Let Ω be a probability space with probability measure P. Let X:Ω→IR and Y:Ω→IR be continuous random variables. X and Y are called independent if

for all x,y є IR.

)()()()() ,( yFxFyYPxXPyYxXP YX=≤≤=≤≤

Corollary. If the continuous random variables are independent,

for all real values of a1<a2,b1<b2.

)()() ,( 21212121 bYbPaXaPbYbaXaP ≤≤⋅≤≤=≤≤≤≤

Continuous random variables –Independence

Page 39: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Let X and Y be two random variables on the same probability space Ω. If there exists a function f: IR x IR→ IR such that

∫ ∫=≤≤≤≤2

1

2

1

),(),( 2121

a

a

b

b

dydxyxfbYbaXaP

For all real values of a1<a2,b1<b2, then X and Y are said to have a continuous joint (multivariate) distribution, and f is called their joint density. We will be considered only with this case here.

The marginal distribution of X is given by

where is the

density of the marginal distribution of X.

, )( ),()(2

1

2

1

21 ∫∫ ∫ ==≤≤∞

∞−

a

aX

a

a

dxxfdydxyxfaXaP

∫∞

∞−

= dyyxfxf X ),()(

Continuous random variables –Joint and marginal probability distributions

Page 40: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

The conditional distribution of X, given Y=b is given by

where is the

density of the conditional distribution of X, given Y=b.

∫ ===≤≤2

1

)|()|( 21

a

aX dxbYxfbYaXaP

∫∞

∞−

== dtbtfbxfbYxf X ),(),( )|(

We mention an equivalent condition for independence:

The random variables X and Y are independent if

1. f(x,y)=fX(x)fY(y) for all x,y є IR

2. fX(x|Y=b)=fX(x) for all x,b є IR.

3. fY(y|X=a)=fY(y) for all a,y є IR.

Continuous Random Variable –Conditional probability distributions

Page 41: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Basic Concepts in Statistics

Page 42: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Data, sampling and statistical inference

Data: Characteristics/properties of a random sample from a population.For example: y1, …, yn (n realizations of a random variable Y)

Sampling: Ways to select the subjects for which the characteristics/ properties of interest will be assessed

Examples: SRS, stratified, clustered

Statistical inference: Learning from datai.e. assuming these data are n draws from distribution fθ, what we know about

the population parameter.

Probability theory: reasoning from f->Y“if the experiment is like…, then f will be …, and (y1, …, yn) will be like…, or E(Y) must be…”

Statistics: Reasoning from Y to f“Since (y1, …, yn), turned out to be …, it seems that f is likely to be …, or the

parameter is likely to be around …”

Page 43: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Affymetrix Gene-Id Signal Detection p-value

BioB-5_at 258 P 0.000754

BioB-M_at 470 P 0.00007

BioB-3_at 247 P 0.000052

BioC-5_at 787 P 0.00011

BioC-3_at 695 P 0.00007

BioDn-5_at 939 P 0.000044

BioDn-3_at 4356 P 0.00006

CreX-5_at 9992 P 0.000044

CreX-3_at 11389 P 0.000044

DapX-5_at 5 N 0.354453

DapX-M_at 14 N 0.239063

DapX-3_at 1 N 0.949771

LysX-5_at 4 N 0.470241

LysX-M_at 3 N 0.672921

LysX-3_at 2 N 0.631562

PheX-5_at 14 N 0.897835

PheX-M_at 65 N 0.32412

PheX-3_at 9 N 0.58862

ThrX-5_at 25 N 0.749204

ThrX-M_at 4 N 0.760937

ThrX-3_at 118 N 0.249204

There are different types of data:

• numerical data (discrete, continuous)• categorical data (ordered, non-ordered)• mixtures of both

If the properties consist of multiple features (like “Signal”, “Detection”, ”p-value” in the example), the data is called multivariate, otherwise it is called univariate.

Types of Data

Page 44: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Steps in statistical analysis of data?

Describing the data (descriptive statistics)

Propose reasonable probabilistic model

Making inference about parameters in the model

Check the model fitting/assumption

Report results

Page 45: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Describing univariate categorical data

Frequency table: Simply list all object-property pairs in a table.

Count the number of objects in each category, display the result in a tableCalculate the relative size of the category, display the result in the table

Example:

Page 46: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Assume we have a dataset with objects 1, …, n and their real-valued properties x1, …, xn.Histogram

Choose intervals with C1=[a1,a2), C2=[a2,a3), …, Ck=[ak,ak+1), and a1 < a2< … < ak+1 (this process is called “binning”). Let yk = Ck iff xk є Ck. Display the categorical dataset y1,…,yn as a bar plot, with the width of the bars proportional to the length of the intervals.

Dataset: the height of a population of 10,000 people. Histograms were plotted with k equidistant bins, k = 8,16,32,64

A local maximum of the abundance distribution is called a mode, xmode .Distribution with only one mode are called unimodal, distributions with more modes are called multimodal.

Describing univariate categorical data

Page 47: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Descriptive statistics 1

The second and by far the most important way is to summarize the data by appropriate statistics. A statistics is a rule that assigns a number to a dataset. This number is meant to tell us something about the underlying dataset.

Examples:

Arithmetic mean. Given x1, …, xn, calculate the arithmetic mean as

The arithmetic mean is one of the many statistics that aim to describe where the “centre” of the data is. The arithmetic mean minimizes the sum of the quadratic distances to the data points, namely

∑=

=n

jjx

nx

1 1

x

⎟⎟⎠

⎞⎜⎜⎝

⎛−= ∑

=

2

1x )(argmin

n

jjxxx

Page 48: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Median. Let x1, …, xn, be given in ascending order. The median xmed is defined as

⎩⎨⎧

+=

+

+

even is if 2/)(

odd is if

12/2/

2/)1(

nxx

nxx

nn

nmed

The median is a value such that the number of data points smaller than xmed equals the number of data points greater than xmed. Like the arithmetic mean, the median is also a location measure for the “centre” of the data.

Mean

Median

Mode (this distribution is unimodal!)

Descriptive statistics 2

Page 49: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Posture rules:

Left skew: Right skew:

Symmetric:

Symmetry. A frequency distribution is called symmetric if it has an axis of symmetry. Skewness. A frequency distribution is called skewed to the right if the right tail of the distribution falls off slower than the left tail. Analogously: skewed to the left.

MeanMedianMode

left skew symmetric skewed to the right

modexxx med <<

modexxx med ≈≈modexxx med >>

Descriptive statistics 3

Page 50: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Quantiles. Let q є (0,1). A q-quantile of a frequency distribution is a value xq such that the fraction of data lying left to xq is at least q, and the fraction lying right to xq is at least 1-q. If the data is ordered , then) ... ( 21 nxxx ≤≤≤

⎡ ⎤

⎪⎩

⎪⎨⎧

∈=

+ integeran is if ][

integeran not is if

1, qnxx

qnxx

qnqn

qnq

X0.05 X0.25X0.5 X0.75 X0.95

Special quantiles are the quartiles, x0.25,x0.5,x0.75 (which split up the data into four classes), and the quintiles x0.2,x0.4,x0.6,x0.8. They are frequently used to give a summary of the data distribution.

Descriptive statistics 4

Page 51: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Variance, Standard deviation. The variance v=Var(x1,…,xn)=Var(x) of a dataset x=(x1,…,xn ) is defined as

(the average squared distance from all data points to ). The standard deviation s=s(x) is the positive square root of the variance, s2=v. The variance and the standard deviation are measures for the dispersion of the data.

∑=

−==n

jj xx

nsv

1

22 )( 1

small variance vs. high variance

Rel

ativ

e fre

quen

cy

Rel

ativ

e fre

quen

cy

x

Descriptive statistics 5

Page 52: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Density plots. If the number of data points is large, it is often convenient to approximate a histogram (of the relative frequencies) by a density curve (red line):

A density function is a non-negative real-valued integrable function f such that

(this condition says that the area enclosed by the graph of f and the x-axis is 1).Interpretation: The area of a segment enclosed by the x-axis, the graph of f and y=x0 and y=x1 (the grey shaded area in the figure) equals the fraction of data points with values between x0 and x1.

∫∞

∞−

= 1 )( dxxf

x0 x1

Detailed description of univariate data

Page 53: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Normal distributions = Gaussian distributions. A very important family of density functions are the Gaussian distributions, defined as

⎟⎠⎞

⎜⎝⎛ −−= 2

21 )(exp

21)(

σμ

πσxxf

This distribution is symmetric (around x=μ), unimodal (with mode at x=μ) and shaped like a bell. The mean of gaussian distributed data is μ, its variance is σ2.

With parameters μ and σ>0.

The 68-95-99.7 rule. If a dataset has a gaussian distribution with mean μ and variance σ2, then

68% of the data lie within the interval [ μ-σ, μ+σ ]95% of the data lie within the interval [ μ-2σ, μ+2σ ]99.7% of the data lie within the interval [ μ-3σ, μ+3σ ]

Continuous univariate data –An important distribution

Page 54: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Summary

Frequency tables, Bar plots, Pie charts, Histograms, Density plots are possible ways to display statistical data.

Mean, Median and Quantiles are measures of location for numerical data

The variance is a measure of variation for numerical data, it has pleasant transformation properties

The Gaussian distribution is a very important density function.

Page 55: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Multivariate descriptive statistics

Page 56: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Multidimensional data (1)

In many applications a set of properties/features is measured.

If we want to learn facts about a single property, we use univariate statistical measures, e.g. mean, median, variance, quantiles.

If we want to learn how two or more properties depend on each otherwe need multivariate statistical measures.

Examples (multidimensional data)

measure age and gender of the same person.

microarray gene expression data are multidimensional

Ways to describe these data

Page 57: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Multidimensional data (2)

For each object i , i=1,…n, we measure simultaneously several features X,Y, Z,… multidimensional or multivariate data

We get the values (xi,yi,zi) of the features for object i

In the following, we consider two features.

Question:X <--> Y How does the correlation between X and Y look like?

Correlation (Association)

X --> Y How does X affect the feature Y (response)?Regression

Page 58: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Discrete/grouped data

If a feature has only a finite or countable infinite number of possible values, we call it discrete.E.g. number of A’s on a DNA-sequence.

If a feature’s possible values range in an interval, we call it continuous.E.g. weight of a person.

To know:

How to describe the distribution of two discrete features.

How to evaluate whether the two features are correlated

This also includes continuous features grouped into categories.

Page 59: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

General description: Contingency table –Absolute frequencies

A (k x m) contingency table of absolute frequencies has the form:

The contingency table describes the joint distribution of X and Y in terms of absolute frequencies

Page 60: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Contingency table –Marginal frequencies

The column and row sums of the contingency table are called the marginal frequencies of the features X and Y.

We write hi.=hi1+…him , i=1,…,k and h.j=h1j+…hkj , j=1,…,m

The resulting sums h1.,…, hk. and h.1,…, h.m describe the univariate distributions of the features X and Y. This distribution is also called the marginal distribution.

Page 61: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Contingency table –Relative frequencies

A (k x m) contingency table of relative frequencies has the form:

The contingency table describes the joint distribution of X and Y.

The margins describe the marginal distributions of X and Y.

Page 62: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Contingency table –Conditional frequencies

By looking at the absolute or relative frequencies alone it is not immediately possible to decide whether there is a correlation between features.

Therefore: Look at conditional frequencies, i.e. the distribution of afeature for a fixed value of the second feature

Page 63: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Contingency table –Conditional frequency distribution (1)

Conditional frequency distribution of Y under the condition X=ai, also written Y|X=ai , is given by:

Conditional frequency distribution of X under the condition Y=bj , also written X|Y=bj , is given by:

Page 64: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Contingency table –Conditional frequency distribution (2)

Because of

we also have

The conditional distributions are computed by dividing the joint frequencies by the appropriate marginal frequencies.

Page 65: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Contingency-table –χ2 coefficients

Starting point: How should the joint frequencies look like, so that we could “empirically” assume independence between X and Y (given the marginal distributions)

Page 66: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Contingency table –Empirical independence

Idea: X and Y are “empirically” independent if and only if the conditional frequencies

are equal in each sub-population X=ai , i.e. independent of ai

Page 67: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Contingency table –Assessing empirical independence

Idea: Compare for each cell (i,j) the theoretical frequency with the observed frequency under the assumption of independence

→ χ2 coefficient:

Page 68: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

X and Y are empirically independent

large <==> strong correlation

small <==> weak correlation

Disadvantage: depends on the dimension of the table

Contingency table –Properties of the χ2 coefficients

Page 69: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Graphical representation of quantitative features

Graphical representation of the values (xi,yi), i=1,…,n from two continuous features X and Y.

The simplest representation of (x1,y1),…,(xn,yn) in a coordinate system is call a scatterplot

Page 70: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Correlation of continuous features

Aim: Find a measure that describes the correlation of two continuous features X and Y.

Measure with

no or only weak correlation

strong positive correlation

strong negative correlation

Page 71: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Pearson’s correlation coefficient (1)

The Pearson correlation coefficient for the data (xi,yi), i=1,…,n is defined as

The range of r is [-1,1]

r > 0 positive correlation, positive linear relationship, i.e. values are around a straight line with positive slope

r < 0 negative correlation, negative linear relationship, i.e. values are around a straight line with negative slope

r = 0 no correlation, uncorrelated

Page 72: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Pearson’s correlation coefficient (2)

The correlation coefficient r measures the strength of a linear relationship

Page 73: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Pearson’s correlation coefficient (3)

Rule of thumb:

“weak correlation”

“medium correlation”

“strong correlation”

Linear transformations:

correlation coefficient between and

correlation coefficient between and

oror

Page 74: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Equivalent forms of r

Multiplying out yields:

Remember the formula for variances!

In terms of standard deviations and covariance

with covariance

and standard deviations

Page 75: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Statistic Inference

EstimationFinding approximations of the model parameters – point estimationFinding the uncertainty associated with the population parameter – interval estimation (finding the confidence intervals)

Hypothesis testing

Page 76: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Point Estimation

Finding:

Desired properties of the estimator:unbiasedness (bias is measured as the expected difference between the estmator and the population parameter) efficiency (could be described by the inverse of variance of the estimator)small mean square error (MSE)other: consistency, etc.

Common methods to find estimators:Method of momentsMaximum likelihood estimation

),...,(ˆ 1 nxxθ

2)ˆ( θθ −E

Page 77: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Estimation: Method of Moments

Method of moments: Match the first E(X), second (E(X2)),…, order moments to the parameters Solve the equation systemIf sample E(Xk)=g(θ), then )(ˆ kx-1g=θ

Maximum Likelihood Estimation (MLE)Assuming the data come from a parametric family indexed by a population parameter θ, i.e. X1,…, Xn ~ i.i.d. f(x| θ), the joint density of the data is

The probability of observing the data is the likelihood function of the parameter θ under the assumed probabilistic model, i.e.

)|()|,...,( 1 θθ in XfXXf Π=

)|()|,...,( 1 θθ in xfxxfLikelihood Π==

Page 78: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Example: Binomial data

Data: 6,3,5,6,8 number of successes in 5 repeated experiments of tossing a coin 10 times

Is this a fair coin? What is going to come up for the 11th toss?

Assuming a probabilistic model: X ~Binom (π,10)Estimating π

MOM: Because E(X)= π, estimate of π = sample mean = (0.6+0.3+0.5+0.6+0.8)/5MLE: L(π|data) = P(x1=6,…, x5= 8|π)=P(x1=6 |π)...P(x5=8|π), then find the value that maximize the likelihood function

Page 79: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Example: Normal data

x1,x2,....,xn ~ iid N(μ,σ2)

Joint pdf for the whole random sample

nμ i∑=

x

n

)μ(σ

2i2 ∑ −

=x

Maximum likelihood estimates of the model parameters μ and σ2

are numbers that maximize the joint pdf for the fixed sample which is called the Likelihood function

2

2

2σμ)(x

2

σ2π1σ,μ | x(

= ef N )

)σ,μ |()...σ,μ |()σ,μ |(σ,μ | ,...,, ( 222

21

221 nn xfxfxfxxxf =)

)σ,μ |()...σ,μ| ()σ,μ |( ,...,,|σ,μ ( 2121 nn xfxfxfxxxl =)

Likelihood function is basically the pdf for the fixed sample

Page 80: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Hypothesis Testing

Making inference about the value of the population parameter based on the dataStart with hypotheses about the population parameter (include: null, alternative)Using data to assess the sample variability of the null hypothesis

Conclusion: Reject the H0, the data is highly unlikely to be generated from the probabilistic model defined by H0Fail to reject H0, the data is not highly unlikely to happen with H0

Page 81: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

X: the expression level of gene A under condition 1

Y: the expression level of gene A under condition 2

To decide: if the average expression levels are equal

Null hypothesis H0=“both expression levels are equal”.

Alternative hypothesis H1= “the expression levels are unequal”.

Specify a method how to decide between these two alternatives.

• Choosing an appropriate statistics D that is able to discriminate between the two hypotheses and

• Choose a rejection region in which H0 is rejected.

The selection of the statistics defines the test.

Example: Hypothesis Testing

Page 82: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Then, the biologist might define the acceptance region as [-1/2,2], i.e. if |log2D| > 1, he rejects the null hypothesis in favour of the alternative hypothesis (differential gene expression). If |log2D| ≤ 1, he does not reject H0. This test is not optimal (see the Exercises), but it is still used by many researchers.

The great advantage of this approach is that the choice of the confidence interval can be done implicitly by prescribing a significance level.

y

k

X

n

nyyy

nxxx

D +++

+++

= ...

...

21

21

X = gene1

Y = gene2

The biologist may proceed in the following way: He has nx replicate measurements of the gene of interest in condition X, (x1,…,xny), and ny replicate measurements of condition Y, (y1,…,ynx). He might divide the average of the X measurements by the average of the Y measurements and obtain the statistics:

Hypothesis Testing (Example)

Page 83: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Let a α є (0,1) be given. Usually α is a number close to zero. The statistics D can be interpreted as a random variable. If we assume the null hypothesis is valid, we can find a (not necessarily unique) interval J on the real line such that

This means that given the null hypothesis is valid, the probability of observing a value of D outside the interval J is α (and hence small, if α is small). The complement of J in IR is then taken as the rejection region for the test.

In the biologist example, there are better ways to design a test for differential gene expression. Under the assumption that the expression values for X respectively Y follow a normal distribution, we can conduct a t-test:

α=∉ )H |( 0JDP

Hypothesis Testing –Significance level (Example)

Page 84: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Assume that X=(x1,…,xnx) resp. Y=(y1,…,yny) are two samples of independent normally distributed random variables with mean μx resp. μy and standard deviation σx resp. σy. The null hypothesis can be stated as H0 = “μx = μy“.

yX nYVar

nXVar

YEXET)()(

)()(

+

−=The T statistic

Is perfectly designed to answer this question.

If the null hypothesis is true, i.e. μx - μy is near 0, then T should be close to 0 except for random outcomes that are pretty unusual.

The T statistic is random variable with a little bit complicated distribution:

If and , then T has approx. a t-distribution with d degrees of freedom, where d is the closest integer to

),(~ 2XXNX σμ ),(~ 2

YYNY σμ

⎟⎟⎠

⎞⎜⎜⎝

⎛−

+−

+ 2222222 )/(1

1)/(1

1/)//( YYY

XXX

YYXX nSn

nSn

nSnS

The two sample Student t-test.

Page 85: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

x

dens

ity o

f t(k

=8)

-6 -4 -2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

2.5% 2.5%

95%

The density of the T statistic tells us how far from 0 we should expect T to be most of the time, given the null hypothesis is true.

E.g. for k=8, and significance level α = 5%, we would expect only 5% of the time for T to be above t(0.975; 8)=2.306 or below t(0.025; 8) = -2.306. Thus a typical decision rule in this case would be to reject H0 in favour of H1 if |T| > t(0.975;8) = 2.306.

Density of the t-statistic for k=8. Symmetric confidence interval for α = 5%.

Student t-test

P-values.The probability of observing values of D that are at least as extreme as d.

Calculate the p-value p = P(|D|>|d|)

Given a significance level α, we reject the null hypothesis if p< α.

Page 86: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Hypothesis Testing – Error types

If we reject the null hypothesis when it is actually true, we have made what is called a type I error or a false positive. (Example: Falsely declare a gene as differentially expressed)If we accept the null hypothesis when it is actually false, then we have made a type II error or a false negative. (Example: Failed to identify a truly differentially expressed gene)

True negatives Type II error(false negatives)

Type I error True positives(false positives)

H0 true H0 not true

Hypothesis not rejected

Hypothesis rejected

Page 87: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

In hypothesis testing, the probability of a Type I error is controled to be at most as high as significance level of the test.

It is harder to control the probability of a Type II error because we usually do not have a statistics for testing the alternative hypothesis.

The smaller the true existing difference in expression levels the larger the probability of a Type II error.

Given a statistical testing procedure, it is impossible to keep both error types arbitrarily large by selecting a special significance level. There is a trade-off between type I and type II error, as depicted by the next figure.

Hypothesis Testing – Error Types (Cont.)

Page 88: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Two types of tests

Parametric tests: A parametric distribution is assumed for the measured random variables.

E.g. the t-test assumes that the variables are normally distributed. (If this were not the case, this would lead to wrong p-values or wrong confidence intervals.)

Non-parametric tests: No parametric distribution function is assumed for the measured random variable

when the distribution of the measured variables is not known or when there is no appropriate test that can deal with the distribution of the measured variables are non-parametric tests. merely rely on the relative order of the values of on some very mild constraints concerning the shape of the probability distributions of the measured variables (e.g. unimodality, symmetry).

Often, prior to computing a test statistic, data is transformed in order to produce random variables that are easier to handle (e.g. to produce approximately normally distributed data).

We mention one parametric and one non-parametric test which are commonly used.

Page 89: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Given two samples x=(x1,…,xn) and y=(y1,…,ym) drawn independently from the random variables X and Y resp.

Testing whether the distibutions of X and Y are identical.

For large numbers it is almost as sensitive as the two Sample Student t-test.

For small numbers with unknown distributions this test is even more sensitive than the Student t-test.

The only requirement for the Wilcoxon test to be applicable is that the distributions are symmetric.

Wilcoxon rank sum test

Page 90: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Wilcoxon rank sum test (Cont.)

State the hypotheses:Null hypothesis: The two variables X and Y have the same distribution.Alternative hypothesis: The two variables X and Y do not have the same distribution

Choose a significance level α

Compute Test statistics: Rank order all N=n+m values from both samples combined. Sum the ranks of the smaller sample and call this value w.

Calculate p-valueLook up the level of significance (p-value) in a table using w, m and n. Calculating the exact p-value is based on calculating all permutations of ranks over both samples. (This is infeasible for n, m>10. Fortunately, there are approximations available (and implemented in R)).

Compare p-value with α and state the conclusionP-value < α : Reject H0P-value >= α : Fail to reject H0

Page 91: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Summary

Null hypothesis, test statistics

Significance level, rejection region, p-value

Type I and type II errors

5-Step testing procedure

Parametric tests: t -test, χ2-test, ANOVA

Non-parametric test: Wilcoxon rank sum test, Kruskal Wallis

Page 92: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Multiple hypothesis testing

Golub et al. (1999) were interested in identifying genes that are differentially expressed in patients with two type of leukemias:

- acute lymphoblastic leukemia (ALL, class 0) and - acute myeloid leukemia (AML, class 1).

Gene expression levels were measured using Affymetrix chips containing g = 6817 human genes.

n = 38 samples = 27 ALL cases + 11 AML cases.

Page 93: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Following Golub et al.

Three preprocessing steps were applied to the normalized matrix of intensity values available on the website: (i) Thresholding floor of 100 and ceiling of 16.000 ;(ii) filtering: exclusion of genes with (max /min) ≤ 5 or (max-min) ≤ 500,

where max and min refer respectively to the maximum and minimum intensities for a particular gene across all mRNA samples ;

(iii) base 10 logarithmic transformation.

The data were then summarized by a 3051 x 38 matrix.A two sample t-Test for was computed for each of the 3051 genes.

Multiple hypothesis testing

Page 94: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

H is to g r a m o f te s ts ta t

te s ts ta t

Freq

uenc

y

-5 0 5 1 0

010

020

030

040

050

0

H is to g r a m o f p -v a lu e s

2 * (1 - p no rm (a b s (te s ts ta t)))

Freq

uenc

y

0 .0 0 .2 0 .4 0 .6 0 .8 1 .0

020

040

060

080

010

0012

00

Did you expect that? Did you expect that?

Multiple hypothesis testing

Page 95: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Multiple Comparison

p-value: probability of finding a difference equal or greater than the observed one just by chance under the null hypothesis.

Measure of false positive rate (F/ m0)Commonly used significance level, 5% (+/-1.96 s.d.), is arbitrary In multiple comparisons, 5% significance level for each comparison often results in too large overall significance level.Do not involve the alternative hypothesis.

Called significant

Called not significant

Total

Null true F m0 - F m0

Alternative true T m1- T m1

Total S m - S m

Page 96: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Multiple Comparison (Cont. 1)

Family-wise error rate (FWER)probability of having at least one false positives in multiple comparisons.

Many versions of controlling procedure. Bonferroni, Holm (1979), Hochberg (1988), Hommel (1988)Can be too conservative for genomic studies.

α N

1 5 10 50 100 10000.01 0.01

(0.01)0.05

(0.05)0.10(0.1)

0.39(0.5)

0.63(1)

1.00(10)

0.05 0.05(0.05)

0.23(0.25)

0.40(0.5)

0.92(2.5)

0.99(5)

1.00(50)

Table: FWER (expected number of false postives) for different number of comparisons (N) at different α level

Page 97: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Multiple Comparison (Cont. 2)

False discovery rate (FDR / pFDR): Proportion of hits that are false (F/S).

Several versions of controlling procedure. (Benjamini & Hochberg (1995), and Benjamini & Yekutieli (2001))A significance measure based on pFDR: q-value (Storey & Tibshirani (2003))

q-value: minimum false discovery rate that can be attained when calling a feature significantRequire to estimate the proportion of true null (m0/m) For FDRs estimated using Benjamini’s and Storey’s approaches, the same cut-off resulted in different numbers of significant genes.No formula to describe what are the quantities related to FDR and how they are related.

Page 98: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

Summary

This only provides some flavor of probability, statistics and their usage.

To learn more: taking a full course!Introduction to biostatistics for clinical investigatorsStatistical methods for observational studies

Page 99: Introduction to Probability and StatisticsConditional probability Let Ωbe an event space and let P be a probability measure on Ω. Let B єΩbe an event (on which we want to condition)

References and some useful info

Statistical methods in bioinformatics course slides developed by Dr. Christian

Gieger and Dr. Achim Tresch http://www.scaibit.de/index.php?id=92

Statistical Methods in Bioinformatics by Warren Ewens and Gregory Grant

Introduction to Statistical Thought by Michael Lavine, http://www.stat.duke.edu/~michael/book.html

The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman

Statistical software package and program language R, http://www.r-project.org