applied statistics 3

Upload: yahya-khurshid

Post on 04-Jun-2018

239 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Applied Statistics 3

    1/34

    1

    Review of Probability and

    Statistics

  • 8/13/2019 Applied Statistics 3

    2/34

    2

    Random Variables

    X is a random variable if it represents a randomdraw from some population

    a discrete random variable can take on onlyselected values (usually Integers)

    a continuous random variable can take on any

    value in a real interval

    associated with each random variable is aprobability distribution

  • 8/13/2019 Applied Statistics 3

    3/34

    3

    Random VariablesExamples

    the outcome of a fair coin tossa discrete

    random variable with P(Head)= 0.5 and

    P(Tail)= 0.5

    the height of a selected studenta

    continuous random variable drawn from anapproximately normal distribution

  • 8/13/2019 Applied Statistics 3

    4/34

    4

    Expected Value of XE(X)

    The expected value is really just a

    probability weighted average of X

    E(X) is the mean of the distribution of X,denoted by mxLet f(xi) be the probability that X=xi, then

    n

    i

    iiX xfxXE1

    )()(m

  • 8/13/2019 Applied Statistics 3

    5/34

    5

    Variance of XVar(X)

    The variance of X is a measure of the

    dispersion or spread of the distribution

    Var(X) is the expected value of the squareddeviations from the mean, so

    22 )( XX XEXVar m

  • 8/13/2019 Applied Statistics 3

    6/34

    6

    More on Variance

    The square root of Var(X) is the standard

    deviation of X

    Var(X) can alternatively be written in termsof a weighted sum of squared deviations,

    because

    iXiX xfxXE 22 mm

  • 8/13/2019 Applied Statistics 3

    7/34

    7

    CovarianceCov(X,Y)

    Covariance between X and Y is a measureof the association between two random

    variables, X & YIf positive, then both move up or down

    together

    If negative, then if X is high, Y is low, viceversa

    YXXY YXEYXCov mm ),(

  • 8/13/2019 Applied Statistics 3

    8/34

    8

    Correlation Between X and Y

    Covariance is dependent upon the units ofX & Y [Cov(aX,bY)=abCov(X,Y)]

    Correlation, Corr(X,Y), scales covarianceby the standard deviations of X & Y so thatit lies between -1 & 1

    21

    )()(

    ),(

    YVarXVar

    YXCov

    YX

    XYXY

  • 8/13/2019 Applied Statistics 3

    9/34

    9

    More on Correlation & Covariance

    If X,Y=0 (or equivalently X,Y=0) then Xand Y are linearly unrelated

    If X,Y= 1 then X and Y are said to beperfectly positively correlated

    If X,Y=1 then X and Y are said to beperfectly negatively correlated

    Corr(aX,bY) = Corr(X,Y) if ab>0

    Corr(aX,bY) =Corr(X,Y) if ab

  • 8/13/2019 Applied Statistics 3

    10/34

    10

    Properties of Expectations

    E(a)=a, Var(a)=0

    E(mX)=mX, i.e. E(E(X))=E(X)

    E(aX+b)=aE(X)+b

    E(X+Y)=E(X)+E(Y)

    E(X-Y)=E(X)-E(Y)

    E(X- mX)=0 or E(X-E(X))=0

    E((aX)2)=a2E(X2)

  • 8/13/2019 Applied Statistics 3

    11/34

    11

    More Properties

    Var(X) = E(X2)mx2

    Var(aX+b) = a2Var(X)

    Var(X+Y) = Var(X) +Var(Y) +2Cov(X,Y)Var(X-Y) = Var(X) +Var(Y) - 2Cov(X,Y)

    Cov(X,Y) = E(XY)-mxmy

    If (and only if) X,Y independent, then Var(X+Y)=Var(X)+Var(Y), E(XY)=E(X)E(Y)

  • 8/13/2019 Applied Statistics 3

    12/34

    12

    The Normal Distribution

    A general normal distribution, with mean mand variance 2is written as N(m, 2)

    It has the following probability densityfunction (pdf)

    2

    2

    2

    )(

    21)(

    m

    x

    exf

  • 8/13/2019 Applied Statistics 3

    13/34

    13

    The Standard Normal

    Any random variable can be standardized by

    subtracting the mean, m, and dividing by the

    standard deviation,

    22

    2

    1 z

    ez

    X

    XXZ

    m

    E(Z)=0, Var(Z)=1

    Thus, the standard normal, N(0,1), has pdf

  • 8/13/2019 Applied Statistics 3

    14/34

    14

    Properties of the Normal Distn

    If X~N(m,2), then aX+b ~N(am+b,a22)

    A linear combination of independent,

    identically distributed (iid) normal randomvariables will also be normally distributed

    If Y1,Y2, Ynare iid and ~N(m,2), then

    n,N~

    2

    mY

  • 8/13/2019 Applied Statistics 3

    15/34

    15

    Cumulative Distribution Function

    For a pdf, f(x), where f(x) is P(X = x), the

    cumulative distribution function (cdf), F(x),

    is P(X x); P(X > x) = 1F(x) =P(Xa) = 2[1-F(a)]P(a Z b) = F(b)F(a)

  • 8/13/2019 Applied Statistics 3

    16/34

    16

    Random Samples and Sampling

    For a random variable Y, repeated drawsfrom the same population can be labeled as

    Y1, Y2, . . . , YnIf every combination of n sample points

    has an equal chance of being selected, thisis a random sample

    A random sample is a set of independent,identically distributed (i.i.d) randomvariables

  • 8/13/2019 Applied Statistics 3

    17/34

    17

    Estimators as Random Variables

    Each of our sample statistics (e.g. the

    sample mean, sample variance, etc.) is a

    random variable - Why?Each time we pull a random sample, well

    get different sample statistics

    If we pull lots and lots of samples, well geta distribution of sample statistics

  • 8/13/2019 Applied Statistics 3

    18/34

    18

    Sampling Distributions

    The Estimators from Samples (like Sample

    Mean, Sample Variance etc) are themselves

    Random Variables and their distributionsare termed as Sampling Distributions.

    These include:

    Chi Square Distribution t-Distribution

    F-Distribution

  • 8/13/2019 Applied Statistics 3

    19/34

    19

    The Chi-Square Distribution

    Suppose that Zi, i=1,,n are iid ~ N(0,1),

    and X=(Zi2), then

    X has a chi-square distribution with ndegrees of freedom (dof), that is

    X~2n

    If X~2n, then E(X)=n and Var(X)=2n

  • 8/13/2019 Applied Statistics 3

    20/34

    20

    The t distribution

    If a random variable, T, has a t distribution with n

    degrees of freedom, then it is denoted as T~tn

    E(T)=0 (for n>1) and Var(T)=n/(n-2) (for n>2)T is a function of Z~N(0,1) and X~2nas follows:

    n

    X

    Z

    T

  • 8/13/2019 Applied Statistics 3

    21/34

    21

    The F Distribution

    If a random variable, F, has an F distribution with

    (k1,k2) dof, then it is denoted as F~Fk1,k2

    F is a function of X1~2

    k1and X2~2

    k2as follows:

    2

    2

    1

    1

    k

    X

    k

    X

    F

  • 8/13/2019 Applied Statistics 3

    22/34

    22

    Estimators and Estimates

    Typically, we cant observe the full population, so

    we must make inferences based on estimates from

    a random sample

    An estimator is a mathematical formula for

    estimating a population parameter from sample

    data

    An estimate is the actual number (numericalvalue) the formula produces from the sample data

  • 8/13/2019 Applied Statistics 3

    23/34

    23

    Examples of Estimators

    Suppose we want to estimate the population mean

    Suppose we use the formula for E(Y), butsubstitute 1/n for f(yi) as the probability weightsince each point has an equal chance of beingincluded in the sample (sample being random),then

    Can calculate the sample average for our sample:

    n

    i

    iYn

    Y1

    1

  • 8/13/2019 Applied Statistics 3

    24/34

    24

    What Makes a Good Estimator?

    Unbiasedness

    Efficiency

    Mean Square Error (MSE)

    Asymptotic properties (for large samples):

    Consistency

  • 8/13/2019 Applied Statistics 3

    25/34

    25

    Unbiasedness of Estimators

    We want our estimator to be right, on average

    We say an estimator, W, of a Population

    Parameter, q, is unbiased if E(W)=E(q)For our example, that means we want

    YYE m)(

  • 8/13/2019 Applied Statistics 3

    26/34

    26

    Proof: Sample Mean is Unbiased

    YY

    n

    i

    Y

    n

    ii

    n

    ii

    nnn

    YEnYnEYE

    mmm

    11

    )(

    11

    )(

    1

    11

  • 8/13/2019 Applied Statistics 3

    27/34

    27

    Example

    PopulationPerson Age (Years)

    A 40B 42C 44D 50E 65

    Take a sample of size four

  • 8/13/2019 Applied Statistics 3

    28/34

    28

    Population and Sample StatisticsPopulation Possible Samples of Size 4

    Person Age ABCD ABCE ABDE ACDE BCDEA 40 40 40 40 40B 42 42 42 42 42C 44 44 44 44 44D 50 50 50 50 50E 65 65 65 65 65

    Mean 48.2 44 47.8 49.3 49.8 50.3Variance 102 18.7 135 129 120 108Std Dev 10.1 4.32 11.6 11.4 11 10.4

  • 8/13/2019 Applied Statistics 3

    29/34

    29

    Unbiasedness

    Mean of the Sample Averages =

    (44+47.8+49.3+49.8+50.3)/5 = 48.2 Mean of the Sample Variances =

    (18.7+135+129+120+108)/5 = 102

    Mean of the Sample Std Devs =

    (4.32+11.6+11.4+11+10.4)/5 = 9.73

  • 8/13/2019 Applied Statistics 3

    30/34

    30

    Efficiency of Estimator

    Want our estimator to be closer to the truth,

    on average, than any other estimator

    We say an estimator, W, is efficient ifVar(W)< Var( any other estimator)

    Note, for our example

    nnY

    nVarYVar

    n

    i

    n

    i

    i

    2

    1

    2

    21

    11)(

  • 8/13/2019 Applied Statistics 3

    31/34

    31

    MSE of Estimator

    What if we cant find an unbiasedestimator?

    Define mean square error as E[(W-q)2

    ]Get trade off between unbiasedness and

    efficiency, since MSE = variance + bias2

    For our example, it means minimizing

    22 YY YEYVarYE mm

  • 8/13/2019 Applied Statistics 3

    32/34

    32

    Consistency of Estimator

    Asymptotic property, that is, what happens

    as the sample size goes to infinity?

    Want distribution of W to converge to q,i.e. plim(W)=q

    For our example, it means we want

    nYPY

    as0m

  • 8/13/2019 Applied Statistics 3

    33/34

    33

    Central Limit Theorem

    Asymptotic Normality implies that P(Z

  • 8/13/2019 Applied Statistics 3

    34/34

    34

    Estimate of Population Variance

    We have a good estimate of mY, would likea good estimate of 2Y

    Can use the sample variance given belownote division by n-1, not n, since mean isestimated tooif know mcan use n

    2

    1

    2

    11

    n

    i

    i YYn

    S