ida central tendency

Upload: pravingondhale

Post on 10-Apr-2018

239 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 IDA Central Tendency

    1/20

    Initial Data AnalysisCentral Tendency

  • 8/8/2019 IDA Central Tendency

    2/20

    Ou tlineWhat is central tendency?Classic meas u res

    Mean, Median, ModeWhats an average?Properties of statistics

    Su fficiencyEfficiency

    BiasResistance

    Resistant meas u res

  • 8/8/2019 IDA Central Tendency

    3/20

    Meas u res of Central TendencyWhile distrib u tions provide an overall pict u re of some data set, it is sometimes desirable to representsome property of the entire data set u sing a singlestatisticThe first descriptive statistic we will disc u ss arethose u sed to indicate where the center of thedistrib u tion lies.

    The expected val u eIt is not a val u e that has to be in the dataset itself There are different meas u res of central tendency,each with their own advantages and disadvantages

  • 8/8/2019 IDA Central Tendency

    4/20

    The ModeThe mode is simply the val u e of the relevant variable thatocc u rs most often (i.e., has the highest freq u ency) in thesample

    Note that if yo u have done a freq u ency histogram, yo u canoften identify the mode simply by finding the val u e with thehighest bar.

    However, that will not work when grou ping was performed prior to plotting the histogram (altho u gh yo u can still u se the

    histogram to identify the modal gro u p, j u st not the modalval u e).

    Modes in partic u lar are probably best applied to nominal data

  • 8/8/2019 IDA Central Tendency

    5/20

    ModeAdvantages

    Very q u ick and easy to determineIs an act u al val u e of the data

    Not affected by extreme scores

    Disadvantages

    Sometimes not very informative (e.g. cigarettes smoked ina day)Can change dramatically from sample to sampleMight be more than one (which is more representative?)

  • 8/8/2019 IDA Central Tendency

    6/20

    The Median

    M edian Location = N + 12

    The median is the point corresponding to the score that lies inthe middle of the distrib u tion (i.e., there are as many data

    points above the median as there are below the median).

    To find the median, the data points m u st first be sorted intoeither ascending or descending n u merical order.The position of the median val u e can then be calc u lated u singthe following form u la:

  • 8/8/2019 IDA Central Tendency

    7/20

    MedianAdvantage:

    Resistant to o u tliers

    Disadvantage:May not be so informative:(1, 1, 2, 2, 2, 2, 5, 6, 9, 9, 10 )

    Does the val u e of 2 really represent this sample as awhole very well?

  • 8/8/2019 IDA Central Tendency

    8/20

    The Mean

    X X

    N

    !

    X The most commonly u sed meas u re of centraltendency is called the mean (denoted for

    a sample, and for a pop u lation).

    The mean is the same of what many of u s call

    the average, and it is calc u lated in thefollowing manner:

  • 8/8/2019 IDA Central Tendency

    9/20

    Mode vs. Median vs. MeanWhen there is only one mode and distrib u tionis fairly symmetrical the three meas u res (as

    well as others to be disc u ssed) will havesimilar val u es

    However, when the u nderlying distrib u tion isnot symmetrical, the three meas u res of centraltendency can be q u ite different.

  • 8/8/2019 IDA Central Tendency

    10/20

    Some Vis u al DemosHere is a demonstration 1 that allows yo u to change afreq u ency histogram while sim u ltaneo u sly noting theeffects of those changes on the mean vers u s the

    median.

    As yo u u se the demo, yo u sho u ld fairly easily beable to think abo u t how these changes are alsoaffecting the mode

    Note that the order wo u ld go Mode Median thenMean in the direction the tail is pointing.

  • 8/8/2019 IDA Central Tendency

    11/20

    Whats an average?Weve been referring to the mean witho u t qu alification, b u t infact there are many types of averages, and that is only oneThe mean we typically u se is the ar ithmetic mean

    Along with the g eomet r ic mean and har monic mean, they arethe Pythagorean means .In their calc u lation, the Arithmetic mean is greater than or eq u al tothe Geometric mean, which is greater than or eq u al to the harmonicmean

    The geometric mean for n val u es is to m u ltiply them all andtake the n th root of that n u mber The harmonic mean can be seen as the reciprocal 1 of thearithmetic mean of the reciprocals of all the val u es of thevariable in q u estion 2

  • 8/8/2019 IDA Central Tendency

    12/20

    More meansThe geometric mean is partic u larly appropriate for exponential type of data

    E.g. H u man pop u lation over a period of time

    The harmonic mean is good for things like rates andratios where an arithmetic mean wo u ld act u ally beincorrect 1, bu t whenever yo u see an AN O VA withu neq u al sample sizes, the far and away mostcommon proced u re u ses the harmonic mean of sample sizes

    As a res u lt, an u nbalanced design will have less statistical power beca u se the average sample size will tend towardthe least sample

  • 8/8/2019 IDA Central Tendency

    13/20

    More meansWeighted averagesSometimes we will want to weight a meas u re of

    some variable by the val u es of some other variableE.g. If each person gets a score on several items and wewant an average of the tot al sco r e fo r ea ch pe r son a cr ossthe items , we might weight them by 1/variance to give themore consistent scorers more importance in thecalc u lation

    The arithmetic mean is a weighted average in whichall weights 1.

  • 8/8/2019 IDA Central Tendency

    14/20

    Properties of a Statistic: Sampling

    Distrib u tionIn order to examine the propertiesof a statistic we often want to take

    repeated samples from some pop u lation of data and calc u latethe relevant statistic on eachsample.We can then look at thedistrib u tion of the statistic acrossthese samples and ask a variety of qu estions abo u t it.

  • 8/8/2019 IDA Central Tendency

    15/20

    Properties of a StatisticSu fficiency

    A su fficient statistic is one that makes u se of all of the information inthe sample to estimate its corresponding parameter

    For example, this property makes the mean more attractive as a meas u re

    of central tendency compared to the mode or median.Unbiasedness

    A statistic is said to be an u nbiased estimator if its expected val u e(i.e., the mean of a n u mber of sample means) is eq u al to the

    pop u lation parameter it is estimating.As one can see u sing the resampling proced u re, the mean can be shown

    to be an u nbiased estimator

  • 8/8/2019 IDA Central Tendency

    16/20

    Properties of a StatisticEfficiency

    The efficiency of a statistic is reflected in the variance that is observedwhen one examines the statistic over independently chosen samples

    Standard error

    The smaller the variance, the more efficient the statistic is said to beResistance

    The resistance of an estimator refers to the degree to which thatestimate is effected by extreme val u es i.e. o u tliersSmall changes in the data res u lt in only small changes in estimateFinite-sample breakdown point

    Measu

    re of resistance to contaminationThe smallest proportion of observations that, when altered s u fficiently,can render the statistic arbitrarily large or small

    Median n/2Trimmed mean whatever the trimming amo u nt isMean 1/n

  • 8/8/2019 IDA Central Tendency

    17/20

    Resistant meas u res of central tendencyTrimmed mean

    Created by trimming some percentage of thehigh and low ends of the dataThe median is act u ally a trimmed estimate

    Windsorized meanM-estimators

    Extreme val u es are given less weight than those closer tothe center of the distrib u tion.May be more rob u st than mean or median for certaintypes of f u nky data

  • 8/8/2019 IDA Central Tendency

    18/20

    Practical ExampleAdminister the BDI to 10 randomly selected UNT st u dents8 of the st u dents score less than 25, two scored greater than 45.8, 12, 6, 16, 10, 20, 22, 25, 47, 55

    Median 18Mean 22.1

    Which is more acc u rate regarding generalization to the typicalUNT st u dent? O ne that incl u des:

    Two people that perhaps reversed their ratings on the items?A score that was miskeyed ( u sing the n u mber pad they hit a 4 insteadof 1 leading to a score of 47)?Two people who do not have English as their native lang u age?Two people that did not answer honestly?Two people that are act u ally clinically depressed?O ne that is clinically depressed, one that j u st wants to be different?

  • 8/8/2019 IDA Central Tendency

    19/20

    Practical ExampleWhile many think of o u tliers as representing the complexity of h u mannatu re1 the iss u e more revolves aro u nd inadeq u ate data collection todetect why the score is what it is and problematic pop u lation description

    E.g. my definition of typical UNT st u dent, if s u ch a thing co u ld be said toexist at all, is not one that is on s u icide watchHowever, the previo u s problem most likely represents an attempt togeneralize to something that doesnt exist.Better pop u lations to try and represent: UNT Texans, UNT Psych gradstu dents, UNT international st u dents, UNT st u dents who have visited C & Tin the last semester (in which case those wo u ld probably not be o u tliers) etc.

    Application to c u rrent events: Do yo u really think there is a middle

    America, a female vote etc. to which the presidential candidates aretrying to appeal? There are demographics, very specific ones yes, b u tthose connotations do little to note the specifics.

  • 8/8/2019 IDA Central Tendency

    20/20

    Su mmaryFavoritism for the arithmetic mean is the res u lt of familiarityonly 1, and u ntil yo u came to this co u rse yo u wo u ld have beenhard-pressed to explain yo u r preference o u tside of arg u ments

    from au

    thorityThe AM is to be val u ed for some properties it has relative toother meas u res (s u fficiency, efficiency, u nbiased), and alsorejected for the same reason (least amo u nt of resistance)In many cases its entirely ina pp r op r ia te to u se the AM as itwo u ld be a distorted view of cent ral tendencyWhich statistics yo u u se to represent yo u r data sho u ld beconsidered as m u ch as the meas u res themselves.