distributions & descriptive statistics dr william simpson psychology, university of plymouth

Download Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Post on 22-Jan-2016

19 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth. Defining and measuring variables. Independent & dependent variables. Independent variable : something we manipulate in an experiment Dependent variable : something we measure - PowerPoint PPT Presentation

TRANSCRIPT

  • Distributions & Descriptive statistics

    Dr William SimpsonPsychology, University of Plymouth*

  • Defining and measuring variables

    *

  • Independent & dependent variablesIndependent variable: something we manipulate in an experimentDependent variable: something we measure By manipulating the IV, we expect to produce a change in the DV*

  • Scales of measurementvariables classified according to type of scaletype of analysis depends on type of scale Worst to best: Nominal, ordinal, interval, ratio*

  • NominalNominal data: assign categorical labels to observationsNot really measurementE.g. male/female; married/single/widowed/divorcedNumbers on football jerseys*

  • OrdinalOrdinal data: values can be ranked (ordered). Categorical but rankableE.g. small, medium, large; movie rating 1-5; Likert scaleCan only be ranked. Rating scale is not like cm. The diff between & is not nec the same as between & *

  • Adding a response of "strongly agree" (5) to two responses of "disagree" (2) would give us a mean of 4, but what is the meaning of that number?*

  • IntervalInterval data: ordinary measurement, e.g. temperatureUnlike ordinal data, we can say the diff between 1 & 2 deg C is same as diff between 4 & 5 deg*

  • RatioOrdinary measurements, but with an absolute, non-arbitrary zero pointE.g. weight, length: any scale must start at zerodeg C: not ratio, because 0 arbitrarily set at freezing pt of water*

  • Discrete & continuous variablesvariables measured on interval & ratio scales are further identified as either:discrete Integers, no intermediate values. E.g. #Smarties in a box

    continuous - measurable to any level of accuracy. E.g. Weight of Smarties contents*

  • Frequency distributions*

  • We have a pile of scoresNot all scores are equally likelyHow were scores distributed?*

  • Subjects were timed (in sec) while completing a problem-solving task:7.6, 8.1, 9.2, 6.8, 5.9, 6.2, 6.1, 5.8, 7.3, 8.1, 8.8, 7.4, 7.7, 8.2

    *

  • Stem & leafTwo components: the stem and the leafIn problem-solving example, stem = ones, leaf = tenthsStems range between 5 and 9

    *

  • 7.6, 8.1, 9.2, 6.8, 5.9, 6.2, 6.1, 5.8, 7.3, 8.1, 8.8, 7.4, 7.7, 8.2 5|98 6|821 7|6347 8|1182 9|2Key: 9|2 means 9.2*

  • Heights in cm:154, 143, 148,139, 143, 147, 153, 162, 136, 147, 144, 143, 139, 142, 143, 156, 151, 164, 157, 149, 146- Put 2 digits in stem; split stems 0-4, 5-913|96914|33432314|8779615|43115|6716|24Key: 13|6 means 136

    *

  • GSR values: 23.25, 24.13, 24.76, 24.81, 24.98, 25.31, 25.57, 25.89, 26.28, 26.34, 27.09- Round the last 2 digits23|324|18825|036926|3327|1Key: 23|3 means 23.3*

  • HistogramAlternative way to look at distributionIt is like a version of stem-and-leaf turned 90 deg*

  • ExampleTime to complete task (min):8 2 6 12 9 14 1 7 7 9 11 8 12 10 5 7 10 9 10 11 4 8 2 11 10 11 13 13 14 11 13 10 12 13 5 16 11 17 10 6 13 11 5 9 12 14 8 2 12 4

    *

  • Sort scores into about 10 or so bins (similar to stem in stem-and-leaf)

    *

  • Decide on sensible binsCount the number of observations in each bin (length of each leaf in stem-and-leaf)This number in each bin is called the frequency

    *

  • * timefrequency 0-112-334-556-758-9810-111312-131014-15316-172

  • This table is then used to make the histogramHistogram is bar chart with frequency on y axis and score on x axisSometimes done other ways, e.g. connect the dots (frequency distrib polygon)*

  • *

  • in Rx
  • Probability distributionsHistogram is estimate of true probability distributionMany theoretical probability distributions existBasis of statistical models used to make inferences about population*

  • Binomial distributionBinomial distribution is a discrete distributionthe binomial distribution applies when:there is a series of n trials (e.g., 10 coin tosses)only 2 possible outcomes per trial outcomes are mutually exclusive (head or tail)outcome of each trial independent of others*

  • The binomial distribution gives the chance of getting each total number of successes after doing all the (binary) trials of the exptE.g. it gives the chance of getting 1, 2, or 3 girls after giving birth to 6 childrenp = p(success) = p(girl) = 0.5 each trialq = p(failure) = p(boy) = 1-p = 0.5n = number of trials = 6*

  • prob distribution where n = 6 and the prob of each outcome is 0.5 on each trial looks like:*number of girls probability

  • For any probability distribution, the y-axis is given by a formulaFor the binomial, it looks like this:*k successes in n trials; () is binomial coefficientyou dont need to know it

  • Normal distributionContinuous probability distributionEvery probability distributions y-axis is given by a formulaFor normal distribution, the y-axis (probability density) is:*

  • *

  • Descriptive statistics*

  • We have a pile of scoresHave made stem-and-leaf, histogramWant to summarise further: descriptive statistics*

  • 1. Centre (location)What is the typical score? If you were to make a prediction for a new score, what would it be?*

  • a) Mean (average)Mean = sum(x)/n*

  • Mean as balance pointImagine that each observation is a toy blockPlace the blocks on a ruler; the position (1, 2, etc inches) represents the valueThe balance point is the mean

    *

  • 1 2 2 3 *1 2 2 5 1 2 2 9 Mean is pulled towards extreme observation (outlier)

  • b) MedianMedian is middle score; 50th percentileuseful when extreme scores (outliers) lie in one tail of distribution (skewed)

    *

  • Calculate the medianSort scoresIf odd n, median is middle valueIf even n, median is mean of 2 middle values25 13 9 18 1 -> 1 9 13 18 25; med=1325 13 9 18 -> 9 13 18 25Median= (13+18)/2 = 15.5*

  • Median and outliers1 2 2 31 2 2 51 2 2 9Median = 2 in all cases

    *

  • c) ModeMode is most frequently occurring scoreMean should really be used only for interval/ratio data. Mode good otherwiseE.g. mean movie rating not really sensible. Mode sensibleSometimes no unique mode exists (e.g. bimodal)*

  • Bimodality can be due to mixture of two different populations (e.g. male and female)*

  • Mean = 9.36 Median = 10 Mode =11*Time to complete task (min)

  • mean(x)median(x)Mode
  • Likert scalee.g. Brief Psychiatric Rating scale (BPRS)Interview + observations of patient's behaviour over preceding 23 daysEach item scored 0-7*

  • Suppose we have a new treatmentDoes it reduce anxiety?Define anxiety as score on Q2*

  • We use BPRS on lots of patientsCompare treatment and placeboHow? Find mean(treatment) vs mean(placebo)?*

  • NO*

  • The numbers 0-7 are not really numbers!They have only rank (order) infoOrdinal

    *

  • The numbers are really ordered labels: normal, a bit anxious, , extremely anxious *

  • They lack a quantitative distance between them; calculating a mean level of anxiety for the group is not really appropriate*

  • It makes sense to find the modeMost frequently occurring anxiety score*

  • It makes sense to measure the median: person in the middle of the group in terms of anxiety, with half the responses below and the other half above*

  • ExampleFamily-Focused Treatment Versus Individual Treatment for Bipolar Disorder: Results of a Randomized Clinical TrialJ. Consulting & Clinical Psychology, 2003, 71, 482 492

    *

  • *The psychiatrist made ratings of compliance on a 7-point Likert scale ranging from full compliance (1) to discontinued medication against medical advice (7) p.486

  • On the whole, the participants were quite compliant with their medication, with at least 78% of the patients scoring within the compliant range at each assessment point p.489- Must have made mistake before: 1 is bad, 7 is good compliance*

  • For each 3-month follow-up period, participants were placed in one of the following clinical outcome categories: relapse, defined as a rating of 6 or 7 on the BPRS/SADS-C core symptoms of depression (depressed mood, loss of interest), mania (hostility, elevated mood, grandiosity), or psychosis (unusual thought content, suspiciousness, hallucinations, conceptual disorganization) and at least two ancillary symptoms (suicidality, guilt, sleep disturbance, appetite disturbance, lack of energy, negative evaluation, discouragement, increased energy activity), or nonrelapse, defined as a score of 5 or below on all relevant BPRS/SADS-C core symptoms during the 3-month interval*

  • *

  • *

  • 2. Spread (dispersion)Measure of centre (e.g. mean) tells what value we expectMeasure of spread tells how close a value will typically be to the centre*

  • a) Interquartile rangeInterquartile range (IQR) finds distance between the top 25% and bottom 25% of scores

  • QuartilesQuartiles divide the data into quartersThe median (Q2) divides the data into 2 piles (50% above, 50% below)Q1 is the cutoff below which fall the bottom 25% of scoresQ3 is the cutoff below which fall the bottom 75%

  • Q1 has 25% of scores below it, Q2 has 50% (i.e. it is the median) and,Q3 has 75% of scores below it (25% above)

  • Finding quartilesSort the dataFind the median = Q2 = value that splits the data into two equal piles, half below it and half aboveQ1 = median of lower halfQ3 = median of upper halfIQR = Q3 Q1