exploratory data analysis. lecture overview data analysis template exploratory data analysis (eda)...

Download Exploratory Data Analysis. Lecture overview Data analysis template Exploratory Data Analysis (EDA) –The role of EDA –Doing EDA –Interpreting EDA results

Post on 12-Jan-2016




0 download

Embed Size (px)


  • Exploratory Data Analysis

  • Lecture overviewData analysis template

    Exploratory Data Analysis (EDA)The role of EDADoing EDAInterpreting EDA results

  • Discover patterns in dataWhy is it important to find patterns?What counts as a pattern? What techniques can we use to find patterns?When can such techniques be used?How should the results be interpreted?

  • Data analysis templateExploratory Data AnalysisSummary of the dataAccidental and unexpected patterns

    Data Screeningcheck for statistical hiccups

    Fit model eg. ANOVA & do specific tests

    Exploratory Data Analysis & Data Screening revisited: check residuals

  • The role of EDAExploratory Data Analysis

    Explore a data setUse methods that help you understand the data- to help you understand the events that generated the data- to help you see what happened, sometimes in spite of your expectations

  • Simple exampleClass attendance and language learning

    Bob: 10 classes; 100 wordsCarol: 15 classes 150 wordsDave: 12 classes; 120 wordsAnn: 17 classes; 170 wordsSteve: 13 classes; 95 words

  • Recognising patternsEDA supplies statistical techniques

    that work in combination with a very powerful pattern recognition deviceWays to tabulate, summarise, display, reduce data

  • Data Analysis (DA)DA can't be done mechanicallyOften there has to be a "creative" elementConventional DA is in a sense idealisticTrade-off between"ideal" experimentation v. ecological validitySometimes questions are tentativeWe need data analysis skills that allow data to speak to us despite our expectation

  • More interesting exampleNameVoyager


  • NameVoyagerVariableMethod used to represent

    Timehorizontal axis No. / billion babiesvertical axisSexcolour hueRank in 2007colour saturationNamelabelDetailpop-up, click thru

  • Confirmatory vs. exploratory data analysistests a hypothesissettles questions

    (Inferential statistics)finds a good descriptionraises new questions

    (Descriptive statistics)Confirmatory data analysisExploratory data analysis

  • What is data?A bunch of numbers (usually)Each number summarises some property or event of intereste.g. 18Age, Beck Depression Inventory (BDI) score, Income in 000sData: lots of numbers e.g. 18, 24, 43, 22, 37,

    Is there a pattern?

  • Data reduction fewer numbersSummarise proportion27 / 48 children in class A are boys16 / 23 children in class B are boysRe-presented: 56% of class A, 69% of class B are boys

    Summarise changeBefore: 112, 134, 121, 97After:116, 132, 140, 108Re-presentedChange: 4, -2, 19, 11

  • Simpler descriptions are better

    "Anything that looks below the previously described surface makes the description more effective" Tukey (1977)

  • Revealing patternsRaw data is hard to understandEDA provides ways of presenting data that make the data easier to understand

    Example of Lord Rayleigh's research on the weight of nitrogenused a chemical compound to isolate a fixed amount of nitrogenrepeated this experiment 15 times

  • DateSource compoundExtraction methodWeight observed29.11.93NOhot iron2.301435.12.93NOhot iron2.298166.12.93NOhot iron2.301828.12.93NOhot iron2.2989012.12.93Airhot iron2.3101714.12.93Airhot iron2.3098619.12.93Airhot iron2.3101022.12.93Airhot iron2.3100126.12.93N2Ohot iron2.2988928.12.93N2Ohot iron2.299409.1.94NH4NO2hot iron2.2984913.1.94NH4NO2hot iron2.2988927.1.94Airferrous hydrate2.3102430.1.94Airferrous hydrate2.310301.2.94Airferrous hydrate2.31028

  • Box & whisker plot

  • dot plot

  • Two separate box & whisker plots

  • TechniqueFind a graph that shows clearly that the data can be divided into two different groups

    Appropriate representation depends on your practical goal

  • Precise descriptions are better"Most of the key questions in our world sooner or later demand answers to "by how much?" rather than merely to "in which direction?" (Tukey, 1977)

    Hick's LawChoice Reaction Time experimentRT increases with number of possible response alternatives

  • Hick's law

  • Hick's law

  • Interpreting EDAMultiplicity

  • Interpreting EDASummarise the resultsDiscover unanticipated resultsnew line of research, new experimentqualify conclusion from the present studyGenerate hypothesesCheck assumptionsqualify conclusion from the present studyaddress anomalies

    NOT (or, rarely) a definitive conclusion

  • Practical week 7Using EDA for data screening in simple & multiple regression

    VisualisationNameVoyagerBullying data

    Register for bullying data before the practical!

    e.g. table of raw dataa representation that the eye finds congenial is of more practical use


View more >