# exploratory data analysis. lecture overview data analysis template exploratory data analysis (eda)...

Post on 12-Jan-2016

218 views

Embed Size (px)

TRANSCRIPT

Exploratory Data Analysis

Lecture overviewData analysis template

Exploratory Data Analysis (EDA)The role of EDADoing EDAInterpreting EDA results

Discover patterns in dataWhy is it important to find patterns?What counts as a pattern? What techniques can we use to find patterns?When can such techniques be used?How should the results be interpreted?

Data analysis templateExploratory Data AnalysisSummary of the dataAccidental and unexpected patterns

Data Screeningcheck for statistical hiccups

Fit model eg. ANOVA & do specific tests

Exploratory Data Analysis & Data Screening revisited: check residuals

The role of EDAExploratory Data Analysis

Explore a data setUse methods that help you understand the data- to help you understand the events that generated the data- to help you see what happened, sometimes in spite of your expectations

Simple exampleClass attendance and language learning

Bob: 10 classes; 100 wordsCarol: 15 classes 150 wordsDave: 12 classes; 120 wordsAnn: 17 classes; 170 wordsSteve: 13 classes; 95 words

Recognising patternsEDA supplies statistical techniques

that work in combination with a very powerful pattern recognition deviceWays to tabulate, summarise, display, reduce data

Data Analysis (DA)DA can't be done mechanicallyOften there has to be a "creative" elementConventional DA is in a sense idealisticTrade-off between"ideal" experimentation v. ecological validitySometimes questions are tentativeWe need data analysis skills that allow data to speak to us despite our expectation

More interesting exampleNameVoyager

NameMapper

NameVoyagerVariableMethod used to represent

Timehorizontal axis No. / billion babiesvertical axisSexcolour hueRank in 2007colour saturationNamelabelDetailpop-up, click thru

Confirmatory vs. exploratory data analysistests a hypothesissettles questions

(Inferential statistics)finds a good descriptionraises new questions

(Descriptive statistics)Confirmatory data analysisExploratory data analysis

What is data?A bunch of numbers (usually)Each number summarises some property or event of intereste.g. 18Age, Beck Depression Inventory (BDI) score, Income in 000sData: lots of numbers e.g. 18, 24, 43, 22, 37,

Is there a pattern?

Data reduction fewer numbersSummarise proportion27 / 48 children in class A are boys16 / 23 children in class B are boysRe-presented: 56% of class A, 69% of class B are boys

Summarise changeBefore: 112, 134, 121, 97After:116, 132, 140, 108Re-presentedChange: 4, -2, 19, 11

Simpler descriptions are better

"Anything that looks below the previously described surface makes the description more effective" Tukey (1977)

Revealing patternsRaw data is hard to understandEDA provides ways of presenting data that make the data easier to understand

Example of Lord Rayleigh's research on the weight of nitrogenused a chemical compound to isolate a fixed amount of nitrogenrepeated this experiment 15 times

DateSource compoundExtraction methodWeight observed29.11.93NOhot iron2.301435.12.93NOhot iron2.298166.12.93NOhot iron2.301828.12.93NOhot iron2.2989012.12.93Airhot iron2.3101714.12.93Airhot iron2.3098619.12.93Airhot iron2.3101022.12.93Airhot iron2.3100126.12.93N2Ohot iron2.2988928.12.93N2Ohot iron2.299409.1.94NH4NO2hot iron2.2984913.1.94NH4NO2hot iron2.2988927.1.94Airferrous hydrate2.3102430.1.94Airferrous hydrate2.310301.2.94Airferrous hydrate2.31028

Box & whisker plot

dot plot

Two separate box & whisker plots

TechniqueFind a graph that shows clearly that the data can be divided into two different groups

Appropriate representation depends on your practical goal

Precise descriptions are better"Most of the key questions in our world sooner or later demand answers to "by how much?" rather than merely to "in which direction?" (Tukey, 1977)

Hick's LawChoice Reaction Time experimentRT increases with number of possible response alternatives

Hick's law

Hick's law

Interpreting EDAMultiplicity

Interpreting EDASummarise the resultsDiscover unanticipated resultsnew line of research, new experimentqualify conclusion from the present studyGenerate hypothesesCheck assumptionsqualify conclusion from the present studyaddress anomalies

NOT (or, rarely) a definitive conclusion

Practical week 7Using EDA for data screening in simple & multiple regression

VisualisationNameVoyagerBullying data

Register for bullying data before the practical!

e.g. table of raw dataa representation that the eye finds congenial is of more practical use

Recommended