question 4 what are data and what do they mean to a scientist?

Download Question 4 What are data and what do they mean to a scientist?

Post on 20-Jan-2016

215 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • QuestionWhat are data and what do they mean to a scientist?

  • Dinner at the Urquhart HouseBrought to you by the Briggs Multiracial AllianceSunday nightAll food provided (probably Chinese)Contact Mimi Reddy, reddydee@msu.edu for details

  • Data, Statistics, and SpreadsheetsWhat are data?What are statistics?What are spreadsheets?How can you analyze data with spreadsheets?

  • DataData are pieces of informationData can be numbers, words, descriptionsData have UNITSThe word data is PLURAL, datum is singularData about Willoughby: Age: 5 (years)Height: 47 (inches)Weight: 66 (pounds)Eyes: BlueFavorite word: WrestleFavorite letter: W

  • Types of DataNumbers two typesReal #s rational numbers 28.75 lbsIntegers whole numbers 18 monthsLetters called characters in programmingW is a characterWords called strings in programmingNo thanks is a strings, can be individual words or phrases

  • Statistics and DataTest Scores: Jeff: 88Mollie: 92Marcie: 88Dave: 47Karim: 99Willoughby: 42Benjamin: 0What statistics can you calculate to describe these data?

    Try to think of four things to describe the datastop

  • StatisticsStatistics are derived from the data Statistics are descriptions of dataStatistics are meant to simplify the dataStatistics can be misleading

  • Typical StatisticsSample Size - number of individuals measured = nSum = SAverage or Mean = S/nMedian Value of 50th percentile, half of values fall above, half belowMaximum, Minimum, Range (Max-Min)Mode - most common valueStandard deviationVariance (SD2)

  • Analyze these data...Mean, max, min, range, median, mode1833447493829455sample size (n)

    Sum S

    mean=average=S/ndenoted xmedian = halfway

    mode = most common

  • SpreadsheetsSpreadsheets are tables

    Spreadsheets allow calculations and manipulations of dataCalculations: mean, standard deviationManipulations: sort,

    CostaRica

    Nicaragua

    Rainforest

    625,000

    3,712,000

    Dry Forest

    50,000

    300,000

    Total

    675,000

    4,012,000

  • Make a data table:Fly 1, length 13.4 mm, velocity 27 Kph, age 21 daysFly 2, length 9.4 mm, velocity 0 Kph, age 220 daysFly 3, length 9.3 mm, velocity 44 Kph, age 1 daysFly 4, length 13.4 mm, velocity 17 Kph, age 32 daysFly 5, length 17.4 mm, velocity 33 Kph, age 11 days

    How many columns?How many rows? #s go down or across?

  • Data Table

  • Microsoft ExcelTypical spreadsheet programLotus 1-2-3 is original commercial spreadsheetHas similar controls to MS WordNow allows graphing (charts) very restricted formats, hard to get exactly what you wantExcel tables and graphs can be copied into MS Word

  • Fridays AssignmentWe will work with Microsoft Excel to analyze some dataGroups of two will submit one finished spreadsheet for the assignment

  • GraphsMany different types of graphsPointsLinesBarsPies

  • Point GraphsCalled X-Y Scatter in MS ExcelPlot points based on X and Y valueCan fit a REGRESSION LINE to the dataLine that best fits the data

  • X-Y Scatter

  • Bar GraphsCategorize data into counts or percentsCategories can be descriptive categories (Windows 98, Windows 2000, )Can also be numeric categories Height: 60-63, 63-66, etc. or just 61, 62, 63Count up number of people in each groupHistograms are a particular type of bar graph

  • Bar Graph

    Chart1

    36000

    38000

    39500

    41000

    43000

    45000

    47000

    Starting Salary

    Sheet1

    YearStarting Salary

    1988$36,000

    1989$38,000

    1990$39,500

    1991$41,000

    1992$43,000

    1993$45,000

    1994$47,000

    Sheet1

    Starting Salary

    Sheet2

    Sheet3

  • HistogramX axis is categoriesY axis is a number or proportion of observations in that category

  • Histogram Bar GraphNumber of Crashes

  • Regular Bar Graph vs. Histogram Bar Graph

    Chart1

    36000

    38000

    39500

    41000

    43000

    45000

    47000

    Starting Salary

    Sheet1

    YearStarting Salary

    1988$36,000

    1989$38,000

    1990$39,500

    1991$41,000

    1992$43,000

    1993$45,000

    1994$47,000

    Sheet1

    Starting Salary

    Sheet2

    Sheet3

  • DistributionsSpecial type of histogram with continuous numeric scale at bottomNormal distribution is a key concept in statisticsSkewed distribution is one that is unbalanced

  • Sample distribution histogramsDanyoungyoo, Katanchalee, and Srichawla, www.s-t.au.ac.th/handout/st2204/week5-Univariate-Des.pptRobert D. Duval, PS 400 Lecture, www.polsci.wvu.edu/duval/ps400/Notes/400Notes.ppt

  • The NORMAL DistributionA NORMAL DISTRIBUTION is the theoretical distribution of values given natural variation around a MEANIt is balanced, humped distribution

  • DistributionsSkew is an imbalance in the distributionDanyoungyoo, Katanchalee, and Srichawla, www.s-t.au.ac.th/handout/st2204/week5-Univariate-Des.ppt

  • Hypothesis TestingStatistical Tests are how scientists decide if data support their hypothesis (NOT PROVE their hypothesis)Four major statistical tests: T-test, X2 Test, Regression, ANOVA

  • HypothesisProcessor speed has an effect on the performance of the computer.Null HypothesisH0: Processor speed has NO EFFECT on the performance of a computer.

  • Statistical Tests and ProbabilityStatistical tests give a valueThat value can be related to a probabilityProbability is likelihood that NULL hypothesis is correct given the data you haveIf P < 0.05 (1/20), then you conclude NULL hypothesis is FALSE

  • T-TestCompares differences between two means

    Formula: T = (x1-x2)/SEMSEM is Standard Error of Mean [SD/(N-1)]T Values: Difference between mean in comparison to the amount of spread in your data

  • T-ValuesIf T > 2.5 or 3.0, difference is usually significant (this depends on your sample sizes)