# question 4 what are data and what do they mean to a scientist?

Post on 20-Jan-2016

215 views

Embed Size (px)

TRANSCRIPT

QuestionWhat are data and what do they mean to a scientist?

Dinner at the Urquhart HouseBrought to you by the Briggs Multiracial AllianceSunday nightAll food provided (probably Chinese)Contact Mimi Reddy, reddydee@msu.edu for details

Data, Statistics, and SpreadsheetsWhat are data?What are statistics?What are spreadsheets?How can you analyze data with spreadsheets?

DataData are pieces of informationData can be numbers, words, descriptionsData have UNITSThe word data is PLURAL, datum is singularData about Willoughby: Age: 5 (years)Height: 47 (inches)Weight: 66 (pounds)Eyes: BlueFavorite word: WrestleFavorite letter: W

Types of DataNumbers two typesReal #s rational numbers 28.75 lbsIntegers whole numbers 18 monthsLetters called characters in programmingW is a characterWords called strings in programmingNo thanks is a strings, can be individual words or phrases

Statistics and DataTest Scores: Jeff: 88Mollie: 92Marcie: 88Dave: 47Karim: 99Willoughby: 42Benjamin: 0What statistics can you calculate to describe these data?

Try to think of four things to describe the datastop

StatisticsStatistics are derived from the data Statistics are descriptions of dataStatistics are meant to simplify the dataStatistics can be misleading

Typical StatisticsSample Size - number of individuals measured = nSum = SAverage or Mean = S/nMedian Value of 50th percentile, half of values fall above, half belowMaximum, Minimum, Range (Max-Min)Mode - most common valueStandard deviationVariance (SD2)

Analyze these data...Mean, max, min, range, median, mode1833447493829455sample size (n)

Sum S

mean=average=S/ndenoted xmedian = halfway

mode = most common

SpreadsheetsSpreadsheets are tables

Spreadsheets allow calculations and manipulations of dataCalculations: mean, standard deviationManipulations: sort,

CostaRica

Nicaragua

Rainforest

625,000

3,712,000

Dry Forest

50,000

300,000

Total

675,000

4,012,000

Make a data table:Fly 1, length 13.4 mm, velocity 27 Kph, age 21 daysFly 2, length 9.4 mm, velocity 0 Kph, age 220 daysFly 3, length 9.3 mm, velocity 44 Kph, age 1 daysFly 4, length 13.4 mm, velocity 17 Kph, age 32 daysFly 5, length 17.4 mm, velocity 33 Kph, age 11 days

How many columns?How many rows? #s go down or across?

Data Table

Microsoft ExcelTypical spreadsheet programLotus 1-2-3 is original commercial spreadsheetHas similar controls to MS WordNow allows graphing (charts) very restricted formats, hard to get exactly what you wantExcel tables and graphs can be copied into MS Word

Fridays AssignmentWe will work with Microsoft Excel to analyze some dataGroups of two will submit one finished spreadsheet for the assignment

GraphsMany different types of graphsPointsLinesBarsPies

Point GraphsCalled X-Y Scatter in MS ExcelPlot points based on X and Y valueCan fit a REGRESSION LINE to the dataLine that best fits the data

X-Y Scatter

Bar GraphsCategorize data into counts or percentsCategories can be descriptive categories (Windows 98, Windows 2000, )Can also be numeric categories Height: 60-63, 63-66, etc. or just 61, 62, 63Count up number of people in each groupHistograms are a particular type of bar graph

Bar Graph

Chart1

36000

38000

39500

41000

43000

45000

47000

Starting Salary

Sheet1

YearStarting Salary

1988$36,000

1989$38,000

1990$39,500

1991$41,000

1992$43,000

1993$45,000

1994$47,000

Sheet1

Starting Salary

Sheet2

Sheet3

HistogramX axis is categoriesY axis is a number or proportion of observations in that category

Histogram Bar GraphNumber of Crashes

Regular Bar Graph vs. Histogram Bar Graph

Chart1

36000

38000

39500

41000

43000

45000

47000

Starting Salary

Sheet1

YearStarting Salary

1988$36,000

1989$38,000

1990$39,500

1991$41,000

1992$43,000

1993$45,000

1994$47,000

Sheet1

Starting Salary

Sheet2

Sheet3

DistributionsSpecial type of histogram with continuous numeric scale at bottomNormal distribution is a key concept in statisticsSkewed distribution is one that is unbalanced

Sample distribution histogramsDanyoungyoo, Katanchalee, and Srichawla, www.s-t.au.ac.th/handout/st2204/week5-Univariate-Des.pptRobert D. Duval, PS 400 Lecture, www.polsci.wvu.edu/duval/ps400/Notes/400Notes.ppt

The NORMAL DistributionA NORMAL DISTRIBUTION is the theoretical distribution of values given natural variation around a MEANIt is balanced, humped distribution

DistributionsSkew is an imbalance in the distributionDanyoungyoo, Katanchalee, and Srichawla, www.s-t.au.ac.th/handout/st2204/week5-Univariate-Des.ppt

Hypothesis TestingStatistical Tests are how scientists decide if data support their hypothesis (NOT PROVE their hypothesis)Four major statistical tests: T-test, X2 Test, Regression, ANOVA

HypothesisProcessor speed has an effect on the performance of the computer.Null HypothesisH0: Processor speed has NO EFFECT on the performance of a computer.

Statistical Tests and ProbabilityStatistical tests give a valueThat value can be related to a probabilityProbability is likelihood that NULL hypothesis is correct given the data you haveIf P < 0.05 (1/20), then you conclude NULL hypothesis is FALSE

T-TestCompares differences between two means

Formula: T = (x1-x2)/SEMSEM is Standard Error of Mean [SD/(N-1)]T Values: Difference between mean in comparison to the amount of spread in your data

T-ValuesIf T > 2.5 or 3.0, difference is usually significant (this depends on your sample sizes)