# question 4 what are data and what do they mean to a scientist?

• QuestionWhat are data and what do they mean to a scientist?

• Data, Statistics, and SpreadsheetsWhat are data?What are statistics?What are spreadsheets?How can you analyze data with spreadsheets?

• DataData are pieces of informationData can be numbers, words, descriptionsData have UNITSThe word data is PLURAL, datum is singularData about Willoughby: Age: 5 (years)Height: 47 (inches)Weight: 66 (pounds)Eyes: BlueFavorite word: WrestleFavorite letter: W

• Types of DataNumbers two typesReal #s rational numbers 28.75 lbsIntegers whole numbers 18 monthsLetters called characters in programmingW is a characterWords called strings in programmingNo thanks is a strings, can be individual words or phrases

• Statistics and DataTest Scores: Jeff: 88Mollie: 92Marcie: 88Dave: 47Karim: 99Willoughby: 42Benjamin: 0What statistics can you calculate to describe these data?

Try to think of four things to describe the datastop

• StatisticsStatistics are derived from the data Statistics are descriptions of dataStatistics are meant to simplify the dataStatistics can be misleading

• Typical StatisticsSample Size - number of individuals measured = nSum = SAverage or Mean = S/nMedian Value of 50th percentile, half of values fall above, half belowMaximum, Minimum, Range (Max-Min)Mode - most common valueStandard deviationVariance (SD2)

• Analyze these data...Mean, max, min, range, median, mode1833447493829455sample size (n)

Sum S

mean=average=S/ndenoted xmedian = halfway

mode = most common

Spreadsheets allow calculations and manipulations of dataCalculations: mean, standard deviationManipulations: sort,

CostaRica

Nicaragua

Rainforest

625,000

3,712,000

Dry Forest

50,000

300,000

Total

675,000

4,012,000

• Make a data table:Fly 1, length 13.4 mm, velocity 27 Kph, age 21 daysFly 2, length 9.4 mm, velocity 0 Kph, age 220 daysFly 3, length 9.3 mm, velocity 44 Kph, age 1 daysFly 4, length 13.4 mm, velocity 17 Kph, age 32 daysFly 5, length 17.4 mm, velocity 33 Kph, age 11 days

How many columns?How many rows? #s go down or across?

• Data Table

• Microsoft ExcelTypical spreadsheet programLotus 1-2-3 is original commercial spreadsheetHas similar controls to MS WordNow allows graphing (charts) very restricted formats, hard to get exactly what you wantExcel tables and graphs can be copied into MS Word

• Fridays AssignmentWe will work with Microsoft Excel to analyze some dataGroups of two will submit one finished spreadsheet for the assignment

• GraphsMany different types of graphsPointsLinesBarsPies

• Point GraphsCalled X-Y Scatter in MS ExcelPlot points based on X and Y valueCan fit a REGRESSION LINE to the dataLine that best fits the data

• X-Y Scatter

• Bar GraphsCategorize data into counts or percentsCategories can be descriptive categories (Windows 98, Windows 2000, )Can also be numeric categories Height: 60-63, 63-66, etc. or just 61, 62, 63Count up number of people in each groupHistograms are a particular type of bar graph

• Bar Graph

• DistributionsSpecial type of histogram with continuous numeric scale at bottomNormal distribution is a key concept in statisticsSkewed distribution is one that is unbalanced

• Sample distribution histogramsDanyoungyoo, Katanchalee, and Srichawla, www.s-t.au.ac.th/handout/st2204/week5-Univariate-Des.pptRobert D. Duval, PS 400 Lecture, www.polsci.wvu.edu/duval/ps400/Notes/400Notes.ppt

• The NORMAL DistributionA NORMAL DISTRIBUTION is the theoretical distribution of values given natural variation around a MEANIt is balanced, humped distribution

• DistributionsSkew is an imbalance in the distributionDanyoungyoo, Katanchalee, and Srichawla, www.s-t.au.ac.th/handout/st2204/week5-Univariate-Des.ppt

• Hypothesis TestingStatistical Tests are how scientists decide if data support their hypothesis (NOT PROVE their hypothesis)Four major statistical tests: T-test, X2 Test, Regression, ANOVA

• HypothesisProcessor speed has an effect on the performance of the computer.Null HypothesisH0: Processor speed has NO EFFECT on the performance of a computer.

• Statistical Tests and ProbabilityStatistical tests give a valueThat value can be related to a probabilityProbability is likelihood that NULL hypothesis is correct given the data you haveIf P < 0.05 (1/20), then you conclude NULL hypothesis is FALSE

• T-TestCompares differences between two means

Formula: T = (x1-x2)/SEMSEM is Standard Error of Mean [SD/(N-1)]T Values: Difference between mean in comparison to the amount of spread in your data

• T-ValuesIf T > 2.5 or 3.0, difference is usually significant (this depends on your sample sizes)

