cs101 lecture 10: excel data analysis - computer … 1 aaron stevens 30 september 2010 cs101 lecture...

12
1 1 Aaron Stevens 30 September 2010 CS101 Lecture 10: Excel Data Analysis "There are three kinds of lies: lies, damned lies, and statistics.” - Mark Twain paraphrasing Benjamin Disraeli 2 What You’ll Learn Today – How do we describe data? – How do we find relationships within data? – How do we analyze data in Excel? – To what extend do two datasets vary together? – Can we describe relationships between data as equations?

Upload: duongcong

Post on 03-May-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CS101 Lecture 10: Excel Data Analysis - Computer … 1 Aaron Stevens 30 September 2010 CS101 Lecture 10: Excel Data Analysis "There are three kinds of lies: lies, damned lies, and

1

1

Aaron Stevens30 September 2010

CS101 Lecture 10:Excel Data Analysis

"There are three kinds of lies:lies, damned lies, and statistics.”

- Mark Twain paraphrasing Benjamin Disraeli

2

What You’ll Learn Today

– How do we describe data?– How do we find relationships within data?– How do we analyze data in Excel?– To what extend do two datasets vary

together?– Can we describe relationships between

data as equations?

Page 2: CS101 Lecture 10: Excel Data Analysis - Computer … 1 Aaron Stevens 30 September 2010 CS101 Lecture 10: Excel Data Analysis "There are three kinds of lies: lies, damned lies, and

2

3

What is Data Analysis?Data analysis is the process used to getfrom raw data to the results that can beused to make decisions.

Results of data analysis can be used for:– Detecting trends– Making predictions

4

Example Data

We have some datadescribing how wellmovies did at the BoxOffice and Videosales:(sales in $ millions)

Page 3: CS101 Lecture 10: Excel Data Analysis - Computer … 1 Aaron Stevens 30 September 2010 CS101 Lecture 10: Excel Data Analysis "There are three kinds of lies: lies, damned lies, and

3

5

Descriptive Statistics

Descriptive Statistics answer basicquestions about the central tendency anddispersion of data observations.– Range of values– Middle value– Frequency distribution

6

Descriptive Statistics

Calculatingdescriptivestatisticsusing Excel:

Menu:Tools ->Data Analysis

Page 4: CS101 Lecture 10: Excel Data Analysis - Computer … 1 Aaron Stevens 30 September 2010 CS101 Lecture 10: Excel Data Analysis "There are three kinds of lies: lies, damned lies, and

4

7

Descriptive Statistics- Mean, Std Error, Median, mode, StandardDeviation, Range, Min, Max, Sum, Count.

8

Histogram

An histogram describes the frequency distributionof the data observations as grouped into “buckets.”

Page 5: CS101 Lecture 10: Excel Data Analysis - Computer … 1 Aaron Stevens 30 September 2010 CS101 Lecture 10: Excel Data Analysis "There are three kinds of lies: lies, damned lies, and

5

9

Relationships Between SeriesA Dot Plot graphically shows the relationship betweenpairs of observations in two data series.

10

Relationships Between SeriesThis plot shows an apparent relationshipbetween box office revenue and videosales revenue.

Page 6: CS101 Lecture 10: Excel Data Analysis - Computer … 1 Aaron Stevens 30 September 2010 CS101 Lecture 10: Excel Data Analysis "There are three kinds of lies: lies, damned lies, and

6

11

CorrelationCorrelation is the extent to which variables in twodifferent data series tend to move together (or apart)over time.

12

Inverse CorrelationAn inverse correlation exists when two data series movein opposite directions.

Page 7: CS101 Lecture 10: Excel Data Analysis - Computer … 1 Aaron Stevens 30 September 2010 CS101 Lecture 10: Excel Data Analysis "There are three kinds of lies: lies, damned lies, and

7

13

CorrelationExample of a weak Correlation

14

Describing CorrelationA correlation coefficient describes the strength of thecorrelation between two series. Values in range (-1.0, 1.0)

Positive correlation: large values of one set areassociated with large values of the other and vice versa.Negative correlation: small values of one set areassociated with large values of the other and vice versa.Zero correlation: the values in the two sets are notcorrelated linearly.

Page 8: CS101 Lecture 10: Excel Data Analysis - Computer … 1 Aaron Stevens 30 September 2010 CS101 Lecture 10: Excel Data Analysis "There are three kinds of lies: lies, damned lies, and

8

15

What exactly is therelationship?

Correlation measures whether a linear relationshipexists between two series of data.

Linear Regression attempts to find the relationshipbetween the two series and expresses this relationwith a linear equation.

Linear equation in the form: y = mx + b

16

Linear RegressionTo run a linear regression:

Select a dependent variable (y) and an independentvariable (x).

Page 9: CS101 Lecture 10: Excel Data Analysis - Computer … 1 Aaron Stevens 30 September 2010 CS101 Lecture 10: Excel Data Analysis "There are three kinds of lies: lies, damned lies, and

9

17

Linear Regression Analysis

What does this output tell us?It describes the relationship in terms of an equation:

Video sales = -140 + 4.33 (Box office sales)

18

Linear Regression Analysis

What does this output tell us?It describes the relationship in terms of an equation:

Video sales = -140 + 4.33 (Box office sales)

Page 10: CS101 Lecture 10: Excel Data Analysis - Computer … 1 Aaron Stevens 30 September 2010 CS101 Lecture 10: Excel Data Analysis "There are three kinds of lies: lies, damned lies, and

10

19

Linear Regression Plot

20

How good is the fit?R-Square statistic describes how muchof the variation in Y variable wasexplained by variation in X variable.– R-Square = 1 is perfect.– R-Square > 0.5 is considered good.

Page 11: CS101 Lecture 10: Excel Data Analysis - Computer … 1 Aaron Stevens 30 September 2010 CS101 Lecture 10: Excel Data Analysis "There are three kinds of lies: lies, damned lies, and

11

21

How good is the fit?P-Value statistic describes thelikelihood of randomness explaining thevalue for the equation’s coefficients.– P-value > 0.05 or 0.10 indicates randomness.– P-value 0.0 indicates non-randomness.

22

What You Learned Today– Data Analysis– Descriptive Statistics– Correlation– Linear Regression

Page 12: CS101 Lecture 10: Excel Data Analysis - Computer … 1 Aaron Stevens 30 September 2010 CS101 Lecture 10: Excel Data Analysis "There are three kinds of lies: lies, damned lies, and

12

23

Student To Dos– HW04 (Excel Data Analysis) due WED 10/6– Quiz 2 is on TUE 10/5

• Covers lectures 7, 8, 9, 10 (Excel)