chapter 2 organizing/displaying data 2.1 bar, circle and time-series graphs

34
Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Upload: helena-baldwin

Post on 26-Dec-2015

256 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Chapter 2Organizing/Displaying

Data2.1 Bar, Circle and Time-

Series Graphs

Page 2: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Exploratory Data Analysis

EDA is a method of studying data that uses stem/leaf plots and histograms. It allows for exploration, pattern finding, and observation of extreme values.

EDA is used when you have general data but are not sure where it might lead or you have few prior assumptions. This is opposed to an experiment where specific data is collected (perhaps with controls) and the observer has particular questions in mind.

Page 3: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Bar Graphs

Segmented bar chart

Page 4: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Bar GraphsVertical or horizontalQuantitative or Qualitative DataBars of uniform Width and uniform spacing betweenLengths represent values of variables, frequency of

occurrence or % of occurrenceLabeled, titled, scalesSometimes scales on sides are general but you will

also see a label on top of a bar to give more specific information

You can change the scale by putting in a “break” on the vertical axis.

Page 5: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Area Principle

The area occupied by a part of a graph should correspond to the magnitude of value it represents. Otherwise, the picture can be misleading even though it is labeled correctly.

Page 6: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

15050 200100

Average amount spent on

Holiday Gifts per child

1970

2000

1990

1980

Page 7: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Pareto charts

Page 8: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Circle Graphs (i.e. Pie charts)

Page 9: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Circle Graphs (i.e. Pie Charts)

Each wedge displays proportional part of total population (that is, the percentage that give a particular answer or share a characteristic)

OK for qualitative data.

Page 10: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Time Series Graph (Time Plot)

Page 11: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Time Series Graph (Time Plot)

Data plotted in sequential orderData sequencing is at regular intervals

Time Series Data must be collected for thre same variable for the same subject at regular intervals over a period of time. NASDAQ, NYSE, Rainfall, etc are examples of Time Series Data

Page 12: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Be Careful!!

Make sure your graph is actually saying something…

Some examples of poor graphs

Page 13: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Resources

wme.cs.kent.edu/kimpton/img/b.png

www.nifl.gov/readingprofiles/FT_Introduction.htm

www.statcan.ca/.../power/ch9/bargraph/bar.htm

http://blogs.ittoolbox.com/eai/implementation/archives/now-with-the-pareto-charts-11915

http://www.quadbase.com/espressreport/help/manual/Charting5.html

http://www.newsandtech.com/issues/2007/01-07/nt/01-07_cornish.htm

http://www.statcan.ca/english/edu/power/ch9/piecharts/pie.htm

http://justinsomnia.org/2006/04/bloglines-subscription-stats-just-check-your-httpd-access-logs/

http://www.appiananalytics.com/solutions/report_automation_gallery.htm

http://trmm.gsfc.nasa.gov/trmm_rain/Events/malaysia_time_series.graph.gif

http://news.bbc.co.uk/2/hi/uk_news/education/4276473.stm

http://www.env.gov.bc.ca/wat/wq/trendstuff/9trends/9locations-06.html

http://www.sapdesignguild.org/resources/diagram_guidelines/DIAGRAMS/SegmSR.GIF

Page 14: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

2.2 More Data Display

Histograms, Frequency Tables and Contingency Tables

Page 15: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Displaying data by counts

Sometimes there is a lot of data. One way to evaluate data is to list the counts, or how many times a particular answer is given. Imagine if the senior and junior class are asked to choose their favorite car color out of three choices. One way to show that data is a Contingency Table.

Page 16: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Contingency TableAlso called a Two Way Table

A contingency table is a way to display and analyze the relationship between 2 (or more) sets of categorical data.

Red Blue Silver TOTAL

Female 41 62 33 136

Male 59 22 50 131

TOTAL 100 84 88 267

Marginal Total

Marginal

Total

Grand Total

If the data is given as percentages, it may be called a two way frequency table.

Page 18: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Histogram

Bars touch Width of bars represent a quantitative

value (the class)Height indicates frequency (how many

individuals give a response in each particular class)

Some books call the bars bins

Great way to evaluate large quantities of data

Page 19: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Types of Histograms

Symmetric Histogram

Bimodal Histogram

Histogram

0

5

1 1 1 1 1 1 1

Bin

Frequency

Frequency

Uniform/Rectangular Histogram

Skewed Left/Right Histograms

Page 20: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Frequency TableA frequency table is used to organize data for drawing the histogram. Class category or interval

Class Width Width of particular interval

Class Frequency # of tally marks for a particular class

Lower/Upper Class Limit lowest/highest data value that can fit in a particular class

Lower Class Limit + Class Width Smallest Value for next class

Class Boundaries Upper class boundary = UCL + ½ Lower Class Boundary = LCL - ½

Class Midpoint LCL + UCL 2

Page 21: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

How to make a Histogram1. Make a frequency table

a) Determine # of classes and class widthb) Find LCL and UCL c) Tally data and find CFd) Find midpointse) Find boundaries

2. Put class Boundaries on horizontal axis, Frequencies on Vertical axis

3. Draw a bar with width extending between Class boundaries, whose height = that particular class frequency

Class Width = Largest Data Value – Smallest Data Value

Desired # of Classes

Page 22: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

TI 83

Enter the data by hand into L1 – press “STAT”, “EDIT” and start typing into List 1.

Hit “2nd” “StatPlot”; Turn“ON” the stats plot on Plot1 and select the histogram picture. The TI83 should automatically select the correct list. If it doesn’t, change it by typing in “2nd” and then the list name you want (see above the number keys for the lists)

Hit Graph – you will see a histogram.Go to window and change the xscale to the class

width and that forces it to match your choice of classes.

“Trace” then allows you to see class information.

Page 23: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Excel

Enter all the data by hand.Select “Tools”, “Data Analysis”, “Histogram”

(it might need to install the data analysis package – do it)

Input range is your range of data valuesOutput range is the list that you create

somewhere else in your table that lists the maximum value for each class. This will force it to make the # of classes you want.

Then click OK. It will put it on another worksheet in your file.

Page 24: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Ogive

An Ogive is a dot plot that shows the accumulation at each level.

Page 25: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Resourceshttp://en.wikipedia.org/wiki/Histogram

http://www.statcan.ca/english/edu/power/ch9/histograms/histo.htm

http://www.aivosto.com/project/help/pm-charts.html

www.tcnj.edu/~rgraham/rhetoric/statistics.html

http://www.ncsu.edu/scivis/lessons/variation/varlab2.html

http://mayoresearch.mayo.edu/mayo/research/cpor/tip2.cfm

Page 26: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

2.3 Stem/Leaf and Dotplots

Page 27: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

What is a stem/leaf display?

Another way to display data in a histogram like method without losing the actual individual data values is a stem/leaf display. It looks like this:

Stem Leaf

Data such as 25, 26, 30, 31, 32, 33, 34, 35, 35, 40, 41, 44 would display like this:

2 5 6

3 0 1 2 3 4 4 5

4 0 1 4

and it is arranged so that the stem is the left digit(s) and the leaves are the right digit(s)

Page 28: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Why a stem/leaf display

Turn that data table sideways and it looks like a histogram – the class with more entries (higher frequency) extends further right.

Lets turn our day 1 pulse exercise into a stem/leaf display.

Page 29: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

How to make a stem/leaf display

1. Choose which digits will be the stem and which will be the leaf.

2. Align stems from smallest to largest *3. Place the leaves on line with the

corresponding stem4. Label to indicate representation. i.e.

6│1 = 61 beats per minute.

Page 30: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Dot PlotA dot plot is a little more basic. One axis is the

individuals (or perhaps basic count value), the other is the quantitative data values, and dot represents each data value.

These are the Kentucky Derby winning times from 1875 through 2004.

Any idea why there are two clusters? (Hint: something happened in 1896, and it has nothing to do with steroids)

Page 31: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Blogs

I found this guy’s blog that was really interesting. He was musing over his ipod playlist and wondered how many times some of his songs had played.

Page 32: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

“I exported the Library and wrote some python scripts to extract data …It turns out I have 208 unplayed songs in my library, and additionally lots of low single digit playcount songs. Here’s an (ugly excel generated) histogram”

Page 33: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

While I was delving around, I figured I would see if theres any correlation between the length of time a song has been in my library, and the number of times it’s been played. The dot plot turned out interesting .

Page 34: Chapter 2 Organizing/Displaying Data 2.1 Bar, Circle and Time-Series Graphs

Resourceshttp://www.monkeyatlarge.com/archives/2006/07/

Bock, V., Velleman, P., De Veaux, R, Stats: Modeling the World, 2nd Edition, Boston, Pearson Addison Wesley p. 49