graphical and tabular descriptive techniques statistics for management and economics chapter 2...

32
Graphical and Tabular Descriptive Techniques Statistics for Management and Economics Chapter 2 Updated: 03/30/22

Upload: aubrey-long

Post on 14-Dec-2015

247 views

Category:

Documents


1 download

TRANSCRIPT

Graphical and Tabular Descriptive Techniques

Statistics for Management and Economics

Chapter 2

Updated: 04/18/23

Objectives

Types of Data and Information Graphical and Tabular Techniques for

Nominal Data Graphical Techniques for Interval Data Describing the Relationship Between

Two Variables Describing Time-Series Data

Types of data and information

A variable is a characteristic of population or sample of interest

Data, or data points, is the actual values of variables. There are three general types: Interval, Nominal and Ordinal

We also commonly designate them: Quantitative or Categorical

Types of data and information: Interval Data

Real numbers, for example: heights, weights, prices, etc.

Also referred to as quantitative or numerical.

Arithmetic operations can be performed on Interval Data, thus it’s meaningful to talk about 2*Height, or Price + $1, and so on.

Types of data and information: Nominal Data

Nominal data are also called qualitative or categorical.

The values of nominal data are categories, for example: responses to questions about marital status, coded as Single = 1, Married = 2, Divorced = 3, Widowed = 4

Because the numbers are arbitrary arithmetic operations don’t make any sense (e.g. does Widowed ÷ 2 = Married?!)

Types of data and information: Ordinal Data

Ordinal Data appear to be categorical in nature, but their values have an order; a ranking to them. For example, college course rating system: poor = 1, fair = 2, good = 3, very good = 4, excellent = 5.

Order is maintained no matter what numeric values are assigned to each category.

While it’s still not meaningful to do arithmetic on this data (e.g. does 2*fair = very good?!), we can say things like:

excellent > poor or fair < very good

Types of data and information Cross-sectional data is data that is

collected at a certain point in time Marketing survey Starting salaries of MBA graduates

Longitudinal data is collected over a period of time. Sometimes also referred to as time-series data. Weekly starting prices of gold Daily bid price on posted e-Bay item

Types of data and information

Prospective data is collected from the current point into the future.

Retrospective or historical data is collected on events that have happened in the past. Compare total return on past years for a

particular IRA

Graphical and Tabular Techniques for Nominal Data

To summarize nominal data, counts (frequencies) are used to describe the number of observations in each category.

These counts are reported in a table called a frequency distribution.

If the percents (or proportions) associated with each count are also reported, this is a relative frequency distribution (i.e., the amount in each group relative to the total).

Graphical and Tabular Techniques for Nominal Data

Bar Chart Often used to display

frequencies The bars represent

each category Height of the bar

represents the frequency

The base of the bar is arbitrary

Pie Chart Shows relative

frequencies A circle, divided into a

number of slices, each of which represent a category

The size of the “slice” is proportional to the size of the percentage for the corresponding category

Graphical and Tabular Techniques for Interval Data

Histogram, stem-and-leaf plot, and the ogive are used when the data are interval (i.e. numeric, non-categorical).

The most important of these graphical methods is the histogram.

The histogram is not only a powerful graphical technique used to summarize interval data, but it is also used to help explain probabilities.

Histogram

A bar-like graph where the bases are intervals and heights are frequencies

Steps to building a histogram:1. Collect data

2. Create a frequency distribution

3. Draw the histogram Interpretation of the histogram:

Symmetry, Skewness, Modality, Bell Shape

Frequency Distribution for Interval Data

Interval data has to be broken down into a series of intervals, or classes

Table 2.6 shows the approximate number of classes in a frequency distribution based on the number of observations (more observations more classes)

Sturges’ formulaNumber of class intervals = 1 + 3.3

log(n) Determine interval widths

(largest obs – smallest obs) / number of classes Summarize data into classes

Interpretation: Symmetry

A histogram is said to be symmetric if, when we draw a vertical line down the center of the histogram, the two sides are identical in shape and size:

Fre

quency

Variable

Fre

quency

VariableFre

quency

Variable

Interpretation: Skewness

A skewed histogram is one with a long tail extending to either the right or the left:

Fre

quency

Variable

Fre

quency

Variable

Positively Skewed Negatively Skewed

Interpretation: Modality

A unimodal histogram is one with a single peak, while a bimodal histogram is one with two peaks:

Fre

quency

Variable

Unimodal

Fre

quency

Variable

Bimodal A modal class is the class withthe largest number of observations

Interpretation: Bell Shape

A special type of symmetric unimodal histogram is one that is bell shaped:

Fre

quency

Variable

Bell Shaped

Many statistical techniques require that the population be bell shaped.

Drawing the histogram helps verify the shape of the population in question.

Let’s See…

Stem-and-leaf plot

A graphical technique often used in preliminary analyses.

Actual values for each observation is used to built the plot (as opposed to summaries as in the histogram).

Each observation is split into a stem and a leaf and plotted.

Creating a Stem-and-Leaf Plot

Observation value: 42.19 There are several ways to split the

stem and leaf: We could split it at the decimal point: Or split it at the “tens” position (first round

to 42)

Stem

Leaf

42 19

4 2

Creating a Stem-and-Leaf Plot

Continue this process for all the observations. Then, line up “stems” in increasing numerical order, with “leaves” to the right (also in increasing numerical order)

Stem Leaf0 00000000001111122222233333455555566666667788889999991 0000011112333333344555556678899992 00001111123446667789993 0013355894 1244455895 335666 34587 0222245567898 3344578899999 0011222223334455599910 00134444669911 124557889

Thus, we still have access to our original data point’s value!

The length of each line represents the frequency of the class defined by the stem.

Ogives Pronounced “Oh-jive” Graph of a cumulative frequency

distribution Steps to an Ogive:

1. Create relative frequency distribution

2. Calculate cumulative frequencies

3. Graph cumulative relative frequencies

Interpretation: OgiveOgive: Starting Accounting Incomes

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

$42,500 $45,900 $50,100 $54,300 $58,500 $62,700 $66,900 $71,100 $75,300

Incomes (dollars)

Cu

mu

lati

ve P

erce

nt

Q: 75% of First-Year accountants obtain what salary?

Q: An accounting firm is offering Suzy $55,000 for her first year. Where does she fall among her peers?

A: About $58,600

A: Around the 50 Percentile.

Graphical and Tabular Techniques for Two Variables

Sometimes it is of interest to summarize more than one variable in your dataset or to compare similar variables from two groups.

Different combinations of variables: Nominal/Ordinal, Nominal/Nominal, Ordinal/Ordinal, Time Series Data

Contingency Tables, Scatter Diagrams, Bar Graphs, Line Plot, side-by-side box plot, back-to-back stem plot

Tabular Techniques For Two Nominal Variables

A contingency table (also called a cross-classification table or cross-tabulation table) is used to describe the relationship between two nominal variables.

A contingency table lists the frequency of each combination of the values of the two variables. Relative frequencies are also reported in the table.

The data can then be summarized graphically with a bar cart

Creating a Contingency Table Rows and columns of table

represent categories in each variable

Each combination of the levels of each variable is summarize in the cells of the table.

This reader’s response is captured as part of the total number on the contingency table…

Example 2.8

Interpretation of a Contingency Table

Percentages or proportions calculated to allow comparisons across cells

Identify patterns that appear among the data

Using the Pivot Table in Excel

In order to create a contingency table in Excel to summarize or graph data, you can use the Pivot Table.

Let’s see…

Graphical Comparison Between Two Interval Variables

Sometimes we are interested in how two interval variables are related.

To explore this relationship, we employ a scatter diagram or scatterplot, which plots two variables against one another.

The independent (predictor, explanatory) variable is labeled X and is usually placed on the horizontal axis, while the other, dependent (outcome, response) variable, Y, is mapped to the vertical axis.

Let’s see…

Interpretation: Scatterplot

Positive Linear Relationship Negative Linear Relationship

Weak or Non-Linear Relationship

StrengthThe extent to which the data points fit the pattern in the

plot

PatternDo the data fall in a linear pattern? The

pattern tells us if there is a relationship

or not

DirectionThe direction in

which the data fall – tells important

information about the relationship

Relationship Between One Nominal and One Interval Variable

Bar Chart is an effective way to summarize with one set of bars for each of the levels of the nominal variable

Back-to-back stem plot Side-by-side box plot

Time Series Plot

Observations measured at successive points in time are called time-series data.

Time-series data graphed on a line chart, which plots the value of the variable on the vertical axis against the time periods on the horizontal axis.

Can be used to compare multiple groups over time

Let’s see…

Time Series Plot•Total amounts of U.S. income tax for the years 1987 to 2002

•Identify Patterns in the chart in order to identify what is happening over time.

•From ’87 to ’92, the tax was fairly flat. Starting ’93, there was a rapid increase taxes until 2001. Finally, there was a downturn in 2002

•We could plot several lines here as well to compare groups.