1 introduction to statistics_handouts
TRANSCRIPT
-
8/16/2019 1 Introduction to Statistics_handouts
1/43
7/20/20
Introduction to Statistics
By
Dr. Vishal Singh Patyal
Learning Objectives
In this chapter, you will learn:
What is StatisticsWhy StatisticsBasic vocabulary used in StatisticsHow statistics is used in Business
The sources of data and its types used in BusinessTypes of VariablesLevel of ManagementTabular and Graphical Presentation of Data
2Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
2/43
7/20/20
3Introduction to Statistics
What is Statistics?
The science of collecting, describing, and interpreting data.
“Statistics is a way to get information from data”
Data
Statistics
Information
Data: Facts, especiallynumerical facts, collectedtogether for reference orinformation.
Information: Knowledgecommunicatedconcerning someparticular fact.
Statistics is a tool for creating new understanding from a set of numbers. 4Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
3/43
7/20/20
Why Study Statistics?
Decision Makers Use Statistics To:
Present and describe business data and information properlyDraw conclusions about large populations, using informationcollected from samplesMake reliable forecasts about a business activityImprove business processes
5Introduction to Statistics
Types of Statistics
StatisticsThe branch of mathematics that transforms data into
useful information for decision makers.
Descriptive Statistics
Collecting, summarizing,and describing data
Inferential Statistics
Drawing conclusionsand/or making decisionsconcerning a population
based only on sample data
6Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
4/43
7/20/20
Descriptive Statistics
Collect dataex. Survey
Present dataex. Tables and graphs
Characterize dataex. Sample mean = i X
nCollect
OrganizeSummarizeDisplayAnalyze 7Introduction to Statistics
Inferential Statistics
Estimationex. Estimate the populationmean weight using the samplemean weight
Hypothesis testingex. Test the claim that thepopulation mean weight is 120pounds
Drawing conclusions and/or making decisions concerning a populationbased on sample results.
Predict and forecast valuesof population parametersTest hypotheses aboutvalues of populationparametersMake decisions
8Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
5/43
7/20/20
Example
A recent study examined the QUANTand VERBALCAT scores of students across the country. Which of the followingstatements are descriptive in nature and which are inferential.
• The mean QUANTCAT score was 492. D• The mean VERBALSAT score was 475. D• Students in the Northeast scored higher in QUANTbut lower
in VERBAL. I• 80% of all students taking the exam were headed for IIMs. I• 32% of the students scored above 610 on the VERBALCAT. D• The QUANTCAT scores are higher than they were 10 years
ago. I
9Introduction to Statistics
PopulationA population consists of all the items or individuals about
which you want to draw a conclusion.A population is the group of all items of interest to a
statistics practitioner.frequently very large; sometimes infinite.
E.g. All 1.252 Billion Indian population i.e. census data.Sample
A subset of the population.A sample is a set of data drawn from the population.Potentially very large, but less than the population.
E.g. a sample of 765 voters exit polled on election day
Basic Vocabulary of Statistics
10Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
6/43
7/20/20
Parameter
PopulationSample
Statistic
Subset
Measures used to describethe population are calledparameters
Measures computedfrom sample data arecalled statistics
Basic Vocabulary of Statistics
11Introduction to Statistics
VariableA variable is some characteristic of a population or sample.E.g. student grades. Typically denoted with a capital letter: A,B, C…
The values of the variable are the range of possible values fora variable.
E.g. student marks (0..100)Data
Data are the observed values of a variable.Data are the different values associated with a variable.E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
Basic Vocabulary of Statistics
12Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
7/43
7/20/20
ExampleNICMAR Institute dean is interested in learning about the averageage of faculty. Identify the basic terms in this situation.
The population is the age of all faculty members at the Institute.A sample is any subset of that population. For example, we might
select 10 faculty members and determine their age.The variable is the “age” of each faculty member.The data would be the set of values in the sample.The parameter of interest is the “average” age of all faculty at the
Institute.The statistic is the “average” age for all faculty in the sample.
13Introduction to Statistics
Why Collect Data?
A marketing research analyst needs to assess theeffectiveness of a new television advertisement.A pharmaceutical manufacturer needs to determine whethera new drug is more effective than those currently in use.An operations manager wants to monitor a manufacturingprocess to find out whether the quality of product being
manufactured is conforming to company standards.An auditor wants to review the financial transactions of acompany in order to determine whether the company is incompliance with generally accepted accounting principles.
14Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
8/43
7/20/20
Sources of Data
Primary Sources:The data collector is the one using the data for analysis
Data from a political surveyData collected from an experimentObserved data
Secondary SourcesThe person performing data analysis is not the data collector
Analyzing census data
Examining data from print journals or data published onthe internet.
15Introduction to Statistics
Types of Variables
Data
Categorical Numerical
Discrete Continuous
Examples:
Marital StatusPolitical PartyEye Color
(Defined categories)Examples:
Number of ChildrenDefects per hour
(Counted items)
Examples:
WeightVoltage(Measured characteristics) 16Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
9/43
7/20/20
Types of Variables
CategoricalQualitative variables have values that can only be placed intocategories, such as “yes” and “no.”
A variable that categorizes or describes an element of apopulation.
Note : Arithmetic operations, such as addition and averaging, are not meaningful for data resulting from a qualitative variable
NumericalQuantitative variables have values that represent quantities.A variable that quantifies an element of a population.Note : Arithmetic operations such as addition and averaging, are
meaningful for data resulting from a quantitative variable.17Introduction to Statistics
ExampleIdentify each of the following examples as attribute (qualitative) ornumerical (quantitative) variables.
The amount of CNG pumped by the next 10 customers at the localhp PUMP . (Numerical)The amount of radon in the basement of each of 25 homes in anew development. (Numerical)
The color of the baseball cap worn by each of 20 students.(Attribute)The length of time to complete a mathematics homeworkassignment. (Numerical)The state in which each truck is registered when stopped andinspected at a weigh station. (Attribute)
18Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
10/43
7/20/20
Question?
Identify each of the following as examples of qualitative orquantitative variables:
The temperature in Barrow, Alaska at 12:00 pm on anygiven day.
The make of automobile driven by each faculty member.Whether or not a 6 volt lantern battery is defective.The weight of a lead pencil.The length of time billed for a long distance telephone call.The brand of cereal children eat for breakfast.
The type of book taken out of the library by an adult.
19Introduction to Statistics
Level of Measurement
Introduction to Statistics 20
Nominal
Ordinal
Interval
Ratio
NOIR
-
8/16/2019 1 Introduction to Statistics_handouts
11/43
7/20/20
Nominal scale
A nominal scale classifies data into distinct categories inwhich no ranking is implied.
Categorical Variables Categories
Personal ComputerOwnership
Type of StocksOwned
Internet Provider
Yes / No
Microsoft Network /AOL
Growth, Value, Other
21Introduction to Statistics
Ordinal scale
An ordinal scale classifies data into distinctcategories in which ranking is implied
Student class designation Freshman, Junior, Senior
Product satisfaction Satisfied, Neutral, Unsatisfied
Faculty rank Professor, Associate Professor, Assistant Professor, Instructor
Standard & Poor’s bond ratings AAA, AA, A, BBB, BB, B, CCC, CC,C, DDD, DD, D
Student Grades A, B, C, D, F
Categorical Variable Ordered Categories
Chap 1-22Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
12/43
7/20/20
Interval scale
An interval scale is an ordered scale in which thedifference between measurements is a meaningfulquantity but the measurements do not have a true zeropoint.
Example:the difference between 1 and 2 years of age is thesame amount as the difference between 21 and 22years of age, or 50 and 51, or 65 and 66.the difference between a height of 60 inches and aheight of 55 inches is the same amount of difference asa height of 72 inches and a height of 67 inches.
23Introduction to Statistics
NOTE : For interval level variables, it is mathematically legitimate to do arithmetic (add, subtract,and divide) as well as count the values, and sort or rank the values .
Introduction to Statistics 24
-
8/16/2019 1 Introduction to Statistics_handouts
13/43
7/20/20
A ratio scale is an ordered scale in which the differencebetween the measurements is a meaningful quantity.
Ratio level variables have the additional property of having a true zero value so that ratios between values aremeaningful, but practically speaking, ratio level data istreated the same as interval level.Example
number of clients in past six months
It is meaningful to say that “...we had twice as manyclients in this period as we did in the previous six months.
Levels of Measurement
Example
26Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
14/43
7/20/20
The Hierarchy of Levels
Nominal27Introduction to Statistics
The Hierarchy of Levels
Nominal Attributes are only named; weakest28Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
15/43
7/20/20
The Hierarchy of Levels
Nominal Attributes are only named; weakest
Ordinal
29Introduction to Statistics
The Hierarchy of Levels
Nominal Attributes are only named; weakest
Attributes can be orderedOrdinal
30Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
16/43
7/20/20
The Hierarchy of Levels
Nominal
Interval
Attributes are only named; weakest
Attributes can be orderedOrdinal
31Introduction to Statistics
The Hierarchy of Levels
Nominal
Interval
Attributes are only named; weakest
Attributes can be ordered
Distance is meaningful
Ordinal
32Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
17/43
7/20/20
The Hierarchy of Levels
Nominal
Interval
Ratio
Attributes are only named; weakest
Attributes can be ordered
Distance is meaningful
Ordinal
33Introduction to Statistics
The Hierarchy of Levels
Nominal
Interval
Ratio
Attributes are only named; weakest
Attributes can be ordered
Distance is meaningful
Absolute zero
Ordinal
34Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
18/43
7/20/20
Introduction to Statistics 35
Level of Measurement :Decision Tree
Introduction to Statistics 36
Level of Measurement :Characteristics
-
8/16/2019 1 Introduction to Statistics_handouts
19/43
7/20/20
Level of Measurement:Statistical Tests
Introduction to Statistics 37
Example
Identify each of the following as examples of (1) nominal, (2)ordinal, (3) discrete, or (4) continuous variables:
The length of time until a pain reliever begins to work.The number of chocolate chips in a cookie.The number of colors used in a statistics textbook.The brand of refrigerator in a home.The overall satisfaction rating of a new car.The number of files on a computer’s hard disk.The pH level of the water in a swimming pool.The number of staples in a stapler.
38Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
20/43
7/20/20
Class Exercise
Q 1: Determine whether the variable is categoricalor numerical If numerical, determine whether thevariable is discrete or continuous .Determine thelevel of measurement
Amount of money spent on clothing in pastmonth?
Favorite department store?
Most likely time period during which shopping forclothing takes place?
Number of pairs of shoes owned?
Q 2: A manufacturer of dog food was planning tosurvey household in India to determine purchasinghabit of dog owners. Among the variables to becollected are
The primary place of purchase of dog food?Whether dry or moist food can be purchased ?Number of dogs living in the household?Whether the dog is pedigreed?
Class Exercise
-
8/16/2019 1 Introduction to Statistics_handouts
21/43
7/20/20
Q3 : Suppose the following information collected fromMr X on his application for a home loan at the HDFCbank Loan departmenta. Monthly payment : Rs 25100b. Annual Family income:c. Marital status: Marriedd. No of job changed in past 10 years: 2
Classify each of the response by type of data and level ofmeasurement.
Class Exercise
Organizing and VisualizingCategorical and Numerical Data
42Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
22/43
7/20/20
Categorical Data Are Organized ByUtilizing Tables
Categorical Data
Tallying Data
Summary Table
One CategoricalVariable
Two CategoricalVariables
Contingency Table
43Introduction to Statistics
Organizing Categorical Data:Summary Table
A summary table indicates the frequency, amount, orpercentage of items in a set of categories so that you cansee differences between categories.
How do you spend the holidays? Percent
At home with family 45%
Travel to visit family 38%
Vacation 5%
Catching up on work 5%
Other 7%
Chap 1-44Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
23/43
7/20/20
Contingency Table
Used to study patterns that may existbetween the responses of two or morecategorical variables
Cross tabulates or tallies jointly the responsesof the categorical variables
For two variables the tallies for one variableare located in the rows and the tallies for thesecond variable are located in the columns
45Introduction to Statistics
Contingency Table - Example
A random sample of 400invoices is drawn.Each invoice is categorized as asmall, medium, or largeamount.Each invoice is also examined toidentify if there are any errors.This data are then organized inthe contingency table to theright.
NoErrors Errors Total
Small
Amount
170 20 190
MediumAmount
100 40 140
LargeAmount
65 5 70
Total335 65 400
Contingency Table ShowingFrequency of Invoices CategorizedBy Size and The Presence Of Errors
46Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
24/43
7/20/20
Contingency Table Based on% of Overall Total
NoErrors Errors Total
SmallAmount
170 20 190
MediumAmount
100 40 140
LargeAmount
65 5 70
Total335 65 400
NoErrors Errors Total
SmallAmount
42.50% 5.00% 47.50%
MediumAmount
25.00% 10.00% 35.00%
LargeAmount
16.25% 1.25% 17.50%
Total83.75% 16.25% 100.0%
42.50% = 170 / 40025.00% = 100 / 40016.25% = 65 / 400
83.75% of sampled invoices have no
errors and 47.50% of sampled invoicesare for small amounts.
47Introduction to Statistics
Contingency Table Based on% of Row TotalsNo
Errors Errors Total
SmallAmount
170 20 190
MediumAmount
100 40 140
LargeAmount
65 5 70
Total335 65 400
NoErrors Errors Total
Small
Amount
89.47% 10.53% 100.0%
MediumAmount
71.43% 28.57% 100.0%
LargeAmount
92.86% 7.14% 100.0%
Total83.75% 16.25% 100.0%
89.47% = 170 / 19071.43% = 100 / 14092.86% = 65 / 70
Medium invoices have a larger chance(28.57%) of having errors than small(10.53%) or large (7.14%) invoices.
48Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
25/43
7/20/20
Contingency Table Based onPercentage of Column Total
NoErrors Errors Total
SmallAmount
170 20 190
MediumAmount
100 40 140
LargeAmount
65 5 70
Total335 65 400
NoErrors Errors Total
SmallAmount
50.75% 30.77% 47.50%
MediumAmount
29.85% 61.54% 35.00%
LargeAmount
19.40% 7.69% 17.50%
Total100.0% 100.0% 100.0%
50.75% = 170 / 33530.77% = 20 / 65
There is a 61.54% chance that invoiceswith errors are of medium size.
49Introduction to Statistics
Tables Used For OrganizingNumerical Data
Numerical Data
Ordered ArrayCumulative
DistributionsFrequency
Distributions
50Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
26/43
7/20/20
Organizing Numerical Data:Ordered Array
An ordered array is a sequence of data, in rank order,from the smallest value to the largest value.
Age ofSurveyedCollege
Students
Day Students
16 17 17 18 18 18
19 19 20 20 21 22
22 25 27 32 38 42Night Students
18 18 19 19 20 21
23 28 32 33 41 45
Chap 1-51Introduction to Statistics
Organizing Numerical Data:Frequency Distribution
The frequency distribution is a summary table in which thedata are arranged into numerically ordered class groupings.You must give attention to selecting the appropriate numberof class groupings for the table, determining a suitable widthof a class grouping, and establishing the boundaries of eachclass grouping to avoid overlapping.
The number of classes depends on the number of values inthe data. With a larger number of values, typically there aremore classes. In general, a frequency distribution shouldhave at least 5 but no more than 15 classes.To determine the width of a class interval, you divide therange (Highest value –Lowest value) of the data by thenumber of class groupings desired.
52Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
27/43
7/20/20
Organizing Numerical Data:Frequency Distribution Example
Example: A manufacturer of insulation randomlyselects 20 winter days and records the dailyhigh temperature
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41,
43, 44, 27, 53, 27
53Introduction to Statistics
STEPS
1. Sort raw data in ascending order:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
2. Find range: 58 - 12 = 463. Select number of classes: 5 (usually between 5 and 15)4. Compute class interval (width): 10 (46/5 then round up)5. Determine class boundaries (limits):
1. Class 1: 10 to less than 202. Class 2: 20 to less than 303. Class 3: 30 to less than 404. Class 4: 40 to less than 505. Class 5: 50 to less than 60
6. Compute class midpoints: 15, 25, 35, 45, 557. Count observations & assign to classes 54Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
28/43
7/20/20
Organizing Numerical Data:Frequency Distribution Example
Class Midpoints Frequency
10 but less than 20 15 320 but less than 30 25 630 but less than 40 35 540 but less than 50 45 450 but less than 60 55 2
Total 20
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
55Introduction to Statistics
Organizing Numerical Data:Relative & Percent FrequencyDistribution
Class Frequency
10 but less than 20 3 .15 15
20 but less than 30 6 .30 3030 but less than 40 5 .25 2540 but less than 50 4 .20 2050 but less than 60 2 .10 10
Total 20 1.00 100
RelativeFrequency Percentage
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
56Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
29/43
7/20/20
Organizing Numerical Data:Cumulative FrequencyDistribution
Class
10 but less than 20 3 15% 3 15%
20 but less than 30 6 30% 9 45%
30 but less than 40 5 25% 14 70%
40 but less than 50 4 20% 18 90%50 but less than 60 2 10% 20 100%
Total 20 100 20 100%
Percentage CumulativePercentage
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
FrequencyCumulativeFrequency
57Introduction to Statistics
Why Use a Frequency Distribution?
It condenses the raw data into a more useful form
It allows for a quick visual interpretation of the data
It enables the determination of the majorcharacteristics of the data set including where the dataare concentrated / clustered
58Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
30/43
7/20/20
Frequency Distributions:Some Tips
Different class boundaries may provide differentpictures for the same data (especially for smallerdata sets)Shifts in data concentration may show up whendifferent class boundaries are chosenAs the size of the data set increases, the impact ofalterations in the selection of class boundaries isgreatly reduced
When comparing two or more groups with differentsample sizes, you must use either a relativefrequency or a percentage distribution
59Introduction to Statistics
Visualizing Categorical DataThrough Graphical DisplaysCategorical Data
Visualizing Data
BarChart
Summary TableFor One Variable
ContingencyTable For Two
Variables
Side By Side BarChart
Pie Chart
ParetoChart
60Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
31/43
7/20/20
In a bar chart, a bar shows each category, the lengthof which represents the amount, frequency orpercentage of values falling into a category.
45%38%
5%
5%
7%
0% 10% 20% 30% 40% 50%
At home with …Travel to visit …
Vacation
Catching up …
Other
How Do You Spend the Holidays?
Organizing Categorical Data:Summary Table
Chap 1-61Introduction to Statistics
Organizing Categorical Data:Pie Chart
The pie chart is a circle broken up into slices thatrepresent categories. The size of each slice of the pievaries according to the percentage in each category.
45%
38%
5% 5%7%
How Do You Spend the Holiday's
At home with family
Travel to visit family
Vacation
Catching up on work
OtherChap 1-62
Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
32/43
7/20/20
Organizing Categorical Data:Pareto Diagram
Used to portray categorical data
A bar chart, where categories are shown indescending order of frequency
A cumulative polygon is shown in the same graph
Used to separate the “vital few” from the “trivial
many ”
63Introduction to Statistics
Organizing Categorical Data:Pareto Diagram
c u m u l a t i v e % i n v e s t e d
( l i n e g r a p h )
% i
n v e s t e
d i n e a c h
c a t e g o r y
( b a r
g r a p h
)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
Stocks Bonds Savings CD
Current Investment Portfolio
64Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
33/43
7/20/20
Visualizing Categorical Data:Side By Side Bar Charts
The side by side bar chart represents the data from a contingency table.
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0%
No Errors
Errors
Invoice Size Split Out By Errors & NoErrors
Large Medium Small
Invoices with errors are much more likely to be of medium size (61.54% vs 30.77% and 7.69%)
NoErrors Errors Total
SmallAmount
50.75% 30.77% 47.50%
MediumAmount
29.85% 61.54% 35.00%
LargeAmount
19.40% 7.69% 17.50%
Total
100.0% 100.0% 100.0%
65Introduction to Statistics
Visualizing Numerical Data ByUsing Graphical Displays
Numerical Data
Ordered Array
Stem-and-Leaf Display Histogram Polygon Ogive
Frequency Distributions andCumulative Distributions
66Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
34/43
7/20/20
Organizing Numerical Data:Stem and Leaf Display
A stem-and-leaf display organizes data into groups(called stems) so that the values within each group(the leaves) branch out to the right on each row.
Stem Leaf
1 67788899
2 0012257
3 28
4 2
Age of College Students
Day Students Night Students
Stem Leaf
1 8899
2 0138
3 23
4 15
Age ofSurveyedCollegeStudents
Day Students16 17 17 18 18 18
19 19 20 20 21 22
22 25 27 32 38 42
Night Students18 18 19 19 20 2123 28 32 33 41 45
2-67Introduction to Statistics
Organizing Numerical Data:Stem and Leaf Display
A stem-and-leaf display organizes data into groups(called stems) so that the values within each group(the leaves) branch out to the right on each row.
Stem Leaf 1 67788899
2 0012257
3 28
4 2
Age of College Students
Day Students Night Students
Stem Leaf 1 8899
2 0138
3 23
4 15
Chap 1-68Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
35/43
7/20/20
Visualizing Numerical Data:The Histogram
A graph of the data in a frequency distribution iscalled a histogram.In a histogram there are no gaps between adjacentbars.The class boundaries (or class midpoints ) are shownon the horizontal axis.The vertical axis is either frequency, relativefrequency, or percentage .
Bars of the appropriate heights are used to representthe number of observations within each class.
69Introduction to Statistics
Class Frequency
10 but less than 20 3 .15 1520 but less than 30 6 .30 3030 but less than 40 5 .25 25
40 but less than 50 4 .20 2050 but less than 60 2 .10 10
Total 20 1.00 100
RelativeFrequency Percentage
0
5
10
5 15 25 35 45 55 More
F r e q u e n c y
Histogram: Daily HighTemperature
Visualizing Numerical Data:The Histogram
70Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
36/43
7/20/20
A percentage polygon is formed by having themidpoint of each class represent the data in that classand then connecting the sequence of midpoints attheir respective class percentages.The cumulative percentage polygon, orogive, displays the variable of interest along the X axis, and the cumulative percentages along the Y axis.
Useful when there are two or more groups tocompare
Visualizing Numerical Data:The Polygon
71Introduction to Statistics
Visualizing Numerical Data:The Frequency Polygon
0
5
10
5 15 25 35 45 55 More
F r e q u e n c y
Frequency Polygon: Daily HighTemperature
Class Frequency
10 but less than 20 3 .15 1520 but less than 30 6 .30 3030 but less than 40 5 .25 2540 but less than 50 4 .20 2050 but less than 60 2 .10 10
Total 20 1.00 100
RelativeFrequency
Percentage
(In a percentage polygonthe vertical axis would bedefined to show thepercentage of observationsper class)
72Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
37/43
7/20/20
Organizing Numerical Data:The Cumulative Percentage Polygon
0
50
100
10 20 30 40 50 60
C u m u
l a t i v e P e r c e n t a g e
Ogive: Daily High Temperature
Class LowerBoundary
% Less ThanLowerBoundary
10
-
8/16/2019 1 Introduction to Statistics_handouts
38/43
7/20/20
Scatter Plot Example
Volumeper day
Cost perday
23 125
26 140
29 146
33 160
38 167
42 170
50 18855 195
60 200
0
50
100
150
200
250
20 30 40 50 60 70
C o s t p e r D a y
Volume per Day
Cost per Day vs. ProductionVolume
75Introduction to Statistics
Time SeriesA Time Series Plot is used to study patterns in thevalues of a numeric variable over timeThe Time Series Plot:Numeric variable is measured on the vertical axis and
the time period is measured on the horizontal axis
Attendance (in millions) at USA amusement/theme parks from 2000-2005
Year Year Number Attendance2000 0 317
2001 1 319
2002 2 324
2003 3 322
2004 4 328
2005 5 33576Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
39/43
7/20/20
Time Series Example
316
320
324
328
332
336
0 1 2 3 4 5 6
A t t e n
d a n c e
Year (Since 2000 )
Attendance (in millions) at US ThemeParks
77Introduction to Statistics
Principles of Excellent Graphs
The graph should not distort the data.The graph should not contain unnecessaryadornments (sometimes referred to as chart junk ).The scale on the vertical axis should begin at zero.
All axes should be properly labeled.The graph should contain a title.The simplest possible graph should be used for agiven set of data.
78Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
40/43
7/20/20
Graphical Errors: Chart Junk
1960: $1.00
1970: $1.60
1980: $3.10
1990: $3.80
Minimum Wage
Bad Presentation
Minimum Wage
0
2
4
1960 1970 1980 1990
$
Good Presentation
79Introduction to Statistics
Graphical Errors:No Relative Basis
A’s received bystudents.
A’s received bystudents.
Bad Presentation
0
200
300
FR SO JR SR
Freq.
10%
30%
FR SO JR SR
FR = Freshmen, SO = Sophomore, JR = Junior, SR = Senior
100
20%
0%
%
Good Presentation
80Introduction to Statistics
-
8/16/2019 1 Introduction to Statistics_handouts
41/43
7/20/20
Graphical Errors:Compressing the Vertical Axis
Good Presentation
Quarterly Sales Quarterly Sales
Bad Presentation
0
25
50
Q1 Q2 Q3 Q4
$
0
100
200
Q1 Q2 Q3 Q4
$
81Introduction to Statistics
Class Exercise 1The owner of the restaurant wanted to study the demand fordessert. He decided that in addition to studying whether the desertwas ordered, he would also study the gender of individual. Datawere collected from 600 customers and organized in the followingcontingency tables.
a.Construct a contingency tables for row, column and total percentage?b.Which type of percentage (row, column and total ), do you think more
informative for each gender?c.What conclusions concerning the pattern of dessert ordering can the
restaurant owner reach?
GenderDessert Ordered Male Female Total
Yes 40 96 136No 240 224 464
Total 280 320 600
-
8/16/2019 1 Introduction to Statistics_handouts
42/43
7/20/20
Class Exercise 2
The Following Table represents estimated green power salesby renewable energy source 2008Source Percentage
Geothermal 2.8
hydro 11.3
Landfill mass and biomass 28.1
Solar 0.2
Unreported 2.5
Wind 55.1
a. Construct a bar chart, pie chart and Pareto chartb. What conclusion can you reach about the sources of green
powerSource: National renewable energy laboratory,2008
Class Exercise 3
-
8/16/2019 1 Introduction to Statistics_handouts
43/43
7/20/20
Calculate the following ?
a. Divide the data into classesb. Absolute frequency
c. Relative frequency
d. Percentages
e. Cumulative frequencyf. Cumulative percentage
g. Midpoints
h. Draw Histogram and relative frequencypolygon
THANKS