(5) graphical presentation 1
TRANSCRIPT
-
7/30/2019 (5) Graphical Presentation 1
1/16
Applied Statistics and Computing Lab
GRAPHICAL PRESENTATIONS 1(REPRESENTATION OF CATEGORICAL DATA)
Applied Statistics and Computing Lab
Indian School of Business
1
-
7/30/2019 (5) Graphical Presentation 1
2/16
Applied Statistics and Computing Lab
Learning Goals
Why we use Graphs?
Basic Graphs for Categorical Data
2
-
7/30/2019 (5) Graphical Presentation 1
3/16
Applied Statistics and Computing Lab
Why use graphs? A market survey firm is conducting a survey on the popularity of different
makes of car in the USA. For this, it investigates various car showrooms andlists down the various car varieties in each showroom
Suppose it obtains the following data on the makes of 28 cars in one suchshowroom : Buick, Cadillac, Buick, Chevrolet, Buick, Buick, Buick, Pontiac,Cadillac, Chevrolet, SAAB, SAAB, SAAB, Cadillac, Chevrolet, Pontiac, Buick,Buick, Buick, SAAB, Cadillac, SAAB, Pontiac, SAAB, Chevrolet, Buick, SAAB,Cadillac.
Relevant Questions: Which is the most common car in the showroom?
Which is the least common car in the showroom?
How to answer these questions?
Probably the first thing you do, is count the observations for each car make
and note them down Once you note them down in a tabular form, that makes it a frequency table
Frequency Table for categorical data is a table that displays the possiblecategories along with the associated frequencies or relative frequencies.
3
-
7/30/2019 (5) Graphical Presentation 1
4/16
Applied Statistics and Computing Lab
Table 1: Frequency Table
Color Frequency
Buick 9
Cadillac 5
Chevrolet 4
Pontiac 3
SAAB 7
4
At one glance- the answers are there!
Buick is the most common brand in that showroom and Pontiac is
the least common brand A frequency table is the simplest representation of data in a tabular
form
This same data can be represented pictorially in a number of ways!
-
7/30/2019 (5) Graphical Presentation 1
5/16
Applied Statistics and Computing Lab
Different types of graphs for representing
categorical data
Graphs for presenting categorical data:
Frequency Table
Bar Chart- Multiple Bar, Divided or Segmented
Bar
Pie Chart
5
-
7/30/2019 (5) Graphical Presentation 1
6/16
Applied Statistics and Computing Lab
Table 2:Why different graphs-
Illustration through examples
Consider the following Frequency table of the population of
Andhra Pradesh in 2011:
6
Category Population in 2011
Rural Male 28219760Rural Female 28092028
Urban Male 14290121
Urban Female 14063624
-
7/30/2019 (5) Graphical Presentation 1
7/16
Applied Statistics and Computing Lab
Bar Chart Inferences
For policy reasons, one isinterested in the composition ofpopulation in Andhra Pradesh.
From a bar chart, we can get themost frequently and infrequently
occurring categories- Rural malehas the highest occurrence, urbanfemale the lowest.
However, one may also beinterested in the relative share ofeach category, rather than theabsolute figure
Visually, from the bar chart ruralmale and rural female seems
almost equal. So does urban maleand female
To analyze their relative share welook at pie chart
Bar chart is a graph of the frequencydistribution of categorical data.
Each category in the frequency
distribution is represented by a bar or
rectangle.
Area of each bar is proportional to thecorresponding frequency.
Bar chart maybe vertical or horizontal.
The following is vertical
7
-
7/30/2019 (5) Graphical Presentation 1
8/16
Applied Statistics and Computing Lab
Pie Chart
A circle is used to represent the whole data set
Slices of the pie represent possible categories
Area of the slice for a particular category is proportional to the correspondingrelative frequency
When to use: categorical data with a relatively small number of categories
In case of many categories, merge a few categories into one
Most useful for illustrating proportions of the whole dataset for various categories
Criticism: Not an effective visual display if one doesnt specify the percentages8
-
7/30/2019 (5) Graphical Presentation 1
9/16
Applied Statistics and Computing Lab
Multiple Bar Graph
Category Population in 2011 Population in 2001
Rural Male 28219760 27937204Rural Female 28092028 27463863
Urban Male 14290121 10590209
Urban Female 14063624 10218731
Suppose the government of AP is interested in population control.
They are interested to know how the population in each category has
changed in the decade from 2001 to 2011 so that they know which sectionto target
We have bar plot and Pie chart of population in 2011.
Draw the same for 2001. Can we compare graphically and readily answer the
questions? (Check!)
A pie chart and bar chart allow comparison between categories within a
year- but not across years
To facilitate such a comparison Multiple bar graph is used
9
-
7/30/2019 (5) Graphical Presentation 1
10/16
Applied Statistics and Computing Lab
Multiple Bar GraphInferences:
Most marked increase in populationin the category of urban female,followed closely by urban male
Very slight increase in the categoriesrural male and rural female
Possible cause- rural to urbanmigration- needs investigation
Since increase in each category from2001 to 2011, hence total increase inpopulation from 2001 to 2011
However, cannot visualize exactlyhow much the population hasincreased from 2001 to 2011
Also, no information about the
relative contribution of each categoryto the total
For this, use segmented or stackedbar plot
What is a multiple bar graph?
Variant of the bar diagram
Used to compare one or more series of data
on the same variable or for showing
different components of an item Several sets of bars are drawn so that bars
for a particular period (here 2001 and
2011) or related phenomenon are put
together and uniform gap is maintained
between any two sets of bars10
-
7/30/2019 (5) Graphical Presentation 1
11/16
Applied Statistics and Computing Lab
Segmented Bar Graph:
Year wise
Inferences
Gives an idea about the total increase in populationfrom 2001 to 2011
Also, the relative share of each category in each year
11
Segmented Bar Graph uses a
rectangular bar rather than a circle to
represent the entire dataset
The bar is then divided into segments,with different segments representing
different categories
Size of the segment for a particular
category is proportional to the relative
frequency for that category
-
7/30/2019 (5) Graphical Presentation 1
12/16
Applied Statistics and Computing Lab
Segmented Bar Graph: Category-wise
Rural male is the most dominant category, followed by rural female
From 2001 to 2011, there has been a relative increase in the urban male
and urban female population (relative to rural male and rural female)
12
Suppose we want to
compare each category-
Rural male vs. rural female,urban male vs. urban female
in both the years
Simple trick- switch the
row/column in your data so
that you have segmentedbar graph, category wise
-
7/30/2019 (5) Graphical Presentation 1
13/16
Applied Statistics and Computing Lab
Pictorial Presentation or Table? The objective of a frequency table is to provide insights about the data that cannot
be quickly obtained by looking only at the original data- like the cars example
Here if you make a pictorial presentation of the data, no extra gain in information
But Pictorial Presentation sometimes necessary to depict some features of data-not readily readable from table- refer table 2, slide 11
Recall some questions we have asked-
What is the relative share of urban male, urban female, rural male and rural
female in the population of Andhra Pradesh in 2011?
By how much did the total population increase from 2001 to 2011?
What was the combined urban male, urban female, rural male and rural
female populations for the years 2001 and 2011?
You cannot readily answer such questions by looking at just the frequency table
which only gives information about the frequency of each category. However, you will be able to answer questions like-
Which one is the most frequently occurring category in 2001 and in 2011?
Was there an increase in the frequency of urban male, urban female, rural
male and rural female from 2001 to 2011?13
-
7/30/2019 (5) Graphical Presentation 1
14/16
Applied Statistics and Computing Lab
R-Codes#Creating Data
APPopulation = cbind(c(28219760,28092028,14290121,14063624),
c(27937204,27463863,10590209,10218731))
rownames(APPopulation) = c("RuralMale","RuralFemale","UrbanMale","UrbanFemale")
colnames(APPopulation) = c("2011","2001")
#barplot
colors=c("red", "bisque", "darkslategray", "violet")
barplot(APPopulation[,"2011"]/1000000,col=colors)
title(main="Barplot of AP Population in 2011 (in millions)")
# Multiple Bar Graph:A = matrix(
c(10218731,10590209,27463863,27937204,14063624,14290121,28092028,28219760), # the dataelements
nrow=2, # number of rows
ncol=4, # number of columns
byrow = TRUE) # fill matrix by rows
colors=c("red", "bisque")
barplot(A/1000000,names.arg=rev(rownames(APPopulation)),legend.text=c(2001,2011),beside=TRUE,main="Distribution ofpopulation by category",xlab="Categories", ylab="population, in millions",ylim=c(0,80),col=colors)
14
-
7/30/2019 (5) Graphical Presentation 1
15/16
Applied Statistics and Computing Lab
R-Codes (Continued)# Segmented bar graph (yearwise)
colors=c("red", "bisque", "darkslategray", "violet","red","yellow")
barplot(APPopulation/1000000, main="Distribution of population by category yearwise",
xlab="Year", ylab="population, in millions",col=colors,
legend = rownames(APPopulation))
# Segmented bar graph (categorywise)
colors=c("red", "bisque")
A = matrix(
c(10218731,10590209,27463863,27937204,14063624,14290121,28092028,28219760), # the data elements
nrow=2,
ncol=4,
byrow = TRUE)barplot(A/1000000,names.arg =rev(rownames(APPopulation)),legend.text=c(2001,2011),beside=FALSE,main="Distribution
of population by category",xlab="Categories", ylab="population, in millions",ylim=c(0,90),col=colors)
# Pie Chart
colors=c("red", "bisque", "darkslategray", "violet")
slices
-
7/30/2019 (5) Graphical Presentation 1
16/16
Applied Statistics and Computing Lab
Thank you
16