(5) graphical presentation 1

Upload: asclabisb

Post on 14-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 (5) Graphical Presentation 1

    1/16

    Applied Statistics and Computing Lab

    GRAPHICAL PRESENTATIONS 1(REPRESENTATION OF CATEGORICAL DATA)

    Applied Statistics and Computing Lab

    Indian School of Business

    1

  • 7/30/2019 (5) Graphical Presentation 1

    2/16

    Applied Statistics and Computing Lab

    Learning Goals

    Why we use Graphs?

    Basic Graphs for Categorical Data

    2

  • 7/30/2019 (5) Graphical Presentation 1

    3/16

    Applied Statistics and Computing Lab

    Why use graphs? A market survey firm is conducting a survey on the popularity of different

    makes of car in the USA. For this, it investigates various car showrooms andlists down the various car varieties in each showroom

    Suppose it obtains the following data on the makes of 28 cars in one suchshowroom : Buick, Cadillac, Buick, Chevrolet, Buick, Buick, Buick, Pontiac,Cadillac, Chevrolet, SAAB, SAAB, SAAB, Cadillac, Chevrolet, Pontiac, Buick,Buick, Buick, SAAB, Cadillac, SAAB, Pontiac, SAAB, Chevrolet, Buick, SAAB,Cadillac.

    Relevant Questions: Which is the most common car in the showroom?

    Which is the least common car in the showroom?

    How to answer these questions?

    Probably the first thing you do, is count the observations for each car make

    and note them down Once you note them down in a tabular form, that makes it a frequency table

    Frequency Table for categorical data is a table that displays the possiblecategories along with the associated frequencies or relative frequencies.

    3

  • 7/30/2019 (5) Graphical Presentation 1

    4/16

    Applied Statistics and Computing Lab

    Table 1: Frequency Table

    Color Frequency

    Buick 9

    Cadillac 5

    Chevrolet 4

    Pontiac 3

    SAAB 7

    4

    At one glance- the answers are there!

    Buick is the most common brand in that showroom and Pontiac is

    the least common brand A frequency table is the simplest representation of data in a tabular

    form

    This same data can be represented pictorially in a number of ways!

  • 7/30/2019 (5) Graphical Presentation 1

    5/16

    Applied Statistics and Computing Lab

    Different types of graphs for representing

    categorical data

    Graphs for presenting categorical data:

    Frequency Table

    Bar Chart- Multiple Bar, Divided or Segmented

    Bar

    Pie Chart

    5

  • 7/30/2019 (5) Graphical Presentation 1

    6/16

    Applied Statistics and Computing Lab

    Table 2:Why different graphs-

    Illustration through examples

    Consider the following Frequency table of the population of

    Andhra Pradesh in 2011:

    6

    Category Population in 2011

    Rural Male 28219760Rural Female 28092028

    Urban Male 14290121

    Urban Female 14063624

  • 7/30/2019 (5) Graphical Presentation 1

    7/16

    Applied Statistics and Computing Lab

    Bar Chart Inferences

    For policy reasons, one isinterested in the composition ofpopulation in Andhra Pradesh.

    From a bar chart, we can get themost frequently and infrequently

    occurring categories- Rural malehas the highest occurrence, urbanfemale the lowest.

    However, one may also beinterested in the relative share ofeach category, rather than theabsolute figure

    Visually, from the bar chart ruralmale and rural female seems

    almost equal. So does urban maleand female

    To analyze their relative share welook at pie chart

    Bar chart is a graph of the frequencydistribution of categorical data.

    Each category in the frequency

    distribution is represented by a bar or

    rectangle.

    Area of each bar is proportional to thecorresponding frequency.

    Bar chart maybe vertical or horizontal.

    The following is vertical

    7

  • 7/30/2019 (5) Graphical Presentation 1

    8/16

    Applied Statistics and Computing Lab

    Pie Chart

    A circle is used to represent the whole data set

    Slices of the pie represent possible categories

    Area of the slice for a particular category is proportional to the correspondingrelative frequency

    When to use: categorical data with a relatively small number of categories

    In case of many categories, merge a few categories into one

    Most useful for illustrating proportions of the whole dataset for various categories

    Criticism: Not an effective visual display if one doesnt specify the percentages8

  • 7/30/2019 (5) Graphical Presentation 1

    9/16

    Applied Statistics and Computing Lab

    Multiple Bar Graph

    Category Population in 2011 Population in 2001

    Rural Male 28219760 27937204Rural Female 28092028 27463863

    Urban Male 14290121 10590209

    Urban Female 14063624 10218731

    Suppose the government of AP is interested in population control.

    They are interested to know how the population in each category has

    changed in the decade from 2001 to 2011 so that they know which sectionto target

    We have bar plot and Pie chart of population in 2011.

    Draw the same for 2001. Can we compare graphically and readily answer the

    questions? (Check!)

    A pie chart and bar chart allow comparison between categories within a

    year- but not across years

    To facilitate such a comparison Multiple bar graph is used

    9

  • 7/30/2019 (5) Graphical Presentation 1

    10/16

    Applied Statistics and Computing Lab

    Multiple Bar GraphInferences:

    Most marked increase in populationin the category of urban female,followed closely by urban male

    Very slight increase in the categoriesrural male and rural female

    Possible cause- rural to urbanmigration- needs investigation

    Since increase in each category from2001 to 2011, hence total increase inpopulation from 2001 to 2011

    However, cannot visualize exactlyhow much the population hasincreased from 2001 to 2011

    Also, no information about the

    relative contribution of each categoryto the total

    For this, use segmented or stackedbar plot

    What is a multiple bar graph?

    Variant of the bar diagram

    Used to compare one or more series of data

    on the same variable or for showing

    different components of an item Several sets of bars are drawn so that bars

    for a particular period (here 2001 and

    2011) or related phenomenon are put

    together and uniform gap is maintained

    between any two sets of bars10

  • 7/30/2019 (5) Graphical Presentation 1

    11/16

    Applied Statistics and Computing Lab

    Segmented Bar Graph:

    Year wise

    Inferences

    Gives an idea about the total increase in populationfrom 2001 to 2011

    Also, the relative share of each category in each year

    11

    Segmented Bar Graph uses a

    rectangular bar rather than a circle to

    represent the entire dataset

    The bar is then divided into segments,with different segments representing

    different categories

    Size of the segment for a particular

    category is proportional to the relative

    frequency for that category

  • 7/30/2019 (5) Graphical Presentation 1

    12/16

    Applied Statistics and Computing Lab

    Segmented Bar Graph: Category-wise

    Rural male is the most dominant category, followed by rural female

    From 2001 to 2011, there has been a relative increase in the urban male

    and urban female population (relative to rural male and rural female)

    12

    Suppose we want to

    compare each category-

    Rural male vs. rural female,urban male vs. urban female

    in both the years

    Simple trick- switch the

    row/column in your data so

    that you have segmentedbar graph, category wise

  • 7/30/2019 (5) Graphical Presentation 1

    13/16

    Applied Statistics and Computing Lab

    Pictorial Presentation or Table? The objective of a frequency table is to provide insights about the data that cannot

    be quickly obtained by looking only at the original data- like the cars example

    Here if you make a pictorial presentation of the data, no extra gain in information

    But Pictorial Presentation sometimes necessary to depict some features of data-not readily readable from table- refer table 2, slide 11

    Recall some questions we have asked-

    What is the relative share of urban male, urban female, rural male and rural

    female in the population of Andhra Pradesh in 2011?

    By how much did the total population increase from 2001 to 2011?

    What was the combined urban male, urban female, rural male and rural

    female populations for the years 2001 and 2011?

    You cannot readily answer such questions by looking at just the frequency table

    which only gives information about the frequency of each category. However, you will be able to answer questions like-

    Which one is the most frequently occurring category in 2001 and in 2011?

    Was there an increase in the frequency of urban male, urban female, rural

    male and rural female from 2001 to 2011?13

  • 7/30/2019 (5) Graphical Presentation 1

    14/16

    Applied Statistics and Computing Lab

    R-Codes#Creating Data

    APPopulation = cbind(c(28219760,28092028,14290121,14063624),

    c(27937204,27463863,10590209,10218731))

    rownames(APPopulation) = c("RuralMale","RuralFemale","UrbanMale","UrbanFemale")

    colnames(APPopulation) = c("2011","2001")

    #barplot

    colors=c("red", "bisque", "darkslategray", "violet")

    barplot(APPopulation[,"2011"]/1000000,col=colors)

    title(main="Barplot of AP Population in 2011 (in millions)")

    # Multiple Bar Graph:A = matrix(

    c(10218731,10590209,27463863,27937204,14063624,14290121,28092028,28219760), # the dataelements

    nrow=2, # number of rows

    ncol=4, # number of columns

    byrow = TRUE) # fill matrix by rows

    colors=c("red", "bisque")

    barplot(A/1000000,names.arg=rev(rownames(APPopulation)),legend.text=c(2001,2011),beside=TRUE,main="Distribution ofpopulation by category",xlab="Categories", ylab="population, in millions",ylim=c(0,80),col=colors)

    14

  • 7/30/2019 (5) Graphical Presentation 1

    15/16

    Applied Statistics and Computing Lab

    R-Codes (Continued)# Segmented bar graph (yearwise)

    colors=c("red", "bisque", "darkslategray", "violet","red","yellow")

    barplot(APPopulation/1000000, main="Distribution of population by category yearwise",

    xlab="Year", ylab="population, in millions",col=colors,

    legend = rownames(APPopulation))

    # Segmented bar graph (categorywise)

    colors=c("red", "bisque")

    A = matrix(

    c(10218731,10590209,27463863,27937204,14063624,14290121,28092028,28219760), # the data elements

    nrow=2,

    ncol=4,

    byrow = TRUE)barplot(A/1000000,names.arg =rev(rownames(APPopulation)),legend.text=c(2001,2011),beside=FALSE,main="Distribution

    of population by category",xlab="Categories", ylab="population, in millions",ylim=c(0,90),col=colors)

    # Pie Chart

    colors=c("red", "bisque", "darkslategray", "violet")

    slices

  • 7/30/2019 (5) Graphical Presentation 1

    16/16

    Applied Statistics and Computing Lab

    Thank you

    16