1 introduction to statistics_handouts

Upload: easwar-kumar

Post on 06-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/16/2019 1 Introduction to Statistics_handouts

    1/43

    7/20/20

    Introduction to Statistics

    By

    Dr. Vishal Singh Patyal

    Learning Objectives

    In this chapter, you will learn:

    What is StatisticsWhy StatisticsBasic vocabulary used in StatisticsHow statistics is used in Business

    The sources of data and its types used in BusinessTypes of VariablesLevel of ManagementTabular and Graphical Presentation of Data

    2Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    2/43

    7/20/20

    3Introduction to Statistics

    What is Statistics?

    The science of collecting, describing, and interpreting data.

    “Statistics is a way to get information from data”

    Data

    Statistics

    Information

    Data: Facts, especiallynumerical facts, collectedtogether for reference orinformation.

    Information: Knowledgecommunicatedconcerning someparticular fact.

    Statistics is a tool for creating new understanding from a set of numbers. 4Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    3/43

    7/20/20

    Why Study Statistics?

    Decision Makers Use Statistics To:

    Present and describe business data and information properlyDraw conclusions about large populations, using informationcollected from samplesMake reliable forecasts about a business activityImprove business processes

    5Introduction to Statistics

    Types of Statistics

    StatisticsThe branch of mathematics that transforms data into

    useful information for decision makers.

    Descriptive Statistics

    Collecting, summarizing,and describing data

    Inferential Statistics

    Drawing conclusionsand/or making decisionsconcerning a population

    based only on sample data

    6Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    4/43

    7/20/20

    Descriptive Statistics

    Collect dataex. Survey

    Present dataex. Tables and graphs

    Characterize dataex. Sample mean = i X

    nCollect

    OrganizeSummarizeDisplayAnalyze 7Introduction to Statistics

    Inferential Statistics

    Estimationex. Estimate the populationmean weight using the samplemean weight

    Hypothesis testingex. Test the claim that thepopulation mean weight is 120pounds

    Drawing conclusions and/or making decisions concerning a populationbased on sample results.

    Predict and forecast valuesof population parametersTest hypotheses aboutvalues of populationparametersMake decisions

    8Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    5/43

    7/20/20

    Example

    A recent study examined the QUANTand VERBALCAT scores of students across the country. Which of the followingstatements are descriptive in nature and which are inferential.

    • The mean QUANTCAT score was 492. D• The mean VERBALSAT score was 475. D• Students in the Northeast scored higher in QUANTbut lower

    in VERBAL. I• 80% of all students taking the exam were headed for IIMs. I• 32% of the students scored above 610 on the VERBALCAT. D• The QUANTCAT scores are higher than they were 10 years

    ago. I

    9Introduction to Statistics

    PopulationA population consists of all the items or individuals about

    which you want to draw a conclusion.A population is the group of all items of interest to a

    statistics practitioner.frequently very large; sometimes infinite.

    E.g. All 1.252 Billion Indian population i.e. census data.Sample

    A subset of the population.A sample is a set of data drawn from the population.Potentially very large, but less than the population.

    E.g. a sample of 765 voters exit polled on election day

    Basic Vocabulary of Statistics

    10Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    6/43

    7/20/20

    Parameter

    PopulationSample

    Statistic

    Subset

    Measures used to describethe population are calledparameters

    Measures computedfrom sample data arecalled statistics

    Basic Vocabulary of Statistics

    11Introduction to Statistics

    VariableA variable is some characteristic of a population or sample.E.g. student grades. Typically denoted with a capital letter: A,B, C…

    The values of the variable are the range of possible values fora variable.

    E.g. student marks (0..100)Data

    Data are the observed values of a variable.Data are the different values associated with a variable.E.g. student marks: {67, 74, 71, 83, 93, 55, 48}

    Basic Vocabulary of Statistics

    12Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    7/43

    7/20/20

    ExampleNICMAR Institute dean is interested in learning about the averageage of faculty. Identify the basic terms in this situation.

    The population is the age of all faculty members at the Institute.A sample is any subset of that population. For example, we might

    select 10 faculty members and determine their age.The variable is the “age” of each faculty member.The data would be the set of values in the sample.The parameter of interest is the “average” age of all faculty at the

    Institute.The statistic is the “average” age for all faculty in the sample.

    13Introduction to Statistics

    Why Collect Data?

    A marketing research analyst needs to assess theeffectiveness of a new television advertisement.A pharmaceutical manufacturer needs to determine whethera new drug is more effective than those currently in use.An operations manager wants to monitor a manufacturingprocess to find out whether the quality of product being

    manufactured is conforming to company standards.An auditor wants to review the financial transactions of acompany in order to determine whether the company is incompliance with generally accepted accounting principles.

    14Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    8/43

    7/20/20

    Sources of Data

    Primary Sources:The data collector is the one using the data for analysis

    Data from a political surveyData collected from an experimentObserved data

    Secondary SourcesThe person performing data analysis is not the data collector

    Analyzing census data

    Examining data from print journals or data published onthe internet.

    15Introduction to Statistics

    Types of Variables

    Data

    Categorical Numerical

    Discrete Continuous

    Examples:

    Marital StatusPolitical PartyEye Color

    (Defined categories)Examples:

    Number of ChildrenDefects per hour

    (Counted items)

    Examples:

    WeightVoltage(Measured characteristics) 16Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    9/43

    7/20/20

    Types of Variables

    CategoricalQualitative variables have values that can only be placed intocategories, such as “yes” and “no.”

    A variable that categorizes or describes an element of apopulation.

    Note : Arithmetic operations, such as addition and averaging, are not meaningful for data resulting from a qualitative variable

    NumericalQuantitative variables have values that represent quantities.A variable that quantifies an element of a population.Note : Arithmetic operations such as addition and averaging, are

    meaningful for data resulting from a quantitative variable.17Introduction to Statistics

    ExampleIdentify each of the following examples as attribute (qualitative) ornumerical (quantitative) variables.

    The amount of CNG pumped by the next 10 customers at the localhp PUMP . (Numerical)The amount of radon in the basement of each of 25 homes in anew development. (Numerical)

    The color of the baseball cap worn by each of 20 students.(Attribute)The length of time to complete a mathematics homeworkassignment. (Numerical)The state in which each truck is registered when stopped andinspected at a weigh station. (Attribute)

    18Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    10/43

    7/20/20

    Question?

    Identify each of the following as examples of qualitative orquantitative variables:

    The temperature in Barrow, Alaska at 12:00 pm on anygiven day.

    The make of automobile driven by each faculty member.Whether or not a 6 volt lantern battery is defective.The weight of a lead pencil.The length of time billed for a long distance telephone call.The brand of cereal children eat for breakfast.

    The type of book taken out of the library by an adult.

    19Introduction to Statistics

    Level of Measurement

    Introduction to Statistics 20

    Nominal

    Ordinal

    Interval

    Ratio

    NOIR

  • 8/16/2019 1 Introduction to Statistics_handouts

    11/43

    7/20/20

    Nominal scale

    A nominal scale classifies data into distinct categories inwhich no ranking is implied.

    Categorical Variables Categories

    Personal ComputerOwnership

    Type of StocksOwned

    Internet Provider

    Yes / No

    Microsoft Network /AOL

    Growth, Value, Other

    21Introduction to Statistics

    Ordinal scale

    An ordinal scale classifies data into distinctcategories in which ranking is implied

    Student class designation Freshman, Junior, Senior

    Product satisfaction Satisfied, Neutral, Unsatisfied

    Faculty rank Professor, Associate Professor, Assistant Professor, Instructor

    Standard & Poor’s bond ratings AAA, AA, A, BBB, BB, B, CCC, CC,C, DDD, DD, D

    Student Grades A, B, C, D, F

    Categorical Variable Ordered Categories

    Chap 1-22Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    12/43

    7/20/20

    Interval scale

    An interval scale is an ordered scale in which thedifference between measurements is a meaningfulquantity but the measurements do not have a true zeropoint.

    Example:the difference between 1 and 2 years of age is thesame amount as the difference between 21 and 22years of age, or 50 and 51, or 65 and 66.the difference between a height of 60 inches and aheight of 55 inches is the same amount of difference asa height of 72 inches and a height of 67 inches.

    23Introduction to Statistics

    NOTE : For interval level variables, it is mathematically legitimate to do arithmetic (add, subtract,and divide) as well as count the values, and sort or rank the values .

    Introduction to Statistics 24

  • 8/16/2019 1 Introduction to Statistics_handouts

    13/43

    7/20/20

    A ratio scale is an ordered scale in which the differencebetween the measurements is a meaningful quantity.

    Ratio level variables have the additional property of having a true zero value so that ratios between values aremeaningful, but practically speaking, ratio level data istreated the same as interval level.Example

    number of clients in past six months

    It is meaningful to say that “...we had twice as manyclients in this period as we did in the previous six months.

    Levels of Measurement

    Example

    26Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    14/43

    7/20/20

    The Hierarchy of Levels

    Nominal27Introduction to Statistics

    The Hierarchy of Levels

    Nominal Attributes are only named; weakest28Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    15/43

    7/20/20

    The Hierarchy of Levels

    Nominal Attributes are only named; weakest

    Ordinal

    29Introduction to Statistics

    The Hierarchy of Levels

    Nominal Attributes are only named; weakest

    Attributes can be orderedOrdinal

    30Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    16/43

    7/20/20

    The Hierarchy of Levels

    Nominal

    Interval

    Attributes are only named; weakest

    Attributes can be orderedOrdinal

    31Introduction to Statistics

    The Hierarchy of Levels

    Nominal

    Interval

    Attributes are only named; weakest

    Attributes can be ordered

    Distance is meaningful

    Ordinal

    32Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    17/43

    7/20/20

    The Hierarchy of Levels

    Nominal

    Interval

    Ratio

    Attributes are only named; weakest

    Attributes can be ordered

    Distance is meaningful

    Ordinal

    33Introduction to Statistics

    The Hierarchy of Levels

    Nominal

    Interval

    Ratio

    Attributes are only named; weakest

    Attributes can be ordered

    Distance is meaningful

    Absolute zero

    Ordinal

    34Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    18/43

    7/20/20

    Introduction to Statistics 35

    Level of Measurement :Decision Tree

    Introduction to Statistics 36

    Level of Measurement :Characteristics

  • 8/16/2019 1 Introduction to Statistics_handouts

    19/43

    7/20/20

    Level of Measurement:Statistical Tests

    Introduction to Statistics 37

    Example

    Identify each of the following as examples of (1) nominal, (2)ordinal, (3) discrete, or (4) continuous variables:

    The length of time until a pain reliever begins to work.The number of chocolate chips in a cookie.The number of colors used in a statistics textbook.The brand of refrigerator in a home.The overall satisfaction rating of a new car.The number of files on a computer’s hard disk.The pH level of the water in a swimming pool.The number of staples in a stapler.

    38Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    20/43

    7/20/20

    Class Exercise

    Q 1: Determine whether the variable is categoricalor numerical If numerical, determine whether thevariable is discrete or continuous .Determine thelevel of measurement

    Amount of money spent on clothing in pastmonth?

    Favorite department store?

    Most likely time period during which shopping forclothing takes place?

    Number of pairs of shoes owned?

    Q 2: A manufacturer of dog food was planning tosurvey household in India to determine purchasinghabit of dog owners. Among the variables to becollected are

    The primary place of purchase of dog food?Whether dry or moist food can be purchased ?Number of dogs living in the household?Whether the dog is pedigreed?

    Class Exercise

  • 8/16/2019 1 Introduction to Statistics_handouts

    21/43

    7/20/20

    Q3 : Suppose the following information collected fromMr X on his application for a home loan at the HDFCbank Loan departmenta. Monthly payment : Rs 25100b. Annual Family income:c. Marital status: Marriedd. No of job changed in past 10 years: 2

    Classify each of the response by type of data and level ofmeasurement.

    Class Exercise

    Organizing and VisualizingCategorical and Numerical Data

    42Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    22/43

    7/20/20

    Categorical Data Are Organized ByUtilizing Tables

    Categorical Data

    Tallying Data

    Summary Table

    One CategoricalVariable

    Two CategoricalVariables

    Contingency Table

    43Introduction to Statistics

    Organizing Categorical Data:Summary Table

    A summary table indicates the frequency, amount, orpercentage of items in a set of categories so that you cansee differences between categories.

    How do you spend the holidays? Percent

    At home with family 45%

    Travel to visit family 38%

    Vacation 5%

    Catching up on work 5%

    Other 7%

    Chap 1-44Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    23/43

    7/20/20

    Contingency Table

    Used to study patterns that may existbetween the responses of two or morecategorical variables

    Cross tabulates or tallies jointly the responsesof the categorical variables

    For two variables the tallies for one variableare located in the rows and the tallies for thesecond variable are located in the columns

    45Introduction to Statistics

    Contingency Table - Example

    A random sample of 400invoices is drawn.Each invoice is categorized as asmall, medium, or largeamount.Each invoice is also examined toidentify if there are any errors.This data are then organized inthe contingency table to theright.

    NoErrors Errors Total

    Small

    Amount

    170 20 190

    MediumAmount

    100 40 140

    LargeAmount

    65 5 70

    Total335 65 400

    Contingency Table ShowingFrequency of Invoices CategorizedBy Size and The Presence Of Errors

    46Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    24/43

    7/20/20

    Contingency Table Based on% of Overall Total

    NoErrors Errors Total

    SmallAmount

    170 20 190

    MediumAmount

    100 40 140

    LargeAmount

    65 5 70

    Total335 65 400

    NoErrors Errors Total

    SmallAmount

    42.50% 5.00% 47.50%

    MediumAmount

    25.00% 10.00% 35.00%

    LargeAmount

    16.25% 1.25% 17.50%

    Total83.75% 16.25% 100.0%

    42.50% = 170 / 40025.00% = 100 / 40016.25% = 65 / 400

    83.75% of sampled invoices have no

    errors and 47.50% of sampled invoicesare for small amounts.

    47Introduction to Statistics

    Contingency Table Based on% of Row TotalsNo

    Errors Errors Total

    SmallAmount

    170 20 190

    MediumAmount

    100 40 140

    LargeAmount

    65 5 70

    Total335 65 400

    NoErrors Errors Total

    Small

    Amount

    89.47% 10.53% 100.0%

    MediumAmount

    71.43% 28.57% 100.0%

    LargeAmount

    92.86% 7.14% 100.0%

    Total83.75% 16.25% 100.0%

    89.47% = 170 / 19071.43% = 100 / 14092.86% = 65 / 70

    Medium invoices have a larger chance(28.57%) of having errors than small(10.53%) or large (7.14%) invoices.

    48Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    25/43

    7/20/20

    Contingency Table Based onPercentage of Column Total

    NoErrors Errors Total

    SmallAmount

    170 20 190

    MediumAmount

    100 40 140

    LargeAmount

    65 5 70

    Total335 65 400

    NoErrors Errors Total

    SmallAmount

    50.75% 30.77% 47.50%

    MediumAmount

    29.85% 61.54% 35.00%

    LargeAmount

    19.40% 7.69% 17.50%

    Total100.0% 100.0% 100.0%

    50.75% = 170 / 33530.77% = 20 / 65

    There is a 61.54% chance that invoiceswith errors are of medium size.

    49Introduction to Statistics

    Tables Used For OrganizingNumerical Data

    Numerical Data

    Ordered ArrayCumulative

    DistributionsFrequency

    Distributions

    50Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    26/43

    7/20/20

    Organizing Numerical Data:Ordered Array

    An ordered array is a sequence of data, in rank order,from the smallest value to the largest value.

    Age ofSurveyedCollege

    Students

    Day Students

    16 17 17 18 18 18

    19 19 20 20 21 22

    22 25 27 32 38 42Night Students

    18 18 19 19 20 21

    23 28 32 33 41 45

    Chap 1-51Introduction to Statistics

    Organizing Numerical Data:Frequency Distribution

    The frequency distribution is a summary table in which thedata are arranged into numerically ordered class groupings.You must give attention to selecting the appropriate numberof class groupings for the table, determining a suitable widthof a class grouping, and establishing the boundaries of eachclass grouping to avoid overlapping.

    The number of classes depends on the number of values inthe data. With a larger number of values, typically there aremore classes. In general, a frequency distribution shouldhave at least 5 but no more than 15 classes.To determine the width of a class interval, you divide therange (Highest value –Lowest value) of the data by thenumber of class groupings desired.

    52Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    27/43

    7/20/20

    Organizing Numerical Data:Frequency Distribution Example

    Example: A manufacturer of insulation randomlyselects 20 winter days and records the dailyhigh temperature

    24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41,

    43, 44, 27, 53, 27

    53Introduction to Statistics

    STEPS

    1. Sort raw data in ascending order:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

    2. Find range: 58 - 12 = 463. Select number of classes: 5 (usually between 5 and 15)4. Compute class interval (width): 10 (46/5 then round up)5. Determine class boundaries (limits):

    1. Class 1: 10 to less than 202. Class 2: 20 to less than 303. Class 3: 30 to less than 404. Class 4: 40 to less than 505. Class 5: 50 to less than 60

    6. Compute class midpoints: 15, 25, 35, 45, 557. Count observations & assign to classes 54Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    28/43

    7/20/20

    Organizing Numerical Data:Frequency Distribution Example

    Class Midpoints Frequency

    10 but less than 20 15 320 but less than 30 25 630 but less than 40 35 540 but less than 50 45 450 but less than 60 55 2

    Total 20

    Data in ordered array:

    12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

    55Introduction to Statistics

    Organizing Numerical Data:Relative & Percent FrequencyDistribution

    Class Frequency

    10 but less than 20 3 .15 15

    20 but less than 30 6 .30 3030 but less than 40 5 .25 2540 but less than 50 4 .20 2050 but less than 60 2 .10 10

    Total 20 1.00 100

    RelativeFrequency Percentage

    Data in ordered array:

    12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

    56Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    29/43

    7/20/20

    Organizing Numerical Data:Cumulative FrequencyDistribution

    Class

    10 but less than 20 3 15% 3 15%

    20 but less than 30 6 30% 9 45%

    30 but less than 40 5 25% 14 70%

    40 but less than 50 4 20% 18 90%50 but less than 60 2 10% 20 100%

    Total 20 100 20 100%

    Percentage CumulativePercentage

    Data in ordered array:

    12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

    FrequencyCumulativeFrequency

    57Introduction to Statistics

    Why Use a Frequency Distribution?

    It condenses the raw data into a more useful form

    It allows for a quick visual interpretation of the data

    It enables the determination of the majorcharacteristics of the data set including where the dataare concentrated / clustered

    58Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    30/43

    7/20/20

    Frequency Distributions:Some Tips

    Different class boundaries may provide differentpictures for the same data (especially for smallerdata sets)Shifts in data concentration may show up whendifferent class boundaries are chosenAs the size of the data set increases, the impact ofalterations in the selection of class boundaries isgreatly reduced

    When comparing two or more groups with differentsample sizes, you must use either a relativefrequency or a percentage distribution

    59Introduction to Statistics

    Visualizing Categorical DataThrough Graphical DisplaysCategorical Data

    Visualizing Data

    BarChart

    Summary TableFor One Variable

    ContingencyTable For Two

    Variables

    Side By Side BarChart

    Pie Chart

    ParetoChart

    60Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    31/43

    7/20/20

    In a bar chart, a bar shows each category, the lengthof which represents the amount, frequency orpercentage of values falling into a category.

    45%38%

    5%

    5%

    7%

    0% 10% 20% 30% 40% 50%

    At home with …Travel to visit …

    Vacation

    Catching up …

    Other

    How Do You Spend the Holidays?

    Organizing Categorical Data:Summary Table

    Chap 1-61Introduction to Statistics

    Organizing Categorical Data:Pie Chart

    The pie chart is a circle broken up into slices thatrepresent categories. The size of each slice of the pievaries according to the percentage in each category.

    45%

    38%

    5% 5%7%

    How Do You Spend the Holiday's

    At home with family

    Travel to visit family

    Vacation

    Catching up on work

    OtherChap 1-62

    Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    32/43

    7/20/20

    Organizing Categorical Data:Pareto Diagram

    Used to portray categorical data

    A bar chart, where categories are shown indescending order of frequency

    A cumulative polygon is shown in the same graph

    Used to separate the “vital few” from the “trivial

    many ”

    63Introduction to Statistics

    Organizing Categorical Data:Pareto Diagram

    c u m u l a t i v e % i n v e s t e d

    ( l i n e g r a p h )

    % i

    n v e s t e

    d i n e a c h

    c a t e g o r y

    ( b a r

    g r a p h

    )

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    0%

    5%

    10%

    15%

    20%

    25%

    30%

    35%

    40%

    45%

    Stocks Bonds Savings CD

    Current Investment Portfolio

    64Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    33/43

    7/20/20

    Visualizing Categorical Data:Side By Side Bar Charts

    The side by side bar chart represents the data from a contingency table.

    0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0%

    No Errors

    Errors

    Invoice Size Split Out By Errors & NoErrors

    Large Medium Small

    Invoices with errors are much more likely to be of medium size (61.54% vs 30.77% and 7.69%)

    NoErrors Errors Total

    SmallAmount

    50.75% 30.77% 47.50%

    MediumAmount

    29.85% 61.54% 35.00%

    LargeAmount

    19.40% 7.69% 17.50%

    Total

    100.0% 100.0% 100.0%

    65Introduction to Statistics

    Visualizing Numerical Data ByUsing Graphical Displays

    Numerical Data

    Ordered Array

    Stem-and-Leaf Display Histogram Polygon Ogive

    Frequency Distributions andCumulative Distributions

    66Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    34/43

    7/20/20

    Organizing Numerical Data:Stem and Leaf Display

    A stem-and-leaf display organizes data into groups(called stems) so that the values within each group(the leaves) branch out to the right on each row.

    Stem Leaf

    1 67788899

    2 0012257

    3 28

    4 2

    Age of College Students

    Day Students Night Students

    Stem Leaf

    1 8899

    2 0138

    3 23

    4 15

    Age ofSurveyedCollegeStudents

    Day Students16 17 17 18 18 18

    19 19 20 20 21 22

    22 25 27 32 38 42

    Night Students18 18 19 19 20 2123 28 32 33 41 45

    2-67Introduction to Statistics

    Organizing Numerical Data:Stem and Leaf Display

    A stem-and-leaf display organizes data into groups(called stems) so that the values within each group(the leaves) branch out to the right on each row.

    Stem Leaf 1 67788899

    2 0012257

    3 28

    4 2

    Age of College Students

    Day Students Night Students

    Stem Leaf 1 8899

    2 0138

    3 23

    4 15

    Chap 1-68Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    35/43

    7/20/20

    Visualizing Numerical Data:The Histogram

    A graph of the data in a frequency distribution iscalled a histogram.In a histogram there are no gaps between adjacentbars.The class boundaries (or class midpoints ) are shownon the horizontal axis.The vertical axis is either frequency, relativefrequency, or percentage .

    Bars of the appropriate heights are used to representthe number of observations within each class.

    69Introduction to Statistics

    Class Frequency

    10 but less than 20 3 .15 1520 but less than 30 6 .30 3030 but less than 40 5 .25 25

    40 but less than 50 4 .20 2050 but less than 60 2 .10 10

    Total 20 1.00 100

    RelativeFrequency Percentage

    0

    5

    10

    5 15 25 35 45 55 More

    F r e q u e n c y

    Histogram: Daily HighTemperature

    Visualizing Numerical Data:The Histogram

    70Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    36/43

    7/20/20

    A percentage polygon is formed by having themidpoint of each class represent the data in that classand then connecting the sequence of midpoints attheir respective class percentages.The cumulative percentage polygon, orogive, displays the variable of interest along the X axis, and the cumulative percentages along the Y axis.

    Useful when there are two or more groups tocompare

    Visualizing Numerical Data:The Polygon

    71Introduction to Statistics

    Visualizing Numerical Data:The Frequency Polygon

    0

    5

    10

    5 15 25 35 45 55 More

    F r e q u e n c y

    Frequency Polygon: Daily HighTemperature

    Class Frequency

    10 but less than 20 3 .15 1520 but less than 30 6 .30 3030 but less than 40 5 .25 2540 but less than 50 4 .20 2050 but less than 60 2 .10 10

    Total 20 1.00 100

    RelativeFrequency

    Percentage

    (In a percentage polygonthe vertical axis would bedefined to show thepercentage of observationsper class)

    72Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    37/43

    7/20/20

    Organizing Numerical Data:The Cumulative Percentage Polygon

    0

    50

    100

    10 20 30 40 50 60

    C u m u

    l a t i v e P e r c e n t a g e

    Ogive: Daily High Temperature

    Class LowerBoundary

    % Less ThanLowerBoundary

    10

  • 8/16/2019 1 Introduction to Statistics_handouts

    38/43

    7/20/20

    Scatter Plot Example

    Volumeper day

    Cost perday

    23 125

    26 140

    29 146

    33 160

    38 167

    42 170

    50 18855 195

    60 200

    0

    50

    100

    150

    200

    250

    20 30 40 50 60 70

    C o s t p e r D a y

    Volume per Day

    Cost per Day vs. ProductionVolume

    75Introduction to Statistics

    Time SeriesA Time Series Plot is used to study patterns in thevalues of a numeric variable over timeThe Time Series Plot:Numeric variable is measured on the vertical axis and

    the time period is measured on the horizontal axis

    Attendance (in millions) at USA amusement/theme parks from 2000-2005

    Year Year Number Attendance2000 0 317

    2001 1 319

    2002 2 324

    2003 3 322

    2004 4 328

    2005 5 33576Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    39/43

    7/20/20

    Time Series Example

    316

    320

    324

    328

    332

    336

    0 1 2 3 4 5 6

    A t t e n

    d a n c e

    Year (Since 2000 )

    Attendance (in millions) at US ThemeParks

    77Introduction to Statistics

    Principles of Excellent Graphs

    The graph should not distort the data.The graph should not contain unnecessaryadornments (sometimes referred to as chart junk ).The scale on the vertical axis should begin at zero.

    All axes should be properly labeled.The graph should contain a title.The simplest possible graph should be used for agiven set of data.

    78Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    40/43

    7/20/20

    Graphical Errors: Chart Junk

    1960: $1.00

    1970: $1.60

    1980: $3.10

    1990: $3.80

    Minimum Wage

    Bad Presentation

    Minimum Wage

    0

    2

    4

    1960 1970 1980 1990

    $

    Good Presentation

    79Introduction to Statistics

    Graphical Errors:No Relative Basis

    A’s received bystudents.

    A’s received bystudents.

    Bad Presentation

    0

    200

    300

    FR SO JR SR

    Freq.

    10%

    30%

    FR SO JR SR

    FR = Freshmen, SO = Sophomore, JR = Junior, SR = Senior

    100

    20%

    0%

    %

    Good Presentation

    80Introduction to Statistics

  • 8/16/2019 1 Introduction to Statistics_handouts

    41/43

    7/20/20

    Graphical Errors:Compressing the Vertical Axis

    Good Presentation

    Quarterly Sales Quarterly Sales

    Bad Presentation

    0

    25

    50

    Q1 Q2 Q3 Q4

    $

    0

    100

    200

    Q1 Q2 Q3 Q4

    $

    81Introduction to Statistics

    Class Exercise 1The owner of the restaurant wanted to study the demand fordessert. He decided that in addition to studying whether the desertwas ordered, he would also study the gender of individual. Datawere collected from 600 customers and organized in the followingcontingency tables.

    a.Construct a contingency tables for row, column and total percentage?b.Which type of percentage (row, column and total ), do you think more

    informative for each gender?c.What conclusions concerning the pattern of dessert ordering can the

    restaurant owner reach?

    GenderDessert Ordered Male Female Total

    Yes 40 96 136No 240 224 464

    Total 280 320 600

  • 8/16/2019 1 Introduction to Statistics_handouts

    42/43

    7/20/20

    Class Exercise 2

    The Following Table represents estimated green power salesby renewable energy source 2008Source Percentage

    Geothermal 2.8

    hydro 11.3

    Landfill mass and biomass 28.1

    Solar 0.2

    Unreported 2.5

    Wind 55.1

    a. Construct a bar chart, pie chart and Pareto chartb. What conclusion can you reach about the sources of green

    powerSource: National renewable energy laboratory,2008

    Class Exercise 3

  • 8/16/2019 1 Introduction to Statistics_handouts

    43/43

    7/20/20

    Calculate the following ?

    a. Divide the data into classesb. Absolute frequency

    c. Relative frequency

    d. Percentages

    e. Cumulative frequencyf. Cumulative percentage

    g. Midpoints

    h. Draw Histogram and relative frequencypolygon

    THANKS