161.120 Introductory Statistics Week 4 Lecture slides

Download 161.120 Introductory Statistics  Week 4 Lecture slides

Post on 09-Feb-2016

42 views

Category:

Documents

0 download

DESCRIPTION

161.120 Introductory Statistics Week 4 Lecture slides. Exploring Time Series CAST chapter 4 Relationships between Categorical Variables Text sections 6.1 CAST chapter 5 Data Presentation Study Guide: extra notes section 13. Time Series What you need to be able to do. - PowerPoint PPT Presentation

TRANSCRIPT

  • 161.120 Introductory Statistics Week 4 Lecture slidesExploring Time SeriesCAST chapter 4

    Relationships between Categorical VariablesText sections 6.1CAST chapter 5

    Data PresentationStudy Guide: extra notes section 13

  • Time Series What you need to be able to doPlot time series and use least squares to make forecastsIdentify and describe in words (in terms that the data collector might understand) the trend and seasonal components in a time series plot

  • What is a Time Series?A series of data values recorded (generally at equal time intervals) sequentially in time.Average age at death of people each month in a large city over a period of 5 years Area of rice grown in East Asia each year for the past 15 years Weight of every 100th kiwifruit packed during an 8-hour shift Number of hospital admissions each day over a period of 5 months

  • What is a Time Series?There is often a time-related pattern to the variability. - A trend towards higher or lower values over time - A pattern that repeats regularly

    Ignoring the time ordering and examining the data with dot plots or similar univariate techniques may result in useful information being missed

    Particularly important in business and commerce

  • The importance of plottingCan be difficult to get useful information from time series if they are presented in tabular form. Information in a time series is most easily understood from a graphical display.A time series plot is a type of dot plot in which the values are displayed as crosses against a vertical axis. The horizontal axis spreads out the crosses in time order. (It can also be thought of as a scatterplot in which the 'explanatory' variable is time.) The successive crosses are often joined by lines.

  • TrendTime series data often change systematically over time this change is called the trend. The long-term upward or downward movements in the values. For example time series plots of commodity prices often have an upward trend over a period of years.The trend can be masked by random fluctuations Trend is very important for forecasting future values

  • Smoothing MethodsReduce the fluctuations and show the trend more clearly. These methods replace each value in the series with a function of it and the adjacent values. Moving averages (also called running means)Each value is replaced by the mean of it and the two adjacent values (3-point moving average)

  • Greater smoothing is obtained by using means of more adjacent values.

    Effective at highlighting the trend in the centre of a time series, but cannot be used at the ends since the moving average requires values both before and after each value being smoothed.

  • Forecasting Least squaresLinear model Residuals

    Recode year

  • Quadratic model

    Patterns in residuals

    ForecastingOnce the equation of a trend line (using least squares) is obtained, insert future time values into equation for forecast.

    Beware forecasting many time periods into the futureThe shape of the actual trend line might be different from your model

  • CyclesNot all increases and decreases can be explained by a smooth trend line.Many time series change in cyclesCyclical PatternsCycles do not repeat regularlyExample: Sun spot activity cycle of approx 11 years, but not all cycles are of the same length.Seasonal PatternsNot usually referred to as cyclicalDistinguished by a period that repeats exactlyRegular cycles that are strongly repeated to the calendarMonthly or quarterly data often has a pattern of peaks and troughs that repeat in a similar way each yearImportant that the most recent values is not interpreted in relation to the immediately preceding value

  • Relationships between Categorical VariablesWhat we might ask

    Explain why relative frequencies allow better comparison between groups.

    Use stacked and grouped bars in a bar chart to better compare groups.

    Identify whether a table of data is a contingency table.

    Find marginal and conditional proportions from a contingency table to answer questions stated in words.

  • Contingency TablesSingle rectangular array combining frequency table for each variable

    Example: A study exploring the relationship between hypertension (high blood pressure) and amount of smoking of a sample of 200 people.

    Degree of hypertension

    Frequency

    Amount of smoking

    Frequency

    Severe

    44

    None

    70

    Mild

    69

    Moderate

    54

    None

    87

    Heavy

    76

    Total

    200

    Total

    200

    Amount of smoking

    None

    Moderate

    Heavy

    Total

    Degree of hypertension

    Severe

    10

    14

    20

    44

    Mild

    20

    18

    31

    69

    None

    40

    22

    25

    87

    Total

    70

    54

    76

    200

  • Fully describes categorical data (2 or more groups)Poor way to compare distributions if there are different total numbers in the groups

    Can be more informative to use proportions within the groups(each frequency in table is divided by the total for that group)

    Places

    Christchurch

    Palmerston North

    Total

    Means of Transport

    Private/Company vehicle

    111687

    21825

    133512

    Public transport

    5406

    351

    5757

    Bicycle

    8667

    2013

    10680

    Walked / Jogged

    6624

    2406

    9030

    Other

    9195

    2106

    11301

    Total

    141579

    28701

    170280

    Places

    Christchurch

    Palmerston North

    Means of Transport

    Private/Company vehicle

    0.79

    0.76

    Public transport

    0.04

    0.01

    Bicycle

    0.06

    0.07

    Walked / Jogged

    0.05

    0.08

    Other

    0.06

    0.07

    Total

    1

    1

  • Example 6.1 Smoking and Divorce RiskData on smoking habits and divorce history for the 1669 respondents who had ever been married.Among smokers, 49% have been divorced, 51% have not. Among nonsmokers, only 32% have been divorced, 68% have not. The difference between row percents indicates a relationship.

  • Same shape whether based on frequency or relative frequency

  • When the groups correspond to different rows, the most important comparisons are down columns.

    Ever Divorced?

    Smoke?

    Yes

    No

    Total

    Yes

    0.49

    0.51

    1

    No

    0.32

    0.68

    1

  • The corresponding bars for the smoking groups are widely spread, making comparison harder. Can cluster bars by smoking group.

  • Example 6.2 Tattoos and Ear PiercesResponses from n = 565 men to two questions: 1. Do you have a tattoo? 2. How many total ear pierces do you have?

  • Stacked Bar charts are often the best way to graphically compare groups

  • Types of bivariate relationshipExperimental dataCategorical data sometimes collected separately from different groupsCategorical measurement treated as responseGrouping treated as explanatory variable

    Stimulus-response dataStimulus may affect the responseAlso can have two categorical measurements made from one individualOne can affect the other but not the reverseAssociationNot all relationships are causal, so sometimes the variables cannot be classified into explanatory and response variables

  • What type of bivariate relationship?

  • Joint proportionsWhat proportion of the skiers where given the placebo and didnt catch a cold?

    Marginal proportionsWhat proportion of skiers didnt catch a cold?

    Conditional proportionsWhat proportion of skiers caught a cold given that they had the Placebo?

    Cold

    No Cold

    Total

    Ascorbic acid

    17

    122

    139

    Placebo

    31

    109

    140

    Total

    48

    231

    279