stat481/581: introductiontotime seriesanalysis - math.unm.edu

91
STAT481/581: Introduction to Time Series Analysis Ch2. Time series graphics OTexts.org/fpp3/

Upload: others

Post on 04-Jan-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

STAT481/581:Introduction to TimeSeries Analysis

Ch2. Time series graphics

OTexts.org/fpp3/

Page 2: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Outline

1 Time series in R

2 Time plots

3 Seasonal plots

4 Seasonal or cyclic?

5 Lag plots and autocorrelation

6 White noise

2

Page 3: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Outline

1 Time series in R

2 Time plots

3 Seasonal plots

4 Seasonal or cyclic?

5 Lag plots and autocorrelation

6 White noise

3

Page 4: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Class packages

# Data manipulation and plotting functionslibrary(tidyverse)# Forecasting functionslibrary(fable)# Time series manipulationlibrary(tsibble)# Time series graphics and statisticslibrary(feasts)# Functions to work with date-timeslibrary(lubridate)# Tidy time series datalibrary(tsibbledata)

4

Page 5: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

tsibbledata datasets

1 anasett: Passenger numbers on Anasett airline flights2 aus_livestock: Meat production in Australia for human

consumption from Q3 1965 to Q4 2018.3 aus_production: Quarterly estimates of manufacturing

production of selected commodities in Australia.4 aus_retail: Australian retail trade turnover (total value

of retail traded).5 gafa_stock: GAFA stock prices.6 global_economy: Global economic indicators.

5

Page 6: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

tsibbledata datasets

7 hh_budget: Household budget characteristics.8 nyc_bikes: A sample from NYC Citi Bike usage of 10

bikes throughout 2018.9 olympic_running: Fastest running times for Olympic

races.10 PBS: Monthly Medicare Australia prescription data.11 pelt: Pelt trading records.12 vic_elec: Half-hourly electricity demand for Victoria,

Australia

6

Page 7: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

tsibble objects

A tsibble allows storage and manipulation of time series in R.A tsibble is a data- and model-oriented object.

It contains:

Measured variable(s): numbers of interestKey variable(s): identifiers for each seriesAn index: time information about the observationA tsibble is sorted by its key first and then indexKey variable(s) together with the index uniquely identifieseach record

7

Page 8: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

tsibble objects example

#Creating a tsibble objectlibrary(tsibble)e1 <- tsibble(year = 2012:2016, x = c(1,2,1,2,2),y = c(123,39,78,52,110), index = year, key = x)

e1

## # A tsibble: 5 x 3 [1Y]## # Key: x [2]## year x y## <int> <dbl> <dbl>## 1 2012 1 123## 2 2014 1 78## 3 2013 2 39## 4 2015 2 52## 5 2016 2 110

8

Page 9: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

tsibble objects example

# yearquarter returns a numeric value that can be addede2<- tsibble(

qtr = rep(yearquarter("2010 Q1") + 0:9, 3),group = rep(c("x", "y", "z"), each = 10),value = rnorm(30),key = group, index = qtr)

e2

## # A tsibble: 30 x 3 [1Q]## # Key: group [3]## qtr group value## <qtr> <chr> <dbl>## 1 2010 Q1 x 0.449## 2 2010 Q2 x 0.702## 3 2010 Q3 x -0.732## 4 2010 Q4 x 0.284## 5 2011 Q1 x -1.39## 6 2011 Q2 x 0.895## 7 2011 Q3 x -1.47## 8 2011 Q4 x 1.21## 9 2012 Q1 x -1.28## 10 2012 Q2 x -0.621## # ... with 20 more rows

9

Page 10: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

The tsibble index

Common time index variables can be created withthese functions:

Frequency Function

Annual start year:end yearQuarterly yearquarter()Monthly yearmonth()Weekly yearweek()Daily as_Date(), ymd()Sub-daily as_datetime()

10

Page 11: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Example for time index variables

2015:2020

## [1] 2015 2016 2017 2018 2019 2020

yearquarter("2010 Q1")+0:3

## [1] "2010 Q1" "2010 Q2" "2010 Q3" "2010 Q4"

yearmonth("2010 1")+0:3

## [1] "2010 Jan" "2010 Feb" "2010 Mar" "2010 Apr"

11

Page 12: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Example for time index variables

yearweek("2010 1")+0:3

## [1] "2009 W53" "2010 W01" "2010 W02" "2010 W03"

as.Date("2020-01-22") + 0:3

## [1] "2020-01-22" "2020-01-23" "2020-01-24"## [4] "2020-01-25"

12

Page 13: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Example for time index variables

ymd("2020-01-22")+0:3

## [1] "2020-01-22" "2020-01-23" "2020-01-24"## [4] "2020-01-25"

as_datetime("2020-01-22 00:50:50")+0:3

## [1] "2020-01-22 00:50:50 UTC"## [2] "2020-01-22 00:50:51 UTC"## [3] "2020-01-22 00:50:52 UTC"## [4] "2020-01-22 00:50:53 UTC"

13

Page 14: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Coerce a dataset to be an tsibbledata ob-ject

olympic_running %>% as_tsibble(key = c(Length, Sex), index = Year)

## # A tsibble: 312 x 4 [4Y]## # Key: Length, Sex [14]## Year Length Sex Time## <dbl> <fct> <chr> <dbl>## 1 1896 100m men 12## 2 1900 100m men 11## 3 1904 100m men 11## 4 1908 100m men 10.8## 5 1912 100m men 10.8## 6 1916 100m men NA## 7 1920 100m men 10.8## 8 1924 100m men 10.6## 9 1928 100m men 10.8## 10 1932 100m men 10.3## # ... with 302 more rows

14

Page 15: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

The key to many time series

tsibbledata::olympic_running %>%group_by_key() %>%slice(1) %>%head(6) %>%knitr::kable(booktabs=TRUE)

15

Page 16: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

The key to many time series

Year Length Sex Time

1896 100m men 12.01928 100m women 12.21900 200m men 22.21948 200m women 24.41896 400m men 54.21964 400m women 52.0

16

Page 17: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Australian GDP

#filter is to select a subset in rowsaus_economy <- global_economy %>%

filter(Code == "AUS")

## # A tsibble: 58 x 9 [1Y]## # Key: Country [1]## Country Code Year GDP Growth CPI## <fct> <fct> <dbl> <dbl> <dbl> <dbl>## 1 Austra~ AUS 1960 1.86e10 NA 7.96## 2 Austra~ AUS 1961 1.96e10 2.49 8.14## 3 Austra~ AUS 1962 1.99e10 1.30 8.12## 4 Austra~ AUS 1963 2.15e10 6.21 8.17## 5 Austra~ AUS 1964 2.38e10 6.98 8.40## 6 Austra~ AUS 1965 2.59e10 5.98 8.69## 7 Austra~ AUS 1966 2.73e10 2.38 8.98## 8 Austra~ AUS 1967 3.04e10 6.30 9.29## 9 Austra~ AUS 1968 3.27e10 5.10 9.52## 10 Austra~ AUS 1969 3.66e10 7.04 9.83## # ... with 48 more rows, and 3 more variables:## # Imports <dbl>, Exports <dbl>,## # Population <dbl>

17

Page 18: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Outline

1 Time series in R

2 Time plots

3 Seasonal plots

4 Seasonal or cyclic?

5 Lag plots and autocorrelation

6 White noise

18

Page 19: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Australian GDP

aus_economy %>% autoplot(GDP)

0.0e+00

5.0e+11

1.0e+12

1.5e+12

1960 1980 2000Year [1Y]

GD

P

19

Page 20: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time plots

ansett %>%filter(Airports=="MEL-SYD", Class=="Economy") %>%autoplot(Passengers)

0

10000

20000

30000

1988 1990 1992Week [1W]

Pas

seng

ers

20

Page 21: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time plots

# Taking a subset of the time series according to timea10.subset <- PBS %>% filter(ATC2 == "A10")%>%

filter_index("2008 Jan")a10.subset

## # A tsibble: 4 x 9 [1M]## # Key: Concession, Type, ATC1, ATC2 [4]## Month Concession Type ATC1 ATC1_desc## <mth> <chr> <chr> <chr> <chr>## 1 2008 Jan Concessio~ Co-p~ A Alimenta~## 2 2008 Jan Concessio~ Safe~ A Alimenta~## 3 2008 Jan General Co-p~ A Alimenta~## 4 2008 Jan General Safe~ A Alimenta~## # ... with 4 more variables: ATC2 <chr>,## # ATC2_desc <chr>, Scripts <dbl>, Cost <dbl>

21

Page 22: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time plots

a10 <- PBS %>%filter(ATC2 == "A10") %>%summarise(Cost = sum(Cost)/1e6)

22

Page 23: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time plots

a10 %>% autoplot(Cost) +ylab("$ million") + xlab("Year") +ggtitle("Antidiabetic drug sales")

10

20

30

1995 2000 2005Year

$ m

illio

n

Antidiabetic drug sales

23

Page 24: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Your turn

Create plots of the following time series: Bricksfrom aus_production, Lynx from pelt,Google from gafa_stockUse help() to find out about the data in eachseries.For the last plot, modify the axis labels and title.

24

Page 25: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Are time plots the best?

maxtemp %>%autoplot(Temperature) +xlab("Week") + ylab("Max temperature")

10

20

30

40

2012 2013 2014 2015Week

Max

tem

pera

ture

25

Page 26: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Are time plots the best?

maxtemp %>%ggplot(aes(x = Day, y = Temperature)) + geom_point() +xlab("Week") + ylab("Max temperature")

10

20

30

40

2012 2013 2014 2015Week

Max

tem

pera

ture

26

Page 27: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Are time plots the best?

2012 2013 2014 2015Day

10

20

30

40

Temperature

27

Page 28: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Are time plots the best?

28

Page 29: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Outline

1 Time series in R

2 Time plots

3 Seasonal plots

4 Seasonal or cyclic?

5 Lag plots and autocorrelation

6 White noise

29

Page 30: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Seasonal plots

a10 %>% gg_season(Cost, labels = "both") +ylab("$ million") + ggtitle("Seasonal plot: antidiabetic drug sales")

1991 19911992 1992199319931994 19941995 1995

1996 199619971997

1998199819991999

2000 2000

20012001

20022002

2003 20032004

20042005 2005

2006 2006

2007

2007

2008

2008

10

20

30

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecMonth

$ m

illio

n

Seasonal plot: antidiabetic drug sales

30

Page 31: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Seasonal plots

Data plotted against the individual “seasons” inwhich the data were observed. (In this case a“season” is a month.)Something like a time plot except that the datafrom each season are overlapped.Enables the underlying seasonal pattern to beseen more clearly, and also allows anysubstantial departures from the seasonal patternto be easily identified.In R: gg_season()

31

Page 32: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Seasonal subseries plots

a10 %>%gg_subseries(Cost) + ylab("$ million") +ggtitle("Subseries plot: antidiabetic drug sales")

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1995

2000

2005

1995

2000

2005

1995

2000

2005

1995

2000

2005

1995

2000

2005

1995

2000

2005

1995

2000

2005

1995

2000

2005

1995

2000

2005

1995

2000

2005

1995

2000

2005

1995

2000

2005

10

20

30

Month

$ m

illio

n

Subseries plot: antidiabetic drug sales

32

Page 33: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Seasonal subseries plots

Data for each season collected together in timeplot as separate time series.Enables the underlying seasonal pattern to beseen clearly, and changes in seasonality overtime to be visualized.In R: gg_subseries()

33

Page 34: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Quarterly Australian Beer Production

beer <- aus_production %>%select(Quarter, Beer) %>%filter(year(Quarter) >= 1992)

beer %>% autoplot(Beer)

400

450

500

1995 2000 2005 2010Quarter [1Q]

Bee

r

34

Page 35: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Quarterly Australian Beer Production

beer %>% gg_season(Beer, labels="right")

1992

1993

19941995

1996

199719981999

20002001

2002

2003

2004

20052006

2007

20082009

2010

400

450

500

Jan Apr Jul OctQuarter

Bee

r

35

Page 36: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Quarterly Australian Beer Production

beer %>% gg_subseries(Beer)

Q1 Q2 Q3 Q4

1995

2000

2005

2010

1995

2000

2005

2010

1995

2000

2005

2010

1995

2000

2005

2010

400

450

500

Quarter

Bee

r

36

Page 37: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Your turn

Look at the quarterly tourism data for the SnowyMountains

snowy <- filter(tourism,Region == "Snowy Mountains",Purpose == "Holiday")

Use autoplot(), gg_season() andgg_subseries() to explore the data.What do you learn?

37

Page 38: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Outline

1 Time series in R

2 Time plots

3 Seasonal plots

4 Seasonal or cyclic?

5 Lag plots and autocorrelation

6 White noise

38

Page 39: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time series patterns

Trend pattern exists when there is a long-termincrease or decrease in the data.

Seasonal pattern exists when a series is influencedby seasonal factors (e.g., the quarter ofthe year, the month, or day of the week).

Cyclic pattern exists when data exhibit rises andfalls that are not of fixed period.

39

Page 40: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time series components

Differences between seasonal and cyclicpatterns:

seasonal pattern constant length; cyclic patternvariable lengthaverage length of cycle longer than length ofseasonal patternmagnitude of cycle more variable thanmagnitude of seasonal pattern

40

Page 41: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time series patterns

as_tsibble(fma::elec) %>%filter(index >= 1980) %>% # or filter_index("1980 Jan" ~.)autoplot(value) + xlab("Year") + ylab("GWh") +ggtitle("Australian electricity production")

7500

10000

12500

15000

1975 1980 1985 1990 1995Year

GW

h

Australian electricity production

41

Page 42: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time series patterns

as_tsibble(fma::elec) %>%filter(index >= 1980) %>% gg_subseries(value)+xlab("Year") + ylab("GWh") +ggtitle("Australian electricity production")

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

7500

10000

12500

15000

Year

GW

h

Australian electricity production

42

Page 43: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time series patterns

as_tsibble(fma::elec) %>%filter(index >= 1980) %>% gg_season(value)+xlab("Year") + ylab("GWh") +ggtitle("Australian electricity production")

7500

10000

12500

15000

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecYear

GW

h

1979

1984

1989

1994

Australian electricity production

43

Page 44: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time series patterns

aus_production %>%autoplot(Bricks) +ggtitle("Australian clay brick production") +xlab("Year") + ylab("million units")

200

300

400

500

600

1960 1980 2000Year

mill

ion

units

Australian clay brick production

44

Page 45: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time series patterns

aus_production %>%gg_subseries(Bricks) +ggtitle("Australian clay brick production") +xlab("Year") + ylab("million units")

Q1 Q2 Q3 Q4

1960

1980

2000

1960

1980

2000

1960

1980

2000

1960

1980

2000

200

300

400

500

600

Year

mill

ion

units

Australian clay brick production

45

Page 46: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time series patterns

aus_production %>%gg_season(Bricks) +ggtitle("Australian clay brick production") +xlab("Year") + ylab("million units")

200

300

400

500

600

Jan Apr Jul OctYear

mill

ion

units

1965

1975

1985

1995

2005

Australian clay brick production

46

Page 47: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time series patterns

as_tsibble(fma::hsales) %>%autoplot(value) +ggtitle("Sales of new one-family houses, USA") +xlab("Year") + ylab("Total sales")

40

60

80

1975 1980 1985 1990 1995Year

Tota

l sal

es

Sales of new one−family houses, USA

47

Page 48: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time series patterns

as_tsibble(fma::hsales) %>%gg_subseries(value) +ggtitle("Sales of new one-family houses, USA") +xlab("Year") + ylab("Total sales")

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

1975

1980

1985

1990

1995

40

60

80

Year

Tota

l sal

es

Sales of new one−family houses, USA

48

Page 49: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time series patterns

as_tsibble(fma::hsales) %>%gg_season(value) +ggtitle("Sales of new one-family houses, USA") +xlab("Year") + ylab("Total sales")

40

60

80

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecYear

Tota

l sal

es

1977

1982

1987

1992

Sales of new one−family houses, USA

49

Page 50: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time series patterns

as_tsibble(fma::ustreas) %>%autoplot(value) +ggtitle("US Treasury Bill Contracts") +xlab("Day") + ylab("price")

86

88

90

0 25 50 75 100Day

pric

e

US Treasury Bill Contracts

50

Page 51: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Time series patterns

pelt %>%autoplot(Lynx) +ggtitle("Annual Canadian Lynx Trappings") +xlab("Year") + ylab("Number trapped")

0

20000

40000

60000

80000

1860 1880 1900 1920Year

Num

ber

trap

ped

Annual Canadian Lynx Trappings

51

Page 52: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Seasonal or cyclic?

Differences between seasonal and cyclic patterns:

seasonal pattern constant length; cyclic patternvariable lengthaverage length of cycle longer than length ofseasonal patternmagnitude of cycle more variable thanmagnitude of seasonal pattern

The timing of peaks and troughs is predictable withseasonal data, but unpredictable in the long termwith cyclic data.

52

Page 53: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Seasonal or cyclic?

Differences between seasonal and cyclic patterns:

seasonal pattern constant length; cyclic patternvariable lengthaverage length of cycle longer than length ofseasonal patternmagnitude of cycle more variable thanmagnitude of seasonal pattern

The timing of peaks and troughs is predictable withseasonal data, but unpredictable in the long termwith cyclic data.

52

Page 54: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Outline

1 Time series in R

2 Time plots

3 Seasonal plots

4 Seasonal or cyclic?

5 Lag plots and autocorrelation

6 White noise

53

Page 55: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Example: Beer production

new_production <- aus_production %>%filter(year(Quarter) >= 1992) # or filter_index("1992 Q1"~.)

new_production

## # A tsibble: 74 x 7 [1Q]## Quarter Beer Tobacco Bricks Cement## <qtr> <dbl> <dbl> <dbl> <dbl>## 1 1992 Q1 443 5777 383 1289## 2 1992 Q2 410 5853 404 1501## 3 1992 Q3 420 6416 446 1539## 4 1992 Q4 532 5825 420 1568## 5 1993 Q1 433 5724 394 1450## 6 1993 Q2 421 6036 462 1668## 7 1993 Q3 410 6570 475 1648## 8 1993 Q4 512 5675 443 1863## 9 1994 Q1 449 5311 421 1468## 10 1994 Q2 381 5717 475 1755## # ... with 64 more rows, and 2 more variables:## # Electricity <dbl>, Gas <dbl>

54

Page 56: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Lagged scatterplots

Each graph shows yt plotted against yt−k fordifferent values of k .Vertical axis: lagged observationHorizontal axis: current observationColors: points are colored by the current quarter

55

Page 57: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Example: Beer production

new_production %>% gg_lag(Beer, geom='point')

lag 7 lag 8 lag 9

lag 4 lag 5 lag 6

lag 1 lag 2 lag 3

400 450 500 400 450 500 400 450 500

400

450

500

400

450

500

400

450

500

Beer

lag(

Bee

r, n)

season

Q1

Q2

Q3

Q4

56

Page 58: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Lagged scatterplots

The autocorrelations are the correlationsassociated with these scatterplots.

57

Page 59: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Autocorrelation

Covariance and correlation: measure extent oflinear relationship between two variables (y andX ).

Autocovariance and autocorrelation: measurelinear relationship between lagged values of a timeseries y .We measure the relationship between:

yt and yt−1

yt and yt−2

yt and yt−3

etc.

58

Page 60: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Autocorrelation

Covariance and correlation: measure extent oflinear relationship between two variables (y andX ).Autocovariance and autocorrelation: measurelinear relationship between lagged values of a timeseries y .

We measure the relationship between:

yt and yt−1

yt and yt−2

yt and yt−3

etc.

58

Page 61: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Autocorrelation

Covariance and correlation: measure extent oflinear relationship between two variables (y andX ).Autocovariance and autocorrelation: measurelinear relationship between lagged values of a timeseries y .We measure the relationship between:

yt and yt−1

yt and yt−2

yt and yt−3

etc.58

Page 62: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Autocorrelation

We denote the sample autocovariance at lag k by ck

and the sample autocorrelation at lag k by rk . Thendefine

ck = 1T

T∑t=k+1

(yt − y)(yt−k − y)

and rk = ck/c0

r1 indicates how successive values of y relate to each otherr2 indicates how y values two periods apart relate to eachotherrk is almost the same as the sample correlation betweenyt and yt−k .

59

Page 63: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Autocorrelation

We denote the sample autocovariance at lag k by ck

and the sample autocorrelation at lag k by rk . Thendefine

ck = 1T

T∑t=k+1

(yt − y)(yt−k − y)

and rk = ck/c0

r1 indicates how successive values of y relate to each otherr2 indicates how y values two periods apart relate to eachotherrk is almost the same as the sample correlation betweenyt and yt−k .

59

Page 64: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Autocorrelation

Results for first 9 lags for beer data:new_production %>% ACF(Beer, lag_max = 9)

## # A tsibble: 9 x 2 [1Q]## lag acf## <lag> <dbl>## 1 1Q -0.102## 2 2Q -0.657## 3 3Q -0.0603## 4 4Q 0.869## 5 5Q -0.0892## 6 6Q -0.635## 7 7Q -0.0542## 8 8Q 0.832## 9 9Q -0.108

60

Page 65: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Autocorrelation

Results for first 9 lags for beer data:new_production %>% ACF(Beer, lag_max = 9) %>% autoplot()

−0.5

0.0

0.5

2 4 6 8lag [1Q]

acf

61

Page 66: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Autocorrelation

r4 higher than for the other lags. This is due tothe seasonal pattern in the data: the peakstend to be 4 quarters apart and the troughstend to be 2 quarters apart.r2 is more negative than for the other lagsbecause troughs tend to be 2 quarters behindpeaks.Together, the autocorrelations at lags 1, 2, . . . ,make up the autocorrelation or ACF.The plot is known as a correlogram

62

Page 67: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

ACF

new_production %>% ACF(Beer) %>% autoplot()

−0.5

0.0

0.5

2 4 6 8 10 12 14 16 18lag [1Q]

acf

63

Page 68: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Trend and seasonality in ACF plots

When data have a trend, the autocorrelationsfor small lags tend to be large and positive.When data are seasonal, the autocorrelationswill be larger at the seasonal lags (i.e., atmultiples of the seasonal frequency)When data are trended and seasonal, you see acombination of these effects.

64

Page 69: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Aus monthly electricity production

elec2 <- as_tsibble(fma::elec) %>%filter(year(index) >= 1980)

elec2 %>% autoplot(value)

8000

10000

12000

14000

1980 1985 1990 1995index [1M]

valu

e

65

Page 70: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Aus monthly electricity production

elec2 %>% ACF(value, lag_max=48) %>%autoplot()

0.00

0.25

0.50

0.75

6 12 18 24 30 36 42 48lag [1M]

acf

66

Page 71: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Aus monthly electricity production

Time plot shows clear trend and seasonality.

The same features are reflected in the ACF.

The slowly decaying ACF indicates trend.The ACF peaks at lags 12, 24, 36, . . . , indicateseasonality of length 12.

67

Page 72: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Google stock price

google_2015 <- gafa_stock %>%filter(Symbol == "GOOG", year(Date) == 2015) %>%select(Date, Close)

google_2015

## # A tsibble: 252 x 2 [!]## Date Close## <date> <dbl>## 1 2015-01-02 522.## 2 2015-01-05 511.## 3 2015-01-06 499.## 4 2015-01-07 498.## 5 2015-01-08 500.## 6 2015-01-09 493.## 7 2015-01-12 490.## 8 2015-01-13 493.## 9 2015-01-14 498.## 10 2015-01-15 499.## # ... with 242 more rows

68

Page 73: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Google stock price

google_2015 %>% autoplot(Close)

500

600

700

Jan 2015 Apr 2015 Jul 2015 Oct 2015 Jan 2016Date [!]

Clo

se

69

Page 74: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Google stock price

google_2015 %>%

ACF(Close, lag_max=100)# Error: Can't handle tsibble of irregular interval.

google_2015

## # A tsibble: 252 x 2 [!]## Date Close## <date> <dbl>## 1 2015-01-02 522.## 2 2015-01-05 511.## 3 2015-01-06 499.## 4 2015-01-07 498.## 5 2015-01-08 500.## 6 2015-01-09 493.## 7 2015-01-12 490.## 8 2015-01-13 493.## 9 2015-01-14 498.## 10 2015-01-15 499.## # ... with 242 more rows

70

Page 75: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Google stock price

google_2015 %>%

ACF(Close, lag_max=100)# Error: Can't handle tsibble of irregular interval.

google_2015

## # A tsibble: 252 x 2 [!]## Date Close## <date> <dbl>## 1 2015-01-02 522.## 2 2015-01-05 511.## 3 2015-01-06 499.## 4 2015-01-07 498.## 5 2015-01-08 500.## 6 2015-01-09 493.## 7 2015-01-12 490.## 8 2015-01-13 493.## 9 2015-01-14 498.## 10 2015-01-15 499.## # ... with 242 more rows

70

Page 76: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Google stock price

#mutate is to create a new variablegoogle_2015 <- google_2015 %>%

mutate(trading_day = row_number()) %>%update_tsibble(index=trading_day, regular=TRUE)

google_2015

## # A tsibble: 252 x 3 [1]## Date Close trading_day## <date> <dbl> <int>## 1 2015-01-02 522. 1## 2 2015-01-05 511. 2## 3 2015-01-06 499. 3## 4 2015-01-07 498. 4## 5 2015-01-08 500. 5## 6 2015-01-09 493. 6## 7 2015-01-12 490. 7## 8 2015-01-13 493. 8## 9 2015-01-14 498. 9## 10 2015-01-15 499. 10## # ... with 242 more rows

71

Page 77: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Google stock price

google_2015 %>%ACF(Close, lag_max=100) %>% autoplot()

0.00

0.25

0.50

0.75

1.00

25 50 75 100lag [1]

acf

72

Page 78: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Your turn

We have introduced the following functions:

gg_lagACF

Explore the following time series using thesefunctions. Can you spot any seasonality, cyclicity andtrend? What do you learn about the series?

Bricks from aus_productionLynx from peltVictorian Electricity Demand from aus_elec

73

Page 79: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Which is which?

40

60

80

0 20 40 60

chir

ps p

er m

inut

e

1. Daily temperature of cow

7000

8000

9000

10000

11000

1974 1976 1978

thou

sand

s

2. Monthly accidental deaths

200

400

600

1950 1955 1960

thou

sand

s

3. Monthly air passengers

30000

60000

90000

1860 1880 1900

thou

sand

s

4. Annual mink trappings

0.0

0.5

1.0

6 12 18

acf

A

0.0

0.5

1.0

5 10 15

acf

B

0.0

0.5

1.0

5 10 15

acf

C

0.0

0.5

1.0

6 12 18

acf

D

74

Page 80: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Outline

1 Time series in R

2 Time plots

3 Seasonal plots

4 Seasonal or cyclic?

5 Lag plots and autocorrelation

6 White noise

75

Page 81: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Example: White noise

wn <- tsibble(t = seq_len(36), y = rnorm(36),index = t)

wn %>% autoplot(y)

−2

−1

0

1

2

0 10 20 30t [1]

y

76

Page 82: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Example: White noise

wn %>% ACF(y, lag_max = 10) %>%

as_tibble() %>%

tidyr::spread(lag, acf) %>%

rename_all(function(x){paste("$r_{",x,"}$",sep="")}) %>%

knitr::kable(booktabs=TRUE,

escape=FALSE, align="c", digits=3,

format.args=list(nsmall=3))

r1 r2 r3 r4 r5 r6 r7 r8 r9 r10

0.177 -0.071 -0.250 -0.020 -0.370 0.007 0.022 0.142 0.015 0.036

77

Page 83: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Example: White noise

−0.4

−0.2

0.0

0.2

5 10 15lag [1]

acf

78

Page 84: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Sampling distribution of autocorrelations

Sampling distribution of rk for white noise data isasymptotically N(0,1/T ).

95% of all rk for white noise must lie within±1.96/

√T .

If this is not the case, the series is probably notWN.Common to plot lines at ±1.96/

√T when

plotting ACF. These are the critical values.

79

Page 85: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Sampling distribution of autocorrelations

Sampling distribution of rk for white noise data isasymptotically N(0,1/T ).

95% of all rk for white noise must lie within±1.96/

√T .

If this is not the case, the series is probably notWN.Common to plot lines at ±1.96/

√T when

plotting ACF. These are the critical values.

79

Page 86: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Example: Pigs slaughtered

pigs <- aus_livestock %>%filter(State == "Victoria", Animal == "Pigs",

year(Month) >= 2014)pigs %>% autoplot(Count/1e3) +

xlab("Year") + ylab("Thousands") +ggtitle("Number of pigs slaughtered in Victoria")

80

90

100

110

2014 2016 2018Year

Tho

usan

ds

Number of pigs slaughtered in Victoria

80

Page 87: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Example: Pigs slaughtered

pigs %>% ACF(Count) %>% autoplot()

−0.2

0.0

0.2

6 12lag [1M]

acf

81

Page 88: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Example: Pigs slaughtered

Monthly total number of pigs slaughtered in the stateof Victoria, Australia, from January 2014 throughDecember 2018 (Source: Australian Bureau ofStatistics.)

Difficult to detect pattern in time plot.ACF shows significant autocorrelation for lag 2and 12.Indicate some slight seasonality.

These show the series is not a white noise series.

82

Page 89: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Example: Pigs slaughtered

Monthly total number of pigs slaughtered in the stateof Victoria, Australia, from January 2014 throughDecember 2018 (Source: Australian Bureau ofStatistics.)

Difficult to detect pattern in time plot.ACF shows significant autocorrelation for lag 2and 12.Indicate some slight seasonality.

These show the series is not a white noise series.

82

Page 90: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Example: Pigs slaughtered

Monthly total number of pigs slaughtered in the stateof Victoria, Australia, from January 2014 throughDecember 2018 (Source: Australian Bureau ofStatistics.)

Difficult to detect pattern in time plot.ACF shows significant autocorrelation for lag 2and 12.Indicate some slight seasonality.

These show the series is not a white noise series.82

Page 91: STAT481/581: IntroductiontoTime SeriesAnalysis - math.unm.edu

Your turn

You can compute the daily changes in the Googlestock price in 2018 using

dgoog <- gafa_stock %>%

filter(Symbol == "GOOG", year(Date) >= 2018) %>%

mutate(trading_day = row_number()) %>%

update_tsibble(index=trading_day, regular=TRUE) %>%

mutate(diff = difference(Close))

Does diff look like white noise?

83