chapter 2 - basic tools for forecasting

7/23/2019 Chapter 2 - Basic Tools for Forecasting

1/66

1

2. Basic Tools for Forecasting

Introduction ................................................................................................................. 2

2.1 Types of data ......................................................................................................... 3

2.2 Time series plots .................................................................................................... 9

2.3 Scatter plots ......................................................................................................... 13

2.4 Summarizing the data .......................................................................................... 18

2.4.1 Notation........................................................................................................ 18

2.4.2 Measures of average ..................................................................................... 19

2.4.3 Measures of variation ................................................................................... 21

2.4.4 Assessing variability ..................................................................................... 24

2.4.5 An example: hot growth companies ............................................................... 26

2.5 Correlation ........................................................................................................... 28

2.6 Transformations ................................................................................................... 32

2.6.1 Differences and growth rates ......................................................................... 33

2.6.2 The log transform .......................................................................................... 36

2.7 How to measure forecasting accuracy? ................................................................ 38

2.7.1 Measures of forecast accuracy ...................................................................... 412.7.2 Measures of absolute error........................................................................... 43

2.8 Prediction intervals .............................................................................................. 49

2.9 Basic Principles ................................................................................................... 53

Summary ................................................................................................................... 55

References ................................................................................................................. 56

Exercises ................................................................................................................... 56

Mini-case 2.1: Are the outcomes of NFL games predictable? ................................. 62

Mini-Case 2.2: Whither Wal-Mart? ....................................................................... 64Mini-Case 2.3: Economic recessions ...................................................................... 65


2/66

2

DDD: Draw the doggone diagram!

(In memory of the late David Hildebrand, who stated the matter rather more forcibly!)

Introduction

In most of the chapters in this book we assume that we have available some kind of

database from which to build numerical forecasts. The data may be incomplete or subject

to error, they may not relate directly to the key variables of interest, and they may not be

available in timely fashion. Nevertheless, they are all we have and we must learn to

understand and respect them, if not to love them; indeed, such is the basis of any good

relationship!

At a conceptual level, we need to understand how the data are compiled and how they

relate to the forecasting issues we seek to address. We must then examine the data to

understand the structure and main features, and to summarize the information available.

In section 2.1, we examine the types of data that arise in practice, and then examine

graphical summaries in sections 2.2 and 2.3. Section 2.4 describes the basic numerical

summaries that are useful and we then move on to measures of association in section 2.5.

Sometimes the original form of the data is not appropriate and some kind of

transformation or modification is needed; this topic is the focus of Section 2.6. Methods

for the generation of forecasts are the focus of later chapters, but in this chapter we will

consider the evaluation of outputs from the forecasting process. In section 2.7 we

examine measures of forecasting accuracy and the evaluation of forecasting performance,


3/66

3

then turn to prediction intervals in section 2.8. . The chapter ends with a summary and

discussion of some underlying principles.

2.1 Types of data

A database may be thought of as a table with multiple dimensions, as the following

examples illustrate:

A survey of prospective voters in an upcoming election; the variables measured

might include voting intentions, party affiliation, age, gender and address

A portfolio of stocks listed in the London Stock Exchange; for each company we

would record contact information, market capitalization, closing stock prices and

dividend payments over suitable periods, and news announcements

The economy of the United States; factors of interest would certainly include

gross domestic product (GDP), consumer expenditures, capital investment

imports and exports.

The reader will certainly be able to add to these lists.

The survey of voters refers to cross-sectional data, in that the purpose is to collect

information as close as possible to the same time for all those interviewed. Of course, in

practice the survey will cover several days but what matters is that the inherent variation

in the data is across respondents. For practical purposes, we view the data as being

collected in the same (short) time period. Of course, voters may change their minds at a

later stage and such shifts of opinion are a major source of discrepancies between opinion

polls and election outcomes.


4/66

4

The daily closing prices for a particular stock or fund over some time period represent

time series data. We are interested in the movement of the price over time. The same

applies if we track the movements over time of macroeconomic variables such as GDP; it

is the development over time that is important.

From these examples, we see that a database may be cross-sectional or time-dependent or

both (consider tracking voting intentions over time or looking at consumer expenditures

each quarter for different regions of a country). Although forecasting practice often

involves multiple series (such as the sales of different product lines), the methods we

examine have the common theme of using data from the past and present to predict future

outcomes. Thus, our primary focus in the first part of the book will be upon the use of

time series data. However, as methods of data capture have become more sophisticated

(e.g. scanners in supermarkets) it has become possible to develop databases that relate to

individuals such as consumers and their spending habits. Forecasting may then involve

the use of cross-sectional data to predict individual preferences or to evaluate a new

customer based upon individuals with similar demographic characteristics.

By way of example, consider the data shown in Tables 2.1, 2.2 and 2.3. Table 2.1 shows

the weekly sales of a consumer product in a certain market area, produced by a major

U.S. manufacturer. This data set will be examined in greater detail in Chapter 4. The

data are genuine but we have labeled the product WFJ Sales to preserve confidentiality.

Cross sectional data refer to measurements onmultiple units, recorded in a single time period.

A time series is a set of measurements recordedon a single unit over multiple time periods.


5/66

5

Table 2.2 shows the annual numbers of domestic passengers at Washington Dulles

International Airport for the years 1963-2007. Clearly, both data sets are time series but

the sales figures are fairly stable (at least after the first 12 weeks or so) whereas the

passenger series shows a strong upward movement. Table 2.3, appearing later in the

chapter, involves cross-sectional data showing the financial characteristics of a sample of

companies.

2.1.1 Use of large databases

A manager responsible for a large number of product lines may well claim that the

forecasting can all be done by computer and there is no need to waste time on model-

building or detailed examination of individual series. This assertion is half-right. The

computer can indeed remove most of the drudgery from the forecasting exercise; see for

example the forecasting methods described in Chapters 4 and 13. However, a computer

is like a sheep-dog. Properly trained, it can deliver a sound flock of forecasts; poorly

trained, it can create mayhem. Even if the forecasting task in question involves

thousands of series to be forecast, there is no substitute for understanding the general

structure of the data so that we can identify appropriate forecasting methods. The

manager can then focus on those products that are providing unusual results.

In order to develop an effective forecasting process, therefore, we need to understand the

kind of data we are handling. That does not mean examining every series in detail, or

even at all, but rather by looking at a sample of series to establish a framework for

effective forecasting. Thus, it is important to understand when and how to use

forecasting methods, how to interpret the results and how to recognize their limitations

and the potential for improvement.


6/66

6


7/66

7

Table 2.1: Value (in $) of weekly sales of product WFJ Sales

[Week 1 is first week of January; WFJ Sales.xlsx]

Week Sales Week Sales Week Sales

1 23056 22 33631 43 32187

2 24817 23 32900 44 30322

3 24300 24 34426 45 345884 23242 25 33777 46 38879

5 22862 26 34849 47 37166

6 22863 27 30986 48 37111

7 23391 28 33321 49 39021

8 22469 29 34003 50 40737

9 22241 30 35417 51 42358

10 24367 31 33822 52 51914

11 29457 32 32723 53 35404

12 31294 33 34925 54 30555

13 38713 34 33460 55 30421

14 35749 35 30999 56 30972

15 39768 36 31286 57 32336

16 32419 37 35030 58 28194

17 37503 38 34260 59 29203

18 31474 39 35001 60 2815519 35625 40 36040 61 28404

20 33159 41 36056 62 34128

21 34306 42 31397


8/66

8

Table 2.2: Washington Dulles International Airport, Domestic Passengers 1963-2007

[Numbers of passengers in '000s; Source: U.S. Department of Transportation, Bureau of

Transport Statistics.Dulles.xlsx]

Year Passengers Year Passengers Year Passengers

1963 641 1978 2518 1993 8501

1964 728 1979 2858 1994 8947

1965 920 1980 2086 1995 9653

1966 1079 1981 1889 1996 10095

1967 1427 1982 2248 1997 10697

1968 1602 1983 2651 1998 12445

1969 1928 1984 3136 1999 16055

1970 1869 1985 4538 2000 15873

1971 1881 1986 8394 2001 14021

1972 1992 1987 9980 2002 13146

1973 2083 1988 8650 2003 12928

1974 2004 1989 9224 2004 18213

1975 2000 1990 9043 2005 22129

1976 2251 1991 9406 2006 17787

1977 2267 1992 9408 2007 18792


9/66

9

2.2 Time series plots

Our aim in the next several sections is not to provide detailed technical presentations on

the construction of the various plots; rather we indicate their application in the current

context. Guidelines for producing these plots and other analyses are provided on the

books website for Excel, EViews, Forecast Pro, Minitab, SAS and SPSS. Also, the

reader is encouraged to make use of the tutorials provided by these programs, as well as

their Help commands. The plots in this chapter are generated using Minitab, unless

stated otherwise.

The time series plot for WFJ Sales is shown in Figure 2.1. As its name suggests, a time

series plot shows the variable of interest on the vertical axis and time on the horizontal

axis. Several features are immediately apparent. Sales are low for the first twelve weeks

and then remain stable until week 46 when there is an increase over the Thanksgiving to

Christmas period and then a peak in the last week of the year. Sales in the following year

are lower than for the final weeks of the previous year,, but higher than for the

corresponding period a year before. We would not wish to make too much of data for one

product over little more than a year, but inspection of the plot has revealed a number of

interesting patterns that we would want to check out for similar products. If these

patterns were found to persist across a number of product lines, we would need to take

these patterns into account in production planning. For example, the company might


10/66

10

initiate extra shifts or overtime to cover peak periods and plan to replenish inventories

during slack periods.

The second time plot presents the data on airline passengers given in Table 2.2. Figure

2.2 shows steady growth from 1962-79 (the airport opened in 1962), then a pause

followed by rapid growth in the late eighties. After a further pause in the early nineties,

there was a long period of growth, with peaks in 1999 and 2005 followed by short-term

declines. A detailed explanation of these changes lies outside the present discussion;

more detailed explanations would require us to examine airport expansion plans, overall

levels of passenger demand, the traffic at other airports in the area, and so on. The key

point is that the time series plot can tell us a lot about the phenomenon under study and

will often suggest suitable approaches to forecasting.

Figure 2.1: Plot of weekly WFJ Sales [WFJ Sales.xlsx]

60544842363024181261

55000

50000

45000

40000

35000

30000

25000

20000

Index

FMS

ales

Time series plot for WFJ Sales


11/66

11

Figure 2.2: Plot of Domestic Passengers at Dulles, 1963-2007 [Dulles.xlsx]

2005199819911984197719701963

25000

20000

15000

10000

5000

0

Year

Passengers

Time series plot for Dulles passengers

2.2.1 Seasonal plots

Figure 2.1 had some elements of a seasonal pattern (the end-of-year peak) but only just

over one year of data from which to identify seasonal behavior. Clearly, Figure 2.2 has

no seasonal pattern since the figures represent complete years. However, seasonal

variations are often very important for planning purposes and it is desirable to have a

graphical procedure that allows us to explore whether seasonal patterns exist. For

monthly data, for example, we may plot the dependent variable against the months, and

generate a separate, but overlaid, plot for a succession of years. In Figure 2.3A we

provide such a plot for airline revenue passenger miles (RPM). RPM measures the total

number of revenue generating miles flown by passengers of U.S. airlines, measured in

billions of miles. To avoid cluttering the diagram, we use only five years of data for


12/66

12

1995-99; a multi-colored diagram that could be created on-line is more informative and

can readily accommodate more years without confusion.

Figure 2.3: Seasonal plots for airline revenue Passenger Miles for 1995-99.

[Revenue miles.xlsx]A. Plot by month, with years overlaid

Month

RevPassMiles

NovSepJulMayMarJan

46

44

42

40

38

36

34

32

30

Year

1997

1998

1999

1995

1996

Scatterplot of RevPass Miles vs Month

B. Time series plot, with each year identified as a sub-group

60544842363024181261

46

44

42

40

38

36

34

32

30

Index

RevPassMiles

1995

1996

1997

1998

1999

Year

Time Series Plot of RevPassMiles

In Figure 2.3A, the line for each year lies above those for earlier years with only rare

exceptions, indicating the steady growth in airline traffic over this period. Figure 2.3B


13/66

13

also shows trend and the seasonal peaks and allows easy comparison of successive

seasonal cycles. There is a major seasonal peak in the summer and a lesser peak in

March/April depending on the timing of Easter. These plots of the data provide

considerable insight into the variations in demand for air travel. Draw the Doggone

Diagram (DDD) is indeed wise counsel.

2.3 Scatter plots

The time series plots, as displayed so far show the evolution of a single series over time.

As we saw with the seasonal plot, it is possible to show multiple series on the same chart,

although some care is required to make the axes sufficiently similar. An alternative is to

plot the variable of interest against potential explanatory variable(s), to see how far

knowledge of the explanatory variable might improve the forecasts of the variable of

interest. Such scatter plots are valuable for both cross-sectional and time series data.

In Figure 2.4 we show a cross-sectional scatter plot for data taken fromBusiness Week

that refer to 100 Hot Growth Companies. The companies were selected based upon

their performance over the last three years in terms of sales growth, earnings growth and

the return on invested capital, all taken as three-year averages. The data are given in

Table 2.3. In the diagram, we plot the price-earnings ratio (P-E Ratio) against the return

on capital (ROC), in order to determine whether variations in ROC are a major

determinant of the companys P-E Ratio. Although there is clearly some relationship

between the two variables, it is also clear that the stock prices reflect a much more


14/66

14

complex evaluation of a companys performance than simply looking at the recent return

on capital.

In its report,Business Weekranked the 100 companies and we may check how far these

two factors went into the ranking by looking at scatter plots against the rank. These plots

are shown in Figure 2.5. The plots show a strong relationship between ROC and Rank,

but a much weaker one between the P-E Ratio and Rank. This finding is hardly

surprising! The BW ranking gave 50% weight to a companys ROC ranking, along with

25% each for sales growth and profits growth.

Table 2.3: Listing of Rank, P-E Ratio against Return on Capital for 100 Hot

Growth Companies for the year?

[Source: Business Week, June 7, 2004, pages 104-109; Growth companies.xlsx]

Rank

Returnon

CapitalP-E

Ratio Rank

Returnon

CapitalP-E

Ratio Rank

Returnon

CapitalP-E

Ratio1 51.5 37 35 19.6 17 68 15.5 14

2 68.5 53 36 16.7 11 69 12.6 17

3 35.3 71 37 13.1 41 70 7.6 24

4 28.7 21 38 17.8 43 71 7.5 43

5 28.2 26 39 17.7 28 72 13.4 28

6 29.8 14 40 14.5 52 73 7.4 14

7 24.4 36 41 15.2 22 74 15.6 26

8 27.4 38 42 8.9 * 75 11.5 25

9 32.3 24 43 14.7 31 76 10.0 31

10 35.1 37 44 12.0 19 77 8.6 15

11 32.5 26 45 10.9 19 78 14.0 31

12 15.9 36 46 14.8 22 79 12.9 20

13 18.5 47 47 15.9 11 80 10.4 16

14 15.0 22 48 12.9 38 81 12.2 19

15 22.3 13 49 21.8 21 82 9.3 32

16 17.2 26 50 12.1 30 83 12.6 21

17 21.2 24 51 8.5 25 84 12.5 26

18 13.5 19 52 18.6 25 85 13.0 39


15/66

15

19 30.3 48 53 12.1 22 86 15.8 32

20 21.2 27 54 15.1 21 87 14.4 28

21 14.1 50 55 17.9 15 88 10.6 16

22 26.7 55 56 9.0 25 89 12.2 31

23 16.1 23 57 11.0 29 90 11.4 13

24 20.3 33 58 18.1 24 91 11.7 6

25 21.2 15 59 6.3 30 92 12.3 37

26 18.3 14 60 15.1 23 93 12.3 29

27 30.6 22 61 15.9 22 94 14.5 20

28 16.6 34 62 15.8 19 95 13.0 28

29 15.7 7 63 18.9 23 96 14.0 23

30 13.4 42 64 12.7 18 97 10.6 30

31 28.2 18 65 15.1 28 98 11.0 21

32 11.2 45 66 8.7 29 99 10.4 31

33 17.3 26 67 15.1 25 100 11.1 29

34 19.9 27

* P-E Ratio not recorded

Figure 2.4 Plot of P-E Ratio against Return on Capital for 100 Hot GrowthCompanies [Growth companies.xlsx]

Return on Capital

P-ERatio

706050403020100

70

60

50

40

30

20

10

0

Scatterplot of P-E Ratio vs Return on Capital

Figure 2.5A: Plot of Return on Capital against Rank


16/66

16

Rank

Returnon

Capital

100806040200

70

60

50

40

30

20

10

0

Scatterplot of Return on Capital vs Rank

Figure 2.5B: Plot of P-E Ratio against Rank.

Rank

P-ERatio

100806040200

70

60

50

40

30

20

10

0

Scatterplot of P-E Ratio vs Rank

We will often have multiple variables of interest and wish the look for relationships

among them. Rather than generate a series of scatter plots as above, we may combine

them into a matrix plot, which is just a two-way array of plots of each variable against

each of the others. The matrix plot for these three variables is shown in Figure 2.6. The

three plots in the upper right are those shown in Figures 2.4 and 2.5. The plots in the

bottom left part of the diagram are these same three plots, but with the X and Y axes

reversed. The matrix plot provides a condensed summary of the relationships among

multiple variables and is a useful screening device for relevant variables in the early

stages of a forecasting exercise.


17/66

17

Figure 2.6: Matrix plot for P-E Ratio, ROC and Rank for 100 Hot GrowthCompanies [Growth companies.xlsx]

P-E Ratio

80

40

0

50250

Return on Capital

50

25

0

80400

100

50

0

Rank

100500

Matrix Plot of P-E Ratio, Return on Capital, Rank

For these data, the relationship between the P-E Ratio and ROC appears to be weaker

than we might expect. The reason for this lies in the set of data used. We are looking at

the records of 100 companies selected for their recent strong performance, so that they all

have strong financial foundations. A random sample ofall companies would show

greater variations in performance but a stronger overall relationship. A fundamental but


18/66

18

often overlooked feature of any statistical study, and especially any forecasting study, is

that the sample data must be relevant for the task in hand. The data set in Table 2.3 is

useful if we are trying to evaluate a possible investment in a growth company, but

much less so if we are trying to understand overall market valuations.

2.4 Summarizing the data

Graphical summaries provide invaluable insights. Time plots and scatter plots should

always be used in the early stages of a forecasting study to aid understanding.

Furthermore, as we shall see in later chapters, such diagrams also play an invaluable role

in providing diagnostics for further model development. Even when we have a large

number of items to forecast, plots for a sample from the whole set of series will provide

useful guidance and insights.

At the same time, we must recognize that while graphical methods provide qualitative

insights, we often need some kind of numerical summary such as the average level of

sales over time or the variability in P-E ratios across companies. These measures are also

valuable for diagnostic purposes, when we seek to summarize forecasting errors, as in

section 2.7.

2.4.1 Notation

At this stage we need to elaborate upon some notational conventions, since we will use

this framework throughout the remainder of the book.

1. Random variables and observations When we speak of an observation, it is

something we have already recorded, a specific number or category. By contrast,

when we talk about future observations, uncertainty exists. For example, if we talk


19/66

19

about tomorrows closing price of the Dow Jones Index, a range of possibilities

exists, which can be described by a probability distribution. Such a variable, with

both a set of possible values and an associated probability distribution, is known as a

random variable. Texts with a more theoretical orientation often use upper-case

letters to denote random variables and lower-case letters for observations that have

already been recorded. More applied books often make no distinction, but rely upon

the context making the difference clear. We will follow the second course of action

and generally use the same notation for both existing observations and random

variables.

2. Variables and parameters As just noted, variables are entities that we can observe

such as sales or incomes. By contrast,parameters contribute to the description of an

underlying process (e.g. a population mean) and are typically not observable. We

distinguish these concepts by using the usual (Roman) alphabet for variables (sample

values), but Greek letters for parameters (population values). Thus, the variable we

wish to forecast will always be denoted by Y and, where appropriate, the sample

mean and standard deviation by and S. The corresponding population mean and

standard deviation will be denoted by and respectively.

2.4.2 Measures of average

By far the most important measure of average is the arithmetic mean, often known

simply as the mean or the average.


20/66

20

An alternate measure of average is given by the median, defined as follows.

Example 2.1: Calculation of the mean and median

Suppose the sales of a popular book over a seven-week period are:

Week 1 2 3 4 5 6 7

Sales (000s) 15 10 12 16 9 8 14

The mean is(15 10 12 16 9 8 14)

12.7

Y+ + + + + +

= =

The order statistics (as we often refer to the values placed in increasing order) are:

8, 9, 10, 12, 14, 15, 16.

Given a set of n values: 1 2, , , nY Y Y the arithmetic mean is given by:

1 2

1

1 i nn

i

i

Y Y Y

Y Yn n

=

=

+ + +

= =

. (2.1)

When the range of summation is clear from the context, such as the index going

from 1 to n in the above formula, we will often write the summation sign without

including the limits.

Given a set of n values 1 2, , , nY Y Y , we place these values in ascending order

written as (1) (2) ( )nY Y Y . The median is the middle observation.

If n is odd, n can be written, n=2m+1 and the median is Y(m+1).

If n is even, n=2m and the median is [Y(m)+Y(m+1)]


21/66

21

Hence the median is the fourth value in the sequence, which also happens to be 12. If

data for week 8 now becomes available (sales = 16) the mean becomes 12.5 and the

median is [12+14] = 13. However, suppose that sales for week 8 had been 116,

because of a sudden surge in popularity. The mean becomes 25, yet the median remains

at 13. In general the mean is sensitive to extreme observations but the median is not.

Which value represents the true average? The question cannot be answered as framed.

The median provides a better view of weekly sales over the first 8 weeks, but the

publisher and the author are more interested in the numbers actually sold. The forecaster

has the unenviable task of trying to decide whether future sales will continue at the giddy

level of 100 thousand plus, or whether they will revert to the earlier more modest level.

The wise forecaster would enquire into the reasons for the sudden jump, such as a rare

large order or a major publicity event.

2.4.3 Measures of variation

A safe investment is one whose value does not fluctuate much over time. Similarly,

inventory planning is much more straightforward if sales are virtually the same each

period. Implicit in both these statements is the idea that we use some measure of

variability to evaluate risk, whether of losing money or running out of stock. There are

three measures of variability that are in common use: the range, the mean absolute

deviation, and the standard deviation. The standard deviation is derived from the

variance, which we also define here.


22/66

22

The deviations are defined as the differences between each observation and the

mean. By construction, the mean of the deviations is zero, so to compute a measure of

variability we use either the absolute values or the squared values. If we use the squares,

our units of measurement become squared also. For example revenues (in $) become ($)2

so we reverse the operation after computing the average by taking the square root to

ensure the measure remains in $s. These various measures are defined as follows, in

terms of the deviationsi i

d Y Y= .

The range denotes the difference between the largest and smallest values in thesample:

Range = ( ) (1)nY Y

TheMean Absolute Deviation is the average of the deviations about the mean, ignoring the

sign:

| |i

dMAD

n=

(2.2)

The Variance is an average of the squared deviations about the mean:

2

2

( 1)

id

S

n

=

(2.3)

The Standard Deviation is the square root of the variance:

2

2 .( 1)

id

S Sn

= =

(2.4)


23/66

23

Example 2.2: Calculation of measures of variation

Consider the values for the seven weeks of book sales, given in Example 2.1. From the

order statistics, we immediately see that the range is:

Range = 16 8 = 8.

However, if week 8 is entered with sales = 116, the range shoots up to 116 8 = 108.

This simple example illustrates both the strength and weakness of the range: it is very

easy to compute, but it is severely affected by extreme values. Its vulnerability to

extreme values makes it unsuitable for most purposes in forecasting. The deviations for

the seven weeks are:

From the table, we have MAD = 18/7 = 2.57, S2 = 58/6 = 9.67 and S = 3.11.

Why do we use (n-1) rather than n in the denominator of the variance?

Since we are using the deviations, if we had only one observation, its deviation would

necessarily be zero. That is, we have no information about the variability in the data.

Likewise, in our sample of seven, if you tell me six of the deviations, I can work out the

value of the seventh observation from the fact that they must sum to zero. In effect, by

subtracting the mean from each observation we have lost an observation. In statistical

Week 1 2 3 4 5 6 7 Sums

Sales (000s) 15 10 12 16 9 8 14 84

Deviation +3 -2 0 +4 -3 -4 +2 0

|d| 3 2 0 4 3 4 2 18d 9 4 0 16 9 16 4 58


24/66

24

parlance, this is known as losing a degree of freedom and we say that the variance is

computed using (n-1) degrees of freedom, which we abbreviate to (n-1) DF. In later

chapters, we sometimes lose several DF, and the definitions of variability will change

accordingly. This adjustment has the benefit of making the sample variance an unbiased

estimator for the population variance. Why dont we use (n-1) in the MAD? Standard

practice is to use nbut there is no other good reason!

Is S always bigger than MAD?

S gives greater weight to the more extreme observations by squaring them and it may be

shown that S MAD> whenever MAD is greater than zero. A rough relationship between

the two is: S=1.25MAD.

2.4.4 Assessing variability

The statement that our book sales have a standard deviation of 3.11 (thousand,

remember) conveys little about the inherent variability in the data from week to week,

unless we live and breathe details about the sales of that particular book, like any

penniless author. To produce a more standard frame of reference, we use standardized

scores. Given a sample mean Yand sample standard deviation S, we define the

standardized scores for the observations, also known as Z-scores as:

iY Y

Z

S

=

Following our simple example, we obtain

Week 1 2 3 4 5 6 7

Sales (000s) 15 10 12 16 9 8 14


25/66

25

The Z-scores still do not provide much information until we provide a frame of reference.

In this book, we usually use Z-scores to examine forecast errors and proceed in three

steps:

1. Check that the observed distribution of the errors is approximately normal (for

details, see Appendix X).

2. If the assumption is satisfied, relate the Z-score to the normal tables (provided in

Appendix X):

The probability that |Z| > 1 is about 0.32



3. Create a time series plot of the residuals (and/or Z-scores) when appropriate to

determine which observations appear to be extreme.

At this stage we do not pursue the systematic use of Z scores except to recognize that

whenever you see a Z-score greater than 3 in absolute value, the observation is very

atypical, since the probability of such an occurrence is less than 3 in 1,000. Often, such

large values will signify that something unusual has happened and we refer to such

observations as outliers. In cross-sectional studies it is sometimes admissible to just

delete such observations (e.g. a report of a 280 year-old man is undoubtedly a recording

error). In time series forecasting, we wish to retain the complete sequence of values and

must investigate more closely, often finding special circumstances (e.g. a strike, bad

Deviation +3 -2 0 +4 -3 -4 +2

Z-score 0.96 -0.64 0 1.29 -0.96 -1.29 0.64


26/66

26

weather, a special sales promotion) for which we had not allowed. Outliers indicate the

need for further exploration, not routine rejection. We defer the detailed treatment of

outliers to Chapter X.

2.4.5 An example: hot growth companies

The default summary outputs for Minitab and Excel for the data in Table 2.3 on hot

growth companies are shown in Figure 2.7. The output from other programs may have a

somewhat different format, but the summary measures included are similar and most

programs allow a variety of options. Excel typically produces too many decimal places;

for ease of comparison, our output has been edited to produce a reasonable number of

decimal places. Note that the count of observations is one fewer for the P-E Ratio, as we

had a missing value.

Both sets of summary statistics we show include a numbers of measures wse do not need

until later. HoweverMinitab introduces Q1 (quartile 1, the value with 25% of

observations below Q1 and 75% above) and Q3 (quartile 3, with 75% below and 25%

above). These, together with the median (Q2) are often useful for summarizing variables.

Figure 2.7: Descriptive Statistics for Hot Growth Companies[Growth companies.xlsx]

(a) MinitabVar i abl e N N* Mean SE Mean St Dev Var i ance Mi ni mum Q1Ret urn on Capi t al 100 0 17. 028 0. 900 8. 998 80. 961 6. 300 12. 100P- E Rat i o 99 1 27. 06 1. 12 11. 11 123. 36 6. 00 20. 00

Var i abl e Medi an Q3 Maxi mum RangeReturn on Capi t al 14. 900 18. 575 68. 500 62. 200P- E Rat i o 25. 00 31. 00 71. 00 65. 00

(b) Excel

Return on Capital P-E Ratio


27/66

27

Mean 17.028 Mean 27.06

Standard Error 0.900 Standard Error 1.12

Median 14.900 Median 25.00

Mode 15.100 Mode 26.00

Standard Deviation 8.998 Standard Deviation 11.11

Sample Variance 80.961 Sample Variance 123.36

Kurtosis 11.676 Kurtosis 1.95

Skewness 2.814 Skewness 1.07

Range 62.200 Range 65.00

Minimum 6.300 Minimum 6.00

Maximum 68.500 Maximum 71.00

Sum 1702.800 Sum 2679.00

Count 100 Count 99

Given the mean and standard deviation, we proceed to compute the Z-scores, shown in

Table 2.4. We list the top seven companies, as the interesting features are associated with

the top few on the list; those ranked 94-100 are provided for comparison. In Table 2.4,

we have shaded those Z-scores that are greater than 3 in absolute value; none of the

remaining companies had any Z-scores outside 3. From the table, it is clear that the

first two companies on the list have an ROC much greater than the rest, whereas the third

one has a P-E Ratio that is much larger. Turning to numbers 94-100 they have negative

Z-scores for ROC and typically small scores for their P-E Ratios. That is, they do not

look so good, although that is only with reference to the illustrious company they are

keeping. Across all public companies, these 100 would show impressive figures that

yielded positive Z-scores.

Table 2.4: Z-scores for hot growth companies

Rank Return on P-E Z-ROC Z-PE


28/66

28

Capital Ratio

1 51.5 37 3.83 0.89

2 68.5 53 5.72 2.33

3 35.3 71 2.03 3.95

4 28.7 21 1.30 -0.555 28.2 26 1.24 -0.10

6 29.8 14 1.42 -1.18

7 24.4 36 0.82 0.80

94 14.5 20 -0.28 -0.64

95 13.0 28 -0.45 0.08

96 14.0 23 -0.34 -0.37

97 10.6 30 -0.71 0.26

98 11.0 21 -0.67 -0.55

99 10.4 31 -0.74 0.35

100 11.1 29 -0.66 0.17

2.5 Correlation

In the previous section we produced numerical summaries to complement the graphical

analysis of section 2.2. We now develop a statistic that performs a similar function for

the scatter plots of section 2.3, known as the correlation. Before developing the

coefficient, we examine Figure 2.8; in each case the horizontal axis may be interpreted as

time. The five plots suggest the following:

Y1 increases with time, and is perfectly related to time;

Y2 decreases with time, and is perfectly related to time;

Y3 tends to increase with time but is not perfectly related to time;

Y4 tends to decrease with time but the relationship is weaker than for Y3;

Y5 shows virtually no relationship with time;


29/66

29

Y6 is perfectly related to time, but the relationship is not linear.

Our measure should reflect these differences, but not be affected by changes in the origin

or changes of scale; the origins and scales of the variables are deliberately omitted from

the diagrams as they do not affect the degree of association between the two variables.

The most commonly used measure that satisfies these criteria is the (Pearson) Product

Moment Correlation Coefficient, which we simply refer to as the correlation. We use the

letterrto denote the sample coefficient and the Greek letter [rho] to denote the

corresponding population quantity. Our definition refers to the sample quantity; the

population definition follows on replacing the sample components by their expected

values, but we shall not need that expression explicitly.

Figure 2.8: Plots of hypothetical data against time.

Y1 Y2

Y3 Y4


30/66

30

Y5 Y6

When we divide denominator by (n-1) the two terms inside the square root sign become

the sample variances of X and Y respectively. That is, taking square roots, they represent

the two standard deviations, SX and SY. The numerator divided by (n-1) is known as the

sample covariance between X and Y, denoted by SXY. That is, the correlation may be

written as:

XY

X Y

Sr

S S= (2.6)

It may be shown that, for Y1 in Figure 2.8, r = 1, the maximum value possible.

Similarly, Y2 has r = -1, the minimum possible. The other correlations are, for Y3, Y4,

Y5 and Y6: 0.93, -0.66, -0.09 and 0 respectively. In general, we see that the absolute

value ofrdeclines as the relationship gets weaker. At first sight the result for Y6 appears

odd. There is a clear relationship with X, but the correlation is zero. The reason for this

is that rmeasures linear association but the relationship with X is quadratic rather than

The sample correlationbetween X and Y is defined as:

1

2 2

( )( )

( ) ( )

n

i i

i

n n

i i

X X Y Y

r

X X Y Y

=

=

(2.5)


31/66

31

linear. A good example would be the relationship between total revenue and price:

charge too much or too little and total revenue is low.

Example 2.3: Calculation of the correlation

Using the data from Example 2.1, the detailed calculations for the correlation between

sales and time are shown in the table. A spreadsheet could readily be set up in this

format for direct calculations, but all standard software packages have a correlation

function.

Thus, 10 / 6, 28 6 and 58 6XY X Y

S S S= = = so that 0.248r = .

The example shows a weak negative correlation for sales with time; that is, sales may be

declining slightly over time.

Example 2.4: Correlation for hot growth companies

For the data given in Table 2.3, the correlations among rank, ROC and P-E Ratio are:

Week, X 1 2 3 4 5 6 7 Sums Mean

Sales (000s), Y 15 10 12 16 9 8 14

X X -3 -2 -1 0 1 2 3 0 4X =

Y Y +3 -2 0 +4 -3 -4 +2 0 12Y =

2( )X X 9 4 1 0 1 4 9 28 4

2( )Y Y 9 4 0 16 9 16 4 58 8.3

( )( )X X Y Y -9 4 0 0 -3 -8 6 -10 -1.4

Variables Correlation

Rank and ROC -0.647


32/66

32

As expected, there is a strong negative correlation between Rank and ROC, since 50

percent of the weight for the ranking is based upon ROC (high ROC relates to a small

number for Rank). The correlation of P-E Ratio and Rank is also negative, but weaker

(no direct weighting). Finally, we see a modest positive correlation between the ROC

and the P-E Ratio. We may compare these numbers with the plots in Figure 2.5 to gain

some insight into their interpretation.

2.6 Transformations

We now examine the annual figures for number of passengers on domestic flights out of

Dulles airport. The descriptive statistics are as follows:

Descriptive Statistics: Passengers (from Dulles.xlsx)

Var i abl e N N* Mean SE Mean St Dev Mi ni mum Q1 Medi an Q3 Maxi mumPassenger s 45 0 7111 892 5987 641 1996 4538 10396 22129

Dulles, we have a problem! What does the average of 7,111 mean? Such levels were

typical of the mid-eighties, but the average in a strongly trending series like this one

has no meaning. Certainly, it would make no sense to use either the mean or the median

to forecast the next years traffic.

How should we deal with a series that reveals a strong trend? Everyday conversation

provides a clue. We talk of the return on an investment, an increase of a certain number

in sales or the percentage change in GDP. This approach is partly a matter of

Rank and P-E Ratio -0.267

ROC and P-E Ratio 0.306


33/66

33

convenience; some ideas are more readily communicated using (percentage) changes

rather than raw figures. Thus we may regard 3 percent growth in GDP in the US or

Europe as reasonable, 1 percent as anemic and 10 percent as unsustainable (except in

China and India. The same information conveyed in currency terms, measured in trillions

of US$, would be hard to comprehend.

From the forecasting perspective there are two further reasons for considering such

alternatives:

The forecast related directly back to the previously observed value, so that such

forecasts are unlikely to be wildly off-target

Averages measured in terms of changes or percentage changes in the time series

are often more stable and more meaningful than averages computed from the

original series.

We now explore these options in greater detail.

2.6.1 Differences and growth rates

The change in the absolute level of the series from one period to the next is known as the

(first) difference1

of the series, and it is written as:

1t t tDY Y Y= (2.7)

At time t, the previous value 1tY is already known. If the forecast for the difference is

written as t

D the forecast fort

Y , Y , becomes

1

t t t t F Y Y D

= = + (2.8)We use ^, the hat symbol, to denote a fore

1 Many texts use the Greek capital letter (del) and others use (inverted del) but the use of D seems a

better mnemonic device for difference.


34/66

34

1

1

( )100 t t

t

t

Y YGY

Y

= . (2.9)

Expression (2.9) also defines the one-period return on an investment, given the opening

price of 1tY . Once the growth rate has been predicted, denoted by tG the forecast for the

next time period is:

1

[1 ]

100t

t t t

GF Y Y

= = + . (2.10)

The time plots for DY and GY for the Dulles passengers series are shown in Figures

2.9A and B. Both series show a fairly stable level over time, so that the mean becomes a

useful summary again, although GY is trending slightly downwards, indicating a slowing

in percentage growth.

Another feature of Figure 2.9A is that the variability in DY is much greater at the end of

the series than it is at the beginning. By contrast, the GY series has more consistent

fluctuations. We could claim that GY has a stable variance over time, a claim that it

would be hard to make for DY. Which should we use? In part, the choice will depend

upon the purpose behind the forecasting exercise, but an often reasonable guideline is a

common-sense one Do you naturally think of changes in the time series in absolute

terms or relative (i.e. percentage) terms? If the answer is absolute use DY; if it is

relative use GY. In the present case, both transformed series show some unusual values

and further investigation would be warranted.

The summary statistics for DY and GY are:

2 The use of G to describe growth rate is non-standard; we use it for the same reason as above: it is aconvenient mnemonic device.


35/66

35

Descriptive Statistics: Difference, Growth rate (fromDulles.xlsx)

Var i abl e N N* Mean SE Mean St Dev Mi ni mum Q1 Medi an Q3Di f f er ence 44 1 413 232 1536 - 4342 - 74 221 552Gr owt h r at e 44 1 9. 41 2. 83 18. 78 - 26. 99 - 1. 53 5. 93 18. 21

Var i abl e Maxi mum

Di f f erence 5285Gr owt h r at e 84. 95

These figures also reflect the considerable fluctuations that appear in each series

Figure 2.9A: Time plot for the first differences of the Dulles passengers series.[Dulles.xlsx]

200720031999199519911987198319791975197119671963

5000

2500

0

-2500

-5000

Year

Difference

Time Series Plot of Difference

Figure 2.9B: Time plot for the growth rates for the Dulles passengers series[Dulles.xlsx]


36/66

36

200720031999199519911987198319791975197119671963

100

80

60

40

20

0

-20

-40

Year

Growthrate

Time Series Plot of Growth rate

2.6.2 The log transform

In George Orwells classic novel 1984 there is a scene where the chocolate ration is

reduced by 50 percent and then increased by 50 percent. The main character, Winston

complains that he does not have as much chocolate as before, but he is sharply rebuked

for his remarks. However, Winston is right, since

50 50(1 )(1 ) 0.75

100 100 + =

So that Winston has 25 percent less chocolate than before. To avoid this asymmetry, we

may use the logarithmic, or just log transform, usually with thenatural logarithm

defined

on the base e=2.71828 . The log transform may be written as ln( )t t

L Y= and the (first)

difference in logarithms becomes:

1ln( ) ln( )t t tDL Y Y= (2.11)


37/66

37

The primary purpose of the log transform is to convert exponential (or proportional)

growth into linear growth. The transform often ahs the secondary benefit of stabilizing

the variance, as did the use of growth rates. Indeed, the log and growth rate transforms

tend to produce very similar results, as can be seen by comparing the plot of the log

differences for the Dulles passengers series in Figure 2.10 with Figure 2.9B.

Figure 2.10: Time plot for the first difference of logarithms for the Dullespassengers series (DL_pass) [Dulles.xlsx]

200720031999199519911987198319791975197119671963

0.6

0.4

0.2

0.0

-0.2

-0.4

Year

DL_

pass

Time Series Plot of DL_pass

If we generate a forecast of the log-difference, tDL say, the forecast for the original

series, given previous value 1tY becomes:

1 exp( )

t t tY Y DL

= (2.12)


38/66

38

Example 2.5: Calculation of forecast using log-differences

The actual number of Dulles passengers for 2007 was 18,792 (in thousands). To make a

forecast for 2008, we might use the last value for the log-difference as the forecast ofDLt

, which is 0.05496. Then equation (2.12) yields

18792exp(0.05496) 18792*1.0565 19,854t

Y = = = .

2.7 How to measure forecasting accuracy?

A key question in any forecasting endeavor is how to measure performance. Such

measures are of particular value when we come to select a forecasting procedure, since

we may compare alternatives and choose the method with the best track record. Then,

once the method is being used on a regular basis, we need similar measures to tell us

whether the forecasts are maintaining their historical level of accuracy. If a particular set

of forecasts is not performing adequately, managerial intervention will be needed to get

things back on track by putting improvements in place such as more timely data, better

statistical methods and software (see chapters 13 and 14)..

The generation of forecasts and the selection of a preferred method will occupy a major

portion of the book. Therefore, in order to discuss issues of accuracy without the need to

develop forecasting methods explicitly at this stage, we consider an example taken from

meteorology. Weather forecasts that appear in the media are not directed at a particular

audience and there is no reason to suppose that forecasts of temperature would have any

inherent bias. However, we would expect that such forecasts (and this is typically true of

all forecasts) would become less accurate as the forecast horizon increases, in this case,

the number of days ahead.


39/66

39

We consider a set of local forecasts for daily high temperatures, extracted from the

Washington Postfor the period December 17 2003 to January 5 2004. The forecasts are

generated byAccuweather, a weather forecasting organization. The forecasts appear for

1 to 5 days ahead, so the initial data could be summarized as shown in Table 2.5 (first

few days only). However, this form of presentation is not useful for the evaluation of the

forecasts since, for example, the 4-days ahead forecast made on December 17 refers to

conditions to be observed on December 21. To match forecasts to actual outcomes we

must slide the columns down, as shown in Table 2.6. We may now compare forecasts in

the same row.

Table 2.5: Temperature forecasts for Washington DC, December 17-22 2003.[Figures represent daily highs at Reagan National Airport. Source: Washington Post.

DC weather.xlsx]

Date Forecasts, days ahead Actual

1 2 3 4 5 Temp

17-Dec-03 42 40 42 44 48 50

18-Dec-03 36 38 42 50 54 38

19-Dec-03 38 40 52 54 52 37

20-Dec-03 44 52 56 54 48 40

21-Dec-03 52 58 56 48 48 44

22-Dec-03 58 56 44 48 50 57

Table 2.6: Temperature forecasts for Washington DC, December 17 2003 toJanuary 5, 2004[ Figures represent daily highs at Reagan National Airport; DC weather.xlsx]

Date Forecasts, days ahead Actual


40/66

40

1 2 3 4 5 Temp

17-Dec-03 50

18-Dec-03 42 38

19-Dec-03 36 40 37

20-Dec-03 38 38 42 40

21-Dec-03 44 40 42 44 44

22-Dec-03 52 52 52 50 48 57

23-Dec-03 58 58 56 54 54 62

24-Dec-03 54 56 56 54 52 56

25-Dec-03 45 44 44 48 48 43

26-Dec-03 47 46 49 48 48 44

27-Dec-03 54 52 54 54 50 54

28-Dec-03 54 55 54 50 54 5229-Dec-03 60 59 58 54 54 60

30-Dec-03 55 53 49 49 51 54

31-Dec-03 53 50 55 53 48 49

1-J an-04 52 52 56 57 50 55

2-J an-04 54 54 45 52 48 50

3-J an-04 69 64 64 62 60 68

4-J an-04 64 64 62 60 56 72

5-J an-04 62 58 54 54 51 49

A general format following the structure of Table 2.6 is shown in Table 2.7, where Y t

denotes the actual value in period t and t|t-hF denotes the forecast made in period t-h for

period t, the h-step ahead foreast. Period t-h is called the forecast origin. Often, we are

interested in one-step-ahead forecasts and we then simplify the notation3 tFto instead of

t|t-1F . Thus, F15 refers to the one-step-ahead forecast made for period 15 at time 14,

15|13F to the two-step-ahead forecast made for period 15 at time 13 and so on. These

values will be eventually compared to the observed value in period 15, Y15.

3 The notation for forecasts is not standard. Some texts use Ft+h to denote forecasts h steps-ahead for Y t+h.While this notation is simpler than ours, and appears to work well when expressed algebraically as here, the

notation F13+2 [since it is not equal to F15!] is potentially confusing and 15|13F is clearer.


41/66

41

Table 2.7: Structure of forecasts for 1, 2, 3, periods ahead

Period Days ahead that forecasts were made Actual

1 2 3

t-1 Ft-1 or Ft-1|t-2 Ft-1|t-3 Ft-1|t-4 ... Yt-1

t Ftor Ft|t-1 Ft|t-2 Ft|t-3 Yt

t+1 Ft+1 or Ft+1|t Ft+1|t-1 Ft+1|t-2 Yt+1

t+2 Ft+2 or Ft+2|t+1 Ft+2|t Ft+2|t-1 Yt+2

2.7.1 Measures of forecast accuracy

Now that we have a set of forecasts and actual values with which to compare them, how

should the comparisons be made? A natural approach would be to look at the differences

between the observed values and the forecasts, and to use their average as a performance

measure. Suppose that we start fromforecast origin t so that the forecasts are made

successively (one-step-ahead) at times t+1, t+2, , t+m; there being m such forecasts in

all. The one-step-ahead forecast error at time t+i may be denoted by

ititit FYe +++ =

A possible indicator is the mean of the errors.

The Mean Error is a useful way of detecting bias in a forecast; that is, ME will be large

and positive (negative) when the actual value is consistently greater (less) than the

Theforecast origin is the time periodfrom which the forecasts are made

TheMean Error(ME) is given by

1 1( ) / /

m m

t i t i t i

i iME Y F m e m+ + +

= == = (2.13)


42/66

42

forecast. When the variable of interest is strictly positive, as with the number of

employees or sales revenues, a percentage measure is often more useful.

Note that the ME is a useful measure for the temperature data, but MPE is not, since the

temperature can fall below zero. More importantly, temperature does not have a natural

origin, so that the MPE would give different (and equally meaningless) results depending

on whether we used the Fahrenheit or Celsius scales.

Example 2.6: Calculation of ME and MPE (Electric errors.xlsx)

The calculations of ME and MPE are illustrated in Table 2.8. The data in this table

represent the monthly electricity consumption (in KWH, kilowatt hours) in a Washington

DC household for 2003; the column of forecasts represents the consumption in the

corresponding month in 2002. Consumption is low in the winter and high in the summer

because the home uses gas heating and electric air conditioning.

As noted, the ME and MPE are useful measures of bias; from Table 2.8 we see that the

household generally reduced its consumption over the year, so the forecasts tended to be

too high. In passing, we note that the year-on-year change is given by comparing the

totals for 2002 and 2003, 13,190 and 11,270 KWH respectively, which results in a 14.6%

TheMean Percentage Error(MPE) is

1 1

( )100 100m mt i t i t i

i it i t i

Y F eMPE

m Y m Y

+ + +

= =+ +

= = (2.14)


43/66

43

drop. The 18.7 percent average given by the MPE reflects month-by-month forecasting

performance, not the change in the totals.

A limitation of these measures is that they do not reflect variability. Positive and

negative errors could virtually cancel each other out, yet substantial forecasting errors

could remain. To see this effect, suppose we used the average monthly figure for 2002 to

predict the months of 2003. The average is 1099 KWH, a figure that seriously

underestimates summer consumption and overestimates the rest of the year. Yet, the ME

would be unchanged. The MPE expands to -37.2 since the errors are larger in the months

with low consumption; however, this apparent gain is largely illusory. For example, a

forecast value of 800 KWH per month is clearly not very useful, yet it reduces the MPE

to -0.1, as shown in Table 2.9!

From this discussion it is evident we also need measures that take account of the

magnitude of an error regardless of sign.

2.7.2 Measures of absolute error

The simplest way to gauge the variability in forecasting performance is to examine the

absolute errors, defined as the value of the error ignoring its sign and expressed as:

| | | |i i i

e Y F= . (2.15)

Thus, if we generate a forecast of F = 100, the absolute error is 20 whenever the actual

value turns out to be either 80 or 120. As before, we may consider various averages,

based upon the absolute errors. Those in common use are:


44/66

44

MASE is a new measure introduced by Hyndman and Koehler (2005). The MASE is the

ratio of the MAE for the current set of forecasts relative to the MAE for forecasts made

using the random walk; the random walk forecastspecifies the most recent observation as

Mean Absolute Error:

1 1

| | / | | /m m

t i t i t i

i i

MAE Y F m e m+ + +

= =

= = (2.16)

Mean Absolute Percentage Error:

= +

+

= +

++ =

=m

i it

it

m

i it

itit

Y

e

mY

FY

mMAPE

11

||100||100(2.17)

Mean Square Error:

=

+

=

++==

m

i

it

m

i

ititmemFYMSE

1

2

1

2 //)( (2.18)

Root Mean Square Error:

MSERMSE= . (2.19)

Mean Absolute Scaled Error

1

11

| |

| |

m

t i t i

i

m

t i t i

i

Y F

MASE

Y Y

+ +

=

+ +

=

=

(2.20)


45/66

45

the forecast for the next period. When the MASE is greater than one, we may conclude

that the random walk forecasts are superior. When MASE is less than one, the method

under consideration is superior to the random walk.

The following comments are in order:

1. MAPE should only be used when Y > 0; MASE is not so restricted.

2. MAPE is the most commonly used error measure in practice. It is sensitive to

values of Y close to zero when the Median, MdAPE can be used in its place.

3. The RMSE is used since the MSE involves squared errors so that is the original

series is in dollars, MSE is measured in terms of (dollars)2. Taking the square

root to obtain the RMSE restores the original units.

4. The RMSE gives greater weight to large (absolute) errors. It is therefore sensitive

to extreme errors. It may be shown that MAERMSE for any set ofm forecasts.

5. The measure using absolute values always equals or exceeds the absolute value of

the measure based on the errors, so that || MEMAE and || MPEMAPE . If the

values are close in magnitude that suggests a systematic bias in the forecasts.

6. Both MAPE and MASE are scale-free and so can be used to make comparisons

across multiple series. The other measures are scale-dependent and cannot be used

to make such comparisons without an additional scaling.

Example 2.7: Calculation of absolute error measures

The absolute error measures are computed in Table 2.8 for the electricity forecasts. The

individual terms are shown in the various columns and MAE, MAPE and MSE are then

evaluated as the column averages. RMSE follows directly from equation (2.16); that is,


46/66

46

by taking the square root of the MSE. The lower part of the table yields the MAE for the

random walk forecasts, so that the MASE is given by the ratio of the forecast MAE to the

MAE of the random walk forecasts. That is, relative to the random walk methods, the

forecasts based upon the same month in the previous year provide a 38 percent

[ 100*(263.6 163.3) / 263.6]= reduction in the mean absolute error.

Table 2.8: Analysis of forecasting accuracy for electricity consumption in aWashington DC household [The monthly forecasts for 2003 are the correspondingactual values for 2002]

(Electric Errors.xlsx)

(a) Error analysis for actual forecasts

Month Actual Forecast

[=2002

values]

Errors Absolute

errors

Percentage

errors

Absolute

percentage

errors

Squared

errors

J an-03 790 820 -30 30 -3.8 3.8 900

Feb-03 810 790 20 20 2.5 2.5 400

Mar-03 680 720 -40 40 -5.9 5.9 1600

Apr-03 500 640 -140 140 -28.0 28.0 19600

May-03 520 780 -260 260 -50.0 50.0 67600

J un-03 810 980 -170 170 -21.0 21.0 28900

J ul-03 1120 1550 -430 430 -38.4 38.4 184900

Aug-03 1840 1850 -10 10 -0.5 0.5 100

Sep-03 1600 1880 -280 280 -17.5 17.5 78400

Oct-03 1250 1600 -350 350 -28.0 28.0 122500

Nov-03 740 890 -150 150 -20.3 20.3 22500

Dec-03 610 690 -80 80 -13.1 13.1 6400

ME = -160.0 MPE = -18.7MAE = 163.3 MAPE = 19.1

MSE = 44483.3 RMSE = 210.9

(b) Error analysis for random walk

Month Actual Randomwalk

forecast

Errors Absoluteerrors


47/66

47

J an-03 790 690

Feb-03 810 790 20 20

Mar-03 680 810 -130 130

Apr-03 500 680 -180 180

May-03 520 500 20 20

J un-03 810 520 290 290

J ul-03 1120 810 310 310

Aug-03 1840 1120 720 720

Sep-03 1600 1840 -240 240

Oct-03 1250 1600 -350 350

Nov-03 740 1250 -510 510

Dec-03 610 740 -130 130

MAE 263.6

Table 2.9 provides a comparison of the three sets of forecasts for electricity consumption:

Last years value as given in Table 2.8

Yearly average (= 1099)

All months set at 800.

Random Walk forecast (as above)

Table 2.9: Comparison of forecasts for electricity data (Electric Errors.xlsx)

Forecast Error Measure

ME MPE MAE MAPE RMSE MASE

Last year's values -160 -18.7 164 19.1 211 0.62

Monthly average -160 -37.2 396 51.5 440 1.67

All F = 800 139 0.1 299 28.8 433 1.64

Random Walk

From the table, we see that the forecasts based upon last years figures are clearly more

accurate. Indeed, some local utilities use these forecasts to estimate customers bills


48/66

48

when no meter recordings are available. The ME and MPE values indicate that the

forecasts were somewhat biased. Did the household deliberately try to conserve energy?

Perhaps, but another factor was certainly that 2003 had a cooler summer; not something

that could be reliably forecast at the beginning of January.

Example 2.8: Comparison of weather forecasting errors

We may now use these various measures to assess the performance of the various

forecasts presented in Table 2.6. The results are given in Table 2.10; the MPE and

MAPE are not reported since they are not sensible measures in this case. As is to be

expected, the MAE and RMSE generally increase as the forecast horizon is extended;

forecasts should improve as we get nearer to the event. The MASE increases because we

used the first order lag to define the random walk. If we had used the same order of lag

as the original forecasts, the MASE would be more similar to the lag one value.

Table 2.10: Summary of forecast errors for weather data [DC_weather.xlsx]

Steps Ahead

Measure 1 2 3 4 5

ME -0.47 0.61 1.00 1.63 3.53

MAE 3.11 3.17 3.59 4.38 5.27

RMSE 4.35 3.92 4.41 5.49 6.43

MASE 0.40 0.44 0.55 0.68 0.87

Theoretically, the RMSE for forecasts should increase as the lead time increases.

However, we note that this expectation may be violated because of small numbers of


49/66

49

observations being used to compute the summary measures, as illustrated in the table for

lags 1 and 2.

2.8 Prediction intervals

Thus far, our discussion has centered uponpoint forecasts; that is, future observations for

which we report a single forecast value. For many purposes, managers seem to feel

comfortable with a single figure. However, such confidence in a single number is often

misplaced.

Consider, for example, the generation of weekly sales forecasts. Our method might

generate a forecast for next week of 600 units. The manager who plans for the sale of

exactly 600 units and never considers the possibility of selling more or fewer units is a

fool, and will probably become an ex-manager fairly quickly! Why? Because the

demand for most products is inherently variable! Some weeks will see sales below the

forecast level and some will see more. When sales fall short of the point forecast, the

business will incur holding costs for unsold inventory or may have to destroy perishable

stock. When sales exceed the point forecasts, not only will the business will lose sales,

but disappointed customers may go elsewhere in the future. The best choice of inventory

level will depend upon the relative costs of lost sales and excess inventory, which are

described by the statistical distribution of possible sales, known as thepredictive

distribution. The selection of the best inventory level in this case is known as the

newsvendor problem, since it was originally formulated in the context of selling

newspapers. Our purpose here is not to dwell upon the details, which may be found in


50/66

50

most management science texts, such as Winston and Albright (2001), but rather to

emphasize the fundamental role that the predictive distribution plays in such cases.

If we know the (relative) magnitudes of the costs of lost sales and of excess inventory we

may define an overall cost function and then select the level of inventory to minimize

cost; see Exercise 2.13 for further details.

Sometimes, these costs are difficult to assess, and the manager will prefer to guarantee a

certain level of service. For example, suppose that we wish to meet demand 95 percent

of the time. We then need to add a safety stockto the point forecast to ensure that the

probability of a stock-out is no more than 5 percent. Typically, we assume that the

predictive distribution for demand follows the normal law (although such an assumption

is at best an approximation and needs to be checked). If we assume that the standard

deviation (SD) of the distribution is known, we may use the upper 95 percent point of the

standard normal distribution4

Mean + 1.645*(SD), (2.21)

(this value is 1.645, see Appendix Table A1) so the

appropriate stock level is:

The mean in this case is the point forecast. Thus, if the point forecast is 600 with an

associated SD of 25, the manager would stock 600 + 1.645*25 or 641 units to achieve the

desired level of customer service.

4 The normal distribution is by far the most widely used in the construction of prediction intervals. Thisusage makes it critical to check that the forecast errors are approximately normally distributed. SeeAppendix X.


51/66

51

Expression (2.21) is an example of a one-sided prediction interval: the probability is 0.95

that demand would be equal to or less than 641, assuming that our forecasting method is

appropriate for the sales of that particular product. Typically the SD is unknown and must

be estimated from the sample that was used to generate the point forecast. That is, we use

the RMSE to estimate the SD. Further, in most forecasting applications, it is more

common to employ two-sided prediction intervals. Putting these ingredients together, we

define the two-sided 100(1-) percent prediction interval as:

Forecast z/2*(RMSE). (2.22)

Here z/2 denotes the upper 100(1-/2) percentage point of the normaldistribution. We

should recognize at this point that although we are using the sample value of RMSE to

estimate the SD, we are not making any allowance for this fact. In Chapter 6, we define

prediction intervals more precisely. For present purposes, expression (2.22) will suffice.

The general purpose of such intervals is to provide an indication of the reliability of the

point forecasts. The limits derived from (2.22) are sometimes expressed as optimistic and

pessimistic forecasts; such nomenclature is useful as a way of presenting the concept to

others, but a precise formulation of the limits as in (2.22) should be used rather than a

vague assessment of extreme outcomes.

Example 2.9: Evaluation of prediction intervals

The RMSE of the one-step-ahead forecasts for the weather data is given in Table 2.10 as

4.35. We may use this sample value as an estimate of the RMSE for the process. Thus

the 95 percent one-step-ahead prediction intervals would be:

Point forecast 1.96*4.35 = F 8.53.


52/66

52

Comparison of forecast and actual values in Table 2.6 reveals that one value out of 19

lies outside these limits (that of Jan 5). We also may observe that the extreme changes in

the weather at the end of the sample period led to greater inaccuracies in forecasting, and

a considerable increase in the RMSE. Finally, note that prediction intervals may be used

for retrospective analysis as here, but their primary purpose is to provide assessments of

uncertainty for future events.

A detailed discussion of prediction intervals must await the formal development of

forecasting models. The reader who wishes to preview these discussions should consult

sections 6.xx.

An alternative approach to using theoretical formulae when calculating prediction

intervals is to use the observed errors to show the range of variation expected in the

forecasts. For the extended weather data set for the years we can calculate the 1-step

ahead errors made using the random walk forecasts. These form a histogram as shown in

figure ?. From this we can see that 90% of the forecast errors fall within the interval ?.

We can also fit a theoretical probability density to the observed errors as follows. We use

a normal ddistribution here but other distributions are possible since in many applications

more extreme errors are observed than those suggested by a normal distribution. Fitting a

distribution gives us more precise estimates of the prediction intervals and these are

calledempirical prediction intervals. To be useful, these empirical prediction intervals

need to be based on a large sample of errors.


53/66

53

2.9 Basic Principles

This book, like most other texts on business forecasting tends to devote most of its space

to discussions of forecasting methods and underlying statistical models. However, if the

groundwork is not properly laid, the best methods in the world cannot save the forecaster

from the effects of poor data selection and inadequate preparation. In this and later

chapters, we select principles from Armstrong (2001) and other sources that are

particularly relevant to the material just covered. We number these principles in order

within each chapter to facilitate cross-referencing but also give the original number from

Armstrong (2001) in square brackets where appropriate. The principles have been

reworded to meet present needs and are not direct quotations. However, the cross

reference is cited wherever a given principle adheres to the spirit of the original

statement.

Principle 2.1 [2.2]Ensure that the data match the forecasting situation.

Once the underlying purpose of the forecasting exercise has been specified, the ideal data

set can be identified. However, there are many reasons why the ideal may not be

available. For example, macroeconomic data are published with a lag that may be of

several months duration and, even then, may be published only as a preliminary estimate.

The forecaster needs to examine the available data with respect to the end use to which

the forecasts will be put, and make sure that a match exists.

Principle 2.2 [5.1] Clean the data.

Data may be wrongly recorded, omitted or affected by changing definitions. Adjustments

should be made where necessary, but a record of such changes should be kept and made


54/66

54

available to users of the forecasts. Data cleaning can be very time-consuming, although

the plots and numerical summaries described in this chapter will go a long way towards

identifying data errors. Failure to clean dat can lead to the familiar situation of Garbage

in, garbage out.

Principle 2.3 [5.2] Use transformations as required by expectations.

We considered differences, growth rates and log-transforms in section 2.6. The

forecaster needs to consider whether the original measurements provide the most

appropriate framework for generating forecasts or whether some form of transformation

is desirable. The basic pattern of no growth (use original data), linear growth (use

differences) of relative growth (use growth rates or log-differences) will often provide

adequate guidance. This issue will be revisited periodically in later chapters.

Principle 2.4 [5.8] Use graphical displays for data.

As we have seen in sections 2.2 and 2.3, plotting the data can provide a variety of insights

and may also suggest suitable transformations or adjustments. Graphical analysis should

always be the first step when developing forecasting procedures, even if applied to only a

small sample from a larger set of series.

Principle 2.5 [5.4] Adjust for unsystematic past events (e.g. outliers).

Data may be affected by the weather, political upheavals, supply shortages, or other

events. Such factors need to be taken into account when clear reasons can be identified

for the unusual observations. The forecaster should resist the temptation to give the data

a face-lift by over-adjusting for every minor event.

Principle 2.6 [5.5] Adjust for systematic events (e.g. seasonal effects).


55/66

55

Systematic events such as weekends, public holidays and seasonal patterns can affect the

observed process and must be taken into account. We will discuss these adjustments in

chapters 9 and X.

Principle 2.7 [13.20, modified] Use error measures that adjust for scale in the data when

comparing across series.

When you compare forecasts for a single series, scale-dependent measures such as MAE

or RMSE are useful. However, when you compare across different series, you should use

scale-free series such as MAPE (if appropriate) or MASE.

Principle 2.8 [13.25] Use multiple measures of performance based upon the errors.

If forecast users are able to compare performance using different measures, they will be

better able to assess performance relative to their particular needs. Multiple measures

allow users to focus on those attributes of a forecasting procedure that they deem most

relevant and also to check on the robustness of their conclusions. For example, one user

may value simplicity and be willing to accept somewhat reduced accuracy in order to

keep it simple. Another may wish to avoid large errors, when the RMSE becomes most

relevant, since it depends upon squared errors. A third may avoid RMSE purely because

it gives such weight to large errors.

Summary

In this chapter we have described the basis tools of data analysis, particularly as they

relate to the analysis of time series. In particular, we examined:

Scatter plots and time series plots for preliminary analysis of the data (sections 2.2

and 2.3)


56/66

56

Basic summary statistics for individual variables (section 2.4)

Correlation as a measure of association for cross-sectional data (section 2.5)

Transformations of the data (section 2.6)

Measures of forecasting accuracy (section 2.7)

Prediction intervals as a measure of the uncertainty related to point forecasts

(section 2.8).

Finally, in section 2.9 we briefly examined some of the underlying principles that should

be kept in mind when starting out on a forecasting exercise.

References

Anderson, D.R., Sweeney, D.J. and Williams, T.A. (2005) Statistics for Business and

Economics. South-Western: Mason, Ohio. Ninth edition.

Armstrong, J.S. (2001). Principles of Forecasting: A Handbook for Researchers and

Practitioners. Boston, MA: Kluwer

Hyndman, R.J and Koehler, A.B. (2006). Another look at measures of forecast accuracy.

International Journal of Forecasting, 22, 679 688

Winston, W. and Albright, S.C. (2001). Practical Management Science. Pacific Grove,

CA: Duxbury, Second edition.

Exercises

2.1The average monthly temperatures for Boulder, Colorado from January 1991 to

September 2008 are given inBoulder.xlsx [Source: U.S. Department of

Commerce, National Oceanic and Atmospheric Administration]. Plot the time


57/66

57

series and also create a seasonal plot for the first four years of the series.

Comment upon your results.

2.2The following table contains data on railroad passenger injuries in the U.S. (rail

safety.xlsx) from 1991 to 2007. Injuries represents the number of persons

injured in the given year, train-miles denotes the millions of miles travelled by

trains and the final column is the ratio describing the number of injuries per 100

million miles travelled.

a. Create a scatterplot for injuries against train-miles

b. Plot each of the three time series

c. Does the level of injuries appear to be changing over time? If so, in what way?

Year Injuries

Train-

miles

Injuries

per T-M

1990 473 72 657

1991 382 74 516

1992 411 74 555

1993 559 75 745

1994 497 75 663

1995 573 76 754

1996 513 77 666

1997 601 78 770

1998 535 78 683

1999 481 82 584

2000 658 84 781

2001 746 88 850

2002 877 90 979

2003 726 89 812

2004 679 89 760


58/66

58

2005 935 90 1,040

2006 761 92 828

2007 938 95 990

Source: U.S. Department of Transportation, Federal Railroad Administration.

2.3An investor has a portfolio consisting of holdings in nine stocks (returns.xlsx).

The end-of-year returns over the previous year are:

-5.0, -3.7, 0.9, 4.8, 6.2, 8.9, 11.2, 18.6, 25.4.

a. Compute the summary statistics (mean, median, MAD and S).

b. Just before the close of business in the last trading session of the year, the

company that had reported the 5.0 percent drop declares bankruptcy, so

that the return becomes -100 percent. Re-compute the results and

comment on your findings.

c. Are simple summary statistics relevant to this investor? How would you

modify the calculations if at all?

2.4 For the temperature data (Boulder.xlsx) in Exercise 2.1,compute the summary

statistics (mean, median, MAD and S) for each month. Comment upon your

results. Does it make sense to compute summary statistics across all values,

rather than month by month? Explain why or why not.

2.5 Compute the summary statistics (mean, median, MAD and S) for each of the

variables listed in Exercise 2.2 (rail safety.xlsx). Are these numbers a sensible

summary of safety conditions?


59/66

59

2.6Calculate the correlation between the monthly values of electricity consumption

(electricity.xlsx) for 2002 (listed as forecasts in the table) and 2003 in Table 2.8.

Interpret the result.

2.7Compute the correlations for the hot growth companies in Table 2.3 (Growth

companies.xlsx) for each pair of variables Rank, P-E Ratio and Return, using the

first 50 and the second 50 separately. Compare these results with those given in

Example 2.4. Explain the differences. .

2.8 The quarterly sales figures and percentage growth figures for Netflix are given in

Netflix.xlsx and the table below. Produce time series plots for each of quarterly

sales, absolute growth and the growth rate.

a. Are the mean and median useful in this case? Explain why or why not.

b. Calculate the growth rate for each quarter relative to the same quarter in the

previous year; that is for 2001 Q1 we have 100(17.06 5.17)/5.17 = 230.

After allowing for the start-up phase of the company, do sales show signs

of leveling off?

Year Quarter

Quarterly

Sales

Growth-

absolute

Growth -

percent

2000 1 5.17 * *

2000 2 7.15 1.97 38.1

2000 3 10.18 3.04 42.5


60/66

60

2000 4 13.39 3.21 31.5

2001 1 17.06 3.67 27.4

2001 2 18.36 1.30 7.6

2001 3 18.88 0.52 2.8

2001 4 21.62 2.74 14.5

2002 1 30.53 8.91 41.2

2002 2 36.36 5.83 19.1

2002 3 40.73 4.37 12.0

2002 4 45.19 4.46 10.9

2003 1 55.67 10.48 23.2

2003 2 63.19 7.52 13.5

2003 3 72.20 9.02 14.3

2003 4 81.19 8.99 12.4

2004 1 99.82 18.63 22.9

2004 2 119.71 19.89 19.9

2004 3 140.41 20.70 17.3

2004 4 140.66 0.25 0.2

2005 1 152.45 11.79 8.4

2005 2 164.03 11.58 7.6

2005 3 172.74 8.71 5.3

2005 4 193.00 20.26 11.7

2006 1 224.13 31.13 16.1

2006 2 239.35 15.22 6.8

2006 3 255.95 16.60 6.9

2006 4 277.23 21.28 8.3

2007 1 305.32 28.09 10.1

2007 2 303.69 -1.63 -0.5

2007 3 293.97 -9.72 -3.2

2007 4 302.36 8.39 2.9

Source: Netflix Annual Reports


61/66

61

2.9

chapter 2 - basic tools for forecasting

Documents