visual displays with excel - mef · •= the visual display of quantitative information (1983) •=...

45
VISUAL DISPLAYS WITH EXCEL Prepared by E. Erkut for BUS 201 (August 1999) Faculty of Business, University of Alberta, Edmonton, Canada Table of Contents Introduction 1 A depiction is valued at a hundred utterances 1 Pictures Beat Numbers on Any Day 2 Abuse of Graphics 4 Excellence in Graphing 7 Excel Chart Wizard 7 Column Chart 7 Bar Chart 10 Line Chart 11 Scatter Chart 12 Special Purpose Charts 14 Pie Chart 14 Area Chart 14 Bubble Chart 15 Stock Chart 15 Radar Chart 15 Surface Chart 15 Elements of a Chart 16 Size 16 Scale 16 Color 17 Dressing 17 Formatting text 20 Bad Charts 21 Which Chart to Use? 24 Standard Charts 24 Custom Charts 25 Faults of the Default 26 Message Drives the Chart 31 Charts and Better Charts 37 Examples of Less Common Applications 40 Recommended readings (all by Edward Tufte): = The Visual Display of Quantitative Information (1983) = Envisioning Information (1990) = Visual Explanations: Images and Quantities, Evidence and Narrative (1997) All published by Graphics Press, Chesire, Connecticut, 1983. They can be purchased at full price from the publisher (US$40, $48, $45 respectively), or from Internet bookstores such as amazon.com at fairly deep discounts. Armann Ingolfsson’s valuable input is gratefully acknowledged. This is a first draft; student input for improvements is welcome. This document cannot be photocopied without written consent of the author.

Upload: others

Post on 11-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

VISUAL DISPLAYS WITH EXCEL

Prepared by E. Erkut for BUS 201 (August 1999) Faculty of Business, University of Alberta, Edmonton, Canada

Table of Contents

Introduction 1 A depiction is valued at a hundred utterances 1 Pictures Beat Numbers on Any Day 2 Abuse of Graphics 4 Excellence in Graphing 7 Excel Chart Wizard 7 Column Chart 7 Bar Chart 10 Line Chart 11 Scatter Chart 12 Special Purpose Charts 14 Pie Chart 14 Area Chart 14 Bubble Chart 15 Stock Chart 15 Radar Chart 15 Surface Chart 15 Elements of a Chart 16 Size 16 Scale 16 Color 17 Dressing 17 Formatting text 20 Bad Charts 21 Which Chart to Use? 24 Standard Charts 24 Custom Charts 25 Faults of the Default 26 Message Drives the Chart 31 Charts and Better Charts 37 Examples of Less Common Applications 40

Recommended readings (all by Edward Tufte):

•= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities, Evidence and Narrative (1997)

All published by Graphics Press, Chesire, Connecticut, 1983. They can be purchased at full price from the publisher (US$40, $48, $45 respectively), or from Internet bookstores such as amazon.com at fairly deep discounts.

Armann Ingolfsson’s valuable input is gratefully acknowledged.

This is a first draft; student input for improvements is welcome. This document cannot be photocopied without written consent of the author.

Page 2: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

1

Visual Displays:

A depiction is valued at a hundred utterances Or is it "A picture is worth a thousand words?" Whichever your choice, during your BCom education and afterwards, you will be dealing with complex ideas and large data sets. In many instances a pictorial representation will be of considerable help to convey the information, and in some cases it will prove very useful in improving your understanding of it. The pictorial representation could be a flow chart, an organizational chart, a layout, a map, or a graph. It may be in black-and-white or color, it may be still or animated, it may be printed on a piece of paper or embedded in a PowerPoint presentation. Whatever its format, the goal is to facilitate communication and understanding. Tufte (p. 9, 1983) writes:

"Modern data graphics can do much more than simply substitute for small statistical tables. At their best, graphics are instruments for reasoning about quantitative information. Often the most effective way to describe, explore, and summarize a set of numbers--even a very large set--is to look at pictures of those numbers. Furthermore, of all methods for analyzing and communicating statistical information, well-designed data graphics are usually the simplest and at the same time the most powerful."

As with most everything else, there are good displays and then there are not so good displays. Our goal here is to point you in the direction of good displays. It is important to differentiate between an effective display and a beautiful display. We do not mean to define beauty here, but merely focus on some of the characteristics of effective displays that can be summarized as "principles." We are fully aware that the creation of visual displays is an area where science and art come together. We do not intend to pass judgment on the beauty of the displays (even when we call a display "beautiful" in class), but on more objective criteria such as a display's ability to convey the information without distorting it. Until the popularization of microcomputers, the production of even simple presentation-quality charts was a laborious and expensive task. Complicated visual displays required considerable time investment by draftspersons--I think back then they were called draftsman, but better be safe than politically incorrect. The availability of microcomputers accelerated the production of software that could generate graphs easily. The real breakthrough for the average person came in the form of spreadsheets and their powerful charting tools. The upside for the BCom student is that one can generate a large variety of charts using Excel--and all of you will. The downside is that some of these charts will be useless, worse yet they may convey the wrong message. The goal of this module is to explore the upside and warn the students about the downside. Much of the discussion in this module will be on the charts that can be produced using the Excel chart tool. However, we will take a broader view of visual displays and demonstrate the powers of displaying geographical data and using animation. We believe that these tools are essential for successful business presentations. What follows is merely a collection of ideas demonstrated by visual displays. It is not meant to be a complete (and well-written) book on the subject, and it is priced as such. ;-)

Page 3: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

2

Pictures Beat Numbers on Any Day Consider the following data for beer sales in millions of barrels.

1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991Jan 14,673 13,310 15,188 14,767 14,148 15,495 15,714 15,601 15,801 15,877 16,459 16,275 Feb 14,912 14,579 14,999 14,562 14,746 14,551 15,206 15,633 15,850 15,292 15,745 15,169 Mar 16,563 16,720 17,654 16,777 17,722 16,767 16,506 17,656 17,125 17,569 17,968 16,085 Apr 16,545 17,675 17,860 18,420 16,814 17,974 17,991 17,422 17,728 17,298 17,477 17,228 May 17,971 18,874 18,216 18,165 18,745 18,858 18,670 17,436 18,310 18,409 18,101 18,900 Jun 17,929 18,863 18,092 18,467 18,468 18,232 18,648 18,584 18,584 18,821 18,579 19,164 Jul 18,693 18,798 17,174 18,497 19,116 18,586 18,327 18,091 18,172 18,283 18,246 19,882 Aug 18,025 17,718 17,502 18,273 17,588 17,713 17,057 16,807 17,725 18,885 18,963 18,627 Sep 16,291 15,715 15,635 15,708 14,581 14,534 15,264 15,824 15,777 15,625 16,086 16,115 Oct 15,637 14,609 15,071 15,407 15,140 14,358 15,620 15,497 15,610 15,825 16,621 16,654 Nov 13,562 13,121 13,649 13,619 13,061 13,115 13,529 13,184 14,019 14,785 15,442 14,470 Dec 13,319 13,934 13,309 12,463 12,893 13,134 13,967 13,687 13,322 13,455 13,970 13,641 As far as data sets go, this one is a fairly small set; there are only 144 numbers in the table. Now look at it for a while and tell me whether the data contains seasonal effects. While you are at it, tell me whether there is a trend in the sales. What I am asking for here is not terribly difficult; you could actually start seeing some patterns after a few minutes. However, trying to see trend and seasonality in a sea of numbers is not a very pleasant task. In contrast, let's take some pictures of this data. Here is the entire time series: 144 months worth of sales data.

0

5,000

10,000

15,000

20,000

25,000

1 13 25 37 49 61 73 85 97 109 121 133

Month

Million Barrels

This chart is more pleasing to the eye (to my eye anyway) than the busy number matrix. The chart suggests heavy seasonality and perhaps a small upward trend. However, this is not the best picture of the data to see seasonality and trend unless you have eagle eyes. To check for seasonality, we can look at a plot of monthly sales over the years:

Page 4: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

3

0

5,000

10,000

15,000

20,000

25,000

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Million Barrells

One quick look at this chart reveals that there is a pronounced and consistent seasonality effect in the data; the production in the summer months is much higher than the production in the winter months. (Note that this chart does not differentiate between years; every year is represented by a broken line of the same color. This is intentional; we do not want to display information that will distract from the main message: there is seasonality in the data.) To see the trend, our best bet may be to plot the annual total sales over the 12 years.

186,000188,000190,000192,000194,000196,000198,000200,000202,000204,000206,000

1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991

Million Barrells

This chart provides strong evidence that there is a small positive trend in the sales (note the scale). (Note that the line depicting sales is thicker than the gridlines, drawing our attention to it.) Producing these "pictures" of the data informed us that there was strong seasonality, and possibly a slight positive trend, in the data before we started with the analysis of the data.

Page 5: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

4

We gave an example of the use of graphs to summarize and display numerical data. The introduction would be incomplete if we also do not give an example of an abuse of graphs. Abuse of Graphics: Consider the following chart that displays the net income of a company over five years. This example is based on a graph published in the Annual Report of Day Mines, Inc. (Tufte, 1983).

Looks impressive, doesn't it? Makes you want to buy some of their stock. Five solid years of net income! A look at the data reveals the following table:

By using a vertical scale that starts at a large negative number, the chart masks the net loss in 1970. The use of 3-D bars, coupled with the multiple colors and the presentation angle, diverts attention from the drop in the net income during the last year. Here is a more accurate graph:

Year Net Income 1970 $ (11,014) 1971 $ 397,747 1972 $ 521,943 1973 $1,647,001 1974 $1,435,102

1970 1971 1972 1973 1974

$(250,000)

$-

$250,000

$500,000

$750,000

$1,000,000

$1,250,000

$1,500,000

$1,750,000

1970 1971 1972 1973 1974

Net Income

Page 6: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

5

Here is a graph a conservative politician might use to argue for reduced funding for universities.

Univ. Budget

450

470

490

510

530

550

570

590

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

$M

What is wrong with this graph? It has been doctored to give the impression that university spending is increasing by leaps and bounds. How?

•= It has been shrunk horizontally to give the impression of the steep increase. •= The range of the Y-scale is very narrow, resulting in a sharp climb. •= The Y-scale has been set so that the graph seems to be “breaking its frame” in 1999.

Here is another look at the same data:

Univ. Budget

0100200300400500600700800900

1000

1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999

$M

Page 7: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

6

Is this an "honest" graph? Not really. It has been stretched horizontally and shrunk vertically. As well, the Y-scale is set to between 0 and $1B. These two measures mask the increase in the budget to a great extent. Finally, the grid lines have been removed. Without reference points on the plane, the eye is less likely to detect an increase. How should the university counter the politician then? By using relevant facts instead of fabricating graphs.

•= The politician is charting university budget. Government grants pay only a portion of the budget (for example, about 55% for the UA). This portion changes over time.

•= The politician's figures may not be inflation-adjusted. Assuming 3% inflation per year, the graph would look quite different.

•= If the university has been expanding (i.e., increasing the number of its students), then a relevant statistic is the dollars-per-student. Depending on the numbers, the university might be able argue that the per-student spending is going down (implying higher efficiency or lower quality of education--take your pick :-)).

This graph assumes a decreasing government share of the university budget (going from 80% to 60%) and an annual inflation rate of 3%. Note that the budget is de-emphasized (shown using a dotted line and printed in light gray), and the deflated government spending is emphasized (shown using a thick dark line). This selection is consistent with the university's view of the importance of these data series in the context of this debate. We suggest that you stay away from graph abuse (as well as data abuse); this is not only deceptive, but also unethical. However, even if you have the best of intentions, you need to develop some skills to create the best possible visual display. The rest of this module is intended to give you some guidance in generating good charts.

0

100

200

300

400

500

600

700

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

$M

Budget

Governmentspending

Governmentspending--deflated

Page 8: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

7

Excellence in Graphing: Here is an excerpt from Tufte (p. 13, 1983) about the definition of a good graphical display.

Graphical display should: •= show the data •= induce the viewer to think about the substance rather than about

methodology, graphic design, the technology of graphic production, or something else

•= avoid distorting what the data have to say •= present many numbers in a small space •= make large data sets coherent •= encourage the eye to compare different pieces of data •= reveal the data at several levels of detail, from a broad overview

to the fine structure •= serve a reasonably clear purpose: description, exploration,

tabulation, or decoration •= be closely integrated with the statistical and verbal descriptions

of a data set

According to Tufte, graphical excellence is "that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space." We may not be able to reach excellence with every graph we produce. However, this definition of excellence gives us a target. Excel Chart Wizard We now describe how best to utilize the Excel Chart Wizard. The Chart Wizard contains 14 standard types and 20 custom types of charts. This wide selection causes some confusion for beginners. For most applications, the standard types are adequate. In fact, most of the charts you will produce will be one of the following three: column chart, line chart, and scatter chart. Column chart and line chart are useful for graphing one-dimensional data (such as a histogram of student grades or past sales data). In contrast, the scatter chart plots two-dimensional data (GPA vs. salary, advertising vs. sales). We will give examples of charts produced by Excel with some guidelines as to which chart to select for what type of data. So as not to detract from the main task at hand, we will use the default options of the Chart Wizard. Later, we will work on improving the charts. We note that the defaults are not effective choices and if you are looking for good examples to emulate, you should look at the later graphs where we override the defaults options. Column chart: An excellent use of column charts is the display of histograms. Suppose we wish to study the reaction times of world-class sprinters. (The reaction time is time that passes between the blast from the starter's pistol and the departure of the runner from the starting blocks.) We have the

Page 9: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

8

data on reaction times of all 106 athletes from the first round heats of the 100 m. competition from the 1996 Atlanta Olympics. We produce two column charts using this data.

The first chart is not very meaningful. In this chart, the vertical axis corresponds to reaction time in seconds, and the horizontal axis corresponds to the athlete's index. This chart is nothing but a very plain picture of the data. The second chart (a histogram) is a more useful picture; it contains a summary of the data. The X-axis corresponds to reaction times (in seconds) and the Y-axis corresponds to the number of athletes whose reaction times fell in a given range. We can clearly see the minimum, the maximum, and the mode. We also get a fairly good idea about the distribution. (In case you are curious, Donovan Bailey's reaction time was 0.172--dead average. He won his heat.) The two charts we display above are "clustered" column charts in Excelese. There are two other options: "stacked" and "100% stacked" column charts. For the sake of keeping the discussion focused, we will not delve into these for now.

0

5

10

15

20

25

0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24

0.000

0.050

0.100

0.150

0.200

0.250

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106

Page 10: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

9

Excel also allows you to produce 3-D versions of the column charts. For example, the 3-D version of the above histogram looks like the following.

And, it is possible to increase the chart depth (to enhance the 3-D effect) as well as change the viewing angle (rotate the chart):

0

5

10

15

20

25

0.10

0.11

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.20

0.21

0.22

0.23

0.24

0

5

10

15

20

25

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0.24

0

5

10

15

20

25

0.10

0.11

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.20

0.21

0.22

0.23

0.24

Page 11: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

10

Before we get carried too far away here, we should ask a simple question: Why do all this? What do we gain by the 3-D, the different shapes and the rotations? You may find my answer rather uninspiring: Nothing! I have a preference for clean and simple graphs. The objective is to communicate the quantitative information in the most clear and precise way. To me, the gimmicks add very little, and I am concerned that they may get in the way. If you like the bells and whistles of the Chart Wizard, play with it until you find the graph that pleases you most. However, keep in mind that the flashiest graph may not be the best conveyor of the information. Applying the "keep things simple" principle, we would recommend not using the Cylinder, Cone, and Pyramid charts, the last three on the list of standard types in the Chart Wizard. Bar Chart: A bar chart is nothing but a column chart turned on its side. In instances where the x-axis does not correspond to numbers (but names, for example), a bar chart may be more suitable. For example, the bar chart below displays the average January temperatures of a dozen US cities (in degrees Fahrenheit).

If you like flashy charts, then you should try the "Outdoor bars" under Custom Charts. This is just a more colorful version of the bar chart above which uses green and brown (hence "outdoor"). It may make for a nice PowerPoint chart, but it would be awful for printing on a black-and-white printer. Here is a two-city version of the above chart in the outdoor format.

0 10 20 30 40 50 60 70

Burlington, VT

Portland, ME

Albany, NY

Boston, MA

Philadelphia, PA

Baltimore, MD

New York, NY

Washington, DC

Norfolk, VA

Charlotte, NC

Charleston, SC

Miami, FL

7

12

Burlington, VT

Portland, ME

Page 12: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

11

Line Chart: This chart is very useful in displaying time series. For example, here are the monthly averages of the Dow Jones index for the 1981-1990 period. The graph shows the positive trend as well as the meltdown of October 1987 and the correction of August/September of 1990.

As an example of displaying multiple time series in the same line chart, we use the monthly number of visitors to two locations in Kenai Fjords National Park in Alaska during the 1990-1993 period. This chart shows that the great majority of visitors arrive in the summer, that the Exit Glacier attracts more visitors than the visitor centre, that the number of visitors is increasing, and that visits to the two sites follow the same pattern.

0.00

500.00

1000.00

1500.00

2000.00

2500.00

3000.00

3500.00

1 13 25 37 49 61 73 85 97 109

Month

Dow Jones Index

-

5,000

10,000

15,000

20,000

25,000

Janu

ary

Mar

ch

May

July

Sept

embe

r

Nov

embe

r

Janu

ary

Mar

ch

May

July

Sept

embe

r

Nov

embe

r

Janu

ary

Mar

ch

May

July

Sept

embe

r

Nov

embe

rExit GlacierVisitor Centre

Page 13: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

12

It is possible to chart two series where the numbers are of different orders of magnitude. For example, the chart below shows the total population (based on the right axis), the number of unemployed and the size of the armed forces (left axis) in the US during the 1947-1962 period. This chart is produced using the custom chart "Lines on 2 Axes."

Clearly the population increases steadily. The number of unemployed is higher than the number in the armed forces, except for the 1951-53 period where an increase in the size of the armed forces may explain the drop in the numbers of the unemployed. Scatter Chart (or XY Chart): This is a very useful chart for demonstrating the relation (or the lack of it) between two series of numbers. Suppose we wish to study the relation between the mean annual temperature of a region and the mortality index due to breast cancer. If we plot the data using a line chart, we do not generate a very useful summary.

0

1,000

2,000

3,000

4,000

5,000

6,000

1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 19620

20,000

40,000

60,000

80,000

100,000

120,000

140,000

Unemployed (left axis)Armed Forces (left axis)Population (right axis)

0.0

20.0

40.0

60.0

80.0

100.0

120.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

TempMort

Page 14: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

13

In contrast, the scatter chart conveys potentially useful information. There seems to be a direct relationship between temperature and breast cancer.

As another example, we plot the total assets of the largest banks in the US against their net incomes in early 1970s. Again we observe a direct relationship between assets and income. We notice that three of the banks are considerably larger than the others. We also note the superstar bank that produces a net income much exceeding its assets ($132M in assets, $222M in income).

This completes our discussion of the most useful (or frequently used) chart types. Before closing this section, we will briefly mention some other chart types that have limited usefulness.

0.0

20.0

40.0

60.0

80.0

100.0

120.0

0.0 10.0 20.0 30.0 40.0 50.0 60.0

Temperature

Mortality

0

50

100

150

200

250

300

350

400

0 100 200 300 400 500 600

Assets

Income

Page 15: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

14

MAY8%

JUN25%

JUL29%

AUG22%

SEP10%

OTHER6%

Special Purpose Charts: Pie Chart: This chart is useful to display the breakdown of a whole into its parts. Below we display the monthly breakdown of visitors to Kenai Fjords National Park in 1993 using a black-and-white pie chart (under custom charts). There are several other pie chart options.

We prefer a column chart to a pie chart for two reasons: 1) If the numerical values are not displayed on the pie chart, then it may be difficult for the viewer to differentiate between slices of similar sizes. This problem does not exist with a column chart because our eyes are better at estimating length than angle. 2) Different slices of a pie chart must be of different colors or different shades. Either way, a pie chart printed on a B&W printer does not look as clean as a column

chart. However, if you like your pie, you are welcome to it. Area chart: An area chart is simply a fill-coloured version of a line chart. It can be useful in displaying multiple time series that add up to a whole. For example, if we wish to display the different sources of energy over time the following stacked area chart can be useful.

Note that this chart nicely displays the increase in energy consumption, as well as the composition of the energy. It shows the decline of the use of coal, a steady use of oil and hydro, an increase in the use of gas, a sudden increase, followed by a stagnation and a decrease in the use of nuclear power, and the recent surge of renewable sources of energy.

0

50

100

150

200

250

300

350

400

450

1930 1940 1950 1960 1970 1980 1990

RenewablesHydroNuclearGasOilCoal

Page 16: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

15

Bubble Chart: The bubble chart is useful when plotting three-dimensional data. The first two dimensions are plotted as a scatter diagram. The third dimension is conveyed with the size of the "bubble" that represents the data point. This chart is particularly useful for displaying spatial data where the first two columns correspond to latitude and longitude, and the third column corresponds to a measurement at that location (such as population or sales). The chart below shows the average January temperature in 56 US cities. The temperature increases (bubbles become larger) as we go south. It is possible to add (stylized) borders to this chart by using the drawing tool.

Stock chart: As the name implies, this chart is useful when plotting information about stocks. You can display the highest, the lowest, and the closing price of a stock for a series of days. Opening price and/or trading volume can be added as well. While all of this information can be displayed using column or line charts; the stock chart allows the presentation of the same information with less ink, and can be a useful tool. Radar Chart: The radar chart can display multidimensional data. However, I have not been able to find a case where the use of the radar chart made a big difference. A display of 5 alternatives with 4 attributes quickly turns into a big mess, and I would much rather use a line chart where each line represents one alternative. Surface Chart: This chart provides another way to plot three-dimensional data, where the third dimension is shown as height above the x-y plane. Hence, it is useful for the display of a two-dimensional function: z = f(x,y). While it can be useful, one has to adjust the viewing angle carefully for a "good view." In my opinion, its most powerful option is the display of the contours of the function in 2D using different colors. This allows, for example, the display of iso-cost lines for facility location problems.

January Temperature

Page 17: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

16

Elements of a Chart: Size: In the introduction, we presented two versions of the same "university budget" chart, and suggested that both were cooked by stretching and shrinking the axes. So what is the proper shape of a chart? The answer depends on what we are charting. However, in most instances the width-to-height ratio should be between 1.2 and 2.2. (Excel's default ratio is about 1.8.) Tufte (1983) gives several reasons for favoring charts that are greater in length than in height (accessibility to the eye, ease of labeling, emphasis on causal influence), and recommends a ratio of 1.5 unless the nature of the data suggests a different shape. Scale: We demonstrated abuses of graphics through the manipulation of the Y-scale. So what should the Y-scale be? Here is a chart from a July 9, 1999 release of StatsCan on unemployment (from http://www.statcan.ca/english/Subjects/Labour/LFS/lfs-en.htm)

A casual look at this chart may lead one to think that unemployment does not exist in Canada any longer. Yes, in July 1999 the unemployment rate was at a 9-year low, but it was still quite a ways from being zero. If the scale starts at zero, then the line will appear rather flat. That is not the desired impression either since a drop of the unemployment rate from 9.5% to 7.5% is significant. So where should the Y-axis start? I would use some value around 4% since this is practically the lowest possible unemployment in most countries (some people will choose to remain unemployed). Another alternative is to find the lowest unemployment rate among OECD countries and use that as the starting point. In general, unless there is a good reason to deviate from zero (such as the example above) I would suggest using zero as the starting point. What about the ending point? The same reasoning applies. Unless there is a good reason, I would suggest the use of a number that is slightly higher than the largest number in the series. If there are several comparable charts next to each other, they should have the same scale.

Page 18: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

17

Color: Spending time on selecting colors makes sense only if you will be able to display the color chart in its full glory. If you will include the chart in a B&W printed report, then you have to make sure that different colors in a chart can be differentiated when printed in black-and-white. A blue line and a red line will look like two dark gray lines on a B&W printer. For graphs to be printed on B&W printers, I would recommend switching to different shades of gray. This way, you will get what you see. Dressing: The Excel Chart Wizard allows you to dress up a chart in infinitely different ways. Consider a simple column chart. These are the different things you can do after creating a default chart quickly:

Format the gridlines (major and minor, horizontal and vertical): •= Change the style (8 options) •= Change the color (56 options) •= Change the weight (4 options) •= Remove

Format the axis: •= The axis line: The same four options as for the gridlines •= The numbers:

•= Format •= Fonts •= Alignments

Format the plot area: •= The border: The same four options as for the gridlines

•= Shadows (Y/N) •= The area:

•= Remove (no color) •= Change the color •= Use fill effects

•= Gradient •= One color (40 choices, can choose the darkness level of colors) •= Two colors (40 choices for each color) •= Preset designs (24 possibilities) •= Shading styles (6 options)

•= Texture (24 options, such as marble, denim, wood) •= Pattern (48 options, such as dotted, striped, plaid, shingle) •= Picture (your favorite picture)

Format the chart area: The same options as for the formatting the plot area PLUS one more option: the border can have round corners (Y/N)

Format the columns: •= The same infinite number of selections as with the plot area above •= Adjust the gap width •= Use different colors for columns

Page 19: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

18

As you see, there are literally many Millions of ways of formatting a column chart since you could pick-and-choose the options you want from this long and extensive menu. We will now give an example what formatting can do to a plain column chart. Consider the column chart displaying the histogram of the reaction times for the 100 m. sprinters. We have dressed it up as follows:

•= Larger and bolder fonts for the numbers along the axes •= 45-degree slanting of the numbers along the x-axis •= Dotted grid lines •= Heavier border for the plot area •= One-color gradient (gray) with horizontal shading for the plot area •= Heavier column borders with shadow •= "Pink tissue paper" texture for the columns •= Reduced the column gap from 150 (default) to 100 (i.e. made the columns wider)

We are not suggesting that these are good suggestions to follow, but merely demonstrating some of the things you can do to dress up a chart. Personally, we have a preference for simplicity. However, for PowerPoint presentations, a little dressing can go a long way towards making the graphs look “professional.”

One has to be careful when dressing up a chart for two reasons: 1) Given the vast number of options, one could waste an inordinate amount of time on this task, 2) if dressing up is not done carefully the results can be quite obnoxious. We will spare you the details on how we managed to produce the "chartjunk" on the next page. Please do not look at it for more than 5 seconds at a time since it may result in temporary loss of sanity as well as double-vision. As an alternative, we also produced a PJ (Plain-Jane) version of this chart on the next page. The PJ version is a simple gray chart on a white background with no gridlines. The space between the columns has been eliminated completely. Following Tufte’s principle of simplicity (maximize the information-to-ink ratio), we would recommend using Plain-Jane charts as much as possible.

0

5

10

15

20

25

0.10

0.11

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.20

0.21

0.22

0.23

0.24

Reaction time (sec.)

Num

ber o

f run

ners

Page 20: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

19

ChartJunk

PlainJane

0

5

10

15

20

25

0.10

0.11

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.20

0.21

0.22

0.23

0.24

0

5

10

15

20

25

0.10

0.11

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.20

0.21

0.22

0.23

0.24

Reaction time in seconds

Num

ber o

f run

ners

Page 21: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

20

Formatting text: A chart can contain text for the following purposes:

•= labels along the axes •= titles along the axes •= legend •= chart title •= labels or values on the chart •= comments

Recommendations:

•= We recommend the use of a clean font that is easy to read (such as Arial, or Times-Roman, as opposed to Monotype Corsiva, or Impact.

•= On charts that appear on the same page as text, the font size in the chart should be as large as that of the text, or possibly slightly larger. The font size on charts prepared for projection (PowerPoint or Excel) will depend on the size of the image on the screen. However, we recommend at least a 16 (and possibly bold) font. (Note that Excel will size the fonts up and down as you change the size of the chart.)

•= It is preferable to keep titles along the axes in one line and horizontal. It is very important to have unambiguous titles so the reader does not have to guess, even if that means a long chart title.

•= The default for the legend is on the right-center. For some charts it may be possible (or desirable) to move the legend inside the plot area to a place where it does not interfere with the data. This will increase the size of the plot area and may make it easier for the reader to refer to the legend. Note that for some charts a legend is not necessary (see the above population chart).

•= The best location for the chart title is its default position: top-center. This allows for a title that consists of several words without wrapping.

•= It is possible to put values or labels on the chart (see the population chart). This may be useful in some instances, especially if the exact values are important. However, it is not advisable to have grid lines and values on a chart. This would result is a chart that is too busy. If you wish to put values on the chart, consider removing grid lines. Note that another way to display the exact values of the series is by placing a data table underneath the chart.

•= If the plot area allows for it, one might want to put in a brief comment summarizing the insight from the chart in a textbox. For example, a textbox insert for the population chart might contain the following text "annual increase = 1.2%."

•= While there are many ways of formatting the text in a chart, the overarching concern should be to keep the chart free of clutter as much as possible.

Here is something we do not do in this document for convenience: Add titles to the charts. You should have a self-contained title at the bottom of each figure. Example:

Figure 1: This map shows Alberta’s provincial boundaries, the primary highway network, and the locations of TransAlta’s customers (black circles). The size of the circle is proportional to the number of customers at a demand point. Each circle is an aggregation point representing anywhere from 1 to 17,000 customers.

Page 22: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

21

Bad Charts Before moving on to the selection of the most suitable chart type, we would like to give several examples of bad chart warnings through examples. Example 1:

Year Units sold1997 100001998 12000

What is the best chart for this data? None. You do not need a chart to display such a small data set. You can either use a table, or mention the data in a sentence "The number of units sold increased from 10,000 in 1997 to 12,000 in 1998." Example 2: Year 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998Sales 8700 8900 9300 9700 9500 9800 11200 11000 11100 11200 11600 12000

A careless use of a line chart may generate the following:

What went wrong? We charted the Year in addition to the Sales. Simple reminder: Do not include the x-axis values (or labels) when highlighting an area for charting. After you start the Chart Wizard, click on "Series," and enter the x-axis values into "Category (X) axis labels." Example 3:

Year 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998Sales A 8700 8900 9300 9700 9500 9800 11200 11000 11100 11200 11600 12000Sales B 6300 6200 6400 6300 6100 6500 6100 6200 6300 6400 6200 6100

0

2000

4000

6000

8000

10000

12000

14000

1 2 3 4 5 6 7 8 9 10 11 12

Page 23: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

22

0

2000

4000

6000

8000

10000

12000

14000

Sales A Sales B

This chart neither summarizes the data nor adds to our knowledge. How did we get this chart from that data? By plotting the data in columns instead of in rows. In this chart, every one of the twelve years is represented by a line that goes from the Sales of Product A to the Sales of Product B for that year. Not very useful indeed.

Example 4: Consider the following breast cancer data that was used earlier to introduce scatter charts:

Temp (F) 44.2 46.3 48.5 45.1 42.3 42.1 51.3 50.0 43.5 49.9 31.8 47.3 40.2 34.0 49.2 47.8Mortality 81.7 78.9 87.0 89.2 65.1 84.6 102.5 100.4 72.2 104.5 67.3 88.6 68.1 52.5 95.9 95.0

How did we create this gibberish from the data? We used the proper chart (namely a scatter chart), but not the proper type of the scatter chart. The above is a scatter chart where the data points are connected by lines (the fifth option under scatter chart). We needed a simple scatter chart with no lines instead. What is the lesson here? When engaging the Chart Wizard, do not disengage brain. If the chart does not make sense, you may have made a simple mistake in charting the data. Go back and redo.

40.0

50.0

60.0

70.0

80.0

90.0

100.0

110.0

25.0 30.0 35.0 40.0 45.0 50.0 55.0

Page 24: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

23

Example 5: Consider the following scatter chart which shows the snowfall (in centimeters) versus the percent unemployment during the last 10 years.

The chart suggests almost a perfect correlation between unemployment and snowfall in a given year. So what are we to conclude? The more people are unemployed, the more it snows. The snowmaker checks the employment statistics to decide how much of the white stuff to bestow upon us? Or is it the other way? The more it snows, the more people are out of work. Do people look at the snowfall and decide to quit work if it snows a lot? None of this sounds plausible. What is going on? We used a scatter chart which is usually used to display a relation, usually a causal relation. In this case, there we cannot think of a reasonable relation. The chart merely adds to the confusion by reinforcing the misconception that there is a relationship. The seeming relationship between the two variables is called a “spurious relationship.” The two variables are unrelated, and the correlation between them (which appears to be very strong) is merely spurious. There are several scenarios in which two variables can be related: the relationship can be causal (one variable affecting the other—sales vs. revenues), both variables could be dependent on a third variable (number of customers in a store impacting the number of checkouts open and the queue lengths), or the relation may be by chance. The scatter diagram is most useful for displaying causal relations. In the case of Z affecting X and Y, it is best to chart Z vs. X and Z vs. Y, instead of X vs. Y. Finally, a chance relationship should probably not be charted at all. This is an example of garbage in – garbage out. In this case, the chart displays a picture of a relationship that does not really exist. It lends credibility to the false belief that there is a relation between these two variables, and it should not be used at all. Lesson: Think before you chart.

2030405060708090

100110

3 5 7 9 11

Unemployment (%)

Snowfall (inches)

Page 25: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

24

Which chart to use? Given the large selection of charts, a user may wonder which chart to use for a given data. We will attempt to provide some guidelines here. In most instances, you will be displaying univariate data (such as time series), or bivariate data (two variables). For univariate data, column chart or line chart is usually suitable. (Note that Tufte’s principle of simplicity implies preference of line charts over column charts.) For bivariate data, a scatter chart is suitable. Hence, when faced with a charting task, ask yourself whether a column/line chart or a scatter chart will do the trick. Consider other alternatives only if you conclude that these simple charts are not adequate. Below we give a summary table of standard chart types. The usefulness mark is our subjective assessment of this chart, and should not be used as a rule. The most useful charts (those you will use most often) get 10 or 9. The charts we find of little use get 2 or 3. The charts that are useful for a specific purpose and will be used only occasionally receive scores of 5 or 6. We have excluded the 3D versions of the charts from our table. As a general principle, we recommend using 3D with great caution, if at all. In our opinion, 3D merely adds style to a chart. If done carefully (and in moderation) it could enhance a chart. However, if overdone, it can easily detract from the main message. Standard Types: Chart type Subclass Usefulness Comment

Column Clustered Column 9 Useful for histograms

Stacked column 7 Displays elements of a whole -- absolute 100% stacked column 7 Displays elements of a whole -- relative

Bar Clustered bar 9 Horizontal Column Chart Stacked Bar 5 100% Stacked Bar 5

Line Line 10 Useful for time series Stacked Line 4 100% Stacked Line 4

Pie Pie 3 Exploded Pie 2 Pie of Pie 1

Scatter Scatter 10 (X,Y) chart for two-dimensional data Connected Scatter 7

Area Area 3 The first area hides the rest Stacked Area 7 100% Stacked Area 7

Doughnut Doughnut 2 Multiple series version of a pie chart Exploded Doughtnut 2

Radar Radar 3 Radar with Markers 3 Filled Radar 1

Page 26: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

25

Surface Contour 5 Useful for charting f(x,y) Wireframe contour 5

Bubble 6 Useful for 3-dimensional data

Stock High-Low-Close 5 Useful for plotting three related series Open-High-Low-Close 4 Volume-High-Low-Close 3 Volume-Open-High-Low-Close

2

Custom Types: Most of the options provided under "Custom Types" add little power to the chart wizard. In many instances, a custom chart is nothing but a formatted version (either in black-and-white for ease of printing on B&W printer, or in fancy colors, fonts, and 3D for PowerPoint presentation). All custom type charts are formatted to have a polished look. In the next table, we summarize the custom charts. In this table, we assign each chart a value, which should be interpreted as the added value the chart provides. If a certain custom chart is merely a polished version of a standard type, we give it a low value. However, this does not mean they are useless; they could be useful for quick generation of slick charts for someone in a hurry. In the last column, we indicate whether the chart adds functionality to the Chart Wizard. If a chart cannot be easily constructed from a standard chart, it gets a star. Chart Value Comment NewArea Blocks 2 Colorful 3D area chart (w. proper angle) B&W Area 1 B&W version of an area chart (Note: this is stacked!) B&W Column 1 B&W 3D column chart with data table B&W Line 1 B&W version of an area chart (it says line, but..) B&W Pie 2 The name says it all (turns color with more than 10 pts.) Blue Pie 4 A fancy exploded pie for PowerPoint Colored Lines 2 Colored lines on a black background (for PPT) Column-Area 3 One series in area, the other in columns Columns w. depth 2 3D column chart Cones 1 Useful to display the population of coneheads * but..Floating bars 9 Gantt charts * Line-Column 3 One series as a line, the other as a column Line-Col. on 2 axes 7 Enables the charting of two series of different magnitudes * Line-Line on 2 axes 7 Line-line version of the above chart * Logarithmic 3 Useful to display exponential growth, but can be easily

generated by taking the logarithm of the series (*)

Outdoor bars 2 A fancy bar chart for PPT Pie Explosion 5 A fancy 3D exploded pie for PPT Smooth Lines 4 A line chart with rounded edges – a form of curve fitting (*) Stack of Colors 2 A 100% stacked column chart formatted for PPT Tubes 1 A stacked bar chart with ugly formatting In summary, we believe that there are only two charts among the custom charts that are particularly valuable: the floating bars, and the line-column or line-charts on two axes.

Page 27: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

26

Faults of the Default: Given the large variety of options available it is not an easy task to set defaults that will work well on all data. We think Excel's defaults are particularly poor and some formatting is needed to make the chart presentable. The following data is taken from StatCan. The "Married" row includes persons legally married, legally married and separated, and persons living in common-law unions. (Ref: http://www.statcan.ca/english/Pgdb/People/Families/famil01.htm)

1994 1995 1996 1997 1998 Population Total 29,035,981 29,353,854 29,671,892 30,003,955 30,300,422

Male 14,383,261 14,537,509 14,691,777 14,853,426 14,997,596 Female 14,652,720 14,816,345 14,980,115 15,150,529 15,302,826

Single Total 12,624,813 12,574,350 12,536,661 12,724,026 12,882,536 Male 6,738,910 6,719,031 6,704,605 6,802,694 6,886,141 Female 5,885,903 5,855,319 5,832,056 5,921,332 5,996,395

Married Total 13,685,640 14,069,207 14,444,072 14,494,578 14,546,493 Male 6,839,845 7,028,219 7,212,581 7,233,410 7,255,128 Female 6,845,795 7,040,988 7,231,491 7,261,168 7,291,365

Widowed Total 1,447,509 1,452,894 1,456,644 1,476,818 1,495,695 Male 251,337 251,023 250,575 258,223 265,541 Female 1,196,172 1,201,871 1,206,069 1,218,595 1,230,154

Divorced Total 1,278,019 1,257,403 1,234,515 1,308,533 1,375,698 Male 553,169 539,236 524,016 559,099 590,786 Female 724,850 718,167 710,499 749,434 784,912

Suppose we wish to see a chart showing the total number of single Canadians versus the total number of married Canadians over the 1994-1998 period. For a quick chart, we might do the following:

•= Highlight the Single-Total row •= Press Ctrl and highlight the Married-Total row •= Click on the Chart Wizard •= Click Finish Series 1 = Single, Series 2 = Married

What does the chart tell you?

•= The number of married people is about twice as high as the number of singles.

11,500,000

12,000,000

12,500,000

13,000,000

13,500,000

14,000,000

14,500,000

15,000,000

1 2 3 4 5

Series1Series2

Page 28: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

27

Checking the data suggests that this conclusion is incorrect. In fact, the numbers are quite comparable. Why did this happen? It happened since the Chart Wizard chose to start the Y-axis at 11,500,000. We did not ask for this; Excel took control. In fact, there is no way we can prevent this from happening even if we do not hurry and click "Finish." The scale of the chart can only be changed after the chart is completed. We right-click on any one of the numbers on the Y-axis, and click on "Format Axis." Then we set the minimum equal to zero to produce a more realistic picture of the data. We also add the x-axis labels and the series names for a more complete chart. (Details: Right-click inside the chart, select "Source Data". Select the "Series" tab. Click on "Series 1" and type the series name "Singles." Click on "Series 2" and type the series name "Married." Then go to Category (x) axis labels, and enter the cell references of the year labels.)

This is a more accurate representation of the data. Note that the truncation of the Y-axis can result is charts that are misleading or plain silly. For example, suppose we wish to chart the numbers of single males and single females. The default chart is the following:

0

2,000,000

4,000,000

6,000,000

8,000,000

10,000,000

12,000,000

14,000,000

16,000,000

1994 1995 1996 1997 1998

SingleMarried

5,200,000

5,400,0005,600,000

5,800,0006,000,000

6,200,0006,400,000

6,600,0006,800,000

7,000,000

1994 1995 1996 1997 1998

MaleFemale

Page 29: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

28

This chart suggests that the number of single males is twice as high as the number of single females. Bad news for single males looking for partners.. Note that the y-axis is truncated at 5,200,000. While the number of single males is higher than the number of single females, the difference is not nearly as high as the chart implies. Suppose we chart the number of married males and females in 1998 using the defaults. (Yes, you are not supposed to chart a table of two numbers. We are merely trying to make a point here about the defaults):

Based on the length of the columns, this chart suggests that the average male is married to about 2.5 females. Not in Canada.. If we reproduce the chart using zero as the Y-axis minimum, we realize that the difference between the two column lengths is negligible. Quite apart from proper charting technique, the question remains: why are there more married females than married males in Canada? The difference is only 0.5%, and can be perhaps attributed to the accuracy of data collection. Or perhaps it reflects a difference between the state of minds of married men and women. ;-) Having made the point about the Y-axis scale (and perhaps having belabored it a bit), let us go back to the second chart of this example, where we had fixed the scale and added some labels. Is this a presentable chart? We argue that it is not. Here is a list of reasons:

•= The plot area is gray. Presumably, this will make the plot area stand out on a white page (or screen), and focus the attention on it. However, we find this somewhat distracting, and we would prefer a blank plot area. The default gray is a 25% gray. A 10% gray may be acceptable, but is not available as an option.

•= The gridlines are solid black lines. We find this somewhat of a distraction too. We prefer lighter lines that are not solid. Some argue that there is no need for grid lines, while others find them somewhat useful as benchmarks.

•= The colors of the columns are not well selected. The defaults colors for the first two series are as follows: Series 1: Periwinkle (a dirty blue), Series 2: Plum (purple). Considering that one of the most frequently column charts will be the one displaying two series, the selection of periwinkle and plum on a gray background is not very good. On a B&W printer, you get two gray bars against a gray background. Granted, the grays are of different shades, but we think the contrast between these three colors should be

7,230,000

7,240,000

7,250,000

7,260,000

7,270,000

7,280,000

7,290,000

7,300,000

1998

MaleFemale

Page 30: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

29

maximized. We would have selected a white background, a black Series 1, and a gray Series 2. The default colors do not work well for color prints or PowerPoint presentations either. Red and blue on white work much better than periwinkle and plum on gray. As a third color, we would suggest green. Judging from the packaging of overhead pens and ball point pens, we are certain that we are not the first to have identified red-blue-green (along with the default black) as a good color set. For whatever reason, Microsoft chose to be different.

To produce the following chart, we deviated from the defaults in the following ways:

•= Change the plot area color from gray to white. •= Change the color of Series 1 from periwinkle to black. •= Change the color of Series 2 from plum to gray (40%). •= Change the plot area border color from gray to black (could also be removed). •= Make the plot area border thicker. •= Change the gridline color from black to light gray (25%). •= Change the gridline style from solid to dotted. •= Make the chart area border thicker, use shadow and rounded corners (possibly distracting). •= Change the gap width for the columns from 150 to 200 (resulting in thinner columns).

It is worth mentioning that such formatting takes very little time once you become familiar with the Chart Wizard.

Please note that some of the changes we made can be classified as cosmetic while others make the chart easier to read. We do not suggest that our settings are universally "better" than the defaults on all accounts. If you like the periwinkle and plum on gray with solid black gridlines and thin borders, then use them. However, keep our objections to the defaults in mind when making formatting choices. The goal is to create a chart that is clean, accurate, accessible, and attractive. The chart should focus the attention on the data and the messages in it. Here is one more reason for spending some time on formatting. Every Excel user knows what a default chart looks like, and many use the defaults. If you use something other then defaults, the reader (or the listener) knows that you have made an extra effort to convey your information.

0

2,000,000

4,000,000

6,000,000

8,000,000

10,000,000

12,000,000

14,000,000

16,000,000

1994 1995 1996 1997 1998

SingleMarried

Page 31: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

30

The chart above has been created with the B&W printer in mind. Color gives you many more options, and it is a good idea to use color when creating charts for presentations. However, please keep in mind that color combinations that look good on your ski jacket may not look good on the screen. Also note that the colors you see on your monitor may not be the same colors as your computer projector will produce on the screen. Make sure you test the colors on the projector before finalizing your presentation. Here is an interesting way to deviate from the defaults. In addition to the use of different colors, gradients, patterns, and textures for the data series, or the plot area (or the chart area), the Chart Wizard allows for the use of pictures (gif or jpg files). For example, to chart Canada's population, we used the maple leaf to decorate the columns in the following column chart. Each maple leaf correspond to 5 million people. We also added the total population figures at the top of each column.

Please note that while pictures make the chart appear slicker and more professional, they might distract from the main message. Hence, we would recommend against frequent use of pictures. However, on occasion, they may allow you to turn a standard chart into a more appealing one. For example, if a company has a square-shaped logo with a dark background, it may be a good idea to use the company logo to display a time series of annual sales on a column or line chart.

Population

24.626.2

27.829.4

30.9

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

1980 1985 1990 1995 2000

Million

Page 32: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

31

Message Drives the Chart: We will use several examples to demonstrate how one can use the best chart for the data on hand. Example 1:

Civilian Labor Force (1,000)

Year Males Females 1960 46,388 23,2401970 51,228 31,5431975 56,299 37,4751980 61,453 45,4871983 63,047 48,5031984 63,835 49,7091985 64,411 51,0501986 65,422 52,4131987 66,207 53,6581988 66,927 54,7421989 67,840 56,0301990 68,234 56,5541991 68,411 56,8931992 69,184 57,7981993 69,633 58,407

Suppose we would like to display this labor force data from the US Bureau of Labor Statistics using a column chart (a line chart would work just as well, perhaps better). There are three charts on the next page. Which one to use? If we wish to compare the size of the male labor force against the size of the female labor force as a function of time, then we should choose A, a simple column chart. If we are more interested in displaying the growth of the total labor force, we should use B, a stacked column chart. If we are interested in the breakdown of the labor force between the two genders, and in the change of this breakdown over time, then we should use C, a 100% stacked column chart. (Note that this chart clearly shows that the female portion of the labor force went from about one-third to almost one half in 30 years.) Why a column chart and not some other chart?

•= Bar chart: We are used to displaying time on the x-axis. A bar chart would be awkward here.

•= Pie chart: A pie chart displays useless information in this case. If we chart the data in columns, it will only plot the first column (Males) and display the breakdown of the total to different years--fairly useless. If we chart the data by rows, then it will only show the gender breakdown for the first year--again not very useful.

•= Area chart: A simple area chart would work well here since one of the columns dominates the other (for each year). Stacked and 100% stacked area charts would work as well.

•= Scatter chart: It is possible to use a scatter chart here. However, it is not advisable to use a scatter chart unless we suspect a causal relationship between the two variables.

Page 33: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

32

0

10,000

20,000

30,000

40,000

50,000

60,000

70,000

80,000

60 70 75 80 83 84 85 86 87 88 89 90 91 92 93

MalesFemales

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

60 70 75 80 83 84 85 86 87 88 89 90 91 92 93

FemalesMales

0%10%20%30%40%50%60%70%80%90%

100%

60 70 75 80 83 84 85 86 87 88 89 90 91 92 93

FemalesMales

A

B

C

Page 34: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

33

We can produce two more charts of this data that may make some sense: doughnut chart and radar chart. We display these charts below. Doughnut chart:

The male labor force is represented by the inner layer of the doughnut and the female labor force is represented by the outer layer. The first year is at the top. When you look carefully, you can see that the sizes of the slices grow as we go clockwise. (Of course it is much easier to see this in a column chart.) Unfortunately, this chart only displays the distribution of the sum of the labor force into years, concealing the size of the female labor force in relation to the male. To see that, we need a different donut (chart the data in rows).

In this donut, the years are arranged from inside out. As we see, females make up more of the labor force as years go by. A more cumbersome version of the 100% stacked column chart. We would not use these dough-not charts since we think that the column charts work much better.

607075808384858687888990919293

MalesFemales

Page 35: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

34

Radar chart:

Each gender is shown with a circular plot on this chart. The center of the chart represents the origin (zero), and as we go further away from the center the size of the labor force increases. The chart shows that both circular plots are moving away from the origin as we go from 1960 to 1993. As well, it shows that the male-female gap is closing. We find this chart somewhat cute, but we would still prefer the good old column chart. Before we complete this example, we should note that every chart we produced so far is faulty. Why? Go back and look at the data. The gap between the data points is not uniform! It goes from 10 (between the first two data points), to 5, to 3, and then to 1. However, the column charts show the same gap between every data point. This gives the observer the impression that there was a rapid increase in the labor force at the beginning. How do we fix this? By introducing dummy rows to the table. See the next chart, which is also souped up a bit via formatting.

6070

75

80

83

84

858687

88

89

90

91

92

93

MalesFemales

010,00020,00030,00040,00050,00060,00070,00080,000

60 70 75 80 83 84 85 86 87 88 89 90 91 92 93 Year

Civilian Labor Force (1,000)

MalesFemales

Page 36: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

35

Example 2: Suppose this is the sales of a given product in four sales regions: North, South, East, and West. We will use a line chart since with four time series the column chart will become too busy.

Note that this chart summarizes the da

•= North are climbing slowly but s•= South are steady with the excep•= East are quite erratic with ups a•= West are increasing steadily an•= South is the leader in sales amo•= West has moved from the botto

Another useful chart is a stacked area cregions:

0

50

100

150

200

250

300

350

400

1991 1992 1993 1994 19

0

200

400

600

800

1000

1200

1991 1992

Sales ('000) 1991 1992 1993 1994 1995 1996 1997 1998North 225 230 233 242 251 254 258 262South 350 348 351 335 310 295 300 295East 145 210 180 145 190 150 205 170West 70 100 120 155 165 190 220 235

ta quite nicely. The sales in the teadily, having passed $250,000 tion of a $50,000 drop during mid-90s nd downs in the $150,000 - $200,000 range d strongly coming close to $250,000 ng regions m to the third position, close to the second region

hart, showing the total sales and its distribution among the

95 1996 1997 1998

North SouthEastWest

1993 1994 1995 1996 1997 1998

WestEastSouthNorth

Page 37: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

36

This chart shows that the total sales have gone from $800,000 in 1991 to almost $1M in 1998. Most of the observations we made above can be made on this graph as well, but not as easily. The rapid increase of sales in the West is clearly visible, as is the ups and downs of the East. However, the $50,000 drop in the South and its leadership are not as clearly visible as in the line chart. If we produce a 100% stacked area (or line, or column) chart, we can see the percentage contributions of each region over time clearly. Instead of the stacked area chart, below we show the "Stack of Colors" custom chart which serves the same purpose. In fact the stack of colors chart is more accurate since the area chart gives the misperception that the data is continuous while the stack of colors chart makes it clear that there is only one data point per year.

You can experiment with other chart types using the same data.

0%10%20%30%40%50%60%70%80%90%

100%

1991 1992 1993 1994 1995 1996 1997 1998

North South East West

Page 38: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

37

Charts and Better Charts: Example 1: Tradeoff between 2 variables (bivariate charts) Suppose you are in the market for a rental apartment. After studying the newspaper and making some phone calls, you have narrowed the choice down to 5 apartments that are very similar in all respects except two: rent and distance to the university. The table on the right summarizes the relevant data. Here is a column chart, and a scatter chart.

The scatter chart is considerably more useful for at least two reasons: 1) the axes can be scaled independently, 2) the distance-rent tradeoff is more obvious. Furthermore, the scatter chart makes it clear quickly that one of the alternatives, namely D, is “dominated.” Note that B is closer and costs less. Depending on your preferences, you can pick, A, B, C, or E, but you should never pick D. This is very difficult to see in the column graph, but rather easy to spot in the scatter chart. A chart that reveals more is a better chart. As a general principle, two-dimensional data should be charted using a scatter graph. Example 2: Univariate series with labels (example: annual production of companies)

1989 Beer Production (in Million barrels)

0102030405060708090

Anh.-Busch Miller Stroh's Coors Heileman

0

500

1000

1500

2000

2500

3000

3500

4000

A B C D E

Distance (m.)Rent ($/mo.)

300350400450500550600650

900 1900 2900 3900 4900

Distance (m.)

Ren

t ($/

mo.

)

1989 Beer Production (in Million barrels)

0

Anh.-Busch

Miller

Stroh's

Coors

Heileman

B

20

D

Apt. Distance (m.) Rent ($/mo.)A 3800 400 B 2200 500 C 1100 600 D 2600 520 E 3100 460

40 60 80 100

Page 39: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

38

A B C D E Fixed cost 7 6 9 1 3 Operating cost 4 6 10 3 7 Highway access 5 8 2 9 7 Unemployment 8 3 5 4 1 Population 2 6 1 10 7

A B C D EFixed cost

Highway accessPopulation

0123456789

10

Fixed costOperating costHighway accessUnemploymentPopulation

0

2

4

6

8

10

Fixed c

ost

Operat

ing co

st

Highway

acce

ss

Unemplo

ymen

t

Popula

tion

ABCDE

This is not a time series; the data corresponds to different companies. The line chart is more suitable for plotting time series. The first impression one may get from the line chart is that the beer production is going down. In this case, a bar chart may be more appropriate. If one wishes to display the percent market share, a pie chart can be used. Example 3: Tradeoffs between many (>2) objectives Suppose you wish to locate a manufacturing facility in a region. You are considering a total of five objectives, and a preliminary study identified five candidate sites for the facility. You have evaluated each facility under each objective, and gave them a subjective score from 1 (worst) to 10 (best). The table to the right contains the scores. For example, according to this table, site C has very low fixed and operating costs, but it also has poor highway access, and it is in a remote area with low population (customer base). On the other hand, site D is in a major city, but has very high fixed costs. Consider a 3D column chart and a line chart of this data.

The 3D column chart may appear impressive at first. However, it offers no help in revealing the data (since some values are hidden). In contrast, the simple line chart displays the strengths and weaknesses of each site nicely. This chart also makes it clear that there is no clearcut winner among the candidate sites. One could use a weighted score to determine the best site. If one uses equal weights, then site B is the best. If one reduces the weight of the fixed cost to 0.1 and distributes the rest of the weight equally among the four remaining objectives, site D comes out ahead. If, on the other hand, one increases the weight of

Page 40: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

39

1111

the fixed cost to 0.3, then site C is the chosen one. Finally, if one gives a sufficiently high (low) weight to unemployment, then site A (E) is chosen. Hence, depending on the preferences of the decision-maker, any one of these sites could be selected. How could we tell if one of the sites could never be selected? If one of the lines is consistently below another one, then the alternative corresponding to the lower line should never be selected. Note that this is the multidimensional version of the dominance rule we have seen in Example 1. Regarding the use of column charts with multiple series, we recommend against using them with more than 3 series, and we recommend the use of the 3D column chart with great caution if at all. Example 4: Two-dimensional data Suppose you are interested in the relation between the size and the price of a bungalow near the university. You start with a small sample consisting of five recent sales from McKernan (actual data). To get a feel for the data, you wish to chart it. You already know that for bivariate data such as this, the choice is the scatter chart.

Both of these are scatter graphs. However, in the one on via broken lines. This is not a good idea. This chart gibungalow prices go up steeply between 900 and 1000 squa1000 and 1100 square-feet. This is probably not the caseintervals between the data points, and we should not "connplain scatter chart. If we suspect a linear relation betweenthe chart by adding a linear trendline.

120000

125000

130000

135000

140000

145000

150000

155000

700 800 900 1000 1100 1200 1300 1400

Size (sq-ft)

Price ($)

Address Size (sqft) Price ($) 7314 111 St 797 125500 1216 75 Av 915 132000 0929 75 Av 969 144500 1258 73 Av 1112 138000 1242 75 Av 1308 151900

155000

the left, the data points are connected ves the reader the impression that the re feet, and then they go down between . At any rate, we have no data for the ect the dots." It is a better idea to use a size and price, we can indicate that on

120000

125000

130000

135000

140000

145000

150000

700 800 900 1000 1100 1200 1300 1400

Size (sq-ft)

Price ($)

Page 41: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

40

200

250

300

350

400

450

500

1 2 3 4 5 6 7 8 9 10 11 12

Quarter

Sales (1,000 units)

Examples of Less Common Applications: Example 1: Error bounds

In this chart, the gray line represents actual sales for Quarters 1 through 8 and the black line represents a model that best fits the past data. Note that the black line has been extended into Quarters 9 through 12. These are the forecasts for the next year. (We are not concerned with how these forecasts are generated--you will learn about sales forecasting in other courses.) Note that we have added "Error bars" to the black line. These bars are set to plus/minus 10% of the forecasted value. Note that for the first 8 periods, the actual sales have always been within 10% of the forecast. This gives us some confidence (but no guarantees) that the quarterly sales during the next year will be within 10% (error bars) of the forecasts. Of course these error intervals can be externally computed on Excel and put into the graph (see chart below). However, the "Error Bars" feature (right-click on the forecast series, click on Format Data series, click on Y-error bars) display the range automatically.

200

250

300

350

400

450

500

1 2 3 4 5 6 7 8 9 10 11 12

Quarter

Sales

Page 42: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

41

ABCDEFGHIJKL

Example 2: Student Evaluations The attached table contains summarized results of the students responses to the "Overall, this course was, ..." question our introductory business courses over a two-year period (actual data). Students provide a numerical answer on a (1,5)-scale to this question, where 1 = poor, 2 = fair, 3 = acceptable, 4 = very good, 5 = excellent. For example, for Course E, 4.8% of the students thought it was poor, 13.2% of them thought it was fair, and so on. How do we summarize this information in a chart? An obvious alternative is to plot the entire table. However, there are 12 courses, and a column or line chart that contains all of them would be a real mess. Another alternative is to generate 12 separate histograms (column charts). However, this would make cowill be impossible to produce a good chart of the raw data w One possible manipulation is the aggregation of all scores focan be accomplished by taking a weighted average of the scaverage is calculated by 1(0.048)+2(0.132)+3(0.365)+4(compute all of the weighted averages and generate a column

Unfortunately, this chart masks the differences between theare perceived to be better courses than K and L. Howeverthe others is difficult. Apparently, the manipulation wsummarized five numbers in one weighted average. Another option is to summarize the 5 scores for each coursscores are combined into a "disapproval" score, and the 4"approval" score. This would allow us to use a scatter chart

0.000.501.001.502.002.503.003.504.004.505.00

A B C D E F G H ICourse

Evaluation

1 2 3 4 5 0.7% 5.4% 7.6% 44.3% 41.4%0.2% 3.8% 17.8% 52.8% 25.3%2.9% 9.7% 29.9% 43.8% 13.1%3.9% 14.6% 26.3% 47.0% 7.2% 4.8% 13.2% 36.5% 36.6% 8.8%

5.1% 14.6% 34.0% 39.0% 6.7% 5.9% 13.0% 39.7% 36.3% 4.9% 8.1% 14.9% 37.9% 30.5% 8.8%

6.9% 17.0% 37.6% 31.4% 7.0% 17.7% 17.8% 28.1% 29.4% 7.2%

26.0% 17.0% 31.0% 19.0% 7.0% 37.0% 28.0% 24.0% 9.0% 2.0%

mparisons difficult. It seems that it ithout some kind of manipulation.

r a course into a single number. This ores. For example, for Course E, the 0.366)+5(0.088) = 3.31. We can chart.

courses. It is apparent that A and B , the differentiation between most of as a little too drastic; after all we

e in two numbers, where the 1 and 2 and 5 scores are combined into an to display the scores.

J K L

Page 43: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

42

0

10

20

30

40

50

60

70

80

90

male under 20 male betw een male over 60

female under 20 female betw een female over 60

Note that this scatter chart displays the differences between the student ratings of the courses considerably better than the column chart. Also note that the information that seemed to be lost on our aggregation of the data (namely the percentage of students who gave the course a 3) is actually contained in the chart. Consider the diagonal line. The data points would be on this line if no student gave the courses an evaluation of 3. Hence, the further the points are away from his line, the more students gave the course a 3.

Example 3: Population Pyramid

This is an excellent example for a creative use of the bar chart. This chart contains the age distribution of the Austrian population (1991 census). What you see here are actually six different bar charts (each color is a separate chart), all stacked. The Y-axis of this chart goes right through the middle. The numbers representing the male populations at each age have been multiplied by -1 to "reverse" the corresponding bar charts (so they do not cover the female bar charts). The outcome is a very descriptive visual display that summarizes the composition of the Austrian population in terms of age. In class, we will see more of this example.

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

0.0% 20.0% 40.0% 60.0% 80.0% 100.0%

Disapproval

App

rova

l

Page 44: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

43

Example 4: Mapping The scatter chart can be used to display geographical information. The chart below is taken from a student project completed in our BCom program in 1998. The task was to find locations for cement warehouses in Saskatchewan for a local cement company. The goal was to minimize transport costs between the warehouses and their service regions. The student group considered different numbers of warehouses. This chart, which displays southern Saskatchewan, summarizes the scenario with 4 warehouses. The optimal warehouse locations from West to East are Unity, Martensville, Lumsden, and Midale. These locations are shown with red circles. As well, the service areas of each warehouse is displayed. For example, all population centres shown with a triangle are in the service area of Midale, all diamonds are served by Martensville, and so on. How is this chart produced? There are five different series in this chart. Every population centre is represented by two numbers: latitude and longitude (which serve as the x and y coordinates in the scatter chart). The population centres in the service region of each facility constitute a series. The fifth series consists of the 4 warehouse locations. Finally, the borders are simple lines drawn using the Drawing Toolbar.

Page 45: VISUAL DISPLAYS WITH EXCEL - MEF · •= The Visual Display of Quantitative Information (1983) •= Envisioning Information (1990) •= Visual Explanations: Images and Quantities,

44

Example 5: We finish this document with a chart that does not come from Chart Wizard, but from the icon immediately to its right: Map. Suppose you wish to chart the following data (p. 118, October 3-9, 1998 issue of Economist) about France, Germany, Italy, and Spain. GDP increase over last year: 3, 1.7, 1.1, 3.9. Industrial production increase (over last year): 5.3, 2.5, 1.3, 8.4. Retail sales increase: 2.9, 1.2, 1.7, 1. Unemployment rate: 11.8, 10.9, 12.4, 18.6.

As you see, Excel allows you to produce a large variety of visual aids. Hopefully this document will help improve your skills in generating visual displays. Keep in mind that producing a good chart requires a little effort, but resist the temptation of overdoing it. The goal, after all, is to convey information visually, as clearly as possible.

HAPPY CHARTING

Europe Countries1

GDP increaseIndustrial production increaseRetail sales increase

Europe Countriesby Unemployment rate

18.6 to 18.61 (1)12.4 to 18.6 (1)11.8 to 12.4 (1)10.9 to 11.8 (1)