how honest is that map? · area represents the quantity being mapped: votes. massachusetts: 12...

1

Cartograms The Lie Factor SummaryAnalysis

How Honest Is That Map?

Critically Evaluating Map Integrity

William Huber

Quantitative Decisions

Slides and notes copyright © 2009, Quantitative Decisions. Reproduction and distribution for educational purposes is expressly allowed.

Questions for viewers:

Cartography concerns itself with constructing “good” maps. Many things go into making a map good. Here, we concentrate on an aspect of maps that attempt (or purport) to convey quantitative information: that is, numbers. How can we determine when a quantitative map presents an unbiased picture of the data?

------------------------------------------------------------------

These slides explain part of an Internet conversation about cartograms that occurred in late October 2008 near the end of a US presidential election campaign

pitting Barack Obama, Democrat, against John McCain, Republican. At the time, opinion polls had begun to shift in favor of Obama, but in the popular vote he was only marginally ahead. The US presidential election is won by a majority of electoral votes acquired by winning individual states (with two minor exceptions, it is winner take all in each state). The number of electoral votes in each state equals two plus an amount proportional to the population, ranging from 1 through 53

additional votes. Smaller (low population) states tended to favor McCain.

The notes accompanying each slide amplify on the slide’s point and, sometimes, provide technical details that are not intended as part of the talk.

2

DRAFT 2/24/09 2


Plan

►We will examine some maps: cartograms,

► … using a tool that is useful for critical evaluation of map integrity, Edward Tufte’s Lie Factor.

►Close analysis of the Lie Factor reveals interesting features of these maps.

►Cartographic errors uncovered by this analysis create subtle but important differences in perception.

►Quantitative tools for active interpretation enable us to Quantitative tools for active interpretation enable us to Quantitative tools for active interpretation enable us to Quantitative tools for active interpretation enable us to make better use of maps and overcome limitations and make better use of maps and overcome limitations and make better use of maps and overcome limitations and make better use of maps and overcome limitations and distortions introduced by map makersdistortions introduced by map makersdistortions introduced by map makersdistortions introduced by map makers

► … but, along with the tools, we need access to the underlying data.

Highlighting on the top strip will track our progress through this presentation.

3

DRAFT 2/24/09 3


The US Presidential Race, October 2008

http://www.geog.ucsb.edu/~indy/gis/eVote.html

Cartograms have suddenly become a

popular way to map quantitative data.


Does McCain have a chance?

(I think this map says so: the blue states don’t seem overwhelming.)

Does anything strike you as wrong about this map? (Besides the odd shapes assumed by the states!)

------------------------------------------------------------------

“Interest in cartograms in at least both Britain and America tends to surge around the time of national elections (particularly when urban based parties win and also just after population censuses have been taken).” [Dorling, Daniel: Area

Cartograms: Their Use and Creation. Geo Books, Norwich, England, 1996; p. 37. http://www.qmrg.org.uk/wp-content/uploads/2008/11/59-area-cartograms.pdf ]

4

DRAFT 2/24/09 4


A Cartogram Uses Area to Represent Values

Area represents the quantity being mapped: votes.

Massachusetts:

12 electoral votes.

3,283 pixels (including

half the border pixels).

3283/12 = 274 pixels per

vote.

This is a kind of

“mapping intensity:” how much of the

perceptual unit (area) is associated with a unit of

the mapped quantity

(votes).

Questions for viewers (answers below):

When maps use other graphical elements to convey data, what would be the analogs of mapping intensity? E.g.,

Choropleth maps use color (hue, lightness, possibly saturation) and, arguably, area;

Dot maps use symbol area and color;

Line maps use symbol width and color;

What kinds of variables are suitable for a mapping intensity calculation (nominal, ordinal, interval, ratio)?

------------------------------------------------------------------

“Pixels per vote?!” Bear with me! There is no “real world” here: areas on the map are depicting an abstract political quantity, numbers of electoral votes.

The image I worked with was a screen shot. To maximize precision, I made the map as large as possible on a high resolution monitor (about two million pixels). There was a lot of wasted space, in part because the original showed a highly distorted Alaska near its correct geographic position. The resulting map had about 5,000 pixels per state on average, which is enough for two to three significant figures of precision in the calculations.

The idea of mapping intensity applies to more than area: almost all quantitative symbols used in mapping—hue, lightness, line thickness, point size, etc.—can be measured. The corresponding mapping intensity is the “amount” of each symbol intended to represent a unit of the underlying quantity. Computing this presupposes that the map is linear in the sense that a given difference in symbol value is supposed to correspond to a given difference in data value, independent of what the value may be.

Note that the mapping intensity is an objective measure: we could also define a “perceptual intensity” by relating the symbol value to a perceptual level. Such an exercise would have different aims, though.

Technical note

It takes a surprising amount of processing to estimate areas in such an image accurately. Here’s what I had to do:

•Separate the image into [R], [G], [B] bands.

•Discretize the bands into 8 colors per band, because the colors varied slightly within each state.

•R1 = ( [R] / 32 ).Int

•B1 = ( [B] / 32 ).Int

•G1 = ( [G] / 32 ).Int

•Recombine the discrete bands into an image.

•RGB = ( [B1] * 8 + [G1] * 8 + [B1])

•RegionGroup this coarse image: this created zones for the interior of each state, plus a gazillion tiny zones corresponding to the varying colors within the state bondaries.

•Zones = ( [RGB]).RegionGroup(true, false, nil)

•Remove small zones.

•( [Zones . Count]< 99 ).SetNull([Zones])

•Expand the remaining zones by one pixel to fill in the boundaries:

•(Expression too big.)

•Manually supply attributes for the zones. (There’s no way to obtain state names from the image!)

•Join electoral vote data to the zones (which automatically have cell counts—areas—associated with them).

•Post-process the data to combine, e.g., northern and southern Michigan, mainland NY with Long Island, etc. (Did you notice the missing portion along the northern coast of North Carolina? That was lost in the small-zone removal.)

Visual examination of the results at each stage prevented gross errors from occurring.

------------------------------------------------------------------

As far as cartograms are concerned, because areas are non-negative and are supposed to be in direct proportion to the underlying attribute, you must be able to interpret that attribute on a ratio scale. For maps that use other representational methods, such as a two-color graduated scale, it can make sense to measure distortion for interval variables. For nominal and categorical variables having arbitrary numerical codes (as with many common land use-land cover maps), computing a mapping intensity would be nonsensical.

5

DRAFT 2/24/09 5


Graphical Integrity Exists When a Map

Accurately Represents Quantities

Montana:

3 electoral votes.



1832/3 = 611 pixels

per vote.

The symbol for Montana is more intense than the symbol for Massachusetts: it

uses more area to represent a vote than the Massachusetts symbol does. It

reflects an inconsistency within this cartogram.

This illustrates Tufte’s first principle of graphical integrity: a map is a faithful representation of the data when its mapping intensity is constant.


Montana uses 611 pixels per vote. Massachusetts uses 274. Therefore, one state is too big and the

other is too small (relative to each other). Which is which?

Could we instead use votes per pixel to measure how this map uses area to convey numerical

information? If we did, how would that change our interpretation?

------------------------------------------------------------------

Like any such rule or principle, this one has limits and is not always applicable. A cartogram, though,

seems to be almost an ideal application: the intent behind its construction is to convey a quantitative

value by means of actual area on the map. A nonconstant mapping intensity implies the map distorts

the underlying values it purports to display. If that distortion is well understood—for example, if the

map is intended to represent a logarithm or other re-expression of the underlying values—then we

should measure mapping intensity by comparing area to the re-expressed data and Tufte’s principle

still applies, mutatis mutandis.

------------------------------------------------------------------

Montana is too large relative to Massachusetts, because it uses over twice as many pixels per

electoral vote as Massachusetts does. (This simple interpretation is why I chose mapping intensity,

rather than its reciprocal—mapping density—to compute Lie Factors.)

Yes, we could use votes per pixel (mapping density) to compute how the map associates area with

votes. It would not change our interpretation at all. It would change the numerical values. In this

case, we would be dealing with numbers between 1/180 = about 0.0055 and 1/611 = about 0.0016.

Data are easier to understand and explore when they are expressed in convenient units, which

usually means with numbers around the 1 – 1000 range if possible. If you really want to use density,

you might prefer measuring it in votes per thousand pixels: this would give values between 1.6 and

5.5 for this map.

6

DRAFT 2/24/09 6


Computing Internal Distortion

Montana:

3 electoral votes.



1832/3 = 611 pixels

per vote.

Massachusetts:

12 electoral votes.



3283/12 = 274 pixels

per vote.

RELATIVEDISTORTION

for MT and MA

611

274=

p/v

p/v2.232.232.232.23

Compared to Massachusetts, the area of Montana on this map is 2.23 times greater than it should be.


Why don’t we just subtract 274 from 611?

How do we determine which number is the numerator and which one the denominator?

Would our measurement of relative distortion change if we used votes per pixel (a mapping density)

instead of pixels per vote (the mapping intensity)?

------------------------------------------------------------------

Relative distortion is an intermediate step in computing the Lie Factor, which is coming next. It is

relative distortion that signaled something was the matter with the original map: in looking it over and

comparing the sizes of some states I knew (such as PA and NY, which border each other for easy

comparison), it seemed to me that their relative areas on the cartogram did not correctly reflect the

electoral votes they owned. The relative distortion calculation formalizes this intuitive visual

comparison.

------------------------------------------------------------------

Subtracting could be done, but the results depend on our units of measurement: votes per pixel,

pixels per vote, votes per thousand pixels, or whatever. The ratio is unitless and will be the same no

matter how you choose to measure mapping intensity. (If you use votes per pixel, just divide in the

other order: for example, 2.23 = 611/274 as shown here; with votes per pixel you would compute

(1/274) / (1/611) = 2.23.)

It is convenient to do the division in whatever order gives a value of 1 or greater: we’re looking for

relative distortion, after all, not some absolute measure.

If we used mapping density, we would get exactly the same relative distortion.

7

DRAFT 2/24/09 7


In This Map, Mapping Intensity Varies

Colors show “average” tendency of each group

RI

MA

, N

J

MT

WV

DE

ND

, N

V,

VT

, N

M

Mapping Intensity by Electoral Vote

0

50

100

150

200

250

150

- 200

200 - 2

50

250

- 300

300

- 350

350

- 400

400

- 450

450

- 500

500

- 550

550

- 600

Above

600

Intensity (pixels per electoral vote)

Vo

tes (

N=

528)

Mapping Intensity by State

0

2

4

6

8

10

12

14

16

18

150

- 200

200

- 250

250

- 300

300

- 350

350

- 400

400

- 450

450

- 500

500

- 550

550

- 600

Above

600


Co

un

t (N

=48)


What do the colors on the bars tell us? Why is that relevant or useful information to include?

Why are the two histograms slightly different? What does the one on the right (intensity by electoral

vote) tell us that was hidden in the left hand one (intensity by state)?

------------------------------------------------------------------

Both histograms show there are several states mapped with unusually low or high intensities. These

states do not account for many of the electoral votes, however. Among those contributing many

electoral votes, there is still a range of mapping intensities. The evidence for this includes that the

three largest bars in the right hand histogram, which account for most electoral votes, cover a range

from 400 to 550 pixels per vote.

------------------------------------------------------------------

The colors indicate how the states making up each bar tend to lean. It helps us see the relationship

between lower intensity and “blue” states.

The histogram on the right accounts for the electoral votes in each state. It shows that the most

extremely distorted states actually have relatively few electoral votes. Thus, the effect of varying

mapping intensity on the map and our interpretation of it will not be as profound as suggested by the

histogram on the left (of counts).

8

DRAFT 2/24/09 8


The Lie Factor

►The range of mapping intensities is 180 to 611 pixels per vote.

►The Lie Factor is the ratio 611:180 = 3.4. It means that some map symbols have 3.4 times the weight of other map symbols. It measures biasbiasbiasbias.

►The histogram shows this does not result from just one or two errors: mapping intensities are distorted throughout this map.


0

2

4

6

8

10

12

14

16

18

0 -

50

50

- 1

00

10

0 -

15

0

15

0 -

20

0

20

0 -

25

0

25

0 -

30

0

30

0 -

35

0

35

0 -

40

0

40

0 -

45

0

45

0 -

50

0

50

0 -

55

0

55

0 -

60

0

Ab

ove

60

0


Co

un

t (N

=4

8)


0

2

4

6

8

10

12

14

16

18

0 -

50

50

- 1

00

10

0 -

15

0

15

0 -

20

0

20

0 -

25

0

25

0 -

30

0

30

0 -

35

0

35

0 -

40

0

40

0 -

45

0

45

0 -

50

0

50

0 -

55

0

55

0 -

60

0

Ab

ove

60

0


Co

un

t (N

=4

8)

MT

RI


How does this histogram differ from the preceding two? How does that change or influence our perception of the Lie Factor?

Why does this number deserve to be called a Lie factor? Is that a fair representation of what it measures?

Why can’t Lie Factors be less than 1.0?

------------------------------------------------------------------

A well-drawn histogram offers us a gestalt view of the data distribution, providing much more in a glance than any set of summary statistics. This one shows the data spread broadly between 180 and 611 pixels per vote, indicating a Lie Factor of 3.4. We should be alert to outliers: we don’t necessarily want to condemn a map because it (inadvertently) distorts a small number of symbols. (In fact, strictly speaking the Lie Factor of this cartogram is infinite, because it neglects to show the District of Columbia, which has three electoral votes.) This particular histogram demonstrates the Lie Factor calculation for the electoral cartogram is robust: even were the most extreme states removed, the mapping intensities would still range by a factor of two or so. (Notice how the histogram on the preceding slide was redrawn to include zero on the intensity scale: this provides a visual reference for comparing relative mapping intensities. It is important to use a linear scale on the horizontalaxis so that the histogram itself may have its own graphical integrity!)

------------------------------------------------------------------

This histogram includes the zero of mapping intensity. This enables us to envision the Lie Factor: 3.4 means the bar for Montana is about 3.4 times further to the right of zero than is the bar for Rhode Island. This was not possible in the preceding histograms.

Maps and graphics used for propanda and distorting the truth—lieing—often do so by changing the relative sizes or intensities of their graphical symbols. In this sense, the term is fair. It’s a little unfair to presume, though, that any map with an appreciable Lie Factor intentionally lies. The examples discussed in this presentation were constructed by people attempting to convey accurate pictures of the state of the election; the Lie Factors were likely inadvertent.

We always divide the largest mapping intensity by the smallest: thus, all Lie Factors are 1.0 or larger. A Lie Factor of 1.0 means no distortion: the best possible graphical integrity in Tufte’s view.

9

DRAFT 2/24/09 9


Lie Factor: 3.4

“Lie Factors greater than 1.05 or less than 0.95 indicate substantial distortion…”

[Tufte, Edward, The Visual Display of Quantitative Information, p. 57].


Having been alerted to the large Lie Factor in this map, can you now point to evidence of distortion or bias? If not, or if this is difficult, what additional information would you need in order to identify this bias?

------------------------------------------------------------------

Tufte seems a bit extreme here, but a Lie Factor of 3.4 would be a “substantial” distortion by almost anybody’s measure, I think.

“Lie Factor” is Tufte’s term and, yes, it has bad connotations. It’s memorable though. I don’t like to sound so negative and certainly don’t want to offend anyone in the course of evaluating this map: it’s pretty, it’s interesting, it is available, and—when this conversation occurred—it was timely. I am grateful for the thought and discussion it stimulated. In that respect, it has probably been more successful than most maps!

Reflect for a second, if you will, on how one could determine the Lie Factor without (a) having access to a GIS to do the calculations, (b) having the knowledge and persistence to do the processing needed, and (c) also having a table of the underlying data (electoral votes per state). It’s impossible! Usually, when it comes to graphical integrity—that is, truth in presenting the facts—we are at the mercy of the mapmaker.

------------------------------------------------------------------

To me, the states of Pennsylvania and New York have comparable areas, but I know one has 21 votes and the other has 31: that’s a relative variation of 31:21 = about 1.5 in mapping intensity. You might be able to evaluate other examples of such distortion for states you are familiar with. The additional information needed is, of course, knowledge of the numbers of electoral votes in each state. (See the map in the “How It Turned Out”

slide below for those data.) You also need to be able to estimate areas reasonably accurately. That’s not always easy to do, which is one of the failings of cartograms: people cannot accurately compare areas, especially when they are of complicated regions separated on the map.

10

DRAFT 2/24/09 10


The Distortion Affects the Message

Red: State is too large.

Blue: State is too small.

Red: State is Republican.

Blue: State is Democratic.

Compared to Republican states, Democratic states tend to be too small.


Interpret these maps. For instance, on a bias-free cartogram, should California be made smaller or

larger? What about Pennsylvania? How does this pair of maps indicate a bias towards making the

red (and pink) states too large? How does that affect our perception of the state of the election?

------------------------------------------------------------------

Why should we care about the Lie Factor? If the distortion affects the states randomly, then it is

unlikely to affect the visual impression of red versus blue versus yellow that much. However, that’s

not the case, as the top map shows. It is a choropleth map of the distortion relative to the median (on

a log scale), using the cartogram itself as the base. It’s better to compare these maps with a

scatterplot (which I did) or regression of distortion on state category (which I did), but by glancing

back and forth you can see the red states (bottom) are all gray or red states (top), indicating the

democratic (blue; bottom) states tend to be drawn smaller than they should be. Thus, this map

distorts the facts: it makes it appear McCain was much closer to Obama in the electoral vote than he

actually was.

I believe this distortion was inadvertent, but it’s worth noting that “Claiming space on maps is as

much a political process as a technical one.” [Dorling, ibid., p. 53.] It looks like as far as cartography

goes, ignorance or error can make one just as political as overt control over the map schema does ;-

).

------------------------------------------------------------------

California is light blue, indicating it is drawn too small: it should be made larger. Pennsylvania is light

red: it should be made smaller. The “blue states” in the cartogram tend to be drawn in blue in the

upper map of distortion, indicating Obama-leaning states are often drawn too small compared to the

other states. This makes it look like the race is tighter than it actually was.

11

DRAFT 2/24/09 11


Analysis

►What caused the Lie Factor to be so large? Speculations:

► Invisible portions of states, such as the Great Lakes and offshore coastal areas, may have been counted by the software: we need to distinguish between political political political political areasareasareasareas and cartographic areascartographic areascartographic areascartographic areas.

► The algorithm to create any cartogram is iterative: this one might not have been run to convergence.

► The algorithm could be flawed.

►Perhaps our calculation of areas in the

graphic is incorrect?

► GIS image processing tools helped obtain accurate estimates of areas on the map.


What other mechanisms can you think of that might bias a cartogram?

How could one make mistakes in calculating the Lie Factor of a cartogram? How large might those errors be? Could any have occurred here? What could we do to avoid or detect such mistakes?

------------------------------------------------------------------

This slide touches on just a few of the possibilities considered. For example, a

close look at the cartogram shows many of the states were separated from each other, leaving unattributed slivers between them. How we measure those will change the estimated Lie Factor somewhat, but not enough to explain it away. In

general, I made choices that conservatively underestimated the Lie Factor in this map.

An experiment I conducted with an undergraduate GIS class a few years ago indicates that it’s difficult to make large random errors in estimating areas: they tend

to cancel out (even when you’re a freshman digitizing a boundary for the first time). We have to watch out for systematic errors, though. Even an innocent decision like

whether to include the state boundaries in their areas or not will have a differential effect, because the smaller states will be affected more than the larger states.

12

DRAFT 2/24/09 12


Lie Factor: 3.8

http://www.odtmaps.com/detail.asp_Q_product_id_E_pres-map-2008


What are the advantages and disadvantages of constructing a cartogram with this cellular method (compared to the preceding cartogram)?

What is the problem with this cartogram? Why is its Lie Factor so large? What kinds of states are most affected by the internal distortion?

------------------------------------------------------------------

This map was offered by a participant in the Internet conversation as another example of an election cartogram. Its contrast with the previous one is illuminating.

I added the insets on the right to show interesting detail: the top one displays 16 electoral votes, with lots of gray background

cells left over, while the bottom one displays 15 electoral votes that almost fill a rectangular view of the same size.

The coarse grid of this map, together with the posted data, enable us to evaluate relative intensities. That is very helpful for critical evaluation. In fact, I didn’t need a GIS for this: I just counted squares in each state and made a table. (It’s a lot of fun to use a low-tech solution once in a while.) A quick division with a spreadsheet computed the mapping intensities. For example, ND or WY in the upper inset use 2/3 square per vote, while GA uses 38/15 squares per vote. The relative mapping

intensity equals 38/15 : 2/3 = 19/5 = 3.8. On this basis alone, the Lie Factor is at least 3.8.

------------------------------------------------------------------

You can construct cellular cartograms without software. (Dorling, op. cit., describes how this is done.) You have less flexibility over the positions and shapes of the mapping regions and the map can look blocky and unnatural.

The basic problem with this cartogram is it is based on a strategic error: the areas are shown in proportion to population rather than electoral votes. Electoral votes themselves are not directly proportional to population, however: each state gets anumber that is roughly proportional to population, plus two more. The low-population states are most severely affected by this. Because McCain was favored in many such states, this map tends to make things look too bleak for his campaign.

13

DRAFT 2/24/09 13


Analysis of the ODT Map

► It is a population cartogram, not an electoral vote cartogram.

► California is given a relative weight of about 53 rather than 55.

► The smallest states are given relative weights of about 1 instead of 3.

► This builds in a Lie Factor of (53:1) / (55:3) = 2.9.

►Additional problems:

► Electoral votes have to be whole numbers.

► Each state is given a whole number of grid cells.

► Collectively, these discretization errors inflate the Lie Factor by about 30%.


When, if ever, is it appropriate to use one variable to determine the areas of the mapping units and another variable to determine their graphical symbols (color, pattern, and outline)? In cases where it is appropriate, how would the map (and its interpretation) change were you to switch the roles of the two variables?

------------------------------------------------------------------

Using a grid of cells makes the map construction easier and limits certain kinds of distortion. The number of cells is small enough that such cartograms can be made manually (without computer help) or even mechanically.

As far as population goes, this map looks pretty honest: within the limits of its discrete representation of area, it’s accurate. The problem is that its design is fundamentally mistaken. It calls itself a “presidential election map.” What is relevant to the election is the electoral vote, not the popular vote. By using population rather than electoral votes to determine relative error, it uses an irrelevant attribute rather than the right one. This accounts for 80% of the Lie Factor. The labels compound this error: a few provide population, but only for the larger states; all states are labeled with the number of electoral votes. Thus, even though the legend is clear about how the map is constructed, most viewers are going to interpret it in terms of electoral votes, not population. (In this map, it’s almost impossible to get an accurate grasp of how the electoral vote totals are shaping up.)

Again, the distortion affects smaller-population states differentially, but in a direction opposite that of the first cartogram: small states are shown much smaller than they should be. If we grant half of the “toss up” states to each side, it looks like McCain has only about 1/3 of the total. In fact, that’s about what McCain wound up getting (173 out of 538), but only because he lost every one of the “toss up” states on this map!

------------------------------------------------------------------

A good use of a bivariate cartogram is when you want to map a proportion or rate. The mapping unit areas can depict the denominator (often total population) while the colors can show the rates. This can produce excellent ways to show, say, world literacy rates, state poverty rates, county party affiliations, and so on. Were we to switch these variables, we would get a strange, difficult to interpret map: now the mapping unit areas would tell us rates while colors would tell us population sizes. Besides being unnatural, this would be less effective: rates often do not vary much (e.g., world adult literacy rates vary from about 12% to almost 100%, a range of eight to one, while country populations vary from perhaps a million to over a billion, a range of about a thousand to one. Areas can show more range of variation than colors or patterns can.

A full answer to this question would take a lot of space indeed: it is worth exploring in a class discussion or though experimentation with the maps you make. It is always a question worth asking when you design your mapping schemata, no matter what kind of map you choose.

14

DRAFT 2/24/09 14


Lie Factor: 1.2

Much better!

http://www3.amherst.edu/~aanderson/presidential_election_2008-polls10_23.png


When is it worthwhile to show so much detail in the outlines of a cartogram’s mapping units? (Contrast this with the gross simplification achieved by the precending ODT cartogram.)

How do you feel about a Lie Factor of 1.2? Is that small enough to trust this map?

------------------------------------------------------------------

This is Andy Anderson’s first attempt to reproduce the original cartogram. It’s a draft of the better one that’s on his Web site now. It shows some signs of haste (just like all my work at the time), such as the incomplete legend and incorrect labels for Minnesota and Michigan, which I like, because they encourage us to think about this map critically (which I believe was his intention) rather than to regard it as a polished cartographic production. I’m showing it, though, because this is what I had to work with. As far as graphical integrity goes, it’s far better than either of its predecessors. The striking thing is that although Andy used the cartogram software in an effort to produce an accurate map, it still has a significant Lie Factor. (The value of 1.2 is approximately the same size as the relative popular vote in the election that transpired two weeks later.)

(This software was created by an ESRI employee using a recently popularized method based on a “diffusion” model. It’s well crafted and has no indications that such distortions might be routinely possible. Not only are we at the mercy of the map maker, but it seems we also rely completely on the skills of the software programmer and the quality of the algorithms he or she implements.)

------------------------------------------------------------------

Detail in the mapping units (states in this case) provides some cartographic algorithms more scope to vary the sizes and shapes of the units. However, this much is not in the least necessary and can even slow down many algorithms. Because a cartogram is schematic, anyway—it does not purport to show coastal outlines accurately, for instance—it is probably better to simplify the outlines as much as possible before making the cartogram.

I’m comfortable with a Lie Factor of 1.2 when areas are used as the mapping element. When lengths are used (as in a bar chart), a Lie Factor of 1.2 can be quite noticeable. Thus, there is no universal rule. It seems to me that Lie Factors much larger than 1.2—say, around 1.5 or greater—usually have a noticeable effect in any kind of quantitative graphic, but I don’t think anyone has formally studied this.

15

DRAFT 2/24/09 15


How It Turned Out

http://upload.wikimedia.org/wikipedia/commons/2/24/ElectoralCollege2008.svg


How would you assess the Lie Factor in this new map? Does posting the numbers of electoral votes automatically make this map unbiased? Because it does not purport to be a cartogram, is it fair to use the areal mapping intensity to compute the Lie Factor (as we have been doing all along for the cartograms)? If not, what can you use?

------------------------------------------------------------------

After spending our time today looking closely at cartograms, this standard projection

(from Wikipedia) comes as a bit of a shock. Although McCain won less than one third of the electoral vote, his “red states” still comprise over half the area on the map! The story is not shown by map symbols at all: it is buried in the numerical labels, whose total determined the outcome. Clearly the cartogram method is useful for conveying quantitative information: all the more reason to want it to be accurate.

------------------------------------------------------------------

This map does not purport to be a cartogram and therefore computing a Lie Factor by comparing map areas to electoral votes, as we did earlier, would not be

appropriate. This map uses graphical elements to depict only one thing: the state’s winner. The actual data are shown by the posted numbers. This can be a good

way to convey a small dataset, but it is of little use to help us see geographic patterns in the data.

16

DRAFT 2/24/09 16


Summary► Cartograms usefully exploit distorted areas to display quantitative data, provided the distortion accurately reflects those data.

► Estimating the Lie Factor is an effective method of critical map analysis. A map with graphical integrity has a Lie Factor close to unity.

► A GIS, by means of its tools to measure map features, such as area, color, light intensity, thickness, length, etc., gives us a way to measure and map Lie Factors.

► Tools of Exploratory Data Analysis, like stem-and-leaf plots, help evaluate what the GIS measures.

► Nevertheless, it is usually impossible to detect cartograms (and other quantitative maps) that lack graphical integrity: we rely on the map maker, the GIS software, and it algorithms to ensure correctness.


What are the main ideas you come away from this presentation with? Are there any points with which you disagree?

Do you feel comfortable with computing Lie Factors in maps that are important to you? Would you compute (or at least estimate) Lie Factors in maps you create?

At what point(s) in GIS, geographic, or cartographic pedagogy, should we introduce ideas like mapping intensity and the Lie Factor?

------------------------------------------------------------------

I welcome your thoughts. What have a missed or exaggerated? Have I been unfair? Or have I not gone far enough in this analysis? Is there indeed a problem in identifying and diagnosing lack of graphical integrity of maps in particular? What other tools and critical approaches can be brought to bear on this? At what point in our teaching of cartography, geography, and GIS should we introduce such ideas and analyses? (I advocate doing it before we even begin that teaching!)

------------------------------------------------------------------

I hope you find the Lie Factor to be a comfortable and useful tool that you routinely apply to every map (and statistical graphic) that you produce and look at. You don’t have to do a full statistical analysis, but spot checks (like the PA vs. NY comparison I did) are usually quick and easy: if you know the values of the underlying data.

A case can be made to introduce tools for critical map evaluation before teaching about GIS, geography, or cartography. Such teaching will rely heavily on reading maps, interpreting them, and eventually making them. For many people, especially in liberal arts colleges, one main point to teaching GIS is to provide methods to critically evaluate data and data presentations. Providing specific tools, like the Lie Factor, is easy to do and might be a pedagogically useful use of class time and energy.

--Bill Huber

Quantitative Decisions, February 26, 2009.

how honest is that map? · area represents the quantity being mapped: votes. massachusetts: 12...

Documents