using diffusion-based cartograms for visual representation and exploratory analysis of plausible...

20

Click here to load reader

Upload: lalit-mohan

Post on 16-Apr-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

This article was downloaded by: [University of California, San Francisco]On: 21 December 2014, At: 08:24Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Spatial SciencePublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/tjss20

Using diffusion-based cartograms forvisual representation and exploratoryanalysis of plausible study hypotheses:the small and big belly effectTonny J. Oyana a , Remigius I. Rushomesa a & Lalit Mohan Bhatt aa 1000 Faner Drive, MC 4514, Department of Geography& Environmental Resources , Southern Illinois University ,Carbondale, IL, 62901, USAPublished online: 18 May 2011.

To cite this article: Tonny J. Oyana , Remigius I. Rushomesa & Lalit Mohan Bhatt (2011) Usingdiffusion-based cartograms for visual representation and exploratory analysis of plausible studyhypotheses: the small and big belly effect, Journal of Spatial Science, 56:1, 103-120, DOI:10.1080/14498596.2010.521976

To link to this article: http://dx.doi.org/10.1080/14498596.2010.521976

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Page 2: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 3: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

Using diffusion-based cartograms for visual representation and exploratory

analysis of plausible study hypotheses: the small and big belly effect

Tonny J. Oyana*, Remigius I. Rushomesa and Lalit Mohan Bhatt

1000 Faner Drive, MC 4514, Department of Geography & Environmental Resources,Southern Illinois University, Carbondale, IL 62901, USA

Diffusion-based cartograms provide a new, more innovative way to visually explore andrepresent spatial data than do standard maps. In this study, we explore two basic issuesregarding visual effectiveness and expressiveness of diffusion-based cartograms: first, weevaluate different applications to support our study hypotheses; and second, we evaluatethe visual effectiveness of the resulting maps using five criteria: attribute values, colour,labels, readability of maps, and clusters. Besides, diffusion-based cartograms are comparedto standard choropleth mapping to illustrate the additional visual exploratory power thesecartograms have to offer. Recent studies have shown that diffusion-based cartograms canfacilitate the uncovering of underlying and hidden structures, thus providing a moreelaborate visual representation of attribute value than conventional mapping does.Interestingly, the ability to display and visualize small areas and geographic locations withhigh attribute values using diffusion-based cartograms also allows for comprehensiveexploration and identification of specific geographies that are normally difficult to visuallyobserve with conventional maps. This method was applied to the results of the USincidence of child lead poisoning, Tanzania’s infant mortality, Uganda’s 2006 presidentialelection and South Korea’s 2005 population. These applications have yielded a number ofnew insights into the datasets illustrating ‘the small and big belly effect’ while successfullyconfirming the benefits of using diffusion-based cartograms.

Keywords: cartography; area cartograms; visualization; data representation; density-equalizing maps; data exploration

1. Introduction

The pioneering work of Gastner andNewman (2004) introduced a new algorithmfor creating cartograms drawn from basicprinciples of diffusion from particle physics(Figure 1). This seminal work has revivedthe well-known tradition of applying carto-grams to visualise mapped data. The mostcommon use of cartograms has been forthe exploration, display, and analysis ofgeographical distributions (Tobler 2004).A diffusion-based algorithm is relatively

efficient and effective for visual data explo-ration and generating hypotheses. Applica-tion domains, such as mining large-scaledatabases that require data visualisationand parameter initialisation, could benefitfrom this algorithm. Recent reviews of thisremarkable algorithm seem to suggest anincreased interest in its use and wideapplication (Colizza et al. 2005; Monmonier2006, 2007; Wieland et al. 2007). Forinstance, in Monmonier’s detailed review(Monmonier 2007) of the cartographic

*Corresponding author. Email: [email protected]

Journal of Spatial Science

Vol. 56, No. 1, June 2011, 103–120

ISSN 1449-8596 print/ISSN 1836-5655 online

� 2011 Surveying and Spatial Sciences Institute and Mapping Sciences Institute, Australia

DOI: 10.1080/14498596.2010.521976

http://www.informaworld.com

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 4: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

enterprise, he referred to Gastner andNewman’s density-equalising algorithm(Gastner and Newman 2004) as a ‘suite ofcleverly revealing cartograms’. Dorling(Dorling 2005) and other reports (e.g.Wieland et al. 2007) on this algorithmpublished in the last five years have notonly reported new insights and an improvedinterpretation of the data but have alsorevealed its promising overall potential.There are other valuable application exam-ples within the health domain (Colizza et al.2005; Wieland et al. 2007). This paper buildson current efforts to expand on the literatureon cartograms and showcases how theycan be used for visual exploration ofspatial data.

The basic structure of a diffusion-basedalgorithm comprises three main steps:declaration and initialisation of variables,computation, and display of outputs. Thealgorithm is based on a fundamentalassumption that the population attribute

should be distributed uniformly for a truecartogram, especially when regions withinthe cartogram are scaled proportionallyaccording to their attributea – that is, apopulation density function can be used toaccurately equalise and maintain readabil-ity in a resulting cartogram. The concept ofdiffusion certainly works well in the crea-tion of a cartogram since in consideringspecific population density, it pays attentionto the transfer from the high-density areasto the low-density ones until the density isequalised everywhere. The primary idea ofdiffusion-based cartogram is inspired bythe basic principles of the linear diffusionprocess of elementary physics, which iscommon in any standard physics textbook.

In a diffusion-based cartogram, thepopulation density can be representedmathematically with a density functionr(r), where r is equal to the given attributedensity of the region of interest and r refersto geographic position. As time approaches

Figure 1. The pseudocode for a density-equalizing map algorithm (Gastner and Newman 2004). Thisalgorithm is for making area cartograms and renders data on a map better than conventional maps.

104 T.J. Oyana et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 5: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

infinity (t ! ?), the attribute density be-comes uniform and so comes to rest. This isthe time taken to complete the computa-tional process for the production of adensity-equalising cartogram. Its total dis-placement from start to finish determinesthe projection of the map necessary toproduce a perfectly density-equalising car-togram. The diffusion equation is as follows:

r2r� @r@t¼ 0 ð1Þ

We can then calculate the velocity field interms of the attribute density using Eq. (2).

vðr; tÞ ¼ �rrr

ð2Þ

In Eq. (2), v(t) represents the velocity atposition r and time t. To create a cartogram,we start by solving Eq. (1). The densityfunction r(r, t) starts from the initial condi-tion in which r is equal to the given attributedensity of the region of interest, while thecorresponding velocity field is derived usingEq. (2). The cumulative displacement r(t) ofany point on the map at time t (i.e. as timeapproaches infinity) can be solved by inte-grating the velocity field in Eq. (3).

rðtÞ ¼ rð0Þ þZ t

0

vðr; t0Þdt0 ð3Þ

Criteria for evaluating the appropriatenessof diffusion-based cartograms

The criteria for evaluating diffusion-basedcartograms are centred on two key visuali-sation factors: expressiveness and effective-ness. The expressiveness factor determineswhether a graphical technique can expressthe desired information, while the effec-tiveness factor determines whether a gra-phical technique exploits the ability of theoutput medium to the human visual system.The purpose of effectiveness factor is

twofold: (i) to provide inferences aboutthe relative difficulty of the perceptual tasksassociated with the interpretation of thegraphical technique, and (ii) to identifywhich visualisation techniques satisfy theexpressiveness factor. This technique isconsidered to be the most effective in agiven situation at exploiting the capabilitiesof the output medium and the human visualsystem (Mackinlay 1986).

Following from these observations, weselected five graphical attributes and per-ceptual tasks to evaluate the appropriate-ness of diffusion-based cartograms for fourexperimental datasets. The datasets possessa variety of unique characteristics thatenables comparisons of different applica-tions of a diffusion-based cartogram forexploratory spatial data analysis. The fivecriteria employed to explore the cartogramswere: (i) Attribute value, (ii) colour, (iii)labels, (iv) readability of maps, and (v)cluster analysis. We investigated threespecific study hypotheses:

. Hypothesis I: the use of diffusion-based cartograms can facilitate theuncovering of the underlying datastructures through transformationand provides a better visual represen-tation of the attribute value thanconventional maps do.

. Hypothesis II: the use of diffusion-based cartograms allows for compre-hensive exploration and identificationof specific geographies that are nor-mally difficult to visually observe inconventional maps, ‘the small and thebig belly effect’.

. Hypothesis III: the use of diffusion-based cartograms together with con-ventional maps provides insights andmultiple ways of probing the data.

The need to confirm the above studyhypotheses is supported by Guagliardo andRonzio (2005). In their report, they

Journal of Spatial Science 105

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 6: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

observed that the use of diffusion-basedalgorithms could dramatically change theinterpretation of data. Monmonier’s (2007)review also confronted a similar questionand agrees that diffusion-based algorithmsoffer valuable insights during data explora-tion and may lead to the development ofsuperior study hypotheses. The need toinvestigate the scientific role of this remark-able algorithm motivated us to demonstrateit with the following applications: theUnited States’ incidence of child leadpoisoning, Tanzania’s infant mortality,Uganda’s 2006 presidential election andSouth Korea’s 2005 population.

The central purpose of this study wastherefore to gain insight into socio-demo-graphic and health datasets and set somefuture research questions pertaining to thesignificance of density-equalising carto-grams. In conducting this investigation, weintend to engage the geographic informa-tion systems (GIS) and cartographic com-munity in thinking about three researchquestions: (1) how do we visually exploreinteresting patterns in data in ways thatengage our cognitive abilities and signifi-cantly improve data interpretation? (2)What exploratory methods and tools areavailable for data representation and trans-formation? (3) In light of questions 1 and 2,can diffusion-equalising cartograms aug-ment data exploration and interpretation?

2. Materials and methods

Data description

Four experimental datasets were used toevaluate and verify study hypotheses. Datapertaining to the US child lead poisoningincidence were obtained from the Centers forDisease Control and Prevention; Tanzania’sinfant mortality data were obtained from theMinistry of Health; the 2006 Uganda pre-sidential electiondatawereobtained from theElectoral Commission of Uganda; and the2005 South Korea Population and Housing

data were obtained from the Korean Na-tional Statistical Office.

Incidence data for child lead poisoning

The child lead poisoning cases for eightconsecutive years from 1997 through 2004had 10 attributes, but four of these were ofprimary interest: the total population of thechildren in each state, number of childrentested in each state, the number of childrentested confirmed, and the prevalence ratesof elevated blood lead levels (BLL) for eachstate.

Infant mortality data

Infant mortality is used as a good indicatorof the standard of living in district neigh-bourhoods and of state of the environmentalhealth. It comprises an approximation of thenumber of newborns dying under 1 year ofage divided by the number of live births in aspecified year or the likelihood of an infant’sdying before its first birthday in comparisonwith the number of live births in a particularyear. The data constitute an average infantmortality for each district as aggregated andsummarised by Muhimbili UniversityHealth Exchange Forum, Tanzania. Thenumber of live births was calculated usingthe 2002 Population and Housing Census ofthe United Republic of Tanzania.

The 2006 Uganda presidential election data

Uganda’s presidential election data was heldon 23 February 2006, with five participatingcandidates: Abed Bwanika, who stood as anindependent; John Ssebaana Kizito for theDemocratic Party; Kizza Besigye for theForum for Democratic Change; MiriaObote for Uganda People’s Congress; andYoweri Museveni for the National Resis-tance Movement. Given that the data forAbed Bwanika, John Ssebaana Kizito andMiria Obote were too few to influence the

106 T.J. Oyana et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 7: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

election outcome, our experiment predomi-nantly focused on the two key candidates(Kizza Besigye and Yoweri Museveni). Theelection outcome was obtained for 69districts from nearly 20,000 polling stationsaround the country with a 99 percentprovisional reporting. The total number ofregistered voters was 10,450,788, with arecord turnout of approximately 68 percent.

The 2005 South Korea Population andHousing data

The 2005 Population and Housing Censusdataset was acquired from the Korean Nati-onal Statistical Office, Ministry of Financeand Economy. The Korean Peninsula pre-sents an interesting experimental datasetpartly because of its multiple islands; someare very large and others are very small. Thedataset further offers a unique opportunityfor us to test how well the density-equalisingalgorithm can handle a geography that isboth contiguous and non-contiguous. Togain additional insights into this algorithmwe explored two variables: the total popula-tion and total foreign population across theSouth Korean Republic.

Data analysis

The analytical methods used to explore andanalyse the datasets described above in-cluded a very popular technology calledGIS for mapping, kriging and a diffusion-based algorithm for making cartograms.The datasets were normalised to minimisenoise and any potential uncertainty con-tained in them. The three major compo-nents of data analysis were:

. spatial exploratory data analysis andvisualisation

. geostatistical analysis to resolve strong-ly positively skewed distributions

. making cartograms using a diffusion-based algorithm.

Spatial exploratory data analysis andvisualisation

The three datasets were processed andlinked with boundary data. Exploratoryanalysis of these datasets suggested that thedistribution was strongly positively skewed.In order to normalise the data and accountfor any potential anomalies and spatialvariations, we conducted a geostatisticalanalysis of the spatial data using GIS. Inaddition, a variety of plots were used tovisualise the data. This process created aworking knowledge of the spatial propertiesand structure of the data.

Geostatistical analysis

As already discussed above, the principalaim of conducting a geostatistical analysiswas to resolve the problem of stronglypositively distribution in three of the fourexperimental datasets. The geostatisticalanalysis using the Geostatistical Extensionin ESRI’s ArcGIS 9.3 was carried outfollowing four steps: (1) resolve stronglypositively skewed data through log-normaltransformation, (2) fit a semivariogrammodel based on insights drawn from visualexploratory analysis, (3) conduct a residualanalysis, and (4) explore spatial variationsusing new estimates.

Making cartograms using a diffusion-basedalgorithm

Figure 1 presents the pseudocode of adiffusion-based algorithm as suggested byGastner and Newman (2004). We must statefrom the onset that we created cartogramsbased on the percentages of child leadpoisoning (the percentage of the totalnumber of children with elevated BLL)and infant mortality (the total number ofinfants who died before reaching the age ofone) to avoid the problem of the rates notbeing additive as demonstrated by the workof Gastner and Newman (2004). In mapping

Journal of Spatial Science 107

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 8: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

the data in ArcGIS, however, we decided todisplay and visualise the rates to meet theconventional public health approach ofusing a denominator of total population.

The experimental datasets were ex-plored using the diffusion-based algorithmoriginally implemented in the Java pro-gramming language for the Windows oper-ating system. Diffused cartograms weremade to represent attributes of the USincidence of child lead poisoning, Tanza-nia’s infant mortality, Uganda’s 2006 pre-sidential election outcomes and SouthKorea’s 2005 population. The sizes of theareal units used in each dataset are adjustedto reflect the size of the attribute; hence, thenew cartograms/maps provide a betterrepresentation of the datasets. The resultingcartogram files were exported to ESRI’sArcGIS 9.3 using GIS shapefile format.Additional visual exploration, spatial queryand map production were accomplishedusing the ArcGIS software application.The compilation and production of someof the final maps were accomplished usingCorel Draw graphic software.

3. Results

The results of the preliminary analysis ofthe US child lead poisoning indicated thatRhode Island, Illinois, Wisconsin and Con-necticut consistently had the highest childlead poisoning incidence rates during thestudy period.

The results for 1997 (Figure 2) show thata high proportion of children confirmedwith elevated blood lead levels (BLL) in theUnited States were distributed in threemajor clusters within the study area. Thefirst cluster comprises the eastern states –Connecticut, Rhode Island, Massachusetts,Vermont, New Hampshire and Maine. Thesecond cluster is in the central United Statesand includes the following states: Kansas,Nebraska, Iowa and Minnesota. The thirdmajor cluster that reveals a high incidence

rate is the southeastern region compri-sing Florida, Alabama, Tennessee, SouthCarolina and Virginia.

The incidence rates for the above-men-tioned states are evidently in contrast withthe rates for the remaining states. Theprobable reason for these contrasting re-sults is that in 1997 only a small fraction ofthe child population was reported as testedin these states – hence a higher rate becauseof the percentage bias.

The incidence rates for child lead poison-ing exhibit a downward trend during thestudy period. Figures 2 and 3 show thespatial distribution of incidence rates for1997 and 2003. Lead poisoning consistentlyaffected a much higher proportion of child-ren in most states throughout the northeastand in the northern states of Illinois, Mis-souri, Iowa, Michigan andWisconsin. None-theless, the rest of the states also followed adeclining trend during the entire studyperiod. The probable causal factors for thisdecline throughout the study period are:

. lead poisoning prevention acts help-ing to improve building assessments,notifications and funding pertainingto lead poisoning;

. states focusing on primary preventionof lead poisoning that allows for themost hazardous buildings to be iden-tified and preventive actions to takeplace;

. efforts at the grassroots level that dealwith the primary problems of lead-contaminated houses and, on a largerscale, build a coalition to eliminatelead hazards in housing; and

. public awareness and education re-garding prevention of lead exposure.

For the application with the results ofTanzania’s infant mortality, preliminaryanalysis shows the existence of high infantmortality in the western, central and south-ern portions of the country, with the west

108 T.J. Oyana et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 9: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

region having the highest mortality com-pared with the rest of the country. Figure 4shows the spatial distribution of 2002 infant

mortality in Tanzania. According to thisfigure, the most affected districts werelocated in the western region. The lowest

Figure 2. The spatial distribution of 1997 child lead poisoning in the USA. The upper panel is an areacartogram of the percentage of the total number of children with elevated BLL while the lower is aconventional map. The ‘small and big belly effect’ is quite evident in the area cartogram map. Bothmaps display rates in the panels and in the legend, a five-point rating scale ranges from very low to veryhigh, representing incidences rates of child lead poisoning of 10 cases per 1000 children and 45 cases per1000 children under the age of six years, respectively.

Journal of Spatial Science 109

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 10: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

infant mortality was observed in the north-eastern region and along the coastal region.

The analysis of Uganda’s presidentialelection shows that the political landscape

was dominated by Museveni. Figure 5shows the spatial distribution of the out-come of the 2006 presidential election inUganda for two candidates (Museveni and

Figure 3. The spatial distribution of 2003 child lead poisoning in USA. The upper panel is an areacartogram of the percentage of the total number of children with elevated BLL while the lower is aconventional map. The ‘small belly effect’ is quite obvious in the area cartogram map, showing littlevariations in the observations. Both maps display rates in the panels and in the legend a five-pointrating scale ranges from very low to very high, representing incidences rates of child lead poisoning of 5cases per 1000 children and 30 cases per 1000 children under the age of six years, respectively.

110 T.J. Oyana et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 11: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

Besigye). Museveni is paired with Besigye,and the kriged data were analysed for bothcandidates. The provisional results for

Museveni were 4,078,677 votes (59.28 per-cent); Besigye Kizza, 2,570,572 votes (37.36percent); and others, 160,990 (2.34 percent).

Figure 4. The spatial distribution of the number of dead infants in 2002 in Tanzania. The upper panelis a live-birth cartogram and on the lower is a proportional symbol map. The ‘small belly effect’ is quiteobvious in the area cartogram map of Tanzania.

Journal of Spatial Science 111

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 12: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

Figure 5. The spatial distribution of the 2006 presidential election outcome in Uganda for two majorcandidates; the other three candidates had no effect at all, thus they were excluded from the analysis.The upper panel is the representation using a diffusion-based cartogram and the lower is a conventionalmap. The ‘small belly effect’ is quite evident in the area cartogram map.

112 T.J. Oyana et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 13: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

The kriged results indicated that Musevenigot 4,311,376 votes (58.76 percent); BesigyeKizza received 2,865,524 votes (39.05 per-cent); and others accounted for 160,588votes (2.19 percent). From these krigeddata, we can infer that Museveni won the2006 election despite Besigye’s 2 percentgain and his 1 percent loss after the krigingapproach was applied. The fact remainsthat Museveni won the election, and Besi-gye was the most competitive among theremaining candidates, thus coming secondafter Museveni. This preliminary findingwas gathered from this study; however,more critical analyses reveal that the elec-tion outcome could perhaps have beenmore complicated.

Figure 6 shows the spatial distribution ofthe population in South Korea. From thesemaps, we can deduce that most of SouthKorea’s population resides in Seoul Metro-politan City and the Province of Gyeonggi-do. The other metropolitan areas includingBusan, Incheon, Daegu, Daejeon, Gwangjuand Ulsan have a sizeable number of people.This information is evident visually in themaps that are presented and is commonknowledge. However, the visual representa-tion is more apparent in the area cartogramsthan in the conventional map.

4. Discussion

Evaluation of diffusion-based cartogramappropriateness:

Given that one of our key goals was todetermine the appropriateness of diffusion-based cartograms using the results of theaforementioned datasets, we set out toexplore the following five criteria:

Attribute value

States like Rhode Island and Connecticut inthe United States and the Unguja andPemba islands and Ileje district in Tanzaniahaving smaller areas but higher mortality

are much more easily readable with the helpof this type of cartogram than they are withconventional maps. For maps of Uganda’selection but not for those of the SouthKorean foreign population, the attributevalue did not make a notable difference interms of any marked distortions in compar-ison with the conventional map.

Colour

We established from our experiments thatin conventional maps the scale of thephenomenon was not properly visible withthe use of colour; however, with this type ofcartogram we achieved a strong visualappeal and spatial patterns were easilyidentifiable. Small geographic units, suchas the islands in South Korea, the states ofRhode Island and Connecticut in theUnited States and the smaller islands ofUnguja and Pemba in Tanzania, may reflectsuch a dilemma because smaller areas tendto be obscured by their larger counterpartsin conventional mapping. For these data-sets, however, colour enhances the qualityof both map types with strong visualattractiveness, even more so for mapsproduced using cartograms instead of con-ventional maps.

Labelling

On conventional maps it is difficult to readthe labels of states with very small areas,which in this investigation are the primaryareas of interest. Although labelling themaps helps facilitate the interpretation ofdata distorted by applying the density-equalising algorithm, the maps may bedifficult to interpret by non-professionalswho have been exposed to conventionalmaps. However, in the interests of visualexploratory data analysis, the primary areasof interest are clearly visible and easy tolabel, and superior hypotheses about thedata may be formulated.

Journal of Spatial Science 113

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 14: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

Map readability

Although diffused cartograms provided im-proved readability of the data and the mapsthus facilitated improved interpretation ofthe data, the level of distortion may be

problematic. In our experiments, most ofthe essential features of the map were visibleand easily identifiable, especially for smallergeographic units such as Rhode Island in theUnited States and the Unguja and Pemba

Figure 6. The population of South Korea. The left upper panel shows the total population using aconventional map, the right upper panel is the total population cartogram, and the lower panel is theforeign population cartogram. Population distributions show a clear ‘big belly effect’.

114 T.J. Oyana et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 15: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

islands of Tanzania. Further evaluation ofthis criterion may be necessary because arealcartograms may distort conventional mapsthat are perceptually known by most people;take the example of the distorted shape ofthe city of Seoul, South Korea.

Analysis of clusters

A review of the conventional map of childlead poisoning in the eastern region of theUnited States suggests that children ex-posed to a high concentration of lead werelocated mainly in three states – RhodeIsland, Connecticut and Massachusetts.However, a different picture emerges whenone reviews maps produced by cartograms,which suggests that the phenomenon wasspatially well spread among the statesforming the New England region; otherclusters are evident in Illinois, Missouri,Iowa and Wisconsin. Likewise, cartogramsfor Tanzania show three main infantmortality clusters with the most affecteddistricts in the northwest and south, fol-lowed by the central zone, and with thenortheast and coastal parts being lessaffected. For Uganda’s election data, thereare three clusters, two of which are forBesigye’s votes and one large contiguousregion for Museveni’s. The populationdataset for South Korea has four distin-guishable clusters. With the cluster criter-ion, we had a significantly improvedunderstanding and interpretation of thefour experimental datasets, especially sincespatial clusters were clearly visible.

Additional insights

Child lead poisoning

Cartograms of child lead poisoning re-vealed that the most affected areas werethe New England states of Rhode Island,Connecticut, Maine, Massachusetts, NewHampshire and Vermont. This is partly due

to the presence of many older housing unitswith lead content. To completely eradicatethe lead poisoning, continued vigilance toidentify remaining lead hazards and chil-dren at risk of lead exposure is necessary.This can be accomplished by honing pri-mary prevention efforts, especially in areaswhere incidence is highest, by increasing thenumber of children screened, by decreasingthe number of the at-risk population, byincreasing the availability of lead-safe low-income housing and by greatly enhancingcommunal knowledge of childhood leadpoisoning (Weitzman et al. 1993; Dietrichet al. 2004; Brown et al. 2001).

Other areas that showed spatial clustersof child lead poisoning were located inIllinois, Missouri, Wisconsin and Michigan.Urban areas including Chicago, Detroitand St. Louis appear to be particularlyvulnerable to the exposure of lead. Thismay require additional research forconfirmation.

Infant mortality

Infant mortality is a sensitive indicator ofthe socioeconomic condition of a country.Key contributing factors include low socio-economic status, lack of health facilities andaccess, lack of female education, expensivevaccines and cultural practices that stillconflict with advances in medicine. Carto-grams of infant mortality data revealedspatial clusters with the highest mortality tobe in the western portion of Tanzania,comprising districts mainly in the regions ofKagera, Kigoma, Mara, Shinyanga, Mwan-za, Rukwa and Tabora. The high mortalitymay be partly due to poverty, malnutrition,poor female education and preventablediseases such as malaria and HIV/AIDS.With the exception of Shinyanga, Mwanza,Rukwa and Tabora regions, the rest arealso known to be food-insecure, thusaggravating the effect of dietary practiceson the health of breast-feeding mothers

Journal of Spatial Science 115

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 16: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

(Kinabo and Msuya 2002). Interestingly,the western portions of Tanzania haverelatively poor infrastructures and inade-quate health facilities and are poorly con-nected to the rest of the country, whichwould hinder access to health services. Theregion is also overburdened by an influx ofrefugees from neighboring countries. Od-dly, the cartogram revealed that Ukerewe,Mpwapwa and Kilosa are outlier districtswith high infant mortality, respectively,compared with surrounding districts, prob-ably because those areas have uniquedemographic disparities (The TanzaniaPopulation and Housing Census 2002).

Additional spatial clusters were ob-served among districts located within thecentral and southwest portions of Tanzania,explainable in part by Tanzania’s socio-economic status (Marchant et al. 2004). Theregion is also burdened with high levels ofpoverty and severe malnutrition amongmothers and infants (Kinabo and Msuya2002; Marchant et al. 2004). High levels ofpoverty in the Mara region may alsoaccount for the spatial clusters.

Spatial clusters were also visible amongdistricts located along the coastal andnortheast regions of the country, revealingrelatively low infant mortality, partly due torelatively better infrastructure, access tohealthcare facilities and high socioeconomicstatus. It should be noted, however, that themost recent 2007 infant mortality data forTanzania show a major decline in thenumber of children dying before reachingthe age of 1 (Marchant et al. 2004).

Uganda’s 2006 presidential electionoutcomes

The 2006 presidential election data offerssome new insight into Uganda’s electoralprocess based on spatial contiguity.Although Museveni had a significant elec-toral success during the 2006 presidentialelection, his overall support is confined to

certain geographic regions. The same wasobserved for Besigye, who had his successpredominantly confined to two contiguousregions. While this representation in thecartograms may lead to the conclusion thatMuseveni is very popular in Uganda(followed by a strong performance forBesigye) this may be only part of the wholestory. Additional analysis and scrutiny ofattribute data revealed that Museveni’sstrong performance was confined to westernUganda, while Besigye’s was among areassurrounding Kampala and the northernportions. The implication of this finding isthat other political parties have an excellentopportunity to build their political basesbecause the central, eastern and northernportions of Uganda are still politicallycompetitive. Preliminary data also suggestthat support for Besigye and Museveni willnot hold because their political base andsupport are confined to specific regions.This information can be deduced from theresults of newcomers such as Abed Bwani-ka, who performed well in the eastern andnorthern portions (Arua, Apac, Lira, Kam-pala, Nebbi, Gulu and Kumi districts) ofUganda, while Obote Kalule Miria wasstrong in the Lango region (Lira and Apacdistricts) and Ssebana Joseph Kizito wasalso strong in some parts of the Bugandaregion, especially in the districts of Wakiso,Kampala, Masaka, Mukono and Mpigi. Ina future study, it would be quite interestingto analyse the 1996 and 2001 election datato elucidate these results and examine anypotential trends.

South Korea’s 2005 population

The population size in South Korea isclearly illustrated by the density-equalisingmaps. The maps have shown a remarkableability possessed by the diffusion-basedalgorithm to uniformly distribute andscale the population attribute throughvisual distortions, which expose specific

116 T.J. Oyana et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 17: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

geographies in the spatial distribution ofpopulation patterns in South Korea. Evenmore appealing than anything else is thevisual representation of the total foreignpopulation residing in South Korea. Con-ventional maps would have not properlyscaled this low attribute value (total foreignpopulation) as much as the diffusion-basedcartogram.

Diffusion-based cartograms as a new methodfor visualising spatial data

A cartogram uses a variable other than areaas its base; thus, it is more effective inrevealing the spatial patterns of mappeddata than are conventional maps that havea fixed map structure. In a cartogram theshape and orientation of units are slightlydistorted, and its use therefore calls foreither prior knowledge of the area map orrequire that a conventional map always beplaced beside the cartogram as a reference.Continued dependence on a fixed mapstructure, cartographic representations andprior knowledge required to decode mapinformation as it relates to conventionalmaps often restricts map readers and there-fore they may not be willing to takeadvantage of new opportunities for visua-lising mapped data. If one takes thisstatement into consideration, it is reason-able to assume that distortions in carto-grams may not be pleasing to most mapreaders. A new method can also be quitedemanding in terms of decoding mappeddata, due in part to its unfamiliarity anddistorted structure. To overcome this pro-blem, we suggest the inclusion of map labelsin cartograms or any other useful carto-graphic representations that render the mapinformation more interpretable for readers.Undoubtedly, for more sophisticated spa-tial data analysts, cartograms reveal spatialpatterns that are easily identifiable and maylead to the development of superiorhypotheses.

Diffusion-based cartograms and health data

Through these experiments we were able todisplay health data maps, and the carto-gram proved to be a valuable tool inachieving this key goal. It would be inter-esting to apply the use of cartograms morewidely to health data so that its benefits canbe fully explored and realised. Their usemay facilitate scientific inquiry and lead todelineation spatially of major risk factorsthat are possibly contributing to a humanepidemic and may provide insights into whycertain locations seem to be more vulner-able to disease risk than others.

Diffusion-based cartograms and election data

Diffusion-based cartograms, first applied toelection data in the United States, revealedan interesting spatial pattern that waslacking in conventional maps. Dorling’swork that came later also popularisedcartograms (Dorling 2005). It would beinteresting to consider Uganda’s presiden-tial election data at county level to deter-mine whether there are any significantinsights at that spatial resolution. In ourcase, that dataset was not readily available,and we had to work at district level. Despitethis limitation, diffusion-based cartogramsprovided a better method to visualiseelection outcomes and to examine twohealth datasets. On the basis of thisexperience, we still recommend that theuse of this type of cartogram with otherinvestigative tools may provide different butsuperior insights. Different visual tools andmethods when combined may yield a solidunderstanding and improve the quality ofdata analyses, especially the outcome.

Diffusion-based cartograms and populationdata

The visual clusters shown in Figure 6 offerus the opportunity to study and understand

Journal of Spatial Science 117

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 18: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

spatial patterns displayed by specific geo-graphies of population. The areal distribu-tion is now much clearer than it is in aconventional map, and by examining itsproportionality we can begin to deduceexplanatory variables that are responsiblefor these patterns. Interestingly, the density-equalising algorithm showed its versatilityin handling both numerous non-contiguousand contiguous of South Korea.

5. Concluding remarks and future work

This study has further revealed the benefitsof using diffusion-based cartograms that areconsistent with what previous studies havereported (Gastner and Newman 2004;Tobler 2004; Colizza et al. 2005;Monmonier 2006, 2007; Wieland et al.2007). However, in terms of future direc-tions, we recommend a thorough testing ofdiffused cartogram using human subjects tomeasure their perceptual and cognitiveabilities. For now it is apparent that it is agreat tool for visual analytics and can beused to extract insights regarding thematicelements of data. And following from ourrich experiences in applying these diffusedcartograms, we wish to summarise ourfindings as follows:

Attribute value

A diffusion-based cartogram has an advan-tage over a conventional map because thelatter shows data in relation to map areaonly and not in relation to attribute valueand may give a false impression of thespatial patterns of the phenomena underinvestigation.

Colour

On the use of conventional choroplethmaps, polygons representing smaller areasoften seem to be denser than the poly-gons representing larger areas even when

applying the same colour, thus making theuse of colour schemes a little elusive. This isnot the case for cartograms because the sizeof the resultant polygon is normally depen-dent on the attribute value.

Labelling

Clusters of smaller polygons and evenindividual smaller polygons are difficult tolabel especially if they represent low values.Cartograms provide some degree of flex-ibility, especially if smaller polygons depicthigh values of phenomena, making labellinga lot easier than when using conventionalmaps. However, there exist obscure flaws incartograms with smaller polygons and lowattribute value.

Map readability

In diffused cartograms most of the essen-tial features are clearly visible and easilyidentifiable compared with conventionalmaps.

Analysis of clusters

Diffusion-based cartograms are relativelysimpler to read and intuitive to understandthan are conventional maps. They focus thehuman eye on the area of interest and allowfor instant visual recognition of differentspatial clusters evident in the phenomenathat are being investigated. The distorteddata also illustrate the ‘small belly, big bellyeffect’. The ‘small belly effect’ reflects littlevariation in the spatial patterns and ob-servations, while the ‘big belly effect’ depictswide variations in the spatial patterns andobservations.

Main contributions and future directions

The main contribution of this study in-volves the development of the followingthree study hypotheses:

118 T.J. Oyana et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 19: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

Hypothesis I

The use of diffusion-based cartogramsfacilitated the uncovering of the underlyingdata structures through transformation andprovided a better visual representation ofthe attribute value than did conventionalmaps. This has been illustrated by theresults of the application examples. Theattribute value of the datasets determinesthe representation of the phenomena, butwith conventional maps it stays the samewithout any flexibility. For example smallerareas with small polygon sizes are betterrepresented by cartograms than by conven-tional maps (Figures 2 to 6). Likewise, areasare adjusted proportionally, thus represent-ing the data effectively, implying thatdiffusion-based cartograms are better toolsfor visual display in cases where smallgeographic areas have high values thanconventional maps. This may be essentialfor knowledge discoveries since cartogramstake the attribute data and accuratelytransform them into useful and visuallyunderstandable information.

Hypothesis II

The use of diffusion-based cartogramsallows for comprehensive exploration andidentification of specific geographies thatare normally difficult to observe visually inconventional maps.

The use of cartograms facilitated theidentification of spatial clusters in the data-sets depicting the ‘small belly, big belly effect’.For child lead poisoning and infant mortalityexperimental datasets three major clusterswere identified, South Korea had four visibleclusters while Uganda’s election outcome hadtwo major clusters that were quite obvious.

Hypothesis III

The use of diffusion-based cartogramstogether with conventional maps provides

insights and multiple ways of probing thedata. An on-screen probing/data brushingor a side-by-side display of the data(Figures 2 to 6) can facilitate the discoveryof spatial patterns.

Acknowledgement

Special thanks and gratitude to the KoreanBrain Pool Program, Project Number 091-5-3-0493, of the Korea Federation of Science andTechnology under the Ministry of Science andTechnology for funding support during mysabbatical leave at the School of Civil andEnvironmental Engineering, Yonsei University,where this work was further developed.

References

Brown, M.J., Shenassa, E., & Tips, N. (2001)Small Area Analysis of Risk for Childhoodlead Poisoning, Washington DC: Alliance toEnd Childhood Lead Poisoning.

Colizza, V., Barrat, A., Barthelemy, M., &Vespignani, A. (2005) The role of the airlinetransportation network in the prediction andpredictability of global epidemics. Proceed-ings of the National Academy of Sciences ofthe United States of America, vol. 103, no. 7,pp. 2015–2020.

Dietrich, K.N., Berger, O.G., Succop, P.A.,Hammond, P.B., & Bornschein, R.L.(2004) The developmental consequences oflow to moderate prenatal and postnatal leadexposure: intellectual attainment in theCincinnati Lead Study Cohort followingschool entry. Neurotoxicology and Terato-logy, vol. 15, pp. 37–44.

Dorling, D. (2005) New maps of the world, itspeople, and their lives. Society of Cartogra-phers Bulletin, vol. 39, pp. 35–40.

Gastner, M.T., & Newman, M.E.J. (2004)Diffusion-based method for producing den-sity-equalizing maps. Proceedings of theNational Academy of Sciences of the UnitedStates of America, vol. 101, no. 20, pp. 7499–7504.

Guagliardo, M.F., & Ronzio, C.R. (2005) Isregion of a country a useful variable for childhealth studies? Pediatrics, vol. 116, no. 6, pp.1542–1545.

Kinabo, J., & Msuya, J., eds. (2002) Thenutrition status in Tanzania: editorial note.South African Journal of Clinical Nutrition,vol. 15, no. 3.

Journal of Spatial Science 119

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014

Page 20: Using diffusion-based cartograms for visual representation and exploratory analysis of plausible study hypotheses: the small and big belly effect

Mackinlay, J. (1986) Automatic design ofgraphical presentations. PhD dissertation,Computer Science Department, StanfordUniversity, Stan-CS-86-1038.

Marchant, T., Schellenberg, J.A., Nathan, R.,Abdulla, S., Mukasa, O., Mshinda, H., &Lengeler, C. (2004) Anaemia in pregnancyand infant mortality in Tanzania. TropicalMedicine and International Health, vol. 9, no.2, pp. 262–266.

Monmonier, M. (2006) Cartography: uncer-tainty, interventions, and dynamic display.Progress in Human Geography, vol. 30, no. 3,pp. 373–381.

Monmonier, M. (2007) Cartography: the multi-disciplinary pluralism of cartographic art,geospatial technology, and empirical scho-larship. Progress in Human Geography, vol.31, no. 3, pp. 371–379.

The 2002 Tanzania Population and HousingCensus, United Republic of Tanzania

(URT). Available at http://www.tanzania.go.tz/census/regions.htm. Accessed 20December 2007.

Tobler, W. (2004) Thirty five years of computercartograms. Annals of the Association ofAmerican Geographers, vol. 94, no. 1, pp.58–73.

Weitzman, S., Clickner, R.P., & Albright, V.A.(1993) The prevalence of lead-based paint inhousing: findings from the national survey.In: Breen, J.J., & Stroup, C.R., eds. LeadPoisoning: Exposure, Abatement, Regulation,CRC Press, Boca Raton, FL, pp. 3–12.

Wieland, S.C., Brownstein, J.S., Berger, B., &Mand, K.D. (2007) Density-equalizing Eu-clidean minimum spanning trees for thedetection of all disease cluster shapes.Proceedings of the National Academy ofSciences of the United States of America,vol. 104, no. 22, pp. 9404–9409.

120 T.J. Oyana et al.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a, S

an F

ranc

isco

] at

08:

24 2

1 D

ecem

ber

2014