geoanalytics tools applied to large geospatial datasets

11
GeoAnalytics tools applied to large geospatial datasets Mikael Jern Tobias Åström Sara Johansson VITA Visualization Technology and Applications, Linkoping University, Sweden [email protected] [email protected] [email protected] Abstract Geovisual analytics focuses on finding location- related patterns and relationship. Many approaches exist but generally do not scale well with large spatial datasets. We propose three enhancements that facilitate scalable geovisual analytics of voluminous geospatial data based on geographic mapping coordinated and linked with parallel coordinates (PC): 1) texture-based geographic mapping that exploits GPU-based rendering performance applied to overview + detail views, 2) statistical methods embedded in PC, 3) aggregated dynamic grid maps that integrate with PC. In this context, we have extended our previous introduced ’GeoAnalytics’ Visualization (GAV) framework and class library with a novel implementation of the standard PC using an atomic layered component architecture that allows new ideas to be implemented and assessed without having to rewrite a complete functional PC component. We demonstrate our proposed enhancements applied to a large geospatial dataset containing more than 10,000 Swedish zip (postal) code regions described by more than three million (X,Y) boundary coordinates and includes many associated demographics and statistical attributes. 1. Introduction Visual exploration and presentation of geospatial demographics, socioeconomic and environmental data can provide analysts with strategic and sometimes competitive advantage. Integrated information and geovisualization methods, here referred to as geovisual analytics, focus on finding location-related patterns and relationship. Geovisual analytics tools are designed to support highly-interactive explorative spatial data analysis (ESDA) of large geospatial data. They enable analysts to look at geospatial data from multiple perspectives and explore complex relationships that may exist over a wide array of spatial and multivariate scales. Geovisual analytics research focuses in particular on integrating cartographic approaches with visual representations and interactive methods from information visualization and ESDA [1,2] complementing human perceptual skills. Larger geospatial datasets challenge existing methods that have been developed and conceptually verified on small or moderate sized spatial data e.g. US states or counties, countries of the world etc [1,2,3,4,10,20,27]. Most conventional geovisual analytics techniques do not scale well with these large volumes of spatial data. Performance for interactive geographic mapping must be immediate and pre- attentive. The often integrated parallel coordinates plot (PC) is cluttered with thousands of strings that obscure effective multivariate data analysis. In this context, we have further extended our previous introduced ’GeoAnalytics’ Visualization (GAV) framework and component class library [13,14,15] based on the principles behind the Visual Analytics [17] and Geovisual Analytics research programs. We propose the following major enhancements to explore voluminous geospatial and multivariate data facilitated by geographic mapping and coordinated with an enhanced PC implementation: We introduce a new and scalable PC component based on a atomic layered component architecture that improves performance and scalability allowing new ideas to be implemented and assessed without having to rewrite a PC; Fundamental statistics for PC axes based on a novel percentile handle” tool that dynamically calculates percentile values including a filter mechanism and an enhanced histogram; We suggest an extension to our previous published paper [21] using GPU-based rendering performance based on textures, optimized for preattentive processing together with scalability in handling highly interactive GeoAnalytics of large volumes of coloured shape maps. This technique is applied to the “Overview, Zoom and filter and Details on demand” information-seeking mantra by allocating larger display areas to dense populated smaller regions with many potentially interesting subsets;

Upload: liu-se

Post on 11-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

GeoAnalytics tools applied to large geospatial datasets

Mikael Jern Tobias Åström Sara Johansson

VITA – Visualization Technology and Applications, Linkoping University, Sweden

[email protected] [email protected] [email protected]

Abstract

Geovisual analytics focuses on finding location-

related patterns and relationship. Many approaches

exist but generally do not scale well with large spatial

datasets. We propose three enhancements that

facilitate scalable geovisual analytics of voluminous

geospatial data based on geographic mapping

coordinated and linked with parallel coordinates (PC):

1) texture-based geographic mapping that exploits

GPU-based rendering performance applied to

overview + detail views, 2) statistical methods

embedded in PC, 3) aggregated dynamic grid maps

that integrate with PC. In this context, we have

extended our previous introduced ’GeoAnalytics’

Visualization (GAV) framework and class library with

a novel implementation of the standard PC using an

atomic layered component architecture that allows

new ideas to be implemented and assessed without

having to rewrite a complete functional PC

component. We demonstrate our proposed

enhancements applied to a large geospatial dataset

containing more than 10,000 Swedish zip (postal) code

regions described by more than three million (X,Y)

boundary coordinates and includes many associated

demographics and statistical attributes.

1. Introduction

Visual exploration and presentation of geospatial

demographics, socioeconomic and environmental data

can provide analysts with strategic and sometimes

competitive advantage. Integrated information and

geovisualization methods, here referred to as geovisual

analytics, focus on finding location-related patterns and

relationship. Geovisual analytics tools are designed to

support highly-interactive explorative spatial data

analysis (ESDA) of large geospatial data. They enable

analysts to look at geospatial data from multiple

perspectives and explore complex relationships that

may exist over a wide array of spatial and multivariate

scales. Geovisual analytics research focuses in

particular on integrating cartographic approaches with

visual representations and interactive methods from

information visualization and ESDA [1,2]

complementing human perceptual skills.

Larger geospatial datasets challenge existing

methods that have been developed and conceptually

verified on small or moderate sized spatial data e.g. US

states or counties, countries of the world etc

[1,2,3,4,10,20,27]. Most conventional geovisual

analytics techniques do not scale well with these large

volumes of spatial data. Performance for interactive

geographic mapping must be immediate and pre-

attentive. The often integrated parallel coordinates plot

(PC) is cluttered with thousands of strings that obscure

effective multivariate data analysis. In this context, we

have further extended our previous introduced

’GeoAnalytics’ Visualization (GAV) framework and

component class library [13,14,15] based on the

principles behind the Visual Analytics [17] and

Geovisual Analytics research programs. We propose the

following major enhancements to explore voluminous

geospatial and multivariate data facilitated by

geographic mapping and coordinated with an enhanced

PC implementation:

We introduce a new and scalable PC component

based on a atomic layered component architecture

that improves performance and scalability

allowing new ideas to be implemented and

assessed without having to rewrite a PC;

Fundamental statistics for PC axes based on a

novel “percentile handle” tool that dynamically

calculates percentile values including a filter

mechanism and an enhanced histogram;

We suggest an extension to our previous

published paper [21] using GPU-based rendering

performance based on textures, optimized for

preattentive processing together with scalability in

handling highly interactive GeoAnalytics of large

volumes of coloured shape maps. This technique

is applied to the “Overview, Zoom and filter and

Details on demand” information-seeking mantra

by allocating larger display areas to dense

populated smaller regions with many potentially

interesting subsets;

We extend a mature geographic mapping method

“dynamic grid map” [22], a scalable technique

suitable for large volume of georeferenced data

points. Here used to shape pixel-oriented maps of

interactively controlled aggregated and relevance

driven ESDA. Overplotting that obscures data

strings in the coordinated PC is prevented and line

thickness that confirms level of relevance results

in further drill-downs into densely populated

regions.

We demonstrate our applied research through a

close collaboration with domain experts from our

research partner Statistics Sweden (SCB) [26] that has

provided a voluminous spatial dataset comprised of

more than 10,000 Swedish zip code regions, based on 3

million (X,Y) boundary coordinates and includes many

associated demographics and statistical attributes used

by analysts for the purpose of making better and more

secure business and policy decisions important to

governments, local authorities and commerce.

Figure 1. ZipView demonstrator with 10,000 zip code regions explored with three coordinated and linked views. Coloured attribute is University Education %. Global (context) shape map (left) with embedded navigation and two zoom views. Detailed (focus) zoomed shape map of South Sweden. New layered PC component with integrated

statistics tools: dynamic percentile calculations and binned histograms. Regions below the 10th percentile (321 residents) for attribute number of people are removed. Two thick green lines show global mean and median values.

The region ‘Lund’ is highlighted and a text box shows drill-down data. A colour legend is based on the 33th percentiles with dynamic sliders. A cluster shows relationship between purchase power and university education.

2. Related work

Large geospatial data can cause serious problems

for most geovisual analytics techniques. Considerable

efforts and enhancements to existing mapping methods

are made. Keim et al [5] describe an interesting

spatial distortion approach mapping large number of

data points to unique positions on a pixel map to avoid

overplotting . Larger display areas are allocated to

dense regions with many potentially interesting subsets

and smaller areas to less interesting items. Cartogram-

based map distortion [6] is another common approach

that trade shape against area by rescaling map regions

according to selected attributes. These two distortion

methods were evaluated but caused problems for the

analysts to spot exact locations because of the

distortion effect. Also the cartogram techniques do not

solve the spatial data scalability issue. Abello describes

a method “Hierarchical Graph Maps” [7] based on

visual indices (i.e. maps) that guide navigation and

visualization and adheres to the mantra overview first,

zoom and filter, then details on demand and is used

experimentally in the navigation of graphs defined on

massive vertex sets.

ESDA that has multivariate data is another

important subject of many research papers [3,4,27].

Focus+Context techniques, for example, display

information of particular interest in detail while the

remaining data is shown in a smaller visual

representation relative to the regions of interest (figure

1). Many of these proposed methods are conceptually

verified on small or moderate sized spatial data e.g. US

states or counties [1,2,3,4,27] and have difficulty

representing and interact with immediate performance

and cause overplotting for massive spatial data.

Since the introduction of PC by Inselberg [18],

many extensions to this technique have been proposed.

In this paper, we introduce a novel layered component

architecture that partitions the PC into smaller and

more manageable “atomic” components. A new method

that extends previous embedded statistical approaches

[19] based on a dynamic percentile handle.

There are several tools in this research domain e.g.

GeoVISTA Studio [8], an open source Java-based

visual programming environment and is commonly

used for developing GeoVis applications. Another

general EDA supporting system is CommonGIS [10].

VIS-STAMP [4] tools leverage visual and

computational methods to search for space-time and

multivariate patterns. The InfoVis Toolkit is available

at [11], GGOBI at [9] and Vtk is the most used SciVis

toolkit [12]. We demonstrate in this paper our latest

research enhancements to the GAV Framework and

class libraries [15] facilitating proposed scalable

methods for large geospatial data.

Figure 2. GAV component level architecture. Atomic layered PC components (lines and axes, inquiry and filter, dynamic labels, background, focus & context, histogram, percentiles etc) are the building blocks for a functional higher-level and more advanced PC component with integrated statistics analysis. The enhanced PC component is in

the next stage assembled together with scatter plot and scatter matrix components by an application developer into a multiple-linked application for multivariate data visualization.

3. The GeoAnalytics Framework

The GAV framework and class library [15], is the

foundation for our GeoAnalytics research agenda.

GAV is designed with the intention to significantly

shorten the time and effort needed to develop state-of-

the-art VA and GeoAnalytics applications. Through an

atomic layered component architecture containing

several hundred C# classes, GAV offers a wide range

of visual representations (from the simple scatter plot

to volume visualization (figure 3). A component also

incorporates versatile interaction methods drawn from

many data visualization research areas. In this paper,

we extend previous work [13,14] by introducing new

means for a developer to extend and further customize

some of the popular functional components by breaking

them into lower-level “atomic” components (figure 2

and 4).

Figure 3. Example of GAV components

3.1. Layered components architecture The GAV open architecture [15], based on a novel

atomic layered component thinking can be used by

researchers for the creation of new or improved

versions of existing components so that ideas can be

tried out rapidly in a fully functional environment.

Customized tailor-made and task-oriented

GeoAnalytics high-level functional components are

assembled based on “atomic” level components. The

component way of thinking enables a shorter

development time, scalability, extensibility, reusability

and robustness of components. A GAV application is

developed into three levels of component (figure 2)

identified according to their nature in the context of

object-oriented programming:

Atomic level components are those very low-level,

high-performance and typically underlying data

structure dependent components that constitute the core

of high-profile software. Lower-level atomic

components are used for developing functional (high-

level) advanced components. Atomic components from

different vendors and public domain can be used in

assembling functional and application components. A

rich set of component resources with fine grain control

allows precise matching to user tasks.

Functional level components are the middle tier

components that are constituted by the combination of

one or more atomic components. These components

typically implement the general functionality of high-

profile software. This is the case, for example, for a

parallel coordinate component (figure 2).

The Application is the level that is constituted by

the combination of one or more functional components.

This level is end-user accessible and typically

implements one or more of the functionalities in a

multiple-linked view environment.

3.2. Atomic based structure In the layered GAV functional component the

foundation of the component is always the same. It

consists of a GAV Component object which handles

the various atomic layers (figure 14) added to the

structure as well as all connections to the rest of GAV.

A common occurrence for layers within a component is

to process the same data, which makes it preferable to

sometimes move that part away from the layers and

onto a shared structure. In the PC case this structure is

called a PC Model and handles calculations from input

data to lines, which are used by both the line, percentile

and histogram layers. This approach speeds up

calculations and conserves memory as duplicates are

unnecessary. Dividing a component into layers

demands special solutions from the fundamental parts

of the component code, mainly the rendering and

interaction as can be seen in figure 4.

3.3. Rendering and layer caching This new architecture creates a demand for a more

advanced rendering pipe and at the same time opens up

new opportunities due to the separation of layers. The

rendering of a GAV Component is controlled by a view

manager which creates all necessary connections to

DirectX. Controls for rendering each separate layer are

set up by the GAV Component which also declares

DirectX state variables such as alpha blending allowing

transparency between layers. When it comes to

rendering order the first layer added is rendered first

which makes it the bottom layer.

Figure 4. The standard PC above was assembled from two atomic components and the enhanced PC below from 6 atomic components

including the dynamic percentile calculation layer.

The layered architecture allows for significant

performance improvements for large datasets through

its ability to cache layers between rendering calls. The

idea is based on the separated rendering of layers

which allows a choice of which layer to render and

when. The PC as an example, when viewing a large

dataset, has a very high ratio between pixels rendered

and actual pixels on the screen. This is very hard to

avoid without compromising the data and leads to the

rendering performance being limited by the graphic

cards fill rate, the speed at which pixels can be drawn.

The proposed solution to still achieve interactive frame

rates is to only render a layer when something has

changed since the last rendering. By rendering layers to

a texture and using this during a subsequent render

instead of re-drawing the whole layer again greatly

reduces the number of operations needed. To cache a

layer like this is a slightly slower operation then just

rendering the layer, so it should only be used with

layers that don’t change often, and has a relatively long

rendering time. In the PC line layer case, caching is

most successfully coupled with suspending rendering

during filtering. This means that the line layer will not

update while the filter sliders are moving which

removes all possible lag, no matter how many lines are

shown.

3.5. GAV Summary The components are developed in C#, based on

Microsoft’s low-level DirectX graphics library and fulfil

many generic requirements for a VA application design

framework such as:

Shorten development time by utilising already devel-

oped and assessed components;

Appropriate for multiple-linked views applications;

Mechanism for integrating external components;

A 3D data model for spatial-temporal and

multivariate attribute data exploration;

Texture colour map rendering and other GPU

graphics optimization using DirectX;

Visual space-time and multivariate inquiry tools;

Component-embedded interactions including brush,

pick, highlight, filter, dynamic sliders, view

coordination, focus & context and many special

interaction facilities ;

Framework for the creation of both user components

and improved versions of existing components so

that ideas can be tried out rapidly in a fully

functional useful environment;

Integrated mechanism for saving and packaging the

results of a VA reasoning process;

4. Embedded Statistical Methods in PC

In our first approach to interact with large spatial

datasets, we provide interactive familiar statistical

indicators. Four atomic components are presented

“focus+context”, “mean and median”, “histogram” and

a novel “percentile handle” that extend [25] and

combine standard PC techniques and are potential

building blocks for ESDA utilities in a tailor-made

extended PC component.

4.1. Histogram and Focus+Context layers A variant of the PC embedded histogram

technique, similar to Hauser et al. [19], is here

implemented as an atomic PC layer. Histograms

attached to each PC axis (figure 5) are used to show

frequency information by splitting the axes into a user

defined number of equally high rectangular areas

(bins). The width of a rectangle indicates the frequency

of regions intersecting that bin; the more regions within

an area the wider the rectangle while a bin with no

polyline intersections is not visible.

Filter operation in figure 5 is presented using two

differently coloured and sized bins. The width of the

light grey bins represents the total number of regions

intersecting that area, whereas the dark grey bins

represent the number of non-filtered regions

intersecting the area. Although a PC with range sliders

is useful for filtering out data along the axes, it is

limited to removing data above the maximum and

below minimum values. To further increase the

possibilities of filtering in PC, an additional method

has been implemented based on the histogram bins. By

simple interaction, regions within a selected bin can be

removed. In figure 5, polylines intersecting the 2nd

, 3rd

and 4th

histogram bins for purchase power are filtered

out. An opposite approach to this filtering will keep the

regions of the selected bin and instead remove the

regions of all other bins along that attribute axis. These

atomic components are used to discover location-

related patterns in an overplotted PC display.

Figure 5. Combined Histogram and Focus atomic layers attached to PC axes.

The focus+context atomic layer (figure 5 right)

provides means to alter the scale (focus) through

dynamic moving min and max value sliders. This

action expands the focus area (between min=30 and

max=50) and pushes large number of polylines (values

not in focus) to the context area. Neither the histogram

nor the focus+context mechanisms are new to the PC

but their implementation as atomic layered components

is novel and improves scalability. By removing

uninteresting polylines from the focus area, more space

is freed up reducing clutter and improves detail

analysis. Figure 5 right also shows that the histogram

supports frequency calculation in the focus area.

4.2 Percentile handle layer Another statistical indicator used to facilitate the

overview and understanding of data distribution and

especially for outlier detection is percentile calculation.

A “percentile handle” is interactively moved along a

PC axis (figure 6), dynamically calculating the

percentile and associated percentile value. Given a

range limited by an upper and a lower percentile, filter

operations are performed either inside or outside that

area controlled by the small triangles on the left side.

Figure 6. Three images (A-C) demonstrate the “percentile handle” tool. A: 5

th and 95

th percentile

representation; B: Outliers below the 5th

and above the 95

th percentile values are removed for mpg (the small

triangles on handle controls the filter operation); C: Outliers are maintained and the values between the two percentiles handles are removed for both variables.

In figure 1, the percentile handles are positioned at

the 5th

and 95th

percentiles. Regions with a population

less than 321 residents (10th

percentile) are removed in

the number of people attribute.

Figure 7. Dynamic colour legend. The colour scheme is divided into four ranges, separated at the values of the 25

th ,

the 50

th and the 75

th percentile.

The same percentile calculation is used in the

dynamic colour legend component. This provides a

better distribution of colours related to the data and

is based on an accurate statistical method.

5. Accelerating Graphics for Mapping

Atomic layers help improve performance, but there

are other visualization scenarios when special solutions

are needed. Interactive mapping of 10,000 Swedish zip

code regions based 3 million (X,Y) coordinates is such

an application scenario.

The simplest way to draw the map regions would

be to render them one by one, changing colour when an

attribute value changes or a dynamic colour legend

require such an event. With this technique, interactive

performance is acceptable for a small number of

regions but when handling a large amount of highly

detailed areas, the performance will become

unacceptable due to the large number of render

operations required. A zip code map with 10,000

regions and huge number of coordinates can generate

several million triangles; add to this the amount of

polylines to be drawn to represent region borders and it

turns out to be a tough assignment even for a modern

computer. Without any special solution applied, the

number of coloured map regions should be limited to a

thousand regions per view.

Many regions will be smaller or close to the size of

one pixel and the human eye cannot distinguish such

small regions so even if we could add some more

triangles into our screen we would not be able to see

them. This works to our advantage as we can reduce

the level-of-detail (LOD) without impeding on the

visuals, by pre-calculating maps with different level of

resolution and thus controlling the amount of triangles

that must be rendered.

Figure 8. Three LODs based on 10,000 regions: Sweden, municipality and central Stockholm

Reducing LOD is not enough, as the user still needs to

have access to the fully detailed map when zooming in

on interesting areas. The proposed solution is based on

an extension of our previous described research based

on using texture mapping and adaptive rendering, so

that only the regions the user actually sees are passed

into the render loop and the remaining regions are

ignored. The problem with multiple render operations

still remains but can be solved by adding all triangles to

the same vertex structure and limit to only one render

operation. This, however, creates another problem as

we now can't change colours for each individual region.

The proposal is to use a pixel shader to colour each

area separately. We move the colour information away

from the vertex structure and onto something smaller

that is faster to change. To do this we assign an index

to all vertexes in a region and store this in the red and

green component of the colour data in each vertex. The

index is then used as texture coordinates to fetch a

colour value from a texture passed to the shader. This

means that changing the colour of several million

vertices comes down to changing a simple 256 by 256

texture. The size comes from the available number of

colour levels in each channel and limits us to 65536

data indexes, sufficient for almost any application. The

possibility exists to expand this by using the blue

colour as well, but that should seldom be needed. This

kind of shading technique can also be applied to other

visualization methods.

6. Dynamic Grid Maps

The coloured 2D grid map was introduced by the

author [22,23,24] already 28 years ago and is used in

this final and third approach to deal with our large

geospatial dataset coordinated with the new layered PC

component.

Figure 9. The spatial data set contains pairs of (X,Y) coordinates that represent a calculated centre point for the original zip code region. The attached attribute represents the number of residents in the

region and is used to determine an importance (weight) value for each spatial data point. The second

multivariate data is the same as above and contains the selected attributes for each zip code region.

Generally, 2D gridding is a process during which

the original georeferenced (X,Y) scatter data points are

converted into a frequency-based grid (bin) map. Every

grid cell is assigned an occupancy value based on, for

example, the calculated number of people living in that

defined grid area. The gridding transformation

preserves the spatial distribution characteristics of the

original data and replaces it with an aggregated grid

representation with no overlap.

Figure 10. Spatial (x,y) data are aggregated into a grid map and coloured according to the aggregated

value

We first convert the original 10,000 zip code

regions represented by millions of shape coordinates

into a single (x,y) centre-point representation (figure

9). This new spatial data is then transformed into an

aggregated and more suitable-sized grid map (figure

10). We exploit the useful properties of aggregated grid

representation to achieve a scalable representation of

our large geospatial data set that considerably speeds

up the interactive performance but also reduce

overplotting in the PC (figure 12). The 10,000 original

PC strings are reduced to 50-1000 strings depending on

the grid size, dynamically controlled through a range

slider with immediate response time from the linked

grid map and PC views.

Based on the georeferenced point dataset and

associated demographics attributes:

Region(x,y,ak), where ak is the attribute number k for

that region we construct a 2D grid map:

Grid(xi,yj,,aggr ak) = { (xi=1,2,….nx); (y j=1,2, … .ny);

(aggr ak=1,2, …nk)} with nx*ny grids and nk

aggregated attributes.

Aggr ak: aggregated attribute value for attribute number

k for a defined grid cell.

n: the number of regions in the grid cell.

akj: attribute number k for region j.

popj: total population in region j

The sum of the selected attribute values for regions

within a defined grid cell is multiplied with their

associate attribute number of people (or other weight

driven data such as an age group) and then divided with

the total population for that grid cell. Attributes are

attached to a given location in space and the calculated

attribute value is an aggregated attribute of the entire

space it occupies.

An aggregated grid element represents the

calculated mean (or if requested min/max) value for a

selected attribute, for zip regions that fall inside the

given grid. Additional conditions can be given through

the attached PC. For example, in figure 11, a condition

is defined by a dynamic filter operation that a region

must have at least 1,500 citizens to be included in the

aggregated mean value representing the grid item

(12*12km). An original zip code region with attached

attributes is in the PC represented by a string across all

axes (figure 1), but in this new context, a single

aggregated grid cell, can similarly be represented by

such a string. By dynamically changing the size of a

grid, the number of strings in the PC, can be

dramatically reduced and overplotting avoided and

make interesting patterns more visible.

Figure 11. Filtered PC and aggregated grid map (12 x 12km). Line thickness in PC represents population

density – selected attribute is purchase power. Highlighted region Billdal (outside Gothenburg) shows a strongly correlation between high purchase power, low

age group 65+ and high education level.

The result from a dynamic grid size test based on

four selected demographics attributes purchase power,

age group 0-15 years, age group 65 + and university

education is shown in figure 12. Limited to only 64

grids, a strong correlation between these four attributes

is demonstrated. After an interesting discovery, the

analyst can select one or more relevant grid cells and

drill-down into more detailed information (figure 13).

Figure 12. Four levels of aggregated grid maps (64, 221, 612, and 1490 grids) show high purchase power in red (bottom original 10,000 regions). Each

grid cell is represented by a string in the PC and coloured according to the aggregated map attribute

value. 64 aggregated grids are enough to see a strong correlation between all four attributes.

Figure 13. Overview, Zoom and filter, Details on demand are demonstrated. A drill-down operation is requested for an interesting grid cell in the south of

Sweden. The original 290 zip code regions available in this densely populated and aggregated 6 x 6 km area are displayed in a focused map view while the context is maintained in the grid map and a small navigation

view. Overplotting is also avoided in the PC. Noteworthy is that focus scaling is applied for two PC

attributes axes (purchase power and university education) to reduce overplotting. Green and yellow lines represent mean values. An interesting zip code

region “Malmö” with 262 people is highlighted.

7. Conclusions

We propose three enhanced methods that in

isolation or integrated can contribute to geovisual

analytics of voluminous geospatial and multivariate

data: 1) statistical methods embedded in PC, 2) texture-

based geographic mapping that exploits GPU-based

rendering performance applied to overview + detail

views,3) aggregated dynamic grid maps that integrate

with PC. Our main contribution is:

Novel layered component architecture that

partitions the PC into smaller and more manageable

“atomic” components (figure 14);

Novel dynamic percentile handle tool;

Dynamic grid maps with coordinated PC;

“Relevance” driven aggregated gridding function

that gives more emphasize to data with, for

example, higher elderly population;

Involvement of domain experts providing tasks,

data and evaluation to the research project;

We have demonstrated our applied research results

in collaboration with Statistics Sweden [26]. A

successful scalable geospatial data aggregation is

achieved from the original 10,000 spatial data items

converted to only 64 aggregated grid regions based on

our aggregation grid map technique. These 64 strings

show strongly correlation between four attributes in the

attached PC.

Figure 14. Novel layered component architecture that partitions the PC into smaller and more

manageable “atomic” components

Future work will include improvements of the

demonstrators and more evaluation together with our

partner. Additional atomic layered components, such as

the scatter plot, will be added to GAV. Future research

will be committed to also include the time dimension in

our zip code data set and thus further increase the

present used 500,000 attribute values representing one

time step to a new challenging level. Finally, we

recommend a visit to our GeoZip web site [16] where

colour images, video presentation and the grid map

prototype ZipGrid can be downloaded and reviewed.

The GAV framework is public available at [15].

Acknowledgements This applied research study was supported and

monitored by Statistics Sweden [26] (Marie Haldorson

RM/RU-Ö). The research is also supported, in part, by

funding from the Swedish Knowledge Foundation

(KK-Stiftelsen). Many thanks to our colleagues in the

National Center for Visual Analytics (NCVA) at ITN,

Linkoping University. Special thanks to everyone in the

GAV development team.

10. References

[1] J. Roberts, “Exploratory visualization with multiple

linked views.” In A. MacEachren, M.-J. Kraak, and J.

Dykes, editors, Exploring Geovisualization.

Amsterdam: Elseviers, December 2004.

[2] D. Guo, M.Gahegan, A. MacEachren and B.Zhou,

“Multivariate Analysis and Geovisualization with an

Integrated Geographic Knowledge Discovery

Approach”, Cartography and Geographic Information

Science, Vol. 32, No. 2, 2005, pp 113-132.

[3] J. Chen, A.M. MacEachren and D. Guo, “Visual

Inquiry Toolkit - An Integrated Approach for

Exploring and Interpreting Space-Time, Multivariate

Patterns”, Proceedings, AutoCarto 2006, Vancouver.

[4] D. Guo, J. Chen, A.M. MacEachren, K. Liao “A

visualization system for space-time and multivariate

patterns (VIS-STAMP),” IEEE Visualization and

Computer Graphics, Vol 12, No 6, 2006.

[5] D. Keim, C. Pense, M Sips and S. North, “Visual Data

Mining in Large Geospatial Point Sets”, IEEE

Computer Graphics and Applications, sep/oct, 2004.

[6] D. Keim, S. North and C. Pense, “Cartodraw: A fast

Algorithm for Generating Contiguous Cartograms”,

IEEE Trans. Visualization and Computer Graphics

(TVCG), vol 10, no 1, 2004, pp. 95-110.

[7] J. Abello: Hierarchical graph maps. Computers &

Graphics 28(3): 345-359 (2004)

[8] Novel layered component architecture that partitions

the PC into smaller and more manageable “atomic”

components codeless visual programming environment

for geoscientific data analysis and visualization.” The

Journal of Computers and Geosciences, 28(10) :1131–

1144, November 2002,

http://www.geovistastudio.psu.edu.

[9] http://www.ggobi.org/

[10] H. Voss, N. Andrienko, and G. Andrienko,

“Commongis - common access to geographically

referenced data.” ERCIM News, (41):44–46, 2000.

http://www.commongis.de

[11] J.Fekete, C.Plaisant, “InfoVis Toolkit.” In proceedings

of the 10th IEEE symposium in Information

Visualization (InfoVis ’04), pp. 167-174.

[12] W. Schroeder, K. Martin, and M. Lorensen, “The

Visualization Toolkit, 2nd ed.” Prentice Hall PTR: New

Jersey, 1998.

[13] Jern, Johansson, Johansson, Franzén, “The GAV

Toolkit for Multiple Linked Views.” Reviewed

proceedings, CMV07, Zurich, July 2007, published by

IEEE Computer Society, ISBN 0-7695-2903-8.

[14] Jern, Franzén, “Integrating GeoVis and InfoVis

components.” Reviewed proceedings, IV07, Zurich,

July 2007, published by IEEE Computer Society,

ISBN 0-7695-2900-3.

[15] GAV Framework: http://vita.itn.liu.se/GAV

[16] http://vita.itn.liu.se/gav/GeoZip

[17] Thomas, J & Cook, K. 2005. “Illuminating the Path:

The Research and Development Agenda for Visual

Analytics.”

[18] A. Inselberg. “The plane with parallel coordinates”,

The Visual Computer 1(2), pp 69-92, 1985.

[19] H. Hauser, F. Ledermann, and H. Doleisch. Angular

brushing of extended parallel coordinates. In

Proceedings of IEEE Symposium on Information

Visualization 2002, pages 127–130, Boston,

Massachusetts, Oct. 2002. IEEE Computer Society.

[20] G. Andrienko, N. Andrienko, R. Fischer, V. Mues, and

A. Schuck. “The parallel coordinate plot in action:

design and use for geographic visualization.”

International Journal of Geographical Information

Science, 20(10) :1149–1171, November 2006.

[21] Johansson, Ljung, Jern, Cooper: “Revealing structure

within clustered parallel coordinates display”, IEEE

Symposium on Information Visualization 2005,

October 23-25, Minneapolis, MN, USA.

[22] Jern & Bladh: "A Color Plotter System and its

Applications in Geoscience", Geoscience and Remote

Sensing, IEEE Transactions, July 1980. Volume GE-

18, Number 3, page 256-263, 1980.

[23] Jern: "Raster Graphics Approach in Mapping”,

Computer & Graphics, Pergamon Press, Volume 9,

Number 4, 1985. Page 373-381.

[24] Jern: "Thematic Mapping", Eurographics '85

Conference, Proceedings, 1985.

[25] Johansson, Jern: “GeoAnalytics Visual Inquiry and

Filtering Tools in Parallel Coordinates Plot”,

Reviewed proceedings, Seattle, Nov 2007, ACM

GIS2007

[26] Statistics Sweden (SCB) http://www.scb.se

[27] G. Andrienko and N. Andrienko. Visual exploration

of the spatial distribution of temporal behaviors. In

Proceedings of the International Conference on

Information Visualisation, pages 799–806. IEEE

Computer Society, 2005.