storyboarding for visual analytics
TRANSCRIPT
Article
Information Visualization0(0) 1–25� The Author(s) 2013Reprints and permissions:sagepub.co.uk/journalsPermissions.navDOI: 10.1177/1473871613487089ivi.sagepub.com
Storyboarding for visual analytics
Rick Walker1, Llyr ap Cenydd2, Serban Pop2, Helen C Miles2,Chris J Hughes2, William J Teahan2 and Jonathan C Roberts2
AbstractAnalysts wish to explore different hypotheses, organize their thoughts into visual narratives and present theirfindings. Some developers have used algorithms to ascertain key events from their data, while others havevisualized different states of their exploration and utilized free-form canvases to enable the users to developtheir thoughts. What is required is a visual layout strategy that summarizes specific events and allows usersto layout the story in a structured way. We propose the use of the concept of ‘storyboarding’ for visual analy-tics. In film production, storyboarding techniques enable film directors and those working on the film to pre-visualize the shots and evaluate potential problems. We present six principles of storyboarding for visual ana-lytics: composition, viewpoints, transition, annotability, interactivity and separability. We use these principlesto develop epSpread, which we apply to VAST Challenge 2011 microblogging data set and to Twitter data fromthe 2012 Olympic Games. We present technical challenges and design decisions for developing the epSpreadstoryboarding visual analytics tool that demonstrate the effectiveness of our design and discuss lessonslearnt with the storyboarding method.
KeywordsAnalysis tool, visual analytics, coordinated views, event-based data, presentation, visual analysis, trends, textmining, tag cloud, streamgraph
Introduction
There are many situations where analysts wish to
explore or explain key events of an unfolding story.
These ‘stories’ occur either because they represent a
human-based narrative or are created by an evaluator
who is reasoning about an argumentation. Importantly,
it is often through the telling of a story (or at least the
creation process) that issues or logical errors with the
discourse are ironed out. The narrative inquiry helps us
make sense of the world. Bell writes, ‘Narrative inquiry
rests on the epistemological assumption that we as
human beings make sense of random experience by the
imposition of story structures’.1
Accounting for and evaluating how events evolve
through time are important aspects of both the investi-
gation and dissemination processes in visual analytics
(VA). Both the foraging and sense-making loops of
Pirolli and Card’s2 model and especially the data
frame model of Klein et al.3 collate sets of structures
together. These structures act as principal events.
While some researchers have investigated how to
algorithmically ascertain key events from data, other
researchers have visualized different states of their
exploration, and some others have used ad hoc meth-
ods (unstructured scratch pads to arrange the user’s
thoughts), it is surprising that few researchers have
focused on visualizing these specific events in a more
structured way. Ideally what is required, for VA, is a
visual layout mechanism that summarizes specific
1Middlesex University, London, UK2Bangor University, Bangor, UK
Corresponding author:Rick Walker, Middlesex University, The Burroughs, Hendon,London NW4 4BT, UK.Email: [email protected]
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
events and allows users to layout the story in a graphi-
cal way and also enables interaction to enable the
user to explore different visual scenarios. To achieve
this, propose using the concept of ‘storyboarding’
for VA.
Storyboards are used in film production as a pre-
visualization technique, in order to help the director
plan the film and try out different scene ideas and
orderings of camera shots. They represent a sequence
of (usually) sketched drawings, each on an individual
sheet (or panel) that are pinned on a bulletin board for
the director, screenwriters and colleagues to view and
discuss. In this article, we develop this idea for VA
investigation. We therefore draw upon the principles
rather than mimicking their form exactly. In our pro-
totype system, story panels consist of a number of dif-
ferent visualization and analysis components suitable
for the analysis of microblogging data. Multiple panels
can then be linked together on a timeline together with
annotation to tell the story of a scenario in a way that
also highlights the provenance of each section of the
analysis and allows it to be reproduced. This story-
board design metaphor is discussed further in section
‘Principles of storyboarding for VA’.
Some data sets inherently lend themselves to story-
boarding. In fact, the growth of microblogging sites
presents an enormous opportunity to analyse events as
they unfold and make use of the messages from differ-
ent perspectives. Indeed, the short latency of these
microblog ‘events’ can be used by disaster response
agencies to help inform and direct effort effectively.4
There is a wealth of information that is contained
within these blogs, and the benefits of the analysis are
huge. Researchers have access to real-time text com-
mentaries at live events, from multiple perspectives.
They give insight into the user’s sentiment, and the
blogs provide (often honest) insight into the thoughts
of the user. Although we discuss in detail some of the
uses that have been made so far of this data source
later in this article, we believe that the potential of
these services is largely untapped. We have developed
the storyboarding techniques for the visualization of
microblog data, but the techniques should be widely
applicable to other data domains.
We evolved the concept of ‘storyboarding scenarios’
while developing our solution to the VAST 2011
Mini-Challenge 1 (hereafter referred to as MC1). The
challenge provides a cleaned microblog data set of over
1 million blogs, with injected ground truth. The story-
boards were an invaluable part of our exploration
phase and helped us refine which stories to include in
the report. In fact, during the code development
phase, each of the members of the team would use the
(rapidly developing) tool to individually discover and
sequence events. These ‘stories’ were then presented
to the group meetings. Their presented panels were
confirmed or bettered by the panels of other research-
ers and the ensuing discussion. Because each panel
was dynamic, we could interactively adjust parameters
to explore in real time the sensitivity or boundaries of
specific events. This process engendered additional
questions from the group, which were immediately
explored during the meeting. If the tool could not be
used to answer specific questions, then new solutions
were designed. These features were prioritized, and
the development cycle continued for that week. This
SCRUM-like process directly led to new tool require-
ments, which were then developed over the subse-
quent days.5 This created an agile and query-driven
development cycle.
Through the development of epSpread for the MC1
data set, and subsequent presentation of our MC1
solution, our thoughts developed further. We started
to consider the wider implications of using storyboard-
ing in VA. What principles of storyboarding can be
used in VA? What functionality is required? What les-
sons have we learnt? Can the technique be applied to
other data sets (other than MC1)? and Would the tech-
niques that we have developed be suitable for real
data?
We explore these issues within this article. Our
hypothesis is that visual storyboarding techniques can
be used to help explain and present events. This can
be applied in a crisis situation and allow analysts to
explore different trending themes and to understand
events as they unfold. Certainly, each user tells a story
from their point of view, but by taking an overview of
the information streams, it is possible to understand
general trends or drill down into specific views. As a
coherent set, the microblogs together form a polypho-
nic narrative of different points of view and voices.
This approach can be used to explore representational
states in multivocal discourses and can be used to
present one or several parallel stories.
The contributions of this article are (1) the devel-
opment of a set of design principles for storyboarding
for VA, (2) a prototype tool built on these principles,
and (3) two case studies, one that uses the tool with
the MC1 data set to uncover ground truth and present
it and a second study on Twitter data that was gath-
ered during the London 2012 Olympic Games.
The remainder of this article is organized as follows.
In the ‘Storyboarding’ section, we examine the use of
storyboarding as a tool for narrative and storytelling,
and we consider the specialized demands of visual ana-
lytic applications in the ‘Principles of storyboarding for
VA’ section. In the ‘Microblogging, events, text analy-
sis and visualization’ section, we consider other work
in the field of microblogging, covering both text analy-
tics and visualization. The implementation of our tool,
2 Information Visualization 0(0)
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
epSpread, is described in the ‘Tool design’ section
together with our text analysis approach. Two case
studies on using epSpread to analyse the MC1 data set
and Twitter messages about the 2012 Olympics are
shown in the ‘Case study: VAST 2011 MC1’ and
‘Case study: London Olympics’ sections. Finally, the
results are discussed, conclusions drawn and future
work detailed.
Storytelling in visualization
One of the key elements of research agendas for VA is
the requirement for the communication of results and
analytic processes. Tools should allow analytic reason-
ing, note-taking, production, presentation and disse-
mination to take place at the same time.6 They should
provide support for documenting the analysis process,
keeping provenance of finding, reporting and storytell-
ing,7 and the resulting information should be pre-
sented in a decision- or task-oriented way.8
The difference between visualization and storytell-
ing is discussed by Gershon and Page.9 They highlight
the benefits of stories as ways of communicating infor-
mation in a short yet memorable fashion and discuss
an example of how storytelling concepts can be applied
to a hypothetical command-and-control situation.
Segel and Heer10 systemically review the design space
for narrative visualization along three dimensions:
genre, visual narrative tactics and narrative structure
tactics. They define seven genres of narrative visualiza-
tion – magazine style, annotated chart, partitioned
poster, flow chart, comic strip, slide show and film/
video/animation – and use them to characterize their
examples with provision for overlap. These genres vary
in terms of number of frames and ordering of their
visual elements. For example, a comic may have many
frames, while a magazine style will use only one.
The comic strip is particularly interesting in the
context of storytelling. While Eisner11 defines comics
simply as ‘Sequential Art’, McCloud12 expands this as
‘juxtaposed pictorial and other images in deliberate
sequence’. Both of these definitions are notable
because they imply an ordinal relationship over time,
but not a quantitative one. In fact, comics offer greater
flexibility in incorporating time compared to paintings,
photography or even film,13 since while the panel cur-
rently being read always represents the present, both
the past and future are visible in preceding and subse-
quent panels at the same time.
Storyboarding
The concept of the storyboard14 that is used in the film
industry enables the production team to organize the
action depicted in the script. Often, the storyboards
include a central rectangle where the artist includes a
sketch, a place for written description for that scene, a
title and details of the artist (Figure 1). The British
Broadcasting Corporation (BBC) in their ‘my place
my space’ competition (bbc.co.uk/myplacemyspace)
provided competition entrants with a suitable example
of a storyboard (see Figure 2). In fact, the sketch need
not be too detailed, and stick figures could be used to
describe the scenes, but the panel does need to include
enough detail to describe the scene and demonstrate
camera positions.
The challenge of providing concise visual explana-
tions at an appropriate level is not unique to visual
explanation. For example, the production of assembly
instructions for furniture15 can be viewed as a comic
strip showing how the furniture is put together: the
steps must take place in order, and the past and future
of the object are visible. In human–computer interac-
tion (HCI), storyboards are used to depict a user’s
interaction with and reaction to system elements,16
and tools exist to support storyboard develop-
ment.17,18 Storyboards have also been used to sum-
marize video – for example, Herranz et al.19 produce
comic-like summaries for videos in an automated fash-
ion by employing scalable representations, while
Goldman et al.20 define the schematic storyboard, a sin-
gle static image that is constructed from multiple video
frames and annotation.
The storyboard metaphor matches well with visual
analytic investigation and in particular fits with the
visualization of microblogging data for several reasons.
The microblogs are stories themselves and often
describe a progression of events evolving through time;
the boards can be used to display key moments in the
story that is contained in the microblog; and impor-
tantly, the storyboards engender discussion. In partic-
ular, film directors and the production team of a
movie view the boards and discuss the whole story,
and they may change the plot after seeing the progres-
sion of the boards. Similarly, the stories that are con-
tained within the microblogs can be described by a
series of ‘key moments’. The visual storyboards can
then be presented to analysts and used to discuss the
progression of the ‘story’ or crisis.
Principles of storyboarding for VA
Film storyboards are a pre-visualization technique:
they are used to set up shots and to determine lighting,
set and prop requirements. Their role is to ‘illuminate
and augment the script narrative’14 not to act as a
replacement for it. Likewise, in HCI, storyboarding
Walker et al. 3
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
acts as a description of the use case or interaction
scenario.
So what can we learn from storyboarding, and how
can it apply to VA? Various authors talk about the ‘lan-
guage of storyboarding’.12,14 The language covers (for
instance) the type of shot and the progression from
one shot to another. Importantly, there are three dif-
ferent components to the language: (1) those that
explain what assets appear in the shot and how the
frame is composed, (2) how the shot is taken (its view-
point) and (3) how frames progress from one to another.
Correspondingly, we divide the language into three
parts. These are depicted schematically in Figure 3.
Each of these three categories correspond directly to
equivalent functionality in our storyboarding model
for VA. However, the storyboarding techniques used
in film tend to be static (and sketchy) representations
of the film and are used as a pre-visualization tech-
nique. Subsequently, we add three more principles for
storyboarding for VA: annotability, interactivity and
separability. The six principles are shown schematically
in Figure 4.
Storyboard composition
The first set of phrases in the ‘language of storyboard-
ing’ we categorize as the composition of the shot. Here,
the artist needs to decide what information is included
in the shot: which actors will appear, what will they be
wearing, what props are there and what the scene looks
like. Atasoy and Martens21 consider the composition
by People, Places and Objects. This is realized by creating
an establishing shot, which provides an overview of the
scene and sets the scene and tone of the film. For
instance, if the establishing shot shows a road along
(say) the Italian Amalfi Coast on a hot summer’s day,
then the panels may imply that the film is a car com-
mercial. The setting of the frames can be ‘established’
through various subtleties. For instance, an office scene
may show a window; if the picture through the window
shows high-rise sky-scrapers, then it would imply that
the office is high-up, important and in a big city.
This set of composition tasks applies directly to VA.
Users can compose their visualization from different
visual components. These components could be maps,
statistical views and charts, legends, titles and keys
Figure 1. A typical storyboard template, with a rectangle for the sketches, lines for the text description of the scene andspace for associated details such as a title.
4 Information Visualization 0(0)
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
along with interaction widgets such as buttons and
sliders. Choosing which components and how they are
positioned in the visualization determines the visual
appearance of the tool. The first storyboard panel
could be used as an ‘establishing frame’ to demon-
strate all the possible components of the system or
could provide an overview that demonstrates the com-
plete range of the data. Indeed, this concept fits well
with Shneiderman’s mantra of overview first, zoom
and filter and then details on demand. Similar to film
storyboarding, there is certainly much subtlety that is
contained within any visualization. These subtitles
represent unwritten rules or objects that a user would
expect (for a particular visualization domain or tool
type). These subtleties can range from colour combi-
nations to the layout and positioning of objects.
Figure 2. An example of a roughly sketched storyboard that was provided to entrants of the BBC’s ‘my place my space’competition (bbc.co.uk/myplacemyspace). The information in the storyboard needs to be suitable for a director to followand discuss and for the filming team to understand.BBC: British Broadcasting Corporation.
Figure 3. The language of storyboarding describes the composition of the storyboard; how people, place and objectsappear in the viewpoint and how frames progress to other frames.
Walker et al. 5
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
Therefore, having the right combination of visual
components is an important part of the design process
and important to storyboarding for VA.
Storyboard viewpoint
Second, there are several aspects in storyboarding that
control the viewpoint of the frame. Storyboard artists
discuss the type of shot that is used for that frame. For
instance, a wide-angle view would demonstrate an
overview of the world and may include more people,
places or objects and can be used to set up the scene.
Often a wide-angle shot is used at the start of a progres-
sion, and through a series of storyboard frames, the
observer is drawn from the wide view to a medium
view and finally into a full-shot or close-up of an object.
For instance, two people may be in conversation about
a letter, and the frames move from one person to
another as they discuss the letter and finally demon-
strate a close-up shot on the letter. The angle of the shot
is also important. Again, with a conversation between
two adults, the angle of the shot should be the same
for both adults (thus giving the impression that we are
looking through the eyes of one converser and then
the other). This represents a point-of-view (POV) shot,
whereas a child looking up to an adult should have a
low camera angle shot, which also makes the subject
appear important. Correspondingly, a high camera
angle makes the subject small and appear weak or
diminutive. Motion in the shot is often determined
through annotation. For instance, an arrow can be
used to demonstrate the path of someone who is run-
ning. Annotation can also be used to describe special
effects (such as fire or explosions), which grab the
attention of the viewer. We believe annotation is an
important aspect of storyboarding in VA and therefore
include it as a separate principle.
These storyboarding techniques offer inspiration for
VA analysis. Techniques of wide-angle, full-shot and
close-up are similar to different zoom levels. The use
of different projections enables the user to understand
the data through different viewpoints.22 A close-up
view could be interpreted as being details-on-demand,
while the different camera angles or POVs are similar
to representing the data through different multiform
views. Representing motion by static arrows (for
instance) is used in fluid-flow visualization techniques,
and it is an important technique that could be applied
to non-fluid-flow visualization systems.
Storyboards in film production are usually sketched.
There are several advantages of sketchiness: it implies
an unfinished state, encourages discussion and is often
beautiful in its simplicity. Likewise, there are several
advantages to sketchy styled renderings for VA espe-
cially because they have been evaluated to increase the
positivity of users about the visual depiction.23 One
advantage of sketched storyboards is that they focus on
important aspects. In our work, we have used algo-
rithms to summarize the data and aggregate the infor-
mation in order to locate interesting information and
features that can be visualized appropriately. We do
not believe that sketchiness is necessary in storyboard-
ing for VA, though it can be beneficial. We do believe
though that the function of ‘summarization’ is more
important for storyboarding. Following on from this
idea, it may be possible to use the Document Cards24
method with the storyboarding idea to provide sum-
mary views.
Transition between the storyboards
Finally, the artists describe how the frames progress and
transition from one to another. The placement of these
panels is important to determine how the order is
Figure 4. The six storyboarding principles for visual analytics.
6 Information Visualization 0(0)
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
organized. Often the storyboard panels are positioned
side-by-side. Segel and Heer10 explain that there are
other layout styles, such as magazine or comic.
However, progression of the panels is usually left to
right and top to bottom. Annotations and descriptions of
the scene are often added to frames to explain informa-
tion that is drawn in the storyboard. There are several
different transition styles, including cut, dissolve, fade,
pan, tilt or zoom. For instance, the shots may demon-
strate a conversation between two people: one frame
shows the picture of one person, and then, the camera
cuts to see the reaction shot of the other person. A cut
can also be used to compress the time. For instance, a
mother walking to the door may take several frames to
complete; however, it could be possible to cut to an out-
side shot that shows the door and a kid standing outside
with cookies and then cut back to the mother opening
the door. This sequence not only allows time to be com-
pressed but also allows the observer to understand the
progression. In fact, the expectations of the viewer need
to be considered, especially to determine continuity. A
ball being hit from left to right of the frame would be
assumed to be travelling left to right in subsequent
shots; likewise, if a character is looking in one direction
in one shot and the opposite direction in a subsequent
shot, then continuity will be forfeited. Naturally, there is
a line of action, in one storyboard, that enables the eye
to understand the motion.
Many of the view layouts used in VA are ad hoc
arrangements. Multiple views are often positioned by
the user or may be positioned side-by-side.22 Small
multiple views may be organized in a tabular (matrix)
format but they are often merely different projections
of the same data. For instance, one view in a matrix of
scatter plots represents a specific correlation between
two independent variables. One progression that is
suitable within storyboarding for VA is Shneiderman’s
mantra of ‘overview first, zoom and filter and details
on demand’. This could enact as a useful progression,
where individual panels demonstrate specifically an
overview, a zoomed view and so on in turn. However,
such a set of panels may be difficult to understand
because the transitions would cut from semantically
different panels, and hence, continuity may be difficult
to understand from one panel to the next. In fact, con-
tinuity is a useful consideration for VA. How does the
user demonstrate the provenance of a specific view?
This is particularly challenging in VA, but storyboard-
ing techniques along with annotation could be used to
tell the provenance story.
In the microblog data sets, in particular, time is an
important variable to display. By visualizing time, the
progression of specific panels may be more easily
understood. In fact, we take this idea further in our
interpretation of transitions between storyboard.
Our storyboard design follows a hybrid design strat-
egy that is somewhat between the comic strip and the
timeline. Instead of relying on labels on a timeline,
events are denoted by the visualizations by which the
analyst identified the event. In this manner, the story-
telling advantages of the comic strip are combined
with the strict temporal ordering of the timeline. The
completed storyboard would represent not only the
sequence of events that occurred but also the evidence
that supports this line of argumentation. It could pro-
vide a useful collaborative analysis tool since story-
boards could be prepared by multiple analysts and
compared to discover alternative hypothesis or narra-
tive sequences. We propose three additional principles
to which visual analytic storyboards must adhere –
annotability, interactivity and separability.
Annotability
While each storyboard panel will be composed of one
or more visualizations, this alone is not sufficient to
convey meaning. Different users might see different
patterns or draw different conclusions from the same
visual representations. When constructing panels,
then, it is important to allow the analyst to annotate
visualizations. In fact, this process can take place at
two levels: within a panel, by adding one or more text
notes directly on the visualizations to indicate impor-
tant features, and at panel level, as a higher level sum-
mary of the event that the panel depicts. We term these
two different notation methodologies annotation and
captioning, respectively.
Interactivity
While storyboards can be constructed from static
screenshots of visualizations, this positions a story-
board as the end product of an analysis. The construc-
tion of storyboard panels, their assembly into complete
storyboards and changing or adding to panels should
all be interactive processes. Panels and the visualiza-
tions they contain should be capable of interaction to
explore the data in depth. For instance, we treat each
of the panels as a zoomable interface, they have all the
functionality of the large interactive windows, but they
are merely smaller.
Separability
While the storyboard represents the complete analysis
and its provenance in an interactive form, static images
from the analysis are often required in other contexts,
such as written reports. By separability, we refer to the
reproducibility of a storyboard panel from just the ele-
ments that are visible on static images. Most notably,
Walker et al. 7
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
this creates the requirement for an explicit statement
of time period in each panel, but also that searches,
queries and filtering be represented visually. If this
property is maintained, then any panel of the story-
board can be used separately.
Summary
Taken in combination, a VA storyboarding system that
demonstrates these principles offers some considerable
benefits. As well as performing exploratory analysis
using the features and representations within each
panel, an analyst can construct an overarching narra-
tive connecting events by generating multiple, anno-
tated panels that are then appropriately arranged
temporally. Furthermore, by producing analyses in
this fashion, the storyboard itself tracks the prove-
nance of the hypotheses generated. With some simple
functionality for storing and retrieving previously
defined storyboards, comparison of hypotheses can be
performed. Coupled with large-screen displays, visual
analytic storyboarding applications can support colla-
borative analysis. Finally, individual panels can be
used in a meaningful fashion in written reports.
Microblogging, events, text analysis andvisualization
Microblogging is a form of blogging where users
broadcast their microposts and exchange short posts
that contain a few sentences, links, small images or
links to other assets such as videos. Often, these micro-
posts are created from mobile devices; hence, they can
be tagged with geolocation (Global Positioning System
(GPS)) information. Because of their small size, they
are easy to create, and because there are many users,
the frequency of the posts is high.
Microblogging services such as Twitter, Tumblr,
Facebook and Google+ have exploded in popularity
in the first decade of the twenty-first century. While
traditional blogs are perceived as high effort, the lower
requirements for microblogging content have led to
much wider uptake of the medium. Twitter alone pub-
lishes 400 million messages per day from 140 million
active users. Facebook has recently passed the billion
user mark. Google+, despite a comparatively late
start, claims 150 million users, and in China, Sina
Weibo receives around 100 million messages per day
from more than 300 million registered users.
However, making sense of such enormous quanti-
ties of data – identifying trends, extracting useful infor-
mation and developing actionable insights – is a
challenging task. Alongside the difficulties that arise
from the quantity and scale of the data (which in fact
is a challenge for the field of information visualization
in general25) are a range of new issues: the microblog-
ging messages themselves are short, they often include
abbreviated words or slang words, may include a pic-
ture and can provide a link to content elsewhere on
the web together with a brief comment. This makes
semantic analysis for these blogs more difficult com-
pared to traditional document analysis. In addition,
the social networks that link microblogs together can
be complicated and hard to uncover, especially when
the links stretch across different services.
But what the blogs do provide is a narrative. The
user is telling a story through their microposts. The
information that they record may be short in length
but is still a journal of their experience and thoughts.
Microblogs, storytelling and unfolding events
Users utilize microblogs for different purposes: some
use it as an aide-memoire, to help the user remember
what they were doing at a particular time, while other
users keep in touch with their friends by letting them
know what they are doing or provide a brief update of
their personal lives and others use the information as a
Rich Site Summary (RSS) feed to gather information
that is relevant to their work or interests.26
Users tend to post their blogs as an event is unfold-
ing, and therefore, the microblogs can be used to
inform other users of the current trends. This immedi-
acy is supported by Oulasvirta et al.27 who evaluated a
10-month usage of the Jaiko service. They report that
83% of the microblogs cover information about the
present, while 7% of the blogs discuss the past and the
remaining 10% discuss the future. The immediacy
and dynamic nature of the posts therefore are relevant
to disaster response agencies and help inform how to
manage the crisis.4 People update their microblogs
with details of their current activity, and they broad-
cast this information to describe what they are think-
ing, reading or their current experience.26
The informal nature of the microblogs allows parti-
cipants to have opportunistic conversations that may
enable people to feel that they are more connected. In
fact, users are often very honest about their current sit-
uation, and therefore, their blogs provide useful insight
into their current thoughts. Because of the personal
nature of the blogs, sometimes, the information is
biased towards the view of the user, such as sporting
supporters biasing the information in their posts to the
fortunes of their team. However, due to the huge
number of posts, any biasing can be often balanced by
alternative viewpoints. Analysing the microblog data
can provide an understanding into trending topics or
can summarize accidents, incidents or sporting
events.28
8 Information Visualization 0(0)
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
It is clear that the microblogs act as snapshots of
events occurring, but with the informal nature of the
posts, we should ask whether they are a reliable indica-
tor. In a recent study, Kwak et al.29 evaluate whether
Twitter as a social network develops representative
information similar to news media. Comparing trend-
ing topics from Twitter to topics in Google Trends to
topics in the CNN Headline News coverage, all three
were similar in content but the timings differed. For
example, topics on Twitter were discussed for a longer
period after the event in comparison with Google
Trends, and although CNN was ahead of the report-
ing half the time, some news did break on Twitter ear-
lier than on CNN. This shows that microblogging
data can play an important role in providing up-to-
date collective intelligence but needs to be analysed
appropriately and presented to the user in a manner
that emphasizes the significant features.
We therefore need appropriate ways to analyse and
visualize the significant events. From one point of
view, the information that is contained within the
posts is already filtered and selected by the user to be
interesting to them or worthy of posting. The impor-
tant aspect of the microblogs is that they are generally
posted by humans and are reactionary. They are
posted when the participant believes that something
may be ‘interesting’ or illustrative of an event. The
user obviously felt that it was worth spending time to
‘tweet’ some information about an event (however
small the microblogs are, it was significant to the user
to make a record of the occurrence). The opposite
view, however, is that the posts contain a lot of irrele-
vant information, and it can be difficult to understand
the development of the crisis from the microblogs
because of the diversity of the messages, the limited
length of the message and the sheer volume of data
that is being created overall by the substantial quantity
of bloggers.4 The content of the blogs is created from
a wide range of individual users, and therefore, the
information stream certainly contains misinformation
and rumours along with the truth, but it is possible to
estimate their reliability30 to provide insight into
unfolding events.
Event information is shared and distributed
between users, who propagate the ideas. Therefore,
trending topics appear when several users classify the
information as important. It may be serendipity that
causes a microblogger to use the same phrase or word-
ing in their microblogs, but usually, it will be because
they are observing the same event, have heard about it
from an alternative source or are reading other micro-
blog posts. Importantly, users amplify the trending
topic by specifically replying to posts, mentioning
information from other messages in their posts or
retweeting other microblogs. The events unfold in a
pattern of decentralized information diffusion.29
Rogers31 writes, ‘diffusion is the process in which an
innovation [new idea] is communicated through cer-
tain channels over time among the members of a social
system’. Especially, retweeting enhances the prolifera-
tion and amplification of an idea.
These diffusion networks communicate and
enhance information but may not amplify the best or
‘correct’ information. These are ad hoc networks
because they are formed by the users and generated by
hashtags and by referencing other tweets; therefore,
the propagation of the information is unknown or
unobserved. Thus, we know the information on a par-
ticular node, but we do not always know the prove-
nance of the information or where it specifically
originated: ‘in case of information propagation, as
bloggers discover new information, they write about it
without citing the source’.32 For instance, although
these information channels are often viewed as having
collective intelligence, even ‘collectives can be just as
stupid as any individual’.33 The event information
evolves and is shared between users.
There are two ways to determine topic trends in
microblogging data: either to analyse the text or to use
visualization and allow the user to perceive the trends
through the visual depiction. Most systems utilize both
processes but tend to put emphasis into one or the
other.
Text analysis
Analysing microblog data is difficult; the tweets are
free form, often non-standard, contain highly irregular
syntax and non-standard punctuation and grammar
and are often noisy. From our experience, microblog
text has some distinct properties that make the use of
standard Natural Language Processing (NLP) solu-
tions problematic. These include the following: the
presence of multiple languages (not just English), the
presence of hashtags where additional context and
metadata have been added and the use of tweet-
specific language unique to microblogs (such as abbre-
viations or slang). Standard NLP techniques such as
part-of-speech (POS) tagging, named entity tagging
and information extraction are problematical because
there is not yet available a richly annotated set of large
microblog training corpora that is the norm for other
text domains that are required to build the statistical
models for robust NLP. Also, the language structure is
dynamic and changes over time.
One of the main areas of interest for this article is
text summarization. Although again, many of the tech-
niques that are used to analyse blogs are not suitable
for analysing the quick update of microblogs.
However, various researchers have investigated event
Walker et al. 9
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
summarization of microblogs, including Sharifi
et al.,34 Chakrabarti and Punera35 and Nichols et al.28
Glance et al.36 use interactive visualization techniques
along with mining algorithms to analyse different
online discussions that concern consumer products.
One analysis method follows a frequency-based
technique. For instance, Sharifi et al.34 generate a sen-
tence from a set of microblogs, while Nichols et al.28
generate multiple sentences for an event. Shamma
et al.37 use a frequency method that is based on the
term frequency–inverse document frequency (TF-
IDF) model38 that evaluates the TF normalized by
IDF, which is based on the total number of documents
within which the term appears. Another concept is to
use the unique identifier (UID) of a message post to
analyse the frequency of the posts. There can be a
surge of microblog posts when something becomes
important. For instance, the software TwitInfo39 uti-
lizes a weighted moving average to evaluate the data
for spikes. Another technique is to use Hidden Markov
Models (HMM) to learn the vocabulary and the struc-
ture of the event;35 however, it may not capture all the
fine details contained within an event.
For MC1, Bertini et al.40 used the Stanford Named
Entity Recognizer (NER), Braunstein41 the Illinois
Named Entity Tagger, while with our submission,42
we used the first three days as the reference corpus
with a relative entropy–based metric. Further details
on our approach are given in the ‘epSpread viewpoints’
section.
Microblog visualization
Although microblogging is a relatively recent innova-
tion, and its visualization does not extend back very
far,43 there is already a wide body of work that covers
their usage and analysis. We first focus on the visuali-
zation of Twitter data and then include some related
areas of blogging, Usenet groups and discussion
boards. Researchers have developed software to visua-
lize microblogs, display information about the people
who are tweeting and their relationships, timing and
chronology of the posts, visualize topic trends and the
location of GPS tweets.
Relationships have been depicted by various meth-
ods; Ho et al.44 use a tree with the focus person at the
route of the tree, surrounded by his or her followers in
a circle. Similar relationship diagrams have been used
in other online social networks, for example, Narayan
and Cheshire45 depict message threads by a series of
connected squares in their system tldr. The primary
interface of tldr visualizes an overview of all the mes-
sages using a histogram display where the activity is
shown over time and a tree visualization to view the
posts of a forum, and the user can drill down into
specific threads. These message threads are visualized
by a series of adjoined blocks, and the user can expand
the messages as required. Other researchers, such
as Biuk-Aghai,46 show co-authorship networks in other
social networks. Indeed, Biuk-Aghai presents associa-
tions in Wikipedia through three-dimensional graphs,
Glance et al.36 analyse social networks to derive mar-
ket intelligence and other researchers analyse who is
talking to whom47 and derive visual signatures.48,49
Perer and Shneiderman50 present interactive graphs to
explore social networks, Heer and Boyd51 present
Vizster to visualize online social networks in large
graphs and Hansen et al.52 present a tool called
EventGraph. They depict an example that shows
Twitter data from VisWeek (with the tag #dcweek)
with the node size mapped to betweenness centrality.
It is worth mentioning that some researchers have
investigated the relationships between followers and
tweeters. For instance, Kwak et al.29 depict these rela-
tionship by scatter plots, while Oulasvirta et al.27
depict 845 interconnected members using a fisheye
magnification technique. A beautiful visualization of
relationships is shown by Kwak et al.29 who visualize
retweeting messages in trees that are aligned as small
multiples. Finally, hierarchical trees are used by Smith
and Fiore53 to visualize conversations, and the hier-
archical relationships of Usenet groups have been
depicted by treemaps.54,55
Associated with relationships are topics. The topics
change over time, as words become more frequent
over time. There are three main styles of topic visuali-
zation: (1) a ThemeRiver56 approach, where the time-
line is modified to also include the frequency of the
posts and the trending topics are annotated onto a
timeline; (2) tag and word clouds and (3) trees. The
Communication-Garden System57 visualizes topic
threads by a flower metaphor. Although different in its
formation, their visualization design, however, is
visually similar to the ThemeRiver visualization, with
the number of threads being represented by the width.
Dork et al.58 present a multiple-view visualization sys-
tem, where the principle view is a visualization similar
to ThemeRiver and other views are keywords and pic-
tures. Dou et al.59 display the frequency of the micro-
blogs – as bursts of information – along a timeline and
allow key topics to be highlighted. Their visualization
also contains two other associated views: a map and a
tag cloud of topics. Word clouds are used by several
developers to depict trending topics: Ramage et al.60
present a vertical timeline with a series of topics in
word clouds. Their word clouds are annotated along
the timeline in a similar way to our visualization. The
ThemeCrowds visualization61 provides a multiresolu-
tion summaries of Twitter usage through tag clouds,
while Bosch et al.62 display word clouds that are local
10 Information Visualization 0(0)
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
to a geographic position, which demonstrate localized
events in geographical space. Finally, Yin et al.63 use
tag cloud representations to enhance emergency situa-
tion awareness and to demonstrate trending topics.
Along with relationships and topics, time and chron-
ology are important factors for event analysis. Many
visualizations utilize a timeline to represent the informa-
tion. While the TweetTracker64 tool uses static visualiza-
tions, others have attempted to add interaction. Marcus
et al.65,66 present TwitInfo for monitoring Twitter data
for certain keywords and used a timeline visualization to
visualize a sudden increase in the frequency of chosen
keyword as a peak that was annotated with the relevant
keyword. Users could then select a peak in order to
explore the event further. Information regarding geolo-
cation, related URLs and any sub-events associated with
that particular keyword were provided. The sentiment
of the keyword (‘positive’ or ‘negative’) was also derived
algorithmically and provided for each keyword. The
intention of the tool is to give the user an overview of
the event as it occurs, aggregating several pieces of infor-
mation that may be of value in understanding the cur-
rent status of the event and its background. But also
other Twitter visualizations map information on a time-
line: Itoh67 displays three-dimensional visualizations
of relationships between different Twitter users that
is mapped along a timeline. The Truthy system68 dis-
plays a small timeline alongside a diffusion network
view. Seascape and volcano69 visualize online discus-
sions using animation and a point-based depiction.
Space and time have been visualized together. Some
designers have followed the space–time cube70 design,
such as Kim et al.71 who use the timeCube three-
dimensional representation to explore topic movements.
Finally, some researchers have developed visualization
design methods that display time, but not on a timeline.
For instance, the PieTime72 system depicts emails sent
and received in a modified star plot, and Whisper73
demonstrates a flower-inspired design to show informa-
tion diffusion in real time.
With the increase in use of mobile devices, there has
been a corresponding increase in microblog postings
with location tags. This permits the posts to be plotted
on a map. White and Roth74 use geospatial location as
a means of visually exploring information contained
within microblog posts. They state that approximately
70% of the publicly accessible posts made to the
Twitter service would therefore be capable of contain-
ing latitude/longitude coordinates, while at the time of
publishing, only approximately 10% of public posts
currently carry this information. White and Roth cre-
ated software (TwitterHitter) that uses this information
to create two visualizations for the user to explore: a
timeline and a network graph for understanding con-
nections between individuals in a particular area. Using
a set of keywords, users may also view relevant posts
plotted as points on a map or as a heat map. They state
that this tool could be used for crime trend analysis
within a particular area of interest. MacEachren et al.75
discuss geospatial aspects of Twitter for crisis manage-
ment. Indeed, most map-based visualizations are two-
dimensional (2D) maps. For instance, Ho et al.44 uses
Google Maps, while Lohmann et al.76 use a word cloud
and a map to plot the microblogs.
In fact, several researchers use map-based displays
to identify the location of outbreaks of influenza (and
potentially other illnesses), including Singh et al.,77
Cheong and Lee,78 Achrekar et al.79 and Kumar
et al.80 in their NIF-T system and Kumar et al.64 in
TweetTracker. In particular, Achrekar et al.79 con-
ducted an experiment to investigate how effective
Twitter microblogs were to predict the location of an
outbreak of influenza. They used keywords such as
‘flu’, ‘swine flu’ and ‘H1N1’ to search for posts that
related to flu and investigated the posts over a period
of 13 days. They also stored longitude/latitude infor-
mation, when it was available from the posts; if it was
not available, they used the location of the user in their
profile. They compared the information contained
within these posts with data released by the Center for
Disease Control and Prevention (CDC) for influenza-
like illness (ILI) cases. Using a Pearson correlation,
the results indicated a strong correlation (r = 0.9846)
between the locations of collected data and the loca-
tions reported in the ILI data. Culotta81 conducted a
similar evaluation and investigated 6.5 million posts
for keywords that mention influenza-like symptoms,
comparing the tweets to ILI data, and found strong
correlations for several keywords such as ‘cough’
(r = 0:84) and ‘flu’ (r = 0:92), but less strong for
‘fever’ (r = � 0:77). This situation is understandable
because fever is often used colloquially and need not
refer to illness. These results support the idea that
microblog data can be used as a means of accurately
identifying the location of outbreaks of influenza (and
potentially other illnesses).
Tool design
As discussed before, we followed an agile program-
ming methodology focusing on the VAST 2011 MC1
data set, which led to the design and development of
our tool, epSpread. In this section, we discuss compo-
sition, viewpoint, transitioning, annotability, interac-
tivity and separability of epSpread.
epSpread composition
Each storyboard panel is composed of a number of
different components to allow examination of different
Walker et al. 11
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
aspects of the data. The principal panels are shown as
an overview storyboard in Figure 5 and consist of a
map interface for looking at geolocated patterns
(Figure 5(a)), a querying interface to select sets of
messages for display and analysis (Figure 5(b)), a word
cloud to display the results of the textual analysis
(Figure 5(c)) and a timeline (Figure 5(d)) and stream-
graph to show tweet counts by time over multiple
topics (Figure 5(e)).
In designing epSpread, we wished to balance a
functionally rich interface with a simple design. We
also developed other views including a more detailed
streamgraph view and a message count view that
showed the quantities of microblogs per region.
However, for this article, we focus on the five principal
views. We positioned the map panel in the centre to
provide the main visualization, the streamgraph and
timeline were positioned together such that they repre-
sent the same time range, and the word cloud is linked
with the range slider on the timeline.
The map panel provides a geographical display to
provide context – it gives a setting for other informa-
tion to be overlaid in different forms. In the case of the
MC1 data, the provided map was based on satellite
imagery, with regions of the city drawn over in bright
colours. The map was thus redrawn and desaturated
to de-emphasise the map but to still provide context.
The map itself supports two types of overlay. A sim-
ple point-per-message geographic plot is useful for
considering the overall spread and examining the
contents of messages. But for large or dense message
sets, over-plotting reduces the value of this technique.
Therefore, the message set can also be shown as a heat
map overlay on the map, by performing kernel density
estimation using the M4 kernel82 on the message set
and visualizing the resultant 2D field. As well as avoid-
ing occlusion issues, this has the advantage that addi-
tional information can be incorporated into the density
calculations: for example, in the VAST 2011 MC1
data set, messages were weighted according to the pop-
ulation of the region of the city from which they were
sent, at the time they were sent.
There is an obvious requirement for some mechan-
ism for filtering messages for further analysis. Our
query interface consists of a keyword search box that
presents results in a result-stack interface. This enables
eight searches to be included, which can be visualized
separately or in combination, or deleted from the
result-stack. The querying interface offers some addi-
tional functionality: regular expressions can be
included in the query box to filter message blogs that
are written in either first person or third person. This
is extremely useful for analysing microblog data for
disaster management as it can be used to reduce blogs
containing hearsay.
In addition, the results panel displays statistics on
the correlation between two selection sets. These
selection sets are created by drawing a lasso selection
region on the map. This lets us ask questions such as
‘how many people, who sent messages from or near
Figure 5. Some features of epSpread: (a) geographical view of two sets of messages, (b) querying interface with cross-query results as percentages, (c) word cloud for selected time period, (d) time range slider and (e) streamgraphvisualizing query results over time. This figure shows the result of a cross-query between people at the convention inDowntown on 18 May and those who reported suffering from chills, fever or sweats. This query was performed entirelythrough selection on the available views.
12 Information Visualization 0(0)
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
the baseball stadium, later sent messages complaining
about fever?’.
The distribution of search terms over time is shown
using a streamgraph. Message sets retrieved using the
querying interface are shown as they are produced.
The streamgraph scales to reflect the number of occur-
rences thus also provides a mechanism for comparing
the relative sizes of message sets.
epSpread viewpoints
We focus on two aspects of viewpoints: first, data-
viewpoint and second view-projection (such as
zooming).
Getting different views on the data is an important
part of VA exploration. In this case, we wished to ana-
lyse the microblog data both to understand patterns
held within it and to produce a manageable (reduced)
set of data that can be visualized effectively. Many sys-
tems that deal with unstructured text rely on struc-
tured training corpora. However, microblog data do
not work well with traditional techniques, where it is
difficult to compare the unstructured, ‘messy’ and
abbreviated form of words of the microblogs with a
traditional formal corpus. In addition, another impor-
tant aspect is that we wish to understand blogs that
relate to first-hand experience, and which represent
second- or third-hand experience.
In order to overcome the challenge of a lack of
microblog training corpora, we used an alternative
relative entropy–based approach that determined
when ngram types (e.g. words, bigrams and trigrams)
in a target window from the microblog corpus differed
significantly in probability from the norm as repre-
sented by a reference corpus.83 several reference cor-
pora exist (such as the Brown Corpus) but most have
been created on clean data and assume the grammar is
good. This is not the case for microblog data that con-
tain spelling mistakes, abbreviations and so on.84
Therefore, we use the microblog data set itself as the
reference corpus. For the case of MC1, we used the
first 3 days, while for the Olympic data set, we took a
subset of the real-world blog data.
Our method uses a simple naive estimate for the
probability of each ngram based on its frequency of
use in the particular microblog window or reference
corpus. Let us define PMicroðgÞ as the probability of the
ngram g in the microblog window and PRef ðgÞ as the
probability of the same ngram in the reference corpus.
With CMicro(g) and CRef (g) representing the frequencies
of the ngram and NMicro and NRef as the total number
of ngrams in the respective microblog window or refer-
ence corpus, we define PMicro(g)=CMicro(g)=NMicro
and PRef (g)=CRef (g)=NRef .
Now, we can calculate a relative entropy–based dis-
tance metric used for ranking the ‘unusualness’ of each
ngram g by HDiff (g)= HMicroðgÞ�j HRef ðgÞ��= �log2j
PMicro (g)� log2PRef (g)��.
We name this measure codelength difference. From a
compression perspective, this measure is merely the
absolute difference in compression codelengths, where
the costs of encoding the ngram are calculated using
two different naive models: one trained on the micro-
blogs window text and the other trained on the refer-
ence corpus text. The codelength is a measure of the
‘information’ (or surprise) for an ngram compared to
the other ngrams. The codelength difference will be
zero when the probabilities for the ngram in the two
different probability distributions are the same.
We now iterate through a few examples using the
word ‘fire’ and the MC1 data set. First, we can calculate
the codelength (HMicro) for encoding the word ‘fire’ for
the entire MC1 data set as follows: HMicro ‘fire’ð Þ=�log2PMicro 9fire9ð Þ=�log2 27305=13560614ð Þ=8:596.
Since the word ‘fire’ occurs 27,305 times in 13,560,614
words, we can compare this (for example) to standard
American English (based on frequencies from the
balanced Brown Corpus). Therefore, ‘fire’ occurs 207
times in 1,023,856 words, and the codelength for fire in
the Brown Corpus is as follows: HBrown 9fire9ð Þ=�log2PBrown 9fire9ð Þ= � log2 207=1023856ð Þ= 12:272.
We can use the absolute difference between the two
codelength values as a means to measure how unusual
the probability for ‘fire’ is for the MC1 text in compari-
son (in this case) to the Brown Corpus. Therefore,
HDiff 0fire0ð Þ= HMicro 0fire0ð Þ �HBrown 0fire0ð Þj j= 8:596�j12:272j=3:676.
In our analysis of MC1 data set, we first ranked all
the unigrams in both the MC1 data set and the Brown
Corpus by codelength difference after first converting
all text to 27 characters (by case-folding and then col-
lapsing all non-letter sequences to a single space). We
found that the top five most ‘unusual’ unigrams from
the texts ranked by this measure are as follows (code-
length difference values are shown in brackets): wow
(9.395), cant (8.742), chills (8.620), que (8.456) and
spill (8.308). An analysis of bigrams was more reveal-
ing. The top five bigrams ranked by codelength differ-
ence were as follows: has caught (9.089), the chills
(8.989), make me (8.669), on fire (8.512) and the flu
(8.289). For trigrams, however, the picture was not as
clear. The top five trigrams are as follows: come down
with (9.358), the united states (8.685), to lose my
(8.251), i was somewhere (8.247) and of the united
(8.161).
This analysis importantly reveals some of the defi-
ciencies of using the Brown data set as a reference cor-
pus since it is a collection of samples of American
English in the 1960s. There are further limitations of
Walker et al. 13
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
our approach. One challenge is called the zero-
frequency problem, that is, the method can only be
used for ranking ngrams that occur in both the target
and reference data sets. For example, this is particu-
larly noticeable using the Brown Corpus because none
of the trigrams that contain the word ‘chills’ appear in
the Brown Corpus. However, not withstanding these
limitations, the codelength metric provides a beneficial
summary mechanism for the microblogs. We can gen-
erate storyboards with salient information, and it helps
us present key snapshots of the stories to present the
development of stories in different storyboard panels.
To determine whether the blogs refer to the first–
hand (or second– or third–hand) experience we ana-
lyse the text using a deictic analysis to investigate the
types of words and where they are located in the micro-
blogs. Our solution is explained further in Pritchard
et al.84 but is based on the Stanford log–linear POS
tagger. This enables us to generate different view-
points: first selecting blogs from the whole data, sec-
ond of those in the first person, and finally those in the
third—person.
Finally, several different view projections could be
included, and the list of possible functions is endless.
However, in epSpread, we enable the user to see sev-
eral panels together or zoom into one particular panel.
We utilize a GridBagLayout mechanism to constrain
the panel components in the storyboard.
epSpread transition – constructingstoryboards
We chose to construct the panels on a timeline. The
order of the panels is therefore determined by the
parameters of a particular panel. With each panel cor-
responding to either a specific instant or a period of
time, the system is able to automatically arrange them
chronologically along a timeline to form a storyboard.
This situation was convenient for this data set because
time was an important aspect of the decision-making.
However, this would not always be the case, and there-
fore, this decision is solely a design decision for epSpread
(for the given microblog data sets) rather than being a
principle of storyboarding for VA. In this case, our belief
that this was the right decision was borne out by the suc-
cessful application of the tool to the VAST task.
Individual panels on the storyboard can be enlarged
by double-clicking. This makes the chosen panel dis-
play in full-screen; during this mode, the user can adjust
any parameters. This enables the user to focus on one
panel and then return it to the storyboard view to show
the whole story. If the time period or instant is adjusted
when the panel is full-screen, then the panel will auto-
matically return to the correct location on the timeline.
The process of producing a storyboard is typically
iterative – panels are added and used to analyse partic-
ular events or trends. Perhaps initially, a single panel
will show the results of multiple events or trends, but
additional panels can be created if this improves the
clarity of explanation. Figure 6 shows an example of
constructing a storyboard that describes a sequence of
conventions that take place in the MC1 data set. First,
a single panel is used to identify, from the stream-
graph, the days where many messages contain the term
‘convention’ (Figure 6(a)). Since these messages can
be seen to be spread over a considerable time period,
additional storyboard panels are created, and the time
slider for each was adjusted to cover just a single con-
vention (Figure 6(b)). Then, each panel is examined
in more detail: the nature of the convention is deter-
mined from the textual analysis word cloud and exami-
nation of individual messages, and this is recorded as
an annotation: additional queries are used to check for
other occurrences of the convention subject in the data
set, and cross-correlation is used to check for more
than one convention on the same day (Figure 6(c)).
epSpread annotation
Panels can be annotated through the use of a simple
Post-it note metaphor – short text notes can be placed
anywhere on the panel to highlight important or inter-
esting features shown in the individual visualizations.
We not only consider annotation as a low-level task but
also support higher level summaries as captions.
Finally, each panel on the storyboard is captioned with
a brief summary of the event it describes (Figure 6(d)).
We distinguish captioning from annotation. Captions
are designed to indicate the role the panel plays in con-
structing the story of the analysis and hence allows for
a textual summary of the storyboard to be produced if
required.
epSpread interaction
Interaction is provided through various means includ-
ing: the user can choose what data to display through
searching the data for different words, they can select
different view options (such as the scatter plot view or
the heat-map view) and they can select a subset of the
displayed microblogs from the map view.
epSpread separability
epSpread has been implemented using Java and the
processing libraries. We have used an extensible panel
design. Each storyboard panel acts as a container for a
number of different visualization and querying tools:
this has the advantage that it allows us to wrap existing
14 Information Visualization 0(0)
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
Fig
ure
6.
Co
nst
ruct
ing
ast
ory
bo
ard
for
con
ven
tio
ns
inth
eM
C1
da
tase
t:(a
)id
en
tify
ing
all
con
ven
tio
ns
usi
ng
the
stre
am
gra
ph
,(b
)cr
ea
tin
ga
pa
ne
lfo
re
ach
con
ven
tio
n,
(c)
exp
lori
ng
,q
ue
ryin
ga
nd
an
no
tati
ng
wit
hin
ea
chp
an
el
an
d(d
)ca
pti
on
ing
pa
ne
lso
nth
est
ory
bo
ard
wit
hsu
mm
ari
es.
Th
eco
mp
lete
dst
ory
bo
ard
can
be
use
da
sa
na
na
lyti
cp
rod
uct
or
sto
red
for
furt
he
re
xplo
rati
on
.
Walker et al. 15
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
libraries and code, rather than develop from scratch.
For example, to produce a streamgraph, we simply
wrap the code supplied by Byron and Wattenberg.
While in our current implementation, each story-
board contains the same visualization elements, it is
easy to see that for other data sets or tasks, different
elements or combinations of elements might be
required. Panels also need not all contain the same ele-
ments, as long as they still adhere to the design princi-
ples discussed in the ‘Principles of storyboarding for
VA’ section and annotation could be made consider-
ably more complex than the simple text notes currently
supported, with a number of comic strip metaphors
and devices that could be usefully applied. This could
also be coupled with better support for storytelling
within the tool.
Case study: VAST 2011 MC1
To demonstrate the use of storyboarding in epSpread,
an example is presented here: analysis of the data for
VAST Challenge 2011 (MC1). In this scenario, a fic-
tional city, Vastopolis, is suffering from an epidemic.
Symptoms reported are largely flu-like and include
fever, chills, sweats, nausea and diarrhoea. Two data
sets are provided. The first is a set of a million micro-
blog messages covering a period of 20 days from 30
April to 20 May 2011. Each message includes a user
ID, a date, a GPS location and a short text message.
The second contains information about the city: popu-
lation statistics, maps, weather data and so on. The
tasks set in MC1 are as follows: first, to identify the
origin of the epidemic and second, to determine its
spread and transmission, with a view to directing emer-
gency resources appropriately.
Identifying the origin through storyboardinvestigation
Our experience with the VAST Challenge 2011
demonstrated the benefit of collaboratively interacting
with the storyboards. During our development period,
group analysis sessions were built around the presenta-
tion of hypotheses by different group members.
Figure 7 shows epSpread on the large display, to
improve accessibility for the group discussions. Each
hypothesis was mapped to a storyboard on a larger
screen display and discussed by the group. Additional
panels and annotation were added to clarify the ideas
and sequence of events. In the time between meetings,
group members could update their own storyboards to
reflect their understanding of events. This discussion
process resulted in a very refined narrative that was
presented as our solution to the Challenge.
Figure 7. Using a large pixel display with epSpread. The resolution of the screen is 7600 3 4400, and storyboard panelsare visible even when shrunk and placed on the timeline. Storyboarding with such displays may be more effective thanon smaller, lower resolution screens.
16 Information Visualization 0(0)
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
While it is possible that the epidemic could cover
the entire period, simply observing the word cloud to
see important terms while dragging the time slider
across the 20-day period reveals that this is not the
case. In fact, the epidemic strikes over the last 3 days,
from 18 May onwards, and this can be clearly seen
from both the word cloud and by performing searches
for the provided symptoms. Simple keyword searches
for symptoms return a number of messages that are
merely reporting on illness of someone else. Excluding
these using the first-person filter discussed in the
‘epSpread Viewpoints’ section gives a much clearer
picture, and by adjusting the slider to cover these 3
days and adding some annotation, this knowledge can
be clearly displayed on our storyboard.
By searching for each of the symptoms given in the
task description, we can see that there are two distinct
patterns: fever, chills and sweats spread eastwards
(which matches the wind direction), while nausea,
vomiting and diarrhoea spread towards the south-west
(which matches the flow of the river). Again, these
patterns can be highlighted, annotated and displayed
as storyboard panels.
However, determining a hypothesis for the cause of
the epidemic engendered more discussion, over several
different storyboard panels. Mapping the direction of
spread backwards seems to indicate a possible com-
mon cause. Various storyboards were proposed and
investigated during our team meetings. Indeed, swap-
ping back and forth between the full-screen view and
the storyboard view was beneficial to finding the cor-
rect hypothesis. If we look at the 17 May in more
detail, we can see that terms such as ‘explosion’,
‘truck’ and ‘spilling’ occur more often than expected.
If we search for each of these terms, we find mostly
second-hand references to an explosion in one district
of Vastopolis and to a truck accident on a bridge,
where a cargo is spilt, at about 11 am. With several of
our team independently coming to the same conclu-
sion, and when the storyboards were investigated col-
laboratively, we came to the conclusion concerning the
best hypothesis for ground zero of the epidemic.
The storyboards enabled us to discuss different sce-
narios, eliminate potentially wrong hypotheses and
drill down into the detail of the hypotheses. Indeed, it
is immediately noticeable, by looking at where the
panels showing the two sets of symptoms are posi-
tioned on the timeline, that gastrointestinal symptoms
are reported later, by a day or so. This implies either
two separate illnesses (unlikely given they spread from
the same source point) or two different means of
spread. Given the weather conditions, it seems likely
that the fever symptoms were spread by an airborne
medium. The additional information provided for the
task states that drinking water is sourced from the river
(which flows north to south) and from nearby lakes.
From several storyboards, the hypothesis then was that
the gastrointestinal symptoms are a result of contami-
nated drinking water. Figure 8 shows a storyboard for
the temporal patterns of the different symptoms.
Establishing transmission and using differentviewpoints
Again, the storyboarding techniques helped to locate
potential transmission of the epidemic. Seeing differ-
ent ‘viewpoints’ in different storyboards was an impor-
tant feature. This enabled us to hypothesize about the
spread of disease. With airborne and waterborne trans-
missions seeming likely, the final step is to check for
person-to-person transmission of illness. As mentioned
in the ‘Identifying the origin through storyboard inves-
tigation’ section, many microblog messages containing
symptom keywords are third-person references – ‘Mia
has come down with a fever’. These messages are likely
to be referring to friends or family members. We uti-
lized different panels with different viewpoints, One
querying just these third-person references and then
another with messages reporting fever in the first per-
son. Through these viewpoints, we can ask the ques-
tion ‘Does anyone who talks about a friend or family
member being ill later fall ill themselves?’ through the
interface. In fact, there is no overlap at all between the
two groups. This gives weight to our hypothesis that
the illness is not transmitted between people.
Case study: London Olympics
Following the VAST Challenge, we extended the sys-
tem to support real microblogging data from Twitter,
collected a data set of messages about the London
2012 Olympics and analysed it. In this section, we first
describe how the data were sourced and then consider
how the geolocation of tweets was handled before
showing some extensions to epSpread and presenting
some examples of the type of analysis we performed
on these data.
Twitter messages
Twitter allows access to its platform through a set of
application programming interfaces (APIs), centred
around four objects: Tweets, Users, Entities and
Places. Two separate resources from the REST 1.1
API – Search and Streaming – are of most relevance to
the data collection task. Each has constraints on its
use: the Search API is limited to tweets up to 7 days
old and returns only a maximum of 1500 tweets for
query (100 per page over 15 pages). Search requests
Walker et al. 17
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
are limited to 180 queries per rate-limiting window
(typically 15 min, but varies in times of heavy traffic),
and complex queries may also be limited. Search
queries can include geographic location as a para-
meter. Twitter states explicitly in the API documenta-
tion that not all tweets will be indexed or made
available via the search interface.
The Streaming API provides only tweets as they are
posted – there is no historical search capability. Once a
connection is established, a feed of tweets is delivered
without requirement for polling or for rate limiting.
However, the Streaming API only allows access to a
sample of tweets: typically around 1% of the total
traffic. Firehose (100% of all tweets) access is
restricted to selected commercial partners (largely
search engines). Larger samples and historical tweets
are also available through commercial partners such
as Gnip and DataSift. We gathered data about the
London 2012 Summer Olympics using both APIs.
The same search terms – ‘olympic’, ‘olympics’,
‘paralympic’, ‘paralympics’, ‘para-olympic’, ‘para-
olympics’, ‘london 2012’, ‘london2012’ and
‘#games’ – were used in both cases.
Geolocation with Twitter. The most recent version of
Twitter’s location API has moved beyond merely
reporting latitude and longitude to attempting to
aggregate data into Places. A Place is a geographical
area defined by a bounding polygon with information
such as country, type of place (e.g. city) and a set of
other optional attributes, which can give associated
information such as a hierarchy of Place inclusion
(city, county, country) and the street address. Location
is disabled for users by default, and an explicit opt—in
is required before any location information can be
added to tweets.
When enabled, the default for tweets is to show only
place information – latitude and longitude provided by
the Twitter client are reverse-geocoded to a location –
while Twitter stores the exact coordinates for 6 months
to improve the accuracy of its geolocation systems. All
past location data can be deleted by the account holder
at any point. Even when opted in, users can choose
whether or not to share location information on a per-
tweet basis.
Twitter’s Search and Streaming APIs behave slightly
differently with regard to geolocation data. If a location
Figure 8. Storyboard for the spread of the epidemic. Spread shown clockwise from top left: nausea, vomiting, diarrhoeaand abdominal pain. While the spread pattern is the same, the temporal pattern differs and this is shown by linking eachstory panel to the timeline.
18 Information Visualization 0(0)
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
is specified as a filter on the search, to return all tweets
in a specified area, then the Search API looks first at
the latitude and longitude of a tweet, if provided, to
see whether it falls within the area. If those coordinates
are not provided but the Place field is populated, then
any overlap between the bounding box of the place
and the search area will result in a match. Finally, if
neither coordinates nor place is provided, the location
given in the user’s profile is considered, and the tweet
returned if geocoding this location produces coordi-
nates within the search area. The behaviour in the case
of the Streaming API is the same in the first two cases,
but user location is not considered, and hence, tweets
without coordinates or place are never returned if the
stream is filtered on location.
Collating locations. Tweets were collected for the
period from 14 July 2012 to 25 October 2012, to cover
both the Olympics and Paralympics and some of the
aftermath. The Streaming API produced only a tiny
number of tweets with geolocation, while using the
Search API, we were able to gather a total of
2,399,516 tweets with some form of location using a
criteria of ‘within 5000 miles of London’. Of these,
just 42,750 had latitude and longitude specified expli-
citly, an additional 2584 had place information and
42,789 had latitude and longitude in the location field
of the user’s profile, set by the mobile Twitter client.
The remaining tweets had location information only
as free text in the user’s profile.
Twitter’s geocoding API was used to retrieve the
centre of the bounding box for the 910 distinct places
contained in those tweets with only place information,
and these were used as the coordinates. The
GeoNames geographical database was used to geo-
code the 104,970 locations retrieved from profiles. By
combining these, we were able to collate location
information for a total of 211,583 tweets.
Extending epSpread
The core functionality of epSpread was extended in a
number of ways to better support analysis of this new
data set. The static map was replaced with a per-panel
dynamic map using the Unfolding Maps library85 for
Java, using an appropriate low-contrast map provider,
and annotations could be made to map points rather
than simply screen positions. The word cloud now
contains two mappings: text size still maps to the
results of our text analysis codelength difference
metric, but words are now coloured by frequency on
that day. This helps to identify words that are surpris-
ing on a day simply because they are very infrequent
words in the reference corpus, for which we used
tweets sent in the first week. We also changed the
behaviour of panels to allow more control over compo-
sition: an individual view (map, streamgraph, word
cloud) can now be given prominence when the panel
is shrunk back to the storyline.
Olympic overview
We used the updated epSpread system to analyse the
Twitter data to explain message traffic during the
London 2012 Summer Olympics in the context of
the events taking place during the games. We began by
constructing a panel that showed all tweets, by search-
ing the message contents for the original search terms
(since by definition, one or more of these terms is in
the text for all messages). We then adjusted the time
slider to restrict the view to only the period from 2
days before the Opening Ceremony (Friday, 27 July)
to 2 days after the Closing Ceremony (Sunday, 12
August).
We then fixed this panel on the storyboard with
focus given to the streamgraph to act as context while
conducting more detailed work. From this panel, we
could see that the broad pattern is that traffic spikes
for the opening, then declines gradually until the clos-
ing ceremony, where again it spikes briefly. This pat-
tern is as might be expected: different events are
popular in different countries, but the opening and
closing have universal appeal. However, alongside this,
clear bulges in traffic can be seen on 31 July, 4 August,
7 August and 9 August.
Detecting events
For each of these dates, a panel was constructed and
the word cloud examined to help determine the cause
of the traffic. For 31 July, the word cloud view is most
revealing – the swimmer Michael Phelps became the
most decorated Olympian ever around this time, and
this is reflected in messages on that day. For 4 August,
the terms in the cloud indicate that the home nation
had a successful day. The 6th indicates that the term
‘sprint’ is unusually popular on that day, and examin-
ing tweets shows that the 100 m final took place the
day before. Finally, on 9 August, drilling down using
the term ‘box’ from the cloud leads to the discovery
that Team GB gained its first-ever women’s boxing
gold medal.
While the process of discovery for each event is sim-
ilar, the composition of panels into a storyboard dif-
fers: while the full panel maintains separability, the
views displayed when the entire storyboard is shown
are those that best summarize, in combination with
the caption, the information on that panel. Figure 9
shows the storyboard constructed for this task. For 7
Walker et al. 19
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
July, a word cloud is shown, while for 4 and 7 August,
a streamgraph is displayed, and a map view shows the
localized response on 9 August.
Discussion and lessons learnt
The storyboarding metaphor worked well in the two
case studies that we used. It enabled a group of
researchers to work together to analyse visually (in
particular) the MC1 data set, discuss various hypoth-
eses and agree on views that demonstrated specific
answers. It certainly helped us as a team to create the
correct answers for the VAST Challenge 2011 data set
and organize the report material. In fact, our answer
consisted almost entirely of panels from the storyboard
visualization. Additionally, the storyboard structure
enabled us to work both individually and collabora-
tively. The use of a large display helped us to discuss
different scenarios as a group, and the use of the full-
screen mode enabled us to focus on specific details (in
the full-screen) and view the progression of different
hypotheses (when docked in the storyboard
configuration).
It was also useful to consider the storyboard con-
cept from the point of view of its use in the film indus-
try. This ideation guided our development and is
summarized in the six principles of storyboarding for
VA. One aspect that was fervently discussed as a group
was the level of abstraction or simplification of the
panels. While it is clear that film storyboards are typi-
cally sketchy representations that are hand-drawn, and
act to demonstrate uncertain or unfinished decisions,
the design goal of our tool was to learn from the idea
of storyboarding rather than mimic every detail. While
sketchy visualization renderings are possible with
Processing, we do not believe that they are useful for
every situation. In fact, our focus is on the challenge of
achieving the right level of detail, rather than over its
sketchy appearance. We found that by reducing the
amount of plotted points (through filtering or crop-
ping), the display would become clearer.
It was not always useful to include every component
of the panel in the storyboard view. In the development
of epSpread for MC1, we included every component
in each of the panels. However, some components
were more useful in the full-screen view because they
Figure 9. Storyboard for Twitter traffic during the London 2012 Summer Olympic Games. The spikes resulting from theopening and closing ceremonies are clear, but the period between shows a number of bulges indicating surges in trafficfrom events. Some of these events have been analysed, and summaries were presented using the most appropriateview. Double-clicking any panel brings it back to full size and shows the full views together with any annotations.
20 Information Visualization 0(0)
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
provided context information and additional controls.
But when docked in the storyboard, some of the com-
ponents cluttered the visualization rather than adding
value to the display. For instance, some words within
the tag clouds were difficult to read when docked in
the storyboard. Therefore, for the Olympic Games
data set, we found that it was more beneficial to display
fewer components in each panel. In fact, it was often
more useful to display one component when displayed
in the storyboard dock.
One aspect that also helped with the clarity of the
view was summarization. We achieved different repre-
sentations by using different viewpoints (such as swap-
ping between first-person and second-person views of
the microblog data). This helped to clarify the visual
depictions because the first-person view held less data.
It was therefore clearer to understand the trends of this
subset. Furthermore, we believe that storyboarding
would benefit from simplification techniques such as
Document Cards,24 which we leave to future work.
Another aspect of our implementation that was use-
ful was the off-screen pdf Processing renderer. This
enabled us to create high-quality screenshots to be
saved directly from the screen and incorporated
directly into the report. This idea could be developed
further; again, this is part of our future plans.
We also found that it was useful to annotate and
include captions on the panels. This enabled the story-
boards to be named and this made the group discus-
sions more specific. For example, rather than talking
about ‘the panel with the group of red tweets’, we could
provide the panel with an identity. However, through
our development and SCRUM meetings, there were
always several requests for additional annotation capa-
bility. These requests included functions to search anno-
tations and to have separate lists of annotations.
However, there are some drawbacks to our design
decisions. We took the decision to place the story-
boards on a timeline. Although this was a useful fea-
ture for both the MC1 and Olympic Games data set,
because it enabled us to discuss the specific order by
which events occurred, we believe that this would not
be useful for all data sets. We certainly found it advan-
tageous to change the position of the panel merely by
changing the time parameter, and it enabled panels
with a wide time range to be easily discussed; however,
on reflection, some of the team felt that it was often
distracting and took up too much screen space.
Another feature that we developed into epSpread
was that any interaction that could be achieved in the
full-screen mode was available when docked in the
storyboard configuration. Because the time slider in a
panel determines the position on the storyboard, it is
thus possible to interactively change the order of panel
through moving a tiny slider. However, in practice,
this was not used. It was difficult to interact with the
interface when the panels were in the storyboard con-
figuration, merely because the buttons and sliders
were extremely small. Potentially though, this could be
useful on an extremely large powerwall screen because
the control sizes would be larger for an individual
storyboard.
Although we found that the storyboard methods
worked well in collaborative group settings, the inter-
face was only operated by one user at any time. We
used a wireless mouse to enable control of the interface
to be exchanged between different participants, and
we amicably handed over control to another collabora-
tor. However, in practice, only a small subset of parti-
cipants operated the tool in the group meetings. We
believe that it would have been good to encourage
more group members to operate epSpread in the colla-
borative environment by making it easier to collabora-
tively interact with the tool. A touch-table interface
may be more accommodating for such group interac-
tion, and although the wireless mouse was useful, other
technologies may work well with the visual storyboard.
Conclusions and future work
In this article, we introduced a storyboard metaphor
for VA. We introduced six key principles of design –
composition, viewpoint, transition, annotation, inter-
activity and separability – and demonstrated how
storyboards can help address visual analytic challenges
through two case studies. The metaphor addresses the
dual goals of visual analysis problems: performing the
analysis and communicating the results of the analysis.
More broadly, conducting and presenting analyses
through storyboards has the potential to improve sense-
making processes. In intelligence analysis, the issue of
provenance is becoming of greater importance – both
for decision-making and for post-decision evaluation –
and by forcing visual presentation of all steps in the
analysis, storyboards map provenance as an implicit
part of the workflow. Storyboards can also provide
summaries at different levels of detail and classification:
from a completed storyboard, text-only reports could
be generated from captions or annotations; depending
on data privacy or classification issues, the storyboard
could be presented as a mixture of full panels and
caption-only summaries. We anticipate that this multi-
level presentation of analysis and analytic product could
prove useful and important in the future.
Acknowledgements
We acknowledge the reviewers for their helpful com-
ments. epSpread is publicly available from http://
github.com/RickWalker/epSpread for academic use.
Walker et al. 21
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
Declaration of conflicting interests
The authors declare that there is no conflict of interest.
Funding
The work presented in this paper was supported by
RIVIC (the Wales Research Institute of Visual
Computing) funded by Higher Education Funding
Council for Wales (HEFCW).
References
1. Duff PA and Bell JS. Narrative research in TESOL: nar-
rative inquiry: more than just telling stories. TESOL
Quart 2002; 36(2): 207–213.
2. Pirolli P and Card S. The sensemaking process and
leverage points for analyst technology as identified
through cognitive task analysis. In: Proceedings of interna-
tional conference on intelligence analysis, McLean, VA,
USA, 3–5 May 2005, pp. 2–4.
3. Klein G, Moon B and Hoffman RR. Making sense of
sensemaking 2: a macrocognitive model. IEEE Intell Syst
2006; 21(5): 88–92.
4. Kireyev K, Palen L and Anderson K. Applications of
topics models to analysis of disaster-related Twitter data.
In: NIPS workshop on applications for topic models: text
and beyond, Whistler, Canada, 11 December 2009.
5. Takeuchi H and Nonaka I. The new new product devel-
opment game. Harvard Bus Rev 1986; 64(1): 137–146.
6. Cook KA and Thomas JJ. Illuminating the path: the
research and development agenda for visual analytics. Rich-
land, WA: Pacific Northwest National Laboratory
(PNNL), 2005.
7. Keim DA, Kohlhammer J, Ellis G, et al. Mastering the
information age-solving problems with visual analytics.
2010. Eurographics Association
8. Keim D, Andrienko G, Fekete JD, et al. Visual analytics:
definition, process, and challenges. In: Information
Visualization. Berlin, Heidelberg: Springer–Verlag, 2008,
pp. 154–175.
9. Gershon N and Page W. What storytelling can do for
information visualization. Commun ACM 2001; 44(8):
31–37, http://doi.acm.org/10.1145/381641.381653
10. Segel E and Heer J. Narrative visualization: telling stories
with data. IEEE T Vis Comput Gr 2010; 16(6): 1139–1148.
11. Eisner W. Comics and sequential art. New York: W. W.
Norton, 2008.
12. McCloud S. Understanding comics. New York: Harper
Perennial, 1994.
13. Aigner W, Miksch S, Schumann H, et al. Visualization of
time-oriented data. Berlin: Springer, 2011.
14. Hart J. The art of the storyboard: storyboarding for film, TV,
and animation. UK: Focal Press, 1999.
15. Agrawala M, Phan D, Heiser J, et al. Designing effective
step-by-step assembly instructions. ACM T Graphic
2003; 22(3): 828–837.
16. Truong KN, Hayes GR and Abowd GD. Storyboarding:
an empirical determination of best practices and effective
guidelines. In: Proceedings of the 6th conference on designing
interactive systems, UniversityPark, PA, USA, 26–28 June,
2006, pp. 12–21. New York, NY, USA: ACM.
17. Bailey BP, Konstan JA and Carlis JV. DEMAIS: design-
ing multimedia applications with interactive storyboards.
In: Proceedings of the ninth ACM international conference
on multimedia Ottawa, ON, Canada, 30 September–05
October 2001, pp. 241–250. New York, NY, USA:
ACM.
18. Landay JA. SILK: sketching interfaces like krazy. In:
Conference companion on human factors in computing sys-
tems: common ground, Vancouver, BC, Canada, 13–18
April 1996, pp. 398–399. New York, NY, USA: ACM.
19. Herranz L, Calic J, Martinez JM, et al. Scalable comic-
like video summaries and layout disturbance. IEEE T
Multimedia 2012; 14(4): 1290–1297.
20. Goldman DB, Curless B, Salesin D, et al. Schematic
storyboarding for video visualization and editing. In:
SIGGRAPH ‘06: ACM SIGGRAPH 2006 papers, 2006,
pp. 862–871. New York: ACM, http://doi.acm.org/
10.1145/1179352.1141967
21. Atasoy B and Martens JB. STORIFY: a tool to assist
design teams in envisioning and discussing user experience.
In: CHI EA ‘11: CHI ‘11 extended abstracts on human factors
in computing systems, 2011, pp. 2263–2268. New York:
ACM, http://doi.acm.org/10.1145/1979742.1979905
22. Roberts JC. State of the art: coordinated & multiple
views in exploratory visualization. In: CMV ’07: fifth
international conference on coordinated and multiple views in
exploratory visualization, Zurich, Switzerland, 2 July
2007, pp. 61–71. New York, NY, USA: IEEE.
23. Wood J, Isenberg P, Isenberg T, et al. Sketchy rendering
for information visualization. IEEE T Vis Comput Gr
2012; 18(12): 2749–2758.
24. Strobelt H, Oelke D, Rohrdantz C, et al. Document
cards: a top trumps visualization for documents. IEEE T
Vis Comput Gr 2009; 15(6): 1145–1152.
25. Chen C. Top 10 unsolved information visualization
problems. IEEE Comput Graph 2005; 25(4): 12–16.
26. Zhao D and Rosson M. How and why people Twitter:
the role that micro-blogging plays in informal communi-
cation at work. In: GROUP ‘09: proceedings of the ACM
2009 international conference on supporting group work,
Sanibel Island, FL, USA 10–13 May 2009, pp. 243–
252. New York: ACM.
27. Oulasvirta A, Lehtonen E, Kurvinen E, et al. Making the
ordinary visible in microblogs. Pers Ubiquit Comput 2010;
14(3): 237–249.
28. Nichols J, Mahmud J and Drews C. Summarizing sport-
ing events using Twitter. In: IUI ‘12: proceedings of the
2012 ACM international conference on intelligent user inter-
faces, Lisbon, Portugal, 14–17 February 2012, pp. 189–
198. New York: ACM.
29. Kwak H, Lee C, Park H, et al. What is Twitter, a social
network or a news media? In: WWW ‘10: proceedings of
the 19th international conference on world wide web,
Raleigh, NC, USA, 26–30 April 2010, pp. 591–600.
New York: ACM.
30. Castillo C, Mendoza M and Poblete B. Information
credibility on Twitter. In: Proceedings of the 20th
22 Information Visualization 0(0)
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
international conference on world wide web, Hyderabad,
India, 28 March–01 April 2011 pp. 675–684. New York:
ACM.
31. Rogers EM. Diffusion of innovations. New York, NY,
USA: Simon & Schuster, 1995.
32. Gomez-Rodriguez M, Leskovec J and Krause A. Infer-
ring networks of diffusion and influence. ACM Trans
Knowl Discov Data 2012; 5(4): 21:1–21:37.
33. Lanier J. You are not a gadget: a manifesto. New York:
Alfred A. Knopf, Random House, Inc., 2010.
34. Sharifi B, Hutton MA and Kalita J. Summarizing micro-
blogs automatically. In: HLT ‘10: human language tech-
nologies: the 2010 annual conference of the North American
chapter of the association for computational linguistics, 2010,
pp. 685–688. Stroudsburg, PA: Association for Compu-
tational Linguistics, http://dl.acm.org/citation.cfm?id=
1857999.1858099
35. Chakrabarti D and Punera K. Event summarization
using tweets. In: Proceedings of the 5th international AAAI
conference on weblogs and social media (ICWSM), 2011,
http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/
paper/view/2885
36. Glance N, Hurst M, Nigam K, et al. Deriving marketing
intelligence from online discussion. In: KDD ‘05: pro-
ceedings of the eleventh ACM SIGKDD international con-
ference on knowledge discovery in data mining, Chicago,
IL, USA, 21–24, August 2005, pp. 419–428. New York:
ACM.
37. Shamma DA, Kennedy L, Churchill EF, et al. Peaks and
persistence: modeling the shape of microblog conversa-
tions. In: CSCW ‘11: proceedings of the ACM 2011 confer-
ence on computer supported cooperative work, 2011, pp.
355–358. New York: ACM, http://doi.acm.org/10.1145/
1958824.1958878
38. Salton G and McGill MJ. Introduction to modern informa-
tion retrieval. New York: McGraw-Hill Inc., 1986.
39. Marcus A, Bernstein MS, Badar O, et al. TwitInfo:
aggregating and visualizing microblogs for event explora-
tion. In: CHI ‘11: proceedings of the 2011 annual conference
on human factors in computing systems, Vancouver, BC,
Canada, 07–12, May 2011, pp. 227–236. New York:
ACM.
40. Bertini E, Buchmuller J, Fischer F, et al. Visual analytics
of terrorist activities related to epidemics. In: IEEE con-
ference on visual analytics science and technology, Provi-
dence, RI, 23–28 Oct. 2011, pp. 329–330. New York,
NY, USA: IEEE.
41. Braunstein E, Gorg C, Liu Z, et al. Jigsaw to save Vasto-
polis. In: IEEE conference on visual analytics science and
technology, Providence, RI, 23–28 Oct. 2011, pp. 325–
326. New York, NY, USA: IEEE.
42. ap Cenydd L, Walker R, Pop S, et al. epSpread – story-
boarding for visual analytics. In: IEEE conference on visual
analytics science and technology, Providence, RI, 23–28
Oct. 2011, pp. 311–312. New York, NY, USA: IEEE.
43. Cheong M and Lee V. A study on detecting patterns in
Twitter intra-topic user and message clustering. In: ICPR
‘10: Proceedings of the 20th international conference on
pattern recognition. Washington, DC: IEEE Computer
Society, Istanbul, Turkey, 23–26 August 2010, pp.
3125–3128.
44. Ho CT, Li CTand Lin SD. Modeling and visualizing infor-
mation propagation in a micro-blogging platform. In: 2011
international conference on advances in social networks analysis
and mining (ASONAM), Kaohsiung, Taiwan, 25–27 July
2011 pp. 328–335. New York, NY, USA: IEEE.
45. Narayan S and Cheshire C. Not too long to read: the tldr
interface for exploring and navigating large-scale discus-
sion spaces. In: 2010 43rd Hawaii international conference
on system sciences (HICSS), Koloa, Kauai, Hawaii, 5—8
January 2010, pp. 1–10. New York, NY, USA: IEEE.
46. Biuk-Aghai RP. Visualizing co-authorship networks in
online Wikipedia. In: ISCIT ‘06: international symposium
on communications and information technologies, 2006.
Bangkok, Thailand, 18–20 October 2006, pp. 737–742.
New York, NY, USA: IEEE.
47. Fisher D, Smith M and Welser HT. You are who you talk
to: detecting roles in Usenet newsgroups. In: HICSS ’06:
proceedings of the 39th annual Hawaii international confer-
ence on system sciences, 2006, Kauai, Hawaii, 4–7 January
2006, vol. 3, p. 59b. NY, USA: IEEE.
48. Welser HT, Gleave E, Fisher D, et al. Visualizing the sig-
natures of social roles in online discussion groups. J Soc
Struct 2007; 8(2): 564–586.
49. Welser HT, Cosley D, Kossinets G, et al. Finding social
roles in Wikipedia. In: Proceedings of the 2011 iConference,
Seattle, WA, USA, 08–11 February 2011, pp. 122–129.
New York, NY, USA: ACM.
50. Perer A and Shneiderman B. Balancing systematic and
flexible exploration of social networks. IEEE T Vis Com-
put Gr 2006; 12(5): 693–700.
51. Heer J and Boyd D. Vizster: visualizing online social net-
works. In: INFOVIS 2005: IEEE symposium on informa-
tion visualization, Minneapolis, MN, USA, 23–25
October 2005, pp. 32–39. New York, NY, USA: IEEE.
52. Hansen D, Smith MA and Shneiderman B. Event-
Graphs: charting collections of conference connections.
In: 2011 44th Hawaii international conference on system
sciences (HICSS), Kauai, Hawaii, 4–7 Jan. 2011, pp. 1–
10. New York, NY, USA: IEEE.
53. Smith MA and Fiore AT. Visualization components for
persistent conversations. In: CHI ‘01: Proceedings of the
SIGCHI conference on human factors in computing systems.
Seattle, WA, USA, 31 March–05 April 2001, pp. 136–
143. New York: ACM.
54. Turner TC, Smith MA, Fisher D, et al. Picturing Use-
net: mapping computer-mediated collective action.
J Comput-Mediat Comm 2005; 10(4).
55. Engdahl B, Koksal M and Marsden G. Using treemaps
to visualize threaded discussion forums on PDAs. In:
CHI EA ‘05: CHI ‘05 extended abstracts on human factors
in computing systems, Portland, OR, USA, 02–07 April
2005, pp. 1355–1358. New York: ACM.
56. Havre S, Hetzler E, Whitney P, et al. ThemeRiver: visua-
lizing thematic changes in large document collections.
IEEE T Vis Comput Gr 2002; 8(1): 9–20.
Walker et al. 23
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
57. Zhu B and Chen H. Communication-Garden System:
visualizing a computer-mediated communication pro-
cess. Decis Support Syst 2008; 45(4): 778–794.
58. Dork M, Gruen D, Williamson C, et al. A visual back-
channel for large-scale events. IEEE T Vis Comput Gr
2010; 16(6): 1129–1138.
59. Dou W, Wang X, Skau D, et al. LeadLine: interactive
visual analysis of text data through event identification
and exploration. In: Proceedings of IEEE visual analytics
science and technology (VAST 2012), Seattle, Washington,
USA, 14–19 October 2012. New York, NY, USA: IEEE.
60. Ramage D, Dumais S and Liebling D. Characterizing
microblogs with topic models. In: Proceedings of the fourth
international AAAI conference on weblogs and social media,
Washington, D.C., 23–26 May 2010, vol. 5, pp. 130–
137. Menlo Park, California: AAAI Press.
61. Archambault D, Greene D, Cunningham P, et al. The-
meCrowds: multiresolution summaries of Twitter usage.
In: SMUC ‘11: proceedings of the 3rd international work-
shop on search and mining user-generated contents, 2011,
pp. 77–84. New York: ACM, http://doi.acm.org/
10.1145/2065023.2065041
62. Bosch H, Thom D, Worner M, et al. ScatterBlogs: geo-
spatial document analysis. In: 2011 IEEE conference on
visual analytics science and technology (VAST), Provi-
dence, RI, USA, 23–28 Oct. 2011, pp. 309–310. New
York, NY ,USA: IEEE.
63. Yin J, Lampert A, Cameron M, et al. Using social media
to enhance emergency situation awareness. IEEE Intell
Syst 2012; 27(6): 52–59.
64. Kumar S, Barbier G, Abbasi MA, et al. TweetTracker:
an analysis tool for humanitarian and disaster relief. In:
Fifth international AAAI conference on weblogs and social
media, Barcelona, Catalonia, Spain, 17–21 July 2011,
pp. 661–662. Menlo Park, California.: AAAI Press.
65. Marcus A, Bernstein MS, Badar O, et al. Tweets as data:
demonstration of TweeQL and TwitInfo. In: Proceedings
of the 2011 international conference on management of data,
Athens, Greece, 12–16 June 2011, pp. 1259–1262. New
York: ACM.
66. Marcus A, Bernstein MS, Badar O, et al. Processing and
visualizing the data in tweets. SIGMOD Rec 2011; 40(4):
21–27.
67. Itoh M. 3D techniques for visualizing user activities on
microblogs. In: 2010 IET international conference on fron-
tier computing. Theory, technologies and applications, Tai-
chung, Taiwan, 4–6 August 2010, pp. 384–389. New
York, NY, USA: IEEE.
68. Ratkiewicz J, Conover M, Meiss M, et al. Truthy: map-
ping the spread of astroturf in microblog streams. In:
WWW ‘11: proceedings of the 20th international conference
companion on world wide web, Hyderabad, India, 28
March-01 April 2011, pp. 249–252. New York: ACM.
69. Lam F and Donath J. Seascape and volcano: visualizing
online discussions using timeless motion. In: CHI EA
‘05: CHI ‘05 extended abstracts on human factors in com-
puting systems, Portland, OR, USA, 02–07 April 2005,
pp. 1585–1588. New York: ACM.
70. Kraak MJ. The space-time cube revisited from a geovi-
sualization perspective. In: Proceedings of the 21st interna-
tional cartographic conference, Durban, South Africa, 10–
16 August 2003, pp. 1988–1996. Durban, SA:Interna-
tional Cartographic Association (ICA).
71. Kim KS, Lee R and Zettsu K. mTrend: discovery of
topic movements on geo-microblogging messages. In:
GIS ‘11: proceedings of the 19th ACM SIGSPATIAL inter-
national conference on advances in geographic information
systems, Chicago, IL, USA, 1–4 November 2011, pp.
529–532. New York: ACM.
72. Zhao OJ, Ng T and Cosley D. No forests without trees:
particulars and patterns in visualizing personal commu-
nication. In: iConference ‘12: proceedings of the 2012 iCon-
ference, Toronto, Canada, 7–10 February 2012, pp. 25–
32. New York: ACM.
73. Cao N, Lin YR, Sun X, et al. Whisper: tracing the spa-
tiotemporal process of information diffusion in real time.
IEEE T Vis Comput Gr 2012; 18(12): 2649–2658.
74. White JJD and Roth RE. TwitterHitter: geovisual analy-
tics for harvesting insight from volunteered geographic
information. In: Proceedings of GIScience, Zurich, 14–
17th September, 2010.
75. MacEachren AM, Robinson AC, Jaiswal A, et al. Geo-
Twitter analytics: applications in crisis management. In:
25th international cartographic conference, Paris, France,
3–8 July 2011.
76. Lohmann S, Burch M, Schmauder H, et al. Visual anal-
ysis of microblog content using time-varying co-occur-
rence highlighting in tag clouds. In: AVI ‘12: proceedings
of the international working conference on advanced visual
interfaces, Capri Island, Naples, Italy, 22–25 May 2012,
pp. 753–756. New York: ACM.
77. Singh VK, Gao M and Jain R. From microblogs to social
images: event analytics for situation assessment. In: MIR
‘10: proceedings of the international conference on multime-
dia information retrieval, Philadelphia, PA, USA, 29–31
March 2010, pp. 433–436. New York: ACM.
78. Cheong M and Lee VCS. A microblogging-based
approach to terrorism informatics: exploration and
chronicling civilian sentiment and response to terrorism
events via Twitter. Inform Syst Front 2011; 13(1): 45–59.
79. Achrekar H, Gandhe A, Lazarus R, et al. Predicting flu
trends using Twitter data. In: 2011 IEEE conference on
computer communications workshops (INFOCOM
WKSHPS), Shanghai, China, 10–15 April 2011, pp.
702–707. New York, NY, USA: IEEE.
80. Kumar S, Morstatter F, Marshall G, et al. Navigating
information facets on Twitter (NIF-T). In: KDD ‘12:
proceedings of the 18th ACM SIGKDD international confer-
ence on knowledge discovery and data mining, Beijing,
China, 12–16 August 2012, pp. 1548–1551. New York:
ACM.
81. Culotta A. Towards detecting influenza epidemics by
analyzing Twitter messages. In: Proceedings of the first
workshop on social media analytics, Washington DC,
DC, USA, 25–28 July 2010, pp. 115–122. New York:
ACM.
24 Information Visualization 0(0)
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from
82. Schoenberg IJ. Contributions to the problem of approxi-
mation of equidistant data by analytic functions. Q Appl
Math 1946; 4: 45–99, 112–141.
83. Teahan W. A compression-based method for ranking n-gram
differences between texts. Bangor: School of Computer Sci-
ence, Bangor University, 2012.
84. Pritchard IC, Walker R and Roberts JC. Visual analytics
of microblog data for pandemic and crisis analysis. In:
EuroVA 2012: international workshop on visual analytics,
Vienna, Austria, 4–5 June 2012, pp. 55–59. Germany:
The Eurographics Association.
85. Nagel T, Heidmann F, Duval E, et al. Unfolding – a
simple library for interactive maps and geovisualizations
in processing. In: GeoViz 2013, Hamburg, Germany, 6–
8 March 2013.
Walker et al. 25
at Bangor University on March 15, 2015ivi.sagepub.comDownloaded from