knowledge generation model for visual analytics · knowledge generation model for visual analytics...

10
Knowledge Generation Model for Visual Analytics Dominik Sacha, Andreas Stoffel, Florian Stoffel, Bum Chul Kwon, Member, IEEE, Geoffrey Ellis and Daniel A. Keim, Member, IEEE Action Finding Visualization Model Data Hypothesis Insight Knowledge Computer Human Exploration Loop Verification Loop Knowledge Generation Loop Fig. 1: Knowledge generation model for visual analytics: The model consists of computer and human parts. The left hand side illustrates a visual analytics system, whereas the right hand side illustrates the knowledge generation process of the human. The latter is a reasoning process composed of exploration, verification, and knowledge generation loops. Visual analytics pursues a tight integration of human and machine by enabling the user to interact with the system. These interactions are illustrated in Figure 2. Abstract—Visual analytics enables us to analyze huge information spaces in order to support complex decision making and data exploration. Humans play a central role in generating knowledge from the snippets of evidence emerging from visual data analysis. Although prior research provides frameworks that generalize this process, their scope is often narrowly focused so they do not encompass different perspectives at different levels. This paper proposes a knowledge generation model for visual analytics that ties together these diverse frameworks, yet retains previously developed models (e.g., KDD process) to describe individual segments of the overall visual analytic processes. To test its utility, a real world visual analytics system is compared against the model, demonstrating that the knowledge generation process model provides a useful guideline when developing and evaluating such systems. The model is used to effectively compare different data analysis systems. Furthermore, the model provides a common language and description of visual analytic processes, which can be used for communication between researchers. At the end, our model reflects areas of research that future researchers can embark on. Index Terms—Visual Analytics, Knowledge Generation, Reasoning, Visualization Taxonomies and Models, Interaction 1 I NTRODUCTION Visual analytics research made great strides in the past decade with numerous studies demonstrating successes in helping domain experts explore large and complex data sets. The power of visual analytics comes from effective delegation of perceptive skills, cognitive reason- ing and domain knowledge on the human side and computing and data storage capability on the machine side, and their effective coupling via visual representations. Thus far, many application papers have tested and verified different ways to instigate this human and machine collaboration to reveal hidden nuggets of information. Recent work emphasizes that visual analytics theories must go be- yond “human in the loop” to “human is the loop” thinking in order to recognize and integrate human work processes with analytics (Endert et al. [7]). To achieve this goal, system, human, cognition and reasoning based theories have to be considered. Many theoretical works have also examined the role of visual analytics tools in data analysis, decision making and problem solving. Visual analytics processes span from humans high-level analytic works using their domain knowledge to low-level activities such as interacting with tools. Many prior works Data Analysis and Visualization Group, University of Konstanz. E-mail: [email protected] Manuscript received 31 Mar. 2014; accepted 1 Aug. 2014; date of publication xx xxx 2014; date of current version xx xxx 2014. For information on obtaining reprints of this article, please send e-mail to: [email protected]. investigated different levels of activities with regards to humans cog- nition models (e.g., Green et al. [14], Kwon et al. [17]); interaction models/taxonomies (e.g., Brehmer and Munzner [4], Norman [25]); process models (e.g., Card et al. [5], Fayyad et al. [8]); sensemaking models (e.g., Pirolli and Card [29]). We initially sought interaction taxonomies that describe the aforementioned models. However, we lack an overarching pipeline that connects all the dots. Previously developed models (e.g., Keim et al. [15, 16]) are system-driven and are not describing in detail the human reasoning part in the visual analytics process. In particular, we have little idea how analytical components support knowledge generation processes and how analysts’ intents drive the insight collection action forward. It would be valuable for future research to have an integrated framework of all processes and models relevant for knowledge generation with visual analytics. The paper aims to take the first step to establish the knowledge gen- eration model for visual analytics. First, we build our initial process model by significantly extending the previous models [15, 16] (Section 2). Comparing with previously developing models and theories, we show how our model fits various kinds of models (Section 3). Using our model, we find areas that existing visual analytics tools can im- prove (Section 4). We discuss our model showing open issues, model implications and future work (Section 5). Our contributions can be described as follows. The model presented in this paper provides a high-level description of the human and com- puter processes within visual analytics systems which facilitates an understanding of the functionality and interaction between the compo- nents. This can be used in the design of new visualisation applications

Upload: others

Post on 15-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Knowledge Generation Model for Visual Analytics · Knowledge Generation Model for Visual Analytics Dominik Sacha, Andreas Stoffel, Florian Stoffel, Bum Chul Kwon, Member, IEEE, Geoffrey

Knowledge Generation Model for Visual Analytics

Dominik Sacha, Andreas Stoffel, Florian Stoffel, Bum Chul Kwon, Member, IEEE,Geoffrey Ellis and Daniel A. Keim, Member, IEEE

Action

Finding

Visualization

Model

Data

Hypothesis

Insight

Knowledge

Computer Human

Exploration Loop

Verification Loop

Knowledge Generation

Loop

Fig. 1: Knowledge generation model for visual analytics: The model consists of computer and human parts. The left hand sideillustrates a visual analytics system, whereas the right hand side illustrates the knowledge generation process of the human. Thelatter is a reasoning process composed of exploration, verification, and knowledge generation loops. Visual analytics pursues a tightintegration of human and machine by enabling the user to interact with the system. These interactions are illustrated in Figure 2.

Abstract—Visual analytics enables us to analyze huge information spaces in order to support complex decision making and dataexploration. Humans play a central role in generating knowledge from the snippets of evidence emerging from visual data analysis.Although prior research provides frameworks that generalize this process, their scope is often narrowly focused so they do notencompass different perspectives at different levels. This paper proposes a knowledge generation model for visual analytics that tiestogether these diverse frameworks, yet retains previously developed models (e.g., KDD process) to describe individual segments of theoverall visual analytic processes. To test its utility, a real world visual analytics system is compared against the model, demonstratingthat the knowledge generation process model provides a useful guideline when developing and evaluating such systems. The model isused to effectively compare different data analysis systems. Furthermore, the model provides a common language and description ofvisual analytic processes, which can be used for communication between researchers. At the end, our model reflects areas of researchthat future researchers can embark on.

Index Terms—Visual Analytics, Knowledge Generation, Reasoning, Visualization Taxonomies and Models, Interaction

1 INTRODUCTION

Visual analytics research made great strides in the past decade withnumerous studies demonstrating successes in helping domain expertsexplore large and complex data sets. The power of visual analyticscomes from effective delegation of perceptive skills, cognitive reason-ing and domain knowledge on the human side and computing and datastorage capability on the machine side, and their effective couplingvia visual representations. Thus far, many application papers havetested and verified different ways to instigate this human and machinecollaboration to reveal hidden nuggets of information.

Recent work emphasizes that visual analytics theories must go be-yond “human in the loop” to “human is the loop” thinking in order torecognize and integrate human work processes with analytics (Endert etal. [7]). To achieve this goal, system, human, cognition and reasoningbased theories have to be considered. Many theoretical works have alsoexamined the role of visual analytics tools in data analysis, decisionmaking and problem solving. Visual analytics processes span fromhumans high-level analytic works using their domain knowledge tolow-level activities such as interacting with tools. Many prior works

• Data Analysis and Visualization Group, University of Konstanz. E-mail:[email protected]

Manuscript received 31 Mar. 2014; accepted 1 Aug. 2014; date of publicationxx xxx 2014; date of current version xx xxx 2014.For information on obtaining reprints of this article, please sende-mail to: [email protected].

investigated different levels of activities with regards to humans cog-nition models (e.g., Green et al. [14], Kwon et al. [17]); interactionmodels/taxonomies (e.g., Brehmer and Munzner [4], Norman [25]);process models (e.g., Card et al. [5], Fayyad et al. [8]); sensemakingmodels (e.g., Pirolli and Card [29]). We initially sought interactiontaxonomies that describe the aforementioned models. However, welack an overarching pipeline that connects all the dots. Previouslydeveloped models (e.g., Keim et al. [15, 16]) are system-driven and arenot describing in detail the human reasoning part in the visual analyticsprocess. In particular, we have little idea how analytical componentssupport knowledge generation processes and how analysts’ intents drivethe insight collection action forward. It would be valuable for futureresearch to have an integrated framework of all processes and modelsrelevant for knowledge generation with visual analytics.

The paper aims to take the first step to establish the knowledge gen-eration model for visual analytics. First, we build our initial processmodel by significantly extending the previous models [15, 16] (Section2). Comparing with previously developing models and theories, weshow how our model fits various kinds of models (Section 3). Usingour model, we find areas that existing visual analytics tools can im-prove (Section 4). We discuss our model showing open issues, modelimplications and future work (Section 5).

Our contributions can be described as follows. The model presentedin this paper provides a high-level description of the human and com-puter processes within visual analytics systems which facilitates anunderstanding of the functionality and interaction between the compo-nents. This can be used in the design of new visualisation applications

Page 2: Knowledge Generation Model for Visual Analytics · Knowledge Generation Model for Visual Analytics Dominik Sacha, Andreas Stoffel, Florian Stoffel, Bum Chul Kwon, Member, IEEE, Geoffrey

or in the evaluation of existing ones in terms of how their subcom-ponents support humans reasoning, decision making, and knowledgegeneration processes.

2 KNOWLEDGE GENERATION MODEL

Visual analytics uses data to draw conclusions about the world, a pro-cess, or an application field. It is a structured reasoning process thatallows analysts to find evidence in data and gain insights into the prob-lem domain. Our model of the knowledge generation process is basedon the visual analytics process of Keim et al. [15, 16] and describes howknowledge is generated with this process. Pohl et al. [30] also discussedrelevant theories that should be considered, besides a computer/systembased view of the analytical process. They show that visual analyt-ics system components, analytical procedures and human perceptualand cognitive activities and especially the interplay between these ele-ments, lead to a complex process. The theories, namely sensemaking,Gestalt theories (describing problem solving), distributed cognition(interplay between humans and artifacts), graph comprehension the-ories (derive meaning from graphs/process visual information) andskill-rule-knowledge models (three-fold hierarchy of processing levels)are highlighted as important aspects of the visual analytics process.In the following sections, we define and relate common elements ofthe aforementioned models/fields/concepts in order to arrive at a betterunderstanding of visual analytics in terms of computing and humanprocesses.

Our visual analytics model (see Figure 1) is split into two parts. Thecomputer system with data, visualization and analytic models, and thehuman component modeling the cognitive processes associated withan analytical session. The cloud in the model indicates that there is noclear separation between the computer and human part, as both partsare required for data analysis. Computers miss the creativity of humananalysis that allows them to create surprising, often subtle or hiddenconnections between data and the problem domain. Humans are notable to deal efficient and effectively with large amount of data. In visualanalytics the connection between the human and computer uses thehumans interaction abilities and perception.

Knowledge generation in visual analytics comprises of abductive,deductive, and inductive reasoning processes. For a detailed defini-tion and discussion of these reasoning processes see Peirce [27] andMagnani [21]. Abductive reasoning processes formulates hypothesesfrom observations that are unexpected or cannot be explained withexisting knowledge. Assuming that these hypotheses are true, expecta-tions of effects, patterns, or relations in the analyzed data are deduced.Through interactions with visual analytics systems, analysts try to findevidence and detect patterns in data to verify or falsify the hypotheses,which is an inductive reasoning process. We decided to model humancognitive processes with loops, because analysis does not follow de-terministic rules but is rather chaotic or spontaneous nature. Analystsare often working on different hypotheses, tasks, or findings and canconsequently be working on several loops in parallel.

2.1 Computer2.1.1 DataThe starting point of all visual analytic systems is data. Data describesfacts [8] in structured, semi-, or unstructured manner. It must be repre-sentative and related to the analytical problem, otherwise, the analyticalprocess is unlikely to reveal meaningful relationships in the problemdomain. In the visual analytics process, data creation, gathering andselection processes often determines the quality of the data. During ananalysis, additional data can be created by automatic methods (e.g., clus-tering or classification) or by manual annotations. Hence, provenienceinformation about data containing details about creation, gathering, se-lection, and preprocessing is important to estimate the trustworthinessof analysis results (see also [31]). The term metadata describes secondorder data or “data about data”. The usage of this term is ambiguousdepending on the domain and interests of users. In addition to prove-nience data, metadata describing the structure of data is usually handledby visual analytics system to access and display data appropriately.This sort of metadata is usually not subject to analysis as it describes

mainly data formats, which is necessity of any kind of data. The termmetadata is also often used for data describing or summarizing otherdata, e.g., keywords or topics for documents or images. In the scope ofdata analysis this descriptive metadata is treated similar to data and wesee no benefit or reason to treat this data differently, therefore the termdata also includes this sort of metadata.

2.1.2 Model

Models can be as simple as descriptive statistics describing a property ofa subset of the data or as complex as a data mining algorithm. The KDDprocess leads to models from data (see Figure 3). These models servedifferent purposes in the visual analytics process. Simple analysis tasksmight be solved by calculating a single number that confirms or rejects apreconceived notion/expectation. For instance, a statistical test can leadto a conclusion whether or not to trust a hypothesis. Complex patterns orabstractions found by data mining methods can be used in visualizationsby showing, for instance, clustering or classification results visually. Inaddition, the automatically created model can be analyzed or visualizedto derive insights. For example, logistic regression models learn weightsof features, which can be used to identify most important features orfeature combination.

2.1.3 Visualization

A different path from data to knowledge is the information visualiza-tion pipeline using data visualizations (see Figure 3). Visualizationsuse data or models generated from the data and enable analysts todetect relationships in the data. In visual analytics, visualizations areoften based on automatic models, for instance, clustering models areused to group data visually. Also, a model itself can be visualized,for instance, a box plot shows the data distribution of a dimension.Visualization methods for a model may vary depending on the state ofthe visualization. For example, in semantic zooming a visualizationmight use different properties of a model depending on the zoom level.Visualization is often used as the primary interface between analystsand visual analytics systems whereas understanding the model oftenrequires more cognitive efforts.

2.2 Exploration LoopThe exploration loop describes how analysts interact with a visual an-alytics system in order to generate new visualizations or models andanalyze the data. Analysts explore data supported with the visual ana-lytic system by interactions and observing the feedback. Actions takenin the exploration loops are dependent on findings or a concrete analyt-ical goal. In case a concrete analysis goal is missing, the explorationloop becomes a search for findings, which may lead to new analyticalgoals. Even when the exploration loop is controlled by an analysis goal,the resulting findings are not necessarily related to it but can lead toinsights solving different tasks or opening new analytical directions.

2.2.1 Action

Different meanings of actions are present in the InfoVis communityas they may be defined with different granularities or with differingrelations. Pike et al. [28] illustrate that actions may concern user goalsand tasks on the one hand and interactive visualizations on the otherhand. According to recent interaction taxonomies (e.g., Brehmer andMunzner [4]), simple interactions, How, are related to higher levelconcepts, Why.

In our definition, actions refer to individual tasks that generate tangi-ble, unique responses from the visual analytics system. For instance, atask can be to visualize a particular data property or calculate a model ofa relationship in the data. Actions derived from hypotheses are usuallycomplex actions, for instance, use a specific visualization method thathas the potential to show interesting data. Actions derived from findingsare normally simple actions, such as changing the visual mapping ofa visualization or selecting a different parameter for model building.In visual analytics, analysts freely choose between visualization andmodeling or a combination of both for their actions. Actions naturallyprovoke interactions of analysts with visual analytics systems.

Page 3: Knowledge Generation Model for Visual Analytics · Knowledge Generation Model for Visual Analytics Dominik Sacha, Andreas Stoffel, Florian Stoffel, Bum Chul Kwon, Member, IEEE, Geoffrey

Visualization

Data

mod

el b

uild

ing

prep

arati

on visu

al

map

ping m

odel-vis mapping

model usage

inspectionobservation

manipulation

observation

Action

FindingModel

Computer Human

Fig. 2: Detailed part of the process model including action and cognition paths. Actions can either lead directly to visual analytic components(blue arrows) or to their mappings (blue, dashed arrows). Humans can observe reactions of the system (red arrows) in order to generate findings.

We name actions dealing with data gathering or data selection aspreparation action because these actions are used to prepare data forthe visual analytics process (Figure 2). Actions taken to create modelsare summarized as model building actions which in turn are relatedto the KDD process and its configuration. Application of a model istermed as model usage, which refers to actions such as calculating astatistic or cluster data. Visual mapping actions are used to create datavisualizations, and model-vis mapping actions are those which mapmodels into visualizations. Manipulation of a visualization changesthe viewport or highlights interesting data in the visualization withoutchanging the mapping of data to the visualization. All these actionsare observable as interactions between analysts and a visual analyticssystem.

2.2.2 Finding

A finding is an interesting observation made by an analyst using thevisual analytics system. The finding leads to further interaction with thesystem or to new insights. For example, findings from data inspectioncan be missing values or other data properties affecting the further anal-ysis and require special data processing. In the case of visualizationsor models, a finding can be a pattern, a conspicuous model result, oran unusual behavior of the system. Bertini et al. state that “a pattern ismade of recurring events or objects that repeat in a predictable manner.The most basic patterns are based on repetition and periodicity.” [2, p.13] Patterns in data can be detected with automatic analytical methodsor humans may detect patterns using their visual perception and cogni-tion skills. Findings are in principle not limited to data, visualizations,or models but comprise of anything interesting to an analyst, e.g., anunexpected high number or a word or statement in some text.

In our definition, a finding is independent from the problem domain,however to make an analytical use of a finding, the analyst has tointerpret them in the context of the problem domain. Although findingsare usually associated with detecting a visual pattern, the lack of apattern, when expected by an analyst, is also considered a finding. Asshown in Figure 1, findings do not necessarily lead to new insights (seeSection 2.3.2) but may trigger basic actions, such as manipulating theviewport of a visualization to show a region in more detail.

Analysts come across findings by observing the feedback from thesystem or examining visualizations and models, which in turn canlead to further actions. The exploration loop can be characterizedas a searching activity by using the system to reveal useful findingsto solve an analysis task. Analysts might frequently change theirexploring strategies and switch between models and visualizationsto collect different findings. The strategies and analysis directions in

the exploration loop are guided by an exploration goal defined in theverification loop. The actions and findings in the exploration loop areclosely related to the visual analytics systems. Analysts gain a newinsight when they are able to understand the findings and are able tointerpret them in the context of the problem domain.

2.3 Verification LoopThe verification loop guides the exploration loop to confirm hypothesesor form new ones. To verify a concrete hypothesis, a confirmatoryanalysis is conducted and the exploration loop is steered to revealfindings that verify or falsify the hypothesis. Analysts gain insightswhen they can interpret findings from the exploration loop in the contextof the problem domain. Insights may lead to new hypotheses thatrequire further investigation. Analysts gain additional knowledge whenthey assess one or more trustworthy insights.

Findings during exploration my contribute to the verification of aconcrete hypothesis but insights emerging from exploration may not berelated to the examined hypothesis. It is often the case that new insightssolve different analysis questions or open up new ones.

2.3.1 HypothesisHypotheses play a central role in the visual analytics process. Anhypothesis formulates an assumption about the problem domain thatis subject to analysis. Analysts try to find evidence that supports orcontradicts hypotheses in order to gain knowledge from data. In thisrespect the visual analytics process is guided by hypotheses. Concretehypotheses can be tested with statistical tests or data visualizationswhen the data contains the necessary information. Unfortunately, hy-potheses are often vague, such as the assumption that there are unknownfactors that have an influence on the problem domain. In such casesan exploratory analysis strategy allows analysts to come up with moreconcrete hypotheses that are used for further analysis.

2.3.2 InsightIn the InfoVis community, insight has a variety of definitions. Saraiyaet al. define insight as “an individual observation about the data by theparticipant, a unit of discovery” [33, p. 2]. North [26] takes anotherview and lists some important characteristics of an insight such as beingcomplex, deep, quantitative unexpected and relevant, and going beyondan individual observation of the data. Yi et al. [43] go one step furtherand also consider the processes that involve insights. They focus onhow people gain insight in information visualization and identify fourtypes of insight-gaining processes, however, they argue that there is nocommon definition of insight. Chang et al. [6] suggest two different

Page 4: Knowledge Generation Model for Visual Analytics · Knowledge Generation Model for Visual Analytics Dominik Sacha, Andreas Stoffel, Florian Stoffel, Bum Chul Kwon, Member, IEEE, Geoffrey

Visualization

Model

Data

Selection

Preprocessing

TransformationData Mining

Target Data

PreprocessedData Tranformed

DataPatterns

Raw Data

Data Tables

VisualStructures

ViewsVisualMappings

DataTransformations

ViewTransformations

THE

WO

RLD

Goals

Stages of Interaction

InfoVis Pipeline

KDD Process

Sensemaking Loop

Execution

Evaluation

Hypothesis

Insight

Knowledge

Exploration Loop

Verification Loop

Knowledge Generation

Loop

Action

Finding

!!

!! !! !!!

InteractionTaxonomies

Computer Human

Data

Fig. 3: Relating the process model for knowledge generation in visual analytics to other models and theories. Similarity is illustrated by color andposition. A detailed illustration of interactions between the human and computer is shown in Figure 2.

definitions of insight: the cognitive science insight as a moment ofenlightenment, an “Ah Ha” moment, which can occur spontaneously,and an advance in knowledge or a piece of information. Our definitionis closer to the latter, where the analyst interprets a finding, often withprevious domain knowledge, to generate a unit of information. Hence,an insight can be quite small, such as realizing that there is a relationbetween several properties of the data, to something more importantand potentially significant. So, insights are different from findings inthe sense that insights have an interpretation in the problem domain,what we not required for findings. For instance, a finding might supporta hypothesis, which may convince the analyst and lead to the insightthat the hypothesis is reliable. An analyst gains insights by collectingenough evidence to create a new hypothesis about the application or,in the case of very strong evidence, even new trusted knowledge. Weconsider an insight not directly as knowledge, because weak evidencemight lead to an insight that needs further verification and becomes ahypothesis. For instance, finding a cluster in a visualization during anexploratory analysis might lead to the insight that there is a cluster withdifferent properties, but this insight should at first be considered as anew assumption or hypothesis that has to be validated.

2.4 Knowledge Generation Loop

Analysts form hypotheses from their knowledge about the problem do-main and gain new knowledge by formulating and verifying hypothesesduring the visual analytics processes. When analysts trust the collectedinsights they gain new knowledge in the problem domain that may alsoinfluence the formulation of new hypotheses in the following analysisprocess.

2.4.1 Knowledge

Data analysis usually starts with data and one or more analysis ques-tions. In addition, analysts bring in their knowledge about the data, theproblem domain, or visual analytic tools and methodology. This priorknowledge determines the analysis strategy and procedure. Duringthe visual analytics process analysts try to find evidence for existingassumptions or learn new knowledge about the problem domain. In gen-eral, knowledge learned in visual analytics can be defined as “justifiedbelief” [2]. The reasoning processes in visual analytics enables analyststo gain knowledge about the problem domain from evidence found indata. The evidence has different qualities, which directly affects thetrustworthiness of the concluded knowledge. The evidence collectionroute also impacts the trustworthiness. For example, an outcome of astatistical test of an hypothesis may be perceived more trustworthy than

a pattern found in a visualization. Depending on the collected evidence,an analyst has to decide whether enough evidence was collected to trustan insight and accept it as new knowledge or whether it is in need offurther examination, e.g., analysis with different data or discussionswith domain experts. Assessing the trustworthiness of new knowledgerequires a critical review of the overall analysis process starting fromdata gathering. As well as new knowledge about the problem domain,analysts gain knowledge about data, e.g., patterns in the data or itsquality. They also gain experience with the visual analytics systemsand methodology.

3 RELATION TO OTHER MODELS

This section covers the most important related models that have influ-enced the model presented in the paper and also aims to offer a bigpicture of the whole knowledge generation process in visual analytics.Figure 3 illustrates related models and our knowledge generation modelfor visual analytics. We have used the same color codes of the originalmodel [15, 16] to highlight related areas in our model. Feedback loopsof other models (e.g. InfoVis pipeline) are replaced by our extensionson the human side. Each action results in a feedback loop via findingthat may lead to new exploration loops or crosses over to higher levelloops. Relevant theories are grouped in three areas according to theirfocus on interaction, human aspects or systems.

3.1 Systems

Card et al. propose the reference model for information visualization[5] that describes visualizations as data connected to visual mappingsperceived by humans. The InfoVis-Pipeline contains the main com-ponents of Raw Data, Data Tables, Visual Structures and Views andtransformations/mappings between these components which can bemanipulated through Human Interactions.

Fayyad et al. [8] describe the Knowledge Discovery Process inDatabases (KDD) as follows. KDD is the process of making senseout of data using data mining techniques at the core. The processincludes the mapping of low-level data into more compact, abstractor useful forms using data mining models with the goal to discover orextract patterns that can be turned into knowledge. The KDD processconsists of nine steps and represents an interactive and iterative process.Examples for interactions are selecting and filtering for the relevant data,choice of appropriate data mining methods or parameter refinement.The goal of these actions is to bring in the domain expertise of thehuman in order to find meaningful patterns. At each step the user mightmake and add decisions that cannot be handled automatically. The

Page 5: Knowledge Generation Model for Visual Analytics · Knowledge Generation Model for Visual Analytics Dominik Sacha, Andreas Stoffel, Florian Stoffel, Bum Chul Kwon, Member, IEEE, Geoffrey

Fig. 4: Van Wijks model including Green et al.’s changes [13, 14] andlabels.

KDD process steps are included in Figure 3.As a basis for our knowledge generation model for visual analytics

we take and extend the process model by Keim et al. [15, 16]. The maincomponents of the model are Data, Models, Visualization and Knowl-edge each representing a stage of the process. Also a Feedback loop ispresent going from knowledge back to data. At and in between of eachstage there are transitions. The visual analytics process consists of thetwo parts, Visual Data Exploration going from data via visualization toknowledge and Automated Data Analysis going from data via modelsto knowledge, as well as of their connection. Actions on the data maybe performed in order to select, filter or preprocess the data. The datais then mapped to a visualization (visual mapping) and to models (datamining). Model parameters and findings are visualization and the usermay interact with the visualization or refine parameters of the modelin order to steer the analysis process. Knowledge can be derived fromvisualizations, automatic analysis and preceding interactions betweenvisualizations, models and the human.

Figure 3 relates visual analytics components to the InfoVis andKDD system pipelines. The InfoVis pipeline corresponds to data,visualization, and the visual mappings, whereas the KDD processmodel is represented by data, model, and their mapping. Furthermorethe visual analytics process model includes human computer interaction.Actions can be performed at each stage, namely data, visualization,model and their mappings.

The economic model of visualization by van Wijk describes contextsin which visualizations operate [40] (Figure 4). In brief, data is trans-ferred into a visualization that can be perceived by a human through animage. Based on perception, the human generates knowledge over timewhich drives interactive exploration through changes to the visualiza-tion specification. Green et al. [14] add to van Wijk model, arguing thatperception has an important role in interactive exploration and that theact of exploration and associated reasoning often leads to knowledgeacquisition. These relate to the exploration and knowledge generationloops of our model.

3.2 Interaction

Norman describes a model for actions containing Seven Stages ofAction [25]. At the beginning of each action each human needs a goalto be achieved. Afterwards, the human has to perform an action tomanipulate something. The last step is to check if the goal was achieved.These two subprocesses are called Execution and Evaluation. Basedon previous actions the action cycle is traversed several times. Themodel also explains two major problems that occur when interactingwith computer systems. The Gulf of Execution indicates if humansdo not know how to perform an action while the Gulf of Evaluationindicates that humans are not able to evaluate the result of their actions.The Goal concept of the stages of interaction model matches to ourHypothesis as a starting point. The Execution path is leading from Goalvia an action to the World. As a result analysts evaluate observations ofthe World in several steps (see Figure 3).

Several interaction taxonomies exist which focus on different as-pects, fields and domains and attempt to structure different kinds ofinteraction at different levels of abstraction. Most of the well-knowntaxonomies focus on one or two fields, e.g. visualizations (Shneider-man [36]), reasoning (Gotz et al. [12]) or data processing (Bertini et

al. [2]), rather than all possible interactions in visual analytics. Recentpublications attempt to structure interaction using the dimensions ofWhy (the purpose of the task), How (methods used to achieve this) andWhat (necessary inputs/outputs) (Brehmer and Munzner [4], Schulz etal. [34]). There are also recent taxonomies that try to integrate existingtaxonomies out of different fields into a sound interaction taxonomyfor visual analytics (e.g. von Landesberger et al. [41]). Taking thetaxonomy of Brehmer and Munzner [4] as an example there are higherlevel goals that an analyst might pursue (why) and lower level opera-tions that can be performed (how) on a specific target (what). High andlow level interactions are both covered by the action concept. Howeverthey can be distinguished when considering the associated loops. Highlevel actions are inspired by the verification loop and are based oninsights and hypotheses whereas low level actions are influenced by theexploration loop that covers a sequence of simple actions and findings.Our process model implies that each action results in a feedback loopand is perceived and processed by the human. Our model descriptionof action-types also shows the relevant interactions for visual analyticsthat are both visualization and model centric.

Also mentionable are two mantras characterizing the analysis pro-cess for information visualization and visual analytics because theyshed light on human analysis strategies. Shneiderman proposes theVisual Information Seeking Mantra that summarizes the basic princi-ples of many visual design guidelines [36]: Overview first, zoom andfilter, then details-on-demand. Also Keim proposes a slightly differentVisual Analytics Mantra [16]: Analyse First - Show the Important -Zoom, Filter and Analyse Further - Details On Demand. Both startwith an overview/aggregation approach and end in a refinement of theirhypothesis and analysis.

3.3 Human Cognition, Sensemaking and ReasoningHumans can observe visualization or model changes that can be usedfor the knowledge generation process. The data may also be inspecteddirectly.

Pirolli and Card present the Sensemaking Process [29] as a descrip-tion of intelligence analysis. Central terms are the shoebox, schemas(that are a set of patterns around the important elements in the tasks),hypotheses and a representation. Sensemaking tasks are described asprocesses consisting of information gathering, the information represen-tation as a schema, the generation of insight and finally the generationof some knowledge product. The first loop is called the foraging loopfollowed by the sensemaking loop. Bottom-up or top-down-procesessare possible. That indicates that the model does not have a fixed entrypoint which depends on the type of task. The model shows the data andprocess flow with many feedback loops. Our model splits the Sensemak-ing Loop in three sub loops. In visual analytics it is also possible thatthe system may learn from the analysts actions allowing the system tosupport the user with visualization or action propositions. That is whythe loop is leading through the system (see Figure 3). Hypotheses aregenerated as an entry point for an analysis process leading to repeatedexploration cycles.

The field of reasoning and decision making both depend on the con-struction of mental models or scenarios of relevant situations. Accord-ing to Legrenzi et al. [19] the key components of decision making areInformation Seeking, Making Hypotheses, Making Inferences, Weigh-ing Advantages and Disadvantages and Applying Criteria to makeDecisions.

The Human Cognition Model (HCM) proposed by Green et al. [14]can also be applied to visual analytics. A major problem in visual ana-lytics is that human cognition is often assumed to be an over simplifiedblack box. Information discovery and knowledge building are at thecore of the HCM. Information is presented by the computer that humanscan perceive and directly interact with in order to focus their attention.The process of discovery of patterns or relations is a primary stage ofknowledge that can be created within the knowledge building process.The computer works to counter the humans limited working memory aswell as some cognitive biases, such as confirmation bias. Central partsof the HCM are shown in Figure 5. The HCM also includes guidelinesfor discovery and knowledge building. The cognition of relevant pat-

Page 6: Knowledge Generation Model for Visual Analytics · Knowledge Generation Model for Visual Analytics Dominik Sacha, Andreas Stoffel, Florian Stoffel, Bum Chul Kwon, Member, IEEE, Geoffrey

Fig. 5: Human Cognition Model by Green et al. [13, 14].

terns and schema that can be derived from insights (verification loop)and finally to some knowledge product (knowledge generation loop)plays a central role in the human part of our model. All elements of theHCM can also be found in the knowledge generation model. The keyelement Discovery describes the overall process, whereas the other sub-concepts all can be placed into our model and related to our concepts.For example, the generation and analysis of hypotheses is a centralpart of our model, because evidence (proof/disproof) is collected as thebasis for drawing conclusions.

4 MODEL APPLICATION

In this section we illustrate that our model can be applied to systemson several levels. Firstly, the interaction possibilities can be examinedaccording to our definition of actions in Section 2.2.1 and shown inFigure 2. Secondly, we can check if and how a system supports each ofthe individual loops. Thirdly, we can use our model for a comparativeassessment. In the following we will demonstrate a detailed modelapplication with Jigsaw [9, 39] and a comparative high level assessmentof systems from different application domains.

4.1 Actions in JigsawIn the following section, we investigate Jigsaw in terms of supportedactions. It is important to note that in our model each of these actionsleads to its own feedback loop. Therefore, we pay special attentionto system components that are capable of accepting human inputsbeyond fully automated implementations. We also highlight some areasof the system that might accept more user input for amplifying userinteractions.

Data preparation is not fully supported. While loading the data, theuser has no possibility of adjusting any of the data preprocessing ortransformation steps. It is also not possible to change those once thedata has been parsed and loaded. Users may find missing data, but suchmanipulation must be done outside Jigsaw.

Model building is done automatically, but some views provide possi-bilities of adjusting the underlying models. For example, the DocumentCluster View allows users to select documents as a cluster seed, whichcan be used when the document clusters are computed. After that, theuser-generated cluster seeds become part of the model used by theapplication.

In terms of the visual mapping, Jigsaw offers some basic functionali-ties. In some views, users can select the background of the visualizationor choose which attribute is mapped to the color of a visualization entity.Taking the color mapping as an example, there are a number of differentpossibilities to define such mappings, like a non-linear color scale, orthe selection of the mapped colors. In this application, the authors leftout any of those manipulation possibilities, and force the user to accepttheir pre-defined color mapping schemes.

Model usage is supported implicitly, because most of the visual-izations require special models. In some views, like the DocumentGrid, there are different document orderings and similarity measures tochoose from. This is an example of a per-visualization model usage,which can be adjusted by the user.

The model-vis mapping is available in almost all views, but only at abasic level. This includes various ordering, sorting and filtering options.

In general, though, this mapping is done as the developers designed it,and does not allow the user to change much.

Visualization manipulation capabilities are mostly used to highlightdocument instances. There are also possibilities of manipulating andchanging the visual appearance of a view. For example, in the CircularGraph View, single terms can be selected, and the connections of thisterm to all others are displayed.

In this examination, we show that Jigsaw supports all actions weproposed in our model. It allows the visualization of different modelsbased on the same data set, the visual mapping can be adjusted asdesigned by the authors, and the model-vis mapping can be modified tofit different analytical questions. In order to achieve a stronger couplingbetween the visualization and the underlying model, the interactionscould be extended to modify the underlying model or algorithm param-eters, for example when the user moves a document to another clusterin the Document Cluster View.

4.2 Knowledge Generation in JigsawHaving examined what actions are supported by Jigsaw, we now focuson an evaluation of the three loops, which are part of the reasoningprocess as defined in Section 2.

Undoubtedly, describing the human reasoning processes using avisual analytic system is complicated. This involves a variety of aspectsfrom perception, cognition and reasoning. The detailed descriptionand evaluation of these processes exceed the scope of this paper andrequires a study with expert users. We therefore limit this examinationto a description of how Jigsaw supports human reasoning.

The exploration loop is broadly supported by Jigsaw, by providing anumber of specialized visualizations for different analytical questions.For example, users can explore given data sets, to get an impressionof the contained topics, by opening the Document Cluster view whichgives a labeled view of document clusters. It is also possible to modifythe number of generated clusters while exploring a once-generatedcluster view. The history of consecutive runs of the clustering algorithmwith different parameters is stored. This allows for assessment ofparameter changes with respect to topic changes. All views include abookmark feature, which can be used to switch between different savedstates of the visualization. As a result, users can capture and annotatefindings that might be of further interest or of high importance forfurther analysis. Bookmarks can be used to store the result of differentruns of the exploration loop, which in turn is done by utilizing theactions as described in the previous Section 4.1.

The verification loop, tightly integrated with the exploration loop,guides users to develop findings into insights. These findings from theexploration loop can be used to verify or falsify a concrete hypothesis,which is the beginning of knowledge generation. The second investiga-tive scenario given by Gork et. al. in [9] contains some questions whichcan only be answered by using the verification loop. In one of theexamples, Jigsaw is used to examine the sentiment of car reviews. Thisis done to verify the car rating, which is given as a numeric score. Thesentiment, displayed in the Document Grid, is expected to agree withthe score in a way, that highly rated cars have more positive reviewsand those with bad ratings have more negative reviews. In this example,the positive correlation of two measures is used to verify a hypothesis,which had been inferred based on product reviews contained in theanalyzed data set. Natively, Jigsaw does not support the concepts offindings nor hypotheses. Although, the Tablet View displays book-marked states of visualizations that can be organized, annotated andconnected manually. This can be used to structure findings, deriveinsights, and connect insights to hypotheses which can be added to theview and refined by the user during the analysis process. This must bedone manually, because Jigsaw does not provide any automated supportof this process. Figure 6c shows an example for hypotheses validationusing the Tablet View.

Support for the knowledge generation loop is challenging, becauseknowledge generation is a process done entirely by the user, and in-volves concepts like trust or reasoning. In addition to these humanfactors, user’s domain knowledge plays an important role, which ishard to incorporate, because it is difficult to externalize. Depending

Page 7: Knowledge Generation Model for Visual Analytics · Knowledge Generation Model for Visual Analytics Dominik Sacha, Andreas Stoffel, Florian Stoffel, Bum Chul Kwon, Member, IEEE, Geoffrey

(a) (b) (c)Fig. 6: Illustration of validation steps using Jigsaw. In 6a left, the person Daniel Keim (left list) is connected to the concept of text, displayed onthe right. To find evidence for this fact, the Document Cluster View of the publications is opened (6b). After inspection of the cluster labels, adocument from the visualizing,interaction,text cluster is examined in detail. This document is evidence that the fact presented in the list view in6a is true. An example for the Tablet View can be seen in 6c.

on the nature of the problem, the required kind and necessary levelof support varies. An example would be the automatic search andsuitable presentation of further evidence for a given hypothesis, whichhas been already verified, but is not yet trusted enough to qualify asknowledge. If the Tablet View has been used during the verificationloop to externalize the analysis process, the view can be helpful in theknowledge generation loop too.

We have showed that Jigsaw supports the three feedback loops whichare part of the inductive reasoning process, and the amount of possibleautomated support varies.

Starting with the exploration loop, automated support is based mainlyon predefined models, and is therefore limited by the analytic possibili-ties of the system. Going a step further to the verification loop, it getsharder to provide adequate automated support. The variety of differenthypothesis types is an important reason for this. It is even more difficultto extend a system to support the knowledge generation loop, due to theincreasing influence of human and other external factors, which cannoteasily be learned and represented by computers. This makes it difficultto incorporate them in an automated process supporting knowledgegeneration. Further implications of loop automation are discussed inSection 5.3.

4.3 Comparative System AssessmentIn this section, we provide a high level assessment of different sys-tems from data mining, information visualization, visual analytics andprovenance domain. The different natures of these systems illustratethe general applicability of our model on applications that deal withdata, provide visualizations, and are designed to generate knowledge.The comparison is shown in Figure 7.

At first, we assess Knime [1](Version 2.9.1), which supports theinteractive creation and execution of data mining pipelines. Data prepa-ration and inspection, model building, model observation, and themodel-vis mapping support is excellent, which is a good foundationfor the exploration loops. The available data visualizations are on abasic level and allow brushing interactions only. Besides that, Knimeprovides no explicit support for the verification and knowledge gener-ation loops, because there exist no tools to organize findings, deriveinsights, or connect hypothesis with insights. As a substitute, separatepaths for the organization of findings and hypotheses can be addedto an already existing pipeline. This is a possible way of recordingthe actions leading to a pipeline outcome, which can contribute to theknowledge generation process.

Next, we assess Tableau Desktop (Version 8.2 PE), which offersa number of visualizations that can be interactively adjusted by theuser. Tableau is a representative of applications from the informationvisualization domain. Therefore, it has strong support for model-vismapping, which is also the case for data preparation and data inspection.When it comes to model building, Tableau provides basic functionality.For specific data sets, it is possible to add a trend line or compute aforecast, where the model can be adjusted in designated bounds. Thevarious manipulation and visualization options provide strong supportfor the exploration loops. Verification loops and knowledge generationloops support is also available with the story module, which allowsfree creation of reports and references to visualizations. In addition,all three loops are supported by the annotation tool, which allows the

V

M

DD

M

EL VL KLD

V

EL VL KLD

V

M

Knime

Tableau

Jigsaw

KL

basic strongweak

EL VLM

V

M

EL VL KL

HARVEST

D

V

Legend:

Fig. 7: Comparative model application to different kinds of systems.Jigsaw represents visual analytics, Knime data mining, Tableau infor-mation visualization, and HARVEST applications from the provenancedomain. Strength of support for functionalities/components of ourmodel is indicated by the weight of the lines.

addition of persistent annotations to data points or specific locationsand areas in the visualization.

Jigsaw is an example for a visual analytics application, which wealready examined in detail in Sections 4.1 and 4.2. It supports allthe actions we included in our model. In addition, all three loops aresupported, but there is some potential for improvements. For example, agood support of the verification loops can be easily achieved by a tighterintegration of the Tablet View in the system. Also, providing historiesof actions leading to a specific state of a visualization (bookmark) is astep towards better support of the verification and knowledge generationloops.

At last, we examine HARVEST [11, 37], a system which supportsprovenance. While using HARVEST, all interaction is recorded and canbe used in order to understand the way findings have been detected.Using this data, the system is able to support analysts during the explo-ration and verification loops, for example by an automatic ranking ofmanually created notes based on the users behavior. This comes closeto our definition of the knowledge generation process. The system alsosupports the analysis process by providing visualization recommenda-tions or ordering of notes connected insights, or findings. Similar toJigsaw’s Tablet View, the note-taking-interface is capable of organizing,grouping, and ordering items, which supports the knowledge generationloops.

The assessment of the tools from different application domainsshows that the model can be applied on applications, which work withdata and visualization in general. The result also clearly separates thedifferent application domains.

5 SUMMARY AND DISCUSSION

This study identifies new perspectives on visual analytic processes be-yond weaving existing frameworks into one. Our model highlights thathuman and machine are a loop in the knowledge generation processusing visual analytics. While existing models focus on one of the these,

Page 8: Knowledge Generation Model for Visual Analytics · Knowledge Generation Model for Visual Analytics Dominik Sacha, Andreas Stoffel, Florian Stoffel, Bum Chul Kwon, Member, IEEE, Geoffrey

our model integrates human thinking whilst describing visual analyticscomponents. Most of the visual analytics systems do not fully coveror properly treat all aspects that our model requires as it includes theholistic process of knowledge generation which involves visual analyt-ics systems and human users in the loop. Our model also specifies howhuman loops are intertwined with the subprocesses of visual analyticstools. We look at systems in a skeleton view using Keim’s previousmodel and show how the interplay between each subcomponent can beinfluenced by human decision making and reasoning processes. This isimportant for researchers as it enables discussions of specific functionsand their impacts on reasoning processes more explicitly because ourmodel can describe the whole path from data to knowledge and viceversa. Besides connecting to system components, the model also de-fines human concepts and introduces three self-contained loops/stagesof reasoning/thinking. We can also use our model to assess visual ana-lytics system in terms of its functions toward human analytic outputs.For instance, we can detect areas of visual analytics systems that tendto cause biases within the knowledge generation process by aligningexperts’ analytic processes and outcomes against our model. Then,designers could improve corresponding visual analytics components toenhance early detection of such analytic failures: wrong hypotheses,conflicts between findings and insights, and dead ends of explorationcycles. We also find results that resonate with sensemaking, cognitionand reasoning models (e.g. Pirolli and Card [29]). Our model evenspecifies where our current visual analytics systems fall short of, whichis to support higher level loops, namely the exploration, verificationand knowledge generation loops. Supporting higher level loops is morecomplex, however this would be a useful addition to many of the currentsystems [31].

5.1 Collaboration and Communication

Our model describes the knowledge generation processes for visualanalytics. There are specific types of visual analytics, which our modeldoes not explicitly mention. Visual analytics processes could be col-laborative, so multiple stakeholders asynchronously or synchronouslyanalyze data together and gain insights through verbal communicationbetween them. Our model currently assumes an individual’s analyticprocess, so it is missing collaborative components. We can simplifythis collaboration process with different but possibly shared knowledgegeneration loops of all humans (i.e., white nodes in Figure 1). We canexplain that a number of users perform actions and interpret findingstogether to improve the quality of their knowledge. Also distributedcognition theories have to be considered when examining representa-tions and interactions among humans and artifacts (Liu et al. [20]).Nobarny et al. [24] already developed a system and performed stud-ies focusing on distributed cognition in order to support collaborativevisual analytics.

Especially the externalization and communication of informationplays a crucial role that needs more detailed investigations with regardsto the presented concepts of findings, insights or knowledge. The im-portance of externalization in visual analytic processes can be observedby heavy reliance on note-taking throughout analysis processes [22].In real world, groups of users often take a variety of approaches tosynthesize information with their own organizational language [32].Communicating knowledge in visual analytics is actually the communi-cation of evidence found in data supporting a belief. This evidence isshared with findings or insights, e.g., a commented visualization reveal-ing an interesting relationship in data. The communication counterpartcan follow the evidence and, depending on the level of trust, gain owninsights or knowledge out of it. Collaborative analysis scenarios allowcommunication partners to verify this evidence with the system. Weshould also note that collaboration between multiple analysts openup chapters about maintaining exploration awareness [38]. In case ofpresentations or static documents the missing visual analytics systemis replaced by the presenter or reporter. During presentations ques-tions and answers are able to replace the knowledge generation loops.Static documents have to convey all information from hypotheses, find-ings, and insights in a way that readers can follow the conclusions.Interactive reports or documents (e.g., infographics) go beyond the

aforementioned scenario as they can be seen as a report combined witha limited visualization that is tailored to show the relevant findings ofinsights.

5.2 Visual Analytics of Streaming DataIn addition to collaborative visual analytics, we can apply our modelto visual analytics of data streams (e.g., monitoring twitter data). Inthis situation, our data node is dynamically changing, so our modelreformulates its visual representations and analytic models accordingto substantial changes made by the data streams. Further investigationis necessary since a number of requirements on data management,knowledge discovery and visualization need to be researched due tothe dynamic nature of streams. Mansmann et al. [23] illustrate this andhighlight that the analyst role changes since exploration is extendedto include real-time monitoring tasks where situational awareness andcomplex decision making come into play.

5.3 Automatic Support of Knowledge GenerationWe will now consider how novice users can be given guidance usingthe system and supporting analysis can aid knowledge generation. Sys-tems can make automatic suggestions based on their current state, e.g.,choosing a suitable color mapping or selecting parameters based ondata.

5.3.1 Exploration LoopThe exploration loop is the basis of all knowledge generation in visualanalytics. An important trigger to observe findings is how the systemhandles interactions. Analysts can learn from the causality between in-teractions and reaction, hence, supporting the exploration loop requiresa system to respond with an immediate observable reaction to any inter-action. Many algorithms in data analysis require complex computationsand are not able to calculate a complete result immediately. In thesecases, analysts should at least receive feedback that the algorithm isstill running. Systems should provide the ability to switch between thestates before and after calculation so analysts can learn from interac-tions. If the final result can be estimated, for instance, by intermediateresults of an incremental algorithm, the estimation could be shown tothe analysts with the additional ability to abort the calculation.

Findings are related to unexpected results or patterns in models orvisualizations. Automatically detecting unexpected results is compli-cated, because it requires a definition of what analysts are expecting,which is usually not known to systems. On the other side, patternmining algorithms are extracting patterns directly from data and in vi-sualization, patterns could be detected with automatic methods as well.Even automatic methods to judge the usefulness of visualization exists,e.g., Bertini et al. [3] gives an overview of quality metrics approaches.

Actions are dependent on findings and the goal of the analysis. Vi-sual analytic systems could provide suggestions for further actions inthe analysis process, as in behavior-driven visualization recommen-dation [10]. Based on findings, these suggestions could offload someburden from analysts of having to choose the right action from all dif-ferent possibilities. Novice users would benefit from such suggestions,because the system would present proper actions allowing users to learnthe abilities of the system. For expert users the interaction with thesystem would be more efficient compared to navigating a large set ofoptions. It is important to find an adequate level of suggestions basedon user experience. A solution to this problem could be a learningtechnique, such as active learning (e.g., Settles [35]), that adapts thesuggestions to users and minimizes the interaction costs.

5.3.2 Verification LoopThe verification loop is the central part in the knowledge generationloops. Analysts combine findings from data with their domain knowl-edge and gain new insights into the problem domain. The knowledgeof analysts play an important role in the verification loop, thereforeautomatic support for the verification loop is limited to helping analystsrecord their analysis results.

Systems are not directly generating insights but analysts gain newinsights from data when they are able to interpret findings. In this

Page 9: Knowledge Generation Model for Visual Analytics · Knowledge Generation Model for Visual Analytics Dominik Sacha, Andreas Stoffel, Florian Stoffel, Bum Chul Kwon, Member, IEEE, Geoffrey

respect systems can support this process by providing useful summariesof findings by allowing analysts to organize findings, hypotheses, orinsights. Often insights are not dependent on a single finding but arehidden in complex relationships in the data and often difficult to findwithout prior knowledge. Systems addressing this problems are, forinstance, Shrinivasan et al. [37] and Wright et al. [42].

To formulate hypotheses about the problem domain, findings arenot enough as analysis requires insights into the domain. However,systems can automatically formulate hypotheses about the analyzeddata from findings. Theses hypotheses can help analysts get a fasteroverview of unknown data sets but their use for complex analysis tasksis limited, because only hypotheses about relations in data can begenerated automatically.

5.3.3 Knowledge Generation LoopAutomating the steps from insights to knowledge or from knowledgeto new hypotheses is according to our definition not possible. Analystsgain new knowledge when the evidence collected with a visual analyticssystem is convincing. The best way to support knowledge generationwith visual analytics are systems with the ability to look at data fromdifferent perspectives. This gives analysts the possibility to collectversatile evidence and increases the level of trust in findings or insights.

5.4 Future Investigations on Visual Analytics SystemsOur model specifies some areas that our current research can furtherinvestigate. We find some interaction types missing in many systemsand interaction models, especially in the model construction or thecoupling of models and visualizations. Many visual analytics tools tendto maintain preloaded models or to provide a very limited capabilityto manipulate models by adjusting some parameters. We observe thatdata become more and more dynamic and unstructured and humaninteraction on the model part is therefore crucial to analyze such data.Secondly, we see that many improvements could be made to further sup-port human actions. Systems can actively learn from user behavior andadapt its models and visualizations, too. Visual analytics systems couldproactively seek next candidate actions based upon user-generated logs.Our model points out that we need more explicit support to transferfindings, insights, and knowledge. As Endert et al. [7] states, visualanalytics should recognize and integrate human working processesinto the system. Different ways of how systems could offload someburdens from human users exist. The key is the interaction with visualanalytics, although we should be aware of interaction costs (Lam [18],van Wijk [40]). Frequent interactions triggered by the system coulddemand too much effort, which may discourage user’s exploration.

After analysis, the results have to be documented or communicatedto others. An interesting ability of visual analytics systems couldbe a semi-automatic approach for generating documentations. Algo-rithms could detect interesting new findings and put them together in aconvincing form, removing dead ends and duplicate findings. Such in-telligent journals would make creation of reports or presentations mucheasier. Alternatively, written reports could be enriched with findingssupporting statements in the document.

The analysis process often follows one direction and many findingsare not explored in depth. Visual analytics systems could provide func-tionalities to backtrack former analysis results and suggest interestingbut not investigated findings for further analysis. This would requirealgorithms to judge the interestingness of findings in context of theconducted analysis results. This functionality together with a short sum-mary of previous results could be helpful for continuing an interruptedanalysis session.

5.5 Real World ScenarioOur model reflects an ideal scenario where an analyst uses one singlevisual analytics system that is capable of handling all requirements butreal world scenarios are different. On the one hand, users are oftennot aware of the systems capabilities and lack the required expertise tounderstand complex analysis methods (Gulf of Execution). On the otherhand, the system’s capabilities are not sufficient to solve an analysistask. As an outcome, analysts may stop their analysis in order to

consult domain experts or continue their analysis with another system.In addition our model implies that each action ends in an observablereaction of the system. Real world systems often lack of this capability,which is a known problem in interaction science (Gulf of Evaluation).

The analysis of real world problems requires both expertise aboutthe analysis and the domain. Domain experts often lack experience inunderstanding computer systems, visualization techniques and analysismethods, whereas visual analytics experts lack of sufficient domainknowledge. Thus, analysis requires collaboration between them. With-out domain knowledge a visual analytics analysts is able to generatefindings and insights concerning the data, not the domain. The domainexpert is responsible for the formulation of problem hypotheses, thedetection and interpretation of patterns. Domain experts need to befamiliar with visual analytics methods and systems. On the other handanalysts have to learn about the problem domain.

One thing to keep in mind is that our model simplifies many differentprocesses, and it contains inherent fuzziness, especially on the humanside. That is the reason why our model consists of various loops thatrepresent several levels of thinking. Obviously, no visual analytics toolscan differentiate exactly between reasoning processes. Furthermore,the processes can disseminate many influences into other processes,which may not clearly appear in our model. Often many conflictinghypotheses are investigated in parallel and derived findings, insights orknowledge may affect each other.

5.6 Teaching Visual Analytics

Our model can be useful to provide a general overview for novicestudents, designers, and researchers in visual analytics. Teaching visualanalytics often require teachers to provide fundamental concepts com-monly appearing throughout applications, which inevitably involvesdomain knowledge. Our model can be used to explain the knowledgegeneration process without the need for such expert language. In addi-tion, this model touches upon various components of visual analytics,such as interactions and automatic algorithms, in the perspectives ofvisual analytics applications. This guideline could point researchers torelevant literature in case they want to find out about specific methods.Our model also highlights the importance of the interplay and collab-oration between human and machines. This rather obvious but easilyforgotten notion could be highlighted, as illustrated in Figure 1.

6 CONCLUSION

In this paper we present a process model for knowledge generation invisual analytics that integrates system and human aspects. The modeldefines and relates relevant concepts and provides a knowledge gen-eration process from knowledge to data and vice versa. The modelembeds concepts into a three loop framework and illustrates possiblehuman machine pairings that are fundamental in visual analytics. Weillustrated that our model integrates with existing models and theoriesthat focus on specific parts within the overall context. We demonstratedthe models application using Jigsaw as an example system as well asundertaking a comparative assessment of three other well-known appli-cations. Finally, we discussed model implications, named open issues,and pointed to future directions. The right hand side (human part) ofour model is not restricted to visual analytics and can also be relevantfor other disciplines as it combines computer and human based theories.The model aims to give a basis for more detailed compositions of theo-ries. Whilst it is not our intention to cover every single details of suchprocesses, we do provide a overview that commonly appears in visualanalytics processes. We also acknowledge that human’s knowledgeprocesses cannot be linear or clearly subdivided into components ofour model (e.g., insight). However, we provide inherently limited butmeaningful distinction between human’s knowledge gaining processes.

REFERENCES

[1] M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kotter, T. Meinl,P. Ohl, K. Thiel, and B. Wiswedel. Knime - the konstanz informationminer: version 2.0 and beyond. SIGKDD Explorations, 11(1):26–31,2009.

Page 10: Knowledge Generation Model for Visual Analytics · Knowledge Generation Model for Visual Analytics Dominik Sacha, Andreas Stoffel, Florian Stoffel, Bum Chul Kwon, Member, IEEE, Geoffrey

[2] E. Bertini and D. Lalanne. Surveying the complementary role of automaticdata analysis and visualization in knowledge discovery. In Proceedings ofthe ACM SIGKDD Workshop on Visual Analytics and Knowledge Discov-ery: Integrating Automated Analysis with Interactive Exploration, pages12–20. ACM, 2009.

[3] E. Bertini, A. Tatu, and D. A. Keim. Quality metrics in high-dimensionaldata visualization: An overview and systematization. IEEE Trans. Vis.Comput. Graph., 17(12):2203–2212, 2011.

[4] M. Brehmer and T. Munzner. A multi-level typology of abstract visualiza-tion tasks. Visualization and Computer Graphics, IEEE Transactions on,19(12):2376–2385, 2013.

[5] S. K. Card, J. D. Mackinlay, and B. Shneiderman. Readings in informationvisualization: using vision to think. Morgan Kaufmann, 1999.

[6] R. Chang, C. Ziemkiewicz, T. M. Green, and W. Ribarsky. Defininginsight for visual analytics. Computer Graphics and Applications, IEEE,29(2):14–17, 2009.

[7] A. Endert, M. Hossain, N. Ramakrishnan, C. North, P. Fiaux, and C. An-drews. The human is the loop: new directions for visual analytics. Journalof Intelligent Information Systems, pages 1–25, 2014.

[8] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining toknowledge discovery in databases. AI magazine, 17(3):37, 1996.

[9] C. Gorg, Z. Liu, J. Kihm, J. Choo, H. Park, and J. T. Stasko. CombiningComputational Analyses and Interactive Visualization for Document Ex-ploration and Sensemaking in Jigsaw. IEEE Transactions on Visualizationand Computer Graphics, 19(10):1646–1663, 2013.

[10] D. Gotz and Z. Wen. Behavior-driven visualization recommendation.In Proceedings of the 14th International Conference on Intelligent UserInterfaces, page 315324, New York, NY, USA, 2009. ACM.

[11] D. Gotz, Z. When, J. Lu, P. Kissa, N. Cao, W. H. Qian, S. X. Liu, andM. X. Zhou. Harvest: An intelligent visual analytic tool for the masses.In Proceedings of the First International Workshop on Intelligent VisualInterfaces for Text Analysis, IVITA ’10, pages 1–4, New York, NY, USA,2010. ACM.

[12] D. Gotz and M. X. Zhou. Characterizing users’ visual analytic activity forinsight provenance. Information Visualization, 8(1):42–55, 2009.

[13] T. Green, W. Ribarsky, and B. Fisher. Visual analytics for complex con-cepts using a human cognition model. In Visual Analytics Science andTechnology, 2008. VAST ’08. IEEE Symposium on, pages 91–98, Oct 2008.

[14] T. M. Green, W. Ribarsky, and B. Fisher. Building and applying a humancognition model for visual analytics. Information Visualization, 8(1):1–13,Jan. 2009.

[15] D. A. Keim, J. Kohlhammer, G. Ellis, and F. Mansmann. Mastering theInformation Age - Solving Problems with Visual Analytics. EurographicsAssociation, 2010.

[16] D. A. Keim, F. Mansmann, J. Schneidewind, J. Thomas, and H. Ziegler.Visual analytics: Scope and challenges. Springer, 2008.

[17] B. C. Kwon, B. Fisher, and J. S. Yi. Visual analytic roadblocks for noviceinvestigators. In 2011 IEEE Conference on Visual Analytics Science andTechnology (VAST), pages 3–11, 2011.

[18] H. Lam. A framework of interaction costs in information visualization.Visualization and Computer Graphics, IEEE Transactions on, 14(6):1149–1156, Nov 2008.

[19] P. Legrenzi, V. Girotto, and P. N. Johnson-Laird. Focussing in reasoningand decision making. Cognition, 49(1):37–66, 1993.

[20] Z. Liu, N. J. Nersessian, and J. T. Stasko. Distributed cognition as atheoretical framework for information visualization. Visualization andComputer Graphics, IEEE Transactions on, 14(6):1173–1180, 2008.

[21] L. Magnani. Abduction, Reason and Science: Processes of Discovery andExplanation. Springer, 2001.

[22] N. Mahyar, A. Sarvghad, and M. Tory. A closer look at note takingin the co-located collaborative visual analytics process. In 2010 IEEESymposium on Visual Analytics Science and Technology (VAST), pages171–178, Oct. 2010.

[23] F. Mansmann, F. Fischer, and D. A. Keim. Dynamic visual analytics facingthe real-time challenge. In Expanding the Frontiers of Visual Analyticsand Visualization, pages 69–80. Springer, 2012.

[24] S. Nobarany, M. Haraty, and B. Fisher. Facilitating the reuse process indistributed collaboration: a distributed cognition approach. In Proceedingsof the ACM 2012 conference on Computer Supported Cooperative Work,pages 1223–1232. ACM, 2012.

[25] D. Norman. The Design of Everyday Things. Basic Books, 2002.[26] C. North. Toward measuring visualization insight. Computer Graphics

and Applications, IEEE, 26(3):6–9, 2006.

[27] C. S. Peirce. Collected papers: Elements of Logic, volume 2, pages 56–59.Belknap Press, 1965.

[28] W. A. Pike, J. Stasko, R. Chang, and T. A. O’Connell. The science ofinteraction. Information Visualization, 8(4):263–274, 2009.

[29] P. Pirolli and S. Card. The sensemaking process and leverage points foranalyst technology as identified through cognitive task analysis. In Pro-ceedings of International Conference on Intelligence Analysis, volume 5,pages 2–4, 2005.

[30] M. Pohl, M. Smuc, and E. Mayr. The user puzzle: Explaining the interac-tion with visual analytics systems. Visualization and Computer Graphics,IEEE Transactions on, 18(12):2908–2916, Dec 2012.

[31] J. C. Roberts, D. A. Keim, T. Hanratty, R. R. Rowlingson, R. Walker,M. Hall, Z. Jacobson, V. Lavigne, C. Rooney, and M. Varga. From Ill-Defined Problems to Informed Decisions. EuroVis Workshop on VisualAnalytics (2014), 2014.

[32] A. Robinson. Collaborative synthesis of visual analytic results. In IEEESymposium on Visual Analytics Science and Technology, 2008. VAST ’08,pages 67–74, 2008.

[33] P. Saraiya, C. North, and K. Duca. An insight-based methodology forevaluating bioinformatics visualizations. Visualization and ComputerGraphics, IEEE Transactions on, 11(4):443–456, 2005.

[34] H.-J. Schulz, T. Nocke, M. Heitzler, and H. Schumann. A design space ofvisualization tasks. Visualization and Computer Graphics, IEEE Transac-tions on, 19(12):2366–2375, 2013.

[35] B. Settles. Active learning literature survey. University of Wisconsin,Madison, 52:55–66, 2010.

[36] B. Shneiderman. The eyes have it: A task by data type taxonomy forinformation visualizations. In Visual Languages, 1996. Proceedings.,IEEE Symposium on, pages 336–343. IEEE, 1996.

[37] Y. Shrinivasan, D. Gotz, and J. Lu. Connecting the dots in visual analysis.In Visual Analytics Science and Technology, 2009. VAST 2009. IEEESymposium on, pages 123–130, Oct 2009.

[38] Y. Shrinivasan and J. van Wijk. Supporting exploration awareness ininformation visualization. IEEE Computer Graphics and Applications,29(5):34–43, Sept. 2009.

[39] J. T. Stasko, C. Gorg, Z. Liu, and K. Singhal. Jigsaw: Supporting Inves-tigative Analysis through Interactive Visualization. In IEEE VAST, pages131–138. IEEE, 2007.

[40] J. van Wijk. The value of visualization. In Visualization, 2005. VIS 05.IEEE, pages 79–86, Oct 2005.

[41] T. von Landesberger, S. Fiebig, S. Bremm, A. Kuijper, and D. W. Fellner.Interaction taxonomy for tracking of user actions in visual analytics ap-plications. In Handbook of Human Centric Visualization, pages 653–670.Springer, 2014.

[42] W. Wright, D. Schroh, P. Proulx, A. Skaburskis, and B. Cort. The sandboxfor analysis: Concepts and methods. In Proceedings of the SIGCHIConference on Human Factors in Computing Systems, CHI ’06, pages801–810, New York, NY, USA, 2006. ACM.

[43] J. S. Yi, Y.-a. Kang, J. T. Stasko, and J. A. Jacko. Understanding andcharacterizing insights: how do people gain insights using informationvisualization? In Proceedings of the 2008 Workshop on BEyond time anderrors: novel evaLuation methods for Information Visualization, page 4.ACM, 2008.