from data chaos to the visualization cosmoscsbob/research/phdthesis/tong/tong18...cosmos. the focus...

13
Book Chapter Template From Data Chaos to the Visualization Cosmos Chao Tong and Robert S. Laramee, Visual And Interactive Computing Group, Swansea University, Swansea, UK, {806708, r.s.laramee}@swansea.ac.uk Abstract Data visualization is a general term that describes any effort to help people enhance their understanding of data by placing it in a visual context. We present a ubiquitous pattern of knowledge evolution that the collective digital society is experiencing. It starts with a challenge or goal in the real world. When implementing a real-world solution, we often run into barriers. Creating a digital solution to an analogue problem create massive amounts of data. Visualization is a key technology to extract meaning from large data sets. Keywords: Data Visualization, Real world challenge, Data chaos, digital solution 1. Introduction Data visualization is a general term that describes any effort to help people enhance their understanding of data by placing it in a visual context. Patterns, trends and correlations that might go undetected in numeric on text-based data can be exposed and recognized easier with data visualization software [1]. Figure 1 represents a ubiquitous pattern of knowledge evolution that the collective digital society is experiencing. It consists of six basic constituents. It starts with a challenge or goal in the real world. The goal could be to build or optimize a design, like a car or computer. The start could be a challenge such as reaching a new level of understanding or observing a behavior or phenomenon rarely or never seen previously. The goal could be running a successful business and making a profit. We all have real world goals and challenges. We all have new understanding and knowledge we would like to obtain. We all have things we would like to build, create, and optimize.

Upload: others

Post on 02-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From Data Chaos to the Visualization Cosmoscsbob/research/phdThesis/tong/tong18...cosmos. The focus is on the last two stages: from data chaos to the visualization cosmos. 2. The Universal

BookChapterTemplate

FromDataChaostotheVisualizationCosmos

Chao Tong and Robert S. Laramee, Visual And Interactive Computing Group,SwanseaUniversity,Swansea,UK,{806708,r.s.laramee}@swansea.ac.uk

Abstract

Data visualization is a general term that describes any effort to help peopleenhancetheirunderstandingofdatabyplacingitinavisualcontext.Wepresentaubiquitous pattern of knowledge evolution that the collective digital society isexperiencing.Itstartswithachallengeorgoalintherealworld.Whenimplementinga real-worldsolution,weoftenrun intobarriers. Creatingadigital solution toananalogueproblemcreatemassiveamountsofdata.Visualizationisakeytechnologytoextractmeaningfromlargedatasets.

Keywords:DataVisualization,Realworldchallenge,Datachaos,digitalsolution

1.Introduction

Datavisualizationisageneraltermthatdescribesanyefforttohelppeopleenhancetheirunderstandingofdatabyplacing it inavisual context. Patterns, trendsandcorrelationsthatmightgoundetectedinnumericontext-baseddatacanbeexposedand recognized easierwithdata visualization software [1]. Figure1 represents aubiquitous pattern of knowledge evolution that the collective digital society isexperiencing.Itconsistsofsixbasicconstituents.Itstartswithachallengeorgoalin the real world. The goal could be to build or optimize a design, like a car orcomputer. The start could be a challenge such as reaching a new level ofunderstanding or observing a behavior or phenomenon rarely or never seenpreviously.Thegoalcouldberunningasuccessfulbusinessandmakingaprofit.Weall have real world goals and challenges. We all have new understanding andknowledgewewouldliketoobtain.Weallhavethingswewouldliketobuild,create,andoptimize.

Page 2: From Data Chaos to the Visualization Cosmoscsbob/research/phdThesis/tong/tong18...cosmos. The focus is on the last two stages: from data chaos to the visualization cosmos. 2. The Universal

Figure 1 :A ubiquitous pattern of knowledge evolution. Here we present the model of our collective thought process.

When trying to build somethingwe generally know thatwhatever it is, it cantheoreticallybebuiltinthereal-world.Forexample,carsandstructurescanbebuiltout of raw materials and components with the right tools. We also know thatobservationscanbemade, ingeneral,bybeing intherightplaceattherighttime,either personally or with recording equipment. Experiments can generally beconducted with the appropriate equipment. New levels of understanding cangenerallybeobtainedifwehireenoughoftherightpeople.

However,whenimplementingareal-worldsolution,weoftenrunintobarriers.Carsandstructuresareextremelyexpensivetobuildandmayalsorequirea long-term investment. Observations may be very expensive, very difficult, or evenimpossible.Someobservationsinterferewiththeverybehaviororphenomenatheyaretryingtostudy.Recordingequipmentmaybetooexpensiveorcauselogisticalproblems.Equipmentforexperimentsisgenerallyveryexpensive.Thisisespeciallytrue if the equipment is specialized or for very small or very large-scaleinvestigations.Also,hiringpeoplefornewunderstandingmaynotbefeasibleduetoexpense. A full-time research assistant costs 100K GBP per year under currentfunding agency full economic costing (FEC) requirements in the UK. Real-worldsolutionsaregenerallyveryexpensiveornotfeasibleatall.Somereal-worldsolutionsareimpossible.

Itisbecauseofthehighcostofreal-worldsolutionsthatcollectively,asasociety,weturntodigitalsolutionstoaddressourchallengesandgoals.ThedottedlineinFigure 1 separates the real, physical, or analogueworld on the left side from thedigitalworldon theright. Weall look to thedigitalworld for theanswers toourquestions.“Theremustbeanappforthat.”or“Whatappcanbebuilttosolvethisproblem?”isthecollectivethinkinginthisdayinage.Societylookstowardsdigitalsolutions for theirreal-worldproblemstodeliver theuser fromthedilemmatheymayface.Peoplebelievethatsoftwareisless-expensivetobuildthanobjectsintherealworld.Thevirtualworldshouldbemorefeasiblethanthephysicaloranalogueworld.Andthisistrueinmanyscenarios.

However, creating a digital solution to an analogue problem introduces newchallenges. In particular, digital solutions, including software, create massive

Page 3: From Data Chaos to the Visualization Cosmoscsbob/research/phdThesis/tong/tong18...cosmos. The focus is on the last two stages: from data chaos to the visualization cosmos. 2. The Universal

amounts of data. The amount of data digital approaches generate is generallyunbounded. Software and storage hardware is less and less expensivewith time.Thus,userscollect,collect,andcollectevenmoredata.ThisisthepointatwhichtheknowledgeevolutionpipelineofFigure1becomes interesting.Largecollectionsofcomplex data are not automatically useful. Extracting meaningful information,knowledge,andultimatelywisdomfromlargedatacollectionisthemainchallengefacingthedigitalworldtoday.Thecollectionofessentiallyunboundeddataiswhatwetermdatachaos.Collectingandarchivingdatawithoutcarefulplanningandwellthought out information design quickly or slowly results in a chaotic dataenvironment.Thosewhocollectdataaregenerallynotyetawareofhowdifficultitistothenderiveusefulinsightandknowledgefromit.

Ontheotherhand,theknowledgethatvisualizationisakeytechnologytoextractmeaning fromlargedatasets israpidlyspreading.This isonesolutionto thedatachaos.Intheearlyyearsofdatavisualizationasafield,saythefirst10years,from1987-1997, data visualizationwas considered very niche. Notmany people knewaboutitnorknewofitsexistence.Itisonlysincearoundtheturnofthecenturythatwordstartedtospread.Inthe2000sthefirstmain-streamnewsstoriesincludingthephrase‘DataVisualization’werepublished.Nowadays,thefieldhascomealongwayfromobscuritytobreakingintothemainstream.Itspresenceandimportanceasafield isstartingtobecomeunderstood.Wordisspreadingthatadatavisualizationcommunityexistsandthatthisisatopicastudentcanstudyatuniversity.

That'sthebasicpatternofknowledgeevolution.Therestofthechapterprovidesconcreteexamplesofthesesixstagesfromreal-worldchallengestothevisualizationcosmos. The focus is on the last two stages: from data chaos to the visualizationcosmos.

2.TheUniversalBigDataStory(andQuandary)

Wecanfindthispatterneverywhere.Itdoesn'tmatterwherewelook.Wecanseein computational fluid dynamics. Physicists and astronomers are facing thesedilemmas.It'snotpossibletostudyallthestarsandblackholesphysically.Weseethis patternwithmarine biologists, biochemists, psychologists, sociologists, sportscientists,journalists,andthosestudyingthehumanities.Weseethisevolutionwithgovernmentcouncils,banks,callcenters,retailwebsites,transportation.Thelist isvirtuallyendless.Youcanexperiencethisyourselfasyoucollectyourownphotos.Peopleliketocollectthings.Thisisanothercontributingfactortothedatachaos.Apersonmaynotevenhaveagoaltoreachoraproblemtheyaretryingtosolve.Theyjustliketocollect.

3.TheVisualCortex

Datavisualizationuses computergraphics togenerate imagesof complexdatasets. It's different from computer graphics. “Computer graphics is a branch ofcomputerscience,yes,butitsappealreachesfarbeyondthatrelativelyspecializedfield.Initsshortlifetime,computergraphicshasattractedsomeofthemostcreativepeopleintheworktoitsfold,”fromtheclassictextbook“IntroductiontoComputerGraphics” by Foley et al, 2000. Visualization tries to generate images of reality.

Page 4: From Data Chaos to the Visualization Cosmoscsbob/research/phdThesis/tong/tong18...cosmos. The focus is on the last two stages: from data chaos to the visualization cosmos. 2. The Universal

Visualizationexploitsourpowerfulvisualsystem.Wehaveseveralbillionneuronsdedicatedtoourvisualprocessingandvisualcortex[2].SeeFigure2.

Figure 2: Several billion neurons are devoted to analyzing visual information.

Thenumbersofneuronsarenotverymeaningfulunlessweputthemintocontext.Wehaveeightpercentofthecortexdedicatedtouchandthreepercentdedicatedtohearing.We have anywhere from 4 to 10 times of our cortex dedicated to visualprocessing than the other senses. Itmakes sense to explore the visual processingpowerinourbrainsasopposedtotheothersenses.It'sdedicatedtoprocessingcolor,motion,texture,andshape.

4.VisualizationGoals

Datavisualizationhas somestrengthsandgoals itself.Oneof thegoalsofdatavisualizationisexploringdata.Thismaybethecasewhentheuserdoesnotknowanythingabout theirdataset.They justwant to findoutwhat it looks likeand itscharacteristics.

Userssearchfortrendsorpatternsinthedata.Explorationisfortheuserthat'snotveryfamiliarwiththedataset.Visualizationisalsogoodforanalysis:toconfirmorrefuteahypothesis.Anexpertmayhavecollectedthedataforaspecialpurposeand would like to confirm or refute a hypothesis or answer a specific question.Visualizationisalsoeffectiveforpresentation.

When our exploration and analysis is finishedwe can present the results to awideraudience.Visualizationisalsogoodforaccelerationi.e.tospeedupsomethingsuch as a search process. This is often a decision-making process or knowledgediscoveryprocess.Wecanseethingsthatwereotherwiseimpossible.

5.Example:VisualizationofCallCenterData

Let'slookatthisfirstexampleofthispatternofknowledgeevolution.Thisisfromabusinesscontext.OneofSwanseaUniversity'sindustrycollaboratorsiscalledQPCLtd.Theyareaninnovatorincallcentertechnology.Theirgoalistounderstandcallcenterbehaviorandtoincreaseunderstandingofcallsandalltheactivitiesthatoccurinsideacallcenter.Thecallcentersarestaffedwithmanyagentsandtheagentsare

Page 5: From Data Chaos to the Visualization Cosmoscsbob/research/phdThesis/tong/tong18...cosmos. The focus is on the last two stages: from data chaos to the visualization cosmos. 2. The Universal

answering hundreds of thousands of calls every day. How can we increase ourunderstandingofallthoseeventsandwhatishappeninginsideofacallcenter?

Wetheoreticallycouldgodowntheanalogorphysicalroute.Wecouldhiremore

peoplethatstandandobservewhat'shappeninginthecallcenterandattempttotakenotes to enhance understanding. Or maybe CCTV could be used to try to filmeverythingthat'sgoingon.Theseanaloguesolutionswillbeveryexpensiveandnotverypractical.Theanalogsolution tohiremorepeople for justobservation isnotpracticallyfeasibleandwillcosttoomuchmoney.

So QPC Ltd chose the digital solution. They decided to implement an event

database.Thedatabaselogsalleventsinthecallcenter:whocalled,whentheycall,howmuchtimetheyspendnavigatingmenusinsidetheinteractivevoicerecognitionsystem(IVR),howlongtheyspentinthequeuebeforespeakingtoanagent,whetherornottheyabandontheircall,whichagenttheyspoketo,andhowlongtheyspoketoeachagentetc.Thatdigitalsolutionintheformofadatabasestoresofmillionseventseveryday.A call centergenerates lotsof activities.TheUKemploysoveramillionpeople in call centers or about five percent of itsworkforce are employed in callcenters[3].It'salargemarket.

Figure 3: Visualization of call center data. Image courtesy of Roberts et al. [3].

Howdowetakethechaosofcallcenterdataandvisualizeittomakesenseofit?

Wecanuseatreemapasoneofthewaystovisualizecallcenterevents.SeeFigure3.Thetreemapisahierarchicaldatastructure.Westartwithanoverviewofthedataand then zoom in down to different levels of detail. In this case, the size of therectanglesisinitiallymappedtocallvolume.Thedifferenthoursstartfrommidnighttomidnightagain.Wecanseewhenthecallcenteropensandwhenthecallvolumeincreasesandreachesitsmaximumataroundlunchtime.Thenitstartstodescendagain.

Colorismappedtothepercentageofabandonedcallsbydefault.Wecannotice

call centers trying to avoid abandoned calls. We can observe a big increase inabandonedcallsintheeveningrightafterdinneraround7pm-8pm.Theusercanmapthecallstodifferentcolorsatdifferentcosts.Theycanalsomapthecolorsto

differentkindsofeventsforexampleabandonedcallsorsuccessfulcalls.Wecanalsonavigatethetreemap.Wecanzoominsmoothlyandseemoredetails.

Wecanzoomintosinglehourandeachrectanglerepresentsasinglecall.Wecan

Page 6: From Data Chaos to the Visualization Cosmoscsbob/research/phdThesis/tong/tong18...cosmos. The focus is on the last two stages: from data chaos to the visualization cosmos. 2. The Universal

visualizeindividualcallsandhowlongtheytake.Thereisacallthatlastedtwohours.Theunusualcallsthatlastlongtimejumprightout.Probablytheyspentalongtimewithanagent─averydedicatedagentspentalongtimetryingtosolveacustomerproblem.Theuserscanuseaclock interface tosmoothlyzoomandnavigateeachhour.Thesoftwarefeaturesasmoothzoomingandpanningoperationandwiththeclockshowing.Theuserdoesnotgetlost.

Wecaneasilyseewhichhoursweareobservingevenwhenwezoomin.Wecan

zoominevenfurther,onehourisbrokenupinto10-minuteintervalsandthenthose10-minute intervals are broken up into single minute intervals. We also see astandardhistogramontheleftwhichrepresentsthedataandprovidesanoverview.Eachbar representsa10-minute interval.Color ismapped to somedataattributechosenbytheuserinthiscasetheaveragecalllengthwhichwecanseeupinFigure3.Wecansee,suddenlyduring,theeveningaveragecalllengthincreases,andwecanseeoverthedaytheaveragecalllengthincreasesthroughoutthedayasanoveralltrend.

Thetreemapfeaturesafinelevelofdetail.Eachrectanglecanrepresentasingle

phone call and, in this case, how long each call lasted. At the top level are notindividualcalls.Eachrectanglerepresentsanhourandtheneachhourisbrokenupinto10-minuteblocks.Sowehave6,10minuteblocksandtheneachtimeintheblockisbrokenupintoindividualminutes.ThisisanexcitingprojectbecausethisisthefirsttimethatQPCLtdhaveeverseenoverviewofthecallcenteractivityinanywayshapeorform.Assoonasweseetheoverviewwecaneasilymakeobservationsaboutthecallcentervolumeabouttheincreasinglevelofabandoncalls.Theaveragecalllengthisalsoincreasingasweexaminetheday.

Figure 4: Focus + context filtering feature of call center data. Image courtesy of Roberts et al. [3]

Wecanfiltercallsusingdifferentsliders.Thisistheanalyticalpartoftheprocess.Thisisanexampleoffocusandcontextvisualization.SeeFigure4.Wefocusonthecallsthatspendalongertimeinaqueue.

Page 7: From Data Chaos to the Visualization Cosmoscsbob/research/phdThesis/tong/tong18...cosmos. The focus is on the last two stages: from data chaos to the visualization cosmos. 2. The Universal

Wecan focuson the inboundcallsbecausecall centershave inboundcallsandoutboundcalls.Thesecanbefilteredbycompletedcalls.Wecancombinefiltersindifferentways.

We can click on an individual call and then obtain the most detailed level ofinformationlikehowmuchtimethecallerspentintheIVRnavigatingmenus,howmuchtimetheyspentqueuingandhowmuchtimetheyspenttalkingtoagents.Wehavetwodifferentqueuingevents,anagentevent,asecondagentevent,backinthequeue,backtoanotheragent,backintothequeueagain.Thatisacomplicatedphonecall.Thatisthelowestlevelofdetail.Wecanalsoseethetypeofcallinthiscaseaconsultcallasitshowsthenumberofevents,oneIVReventforqueuingeventsandfourdifferentagentevent.

One detailed view shows that the proportion each event as a unit proportion

because sometimes the events disappearwhen they're too short for a traditionalversion.

6.Example:InvestigatingSwirlandTumbleFlowwithaComparisonofTechniques

Figure 5: Investigating swirl and tumble flow with a comparison of flow visualization techniques. Image courtesy of Laramee et al. [4].

Computational fluid dynamics is the engineering discipline of tryingto predictfluidflowbehavior: fluidmotionas it interactswithgeometries likecars,ships,orairplanes[4,5].Ifwewanttounderstandhowfluidwillinteractwithasurfaceonewaytodo,istobuildanactualsurfaceandbuildaflowenvironmentandtovisualizetheflowwithsmokeordieorothersubstances.Thisissomethingfluidengineersdo.Thisisafieldofengineering.Butitisveryexpensive.Justtryatestflightandthenattempt to visualize the air flow around the wings with smoke. This is a veryexpensive experiment. There are analog solutions. Canwe comeupwith a digitalsolutionthatmakesthisinvestigationmorefeasibletoacceleratetheengineeringandmakeitlessexpensive?Thatistheinspirationbehindcomputationalfluiddynamics(CFD).

Here'sanexampleofcombustionchamberinanautomobileengine(Figure5).The

engineer'sgoalistoobtainaperfectmixtureoffuel-to-air.Thewaytheengineerspropose todo that is tocreate thishelicalmotion inside thecombustionchamber(Figure5left)andforthedieselengineexampletheidealpatternforthemixtureoffluid flow is a tumblemotionabout an imaginaryaxispointingout from thepage(Figure5right).

Page 8: From Data Chaos to the Visualization Cosmoscsbob/research/phdThesis/tong/tong18...cosmos. The focus is on the last two stages: from data chaos to the visualization cosmos. 2. The Universal

Wecanbuildarealphysicalsolution,butitsavestimeandmoneytogothroughadigitalprocessfirstbeforewebuildrealsolutions.Wedon'tneedtobuildasmanyrealsolutions.Thedigitalsolutioniscomputationalfluiddynamics.Asweknowincomputationalfluiddynamics,thenumberonechallengeistheamountofdatathatsimulationsgeneratewhichisatthegigabyteandterabytescale.CFDsimulationsrunfromweekstoevenmonthsevenonhighperformancecomputingmachines.HowcanweusevisualizationtomakesenseofthismassiveamountofCFDdata?

Figure 6: Pathlets visualizing tumble motion of flow. Image courtesy of Garth et al. [5]

Let'slookatsomedatavisualizationsolutionsforCFDdata,visualizingtheswirlandtumblemotion.SeeFigure6.Thisisthetumblemotionexamplesothoseare

calledpathorshortpath lines intheflowdirectionandthecolor ismappedtocrank angle. We have a piston head moving up-and-down a thousand cycles perminutes(atthebottom--notshown).

Figure 7: A hybrid visualization of particles and vortex core lines. Particles swirling around vortex cores aid the visualization of vortex core strength. Image courtesy of Garth et al.[4,5]

Wecanalsousevortexcorelinesacombinationofthesegreenpathsofvortexcore

lines: centers of swirling flow. See Figure 7. When combined with particles, theparticlesshowflowbehavioraroundthevortexcores.Thisiswhatwecallfeature-

Page 9: From Data Chaos to the Visualization Cosmoscsbob/research/phdThesis/tong/tong18...cosmos. The focus is on the last two stages: from data chaos to the visualization cosmos. 2. The Universal

based flow visualization -- looking for special features in the flow [4,5]. We canvisualizetheflowatthesurfaceitselfusingso-calledcriticalpoints.That'sasinkandthat'sasaddlepoint.Andthentherearesomecurvesthatconnectthedifferentpoints.Thosearespecialkindofstreamlinecalledseparatricesandtheyshowthetopologyofflow.

Topologyisaskeletalrepresentationoftheflow.Wecanseethetime-dependenttopologyoftheflow-sinks,thesources,andthevortexcorelines.Thevortexcorelinesaretubesinthemiddle.Wealsoseeaseparatrixontheboundariesofthesurfaceandtheyareanimatedovertime.That'saslow-motionversionoftime.Itissloweddownquitealot.Inrealitythisisinsideoftheengineandit'smovingupanddownhundredsof times per minute. We can also use a volume visualization of the fluid flowspecificallyforthevorticessothevorticesaretheareasofswirlingmotion.Theredismappedtoonedirectionofcircularflowandbluetheother.

Theideaistovisualizetheswirlandtumblemotion.Inthiscasethetumblemotionisaboutanimaginaryaxisthatpointsoutattheattheviewerjustlikeatumbledryer.Wecanobserveanaxispointingoutanddownwardstotheleftoftheidealaxis.It'saveryunstable axis of rotation.That iswhat these visualizations showanunstablerotationalaxis.

Thecomputationfluiddynamicistsseethisandtheyobservethisisnottheidealtumblemotion.There'salittlebitoftumblemotionrightaroundtheperimeterofthegeometrybutassoonaswelookinthecenterwestillseesomeswirlingmotionbutit'sveryfarfromtheidealkindoftumblemotiontheystrivefor.Theyhavetomakesomemodificationstothegeometrytotrytorealizethebestmixingpossible.Andherethisisalsonottheidealswirlmotion.Themotionisoff-center.Againtheyhavenotachievedtheirtargetoftheidealmotion.That'swhatthesevisualizationsshow.Theyshowthedifferencebetweentheactualandpredictedmotion.

Oneofthethingsthattheengineersliketoknowiswherepreciselytheflowismisbehaving.Theyknowwhattheywanttoseeandwhattheyexpecttosee.Theyliketoseevisualizationsthathighlightunwantedbehavior.That'swhatalluserswanttosee.Infact,thatcouldbeintheknowledgeevolutionpipeline.OneofthethingsQPC would like to see where the abandoned calls are and when people are notbehaving properly. Here the engineers can see where the flow does not behaveproperly.This is oneof the strengthsof visualization -- to showwhenandwherebehaviorgowrong.

7.Example:VisualizationofSensorDatafromAnimalMovement

The next example is from marine biology. Marine biologists would like tounderstandmarinewildlifeandhowmarinewildlifebehaves.Oneofthechallengesthattheyfaceisdeepseaunderwaterdiving.Howdoyoustudyanimalsthatdivedeepunderwaterforhoursorevendaysatatime?Howisthatpossible?Theoreticallythesolutionmightbetofollowtheanimal.Thatmightbekindofanapproach.Buttherearesomeproblemswiththat.Peoplecannotjustdiveafewkilometersunderneaththewater.Theycantrytobuildsubmarinesorsimilarbuttotrytofollowacormorantor tortoise in a submarine is not a very practical solution. It's not feasible, veryexpensive,andtheanalogsolutionisoneofthosecaseswheretheobservationitselfinfluencesthebehaviorwearetryingtostudy.

Marinebiologistslooktothedigitalworldforasolution.Theyusesensordevices

atSwanseaUniversitycalledadailydiary[6].Theyactuallycapturetheanimalslikeacormorant.Theyattachthedigitalsensorormaybemorethanonedigitalsensortothesubjectandthenreleaseit.SeeFigure8.

Page 10: From Data Chaos to the Visualization Cosmoscsbob/research/phdThesis/tong/tong18...cosmos. The focus is on the last two stages: from data chaos to the visualization cosmos. 2. The Universal

Figure 8: Visualization of Sensor Data from Animal Movement. Image courtesy of Grundy et al. [6]

Thentheyrecapturethesensorafewhoursorafewdayslater.Theyremoveitfrom the animal and they study the information that it collects about the localenvironment. It collects information on acceleration, local acceleration, localtemperature, pressure, ultraviolet light, and a few other properties. Anotherchallengecurrently is thatGPSdoesnotworkunderwateratgreatdepths. It'snotpossibletojustplotapathnaivelyinadeadreckoningfashionthesamewaywecanforlandanimals.

However,whentheusergetthisdatathisiswhatitlookslike(SeeFigure8right).Thisisatinylittlepieceofwhatitlookslike.Theyplot,foreveryattribute,magnitudeversustime.AccelerationMagnitudeisonthey-axisandtimeisonthex-axis.Theyclaimtheycaninferanimalbehaviorbasedonthesewavepatterns.Theycanlookatawavepatternandsaythatitlooksliketheanimalisdivingortheanimalhunting.

Butyoucanseethatthat'snoteasy.Thisisonlyafewsecondsofdata.Ifyouplottheday'sworthofdatainthisfashion,itwillwraparoundabuildingafewtimes.Theaccelerationhasthreecomponents:x,y,z.Thesearethreecomponentsdecoupled.Inrealitytheyformavectorin3-space.

Figure 9: Spherical visualization of sensor data coupled with standard visualization

(bottom). Image courtesy of Grundy et al.[6]

Themarinebiologistsaskedus ifwecandrivevisualizations that facilitate theunderstandingofmarinewildlifebehavior.Wehaveastandardvisualizationcoupledwith a newvisualization. (See Figure 9.) In the newvisual designwe can see thegeometryoftheanimalsandhowtheanimalisorientedimmediately.WhatGrundy

Page 11: From Data Chaos to the Visualization Cosmoscsbob/research/phdThesis/tong/tong18...cosmos. The focus is on the last two stages: from data chaos to the visualization cosmos. 2. The Universal

etal.didwasreintegratethex,y,zcomponentsoftheaccelerationandplottheminspherical space rather than time versus amplitude space. And theymap the unitvectorsontoasphereandcanimmediatelyinferanimalbehavior.Theycanalsomappressuretotheradius.Figure9showstheanimalswimmingatthesurfaceandthenthepressureincreases.Pressuremappedtoradiusrepresentsdivingbehaviorandthedivingbehaviorisveryeasytonotice.Nowthatisvisualizedinsphericalspacewe can observe swimming, hunting, searching behavior. This spherical space isinteractivesothatwecanrotate,zoom,andpanatdifferentangles.

Figure 10: Spherical histogram of sensor data. Image courtesy of Grundy et al.[6]

Figure 11: Utilizing data clustering methods of sensor data. Image courtesy of Grundy et al.[6]

Figure 10 presents a spherical histogram. The vectors are binned into unitrectanglesandthemoretimeananimalspendsinagivenpostureatthatorientation,the longerhistogrambin.Wecansee theposturesandthestates that theanimalsspendalongtimein.Ratherthanfocusingonallofthetime,theuserchoosesaspecialregionandthentheregionisplottedupcloseintheleft-handcorner.Theusercanclusterthevectorsintodifferentgroups.(SeeFigure11).Assigningeachdatapointto

Page 12: From Data Chaos to the Visualization Cosmoscsbob/research/phdThesis/tong/tong18...cosmos. The focus is on the last two stages: from data chaos to the visualization cosmos. 2. The Universal

agroupthatrepresentssomeinterestingaspectoftheanimalbehavior.Theusercanadjusttheprobabilityofanydatasamplebelongingtooneoftheclusters.Theseareclustersof animalpostures calculatedusingK-means clustering.Grundyet al. canrepresent clusters as spheres and then connect the spheres or the postureswithedgesthatrepresenttransitionsfromoneorientationtoanothersuccessively.Wecanobservethetransitionsbetweenvariousstatesandpostures.Wecanseethemostpopularordominantstates.Thatinformationpopsupimmediately.

8.Example:VisualizationofMolecularDynamicsSimulationData

Thegoalhereistounderstandbiologyatthemolecular level.Thereareanalogapproaches and solutions to this challenge. Biologists run experiments at themolecularlevelandtrytounderstandbehaviorofmoleculesusingexperimentsandnuclearmagneticresonancespectroscopy.Thesemachinesandexperimentsareveryexpensive.

Figure 12: Visualization of molecular dynamics simulation data. Image courtesy of Alharbi et al. [7]

Thewholefieldofcomputationalbiologyattemptstoaddressthischallengeinthedigitalworldbecauseit'smuchlessexpensivethantheanalogworld. Aswithanysimulationdataallthesimulationexpertsgeneratemassiveamountsofdata.Theytrytousethelatesthigh-performancecomputingmachines.

This is the interaction of lipids and proteins. See Figure 12. That's what thissimulationdatashowsandAlharbietal.[7]developsomevisualizationsoftwaretoenhanceunderstandingofthis.Theseholesareproteinandthenthepathsarelipidtrajectories. See Figure 12. The computational biologists attempt to visualize theinteractionbetweentrajectoriesandtheproteins.

Alharibietal.aretryingtodevelopvisualizationstohelpcomputationalbiologistunderstand thedatawithaspecial focus, in thiscase,onpath filtering. Given themassivenumberoftrajectorieshundredsofthousandsormillionsoftrajectoriesovermultiple time steps, is it possible to select a subset of those trajectoriesbasedon

Page 13: From Data Chaos to the Visualization Cosmoscsbob/research/phdThesis/tong/tong18...cosmos. The focus is on the last two stages: from data chaos to the visualization cosmos. 2. The Universal

interestingpropertiesthathelpthebiologistsunderstandingthebehavior?Alharibietal.developtoolsforfilteringandselectionofthesetrajectoriestotrytounderstandbehavior.Oneexampleisjustchangingthetimestepofthesimulationorfilteringthepathbyitslength.Theycanfocusonshorterpathsoronlongerpaths.Theycanslidethefilterovertolongpathsorthelongtrajectories.

Theuser can filter thepathsbasedonother characteristics.They chosea fewproperties that theyhopewillbe interesting for thecomputationalbiologists.Onepropertyiscurvature.Therearehighlycurvedpaths.

Theatomtrajectoriesareactuallythreedimensions,butthey'relimitedtoalayeranalogoustothebiospheresuchthatthezdimensionisrelativelysmallcomparedtothexandydimensions.Theycanvisualizeprojected2Dspaceorthevolumetric3-space. The user can experiment with 2D versus 3D. The standard visualizationpackagesforthisareconstrainedtoatwo-dimensionalplaneandthey'regenerallynotinteractive.

Conclusion

This chapter presents a ubiquitousmodel of knowledge evolutionwitnessed at acollective level by a society deeply involvedwith the digitalworld. It presents atheorysupportedbyanumberofcasestudiesrangingfromthecallcenterindustry,to automotive engineering, to computational biology. It sets the stage for datavisualizationasavitaltechnologytoevolveourunderstandingofdataandtheworlditdescribestothenextlevel.Itwillbeexcitingtowitnesshowthismodelandpatternevolveovertime.

References

[1] Data Visualization definition. Available from:https://searchbusinessanalytics.techtarget.com/definition/data-visualization[Accessed:2018-06]

[2]C.Ware,Informationvisualization:perceptionfordesign.Elsevier,2012.[3] R. Roberts, C. Tong, R. Laramee, G. A. Smith, P. Brookes, and T. D’Cruze,

“InteractiveAnalyticalTreemapsforVisualizationofCallCentreData,”inProceedingsof the Conference on Smart Tools and Applications in Computer Graphics.EurographicsAssociation,2016,pp.109–117.

[4]R.S.Laramee,J.Schneider,andH.Hauser,“Texture-basedflowvisualizationonisosurfaces.”inVisSym,2004,pp.85–90.

[5]C.Garth,R.S.Laramee,X.Tricoche,J.Schneider,andH.Hagen,“Extractionandvisualization of swirl and tumble motion from engine simulation data,” inTopologybasedMethodsinVisualization.Springer,2007,pp.121–135.

[6]E.Grundy,M.W.Jones,R.S.Laramee,R.P.Wilson,andE.L.Shepard,“Visualizationofsensordatafromanimalmovement,”inComputerGraphicsForum,vol.28,no.3.WileyOnlineLibrary,2009,pp.815–822

[7] N. Alharbi, R. S. Laramee, and M. Chavent, “Molpathfinder: interactivemultidimensional path filtering of molecular dynamics simulation data,” in TheComputerGraphicsandVisualComputing(CGVC)Conference,vol.2016,2016,pp.9–16.