visual network exploration for data journalists · 2 computational power, graph mathematics does...

18
HAL Id: hal-01672282 https://hal.archives-ouvertes.fr/hal-01672282 Submitted on 23 Dec 2017 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Visual Network Exploration for Data Journalists Tommaso Venturini, Mathieu Jacomy, Liliana Bounegru, Jonathan Gray To cite this version: TommasoVenturini, Mathieu Jacomy, Liliana Bounegru, Jonathan Gray. Visual Network Exploration for Data Journalists. Scott A. Eldridge II; Bob Franklin. The Routledge Handbook of Developments in Digital Journalism Studies, Routledge, 2018, 9781138283053. hal-01672282

Upload: others

Post on 17-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

HAL Id: hal-01672282https://hal.archives-ouvertes.fr/hal-01672282

Submitted on 23 Dec 2017

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Visual Network Exploration for Data JournalistsTommaso Venturini, Mathieu Jacomy, Liliana Bounegru, Jonathan Gray

To cite this version:Tommaso Venturini, Mathieu Jacomy, Liliana Bounegru, Jonathan Gray. Visual Network Explorationfor Data Journalists. Scott A. Eldridge II; Bob Franklin. The Routledge Handbook of Developmentsin Digital Journalism Studies, Routledge, 2018, 9781138283053. �hal-01672282�

Page 2: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

1

VISUALNETWORKEXPLORATIONFORDATAJOURNALISTSTOMMASOVENTURINI,MATHIEUJACOMY,LILIANABOUNEGRU,JONATHANGRAY

Networksareclassicbutunder-acknowledgedfiguresofjournalisticstorytelling.Whoisconnectedtowhomandbywhichmeans?Whichorganizationsreceivesupportfromwhichothers?Whatresourcesorinformationcirculatethroughwhichchannelsandwhichintermediariesenableandregulatetheirflows?Theseareallcustomarystoriesandlinesofinquiryinjournalismandtheyallhavetodowithnetworks.Additionally,therecentspreadofdigitalmediahasincreasinglyconfrontedjournalistswithinformationcomingnotonlyinthetraditionalformofstatistictables,butalsoofrelationaldatabases.Yet,journalistshavesofarmadelittleuseoftheanalyticalresourcesofferedbynetworks.Toaddressthisprobleminthischapterweexaminehow“visualnetworkexploration”maybebroughttobearinthecontextofdatajournalisminordertoexplore,narrateandmakesenseoflargeandcomplexrelationaldatasets.Weborrowthemorefamiliarvocabularyofgeographicalmapstoshowhowkeygraphicalvariablessuchasposition,sizeandhuecanbeusedtointerpretandcharacterisegraphstructuresandproperties.Weillustratethistechniquebytakingasastartingpointarecentexamplefromjournalism,namelyacatalogueofFrenchinformationsourcescompiledbyLeMonde’sTheDecodex.Weestablishthatgoodvisualexplorationofnetworksisaniterativeprocesswherepracticestodemarcatecategoriesandterritoriesareentangledandmutuallyconstitutive.Toenrichinvestigationwesuggestwaysinwhichtheinsightsofthevisualexplorationofnetworkscanbesupplementedwithsimplecalculationsandstatisticsofdistributionsofnodesandlinksacrossthenetwork.Weconcludewithreflectionontheknowledge-makingcapacitiesofthistechniqueandhowthesecomparetotheinsightsandinstrumentsthatjournalistshaveusedintheDecodexproject–suggestingthatvisualnetworkexplorationisafertileareaforfurtherexplorationandcollaborationsbetweendatajournalistsanddigitalresearchers.

INTRODUCTIONFewpeopleknowaswellasjournaliststhattheworldismadeofrelations.Followingalliances,unveilinglinks,unravellingthreadsis,andhaslongbeen,acentralpartoftheirinvestigations.Ifsocialscientistscanspeculateaboutlongstandingstructuresandglobalarrangements,journalistshavenosuchleisure.Theirworkconsistsintracingthespecificassociationsthatconnectindividualsandinstitutionstouncoverhowlumpsofmoney,influenceandknowledgeareexchangedthroughthemandwhereunethicalbehaviour,corruption,fraudorunfairpoliticalinfluencemayoccur.Theadventofdigitaltechnologieshasmadesuchworkbotheasierandmoredifficult.Easier,becauseithasincreasedthetraceabilityofeconomicandpoliticalassociations.Moredifficult,becauseithassubmergedjournalistswithmoreinformationthantheirinvestigativetoolkitisusedtohandling.

When,forexample,thereportersoftheInternationalConsortiumforInvestigativeJournalism(ICIJ)receivedthe2,6terabytesand11,5milliondocumentscomposingtheso-called'PanamaPapers',theyobviouslycouldnotprocessthemmanually(Baruch&Vaudano,2016).Notethatthisisnotjusta‘bigdata’problem.Thetroublewiththeleakwasnotonlyitssize,butthefactthatitsinterestcamefromthelinksitestablishedbetweenspecificindividualsandparticulartax-havens.Extracting“key”figuresthroughstatisticalaggregationorabstractedcomputationalmodelswouldmissthepointofmanyofthestoriesthatjournalistsweremostkeentoexplore.Theinquirycouldnotsimplifythedataset,buthadtoexploreeachandeveryoneoftheconnectionsitexposed.Thiswasdone,amongothersways,throughatoolcalledLinkurious(http://linkurio.us),whoseinterestcomeslessfromitscomputationalpowerthanfromthewayinwhichitallowsitsuserstoseeandfollowtheconnectionsofanetwork.

ThePanamaPapercaseisinteresting,butalsointerestinglyisolated.Despitelongstandinginterest,theuseofnetworksinjournalismremainscomparativelymarginal(cf.Bounegruetal.,2016foranoverviewoftheemergingusesofnetworksinjournalism).Thereasonsarenotdifficulttoimagine.Graphmathematicsismoredemandingandlesswidelyknownthantraditionalstatisticalapproachesanddoesnotcomewiththesamereadilyaccessibleandpubliclyrecognisedvocabularyofvisualmotifs.Withallits

Page 3: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

2

computationalpower,graphmathematicsdoesnotfitjournalisticneedsbecauseittendstobeobscureforbothreportersandtheirreaders.

Inthischapter,weaddressthisdifficultybysuggestingatechniqueforthevisualexplorationofnetworks.Aswewilltrytoshow,whenperformedcorrectly,thevisualrepresentationofnetworktranslatessomeofthemostimportantgraphstructuresintographicalvariables(therebysupportinginvestigativework)andallowingtheinterpretationofnetworkswithconventionssimilartothosedevelopedforgeographicalmaps(therebyremaininglegibleforalargeaudience).Afterhavingintroducedthemathematicalandhistoricalbasesofourapproach,wewillpresentourtechniqueforthevisualexplorationofnetworks.Usingasanexample,athenetworkoftheFrenchinformationsphere,wewillillustratetherecursiveworkofinterpretationandcategorisationthatallowtoreadthenetworkasanorganisedterritory.Visualnetworkexploration,whichisgrowinginprominenceamongstdigitalmethodsresearchersforsocialandculturalresearch,maybeusefulnotonlyforstudyingmedialandscapes,butalsofordigitaljournalismpractitionerswhoareinterestedinexploringandtellingstorieswithnetworksandrelationaldata.

UNDERSTANDINGFORCE-DIRECTEDLAYOUTSFarfrombeingmerelyaesthetic,thegraphicalrepresentationofnetworkshasanintrinsichermeneuticvalue,whichyouwillhaveexperiencedifyouhaveeverusedapublictransportationmap.Suchmapsaredistinctivelydifferentfromroadmapsorcitymaps.Itisnotonlythattransportationmapsaresimpler(thelevelofdetailsdependingonlyontheresolutionofthemap),itisthattheyrepresentanetworkandnotageographicalterritory.AnillustrationofthisdifferencecanbefoundinthefamousmapoftheLondontubeasdesignedbyHarryBeckin1933.BeforeBeck’sredesign,thediagramwasaclassicgeographicalmaplocatingstationsaccordingtotheircoordinates.Aftertheredesign,itbecameanetworkofcorrespondencesinwhichstationsarepositionedaccordingtotheirrelativeproximityandconnectivity.Thegaininlegibilityisevidentasthefunctionofthetransportationmapisnottosituatestationsinurbanspace,butrelativetoeachother,soastohelpuserstomovefromonetoanother(atypeoforientationthatresemblesstrikinglytooneusedbyoftraditionalseanavigators,see,forexample,Turnbull,2000,pp.133-165).

a. b.

Figure1.Londontubemap(a)in1920beforeBeckredesignand(b)in1933afterBeckredesign.

AnotherexampleofsuchmappingapproachcomesfromearlyworksinSocialNetworkAnalysis(Freeman,2000).JacobMoreno,founderofSNA,isexplicitabouttheimportanceofvisualization:'Aprocessofchartinghasbeendevisedbythesociometrists,thesociogram,whichismorethanmerelyamethodofpresentation.Itisfirstofallamethodofexploration'(1953,pp.95-96).InaninterviewreleasedbyMorenototheNewYorkTimesin1933,networkanalysisispresentedasa'newgeography'.Moreimportantthanthetitle,however,isthefigurethataccompaniesthatinterview,depictingfriendshipsamongfourthgradepupils.Thesociogrampresentedbythesefigurespowerfullyrevealshowfriendshipisnotequallydistributedintheclass.Oneonlyneedtoknowthattrianglesrepresentboysandcirclesgirlstoseehowinter-genderrelationshipsarediscouragedatthatspecificage(oratleastthedeclarationofsuchfriendships).Thetrick,ofcourse,onlyworksbecausethenodesarenotpositioned

Page 4: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

3

randomlyinthespace,butinawaythatminimizesline-crossing(inMoreno’sownwords'thefewerthenumberoflinescrossing,thebetterthesociogram',1953,p.141).Itisbecausetrianglesarepushedononesideandcirclesonanotherthatitiseasytospottheexistenceofasingleinter-genderconnection.

a. b.

Figure2.Sociogramrepresentingfriendshipamongschoolpupils(originaltitleandimageaccompanyingMoreno’s1933NewYorkTimeinterview)(a)intheoriginalversionand(b)inthemodernforce-directedspatialisation.

Moreno’sruleofspatialisationiseasytofollowonagraphofafewdozennodesandedgesbutimpracticableonlargernetworks.Graphswiththousandsofnodesandedgesaresointricatethatthedirectcountingofline-crossingsbecomesprohibitivelytime-consuming.Anindirectapproachconsistsofdrawingclosertheconnectednodestominimizethelengthoftheedgesandthereforethepossibilityofcrossings.Buteveninthiscase,sinceeachnodemaybeconnectedtoseveralothernodeswhicharethemselvesconnectedmanyothernodes,minimizingthelengthoftheedgesisfarfromatrivialexercise.

Thuswemightexplorethenetworkusingatechniquecalled'force-directedspatialisation'.Suchspatialisationfollowsaphysicalanalogy:nodesarechargedwitharepulsiveforcethatdrivesthemapart,whileedgesactasspringsbindingthenodesthattheyconnect.Oncethealgorithmislauncheditchangesthedispositionofnodesuntilitreachesabalancesuchofforces(Jacomyetal.,2014).Suchequilibriumreducesline-crossingsandimprovesthelegibilityofthegraph.FrüchtermanandReingold(1991),whoproposedthefirstefficientforce-directedalgorithm,citeline-crossingasthesecondoftheiraestheticcriteria.

Yet,scholarsworkingwithnetworkssoonrealisedthatavoidingline-crossingisnotthemostinterestingeffectofforce-directedlayouts.Atequilibrium,thevisualdensityofnodesandedgesbecomesanapproximatebutreliableproxyofthemathematicalstructureofthegraph(foradetailedmathematicalproof,seeVenturinietal.,forthcoming).Groupsofnodesgatheringinthelayouttendtocorrespondtotheclustersidentifiedbycommunity-detectiontechniques(Noack,2009);structuralholes(Burt,1995)tendtolooklikesparserzones;centralnodesmovetowardsmiddlepositions;andbridgesarepositionedsomewaybetweendifferentregions(Jensenetal.,2015).

Thetrickofforce-directedalgorithmsisallthemoreremarkable,giventhatthespaceofnetworksisrelativeratherthanabsolute(itcanberotatedormirroredwithoutdistortionofinformation)andthatitisaconsequenceandnotaconditionofelementpositioning.Intraditionalgeographicalrepresentation,thespaceisdefinedaprioribythewaythehorizontalandverticalaxesareconstructed.Pointsareprojectedonsuchpre-existingspaceaccordingtoasetofrulesthatassignaunivocalpositiontoapairofcoordinates.ThesameistrueforanyCartesiandiagram(scatterplotsforinstance),butnotfornetworks,inwhichthespaceisdefinedbythepositionofthenodesandnottheotherwayaround.

Despitesuchdifferences(whichshouldnotbeforgotten),force-directedalgorithmsallowreadingnetworksasgeographicalmaps,translatingcomplicatedmathematicalconceptsintomoreconventionalvocabularyofregionsandmargins,pathandlandmarks,centresandperipheries(Lynch,1960).Thisisacrucialadvantagethatexplainswhyforce-directedalgorithmshavebecomethede-factostandardofnetworkvisualisation:theyfacilitatetheexplorationofnetworksandrelationsbymeansofmorefamiliarandintuitivespatialmetaphors,aswellasthroughlessfamiliarcomputationalandstatisticalmetrics.

Page 5: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

4

THEDÉCODEX:ACONTROVERSIALCASESTUDYInthefollowingpages,wewillillustratethetechniqueofvisualnetworkexplorationdrawingonaconcreteexample.OurcasestudyisanetworkofwebsitesextractedfromalistingcompiledbytheFrenchjournalLeMonde.Since2009,agroupofjournalistsgatheredunderthenameofLesDécodeurs(www.lemonde.fr/les-decodeurs/article/2014/02/12/l-equipe-des-decodeurs_4365082_4355770.html)hasverifiedtheaccuracyofthousandsofstoriescirculatingintheFrenchblogosphereandinsocialmedia.InJanuary2017(atthebeginningtheFrenchpresidentialcampaign),LesDécodeurshavelaunchedanonlinetoolcalledtheDécodex(www.lemonde.fr/verification),allowingreaderstosearchforthemostimportantsourcesofonlineinformationrelevanttoFrenchpublicdebates(thoughnotnecessaryinFrench).Eachsourceisaccompaniedbyashortdescriptionand,morecrucially,byanevaluationofitstrustworthinessaccordingtothejournalistsofLeMonde.

Figure3.UserinterfaceoftheDécodextoolbyLeMonde

Notsurprisingly,theclassificationprovidedbyLesDécodeurshasstirredmuchdebateintheFrenchmediaspheres.Severalofthesourcescategorizedasimpreciseorunreliable,alongwithothernewspapersandblogs,havecontestedtheDécodex,withcritiquespanningfromchallengingthewayinwhichwebsitesareover-simplisticallyclassified;toquestioningtherightofLeMonde(whichisitselfarivalsourceofinformation)tonotethereliabilityofotherwebsites;todisputingthelegitimacyandinterestofsuchclassificationingeneral(arguingthatsomeofthewebsitesinthelistmeanstocirculateopinionsratherthaninformation).LesDécodeursthemselvesadmittedthedifficultyoftheirexercise,themanyambiguitiesthattheywereobligedtodecideonandtheerrorsandinaccuraciesthatmayhavederivedfromthem.Atthesametime,theydefendedtheirworkbypointingattheincreasingquantityoffalseorpartisaninformationcirculatingonlineandbyaffirmingtheiropennesstodiscussingtheirclassificationandrevisingitifnecessary.

ThecontroversyaroundtheDécodexisagoodexampleofdifficultiesconnectedtothedetectionoffakenewsonline(Bounegruetal.,2017),butalsoofthemoregeneraldebatessurroundingallkindofclassifications.Categorizingthingsisneveraself-evidentorinnocentpractice(Bowker&Star,1999)andshouldalwaysbecarriedoutwiththegreatestcaution.ThisistruefortheinitialclassificationoftheDécodex,butitisalsotrueforthenetworkextractedfromit.Aswewillseeinthefollowingpages,thevisualexplorationofnetworkinvolvesaconstanttoingandfroingofcategorizationandobservation,typologyandtopology.

Tobuildourexamplenetwork,wehaveextracted,incollaborationwithLesDécodeurs,allthewebsites

Page 6: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

5

containedintheDécodexandinvestigatedthewayinwhichtheyciteeachother.Todoso,weemployedHyphe(http://hyphe.medialab.sciences-po.fr)awebcrawlerdevelopedbythemédialabofSciencesPo,whichfacilitatestheexplorationofwebsitesandfollowingthehyperlinkspresentintheirpages.AllthewebsitescomprisingtheDécodexcorpushavebeencrawledatadepthofoneclickstartingfromthehomepage.Wesoobtainedanetworkwith653nodesand5943edges.WhilstLesDécodeursfocusoneditorialjudgementsabouthowtoclassifywebsitesintheFrenchmedialandscape,ournetworkexplorationexaminestherelationsbetweenthemandotherwebsitesbymeansoftheirlinkingpractices.Whilesomeresearchersfocusonhownetworksareheldtogetherthroughfinancialties,organisationalaffiliations,businessrelationshipsandfamilyandsocialrelations–weconsidertheirrelationsaccordingtothehyperlink,inaccordancewithalongertraditionofdigitalmethods,digitalsociologyandnewmediastudiesresearch(see,e.g.Marres&Rogers,2005;Rogers,2013)

Thetreatmentofsocialplatforms(suchasFacebook,Twitter,YouTube…)inourcrawlrequiressomeadditionalexplanation.Theseplatformsarebothsourcesofinformationasawholeandcontainersofmultipleindividualsourcesintheformofpagesoraccounts.SinceextractingallthehyperlinksfromasiteaslargeasFacebookwouldhavebeenimpossible,weonlycrawledtheaccountsthatwerespecificallymentionedintheDécodex.Wehave,however,keptarecordofallthelinkspointingtowardthemainsocialmediaplatformtoinvestigatehowtheyarecitedbytheotherwebsitesofourcorpus.

AVISUALEXPLORATIONOFTHEDÉCODEXNETWORKThevisualexplorationofnetworksexploitsthreevisualvariablestographicallyrepresenttheirfeatures:position,sizeandhue(foradefinitionofthesevariablesandtheirsemioticaffordances,seeBertin,1967).Forthereasonsdiscussedabove,positioniscrucialintranslatingthemathematicalcharacteristicsofthegraphs.Force-directedlayoutscreateregionswherenumerousnodesaredenselyassembledandregionsthatarelesscrowded.Thesedifferencesofdensity,determinedbytheunevendistributionoflinks,revealtheunevenassociationbetweentheentitiesofthenetwork.Everythingmaybeconnectedinthisworld,butnoteverythingisequallyconnected.

Discerningthespatialstructureofnetworks,however,isnotalwaysstraightforward.Intheeasiestcases,thedifferenceinthedensityofassociationissuchthatclustersappearaswelldefinedknotsofnodesandedgesseparatedbyempty(oralmostempty)zones.Thesezonesarecalled'structuralholes'(Burt,1995)and,whentheyexist,theyprovideacrucialguidancefortheinterpretationofthenetwork.Thankstotherupturescreatedbystructuralholes,theboundariesofclusterscanbeeasilydetected,likecliffsseparatingaplateaufromavalley.Mostofnaturalandsocialnetworks,however,donotexhibitsuchaclearseparationandthebordersoftheirclustertendtobegradualasthehillsideslopes.Thefuzzinessofclusters’frontiersisnotnecessarilyanobstacletotheirrecognition(onecanpointatahillevenwhenitisimpossibletosayexactlywhereitstartsandends),butitcertainlymaketheiridentificationmoredifficult.Thisiswhyvisualnetworkanalysisisoftenmorelikeanexploratoryexpedition-wheremeaningsandfindingsareprogressivelyandhermeneuticallygenerated-thantothestatisticalconfirmationofasetofpre-existinghypotheses(onthedifferencebetweenexploratoryandconfirmatoryanalysisseeTuckey,1997andBehrensandChong-Ho,2003).

ThisiscertainlythecaseforourDécodexnetwork,which,atafirstlook,doesnotpresentanymanifeststructuralholeoranyclearspatialstructure.Tovisualiseournetworkweusedtwomaintools:Gephi(https://gephi.org)forfilteringandspatializingthenetwork(usinginparticulartheforce-drivenalgorithmForceAtlas2)andGraphRecipes(http://tools.medialab.sciences-po.fr/graph-recipes)totweakthevisualrenderingofthenetwork.ThoughnostructuralholesareevidentintheDécodexnetwork,lookingcloselyatthelayoutmakesitispossibletonoticethatthenetworkdoesnotspatializeasaperfectcircle,butratherinanavocado-likeshapewithasmallertopandandalargerbottom.Theseirregularities(asweakandsubtleastheycanbe)oftensuggestthepresenceofpolarisingeffectswhichcanbeinterestingtoinvestigatefurther.

Page 7: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

6

Figure4.TheDécodexnetworkspatializedbyForceAtlas2.Thesizeofnodesisproportionaltoin-degree.

Thefirstandmostcrucialwaytoexploreournetworkistolookattheidentityofthenodesthatoccupyitsdifferentregions.Thismayseemtrivial,butitisnot.Itisadistinctadvantageofvisualexplorationcomparedtootherformofstatisticalanalysis,thatitdoesnotaggregatetheindividualentitiesthatcomposeitscorpus:eachandeverynodeisvisibleinthelayoutandcanbeinterrogatedbytheresearcher.Evenonasmallnetworkastheoneinourexample,however,thequantityofnodescanmakeitdifficult(andtimeconsuming)tolookatallofthem.

Thisiswherethesecondvariableofourvisualexploration,size,comesinhandy.Since,innetworks,nodesaredefinedfirstandforemostbytheirconnections,wehaverankedthenodesaccordingtothenumberofedgespointingtothem.Inthejargonofnetworkanalysisthisnumberiscalled'in-degree'andnodeswithanelevatedin-degreearecalled'authorities',becausetheyarerecognisedandreferredtobymanyothers.Inthepreviousfigureandinallfollowing,wehavesizedthenodesaccordingtotheirin-degreesothatagreaterauthorityliterallytranslatesintoincreasedvisualprominence.

Readingthenamesofwebsitesthatoccupythetwopolesofouravocado,itseemsnaturaltosupposethattheirseparationderivesfromalinguisticfracture.ThewebsitesinthelowerpartarepredominantlyFrench,whilethoseintheupperpartaremoreinternational.AwaytohighlightthisistoshowtheunevendistributionofTLD(TopLevelDomain)inthenetwork.

Page 8: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

7

Fig.5.DistributionofTDLintheDécodexnetwork.

Thelinguisticseparationwejusthighlighted,however,isnotparticularlysurprisingorinteresting.Thiskindofdivisionisregularlyobservedinnetworkofwebsitesandhyperlinks.Detectingitisimportant,butratherinanegativeway-itmakesusawarethatinordertogeneratemoreinterestingfindings,wewillhavetolookbeyondit.

Furtherexploringthenetwork,wemaynoticetheroleofnotjustlanguages,butalsosocialnetworkplatforms,suchasYouTube,Facebook,Twitter,InstagramandDailymotion.WiththeremarkableexceptionofWikipedia,allthemainsocialmediaplatformsarelocatedinthemiddlerightofthelayout-somewherein-betweentheEnglishandtheFrenchwebsites(asonewouldexpectgiventhemultilinguality),butalsoseparatedfrombothbytheirdistinctivenature(andpossiblybythedifferentwayinwhichtheyhavebeentreatedinthecrawl).

Moreover,byfocussingonthelowerandlargerpartofthenetwork,wecanrecognisetwodifferentsub-poles,withnationalsources(suchasLeMonde,LeFigaro,FranceInfo,Libération...)occupyingmostofthelowerregionandtheregionalpressclusteringatthebottom-rightofthelayout.

Page 9: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

8

Fig.6.ZoomontheFrenchregionalpress

Thedistinctivepositionoftheplatformsandthenational/regionalpressarebothinterestingandnontrivialfindings,butwecanpushouranalysisfurther.Thewaytodosoisbyplayingwiththethirdvisualvariableexploitedbyvisualexplorationofnetwork:thehueofthenode.Thisisalaboriousbutrevealingpartofourvisualexploration.Itconsistsincategorizingthenodesofthenetworkaccordingtomultipleclassificationsandvisualizingtheseclassesonthenetworkasdifferentcolorsor(asinthispaper)asdifferentshadesofgrey.Itisimportanttonoticethattheoperationofclassifyingthenodesandofreadingthedispositionofclassesarenotseparated,butperformedatthesametime.Asitwillbecomeclearinthenextpages,ourtechniquedoesnotconsistsimplyintheprojectionofasetofpre-existingcategoriesonaconnectivity-basedlayout,butonrecursivelyusingthecategoriestomakesenseofthelayoutandthelayouttodefinethecategories.Itisimportanttorememberthatthecolorisa‘non-mixable’visualvariable.Anodecanberedorblue,forexample,butnotthetwoatthesametime.Whencategorizingnodes,itisthereforenecessarytoemployexclusivecategories.Awebsite,forexample,canbeclassedinthecategory'news'or'satire',butnotinboth.Inthe(notuncommon)caseofnodesresistingauniqueclassification,researchercanintroducearesidualcategorysuchas'multiple'or'misc'.

Asafirststepinourcombinedexplorationoftopologyandtypology,wewillcolorthenodesofthenetworkaccordingtotheoriginalcategoriesoftheDécodex.ThesecategoriesrefertothetrustworthinessofthesourcesasmanuallyassessedbythejournalistsofLeMondeinthefourcategoriesare'reliable','imprecise','unreliable'and'satirical'.Preciselybecausethesecategorieshavebeendefinedbeforeandindependentlyfromtheextractionofthenetwork,theirdispositiondoesnotfollowthespatialarticulationofthenetwork.Rather,itispossibletofindnodesofeverycategoriesinalmostofregionsofthenetwork.Aremarkableexceptionarethesatiricalwebsitesthataretobefoundontherightsideofthelayoutbothinitsupperandlowerpart.Arguably,thispositionisnotduetothehyperlinksbetweenthesatiricalwebsites(whichdonotciteeachotherverymuch),butbytheirstrongconnectionwithsocialmediaplatformstowhichallthesesitesextensivelylink.

Page 10: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

9

Fig.7.The'satirical'websitesaccordingtotheoriginalDecodexclassification(nodehavebeenemphasizedbytheblack

colorandbydoublingtheirradiusdespitetheirlowdegree)

Theotherclassesaredistributedmoreevenlybutnotrandomly.The'reliable'websitestendtooccupythecenterofbothintheinternationalandFrenchpole,whilethe'imprecise'and'unreliable'takeamoremarginalposition.Moreinterestingly,lookingatthelowerpartofthenetwork,weobservetwogroupsof'imprecise'and'unreliable'sources-whileamajorityofthesenodesarepositionedabovethecoreofnationalandreliablewebsites(andhencein-betweentheFrenchandtheinternationalwebsite),asignificantminorityislocatedbelowthem.

Page 11: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

10

Fig.8.Highlightofthe'reliable'websites(left)and'unreliable'and'imprecise'websites(right)

Toaccountforthisseparation,weintroduceanadditionalcategorisationbasedonthepoliticalleaningofthewebsites.Inparticular,wedistinguishthewebsitesthatdisseminateunreliableorimpreciseinformationbecausetheypursuearight-wingorextreme-rightagenda(whichoccupythecenterofthenetwork)andthewebsitesexhibitingamoregeneralconspiritorialattitude(whichoccupythebottomofthenetwork).

Fig.9.Highlightofthe‘conspiritorial’websites(left)and'right'and'extremeright'websites(right)

Throughouriterativeexplorationoftypologyandtopologywehaveeventuallyrevealedapartitioningof

Page 12: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

11

thenetworkthat,whileinvisibleatfirstglance,allowstointerpretsomeofthemaincontoursoftheFrenchmedialandscape.Thoughtheseterritoriesarenotseparatedbyclearstructuralholes,thenodesthattheycontainarefairlyconsistent.Interestingly,ourfinalclassificationproducesahomogeneouspartitionofthelayoutnotinspite,butbecauseofitsheterogeneity,whichmixeslinguisticcategories,trustworthinessclassesandpoliticalleanings.Thefactthatanon-homogenouscategorizationturnsuptoofferthebestcharacterizationofthestructureofournetworkshouldnotcomeasasurprise.Networksarecomplexobjectswhicharticulatediverseelementsthroughdisparatelogics.Inthis,theyremindusofapassagebyJorgeLuisBorgescitedbyFoucaultasaperfectexampleofaheterogenousclassificationthat,whiledefyingourtraditionalcategories,isnonethelesshighlyefficienttodescribethecultureinwhichithasbeenelaborated:

“[Borges] quotes a ‘certain Chinese encyclopaedia’ in which it is written that ‘animals are divided into: (a) belonging to the Emperor, (b) embalmed, (c) tame, (d) suckling pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification, (i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from a long way off look like flies’. In the wonderment of this taxonomy, the thing we apprehend in one great leap, the thing that, by means of the fable, is demonstrated as the exotic charm of another system of thought, is the limitation of our own”. (Foucault,1970p.XV).

Fig.10.TheheterogenousterritoriesoftheDécodexnetwork.

LINKINGPATTERNSINTHEDÉCODEXNETWORKNowthat,bymeansofvisualexploration,wehavedefinedaheterogenousbuthermeneuticallyrobustpartitioningofournetwork,wecanuseitasabasisforastatisticalanalysis.Whilepraisingtheadvantagesofthevisualinterpretation,wearealsoawarethatnotallstructuralpropertiescanberenderedvisually.Thedirectionofedgesortheconnectionbetweendifferentclasses,inparticular,arenoteasilyreadinnetworkimages.Thesequestions,however,canbeinvestigatedbyothermeansoncethepartitioningofthenetworkhasbeendefined.

Page 13: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

12

Fig.11.Distributionofthenumberofnodespercategory

Fig.11showsthedistributionofnodesintheregionsidentifiedinourfinalclassification(seefigure10),towhichwehaveaddedthe‘satirical’websites(whichwediscussedabovebutnotincludedinfigure10forthesakeoflegibility)aswellas“otherreliable”and‘otherunreliable’.Thesetworesidualcategoriescomprisetogetheraboutonefifthofthenodesofthenetwork.Thisrelativelyhighfigureisnotuncommon.Giventheheterogeneityofthenetworkstheyworkwith,socialscientistsandjournalistsshouldaimatclassificationsthatarerobustandinsightful(capableofdelineatinghomogenouszonesinthegraph)ratherthancomprehensive.

Page 14: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

13

Fig.12.Connectivitybetweenthecategoriesofourfinalclassification.Rowsconveyhowmanytimethenodesofagivencategorycitesthenodesofothercategories.

Columnsconveyhowmanytimethenodeofagivencategoryarecitedbythenodesofothercategories.

Ourempiricalcategoriesarepowerfultoolstounveildifferentlinkingstrategiesinthenetwork.Figure12abovepresentsthelinksinthecorpusaggregatedbycategories.Aswecansee,notallcategoriesciteorarecitedthesameway.‘Frenchnationalmedia’and‘platforms’aremuchcitedandbyvariousactors(theircolumnscontainlargercircles),while‘satirical’websitesarescarcelycited(theircolumnisalmostempty).Platformsdonotcitemuch,butthisismerelyaconsequenceofourmethodsince(asexplainedabove)mostofthemhadnotbeennotcrawled.‘Right-wing’,conspiracytheoristandother‘unreliable’websitesareonthecontrarytheoriginsofthehighestnumberofcitationsand,veryinterestingly,theyseemtofavour“reliable”sourcesover“unreliable”ones.Asexpected,thereliablewebsitesdonotlinkbacktothem,andthisasymmetryrevealsanimportanthierarchy.Toinvestigatethislinkingpattern,wewillcomparetheincomingandoutgoinglinksofsomeofthemostinterestingcategories.

Page 15: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

14

fig.13.Hierarchicalstructureinthecorpus,basedonourfinalcategories.Blackarrowsontherightsidesummarizethe

linksstructurebetweenthesecategories.

Page 16: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

15

fig.14.Simplifiedversionofthestatisticalanalysispresentedinfigure13.

Thiskindofhierarchicalstructureiscommononthewebandhasbeenexplainedasaconsequenceofpreferentialattachment(Barabási&Albert,1999):actorstendtolinktootherwebsitesthattheyperceiveashigherinthehierarchyandavoidlinkingtothosethattheyperceiveaslower.Thisstyleofpreferentialattachmentwherebysmalleractorslinktoestablishmentactorswithoutreciprocationofthelinkingacthaselsewherebeencalled“aspirationallinking”(Rogers,2013).Linksinanetworkdonotalwaysproduceahierarchyofcategoriesbutthisbehaviourdoes.Thislinkingpatternandthewayitfitsourempiricalcategories,maysuggestanalternativewaytocharacterisethetrustworthinessbeinginvestigatedbyLeDécodeurs:reliablesourcesarecitedbyalltypesofwebsites,whileunreliablesourcesareonlycitedbyfewothertypes(ifany).

Thisobservationisinmanywaysatoddswithwhatisoftenaffirmedabout“post-truthera”inwhichwehavesupposedlylanded.Whilefakenewsissaidtoleveragethehorizontalityofdigitalmediatoblurtheboundariesbetweentrueandfalse,thelinkingpatternsofthe(French)informationspheressuggestadifferentpicture.Despitetheirdifferentideologicalleanings,allwebsitesagreeontheoverallhierarchyofreliabilitybycitinginonesenseandnotintheother.The‘right-wing’websites,forexample,trytoblurthelinesbycitingboththeirpeersandmorereliablesources,buttheyalsotrytodrawalinebetweenthemandtheevenlessreliable‘conspiracytheorist’websites.Whateveritspositioninthepyramidofhyperlinking,everyactortriestoimproveitssituationbylinkingupwardstoauthoritiesabove,andnotlinkingtolessreputablewebsitesbelow,thusreinforcingthehierarchy.

CONCLUSIONThischapterdiscussedthevisualexplorationofnetworkswiththeaimofimprovingtheunderstandingofoneofthedominantvisual-analyticalformsofourdigitalage–thenetworkdiagram–anditspotentialroleinrelationtothestudyandpracticeofdigitaljournalism.Drawingongraphsemioticsandtraditional

Page 17: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

16

cartography,thischapterproposedamodelwherebytheinterpretationofnetworktopologywithitsregions,paths,coresandperipheries,isguidedbythreevisualvariables:position,sizeandhue.Theprocessthatwedescribedisonethatemphasizestheexploratoryanditerativecharacteroftheinvestigation.Whilecounter-intuitiveatfirst,weemphasisedthatinordertosurfacethemultiplelogicsthatplayoutinthestructureofanetworkgraph,analysisshouldnotlimititselftooneclassificatoryprinciple.Multipleheterogeneouscriteriaofclassificationareoftennecessarytocharacterizethetopologyofanetworkmap.Finally,weadvocatedformixingmethods,complementingvisualnetworkexplorationwithstatisticalanalysesinordertofurthercharacterisenetworkproperties.ThroughthecasestudyofFrenchmediahyperlinkmap,wetriedtoshownhowthevisualexplorationofnetworksrevealsnewangleswhichotheranalysesmayleaveunexplored.Inthiscasethechapterillustratedanalternativewaytoassesswebsites’reliabilitythatcomplementsthetraditionalfact-checkingapproachofqualifyingcontentwithanexaminationofthelinkingpatternsbetweendifferentregionsofthenetworkasreputationalmarkers(Rogers,2013).InthisanalysisthuswehavecombinedthemanualclassificationofreliabilityundertakenbyLeMonde’sjournalistswiththestandingofasourceaccordingtothehyperlinksthatitreceivesandgives.Thisapproachenabledustobringfreshfindingstocurrentdebatesaroundfakenews.Inspiteoftheproliferationoffabricatedcontentofvariousshades,reputationhierarchiesonthewebseemtobemaintained(atleasttosomeextent),asfakeandhyper-partisansitesdeployaspirationalhyperlinkingstyleswhichfavour,perhapssurprisingly,authoritativesources.

REFERENCESBarabási,A.L.,&Albert,R.(1999).Emergenceofscalinginrandomnetworks.Science,286(5439),509.Retrieved

fromhttp://www.sciencemag.org/cgi/content/abstract/sci;286/5439/509

Baruch,J.,&Vaudano,M.(2016,April8).« Panamapapers » :undéfitechniquepourlejournalismededonnées.LeMonde.Paris.Retrievedfromhttp://data.blog.lemonde.fr/2016/04/08/panama-papers-un-defi-technique-pour-le-journalisme-de-donnees

Behrens,J.T.,&Chong-Ho,Y.(2003).ExploratoryDataAnalysis.InI.B.Weiner(Ed.),HandbookofPsychology(pp.33–64).London:Wiley.http://doi.org/10.1002/0471264385.wei0202

Bounegru,L.,Gray,J.,Venturini,T.,&Mauri,M.(2017).AFieldGuidetoFakeNews.Retrievedfromfakenews.publicdatalab.org

Bounegru,L.,Venturini,T.,Gray,J.,&Jacomy,M.(2016).NarratingNetworks:ExploringtheAffordancesofNetworksasStorytellingDevicesinJournalism.DigitalJournalism,

Bowker,G.C.,&Star,S.L.(1999).SortingThingsOut:ClassificationandItsConsequences(InsideTechnologyS.).CambridgeMA:MITPress.

Burt,R.S.(1995).StructuralHoles:TheSocialStructureofCompetition.CambridgeMA:HarvardUniversityPress.Retrievedfromhttp://books.google.com/books?id=E6v0cVy8hVIC&pgis=1

Foucault,M.(1970).TheOrderofThings.NewYork:PantheonBooks.Freeman,L.C.(2000).VisualizingSocialNetworks.JournalofSocialStructure,1(1).Fruchterman,T.M.,&Reingold,E.M.(1991).Graphdrawingbyforce-directedplacement.Software:Practiceand

Experience,21(NOVEMBER),1129–1164.Retrievedfromhttp://onlinelibrary.wiley.com/doi/10.1002/spe.4380211102/abstract

Jacomy,M.,Venturini,T.,Heymann,S.,&Bastian,M.(2014).ForceAtlas2,aContinuousGraphLayoutAlgorithmforHandyNetworkVisualizationDesignedfortheGephiSoftware.PloSOne,9(6),e98679.http://doi.org/10.1371/journal.pone.0098679

Page 18: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their

17

Jensen,P.,Morini,M.,Karsai,M.,Venturini,T.,Vespignani,A.,Jacomy,M.,…Fleury,E.(2015).Detectingglobalbridgesinnetworks.JournalofComplexNetworks,cnv022.http://doi.org/10.1093/comnet/cnv022

Lynch,K.(1960).Theimageofthecity.CambridgeMA:MITPress.Retrievedfromhttp://books.google.com/books?hl=it&lr=&id=_phRPWsSpAgC&pgis=1

Marres,N.,&Rogers,R.(2005).RecipeforTracingtheFateofIssuesandTheirPublicsontheWeb.InB.Latour&P.Weibel(Eds.),MakingThingsPublic:AtmospheresofDemocracy(pp.922–935).Cambridge,MA:MITPress.

Moreno,J.(1953).WhoShallSurvive?(SecondEdition).NewYork:BeaconHouseInc.Noack,A.(2009).Modularityclusteringisforce-directedlayout.PhysicalReviewE,79(2).

http://doi.org/10.1103/PhysRevE.79.026102

Rogers,R.(2013).DigitalMethods.Cambridge,MA:MITPressTheNewYorkTimes.(1933).EmotionsMappedbyNewGeography.TheNewYorkTimes,3April.Tukey,J.W.(1977).ExploratoryDataAnalysis.Reading,MA:Addison-Wesley.Turnbull,D.(2000).Masons,TrickstersandCartographers.London:Routledge.Venturini,T.,Jacomy,M.,&Jensen,P.(n.d.).WhatdoweSee,WhenweLookAtNetworks.TowardsaPositive

MeasureofSpatialisationQualityforForce-DrivenNetworkLayouts.Forthcoming.