Visual Analysis and Historical Discovery

Download Visual Analysis and Historical Discovery

Post on 14-Jul-2015



Entertainment & Humor

2 download

Embed Size (px)


<ul><li><p>VISUAL ANALYSIS AND HISTORICAL DISCOVERY </p><p>Summer School on Big Data Information Visulisation </p><p>Chandan Kumar (University of Oldenburg) Julia Juergens (University of Hildesheim) Percy Perez (University of St. Andrews) Victoria Hore (University of Oxford) </p><p>BRIGHTSOLID: NEWSPAPER DATASET </p></li><li><p>Data description Newspapers </p><p> Fife Herald 1833-1878 The Dundee Courier &amp; Argus 1890-1899 </p><p> Data set </p><p> 154 GB of XML files 16 048 issues (1 METs file for 1 issue) 77 954 pages (1 ALTO file for 1 page) no images </p></li><li><p>Data files </p><p>Title MET </p><p>- OCR errors - No meaning </p><p>ALTO </p></li><li><p>Methodology </p></li><li><p>Architectural overview </p></li><li><p>Data processing 20 years data analyzed </p><p> 12 years have complete titles 8 years do not have complete titles 6189 files analysed 314 meta files per year ( Avg) </p><p> 12 years =&gt; 3754 issues Word counting, formating files to/from XML, D3 and Jigsaw </p><p> Hadoop processing was impressive </p></li><li><p>Idea generation What happened in the 19th century? </p><p> Find interesting stories </p><p> Where were events happening? Overview of mentioned locations </p><p> What were the most common topics? Overview of frequent words Categorization of words </p><p> Who was mentioned? Entity recognition of names </p></li><li><p>Visualization (overview) </p></li><li><p>Visualization (overview) </p></li><li><p>Visual Exploration with Jigsaw Jigsaw already has good functions and visualizations! </p></li><li><p>Visualisations (Beyond Jigsaw) More numerical analysis </p><p> User selected dimensions and exploration </p><p> Dynamic visualization </p><p> topics, locations, entities </p><p> Pattern analysis </p></li><li><p>Interactive visualisation </p></li><li><p>Dynamic exploration </p></li><li><p>Insights Industrial revolution in Dundee </p><p> Frequency analysis, cluster overview, positive sentiments </p><p> LATEST MOVEMENTS OF DUNDEE JUTE FLEET Entity relations, bigram analysis </p><p> Calcutta, Indian subcontinent? Location-commercial significance </p><p> Baxter Brothers was the world's largest linen manufacturer (1840-1890) Family names-organization </p></li><li><p>Conclusions A really steep learning curve Big data is BIG Distributed computing is important Data wants to tell interesting stories (we just need to interact) Visualisation is powerful Jigsaw is awesome Lot of useful visualisation tools are ready to be used </p><p> Generalizations and Interactions (future work) </p></li><li><p>THANK YOU FOR THE COOL (SCHOOL) EXPERIENCE Big thanks to BRIGHTSOLID for providing the interesting dataset </p><p>Chandan Kumar Julia Juergens Percy Perez Victoria Hore </p><p>Visual analysis and Historical Discovery Data descriptionData filesMethodologyArchitectural overviewData processingIdea generationVisualization (overview)Visualization (overview)Visual Exploration with JigsawVisualisations (Beyond Jigsaw)Interactive visualisationDynamic explorationInsights ConclusionsThank You for the cool (school) experience </p></li></ul>