Download - Visual Analysis and Historical Discovery
VISUAL ANALYSIS AND HISTORICAL DISCOVERY
Summer School on Big Data Information Visulisation
Chandan Kumar (University of Oldenburg) Julia Juergens (University of Hildesheim) Percy Perez (University of St. Andrews) Victoria Hore (University of Oxford)
BRIGHTSOLID: NEWSPAPER DATASET
Data description • Newspapers
• Fife Herald 1833-1878 • The Dundee Courier & Argus 1890-1899
• Data set
• 154 GB of XML files • 16 048 issues (1 METs file for 1 issue) • 77 954 pages (1 ALTO file for 1 page) • no images
Data processing • 20 years data analyzed
• 12 years have complete titles • 8 years do not have complete titles • 6189 files analysed • 314 meta files per year ( Avg)
• 12 years => 3754 issues • Word counting, formating files to/from XML, D3 and Jigsaw
• Hadoop processing was impressive
Idea generation • What happened in the 19th century?
• Find interesting stories
• Where were events happening? • Overview of mentioned locations
• What were the most common topics? • Overview of frequent words • Categorization of words
• Who was mentioned? • Entity recognition of names
Visualisations (Beyond Jigsaw) • More numerical analysis
• User selected dimensions and exploration
• Dynamic visualization
• topics, locations, entities
• Pattern analysis
Insights • Industrial revolution in Dundee
• Frequency analysis, cluster overview, positive sentiments
• LATEST MOVEMENTS OF DUNDEE JUTE FLEET • Entity relations, bigram analysis
• Calcutta, Indian subcontinent? • Location-commercial significance
• Baxter Brothers was the world's largest linen manufacturer (1840-1890) • Family names-organization
Conclusions • A really steep learning curve • Big data is BIG • Distributed computing is important • Data wants to tell interesting stories (we just need to interact) • Visualisation is powerful • Jigsaw is awesome • Lot of useful visualisation tools are ready to be used
• Generalizations and Interactions (future work)