visualizing textual data cpsc 601.28 a. butt / feb. 26 '09
TRANSCRIPT
Visualizing textual data
CPSC 601.28
A. Butt / Feb. 26 '09
Overview
• Project implications• Summarize "Tilebars"
– Hearst / PARC (Xerox)• Summarize "Visualizing the Non-Visual"
– Wise et al / Pacific Northwest Lab (Battelle)• Key Issues• Summary• References
Project Implications
• Research area is partly based on text-based environmental reports– textual reporting feeds into textual (quasi-judicial)
regulatory framework– rooms of binders (e.g. >20,000 pages for Mackenzie
Pipeline Project)• Vocabulary specialized / semantically complete
– "no significant adverse environmental impacts"
TileBars
• goals are to simultaneously view:– length of a document– relative frequency of specific words– distribution of words with respect to each other
• benefits include:– enhanced relevancy of search response– patterns of frequency by document / author– compactness of information
Tilebars
• Visual representation via– rectangular block: size equates to document length– three bars within the block: each corresponds to a
query– in each bar tiles indicate location, saturation of tile
indicates frequency
•5 articles, 3 search queries•1st, 2nd, 5th appear compact / relevant•1st and 2nd appear to have better concurrency•3rd and 4th potentially less relevant, greater time investment to read
Visualizing the Non-Visual
• goals are to:– overcome time constraints in processing textual
information– overcome attention constraints; avoid becoming
overwhelmed by volume of textual information• benefits include:
– escape limitations of traditional text– increase throughput and comprehension of
information processing– feedback on text structure to enhance visualization
Visualizing the Non-Visual
• Employ a "natural landscape" metaphor– leverage evolutionary psychological adaptations via
natural landscapes for representation– galaxy or star-fields ("night sky")– themescapes ("cartographic" or "landscape") – although statistical measures used for clustering, they
are not used as directly as in tile bars– self-organizing maps
Galaxies
•PNL software development (DOE)•Display is a review of cancer literature•Branched to SPIRE / In-SPIRE for government documents
Themescapes
•PNL software development (DOE)•Branched to SPIRE / In-SPIRE for government documents (renamed "Themeview")•Branched into NVAC (National Visual and Analytics Centre) - part of the Homeland Security infrastructure
Themescapes (2.0?)
•Branched progeny of themescapes•Used in searching IP / Patents•Subscription service
•Failed metaphors??
Key Issues
• Vocabulary / semantics - how do you interpret meaning from text statistics?– earlier failures of natural language processing– contingent semantics
• Employing metaphors (Zhang 2008)– rely on unusual linkages (versus analogy) to highlight– degree of "unusual-ness" is critical: too much or too
little leads to confusion
Summary
www.wordle.net
References
Marti A. Hearst: TileBars: Visualization of Term Distribution Information in Full Text Information Access. CHI 1995: 59-66
James A. Wise and James J. Thomas and Kelly Pennock and David Lantrip and Marc Pottier and Anne Schur and Vern Crow. Visualizing the Non-Visual: Spatial Analysis and Interaction with Information from Text Documents. Proc. IEEE Symp. Information Visualization, InfoVis, pp. 51-58, IEEE Computer Soc. Press, 30-31, October 1995. (in text pages 442-450)
Jin Zhang. The Implication of Metaphors in Information Retrieval. Visualization in Information Retrieval, Elsevier, 2008. (pages 215-237)