mw2011: klavans, j. +, computational linguistics in museums: applications for cultural datasets
Post on 10-May-2015
1.032 Views
Preview:
DESCRIPTION
TRANSCRIPT
• Your spoken paper cannot be the same as your written paper
• Read more: Museums and the Web 2011 (MW2011): Presentation Guidelines | conference.archimuse.com
Klavans
Judit
hRobert Stein
SusanChun
Raul Guerr
a
Computational Linguistics in Museums:
Applications for Cultural Datasets
COMPUTATIONALLINGUISTICS
• Language - Words, Words, Words• Use• Meaning• Syntax• Shape of words• Sounds
APPLICATIONS
• Speech synthesis – 1980’s Talking Machines for the Blind
• Intelligent search – pre-google• Finding names – who, what, where• Translation• Speech recognition• Answering Questions – What is
Watson?
DOMAINS FOR COMPUTATIONAL
LINGUISTICS• Healthcare – interpreting patient records• Government – helping people find
information• International Affairs – cross-language
translation• Law – analyzing Enron scandal email• Marketing – Opinions on products• Museums – analyzing text and tags
associated with objects for better access
Computational Linguistics forMetadataBuilding+
Klavans
Judit
hRobert Stein
SusanChun
Raul Guerr
a
Computational Linguistics in Museums:
Applications for Cultural Datasets
INTERDISCIPLINARYRESEARCH
Computational Linguistics
in Museums
Text, Tags, Trust
• Funded in 2008 by IMLS• With the University of Maryland, and
collaborative of museum partners• Studying the relationships between
social tags, scholarly text and resources, and the application of trust networks to improve access to museum collections.
MW 2011 Contributions
• Which Computational Linguistic tools can or should be applied to tags?
• How do these tools impact tag analysis?
• What results differ from the initial steve.museum results from Trant 2007?
• So what – for CL?
• So what – for Museums?
Hard Challenges
• What do these words really mean?• How can tags be related to other tags?
across languagesacross users
• How are tags over museum objects related to tags over anything else?• How can they be used?
FINDING A NEEDLE IN THE HAYSTACK
This canvas was the first one Gauguin painted during the two months he spent in Provence.... Gauguin had rebelled against Impressionism's reliance on the visible world, and he altered nature's shapes and colors to suggest his own more subjective reaction to the landscape.
While the rural subject and acidic colors show the influence of van Gogh, this image is more indebted to Paul Cézanne. In his careful integration of the haystack and farm buildings, Gauguin has echoed Cézanne's emphasis on geometric form.
Gallery Label
TOOLS FOR TAGS• Morphological Analysis – Conflate
when possible– Cats, cat– Haystacks, haystack– Painting, paint ?
• What words are verbs, nouns, adjectives?
• How should multi-word tags be handled?
Raw Tags or Tokens
Results
25%
68%
93%
1. NN=252052. JJ=63193. NNS=40414. NN_NN=22575. JJ_NN=17926. VBG=10437. VBN=7278. NP=7089. OD_NN=45410. JJ_NNS=413
Top 10 POS Patterns:1. NN=67062. NN_NN=17133. JJ_NN=11944. JJ=9215. NNS=7576. JJ_NNS=3037. NN_NNS=3008. VBG=2389. NP=20910. VBN_NN=202
Hard Challenges
• What do these words really mean?• How can tags be related to other tags?
across languagesacross users
• How are tags over museum objects related to tags over anything else?• How can they be used?
• Integral to most language processing pipelines• Irecursor to parsing.
• However, for social tags, parsing is not a meaningful step.
Research:• Understand the nature of this kind of descriptive tagging. • Link part of speech information with other lexical resources for disambiguation
Why Part of Speech?
Gold
Orange
Necklace
Ripe
You shall know a word by the company it keeps. J.R. Firth
WHAT ABOUT “NEW ENGLAND”
• Idioms / lexicalized phrases are more difficult
• Heuristic comparison to Wikipedia Titles matched 46% (30% distinct) of multiword tags
• E.g. “Grapes of Wrath”, “Irish Wolfhound”, “Franco-Prussian War”
*Klavans and Golbeck, 2010
WISH LIST - BETTER WAYS TO TAME THE
PROLIFERATION OF RICH BUT “NOISY” CONTENT
• Clustering over tags for similarity• Clustering over tags and terms from
text• Matching over existing terms to
identify meaningful units• Apply machine learning techniques
to guess meaning• Bigrams, Trigram, Thesauri, Corpus
Analysis
ACKNOWLEDGEMENTSSteve.museum project membersT3 and steve.museum museum partnersUniversity of Maryland, T3 groupIMA Museum ……and other participants
THANK YOU!
Questions?
top related