inducing ontologies from folksonomies using natural language understanding

18
Inducing Ontologies from Folksonomies using Natural Language Understanding Marta Tatu, Dan Moldovan Lymba Corporation Presenter: Chris Irwin Davis

Upload: beverly-nunez

Post on 03-Jan-2016

37 views

Category:

Documents


1 download

DESCRIPTION

Inducing Ontologies from Folksonomies using Natural Language Understanding. Marta Tatu, Dan Moldovan Lymba Corporation Presenter: Chris Irwin Davis. Overview. Folksonomy. lexical normalization of tags semantic consistency tag-tag relations. folksonomy-based applications - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Inducing Ontologies from Folksonomies using Natural Language Understanding

Inducing Ontologies from Folksonomies using Natural Language Understanding

Marta Tatu, Dan MoldovanLymba Corporation

Presenter: Chris Irwin Davis

Page 2: Inducing Ontologies from Folksonomies using Natural Language Understanding

Overview

LREC 2010 May 19th, 2010

NLP

Folksonomy

• typographical errors, spelling variations• singular/plural forms, lower case• space/punctuation used as delimiters• same tag in different contexts• tag synonymy

Ontology

• lexical normalization of tags• semantic consistency• tag-tag relations

social annotations (author vs. user) browse/search bookmarks resource discovery (recommendations) collaborative tagging (across folksonomies)

folksonomy-based applications reasoning applications

Page 3: Inducing Ontologies from Folksonomies using Natural Language Understanding

Semantic Approach

1. Folksonomy semantic representation

2. Tag understandingo Lexical: language identification, tokenization and spelling corrections, capitalization

restoration

o Syntactic: part-of-speech tagging, syntactic parsing

o Semantic: acronym understanding, word sense disambiguation, named entity recognition, semantic parsing

3. Deriving the ontological structureo Semantic relations between tags

• Sources of informationo Tag text semantics

o Social bookmarking annotations

o Machine understanding of bookmark content

LREC 2010 May 19th, 2010

Page 4: Inducing Ontologies from Folksonomies using Natural Language Understanding

Representing Folksonomies

• knowledge

• advertisign

• americanhistory

• read-now

LREC 2010 May 19th, 2010

American[JJ]1 history[NN]2TOPIC

now[RB]3 read[VB]1TEMPORAL

advertising[NN]1

knowledge[NN]1

Page 5: Inducing Ontologies from Folksonomies using Natural Language Understanding

Representing Folksonomies

LREC 2010 May 19th, 2010

SYNONYMY cluster

knowledge

(axlape,www.wolframalpha.com/)(nicksoni,www.curatingthecity.org/map.jsp)(pilx,www.wolframalpha.com/)...

knowledge|NN|1

knowledge,cognition

(bernsnarok,www.wolframalpha.com/)(_tarea_,academicearth.org/)(_tarea_,www.howstuffworks.com/)...

(omnamoprabhu,www.goertzel.org/dynapsyc/dynacon.html)(MikeMolto,cvcl.mit.edu/)(latrippi,nymag.com/news/features/56793/)...

cognition|NN|1

folksonomic tags

associated (user,document) pairs

WN synsetId = 20729

Associated (user, document) pairs

Page 6: Inducing Ontologies from Folksonomies using Natural Language Understanding

Representing Folksonomies

cognition|NN|1; knowledge|NN|1

module|NN|1; faculty|NN|1 organization|NN|1; organisation|NN|2 pattern|NN|1; form|NN|3

ISA ISA ISA

cognitive|JJ|1 PERTAIN perception|NN|1

PW

design|NN|2

ISA

calendar|NN|1

ISA

ISAISA

PDA|NN|1 – Personal Digital Assistant

organization|NN|1; governance|NN|1

SIM

SIM

SIM

adaptive|JJ|1 design|NN|2PAH

instructional|JJ|1 design|NN|2AGT

LREC 2010 May 19th, 2010

Page 7: Inducing Ontologies from Folksonomies using Natural Language Understanding

System Architecture

NLP processing

Document Cache

Document NLP

Repository

Social Tag-Tag & Tag-Doc Associations

Lexical Processing

of Tags

Syntactic Processing

of Tags

Semantic Processing

of Tags

Social Annotations

user

document

tag

Doc-2-TextLanguage IdentificationNLP of EN documents: Tokenization Part-of-speech tagging Sentence boundary detection Named entity recognition Syntactic parsing Word sense disambiguation Semantic parsing

Semantic Representation

of Tags

Ontology generation(Tag-Tag relations)

Applications: Search, browse, visualize Recommendations Collaborative tagging

Tag Classification Rules

Induced Ontology

LREC 2010 May 19th, 2010

Page 8: Inducing Ontologies from Folksonomies using Natural Language Understanding

Tag Understanding

Sources used to understand tags

Tag text Social bookmarking data Document content

Lexical

Language identification X X X

Tokenization and Spell checking X X X

Capitalization restoration X X

SyntacticPart-of-speech tagging X X

Syntactic parsing X

Semantic

Abbreviation and acronym expansion X X X

Word sense disambiguation (+ ner) X X X

Semantic parsing X

LREC 2010 May 19th, 2010

Page 9: Inducing Ontologies from Folksonomies using Natural Language Understanding

Acronym/Abbreviation Understanding• Abbreviation dictionary: (abbreviation - expansion - domain of usage)

o 118,055 distinct abbreviations

o 137 domains: Law, Music, TV/Radio Stations, Countries, Airport, Domain Names, Chat, Emoticons, etc.

o 25% of the abbreviations have more than one definition

• (unambiguous) Zip codes – (76012 : Arlington, TX)

• (ambiguous) SS : 192 definitions in 66 domains

o Social Security – Business and US Government, Screen Saver – File Extensions, Stainless Steel – Housing and Products, Subtropical Storm – Meteorology, Style Sheet – Software

• Check tag if part of abbreviation dictionary

• Use lexical chains to link document content to abbreviation domain

• Use co-occurring tags to identify correct expansion

• Use text alignment to find new abbreviation definitions within document content

LREC 2010 May 19th, 2010

Page 10: Inducing Ontologies from Folksonomies using Natural Language Understanding

Acronym/Abbreviation Understanding• “PR” ~ 1409 documents

• 87 definitions for PR

o Press Release, Public Relations, Puerto Rico, Page Rank, Public Radio, Permanent Resident/Residency, etc.

• http://prsarahevans.com/2009/06/do-you-have-a-strategy-for-online-comments

o “PR” = “public relations” (6 times in document content)

o Other tags of the bookmark: “public”, “relations”, “media”, “strategy”

• http://www.bbc.co.uk/pressoffice/pressreleases/category/new_media_index.shtml

o “PR” = “press releases” (in document content)

• http://escape.topuertorico.com

o “PR” = “Puerto Rico” (in document content)

LREC 2010 May 19th, 2010

Page 11: Inducing Ontologies from Folksonomies using Natural Language Understanding

Evaluation

• Experimental datao ~ 150,000 (user,document,tag) from del.icio.us

• 8,460 tags; 83,827 documents; 58,198 users

• Main error source: tag cannot be identified within documento Lack of document content (images, non-EN content, etc.)

• Errors propagate from initial processing steps to later oneso Bad capitalization leads to bad named entity recognition

LREC 2010 May 19th, 2010

Page 12: Inducing Ontologies from Folksonomies using Natural Language Understanding

Ontological Tag-Tag Relations• EQUALITY relations

o same lemma, part-of-speech, and sense number

o EQ(activity, activities), EQ(after-effects, AfterEffects), EQ(opinion, Opnion), etc.

• SYNONYMY clusters

o Same synset id

o SYN(OS, operating.system), SYN(LA, losangeles), SYN (nyt, nytimes)

• ISA relations between named entities and type tags

o ISA(OracleCorporation, organization), ISA(davidfosterwallace, person)

• WordNet relations between tags

o ISA(vegan, vegetarian), ANTONYMY(peace, war), PART_WHOLE(Businesses, markets), ENTAIL(proofreading, +read), SIMILARITY(important, general), DOMAIN(light, physics)

LREC 2010 May 19th, 2010

Page 13: Inducing Ontologies from Folksonomies using Natural Language Understanding

Ontological Tag-Tag Relations• Lexical chains of size 2 and Semantic calculus

– tag1 rel1 synset rel2 tag2

• rel1 & rel2 rel3

• rel3(tag1, tag2) is added to the ontology

– ISA(integration, events,) ISA(integration, group_action/NN/1) and ISA(group_action/NN/1, events,)

– PART_WHOLE(lobby, hotels) PART_WHOLE(lobby, building/NN/1) and ISA(building/NN/1, hotels)

• ISA relations between “modifier head” and “head” tags

– ISA(book-cover, covers)

– ISA(theoryofmind, theory)

– ISA(photoshoptutorials, tutorials,)

LREC 2010 May 19th, 2010

Page 14: Inducing Ontologies from Folksonomies using Natural Language Understanding

Ontological Tag-Tag Relations

• Relations between “modifieri headi” tags (i=1,2)

– ISA(build-solar-panel, create-solar-panel)

– SIMILARITY(socialnetworks, socialweb)

LREC 2010 May 19th, 2010

modifier2

modifier1

ISA

head2

head1

ISA

modifier2

modifier1

ISA

head2

head1

SYN

modifier2

modifier1

SYN

head2

head1

ISA& & &OR OR

head2

modifier2

REL

head2

modifier2

REL

ISA⇒

Page 15: Inducing Ontologies from Folksonomies using Natural Language Understanding

Evaluation

• 9,820 EQ clusters for the 8,460 unique tagso Same abbreviation expanded to different definitions

o EQ: tutorial, tutorials, tutorials,

• 8,801 SYN clusterso Largest cluster (133 bookmarks): car, automobiles, auto, autos, cars,

automobile

• 17% of tags placed into incorrect SYN clustero Errors caused by imperfect word sense disambiguation

• 5,439 ontological tag-tag relationso 3,869 ISA, 601 SIMILARITY, 429 PART_WHOLE, etc.

o 1,778 relations derived using WordNet’s lexical chains and Lymba’s semantic calculus rules

LREC 2010 May 19th, 2010

Page 16: Inducing Ontologies from Folksonomies using Natural Language Understanding

Folksonomic Ontology

LREC 2010 May 19th, 2010

• Portion of ontology generated from experimental folksonomy

Page 17: Inducing Ontologies from Folksonomies using Natural Language Understanding

Folksonomic Ontology

LREC 2010 May 19th, 2010

• Portion of ontology generated from experimental folksonomy

Page 18: Inducing Ontologies from Folksonomies using Natural Language Understanding

Thank you!

For questions: email [email protected]