semantics-based news recommendation international conference on web intelligence, mining, and...

17
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle [email protected] Marnix Moerland [email protected] Flavius Frasincar [email protected] Frederik Hogenboom [email protected] Erasmus University Rotterdam PO Box 1738, NL-3000 DR Rotterdam, the Netherlands

Upload: domenic-ellis

Post on 17-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

Introduction (2) One could take into account semantics: –Semantic Similarity (SS) recommenders: Jiang & Conrath [1997] Leacock & Chodorow [1998] Lin [1998] Resnik [1995] Wu & Palmer [1994] –Concepts instead of terms → Concept Frequency – Inverse Document Frequency (CF-IDF): Reduces noise caused by non-meaningful terms Yields less terms to evaluate Allows for semantic features, e.g., synonyms Relies on a domain ontology Published at WIMS 2011 International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

TRANSCRIPT

Page 1: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Semantics-Based News Recommendation

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

June 14, 2012

Michel [email protected]

Marnix [email protected]

Flavius [email protected]

Frederik [email protected]

Erasmus University RotterdamPO Box 1738, NL-3000 DRRotterdam, the Netherlands

Page 2: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Introduction (1)• Recommender systems help users to plough through

a massive and increasing amount of information

• Recommender systems:– Content-based– Collaborative filtering– Hybrid

• Content-based systems are often term-based

• Common measure: Term Frequency – Inverse Document Frequency (TF-IDF) as proposed by Salton and Buckley [1988]

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

Page 3: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Introduction (2)• One could take into account semantics:

– Semantic Similarity (SS) recommenders:• Jiang & Conrath [1997]• Leacock & Chodorow [1998]• Lin [1998]• Resnik [1995]• Wu & Palmer [1994]

– Concepts instead of terms → Concept Frequency – Inverse Document Frequency (CF-IDF):

• Reduces noise caused by non-meaningful terms• Yields less terms to evaluate• Allows for semantic features, e.g., synonyms• Relies on a domain ontology• Published at WIMS 2011

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

Page 4: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Introduction (3)• One could take into account semantics:

– Synsets instead of concepts → Synset Frequency – Inverse Document Frequency (SF-IDF):

• Similar to CF-IDF• Does not rely on a domain ontology

• Implementations in Ceryx (as a plug-in for Hermes [Frasincar et al., 2009], a news processing framework)

• What is the performance of semantic recommenders?– TF-IDF vs. SF-IDF– TF-IDF vs. SS

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

Page 5: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Framework: User Profile• User profile consists of all read news items

• Implicit preference for specific topics

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

Page 6: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Framework: Preprocessing• Before recommendations can be made, each news

item is parsed:– Tokenizer– Sentence splitter– Lemmatizer– Part-of-Speech

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

Page 7: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Framework: Synsets• We make use of the WordNet dictionary and WSD

• Each word has a set of senses and each sense has a set of semantically equivalent synonyms (synsets):– Turkey:

• turkey, Meleagris gallopavo (animal)• Turkey, Republic of Turkey (country)• joker, turkey (annoying person)• turkey, bomb, dud (failure)

– Fly:• fly, aviate, pilot (operate airplane)• flee, fly, take flight (run away)

• Synsets are linked using semantic pointers– Hypernym, hyponym, …

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

Page 8: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Framework: TF-IDF• Term Frequency: the occurrence of a term ti in a

document dj, i.e.,

• Inverse Document Frequency: the occurrence of a term ti in a set of documents D, i.e.,

• And hence

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

k jk

jiji n

ntf

,

,,

|}:{|||log

jii dtj

Didf

ijiji idftfidftf ,,-

Page 9: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Framework: SF-IDF• Synset Frequency: the occurrence of a synset si in a

document dj, i.e.,

• Inverse Document Frequency: the occurrence of a synset si in a set of documents D, i.e.,

• And hence

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

k jk

jiji n

nsf

,

,,

|}:{|||log

jii dsj

Didf

ijiji idfsfidfsf ,,-

Page 10: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Framework: SS (1)• TF-IDF and SF-IDF use cosine similarity:

– Two vectors: • User profile items scores• News message items scores

– Measures the cosine of the angle between the vectors

• Semantic Similarity (SS):– Two vectors:

• User profile synsets• News message synsets

– Jiang & Conrath [1997], Resnik [1995] , and Lin [1998]: information content of synsets

– Leacock & Chodorow [1998] and Wu & Palmer [1994]:path length between synsets

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

Page 11: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Framework: SS (2)• SS score is calculated by computing the pair-wise

similarities between synsets in the unread document u and the user profile r:

where W is a vector with all combinations of synsets from r and u that have a common Part-of-Speech, and where sim(u,r) is any of the mentioned SS measures.

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

||

),()( ),(

W

rusimurank Wru

Page 12: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Implementation: Hermes• Hermes framework is utilized for building a news

personalization service for RSS

• Its implementation is the Hermes News Portal (HNP):– Programmed in Java– Uses OWL / SPARQL / Jena / GATE / WordNet

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

Page 13: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Implementation: Ceryx• Ceryx is a plug-in for HNP

• Uses WordNet / Stanford POS Tagger / JAWS lemmatizer / Lesk WSD

• Main focus is on recommendation support

• User profiles are constructed

• Computes TF-IDF, SF-IDF, and SS

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

Page 14: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Evaluation (1)• Experiment:

– We let 19 participants evaluate 100 news items– User profile: all articles that are related to Microsoft, its

products, and its competitors– Ceryx computes TF-IDF, SF-IDF, and SS with cut-off of 0.5– Measurements:

• Accuracy• Precision• Recall• Specificity• F1-measure• t-tests for determining significance

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

Page 15: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Evaluation (2)• Results:

– SF-IDF significantly outperforms TF-IDF– Almost all SS methods significantly outperform TF-IDF

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

Measure TF-IDF SF-IDF J&C L&C L R W&P

Accuracy 78.2% 80.1% 78.3% 59.5% 38.1% 74.5% 58.5%

Precision 77.4% 77.8% 64.2% 33.7% 19.9% 56.4% 35.3%

Recall 22.0% 35.9% 29.3% 63.5% 49.7% 40.0% 73.6%

Specificity 97.2% 94.7% 94.6% 57.9% 34.0% 86.3% 52.6%

F1-measure 32.0% 46.8% 38.4% 43.2% 27.7% 42.8% 47.1%

Page 16: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Conclusions• Common recommendation is performed using TF-IDF

• Semantics could be considered by considering synsets:– SF-IDF– SS

• Semantics-based recommendation outperforms the classic term-based recommendation

• Future work:– Employ also the similarity of words (e.g., named entities)

missing from WordNet (e.g., based on the Google Distance)– Compare CF-IDF, SF-IDF, and SS with LDA (latent dirichlet

allocation) and ESA (explicit semantic analysis)

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

Page 17: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

Questions

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)