semantics-based news recommendation with sf-idf+ international conference on web intelligence,...
TRANSCRIPT
Semantics-Based News Recommendation with SF-IDF+
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
June 13, 2013
Marnix [email protected]
Michel [email protected]
Frederik [email protected]
Flavius [email protected]
Erasmus University RotterdamPO Box 1738, NL-3000 DRRotterdam, the Netherlands
Introduction (1)
• Recommender systems help users to plough through a massive and increasing amount of information
• Recommender systems:– Content-based– Collaborative filtering– Hybrid
• Content-based systems are often term-based
• Common measure: Term Frequency – Inverse Document Frequency (TF-IDF) as proposed by Salton and Buckley [1988]
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
Introduction (2)
• One could take into account semantics:– Semantic Similarity (SS) recommenders:
• Jiang & Conrath [1997]• Leacock & Chodorow [1998]• Lin [1998]• Resnik [1995]• Wu & Palmer [1994]
– Concepts instead of terms → Concept Frequency – Inverse Document Frequency (CF-IDF):
• Reduces noise caused by non-meaningful terms• Yields less terms to evaluate• Allows for semantic features, e.g., synonyms• Relies on a domain ontology• Published at WIMS 2011
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
Introduction (3)
• One could take into account semantics:– Synsets instead of concepts → Synset Frequency – Inverse
Document Frequency (SF-IDF):• Similar to CF-IDF• Does not rely on a domain ontology• Published at WIMS 2012
– Research has shown that relationships like synonymy, hyponymy, … provide structure and contribute to an improved level of interpretability
– Hence, we coin SF-IDF+, which additionally accounts for synset semantic relationships
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
Introduction (4)
• Implementations in Ceryx (as a plug-in for Hermes [Frasincar et al., 2009], a news processing framework)
• What is the performance of semantic recommenders?– SF-IDF+ vs. SF-IDF– SF-IDF+ vs. TF-IDF– SF-IDF+ vs. SS
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
Framework: User Profile
• User profile consists of all read news items
• Implicit preference for specific topics
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
Framework: Preprocessing
• Before recommendations can be made, each news item is parsed:– Tokenizer– Sentence splitter– Lemmatizer– Part-of-Speech
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
Framework: Synsets
• We make use of the WordNet dictionary and WSD
• Each word has a set of senses and each sense has a set of semantically equivalent synonyms (synsets):– Turkey:
• turkey, Meleagris gallopavo (animal)• Turkey, Republic of Turkey (country)• joker, turkey (annoying person)• turkey, bomb, dud (failure)
– Fly:• fly, aviate, pilot (operate airplane)• flee, fly, take flight (run away)
• Synsets are linked using semantic pointers– Hypernym, hyponym, …
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
Framework: TF-IDF
• Term Frequency: the occurrence of a term ti in a document dj, i.e.,
• Inverse Document Frequency: the occurrence of a term ti in a set of documents D, i.e.,
• And hence
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
k jk
jiji n
ntf
,
,,
|}:{|
||log
jii dtj
Didf
ijiji idftfidftf ,,-
Framework: SF-IDF
• Synset Frequency: the occurrence of a synset si in a document dj, i.e.,
• Inverse Document Frequency: the occurrence of a synset si in a set of documents D, i.e.,
• And hence
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
k jk
jiji n
nsf
,
,,
|}:{|
||log
jii dsj
Didf
ijiji idfsfidfsf ,,-
Framework: SF-IDF+
• Synset Frequency: the occurrence of a synset si and its related synsets ri in a document dj, i.e.,
• Inverse Document Frequency: the occurrence of synsets si and ri in a set of documents D, i.e.,
• Weighting is applied depending on relations, and hence
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
k jk
jiji n
nsf
,
,,
|},:{|
||log
jiii drsj
Didf
rijirji widfsfidfsf ,,,-
Framework: SS (1)
• TF-IDF and SF-IDF(+) use cosine similarity:– Two vectors:
• User profile items scores• News message items scores
– Measures the cosine of the angle between the vectors
• Semantic Similarity (SS):– Two vectors:
• User profile synsets• News message synsets
– Jiang & Conrath [1997], Resnik [1995] , and Lin [1998]: information content of synsets
– Leacock & Chodorow [1998] and Wu & Palmer [1994]:path length between synsets
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
Framework: SS (2)
• SS score is calculated by computing the pair-wise similarities between synsets in the unread document u and the user profile r:
where W is a vector with all combinations of synsets from r and u that have a common Part-of-Speech, and where sim(u,r) is any of the mentioned SS measures.
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
||
),(
)( ),(
W
rusim
urank Wru
Implementation: Hermes
• Hermes framework is utilized for building a news personalization service for RSS
• Its implementation is the Hermes News Portal (HNP):– Programmed in Java– Uses OWL / SPARQL / Jena / GATE / WordNet
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
Implementation: Ceryx
• Ceryx is a plug-in for HNP
• Uses WordNet / Stanford POS Tagger / JAWS lemmatizer / Lesk WSD
• Main focus is on recommendation support
• User profiles are constructed
• Computes TF-IDF, SF-IDF, SF-IDF+, and SS
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
Evaluation (1)
• Experiment:– We let 19 participants evaluate 100 news items– We use 8 different user profiles focusing on various topics– Ceryx computes TF-IDF, SF-IDF, SF-IDF+, and SS for
various cut-off values– F1 scores are evaluated
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
Evaluation (2)
• Results:
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
TF-IDFSF-IDF+
SS
Evaluation (2)
• Results:
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
Conclusions
• Common recommendation is performed using TF-IDF
• Semantics could be considered by considering synsets and their relations
• Semantics-based recommendation outperforms the classic term-based recommendation
• Future work:– Employ also the similarity of words (e.g., named entities)
missing from WordNet (e.g., based on the Google Distance)– Compare SF-IDF, SF-IDF+, and SS with LDA (latent dirichlet
allocation) and ESA (explicit semantic analysis)
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
Questions
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)