content modelling for recommendation - consorcio mavir · content modelling for recommendation...

Content Modelling for Recommendation

Angel Castellanos González:[email protected]

Background


Content ModellingExperimental Environment: RepLab 2013

• Detect topics in tweets

• 61 Entities in 4 domains

• Annotated Training Set + Test Set to Classify– Clustering Task

• Reflect the Previous Knowledge about the Entities ¿?

• Set the number of topics ¿?

• Evaluation– Entity Level

– Reliability & Sensitivity

Content ModellingFCA-Based Content Modelling & Organization

• Unsupervised approach

• Avoid clustering problems

• Adaptation to new topics, but taking into account previous knowledge

FCA Overview [Wille, 1992]

Extent Intent

C1 {Doc1, Doc2, Doc3, Doc4} {∅}

C2 {Doc1, Doc2, Doc3} {P}

C3 {Doc1, Doc4} {J}

C4 {Doc1} {P, J, PJ}

C5 {Doc2} {P, NP}

C6 {Doc3} {P, AP}

C7 {∅} {P, NP, AP, J, PJ}

P NP AP J PY

Doc1 X X X

Doc2 X X

Doc3 X X

Doc4 X

Formal Context: 𝕂 ∶= (𝑮,𝑴, 𝑰)

• 𝑮: tweets

• 𝑴: terms (hashtag, url, word) in the tweets

• 𝑰 tweet 𝒈 has the term 𝒎

Term1Term2 Term3 Term4

Tweet1 X X X

Tweet2 X X

Formal Context Generation

Concept Lattice Generation

Term1Term2 Term3 Term4 Term5 Term6 Term7 Term8

Tweet1 X

Tweet2 X X X X

Tweet3 X X X X

Tweet4 X

Tweet5 X X X X

Tweet6 X X X X

Tweet7 X X X X

Topic Selection

Stability

𝜎𝑖 𝐴, 𝐵 =| 𝐶 ⊆ 𝐴 𝐶′ = 𝐵}|

2 𝐴

where 𝐴 is the number of objects in 𝐴 and 𝐶 is each subset of 𝐴 whoseconcept’s intent (𝐶′) is equal to the concept intent of 𝐴, that is, 𝐶′ = 𝐵.

Data Analysis


FilteringManaging of Noisy Data

• Does it really affects? Sure, but how much?– Is it valuable to take into account?

• Comparison:– KLD-Filtering

• Initial bad-performing approach

– Best-performing case

Reliability Sensitivity F(R,S)

KLD-based Filtering [0,6735 - 0,8331] [0,1076 - 0,1092] [0,1548 - 0,1711]

Gold Standard Based Filtering [0,6184 - 0,6615] [0,1940 – 0,2469] [0,1730 - 0,2336]

Code Bug Fixing

Reliability Sensitivity F(R,S)

With Code Bug [0,6184 - 0,6615] [0,1940 – 0,2469] [0,1730 - 0,2336]

Without Code Bug [0,1678 - 0,3021] [0,3343 – 06678] [0,2242 - 0,2882]

Attribute SelectionPre-selection of the top-representative attributes

• Knowledge lost or noise reduction?

• 2 Parameters: Lower &Upper Threshold

LT UT Reliability Sensitivity F(R,S)

1 50 0,3021 0,3343 0,2882

1 25 0,3029 0,3324 0,2878

1 10 0,3039 0,3311 0,2877

5 50 0,1678 0,6778 0,2242

5 25 0,1680 0,6746 0,2235

5 10 0,1685 0,6715 0,2236

Attribute SelectionOnly lower threshold really affects. ¿Why?

LowerThreshold

UpperThreshold

# of GeneratedConcepts Average Concepts by Entity

1 50 29836 489

1 25 31384 514

1 10 32566 533

5 50 1100 18

5 25 1154 18

5 10 1258 20

1 E+0

1 E+1

1 E+2

1 E+3

1 E+4

1 E+5

1 E+6

Nu

mb

er o

f A

ttri

bu

tes

(lo

g)

Frequency of Occurrence

5%1%

Attribute SelectionThe lower the threshold, the better the results ¿?

LT Reliability Sensitivity F(R,S)

5 0,1678 0,6778 0,2242

1 0,3021 0,3343 0,2882

0.5 0,3836 0,2412 0,2710

0.1 0,4075 0,2204 0,2671

0,0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

5 1 0.5 0.1

F-measure

Reliability

Sensitivity

Topic Selection - StabilitySelect the most-suitable concepts

Stability Value Reliability Sensitivity F(R,S)

0.2 0,4090 0,3163 0,3258

0.4 0,4090 0,3163 0,3258

0.5 0,3455 0,3228 0,3041

0.7 0,3407 0,3236 0,3027

0.9 0,3029 0,3324 0,2878

BEST_REPLAB_APPROACH 0,4624 0,3246 0,3252

Topic AdaptationSame performance for seen & unseen topics¿?

Topics Reliability Sensitivity F(R,S)

All 0,4090 0,3163 0,3258

Seen 0,5764 0,3018 0,3730

Unseen 0,4504 0,3302 0,3447

FCA Recommendation


RecommendationCollaborative Filtering RS (CFRS)

Photo by Moshain /CC BY-SA

RecommendationCollaborative Filtering RS (CFRS)

Content-Based RS (CBRS)

Collaborative Filtering RS (CFRS)

Content-Based RS (CBRS)

Hybrid RS (HRS)

Recommendation

FCA for RecommendationWhy FCA?

FCA for Recommendation: SoftACFRS

• Entry Levels

• Association Rules

• Fuzzy FCA

CBRS

• Keyword-Based

• Item Modelling

Toy ExamplesNot really representatives

Don’t take advantage of the CBRS potential(features, keywords)

FCA for Recommendation: Basics

CFRS vs. CBRS = Model Users vs. Model Contents

FCA can infer concepts from contents but…

Can be inferred concepts from User preferences?

• FCA vs. Recommendation Exploratory vs. Predictive task

FCA for Recommendation: Basics

Intuition

• Content-Based

• Collaborative Filtering– Solution: Association rules ¿?

{milk, nappies} {beer}

FCA Recommendation ApproachesCollaborative Filtering: User vs. Items

• Take similar users

• Recommend new items related to these users

• Algorithm already developed: Based on the lattice structure

Content-Based: Content vs. Features

Hybrid Filtering: User vs. Features

Test Bed: FCA-based CFRSMovieLens• State of the Art Dataset

LDosCoMoDa• Context Information:

– Time, Daytype, Season, Location, Weather, Social, Mood, Physical,Decision and Interaction

• 122 Users, 1233 Items, 2300 ratings

Movie Tweetings• 100k Tweets rating IMDB movies

“I rated The Matrix 9/10 http://www.imdb.com/title/tt0133093/ #IMDb”

Test Bed: FCA-based CBRSPlista: 80GB Dataset

• Features: Contextual and Content-related

• User Interactions

DBBook

• Features: LD Information about the books

• Interactions: Users and a set of books

RERmovie

• Features: LD Information about the movie

• Interactions:– User-item ratings

– User-attributes ratings

Explanations in RecommendationChristian Scheel, Angel Castellanos, Thebin Lee, Ernesto WilliamDe Luca. The Reason Why: A Survey of Explanations forRecommender Systems

Attribute-Based ExplanationsYou like Mafia movies I recommend you “The Godfather”

SoftA

• Based on item ratings Infer wrong preferencesI like The GodfatherI like Marlon Brando but I don’t like Robert Duvall

Proposal

• Based on item attributes ratings

• Dataset: RERmovie

Dataset - RERmovieUser Study: Items ratings attribute by attribute

Some numbers:• 53 Users

• 650 ratings in 299 different movies

• 6597 reasons for movie ratings

ProposalExplanation Retrieval• Attribute rating = quality measure

– Based on the scores of the users.

• ∀𝒊: Item Model 𝒎𝒊 → 𝑨𝒊: Attribute set to explain 𝒊– 𝑨𝒊

+: 𝒂 ∈ 𝑨𝒊 where 𝒓𝒂 > 𝒕𝟏– 𝑨𝒊

−: 𝒂 ∈ 𝑨𝒊 where 𝒓𝒂 < 𝒕𝟐

• Explanations– ∀𝒊 : Pro (Pr) and contra (Cr) reasons

Evaluation• Pr and Cr are compared to the user feedback

– Measure: Precision and Recall and F1

Results

Work LoadConcept Modelling

• FCA posed as a more-than-suitable approach

• Extensive data analysis

FCA 4 Recommendation

• FCA Algorithm almost developed

Explanations 4 Recommendation

• Novel approach presented

Reduction Algorithm (1)

S

G

AUX

ddddddddP

DDDDDDDDDDD

87654321

10987654321

,,,,,,,

,,,,,,,,,

76521

2

76521

8765431

109843

,,,,

,,,,

,,,,,,

,,,,

DDDDDG

dS

DDDDDAUX

dddddddP

DDDDDD

87654321

32

843

876541

109

,,,,,,,

,

,,

,,,,,

,

DDDDDDDDG

ddS

DDDAUX

ddddddP

DDD

10987654321

432

109

87651

,,,,,,,,,

,,

,

,,,,

DDDDDDDDDDG

dddS

DDAUX

dddddP

D

Reduction Algorithm (2)

10987654321

432

109

87651

,,,,,,,,,

,,

,

,,,,

DDDDDDDDDDG

dddS

DDAUX

dddddP

D

Annotation Agreement

RERmovie Data AnalysisMost prominent Attributes

• Positives: Genres (35%), Actors (31%), Director (6%)

• Negatives: Actors (26%), Genres (18%)

Most ratings are positives…

255

217

103

48

27

RERmovie Data Analysis (2)Leads on the overall percentage of contra-reasons is low

content modelling for recommendation - consorcio mavir · content modelling for recommendation...

Documents