challenge@ruleml2015 transformation and aggregation preprocessing for top-k recommendation gap rules...

Post on 17-Aug-2015

24 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Transformation and aggregation preprocessing for top-k recommendation

GAP rules induction

Marta Vomlelova, Michal Kopecky and Peter Vojtas

Charles University Prague

Content

• Data

• Task

• Mining – heuristics, domain specific, … • Some results

• Mining - transferable methods , data aggregations • Some results

• Oracle DB Data Miner

• Second order logic GAP rules

• Conclusions

RuleML-2015 Challenge Rule-based RS for the web of data

Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

2

RuleML-2015 Challenge Rule-based RS for the web of data

3

Task

• Run Python script train data – intermediate join processing size big, redundant (for each UserID,MovieID the 5003 movie data repeat)

• For each user find 5 movies that best match a user profile top5(u)

• Submit CSV format: userId, movieId, score\n

• Observations

• Score does not affect system response, only (unordered) sets are compared

• P, R, F@5 between top5(u) and varying size target (estimated average size of target is 9.4 resp. 8, depending on assumptions)

RuleML-2015 Challenge Rule-based RS for the web of data

Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

4

Mining – heuristics, domain specific, …

• 5003 DBPedia attributes – most frequent, clusters of properties, tried mining, no relevant results (acquaintance with data)

• per attribute: • relative frequency in ratings, NLP extraction

MAKEUP,VISUAL,SMIX,SEDIT,SPIELBERG,NY,CALIF,NOVELS,CAMERON,LA,ARIZONA,WILLIAMS• KSI Pure first order logic with weighted average F = 0.05262 (our third)

• 0-1 order agreement with ratings ( good properties)• 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG• SCS_CUNI “Spielberg” F = 0.10681 (our best)

• Script downloaded table Xratings \DB Ratings gave surprise• disqualified Did not use only the training/test set F = 0.6987 • Precision: 0.9994 * 5000 = 4997 – three users have target set of size 4

RuleML-2015 Challenge Rule-based RS for the web of data

Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

5

Transferable methods , data aggregations

• GenreMatch (genres in users ratings versus movie genres) and decision tree drastic pruning

• KTIML Data mining combined with first order 0.10085 (our second)

RuleML-2015 Challenge Rule-based RS for the web of data

Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

6

RulePreference Rule

0.11 R1:GoodProperty=1

0.25 R2: 113.5<CNT<400

0.29 R3: R1 and R2

0.58 R4: GoodProperty=0& CNT>399

0.57 R5: GoodProperty=1 & CNT>399

RuleML-2015 Challenge Rule-based RS for the web of data

Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

7

Oracle DB Data Miner

Second order logic GAP rules

• DB aggregations second order logic

• “simple” queries can be transformed to rules. E.g. SELECT UserID, MovieID, 5 FROM Ordered_Prediction WHERE OrdNr <= 5; …

… 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG

• corresponds to GAP rule • SCS_CUNI_Movie(u,m):100*x1+50*x2+ x3

• SPIELBERG(m): x1 & ORIGINAL(m): x2 & BAYESAVG(m):x3

• Semantics so far:• 2GAP - facts extended by atomic predicates corresponding to tables resulting

from database aggregations e.g. SPIELBERG(m), ORIGINAL(m), BAYESAVG(m)

RuleML-2015 Challenge Rule-based RS for the web of data

Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

8

Conclusions

• Data too big for rule induction tools – all processing in a relational DB

• Transformation via NLP extraction. Clustering and importance of attributes

• Data base aggregation – CNT, AVG, ….

• “simple” rules (in a second order logic GAP)

• Rules give explanation intuitive for humans

• Precision - In ideal case we gave 75% of users at least one correct recommendation

• Future work – distribution of learning quality along users (not only AVG)

RuleML-2015 Challenge Rule-based RS for the web of data

Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

9

top related