challenge@ruleml2015 transformation and aggregation preprocessing for top-k recommendation gap rules...

9

Click here to load reader

Upload: ruleml

Post on 17-Aug-2015

24 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Challenge@RuleML2015 Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

Transformation and aggregation preprocessing for top-k recommendation

GAP rules induction

Marta Vomlelova, Michal Kopecky and Peter Vojtas

Charles University Prague

Page 2: Challenge@RuleML2015 Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

Content

• Data

• Task

• Mining – heuristics, domain specific, … • Some results

• Mining - transferable methods , data aggregations • Some results

• Oracle DB Data Miner

• Second order logic GAP rules

• Conclusions

RuleML-2015 Challenge Rule-based RS for the web of data

Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

2

Page 3: Challenge@RuleML2015 Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

RuleML-2015 Challenge Rule-based RS for the web of data

3

Page 4: Challenge@RuleML2015 Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

Task

• Run Python script train data – intermediate join processing size big, redundant (for each UserID,MovieID the 5003 movie data repeat)

• For each user find 5 movies that best match a user profile top5(u)

• Submit CSV format: userId, movieId, score\n

• Observations

• Score does not affect system response, only (unordered) sets are compared

• P, R, F@5 between top5(u) and varying size target (estimated average size of target is 9.4 resp. 8, depending on assumptions)

RuleML-2015 Challenge Rule-based RS for the web of data

Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

4

Page 5: Challenge@RuleML2015 Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

Mining – heuristics, domain specific, …

• 5003 DBPedia attributes – most frequent, clusters of properties, tried mining, no relevant results (acquaintance with data)

• per attribute: • relative frequency in ratings, NLP extraction

MAKEUP,VISUAL,SMIX,SEDIT,SPIELBERG,NY,CALIF,NOVELS,CAMERON,LA,ARIZONA,WILLIAMS• KSI Pure first order logic with weighted average F = 0.05262 (our third)

• 0-1 order agreement with ratings ( good properties)• 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG• SCS_CUNI “Spielberg” F = 0.10681 (our best)

• Script downloaded table Xratings \DB Ratings gave surprise• disqualified Did not use only the training/test set F = 0.6987 • Precision: 0.9994 * 5000 = 4997 – three users have target set of size 4

RuleML-2015 Challenge Rule-based RS for the web of data

Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

5

Page 6: Challenge@RuleML2015 Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

Transferable methods , data aggregations

• GenreMatch (genres in users ratings versus movie genres) and decision tree drastic pruning

• KTIML Data mining combined with first order 0.10085 (our second)

RuleML-2015 Challenge Rule-based RS for the web of data

Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

6

RulePreference Rule

0.11 R1:GoodProperty=1

0.25 R2: 113.5<CNT<400

0.29 R3: R1 and R2

0.58 R4: GoodProperty=0& CNT>399

0.57 R5: GoodProperty=1 & CNT>399

Page 7: Challenge@RuleML2015 Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

RuleML-2015 Challenge Rule-based RS for the web of data

Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

7

Oracle DB Data Miner

Page 8: Challenge@RuleML2015 Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

Second order logic GAP rules

• DB aggregations second order logic

• “simple” queries can be transformed to rules. E.g. SELECT UserID, MovieID, 5 FROM Ordered_Prediction WHERE OrdNr <= 5; …

… 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG

• corresponds to GAP rule • SCS_CUNI_Movie(u,m):100*x1+50*x2+ x3

• SPIELBERG(m): x1 & ORIGINAL(m): x2 & BAYESAVG(m):x3

• Semantics so far:• 2GAP - facts extended by atomic predicates corresponding to tables resulting

from database aggregations e.g. SPIELBERG(m), ORIGINAL(m), BAYESAVG(m)

RuleML-2015 Challenge Rule-based RS for the web of data

Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

8

Page 9: Challenge@RuleML2015 Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

Conclusions

• Data too big for rule induction tools – all processing in a relational DB

• Transformation via NLP extraction. Clustering and importance of attributes

• Data base aggregation – CNT, AVG, ….

• “simple” rules (in a second order logic GAP)

• Rules give explanation intuitive for humans

• Precision - In ideal case we gave 75% of users at least one correct recommendation

• Future work – distribution of learning quality along users (not only AVG)

RuleML-2015 Challenge Rule-based RS for the web of data

Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

9