challenge@ruleml2015 transformation and aggregation preprocessing for top-k recommendation gap rules...
Post on 17-Aug-2015
24 Views
Preview:
TRANSCRIPT
Transformation and aggregation preprocessing for top-k recommendation
GAP rules induction
Marta Vomlelova, Michal Kopecky and Peter Vojtas
Charles University Prague
Content
• Data
• Task
• Mining – heuristics, domain specific, … • Some results
• Mining - transferable methods , data aggregations • Some results
• Oracle DB Data Miner
• Second order logic GAP rules
• Conclusions
RuleML-2015 Challenge Rule-based RS for the web of data
Transformation and aggregation preprocessing for top-k recommendation GAP rules induction
2
RuleML-2015 Challenge Rule-based RS for the web of data
3
Task
• Run Python script train data – intermediate join processing size big, redundant (for each UserID,MovieID the 5003 movie data repeat)
• For each user find 5 movies that best match a user profile top5(u)
• Submit CSV format: userId, movieId, score\n
• Observations
• Score does not affect system response, only (unordered) sets are compared
• P, R, F@5 between top5(u) and varying size target (estimated average size of target is 9.4 resp. 8, depending on assumptions)
RuleML-2015 Challenge Rule-based RS for the web of data
Transformation and aggregation preprocessing for top-k recommendation GAP rules induction
4
Mining – heuristics, domain specific, …
• 5003 DBPedia attributes – most frequent, clusters of properties, tried mining, no relevant results (acquaintance with data)
• per attribute: • relative frequency in ratings, NLP extraction
MAKEUP,VISUAL,SMIX,SEDIT,SPIELBERG,NY,CALIF,NOVELS,CAMERON,LA,ARIZONA,WILLIAMS• KSI Pure first order logic with weighted average F = 0.05262 (our third)
• 0-1 order agreement with ratings ( good properties)• 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG• SCS_CUNI “Spielberg” F = 0.10681 (our best)
• Script downloaded table Xratings \DB Ratings gave surprise• disqualified Did not use only the training/test set F = 0.6987 • Precision: 0.9994 * 5000 = 4997 – three users have target set of size 4
RuleML-2015 Challenge Rule-based RS for the web of data
Transformation and aggregation preprocessing for top-k recommendation GAP rules induction
5
Transferable methods , data aggregations
• GenreMatch (genres in users ratings versus movie genres) and decision tree drastic pruning
• KTIML Data mining combined with first order 0.10085 (our second)
RuleML-2015 Challenge Rule-based RS for the web of data
Transformation and aggregation preprocessing for top-k recommendation GAP rules induction
6
RulePreference Rule
0.11 R1:GoodProperty=1
0.25 R2: 113.5<CNT<400
0.29 R3: R1 and R2
0.58 R4: GoodProperty=0& CNT>399
0.57 R5: GoodProperty=1 & CNT>399
RuleML-2015 Challenge Rule-based RS for the web of data
Transformation and aggregation preprocessing for top-k recommendation GAP rules induction
7
Oracle DB Data Miner
Second order logic GAP rules
• DB aggregations second order logic
• “simple” queries can be transformed to rules. E.g. SELECT UserID, MovieID, 5 FROM Ordered_Prediction WHERE OrdNr <= 5; …
… 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG
• corresponds to GAP rule • SCS_CUNI_Movie(u,m):100*x1+50*x2+ x3
• SPIELBERG(m): x1 & ORIGINAL(m): x2 & BAYESAVG(m):x3
• Semantics so far:• 2GAP - facts extended by atomic predicates corresponding to tables resulting
from database aggregations e.g. SPIELBERG(m), ORIGINAL(m), BAYESAVG(m)
RuleML-2015 Challenge Rule-based RS for the web of data
Transformation and aggregation preprocessing for top-k recommendation GAP rules induction
8
Conclusions
• Data too big for rule induction tools – all processing in a relational DB
• Transformation via NLP extraction. Clustering and importance of attributes
• Data base aggregation – CNT, AVG, ….
• “simple” rules (in a second order logic GAP)
• Rules give explanation intuitive for humans
• Precision - In ideal case we gave 75% of users at least one correct recommendation
• Future work – distribution of learning quality along users (not only AVG)
RuleML-2015 Challenge Rule-based RS for the web of data
Transformation and aggregation preprocessing for top-k recommendation GAP rules induction
9
top related