challenge@ruleml2015 transformation and aggregation preprocessing for top-k recommendation gap rules...

Transformation and aggregation preprocessing for top-k recommendation

GAP rules induction

Marta Vomlelova, Michal Kopecky and Peter Vojtas

Charles University Prague

Content

• Data

• Task

• Mining – heuristics, domain specific, … • Some results

• Mining - transferable methods , data aggregations • Some results

• Oracle DB Data Miner

• Second order logic GAP rules

• Conclusions

RuleML-2015 Challenge Rule-based RS for the web of data

Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

• Run Python script train data – intermediate join processing size big, redundant (for each UserID,MovieID the 5003 movie data repeat)

• For each user find 5 movies that best match a user profile top5(u)

• Submit CSV format: userId, movieId, score\n

• Observations

• Score does not affect system response, only (unordered) sets are compared

• P, R, F@5 between top5(u) and varying size target (estimated average size of target is 9.4 resp. 8, depending on assumptions)

Mining – heuristics, domain specific, …

• 5003 DBPedia attributes – most frequent, clusters of properties, tried mining, no relevant results (acquaintance with data)

• per attribute: • relative frequency in ratings, NLP extraction

MAKEUP,VISUAL,SMIX,SEDIT,SPIELBERG,NY,CALIF,NOVELS,CAMERON,LA,ARIZONA,WILLIAMS• KSI Pure first order logic with weighted average F = 0.05262 (our third)

• 0-1 order agreement with ratings ( good properties)• 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG• SCS_CUNI “Spielberg” F = 0.10681 (our best)

• Script downloaded table Xratings \DB Ratings gave surprise• disqualified Did not use only the training/test set F = 0.6987 • Precision: 0.9994 * 5000 = 4997 – three users have target set of size 4

Transferable methods , data aggregations

• GenreMatch (genres in users ratings versus movie genres) and decision tree drastic pruning

• KTIML Data mining combined with first order 0.10085 (our second)

RulePreference Rule

0.11 R1:GoodProperty=1

0.25 R2: 113.5<CNT<400

0.29 R3: R1 and R2

0.58 R4: GoodProperty=0& CNT>399

0.57 R5: GoodProperty=1 & CNT>399

Oracle DB Data Miner

Second order logic GAP rules

• DB aggregations second order logic

• “simple” queries can be transformed to rules. E.g. SELECT UserID, MovieID, 5 FROM Ordered_Prediction WHERE OrdNr <= 5; …

… 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG

• corresponds to GAP rule • SCS_CUNI_Movie(u,m):100*x1+50*x2+ x3

• SPIELBERG(m): x1 & ORIGINAL(m): x2 & BAYESAVG(m):x3

• Semantics so far:• 2GAP - facts extended by atomic predicates corresponding to tables resulting

from database aggregations e.g. SPIELBERG(m), ORIGINAL(m), BAYESAVG(m)

Conclusions

• Data too big for rule induction tools – all processing in a relational DB

• Transformation via NLP extraction. Clustering and importance of attributes

• Data base aggregation – CNT, AVG, ….

• “simple” rules (in a second order logic GAP)

• Rules give explanation intuitive for humans

• Precision - In ideal case we gave 75% of users at least one correct recommendation

• Future work – distribution of learning quality along users (not only AVG)

challenge@ruleml2015 transformation and aggregation preprocessing for top-k recommendation gap rules...

Science

ruleml2015: towards formal semantics for odrl policies

ruleml2015: fowla, a federated architecture for ontologies

ruleml2015: binary frontier-guarded asp with function...

ruleml2015: rule-based exploration of structured data in the...

ruleml2015: graal - a toolkit for query answering with...

data preprocessing

image preprocessing

challenge@ruleml2015 modeling object-relational geolocation...

doctoral consortium@ruleml2015: grools: reactive graph...

ruleml2015: rule generalization strategies in incremental...

data preprocessing - polito.it · data preprocessing...

ruleml2015: using psl to extend and evaluate event...

importance of data preprocessing for improving...

oasis legalruleml ruleml2015 berlin, august 2nd, 2015

ruleml2015 : hybrid relational and graph reasoning

data preprocessing -...

industry@ruleml2015 datagraft

ruleml2015: input-output stit logic for normative systems

meg preprocessing

ruleml2015: rule-based data transformations in electricity...