characteristic relational patterns arne koopman & arno siebes kdd 2009 speaker: 王思元...
TRANSCRIPT
Characteristic Relational Patterns
Arne Koopman & Arno SiebesKDD 2009
Speaker: 王思元黎引得
Outline
2
Introduction Method Experiment Results Conclusion
Introduction
3
Relational database models Local models: frequent pattern mining Global models: probabilistic relational model
Charactering the database Combine patterns to form a global model
Relational Databases
4
KDD cup relational databases Genes Hepatitis Financial
ORDERorderID
loanID
accountID
Bank-ToAmountK-Symbol
LOANloanID
accountID
DateAmountDurationPaymentStatus
ACCOUNTloanID
accountID
FrequencyDate
CLIENTloanID
clientID
BirthyearSex
CARDcardID
loanID
dispositionID
TypeIssueDate
DISPOSITION
loanID
accountID
dispositionID
clientID
Type
1513 records
682 records
682 records 36 records
827 records
827 records
Local Model
5
Frequent pattern mining: too many patterns
ACCOUNTaccountID Date
10 200911 2009
LOANloanID accountID Duration100 10 12101 11 12102 11 24
ACCOUNTDate=2009
Patterns
ACCOUNTDate=2009
ACCOUNTDate=2009
LOANDuration=12
LOANDuration=12
LOANDuration=24
Global Model
6
Probabilistic Relational Models: local co-occurrence information lost
ACCOUNTaccountID Date
10 200911 2009
LOANloanID accountID Duration100 10 12101 11 12102 11 24
Probabilistic Relational Models
ACCOUNTDate=2009
LOANDuration=12
LOANDuration=24
0.67
0.33
Combined Model
7
Relational Code Table: compact and lossless description of the complete database
ACCOUNTaccountID Date
10 200911 2009
LOANloanID accountID Duration100 10 12101 11 12102 11 24
Code Table
ACCOUNTDate=2009
ACCOUNTDate=2009
LOANDuration=12
LOANDuration=12
LOANDuration=24
Outline
8
Introduction Method Experiments Conclusion
RDB-KRIMP RDB-KRIMP selects
patterns that describe the database well
Candidates are frequent relational patterns
Describing patterns are placed in a code table shortness code length
proportional to usagecode table
emptycodetable
database
database
select pattern
codetable
patterns
add to table
MDL
accept / reject
compress database
9
Outline
10
Introduction Method Experiments Conclusion
Experiments
11
Result for Different Minimum Support Value
12
100%
13
Comparison to Other Pattern Types
14
15
Outline
16
Introduction Method Experiments Conclusion
Conclusion Code tables describe the database while
preserving local information Code tables stay compact
Stay compact for low minimum support values Reductions up to 4 orders of magnitude
Richer patterns lead to better models Smaller encodings Better descriptions for database
17