horizontal data sets: number of attributes is of the same order to several orders of magnitude...
Post on 21-Dec-2015
213 views
TRANSCRIPT
Horizontal data sets: Number of attributes is of the same order to several orders of magnitude higher than the number of records.
Example: genetic data sets, can have 10,000 attributes and 100 records.
10, 000 attributes, up to 100 million combinations of two attributes and up to 1 trillion 3 attribute sets!
Data Driven AlgorithmConstructing the Max-conf kernel for small data sets:
Input: i) a Database DB ii) a fixed consequent C
Output:
a set R of rules such that for any rule of the form X->Cthere exists a rule X'->C in R, where X' is a superset of X and X'->C has a a higher confidence then X->C
Algorithm:// DB(C) is the set of records that satisfy the consequent // RS is a working set which maintain the current subset of records that satisfy the consequentCOMMON is the set of common descriptors for the record set RS;
MaxConfKernelSet(DB, C, DB(C), RS, COMMON) {
i= size(RS)+1; if (i==1) {COMMON=Descriptors in the ith record in DB(C);} RS=RS \union {ith record in DB(C)}; while (i<=size(DB(C))) do {Delete from COMMON the descriptors not shared by the ith record; Compute support of records satisfying {COMMON-C};Compute the confidence of COMMON-C->C;if (COMMON-C)!=null) {if sufficient support and not duplicateoutput "COMMON-C->C [support, conf]" ; MaxConfKernelSet(DB, C, DB(C), RS, COMMON); RS=RS-{ith record in DB(C)};i++;RS=RS \union {ith record in DB(C)};}} Invoke:MaxConfKenalSet(DB,C, DB(C), null, null); // RS, COMMONis empty initially
OLAP and Statistical databases
• Statistical databases – from early 80s– Mutidimensional datasets concerned with
summariziation over the dimensions of the data sets. 2-D representations – census, socioeconomic data etd
• OLAP: on line analytical processing: mid 90s