![Page 1: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/1.jpg)
1/24
Novel algorithm for mining high utility itemsets
Shankar, S. Purusothaman, T. Jayanthi, S.
International Conference on Computing, Communication and Networking,
2008. (ICCCN 2008) 18-20 Dec. 2008 Page(s):1 - 6
Speaker :89621003 廖執善 69721042 鄭仁傑
![Page 2: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/2.jpg)
2/24
Outline
Introduction Mining high utility itemsets Existing Umining algorithm Proposed FUM algorithm Experimental Results Conclusions Future Work
![Page 3: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/3.jpg)
3/24
Introduction (1/4)
One of the important issues in data mining is the interestingness problem.
The fundamental idea behind mining frequent itemsets is that only item sets with high frequency are of interest to users.
A frequent itemset only reflects the statistical correlation between items, and it does not reflect the semantic significance of the items.
![Page 4: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/4.jpg)
4/24
Introduction (2/4)
![Page 5: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/5.jpg)
5/24
Introduction (3/4)
![Page 6: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/6.jpg)
6/24
Introduction (4/4)
Motivation : we are using a utility based itemset mining approach to overcome this limitation. Utility based data mining is a new research area interested in all types of utility factors in data mining processes and targeted at incorporating utility considerations in data mining tasks. High utility itemset mining is a research area of utility based data mining , aimed at finding itemsets that contribute high utility.
![Page 7: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/7.jpg)
7/24
Mining high utility itemsets (1/3)
A frequent itemset is a set of items that appears at least in a pre-specified number of transactions. Formally, let I = {I1, I2, ••• , Im} be a set of items and DB = {T1, T2, ••• , Tn} a set of transactions where every transaction is also a set of items (i.e. itemset). Given a minimum support threshold minSup an itemset S is frequent iff:
![Page 8: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/8.jpg)
8/24
Mining high utility itemsets (2/3)
The following is the set of definitions given in [6] which we shall illustrate on a small example.
Definition 1: The external utility of an item ip is a numerical value YP defined by the user. It is transaction independent and reflects importance (usually profit) of the item. External utilities are stored in a utility table. For example, external utility of item B in Table2 is 10.
Definition 2: The internal utility of an item ip is a numerical value xp which is transaction dependent. In most cases it is defined as the quantity of an item in transaction. For example , internal utility of item E in transaction T5 is 2 (see Table 1).
Definition 3: Utility function f is a function of two variables: f{x, y) : (R+,R+) ---.. R+. The most common form also used in this paper is the product of internal and external utility: Xpx Yp
![Page 9: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/9.jpg)
9/24
Mining high utility itemsets (3/3)
Definition 4: The utility of item ip in transaction T is the quantitative measure computed with utility function from Definition 3 (i.e.) u (ip, T) = f(Xp, Yp), ip T
Definition 5: The utility of itemset S in transaction T is defined as
Definition 6: Itemset S is of high utility iff U(S) minUtil where minUtil is user defined utility threshold in percents of the total utility of the database.
Definition 7: High utility itemset mining is the problem of finding set H defined as
where ‘I’ is the set of items (attributes).
![Page 10: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/10.jpg)
10/24
Existing Umining algorithm (1/4)
![Page 11: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/11.jpg)
11/24
Existing Umining algorithm (2/4)
![Page 12: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/12.jpg)
12/24
Existing Umining algorithm (3/4)
![Page 13: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/13.jpg)
13/24
Existing Umining algorithm (4/4)
Minutil=196(threshold) k=4(numbers of level)
Level1 I={A, B, C, D} by scan function and assigned to C1
Using calculate and store function u(A)=110, u(B)=200, u(C)=190, u(D)=85
Using Discover function H={B} bigger than Minutil
Levet2 I={AB, AC, AD, BC, BD, CD} by generation function
Using Prune function b(AB)=310, b(AC)=300, b(AD)=195, b(BC)=390, b(BD)=285, b(CD)=275 ,
because b(AD)<Minutil , so omitted it. Therefore, C2={AB, AC, BC, BD, CD}
Using calculate and store function u(AB)=105, u(AC)=197, u(BC)=138, u(BD)=211, u(CD)=193
Using Discover function H={AC, BD} bigger than Minutil
Levet3 I={ABC, ABD, ACD, BCD} by generation function
Using Prune function b(ABC)=220, b(ABD)=225.5, b(ACD)=262.5, b(BCD)=271, none omitted it.
Using calculate and store function u(ABC)=143, u(ABD)=106, u(ACD)=150, u(BCD)=139
Using Discover function H={}
Levet4 I={ABCD} by generation function
Using Prune function b(ABCD)=179.3 because b(AD)<Minutil , so omitted it. None Candidate
![Page 14: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/14.jpg)
14/24
Proposed FUM algorithm (1/2)
![Page 15: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/15.jpg)
15/24
Proposed FUM algorithm (2/2)
Candidateset = {A, B, C, D, E,
AB, AC, AD, AE, BC, BD, BE,
CD, CE, DE, ACD, ACE, ADE,
BCE, BDE, CDE, ACDE }
total 22 items
![Page 16: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/16.jpg)
16/24
TWU tree Mining algorithm (1/2)
A B C D E TWU
T1 0 0 16 0 1
T2 0 12 0 2 1
T3 2 0 1 0 1
T4 1 0 0 2 1
T5 0 0 4 0 2
T6 1 2 0 0 0
T7 0 20 0 2 1
T8 3 0 25 6 1
T9 1 2 0 0 0
T10 0 12 2 0 2
item A B C D E
Benefit 3 5 1 3 5
itemTID
21
71
12
14
14
13
111
57
13
72
1TTWU
2TTWU
=16*1+1*5=21=12*5+2*3+1*5=71…
Reference :A Novel Algorithm for Mining High Utility Itemsets
![Page 17: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/17.jpg)
17/24
TWU tree Mining algorithm (2/2)
If min_util=130 WIT-tree for TWU-
Mining:
A
B
C
D E
TWU
T1
0 0 16
0 1 21
T2
0 12
0 2 1 71
T3
2 0 1 0 1 12
T4
1 0 0 2 1 14
T5
0 0 4 0 2 14
T6
1 2 0 0 0 13
T7
0 20
0 2 1 111
T8
3 0 25
6 1 57
T9
1 2 0 0 0 13
T10
0 12
2 0 2 72
E
EDC
E
B
D
E
Root
34689AXTWU =12+14+13+57+13=109<130
280
BX267910
182
BDX27
E254
BEX2710182
BDEX27
176
176253
253
372
item A B C D E
Benefit 3 5 1 3 5
HUIs={
240
83 172
BD,
240
BE,
182
BDE
48
B,
36
56
50
}
CX135810
DX2478
![Page 18: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/18.jpg)
18/24
Experimental Results (1/4)
![Page 19: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/19.jpg)
19/24
Experimental Results (2/4)
![Page 20: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/20.jpg)
20/24
Experimental Results (3/4)
Minutil (%) Two-phase TWU-Mining
#HUIs
5 51.89 27.59 4
4 73.73 39.05 6
3 117.72 55.67 7
2 205.09 95.56 22
1 569.22 182.67 161
Database #Trans #Items Remark
BMS-POS 515597 1656 Modified
Retails 88162 16469 Modified
Experimental table in BMS-POS database
![Page 21: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/21.jpg)
21/24
Experimental Results (4/4)
Minutil (%) Two-phase TWU-Mining
#HUIs
1 7.67 7.46 20
0.8 11.38 11.31 29
0.6 24.63 23.23 45
0.4 60.25 57.69 64
0.2 210.78 178.19 239
0.1 546.03 426.27 800
Experimental table in Retails database
![Page 22: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/22.jpg)
22/24
Conclusions
Utility based itemset mining is to discover the itemsets that are significant according to their utility values and utility constraints are capable of expressing more complex semantics than the support measure.
In this paper we have shown that the proposed FUM algorithm executes faster than existing Umining algorithm, (see Table III) when more itemsets are identified as high utility itemsets.
![Page 23: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/23.jpg)
23/24
Future Work
A Fast Algorithm for Mining High Utility Itemsets
2009 IEEE International Advance Computing Conference (IACC2009) Patiala, India 6-7 March 2009
We have also suggested a novel method of generating different types of itemsets such as High Utility and High Frequency itemsets (HUHF), High Utility and Low Frequency itemsets (HULF), Low Utility and High Frequency itemsets (LUHF) and Low Utility and Low Frequency itemsets (LULF) using a combination of FUM and Fast Utility Frequent mining (FUFM) algorithms.
![Page 24: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/24.jpg)
24/24
Question
為何 Umining #HUI 比 FUM #HUI
數量來得少 ?
基本上, FUM的候選集應該比Umining還要少,為何 mining出來的 #HUI比較多 ? 如果說, FUM的候選集比 Umining還要多的話,那麼Umining會有 miss,這樣才會合理。
![Page 25: 1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649f485503460f94c6a322/html5/thumbnails/25.jpg)
25/24
謝謝大家!感恩!