mining discriminative itemsets in ata streams using ... · empirical analysis shows that the...

MINING DISCRIMINATIVE ITEMSETS IN DATA

STREAMS USING DIFFERENT WINDOW MODELS

Mr Majid Seyfi

Associate Professor Shlomo Geva, Associate Professor Yue Xu

Submitted in fulfilment of the requirements for the degree of

Master of Philosophy

Faculty of Science and Engineering

Queensland University of Technology

To my supervisors, parents and friends

Mining Discriminative Itemsets in Data Streams using Different Window Models Page i

Keywords

Anomaly detection, Association classification, Concept drift, Data mining, Data stream

mining, Discriminative classification, Discriminative itemset, Discriminative rule,

Efficiency, Emerging patterns, Frequent pattern, Heuristic, Optimization, Pattern mining,

Personalization, Prediction mining, Prefix tree, Recommendation, Sliding window model,

Sparse itemsets, Tilted-time window model, Transaction.

Mining Discriminative Itemsets in Data Streams using Different Window Models Page ii

Abstract

With the availability of big data in areas such as social networks, online marketing systems and

stock markets, data mining techniques have been successively used for knowledge discovery. For

user gratification, businesses commonly create profiles for their customers and keep track of user

activities. Usually, the historical activities of each customer are stored as transaction file data

streams. Embedded knowledge in these data streams is applicable for description and prediction

of the trends. It can be used for different purposes like personalization, anomaly detection, target

marketing and forecasting upcoming trends.

Frequent itemset mining, sequential rule mining and decision tree methods are popular

techniques used for description and prediction mining. However, most of the proposed methods

designed based on the static datasets or a single data stream. The algorithms based on single data

stream lose the part of knowledge that is related to interaction between data streams. The

embedded patterns and differences between data streams can be used for defining effective data

mining methods.

This thesis, for the first time, develops algorithms for mining discriminative itemsets in

data streams. Discriminative itemsets are those itemsets that are frequent in the one data stream

and their frequencies in that stream are much higher than other data streams in the application

domain. The discriminative itemsets are more distinctive and are more useful in data stream

comparison. This thesis has developed novel discriminative itemset mining algorithms using the

titled-time window model and sliding window model in data streams. The determinative

heuristics have been applied to the proposed algorithms for efficient mining of discriminative

itemsets in data streams with the highest refined approximate bound. The extracted

discriminative itemsets can be used for enhancing the effectiveness of description and prediction

in data stream applications. The proposed algorithms can be utilized on the targeted datasets. The

empirical analysis shows that the proposed algorithms have good time and space complexities

with high accuracy in large data streams.

In summary, this thesis attempts to address some of the challenges in the pattern mining

in more than one data stream for the purpose of description and prediction in data streams.

Mining Discriminative Itemsets in Data Streams using Different Window Models Page iii

Table of Contents

Keywords ................................................................................................................................................ iii

Abstract ................................................................................................................................................... ii

Table of Contents ................................................................................................................................... iii

List of Figures ......................................................................................................................................... vi

List of Tables ........................................................................................................................................... ix

List of Abbreviations ................................................................................................................................ x

List of Notations ..................................................................................................................................... xi

Statement of Original Authorship ........................................................................................................ xiv

Acknowledgments ................................................................................................................................. xv

1 CHAPTER 1: INTRODUCTION ........................................................................................................ 1

1.1 Background .................................................................................................................................. 1

1.2 Problem statement and objectives .............................................................................................. 3 1.2.1 Research problems ........................................................................................................... 3 1.2.2 Research objectives .......................................................................................................... 5

1.3 Research significance ................................................................................................................... 8

1.4 Research limitation ...................................................................................................................... 8

1.5 Contributions ............................................................................................................................... 9

1.6 Publications .................................................................................................................................. 9

1.7 Thesis outline ............................................................................................................................. 10

2 CHAPTER 2: LITERATURE REVIEW .............................................................................................. 12

2.1 Data stream mining .................................................................................................................... 12

2.2 Frequent itemset mining in data stream ................................................................................... 13 2.2.1 Window models and update intervals in data stream.................................................... 13 2.2.2 Frequent itemset mining algorithms in data stream ...................................................... 15

2.3 Contrast data mining ................................................................................................................. 18 2.3.1 Emerging patterns .......................................................................................................... 19 2.3.1.1 Different types of emerging patterns ............................................................................. 22 2.3.1.2 Delta-discriminative emerging patterns ......................................................................... 24 2.3.1.3 Differences between emerging patterns and discriminative itemsets ........................... 26 2.3.2 Other contrast patterns .................................................................................................. 27 2.3.3 Discriminative itemset mining in data streams .............................................................. 28 2.3.4 Discriminative item mining in data streams ................................................................... 29

2.4 Association rule mining .............................................................................................................. 32 2.4.1 Association rule mining in data streams ......................................................................... 33 2.4.2 Classification rule mining ................................................................................................ 33

2.5 Verifying the new knowledge contributed to the area .............................................................. 35

2.6 Research gaps and implications ................................................................................................. 37

Mining Discriminative Itemsets in Data Streams using Different Window Models Page iv

3 CHAPTER 3: MINING DISCRIMINATIVE ITEMSETS IN DATA STREAMS .......................................... 39

3.1 Existing works ............................................................................................................................ 39

3.2 Research problem ...................................................................................................................... 42 3.2.1 Formal definition ............................................................................................................ 43 3.2.2 Discriminative itemset mining ........................................................................................ 44

3.3 DISTree method ......................................................................................................................... 44 3.3.1 DISTree construction and pruning .................................................................................. 46 3.3.2 DISTree algorithm ........................................................................................................... 51 3.3.3 DISTree summary ............................................................................................................ 54

3.4 DISSparse method ...................................................................................................................... 54 3.4.1 Mining discriminative itemsets using sparse prefix tree ................................................ 56 3.4.1.1 Potential discriminative itemsets generation using minimized DISTree ........................ 61 3.4.1.2 Conditional FP-Tree expansion ....................................................................................... 63 3.4.1.3 Tuning non-discriminative subsets of discriminative itemsets ....................................... 65 3.4.2 DISSparse Algorithm ....................................................................................................... 66 3.4.3 DISSparse Algorithm Complexity .................................................................................... 68 3.4.4 Modified DISSparse and modified DPMiner ................................................................... 68 3.4.5 DISSparse summary ........................................................................................................ 73

3.5 Chapter summary ...................................................................................................................... 73

4 CHAPTER 4: MINING DISCRIMINATIVE ITEMSETS IN DATA STREAMS USING THE TILTED-TIME WINDOW MODEL ........................................................................................................................ 76

4.1 Existing works ............................................................................................................................ 76

4.2 Research problem ...................................................................................................................... 78 4.2.1 Problem formal definition .............................................................................................. 80 4.2.2 Discriminative itemset mining using the tilted-time window model ............................. 83

4.3 Tilted-time window model ......................................................................................................... 84 4.3.1 Tilted-time window model updating .............................................................................. 86 4.3.2 Discriminative itemsets approximate bound .................................................................. 88 4.3.2.1 Maintaining the discriminative itemsets in the tilted-time window model ................... 89 4.3.2.2 Improving the accuracy using relaxation ratio ............................................................... 91 4.3.2.3 Tail pruning in the tilted-time window model ................................................................ 93

4.4 H-DISSparse method .................................................................................................................. 95 4.4.1 H-DISSparse Algorithm ................................................................................................... 95 4.4.2 H-DISSparse Algorithm Complexity ................................................................................ 97

4.5 Chapter summary ...................................................................................................................... 97

5 CHAPTER 5: MINING DISCRIMINATIVE ITEMSETS IN DATA STREAMS USING THE SLIDING WINDOW MODEL ...................................................................................................................... 100

5.1 Existing works .......................................................................................................................... 100

5.2 Research problem .................................................................................................................... 104 5.2.1 Problem formal definition ............................................................................................ 105 5.2.2 Discriminative itemset mining using the sliding window model .................................. 108

5.3 Offline Sliding window model .................................................................................................. 108 5.3.1 Mining discriminative itemsets in sliding window using prefix tree ............................. 109 5.3.2 Incremental offline sliding window .............................................................................. 110 5.3.2.1 Initializing the offline sliding window ........................................................................... 111 5.3.2.2 Stable and updated subsets in offline sliding window ................................................. 113 5.3.2.3 S-DISStream tuning and pruning in offline sliding window .......................................... 121

5.4 Online Sliding window model .................................................................................................. 123 5.4.1 Mining discriminative itemsets in online sliding window using queue structure ......... 123

Mining Discriminative Itemsets in Data Streams using Different Window Models Page v

5.4.2 Improving the accuracy using relaxation ratio ............................................................. 124

5.5 S-DISSparse method ................................................................................................................. 125 5.5.1 S-DISSparse Algorithm .................................................................................................. 126 5.5.2 S-DISSparse Algorithm Complexity ............................................................................... 128

5.6 Chapter summary .................................................................................................................... 128

6 CHAPTER 6: EVALUATION AND ANALYSIS ................................................................................ 131

6.1 Benchmarking .......................................................................................................................... 131 6.1.1 Evaluation benchmarks ................................................................................................ 131 6.1.1.1 Batch processing benchmarks ...................................................................................... 132 6.1.1.2 Data stream processing benchmarks ............................................................................ 134 6.1.2 Evaluation environment ............................................................................................... 134

6.2 Batch processing ...................................................................................................................... 135 6.2.1 Evaluation on synthetic datasets .................................................................................. 135 6.2.2 Evaluation on real datasets .......................................................................................... 142 6.2.3 Discussion on 𝜹-discriminative emerging patterns ...................................................... 145 6.2.3.1 Evaluation on modified DISSparse and modified DPM ................................................. 146 6.2.3.2 Evaluation on real datasets with modified DISSparse and modified DPM ................... 149 6.2.4 Discussion ..................................................................................................................... 149

6.3 Tilted-time window model ....................................................................................................... 150 6.3.1 Evaluation on synthetic datasets .................................................................................. 151 6.3.1.1 Approximation in discriminative Itemsets in the tilted-time window model ............... 155 6.3.1.2 Discriminative Itemsets in the tilted-time window model without tail pruning .......... 157 6.3.2 Evaluation on real datasets .......................................................................................... 158 6.3.2.1 Scalability on datasets with less concept drifts in the tilted-time window model ....... 159 6.3.2.2 Approximation in discriminative Itemsets in the tilted-time window model ............... 162 6.3.3 Discussion ..................................................................................................................... 164

6.4 Sliding window model .............................................................................................................. 165 6.4.1 Evaluation on synthetic datasets .................................................................................. 166 6.4.1.1 Offline sliding discriminative itemsets .......................................................................... 166 6.4.1.2 Online sliding discriminative itemsets .......................................................................... 169 6.4.2 Evaluation on real datasets .......................................................................................... 171 6.4.3 Discussion ..................................................................................................................... 175

6.5 Chapter summary .................................................................................................................... 176

7 CHAPTER 7: CONCLUSIONS ...................................................................................................... 178

7.1 Summary of contributions ....................................................................................................... 179

7.2 Summary of findings ................................................................................................................ 181

7.3 Connections between the three tasks ..................................................................................... 182

7.4 Limitations and the future research issues .............................................................................. 182

REFERENCES .............................................................................................................................. 184

Mining Discriminative Itemsets in Data Streams using Different Window Models Page vi

List of Figures

Figure ‎1.1: Research methodology and thesis structure ........................................................................ 10

Figure ‎2.1: Landmark window model ................................................................................................... 13

Figure ‎2.2: Damped window model ...................................................................................................... 13

Figure ‎2.3: Sliding window model ........................................................................................................ 14

Figure ‎2.4: Tilted-time window model .................................................................................................. 14

Figure ‎2.5: Support plan for emerging patterns (Dong and Li 1999) .................................................... 21

Figure ‎3.1 Header-Table and FP-Tree structures for input batch of transactions ................................. 48

Figure ‎3.2 Conditional FP-Tree of Header-Table item 𝑎 ..................................................................... 49

Figure ‎3.3 Header-Table and DISTree structure without pruning (the full prefix tree size is only for

display and is not generated) ........................................................................................................ 50

Figure ‎3.4 Final DISTree structure and the reported discriminative itemsets ....................................... 51

Figure ‎3.5 Conditional FP-Tree of Header-Table item 𝑎 associated with the top ancestor on the first

level .............................................................................................................................................. 56

Figure ‎3.6 Minimized DISTree generated from the left-most subtree in Figure ‎3.5 ............................. 62

Figure ‎3.7 Expanded conditional FP-Tree of Header-Table item 𝑎 after processing the first subtree . 64

Figure ‎3.8 Minimized DISTree generated out of the potential discriminative subsets of the left-most

subtree in conditional FP-Tree for Header-Table item 𝑎 ............................................................. 64

Figure ‎3.9 Expanded modified conditional FP-Tree of Header-Table item a after processing the

second subtree .............................................................................................................................. 65

Figure ‎3.10 A sample of discriminative itemsets distribution with different discriminative levels in

market basket monitoring application .......................................................................................... 75

Figure ‎4.1 Tilted-time window frames .................................................................................................. 81

Figure ‎4.2 Logarithmic tilted-time window structure (Giannella et al. 2003) ....................................... 84

Figure ‎4.3 A sample H-DISStream based on Example ‎3.1 with the built-in tilted-time window model85

Figure ‎4.4 Tilted-time window model updating .................................................................................... 87

Figure ‎5.1 Sliding window model 𝑊 made of three partitions 𝑃 ........................................................ 106

Figure ‎5.2 Header-Table and S-FP-Tree structures by the first partition 𝑃1 ...................................... 112

Figure ‎5.3 Header-Table and S-DISStream structures by the first partition 𝑃1 .................................. 113

Figure ‎5.4 Header-Table and updated S-FP-Tree structures by adding second partition 𝑃2 .............. 113

Figure ‎5.5 Conditional FP-Tree of Header-Table item 𝑎 updated by partition 𝑃2 ............................. 117

Figure ‎5.6 Updated S-DISStream after processing 𝑆𝑡𝑎𝑏𝑙𝑒(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐) in conditional FP-Tree for

Header-Table item 𝑎 .................................................................................................................. 118

Figure ‎5.7 Expanded conditional FP-Tree of Header-Table item 𝑎 updated by partition 𝑃2 after

processing the first subtree ......................................................................................................... 120

Mining Discriminative Itemsets in Data Streams using Different Window Models Page vii

Figure ‎5.8 Updated S-DISStream after processing potential discriminative subsets of the left-most

subtree in conditional FP-Tree for Header-Table item 𝑎 ........................................................... 120

Figure ‎5.9 Expanded conditional FP-Tree of Header-Table item a updated by partition 𝑃2 after

processing the second subtree .................................................................................................... 121

Figure ‎5.10 Updated S-DISStream after processing potential discriminative subsets of the left-most

subtree in conditional FP-Tree for Header-Table item 𝑎 ........................................................... 121

Figure ‎5.11 Final S-DISStream after offline sliding based on partition 𝑃2 ......................................... 122

Figure ‎5.12 Transaction-List made of partitions fit in the online sliding window frame 𝑊 ............... 123

Figure ‎6.1 Scalability with discriminative level 𝜃 for 𝐷1 (support threshold 𝜑 = 0.0001) ............... 136

Figure ‎6.2 Scalability with support threshold 𝜑 for 𝐷1(discriminative level 𝜃 = 10) ....................... 137

Figure ‎6.3 Number of the discriminative itemsets with different dataset length ratios (𝑛2/𝑛1) (discriminative level 𝜃 = 10 and support threshold 𝜑 = 0.0001) ............................................. 137

Figure ‎6.4 Scalability with different dataset length ratios (𝑛2/𝑛1) (discriminative level 𝜃 = 10 and

support threshold 𝜑 = 0.0001) .................................................................................................. 138

Figure ‎6.7 Number of the discriminative itemsets with different 𝜃 for 𝐷3 (support threshold 𝜑 =0.0001) ....................................................................................................................................... 139

Figure ‎6.8 Scalability with discriminative level 𝜃 for 𝐷3 (support threshold 𝜑 = 0.00001) ............. 140

Figure ‎6.9 Number of the discriminative itemsets with discriminative level 𝜃 for 𝐷3 (support

threshold 𝜑 = 0.0001) ............................................................................................................... 140

Figure ‎6.10 Frequent items distribution in 𝑆1 ..................................................................................... 141

Figure ‎6.11 Frequent items distribution in 𝑆2 ..................................................................................... 141

Figure ‎6.12 Frequent items distribution in discriminative itemsets in the modified 𝐷1 with

discriminative level 𝜃 = 10 and support threshold 𝜑 = 0.0001 ............................................... 141

discriminative level 𝜃 = 10 and support threshold 𝜑 = 0.0001 ............................................... 142

Figure ‎6.14 Scalability with discriminative level 𝜃 for susy dataset (support threshold 𝜑 = 0.01) ... 143

Figure ‎6.15 Scalability with support threshold 𝜑 for susy dataset (discriminative level 𝜃 = 2) ........ 143

Figure ‎6.16 Scalability with discriminative level 𝜃 for accident dataset (support threshold 𝜑 = 0.01)144

Figure ‎6.17 Scalability with discriminative level 𝜃 for mushroom dataset (support threshold 𝜑 =0.001) ......................................................................................................................................... 145

Figure ‎6.18 Scalability with 𝑚𝑖𝑛_𝑠𝑢𝑝 for 𝐷1 (δ = 2) ........................................................................ 147

Figure ‎6.19 Scalability with δ for 𝐷1 (𝑚𝑖𝑛_𝑠𝑢𝑝 = 50) ...................................................................... 147

Figure ‎6.20 Scalability with 𝑚𝑖𝑛_𝑠𝑢𝑝 for 𝐷3 (δ = 50) ...................................................................... 148

Figure ‎6.21 Scalability with δ for 𝐷3 (𝑚𝑖𝑛_𝑠𝑢𝑝 = 50) ...................................................................... 148

Figure ‎6.22 Scalability of batch processing not considering the tilted-time window model updating 153

Figure ‎6.23 Tilted-time window model updating time complexity ..................................................... 154

Figure ‎6.24 Time complexity of H-DISTree and H-DISSparse algorithms ........................................ 154

Figure ‎6.25 H-DISStream structure size .............................................................................................. 155

Mining Discriminative Itemsets in Data Streams using Different Window Models Page viii

Figure ‎6.26 Scalability of H-DISSparse algorithm by relaxation of 𝛼 = 1, 𝛼 = 0.9 and 𝛼 = 0.75 ... 156

Figure ‎6.27 Number of sub-discriminative itemsets by relaxation of 𝛼 = 0.9 and 𝛼 = 0.75 ............. 156

Figure ‎6.28 Scalability of H-DISSparse algorithm by eliminating Corollary ‎4-1 and Corollary ‎4-3158

Figure ‎6.29 Scalability of batch processing not considering the tilted-time window model updating 161

Figure ‎6.30 Time complexity of H-DISTree and H-DISSparse algorithms ........................................ 162

Figure ‎6.31 H-DISStream structure size .............................................................................................. 162

Figure ‎6.32 Scalability of H-DISSparse algorithm by relaxation of 𝛼 = 1 and 𝛼 = 0.9 .................... 163

Figure ‎6.33 Number of sub-discriminative itemsets by relaxation of 𝛼 = 0.9 ................................... 163

Figure ‎6.34 S-DISSparse and DISSparse time complexity for 𝐷1 (window frame 𝑊 = 25) ............. 167

Figure ‎6.35 S-DISSparse space complexity in offline sliding for 𝐷1 (𝑊 = 25) ................................ 167

Figure ‎6.36 S-DISSparse and DISSparse time and space complexity for 𝐷2 (window frame 𝑊 = 10)168

Figure ‎6.37 S-DISSparse time complexity in online and offline sliding for 𝐷1 (𝑊 = 25) ................ 169

Figure ‎6.38 S-DISSparse time complexity for online and offline sliding for 𝐷1 (𝑊 = 25) with

different relaxation of 𝛼 ............................................................................................................. 170

Figure ‎6.39 S-DISStream size for 𝐷1 (𝑊 = 25) by different relaxation of 𝛼 .................................... 170

Figure ‎6.40 Number of itemsets that their tag is changed for 𝐷1 (𝑊 = 25) by different relaxation of 𝛼171

Figure ‎6.41 Number of discriminative and sub-discriminative itemsets for 𝐷1 (𝑊 = 25) by different

relaxation of 𝛼 ............................................................................................................................ 171

Figure ‎6.42 S-DISSparse and DISSparse time complexity for susy dataset (window frame 𝑊 = 20)172

Figure ‎6.43 S-DISSparse space complexity in offline sliding for susy dataset (𝑊 = 20) .................. 173

Figure ‎6.44 S-DISSparse time complexity in online and offline sliding for susy dataset (𝑊 = 20) .. 173

Figure ‎6.45 S-DISStream size for susy dataset (𝑊 = 20) by different relaxation of 𝛼 ...................... 174

Figure ‎6.46 Number of itemsets that their tag is changed for susy dataset (𝑊 = 20) by different

relaxation of 𝛼 ............................................................................................................................ 174

Figure ‎6.47 Number of discriminative and sub-discriminative itemsets for susy dataset (𝑊 = 20) by

different relaxation of 𝛼 ............................................................................................................. 174

Mining Discriminative Itemsets in Data Streams using Different Window Models Page ix

List of Tables

Table ‎2.1 Data stream frequent itemset mining algorithms ................................................................... 15

Table ‎3.1 An input batch in data streams .............................................................................................. 48

Table ‎3.2 Desc-Flist order of frequent items in target data stream 𝑆1 .................................................. 48

Table ‎5.1 The first input batch in data streams fits in partition 𝑃1 ..................................................... 112

Table ‎5.2 Desc-Flist order of frequent items is target data stream 𝑆1 in the first batch...................... 112

Table ‎5.3 The second input batch in data streams fits in partition 𝑃2 ................................................. 113

Table ‎6.1 The number of discriminative itemsets in the tilted-time window model ........................... 152

Mining Discriminative Itemsets in Data Streams using Different Window Models Page x

List of Abbreviations

DISTree – discriminative tree

DISSparse-discriminative sparse

H-DISSparse-historical discriminative sparse

S-DISSparse – sliding discriminative sparse

FP-Tree – frequent pattern tree

FP-Growth – frequent pattern growth

H-DISStream – historical discriminative stream

S-FP-Tree – sliding frequent pattern tree

S-DISStream – sliding discriminative stream

Mining Discriminative Itemsets in Data Streams using Different Window Models Page xi

List of Notations

𝑺𝒊: Data stream i

𝑩: Batch of transactions

𝑻: Transaction made of items

∑: Alphabet set of items

𝒆: Item

𝑰: Itemset

𝒇𝒊(𝑰): Frequency of the itemset I in data stream 𝑆𝑖

𝑰(𝒂𝒏,𝒎): Itemset ending with 𝑎 with frequency of 𝑛 in 𝑆𝑖 and 𝑚 in 𝑆𝑗

𝒏𝒊: Length of data stream 𝑆𝑖

𝒓𝒊(𝑰): Frequency ratio of itemset 𝐼 in data stream 𝑆𝑖

𝜽: Discriminative level

𝝋: Support threshold

𝑹𝒊𝒋(𝑰): Frequency ratio of itemset 𝐼 in target data stream 𝑆𝑖 vs general data stream 𝑆𝑗

𝑫𝑰𝒊𝒋: Discriminative itemsets in a single batch of transactions

𝑵𝑫𝑰𝒊𝒋: Non-discriminative itemsets in a single batch of transactions

𝑫𝒆𝒔𝒄 − 𝑭𝒍𝒊𝒔𝒕: Descending order of items based on their frequencies in the first batch

𝑺𝒖𝒃𝒕𝒓𝒆𝒆𝒓𝒐𝒐𝒕: Branches under the same root in the first level of conditional FP-Tree and ending

with the processing Header-Table item

𝑯𝒆𝒂𝒅𝒆𝒓_𝑻𝒂𝒃𝒍𝒆_𝒊𝒕𝒆𝒎𝒔(𝑺𝒖𝒃𝒕𝒓𝒆𝒆𝒓𝒐𝒐𝒕): The set of Header-Table items which are linked

under their subtree root node using Header-Table links

𝒊𝒕𝒆𝒎𝒔𝒆𝒕𝒔(𝒓𝒐𝒐𝒕, 𝒂): The set of itemsets in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 ending with a header item 𝑎 ∊

𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡)

𝐌𝐚𝐱_𝐟𝐫𝐞𝐪𝒊(𝒓𝒐𝒐𝒕, 𝒂): The maximum frequency of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎) in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 in the

target data stream 𝑆𝑖

𝐌𝐚𝐱_𝐝𝐢𝐬_𝐯𝐚𝐥𝐮𝐞(𝒓𝒐𝒐𝒕, 𝒂): The maximum discriminative value of 𝐼𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎 ) in

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡

𝑷𝒐𝒕𝒆𝒏𝒕𝒊𝒂𝒍(𝑺𝒖𝒃𝒕𝒓𝒆𝒆𝒓𝒐𝒐𝒕): Potential discriminative itemsets in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡

𝑰𝒏𝒕𝒆𝒓𝒏𝒂𝒍 𝒏𝒐𝒅𝒆𝒓𝒐𝒐𝒕: The internal nodes in the paths between 𝑟𝑜𝑜𝑡 of 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 and the

𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡)

𝒊𝒕𝒆𝒎𝒔𝒆𝒕𝒔(𝒓𝒐𝒐𝒕, 𝒊𝒏, 𝒂): The set of itemsets in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 ending with a header item

𝑎 ∊ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) with subset of internal node 𝑖𝑛

𝐌𝐚𝐱_𝐟𝐫𝐞𝐪𝒊(𝒓𝒐𝒐𝒕, 𝒊𝒏, 𝒂): The maximum frequency of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡

in the target data stream 𝑆𝑖

Mining Discriminative Itemsets in Data Streams using Different Window Models Page xii

𝐌𝐚𝐱_𝐝𝐢𝐬_𝐯𝐚𝐥𝐮𝐞(𝒓𝒐𝒐𝒕, 𝒊𝒏, 𝒂): The maximum discriminative value of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) in

𝑷𝒐𝒕𝒆𝒏𝒕𝒊𝒂𝒍(𝒊𝒏): Potential discriminative itemsets with subset of internal node 𝑖𝑛 in

𝐌𝐚𝐱_𝐟𝐫𝐞𝐪(𝒓𝒐𝒐𝒕, 𝒂): The maximum frequency of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎) in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 in the

datasets

𝐌𝐢𝐧_𝐝𝐞𝐥𝐭𝐚_𝐯𝐚𝐥𝐮𝐞(𝒓𝒐𝒐𝒕, 𝒂): The minimum delta discriminative value of 𝐼𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎)

in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡

𝐌𝐚𝐱_𝐟𝐫𝐞𝐪(𝒓𝒐𝒐𝒕, 𝒊𝒏, 𝒂): The maximum frequency of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡

in the datasets

𝐌𝐢𝐧_𝐝𝐞𝐥𝐭𝐚_𝐯𝐚𝐥𝐮𝐞(𝒓𝒐𝒐𝒕, 𝒊𝒏, 𝒂): The minimum delta discriminative value of

𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡

𝑾𝒌: Window frame k in the tilted-time window model

𝒇𝒊𝒌(𝑰): Frequency of the itemset I in data stream 𝑆𝑖 in window frame 𝑊𝑘

𝒏𝒊𝒌: Length of data stream 𝑆𝑖 in window frame 𝑊𝑘

𝒓𝒊𝒌(𝑰): Frequency ratio of itemset 𝐼 in data stream 𝑆𝑖 in window frame 𝑊𝑘

𝑹𝒊𝒋𝒌 (𝑰): Frequency ratio of itemset 𝐼 in target data stream 𝑆𝑖 vs general data stream 𝑆𝑗 in window

frame 𝑊𝑘

𝑫𝑰𝒊𝒋𝒌 : Discriminative itemsets in tilted-time window model with 𝑘 window frames

𝒇𝒊𝟎..𝒌(𝑰): Total frequency of the itemset I in data stream 𝑆𝑖 in window frame 𝑊0 to 𝑊𝑘

𝒏𝒊𝟎..𝒌: Total length of data stream 𝑆𝑖 in window frame 𝑊0 to 𝑊𝑘

𝜶: Relaxation threshold

𝑺𝑫𝑰𝒊𝒋𝒌 : Sub-discriminative itemsets in tilted-time window model with 𝑘 window frames

𝑵𝑫𝑰𝒊𝒋: Non-discriminative itemsets in data stream 𝑆𝑖 against data stream 𝑆𝑗 in the tilted-time

window model

𝒇𝒊(𝑰) < 𝒙, 𝒚 >: Frequency of the itemset 𝐼 in the group of continuous batches 𝐵𝑥 to 𝐵𝑦 with

𝑥 ≥ 𝑦, in data stream 𝑆𝑖

𝑷: Partition

𝑾: Sliding window model

𝒇𝒊𝒘(𝑰): Frequency of the itemset I in data stream 𝑆𝑖 in sliding window frame 𝑊

𝒏𝒊𝒘: Length of data stream 𝑆𝑖 in sliding window frame 𝑊

𝒓𝒊𝒘(𝑰): Frequency ratio of itemset 𝐼 in data stream 𝑆𝑖 in sliding window frame 𝑊

𝑹𝒊𝒋𝒘(𝑰): Frequency ratio of itemset 𝐼 in target data stream 𝑆𝑖 vs general data stream 𝑆𝑗 in sliding

window frame 𝑊

𝑫𝑰𝒊𝒋𝒘: Discriminative itemsets in sliding window frame 𝑊

Mining Discriminative Itemsets in Data Streams using Different Window Models Page xiii

𝑺𝑫𝑰𝒊𝒋𝒘: Sub-discriminative itemsets in sliding window frame 𝑊

𝑵𝑫𝑰𝒊𝒋𝒘: Non-discriminative itemsets in sliding window frame 𝑊

𝑺𝒕𝒂𝒃𝒍𝒆(𝑺𝒖𝒃𝒕𝒓𝒆𝒆𝒓𝒐𝒐𝒕): Stable 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 during offline sliding

𝑺𝒕𝒂𝒃𝒍𝒆(𝒊𝒏): Stable itemsets with subset of 𝑖𝑛 in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 during offline sliding

QUT Verified Signature

August 2018

Mining Discriminative Itemsets in Data Streams using Different Window Models Page xv

Acknowledgments

I would like to express my gratitude to all those who gave me the chance to complete

this thesis.

Firstly, I would like to thank Associate Prof Shlomo Geva, as my principal supervisor,

for his specific guidance and encouragement throughout this candidature. Many thanks to my

associate supervisor, Associate Prof Yue Xu, for her generous support and comments on my

work during this research work. I appreciate all the help and support I also received from

Associate Prof Richi Nayak, as co-author in my publications. Thank you, for providing your

valuable suggestions and encouragement, for the period of my study.

Special thanks to the Science and Engineering Faculty, QUT, which has provided me

with a comfortable research environment and thoughtful administrative help. I would like to

express my appreciation to CRC Smart Services and again, the Science and Engineering Faculty,

QUT, for the partial financial funding for this research work.

Copyediting and proofreading services for this thesis were provided and are

acknowledged, according to the guidelines laid out in the University-endorsed national policy

guidelines for the editing of research theses.

Thank you to all my colleagues who shared the research offices with me, and the

members of my research group, for participating in helpful discussions and being important

friends along my study journey. Finally, I would like to extend my heartfelt appreciation to my

parents for their unconditional love. I thank them for their support for my study overseas, and for

their consistently positive advice.

Chapter 1: Introduction Page 1

1Chapter 1: Introduction

This chapter presents a general outline for the thesis by converging the research

background in Section ‎1.1 and research statement in Section ‎1.2. It will highlight the overall

research significance in Section ‎1.3, research limitation in Section ‎1.4 and contributions in

Section ‎1.5. Furthermore, it includes the publications from this research in Section ‎1.6 and the

thesis outline in Section ‎1.7.

1.1 BACKGROUND

The large amounts of datasets collected in different domains provide this opportunity for

the knowledge discovery and data mining tasks to explain the natural processes in much clearer

and more understandable ways. This is actually one of the main challenges that organizations and

individuals are facing, in turning their fast-growing data collections into accessible and actionable

knowledge. The important target of the data mining techniques is to look for easy-to-understand,

novel patterns that can show the underlying concepts of the large datasets. These techniques

however are not applicable in the current real applications unless they enable exploration,

explanation and summary of the datasets in concise, efficient and understandable forms. Pattern

mining is the task of discovering patterns, showing the important content and structure of the

datasets, and presenting them in understandable forms for further usage.

Pattern mining is useful in many real life applications like personalization (Tseng and

Lin 2006; Djahantighi et al. 2010), anomaly detection (Lin et al. 2010), enhancing marketing

strategies (Hollfelder, Oria and ¨Ozsu 2000; Huang et al. 2008; Prasad and Madhavi 2012) and

forecasting upcoming trends (Mori et al. 2005). Companies would have more focused future

plans in their businesses based on description of the patterns observed for each customer or group

of customers and prediction of their subsequent patterns, as well as information about general

trends. Using this sort of knowledge, they can increase their marketing strength and offer better

services to their customers. In many areas like social networks, online marketing systems, stock

markets, online news agencies and search engines, users have profiles and their activities are

tracked in the form of transactional data streams. The embedded knowledge in these historical

data‎streams‎is‎used‎for‎understanding‎the‎current‎patterns‎discovered‎in‎users’‎activities‎as‎well‎

as prediction of their future patterns.

A data stream is defined as a sequence of transactions that are coming at high speed

through time (Manku and Motwani 2002). Compared to traditional static datasets, data streams

have more complexities, as they are big, they are growing fast, their embedded knowledge

changes during time and queries usually need real time answers (Manku and Motwani 2002).

Currently, data stream mining methods such as frequent itemset mining (Giannella et al. 2003;

Hyuk and Lee 2003; Li, Lee and Shan 2004; Yu et al. 2004; Chi et al. 2004; Chang and Lee

2005; Leung and Khan 2006; Lin, Hsueh and Hwang 2008; Li and Lee 2009; Manku 2016) and

sequential rule mining techniques (Shin and Lee 2008; Çokpınar‎and‎Gündem‎2012; Ahmed et

al. 2012) have been used for knowledge discovery from data streams. The extracted knowledge

from these methods is used for the purpose of data stream pattern description and data stream

pattern prediction (Hollfelder, Oria and ¨Ozsu 2000; Mori et al. 2005; Tseng and Lin 2006;

Eichinger, Nauck and Klawonn 2006; Aggarwal 2007; Huang et al. 2008; Cheng, Ke and Ng

2008; Lin et al. 2010; Prasad and Madhavi 2012; Manku 2016). However, most of the proposed

methods work on a single data stream (Aggarwal 2007; Cheng, Ke and Ng 2008; Manku 2016).

Using data mining techniques in more than one data stream helps for better understanding of

streams, as data streams in the same domain usually have a relationship with each other and with

general trends. The periodic frequent items, frequent itemsets, association rules and changes in

user activities can be considered in comparison with the general trends.

In dynamic tracing of stock market fluctuation, one maybe interested in itemsets that

come together much more frequently in one stock market than another one. These itemsets are

useful for identifying the specific set of items which are of high interest in one market compared

to the other markets. These itemsets can also be useful for anomaly detection and personalization.

In network traffic measurements, in order to find the fraud or intrusion detections, the concurrent

activities of one user, which are more frequent than the same group of activities in the whole

network, are investigated; for example, identifying the users who have the set of activities in their

log files more frequently than the log files of rest of the population. What is looked for is a group

of web pages visited together, by a specific user or group of users, more frequently in comparison

to another user or groups, as well as the queries that are asked many more times in one

geographical area compared to another area, or the whole world. This can be used for better

optimization, localization and suggestion. To find the success factors of one project, you look for

the itemsets that are happening more frequently in the documents of the successful project

compared to the failed one. Building a personalized dynamic news delivery service looks at a

group of words which are more common in the news read by a specific user, compared to the

same group of words in a collection of all news, and updates the system by changes in user

preferences. An essential issue inherent in all the mentioned applications is to find itemsets that

can distinguish the target stream from all others (Lin et al. 2010). There are different data mining

techniques proposed for dataset comparison.

The emerging patterns (Dong and Li 1999) were proposed for differentiating between

datasets by comparing their frequent itemsets. EPs are defined as itemsets whose frequencies

grow significantly from one dataset to another dataset. EPs are able to highlight the emerging

trends in the time stamped datasets and also show the differences between data classes. The same

idea is useful for differentiating between data streams. Also, in (Lin et al. 2010) three algorithms

have been proposed for mining discriminative items in data streams. The items are discovered

with frequencies much higher in the target data stream in comparison with the general data

stream. Discriminative item mining suffers from the problem of being meaningless, as the items

are not carrying so much knowledge with them. Compared to the items, itemsets have more

value in the mining process by themselves; also the association rules can be extracted out of them

as another kind of valuable knowledge. This emphasizes the need for developing more effective

methods for data mining from more than one data stream.

The 𝛿-discriminative emerging patterns are defined as special type of useful emerging

patterns (Li, Liu and Wong 2007) which occur in only one dataset (data class) with almost no

occurrence in other datasets (data classes). The 𝛿-discriminative emerging patterns are frequent

in the target dataset with frequency less than 𝛿 in the other datasets. The delta-discriminative

emerging pattern does not find some of useful emerging patterns compared to the discriminative

itemsets defined in this thesis. The discriminative itemsets are determined based on their relative

occurrences in the target and general dataset. If the support of a pattern is relatively different in

the‎ target‎dataset‎and‎general‎dataset‎ the‎pattern‎ is‎considered‎discriminative.‎However,‎ the‎δ-

discriminative emerging patterns are determined based on their frequency (i.e., < 𝛿) in the

general dataset; for example, the discriminative itemsets which are frequent in the target dataset

and general dataset, but the frequency is relatively different in the two datasets are not discovered

as δ-discriminative emerging patterns

1.2 PROBLEM STATEMENT AND OBJECTIVES

The previous section in this chapter describes the motivation of this research and also

identifies the general problems in acquiring user information. This section lists major research

questions that must be addressed in the work of this thesis.

1.2.1 Research problems

In this thesis, algorithms are developed for knowledge discovery from data streams.

These algorithms are proposed to enhance the effectiveness of description and prediction mining

in data stream environments. Working on more than one data stream has more challenges

compared to single data stream mining as the algorithms should deal with the bigger size of data

streams and also with the interaction between data streams. We use data mining techniques to

mine the discriminative itemsets in data streams in historical and real time window frames. The

extracted knowledge is used for describing the current trends of the data stream and recommend

for predicting its future trends using classification techniques.

Motivated by the applications explained in the previous section, the problem of mining

“discriminative‎ itemsets”‎ in‎ data‎ streams is defined following the concepts of discriminative

items (Lin et al. 2010), frequent itemsets (Agrawal and Srikant 1994) and emerging patterns

(Dong and Li 1999). Discriminative itemsets in data streams are the frequent itemsets in one data

stream with much higher frequencies than the same itemsets in another data stream.

Discriminative itemsets are closely related to the frequent itemsets in data streams. The

developed algorithms have to work on more than one data stream and will face the challenges

related to inter-relationships between data streams. The discriminative rules can be defined from

discriminative itemsets as the sequential rules in one data stream that have higher support and

confidence compared to the same sequential rules in the other data streams. These discriminative

rules can be used for prediction mining in data streams using classification techniques. We

develop algorithms based on a batch of transactions, the historical tilted-time window model and

the sliding window model.

As the growing speed data arrival in data streams is very fast, the proposed algorithms

should be designed in a way that can handle these large data streams in reasonable time and space

complexities. The pattern mining should be done quickly enough in order to be useful in the real

time data streams. To skip the unpromising processes, the proposed algorithms are modified

based on heuristics learned from the general and specific characteristics of the data streams.

In order to show the importance of the discriminative itemsets, mine the discriminative

itemsets and enhance the performance of discriminative itemset mining tasks, this thesis

addresses the following problems:

How the discriminative itemsets can be used in the classification techniques of large

data streams.

The aim of the discovered discriminative itemsets in data streams in this thesis can lead

them to be employed in the classification techniques for prediction in data streams. The

discovered discriminative itemsets in the tilted-time and sliding window models can be used for

the definition of the discriminative rule, as the rules with higher support and confidence in the

target data stream compare to the general data stream. We raised this research question, but it is

not a research task in this thesis work.

How the discriminative itemsets can be discovered from data streams.

There are many algorithms proposed for mining frequent itemsets from a single data

stream. Using these algorithms separately in multiple data streams and merging the results for

comparison purposes is time-consuming and does not follow the data stream mining tasks

necessities. The new algorithms should be defined and developed for processing more than one

data stream in a same scan and using the same data structures. However, the number of

discriminative itemsets is much fewer in comparison to the number of frequent itemsets and the

algorithms should have pruning techniques that lead to less itemsets being generated and tested.

This is an emerging new research problem,

How the discriminative itemsets can efficiently define the trends in data streams in

the historical and real-time window frames.

The designed algorithms should be adapted to the fast speed nature of the data streams.

The algorithms should utilize the concise data structures for saving the itemsets in the recent or

historical time frames. The heuristics should be proposed to exclude the generation of the non-

potential itemset combinations during the process. The aim is to employ the techniques for

achieving the high scalability and high accuracy in the large and fast-growing data streams. The

defined heuristics can be in general for pattern mining in the transactional data streams or

specified, based on the properties of the target data streams. This is raised as the fourth research

question in this thesis,

How well the discriminative itemsets can be set based on the characteristics of the

target datasets.

The data streams in different domains have different characteristics. Based on the

application domain and the input characteristics, the algorithms and data structures can be

optimised by tuning the input parameters either from the beginning or by the time, through

learning from history. How to tune and modify the related parameters such as support threshold

and discriminative level in the designed algorithms is important.

1.2.2 Research objectives

This thesis attempts to discover discriminative itemsets for efficiently and effectively

describing data streams, which can be adjusted to be used for data stream prediction. To achieve

this goal, the thesis will emphasize the discriminative itemset mining, as the success of

discriminative rule mining methods largely depends on this step. The discriminative itemsets are

extracted in the historical and real time window frames. The proposed thesis can be broken down

into various tasks, namely:

Developing extensive research to show the importance of discriminative itemsets in

data streams in the real applications: Discriminative itemsets in data streams are

focused on differences in trends in data streams. Discriminative itemsets can be

employed in defining the classification techniques in more than one data stream. The

existing frequent itemset mining methods in data stream are not applicable for more than

one data stream.

This includes extensive research in different algorithms proposed for mining frequent

itemsets in data stream, different contrast data mining methods and association classification

mining methods. The importance of the discriminative itemset and its superiority to the frequent

itemset and emerging patterns is discussed, and the application of discriminative itemset for the

definition of the discriminative rule is provided.

Developing algorithms to discover discriminative itemsets in data streams: Existing

works in data stream mining have been focused on frequent itemset mining in a single

data stream. Discriminative itemsets are different types of itemsets and involve multiple

data streams. Existing frequent itemset mining methods in data stream are not applicable

for mining discriminative itemsets. The emerging pattern mining methods are mainly

work on the static datasets and are involved in presenting the emerging patterns in

compact forms. New techniques will have to be developed.

This includes developing discriminative itemset mining algorithm in data streams. The

first algorithm is developed based on simple expansion of the state-of-the-art techniques designed

for frequent itemset mining in a single data stream.

Enhancing efficiency of the discriminative itemset mining based on determinative

heuristics: The discriminative itemsets can be discovered by comparing the frequent

itemsets in data streams without considering the size and complexity of the data streams.

This is not suited for the fast speed nature of data streams. New, efficient techniques will

have to be developed.

This includes developing the discriminative itemset mining algorithm in data streams

based on the determinative heuristics for having concise data structures and a faster mining

process.

Developing discriminative itemset mining algorithms in data streams for historical

time frames: The discriminative itemset mining in historical time frames is important in

applications with the necessity of discovering patterns in different time frames during the

history of data streams.

The algorithm has to be developed based on the tilted-time window model to show the

discriminative itemsets in different time granularities. We choose the tilted-time window model

to show the discriminative itemsets in different time frames during the history of the data

streams. The input transactions are grouped as a batch of transactions and the results are updated

in an offline state by shifting and merging the window frames.

Developing discriminative itemset mining algorithms in data streams for the real

time frame: The discriminative itemset mining in a real time frame is different than

historical time frames as the itemsets have to be discovered and updated in a fixed-size

window frame.

The algorithm has to be developed based on the sliding window model to show the

discriminative itemsets in the real time frame. We choose the sliding window model to show the

discriminative itemsets in a fixed size recent time frame of the data streams. The transactions

come one-by-one and results are updated based on the changes in the transactions that fit in the

sliding window frame.

Optimization in discriminative itemset mining: The discriminative itemset mining

techniques are defined in general. The algorithms can be optimized based on their input

parameters.

The parameters in the discriminative itemset mining algorithms can be defined generally,

or specifically tuned, based on the application domains. The parameters have to be set in smart

ways to mine the useful set of the discriminative patterns based on their different support and

discrimination.

In order to achieve these goals, the surveys on the data stream mining, the different

algorithms proposed for frequent itemset mining in data stream and their different categories, and

the data mining techniques defined for more than one data stream, are investigated. The emerging

pattern mining as the main contrast data mining task is extensively explained. The association

classification rule mining methods in data streams are also discussed briefly by exploring the

association rule mining methods used for the association classification. The problems are then

solved step-by-step and the advanced models are presented out of the proposed algorithms. The

object of the research in this thesis is to develop the optimized methods with more concise data

structures and faster processes in finding the discriminative itemsets in data streams. The

outcomes can be used for prediction purposes in data streams by defining the discriminative rules

and adapting the algorithms based on the application domain. The contributions are highly

significant and original especially in the proposed algorithms for the large and fast speed data

streams in the tilted-time and the sliding window models.

1.3 RESEARCH SIGNIFICANCE

Data streams are used in many real life applications like personalization (Tseng and Lin

2006; Djahantighi et al. 2010), anomaly detection (Lin et al. 2010), marketing (Hollfelder, Oria

and ¨Ozsu 2000; Huang et al. 2008; Prasad and Madhavi 2012) and forecasting upcoming trends

(Mori et al. 2005). With the presence of a huge amount of knowledge in the data streams, it is

considered crucial to use data mining to gain insights from it. Data streams are growing fast and

description of their current patterns as well as prediction about their future patterns are big

challenges (Wang et al. 2003; Zhu, Wu and Yang 2006; Zhang et al. 2010). Compared to the

static datasets, data streams are live currents of data, which are useful for online prediction.

The proposed algorithms for pattern mining can be categorized based on online or offline

data streams, different window models and approximation types (Aggarwal 2007; Cheng, Ke and

Ng 2008; Manku 2016). Most of the current pattern mining methods in data streams are working

based on single data streams. We use the extracted knowledge from comparing data streams to

highlight the differences between one data stream and other data streams. The current existing

classifier methods applied for prediction mining mostly work on the single data streams (Wang et

al. 2003; Zhu, Wu and Yang 2006; Zhang et al. 2010). The discriminative itemsets in data

streams can be used with these classifiers by distinguishing the data streams. The different

behaviours and trends of data streams are compared for better differentiation in their patterns.

This thesis will work on discriminative itemsets in data streams recommended to be used in

discriminative rule mining for a classifier method working based on different patterns of data

streams.

1.4 RESEARCH LIMITATION

One of the major limitations of this research work is that there exist no similar methods

for working on multiple data streams that can be used as a benchmark. However, as the

discriminative itemsets are closely related to the frequent itemsets, we expand the modified

versions of the FP-Growth method proposed for frequent itemset mining in a single data stream

to work on more than one data stream. The efficiency of our proposed advanced algorithms is

then evaluated in comparison with this method. The emerging patterns algorithms can be used as

the other benchmark for evaluation purposes. However, they are mostly working based on static

datasets and they have, both basically and in detail, differences in their output patterns, compared

to our proposed methods. As a close research work to the proposed methods in this thesis we

modify the DPMiner (Li, Liu and Wong 2007) to include all the delta-discriminative emerging

patterns (i.e., the original method does not find the large number of emerging patterns as

redundant patterns). We also modify the definition of the discriminative itemsets and

consequently the proposed methods to be the same as the delta-discriminative emerging patterns.

The efficiency of the proposed modified algorithms is then evaluated together.

1.5 CONTRIBUTIONS

This thesis has developed approaches for discovering discriminative itemsets in data

streams in different window models: In particular, the contributions of this thesis are to:

Develop the extensive research on frequent itemset mining in data streams and contrast

data mining to show the importance of discriminative itemsets in data streams in real

applications, and propose the discriminative itemsets for the application of classification

in data streams. Specifically, this covers the literatures that use the frequent itemset

mining and contrast data mining for the classification of static datasets and data streams.

Develop a simple single pass algorithm, DISTree, for mining the discriminative itemsets

in data streams. More specifically, this is about the definition of the discriminative

itemset mining in data streams and developing a simple method for the benchmarking

purpose with the advanced proposed methods.

Develop an advanced and efficient algorithm, DISSparse, for mining the discriminative

itemsets in data streams. More specifically, this is about developing an efficient method

for the fast speed large data streams.

Develop one algorithm for mining the discriminative itemsets in data streams using

novel data structures and the tilted-time window model. In this part, the efficient H-

DISSparse algorithm is developed for the historical discriminative itemset mining.

Develop a novel algorithm for mining discriminative itemsets over data streams

using the sliding window model. The efficient S-DISSparse algorithm is developed

for mining the discriminative itemsets in the sliding window model. This can handle

large datasets in an online data stream growing at high speed.

Show strategies and principles for parameter setting based on data stream characteristics.

Using experiments in a wide range of datasets and with different parameter settings, the

principles are demonstrated for tuning the algorithms’‎parameters.

1.6 PUBLICATIONS

M Seyfi, S Geva, R Nayak, Mining Discriminative Itemsets in Data Streams, Web

Information Systems Engineering–WISE 2014, 125-134

M Seyfi, R Nayak, Yue Xu, S Geva, Efficient Mining of Discriminative Itemsets, Web

Intelligence–WI 2017, ACM: 451-459

1.7 THESIS OUTLINE

This study is designed to explore the data mining techniques for pattern mining in data

streams. The study will primarily provide extensive research on frequent itemset mining in data

stream, contrast data mining and classification techniques in static datasets and data streams. The

secondary focus is on developing methods for mining discriminative itemsets in data streams.

The third focus of this study is making the developed algorithms efficient for mining

discriminative itemsets in large and fast-growing data streams. Finally, we show strategies and

principles to tune and modify the related parameters such as support threshold and discriminative

level. This thesis is organised in seven chapters, which follow the structure shown in Figure ‎1.1.

Figure ‎1.1: Research methodology and thesis structure

In Chapter ‎2, a review of the relevant literature is performed. The reviewed literature

includes the latest works in the area of data stream mining and the frequent itemset mining in

data streams in different window models. In the next step, the emerging pattern mining and

discriminative item mining in data streams are reviewed. After that, the sequential rule mining in

data streams is discussed. At the end, the significance of the work and the new knowledge

contributed to the area is verified, followed by the research gaps and motivations.

In Chapter ‎3, the problem of mining discriminative itemsets in data streams is formally

defined and the DISTree method is proposed based on the expansion of the FP-Growth method

for more than one data stream. However, the efficiency of this method is not tolerable for the

large datasets. The DISSparse method is proposed for efficient mining of the discriminative

itemsets. The proposed methods address the problem in a way that can be expanded to the tilted-

time and sliding window model. This chapter mainly covers the published paper of ‘Mining

Discriminative Itemsets in Data Streams’ and the published paper of ‘Efficient Mining of

Discriminative Itemsets’.

In Chapter ‎4, the H-DISSparse method is presented for mining the discriminative

itemsets using the tilted-time window model based on the offline data streams. The algorithm

shows results with high accuracy and recall. The H-DISSparse method is efficient for the range

of large data streams..

In Chapter ‎5, the S-DISSparse method is presented for mining the discriminative

itemsets in the sliding window model based on the offline and online updating states. This

method processes the input data streams in a fast and efficient way. The proposed S-DISSparse

algorithm has efficient time and space complexity and high accuracy in the high speed data

streams..

In Chapter ‎6, the four proposed algorithms are tested on various synthetic and real

datasets with different characteristics and sizes. The results are analysed and the efficiency and

accuracy of the methods are discussed in the wide range of input data streams. The principles are

showed for the parameter setting based on data streams’‎characteristics.

In Chapter ‎7, the conclusions and summary of the key findings that are drawn from this

thesis are highlighted as the significant contributions. Limitations are also pointed out and the

future works are discussed.

Chapter 2: Literature review Page 12

2Chapter 2: Literature review

This chapter presents a critical review of the literature essential to addressing the research

questions introduced in Chapter ‎1. This review presents and analyses current theories and

methodologies that have been used in the relevant research areas. In so doing, sound argument is

developed to support the research undertaken in this thesis.

This chapter starts with the definition of data streams and the sets of requirements

needed for developing data stream mining algorithms in Section ‎2.1. This will continue with the

developed algorithms for frequent itemset mining in a single data stream in Section ‎2.2. In

advance, the contrast data mining research methods dealing with pattern mining in more than one

data stream are discussed, and the emerging patterns as a close concept to the discriminative

itemsets are presented in Section ‎2.3. The data stream description using association rules is

discussed by opening the new research area of discriminative rule mining in Section ‎2.4. The

significance of the research and the new knowledge contributed to the research area is discussed

in Section ‎2.5. It finally highlights the limitations of current methods to support the research

hypothesis in Section ‎2.6.

2.1 DATA STREAM MINING

The data stream is defined as a continual sequence of transactions that are coming in a

fast speed through time (Manku and Motwani 2002). The data streams are considered as dynamic

datasets and compared to the traditional static datasets, there are more challenges in data stream

processing. First, the volumes of data streams in their life periods are huge and increasing fast

and they cannot be maintained and processed in the main memory or even in the secondary

storage. Second, in most of the data stream-related applications, quick and real time answers are

needed. Third, the embedded knowledge in data streams will change through time and by passing

the transactions we lose their information and the trends change by the concept drifts. Because of

these characteristics, every designed algorithm for data stream processing should guarantee the

following recommended requirements (Minos Garofalakis, Johannes Gehrke and Rastogi 2002;

Manku and Motwani 2002; Hyuk and Lee 2003).

All the data stream mining algorithms should work based on the concise and compressed

data structures, fitted in the main memory. These data structures should monitor the data streams

in the current state and the defined period of the history. The high speed nature of the data

streams is stressing that the designed algorithms must be faster than the rate of incoming data

streams. The new incoming transactions have to be processed as quickly as possible and the

results should be updated rapidly. The algorithms should follow the single pass constraint design

as in the fast data streams usually there is no time for the second pass and the heuristics should be

defined and used in the algorithms based on the data stream characteristics and the problem

definitions. The requested queries need short response time and the answers may go out of

interest by the time that stream is updated. This emphasises the need for the fast updating rates of

the algorithms. The data structures used in the algorithms have to be defined and updated by time

as the streams are passing, and the indicated concept drifts happening in the data streams. The

embedded knowledge in data streams will change through time and by passing the transactions

their information is lost. As a consequence, data stream mining algorithms are designed with

trade-off between their accuracy and ability to satisfy the discussed requirements. The

discriminative itemsets are a small subset of frequent itemsets. In the section below we discuss

the frequent itemset mining algorithm in data stream and their different categories in detail.

2.2 FREQUENT ITEMSET MINING IN DATA STREAM

In the last decade the frequent itemset mining algorithms working on single data stream

have attracted much attention (Manku and Motwani 2002; Hyuk and Lee 2003; Wu, Zhang and

Zhang 2004; Yu et al. 2004; Chang and Lee 2005; Leung and Khan 2006; Lin, Hsueh and

Hwang 2008; Cheng, Ke and Ng 2008; Li and Lee 2009; Guo, Su and Qu 2011; Manku 2016).

These studies can be categorized according to their update intervals, different window models

and the type of approximations (Cheng, Ke and Ng 2008). Depending on the target application

domain, the data stream mining algorithms can be bulk processing the input transactions, which

is called offline data streams, or processing by transaction generation through time, which is

called online data streams (Manku and Motwani 2002).

2.2.1 Window models and update intervals in data stream

The frequent itemset mining algorithms in data streams are categorized based on the

landmark window model, sliding window model, damped window model and the tilted-time

window model (Aggarwal 2007; Cheng, Ke and Ng 2008) (Figure ‎2.1 to Figure ‎2.4).

Figure ‎2.1: Landmark window model

Figure ‎2.2: Damped window model

Figure ‎2.3: Sliding window model

Figure ‎2.4: Tilted-time window model

In the algorithms designed based on the landmark window model (Figure ‎2.1), the

frequent itemsets are extracted and updated from a special point defined in data stream as the

start point until now. The starting point can be set anywhere in the data stream and this is

considered as the beginning of the window model without change for the entire period of data

stream mining (Manku and Motwani 2002). The damped window model (Figure ‎2.2) follows the

landmarked model using the same idea of the specified start point. However, in this window

model, the recent input transactions have higher impacts. The weighting functions are used for

the incoming transactions by giving higher weight to the newly added transactions. These groups

of algorithms usually use decaying factors for the old transactions (Chang and Lee 2003).

Compared to the landmark and damped window models with static start points, the

sliding window model (Figure ‎2.3) has a dynamic start point, which is moving by time. This

window model has static size, which is defined either based on the number of transactions or a

time slide. This window size is set by the user based on the domain application and the

algorithms ignore the data arrived out of this window slide (Chi et al. 2004). In the tilted-time

window model (Figure ‎2.4), frequent itemsets are maintained in the historical separated window

frames, which are in different sizes. The window frames belong to the old times, cover bigger

time intervals and the ones for the newer time intervals are set smaller. The reason for this

variation in the window sizes is because in the current times users are usually interested in data

mining results at fine granularities and in the past times at coarse granularities (Giannella et al.

2003).

Depending on the applications, the out-coming results are updated offline by processing

every new batch of incoming transactions or online by processing every newly arrived single

transaction (Aggarwal 2007). The frequent itemset mining algorithms can discover the exact

answers or the answers with approximations. The approximate algorithms are classified based on

the false-positive and false-negative techniques (Yu et al. 2004; Aggarwal 2007; Cheng, Ke and

Ng 2008). In the false-positive methods, by the level of user-defined approximations, a number

of non-frequent itemsets may appear in the final results as frequent causes for decreasing the

accuracy of the algorithms. In the false-negative algorithms all the reported answers are frequent;

however, a number of frequent itemsets may be missed results in lower recall. Out of the

different surveys, most of designed frequent itemset mining algorithms are false positive (Yu et

al. 2004; Aggarwal 2007; Cheng, Ke and Ng 2008; Li and Lee 2009).

2.2.2 Frequent itemset mining algorithms in data stream

In this section the main frequent itemset mining algorithms are discussed briefly with

their distinct characteristics and limitations. Table ‎2.1 lists the most popular algorithms and

specifies their main features based on the different categories:

Table ‎2.1 Data stream frequent itemset mining algorithms

Algorithm Window

Update interval Approximation

Lossy counting

(Manku and

Motwani 2002)

Landmark Offline data stream False-positive

estDec (Hyuk and

Lee 2003)

Damped Online data stream False-positive

FP-Stream

(Giannella et al.

Tilted-time Offline data stream False-positive

FPDM (Yu et al.

Landmark Offline data stream False-negative

Moment (Chi et al.

Sliding Online data stream Exact answers

DSM-FI (Li, Lee and

Shan 2004)

Landmark Offline data stream False-positive

estWin (Chang and

Lee 2005)

Sliding Online data stream False-negative

DStree (Leung and

Khan 2006)

Sliding Offline data stream Exact answers

VSMTP (Lin, Hsueh

and Hwang 2008)

Tilted-time Offline data stream False-positive

WSW (Tsai 2009) Sliding Online data stream Exact answers

The Lossy counting (Manku and Motwani 2002) is a one-pass algorithm proposed for

finding the frequent itemsets in a data stream. In this algorithm a given threshold 𝜃 is defined by

user for the application domain and the error rate (ɛ) adjusted by Chernoff bound. The answers

are guaranteed to contain all the frequent itemsets with frequency higher than 𝜃 and not less than

(𝜃 − ɛ). The itemsets with frequency between (𝜃 − ɛ) and 𝜃 make the group of false-positive

ones in the answering set. In the false-positive algorithms, generally if there is time to do the

second pass scan in the data stream, then all the false-positive answers are eliminated (Aggarwal

2007).

The FPDM (Yu et al. 2004) is another false-negative frequent itemset mining algorithm.

The same as Lossy counting, this algorithm works based on the landmark window model over

data stream. In this method, the non-frequent itemsets do not appear in the answering set but it

may lose the frequent itemsets with frequency higher than the user defined threshold 𝜃. The

approximation in the answering set is controlled by the user defined error rate ɛ 𝜖 (0,1) and

reliability 𝛿 𝜖 (0,1). The ɛ is used to control the error bound and 𝛿 controls the memory

consumption. Like any other frequent itemset mining algorithm, the exponential numbers of

candidate itemsets is the main problem. The generation of the candidate itemsets is much bigger

for the data mining in more than one data stream.

In the landmark window model the frequent itemsets are mined in the entire data stream

but in the tilted-time window structure the frequent itemsets are maintained in the different time

granularities (Giannella et al. 2003) of the data stream history. In many applications, users are

interested in time related trends and changes in frequent itemsets than the frequent itemsets

themselves (Giannella et al. 2003). Out of the historical windows structure, the time sensitive

queries for frequent itemsets are answered. The FP-Stream (Giannella et al. 2003) is a prefix tree

structure defined based on the FP-Growth method (Han, Pei and Yin 2000). This structure uses a

logarithmic time scale tilted-time window as users are usually interested in recent trends in the

short-term periods and the historical trends in the long-term periods. The logarithmic window

model is defined for each prefix tree node to show the frequency of the frequent itemsets in the

recent and historical time frames. The FP-Stream method uses the same support threshold in all

the tilted-time windows. This method mines the frequent itemsets in the current time frame and

then merges the results by passing the old frequent itemsets in the tilted-time window frames.

The VSMTP (Lin, Hsueh and Hwang 2008) algorithm expanded this method with changeable

support thresholds in its window model.

The Moment (Chi et al. 2004) extracts the closed frequent itemsets in the sliding window

of data stream. Based on the assumption in this algorithm, the size of the sliding window is set to

the number of transactions that are fitted in main memory. The algorithm uses the heuristic that

in most of the cases the frequent itemsets are the same in successive sliding windows (Cheng, Ke

and Ng 2008). It means the boundary between frequent and non-frequent itemsets, and also the

boundary between closed frequent itemsets and other itemsets move very slowly. In this

algorithm, instead of looking for all the closed frequent itemsets in each sliding window, mostly

it tries to define these boundaries. The closed enumeration tree (CET) data structure, which is

fitted in main memory, is defined to have the closed frequent itemsets and the itemsets that are in

the boundary of closed frequent itemsets and other itemsets. This structure is updated by new

incoming transactions and categorized as an online data stream processing method.

The DSM-FI (Li, Lee and Shan 2004) uses an FP-Tree structure like forest and mines

the approximate frequent itemsets in the landmark window model. estWin (Chang and Lee 2005)

is another algorithm working based on the sliding window model. All the transactions in the

current window are saved by CTL data structure. The CTL initializes by every new transaction

until it becomes full. In the next state the old transactions are deleted by new incoming

transactions. The significant frequent itemsets are maintained in the monitoring prefix tree and

updated by changes in the current window. Another tree structure proposed is DStree (Leung and

Khan 2006). In this structure in each node a list of frequency counters are maintained for every

batch of transactions. The algorithm is offline and by each new batch of transactions, the window

slides and the transactions in the oldest batch are deleted and counters in every affected node are

shifted. This algorithm is designed for exact frequency counting.

The estDec (Hyuk and Lee 2003) is an online data stream mining algorithm that focuses

on recent frequent itemsets more than the old ones. The weight of the old transactions is

decreased by defined decaying factors as the new transactions arrive. This algorithm saves the

frequent itemsets in a lattice and updates the lattice by every new transaction. The frequency of

the itemsets that are a subset of the new transaction will change based on the damped window

model. At first their frequencies decrease by decaying factor and then increase by one. In the next

step, the subsets of current transactions that can be frequent are added to the lattice. In WSW (Tsai

2009) users are allowed to define the number of windows, with size and weight for each of them.

This can be useful when certain sections of the data stream are considered more important

compared to the other parts.

Summary: In summary and based on the discussion in (Aggarwal 2007), the landmark

model is more fundamental compared to the other window models. In the sliding window model,

the most challenging problem happens when the main memory does not fit the sliding window.

In the damped model, compared to the landmark model, by every new transaction the frequency

of all the itemsets has to be adjusted, even if they are not a subset of that recent transaction. The

proposed algorithms in this thesis are categorized based on the algorithms developing for mining

the discriminative itemsets, using the tilted-time window model for offline data streams followed

by developing the algorithm for mining the discriminative itemsets using the sliding window

model in online data streams. We choose the tilted-time window model to have the

discriminative itemsets in multiple time granularities. We choose the sliding window model to

have the discriminative itemsets in the most recent time.

Expanding the current frequent itemset mining algorithms for working on more than one

data stream has different challenges that cause inefficient results. These methods have been

designed based on either the basic Apriori idea or FP-Growth algorithms (Aggarwal 2007).

Expanding the Apriori-based frequent itemset mining idea for the discriminative itemset mining

is not feasible. The concept behind Apriori group methods is that they are working based on the

principle of (k+1)-candidate itemset generation and test, out of the frequent k-Itemsets. This

process will be time and space consuming for the discriminative itemset mining as the entirety of

the candidate frequent itemsets in the target data stream have to be generated and then tested for

saving the frequent itemsets in data structures. They have to be then checked against the same

itemsets in other data streams for their frequencies. Based on the definition of the discriminative

itemsets, they are in a much smaller percentage of the total frequent itemsets, especially in larger

discriminative level thresholds. This is cause for inappropriate results in expansion of the

frequent pattern mining algorithms for discriminative itemset mining in more than one data

stream. The frequent itemset mining methods followed FP-Growth can possibly be better

expanded for discriminative itemset mining; however, generating the unnecessary combinations

in the prefix tree structures and checking of itemsets for frequency in more than one data stream

will incur unreasonable extra time and space usage.

The prefix tree structures used in the FP-Growth algorithms can be modified based on

the pruning heuristics, by ignoring the generation of the frequent itemsets that does not lead to the

discriminative itemsets. The designed algorithms discover the discriminative itemsets that exist in

the current batch with the exact accuracy. The approximations in the historical results during

merging the window frames in the tilted-time window model are set based on the user defined

thresholds leading to the higher time and space complexities, as discussed in the analysis part in

Chapter ‎6. The approximation in the sliding window model algorithm depends on the sub-

divisions of the window size leading to the higher accuracy in the application domains with less

concept drifts in the short time periods. These algorithms are categorized as the approximate

methods that guarantee the high accuracy of the reported discriminative itemsets. Losing a

percentage of the answers during merging the different time periods is caused for lower recall.

The discovered discriminative itemsets are proposed for the discriminative rule mining

in data streams for prediction purposes in classification techniques. The discriminative rules can

be defined based on the discriminative itemsets as the sequential rules in one data stream that

have higher support and confidence compared to the same sequential rules in the other data

streams. The discriminative itemsets are measured based on the support and confidence

thresholds to find the set of useful discriminative rules for prediction mining.

Discriminative itemset mining in data streams focuses on differentiating the data streams

based on their trends in different window models. This is a wide research topic defined as

contrast data mining. In the contrast data mining the differences between datasets are discovered

based on the defined thresholds. In the section below the contrast data mining as the main

research area for discriminative itemset mining is discussed in detail.

2.3 CONTRAST DATA MINING

Contrast mining is a focused data mining research area for discovering interesting

contrast patterns that state the significant differences between datasets (Dong and Bailey 2012).

The discriminative itemsets are a kind of contrast patterns. The contrast patterns show non trivial

differences between datasets. The emerging patterns (Dong and Li 1999) are one of the well-

known contrast patterns. The distinguishing power of emerging patterns is used for building

powerful classifiers (Dong et al. 1999). In emerging patterns the degree of change in supports of

itemsets is important, and the actual support of itemsets is not considered. In contrast, the

discriminative itemsets proposed in this thesis focus on the difference in support rather than the

change degree of support. We discover the real support of each discriminative itemset and the

relative differences of supports in datasets explicitly, which can provide concrete information

assisting users in making right decisions. The discriminative itemsets with different cardinalities

are useful for making rule-based classifiers with high accuracy. Most importantly, there are too

many emerging patterns with low supports which may not be interesting. The discriminative

itemsets are in smaller numbers as firstly they must be frequent in the target dataset.

2.3.1 Emerging patterns

Similar to the concept of discriminative itemsets are the emerging patterns (EPs) (Dong

and Li 1999). EPs are defined as itemsets whose frequencies grow significantly from one dataset

𝐷1 to another 𝐷2. EPs are able to highlight the emerging trends in the time stamped datasets and

also show the differences between data classes. In most of the applications the EPs with large

supports are mainly folklore (i.e., already known) and EPs with low to medium support, such as

1%-20%, are interested. For example, the purchase patterns in a company with the growth rate of

two from last year to this year, even with low supports, can give good insights to the domain

experts. In the medical field, EPs with long length and low supports may add new knowledge to

the field. The treatment combinations and symptoms with even small growth rate from patient

not cured compared to who were cured can suggest good treatments (i.e., if there are no better

plan).

The interest in finding emerging patterns with small supports is challenging as first, the

useful Apriori property not holds and second, there are too many candidate patterns. Because of

these two reasons the naïve methods are expensive. However, using some nice properties of the

patterns (i.e., set intervals) the efficient mining method is proposed (Dong and Li 1999). In the

interval closed collection 𝑆 of sets, such that 𝑋 and 𝑍 are in 𝑆 and 𝑌 is a set which is 𝑋 ⊆ 𝑌 ⊆ 𝑍,

then 𝑌 is in 𝑆. The collections of large itemsets for a specific given thresholds are interval closed.

The proposed method (Dong and Li 1999), which has been followed in other studies (Zhang,

Dong and Kotagiri 2000; Fan and Ramamohanarao 2002; Bailey, Manoukian and

Ramamohanarao 2002; Alhammady and Ramamohanarao 2005; Loekito and Bailey 2006; Li,

Liu and Wong 2007; Bailey and Loekito 2010; Yu et al. 2013; Yu et al. 2015), works based on

border definition of maximal (large) itemsets in the first and second datasets. The large itemsets

are discovered using their borders as the pair of sets of the minimal itemsets and of the maximal

itemsets. The borders are much smaller than the collections they represent. The algorithms are

quick even with large number of emerging patterns. For example, in mushroom dataset presented

in UCI repository, the algorithm finds 228 EPs for the growth rate threshold of 2.5 (Dong and Li

1999).

In the ordered pair of datasets 𝐷1 and 𝐷2 the growth rate of itemset 𝑋 from 𝐷1 to 𝐷2, is

defined as in equation (‎2.1). Considering 𝜌 as a growth-rate threshold, an itemsets 𝑋 is 𝜌-

emerging pattern from 𝐷1 to 𝐷2 if 𝐺𝑟𝑜𝑤𝑡ℎ𝑅𝑎𝑡𝑒(𝑋) ≥ 𝜌. The main interest in emerging pattern

mining is on the degree of change in supports, than their actual supports.

𝐺𝑟𝑜𝑤𝑡ℎ𝑅𝑎𝑡𝑒(𝑋) = {

0, 𝑖𝑓𝑠𝑢𝑝𝑝1 = 0 𝑎𝑛𝑑 𝑠𝑢𝑝𝑝2 = 0∞, 𝑖𝑓 𝑠𝑢𝑝𝑝1 = 0 𝑎𝑛𝑑 𝑠𝑢𝑝𝑝2 ≠ 0

𝑠𝑢𝑝𝑝2𝑠𝑢𝑝𝑝1

, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (‎2.1)

The EPs are identified by extracting the maximal itemsets separately for each dataset

based on the defined frequency changing rate in the first and second datasets. The maximal

itemsets are then reported altogether between the two borders. The borders are defined using the

lowest and the highest thresholds (Dong and Li 1999), as the Left and Right borders denoted as

<Left, Right>. By working on these two borders it limits the answers in the area between <L, R>

to show a large collection of itemsets. The idea is to decrease the number of candidate itemsets

by defining just the borders for the maximal itemsets. In all EPs the ‘frequency‎grow’‎is‎between‎

the‎given‎‘borders’.‎The‎emerging‎pattern‎mining‎methods need to check frequency or frequency

change but do not care whether the pattern is frequent or not. The discriminative itemsets

methods proposed in this thesis first finds frequent itemsets and then compares the difference of

the frequencies of the same itemsets in two datasets.

Based on Figure ‎2.5 the emerging pattern mining method defines two minimum support

thresholds 𝜃𝑚𝑖𝑛 and 𝛿𝑚𝑖𝑛 for itemset 𝑋 in two datasets 𝐷1 and 𝐷2 and extracts the maximal

itemset boundary between these two maximal itemsets (triangle ACE). This area is separated into

three different domains; triangle ABG, rectangle BCDG and triangle GDE, respectively. The EPs

extracted from these three areas are merged together as a final result (Dong and Li 1999).

Figure ‎2.5: Support plan for emerging patterns (Dong and Li 1999)

The EPs in BCDG rectangle are precisely those itemsets whose support in 𝐷2 ≥ 𝜃𝑚𝑖𝑛

but in 𝐷1 < 𝛿𝑚𝑖𝑛. The proposed method in (Dong and Li 1999) uses the borders of

𝐿𝑎𝑟𝑔𝑒𝛿𝑚𝑖𝑛(𝐷1) and of 𝐿𝑎𝑟𝑔𝑒𝜃𝑚𝑖𝑛(𝐷2), instead of using 𝐿𝑎𝑟𝑔𝑒𝛿𝑚𝑖𝑛(𝐷1) and of

𝐿𝑎𝑟𝑔𝑒𝜃𝑚𝑖𝑛(𝐷2) themselves, as inputs of the algorithm. The algorithm drives the emerging

patterns by manipulating only the two borders and producing the border representation of the

derived EPs. The algorithm avoids the high number of candidate itemsets generation and

avoiding printing large number of EPs.

The EPs in GDE triangle are from candidate itemsets whose support in 𝐷1 ≥ 𝛿𝑚𝑖𝑛 and

in 𝐷2 ≥ 𝜃𝑚𝑖𝑛. These candidates are exactly from 𝐿𝑎𝑟𝑔𝑒𝛿𝑚𝑖𝑛(𝐷1) ⋂ 𝐿𝑎𝑟𝑔𝑒𝜃𝑚𝑖𝑛(𝐷2). The

approximate size of intersection can be estimated by checking the border description. When this

intersection is small, the EPs can be found easily by checking the support of all the candidates in

the intersection. When this intersection is large, it is recursively divided to a

new rectangle and new triangle and applying the border algorithm for the new rectangle until find

approximately all the emerging patterns. Finding the EPs in ABG triangle is a hard task as there

may many EPs in this area with small support in 𝐷1 or 𝐷2 or both. These are the large number of

itemsets with 𝑠𝑢𝑝𝑝(𝑋) < 𝛿𝑚𝑖𝑛 or in 𝑠𝑢𝑝𝑝(𝑋) < 𝜃𝑚𝑖𝑛. Generally, finding all the EPs is very

challenging and algorithms mainly look for a way to show the best approximations.

The key point of using borders is to efficiently showing large collections of itemsets as

emerging patterns. Each interval closed set of emerging patterns has a unique border <L, R>,

where L is the collection of minimal itemsets and R is the collection of maximal itemsets. The

method uses the Max-Miner (Roberto J. Bayardo 1998) for discovering the maximal borders. The

differential procedure, called Border-Diff and the main algorithm is MBD-LLborder, which uses

the Border-Diff as subroutine (Dong and Li 1999). The algorithm calls the Border-Diff multiple

times to drives the EPs.

Considering the large number of emerging patterns and multiple applications and dataset

types, different algorithms use borders for mining different types of emerging patterns. The

different types of emerging patterns are explained in the section below.

2.3.1.1 Different types of emerging patterns

The different types of emerging patterns have been defined (Dong and Bailey 2012) for

capturing the changes or differences between datasets. Jumping emerging patterns (JEPs)

(Bailey, Manoukian and Ramamohanarao 2002) is a special type of emerging patterns which are

the itemsets whose support increase from zero in one dataset to non-zero in another dataset. The

jumping emerging pattern (JEP) growth rate must be infinite (i.e. it is present in one and absent

in another). The jumping emerging patterns are useful as a means of discovering inherent

distinctions exist amongst datasets. The proposed method in (Bailey, Manoukian and

Ramamohanarao 2002) uses prefix-tree structure as applied in (Han, Pei and Yin 2000) for

frequent itemset mining. However, there are significant differences in the kinds of tree shapes

that lead to efficient mining.

The mining of emerging patterns is harder than that of frequent itemsets. The Apriori

property does not exist for JEPs and thus the algorithm has greater complexity. The JEPs can be

more efficiently discovered than general emerging patterns and are useful in building strong

classifiers (Li, Dong and Ramamohanarao 2001). Since the emerging pattern mining is dealing

with several classes of data, each node in the prefix tree structure must hold the frequency of the

itemset for each class. The multiple transactions are compressed with sharing a same prefix of

itemset and they are merged into individual nodes with increased counts. The items in the prefix

tree are ordered in the inverse ratio tree ordering. The intuition for this ordering is JEPs reside

much higher up in the tree than they would under the descending frequency in frequent tree

ordering (Han, Pei and Yin 2000). This will help to limit the depth of branch traversals needed to

mine emerging patterns.

The method uses a core function called Border-Diff used in (Dong and Li 1999) to return

the set of JEPs. The initially constructed prefix tree has null root node, with each child node as

the root of a subtree named component tree. Each component tree is traversed downward for

each of its branches. The nodes which contain a non-zero counter for the class which the JEPs

are discovered and zero counters for every other class are called base nodes. The significance of

these nodes is that the itemset spanning from the root of the branch to the base node is unique to

the class being processed. This itemset is therefore a potential JEP and hence any subset of this

itemset is also potentially a JEP. After identifying a potential JEP, it gathers up all negative

transactions that are related to it (i.e. share the root and base node). Border-diff is then invoked to

identify all actual JEPs contained within the potential JEP. After examining all branches for a

particular component tree, the branches are inserted into the remaining component trees having

removed the initial node of each. The component trees are traversed from leftmost component

tree to find children as base nodes for potential JEPs. It examines every potential JEP with

reference to every component tree (and thus every item in the problem), to ensures completeness.

The number of component trees is equal to the number of unique items in the datasets.

The ConsEPMiner (Zhang, Dong and Kotagiri 2000) discovers the emerging patterns

based on two types of constraints namely, external constraints and inherent constraints. The

external constraints are user-given minimum on support, growth rate and growth-rate

improvement. The support and growth rate directly prune the search space. The growth rate

improvement is used to eliminate the uninteresting emerging patterns. A positive growth-rate

improvement ensures concise representation of EPs which are not subsumed by previously

discovered patterns. The inherent constraints same subset support, top growth rate and same

origin are derived from dataset characteristics and properties of EPs. The inherent constraints

also used for further pruning the search space. The search framework is made of a set

enumeration tree (i.e., SE-tree) which enumerates all the itemsets in breadth-first-search. The two

types of constraints are used for limiting the search space by skipping large number of itemsets.

Essential jumping emerging patterns (eJEPs) (Fan and Ramamohanarao 2002) are the

itemsets whose support grows from zero in one dataset to higher than a specified threshold in

another dataset, with no subset as essential jumping emerging patterns. eJEPs are minimal

itemsets. If using less attributes the two data classes can be distinguished, using more may not be

useful and may even add noise. In (Fan and Ramamohanarao 2002) the eJEPs are discovered

using a pattern tree (P-tree). A P-tree is an extended prefix tree structure storing the quantitative

information about eJEPs. The count of item for each dataset is registered and items with larger

support ratios which are closer to the root make the eJEPs. By depth first search from the root,

algorithm always find the shortest one first. This process is completely different from FP-Growth

(Han, Pei and Yin 2000) based methods. It merges the node during search to ensure complete set

of eJEPs are generated. The pattern growth is achieved using concatenation of prefix pattern with

the new one at deep level. As the algorithm is interested to the short eJEPs the depth of the

search is not very deep (i.e., normally 5 to 10). The accurate classifiers are made out of the

smaller eJEPs compared to JEPs.

In (Bailey and Loekito 2010) a method is proposed for mining contrast patterns in

changing data based on the old and the current parts of a data stream. The method is focused on

jumping emerging patterns as special type of contrast patterns. The minimal JEPs are discovered

in data stream by adding new transactions and deleting the old transactions. This is different to

the problem of mining discriminative itemsets in data streams in this thesis as the contrast

patterns are discovered in the old part (i.e., old class) compared to the recent part (i.e., recent

class) of a single data stream. The discriminative itemsets proposed in this thesis are discovered

in multiple data streams changing at a same time.

2.3.1.2 Delta-discriminative emerging patterns

The pattern mining based on support and confidence of itemsets may bring statistical

flaws to the results. The delta-discriminative itemsets are itemsets with ranked statistical merits

under different test statistics such as chi-square, risk ratio, odds ratio, etc. The 𝛿-discriminative

emerging patterns (Li, Liu and Wong 2007) are determined based on a threshold 𝛿. The

DPMiner algorithm (Li, Liu and Wong 2007) can efficiently mine the 𝛿-discriminative emerging

patterns. The algorithm skips the subset of frequent itemsets if their support in the general dataset

is larger than 𝛿. However, for the discriminative itemsets proposed in this thesis, a subset of a

non-discriminative itemsets can be discriminative. It also skips the redundant itemsets defined as,

a superset of discriminative itemsets with the same or smaller infinite ratio between the supports

in the target and general dataset. The itemset with infinite ratio has high frequency in the one

dataset and zero or very low frequency in the other datasets. Similar to the discriminative

itemsets, the delta-discriminative emerging patterns must be frequent in the target dataset

(positive data class) i.e., 𝑓𝑖 > 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 but with frequency less than delta in the

general dataset i.e., 𝑓𝑗 < 𝛿, where 𝛿 is usually a small integer number 1 or 2. The Discriminative

Pattern Miner (DPMiner) algorithm (Li, Liu and Wong 2007) discovers the delta-discriminative

emerging patterns having the maximum frequency in the contrasting classes, efficiently (Dong

and Bailey 2012).

The DPMiner algorithm is proposed based on the concepts of equivalence classes. An

equivalence class EC is a subset of itemsets that always occur together frequently in the same set

of transactions. The equivalence class can be uniquely defined based on closed patterns and a set

of generators. The closed pattern is a frequent itemset with no proper superset in the same

frequency (Zaki and Hsiao 2002). The generators are the itemsets with smaller frequency

compared to every immediate proper subset (Li et al. 2006). The generators are the minimal

itemsets among the equivalence class and the closed pattern is the maximum itemset. The key

idea in DPMiner algorithm is to mine the concise representation of the equivalence classes. The

DPMiner employs 𝛿 constraint to reduce the pattern search space by setting a border of non 𝛿-

discriminative emerging patterns. If an EC is 𝛿-discriminative and non-redundant, then none of

its subset ECs can be 𝛿-discriminative and none of its superset ECs can be non-redundant. The

DPMiner efficiently mine the delta-discriminative emerging patterns by skipping the subset of

itemsets with 𝑓𝑗 > 𝛿.

The‎ δ-discriminative equivalence class often have a value of infinity under relative

support (i.e., 𝑓𝑖 𝑛𝑖⁄

𝑓𝑗 𝑛𝑗⁄=

𝑛𝑗∗𝑓𝑖

𝑛𝑖∗𝑓𝑗= ∞) used in many statistical tests. The equivalence class is

redundant if the closed itemsets of one EC is the subset of closed itemsets of another EC as all the

itemsets already subsumed by the subset closed itemset. This emphasizes that only the most

general‎δ-discriminative equivalences are non-redundant. In the depth first search the algorithm

stop the pattern mining along the branch whenever it reaches‎to‎the‎δ-discriminative equivalence

class. This will ignore the low ranked equivalence classes as redundant. DPMiner is efficient as it

integrates the mining of closed patterns and generators into one depth-first framework.

The definition of the δ-discriminative equivalence classes are close to the jumping

emerging patterns (JEPs) but different in concept. A JEP is an itemsets which occurs in one data

class‎and‎never‎occurs‎in‎other‎classes.‎The‎δ-discriminative equivalence class are presented in a

group of itemsets but JEP are presented in single. The DPMiner delete the redundancy within

equivalence class while the redundancy exists in the JEPs.‎Also,‎the‎δ-discriminative equivalence

class must be frequent but JEPs can be infrequent.

The CDPM method (Conditional Discriminative Patterns Mining) (He et al. 2017) is

also proposed for discovering a set of significant non-redundant discriminative patterns, which

have no similar discrimination in their subsets. The discriminative itemsets with frequency

difference in the two datasets (or two data classes) greater than the defined significant threshold

are discovered. Out of these significant discriminative itemsets only the itemsets with frequency

difference in all their subsets greater than the defined local significant threshold are identified.

The CDPM mainly focuses on effectiveness of the patterns and not the efficiency of the

algorithm in case of time and space usage. It discovers the small number of patterns that have

discriminative power which cannot be obtained from their subsets (i.e., subsets of a conditional

discriminative pattern do not have the same discriminative power as pattern does). The DPM (Li,

Liu and Wong 2007) and CDPM (He et al. 2017) discover the discriminative itemsets based on

their statistical measures. Discovering a small set of conditional discriminative patterns without

redundancy is more preferred than generating all patterns. However, subsets of non-conditional

discriminative patterns could be discriminative, but will be missed out by CDPM algorithm. The

proposed methods in this thesis discover a complete set of discriminative itemsets based on their

explicit supports in the datasets and not the statistic measures. These are useful for the

applications using support and confidence for pattern mining. The discriminative itemsets are

used for mining discriminative rules based on the defined support threshold and confidence

threshold. The discriminative rules are then used for making rule based classifiers.

Summary: The δ-discriminative emerging patterns skips the subset of itemsets if their

support in the general dataset is larger than 𝛿. However, for the discriminative itemsets proposed

in this thesis, a subset of a non-discriminative itemsets can be discriminative. The δ-

discriminative emerging pattern also ignores the low ranked equivalence classes as redundant.

However, for the discriminative itemsets proposed in this thesis, explicit relative supports are

provided for every discriminative itemset. This will support users with more information for data

stream description and decision making. In the section below the detail differences between

emerging patterns and discriminative itemsets are discussed.

2.3.1.3 Differences between emerging patterns and discriminative itemsets

The discriminative itemset mining is a novel research problem by mining a complete set

of the itemsets which are frequent in the target dataset based on the support threshold, and are

discriminative in the target dataset compared to the general dataset based on discriminative level

threshold. Despite the similarities between the definition of discriminative itemsets and Emerging

Patterns (EPs), they are different in several ways;

Firstly, in EPs the degree of change in supports of itemsets is important, and the actual

support of itemsets is not considered (Dong and Li 1999). EPs can be infrequent, which could

result in many EPs with low supports. In contrast, discriminative itemsets have to be frequent in

the target dataset based on the support threshold, and be discriminative in the target dataset

compared to the general dataset based on discriminative level.

Secondly, the proposed algorithms for emerging patterns are mostly focused on

representing these patterns in a compact way to avoid examining all the possible itemsets. The

real supports of the EPs are not explicitly presented as they are reported in a group between two

borders using the maximal itemsets. For the purpose of comparison, in discriminative itemsets in

data streams the frequencies of the highly and lowly discriminative itemsets have to be known.

Thirdly, the EPs algorithms are mainly proposed for static datasets, except a small

number of works in stream mining (Alhammady and Ramamohanarao 2005; Bailey and Loekito

2010) based on the same idea of border definition. The (Alhammady and Ramamohanarao 2005)

method discovers the EPs in each block of transactions and discards the block out of the process.

Based on the defined minimum thresholds for each dataset the maximal itemsets are extracted

separately and then borders are defined. This border definition is not useful for data streams as

the designed algorithms for data streams have to process the streams by one scan.

Fourthly, the EPs method generates all the combinations of the itemsets, which is not

suitable for the discriminative itemset mining methods looking for only generating the potential

combinations of itemsets. Also, comparing the concept of the proposed methods for EPs

following the candidate itemsets generation, the discriminative itemset mining method is more

efficient if designed based on the FP-Growth (Han, Pei and Yin 2000).

The 𝛿-discriminative emerging patterns proposed in (Li, Liu and Wong 2007) are

determined based on a threshold 𝛿. The DPMiner algorithm in (Li, Liu and Wong 2007) can

efficiently mine the 𝛿-discriminative emerging patterns by skipping the subset of itemsets if their

support in the general dataset is larger than 𝛿. The discriminative itemsets discussed in this thesis

are relatively discriminative in the target dataset compared to the general dataset. A subset of a

non-discriminative itemsets can be discriminative. The delta-discriminative emerging patterns do

not include some of useful emerging patterns compared to the discriminative itemsets defined in

this thesis.

The discriminative itemsets which are frequent in the target dataset and general dataset,

but the frequency is relatively different in‎ the‎ two‎ datasets‎ are‎ not‎ discovered‎ as‎ the‎ δ-

discriminative emerging patterns, for example, the itemsets in market basket frequent in all

suburbs with relatively higher frequency in the target suburb with smaller average age of

population. The DPMiner skips the redundant itemsets as a superset of subset itemsets with the

same infinite ratio between the supports in the target and general dataset. The discriminative

itemsets are discovered with explicit supports in datasets without redundancy, for example, the

superset of discriminative itemsets with different cardinalities is discovered as the discriminative

itemsets.

Summary: In the discriminative itemsets proposed in this thesis, the frequencies of

discriminative itemsets are derived and explicitly provided to the user together with the patterns.

The significance of discriminative itemset mining in comparison to the emerging pattern mining

is in the explicit cardinality of the itemsets in the datasets. Every discriminative itemset is

reported with its real supports in the target dataset and general dataset, respectively. The accurate

classification techniques can be defined based on the discriminative rules which are extracted

from the discriminative itemsets.

The proposed algorithms for emerging patterns are mostly focused on representing these

patterns in a compact way to avoid examining all the possible itemsets. Therefore, in the

proposed methods for discriminative itemset mining the combination of itemset generation in an

FP-Tree should be restricted by defining the useful heuristics working both based on the general

and specific features of the target data streams. The data streams should scan together and for one

time only. In contrast to the EPs method, the difference between very high discriminative

itemsets and other discriminative itemsets is observable by the exact support value of itemsets

using the definition of the support threshold and discriminative level.

2.3.2 Other contrast patterns

In the other related research, the HFP-Tree method (Zhu and Wu 2007) is proposed for

discovering the relational patterns across multiple databases based on the desire queries. The

relational patterns hold the specific relationships in datasets that is exist in the itemsets. This

method finds the desired patterns based on the defined queries; for example, itemsets with

frequency in datasets A and B higher than specific threshold. This is followed by the H-Stream

(Guo et al. 2011) for mining frequent patterns across multiple data streams in the tilted-time

window model. The H-Stream is an offline method based on the FP-Growth (Han, Pei and Yin

2000) without restricting the generation of non-potential itemset combinations which will be time

and space consuming for the large and fast-growing data streams. There are also research works

following the discriminative sequential changes (Patel, Hsu and Lee 2011; Gao et al. 2016) for

data streams differentiation. The contrast subspace mining proposed recently (Duan et al. 2014;

Duan et al. 2016) for discovering the subspaces that maximize the ratio of likelihood of a query

in one class against other class; for example, patients with symptoms most similar to the cases in

disease A, and at the same time dissimilar to cases in disease B.

In the section below the discriminative itemset mining in data streams, challenges,

usefulness and applications and are discussed.

2.3.3 Discriminative itemset mining in data streams

Discriminative pattern mining is a new research area in data stream mining (Lin et al.

2010). The discriminative itemsets are those itemsets that are frequent in the target data stream

domain. Compared to the frequent itemsets and sequential patterns, the discriminative itemsets

are more distinctive and more directed for the purpose of comparison between data streams and

carry more valuable information. They can highlight the differences between patterns and trends

of different data streams more clearly and distinguish their major trends and be more useful for

the purpose of prediction mining. The discriminative itemsets can also be discovered for the

purpose of anomaly detection and personalization in the web applications. The generally frequent

itemsets are not distinctive (Lin et al. 2010) and do not suffice for their intended purposes.

In contrast to the numerous algorithms proposed on mining frequent itemsets in single

data streams, there has not been much research done for pattern mining in more than one data

stream. In (Amagata and Hara 2017 ) a method proposed for mining top-k closed co-occurrence

patterns across multiple streams using the sliding window model. Pattern mining in more than

one data stream has the following challenges. Compared to the frequent itemset mining

techniques, the discriminative itemset mining does not follow the Apriori property and a sub-set

of a discriminative itemset can be non-discriminative. The exponential number of itemset

combinations demand high time and space complexities when the frequent itemset mining

algorithms are used in the large and fast-growing data streams (Cheng, Ke and Ng 2008). The

designed methods should deal with the exponential number of itemset combinations in more than

one data stream generated and be tested against the discriminative itemset criteria with efficiency

in gaining results.

Despite these challenges, discriminative itemset mining is an emerging research area.

Summary: One of the interesting real world scenarios is dynamic tracing of transactions

in market basket datasets. The itemsets that occur more frequently in one market compared to the

other markets are of interest. These are useful for identifying the customers or group of customers

who have high interest in specific items compared to the rest of the population. The discovered

itemsets are useful for personalization or anomaly detection as well. Considering only the

frequent itemsets in data streams, is not distinctive enough for these purposes, as they may be

generally frequent in other data streams as well (Lin et al. 2010). There are many other examples

that can show the usefulness and significance of the discriminative pattern mining in data

streams. In the network traffic monitoring the discriminative patterns shows the activity sets that

are more impressive for specific users in comparison to the rest of the group activities in the

whole network. This information is meaningful for anomaly detection. In another example, the

discriminative itemsets can be effectively used in the search engines and news delivery services

for the purpose of personalization (Lin et al. 2010).

The essential issue in the above example applications is to find the itemsets that can

distinguish the target stream from all other streams. There is not much research done in pattern

mining in more than one data stream. A couple of methods have followed the discriminative

items mining in data streams (Lin et al. 2010; Seyfi 2011) with challenges for expansion to the

discriminative itemset mining. Another related research area, as we explained, is Emerging

Patterns (EPs) (Dong and Li 1999) with a close definition to the definition of discriminative

itemsets. The emerging patterns are described as itemsets whose frequencies grow significantly

higher in one dataset in comparison to another one.

2.3.4 Discriminative item mining in data streams

Discriminative items are the items that are frequent in the target data stream and

infrequent in the other data streams totally defined as general data stream. In (Lin et al. 2010)

three different methods have been proposed for mining discriminative items in data streams

namely, frequent item based, hash-based method and hybrid method.

In the frequent item based method the space-saving algorithms (Metwally, Agrawal and

El Abbadi 2005) are used separately on each data stream and then the results combined to find

the discriminative items. In this method the recall is highly dependent on the minimum support

error. However, because the frequent items are discovered separately in each data stream and the

comparison is done only after processing both data streams, we may count so many non-

discriminative items as the frequent items in the target data stream defined as 𝑆1 which are

frequent in general data stream defined as 𝑆2, as well. Applying the same method for

discriminative itemset mining would cause the same issue as many of the frequent itemsets are

frequent in both streams but with a bigger challenge because of the combinatorial itemsets

generation in multiple data streams.

In the hash-based method, two data streams process together at the same time, so if one

item is found to be very frequent in the second data stream, its counting would stop and save the

effort in both the data streams. In this approach, items are being appointed to different groups

named buckets and the frequency of all the items in one bucket is counted together using the

same counter. The buckets are expanded if they reach the threshold of discriminative bucket and

frequency of the smaller group of items counted together. The issue in this method is that a

number of the discriminative items may be buried in non-discriminative buckets because of the

group‎frequency‎counting‎in‎each‎bucket‎and‎the‎effect‎of‎other‎items’‎frequencies‎in‎the‎general‎

data stream. This will result in lower recall in this method by losing the discriminative items in

the potential but not expanded higher level buckets. The same issue would arise for

discriminative itemsets.

The hybrid method is tried to discover the concealed discriminative items in the non-

discriminative potential buckets to improve the performance of the hash-based structure. For this

purpose, using a number of space-saving counters in the target stream, a summary of every

bucket is saved, which can be referred to as a sub-stream of the main stream. Items are assigned

to different expandable buckets and all the items in one bucket are counted together using the

same counter. Using this hybrid structure, in the current bucket, the frequent items of the target

data stream are found and by expanding that bucket, there would be a good chance to find the

discriminative items concealed in the non-discriminative but potential buckets. However, having

frequent items in the current bucket would not be a good heuristic for bucket expansion as the

frequent items may be frequent in the general data stream 𝑆2, as well (Seyfi 2011). The hybrid

method (Lin et al. 2010) is also designed for mining the discriminative items in the landmark

window model. The two data streams are processed together and the items are assigned to the

different buckets. In each bucket, frequency of items are counted by the same counter using the

space-saving algorithm (Metwally, Agrawal and El Abbadi 2005). Buckets are only expanded if

they meet the discriminative bucket conditions. Expansion of this method for the other type of

window models will face different challenges for keeping on updating the window model by the

items coming in or going out of the window frame.

Compared to the discriminative itemsets, these methods have simplicity due to not

needing combinatorial explosion. The process of discriminative itemset mining with these

methods would be time and space consumption especially for dealing with large data streams. All

of the three mentioned methods have been designed for pre-defined thresholds and it is not

possible to change the thresholds after setting them up for the first time. This method only

generates the discriminative items and not the itemsets.

The hybrid structure can be possibly expanded for discriminative itemset mining by

assigning several itemsets to the same bucket using an improved hash function. However, it will

face major challenges such as very large exponential number of itemsets generation for each new

transaction, which will require a huge hybrid structure and a very complex hashing process for

assigning the best sets of itemsets to each bucket. This would not be a meaningful expansion and

updating and restructuring the hybrid structure for the large number of itemsets generated for

window updating will be a complex process. If the hash assignment functions are not employed

properly the number of concealed discriminative itemsets will increase the result in low recall.

The hybrid method also shows the approximate frequencies of the itemsets and the efficiency of

this method highly depends upon the hashing function and the group of itemsets assigned to the

same bucket. Also, as a group of itemsets in each bucket has to be counted together, there

wouldn’t‎be‎separate‎frequencies‎for‎every‎itemset,‎which‎is‎necessary‎for‎ the‎historical‎ tilted-

time window model. In the tilted-time window model there needs to be the frequency of the

discriminative itemsets to add in the historical windows.

The hierarchical counter approach (Seyfi 2011) is proposed for identifying the exact

frequencies of all the items including the infrequent items in the datasets and then discovering the

discriminative items. The hierarchical counters have real values and each of the 10-valued

positions (a different power of 10) is used for holding the frequency of one item. If the frequency

of the item goes up and passes the 9 the counter is set to zero and its counter in the higher place

of the hierarchy will increase. The process will happen for all of the counters in the hierarchies.

Compare to the hybrid method (Lin et al. 2010) the different thresholds can be used at different

times. The unique feature of this method is its different type of counter structure that seems to be

more space efficient compared to the normal counters. In the same way as the hybrid method this

method can be also expanded for discriminative itemset mining using the FP-Growth method by

appointing each group of itemsets to a hierarchy of counters. A better approximation in the

results seems to be achieved using this type of counter. However, defining the hierarchical

counter structure for the itemsets is not acceptable because of the explosion in the number of

itemset combinations in discriminative itemset mining. Also, appointing and tuning the

hierarchical counters for the historical tilted-time or real-time sliding window models would have

high time complexity.

The superiority of the discriminative itemsets over discriminative items can be validated

in different real world examples. The discriminative items in the search engines are the keywords

that are frequent in the target data stream and infrequent in the other data streams. The

discriminative itemsets in the search engines are the set of the keywords that more frequently

appeared together in the target data stream compared the other data streams; for example, the

discriminative itemset‎of‎ ‘women’s‎ skinny‎ jeans’‎compared‎ to‎ the‎discriminative item ‘jeans’.

These are more interesting for the purpose of comparison, personalization and recommendation.

The discriminative itemsets in the market basket are more useful compared to the discriminative

items for the purpose of recommendation and personalization; for example, the discriminative

itemset‎of‎‘bread, beer, eggs’‎compared‎to‎discriminative item‎‘eggs’.

Summary: Using itemsets is better as the items are meaningless‎ and‎ they‎don’t‎have‎

much knowledge with them. Also, the itemset combination generation is a big burrier in mining

discriminative itemsets compared to the mining discriminative items. Mining discriminative

itemsets in data streams seems to have far more complexities than frequent itemset mining in

single data streams. The Apriori property in frequent itemsets is not applicable for discriminative

itemsets as the subset of discriminative itemsets can be non-discriminative. Also, the algorithms

should be adjusted for the bigger size of datasets and deal with the inter-relationships between

data streams.

Considering the challenges in the expansion of the current discriminative item mining

method to the area of discriminative itemset mining, the existed frequent itemset mining

algorithms following the Apriori or FP-Growth concepts are the better options to be applied for

the purpose of itemset generation in more than one data stream. The number of generated

itemsets in the FP-Growth algorithm would be lower than Apriori based algorithms. However, it

should be considered that the concept of closed (Chi et al. 2006) and maximal (Farzanyar,

Kangavari and Cercone 2012) itemsets used in some research to deal with the combinatorial

itemset generation problem in data streams is not applicable in discriminative itemset mining for

reducing the itemsets, as the Apriori property is not valid.

In this thesis the tilted-time window model and the sliding window model are selected

for mining discriminative itemsets. Mining discriminative itemsets in data streams at multiple

time granularities and real time frames can be defined as the of the research gaps in the area of

pattern mining in data streams.

In many applications the rule mining is the next step after pattern mining in datasets. The

discriminative rules can be defined from discriminative itemsets as the sequential rules in one

data stream that have higher support and confidence compared to the same sequential rules in the

other data streams. These discriminative rules can be used for prediction mining in data streams

using classification techniques. The association rules in single data stream and association

classification techniques are explained in the section below as similar concepts to the

discriminative rules and discriminative classifications.

2.4 ASSOCIATION RULE MINING

Algorithms for association rule mining consist of two steps. In the first step they discover

the itemsets that meet the support threshold and then in the second step based on these discovered

frequent itemsets they drive the association rules that meet the confidence criterion. There are

several approaches proposed for association rule mining like AprioriAll (Peng and Liao

2009), GSP (Agrawal and Srikant 1994), SPADE (Zaki 2001) SPAM (Ayres et al. 2002) and

PrefixSpan (Han et al. 2001). The proposed algorithms for sequential rule mining in data streams

should follow the requirements of data stream mining algorithms and adopt an incremental

and one pass mining approach as well as defining synopsis structure. Also, in order to avoid

the concept drifting problem (Wang et al. 2003), the data mining methods have to be adapted

to the change in data distribution (Jiang and Gruenwald 2006).

The existing algorithms for sequential rule mining in data streams are discussed briefly

in the section below.

2.4.1 Association rule mining in data streams

Several algorithms are proposed for association rule mining in data streams. In (Shin and

Lee 2008) a method proposed for mining association rules directly over the changing set of

currently frequent itemsets, which are generated over an online data stream using the estDec

(Hyuk and Lee 2003) method. In (Çokpınar‎ and‎ Gündem 2012), a rule mining system is

proposed for mining positive and negative association rules in XML data streams. The extracted

rules from association rule mining techniques are generally positive rules. However, there are

some works on negative association rules (Antonie‎ and‎Za¨ıane‎2004; Wu, Zhang and Zhang

2004). The itemsets with support greater than the positive threshold are used for mining positive

association rules, and the itemsets with support less than negative threshold are used for mining

negative association rules. In (Çokpınar‎and‎Gündem‎2012) the PNRMXS algorithm proposed

for mining association rules is based on two steps of frequent itemset mining and

sequential rule mining. The algorithm is designed for a time sliding window model on

a data stream. In (Ahmed et al. 2012) a tree structure HUS-Tree (high utility stream tree)

and algorithm, called HUPMS (high utility itemset mining over stream data) is

proposed for incremental and interactive HUP mining over sliding window in data

streams. The negative rule mining approach introduced in (Yuan et al. 2002) decreases the

search space for mining negative rule mining by losing many negative relations in

One of the drawbacks of current methods in the area of sequential rule mining

in data streams is that the user can only define the parameters before the execution

and cannot adjust the parameters during the process (Jiang and Gruenwald 2006).

Similar to the frequent itemset mining and association rule mining, the discriminative

itemsets mining in data streams can be followed by discriminative rule mining. These

discriminative rules can be used for defining accurate classification techniques. The

discriminative classification can be defined based on the discriminative rule mining. In the

section below the association classification as of the classification rule mining methods is

discussed with its challenges, usefulness and applications.

2.4.2 Classification rule mining

Classification is one of the main data mining techniques used in applications like

personalization, anomaly detection, recommendation and prediction. There are many

classification techniques developed for discovering a small subset of rules from the dataset which

can be used for prediction purposes in real world applications. These methods are grouped in

different categories like decision trees (Quinlan 2014), rule learning (Clark and Niblett 1989),

nayve-Bayes classification (Duda and Hart 1973) and the statistical approach (Lim, Loh and Shih

2000). Many algorithms are also proposed for data stream classifications (Alhammady and

Ramamohanarao 2005; Aggarwal 2007; Li et al. 2009; Hashemi et al. 2009; Zhao, Wang and Xu

2012; Masud et al. 2012). These methods are mainly working based on the pattern mining from a

single data stream. As with the other data stream mining algorithms, the challenges in developing

data stream classifications are the fast speed growing of data streams, unbounded memory

requirements and concept drifting (Aggarwal 2007; Manku 2016). These make the adaptation of

the compact set of rules necessary to be synchronized with the fast speed of data streams.

Association classification (AC) (Ma 1998; Li, Han and Pei 2001) is proposed based on

the integration of the association rule mining and the classification techniques. These are closed

concept with the difference that classification predicts the class labels but the association rules

show the relationships between items in transactions (Thabtah 2007). In association rule mining,

the targets are not predetermined while the classes are the defined sets of targets in classifications

(Ma 1998). Out of the current studies the association classification is a promising approach that

can compete with the decision tree, rule induction and probabilistic classifiers (Ma 1998; Thabtah

2007). The AC methods have four steps of rule ranking, rule pruning, rule prediction and rule

evaluation (Ma 1998).

The rule mining is the most time-consuming step in the traditional AC mining

approaches and not applicable for fast-growing large data streams. The number of the rules

discovered in the data streams can also be large causing other steps to take longer. There are

several studies on using association classification in data streams (Song and Li 2010; Su, Liu and

Song 2011; Lakshmi and Reddy 2012; Saengthongloun et al. 2013; Waiyamai et al. 2014;

Kompalli and Cherku 2015). The discriminative itemsets defined as frequent itemsets in a data

stream with higher frequency than frequency in the general data stream can be used for

discovering discriminative rules as a new set of rules. The discriminative rules are useful for the

classification purpose, with a smaller number and better knowledge for differentiation of the data

streams. The discriminative classification can be defined following the similar concept as

association classification and based on the discriminative rule mining.

Summary: Development of an effective algorithm for mining discriminative rules

highly depends on the effectiveness of the algorithms used for mining discriminative itemsets.

However, the rule mining challenges also should be addressed in different window models. The

challenges mostly relate to managing the number of transactions fitted in the window sizes and

updating the results by every incoming transaction.

In this thesis there are several new techniques developed for discriminative itemsets

mining in data streams. In the section below the new knowledge contributed to the research area

is verified.

2.5 VERIFYING THE NEW KNOWLEDGE CONTRIBUTED TO THE AREA

In this chapter we explained many related works to the problem of mining discriminative

itemsets in data streams. The problem of mining discriminative itemsets in data streams is

motivated by mining frequent itemsets in data stream (Manku and Motwani 2002), mining

discriminative items in data streams (Lin et al. 2010) and mining emerging patterns (Dong and Li

1999). The discriminative itemsets are those itemsets that are frequent in the target data stream

domain. We explained the differences between emerging patterns and discriminative itemsets

and also differences between discriminative items and discriminative itemsets in details in the

above sections. Compared to the frequent itemsets the discriminative itemsets have sparsity

characteristics. The number of discriminative itemsets is much less than frequent itemsets.

However, the discriminative itemsets are more directed for the purpose of comparison between

data streams. The discriminative itemsets can distinguish the datasets in different real world

scenarios.

The discriminative itemsets are significant when there is no inherent discrimination in

the datasets (data classes). In many application domains there are inherent conceptual differences

between datasets (data classes), for example, in mushroom dataset from UCI repository (Dheeru

and Karra Taniskidou 2017) the two classes are edible and poisonous. In this type of applications

the emerging pattern mining is useful. Most of the emerging patterns in these type of applications

are jumping emerging patterns, i.e., special type of emerging patterns whose support increase

from zero in one dataset to non-zero in another dataset. We observed that in emerging pattern

mining different techniques are used to decrease the number of emerging patterns to a subset of

useful emerging patterns. This is not useful in applications which do not have inherent

discrimination in the datasets, for example, the two different markets in market basket dataset

analysis. In this type of applications the discriminative itemsets are discovered with high

frequency in all datasets.

The other significance of discriminative itemset is that every discriminative itemset is

discovered with explicit frequencies in the datasets. This is useful in pattern mining in the

applications based on the support and confidence, for example, in rule mining applications. The

emerging pattern mining mainly define the desired emerging patterns based on the ranked

statistical merits under different test statistics such as chi-square, risk ratio, odds ratio, etc (Dong

and Bailey 2012). The emerging pattern mining methods use heuristics to have compact set of

emerging patterns. Discovering a small set of emerging patterns without redundancy is more

preferred than generating all patterns. However, the proposed methods in this thesis discover a

complete set of discriminative itemsets efficiently based on their explicit supports in the datasets

and not the statistic measures.

Also, the proposed discriminative itemset mining algorithms in this thesis for the first

time deal with mining discriminative itemsets in data streams. As per literature reviewed in this

chapter there is no similar work on data streams. In this thesis, we are contributed in mining

discriminative itemsets in datasets in a batch of transactions, mining discriminative itemsets in

data streams using tilted-time window model, and mining discriminative itemsets in data streams

using the sliding window model.

In the first step and in Chapter ‎3, the discriminative itemsets are discovered from a batch

of transactions belongs to the datasets for offline updating the window models. The most

challenging issue in this step is efficiency. The proposed algorithms for mining discriminative

itemset in a batch of transactions have to be efficient for dealing with the exponential number of

itemset generation in more than one data stream. The Apriori property defined for the frequent

itemsets is not valid in discriminative itemset mining, that is, not every subset of a discriminative

itemset is discriminative. The proposed algorithms are designed based on the concise process and

data structures for saving time and space. The discriminative itemsets in a batch of transactions

are discovered and used for updating the different window models in offline state. The algorithms

should have full accuracy and recall in a batch of transactions i.e., no false positive or false

negative answers. The approximation is not acceptable in batch processing as the offline updating

the window model will increase the approximation to a higher level.

In the second step and in Chapter ‎4, the discriminative itemsets are discovered from a

batch of transactions belongs to the datasets for offline updating the tilted-time window model.

The tilted-time window model is updated in an offline state in the specific time intervals defined

for the batch of transactions. The discriminative itemsets discovered from the new batch of

transactions are saved in the current window frame in the tilted-time window model. To improve

the approximation in discriminative itemsets in the tilted-time window model, the sub-

discriminative itemsets are also defined. The sub-discriminative itemsets are discovered as they

may become discriminative in the future by merging the window frames. The discriminative and

sub-discriminative itemsets in the older window frames are discovered by shifting and merging

the itemsets. The approximate discriminative itemsets in the tilted-time window model get worse

in the larger window frames by merging the approximate smaller window frames. The

determinative properties of the discriminative itemsets in the tilted-time window model are

applied to decrease the number of false positive and false negative answers in the tilted-time

window model.

In the third step and in Chapter ‎5, the discriminative itemsets are discovered from a batch

of transactions belongs to the datasets for offline updating the sliding window model. The sliding

window model is updated in offline state in the specific time intervals defined for the batch of

transactions. The new batch of transactions is processed and the discriminative itemsets are saved

in the sliding window frame and the itemsets belong to the oldest partition out of the sliding

window frame are deleted. The proposed algorithm for offline sliding window model should

have full accuracy and recall. Approximation is not accepted in sliding window model as by each

offline updating of the window model the approximation will increase to a higher level. The

heuristics are proposed for mining exact set of discriminative itemsets in offline sliding window

model. The sub-discriminative itemsets are also saved as they may become discriminative in the

future by online sliding window frame. The online sliding window model is happen between two

offline sliding limited to the itemsets in the window frame. The online sliding window model is

for mining discriminative itemsets in an online state with approximate results.

Finally and in Chapter ‎6, the proposed methods are evaluated with several synthetic and

real datasets with different characteristics for their efficiency and effectiveness. The input batches

of transactions are generated synthetically in wide varieties. The different parameter settings are

tested to show the effect of each parameter on the algorithm efficiency and discovered patterns.

The proposed methods are compared with the baseline methods for both efficiency and

effectiveness. The different behaviours of the proposed algorithms with different input data

streams and different input parameters are demonstrated. The effectiveness of the discriminative

itemsets is represented in real world applications.

In the section below the research gaps are identified to be addressed in this thesis.

2.6 RESEARCH GAPS AND IMPLICATIONS

In spite of the abundance of research in frequent itemset mining in data stream, there are

not many research works in pattern mining in more than one data stream. Almost all of the

proposed algorithms are developed for single data stream (Aggarwal 2007; Cheng, Ke and Ng

2008; Manku 2016). Also, most of the emerging pattern mining algorithms are proposed on static

datasets and not data streams. As there is a research gap, more attention needs to be considered

for studying in the area of pattern mining in more than one data stream. The discriminative

itemsets are defined as the frequent itemsets in the target data stream that have frequencies much

higher than that of the same itemsets in other data streams. Compared to frequent itemset and

sequential rule mining in single data streams, discriminative itemsets and rules are more directed

for the purpose of comparison between data streams. The discriminative itemsets distinguish the

target data stream from other data streams. The emerging patterns are the close concept to the

discriminative itemsets. As with the emerging patterns (Dong and Li 1999), mining

discriminative itemsets from more than one data stream can reveal interesting insights. Like EPs

in static datasets, the discriminative itemsets can be used for prediction purposes by defining

classifier techniques or be used for highlighting the differences between different data streams.

As discussed before, most of the existing emerging pattern mining methods have been

designed for the static datasets (Dong and Li 1999; Fan and Ramamohanarao 2002; Zhang, Dong

and Kotagiri 2000; Bailey, Manoukian and Ramamohanarao 2002; Alhammady and

Ramamohanarao 2005). These methods try to define left and right borders for maximal itemsets

and use them to reach to the EPs. They scan each dataset separately and several times, which is

not acceptable in data stream environments. They are involved with the problem of a large

number of candidate itemsets, which again would be a big barrier in data streams. We need to

propose algorithms to avoid processing of non-potential itemsets during mining discriminative

itemsets.

The classification is one of the main tasks of the emerging patterns. There are many

classification techniques proposed, based on emerging patterns both for static datasets (Li, Dong

and Ramamohanarao 2000, 2001; Fan and Ramamohanarao 2002, 2003) and stream data

(Alhammady and Ramamohanarao 2005; Yu et al. 2012). Also, the discriminative subsequence

mining is used for classification purposes (Nowozin, Bakir and Tsuda 2007). However, the type

and the concept of these patterns are very different compare to the discriminative itemsets. The

purpose of the discriminative itemset mining in data streams can lead them to the discriminative

rule mining and the discriminative classification mining.

There are number of methods proposed for mining discriminative items in data streams.

Discriminative itemsets are more valuable in data mining process as the items are meaningless

and they do not have much knowledge with them. Also, the discriminative rules can be extracted

out of the discriminative itemsets which are very important for prediction mining.

Every proposed algorithm for data stream mining should be very effective and efficient

in time and space complexities to be useful and reasonable in real-time situations in large and

fast-growing data streams. This research will use the inherent heuristics of the target datasets to

optimize the data mining process. In summary, the following research gaps will be addressed by

this research:

Lack of method for differentiating between data streams by highlighting the

Lack of method for efficient discriminative itemset mining in multiple time

granularities.

Lack of method for efficient discriminative itemset mining in sliding real time

window frame.

Lack of efficient discriminative itemset mining methods optimized based on the

general and specific characteristics of the data streams in different window

models.

Chapter 3: Mining Discriminative Itemsets in Data Streams Page 39

3Chapter 3: Mining Discriminative Itemsets in

Data Streams

In this chapter, the problem of mining discriminative itemsets in data streams is formally defined.

The comprehensive research problem is outlined and two methods are proposed. The primary

method is the simple expansion based on the basics of FP-Growth (Han, Pei and Yin 2000),

called DISTree method. The advanced efficient method is based on the sparse characteristics of

the discriminative itemsets, called the DISSparse method. The proposed methods are explained

in detail with deep discussion about their limitations in real world applications. The

discriminative itemsets are discovered from a batch of transactions and used for updating the

window models in an offline state. The precision of the proposed methods is proved with full

accuracy and recall in mining discriminative itemsets in one batch of transactions. The proposed

methods are extensively evaluated on various datasets with different sizes exhibiting diverse

characteristics in terms of transaction size, frequent-pattern size, and number of unique items

using different thresholds in Chapter ‎6. Empirical analysis shows much better time and space

complexity gained by the DISSparse algorithm proposed in (Seyfi et al. 2017) in comparison to

the modified version of the DISTree algorithm proposed in (Seyfi, Geva and Nayak 2014). To the

best of our knowledge, the proposed methods in this chapter are the first algorithms for mining

The chapter starts with describing the existing works in Section ‎3.1. The mathematical

definition of the research problem using several notations is proposed in Section ‎3.2. In

Section ‎3.3 the DISTree method is proposed as the primary method for mining discriminative

itemsets. In Section ‎3.4 the DISSparse method is proposed as the advanced method for efficient

mining of the discriminative itemsets. The chapter is finalized by the discussion about the

DISTree and DISSparse methods and their state-of-the-art techniques in Section ‎3.5. This chapter

covers the paper of Mining Discriminative Itemsets in Data Streams (Seyfi, Geva and Nayak

2014) and the paper of Efficient Mining of Discriminative Itemsets (Seyfi et al. 2017).

3.1 EXISTING WORKS

FP-Growth is a famous method proposed for mining frequent patterns by pattern

fragment growth (Han, Pei and Yin 2000). A large database is compressed into a much smaller

data structure called FP-Tree. The FP-Tree based mining method avoids generation of large

number of candidate itemsets using pattern fragment growth method. A divide-and-conquer

method is used for decomposing the frequent pattern mining to smaller tasks.

The frequent pattern tree, or FP-Tree for short, is a compact prefix-tree structure used for

holding information about the frequent patterns. The frequent pattern tree is constructed in the

first scan of the datasets out of only the frequent items. This data structure reduces the necessity

of repeatedly scanning the datasets. The transactions with identical frequent patterns can be

merged with number of occurrence as frequency. To make this easy the items in whole

transactions are ordered based on fixed order. If the frequent items in the transactions ordered

based on their frequency descending order, more prefixes can be shared between transactions. In

this way items with higher frequency have better chance for sharing the nodes compare to the

less frequent items. To facilitate the tree traversal a header table is constructed from the items.

The items in header table point to their occurrences in the tree using linked lists.

Based on sharing the frequent items in transactions the size of FP-Tree is much smaller

than original dataset. This ensures the subsequent mining with rather smaller data structure. The

pattern fragment growth based on FP-Tree is started from frequent length 1-patterns. The

conditional FP-Tree is constructed out of the set of frequent items which are co-occurring with

the specific item as suffix, and mining continues recursively with such tree. The patterns are

discovered by concatenating of the suffixes with new generated patterns from conditional FP-

Tree. The divide and conquer partitioning-based is used to reduce the size of the generated

patterns. This also simplify the generation of the long patterns by concatenating of the shorter

ones with suffix.

Mining the discriminative patterns using the FP-Tree has the problem of combinatorial

candidate generation and test. The recursive process is performed and the patterns are obtained

by a pattern fragment growth method. However, the pattern fragment method is not applied to the

problem of discriminative itemset mining as it is discussed in details in Section ‎3.3. We modify

the original FP-Growth method by the multiple counters in each node of prefix-tree i.e., for

showing the frequencies in different datasets. We use linear algorithm to discover all the

discriminative itemsets as it is explained in the full chapter. For efficiency we propose two

heuristics to eliminate the non-potential itemset from itemset combination generation. We

compare the methods in case of time and space usage in Chapter ‎6.

The Discriminative Pattern Miner (DPMiner) algorithm (Li, Liu and Wong 2007) is a

famous method proposed for efficient mining of the delta-discriminative emerging patterns

(Dong and Bailey 2012) based on the concept of equivalence classes. An equivalence class EC is

a subset of itemsets that always occur together frequently in the same set of transactions. The

equivalence classes are uniquely defined based on closed patterns and a set of generators. The

closed pattern is a frequent itemset with no proper superset in the same frequency (Zaki and

Hsiao 2002). The generators are the itemsets with smaller frequency compared to every

immediate proper subset (Li et al. 2006). The generators are the minimal itemsets among the

equivalence class and the closed pattern is the maximum itemset. The key idea in DPMiner

algorithm is to mine the concise representation of the equivalence classes.

The 𝛿-discriminative emerging patterns must be frequent in the target dataset (𝑓𝑖 > 𝑚𝑠)

and have frequency less than 𝛿 in the other datasets (𝑓𝑗 < 𝛿). The DPMiner mines the delta-

discriminative emerging patterns by skipping the subset of itemsets with frequency 𝑓𝑗 > 𝛿

during depth search first. It also skips the redundant itemsets as a superset of subset itemsets with

the same infinite ratio between the supports in the target and general dataset. The itemset with

infinite ratio has high frequency in the one dataset and zero or very low frequency in the other

datasets. The DPMiner employs 𝛿 constraint to reduce the pattern search space by setting a

border of non 𝛿-discriminative emerging patterns. If an EC is 𝛿-discriminative and non-

redundant, then none of its subset ECs can be 𝛿-discriminative and none of its superset ECs can

be non-redundant. This is caused for the wide difference in the number of discovered itemsets as

the redundant 𝛿-discriminative emerging patterns. The equivalence class is redundant if the

closed itemsets of one EC is the subset of closed itemsets of another EC as all the itemsets

already‎subsumed‎by‎ the‎subset‎closed‎itemset.‎This‎emphasizes‎ that‎only‎ the‎most‎general‎δ-

discriminative equivalences are non-redundant. This will ignore the low ranked equivalence

classes as redundant.

The discriminative itemsets defined in this thesis are measured by their relative

occurrences in‎the‎target‎and‎general‎dataset,‎however,‎the‎δ-discriminative emerging patterns are

measured by their frequency (i.e., < 𝛿) in the general dataset. A subset of a non-discriminative

itemsets can be discriminative. The δ-discriminative emerging pattern also ignores the low

ranked equivalence classes as redundant. However, for the discriminative itemsets proposed in

this thesis, explicit relative supports are provided for every discriminative itemset. The

discriminative itemset mining is a novel research problem by mining a complete set of the

itemsets which are frequent in the target dataset and are relatively discriminative in the target

dataset compared to the general dataset. In this chapter we modify the original DPMiner to

include all the delta-discriminative itemsets. We also modify the proposed DISSparse in this

chapter and its proposed heuristics to find all the delta-discriminative itemsets. We compare these

methods in case of time and space usage for the desired delta-discriminative emerging patterns in

chapter ‎6.

The main contribution in this chapter is proposing two algorithms for mining

discriminative itemsets. The DISTree method (Seyfi, Geva and Nayak 2014) is proposed in

Section ‎3.3 based on simple expansion of the FP-Growth method. The efficient DISSparse

method (Seyfi et al. 2017) is proposed in Section ‎3.4 based on two heuristics to skip the non-

discriminative itemsets. The technical details of the proposed algorithms are provided in a way to

distinguish with the existing FP-Growth method (Han, Pei and Yin 2000).

3.2 RESEARCH PROBLEM

A data stream is defined as a dynamic large ordered sequence of transactions that are

coming with fast speed through time. In most of the data stream mining applications, the stream

can be read only once by considering the limited computing and storage capabilities (Manku and

Motwani 2002). Stock market monitoring, online market basket analysis, network access

patterns, search engine queries in a region etc., can be modelled as data streams. The

discriminative itemsets are defined as the frequent itemsets in the target data stream that have

frequencies much higher than that of the same itemsets in other data streams. Not losing

generality, we call other data streams a 'general data stream' for the sake of simplicity. The

discriminative itemsets are relatively frequent in the target data stream and relatively infrequent

in the general data stream. An essential issue in this research problem is to find the itemsets that

can distinguish the target data stream from all other data streams.

There are many real world scenarios that can show the significance of discriminative

itemset mining in data streams. In online monitoring of market basket transactions, the itemsets

that occur more frequently in one market compared to the other markets are of interest. These are

useful for identifying the specific set of items which are of high interest in one market compared

to the other markets. The discovered itemsets are useful for personalization, anomaly detection or

prediction purposes. In an application of web page personalization, the discriminative itemsets

are groups of the web pages visited together by specific user groups much more frequently than

other user groups. In search engines, the discriminative itemsets are the sequences of queries

asked with higher support in one geographical area compared to another area. In network traffic

monitoring, discriminative itemsets are the concurrent activities of one user, which are more

frequent in comparison to the rest of the same group activities in the whole network.

Discriminative itemsets highlight the differences between data streams (Lin et al. 2010; Seyfi,

Geva and Nayak 2014; Seyfi et al. 2017). They can be used for classification methods by

distinguishing the trends in the target data stream from other data streams.

The discriminative itemsets are a sparse subset of frequent itemsets. There are additional

challenges in the problem of mining discriminative itemsets in data streams as this does not

follow the Apriori property defined for the frequent itemset mining and a subset of discriminative

itemsets can be non-discriminative. Additionally, any discriminative itemset mining method has

to deal with the exponential number of itemsets generated in more than one dataset. These

challenges demand high time and space complexities. Despite these challenges, discriminative

itemset mining is an emerging research area with great potential. The frequent itemsets can be

frequent in all data streams and not distinctive, while discriminative itemsets carry distinctive

knowledge and can be used for the purpose of comparison between data streams.

In this chapter, the discriminative itemset mining is discussed only based on a single

batch of transactions 𝐵 which is made of two data streams 𝑆𝑖 and 𝑆𝑗. Later, in Chapter ‎4 and

Chapter ‎5, we discuss the problem in detail, using the tilted-time window model and sliding

window model, respectively.

3.2.1 Formal definition

Let ∑ be the alphabet set of items, a transaction 𝑇 = {𝑒1, … 𝑒𝑖 , 𝑒𝑖+1, … , 𝑒𝑛}, 𝑒𝑖 ∈ ∑, is

defined as a set of items in ∑. The items in the transaction are in alphabetical order by default for

ease in describing the mining algorithm. The two data streams 𝑆𝑖 and 𝑆𝑗 are defined as the target

and general data stream; each consists of a different number of transactions, i.e., 𝑛𝑖 and 𝑛𝑗 (𝑛𝑖 ≠

𝑛𝑗), respectively. An itemset 𝐼 is defined as a subset of ∑. The itemset frequency is the number of

transactions that contain the itemset. The frequency of the itemset 𝐼 in data stream 𝑆𝑖 is denoted

as 𝑓𝑖(𝐼) and the frequency ratio of itemset 𝐼 in data stream 𝑆𝑖 is defined as 𝑟𝑖(𝐼) = 𝑓𝑖(𝐼)/𝑛𝑖. In

this chapter, if the frequency ratio of itemset 𝐼 in the target data stream 𝑆𝑖 is larger than the

frequency ratio in the general data stream 𝑆𝑗, i.e., 𝑟𝑖(𝐼)

𝑟𝑗(𝐼)> 1, then the itemset 𝐼 can be considered

as a discriminative itemset. Let 𝑅𝑖𝑗(𝐼) be the ratio between 𝑟𝑖(𝐼) and 𝑟𝑗(𝐼), i.e., 𝑅𝑖𝑗(𝐼) = 𝑟𝑖(𝐼)

𝑟𝑗(𝐼).

Obviously, the higher the 𝑅𝑖𝑗(𝐼), the more discriminative the itemset 𝐼 is.

To more accurately define discriminative itemsets, we introduce a user-defined threshold

𝜃 > 1, called a discriminative level threshold with no upper bound. An itemset 𝐼, is considered

discriminative if 𝑅𝑖𝑗(𝐼) ≥ 𝜃. This is formally defined as:

𝑅𝑖𝑗(𝐼) = 𝑟𝑖(𝐼)

𝑟𝑗(𝐼)=𝑓𝑖(𝐼)𝑛𝑗

𝑓𝑗(𝐼)𝑛𝑖 ≥ 𝜃 (‎3.1)

The 𝑅𝑖𝑗(𝐼) could be very large but with very low 𝑓𝑖(𝐼). In order to accurately identify

discriminative itemsets which have reasonable frequency, and also in the case of 𝑓𝑗(𝐼) = 0, we

introduce another user-specified support threshold, 0 < 𝜑 < 1 𝜃⁄ , to eliminate itemsets that

have very low frequency. In this chapter, an itemset 𝐼 is considered as discriminative if its

frequency becomes greater than 𝜑𝜃𝑛𝑖 i.e., 𝑓𝑖(𝐼) ≥ 𝜑𝜃𝑛𝑖 and also 𝑅𝑖𝑗(𝐼) ≥ 𝜃.

Definition ‎3.1. Discriminative itemsets: Let 𝑆𝑖 and 𝑆𝑗 be two data streams, with the

current size of 𝑛𝑖 and 𝑛𝑗 respectively, that contain varied length transactions of items in ∑, a user

defined discriminative level threshold 𝜃 > 1 and a support threshold 𝜑 𝜖 (0, 1 𝜃⁄ ). A set of

discriminative itemsets in 𝑆𝑖 against 𝑆𝑗, denoted as 𝐷𝐼𝑖𝑗, is formally defined as:

𝐷𝐼𝑖𝑗 = {𝐼 ⊆ ∑ | 𝑓𝑖(𝐼) ≥ 𝜑𝜃𝑛𝑖 & 𝑅𝑖𝑗(𝐼) ≥ 𝜃} (‎3.2)

The itemsets that are not discriminative are defined as non-discriminative itemsets:

Definition ‎3.2. Non-Discriminative itemsets: Let 𝑆𝑖 and 𝑆𝑗 be two data streams, with the

current size of 𝑛𝑖 and 𝑛𝑗 respectively that contain varied size transactions of items in ∑, a user

defined discriminative level threshold 𝜃 > 1 and a support threshold 𝜑 𝜖 (0, 1 𝜃⁄ ). The non-

discriminative itemsets are the itemsets with frequency less than 𝜑𝜃𝑛𝑖 in the target data stream 𝑆𝑖

or the frequency ratio in 𝑆𝑖 compared to 𝑆𝑗 less than 𝜃. A non-discriminative itemset can be

ignored if it is not a subset of discriminative itemsets. A set of non-discriminative itemsets in 𝑆𝑖

against 𝑆𝑗, denoted as 𝑁𝐷𝐼𝑖𝑗, is formally defined as:

𝑁𝐷𝐼𝑖𝑗 = {𝐼 ⊆ ∑ | 𝑓𝑖(𝐼) < 𝜑𝜃𝑛𝑖 𝑜𝑟 𝑅𝑖𝑗(𝐼) < 𝜃} (‎3.3)

3.2.2 Discriminative itemset mining

The problem of discriminative itemset mining is much more complicated compared to

the frequent itemset mining. The most challenging issue is the efficiency. The discriminative

itemset mining algorithms have to be efficient for dealing with the exponential number of itemset

generation in more than one data stream. The Apriori property defined for the frequent itemsets is

not valid in discriminative itemset mining, that is, not every subset of a discriminative itemset is

discriminative. The non-discriminative itemsets should be ignored in the process if they are not

subset of discriminative itemsets. This is time and space consuming even in the non-streaming

environments. The algorithms have to be designed based on the concise process and data

structures for saving time and space. To the best of current knowledge and based on the literature

reviewed in Chapter ‎2, there is no research work in mining discriminative itemsets in data

streams.

Due to the complexity of the discriminative itemset mining process and the current

methods for the frequent itemset mining in single data stream, the primary method is proposed by

expanding the FP-Growth method (Han, Pei and Yin 2000) of frequent itemset mining adaptable

for different window models. The advanced efficient method is then proposed based on the

sparse characteristics of the discriminative itemsets. In this chapter, the DISTree and DISSparse

algorithms are developed for the purpose of mining discriminative itemsets in data streams. The

data structures and the detailed process of the algorithms are discussed using a running example.

The specific characteristics and the limitations of the proposed methods in the large and fast-

growing data streams are presented. The DISTree and DISSparse algorithms are evaluated in

Chapter ‎6, based on the different input data streams with different characteristics, for their time

and space complexities.

3.3 DISTREE METHOD

The DISTree (Seyfi, Geva and Nayak 2014) is proposed by simple expansion based on

the basics of FP-Growth method (Han, Pei and Yin 2000). The DISTree method is developed for

offline mining the discriminative itemsets in a batch of transactions in the datasets adaptable for

the different window models. The DISTree algorithm has been designed based on several data

structures either modified from standard FP-Growth method of frequent itemset mining, or

specifically developed for discriminative itemset mining. The data structures are explained with

their concepts usage in the algorithm. To facilitate describing the method, several important

concept and constructs are defined below.

FP-Tree: The prefix tree structure proposed in the FP-Growth (Han, Pei and Yin 2000)

is used for holding the frequent items of the transactions by sharing the branches for their most

common frequent items. The branch from root to a node is considered as the prefix of the

itemsets ending at a node after that node. Each node in the FP-Tree is associated with a counter

showing the frequency of the itemset consisting of the items on the path starting from the root

and ending at this node. In the DISTree method, the FP-Tree is adapted by adding two counters

in each node for holding the frequencies of the itemsets in the target dataset and the general

dataset; for example, there are two counters associated with each node in the FP-Tree in

Figure ‎3.1.

In this thesis, we use the sequence from the root to a node to represent an itemset and the

two associated values indicate the frequency of the itemset in the target dataset and the general

dataset, respectively, for example, 𝑐𝑏7,3 in Figure ‎3.1 represents the frequency of itemset 𝑐𝑏 is 7

in the target dataset and 3 in the general dataset.

Header-Table: The Header-Table is a tabular structure showing all the items in the

defined alphabet ∑ by considering the processing order from the least frequent items. For fast

traversing the prefix tree structures, each item is associated with two linked-lists, which hold the

itemsets ending with that item in FP-Tree and DISTree, respectively; for example, the Header-

Table has links to the FP-Tree and DISTree in Figure ‎3.1 and Figure ‎3.3, respectively.

DISTree: The DISTree is a similar prefix tree structure to FP-Tree with two counters in

each node. The two counters 𝑓𝑖 and 𝑓𝑗 show the frequencies of the itemsets in the sequence

ending with each particular node in the target data stream and the general data stream,

respectively. The DISTree is constructed by traversing through the links in the Header-Table

structure following the FP-Growth method (Han, Pei and Yin 2000), generating all the

combinations of itemsets that are frequent in the target data stream and considered discriminative

based on Definition ‎3.1. The non-discriminative itemsets identified during the process are deleted

from DISTree unless they are a subset of discriminative itemsets; for example, 𝑐12,13 in

Figure ‎3.3 is a non-discriminative itemset but it is a subset of itemset 𝑐𝑏4,2 which is a

discriminative itemset. The DISTree structure is used to store the discriminative itemsets with

their intermediate nodes.

Window model: The DISTree method can be applied with different window models.

The tilted-time structure is a window model for showing the recent and historical

answers in different time granularities. A group of incoming transactions in a period of time are

considered as a batch of transactions 𝐵. By processing every new batch of transactions, the newly

discovered itemsets are transferred to the tilted-time window structure (Giannella et al. 2003) and

the structure is updated by shifting and merging the older answers in an offline state. The

logarithmic structure of the tilted-time window model shown in Figure ‎2.4 will be held for each

discovered discriminative itemset during the history of data streams, but as the number of

discriminative itemsets is much smaller than all the itemsets presented in the data streams, the

used space is not large.

The sliding window model is a structure for showing the answers in a fixed period of

time updated dynamically. This can be obtained by dividing the window size into several smaller

partitions. The discriminative itemsets are discovered in each partition using the DISTree

algorithm and updated in the sliding window model shown in Figure ‎2.3. The window slides by

processing the new batch of transactions and removing the itemsets related to the oldest batch of

transactions in the offline state.

The process of mining the discriminative itemsets using the DISTree method is

explained in the section below followed by the limitations and shortcomings for larger data

streams. The algorithms using the tilted-time window model and sliding window model will be

introduced in Chapter ‎4 and Chapter ‎5, respectively.

3.3.1 DISTree construction and pruning

The FP-Tree is constructed first for the transactions in both data streams. For every new

transaction in the data stream 𝑆𝑖 either a new prefix or a sub-prefix (i.e., a new branch or a sub-

branch in the tree) is added to the FP-Tree and the frequency pairs are adjusted or the frequency

pairs are updated in the prefixes in FP-Tree if the itemset already appeared in the past

transactions. Out of the running example presented in Table ‎3.1, the Header-Table and the FP-

Tree structures are shown in Figure ‎3.1. The process is continued by generating the itemsets

which are frequent in the target data stream using FP-Growth (Han, Pei and Yin 2000) and

adding them to the DISTree structure. The non-discriminative itemsets are deleted for space

saving if they are not a subset of any other discriminative itemsets.

Following the FP-Growth standard method (Han, Pei and Yin 2000) the FP-Tree is

traversed using the Header-Table links from the least frequent items and the conditional FP-Tree

is made by the patterns in the FP-Tree ending with that item. The conditional FP-Tree is the

miniaturized version of the original FP-Tree constructed for each item in the Header-Table out

of the frequent items in FP-Tree paths ending with that item defined as items’‎ conditional

patterns; for example, in Figure ‎3.1 the conditional patterns of item 𝑎 are

𝑐𝑏𝑑𝑎3,2, 𝑐𝑏𝑎1,0, 𝑐𝑒𝑎1,0, 𝑐𝑎0,4, 𝑏𝑎1,0, 𝑎0,2, respectively. The process is continued by generating

all the combinations of each single path in a conditional FP-Tree ending with Header-Table

item, and adding them to the DISTree structure. By processing each conditional FP-Tree, the

DISTree is checked by traversing through the links in the Header-Table item for the new

discriminative itemsets based on the defined thresholds. The non-discriminative itemsets are

deleted for space saving if they are not a subset of any other discriminative itemsets. After

processing the whole Header-Table items the discovered discriminative itemsets are fully set in

the DISTree structure and can be used for offline updating the different window models. The

DISTree structure is shown in the final state in the Figure ‎3.4.

Lemma ‎3-1 (Completeness of DISTree) itemset combination generation using

conditional FP-Tree as defined in the FP-Growth method (Han, Pei and Yin 2000) ensures the

completeness of discriminative itemsets in the DISTree structure by generation of all itemsets

which are frequent in the target data stream 𝑆𝑖.

Proof. The DISTree method following the basics of FP-Growth (Han, Pei and Yin 2000)

generates the full itemset combinations if they are frequent in the target data stream 𝑆𝑖. The

itemsets that are frequent in the target data stream 𝑆𝑖 are the superset of the discriminative

itemsets and are fully set or updated intermediately to the DISTree structure. Based on

Definition ‎3.1 the itemsets that are of interest are frequent in the target data stream 𝑆𝑖 and their

frequency ratio with the general data stream 𝑆𝑗 is 𝜃 times or higher. On the other side the non-

discriminative itemsets are deleted by traversing the recently added or updated itemsets through

the links in the Header-Table if they are not a subset of any discriminative itemsets.

Lemma ‎3-2 (Correctness of DISTree) All the itemsets in the DISTree structure are

checked based on Definition ‎3.1 and Definition ‎3.2 to be tagged as discriminative or non-

discriminative itemsets, respectively.

Proof. DISTree structure is made by itemsets that are frequent in the target data stream 𝑆𝑖

as the superset of discriminative itemsets. All the itemsets in the DISTree structure are tagged as

discriminative or non-discriminative based on the Definition ‎3.1 and Definition ‎3.2 by traversing

through Header-Table links.

Example ‎3.1. The DISTree construction is graphically demonstrated using the running

example presented in Table ‎3.1. Two simple data streams with the same number of transactions

in 𝑆1 and 𝑆2 (𝑛1 = 𝑛2 = 15), are presented in Table ‎3.1.

Table ‎3.1 An input batch in data streams

The DISTree method (Seyfi, Geva and Nayak 2014) is proposed and tested based on

single scan but the size of the data structures and the processing time is highly affected. Data

stream mining algorithms (Giannella et al. 2003; Chi et al. 2006; Tanbeer et al. 2009) use two

scans for making the concise data structures and faster processing time in which items are

ordered by decreasing frequencies as in (Han, Pei and Yin 2000). This ordering is adjusted based

on the frequent items in the first batch of transactions and remains fixed for all remaining batches

in the data streams. In this chapter, we modify the original DISTree method based on two scans.

In the first scan, the frequent items in the target data stream 𝑆1 are found and sorted based on the

descending order of their frequencies (i.e., Desc-Flist order in Table ‎3.2 is constructed out of

frequent items in Table ‎3.1). The Desc-Flist is used as the default order for all the prefix tree

structures and also shows the processing order in the Header-Table (e.g., the Header-Table in

Figure ‎3.1 is processed from the least frequent item in the dataset, item 𝑎). The frequent items in

each input transaction in the datasets are sorted based on the Desc-Flist order before adding to the

FP-Tree structure (e.g., the FP-Tree in Figure ‎3.1 is made of transactions with Desc-Flist order

in their items).

Table ‎3.2 Desc-Flist order of frequent items in target data stream 𝑆1

In this example the discriminative level threshold is set to 𝜃 = 2 and the support

threshold is set to 𝜑 = 0.1. The FP-Tree and DISTree together with Header-Table is represented

in Figure ‎3.1 and Figure ‎3.3, respectively. For simplicity, we only show the Header-Table links

of item 𝑎.

Figure ‎3.1 Header-Table and FP-Tree structures for input batch of transactions

Based on the FP-Growth method, the conditional patterns and the conditional FP-Tree

are constructed for each item in the Header-Table following the Desc-Flist bottom-up order. The

conditional patterns of a Header-Table item are the sub-patterns in the original FP-Tree starting

from one root node and ending at that item, for example in Figure ‎3.1 the conditional patterns of

item 𝑎 are 𝑐𝑏𝑑𝑎3,2, 𝑐𝑏𝑎1,0, 𝑐𝑒𝑎1,0, 𝑐𝑎0,4, 𝑏𝑎1,0, 𝑎0,2, respectively. The conditional FP-Tree is

constructed for each item in the Header-Table on its conditional patterns, for example, the

conditional FP-Tree for item 𝑎 is given in Figure ‎3.2. In Figure ‎3.2, the two conditional patterns

𝑐𝑒𝑎1,0 and 𝑐𝑎0,4 are merged together as 𝑐𝑎1,4. The item 𝑒 is not frequent in this conditional FP-

Tree and is not included.

Figure ‎3.2 Conditional FP-Tree of Header-Table item 𝑎

The itemsets which are frequent in the target dataset are generated using FP-Growth

(Han, Pei and Yin 2000). The itemset combinations in each single path of the conditional FP-

Tree are generated and the new generated itemsets are added to the DISTree structure either as

new paths or by updating the previously added itemset frequencies; for example, in Figure ‎3.3

itemset 𝑐𝑏𝑎4,2 is generated from 𝑐𝑏𝑎3,2 and 𝑐𝑏𝑎1,0 in Figure ‎3.2. The non-discriminative

itemsets are ignored for space saving if they are not a subset of any discriminative itemsets. By

making the conditional FP-Tree for each item in the Header-Table and generating itemset

combinations in each single path of the conditional FP-Trees, the DISTree structure is

constructed. The DISTree contains all frequent itemsets in the first dataset no matter whether they

are discriminative or not. The highlighted node shows the discriminative itemsets as in

Figure ‎3.3. For simplicity, we only show the Header-Table links of item 𝑎.

Figure ‎3.3 Header-Table and DISTree structure without pruning (the full prefix tree size is

only for display and is not generated)

The DISTree construction for the fast-growing data streams will consume large memory

space. For each unique itemset, a new branch of itemsets has to be added by multiple counters in

each node for data streams frequencies as in Figure ‎3.3. However, the DISTree is not constructed

in full size and non-discriminative itemsets are pruned after processing each conditional FP-Tree,

as explained in the paragraph below (i.e., Figure ‎3.3 is only for display of complete itemset

generation in DISTree). The final DISTree for Example ‎3.1 is presented in Figure ‎3.4. The

coloured nodes in Figure ‎3.3 and Figure ‎3.4 are showing the discriminative itemsets ending with

them. As explained earlier in this chapter, the Apriori property of frequent itemsets is not valid

for the discriminative itemsets, and subsets of a discriminative itemset can be non-discriminative.

Saving the full size DISTree inside the main memory is not possible for the large

datasets so it is necessary for the DISTree to be in compact size. For the concise DISTree

structure, the non-discriminative itemsets should be pruned from the itemset tails for each item in

the Header-Table following the bottom-up order of the Desc-Flist. The tail pruning process is

defined for each itemset ending with 𝑎 denoted as 𝐼(𝑎) generated during processing the

conditional FP-Tree. If based on the Definition ‎3.2 the 𝑓𝑖(𝐼(𝑎)) < 𝜑𝜃𝑛𝑖 or

𝑟𝑖(𝐼(𝑎)) 𝑟𝑗(𝐼(𝑎))⁄ < 𝜃 then 𝐼(𝑎) is not discriminative. The itemset 𝐼(𝑎) is tagged as non-

discriminative and removed from the DISTree if it is a leaf node. This deletion process saves

reasonable time and space during the DISTree construction by reducing the size of DISTree.

Lemma ‎3-3 (Concise DISTree structure) the pruning of non-discriminative itemsets

ensures to hold the DISTree as a concise data structure.

Proof. The DISTree structure is traversed through Header-Table links based on the

reverse order of Desc-Flist. The non-discriminative itemsets are deleted from the DISTree if they

are not a subset of discriminative itemsets. The final DISTree structure only holds the

discriminative itemsets and their non-discriminative subsets.

By pruning the non-discriminative itemsets staying as leaf nodes, the final DISTree

structure for input dataset in Table ‎3.1 is shown in Figure ‎3.4. The DISTree in this example

contains one non-discriminative itemset staying as the internal node i.e., 𝑐12,13. After finding the

discriminative itemsets in DISTree structure, the tails are pruned if they are non-discriminative

itemsets. The tail nodes are the nodes in DISTree with no children and they are pruned if they are

non-discriminative. A significant difference appears in size of the DISTree structure with and

without tail pruning (e.g., Figure ‎3.3).

Figure ‎3.4 Final DISTree structure and the reported discriminative itemsets

The DISTree structure is made of a small subset of the frequent itemsets in the FP-Tree.

The Apriori property of frequent itemsets is not valid for the discriminative itemsets and subsets

of a discriminative itemset can be non-discriminative (e.g., 𝑐12,13). Figure ‎3.4 also shows the six

discriminative itemsets inside the DISTree for the data streams in Table ‎3.1. Following the

generation of all combinations of frequent itemsets in the target data stream 𝑆𝑖 in the DISTree

structure and traversing and tagging the itemsets as discriminative or non-discriminative, it is

proved that the proposed DISTree algorithm shows full accuracy and recall in a batch of

transactions in data streams.

3.3.2 DISTree algorithm

The DISTree algorithm starts by reading the batch of transactions and making the Desc-

Flist based on the descending order of the item frequencies in the target data stream 𝑆𝑖. The

Desc-Flist order is used for saving space by sharing the paths in the prefix trees with most

frequent items on the top. In data stream mining this Desc-Flist is made from the first batch of

transactions and remains the same for all the upcoming batches in the data streams. In the case of

single-pass algorithm requirements, this step can be ignored by making the FP-Tree and DISTree

using an alphabetical order of items. The input parameters discriminative level 𝜃 and support

threshold 𝜑 are defined based on the application domain, data stream characteristics and sizes or

by the domain expert users as discussed in Chapter ‎6. The presented DISTree algorithm is

applicable for one batch of transactions.

The transactions are added to the FP-Tree by the prefix path sharing of the itemsets. The

DISTree structure is constructed by initializing its root as an empty tree. The DISTree is updated

by traversing the itemsets in the FP-Tree using the Header-Table links and making the

conditional FP-Tree of the Header-Table item and then generating all combinations of each

single path in the conditional FP-Tree. This is following the basics of the standard FP-Growth

method (Han, Pei and Yin 2000) for frequent itemset generation. Based on the Desc-Flist order

of Header-Table items and from the least frequent item, the DISTree is updated by adding the

new itemset combinations generated from conditional FP-Tree. The new itemsets added to the

DISTree (i.e., ending with Header-Table item) are then checked based on the Definition ‎3.1 and

Definition ‎3.2 to be saved as discriminative itemsets or be deleted from the DISTree data

structure as non-discriminative itemsets if they are leaf nodes, respectively. By the end of itemset

combination generation for the batch of transactions, the full DISTree has been tagged based on

the discovered discriminative itemsets.

Algorithm ‎3.1 (DISTree: Discriminative itemset mining using itemsets

Generation and Test)

Input: (1) The discriminative level threshold 𝜃; (2) The support threshold 𝜑; (4)

The input batch 𝐵 made of transactions with alphabetically ordered items

belonging to data streams 𝑆𝑖 and 𝑆𝑗.

Output: 𝐷𝐼𝑖𝑗, a set of discriminative itemsets in 𝑆𝑖 against 𝑆𝑗 in a batch of

transactions 𝐵.

1) Scan 𝐵 to generate Desc-Flist and order the items in transactions based on

Desc-Flist;

2) Make FP-Tree for 𝐵 based on expansion of FP-Growth;

3) 𝐷𝐼𝑖𝑗 = { };

4) For each item x in Header-Table do // bottom-up order of Desc-Flist

5) Make conditional FP-Treex based on item x;

6) For each path in conditional FP-Treex do

7) Generate all itemsets ending with Header-Table item x from the path;

8) If itemsets are not in DISTree, add new paths in DISTree, otherwise

update the frequency of these itemsets in DISTree;

9) End for;

10) Check DISTree for discriminative & non-discriminative nodes and

update 𝐷𝐼𝑖𝑗 by discovered discriminative itemsets;

11) Delete non-discriminative leaf nodes;

12) End for;

13) Report discriminative itemsets in 𝐷𝐼𝑖𝑗;

In the DISTree algorithm, the significant parts attracting considerable complexity are

related to the itemset combination generation for each single path in conditional FP-Tree, and

DISTree construction and test. The number of itemset combinations is exponential specifically in

the large datasets. The Theorem ‎3-1 and Theorem ‎3-2 prove the correctness of the DISTree

method within its minimum space usage, respectively as below.

Theorem ‎3-1 (Completeness and correctness of DISTree): Based on Lemma ‎3-1 all

combination of frequent itemsets in the target data stream 𝑆𝑖 are generated in the DISTree

structure. Based on Lemma ‎3-2 all the itemsets in the DISTree structure are traversed and tagged

as discriminative or non-discriminative itemsets. The non-discriminative itemsets that are not a

subset of any discriminative itemset are pruned in DISTree structure. The complete set of

discriminative itemsets is held in DISTree structure.

Theorem ‎3-2 (Concise DISTree structure): Based on Theorem ‎3-1 the complete set of

itemset combinations are generated in the DISTree structure and correctly tagged as

discriminative and non-discriminative itemsets. Based on Lemma ‎3-3 the non-discriminative

itemsets are deleted from the DISTree structure if they are not subsets of discriminative itemsets.

These prove the minimum space usage by the DISTree structure.

The efficiency of the DISTree algorithm is discussed in detail by evaluating the

algorithm with several input data streams in Chapter ‎6. Empirical analysis shows the

performance of the proposed method on different datasets. The method is tested with different

discriminative level thresholds, support thresholds and ratios between sizes of two data streams.

However, it is discussed that there are several issues regarding the performance of the DISTree

algorithm in generally large and fast-growing data streams.

In this thesis, the problem of mining discriminative itemsets in data streams is defined

using two different window models. The proposed DISTree method is used in Chapter ‎4 for

mining discriminative itemsets in data streams using the tilted-time window model (e.g.,

Figure ‎2.3). The DISTree method is then used in Chapter ‎5 for mining discriminative itemsets in

data streams using the sliding window model (e.g., Figure ‎2.4). Here we explain briefly about

updating process of the two window models after processing a single batch of transactions 𝐵

using the DISTree algorithm.

For the mining discriminative itemsets in data streams using the tilted-time window

model the pre-defined window size made of a batch of input transactions is set as the current

window frame for offline updating the window model. The discriminative itemsets are

discovered using the DISTree method and are transferred from the DISTree structure to the first

window frame as the current window frame. The tilted-time structure is updated by shifting and

merging the older window frames to the larger window frames using the logarithmic way

following the basics of the FP-Stream method (Giannella et al. 2003), as explained in Chapter ‎4.

The discriminative itemsets in the current window frame and the older window frames are

reported separately as the output results in the history of data streams. The discriminative itemset

mining is continued by making the DISTree structure for every new incoming batch of

transactions and offline updating the tilted-time window model.

For the mining discriminative itemsets in data streams using the sliding window model,

the sliding window frame can be divided into several smaller partitions for offline sliding the

window model. The discriminative itemsets are discovered using the DISTree method and are

transferred from the DISTree structure to the sliding window frame by merging with older

itemsets in the sliding window frame, as explained in Chapter ‎5. By every new incoming batch of

transactions, the window slides and the itemsets that are out of the window frame belonging to

the oldest partitions are deleted from the window model and the results are updated in offline

sliding. However, when the sliding window frame becomes full every new input transaction is

checked for having a subset in the set of discriminative itemsets in the sliding window frame for

online sliding.

3.3.3 DISTree summary

In this section the DISTree method proposed for mining the discriminative itemsets in

data streams based on the expansion of the FP-Growth method (Han, Pei and Yin 2000) was

discussed. The DISTree method generates all the itemset combinations ending with each Header-

Table item and adds them to the prefix tree DISTree structure. The new itemsets are checked by

their frequencies and tagged as discriminative or non-discriminative itemsets. The itemset

combination generation is based on the frequent itemsets in the target data stream. The

conditional FP-Tree is made for each Header-Table item by the bottom-up order of Desc-Flist

and based on the frequent items in FP-Tree paths ending with that item, called conditional

patterns. All the combinations of each single path in the conditional FP-Tree are generated and

added to the DISTree structure. To control the data structure size, the non-discriminative itemsets

are deleted from the DISTree structure.

The FP-Growth (Han, Pei and Yin 2000) method was originally designed for the

frequent itemset mining in single dataset. The main part of the DISTree method is the itemset

combination generations limited to the itemsets that are specifically frequent in the target data

stream. However, there are many itemset combinations that are frequent in both data streams and

not discriminative. Based on the conceptual definition and following the empirical analysis in

Chapter ‎6, the discriminative itemsets are a small subset of frequent itemsets. The sparse

characteristics of discriminative itemsets must be used for defining a novel efficient method

acceptable in large, complex and fast-growing data streams. The process must be adjusted based

on the fact that the Apriori property is not valid for the discriminative itemset mining. The

determinative heuristics have to be applied for limiting the mining efficiently to the potential

discriminative itemsets. In the next section a novel efficient method called DISSparse (Seyfi et al.

2017) is proposed for mining the discriminative itemsets in data streams.

The precision of the DISTree algorithm was proved by Theorem ‎3-1 based on its

completeness and correctness. Theorem ‎3-2 also proved the minimum space usage by DISTree

structure in the main memory. Considering small and simple datasets, the DISTree method can

be used correctly for offline updating the different window models.

3.4 DISSPARSE METHOD

The DISSparse method is developed for efficient offline mining the discriminative

itemsets in a batch of transactions in datasets (Seyfi et al. 2017). There are two main issues with

the DISTree method as explained in the previous section. First, it generates the itemset

combinations if they are frequent in the target data stream without considering their frequency in

the general data stream. Second, the candidate itemsets remain in the DISTree structure while

they are checked for being discriminative. The DISTree can become massive for large datasets

with long discriminative itemsets.

The key novel idea of the DISSparse method is to limit the discriminative itemset mining

to the potential subtrees of the DISTree structure, instead of mining the entire DISTree structure.

To this end, two heuristics are proposed to determine the potential subtrees and the potential

internal nodes in the subtrees for mining discriminative itemsets. The subtrees which do not have

potential for containing discriminative itemsets are removed without checking. To facilitate

description of the method, some new data structures are defined below.

Conditional FP-Tree: The conditional pattern of an item is the sub-pattern base under

the condition of the Header-Table item’s‎existence‎in‎the‎original‎FP-Tree. The conditional FP-

Tree is constructed for each item in the Header-Table on its conditional patterns. In contrast with

FP-Growth based methods, the conditional FP-Tree in the DISSparse method contains the nodes

related to the processing Header-Table item by saving the top ancestor on the first level as the

root of the Header-Table items; for example, node 𝑐 is the top ancestor of the Header-Table

items 𝑎 in the left-most subtree in Figure ‎3.5, 𝑐 appears in all Header-Table item nodes in the

left-most subtree. The nodes in the first level of the conditional FP-Tree determine different

subtrees, which are used separately to generate potential discriminative itemset combinations.

The subtree is made of branches under the same root in the first level of the conditional FP-Tree

and ending with the processing Header-Table item, denoted as 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡; for example, the

conditional FP-Tree in Figure ‎3.5 has three subtrees under root 𝑐, 𝑏 and 𝑎 (i.e., 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐,

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑏 and 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑎, respectively). The Header-Table items are linked under their subtree

root node using Header-Table links.

As defined before, each node represents an itemset starting from the root and ending at

this node and is annotated with the frequencies of the itemset in two datasets, for example, 𝑎3,2 in

Figure 3.5 indicates that the frequency of itemset 𝑐𝑏𝑑𝑎 is 3 and 2 in stream 𝑆𝑖 and 𝑆𝑗

respectively. For simplicity, we use a single header item 𝑎3,2 to denote an itemset 𝑐𝑏𝑑𝑎 and its

frequency. Let 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) be a set of header items with their

frequency, for each item 𝑎𝑛,𝑚 ∈ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡), 𝐼(𝑎𝑛,𝑚) is defined as

the itemset ending with 𝑎 and the frequency of 𝐼(𝑎𝑛,𝑚) in stream 𝑆𝑖 and 𝑆𝑗 are 𝑛 and 𝑚,

respectively; for example, in Figure ‎3.5 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐) contains 𝑎3,2, 𝑎1,0

and 𝑎1,4, 𝐼(𝑎3,2) = 𝑐𝑏𝑑𝑎, 𝑓𝑖 (𝐼(𝑎3,2)) = 3 and 𝑓𝑗 (𝐼(𝑎3,2)) = 2. In the DISSparse method, the

conditional FP-Tree will be expanded by its subtrees during the process, as explained in

Section ‎3.4.1.2.

Based on Definition ‎3.1 and empirical studies, the discriminative itemsets are a small

subset of frequent itemsets. The DISSparse method is presented with two determinative

heuristics for limiting the mining discriminative itemsets to the potential itemsets in

Section ‎3.4.1. We prove the full accuracy and recall of the DISSparse method in one batch of

transactions in Section ‎3.4.2. The efficiency of the DISSparse method is compared with the

proposed DISTree method in Chapter ‎6, for a single batch of transactions. The method is

evaluated extensively using various thresholds on several large and complex fast-growing data

streams with different characteristics.

3.4.1 Mining discriminative itemsets using sparse prefix tree

Following FP-Growth, the FP-Tree is generated for the incoming batch of transactions

as in Figure ‎3.1. The conditional patterns and conditional FP-Tree for each Header-Table item is

then produced following the‎ increasing‎order‎of‎ the‎ items’‎frequency‎in‎Header-Table starting

from the item with the lowest frequency as in Desc-Flist (i.e., the conditional FP-Tree for the

Header-Table item 𝑎 presented in Figure ‎3.5). The difference between conditional FP-Tree in

Figure ‎3.2 and conditional FP-Tree in Figure ‎3.5 is that in Figure ‎3.5 we process the subtrees one

by one.

Figure ‎3.5 Conditional FP-Tree of Header-Table item 𝑎 associated with the top ancestor on

the first level

The conditional FP-Tree is traversed from the left-most subtree through the processing

Header-Table item links. The left-most subtree in conditional FP-Tree and its internal nodes are

checked under subject of two heuristics that are defined in this section. The heuristics determine

the potential discriminative itemsets in a combination set of itemsets. The two heuristics are

defined based on two important measures, the maximum frequency of the itemsets in a subtree

and the maximum discriminative value of the itemsets in a subtree. We first define the two

measures below, then define the two heuristics.

Let 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎) denote the set of itemsets in subtree 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 starting from

𝑟𝑜𝑜𝑡 and ending with a header item 𝑎𝑛,𝑚 ∊ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡), e.g.,

𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑐, 𝑎) = {𝐼(𝑎3,2), 𝐼(𝑎1,0), 𝐼(𝑎1,4)}. The maximum frequency of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎)

in the target dataset 𝑆𝑖, denoted as Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎 ), is defined as the sum of the frequencies

in 𝑆𝑖 of the itemsets in 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎) as below.

Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎 ) = ∑ 𝑓𝑖(𝑏)

𝑏∈𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡,𝑎)

(‎3-a)

For simplicity, in the equation above, 𝑓𝑖(𝑏) refers to the frequency in 𝑆𝑖 of an itemset

ending with an item 𝑏 in 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡); for example, in Figure ‎3.5 the

maximum frequency of itemsets in 𝑆𝑖 in the left-most subtree 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐, which contains three

branches ending with 𝑎3,2, 𝑎1,0 and 𝑎1,4 is equal to 5 i.e., Max_freq𝑖(𝑐, 𝑎) = 5.

Let 𝒮 be the power set of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎), 𝒮 = 2𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡,𝑎) , i.e., 𝒮 consists of

all subsets of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎). For 𝐵 ∈ 𝒮 and 𝐵 ≠ { }, the discriminative value of the

itemsets in 𝐵 is defined below.

Dis_value(𝐵) =

𝑟𝑖(𝐵)

𝜃 ∑𝑓𝑗(𝑏)

𝑏∈𝐵

𝑟𝑖(𝐵)

𝑟𝑗(𝐵) ∑𝑓𝑗(𝑏)

𝑏∈𝐵

(‎3-b)

Where 𝑟𝑖(𝐵) =∑ 𝑓𝑖(𝑏)𝑏∈𝐵

𝑛𝑖 and 𝑟𝑗(𝐵) =

∑ 𝑓𝑗(𝑏)𝑏∈𝐵

𝑛𝑗 . 𝑟𝑖(𝐵) is called the relative support of

B in 𝑆𝑖. It is the sum of the relative supports of the itemsets in B in 𝑆𝑖 and 𝑆𝑗, respectively.

When ∑ 𝑓𝑗(𝑏)𝑏∈𝐵 = 0, the itemsets 𝑏 ∈ 𝐵 do not exist in dataset 𝑆𝑗. In this case,

Dis_value(𝐵) is defined as the ratio between the sum of the relative supports of the itemsets of 𝐵

in 𝑆𝑖 and the discriminative level threshold 𝜃. The idea behind is that the itemset should be

frequent and significant in the target dataset.

When ∑ 𝑓𝑗(𝑏)𝑏∈𝐵 > 0 which indicates that at least one of the itemsets in 𝐵 does exist in

the general dataset, Dis_value(𝐵) is defined as the ratio between the sum of the relative supports

of the itemsets of 𝐵 in 𝑆𝑖 and the sum of the relative supports of the itemsets of 𝐵 in 𝑆𝑗. 𝑟𝑖(𝐵)

𝑟𝑗(𝐵) is

called the discriminative value of 𝐵, denoted as 𝑅𝑖𝑗(𝐵) =𝑟𝑖(𝐵)

𝑟𝑗(𝐵) (e.g., in the left-most subtree in

Figure ‎3.5, for = {𝑐𝑏𝑎𝑑, 𝑐𝑏𝑎, 𝑐𝑎}, 𝑅𝑖𝑗(𝐵) =5

6∗15

15, 𝑛𝑖 = 𝑛𝑗 = 15).

The maximum discriminative value of all itemsets in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 ending with a header

item 𝑎, denoted as Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎), is defined below.

Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) = 𝑚𝑎𝑥𝐵∈𝒮{Dis_value(𝐵)} = 𝑚𝑎𝑥𝐵∈𝒮

𝑟𝑖(𝐵)

𝜃 ∑𝑓𝑗(𝑏)

𝑏∈𝐵

𝑟𝑖(𝐵)

𝑟𝑗(𝐵) ∑𝑓𝑗(𝑏)

𝑏∈𝐵

(‎3-c)

Obviously, Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) is the highest discriminative value among all

subsets of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎).

In order to determine Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎), all possible itemsets in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡

will have to be generated as required in equations (‎3-b) and (‎3-c). However, the generation of all

possible itemset combinations is time-consuming. An efficient method, as described below, is

designed to find a subset 𝐵 of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎) which makes Dis_value(𝐵) maximum among

all subsets.

Let 𝐵 with maximum 𝑅𝑖𝑗(𝐵) in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 be defined as 𝐵𝑚𝑎𝑥. Initially, 𝐵𝑚𝑎𝑥 is

initialized by summing up the 𝑓𝑖(𝑏) frequencies of the itemsets 𝑏 with 𝑓𝑗(𝑏) = 0.

The frequencies of itemset 𝑏, with maximum frequency ratio, is summed up by 𝐵𝑚𝑎𝑥

only if it increases its discriminative value i.e., 𝑅𝑖𝑗(𝐵𝑚𝑎𝑥); for example, in Figure ‎3.5 the

maximum discriminative value of itemsets in the left-most subtree, 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐, is equal to 2, i.e.,

Max_dis_value(𝑐, 𝑎) = 2, which is calculated by the sum of frequencies of two itemsets ending

with the items in 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐), i.e., 𝑎1,0 and 𝑎3,2. Following the

Definition ‎3.1 if the 𝑓𝑗(𝐵𝑚𝑎𝑥) = 0 then 𝑅𝑖𝑗(𝐵𝑚𝑎𝑥) =𝑓𝑖(𝐵𝑚𝑎𝑥)

𝜃𝑛𝑖. The algorithm for calcultating

the Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) and estimating the Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) is presented in

Algorithm ‎3.2.

Algorithm ‎3.2 𝐌𝐚𝐱_𝐟𝐫𝐞𝐪𝒊 and 𝐌𝐚𝐱_𝐝𝐢𝐬_𝐯𝐚𝐥𝐮𝐞

Input: (1) 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡; (2) header item

𝑎 ∊ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡).

Output: (1) Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎); (2) Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎).

1) Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) = 0; 𝐹𝑖 = 0, 𝐹𝑗 = 0;

2) For each item 𝑏 ∊ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) do

3) Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎)+= 𝑓𝑖(𝐼(𝑏));

4) If 𝑓𝑗(𝐼(𝑏)) = 0 then 𝐹𝑖+= 𝑓𝑖(𝐼(𝑏)); Tag 𝑏 as checked;

5) End if;

6) End For;

7) While ∃𝑏 ∊ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) and 𝑏 is unchecked do

8) Find 𝐼(𝑏) with maximum 𝑅𝑖𝑗(𝐼(𝑏));

9) If 𝐹𝑖+ 𝑓𝑖(𝐼(𝑏))

𝐹𝑗+ 𝑓𝑗(𝐼(𝑏)) >

𝐹𝑖

𝐹𝑗 then 𝐹𝑖+= 𝑓𝑖(𝐼(𝑏)); 𝐹𝑗+= 𝑓𝑗(𝐼(𝑏));

10) End if;

11) Tag 𝑏 as checked;

12) End While;

13) Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) =𝐹𝑖∗𝑛𝑗

𝐹𝑗∗𝑛𝑖 ;

14) Return Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) and Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎);

In the above algorithm, the items in 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) are scanned

in the two separated loops. In the first loop the Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) is calculated based on the

frequency in 𝑆𝑖 (i.e., 𝑓𝑖(𝐼(𝑏))) of all itemsets in the 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡. The 𝐹𝑖 is also initiated based

on sum of 𝑓𝑖(𝐼(𝑏)) in the itemsets with 𝑓𝑗(𝐼(𝑏)) = 0. The 𝐹𝑖 and 𝐹𝑗 are used for calculating the

Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎). In the second loop and using a greedy method, the 𝐹𝑖 and 𝐹𝑗 are

updated by adding the frequencies of the itemset 𝐼(𝑏) in 𝑆𝑖 and 𝑆𝑗 with maximum 𝑅𝑖𝑗(𝐼(𝑏)) if

the frequencies of the itemset 𝐼(𝑏) increase the ratio between 𝐹𝑖 and 𝐹𝑗. The ratio between 𝐹𝑖 and

𝐹𝑗 is the greatest ratio in all itemsets in the subtree. In the second loop each item is checked one

time and the Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) are calculated based on the selected items.

The first heuristic is formally defined below.

HEURISTIC ‎3.1. A 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 in the conditional FP-Tree in terms of a header item 𝑎 is

considered as a potential discriminative subtree denoted as 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) if it

satisfies the following conditions:

1. Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) ≥ 𝜑𝜃𝑛𝑖

2. Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) ≥ 𝜃

Where 𝜃 > 1 is discriminative level threshold, 𝜑 𝜖 (0, 1 𝜃⁄ ) is the support threshold, 𝑛𝑖 is the

size of the target data stream 𝑆𝑖 and 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎) is a set of itemsets in subtree

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 ending with a header item 𝑎 ∊ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡).

Lemma ‎3-4 (Potential discriminative subtree) HEURISTIC ‎3.1 ensures that none of the

non-potential discriminative subtrees contains any discriminative itemset.

Proof. The first condition in HEURISTIC ‎3.1 ensures that the sum of frequencies in

target data stream S𝑖 of itemsets of a potential discriminative subtree is frequent, which implies

that there could be an itemset in target data stream S𝑖 in the subtree which is frequent. The second

condition in HEURISTIC ‎3.1 ensures that the maximum discriminative value of a potential

discriminative subtree is larger than the discriminative level 𝜃, which implies that there could be

an itemset in the subtree whose discriminative value is larger than the discriminative level 𝜃.

For a subtree, if it does not satisfy any of the two conditions, the subtree is considered as

a non-potential discriminative subtree. Because a non-potential discriminative subtree breaches

one or both of the conditions, the subtree does not contain any frequent itemset in target data

stream S𝑖 or does not contain any itemset whose discriminative value is larger than the

discriminative level 𝜃. According to Definition ‎3.1, the subtree would not contain any

discriminative itemset.

In Figure ‎3.5, the left-most subtree related to the processing Header-Table item 𝑎 under

root node 𝑐 is potential with Max_freq𝑖(𝑐, 𝑎) = 5 ≥ (𝜑𝜃𝑛𝑖 = 0.1 ∗ 2 ∗ 15 = 3) and

Max_dis_value(𝑐, 𝑎) = 2 ≥ (𝜃 = 2). In this chapter, for the sake of simplicity, the dataset

lengths are omitted from ratios as 𝑛1 = 𝑛2 = 15. In case of data streams with different length

(i.e., 𝑛2

𝑛1≠ 1), the ratios must be multiplied by the constant of

𝑛1.

Using HEURISTIC ‎3.1 all potential discriminative subtrees can be identified and all

non-potential subtrees are to be ignored from the itemset combination generation. A

𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) could contain non-discriminative itemsets. The non-potential subsets

may exist in 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) as internal nodes which are in the paths between 𝑟𝑜𝑜𝑡 of

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 and the items in 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡). The internal nodes are

denoted as 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡 (e.g., in the left-most subtree in conditional FP-Tree in

Figure ‎3.5, 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐 has two internal nodes i.e., 𝑏 and 𝑑, respectively).

Let 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) denote a set of itemsets in subtree 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 ending with

a header item 𝑎 ∊ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) with the internal node 𝑖𝑛 in each of the

itemsets, i.e., 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) ⊆ 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎), for example, in Figure ‎3.5,

𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑐, 𝑏, 𝑎) = {𝐼(𝑏3,2), 𝐼(𝑏1,0)} . The maximum frequency of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) in

the target dataset 𝑆𝑖 is denoted as Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎); for example, for 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑐, 𝑏, 𝑎) ,

Max_freq𝑖(𝑐, 𝑏, 𝑎) = 4. In Figure ‎3.5 the maximum frequency of itemsets with the subset of

internal node item 𝑑 i.e., 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑐, 𝑑, 𝑎) in the left-most subtree with one appearance, 𝐼(𝑑3,2)

in 𝑆𝑖 is equal to 3 i.e., Max_freq𝑖(𝑐, 𝑑, 𝑎) = 3.

The maximum discriminative value of 𝐼𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) is denoted as

Max_dis_value(𝑟𝑜𝑜𝑡, 𝑏, 𝑎); for example, in Figure ‎3.5 the maximum discriminative value of the

itemsets with the subset of internal node item 𝑏 in the left-most subtree is equal to 2 i.e.,

Max_dis_value(𝑐, 𝑏, 𝑎) = 2, which is made of two itemsets with subsets of internal node

𝐼(𝑏3,2) and 𝐼(𝑏1,0), respectively. In Figure ‎3.5 the maximum discriminative value of the

itemsets with subset of internal node item 𝑑 in the left-most subtree is equal to 1.5 i.e.,

Max_dis_value(𝑐, 𝑑, 𝑎) = 1.5, which is made of itemset with the subset of internal node

𝐼(𝑑3,2).

The second heuristic is formally defined below.

HEURISTIC ‎3.2. An internal node 𝑖𝑛 ∈ 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡 in a potential subtree

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 is considered as potential discriminative internal node denoted as 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑖𝑛) if

it satisfies the following conditions:

1. Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) ≥ 𝜑𝜃𝑛𝑖

2. Max_dis_value(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) ≥ 𝜃

Where 𝜃 > 1 is discriminative level threshold, 𝜑 𝜖 (0, 1 𝜃⁄ ) is support threshold and 𝑛𝑖 is the

size of target data stream 𝑆𝑖 and 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) is a set of itemsets in subtree

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 ending with a header item 𝑎 ∊ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) with the

internal node 𝑖𝑛 as subset.

Lemma ‎3-5 (Potential discriminative internal node) HEURISTIC ‎3.2, ensures that none

of the non-potential discriminative internal nodes would occur in any discriminative itemset.

Proof. The first condition in HEURISTIC ‎3.2 ensures that the sum of frequencies in

target dataset S𝑖 of itemsets with subset of the potential discriminative internal node is frequent,

which implies that there could be an itemset in target dataset S𝑖 in the subtree with subset of the

potential discriminative internal node which is frequent. The second condition in

HEURISTIC ‎3.2 confirms that the maximum discriminative value of the potential

discriminative internal node is larger than the discriminative level 𝜃, which implies that there

could be an itemset in the subtree with subset of the potential discriminative internal node whose

discriminative value is larger than the discriminative level 𝜃.

For an internal node, if it does not satisfy any of the two conditions, the internal node is

considered as a non-potential discriminative internal node. Because a non-potential

discriminative internal node breaches one or both of the conditions, the internal node is not

contained in any frequent itemset in target dataset S𝑖 or is not contained in any itemset whose

discriminative value is larger than the discriminative level 𝜃. According to Definition ‎3.1, the

internal node would not be contained in any discriminative itemset.

In Figure ‎3.5, the internal node 𝑑 in the left-most subtree 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐) related

to the processing Header-Table item 𝑎 under root node 𝑐 is non-potential with

Max_freq𝑖(𝑐, 𝑑, 𝑎) = 3 ≥ (𝜑𝜃𝑛𝑖 = 0.1 ∗ 2 ∗ 15) and Max_dis_value(𝑐, 𝑑, 𝑎) = 1.5 ≱ (𝜃 =

2). The non-potential internal nodes are ignored from itemset combination generation in

𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) (e.g., the 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑐, 𝑑, 𝑎) is not involved in the minimized DISTree

in Figure ‎3.6).

3.4.1.1 Potential discriminative itemsets generation using minimized DISTree

The DISSparse method does not use the DISTree structure to identify discriminative

itemsets, which is usually very big. Instead, it identifies potential discriminative subtrees in the

conditional FP-Tree, and then discovers discriminative itemsets from the potential discriminative

subtrees, which can significantly increase the efficiency. The potential discriminative itemsets

identified from a 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 in conditional FP-Tree are represented in a minimized DISTree

structure defined below.

Minimized DISTree: The minimized DISTree is similar to the DISTree structure

defined early in previous section (Seyfi, Geva and Nayak 2014). The size of a minimized

DISTree is bounded to the size of potential subsets in one potential discriminative subtree in a

conditional FP-Tree and non-potential subsets are ignored when the itemset combinations are

generated (e.g., the minimized DISTree in Figure ‎3.6 is generated out of the potential

discriminative subsets of the left-most subtree in the conditional FP-Tree in Figure ‎3.5 without

considering 𝑑 which is a non-discriminative internal node). The minimized DISTree covers the

itemsets starting with 𝑟𝑜𝑜𝑡 item of 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 as prefix and items in

𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) as postfix (e.g., the minimized DISTree in Figure ‎3.6

covers the itemsets with prefix of 𝑐 and postfix of 𝑎, generated out of the potential discriminative

subset of items 𝑐, 𝑏 and 𝑎 in the left-most subtree in conditional FP-Tree in Figure ‎3.5). This is

formally defined below.

Let 𝐼 be an itemset in a minimized DISTree for a given potential discriminative subtree,

i.e., 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡. For all internal subsets 𝐼′ of itemset 𝐼, i.e., 𝐼′ ⊂ 𝐼, which start immediately

after the 𝑟𝑜𝑜𝑡 item of 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 and end before the items in

𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡), the subsets 𝐼′ are included in the minimized DISTree

generated from the 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 if 𝐼 = {𝑟𝑜𝑜𝑡} ∪ 𝐼′ ∪ {𝑎}, where

It should be noted that in an itemset, the 𝑟𝑜𝑜𝑡 item of 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 and the item in

𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) may refer to a same node (i.e., length-1 itemset) (e.g., in

Figure ‎3.5 in the right-most subtree under root 𝑎 (i.e., 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑎) the node 𝑎0,2 refers to the 𝑟𝑜𝑜𝑡

of 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑎 and also 𝑎0,2 ∊ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑎)). This confirms the minimized

DISTree covers itemsets in a 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 that satisfy the condition above. We will explain the

conditional FP-Tree expansion to obtain all possible subtrees in next subsection.

The minimized DISTree is generated from the potential discriminative subsets of one

𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) for the processing Header-Table item; for example, Figure ‎3.6 shows

the minimized DISTree for Header-Table item 𝑎, generated out of the potential discriminative

subset of items 𝑐, 𝑏 and 𝑎 in the left-most subtree in conditional FP-Tree in Figure ‎3.5 (i.e.,

𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐)). As explained before, the minimized DISTree of a subtree 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡

covers the itemset combinations starting with 𝑟𝑜𝑜𝑡 item of 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 as prefix and ending by

items in 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) as postfix. The potential discriminative itemsets

with other prefix items are generated from potential discriminative subtrees with different 𝑟𝑜𝑜𝑡

items in separate minimized DISTrees.

Figure ‎3.6 Minimized DISTree generated from the left-most subtree in Figure ‎3.5

By generating the potential discriminative itemset combinations out of all branches in a

potential discriminative subtree (e.g., three branches in the left-most subtree,

𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐) in Figure ‎3.5 i.e., 𝑐𝑏𝑑𝑎3,2, 𝑐𝑏𝑎1,0 and 𝑐𝑎1,4), the minimized DISTree is

traversed through Header-Table item links for mining the discriminative itemsets based on

Definition ‎3.1 (e.g., the highlighted node 𝑎4,2 in Figure ‎3.6 refers to the discriminative itemset

𝑐𝑏𝑎4,2). The conditional FP-Tree then is expanded for processing the next subtree of the

processing Header-Table item as in the section below.

3.4.1.2 Conditional FP-Tree expansion

In a conditional FP-Tree, except for the left-most subtree such as the subtree with root 𝑐

in Figure ‎3.5, a subtree with a particular root item may not contain all the itemsets starting with

that particular item, for example, in the subtree with root 𝑏 in Figure ‎3.5, the itemset 𝑏𝑑𝑎, which

was included in the left-most subtree, was not included in the subtree with root 𝑏. In order to

generate all the possible discriminative itemsets, before traversing the next 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 of the

processing Header-Table item, the conditional FP-Tree must be expanded by adding the sub-

branches of the current 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 without their 𝑟𝑜𝑜𝑡 item. Each sub-branch is added to the

conditional FP-Tree under the 𝑟𝑜𝑜𝑡 node of one of the remaining subtrees by summing up the

frequencies of the itemsets ending with the items in 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡), for

example, three sub-branches in the left-most subtree, 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐 in Figure ‎3.5 i.e., 𝑏𝑑𝑎3,2, 𝑏𝑎1,0

and 𝑎1,4 are added to the conditional FP-Tree as in Figure ‎3.7 to generate three branches under

root 𝑏 or 𝑎 i.e., 𝑏𝑑𝑎3,2, 𝑏𝑎2,0 and 𝑎1,6. For space saving, the

𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) of the processed 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 are removed from

conditional FP-Tree. The conditional FP-Tree expansion continues by processing each

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 until no further 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 remains.

Lemma ‎3-6 (Completeness of itemset prefixes) For the current processing Header-Table

item, the conditional FP-Tree expansion confirms that all the possible itemset prefixes of this

item can be obtained.

Proof. Each 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 is used for generating the potential discriminative itemsets

with 𝑟𝑜𝑜𝑡 item of 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 as prefix and items in 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) as

postfix. The expanded conditional FP-Tree with sub-branches of the processed 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡

ensures that the left-most subtree covers all the itemset prefixes starting with the subtree root

item. The expanded conditional FP-Tree with sub-branches of every processed 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡

obtains a complete set of itemsets in distinct subtrees starting with every possible 𝑟𝑜𝑜𝑡 item. This

ensures the completeness of the conditional FP-Tree in DISSparse method for itemset

combination generation with all prefixes for each header item.

The expanded conditional FP-Tree of Header-Table item 𝑎 after processing the first

subtree (i.e., 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐) is presented in Figure ‎3.7; for example, the path 𝑏𝑑𝑎3,2 is added to

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑏 by expanding the conditional FP-Tree with sub-branch 𝑏𝑑𝑎3,2 of 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐. The size

of a conditional FP-Tree may increase by expansion, especially at the beginning. However, the

increment is not exponential, as compared to the original conditional FP-Tree size and as per

empirical analysis, does not affect the DISSparse algorithm performance.

Figure ‎3.7 Expanded conditional FP-Tree of Header-Table item 𝑎 after processing the first

subtree

In Example ‎3.1, the 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑏 is traversed by 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑏)

links. The 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑏 has two branches ending with the 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑏) i.e.,

𝐼(𝑎3,2) and 𝐼(𝑎2,0) as in Figure ‎3.7. Based on HEURISTIC ‎3.1 the 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑏 is potential with

Max_freq𝑖(𝑏, 𝑎) = 5 ≥ (𝜑𝜃𝑛𝑖 = 0.1 ∗ 2 ∗ 15) and Max_dis_value(𝑏, 𝑎) = 2.5 ≥ (𝜃 = 2).

Based on HEURISTIC ‎3.2 the internal node 𝑑 is non-potential with Max_freq𝑖(𝑏, 𝑑, 𝑎) = 3 ≥

(𝜑𝜃𝑛𝑖 = 0.1 ∗ 2 ∗ 15) and Max_dis_value(𝑏, 𝑑, 𝑎) = 1.5 ≱ (𝜃 = 2)).

The minimized DISTree for the Header-Table item 𝑎 based on potential discriminative

subsets in the left-most subtree in conditional FP-Tree in Figure ‎3.7, 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑏, is generated as

in Figure ‎3.8 (e.g., the highlighted node 𝑎(5,2) in Figure ‎3.8 refers to the discriminative itemset

𝑏𝑎5,2).

Figure ‎3.8 Minimized DISTree generated out of the potential discriminative subsets of the

left-most subtree in conditional FP-Tree for Header-Table item 𝑎

The conditional FP-Tree of the Header-Table item 𝑎 is then expanded by adding the two

sub-branches of 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑏, 𝑑𝑎3,2 and 𝑎2,0 as in Figure ‎3.9. Based on HEURISTIC ‎3.1 the

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑑 with one itemset ending the items in 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑑) i.e., 𝐼(𝑎3,2)

is non-potential with Max_freq𝑖(𝑑, 𝑎) = 3 ≥ (𝜑𝜃𝑛𝑖 = 0.1 ∗ 2 ∗ 15) and

Max_dis_value(𝑑, 𝑎) = 1.5 ≱ (𝜃 = 2), and the minimized DISTree is not generated. Based on

Lemma ‎3-6, related to the completeness of itemset prefixes, the conditional FP-Tree must be

expanded by adding the sub-branches of the non-potential 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 as well; for example, the

conditional FP-Tree in Figure ‎3.9 is expanded by adding the single sub-branch of 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑑,

path 𝑎3,2. The 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑎 with one itemset ending with the items in

𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑎) (i.e., 𝐼(𝑎6,8)) is found as non-potential.

Figure ‎3.9 Expanded modified conditional FP-Tree of Header-Table item a after processing

the second subtree

The potential discriminative itemset combination generation for the processing Header-

Table item is finished if there is no more 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 in the conditional FP-Tree. Following the

bottom-up order of Desc-Flist, the conditional FP-Tree is then generated for the rest of Header-

Table items respectively (i.e., item 𝑒 in Example ‎3.1). In the DISSparse method by processing

the last Header-Table item (i.e., item 𝑐 in Example ‎3.1) the full discriminative itemsets are

reported as in 𝐷𝐼𝑖𝑗. Considering the data streams, the non-discriminative subsets of

discriminative itemsets, as part of the answers set, may become involved in window model

updating. The basics for adjusting the frequencies of these itemsets are explained, as in the

section below.

3.4.1.3 Tuning non-discriminative subsets of discriminative itemsets

Against the Apriori property and distinguishing the discriminative itemset mining from

frequent itemset mining, the non-discriminative itemsets appear as subset of discriminative

itemsets; for example, the item 𝑐 in Example ‎3.1 is a subset of discriminative itemsets, but 𝑐 is

not discriminative. After mining all discriminative itemsets in the datasets, the frequencies of

non-discriminative itemsets, which appear as subsets of discriminative itemsets, must be adjusted

accordingly using the original FP-Tree. In data stream mining, these itemsets may become

involved in window model updating, as discussed in Chapter ‎4 and Chapter ‎5. For the sake of

clarity, in this thesis the non-discriminative subsets of discriminative itemsets are called non-

discriminative subsets. Tuning the frequencies of non-discriminative subsets is not a time-

consuming process, if the sparse discriminative itemsets are considered with less number of non-

discriminative subsets.

Lemma ‎3-7 (Exact non-discriminative subsets) tuning the frequencies of the non-

discriminative itemsets appearing as a subset of discriminative itemsets using original FP-Tree

ensures the exact frequencies of these itemsets as part of the results that may become involved in

the window model updating.

Proof. The original FP-Tree is the superset of conditional FP-Trees and has full view of

all itemsets in the datasets. The exact frequencies of the non-discriminative subsets are collected

accurately using their appearances in the original FP-Tree by traversing through Header-Table

links.

3.4.2 DISSparse Algorithm

The DISSparse algorithm starts by reading the batch of transactions and making the

Desc-Flist based on the descending order of the item frequencies in the target data stream 𝑆𝑖. The

Desc-Flist order is used for saving space by sharing the paths in the prefix tree structures,

including FP-Tree, conditional FP-Tree and minimized DISTree, with most frequent items on

the top. In data stream mining this Desc-Flist is made from the first batch of transactions and

remains the same for all the upcoming batches in data streams. In the case of single-pass

algorithm requirements, this step can be ignored by making the prefix tree structures using an

alphabetical order of items. The input parameters discriminative level 𝜃 and support threshold 𝜑

are defined based on the application domain, data stream characteristics and sizes or by the

domain expert users as discussed in Chapter ‎6.

The FP-Tree and Header-Table are built using expansion of FP-Growth (Han, Pei and

Yin 2000) for the batch of transactions 𝐵. Following the bottom-up order of Desc-Flist, the

conditional FP-Tree is built for each Header-Table item. Every 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 in the conditional

FP-Tree is assessed using HEURISTIC ‎3.1 and HEURISTIC ‎3.2 to limit the algorithm to the

potential discriminative itemsets. The minimized DISTree is generated for a

𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) by the potential discriminative itemset combinations and

discriminative itemsets are instantly reported in 𝐷𝐼𝑖𝑗. The conditional FP-Tree of the processing

Header-Table item is expanded by the sub-branches of the processed 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 and the

process continues if there is a new 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡. By full discovery of the discriminative itemsets

the frequencies of non-discriminative subsets are tuned using original FP-Tree.

Algorithm ‎3.3 (DISSparse: Discriminative itemset mining using Sparse

Prefix Tree)

Input: (1) The discriminative level threshold 𝜃; (2) The support threshold 𝜑; (3)

The input batch 𝐵 made of transactions with alphabetically ordered items

belonging to data streams 𝑆𝑖 and 𝑆𝑗.

Output: 𝐷𝐼𝑖𝑗, a set of discriminative itemsets in 𝑆𝑖 against 𝑆𝑗 in a batch of

transactions 𝐵.

1) Scan 𝐵 to generate Header-Table and to order the items in Header-Table

by frequency;

2) Make FP-Tree for 𝐵;

3) 𝐷𝐼𝑖𝑗 ={ };

4) For each item x in Header-Table do // x is least-frequent

5) Make conditional FP-Treex based on item x;

6) While conditional FP-Treex has remaining 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 do // left-most

7) Assess 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 using HEURISTIC ‎3.1;

8) If 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) then

9) Find 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡) using HEURISTIC ‎3.2;

10) Generate minimized DISTree based on 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡)

with only potential internal nodes in 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡);

11) Generate discriminative itemsets from the minimized DISTree;

12) Update 𝐷𝐼𝑖𝑗 by adding the discovered discriminative itemsets;

13) End if;

14) Expand conditional FP-Treex by sub-branches of 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡;

15) End while;

16) End for;

17) Report discriminative itemsets in 𝐷𝐼𝑖𝑗;

Based on the three lemmas, we can prove that the DISSparse algorithm can find all

correct discriminative itemsets.

Theorem ‎3-3 (Completeness and correctness of DISSparse): Based on Lemma ‎3-6,

each conditional FP-Tree obtains a complete set of itemset prefixes for each Header-Table item.

Based on Lemma ‎3-4 and Lemma ‎3-5, the potential discriminative itemsets in each potential

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 in conditional FP-Tree are generated in a minimized DISTree and discriminative

itemsets are discovered correctly. These prove the completeness and correctness of the

DISSparse method by discovering all the discriminative itemsets and their non-discriminative

subsets.

As discussed, in this thesis the problem of mining discriminative itemsets in data streams

is defined, using two different window models. The proposed DISSparse method is used in

Chapter ‎4 for mining discriminative itemsets in data streams using the tilted-time window model

(e.g., Figure ‎2.3). The DISSparse method is then used in Chapter ‎5 for mining discriminative

itemsets in data streams using the sliding window model (e.g., Figure ‎2.4). The DISSparse

algorithm is used for processing a single batch of transactions 𝐵 following the updating process

for the two window models, as was explained briefly for DISTree algorithm in Section ‎3.3.2.

3.4.3 DISSparse Algorithm Complexity

In the DISSparse algorithm, the significant parts attracting considerable complexity are

related to the finding of potential discriminative itemsets, generating the minimized DISTree and

conditional FP-Tree expansion.

The most time-consuming part is finding potential subtrees. For this part, we used two

separated loops. Each item is checked one time and the Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) are calculated. The

Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) is calculated based on the selected items using a greedy method using a

nested loops. The‎size‎of‎the‎outer‎loop‎is‎the‎number‎of‎items,‎the‎inner‎loop,‎“find 𝐼(𝑏) with

maximum 𝑅𝑖𝑗(𝐼(𝑏))”‎ involves‎combination.‎However, using the greedy method decreases the

number of combinations 𝐼(𝑏). The number of generated potential discriminative itemsets in a

minimized DISTree is much fewer compared to the exponential number of frequent itemsets in

the target dataset 𝑆𝑖 as generated in DISTree method. The DISSparse does not use a big

intermediate data structure as DISTree and discriminative itemsets are instantly discovered from

the minimized DISTree. The number of discovered discriminative itemsets is usually less than

actual size of the minimized DISTree.

The efficiency of the DISSparse algorithm is evaluated in Chapter ‎6 with empirical

analysis which shows the performance of the proposed method on different datasets in

comparison to the proposed DISTree method. The method is tested with different discriminative

level thresholds, support thresholds and ratios between sizes of two datasets. It is discussed that

the DISSparse method has efficient time and space complexity in generally large datasets for

mining a large number of both short and long discriminative itemsets.

The 𝛿-discriminative emerging patterns have close definition to the discriminative

itemsets defined in this chapter. However, there are several differences as we discussed earlier.

We explained that the discriminative itemsets discovered by DISSparse algorithm cannot be

limited with the static bound of frequency (< 𝛿) in the general dataset. For the purpose of

comparison and as another baseline method, we modify the definition criteria and consequently

the proposed heuristics in DISSparse algorithm‎ to‎ discover‎ all‎ the‎ δ-discriminative emerging

patterns as in the section below. We also modify the original DPMiner method (i.e., DPM in

short) to discover all the 𝛿-discriminative emerging patterns (i.e., including redundant emerging

patterns). The methods are tested together in Chapter ‎6 in case of time and space usage.

3.4.4 Modified DISSparse and modified DPMiner

The DISSparse method is modified in a way to discover the 𝛿-discriminative emerging

patterns following the similar definition as in DPMiner method. The modification is in the

subtrees in the conditional FP-Tree and pruning the non-delta-discriminative itemsets instead of

non-discriminative itemsets. The conditional FP-Tree is traversed from the left-most subtree

through processing the Header-Table item links. The two proposed heuristics in DISSparse

method are re-defined to find the potential subtrees and potential internal nodes for mining delta-

discriminative itemsets, respectively. The algorithm looks for delta-discriminative itemsets

instead of discriminative itemsets defined in this chapter. The rest of the algorithm is the same as

original DISSparse algorithm. Each subtree and its internal nodes are checked based on two new

heuristics to identify the potential subtrees for mining 𝛿-discriminative itemsets. The potential

subtree is defined based on two conditions as defined below.

The maximum frequency of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎) in the dataset 𝑆𝑖 and dataset 𝑆𝑗, denoted

as Max_freq(𝑟𝑜𝑜𝑡, 𝑎 ), is defined as the sum of the frequencies in 𝑆𝑖 and 𝑆𝑗 of the itemsets in

𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎) as below.

Max_freq(𝑟𝑜𝑜𝑡, 𝑎 ) = ∑ (𝑓𝑖(𝑏) + 𝑓𝑗(𝑏))

𝑏∈𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡,𝑎)

(‎3-d)

Let 𝒮 be the power set of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎), 𝒮 = 2𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡,𝑎) , i.e., 𝒮 consists of

all subsets of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎). For 𝐵 ∈ 𝒮 and 𝐵 ≠ { }, the delta-discriminative value of the

itemsets is defined below.

Delta_dis(𝐵) =

∑ 𝑓𝑖(𝑏)

𝑏∈𝐵

, ∑(𝑓𝑖(𝑏)

𝑏∈𝐵

+ 𝑓𝑗(𝑏)) ≥ 𝑚𝑖𝑛 _𝑠𝑢𝑝 And ∑ 𝑓𝑖(𝑏)

𝑏∈𝐵

≤ ∑ 𝑓𝑗(𝑏)

𝑏∈𝐵

∑ 𝑓𝑗(𝑏)

𝑏∈𝐵

, ∑(𝑓𝑖(𝑏)

𝑏∈𝐵

+ 𝑓𝑗(𝑏)) ≥ 𝑚𝑖𝑛 _𝑠𝑢𝑝 And ∑ 𝑓𝑗(𝑏)

𝑏∈𝐵

< ∑ 𝑓𝑖(𝑏)

𝑏∈𝐵

(‎3-e)

Delta_dis(𝐵) is the smaller frequency of itemset 𝐵 among the two datasets. The

minimum delta-discriminative value of all itemsets in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 ending with a header item 𝑎,

denoted as Min_delta_dis(𝑟𝑜𝑜𝑡, 𝑎), is defined below.

Min_delta_dis(𝑟𝑜𝑜𝑡, 𝑎) = 𝑚𝑖𝑛𝐵∈𝒮{Delta_dis(𝐵)} (‎3-f)

The Min_delta_dis(𝑟𝑜𝑜𝑡, 𝑎) can be defined either based on the dataset 𝑆𝑖 or dataset 𝑆𝑗

depending on the frequency of the itemsets in the datasets. Obviously, Min_delta_dis(𝑟𝑜𝑜𝑡, 𝑎)

shows the highest discrimination among all subsets of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎).

In order to determine Min_delta_dis(𝑟𝑜𝑜𝑡, 𝑎), all possible itemsets in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 will

have to be generated as required in above equation. However, the generation of all possible

itemset combinations is time-consuming. An efficient method, as described below, is designed to

find a subset 𝐵 of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎) which makes Delta_dis(𝐵) minimum among all subsets.

Let 𝑆𝑓𝑖 , 𝑆𝑓𝑗 be sum frequency of all itemsets in 𝐵 in datasets 𝑆𝑖 and 𝑆𝑗, respectively.

The sum of 𝑆𝑓𝑖 and 𝑆𝑓𝑗 are checked for being larger than 𝑚𝑖𝑛 _𝑠𝑢𝑝 and with 𝑆𝑓𝑗 ≤ 𝛿 or 𝑆𝑓𝑖 ≤

𝛿. For the sake of simplicity we only show the 𝑆𝑓𝑗 ≤ 𝛿. The complete algorithm should cover

both 𝑆𝑓𝑗 ≤ 𝛿 and 𝑆𝑓𝑖 ≤ 𝛿.

Initially, 𝐹𝑖 is initialized by summing up the 𝑓𝑖(𝑏) frequencies of the itemsets 𝑏 with

𝑓𝑗(𝑏) = 0. If 𝐹𝑖 is frequent then the subtree 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 is potential.

We delete the frequencies of itemsets from sum frequency of all itemsets, one by one,

until we find the potential subtree or we finish the itemsets. Let 𝐼(𝑏) with maximum

(𝑆𝑓𝑖+𝑆𝑓𝑗)−(𝑓𝑖(𝐼(𝑏))+ 𝑓𝑗(𝐼(𝑏)))

𝑆𝑓𝑗− 𝑓𝑗(𝐼(𝑏)) in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 be defined as 𝑚𝑖𝑛_𝑙𝑒𝑎𝑓. The frequencies of itemset

𝑚𝑖𝑛_𝑙𝑒𝑎𝑓, is decreased from 𝑆𝑓𝑖 and 𝑆𝑓𝑗 every time until 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 become potential or no

more itemset (𝑏) remains. The algorithm for calculating the Max_freq(𝑟𝑜𝑜𝑡, 𝑎 ) and estimating

the Min_delta_dis(𝑟𝑜𝑜𝑡, 𝑎) is presented in Algorithm ‎3.4.

Algorithm ‎3.4 𝐌𝐚𝐱_𝐟𝐫𝐞𝐪_𝐌𝐢𝐧_𝐝𝐞𝐥𝐭𝐚_𝐝𝐢𝐬

Input: (1)𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡; (2) header item 𝑎 ∊ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡)

Output: (1) Max_freq(𝑟𝑜𝑜𝑡, 𝑎); (2) Min_delta_dis(𝑟𝑜𝑜𝑡, 𝑎).

1. 𝑆𝑓𝑖 = 0, 𝑆𝑓𝑗 = 0; 𝐹𝑖 = 0, 𝐹𝑗 = 0;

2. For each item 𝑏 ∊ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) do

3. 𝑆𝑓𝑖+= 𝑓𝑖(𝐼(𝑏)); 𝑆𝑓𝑗+= 𝑓𝑗(𝐼(𝑏));

4. If 𝑓𝑗(𝐼(𝑏)) = 0 then 𝐹𝑖+= 𝑓𝑖(𝐼(𝑏)); End if;

5. End For;

6. If 𝐹𝑖 ≥ 𝑚𝑖𝑛 _𝑠𝑢𝑝 then

7. Max_freq(𝑟𝑜𝑜𝑡, 𝑎) = 𝐹𝑖; Min_delta_dis(𝑟𝑜𝑜𝑡, 𝑎) = 0;

8. Else

9. While ∃𝑏 ∊ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) and 𝑏 is unchecked do

10. Find 𝐼(𝑏) with maximum (𝑆𝑓𝑖+𝑆𝑓𝑗)−(𝑓𝑖(𝐼(𝑏))+ 𝑓𝑗(𝐼(𝑏)))

𝑆𝑓𝑗− 𝑓𝑗(𝐼(𝑏)) as 𝑚𝑖𝑛_𝑙𝑒𝑎𝑓;

11. If (𝑆𝑓𝑖 + 𝑆𝑓𝑗) ≥ 𝑚𝑖𝑛 _𝑠𝑢𝑝 & 𝑆𝑓𝑗 ≤ 𝛿 then

12. Max_freq(𝑟𝑜𝑜𝑡, 𝑎) = 𝑆𝑓𝑖 + 𝑆𝑓𝑗;

13. Min_delta_dis(𝑟𝑜𝑜𝑡, 𝑎) = 𝑆𝑓𝑗.

14. Else

15. 𝑆𝑓𝑖−= 𝑓𝑖(𝑚𝑖𝑛_𝑙𝑒𝑎𝑓); 𝑆𝑓𝑗−= 𝑓𝑗(𝑚𝑖𝑛_𝑙𝑒𝑎𝑓);

16. End if;

17. Tag 𝑚𝑖𝑛_𝑙𝑒𝑎𝑓 as checked;

18. End While;

19. End if;

20. Return Max_freq(𝑟𝑜𝑜𝑡, 𝑎) and Min_delta_dis(𝑟𝑜𝑜𝑡, 𝑎); End.

in the two separated loops. In the first loop the Max_freq(𝑟𝑜𝑜𝑡, 𝑎) is calculated based on the all

itemsets in the 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡. The 𝐹𝑖 is also calculated based on itemsets with 𝑓𝑗(𝐼(𝑏)) = 0. In

case of 𝐹𝑖 ≥ 𝑚𝑖𝑛 _𝑠𝑢𝑝 the subtree is potential. In the other case and in the second loop, each

item is checked one time using a greedy method and Min_delta_dis(𝑟𝑜𝑜𝑡, 𝑎) is calculated based

on the selected items. The itemset 𝐼(𝑏) with smallest ratio 𝑅𝑖𝑗(𝐼(𝑏)) is found as 𝑚𝑖𝑛_𝑙𝑒𝑎𝑓.

Before decreasing the 𝑚𝑖𝑛_𝑙𝑒𝑎𝑓 frequencies from 𝑆𝑓𝑖 and 𝑆𝑓𝑗, it is checked if (𝑆𝑓𝑖 + 𝑆𝑓𝑗) ≥

𝑚𝑖𝑛 _𝑠𝑢𝑝 and 𝑆𝑓𝑗 ≤ 𝛿 then the subtree is potential, otherwise the 𝑚𝑖𝑛_𝑙𝑒𝑎𝑓 frequencies are

decreased from 𝑆𝑓𝑖 and 𝑆𝑓𝑗, respectively.

The HEURISTIC_Delta ‎3.1 is defined below.

HEURISTIC_Delta ‎3.1. 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 in the conditional FP-Tree in terms of a header item

𝑎 is considered as a potential delta-discriminative subtree denoted as

𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) if it satisfies the following conditions:

1. Max_freq(𝑟𝑜𝑜𝑡, 𝑎) ≥ 𝑚𝑖𝑛 _𝑠𝑢𝑝

2. Min_delta_dis(𝑟𝑜𝑜𝑡, 𝑎) ≤ 𝛿

Where 𝛿 is a small integer number, 𝑚𝑖𝑛 _𝑠𝑢𝑝 is the support threshold and 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎) is

a set of itemsets in subtree 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 ending with a header item

Lemma ‎3-8 (Potential delta-discriminative subtree) HEURISTIC_Delta ‎3.1 ensures

that none of the non-potential delta-discriminative subtrees contains any delta-discriminative

itemset.

Proof. The first condition in HEURISTIC_Delta ‎3.1 ensures that the sum of frequencies

in datasets of itemsets in a potential delta-discriminative subtree is frequent, which implies that

there could be an itemset in datasets in the subtree which is frequent. The second condition in

HEURISTIC_Delta ‎3.1 ensures that the minimum delta-discriminative value of a potential

discriminative subtree is smaller than the delta-discriminative value 𝛿, which implies that there

could be an itemset in the subtree whose discriminative value is smaller than the delta-

discriminative value 𝛿.

For a subtree, if it does not satisfy any of the two conditions, the subtree is considered as

a non-potential delta-discriminative subtree. Because a non-potential delta-discriminative subtree

breaches one or both of the conditions, the subtree does not contain any frequent itemset in

datasets or does not contain any itemset whose delta-discriminative value is smaller than the

delta-discriminative value 𝛿. According to definition, the subtree would not contain any delta-

discriminative itemset.

A potential subtree could contain non-𝛿-discriminative itemsets. Non-potential subsets

may also exist in a potential subtree as internal nodes. We define the HEURISTIC_Delta ‎3.2 for

the potential internal nodes in a same way as we did in the HEURISTIC_Delta ‎3.1.

The HEURISTIC_Delta ‎3.2 is defined below.

HEURISTIC_Delta ‎3.2. An internal node 𝑖𝑛 ∈ 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡 in a potential subtree

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 is considered as potential delta-discriminative internal node denoted as

𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑖𝑛) if it satisfies the following conditions:

1. Max_freq(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) ≥ 𝑚𝑖𝑛 _𝑠𝑢𝑝

2. Min_delta_dis(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) ≤ 𝛿

Where 𝛿 is a small integer number, 𝑚𝑖𝑛 _𝑠𝑢𝑝 is the support threshold and

𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) is a set of itemsets in subtree 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 ending with a header item

𝑎 ∊ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) with the internal node 𝑖𝑛 as subset.

Lemma ‎3-9 (Potential delta-discriminative internal node) HEURISTIC_Delta ‎3.2

ensures that none of the non-potential delta-discriminative internal nodes would occur in any

delta-discriminative itemset.

Proof. The first condition in HEURISTIC_Delta ‎3.2 ensures that the sum of

frequencies in datasets of itemsets with subset of the potential delta-discriminative internal node

is frequent, which implies that there could be an itemset in datasets in the subtree with subset of

the potential delta-discriminative internal node which is frequent. The second condition in

HEURISTIC_Delta ‎3.2 confirms that the minimum delta-discriminative value of the potential

delta-discriminative internal node is smaller than the delta-discriminative value 𝛿, which implies

that there could be an itemset in the subtree with subset of the potential delta-discriminative

internal node whose delta-discriminative value is smaller than the delta-discriminative value 𝛿.

For an internal node, if it does not satisfy any of the two conditions, the internal node is

considered as a non-potential delta-discriminative internal node. Because a non-potential delta-

discriminative internal node breaches one or both of the conditions, the internal node is not

contained in any frequent itemset in datasets or is not contained in any itemset whose delta-

discriminative value is smaller than the delta-discriminative value 𝛿. According to definition, the

internal node would not be contained in any delta-discriminative itemset.

The non-potential internal nodes are ignored from itemset combination generation in

𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡). The rest of the DISSparse algorithm remains the same as the original

DISSparse method proposed in this chapter. Based on the new heuristics DISSparse algorithm

discovers the all 𝛿-discriminative emerging patterns.

The DPM algorithm is also modified in a way to cover all the delta-discriminative

itemsets with any support (i.e., including redundant delta-discriminative emerging patterns). We

generate all the combinations of each delta-discriminative emerging pattern to find all the delta-

discriminative emerging patterns with explicit frequencies. The generated itemsets are checked

for the true frequencies using the hashing function proposed in DPMiner (Li, Liu and Wong

2007). The modified DISSparse and modified DPM are evaluated based on different datasets in

Chapter ‎6.

3.4.5 DISSparse summary

In this section, the DISSparse method proposed for efficient mining of the discriminative

itemsets in data streams, following the basics of FP-Growth method (Han, Pei and Yin 2000) and

by proposing determinative heuristics. The DISSparse method generates only the potential

discriminative itemsets ending with each Header-Table item to discover the discriminative

itemsets. Following FP-Growth method and as with the DISTree method the generation of

itemset combinations for each Header-Table item is based on the conditional patterns and the

conditional FP-Tree made for that item. In the DISSparse method the conditional FP-Tree is a

modified data structure and itemset generation from conditional FP-Tree is limited by taking

benefit of the sparse characteristics of the discriminative itemsets.

The DISSparse does not use a big intermediate data structure as DISTree and

discriminative itemsets are directly updated to the window model. The process of itemset

combination generation is based on the potential discriminative itemsets by eliminating the non-

potential discriminative itemsets using HEURISTIC ‎3.1 and HEURISTIC ‎3.2. The frequencies

of the non-discriminative itemsets appears as a subset of discriminative itemsets are extracted

using original FP-Tree. The DISSparse method, using the proposed heuristics and the potential

discriminative itemset generation, ensures the efficient and accurate discriminative itemset

mining with minimum space usage.

Based on the empirical evaluations in Chapter ‎6, the DISSparse method with the

proposed heuristics, exhibits an efficient time and space complexity especially with smaller

discriminative level thresholds. In the DISSparse method, the potential discriminative itemsets

are discovered in each subtree in the conditional FP-Tree and the minimized DISTree structure is

generated by the potential discriminative itemset combinations for a potential subtree of the

Header-Table item. The conditional FP-Tree is expanded during the process to cover the

subtrees with all the itemset prefixes. The number of generated itemsets in the DISSparse method

is much fewer compared to the exponential number of frequent itemsets in the target data stream

𝑆𝑖 as generated in DISTree method.

The precision of the DISSparse algorithm was proved by Theorem ‎3-3 based on its

correctness and completeness. The DISSparse method can be used correctly for offline updating

of the different window models.

3.5 CHAPTER SUMMARY

Considering the two proposed methods in this chapter, DISTree is applicable for small

datasets with simpler complexities. The DISTree method follows the generation of all

combinations of single paths in a conditional FP-Tree for the Header-Table items if they are

frequent in the target data stream. The itemsets generated out of each conditional FP-Tree are

tested based on the definitions of discriminative and non-discriminative itemsets. This is a time-

consuming process as the discriminative itemsets are a sparse subset of frequent itemsets. The

DISSparse method is proposed for efficient mining of discriminative itemsets using

determinative heuristics for restricting the process to the potential discriminative itemsets. Based

on the discussed theorems, both algorithms report the discriminative itemsets with full accuracy

and recall in a single batch of transactions.

The process of mining discriminative itemsets highly depends on the distributions of

transactions in batches in data streams. Following the principles of FP-Growth-based methods,

the FP-Tree structure is used in DISTree and DISSparse algorithms for holding the input

transactions in a concise way in the main memory. In the DISTree method, the DISTree structure

also stays in the main memory. Although the DISTree size is controlled by pruning the non-

discriminative itemsets, it may still consume high memory space for the large batches. A big

DISTree will result by the necessity of checking large number of itemset combinations as the

discriminative itemsets do not follow the Apriori property. In fact, it may not be feasible to hold

the DISTree structure in memory for large batches, so in the DISTree method it is necessary to

choose an appropriate size for the single batch of transactions.

In the DISTree method, many of the generated itemsets are frequent in both target data

stream 𝑆𝑖 and general data stream 𝑆𝑗. In the DISSparse method using the proposed heuristics

many non-discriminative itemsets are skipped from itemset combination generation in a

minimized DISTree caused by significant improvements in mining discriminative itemsets as

reported in Chapter ‎6. In the DISSparse method, the modified conditional FP-Tree structure is

generated for each processing Header-Table item. The discriminative itemsets are instantly

updated from the minimized DISTree structure, generated for a potential discriminative subtree,

to the window model without the need of a big intermediate data structure as in DISTree method.

These are small data structures as discussed theoretically and also evidenced by empirical

analysis in Chapter ‎6.

Both DISTree and DISSparse methods are working based on two scans. In the first scan,

the frequent items in the target data stream are found and the items in all input transactions and

the paths in prefix tree structures are ordered based on the descending order of the frequent items.

Several data stream mining algorithms (Giannella et al. 2003; Chi et al. 2006; Tanbeer et al.

2009) use two scans for making the concise data structures and faster processing time, in which

items are ordered by decreasing frequencies as in (Han, Pei and Yin 2000). In (Seyfi, Geva and

Nayak 2014) we proposed the DISTree algorithm based on a single scan but the performance of

the algorithm was highly affected with large data structures and high processing time.

We explained many different real world applications for mining discriminative itemsets.

One of the interesting scenarios is in market basket analyses by looking for itemsets being bought

more frequently in one market compared to the rest of markets. This can be used for

personalization or anomaly detection. Figure ‎3.10 shows a sample of discriminative itemsets with

different discriminative levels discovered in one market compared to the other markets. Based on

the empirical analysis in Chapter ‎6 on different datasets the average number of long

discriminative itemsets decreases by higher discriminative level thresholds. The information

provided with the discriminative itemsets can be used for better differentiation of the trends in the

target market compared to the trends in rest of the markets. Based on the experiments conducted

in Chapter ‎6, the discriminative itemsets with very high discriminative levels appear less often in

the total distribution of discriminative itemsets.

Figure ‎3.10 A sample of discriminative itemsets distribution with different discriminative

levels in market basket monitoring application

The interesting characteristic of discriminative itemsets compared to the frequent

itemsets is that all of the discovered discriminative itemsets are useful and meaningful (e.g., the

frequent itemsets have redundancy, which is limited by the closed frequent itemset). Also, many

of the (closed or) frequent itemsets are frequent in both the target data stream and the general data

stream. They may not be a good source of knowledge for differentiation between data streams.

The difference between the discriminative and frequent itemsets can be highlighted by significant

differences in the number of discriminative itemsets discovered in the data sets in comparison to

the frequent itemsets as reported in Chapter ‎6.

In this chapter, the multiple data streams were treated as the target data stream 𝑆𝑖 and the

rest of them combined as the general stream 𝑆𝑗. The methods remain the same principally, even if

the multiple data streams need to be considered separately. We list this as future work with

modification of the current implementation considering applications requiring more than two

data streams. In the next chapter, we propose the algorithms for mining discriminative itemsets in

data streams, using the tilted-time window model.

Chapter 4: Mining Discriminative Itemsets in Data Streams using the Tilted-time Window Model Page 76

Data Streams using the Tilted-time Window

In this chapter, the problem of mining discriminative itemsets in data streams using the tilted-

time window model is formally defined. The comprehensive research problem is outlined and

one method is proposed. The advanced high-efficient and high-accurate method called H-

DISSparse utilizes the DISSparse algorithm (Seyfi et al. 2017) proposed in Chapter ‎3 with the

tilted-time window model. The proposed method is explained in detail with its novel data

structures and offline updating of the tilted-time window model. In this chapter, in order to

achieve the best approximation in mining discriminative itemsets in data streams, we use the

properties of the discriminative itemsets in the tilted-time window model to propose a novel and

efficient method. The proposed method guarantee the approximate support and approximate ratio

bound in discriminative itemsets in the large and fast-growing data streams with the necessity of

concise process fitting in real world applications. In Chapter ‎6, the proposed method is

extensively evaluated on data streams made of multiple batches of transactions exhibiting diverse

characteristics and by setting different thresholds. Empirical analysis shows high efficiency in

time and space usage with the highest refined approximate bound in discriminative itemsets

gained by the H-DISSparse algorithm. To the best of our knowledge, the proposed method in this

chapter is the first algorithm for mining discriminative itemsets in data streams using the tilted-

time window model.

definition of the research problem using several notations is proposed in Section ‎4.2. The tilted-

time window model and its updating process are discussed in Section ‎4.3. In Section ‎4.4 the H-

DISSparse method is proposed as the advanced method for high-efficient and high-accurate

mining of discriminative itemsets in data streams using the tilted-time window model. The

chapter is finalized by discussion about the H-DISSparse method and its state-of-the-art

techniques in Section ‎4.5.

4.1 EXISTING WORKS

FP-Stream is a famous method proposed for mining frequent patterns from data streams

using the tilted-time window model (Giannella et al. 2003). Mining frequent patterns from data

streams has more challenges compared to the static datasets as the infrequent patterns can

become frequent later and cannot be ignored. The data structures need to be adjusted regularly by

pattern frequencies over times. The extended framework is proposed for mining the time-

sensitive frequent patterns in data stream with approximate support guarantee. The prefix tree

data structure proposed in (Han, Pei and Yin 2000) is an effective and concise data structure for

frequent pattern mining. The FP-Stream is a model based on FP-Tree consists of in-memory

pattern tree structure to capture the frequent and sub-frequent itemsets; and a tilted-time window

table for each itemset. By embedding the tilted-time window structure into each node, the space

is saved.

Using an efficient algorithm (Giannella et al. 2003) the FP-Stream structure is

maintained and updated over single data stream to summarize the frequent patterns at multiple

time granularities. Considering the large volume of dynamic data stream, it is not possible to

maintain the entire frequent patterns in the history in a limited size window. The frequent patterns

are compressed and stored using a prefix tree structure under a tilted-time window framework

residing in the main memory. The patterns in data stream are updated incrementally with

incoming transactions. The frequent and sub-frequent patterns are maintained in a pattern tree

data structure with a built-in tilted-time window model for each individual pattern. The method is

able to answer time-sensitive queries under the framework of FP-Stream over data streams with

an error bound guarantee.

For the completeness of frequent patterns for stream data, the FP-Stream method stores

the information related to infrequent itemsets. Considering the 𝑚𝑖𝑛 _𝑠𝑢𝑝 and relaxation ratio of

error rate ɛ, itemset 𝐼 is frequent if its support is no less than 𝑚𝑖𝑛 _𝑠𝑢𝑝, and it is sub-frequent if

the support is less than 𝑚𝑖𝑛 _𝑠𝑢𝑝 but not less than ɛ, otherwise it is infrequent. The sub-frequent

patterns are maintained as the patterns that are not frequent but become frequent later. The

infrequent patterns discarded as they are in large number and loosing information about them will

not affect the support of itemsets too much. Incremental update of the prefix tree and the

embedded tilted-time window model occurs when some infrequent patterns become sub-

frequent, or vice versa. The definition of frequent, sub-frequent, and infrequent patterns is

actually relative to a period of time. For example, a pattern may be sub-frequent over a period,

but it is possible that it becomes infrequent over a longer period.

In FP-Stream method (Giannella et al. 2003), the FP-Tree structure is used for saving

the information about the transactions in the current window. The discovered frequent patterns

are saved with the frequency histories in the tiled-time windows related to the pattern in the

pattern tree structure. This tree is similar to the FP-Tree but with the difference that it stores

patterns instead of transactions. Mining the frequent patterns over different time intervals in a

data stream assumes that the transactions can be scanned in a limited size window at any

moment. The pattern fragment method is used for mining the frequent patterns in the current

window frames. However, pattern fragment is not applied to the problem of discriminative

itemset mining as it does not follow the Apriori property. We discuss this in details in

Section ‎4.3. Using the FP-Stream structure it is possible to mine frequent patterns in the current

window, mine frequent patterns over different time ranges, put different weights on different

windows to mine various kinds of weighted frequent patterns, and mine evolution of frequent

patterns based on the changes of their occurrences in a sequence of windows.

The superiority of the tilted-time window model to the landmark window model and

damped window model is discussed in describing the research problem in the next section. The

details structure of the logarithmic window model and its updating process adjusted for mining

discriminative itemsets in data streams is presented in Section ‎4.3. The different types of the

tilted-time window models are discussed by explaining the incremental frequency merging and

frequency shifting in the window model. We explain the two types of pruning techniques

proposed in (Giannella et al. 2003) for mining frequent itemsets in data stream using the tilted-

time window model. Then we discuss the reasons why the proposed pruning techniques cannot

be used for mining discriminative itemsets from data stream using the tilted-time window model,

and we propose the new solutions.

The main contribution of this chapter is proposing one algorithm for mining

discriminative itemsets in data streams using the tilted-time window model. The H-DISSparse

method is an efficient method proposed in this chapter by utilizing the DISSparse method (Seyfi

et al. 2017) which is proposed in Chapter ‎3. The method is proposed based on the properties of

the discriminative itemsets in the tilted-time window model. The technical details of the proposed

algorithm are provided in a way to distinguish with the proposed DISSparse method.

The patterns in data streams are usually time sensitive and in many applications, users

are more interested in changes in patterns and their trends during the time, than the patterns

themselves (Giannella et al. 2003). The patterns appearing in the old time may not be dominant

anymore and may have lost their attractions (e.g., patterns in the news delivery services).

Particular groups of patterns appearing in a period of time should not affect the general trend of

the patterns in data streams during the history or recent time (e.g., patterns related to the specific

events). Time related patterns and the changes in their trends during the history of data streams

are of interest for recent patterns in the short time intervals and the old patterns in the larger time

intervals. These patterns are represented in the tilted-time window model in different time periods

for answering the time sensitive queries. The tilted windows are in different sizes and each one

points to the specific period of time. There are different types of tilted-time window models, like

the natural tilted-time window model and the logarithmic tilted-time window model, as explained

in detail in Section ‎4.3.

The discriminative itemsets in the tilted-time window model are defined as the frequent

itemsets in the target data stream that have frequencies much higher than that of the same

itemsets in other data streams in different time periods. Not losing generality, we call other data

streams a 'general data stream' for the sake of simplicity. The discriminative itemsets are

relatively frequent in the target data stream and relatively infrequent in the general data stream in

each time period during the history. An essential issue in this research problem is to find the

itemsets that can distinguish the target data stream from all other data streams in each time

period.

There are many real world scenarios that can show the significance of mining

discriminative itemsets in data streams using the tilted-time window model. Monitoring the

market basket transactions could have started a long time ago and treating the old and new

transactions in a same way, equally may not be useful for tracing market fluctuation and guiding

the business (e.g., as in the landmark window model (Manku and Motwani 2002)). The fading of

old transactions (e.g., reducing the weight as in the damped window model (Chang and Lee

2003)) may not be enough for those applications interested in finding the changes in the

discriminative itemsets and their trends during the time. Discriminative itemsets in the data

streams made of market basket dataset represented in a tilted-time window model are useful for

identifying the specific set of items that are of high interest in one market compared to the other

markets in different time periods. They are applicable for showing the relative changes in the data

stream trends in different time periods and for answering the time sensitive queries (e.g., in

applications of personalization, anomaly detection and prediction).

Web page personalization can be optimized by changes in user preferences in different

time periods or during specific events. User groups may have different preferences during the

time by visiting the groups of the web pages much more frequently than other user groups in

different time periods. The sequences of queries with higher support in one geographical area

compared to another area are time related. The discriminative sequences of queries related to the

specific events and changes in the relative trends in different geographical areas are monitored

separately in different time periods for better recommendation. Changes in the discriminative

pattern trends during network monitoring in the last few minutes are more valuable for anomaly

detection and network interference prediction than discriminative patterns themselves.

The data streams are processed in one scan and the tilted-time window model is updated

in an offline state with approximate support and approximate ratio in discriminative itemsets in

different time periods. The discriminative itemsets are reported with exact support and exact ratio

in one batch of transactions. The discriminative itemsets may happen between the borders of the

tilted-time window frames. The relaxation ratio 𝛼 is defined for saving the itemsets with

approximate frequencies. The discriminative itemsets are reported with approximate support and

approximate ratio between borders of the tilted-time window model. The multiple scans are not

acceptable in data stream mining (Aggarwal 2007; Han, Pei and Kamber 2011) and the

discriminative itemsets with exact minimum support and exact frequency ratio in data streams

cannot be discovered. The approximation can become worse in the larger time periods by

merging the approximate discriminative itemsets in the smaller time periods which caused for

increase in the number of false-positives and false-negatives.

The number of false-positives and false-negative discriminative itemsets must be

bounded for qualified answers. The discriminative itemsets do not follow the Apriori property

defined for the frequent itemsets and a subset of discriminative itemsets can be non-

discriminative. The tail pruning techniques proposed in (Giannella et al. 2003) for efficient

frequent itemset mining using the tilted-time window model by minimum support guarantee are

not applicable to discriminative itemset mining using the tilted-time window model. The

discriminative itemsets are a sparse subset of frequent itemsets. Using the properties of the

discriminative itemsets in the tilted-time window model three corollaries is defined. This

guarantee the highest refined approximate support and approximate ratio bound in the tilted-time

window model with efficient time and space complexity in the high speed data streams.

In this chapter, the discriminative itemset mining using the tilted-time window model is

discussed based on two data streams 𝑆𝑖 and 𝑆𝑗, modelled as multiple continuous batches of

transactions denoted as 𝐵1,‎…‎𝐵ℎ, 𝐵ℎ+1,…,‎𝐵𝑛. Later in Chapter ‎5, the problem is discussed in

detail using the sliding window model.

4.2.1 Problem formal definition

The formal definition of mining discriminative itemsets presented in Chapter ‎3 is

expanded for mining discriminative itemsets using the tilted-time window model as defined

below:

Let‎∑‎be‎the‎alphabet set of items, a transaction 𝑇 = {𝑒1, … 𝑒𝑖 , 𝑒𝑖+1, … , 𝑒𝑛}, 𝑒𝑖 ∈ ∑, is

defined as a set of items in ∑. The items in the transaction are in alphabetical order by default for

and general data streams; each consists of a different number of transactions, i.e., 𝑛𝑖 and 𝑛𝑗,

respectively. A group of input transactions from two data streams 𝑆𝑖 and 𝑆𝑗 in the pre-defined

time period are set as a batch of transactions 𝐵𝑛 i.e., 𝑛 ≥ 1.

The tilted-time window model is composed of different window frames denoted as 𝑊𝑘

i.e., 𝑘 ≥ 0 as in Figure ‎4.1. Each window frame 𝑊𝑘 refers to a different time period containing

itemsets made of transactions from different number of batches in two data streams 𝑆𝑖 and 𝑆𝑗

with the lengths of 𝑛𝑖𝑘 and 𝑛𝑗

𝑘, respectively. The current window frame is denoted as 𝑊0.

Figure ‎4.1 Tilted-time window frames

An itemset 𝐼 is defined as a subset of ∑. The itemset frequency is the number of

transactions that contains the itemset. The frequency of itemset 𝐼 in data stream 𝑆𝑖 in the window

frame 𝑊𝑘 is denoted as 𝑓𝑖𝑘(𝐼) and the frequency ratio of itemset 𝐼 in data stream 𝑆𝑖 in the

window frame 𝑊𝑘 is defined as 𝑟𝑖𝑘(𝐼) = 𝑓𝑖

𝑘(𝐼)/𝑛𝑖𝑘.

In this chapter, if the frequency ratio of itemset I in the target data stream 𝑆𝑖 in the

window frame 𝑊𝑘 is larger than the frequency ratio in the general data stream 𝑆𝑗, i.e., 𝑟𝑖𝑘(𝐼)

𝑟𝑗𝑘(𝐼)

then the itemset I can be considered as a discriminative itemset in the window frame 𝑊𝑘. Let

𝑅𝑖𝑗𝑘 (𝐼) be the ratio between 𝑟𝑖

𝑘(𝐼) and 𝑟𝑗𝑘(𝐼), i.e., 𝑅𝑖𝑗

𝑘 (𝐼) =𝑟𝑖𝑘(𝐼)

𝑟𝑗𝑘(𝐼)

. Obviously, the higher the

𝑅𝑖𝑗𝑘 (𝐼), the more discriminative the itemset 𝐼 is.

discriminative in the window frame 𝑊𝑘 if 𝑅𝑖𝑗𝑘 (𝐼) ≥ 𝜃. This is formally defined as:

𝑅𝑖𝑗𝑘 (𝐼) =

𝑟𝑖𝑘(𝐼)

𝑟𝑗𝑘(𝐼)

=𝑓𝑖𝑘(𝐼)𝑛𝑗

𝑓𝑗𝑘(𝐼)𝑛𝑖

𝑘 ≥ 𝜃 (‎4.1)

The 𝑅𝑖𝑗𝑘 (𝐼) could be very large but with very low 𝑓𝑖

𝑘(𝐼). In order to accurately identify

discriminative itemsets that have reasonable frequency in the window frame 𝑊𝑘, and also in the

case of 𝑓𝑗𝑘(𝐼) = 0, we introduce another user-specified support threshold, 0 < 𝜑 < 1 𝜃⁄ , to

eliminate itemsets that have very low frequency in the window frame 𝑊𝑘. In this chapter, an

itemset 𝐼 is considered as discriminative if its frequency in the window frame 𝑊𝑘 becomes

greater than 𝜑𝜃𝑛𝑖 i.e., 𝑓𝑖𝑘(𝐼) ≥ 𝜑𝜃𝑛𝑖 and also 𝑅𝑖𝑗

𝑘 (𝐼) ≥ 𝜃.

Definition ‎4.1. Discriminative itemsets in the tilted-time window model: Let 𝑆𝑖 and 𝑆𝑗 be

two data streams, with the current size of 𝑛𝑖𝑘 and 𝑛𝑗

𝑘 in a window frame 𝑊𝑘 i.e., 𝑘 ≥ 0 that

contain varied length transactions of items in ∑, a user defined discriminative level threshold

𝜃 > 1 and a support threshold 𝜑 𝜖 (0, 1 𝜃⁄ ). The set of discriminative itemsets in 𝑆𝑖 against 𝑆𝑗

in the tilted-time window model in the window frames 𝑊𝑘, denoted as 𝐷𝐼𝑖𝑗𝑘 i.e., 𝑘 ≥ 0, is

formally defined as:

𝐷𝐼𝑖𝑗𝑘 = {𝐼 ⊆ ∑ | 𝑓𝑖

𝑘(𝐼) ≥ 𝜑𝜃𝑛𝑖𝑘 & 𝑅𝑖𝑗

𝑘 (𝐼) ≥ 𝜃} (‎4.2)

The itemsets that are not discriminative in the current window frame 𝑊0, can be

discriminative in some larger window frames in the tilted-time window model (e.g., by merging

the multiple window frames). In order to avoid missing any potential discriminative itemsets in

larger window frames, we propose to identify sub-discriminative itemsets in the tilted-time

window model with a parameter specified by the user. The sub-discriminative itemsets are

discovered by relaxation of 𝛼 𝜖 (0,1) with more number sub-discriminative itemsets in smaller

𝛼. The 𝛼 is defined for approximate support and approximate ratio between borders of the tilted-

time window model. The itemsets 𝐼 is sub-discriminative if it is not discriminative but its

frequency in target data stream 𝑆𝑖 is not less than 𝛼𝜑𝜃𝑛𝑖 and its ratio is not less than 𝛼𝜃. The

discriminative itemsets are in interest; however, the sub-discriminative itemsets also kept

tracking during the process as they may be discriminative in larger window frames. For the sake

of clarity in this chapter any notation related to the sum up of all tilted window frames is denoted

as 0. .𝑚; for example, the ∑ 𝑓𝑖𝑘𝑚

𝑘=0 is denoted as 𝑓𝑖0..𝑚.

Definition ‎4.2. Sub-Discriminative itemsets in the tilted-time window model: Let 𝑆𝑖 and

𝑆𝑗 be two data streams, with the current size of 𝑛𝑖𝑘 and 𝑛𝑗

𝑘 in each window frame 𝑊𝑘 i.e., 𝑘 ≥ 0

that contain varied length transactions of items in ∑, 𝑚 be the number tilted window frames for

the itemset i.e., 𝑘 ≤ 𝑚, a user defined discriminative level threshold 𝜃 > 1, a support

threshold 𝜑 𝜖 (0, 1 𝜃⁄ ) and a relaxation parameter 𝛼 𝜖 (0,1). A set of sub-discriminative

itemsets in 𝑆𝑖 against 𝑆𝑗 in the tilted-time window model, denoted as 𝑆𝐷𝐼𝑖𝑗, is formally defined

𝑆𝐷𝐼𝑖𝑗 = {𝐼 ⊆ ∑ ∣ (𝑓𝑖0(𝐼) ≥ 𝛼𝜑𝜃𝑛𝑖

0 & 𝑅𝑖𝑗0 ≥ 𝛼𝜃) 𝑜𝑟 (𝑓𝑖

0..𝑚(𝐼) ≥ 𝛼𝜑𝜃𝑛𝑖0..𝑚 &

𝑓𝑖0..𝑚

𝑓𝑗0..𝑚 ∗

𝑛𝑗0..𝑚

𝑛𝑖0..𝑚

≥ 𝛼𝜃)}

(‎4.3)

The sub-discriminative itemsets are the potential discriminative itemsets in the current

window frames 𝑊0 or in the history of the data streams considering all window frames i.e., the

sum up of frequencies and sum up of data stream lengths in 𝑊0, …𝑊𝑘, 𝑊𝑘+1,… , 𝑊𝑚 denoted

as 𝑊0..𝑚. The relaxation of 𝛼 𝜖 (0,1) is defined for better approximate support and approximate

ratio in discriminative itemsets in the larger window frames with the trade-off between the

computational cost and less number of wrong or hidden discriminative itemsets.

The itemsets that are not discriminative and not sub-discriminative are defined as non-

discriminative itemsets:

Definition ‎4.3. Non-Discriminative itemsets in the tilted-time window model: Let 𝑆𝑖 and

𝑆𝑗 be two data streams, with the current size of 𝑛𝑖𝑘 and 𝑛𝑗

𝑘 in different window frame 𝑊𝑘

i.e., 𝑘 ≥ 0 that contain varied length transactions of items in ∑, 𝑚 be the number of tilted

window frames for the itemset i.e., 𝑘 ≤ 𝑚, a user defined discriminative level threshold 𝜃 > 1,

a support threshold 𝜑 𝜖 (0, 1 𝜃⁄ ) and a relaxation parameter 𝛼 𝜖 (0,1). The set of non-

discriminative itemsets in 𝑆𝑖 against 𝑆𝑗 in the tilted-time window model, denoted as 𝑁𝐷𝐼𝑖𝑗 is

𝑁𝐷𝐼𝑖𝑗 = {𝐼 ⊆ ∑ ∣ (𝑓𝑖0(𝐼) < 𝛼𝜑𝜃𝑛𝑖

0 𝑜𝑟 𝑅𝑖𝑗0 < 𝛼𝜃) & (𝑓𝑖

0..𝑚(𝐼) < 𝛼𝜑𝜃𝑛𝑖0..𝑚 𝑜𝑟

𝑓𝑖0..𝑚

𝑓𝑗0..𝑚 ∗

𝑛𝑗0..𝑚

𝑛𝑖0..𝑚

< 𝛼𝜃)}

(‎4.4)

The non-discriminative itemsets are not frequent in the target data stream with the

relaxation of 𝛼 and have frequency ratio in 𝑆𝑖 compared to 𝑆𝑗 less than 𝛼𝜃 in the current window

frame 𝑊0 or in the history of the data streams in all window frames 𝑊0..𝑚 by the sum up of

frequencies and sum up of data stream lengths in 𝑊0,‎…𝑊𝑘, 𝑊𝑘+1,…‎ ,𝑊𝑚 (i.e., 𝑊0..𝑚). The

non-discriminative itemsets are used for tail pruning in the tilted-time window model as

discussed in Section ‎4.3.

4.2.2 Discriminative itemset mining using the tilted-time window model

The problem of discriminative itemset mining using the tilted-time window model is

defined for offline batch processing in data streams. The tilted-time window model is updated in

an offline state in the specific time intervals defined for the batch of transactions. The new batch

of transactions is processed and the discriminative itemsets are saved in the current window

frame 𝑊0 in the tilted-time window model. The sub-discriminative itemsets are also saved as

they may become discriminative in the future by merging the window frames. By discovering the

discriminative and sub-discriminative itemsets from the current batch of transactions in 𝑊0, the

discriminative and sub-discriminative itemsets are discovered by shifting and merging the

itemsets in the older window frames 𝑊𝑘 i.e., 𝑘 > 0.

The tilted-time window model updating by shifting and merging is discussed in detail

using running examples by showing the significant challenges. The approximate discriminative

itemsets in the tilted-time window model get worse in the larger window frames. The

approximate discriminative itemsets in the smaller window frames are shifted and merged in the

larger window frames. This can cause a greater number of false-positives and false-negatives in

the tilted-time window model. The determinative properties of the discriminative itemsets in the

tilted-time window model are applied for the approximate bound guarantee. Considering the

complexities of the discriminative itemset mining in data streams using the tilted-time window

model, the advanced high accurate and high efficient H-DISSparse method is proposed by

utilizing the DISSparse method (Seyfi et al. 2017) proposed in Chapter ‎3 and defining new data

structures adapted for offline updating the tilted-time window model. The H-DISSparse

algorithm is evaluated based on the input data streams with different characteristics in Chapter ‎6

for their time and space complexities and different approximate support and approximate ratio

bounds in discriminative itemsets. The specific characteristics and the limitations of the proposed

methods in the large and fast growing data streams are discussed.

4.3 TILTED-TIME WINDOW MODEL

The tilted-time window model is a group of built-in tables within the nodes of a prefix

tree structure (i.e., H-DISStream as presented in Figure ‎4.3). Each node in the prefix tree

structure has a built-in table for holding the frequencies of the discriminative itemsets in different

time periods called window frames. The nodes in the prefix tree structure and each window

frame in the built-in tables has two counters 𝑓𝑖 and 𝑓𝑗 for holding the frequencies of the itemset in

the target data stream 𝑆𝑖 and the general data stream 𝑆𝑗, respectively. The discriminative itemsets

are reported in the different tilted window frames in offline time intervals. The tilted-time

window model grows by the new batch of transactions arriving during the time. However, based

on its maintenance strategies the window model remains as a compact data structure. The most

recent results are represented in the current window frames in small time intervals and the older

results are merged in the older window frames in larger time intervals. The tilted-time window

model can be defined based on different window frames such as the natural tilted-time window

frames or the logarithmic tilted-time window frames as below.

The natural tilted-time window model is designed by considering the smallest time

period (e.g., 60 seconds) as the current window frame and makes the larger window frames by

merging the smaller ones (e.g., quarters, hours, days, weeks, etc.); for example in the natural

tilted-time window model, every 15 single minute time periods is accumulated as one quarter and

every four quarters accumulated as one hour, and a day is built by merging 24 single hour time

periods. For reporting the results in a period of months, at most 59 tilted window frames need to

be maintained (Giannella et al. 2003). The logarithmic tilted-time window model is a more

compact data structure and an alternative way for presenting the natural tilted-time window

model; for example a batch of transactions in one minute is supposed as the smallest time period.

The current window frame shows the discriminative itemsets in the last minute and it is followed

by the results in the remaining slots of the next 2 minutes, 4 minutes, 8 minutes, etc. In this

chapter, the logarithmic tilted-time window model is applied in the proposed algorithms as in

Figure ‎4.2. In this structure the current window frame 𝑊0 covers the current time period 𝑡 and the

older window frame 𝑊𝑘 covers the 2𝑘𝑡 time periods i.e., 𝑘 ≥ 0.

Figure ‎4.2 Logarithmic tilted-time window structure (Giannella et al. 2003)

The H-DISStream is a prefix tree structure with the built-in tilted-time window model as

defined below.

H-DISStream: The H-DISStream prefix tree structure is for holding the discovered

discriminative and sub-discriminative itemsets as well. Both discriminative and sub-

discriminative itemsets share the branches in the same H-DISStream structure for their most

common frequent items (e.g., as in Figure ‎4.3). Each path in the H-DISStream may represent a

subset of multiple discriminative and sub-discriminative itemsets started from the root of the

prefix tree structure. Each node in the H-DISStream has two counters 𝑓𝑖 and 𝑓𝑗 for holding the

frequencies of an itemset in the target data stream 𝑆𝑖 and the general data stream 𝑆𝑗, respectively

in the current window frame 𝑊0. The Header-Table is defined for fast traversing the prefix tree

structure using the links holding the itemsets ending with identical items and the nodes are tagged

as discriminative, sub-discriminative or non-discriminative (i.e., subset of discriminative or sub-

discriminative). Each H-DISStream node may have a built-in tilted-time window frame if the

itemset is discriminative in the larger window frame 𝑊𝑘 (i.e., 𝑘 > 0), sub-discriminative in the

history summary of the window frames (i.e., 𝑊0..𝑚) or appeared as a non-discriminative subset of

discriminative or sub-discriminative itemsets in any window frame 𝑊𝑘 (i.e., 0 < 𝑘 ≤ 𝑚).

For the sake of clarity in this chapter, the non-discriminative subset of discriminative or

sub-discriminative itemsets in any window frame 𝑊𝑘 is called a non-discriminative subset; for

example, in Figure ‎4.3 𝑐12,13 is the non-discriminative itemset which is the subset of other

discriminative itemsets in 𝑊0, by considering the discriminative level 𝜃 = 2, the support

threshold 𝜑 = 0.1 and dataset length in two data streams 𝑛10 = 𝑛2

0 = 15. The H-DISStream is

updated by its tilted-time window model in offline time intervals after processing a new batch of

transactions in the pre-defined time intervals; for example, in Figure ‎4.3 the discriminative

itemset 𝑐𝑏𝑎4,2 is discovered in the current window frame 𝑊0. The itemset 𝑐𝑏𝑎 has a built-in

table with four entries related to the older window frame 𝑊𝑘 i.e., 𝑘 > 0.

Figure ‎4.3 A sample H-DISStream based on Example ‎3.1 with the built-in tilted-time

window model

The construction of the H-DISStream data structure (i.e., 𝑊0) is started based on the

discriminative itemsets discovered out of the first batch of transactions. By processing every new

batch of transactions the discriminative itemsets are shifted and the tilted windows merged

together. The pruning happens in the H-DISStream if the itemset is non-discriminative in the

tilted-time window model and stays as leaf node.

4.3.1 Tilted-time window model updating

The data streams are defined as continuous batches, each containing a different number

of transactions depending on the speed of data streams. This is showed as 𝐵1,‎…‎𝐵ℎ, 𝐵ℎ+1,…‎,‎

𝐵𝑛 with 𝐵𝑛 as the most recent one and 𝐵1 as the oldest one. For each itemset 𝐼 the two counters

𝑓𝑖(𝐼) < 𝑥, 𝑦 > and 𝑓𝑗(𝐼) < 𝑥, 𝑦 > denote the frequencies of the itemset I in data streams 𝑆𝑖 and

𝑆𝑗, respectively in the group of continuous batches 𝐵𝑥 to 𝐵𝑦 with 𝑥 ≥ 𝑦. For the sake of clarity,

the itemset 𝐼 and data stream indicatives are omitted from the context (i.e., 𝑓𝑖(𝐼) < 𝑥, 𝑦 > and

𝑓𝑗(𝐼) < 𝑥, 𝑦 > denoted as 𝑓 < 𝑥, 𝑦 >). The frequencies of an itemset in the logarithmic tilted-

time window model are kept during the history as follows:

𝑓 < 𝑛, 𝑛 >; 𝑓 < 𝑛 − 1, 𝑛 − 1 >; 𝑓 < 𝑛 − 2, 𝑛 − 3 >; 𝑓 < 𝑛 − 4, 𝑛 − 7 >;… (‎4-a)

The ratio between sizes of two neighbour windows (i.e., number of batches of two

neighbour windows) shows the window frame growth rate (e.g., in Figure ‎4.2 the window frame

growth rate is 2). It should be noted that in the logarithmic tilted-time window model

⌈𝑙𝑜𝑔2(𝑛) + 1⌉ frequencies exist, so the number is small enough even for a large number of

batches (e.g., 109 batches requires 31 couple of frequencies) (Giannella et al. 2003). For

updating the tilted-time window model by shifting and merging the older window frames the

intermediate windows denoted as [𝑓𝑖(𝐼) < 𝑥, 𝑦 >] and [𝑓𝑗(𝐼) < 𝑥, 𝑦 >] are used as extra

memory spaces as shown below.

The 𝑓 < 𝑛, 𝑛 > saves the frequencies of the itemsets discovered from the current batch

of transactions in 𝑊0 (i.e., 𝑓𝑖0(𝐼) and 𝑓𝑗

0(𝐼) in H-DISStream prefix tree structure). By processing

the new batch of transactions the 𝑓 < 𝑛, 𝑛 > is shifted and replaces the 𝑓 < 𝑛 − 1, 𝑛 − 1 > in

𝑊1 (i.e., 𝑓𝑖1(𝐼) and 𝑓𝑗

1(𝐼)) and the recent frequencies set to the 𝑓 < 𝑛, 𝑛 >. Before shifting the

𝑓 < 𝑛 − 1, 𝑛 − 1 > to the next level, it is checked to see if its intermediate window is empty

then it is shifted to that, otherwise the frequencies in the 𝑓 < 𝑛 − 1, 𝑛 − 1 > and its intermediate

window are added together and shifted to the next level 𝑓 < 𝑛 − 2, 𝑛 − 3 > in 𝑊2 (i.e., 𝑓𝑖2(𝐼)

and 𝑓𝑗2(𝐼)). This process will continue until shifting stops.

Following (Giannella et al. 2003) 𝐵1 is the oldest batch, but 𝑊0 is the latest window, so,

𝐵1 is in 𝑊𝑚 (if 𝑚 is the oldest window), 𝐵𝑛 is in 𝑊0. If we put the batches and the windows

together in one line, the indexes of batches are decreasing from current time to old time, while the

indexes of windows are increasing from current time to old time.

The processing scenario for continued batches of transactions is presented in the

example below:

Example ‎4.1. The first batch of transactions 𝐵1 is processed and the discovered

discriminative itemsets are set to the H-DISStream considered as 𝑓 < 1,1 > which is the most

recent window frame (𝑊0) in the tilted-time window model. By processing the next batch of

transactions 𝐵2, the itemsets from H-DISStream are shifted to the older window frame (𝑊1)

represented as 𝑓 < 1,1 > in the window frame table and the new discovered itemsets are set to

H-DISStream as 𝑓 < 2,2 > (i.e., 𝑊0). The process continues for another batch of transactions

𝐵3, and the discovered itemsets in H-DISStream and its tilted-time window model set as 𝑓 <

3,3 >, 𝑓 < 2,2 > [𝑓 < 1,1 >]. The [𝑓 < 1,1 >] is the intermediate window frame and by a

new batch of transactions it will be merged with 𝑓 < 2,2 > and is represented in the tilted-time

window model as 𝑓 < 4,4 >; 𝑓 < 3,3 >; 𝑓 < 2,1 >. The full process is represented step-by-

step for the first 10 batches of transactions as in Figure ‎4.4:

𝑓 < 1,1 >

𝑓 < 2,2 >; 𝑓 < 1,1 >

𝑓 < 3,3 >; 𝑓 < 2,2 > [𝑓 < 1,1 >]

𝑓 < 4,4 >; 𝑓 < 3,3 >; 𝑓 < 2,1 >

𝑓 < 5,5 >; 𝑓 < 4,4 > [𝑓 < 3,3 >]; 𝑓 < 2,1 >

𝑓 < 6,6 >; 𝑓 < 5,5 >; 𝑓 < 4,3 > [𝑓 < 2,1 >]

𝑓 < 7,7 >; 𝑓 < 6,6 > [𝑓 < 5,5 >]; 𝑓 < 4,3 > [𝑓 < 2,1 >]

𝑓 < 8,8 >; 𝑓 < 7,7 >; 𝑓 < 6,5 >; 𝑓 < 4,1 >

𝑓 < 9,9 >; 𝑓 < 8,8 > [𝑓 < 7,7 >]; 𝑓 < 6,5 >; 𝑓 < 4,1 >

𝑓 < 10,10 >; 𝑓 < 9,9 >; 𝑓 < 8,7 > [𝑓 < 6,5 >]; 𝑓 < 4,1 >

Figure ‎4.4 Tilted-time window model updating

The tilted-time window model is used for saving the older discovered discriminative and

sub-discriminative itemsets by merging the smaller window frames in the larger sizes. The

itemsets inside the larger window frames 𝑊𝑘 (i.e., 𝑘 > 0) must be tested after merging under

updated frequencies and stream lengths based on Definition ‎4.1, Definition ‎4.2 and Definition ‎4.3

and tagged as discriminative, sub-discriminative or non-discriminative itemsets, respectively.

The sub-discriminative itemsets have potential to be discriminative by merging the window

frames as presented in the example below.

Example ‎4.2. Considering 𝜃=3 and support threshold 𝜑 = 0.1, the itemset 𝐼 with

frequencies 𝑓10(𝐼) = 5 and 𝑓2

0(𝐼) = 2 in window frame 𝑊0 with the lengths of 𝑛10 = 10 and

𝑛20 = 10 is non-discriminative. The itemset 𝐼 with frequencies 𝑓1

1(𝐼) = 5 and 𝑓21(𝐼) = 1 in the

window frame 𝑊1 with the lengths of 𝑛11 = 15 and 𝑛2

1 = 15 is discriminative. By setting

relaxation of 𝛼 = 0.8 the itemset 𝐼 is not omitted and discovered as discriminative itemset in the

larger window frame by shifting and merging the window frames 𝑊0 and 𝑊1.

4.3.2 Discriminative itemsets approximate bound

The pruning techniques have been proposed in (Giannella et al. 2003) for the efficient

mining of frequent itemsets using the tilted-time window model by approximate minimum

support guarantee. These techniques are not applicable to the problem of mining discriminative

itemsets using the tilted-time window model, as explained briefly in below.

The first pruning technique in (Giannella et al. 2003) is related to the tail pruning in the

tilted-time window model by accepting an error threshold boundary of the false positive frequent

itemsets. The tail sequences of the oldest tilted-time window frames related to the itemset are

pruned if the itemset is not frequent in any of those window frames and not sub-frequent in the

history of the data stream from the current time period to any of those window frames, by the

defined error threshold. Based on the claim in (Giannella et al. 2003) the number of false positive

answers is reasonable if the error threshold set small enough. However, setting a small error

threshold is in contradiction with efficiency as the large number of sub-frequent itemsets with

very low support has to be generated and saved in the tilted-time window model. Empirical

analysis shows the number of frequent itemsets grows exponentially in smaller supports.

The discriminative itemsets are a subset of frequent itemsets and in many applications

the discriminative itemsets are in interest with low support and low ratio (e.g., in anomaly

detection). Also, in the research problem of discriminative itemset mining, the itemsets in each

window frame have at least two frequencies related to the target data stream and general data

stream. By dropping tail sequences of the oldest tilted-time window frames, the discriminative

itemsets will be missed during merging the older window frames, causing false negatives and

less recall; for example, because of lack of itemsets in window frames with possible high ratio

between the lengths of target data stream and general data stream (i.e., 𝑅𝑖𝑗𝑘 (𝐼)≫ 1). This can also

cause false positive and less accuracy; for example, because of lack of itemsets in window frames

with possible low ratio between the lengths of the target data stream and the general data stream

(i.e., 𝑅𝑖𝑗𝑘 (𝐼)≪ 1).

The second pruning technique in (Giannella et al. 2003) is based on the anti-monotone

Apriori property in the frequent itemsets. The superset of the frequent itemset has a frequency

equal to or less than its subset, that is hold in all the window frames in the tilted-time window

model. Hence if the itemset is not frequent in the current batch then none of its supersets need be

examined. Following this if the tail of itemsets can be dropped based on the explained tail

pruning techniques, then the possible existed similar tail in its all supersets can be dropped. The

Apriori property is not valid for the discriminative itemset mining and an itemset can be

discriminative in different window frames with non-discriminative subsets in the same window

frames.

In this chapter, the properties of the discriminative itemsets in the tilted-time window

model are applied within efficient time and space usage for the highest refined approximate

support and approximate ratio bound, by minimizing the number of false-positives and false-

negatives discriminative itemsets in in data streams.

The FP-Tree is defined based on the basics of FP-Growth (Han, Pei and Yin 2000) out

of frequent items of transactions by sharing the branches for their most common frequent items.

This was adapted in Chapter ‎3 by two counters in each node for holding the frequencies of

itemsets in the target dataset and general dataset. In the proposed method in this chapter, the FP-

Tree is generated for the current batch of transactions as a similar prefix tree structure made for

one batch of transactions in Chapter ‎3, but without pruning infrequent items (i.e., FP-Tree

includes all items). The FP-Tree includes the infrequent items in the target data stream 𝑆𝑖 in the

current batch of transactions, so it can be used for mining the frequencies of the itemsets that are

non-discriminative subsets of the discriminative itemsets in the older window frames. It should

be noted that although the FP-Tree includes all the items, each conditional FP-Tree that is

generated during offline batch processing follows the basics of FP-Growth (Han, Pei and Yin

2000) by pruning items that are infrequent in the target data stream 𝑆𝑖. This ensures that the FP-

Tree including all the items does not add high complexity to the batch processing using

DISSparse algorithm. The discovered discriminative itemsets are saved in a pattern tree with a

built-in tilted-time window model as we explain in the section below.

4.3.2.1 Maintaining the discriminative itemsets in the tilted-time window model

In the proposed H-DISSparse method in this chapter, the itemsets that are discriminative,

or appeared as non-discriminative subset (of discriminative itemsets) at least in one window

frame 𝑊𝑘 (i.e., 𝑘 ≥ 0) are saved in the tilted-time window model during the history of data

streams. The exact set of discriminative itemsets in the current batch of transactions is discovered

using DISSparse algorithm during batch processing respectively and is held in the current

window frame 𝑊0. The exact frequencies of non-discriminative subsets in the current window

frame 𝑊0 are obtained by traversing the FP-Tree through Header-Table links for their

appearances in the current batch of transactions. The tilted-time window frames are updated by

shifting and merging the itemset frequencies in the larger window frames after processing each

batch of transactions in the offline state.

Property ‎4.1. Exact set of discriminative itemsets in the current batch of transactions is

held in the current window frame 𝑊0.

This property says that the entire discriminative itemsets in current window frame 𝑊0

are saved with their exact frequencies. The discriminative itemsets in the current window frame

𝑊0 are discovered using the DISSparse algorithm (Seyfi et al. 2017) with 100% accuracy and

recall.

Property ‎4.2. Exact set of non-discriminative subsets (of discriminative itemsets) in the

current batch of transactions is held in the current window frame 𝑊0.

This property says that the entire non-discriminative itemsets stays as internal node in

current window frame 𝑊0 (i.e., subset of discriminative itemsets) are saved with their exact

frequencies. The exact frequencies of the non-discriminative itemsets (subsets) in the current

window frame 𝑊0 are obtained by traversing the FP-Tree through Header-Table links for their

appearances in the current batch of transactions.

The first corollary is formally defined below.

Corollary ‎4-1. An exact set of itemsets including discriminative and non-discriminative subsets

in the current batch of transactions is held in the current window frame 𝑊0.

Where the current window frame 𝑊0 is the H-DISStream prefix tree structure and the non-

discriminative subsets are the subset of discriminative itemsets at least in one window frame 𝑊𝑘

(i.e., 𝑘 ≥ 0) in the tilted-time window model during the history of data streams.

Rationale ‎4-1. (H-DISStream holds exact frequencies of itemsets from the time they are

maintained in the tilted-time window model) Corollary ‎4-1 ensures that any itemset in the H-

DISStream structure and its built-in tilted-time window model has the exact frequencies in each

window frame 𝑊𝑘 (i.e., 0 ≤ 𝑘 ≤ 𝑚 where 𝑚 is the oldest window frame related to the itemset).

Proof. The precision of the DISSparse algorithm for mining discriminative itemsets in a

batch of transactions has been proved in Chapter ‎3 based on Theorem ‎3-3. The Property ‎4.1

ensures the exact set of discriminative itemsets is held in the current window frame 𝑊0. The

Property ‎4.2 ensures to hold the exact frequencies of non-discriminative subsets in the current

window frame 𝑊0. The FP-Tree holds all the items including the infrequent items in the target

data stream 𝑆𝑖. This implies that the frequencies of non-discriminative subsets that in the current

batch of transactions are infrequent in the target data stream 𝑆𝑖, are not missed. The above two

properties imply that any itemset in 𝑊0 holds the exact frequencies in the current batch of

transactions. The tilted-time window updating is done by shifting and merging the itemset

frequencies in the smaller window frames, which are holding the exact frequencies of itemsets, to

the larger window frames. These ensure the exact frequencies of the itemset in every tilted

window frames 𝑊0, 𝑊1,…, 𝑊𝑚 (i.e., 𝑊𝑚 is the oldest window frame related to itemset).

Based on Corollary ‎4-1 the exact itemset frequencies are held in each window frame by

considering the oldest window frame where the itemset has appeared in the H-DISStream

structure. However, the itemset frequencies may have been ignored during batch processing in

the oldest window frames (i.e., as non-discriminative itemset without any discriminative

superset). The frequencies of an itemset are less than or equal to its actual frequencies in the

history of data streams considering all input batches 𝐵1,‎…‎𝐵ℎ, 𝐵ℎ+1,…‎ ,‎𝐵𝑛. Let 𝑊𝑥 be the

oldest possible window frame in H-DISStream, and 𝑊𝑚 be the oldest window frame related to

the itemset 𝐼 in H-DISStream (i.e., m ≤ 𝑥), the following statements hold.

∑𝑓𝑖𝑘(𝐼) ≤ ∑𝑓𝑖

𝑘(𝐼)

𝑘=0

, ∑𝑓𝑗𝑘(𝐼) ≤ ∑𝑓𝑗

𝑘(𝐼)

𝑘=0

(‎4-b)

∑𝑛𝑖𝑘 ≤ 𝑛𝑖 , ∑𝑛𝑗

𝑘 ≤ 𝑛𝑗

𝑘=0

(‎4-c)

The conditions above caused approximate frequencies and approximate ratios in

discriminative itemsets in the tilted-time window model. The discriminative itemsets with

approximate frequencies less than their exact frequencies may be missed in the tilted-time

window model. The approximate ratio can be less or more than the actual ratio considering the

ratios in the oldest window frames 𝑊𝑘 i.e., 𝑚 < 𝑘 ≤ 𝑥 (e.g., 𝑅𝑖𝑗𝑘 (𝐼)≫ 1 or 𝑅𝑖𝑗

𝑘 (𝐼)≪ 1,

respectively). Based on the above conditions one of the following two statements holds.

∀ 𝑙, 0 ≤ 𝑙 ≤ 𝑚 ≤ 𝑥,∑ 𝑓𝑖

𝑘(𝐼)𝑚

𝑘=𝑙

∑ 𝑓𝑗𝑘(𝐼)

𝑘=𝑙

∗∑ 𝑛𝑗

𝑘𝑚

𝑘=𝑙

∑ 𝑛𝑖𝑘

𝑘=𝑙

≤∑ 𝑓𝑖

𝑘(𝐼)𝑥

𝑘=𝑙

∗∑ 𝑛𝑗

𝑘𝑥

𝑘=𝑙

∑ 𝑛𝑖𝑘

𝑘=𝑙

(‎4-d)

∀ 𝑙, 0 ≤ 𝑙 ≤ 𝑚 ≤ 𝑥,∑ 𝑓𝑖

𝑘(𝐼)𝑚

𝑘=𝑙

∗∑ 𝑛𝑗

𝑘𝑚

𝑘=𝑙

∑ 𝑛𝑖𝑘

𝑘=𝑙

>∑ 𝑓𝑖

𝑘(𝐼)𝑥

𝑘=𝑙

∗∑ 𝑛𝑗

𝑘𝑥

𝑘=𝑙

∑ 𝑛𝑖𝑘

𝑘=𝑙

(‎4-e)

To improve the accuracy we define a relaxation ratio for discovering the discriminative

itemsets between borders of the tilted-time window model as we explain in the section below.

4.3.2.2 Improving the accuracy using relaxation ratio

Data stream mining algorithms basically must be designed based on a single scan

(Aggarwal 2007; Han, Pei and Kamber 2011) as the multiple scans of datasets is often too

expensive. The discriminative itemsets with approximate ratios less than their exact ratios may be

missed in the tilted-time window model i.e., false-negatives. The non-discriminative itemsets

with approximate ratios greater than their exact ratios may be reported in the tilted-time window

model as discriminative itemsets i.e., false-positives. The approximation can be refined using a

relaxation of 𝛼 by sub-discriminative itemsets based on the Definition ‎4.2. The sub-

discriminative itemsets are discovered during batch processing by modifying the DISSparse

algorithm, and are saved in the current window frame 𝑊0. The two heuristics HEURISTIC ‎3.1

and HEURISTIC ‎3.2 proposed in DISSparse method (Seyfi et al. 2017) are modified by the

relaxation of 𝛼, for holding the sub-discriminative itemsets during the batch processing. The sub-

discriminative itemsets are saved in the tilted-time window model as the potential discriminative

itemsets following Definition ‎4.2 and based on the relaxation of 𝛼.

Property ‎4.3. By modifying the Corollary ‎4-1 using relaxation of 𝛼 a set of non-

discriminative itemsets in the current batch is held in 𝑊0 are discovered as sub-discriminative

itemset.

This property says that the sub-discriminative itemsets are discovered by choosing the

relaxation of 𝛼 from non-discriminative itemsets. The sub-discriminative itemsets in the current

window frame 𝑊0 and the history of data streams are discovered for better approximation in

itemset frequencies and itemset frequency ratios.

Property ‎4.4. By using smaller relaxation of 𝛼 a better approximation will be in the

discriminative itemsets in the tilted-time window model.

This property says that the more sub-discriminative itemsets are discovered by choosing

the smaller relaxation of 𝛼. This is a trade-off between better approximations in discriminative

itemsets in the tilted-time window model and computation cost.

The second corollary is formally defined below.

Corollary ‎4-2. A refined approximate bound in discriminative itemsets in the tilted-time window

model is obtained by modifying the Corollary ‎4-1 based on relaxation of 𝛼.

Where 𝛼 is the relaxation threshold for sub-discriminative itemset and Corollary ‎4-1 is for

holding the exact set of discriminative itemsets and non-discriminative subsets in the current

batch of transactions in the current window frame 𝑊0.

Rationale ‎4-2. (highest refined approximate bound in discriminative itemsets in the

tilted-time window model) Corollary ‎4-2 ensures that the approximation in discriminative

itemsets in the tilted-time window model may be improved by holding the sub-discriminative

itemsets in the tilted-time window model.

Proof. The sub-discriminative itemsets have potential to be discriminative by merging in

the larger window frames. The sub-discriminative itemsets improve the approximate bound in

discriminative itemsets by increasing the number of window frames that hold the exact

frequencies of itemsets in the tilted-time window model. This caused less number of false-

positives and false-negatives in the discriminative itemsets in the tilted-time window model.

Considering two relaxations of 𝛼 and 𝛼′ i.e., 𝛼 ≤ 𝛼′ the approximate bound for itemset 𝐼 in the

tilted-time window model is obtained by its exact frequencies from the time it is maintained in

the window model 𝑊𝑚 and 𝑊𝑚′, respectively i.e., 𝑚 ≥ 𝑚′.

The size of the pattern tree (i.e., H-DISStream) and its built-in tilted-time window model

can be large in the history of data stream. We propose a tail pruning techniques for holding the

in-memory data structures in a reasonable size as we explain in the section below.

4.3.2.3 Tail pruning in the tilted-time window model

The discriminative itemsets are a sparse subset of frequent itemsets. The H-DISStream

with built-in tilted-time window frames, in principle, is much smaller than the FP-Stream used in

(Giannella et al. 2003) for frequent itemset mining in data streams using the tilted-time window

model. However, without effective pruning techniques, the H-DISStream structure can still

become unnecessarily large during the history of data streams. Using a reasonable relaxation of 𝛼

a non-discriminative itemset in the H-DISStream structure is non-potential to be discriminative,

with the approximate bound. The tail pruning in H-DISStream is applied for space saving by

pruning the non-discriminative itemsets based on the Definition ‎4.3.

Property ‎4.5. The set of non-discriminative itemsets are defined based Definition ‎4.3

and can be tagged for deletion for space saving.

This property says that the large number of non-discriminative itemsets can be tagged for

deleting and space saving in the tilted-time window model in data streams.

Property ‎4.6. The set of non-discriminative itemsets stay as leaf node are deleted from

tilted-time window model for space saving.

This property says that the large number of non-discriminative itemsets is deleted if they

stay as leaf node. This tail pruning will cause for large space saving in the tilted-time window

model in data streams.

The third corollary is formally defined below.

Corollary ‎4-3. An itemset in the H-DISStream and its built-in tilted-time window model is

pruned if it is non-discriminative itemsets and stays as tail itemset.

Where the tail itemset is not a subset of any discriminative or sub-discriminative itemsets in any

𝑊𝑘 i.e., 𝑘 ≥ 0 and the Definition ‎4.3 defines the non-discriminative itemsets in data streams in

the tilted-time window model.

Rationale ‎4-3. (concise H-DISStream structure) Corollary ‎4-3 ensures that any itemset

that is held in the tilted-time window model is a discriminative, sub-discriminative or non-

discriminative subset in the history of data streams.

Proof. The H-DISStream is made up of a compact prefix tree structure by sharing

branches for their most common frequent items. The logarithmic built-in tilted-time window

model is also a very compact data structure. The Property ‎4.5 ensures that the discriminative and

sub-discriminative itemsets are not pruned. The Property ‎4.6 ensures that the non-discriminative

subsets are not pruned. These imply that the itemsets are pruned if they have the least potential to

be discriminative in the recent trends in the data streams. The more space is saved using

Corollary ‎4-3 by pruning the non-discriminative itemsets staying as leaf node and their possible

direct non-discriminative subsets iteratively. In the tilted-time window model consider itemsets

𝐼 ⊂ 𝐼′ which are both in the H-DISStream structure at the end of the batch processing. Let 𝑊0,

𝑊1,…, 𝑊𝑚 and 𝑊0, 𝑊1,…, 𝑊𝑚′ be the window frames that are maintained in the tilted-time

window model for the itemsets 𝐼 and 𝐼′, respectively. The number of window frames related to

the itemset 𝐼 is equal or more than the number of window frames related to the itemset 𝐼′ (i.e., 𝑚

≥ 𝑚′ ).

The periodic changes in discriminative itemsets are happened by concept drifts in data

streams and the discriminative itemsets in the neighbour window frames become considerably

different. The pruned non-discriminative itemsets as explained are basically the least potential

discriminative itemsets. In Chapter ‎6, the principles are proposed for setting the relaxation of 𝛼

based on data stream characteristics.

Claim ‎4-1 Based on Rationale ‎4-1, Rationale ‎4-2 and Rationale ‎4-3, the highest

refined approximate bound in discriminative itemsets is achieved efficiently by setting relaxation

of 𝛼 to a reasonably small value and applying the tail pruning.

The Claim ‎4-1 essentially says the approximation in discriminative itemset mining in

data streams using the tilted-time window model is improved by saving a greater number of sub-

discriminative itemsets. The sub-discriminative itemsets have potential to be discriminative

itemsets in the recent history of data streams. We call this the highest refined approximate bound

in the discriminative itemsets in the tilted-time window model, which is obtained by the less

number of false-positives and false-negatives during the recent history of data streams.

In the next sections, a single pass algorithm is proposed for mining discriminative

itemsets in data streams using the tilted-time window model. The prefix tree structure H-

DISStream with the built-in tilted-time window model is used in algorithm following the defined

corollaries for the tilted-time window model updating in this section.

4.4 H-DISSPARSE METHOD

In this section, we describe the process of efficient mining discriminative itemsets using

the tilted-time window model in the H-DISSparse method with the approximate bound.

The H-DISSparse method utilizes the DISSparse algorithm (Seyfi et al. 2017) proposed

in Chapter ‎3 with the tilted-time window model. The DISSparse is used for the batch processing

in offline updating the tilted-time window model using same data structure, FP-Tree, Header-

Table, conditional FP-Tree and minimized DISTree as defined in Chapter ‎3. The discriminative

(and sub-discriminative) itemsets are directly updated to the H-DISStream structure (i.e., current

window frame 𝑊0), and the tilted-time window model is updated by shifting and merging the

itemsets in the older window frames 𝑊𝑘 (i.e., 𝑘 > 0). The H-DISSparse method continues by

discovering discriminative and sub-discriminative itemsets for the next batch of transactions. To

best of our knowledge, the H-DISSparse method is considered as the first works for efficient

mining of discriminative itemsets using the tilted-time window model with the approximate

bound.

4.4.1 H-DISSparse Algorithm

The H-DISSparse algorithm is presented by incorporating three corollaries defined in

Section ‎4.3.2 for the efficient discriminative itemset mining using the tilted-time window model

and within the approximate bound guarantee. The H-DISStream structure is updated in offline

time intervals when the current batch of transactions 𝐵𝑛 is full (i.e., n ≥ 1). The first batch of

transactions 𝐵1 is treated differently by calculating all the item frequencies and making the Desc-

Flist based on the descending order of the item frequencies. The Desc-Flist order is used for

saving space by sharing the paths in the prefix trees with most frequent items on the top. This

Desc-Flist remains the same for all the upcoming batches in data streams. The H-DISSparse

algorithm is single-pass for the rest of the batches of transactions. The input parameters

discriminative level 𝜃, support threshold 𝜑 and relaxation of 𝛼 are defined based on the

application domain, data stream characteristics and sizes or by the domain expert users, as

discussed in Chapter ‎6. The H-DISSparse algorithm is represented in Algorithm ‎4.1.

The FP-Tree is made by adding the transactions from 𝐵𝑛 (i.e., the most current batch of

transactions) without pruning infrequent items. The tilted-time window model is updated for

larger window frames 𝑊𝑘 (i.e., k > 0) by shifting and merging as explained, based on the basic

of the logarithmic tilted-time window frames as in (Giannella et al. 2003). Following the

DISSparse algorithm (Seyfi et al. 2017) proposed in Chapter ‎3, the minimized DISTree is

generated by the itemset combinations from potential discriminative subsets in the conditional

FP-Tree made for each Header-Table item by following the proposed HEURISTIC ‎3.1 and

HEURISTIC ‎3.2. The HEURISTIC ‎3.1 and HEURISTIC ‎3.2 are modified using

Corollary ‎4-2 by setting the relaxation of 𝛼 for the approximate bound guarantee in

discriminative itemset in the tilted-time window model. The itemsets in minimized DISTree

structure are checked based on the Definition ‎4.1, Definition ‎4.2 and Definition ‎4.3 to be saved as

discriminative and sub-discriminative itemsets or be deleted as non-discriminative itemsets if

they are leaf nodes, respectively.

The H-DISStream structure as the current window frame 𝑊0 is updated instantly by

discriminative and sub-discriminative itemsets in each minimized DISTree. The conditional FP-

Tree of the processing Header-Table item is expanded by the sub-branches of the processed

subtree and the process continues if there is a new subtree, as explained in DISSparse algorithm

in Chapter ‎3. By full discovery of the discriminative itemsets in 𝐵𝑛, the exact frequencies of the

non-discriminative subsets not updated in 𝑊0 are tuned using the FP-Tree following the

Corollary ‎4-1. By the end of updating the window frame 𝑊0, the tail pruning is applied to the H-

DISStream and its built-in tilted-time window model following the Corollary ‎4-3. The

discriminative itemsets in target data stream 𝑆𝑖 against general data stream 𝑆𝑗 in each window

frame 𝑊𝑘 (i.e., 𝑘 ≥ 0) are reported in offline time intervals in each window frame in 𝐷𝐼𝑖𝑗𝑘 and

the H-DISSparse algorithm is continued for the new incoming batch of transactions 𝐵𝑛+1.

Algorithm ‎4.1 ((H-DISSparse: Efficient Mining of Discriminative Itemsets in

Data Streams using the Tilted-time Window Model)

Input: (1) The discriminative level 𝜃; (2) The support threshold 𝜑; (3) The

relaxation of 𝛼; and (4) The Input batches 𝐵𝑛 i.e., 𝑛 ≥ 1 made of transactions

with alphabetically ordered items belonging to data streams 𝑆𝑖 and 𝑆𝑗.

Output: 𝐷𝐼𝑖𝑗𝑘 i.e., 𝑘 ≥ 0, different set of discriminative itemsets in 𝑆𝑖 against 𝑆𝑗

in the tilted-time window model (H-DISStream structure)

1) While not end of streams do

2) Read current batch of transactions 𝐵𝑛;

3) Order the items in transactions based on Desc-Flist made of 𝐵1;

4) Make FP-Tree for 𝐵𝑛 based on expansion of FP-Growth (i.e., includes all

items);

5) Update H-DISStream as 𝑊0 using DISSparse algorithm modified based

on Corollary ‎4-2; (HEURISTIC ‎3.1 and HEURISTIC ‎3.2 modified by

Corollary ‎4-2);

6) Update window frames 𝑊𝑘 (i.e., 𝑘 > 0) by shifting and merging using

FP-Stream algorithm in (Giannella et al. 2003);

7) Tune non-discriminative subset in 𝑊0 using FP-Tree (Corollary ‎4-1);

8) Apply tail pruning in H-DISStream (Corollary ‎4-3);

9) Report discriminative itemsets 𝐷𝐼𝑖𝑗𝑘 for each window frame 𝑊𝑘;

10) End while;

4.4.2 H-DISSparse Algorithm Complexity

In the H-DISSparse algorithm, the significant part attracting considerable complexity is

related to the DISSparse algorithm by generating the potential discriminative itemsets. Updating

the tilted-time window model by shifting and merging, tuning the frequencies of the non-

discriminative subsets and applying the tail pruning in the H-DISStream structure have less

complexity compared to the FP-Stream, by considering the sparsity property of discriminative

itemsets. In H-DISSparse the tilted-time window model is updated after finding every pattern out

of current batch of transactions. The efficiency of the H-DISSparse algorithm is discussed in

detail by evaluating the algorithm with the input data streams in Chapter ‎6. Empirical analysis

shows the performance of the proposed method by testing with different parameter settings (e.g.,

relaxation of 𝛼). The efficiency of the H-DISSparse algorithm is discussed on large and fast-

growing data streams for mining discriminative itemsets with the approximate bound guarantee.

The proposed DISSparse method (Seyfi et al. 2017) in Chapter ‎3 uses two determinative

heuristics and new concise data structures for efficient discriminative itemset mining in a batch of

transactions. Following Corollary ‎4-2 the relaxation of 𝛼 is adapted to the defined

HEURISTIC ‎3.1 and HEURISTIC ‎3.2 in Chapter ‎3 for approximate support and approximate

ratio bound guarantee in discriminative itemsets in the tilted-time window model.

4.5 CHAPTER SUMMARY

Considering the proposed method in this chapter, the H-DISSparse method is applicable

for the large datasets. The utilized DISSparse method for offline batch processing is efficient for

mining discriminative itemsets in data streams, based on the heuristics proposed in Chapter ‎3.

The tail pruning techniques proposed in the FP-Stream method are not applicable in the method

proposed in this chapter, and the new techniques are defined. The discriminative itemsets are

held in the proposed novel prefix tree H-DISStream structure and its built-in tilted-time window

model in approximate support and approximate ratio during the history of data streams.

In this chapter, three corollaries are defined for mining discriminative itemsets in the

tilted-time window model with the highest refined approximate bound. The defined corollaries

are applied to the H-DISSparse method. The process is efficient in the H-DISSparse method

based on its efficient DISSparse algorithm, utilized for the batch processing. The defined

corollaries in this chapter, guarantee to hold the exact frequencies of the discriminative itemsets,

sub-discriminative itemsets and non-discriminative subsets from the time they are maintained in

the tilted-time window model. The non-discriminative itemsets, staying as leaf nodes in H-

DISStream, are pruned with their possible built-in tilted-time window frames. The highest refined

approximate bound is achieved with setting the smaller relaxation for tail pruning in the tilted-

time window model. Following the compact logarithmic tilted-time window model and the

defined tail pruning techniques, the H-DISStream data structure is able to be fitted in the main

memory to ascertain that mining discriminative itemsets in data streams using the tilted-time

window model is realistic in fast-growing data streams.

The H-DISStream structure is stable during the time and the discriminative itemsets that

appeared after processing batches with high concept drifts are neutralized by merging in larger

window frames. The process of building the H-DISStream structure can become more efficient

by periodically reordering the data structures based on the new trends in the data streams. The

Desc-Flist order made of the first batch of transactions is the default order for making all the data

structures in the algorithms. The efficiency of the algorithms may be affected by this default

order in case of high concept drifts in the data streams during the time. The data structures such

as FP-Tree, conditional FP-Tree, minimized DISTree and H-DISStream can be updated

periodically with new ordering of frequent items for better efficiency (i.e., Desc-Flist adjusted

based on the new trends in frequent items). However, the overhead of restructuring the large H-

DISStream structure and its tilted-time window model must be considered.

The proposed method is extensively evaluated with datasets exhibiting distinct

characteristics in Chapter ‎6. Empirical analysis shows that the proposed H-DISSparse method

has efficient time and space complexity with the highest refined approximate bound based on the

defined corollaries and by considering the smaller relaxation of 𝛼. The discriminative itemset

appeared with high concept drifts in specific batches are neutralized quickly during merging in

larger window frames. The in-memory data structures defined for the algorithms are efficiently

stay small based on the defined corollaries and are stabilized during the process.

We explained many different real world applications for mining discriminative itemsets

using the tilted-time window model. One of the interesting scenarios is in online news delivery

services by looking for a group of words which are more frequent in the news read by specific

users compared to the same group of words in a collection of all news, in different time periods.

This can be used for personalization based on the user high interest during the current period of

time, history of data streams or specific events and updating the system based on changes in user

preferences. The classification techniques can be applied based on the discriminative itemsets in

the tilted-time window model. The general trends are not affected by high concept drifts at

specific periods of time and different weighting functions can be adjusted to the discriminative

itemsets in each window frame. The interesting queries can be asked based on the trends in

different time periods. The discriminative itemsets are a sparse subset of frequent itemsets and

are applicable to many real world applications with large data streams (e.g., online market basket

analysis, network access patterns, etc.).

In this chapter, discriminative itemsets are updated in offline time intervals in the tilted-

time window model. In the next chapter, we propose the algorithm for mining discriminative

itemsets in data streams using the sliding window model in offline and online updating states.

Chapter 5: Mining Discriminative Itemsets in Data Streams using the Sliding Window Model Page 100

Data Streams using the Sliding Window

In this chapter, the problem of mining discriminative itemsets in data streams using the sliding

window model is formally defined. The comprehensive research problem is outlined and one

method is proposed. The method called S-DISSparse utilizes the efficient DISSparse algorithm

(Seyfi et al. 2017) proposed in Chapter ‎3 with the sliding window model. The proposed method

is explained in detail with its novel data structures and the offline updating sliding window

model. In this chapter, the novel determinative heuristics are proposed for exact and efficient

mining discriminative itemsets using the offline sliding window model. The proposed heuristics

guarantee the exact discriminative itemsets in the large and fast-growing data streams with the

necessity of concise process fitting in real world applications. The online sliding is also proposed

for mining discriminative itemsets in data streams using the sliding window model in the real

time frame with the highest refined approximation. The proposed method is extensively

evaluated on data streams made of multiple batches of transactions exhibiting diverse

characteristics and by setting different thresholds, in Chapter ‎6. Empirical analysis shows

efficient time and space complexity gained by the S-DISSparse algorithm for mining

discriminative itemsets in the offline and online sliding window model. To the best of our

knowledge, the proposed method in this chapter is the first algorithm for mining discriminative

itemsets in data streams using the sliding window model.

definition of the research problem using several notations is proposed in Section ‎5.2. The sliding

window model and its offline updating process are discussed in Section ‎5.3. The online sliding

window model is discussed in Section ‎5.4. The S-DISSparse method is proposed for mining

discriminative itemsets in data streams using the sliding window model in Section ‎5.5. The

chapter is finalized by discussion about the S-DISSparse method and its state-of-the-art

techniques in Section ‎5.6.

5.1 EXISTING WORKS

Moment is a famous method proposed for mining closed frequent patterns from data

streams using sliding window model (Chi et al. 2004). A synopsis data structure is designed to

monitor the transactions in the sliding window model for mining the current closed frequent

itemsets. The synopsis data structure cannot monitor all the itemset combinations due to time and

memory limits. On the other side monitoring only the frequent itemsets is not enough as it is not

possible to detect the itemset when they change from infrequent to frequent. The Moment (Chi et

al. 2004) introduces a compact data structure, the closed enumeration tree (CET), for maintaining

the dynamic set of itemsets over a sliding window. The itemsets in the boundary between closed

frequent itemsets and the rest of the itemsets are saved in CET, and the boundary is changed by

conceptual drifts in data stream. The itemset status will change (e.g., from non-frequent to

frequent) through the boundary. The data structure is small enough to be maintained in memory

and updated in real time. The boundary is relatively stable and most of the itemsets do not often

change status.

The CET data structure in the Moment algorithm (Chi et al. 2004) is informative enough

for mining any (closed) frequent itemset over a sliding window. By choosing a reasonably large

window size, and having less concept drifts in data stream, most of the itemsets do not change

their status form frequent to infrequent or vice versa. This means the itemsets are mainly updated

by their frequency in window model and do not change their status by the effects of adding or

deleting the transactions. Also, the changes to the entire tree structure are limited to the change in

the status through the boundary nodes.

The Moment algorithm (Chi et al. 2004) holds four types of nodes in CET, including:

infrequent gateway nodes, unpromising gateway nodes, intermediate nodes, and closed nodes.

These node types and the change status between them are described briefly. Infrequent gateway

nodes are the infrequent itemsets with a frequent parent or frequent sibling of parent.

Unpromising gateway nodes are frequent itemsets with a superset of a closed itemset with the

same support. Intermediate nodes are frequent itemsets with a child node with same support

which the itemset is not unpromising gateway node. Closed nodes represent closed itemsets in

the sliding window model and can be an internal node or a leaf node. The node statuses

(itemsets) are changed as a matter of operations adding a transaction, and deleting a transaction.

In the Moment algorithm (Chi et al. 2004) by adding a transaction, most likely the node

status will not change and the algorithm simply updates the support of the itemsets with the

minimum cost. In specific cases these changes may happen. An infrequent gateway node may

become frequent. In this case all its left siblings must be checked for making new children, and

the pruned branches under node must be re-explored. An unpromising gateway node may

become promising. In this case the pruned branches under node must be re-explored. A closed

node will remained as closed without change in status, and an intermediate node can become

closed node.

In the Moment algorithm (Chi et al. 2004) by deleting the old transactions most likely

and similar to adding the new transaction the node statues will not change and only the supports

should be updated. In specific cases these changes may happen. An infrequent gateway node

remains as infrequent node. An unpromising gateway node may change to infrequent node. The

frequent node can become infrequent by decreasing its support. In this case their entire

descendants are pruned and all of its left siblings are updated by removing the children obtained

from join with the node. A promising node may become unpromising. In this case the left check

of its siblings is necessary. A closed node may become non-closed.

The Moment algorithm (Chi et al. 2004) is basically based on the concept of closed

itemset as a concise way of showing frequent itemsets. The concept of closed frequent itemsets is

not applicable to the discriminative itemset mining as it does not follow the Apriori property.

The estWin algorithm (Chang and Lee 2005) is proposed for mining recently frequent

itemsets over an online transactional data stream. The mining over an online data stream is

applied based on flexible trade-off between processing time and mining accuracy. The algorithm

uses a minimum support threshold and significant support threshold for mining the frequent

itemsets. An itemset is recently frequent if its current support within the sliding window size is

greater than or equal to minimum support threshold. An itemset with the current support greater

than significant support threshold within the sliding window size is significant itemset. The

significant itemsets are maintained in the main memory by a lexicographic tree structure. The

effect of old transactions become out of range by decreasing the occurrence count of each itemset

that appeared in the transaction.

In estWin algorithm (Chang and Lee 2005) the total number of itemsets monitored in the

main memory is minimized by two operations, which are delayed insertion and pruning. The

itemset insertion is delayed until it become significant. The itemsets are pruned if they become

insignificant. The itemsets with current support much less than minimum support are not

monitored since they cannot be frequent in the recent future. These itemsets are considered as

insignificant itemsets. The support of an itemset which is not monitored in the current window in

data stream can be estimated by the supports of its subsets which are currently monitored. All the

transactions in the current window are maintained by a structure named CTL (Current

Transaction List). The estWin has two states as window initialization state, and window sliding

state. The window is initialized while the number of transactions generated so far in data stream

is less than sliding window model. The window slides when CTL become full. In this state the

new transaction is added and the oldest transaction is extracted from CTL.

In estWin algorithm (Chang and Lee 2005) the currently significant itemsets are

maintained in a lexicographic tree structure as in (Wang and Han 2004) which is called

monitoring tree. Every node in the monitoring tree holds an item and it shows the itemset from

the path between root and this item. The algorithm maintain the set of (pcnt, acnt, err, mtid) in

each entry. The pcnt is maximum possible count of the itemset and it is estimated number of

transactions contain the itemset before it is added to the monitoring tree. The actual count acnt is

the number of transactions in the sliding window that contain the itemsets after itemset is

inserted. The maximum error count err of itemset is the error in maximum possible count pcnt

when the pcnt is estimated. The mtid is the TID of the transactions which caused for insertion of

the itemset in the monitoring tree. In order to an itemset be a new significant itemset, all of its

subsets must be currently significant itemset.

The estWin (Chang and Lee 2005) is an approximate method and the frequency of the

itemsets are estimated based on frequency of their subsets. The Apriori property of subset is not

applied to the problem of discriminative itemset mining.

The DSTree (Leung and Khan 2006) is proposed for mining the frequent itemsets in data

stream over an sliding window model. The DSTree is designed for exact stream mining of

frequent itemsets. A novel tree structure call DSTree (Data Stream Tree) is proposed for

maintaining the transaction frequencies in different batches and mining the exact frequent

itemsets. By the data stream flow the fixed size window model defined by user slides and DSTree

is updated. The frequencies of items are continuously affected by inserting a new batch or

removing the old batch of transactions. In DSTree a list of counters are maintained at each node

and the last entry of the list at the node shows its frequency count in the current batch. When the

next batch of transaction comes this list is shifted forward and the last entry shifts and becomes

the second last entry. At the same time the frequency count corresponding to the oldest batch is

removed. By every window sliding these shift and update operations are done.

The proposed DSTree algorithm (Leung and Khan 2006) holds a pointer in each node to

show the last update. When the window slides, the transactions in the oldest batch are deleted by

shifting the list of frequency count using the pointer in each node. The DSTree is made

independent of the minimum support and every transaction in the batches is monitored. The

frequent itemset mining is then can be applied using the FP-Growth method (Han, Pei and Yin

2000). This structure can also be used for mining the maximal or closed itemsets in the sliding

window model.

The DSTree algorithm (Leung and Khan 2006) basically is for holding the items

frequency in offline sliding of the batch of transactions. The FP-Growth algorithm (Han, Pei and

Yin 2000) is run based on DSTree structure for mining the frequent itemsets in the offline sliding

window model. The FP-Growth algorithm is designed based on the Apriori property and divide

and quanquer, which is not applied to the problem of discriminative itemset mining.

The main contribution in this chapter is proposing one algorithm for mining

discriminative itemsets in data streams using the sliding window model. The S-DISSparse

method is proposed in this chapter for mining exact discriminative itemsets in data streams using

the offline sliding window model. The discriminative itemsets with approximate frequency are

discovered using the online sliding window model. The proposed S-DISSparse method in this

chapter is completely novel and utilizes the DISSparse algorithm (Seyfi et al. 2017) which is

proposed in Chapter ‎3. The technical detail of the proposed algorithm is provided in a way to

distinguish with the DISSparse method.

The embedded knowledge in data streams is changing during the time. Processing the

recent transactions is important in the applications looking for the recent patterns inside the data

streams (e.g., anomaly detection and decision making). Discovering the recent patterns in a finite

number of transactions or a fixed time period in data streams and continuous monitoring of the

variations in these patterns gives valuable information for data stream processing based on the

recent trends. Monitoring the gradual changes in the recent patterns in the fast-growing data

streams quickly, especially in the online status, using the sliding window model is useful for

answering the recent time-restricted queries. The transactions for pattern mining should be

restricted to the most recent ones in the fixed-size window frame by eliminating the effects of the

obsolete transactions in the information. The transactions in the window frame have to be saved

to delete their effects from the results by window sliding when they go out of the window frame

(Chang and Lee 2005).

The discriminative itemsets in the sliding window model are defined as the frequent

itemsets in the target data stream that have frequencies much higher than that of the same

itemsets in other data streams in a fixed recent time period. Not losing generality, we call other

data streams a 'general data stream' for the sake of simplicity. The discriminative itemsets are

relatively frequent in the target data stream and relatively infrequent in the general data stream

during the fixed-size sliding window frame. An essential issue in this research problem is to find

the itemsets that can distinguish the target data stream from all other data streams in a finite

number of recent transactions or a fixed recent time period.

There are many real world scenarios that can show the significance of mining

discriminative itemsets in data streams using the sliding window model. Monitoring the stock

market fluctuations for the most recent transactions in the fixed-size sliding window frame can be

useful for quickly detecting the changes in the recent trends. Discriminative itemsets represented

in the sliding window model are useful for data stream comparison based on their recent trends

by identifying the specific set of items that are of high interest in one market compared to the

other markets in the fixed recent time period. The itemsets that occur more frequently in the

recent transactions in one stock market compared to the other stock markets can be used for

answering the recent time-restricted queries. They are applicable for showing the relative changes

in the data stream trends in the recent time period and for answering the real time queries. Web

page personalization can be optimized by changes in user preferences in the fixed recent time

period. The sequences of queries with higher support in one geographical area, compared to

another area, are time related. Changes in the discriminative pattern trends during network

monitoring in the last few minutes are more valuable for anomaly detection and network

interference prediction than discriminative patterns themselves.

The problem of mining discriminative itemsets in data streams using the sliding window

model in comparison to the other window models has additional challenges. The sliding window

model has to be updated by adding the large number of itemset combinations of recent

transactions and deleting the effects of the old transactions from the fixed-size sliding window

frame. The challenges are more in the online sliding window model as during online sliding, by

appearing every single transaction in the data streams, the window frame has to be updated

quickly by adding the itemsets of the recent transaction and deleting the itemsets of the oldest

transaction. Depending on application, the sliding window frame is defined based on fixed time

period or fixed number of transactions and the discriminative itemsets are represented in offline

and online updating states. The Apriori property defined for the frequent itemsets is not valid in

discriminative itemsets and a subset of a discriminative itemset can be non-discriminative. The

proposed method has to be highly time and memory efficient. The greatest challenge is

generation of compact in-memory data structures by processing only the recent potential

discriminative itemsets in the offline and online sliding states. Moreover, unsynchronized data

streams with different speeds add more challenges to the sliding window updating process.

In this chapter, the discriminative itemset mining using the sliding window model is

discussed, based on two data streams 𝑆𝑖 and 𝑆𝑗, modelled as multiple continuous batches of

transactions denoted as 𝐵1,‎…‎𝐵ℎ, 𝐵ℎ+1,…‎,‎𝐵𝑛.

5.2.1 Problem formal definition

The formal definition of mining discriminative itemsets presented in Chapter ‎3 is

expanded for mining discriminative itemsets using the sliding window model as defined below:

Let‎∑‎be‎the‎alphabet‎set‎of items, a transaction 𝑇 = {𝑒1, … 𝑒𝑖 , 𝑒𝑖+1, … , 𝑒𝑛}, 𝑒𝑖 ∈ ∑, is

defined as a set of items‎in‎∑. The items in the transaction are in alphabetical order by default for

and general data streams; each consists of a different number of transactions, i.e., 𝑛𝑖 and 𝑛𝑗,

respectively. A group of input transactions from two data streams 𝑆𝑖 and 𝑆𝑗 in the pre-defined

time period are set as a batch of transactions 𝐵𝑛 i.e., 𝑛 ≥ 1. Let 𝑃 be a partition fitting an input

batch of transactions 𝐵. The sliding window frame denoted as 𝑊 is made of a fixed number of

partitions 𝑃𝑘 i.e., 𝑘 ≥ 1 and 𝑃𝑘 ⊆ 𝑊 (e.g., in Figure ‎5.1 the sliding window model is made of

three partitions) and refers to the fixed recent time period containing itemsets made of

transactions in two data streams 𝑆𝑖 and 𝑆𝑗 with the lengths of 𝑛𝑖𝑤 and 𝑛𝑗

𝑤, respectively. All

partitions cover the same width of time period but the number of transactions in partitions in

sliding window frame 𝑊 varies depending on the speed of data streams. The window frame 𝑊

slides in an offline state by adding the itemsets in the recent partition i.e., 𝑃𝑛𝑒𝑤 and deleting the

itemsets in the oldest partition i.e., 𝑃𝑜𝑙𝑑, as in Figure ‎5.1.

Figure ‎5.1 Sliding window model 𝑊 made of three partitions 𝑃

An itemset I is defined as a subset of ∑. The itemset frequency is the number of

transactions that contains the itemset. The frequency of itemset 𝐼 in data stream 𝑆𝑖 in the window

frame 𝑊 is denoted as 𝑓𝑖𝑤(𝐼) and the frequency ratio of itemset 𝐼 in data stream 𝑆𝑖 in the

window frame 𝑊 is defined as 𝑟𝑖𝑤(𝐼) = 𝑓𝑖

𝑤(𝐼)/𝑛𝑖𝑤.

In this chapter, if the frequency ratio of itemset I in the target data stream 𝑆𝑖 in the sliding

window frame 𝑊 is larger than the frequency ratio in the general data stream 𝑆𝑗, i.e., 𝑟𝑖𝑤(𝐼)

𝑟𝑗𝑤(𝐼)

then the itemset I can be considered as a discriminative itemset in the sliding window frame 𝑊.

Let 𝑅𝑖𝑗𝑤(𝐼) be the ratio between 𝑟𝑖

𝑤(𝐼) and 𝑟𝑗𝑤(𝐼), i.e., 𝑅𝑖𝑗

𝑤(𝐼) =𝑟𝑖𝑤(𝐼)

𝑟𝑗𝑤(𝐼)

. Obviously, the higher the

𝑅𝑖𝑗𝑤(𝐼), the more discriminative the itemset 𝐼 is.

discriminative in the window frame 𝑊 if 𝑅𝑖𝑗𝑤(𝐼) ≥ 𝜃. This is formally defined as:

𝑅𝑖𝑗𝑤(𝐼) =

𝑟𝑖𝑤(𝐼)

𝑟𝑗𝑤(𝐼)

=𝑓𝑖𝑤(𝐼)𝑛𝑗

𝑓𝑗𝑤(𝐼)𝑛𝑖

𝑤 ≥ 𝜃 (‎5.1)

The 𝑅𝑖𝑗𝑤(𝐼) could be very large but with very low 𝑓𝑖

𝑤(𝐼). In order to accurately identify

discriminative itemsets which have reasonable frequency in the window frame 𝑊, and also in the

case of 𝑓𝑗𝑤(𝐼) = 0, we introduce another user-specified support threshold, 0 < 𝜑 < 1 𝜃⁄ , to

eliminate itemsets that have very low frequency in the window frame 𝑊. In this chapter, an

itemset 𝐼 is considered as discriminative if its frequency in the window frame 𝑊 becomes greater

than 𝜑𝜃𝑛𝑖 i.e., 𝑓𝑖𝑤(𝐼) ≥ 𝜑𝜃𝑛𝑖 and also 𝑅𝑖𝑗

𝑤(𝐼) ≥ 𝜃.

Definition ‎5.1. Discriminative itemsets in the sliding window model: Let 𝑆𝑖 and 𝑆𝑗 be two

data streams, with the current size of 𝑛𝑖𝑤 and 𝑛𝑗

𝑤 in sliding window frame 𝑊 that contain varied

length transactions of items in ∑, a user defined discriminative level threshold 𝜃 > 1 and a

support threshold 𝜑 𝜖 (0, 1 𝜃⁄ ). The set of discriminative itemsets in 𝑆𝑖 against 𝑆𝑗 in the sliding

window model in window frame 𝑊, denoted as 𝐷𝐼𝑖𝑗𝑤 is formally defined as:

𝐷𝐼𝑖𝑗𝑤 = {𝐼 ⊆ ∑ | 𝑓𝑖

𝑤(𝐼) ≥ 𝜑𝜃𝑛𝑖𝑤 & 𝑅𝑖𝑗

𝑤(𝐼) ≥ 𝜃} (‎5.2)

The itemsets that are not discriminative in the current state of sliding window frame 𝑊,

can be discriminative by window sliding; for example, by updating the sliding window frame

with the transactions coming in or going out of the window frame size. In order to avoid missing

any potential discriminative itemsets in the full size of the sliding window frame 𝑊, we propose

to identify sub-discriminative itemsets in the sliding window model with a parameter specified by

the user. The sub-discriminative itemsets are discovered by relaxation of 𝛼 𝜖 (0,1) with more

number of sub-discriminative itemsets in smaller 𝛼. The itemset 𝐼 is sub-discriminative if it is not

discriminative but its frequency in target data stream 𝑆𝑖 is not less than 𝛼𝜑𝜃𝑛𝑖𝑤 and its ratio is not

less than 𝛼𝜃. The discriminative itemsets are in interest; however, the sub-discriminative itemsets

also kept tracking during the process as they may be discriminative by sliding window frame.

Definition ‎5.2. Sub-discriminative itemsets in the sliding window model: Let 𝑆𝑖 and 𝑆𝑗 be

two data streams, with the current size of 𝑛𝑖𝑤 and 𝑛𝑗

𝑤 in sliding window frame 𝑊 that contains

varied length transactions of items in ∑, a user-defined discriminative level threshold 𝜃 > 1, a

support threshold 𝜑 𝜖 (0, 1 𝜃⁄ ) and a relaxation parameter 𝛼 𝜖 (0,1). A set of sub-discriminative

itemsets in 𝑆𝑖 against 𝑆𝑗 in the sliding window model 𝑊, denoted as 𝑆𝐷𝐼𝑖𝑗𝑤 is formally defined as:

𝑆𝐷𝐼𝑖𝑗𝑤 = {𝐼 ⊆ ∑ ∣ 𝑓𝑖

𝑤(𝐼) ≥ 𝛼𝜑𝜃𝑛𝑖𝑤 & 𝑅𝑖𝑗

𝑤 ≥ 𝛼𝜃} (‎5.3)

The sub-discriminative itemsets are the potential discriminative itemsets in the sliding

window frames 𝑊 in the data streams. The relaxation of 𝛼 𝜖 (0,1) is defined for better

approximate support and approximate ratio in discriminative itemsets in the online sliding

window model with the trade-off between the computational cost and less number of wrong or

hidden discriminative itemsets. The discriminative itemsets are reported with exact support and

exact ratio in offline sliding window model. The discriminative itemsets are reported with

approximate support and approximate ratio between two offline sliding of the window model,

during online sliding window model.

The itemsets that are not discriminative and not sub-discriminative are defined as non-

discriminative itemsets:

Definition ‎5.3. Non-discriminative itemsets in the sliding window model: Let 𝑆𝑖 and 𝑆𝑗 be

two data streams, with the current size of 𝑛𝑖𝑤 and 𝑛𝑗

𝑤 in sliding window frame 𝑊 that contains

varied length transactions of items in ∑, a user defined discriminative level threshold 𝜃 > 1, a

support threshold 𝜑 𝜖 (0, 1 𝜃⁄ ) and a relaxation parameter 𝛼 𝜖 (0,1). The set of non-

discriminative itemsets in 𝑆𝑖 against 𝑆𝑗 in the sliding window model, denoted as 𝑁𝐷𝐼𝑖𝑗𝑤 is

𝑁𝐷𝐼𝑖𝑗𝑤 = {𝐼 ⊆ ∑ ∣ 𝑓𝑖

𝑤(𝐼) < 𝛼𝜑𝜃𝑛𝑖𝑤 𝑜𝑟 𝑅𝑖𝑗

𝑤 < 𝛼𝜃} (‎5.4)

The non-discriminative itemsets are not frequent in the target data stream with the

relaxation of 𝛼 or have frequency ratio in 𝑆𝑖 compared to 𝑆𝑗 less than 𝛼𝜃 in the sliding window

frame 𝑊 in the data streams. The non-discriminative itemsets are used for tail pruning in the

sliding window model, as discussed in Section ‎5.3.

5.2.2 Discriminative itemset mining using the sliding window model

The problem of discriminative itemset mining using the sliding window model is defined

for offline batch processing in data streams. The sliding window model is updated in offline state

in the specific time intervals defined for the batch of transactions. The new batch of transactions

is processed and the discriminative itemsets are saved in the sliding window frame 𝑊 and the

itemsets belong to the oldest partition out of the sliding window frame 𝑊 are deleted. The sub-

discriminative itemsets are also saved as they may become discriminative in the future by online

sliding window frame. By discovering the new discriminative and sub-discriminative itemsets

from the current batch of transactions in 𝑊, the old itemsets in the sliding window frame 𝑊 are

assessed for being discriminative and sub-discriminative based on the recent data stream lengths.

The offline sliding window model is discussed in detail using running examples by

showing the significant challenges. Two determinative heuristics are proposed for efficient and

exact mining discriminative itemsets in the offline sliding window model. The online sliding

window model is happen between two offline sliding of the window model. The online sliding

window model is discussed in detail for mining discriminative itemsets in an online state with

approximate bound guarantee based on the properties of discriminative itemsets in the sliding

window model. Considering the complexities of the discriminative itemset mining in data

streams using the sliding window model, the novel efficient S-DISSparse method is proposed by

utilizing the proposed DISSparse method (Seyfi et al. 2017) in Chapter ‎3 and defining new data

structures adapted for offline and online updating the sliding window model. The S-DISSparse

algorithm is evaluated based on the input data streams with different characteristics in Chapter ‎6

for its time and space complexities in the offline sliding window model. The algorithm is also

evaluated for different approximate support and approximate ratio bounds in discriminative

itemsets in the online sliding window model. The specific characteristics and the limitations of

the proposed method in the large and fast-growing data streams are discussed.

5.3 OFFLINE SLIDING WINDOW MODEL

The sliding window model is made of itemsets from the recent transactions in the range

of window frame size stay in a prefix tree structure (i.e., S-DISStream as presented in Figure ‎5.3).

The frequencies of itemsets are held in the full size sliding window frame and the discriminative

itemsets are reported in the offline and online updating sliding window model. The offline sliding

window model is updated by the new batch of transactions arriving and the oldest batch of

transactions going out in offline time intervals in the window frame. The online sliding window

model is updated by adding and deleting the recent and oldest transaction respectively in the

window frame in the real time frame. The size of the sliding window frame (i.e., 𝑊) can be

defined based on a fixed time period or a fixed number of transactions. The size of the sliding

window frame based on a fixed time period can be varied, as the data streams have different

speeds during the time. The size of the sliding window frame is defined based on the desired

output range in the application domain and the limit of main memory. To facilitate describing the

method, several important concepts and constructs are defined in the section below.

5.3.1 Mining discriminative itemsets in sliding window using prefix tree

The two prefix tree structures are defined for holding the transactions and itemsets in the

sliding window model, respectively.

S-FP-Tree: The prefix tree structure proposed in the FP-Growth (Han, Pei and Yin

2000) is used for holding the items of the transactions, without pruning infrequent items, by

sharing the branches for their most common frequent items (i.e., S-FP-Tree includes all items).

The S-FP-Tree is adapted by adding two counters in each node for holding the frequencies of the

itemsets in the target data stream 𝑆𝑖 and the general data stream 𝑆𝑗, respectively; for example,

there are two counters associated with each node in the S-FP-Tree in Figure ‎5.2. The S-FP-Tree

is updated during window sliding by adding the transactions of the new partition and deleting the

transactions of the oldest partition in the sliding window frame 𝑊. New paths in the S-FP-Tree

are added for the new transactions or the frequencies of paths are updated. The frequencies of

paths in the S-FP-Tree are decreased by deleting the transactions of the oldest partition. The

nodes in the S-FP-Tree are tagged based on their recent status as stable if the frequencies have

not changed by adding the new partition or deleting the oldest partition, respectively, otherwise

they are tagged as updated.

S-DISStream: The S-DISStream prefix tree structure is for holding the discovered

discriminative and sub-discriminative itemsets as well. Both discriminative and sub-

discriminative itemsets share the branches in the same S-DISStream structure for their most

common frequent items (e.g., as in Figure ‎5.3). Each path in S-DISStream may represent a subset

of multiple discriminative and sub-discriminative itemsets started from the root of the prefix tree

structure. Each node in S-DISStream has two counters 𝑓𝑖 and 𝑓𝑗 for holding the frequencies of

itemset in the target data stream 𝑆𝑖 and the general data stream 𝑆𝑗, respectively in the sliding

window frame 𝑊. The itemsets in S-DISStream are made of transactions from the partitions that

fit in the sliding window frame 𝑊.

The Header-Table is defined for fast traversing the prefix tree structures using the links

holding the itemsets ending with identical items. Each Header-Table item node in S-DISStream

saves the top ancestor on the first level as the root; for example, node 𝑐 is the top ancestor of all

different Header-Table items, including 𝑎, 𝑒, 𝑏 and 𝑐 in the left-most subtree in Figure ‎5.3, and 𝑐

appears in all nodes in the left-most subtree. The nodes in the first level of S-DISStream

determine different subtrees made of number of itemsets under the same root in the first level of

S-DISStream and ending with different Header-Table items. Following similar notations as in the

conditional FP-Tree in DISSparse method (Seyfi et al. 2017) proposed in Chapter ‎3, a subtree in

S-DISStream is denoted as 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡; for example, the S-DISStream in Figure ‎5.3 has three

subtrees under root 𝑐, 𝑏 and 𝑒 (i.e., 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐, 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑏 and 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑒, respectively).

During processing each header item, the set of Header-Table items which are linked

under their subtree root node using Header-Table links is denoted as

𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡). The nodes are tagged as discriminative, sub-

discriminative or non-discriminative (i.e., subset of discriminative or sub-discriminative

itemsets). The S-DISStream is updated in the sliding window model in offline and online states.

The nodes in S-DISStream are tagged based on their recent status as stable if the frequencies have

not changed by offline window sliding i.e., adding new partition or deleting the oldest partition,

otherwise they are tagged as updated. The nodes in S-DISStream are tagged as online if they are

updated during online sliding. The concept of the stable nodes in S-FP-Tree and the stable nodes

in S-DISStream refers to the itemsets that do not have any change in their frequencies during

sliding window model.

The offline processing is used for controlling the generation of stable and non-potential

discriminative itemset combinations. The online processing is used for more up-to-date and

accurate online answers. The discriminative itemsets are discovered in the offline state

periodically, and the mining process is continued in the online state. The two heuristics are

applied as new heuristics for efficient mining discriminative itemsets in data streams using the

sliding window model, in the S-DISSparse method.

5.3.2 Incremental offline sliding window

The sliding window model can be simply implemented by adapting the DISTree

algorithm (Seyfi, Geva and Nayak 2014) or DISSparse algorithm (Seyfi et al. 2017), proposed in

Chapter ‎3, in the offline updating state. The recent batch of transactions fitting in a partition of

the sliding window frame 𝑊 is processed for mining discriminative itemsets using DISTree or

DISSparse algorithms and the results are saved in the sliding window model (i.e., S-DISStream).

The sliding window model is updated by discriminative itemsets discovered from each new batch

of transactions fitting in the new partition added to the sliding window frame 𝑊, and deleting the

itemsets belong to the oldest partition out of the full size sliding window frame 𝑊. However,

there are several challenges with this naïve approach as below.

First, the sliding window frame 𝑊 is made of several partitions and discovering the

discriminative itemsets in a single partition and merging them with itemsets in the full size

sliding window frame can result in high numbers of false positives and false negatives, which

significantly downgrades the output quality; for example, the itemsets in a partition can be non-

discriminative with a very small frequency ratio (i.e., much larger frequency in general data

stream compared to the target data stream). The itemset is pruned during batch processing caused

for less accuracy by reporting false positives in the sliding window frame 𝑊. The non-

discriminative itemsets in a partition can be discriminative by merging in the sliding window

model: for example, the non-discriminative itemsets with high frequency ratio but infrequent in

the target data stream. The itemset is pruned during batch processing causing less recall by

reporting false negatives in the sliding window frame 𝑊. Second, many itemsets can be

discriminative in a single partition and non-discriminative in the sliding window frame 𝑊,

causing an inefficient mining process. Third, two batch processing must be done during the

window model sliding i.e., one for adding the discriminative itemsets in the recent batch and one

for deleting the itemsets in the oldest batch in the sliding window frame 𝑊. This may not be

realistic if the transitions come continuously in fast speed.

Data stream mining algorithms basically must be designed based on a single scan

(Aggarwal 2007; Han, Pei and Kamber 2011). Algorithm design based on multiple scans is often

too expensive, specifically in the sliding window model with the necessity of fast updating. The

smaller relaxations for sub-discriminative itemsets can improve the approximation relatively,

with extra complexity. The efficient method should be designed for exact mining discriminative

itemsets by processing the recently updated transactions in the S-FP-Tree and tagging the

discriminative and non-discriminative itemsets in the S-DISStream based on the recent data

stream lengths in the sliding window frame 𝑊.

In this chapter, the two heuristics are proposed as new heuristics based on the S-FP-Tree

nodes status (i.e., stable or updated during window model sliding) within efficient time and space

usage for offline updating of the sliding window model, based on the S-DISSparse method. The

offline sliding window model is described in the sub-sections below.

5.3.2.1 Initializing the offline sliding window

The S-FP-Tree is initialized by the transactions fitting in the first partition; 𝑃1 and

discriminative itemsets are discovered following the normal process of DISSparse algorithm

(Seyfi et al. 2017) and set to the S-DISStream. All the nodes in the S-FP-Tree and S-DISStream

are tagged as stable (i.e., not updated) before adding the recent partition 𝑃𝑛𝑒𝑤 and deleting the

oldest partition 𝑃𝑜𝑙𝑑 i.e., in the full size window frame 𝑊 as in Figure ‎5.1. The nodes in S-FP-

Tree paths are tagged as updated if the frequencies change during window model sliding; for

example, by adding new transactions or deleting old transactions.

The conditional FP-Tree is made for each Header-Table item based on the item’s‎

conditional patterns in the S-FP-Tree. The Header-Table items in the conditional FP-Tree hold

the same status as following the tags in the S-FP-Tree. The tags in

𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) in the conditional FP-Tree can show that the itemsets in a

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 are updated or stable during the window model sliding. Similar approaches are

applied for the updated or stable itemsets with subset of 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡 in a potential

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡. The two new heuristics are proposed accordingly based on the recently updated

itemsets, for mining discriminative itemsets out of the updated potential discriminative subsets in

the conditional FP-Tree. The stable itemsets are checked in the S-DISStream and are tagged

based on the recent data stream lengths in the sliding window frame 𝑊.

Example ‎5.1. The S-FP-Tree and S-DISStream constructions and updating are

graphically demonstrated using the running example with two batches of transaction fitting in the

first two partitions in the sliding window model, respectively. The first batch (i.e., followed as in

Example ‎3.1) made of data streams 𝑆1 and 𝑆2 (𝑛1 = 𝑛2 = 15) fits in 𝑃1 is presented in

Table ‎5.1. The second batch made of data streams 𝑆1 and 𝑆2 (𝑛1 = 𝑛2 = 5) fits in 𝑃2 is

presented in Table ‎5.3.

Table ‎5.1 The first input batch in data streams fits in partition 𝑃1

The Desc-Flist order made of the first batch of transactions (i.e., Table ‎5.1) is generated

as in Table ‎5.2. This Desc-Flist remains the same for all the upcoming batches in data streams.

Table ‎5.2 Desc-Flist order of frequent items is target data stream 𝑆1 in the first batch

In this example the discriminative level threshold is set to 𝜃 = 2 and the support

threshold is set to 𝜑 = 0.1. The S-FP-Tree and S-DISStream structures made of the first partition

𝑃1 are represented in Figure ‎5.2 and Figure ‎5.3, respectively. The highlighted nodes in S-

DISStream in Figure ‎5.3 refer to the discriminative itemset.

Figure ‎5.2 Header-Table and S-FP-Tree structures by the first partition 𝑃1

Figure ‎5.3 Header-Table and S-DISStream structures by the first partition 𝑃1

By the processing the most recent partition the S-DISStream holds the discriminative

itemsets in the offline sliding window model. The S-FP-Tree and S-DISStream structures are

then tagged based on their stable and updated subsets for efficient mining as explained in the

section below.

5.3.2.2 Stable and updated subsets in offline sliding window

Before processing the next batch of transactions fitting in 𝑃2 all the nodes in the S-FP-

Tree and S-DISStream structures are tagged as stable. Figure ‎5.4 shows the S-FP-Tree structures

after adding the second batch of transactions fits in 𝑃2 (i.e., as in Table ‎5.3). The updated nodes

in S-FP-Tree are represented by thick borders; for example, the path 𝑏𝑑𝑎3,1 is appeared in the S-

FP-Tree after adding the new batch of transactions fits in 𝑃2.

Table ‎5.3 The second input batch in data streams fits in partition 𝑃2

Figure ‎5.4 Header-Table and updated S-FP-Tree structures by adding second partition 𝑃2

A 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) in the conditional FP-Tree in the sliding window frame 𝑊

satisfies two conditions i.e., Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) ≥ 𝜑𝜃𝑛𝑖𝑤 and Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) ≥ 𝜃,

where 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎) is the set of itemsets in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 ending with a header item

Let 𝒮 be the power set of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎), i.e., 𝒮 = 2𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡,𝑎) , i.e., 𝒮 consists

of all subsets of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎). For 𝐵 ∈ 𝒮 and 𝐵 ≠ { }, the frequency of each itemset in

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡, in respect of the data stream 𝑆𝑖 in the sliding window frame 𝑊, is defined below.

𝑓𝑖𝑤(𝐵) =∑𝑓𝑖

𝑤(𝑏

𝑏∈𝐵

) (‎5-a)

For simplicity, in the equation above 𝑓𝑖𝑤(𝑏) refers to the frequency in 𝑆𝑖 of an itemset

𝑏 ∈ 𝐵 belongs to the power set of 𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑎). The frequency of itemset in a subtree is

stable if all 𝑏 ∈ 𝐵 are stable during the offline sliding window model. The discriminative value

of each itemset in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 in the sliding window frame 𝑊 is defined below.

Dis_value(𝐵) =

𝑟𝑖𝑤(𝐵)

𝜃 ∑𝑓𝑗

𝑤(𝑏)

𝑏∈𝐵

𝑟𝑖𝑤(𝐵)

𝑟𝑗𝑤 ∑𝑓𝑗

𝑤(𝑏)

𝑏∈𝐵

(‎5-b)

Where 𝑟𝑖𝑤(𝐵) =

∑ 𝑓𝑖𝑤(𝑏)𝑏∈𝐵

𝑛𝑖𝑤 and 𝑟𝑗

𝑤(𝐵) =∑ 𝑓𝑗

𝑤(𝑏)𝑏∈𝐵

𝑛𝑗𝑤 . 𝑟𝑖

𝑤(𝐵) is called the relative support

of B in 𝑆𝑖. It is the sum of the relative supports of the itemsets in B in 𝑆𝑖 and 𝑆𝑗, respectively.

A potential 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 is stable if all potential discriminative itemsets in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡

are stable during the offline sliding window model as below.

Potential itemsets = {∀𝐵 ∈ 𝒮 ∣ ∑𝑓𝑖𝑤(𝑏)

𝑏∈𝐵

≥ 𝜑𝜃𝑛𝑖𝑤 ∩ Dis_value(𝐵) ≥ 𝜃} (‎5-c)

In order to find the updated and stable potential discriminative itemsets, all possible

itemsets in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 will have to be generated. However, the generation of all possible

itemset combinations is time-consuming. In this chapter, we propose the simple method for

calculating the Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) and estimating the Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) in the S-

DISSparse method.

The itemset is stable if it is summed up by the frequencies of only stable itemsets. Let 𝐵

with maximum 𝑅𝑖𝑗𝑤(𝐵) in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 be defined as 𝐵𝑚𝑎𝑥. Initially, 𝐵𝑚𝑎𝑥 is initialized by

summing up the 𝑓𝑖𝑤(𝑏) frequencies of the itemsets 𝑏 with 𝑓𝑗

𝑤(𝑏) = 0. The frequencies of 𝑏,

with maximum frequency ratio, is summed up by 𝐵𝑚𝑎𝑥 only if it increases its discriminative

value i.e., 𝑅𝑖𝑗𝑤(𝐵𝑚𝑎𝑥). The 𝐵𝑚𝑎𝑥 is considered as updated if it is summed up by the frequencies

of any updated itemset 𝑏. However, if the discriminative value of 𝐵𝑚𝑎𝑥 sum up by any updated 𝑏

is larger than discriminative level 𝜃, the 𝐵𝑚𝑎𝑥 is considered as updated (i.e., the overall

frequencies are tested only and not summed up); for example, in Figure ‎5.5 the maximum

discriminative value of itemsets in the left-most subtree, 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐, is equal to 2, i.e.,

Max_dis_value(𝑐, 𝑎) = 2, which is calculated by the sum of frequencies of two stable itemsets

ending with the items in 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐), i.e., 𝑎1,0 and 𝑎3,2.

Following the Definition ‎5.1 if the 𝑓𝑗𝑤(𝑏𝑚𝑎𝑥) = 0 then 𝑅𝑖𝑗

𝑤(𝑏𝑚𝑎𝑥) =𝑓𝑖𝑤(𝑏𝑚𝑎𝑥)

𝜃𝑛𝑖. The

Max_freq𝑖(𝑐, 𝑎) = 4 which is calculated by the sum of frequency of similar stable itemsets i.e.,

𝐼(𝑎1,0) and 𝐼(𝑎3,2) and the 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐 is defined as stable subtree. The algorithm proposed for

calculating the Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) and Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) for finding the stable

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 in S-DISSparse method for mining potential discriminative itemsets. The statement

𝑓𝑤(𝑏𝑚𝑎𝑥)+= 𝑓𝑤(𝑏) in the algorithm is the short way of two statements 𝑓𝑖

𝑤(𝑏𝑚𝑎𝑥)+= 𝑓𝑖𝑤(𝑏)

and 𝑓𝑗𝑤(𝑏𝑚𝑎𝑥)+= 𝑓𝑗

𝑤(𝑏), for the sake of simplicity.

Algorithm ‎5.1 Stable, updated 𝐌𝐚𝐱_𝐟𝐫𝐞𝐪𝒊(𝒓𝒐𝒐𝒕, 𝒂), 𝐌𝐚𝐱_𝐝𝐢𝐬_𝐯𝐚𝐥𝐮𝐞(𝒓𝒐𝒐𝒕, 𝒂)

Input: (1) 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡; (2) header item

Output: (1) stable or updated Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎); (2) stable or updated

Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎).

1) Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) = 0; 𝐹𝑖𝑤 = 0; 𝐹𝑗

𝑤 = 0;

2) For each items 𝑏 ∊ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) do

3) Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎)+= 𝑓𝑖𝑤(𝐼(𝑏));

4) If 𝐼(𝑏) is updated then Tag Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) as updated; End if;

5) If 𝑓𝑗𝑤(𝐼(𝑏)) = 0 then 𝐹𝑖

𝑤+= 𝑓𝑖𝑤(𝐼(𝑏)); Tag 𝑏 as checked;

6) If 𝐼(𝑏) is updated then Tag 𝐹𝑖𝑤 as updated; End if;

7) End if;

8) End for;

9) While ∃𝑏 ∊ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) and 𝑏 is unchecked do

10) Find 𝐼(𝑏) with maximum 𝑅𝑖𝑗𝑤(𝐼(𝑏));

11) If 𝐹𝑖𝑤+ 𝑓𝑖

𝑤(𝐼(𝑏))

𝐹𝑗𝑤+ 𝑓𝑗

𝑤(𝐼(𝑏)) >

𝐹𝑖𝑤

𝐹𝑗𝑤 Or

𝐹𝑖𝑤+ 𝑓𝑖

𝑤(𝐼(𝑏))

𝐹𝑗𝑤+ 𝑓𝑗

𝑤(𝐼(𝑏)) > 𝜃 then 𝐹𝑖

𝑤+= 𝑓𝑖𝑤(𝐼(𝑏));

𝐹𝑗𝑤+= 𝑓𝑗

𝑤(𝐼(𝑏));

12) End if;

13) Tag 𝑏 as checked;

14) If 𝐼(𝑏) is updated then Tag 𝐹𝑖𝑤 as updated;

15) End While;

16) Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) =𝐹𝑖𝑤∗𝑛𝑗

𝐹𝑗𝑤∗𝑛𝑖

17) If 𝐹𝑖𝑤 is updated then Tag Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) as updated;

18) Return Stable or updated Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎), Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎);

in the two separated loops. In the first loop the Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) is calculated based on the all

itemsets in the 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡. If any 𝐼(𝑏) is updated then Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) is tagged as

updated. The 𝐹𝑖𝑤 is also initiated based on sum of 𝑓𝑖(𝐼(𝑏)) in the itemsets with 𝑓𝑗(𝐼(𝑏)) = 0.

The 𝐹𝑖𝑤 is also tagged as updated if any 𝐼(𝑏) is updated. The 𝐹𝑖

𝑤 and 𝐹𝑗𝑤are used for calculating

the Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎). In the second loop and using a greedy method, the 𝐹𝑖𝑤 and 𝐹𝑗

𝑤 are

updated by adding the frequencies of the itemset 𝐼(𝑏) with maximum 𝑅𝑖𝑗𝑤(𝐼(𝑏)) if the

frequencies of the itemset 𝐼(𝑏) increase the ratio between 𝐹𝑖𝑤 and 𝐹𝑗

𝑤, or ratio between 𝐹𝑖𝑤 and

𝐹𝑗𝑤 become greater than discriminative level 𝜃. If any 𝐼(𝑏) is updated then 𝐹𝑖

𝑤 is tagged as

updated caused for updated Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎). In the second loop each item is checked

one time and the Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) are calculated based on the selected items.

All discriminative itemsets in a stable 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 are stable in the sliding window

frame 𝑊 compared to the current state in S-DISStream. A potential 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 can be non-

potential before offline sliding with different data stream lengths in the sliding window frame 𝑊,

and new discriminative itemsets may be discovered in the sliding window frame 𝑊; for example,

by decreasing the length of target data stream 𝑆𝑖 or increasing the data streams length ratio, 𝑛𝑗𝑤

𝑛𝑖𝑤. A

𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) that satisfies the conditions Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) ≥ 𝜑𝜃𝑛𝑖𝑤 and

Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) ≥ 𝜃 with smaller frequency in the target data stream S𝑖, or smaller

frequency ratio in target data stream S𝑖 vs general data stream S𝑗, compared to the last size of

sliding window frame 𝑊 (i.e., last offline window model sliding), is not stable. For the sake of

clarity, this part has not been represented in the Algorithm ‎5.1, considering all partitions is in a

same size containing an equal number of transactions. The algorithm has to be modified by

holding the length of data streams in the last offline sliding, and comparing the

Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) and Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) calculated by the recent and old data stream

lengths.

A potential 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 in the sliding window model is processed in a different way.

The first heuristic is formally defined below.

HEURISTIC ‎5.1. A potential 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 is stable denoted as 𝑆𝑡𝑎𝑏𝑙𝑒(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) if all

potential discriminative itemsets in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 that satisfy the following conditions are stable:

1. Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) ≥ 𝜑𝜃𝑛𝑖𝑤

2. Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) ≥ 𝜃

Where 𝜃 > 1 is the discriminative level threshold, 𝜑 𝜖 (0, 1 𝜃⁄ ) is the support threshold, 𝑛𝑖𝑤 is

the size of target data stream 𝑆𝑖 in the sliding window frame 𝑊 and Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑎) and

Max_dis_value(𝑟𝑜𝑜𝑡, 𝑎) are stable if any itemset in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 that satisfies the two

conditions is stable.

Lemma ‎5-1 (Stable subtree) HEURISTIC ‎5.1 ensures that any discriminative itemset in

a 𝑆𝑡𝑎𝑏𝑙𝑒(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) is stable in the sliding window model.

Proof. The two conditions in HEURISTIC ‎5.1 ensure that any itemset that is frequent in

the target data stream 𝑆𝑖 and has discriminative value larger than the discriminative level 𝜃 is

stable in the sliding window model. This implies that all discriminative itemsets in a

𝑆𝑡𝑎𝑏𝑙𝑒(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) exist in the sliding window model by processing the

𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) during previous offline sliding in the window model. The itemset

combinations in a 𝑆𝑡𝑎𝑏𝑙𝑒(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) are tagged as discriminative or non-discriminative in

the 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 of S-DISStream, based on the recent data stream lengths in sliding window

frame 𝑊. This implies that the discriminative itemsets are discovered based on the recent data

stream lengths that have been changed by adding the new partition and deleting the oldest

partition in the sliding window frame 𝑊.

For a subtree, if any of the two conditions is updated the subtree is considered as a

potential discriminative subtree and the potential discriminative itemset combinations are

generated from the subtree as it may contain new discriminative itemsets.

In Figure ‎5.5, the left-most subtree related to the processing Header-Table item 𝑎 under

root node 𝑐 (i.e., 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐) is stable as the only subset of itemsets in the subtree that satisfy the

conditions in HEURISTIC ‎5.1 i.e., 𝐼(𝑎3,2) and 𝐼(𝑎1,0) by 𝑎3,2, 𝑎1,0 ∈ 𝐵 and 𝐵 ∈ 𝒮, is stable as

in conditions below.

∑𝑓𝑖𝑤(𝑏)

𝑏∈𝐵

≥ 3 + 1 = 4 ≥ (𝜑𝜃𝑛𝑖𝑤 = 0.1 ∗ 2 ∗ 20 = 4) (‎5-d)

Dis_value(𝐵) = ∑ 𝑓𝑖

𝑤(𝑏)𝑏∈𝐵

∑ 𝑓𝑗𝑤(𝑏)𝑏∈𝐵

∗𝑛𝑗𝑤

𝑛𝑖𝑤 =

(3 + 1)

(2 + 0)∗20

2≥ (𝜃 = 2) (‎5-e)

In this chapter, for the sake of simplicity the dataset lengths are omitted from ratios as

𝑛1 = 𝑛2. In case of data streams with different length (i.e., 𝑛2

𝑛1≠ 1) the ratios must be multiplied

by the constant of 𝑛2

𝑛1. The conditional FP-Tree of Header-Table item 𝑎 is made out of S-FP-Tree

updated with partition 𝑃2 is presented as in Figure ‎5.5.

Figure ‎5.5 Conditional FP-Tree of Header-Table item 𝑎 updated by partition 𝑃2

The 𝑆𝑡𝑎𝑏𝑙𝑒(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) is ignored from itemset combination generation. The itemset

combinations of a 𝑆𝑡𝑎𝑏𝑙𝑒(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 of S-DISStream are traversed using

Header-Table links and tagged as discriminative or non-discriminative based on the recent data

stream lengths in the sliding window frame 𝑊; for example, in 𝑆𝑡𝑎𝑏𝑙𝑒(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐) the itemset

𝐼(𝑎4,2) i.e., 𝑐𝑏𝑎4,2 in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐 of S-DISStream ending with

𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐), is tagged as discriminative as in Figure ‎5.6.

Figure ‎5.6 Updated S-DISStream after processing 𝑆𝑡𝑎𝑏𝑙𝑒(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐) in conditional FP-Tree

for Header-Table item 𝑎

A 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑖𝑛) in a 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) (i.e., 𝑖𝑛 ∊ 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡) in the

conditional FP-Tree in the sliding window frame 𝑊 satisfies two conditions i.e.,

Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) ≥ 𝜑𝜃𝑛𝑖𝑤 and Max_dis_value(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) ≥ 𝜃, where

𝑖𝑡𝑒𝑚𝑠𝑒𝑡𝑠(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) is a set of itemsets in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 ending with a header item 𝑎 ∊

𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) with the internal node 𝑖𝑛 as subset.

Let 𝒮 be the power set of 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡, i.e., 𝒮 = 2𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡 , and itemset

𝐼, with subset of 𝑖𝑛 ∊ 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡, ending with

𝑎 ∈ 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) denoted as 𝐼(𝑖𝑛). The frequency of each itemset in

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 with subset of internal node 𝑖𝑛, in respect of the data stream 𝑆𝑖 in the sliding

window frame 𝑊, is defined below (i.e., 𝐵 ∈ 𝒮).

𝑓𝑖𝑤(𝐵) =∑𝑓𝑖

𝑤(𝑏

𝑏∈𝐵

) (‎5-f)

The frequency of an itemset in a subtree is stable if all 𝑏 ∈ 𝐵 are stable during the offline

sliding window model. A potential internal node 𝑖𝑛 ∊ 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡 is stable if all

potential discriminative itemsets in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 with internal node 𝑖𝑛 as subset are stable during

the offline sliding window model.

All discriminative itemsets in a 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 with a subset of a stable 𝑖𝑛 are stable in the

sliding window frame 𝑊 compared to the current state in S-DISStream. A potential internal node

𝑖𝑛 ∊ 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡 can be non-potential before offline sliding with different data stream

lengths in sliding window frame 𝑊, and new discriminative itemsets may be discovered in

sliding window frame 𝑊; for example, by decreasing the length of target data stream 𝑆𝑖 or

increasing the data streams’‎ length ratio, 𝑛𝑗

𝑛𝑖. A 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑖𝑛) that satisfies the conditions

Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) ≥ 𝜑𝜃𝑛𝑖𝑤 and Max_dis_value(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) ≥ 𝜃 with smaller frequency

in the target data stream S𝑖, or smaller frequency ratio in target data stream S𝑖 vs general data

stream S𝑗, compared to the last size of sliding window frame 𝑊 (i.e., last offline window model

sliding) is not stable.

The potential internal node 𝑖𝑛 ∊ 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡 in the sliding window model is

processed in a different way.

The second heuristic is formally defined below.

HEURISTIC ‎5.2. An internal node 𝑖𝑛 ∊ 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡 is stable denoted as 𝑆𝑡𝑎𝑏𝑙𝑒(𝑖𝑛) if

all potential discriminative itemsets in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 with internal node 𝑖𝑛 as subset that satisfy

the following conditions are stable.

1. Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) ≥ 𝜑𝜃𝑛𝑖𝑤

2. Max_dis_value(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) ≥ 𝜃

Where 𝜃 > 1 is the discriminative level threshold, 𝜑 𝜖 (0, 1 𝜃⁄ ) is support threshold, 𝑛𝑖𝑤 is the

size of target data stream 𝑆𝑖 in the sliding window frame 𝑊 and Max_freq𝑖(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) and

Max_dis_value(𝑟𝑜𝑜𝑡, 𝑖𝑛, 𝑎) are stable if any itemset in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 with internal node 𝑖𝑛 as

subset that satisfies the two conditions is stable.

Lemma ‎5-2 (Stable internal node) HEURISTIC ‎5.2 ensures that any discriminative

itemset in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡with subset of a 𝑆𝑡𝑎𝑏𝑙𝑒(𝑖𝑛) is stable in the sliding window model.

Proof. The two conditions in HEURISTIC ‎5.2 ensure that any itemset with subset of

internal node 𝑖𝑛 ∊ 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡 that is frequent in the target data stream 𝑆𝑖 and has

discriminative value larger than the discriminative level 𝜃 is stable in the sliding window model.

This implies that all discriminative itemsets with the subset of 𝑆𝑡𝑎𝑏𝑙𝑒(𝑖𝑛) exist in the sliding

window model by processing the 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑖𝑛) in a potential 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 during a previous

offline sliding of the window model. The itemset combinations with a 𝑆𝑡𝑎𝑏𝑙𝑒(𝑖𝑛) are tagged as

discriminative or non-discriminative in the 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 of S-DISStream, based on the recent

data stream lengths in sliding window frame 𝑊. This implies that the discriminative itemsets are

discovered based on the recent data stream lengths that have been changed by adding the new

partition and deleting the oldest partition in the sliding window frame 𝑊.

For a 𝑖𝑛 ∊ 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡 if any of the two conditions is updated the internal node

is considered as a potential discriminative internal node and the potential discriminative itemset

combinations with the subset of internal node are generated from the subtree, as it may contain

new discriminative itemsets.

Every itemset in a 𝑆𝑡𝑎𝑏𝑙𝑒(𝑖𝑛) is stable in the sliding window model; for example in

Figure ‎5.5 in the left-most subtree related to the processing Header-Table item 𝑎 under root node

𝑐 (i.e., 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐) the internal node 𝑏 is stable as the only subset of items in the subtree that

satisfy the conditions in HEURISTIC ‎5.2 i.e., made of 𝐼(𝑏3,2) and 𝐼(𝑏1,0) as 𝑏3,2, 𝑏1,0 ∈ 𝐵 and

𝐵 ∈ 𝒮, is stable. The 𝑆𝑡𝑎𝑏𝑙𝑒(𝑖𝑛) is ignored from itemset combination generation. The itemset

combinations with a subset of 𝑆𝑡𝑎𝑏𝑙𝑒(𝑖𝑛) in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 of S-DISStream are traversed using

Header-Table links and tagged as discriminative or non-discriminative based on the recent data

stream lengths in the sliding window frame 𝑊; for example, in Figure ‎5.6 the itemset 𝑐𝑏𝑎4,2 in

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐 of S-DISStream ending with 𝐻𝑒𝑎𝑑𝑒𝑟_𝑇𝑎𝑏𝑙𝑒_𝑖𝑡𝑒𝑚𝑠(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐) with subset of

internal node 𝑏, is tagged as discriminative.

Following the running Example ‎5.1 the conditional FP-Tree of Header-Table item 𝑎 is

expanded by sub-branches of 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑐 (i.e., 𝑏𝑑𝑎3,2, 𝑏𝑎1,0 and 𝑎1,5) as in Figure ‎5.7. The

𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑏) and its 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑖𝑛) are not stable and S-DISStream is updated by new

itemset combinations generated out of the potential 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑏 as in Figure ‎5.8. The

discriminative itemsets (i.e., 𝑏𝑑𝑎4,2 and 𝑏𝑎8,3) are discovered in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑏. The potential

discriminative itemsets may exist in the S-DISStream out of processing the old partitions and be

over written; for example, in Figure ‎5.8 the itemset 𝑏𝑎5,2 exists in S-DISStream out of processing

partition 𝑃1 as in Figure ‎5.3 and is overwritten by the new frequencies as 𝑏𝑎8,3.

Figure ‎5.7 Expanded conditional FP-Tree of Header-Table item 𝑎 updated by partition 𝑃2

after processing the first subtree

Figure ‎5.8 Updated S-DISStream after processing potential discriminative subsets of the left-

most subtree in conditional FP-Tree for Header-Table item 𝑎

The conditional FP-Tree of Header-Table item 𝑎 is then expanded by sub-branches of

𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑏 (i.e., 𝑑𝑎6,3 and 𝑎2,0) as in Figure ‎5.9. The 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑑) is not stable and S-

DISStream is updated by the discriminative itemset 𝑑𝑎6,3 as in Figure ‎5.9.

Figure ‎5.9 Expanded conditional FP-Tree of Header-Table item a updated by partition 𝑃2

after processing the second subtree

Figure ‎5.10 Updated S-DISStream after processing potential discriminative subsets of the

left-most subtree in conditional FP-Tree for Header-Table item 𝑎

Following the bottom-up order of Desc-Flist, the conditional FP-Tree is then generated

for the rest of Header-Table items respectively (i.e., item 𝑒 in Example ‎5.1 as in Table ‎5.2). The

new discriminative itemsets in each potential 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 are inserted to S-DISStream. The tag

of itemsets in S-DISStream belongs to the 𝑆𝑡𝑎𝑏𝑙𝑒(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) and 𝑆𝑡𝑎𝑏𝑙𝑒(𝑖𝑛) are updated

based on the recent data stream lengths in the sliding window frame 𝑊. The frequencies of

itemsets in S-DISStream that are not updated (i.e., belongs to the non-potential 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 or

with subset of non-potential 𝑖𝑛 ∊ 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡) must be adjusted based on their

appearances in S-FP-Tree followed by updating the tag of itemsets, as explained in the section

below.

5.3.2.3 S-DISStream tuning and pruning in offline sliding window

By processing the last conditional FP-Tree (e.g., conditional FP-Tree of item 𝑐 in

Example ‎5.1) the offline sliding continues by checking the itemsets in S-DISStream that have not

been updated. The itemsets that belong to the non-potential 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 or with subset of non-

potential 𝑖𝑛 ∊ 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡 are not updated during batch processing. The itemsets in S-

DISStream are checked by traversing using Header-Table links and the frequencies of the

itemsets that have not been updated are tuned, based on their appearances in the S-FP-Tree and

tagged as discriminative or non-discriminative based on the recent data stream lengths.

Against the Apriori property and distinguishing the discriminative itemset mining from

frequent itemset mining, the non-discriminative itemsets appear as a subset of discriminative

itemsets; for example, the items 𝑐 and 𝑑 in Example ‎5.1 are subsets of discriminative itemsets as

in Figure ‎5.11, but they are not discriminative. The frequencies of non-discriminative itemsets

appearing as a subset of discriminative itemsets must also be set accordingly using the S-FP-

Tree. These itemsets may become involved in the online sliding window, as explained in

Section ‎5.4. Tuning the frequencies of the not-updated itemsets and non-discriminative subsets is

not a time-consuming process, by considering the sparse discriminative itemsets with less

number of non-discriminative subsets.

Lemma ‎5-3 (Exact non-discriminative subsets) tuning the frequencies of the non-

discriminative itemsets appearing as a subset of discriminative itemsets using S-FP-Tree ensures

the exact frequencies of these itemsets in S-DISStream that may become involved in the online

window model updating.

Proof. The S-FP-Tree is the superset of conditional FP-Trees and has full view of all

itemsets in the datasets in the sliding window frame 𝑊. The exact frequencies of the non-

discriminative subsets are collected accurately using their appearances in S-FP-Tree by

traversing through Header-Table links.

The tail pruning in S-DISStream is applied for space saving. The itemset in S-DISStream

in the sliding window model is pruned if it is non-discriminative and stays as a leaf node. The tail

pruning ensures that S-DISStream only maintain the discriminative itemsets and the non-

discriminative subsets in the sliding window frame 𝑊. The final S-DISStream after offline

sliding by partition 𝑃2 as in Table ‎5.3 and tail pruning is presented in Figure ‎5.11 with the eight

discriminative itemsets as listed in the table. The tail pruning is applied in the S-FP-Tree structure

following the same process for deleting the old transactions that are out of sliding window frame

𝑊 with zero frequencies in data streams.

Figure ‎5.11 Final S-DISStream after offline sliding based on partition 𝑃2

The S-DISStream is proposed for dynamically monitoring the discovered discriminative

itemsets in an offline state. The S-DISStream is updated by every new incoming batch of

transactions that fits in partition 𝑃𝑛𝑒𝑤 in sliding window frame 𝑊 and deleting the itemsets that

belong to the oldest partition 𝑃𝑜𝑙𝑑 out of full size sliding window frame 𝑊. The S-DISStream

structure is constructed and updated in the offline sliding window model by adjusting the

efficient mining discriminative itemsets. The online sliding window model is defined for

approximate online monitoring the discriminative itemsets in the sliding window frame 𝑊

between two offline sliding, as in the section below.

5.4 ONLINE SLIDING WINDOW MODEL

In offline sliding the discriminative itemsets are reported only after finishing the offline

processing by the new partition which may take time, depending on the size of datasets and their

characteristics, and may not be acceptable for some real-time applications. The online sliding is

defined to discover the discriminative itemsets and report the change in trends in an online state

when any new single transaction arrives. To facilitate describing the method, several important

concepts and constructs are defined in the section below.

5.4.1 Mining discriminative itemsets in online sliding window using queue structure

The queue structure is defined for holding the transactions in the online sliding window

model made of partitions with different sizes.

Transaction-List: This is a queue structure for keeping track of the transactions in the

online sliding window model as in Figure ‎5.12. For each transaction fits in the online sliding

window frame 𝑊, it holds the partition number; either it belongs to the target data stream 𝑆𝑖 or

the general data stream 𝑆𝑗 and a link to the transaction in the S-FP-Tree node. The Transaction-

List contains only the recent transactions fit in the defined online sliding window frame 𝑊. The

rest of transactions are deleted when the window frame 𝑊 slides as in Figure ‎5.12.

Figure ‎5.12 Transaction-List made of partitions fit in the online sliding window frame 𝑊

The online sliding window model is happening within two offline sliding of the window

model limited to the itemsets in S-DISStream (i.e., the online sliding window frame 𝑊). The

itemsets in the S-DISStream are the potential itemsets to change their tags during the online

sliding window model. The frequency and tag of the itemsets existing in S-DISStream are

updated during online sliding and no new itemset is generated. The window frame 𝑊 slides in

the online state between offline sliding, when a new batch is loaded with transactions. Every

transaction updates the S-FP-Tree and is linked by Transaction-List for online sliding. During

online sliding every new transaction in the recent partition (i.e., 𝑃𝑛𝑒𝑤 as in Figure ‎5.1) in window

frame 𝑊 is checked for having a subset in the S-DISStream by traversing through Header-Table

links. Subsets of new transaction exist in S-DISStream, called online itemsets, are used for online

sliding by increasing the itemset frequencies and updating the tags in the S-DISStream; for

example, a discriminative itemset may become non-discriminative.

By each new incoming transaction, the oldest transaction in the Transaction-List is

deleted by its online itemsets out of the sliding window frame 𝑊 if it belongs to the oldest

partition (i.e., 𝑃𝑜𝑙𝑑 as in Figure ‎5.1). Online itemsets of the old transaction (i.e., subsets exist in S-

DISStream) are used for online sliding by decreasing the itemset frequencies and updating the

tags in the S-DISStream. The online sliding continues for every new transaction until the end of

the new partition. The itemsets in the S-DISStream that are updated during online sliding are

tagged as online. The online itemsets in the S-DISStream hold the exact frequencies in the sliding

window frame 𝑊, however, they must be tagged after offline sliding based on the recent data

stream lengths during S-DISStream tuning and pruning as in Section ‎5.3.2.3. The S-DISStream

structure is updated in the online sliding window model by proposing one new corollary as in the

section below.

5.4.2 Improving the accuracy using relaxation ratio

The HEURISTIC ‎5.1 and HEURISTIC ‎5.2 are modified by the relaxation of 𝛼 for

holding the sub-discriminative itemsets in the sliding window frame 𝑊. The sub-discriminative

itemsets are saved in the sliding window model as the potential discriminative itemsets following

Definition ‎5.2 and based on the relaxation of 𝛼.

Property ‎5.1. Modifying HEURISTIC ‎5.1 and HEURISTIC ‎5.2 based on relaxation

of 𝛼 obtains the sub-discriminative itemsets.

This property says that the sub-discriminative itemsets are discovered by choosing the

relaxation of 𝛼 from non-discriminative itemsets. The sub-discriminative itemsets in the sliding

window frame 𝑊 are discovered for better approximation in itemset frequencies and itemset

frequency ratios in the online sliding window model.

Property ‎5.2. By using smaller relaxation of 𝛼 a better approximation will be in the

discriminative itemsets in the online sliding window model.

This property says that the more sub-discriminative itemsets are discovered by choosing

the smaller relaxation of 𝛼. This is a trade-off between better approximations in discriminative

itemsets in the online sliding window model and computation cost.

The corollary is formally defined below.

Corollary ‎5-1. A refined approximate bound in discriminative itemsets in the online

sliding window model is obtained by modifying the HEURISTIC ‎5.1 and HEURISTIC ‎5.2

based on relaxation of 𝛼.

Where 𝛼 is the relaxation threshold for sub-discriminative itemsets, and HEURISTIC ‎5.1 and

HEURISTIC ‎5.2 are defined for potential discriminative itemset combination generation during

the offline sliding window model.

Rationale ‎5-1. (highest refined approximate bound in discriminative itemsets in the

online sliding window model) Corollary ‎5-1 ensures that the approximation in discriminative

itemsets in the online sliding window model may be improved by holding the sub-discriminative

itemsets in the S-DISStream structure in an online sliding window model.

Proof. The sub-discriminative itemsets improve the approximate bound in discriminative

itemsets by increasing the number of potential discriminative itemsets under relaxation of 𝛼. We

call this the highest refined approximate bound in discriminative itemsets in the online sliding

window model. The Corollary ‎5-1 is more efficient when the discriminative itemsets are stable

in the neighbour partitions with less concept drifts present in the datasets.

5.5 S-DISSPARSE METHOD

In this section we describe the process of efficient mining discriminative itemsets using

the sliding window model in the S-DISSparse method by effectively dealing with explosion in

the number of generated itemsets in online and offline states.

The S-DISSparse method utilizes the DISSparse algorithm (Seyfi et al. 2017) proposed

in Chapter ‎3 with the offline sliding window model. The DISSparse algorithm is used for the

batch processing in offline updating the sliding window model using S-FP-Tree and S-

DISStream structures proposed in this chapter, and Header-Table, conditional FP-Tree and

minimized DISTree as defined in Chapter ‎3. The discriminative (and sub-discriminative) itemsets

are directly updated to the S-DISStream structure (i.e., sliding window frame 𝑊) and the sliding

window model is updated in offline and online states. The S-DISSparse method continues by

discovering discriminative and sub-discriminative itemsets for the next batch of transactions fits

in the new partition. To the best of our knowledge, the S-DISSparse method is considered as the

first work for efficient mining discriminative itemsets using the sliding window model with

approximate bound guarantee.

5.5.1 S-DISSparse Algorithm

The S-DISSparse algorithm is presented by incorporating two heuristics and one

corollary proposed in this chapter, for the efficient discriminative itemset mining using the

sliding window model. The S-DISStream structure is updated with the exact set of discriminative

itemsets in offline time intervals when the current batch of transactions 𝐵𝑛 is full (i.e., n ≥ 1).

The S-DISStream structure is updated with an approximate set of discriminative itemsets in a real

time frame when the current batch 𝐵𝑛 (i.e., n > 1) is loaded with transactions. The first batch of

transactions 𝐵1 is treated differently by calculating all the item frequencies and making the Desc-

Flist based on the descending order of the item frequencies. The Desc-Flist order is used for

saving space by sharing the paths in the prefix trees with the most frequent items on the top. This

Desc-Flist remains the same for all the upcoming batches in data streams. The S-DISSparse

algorithm is single-pass for the rest of the batches of transactions. The input parameters

discriminative level 𝜃, support threshold 𝜑 and relaxation of 𝛼 are defined based on the

application domain, data stream characteristics and sizes or by the domain expert users as

discussed in Chapter ‎6.

The S-FP-Tree is updated by adding the transactions from the recent batch of

transactions 𝐵𝑛 (i.e., the most current batch of transactions) fits in partition 𝑃𝑛𝑒𝑤 without pruning

infrequent items and by making the Transaction-List. The first partition is processed using the

DISSparse algorithm (Seyfi et al. 2017) proposed in Chapter ‎3 and the S-DISStream is generated

from discriminative itemsets and non-discriminative subsets in transactions fitting in partition 𝑃1.

By every new transaction in 𝑃𝑛𝑒𝑤 the Transaction-List is updated. The online sliding window is

updated by online itemsets in S-DISStream (i.e., increasing frequency of online itemsets in 𝑃𝑛𝑒𝑤

and decreasing frequency of online itemsets in 𝑃𝑜𝑙𝑑). The online itemsets in the S-DISStream are

tagged as discriminative or non-discriminative based on their updated frequencies and data

stream lengths. By the end of the online sliding of 𝑃𝑛𝑒𝑤, the 𝑃𝑜𝑙𝑑 is checked for online sliding of

the remained transactions (i.e., when 𝑃𝑜𝑙𝑑 has larger number of transaction than 𝑃𝑛𝑒𝑤).

During offline sliding, the S-DISStream is updated by the discriminative itemsets in each

𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) and its 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑖𝑛) in an offline state. The tags in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 in

S-DISStream are updated by checking the itemsets in 𝑆𝑡𝑎𝑏𝑙𝑒(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) and 𝑆𝑡𝑎𝑏𝑙𝑒(𝑖𝑛) in

S-DISStream based on the recent data stream lengths. The sub-discriminative itemsets are

discovered for better approximation bound in discriminative itemsets in the online sliding

window model by relaxation of 𝛼 and based on HEURISTIC ‎5.1 and HEURISTIC ‎5.2

modified using Corollary ‎5-1. By the end of offline sliding of 𝑃𝑛𝑒𝑤 the exact frequencies of the

non-discriminative subsets not updated in S-DISStream are tunned based on their appearance in

the S-FP-Tree and tail pruning is applied in S-DISStream and S-FP-Tree structures. The online

itemsets in S-DISStream are also tagged based on the recent data stream lengths and the process

continues by the next partition. The discriminative itemsets in target data stream 𝑆𝑖 against

general data stream 𝑆𝑗 are reported in offline time intervals in the sliding window frame 𝑊 in

𝐷𝐼𝑖𝑗𝑊 and S-DISSparse algorithm is continued for the new incoming batch of transactions 𝐵𝑛+1.

Algorithm ‎5.2 (S-DISSparse: Mining Discriminative Itemsets in Data

Streams using the Sliding Window Model)

Input: (1) The discriminative level 𝜃; (2) The support threshold 𝜑; (3) The

relaxation of 𝛼; and (4) incoming batches of transactions fit in partitions 𝑃 with

alphabetically ordered items belonging to data streams 𝑆𝑖 and 𝑆𝑗.

Output: 𝐷𝐼𝑖𝑗𝑤, set of discriminative itemsets in 𝑆𝑖 against 𝑆𝑗 in the sliding

window frame 𝑊 (S-DISStream structure) in online and offline states.

1) Make S-FP-Tree based on 𝐵1 fits in 𝑃1 and update Transaction-List;

2) Process 𝑃1 using DISSparse algorithm (Seyfi et al. 2017) and make S-

DISStream;

3) While not end of streams do

4) Untag S-FP-Tree and S-DISStream;

5) While not end of partition 𝑃𝑛𝑒𝑤 do // Online sliding window

6) Update S-FP-Tree and Transaction-List by new transaction;

7) If added transaction in partition 𝑃𝑛𝑒𝑤 in 𝑊 has online itemset then

8) Update online itemsets in S-DISStream; // increase frequency

9) If deleted transaction in partition 𝑃𝑜𝑙𝑑 in 𝑊 has online itemset then

10) Update online itemsets in S-DISStream; // decrease frequency

11) End While;

12) Delete remained transactions in partition 𝑃𝑜𝑙𝑑 in online state;

13) Update S-DISStream by discriminative itemsets in every

𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) and 𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙(𝑖𝑛);

14) Update tags in 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 in S-DISStream for itemsets in every

𝑆𝑡𝑎𝑏𝑙𝑒(𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡) and 𝑆𝑡𝑎𝑏𝑙𝑒(𝑖𝑛) based on HEURISTIC ‎5.1 and

HEURISTIC ‎5.2 modified by Corollary ‎5-1;

15) Tune non-discriminative subsets and tag online itemsets in S-DISStream

and apply tail pruning;

16) Report discriminative itemsets 𝐷𝐼𝑖𝑗𝑊 in sliding window frame 𝑊;

17) End while;

The Theorem ‎5-1 proves the precision of the S-DISSparse method as below.

Theorem ‎5-1 (Completeness and correctness of S-DISSparse): Based on Theorem ‎3-3

the DISSparse method (Seyfi et al. 2017) discovers the exact set of discriminative itemsets in

offline sliding states. Based on Lemma ‎5-1 and Lemma ‎5-2 the updated potential discriminative

itemsets in each potential 𝑆𝑢𝑏𝑡𝑟𝑒𝑒𝑟𝑜𝑜𝑡 in conditional FP-Tree and its 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑟𝑜𝑜𝑡 are

generated completely in S-DISStream, and all stable discriminative itemsets are tagged in S-

DISStream correctly based on the recent data stream lengths. Based on Lemma ‎5-3 the

frequencies of non-discriminative itemsets that appear as a subset of discriminative itemsets are

collected accurately. These prove the completeness and correctness of the S-DISSparse method

by discovering all the discriminative itemsets and their non-discriminative subsets in the offline

sliding window model.

5.5.2 S-DISSparse Algorithm Complexity

In the S-DISSparse algorithm, the significant part attracting considerable complexity is

related to the generating the potential discriminative itemsets and updating the tag of stable

itemsets in the S-DISStream structure. Tunning the frequencies of non-discriminative subsets in

the S-DISStream and applying the tail pruning in S-DISStream and S-FP-Tree have less

complexity by considering the sparse discriminative itemsets. The online sliding in the line 5 to

line 12 is based on a quick search method on S-DISStream structure. The offline sliding is based

on the potential subtrees and the potential internal nodes i.e., updated itemsets. The stable

itemsets in S-DISStream structure are also checked based on a quick search method and tagged as

discriminative or non-discriminative itemsets.

The efficiency of the S-DISSparse algorithm is discussed in detail by evaluating the

algorithm with the input data streams in Chapter ‎6. Empirical analysis shows the performance of

the proposed method by testing with different parameter setting (e.g., relaxation of 𝛼). The

efficiency of the S-DISSparse algorithm is discussed on large and fast-growing data streams for

exact mining discriminative itemsets in the offline sliding window model, and with approximate

bound guarantee in the online sliding window model.

5.6 CHAPTER SUMMARY

The proposed S-DISSparse method in this chapter is applicable for the large datasets.

The utilized DISSparse method for offline batch processing is efficient for mining discriminative

itemsets in data streams, based on the new heuristic proposed in Chapter ‎5. The sliding window

frame is divided into smaller partitions for offline sliding and it is also updated by online

transactions for online sliding. The exact discriminative itemsets in data streams are held in the

proposed novel S-DISStream structure in offline time intervals with approximate discriminative

itemsets in an online real time frame. The S-DISSparse algorithm has a complex process. Every

single transaction is checked for online updating and full partitions are processed for offline

sliding. The online sliding happens by online itemsets of the recent and the oldest transactions in

Transaction-List. The offline sliding happens by updating the sliding window model with

recently updated transactions and tagging the itemsets in S-DISStream based on the recent data

stream lengths in the sliding window frame.

The offline sliding is used for the efficient mining of the batch processing and the online

updating is used for the quick reporting of the discriminative itemsets. The usability of the online

sliding highly depends on the concept drifts in the input data streams and also the input rate of the

data streams. Considering the incoming partitions in the data streams, the window frame is

updated with high performance if the adjacent partitions have the least concept drifts. In the data

streams with high concept drifts, the sliding window frame is mainly updated through offline

sliding state and the online sliding state adding overhead in the mining process. The sliding

window frame size and also the number of partitions and size of each partition can be defined

based on the specific characteristics of the input datasets based on the domain expert knowledge.

In this chapter, two determinative heuristics and one corollary are proposed for mining

discriminative itemsets in the exact offline sliding window model, and with the highest refined

approximate bound in the online sliding window model. The proposed heuristics applied to the S-

DISSparse method are efficient. The proposed heuristics in this chapter, guarantee to hold the

exact frequencies of the discriminative itemsets, sub-discriminative itemsets and non-

discriminative subsets in the offline sliding window model. The non-discriminative itemsets,

staying as leaf nodes in S-DISStream, are pruned from sliding window model. The highest

refined approximate bound is achieved with setting the smaller relaxation in the online sliding

window model. Following the defined tail pruning techniques, the S-DISStream data structure is

able to be fitted in the main memory, ascertaining that mining discriminative itemsets in data

streams using the sliding window model is realistic in fast-growing data streams.

The S-DISStream structure is stable during the time and the discriminative itemsets

appeared after processing batches with high concept drifts are neutralized by merging in the full

size sliding window frame. The process of building S-DISStream structure can become more

efficient by reordering the data structures based on the new trends in the data streams

periodically. The Desc-Flist order made of the first batch of transactions is the default order for

making all the data structures in the algorithm. The efficiency of the algorithm may be affected

by this default order in the case of high concept drifts in the data streams during the time. The

data structures such as S-FP-Tree, conditional FP-tree, minimized DISTree and S-DISStream can

be updated periodically with new ordering of frequent items for better efficiency (i.e., Desc-Flist

adjusted based on the new trends in frequent items). However, the overhead of restructuring large

S-DISStream structure in the sliding window model must be considered.

The proposed method is extensively evaluated with datasets exhibiting distinct

characteristics in Chapter ‎6. Based on the experimental results, the S-DISSparse algorithm

exhibits an efficient time and space complexity in the large datasets with high complexity when

we choose the partition size as the smaller percentage of the sliding window frame size. The S-

DISSparse algorithm reports the discriminative itemsets with full accuracy and recall in offline

sliding. The approximate results are reported in the online sliding window frame with higher

accuracy and recall, considering smaller relaxation of 𝛼. The process of the discriminative

itemset mining in the algorithm is highly dependent on the type of datasets and how the itemsets

are distributed in the streams. The discriminative itemsets that appeared with high concept drifts

in specific batches are neutralized quickly during merging in the full size sliding window frame.

The in-memory data structures defined for the algorithm efficiently stay small, based on the

proposed heuristics, and they are stabilized during the process.

We explained many different real world applications for mining discriminative itemsets

using the sliding window model. The sliding window model is useful for real world applications

attracting high attention in recent history. One of the interesting scenarios is in network

monitoring for intrusion detection. Looking for a group set of activities happening more

frequently in one network compared to the rest of networks can be used for personalization or

anomaly detection. The discriminative itemsets in the sliding window model can be useful for

monitoring the recent patterns in the fast speed data streams. The sliding window model displays

the recent discriminative itemsets in the fixed-size window frame, which is updatable in offline

and online states.

In this chapter, exact discriminative itemsets are updated in offline time intervals in the

sliding window model and approximate discriminative itemsets are updated in real time frame in

the online sliding window model. In future work, we propose algorithms for classification based

on the discriminative itemsets in data streams using different window models. In the next

chapter, we evaluate the proposed algorithms for mining discriminative itemsets in data streams,

using different input datasets and based on different parameter settings.

Chapter 6: Evaluation and Analysis Page 131

6Chapter 6: Evaluation and Analysis

This chapter details all the results of the study in this thesis. The experimental results of different

datasets with different characteristics, on the four presented algorithms are reported and analysed

in detail. This chapter contains a full discussion of the results with reference to the literature. For

each result, similarities and differences to the findings in the literature review are discussed. This

chapter also includes theory building.

This chapter details the different types of experiment for evaluating the presented

algorithms for mining discriminative itemsets in a batch of transactions, mining discriminative

itemsets using the tilted-time window model and mining discriminative itemsets using the sliding

window model. Algorithms are evaluated based on different criteria including time and space

complexities, scalability and sensitivity analysis by performing varied input parameters. The

benchmarking and the type of datasets used for the experiments are explained in Section ‎6.1. The

DISTree and DISSparse algorithms proposed for mining discriminative itemsets in one batch of

transactions are evaluated in Section ‎6.2. The H-DISSparse algorithm proposed for mining

discriminative itemsets using the tilted-time window model is evaluated in Section ‎6.3. The S-

DISSparse algorithm proposed for mining discriminative itemsets using the sliding window

model is evaluated in Section ‎6.4. The chapter is finalized by the chapter summary in Section ‎6.5.

6.1 BENCHMARKING

There are several methods proposed with the close concepts to the proposed methods in

this thesis. We list the current similar methods with a brief discussion about their definitions and

their proposed algorithms. We choose the best algorithms for the benchmarking purposes in the

proposed batch processing algorithms and data stream processing algorithms, respectively.

6.1.1 Evaluation benchmarks

Mining discriminative itemset in data streams is a new topic and there not many research

works in this area. In (Lin et al. 2010) three different methods have been proposed for mining

discriminative items in data streams, namely, frequent item based, hash-based method and hybrid

method. These three methods are considered as the first research work on mining discriminative

items in data streams and are used as benchmarks for each other for evaluation purposes. In

Chapter ‎2 we confirmed that the proposed methods in (Lin et al. 2010; Seyfi 2011; Guo et al.

2011) cannot be used for benchmarking in discriminative itemset mining. Also, based on the

literature reviewed in Chapter ‎2 there is no research work in mining discriminative itemsets in

data streams. Out of the current methods for the batch processing, we choose two of the

algorithms for the benchmarking purposes with presenting the sufficient arguments as in the

section below.

6.1.1.1 Batch processing benchmarks

The DISTree method proposed in this thesis is the first method for mining discriminative

itemsets (Seyfi, Geva and Nayak 2014). The DISTree method is an adapted version of the FP-

Growth (Han, Pei and Yin 2000) modified for more than one dataset. The standard FP-Growth

method is adapted to work on more than one dataset as well as pruning the non-discriminative

itemsets instead of infrequent itemsets. The DISSparse method is the advanced efficient method

for mining discriminative itemsets (Seyfi et al. 2017). The proposed DISTree and DISSparse

algorithms in this thesis are completely novel and mine the complete set of the discriminative

itemsets which are frequent in the target dataset based on the minimum support threshold, and are

discriminative in the target dataset compared to the general dataset based on discriminative level

threshold. Therefore, in the evaluation in Section ‎6.2, the DISTree algorithm is chosen as a

baseline model to compare with the proposed DISSparse algorithm.

The DISTree method is used as the first baseline for the DISSparse method. The

discriminative itemsets discovered using both methods are similar. The accuracy of the DISTree

method can be confirmed by the completeness of the FP-Growth (Han, Pei and Yin 2000) and

the correctness of discriminative and non-discriminative itemsets by full traversing the DISTree.

Based on the empirical analysis in this chapter, DISTree takes relatively two times longer than

the basic FP-Growth. The discriminative itemsets do not follow the Apriori property and thus the

divide and conquer as in FP-Growth cannot be implemented. There are also extra processes and

data structures working with a bigger size of datasets from more than one dataset. The DISSparse

method is based on determinative heuristics for efficient mining of discriminative itemsets by

limiting the itemset generation to the potential discriminative subsets. The accuracy of the

DISSparse method can be confirmed, based on its completeness of potential discriminative

itemset combination generation and correctness of discriminative itemsets and non-

The concept of discriminative itemsets is very close to the emerging patterns. There are

many algorithms proposed for mining different types of emerging patterns. We choose one of the

state-of-the-art emerging pattern mining algorithms for the benchmarking with the discriminative

itemset mining algorithm proposed in this thesis. However, the two methods need some

modifications to be prepared for a fare comparison.

The ConsEPMiner (Zhang, Dong and Kotagiri 2000) reduces the cost of emerging

pattern mining by satisfying several constraints including user-defined minimums support,

growth rate and growth-rate improvement. Nevertheless, ConsEPMiner is not efficient when the

minimum support is low. Also, it is unable to handle datasets with very high dimensions; for

example, in market basket datasets. The epMiner (Loekito and Bailey 2006) is proposed for

mining EPs from a high-dimensional dataset by employing the 𝛼 and 𝛽 constraints to reduce the

pattern search space. The epMiner algorithm mines minimal patterns occurring frequently i.e.,

> 𝛼% in the positive class and infrequently i.e., < 𝛽% in the negative class.

The Discriminative Pattern Miner (DPMiner) algorithm (Li, Liu and Wong 2007) is able

to discover the delta-discriminative emerging patterns having the maximum frequency in the

contrasting classes. The DPMiner algorithm is much faster than epMiner and based on the

literature reviewed it is the most efficient method to mine emerging patterns (Dong and Bailey

2012). The DPMiner find‎the‎δ-discriminative emerging patterns which occurs in only one of the

classes with almost no occurrence in any other classes i.e., the itemset appear less than 𝛿 in the

rest of the classes where 𝛿 is usually a small integer number 1 or 2. The DPMiner employs the 𝛼

and 𝛽 constraints as in (Loekito and Bailey 2006) to reduce the pattern search space, however, it

does not find some of the useful emerging patterns compared to the proposed DISTree and

DISSparse methods in this thesis. The difference between these algorithms is in the measures

defined for the discriminative itemsets. The discriminative itemset proposed in this thesis which

is relatively discriminative in the target dataset compared to the general dataset.

The ratio between frequencies of the discriminative itemsets in the target dataset i.e., 𝑓𝑖

and general dataset i.e., 𝑓𝑗 in the research problem of discriminative itemset mining is relative

i.e., 𝑓𝑖(𝐼)𝑛𝑗 𝑓𝑗(𝐼)𝑛𝑖⁄ > 𝜃. This measure does not follow Apriori property as a subset of a non-

discriminative itemsets‎can‎be‎discriminative.‎The‎δ-discriminative emerging patterns have non-

relative static frequency measure in the negative class i.e., (< 𝛿).‎ The‎ problem‎ of‎mining‎ δ-

discriminative emerging patterns is based on the equivalence classes defined by closed patterns

and a set of generators. The measure of (< 𝛿) in 𝛿-discriminative emerging patterns follows the

Apriori property, by finding a border for a non 𝛿-discriminative patterns above the border and

redundant patterns below the border.

Although the DPMiner algorithm (Dong and Bailey 2012) is to mine discriminative

itemsets with 𝑓𝑖 + 𝑓𝑗 > 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 i.e., itemset must be frequent in the datasets, but it

has a specific requirement on 𝑓𝑗 < 𝛿 or 𝑓𝑖 < 𝛿, which is different from the measure used by

DISSparse to determine discriminative itemsets. In the Chapter ‎3, we modified the DISSparse

and its proposed heuristics to find all the delta-discriminative itemsets i.e., including redundant

patterns. We also modified the original DPMiner to include all the delta-discriminative itemsets.

We compare these methods in case of time and space usage for the desired delta-discriminative

emerging patterns. The proposed data stream mining algorithms in this thesis are designed based

on the basic of the proposed batch processing algorithms. We define the evaluation benchmarks

for the data stream mining algorithms as in the section below.

6.1.1.2 Data stream processing benchmarks

The DISTree and DISSparse algorithms are proposed for mining discriminative itemsets

from a single batch of transactions, which can be used for offline updating of the different

window models in data streams. The H-DISSparse algorithm is proposed for mining

discriminative itemsets in data streams using the tilted-time window model by utilizing the

DISSparse method. The precision of the discriminative itemset mining algorithms using the

tilted-time window model is affected by merging the results from multiple batches of

transactions. In Chapter ‎4 a set of corollaries were defined to improve the precision based on the

properties of the discriminative itemset mining in data streams using the tilted-time window

model. In Section ‎6.3, the algorithm constructed based on the basic DISTree method is used as

benchmark for the algorithm constructed based on the efficient DISSparse method.

The S-DISSparse algorithm is proposed for mining discriminative itemsets in data

streams using the sliding window model by utilizing the DISSparse method. The discriminative

itemset mining algorithm using the sliding window model are proposed with full accuracy and

recall by following the recent changes in data structures during the offline sliding window

sliding. In Chapter ‎5, a set of heuristics and corollary were defined to improve the precision of

the discriminative itemset mining algorithms in data streams during the online sliding window

model. In Section ‎6.4, the original algorithm constructed based on the basic of DISSparse method

is used as benchmark for the efficient algorithm constructed based on the DISSparse method for

the sliding window model.

6.1.2 Evaluation environment

All the algorithms were implemented in C++ and the experiments were conducted on a

desktop computer with an Intel Core (TM) Duo E2640 2.8GHz CPU and 8GB main memory

running 64 bit Microsoft Windows 7 Enterprise. All the synthetic datasets used in the

experiments were generated using the IBM synthetic data generator (Agrawal and Srikant 1994).

The input datasets made of two data streams 𝑆1 and 𝑆2 of different sizes, ratios and number of

unique items have been generated. The 𝑇$: 𝐼$: 𝐷$ format shows the datasets with 𝑇 as the

average transaction length, 𝐼 as the average length of the large itemsets and 𝐷 as the number of

transactions. We used the same 𝑇 for both 𝑆1 and 𝑆2 to indicate that both data streams belong to

the same domain and exhibit similar behaviour. The 𝑆1 and 𝑆2 were generated with different 𝐼 as

there are more large itemsets in 𝑆2 which is made of several smaller data streams. This will also

support generation of a larger number of discriminative itemsets. The real datasets used in the

experiments were mainly obtained from the UCI repository (Dheeru and Karra Taniskidou

2017). The details about the real datasets applied in the experiments are presented in the related

sub-sections.

It should be noted that, in our experiments, the runtime means the total execution time,

i.e., the period between input and output, instead of CPU time measured in the experiments in

some literature. The reported runtimes and space usages include the construction time and space

usage related to the FP-Tree structure, for holding the input transactions in a concise way. This is

considered as an overhead for all algorithms. For simplicity, we defined the combination of

𝜑𝜃𝑛1 as minimum support. The number of discriminative itemsets in datasets exponentially goes

up with lower minimum support. There are pretty long discriminative itemsets with a large

number of short discriminative itemsets. The datasets contain abundant mixtures of short and

long discriminative itemsets.

6.2 BATCH PROCESSING

In this section we evaluate the DISTree and DISSparse algorithms by focusing only on a

single batch of transactions. The DISTree and DISSparse algorithms are evaluated in a wide

range of datasets exhibiting various characteristics and with different parameter settings. Three

main input datasets, made of target data stream 𝑆1 and general stream 𝑆2 with different size,

average length of transactions, average length of the large itemsets and number of transactions,

have been generated with 1𝑘 and 10𝑘 numbers of unique items. The general data stream 𝑆2 is

typically considered much bigger than the target data stream 𝑆1 as it combines multiple datasets.

In this section, the strategies and principles are recommended for tuning the parameters based on

the application domains and data stream characteristics.

6.2.1 Evaluation on synthetic datasets

The scalability of DISTree and DISSparse algorithms is presented within different ranges

of discriminative level 𝜃, support threshold 𝜑, ratios between 𝑆1 and 𝑆2 (𝑛2/𝑛1) and also the

number of unique items in the alphabet set ∑.

Dataset 𝑫𝟏 is generated with 𝑆1 as 𝑇25: 𝐼10: 𝐷10𝐾 and 𝑆2 as 𝑇25: 𝐼15: 𝐷50𝐾 limited

to 1𝐾 unique items. The number of discriminative itemsets exponentially goes up in the lower

minimum supports. The scalability of DISSparse is compared with DISTree, by different

discriminative level 𝜃 and a fixed support threshold 𝜑 = 0.01%, as presented in Figure ‎6.1.

Figure ‎6.1 Scalability with discriminative level 𝜃 for 𝐷1 (support threshold 𝜑 = 0.0001)

DISSparse scales much better than DISTree specifically for lower minimum supports by

decreasing the 𝜃 when a greater number of discriminative itemsets with lower minimum support

and frequency ratio are discovered. The two algorithms have close behaviour in large minimum

supports when there is less number of discriminative itemsets (i.e., for 𝜃 = 10 in Figure ‎6.1 the

number of discriminative itemsets is less than 200k). By large minimum supports both DISTree

and DISSparse prune many items that are infrequent in the target data stream 𝑆1, during the

making of conditional FP-Tree. In Figure ‎6.1 the exponential growth in space usage of DISTree

with smaller minimum support is observed caused by exponential number of generated itemset

combinations.

For showing the efficiency gained by the proposed heuristic the experiments are repeated

on 𝐷1, by different discriminative level 𝜃 and a fixed support threshold 𝜑 = 0.01%, by

separately testing the DISSparse algorithm one time eliminating HEURISTIC ‎3.1 and one time

eliminating HEURISTIC ‎3.2. The proposed data structures and processes in DISSparse

algorithm without using the proposed heuristics are not efficient and DISSparse does not scale

well compared to DISTree (i.e., the time complexity of DISSparse without considering the

heuristic generally scales similar to DISTree). The space usage of DISSparse eliminating the

heuristics is still better compared to DISTree. The main contribution regarding the DISSparse

efficiency is related to the HEURISTIC ‎3.2 however the effect of HEURISTIC ‎3.1 becomes

clear when the lower minimum supports are considered.

The scalability of the DISTree and DISSparse algorithms is tested with different support

thresholds 𝜑 and fixed discriminative level 𝜃 = 15 as presented in Figure ‎6.2. The time and

space complexity of DISSparse and DISTree by changing support thresholds 𝜑 and a fixed

discriminative level 𝜃 = 15, follow same patterns as in Figure ‎6.1. The support threshold 𝜑 has

similar effect as discriminative level 𝜃 on the time and space complexity as these two parameters

together define the minimum support of discriminative itemsets.

Figure ‎6.2 Scalability with support threshold 𝜑 for 𝐷1(discriminative level 𝜃 = 10)

The next experiments are done in the datasets defined in the same way as 𝐷1 but with

different dataset length ratios between size of 𝑆1 and 𝑆2. The number of transactions in 𝑆2 is

changed from 10𝑘 to 100𝑘. In this setting, a greater number of itemsets fits to the definition of

discriminative itemsets as ratio 𝑛2/𝑛1 increased gradually from smaller values as presented in

Figure ‎6.3. The scalability of DISSparse and DISTree by changing the ratio of 𝑛2/𝑛1 is

presented in Figure ‎6.4. The runtime in DISSparse is mainly increased as a consequence of rise in

the number of discriminative itemsets. The linear increase in the time and space complexity of

DISTree by larger ratios is caused by the requirement of bigger data structures with longer

processing time (i.e., the dataset with 𝑛2

𝑛1= 10 is ten times bigger than dataset with 𝑛1 = 𝑛2 as in

Figure ‎6.4).

Figure ‎6.3 Number of the discriminative itemsets with different dataset length ratios (𝑛2/𝑛1)

(discriminative level 𝜃 = 10 and support threshold 𝜑 = 0.0001)

Figure ‎6.4 Scalability with different dataset length ratios (𝑛2/𝑛1) (discriminative level

𝜃 = 10 and support threshold 𝜑 = 0.0001)

Dataset 𝑫𝟐 is generated with 𝑆1 as 𝑇25: 𝐼10: 𝐷10𝐾 and 𝑆2 as 𝑇25: 𝐼15: 𝐷50𝐾 limited

to 10𝐾 unique items. The transactions in 𝐷2 are made of sparse items from 10𝐾 unique items

and the number and average length of discriminative itemsets decreases. Both algorithms prune

many items that are infrequent in 𝑆1 by making conditional FP-Tree and DISTree scale well with

close time complexity to the DISSparse but larger space usage as in Figure ‎6.5.

Dataset 𝑫𝟑 is generated with 𝑆1 as 𝑇25: 𝐼10: 𝐷100𝐾 and 𝑆2 as 𝑇25: 𝐼15: 𝐷500𝐾

limited to 10𝐾 unique items. The 𝑛1 is much bigger in 𝐷3 as compared to the other datasets that

caused larger minimum support and consequently less number of discriminative itemsets (e.g.,

for minimum support 𝜑𝜃𝑛1 = 50. .100 the number of discriminative itemsets with different

ratio 𝜃 varies from 3 million to a hundred thousand as in Figure ‎6.7). In this setting also both

algorithms prune many items that are infrequent in 𝑆1 by making conditional FP-Tree as in

Figure ‎6.6.

Figure ‎6.7 Number of the discriminative itemsets with different 𝜃 for 𝐷3 (support threshold

𝜑 = 0.0001)

DISSparse scales very well for large datasets by a prolific number of discriminative

itemsets with smaller minimum supports as in Figure ‎6.8. The experiments are conducted on 𝐷3

using homogeneous parameter setting with experiments reported in Figure ‎6.1 i.e., the support

threshold is set to 𝜑 = 0.00001 for having minimum support, 𝜑𝜃𝑛1 = 0.00001 ∗ 𝜃 ∗

100,000 = 𝜃. The DISSparse and DISTree have smooth growth in their time complexity by

decreasing the discriminative level 𝜃 as in Figure ‎6.8. However, when the minimum support

becomes very small DISTree becomes intolerable by an exponential number of discriminative

itemsets (i.e., from 6𝑀 for 𝜃 = 35 to 11𝑀 for 𝜃 = 25 as in Figure ‎6.9).

Figure ‎6.9 Number of the discriminative itemsets with discriminative level 𝜃 for 𝐷3 (support

threshold 𝜑 = 0.0001)

To show the relationship between distribution of the frequent items in each dataset and

the distribution of the items in discriminative itemsets the experiments are conducted on 𝐷1. The

distribution of frequent items is modified in dataset 𝐷1. The frequent items in 𝑆1 are deliberately

started from the items with smaller identifiers as in Figure ‎6.10; for example, item 0 is the most

frequent item and item 999 is the least frequent item in 𝑆1. The frequent items in 𝑆2 are

delibrately started from the items with the larger identifier as in Figure ‎6.11; for example, item

999 is the most frequent item and item 0 is the leat frequent item in 𝑆2.

Figure ‎6.10 Frequent items distribution in 𝑆1

Figure ‎6.11 Frequent items distribution in 𝑆2

The experiments are conducted on the modified 𝐷1 by setting the support threshold to

𝜑 = 0.0001 and discriminative level 𝜃 = 10. The distribution of items in the discriminative

itemsets (i.e., two hundred ten thousand discriminative itemsets based on the parameter setting) is

represented as in Figure ‎6.12. The frequency of items in discriminative itemsets is divided by ten

for the sake of clarity. The discriminative itemsets are mainly made of the items which have high

frequency in 𝑆1 and less frequency in 𝑆2 as in Figure ‎6.12.

discriminative level 𝜃 = 10 and support threshold 𝜑 = 0.0001

To show the relationship between distribution of the frequent items in consistent datasets

with similar item distribution, other experiments are conducted on 𝐷1. The distribution of

frequent items is modified in dataset 𝐷1. The frequent items in 𝑆1 and 𝑆2 are deliberately started

from the items with the smaller identifier as in Figure ‎6.13; for example, item 0 is the most

frequent item and item 999 is the least frequent item in both 𝑆1 and 𝑆2.

discriminative level 𝜃 = 10 and support threshold 𝜑 = 0.0001

The experiments are conducted on the modified 𝐷1 by the similar parameter setting (i.e.,

𝜑 = 0.0001 and 𝜃 = 10). The distribution of items in the discriminative itemsets is represented

as in Figure ‎6.13. The number of discriminative itemsets decreased, as compared to the previous

experiments, by ten percent (i.e., one hundred ninety thousand discriminative itemsets based on

the parameter setting). The discriminative itemsets are still made of the items that have high

frequency in 𝑆1 but in smaller number.

6.2.2 Evaluation on real datasets

To evaluate the proposed DISSparse algorithm on real applications we ran the experiments

on some real datasets. The susy and mushroom datasets from the UCI repository (Dheeru

and Karra Taniskidou 2017) and accident dataset all provided in (Fournier-Viger et al. 2016)

were used. The selected datasets are dense (i.e., transactions have values for each attribute)

with less sparsity characteristics compared to the synthetic datasets. For this reason we set

the parameters in a way to show the scalability in the best scales.

The susy dataset is related to high-level features derived by physics to help discriminate

between two classes defined as signal and background. It is related to the particles detected using

particle accelerator based on Monte Carlo simulations. This dataset is made of five million

instances and the first column is the class label followed by eighteen features. The transactions

are made of about hundred ninety unique items. We selected the first hundred thousand instances

for the scale of the experiments (i.e., 2% of dataset). As in Figure ‎6.14 the DISSparse scales

better than DISTree specifically for lower minimum supports when a greater number of

discriminative itemsets are discovered. The number of discriminative itemsets is from 17

thousands for 𝜃 = 1.75 to 2 thousands for 𝜃 = 3. The interesting feature in this dataset

compared to the synthetic market basket dataset is that the majority of the discriminative itemsets

has high frequencies in the both datasets. We observed less number of patterns with zero or small

frequency in the general dataset. This shows the significance of the proposed DISSparse

algorithm for the applications with no inherent discrimination. This feature also exists in most of

the real applications we explained in Chapter ‎3.

Figure ‎6.14 Scalability with discriminative level 𝜃 for susy dataset (support threshold

𝜑 = 0.01)

We ran the experiments with different 𝜑 and fixed 𝜃 = 2 as presented in Figure ‎6.15.

The 𝜑 has similar effect as 𝜃 on the time and space complexity.

Figure ‎6.15 Scalability with support threshold 𝜑 for susy dataset (discriminative level 𝜃 = 2)

The accident dataset is anonymized traffic accident data obtained from the National

Institute of Statistics (NIS) for the region of Flanders (Belgium) for the period 1991-2000

(Fournier-Viger et al. 2016). This dataset is also dense with large and varied transactions sizes.

The transactions are made of about four hundred fifty items. We selected the full size dataset

(i.e., a three hundred forty thousand instances) for the experiments by taking the first column as

the class label. The DISSparse scales slightly better than DISTree for lower minimum supports as

in Figure ‎6.16. In this dataset we observed many discriminative itemsets with high frequency in

both datasets (i.e., two classes of data). However, there are many patterns with zero frequency in

the general dataset compared to the patterns in susy dataset. The interesting feature in this dataset

is that the discriminative itemsets are mixture of itemsets with high frequencies in both datasets,

and itemsets with high frequency in one dataset and zero frequency in other datasets.

Figure ‎6.16 Scalability with discriminative level 𝜃 for accident dataset (support threshold

𝜑 = 0.01)

The mushroom dataset includes descriptions of hypothetical samples corresponding to

23 species of gilled mushrooms in the Agaricus and Lepiota Family. Each species is identified as

definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter

class was combined with the poisonous one. The Guide clearly states that there is no simple rule

for determining the edibility of a mushroom. This dataset is made of 8124 instances and the first

column is the class label followed by 22 features. The transactions are made of about hundred

twenty unique items. We selected the full size dataset for the experiments.

For the mushroom dataset the two algorithms scale similar in case of time complexity

with better space usage in DISSparse algorithm as in Figure ‎6.17. We realized for the datasets

(i.e. different classes of dataset) with inherent discrimination, like edible and poisonous classes in

the mushroom dataset, the algorithm finds many discriminative itemsets even with considering

very high minimum supports and discriminative level thresholds.

Figure ‎6.17 Scalability with discriminative level 𝜃 for mushroom dataset (support threshold

𝜑 = 0.001)

The DISSparse algorithm works efficiently for the discriminative itemsets with low

supports and low discriminative level thresholds when there are a large number of items in the

datasets. This is very important in the applications we explained in Chapter ‎3. However, the

mushroom dataset has very few number of unique items (i.e., less than 120) and does not have

the sparsity characteristics (i.e., the number of discriminative itemsets is not necessarily much

less than the number of frequent itemsets). Because of this, most of the subtrees in the conditional

FP-Tree are considered potential and cannot be ignored.

6.2.3 Discussion on 𝜹-discriminative emerging patterns

The discriminative itemsets are frequent in the target dataset with relatively different

supports‎in‎the‎target‎dataset‎and‎the‎general‎dataset.‎The‎δ-discriminative emerging patterns are

frequent in the target dataset but infrequent (i.e., frequency < 𝛿) in the other datasets. The

DPMiner (Li, Liu and Wong 2007) discovers the equivalence classes (ECs) and employs 𝛿

constraint to reduce the pattern search space by setting a border of non 𝛿-discriminative and

redundant emerging patterns. It excludes the subsets with frequencies that are larger than 𝛿 in the

general dataset and the supersets with less frequency in the datasets. We observed big differences

in the number of discovered discriminative itemsets between DPMiner and DISSparse with

several different parameter settings. The wide difference in the number of discovered itemsets is

in the redundant 𝛿-discriminative emerging patterns. The redundant itemset is a superset of delta-

discriminative itemset with the same infinite ratio between the supports in the target and general

dataset (e.g., the itemset {𝑎, 𝑏, 𝑐} with 𝑓𝑖 = 10 and 𝑓𝑗 = 0 and itemset {𝑎, 𝑏} with 𝑓𝑖 = 15 and

𝑓𝑗 = 0 is considered discriminative by DISSparse while for DPMiner, {𝑎, 𝑏} is considered 𝛿-

discriminative, but itemset {𝑎, 𝑏, 𝑐} is not because {𝑎, 𝑏, 𝑐} is superset of the discriminative

itemset {𝑎, 𝑏} with less frequency in the both datasets 𝑆𝑖 and 𝑆𝑗, and such considered redundant).

The discriminative itemsets can be frequent in target dataset and general dataset without

any static limit for their frequency in the general dataset, but DPMiner does not generate itemsets

which are frequent in the general dataset (e.g., by setting 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑠𝑢𝑝 = 0.01, 𝛿 = 5 and

𝜃 = 2 in a dataset the DISSparse discovers the itemset {𝑎, 𝑏, 𝑐} with 𝑓𝑖 = 100 and 𝑓𝑗 = 50 as

discriminative itemset, but the DPMiner skips this itemset because of 𝑓𝑗 = 50 > 𝛿). The

discriminative itemsets are useful when the itemsets are frequent in all datasets and when the

support and confidence of itemsets are needed explicitly, for example, the itemsets in market

basket frequently appeared in all suburbs with relatively higher frequency in the target suburb

with different aging population. For fare comparison with the DISSparse algorithm we modified

the original DPM algorithm to cover all the delta-discriminative emerging patterns (i.e., including

redundant emerging patterns). Also, we explained earlier that the discriminative itemsets

discovered by DISSparse algorithm cannot be limited with the static bound of frequency (< 𝛿) in

the general dataset. We modified the definition criteria and consequently the proposed heuristics

in DISSparse algorithm in Chapter ‎3, to‎discover‎all‎the‎δ-discriminative emerging patterns as in

the section below.

6.2.3.1 Evaluation on modified DISSparse and modified DPM

The scalability of the modified algorithms is tested with different 𝑚𝑖𝑛 _𝑠𝑢𝑝 and fixed

𝛿 = 2 on Dataset 𝑫𝟏 as presented in Figure ‎6.18. Modified DISSparse scales much better than

the modified DPM specifically for lower 𝑚𝑖𝑛 _𝑠𝑢𝑝 when a greater number of delta-

discriminative itemsets are discovered. The algorithms have close behaviour in large 𝑚𝑖𝑛 _𝑠𝑢𝑝

when there is less number of discriminative itemsets (i.e., for 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 60 and 𝑚𝑖𝑛 _𝑠𝑢𝑝 =

65 the number of delta-discriminative itemsets is less than 100k). By large 𝑚𝑖𝑛 _𝑠𝑢𝑝 both

modified DPM and modified DISSparse prune many items that are infrequent in the datasets

during the making of conditional FP-Tree. In Figure ‎6.18 the exponential growth in time and

space usage of modified DPM with smaller 𝑚𝑖𝑛 _𝑠𝑢𝑝 is observed caused by exponential number

of generated itemset combinations.

Figure ‎6.18 Scalability with 𝑚𝑖𝑛_𝑠𝑢𝑝 for 𝐷1 (δ = 2)

The scalability of the modified algorithms is tested with different delta-discriminative

value δ and fixed 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 50 on Dataset 𝑫𝟏 as presented in Figure ‎6.19. In this experiment

we observed a very small increase in time and space usage of the both algorithms by changing δ

to larger values.

Figure ‎6.19 Scalability with δ for 𝐷1 (𝑚𝑖𝑛_𝑠𝑢𝑝 = 50)

For comparison in a bigger dataset with higher sparsity characteristics the scalability of

the modified algorithms is tested with different 𝑚𝑖𝑛 _𝑠𝑢𝑝 and fixed 𝛿 = 50 on Dataset 𝑫𝟑 as

presented in Figure ‎6.20. We observed similar behaviour as for Dataset 𝑫𝟏 with much better

time and space complexity in modified DISSparse than the modified DPM specifically for lower

𝑚𝑖𝑛 _𝑠𝑢𝑝. In Figure ‎6.20 the exponential growth in time and space usage of modified DPM with

smaller 𝑚𝑖𝑛 _𝑠𝑢𝑝 is observed caused by exponential number of generated itemset combinations.

Figure ‎6.20 Scalability with 𝑚𝑖𝑛_𝑠𝑢𝑝 for 𝐷3 (δ = 50)

The scalability of the modified algorithms is tested with different delta-discriminative

value δ and fixed 𝑚𝑖𝑛 _𝑠𝑢𝑝 = 450 on Dataset 𝑫𝟑 as presented in Figure ‎6.21. Similar for

Dataset 𝑫𝟏 we observed a very small increase in time and space usage of both algorithms by

changing δ to larger values.

Figure ‎6.21 Scalability with δ for 𝐷3 (𝑚𝑖𝑛_𝑠𝑢𝑝 = 50)

We modified the DISTree algorithm to find the delta-discriminative itemsets for the

purpose of comparison with the modified DISSparse and the modified DPM algorithm.

However, the time and space usage of the modified DISTree are totally out of range and not

possible to show it in the reported scales which we used for the modified DISSparse and the

modified DPM.

6.2.3.2 Evaluation on real datasets with modified DISSparse and modified DPM

The modified DISSparse algorithm and modified DPM algorithm have different

behaviours with the three real datasets we used in this thesis. The mushroom dataset has high

inherent discriminate classes. The modified DPM is much more efficient with mushroom dataset

compared to the modified DISSparse. The definition of the 𝛿-discriminative itemsets is generally

based on the small 𝛿 value (i.e., 0 or 1) (Li, Liu and Wong 2007). Mining discriminative itemsets

with small 𝛿 values is useful in datasets with inherent discriminate classes when discriminative

itemsets are frequent in one class and have few occurrences in the other classes. Most of the

delta-discriminative itemsets in these type datasets are jumping emerging patterns with high

frequency in one class and zero frequency in the other classes. The modified DPM in this dataset

finds the delta-discriminative itemsets with small 𝛿 much faster than the modified DISSparse.

The modified DISSparse is more efficient when there are no inherent discriminate

classes in the dataset as in the susy and accident datasets. In susy and accident datasets we need

to set large 𝛿 for mining delta-discriminative itemsets which is against the definition in original

DPM (Li, Liu and Wong 2007). The definition of the discriminative itemsets proposed in this

thesis is based on the relative differences of the supports in datasets (i.e., classes of dataset). This

is actually in favour of the modified DISSparse to set the large 𝛿 values for mining delta-

discriminative itemsets. Even with setting the large 𝛿 values in the modified DISSparse, the

original DISSparse algorithm still finds more discriminative itemsets with different relative

supports. The proposed DISSparse method in this thesis targets the applications similar to the

susy and accident datasets with no inherent discrimination in the dataset. Most of the

discriminative itemsets in this type of datasets are frequent in all data classes with much higher

frequency in one class against the other classes. We explained several of the real applications for

the discriminative itemset mining in Chapter ‎1.

6.2.4 Discussion

The general outcome from experiments discussed in previous sub-sections is explained

here. The DISSparse algorithm with the proposed heuristics exhibits efficient time and space

complexity, specifically when discriminative itemsets with lower minimum support and

frequency ratios are of interest. Setting the support threshold 𝜑 and discriminative level 𝜃, with

considering reasonable size for input batch of transactions 𝑛1, is very important in the real

applications. These three parameters (i.e., 𝜑, 𝜃 and 𝑛1) set the minimum support for the

frequency of discriminative itemsets in the target dataset. Setting of these parameters is highly

related to the application and domain experts by considering limited computing and storage

capabilities. The smaller size for an input batch of transactions must be considered for

applications that need continuous updating in short time intervals especially for discriminative

itemsets with small minimum supports and ratios. The discovered patterns may expire and

become out of interest as time goes by and new transactions added to the data streams. The

algorithms were also tested on synthetic simpler datasets with average length of transactions 𝑇

equal to 10 and 15. The number of discriminative itemsets is much smaller even with smaller

minimum supports and frequency ratios and DISTree scales well and close to the DISSparse.

However, the space usage in the DISTree still had exponential growth for smaller minimum

supports as observed in the reported experiments above. The number of unique items in alphabet

∑ can change the distribution of sparse transactions and has to be considered for setting batch

sizes with good scalability.

The algorithms designed for the data stream processing should have single scan on the

input datasets, as explained in Chapter ‎2 in the literature review. In (Seyfi, Geva and Nayak

2014), we proposed the early algorithm with one scan for mining discriminative itemsets. In this

algorithm the transactions are not sorted based on the nice ordering of the frequent items and FP-

Tree is made of full size transactions that lead to the huge data structure. This will also affect the

size of other data structures as they are not using the common subsequent sharing for time and

space saving. This is not comparable with DISTree and DISSparse methods constructed by

descending order of frequent items in their data structures in the designed experiments. The

results were unacceptable even considering simple and small datasets, showing the necessity of

another scan for sorting the items based on descending order of their frequencies as in (Giannella

et al. 2003).

The emerging pattern mining algorithms cannot be directly used as baseline for

efficiency of the DISSparse algorithm. The essence and the concept are different between EPs

and discriminative itemsets. Conceptually, for EPs, it is easy to deal with this problem. We print

the itemsets with exact frequencies but EP mining methods show the patterns between borders.

JEPs consider the patterns whose support is non-zero in one dataset and zero in the other. eJEPs

are the minimal JEPs. JEPs is more general.

6.3 TILTED-TIME WINDOW MODEL

In this section we evaluate the H-DISSparse algorithm using data streams modelled as

multiple batches of transactions. The H-DISSparse algorithm is evaluated with different

parameter settings for the highest refined approximate bound in discriminative itemsets in the

tilted-time window model. The data streams made of different numbers of transactions were

generated using the IBM synthetic data generator (Agrawal and Srikant 1994). The main dataset

is generated with 𝑆1 as 𝑇25: 𝐼10: 𝐷320𝐾 and 𝑆2 as 𝑇25: 𝐼15: 𝐷1600𝐾 limited to 1𝐾 unique

items. The data streams are modelled as 32 continuous batches in the same sizes (i.e., for the

sake of clarity) with 𝑇25: 𝐼10: 𝐷10𝐾 and 𝑇25: 𝐼15: 𝐷50𝐾 belong to the target data stream 𝑆1

and general data stream 𝑆2, respectively. The ratio between size of 𝑆1 and 𝑆2 is also the same for

all 32 batches (i.e., 𝑛2/𝑛1 = 5). It should be noted that in the real applications, the input batches

could be in different sizes and have different characteristics as the data streams have different

speeds and different characteristics during the time. In this section, the strategies and principles

are recommended for tuning the parameters based on the application domains and data stream

characteristics.

The scalability of H-DISSparse is presented within offline updating of the tilted-time

window model after processing each batch of transactions. In this section during all experiments

the discriminative level 𝜃 = 10 and support threshold 𝜑 = 0.01% but the scalability of the

algorithms are tested based on different relaxations of 𝛼. It is assumed while the new batch is

loaded with transactions the H-DISStream updating can be done by processing the current batch

of transactions. This works well as far as the algorithms are faster than the rate of incoming data

streams. Table ‎6.1 shows the number of discriminative itemsets in the tilted-time window model

by processing each batch considering 𝛼 = 1 i.e., no sub-discriminative itemset.

Table ‎6.1 The number of discriminative itemsets in the tilted-time window model

The number of discriminative itemsets in the batches (i.e., presented in 𝑊0) are different

because of distributions of the transactions. The embedded knowledge and the trends in data

streams change through time by the concept drifts. In Table ‎6.1 the number of discriminative

itemsets in the current window frames 𝑊0 changes dramatically by processing the batches 𝐵1

and 𝐵9, respectively. This has high effects on the algorithm scalability, as in Figure ‎6.22 and

Figure ‎6.24. The discriminative itemsets in the larger tilted window frames are usually sparser.

The data streams in the larger window frames 𝑊𝑘 (i.e., 𝑘 > 0) have higher lengths and the

discriminative itemsets appeared with high concept drifts are neutralized. For the sake of clarity,

the scalability of the algorithm is first represented by time and space complexities for processing

the batch of transactions (i.e., not considering tilted-time window model updating) as in

Figure ‎6.22. We used the DISTree algorithm (Seyfi, Geva and Nayak 2014) and DISSparse

algorithm (Seyfi et al. 2017) for processing the batch of transactions as discussed in Chapter ‎4.

Figure ‎6.22 Scalability of batch processing not considering the tilted-time window model

updating

The variations in batch processing time and space are caused by the concept drifts in

transaction distribution in the batches. The variations are not big in the DISSparse algorithm as

compared to the DISTree algorithm. However, the DISSparse algorithm also has high time and

space complexities for processing the batches 𝐵1 and 𝐵9 with a high number of discriminative

itemsets.

The tilted-time window model updating time for the algorithm is represented in

Figure ‎6.23. The H-DISSparse algorithm mainly has less time usage for the tilted-time window

model updating compared to the time usage for batch processing. The fluctuations are mainly

because of the tilted-time window updating by a different number of discriminative itemsets

discovered in the batches. The high growths in the time usage of the algorithms are because of

the wide tail pruning in the H-DISStream structure caused by high concept drifts in the old

batches; for example after processing 𝐵3 a wide number of non-discriminative itemsets is pruned

during tail pruning process. These are mainly appeared in the tilted-time window model after

processing 𝐵1 with high concept drifts, as in Table ‎6.1.

Figure ‎6.23 Tilted-time window model updating time complexity

We ran the full algorithm with the DISTree method for the batch processing, called H-

DISTree. Obviously the H-DISSparse algorithm is more efficient and scalable for batch of

transactions with different characteristics. The full time complexity of the H-DISTree and H-

DISSparse algorithms is represented in Figure ‎6.24. The H-DISTree time complexity is highly

affected even by small concept drifts (e.g., the batch processing for 𝐵28 is completely out of

tolerable range). For the rest of the experiments in this section, the scalability of the H-DISSparse

algorithm is represented by the full time complexity in mining discriminative itemsets in the

tilted-time window model.

Figure ‎6.24 Time complexity of H-DISTree and H-DISSparse algorithms

The H-DISStream size as the biggest data structure in the designed algorithms is

presented in Figure ‎6.25. Despite the batches with high concept drifts (e.g., the batch of

transactions with large number of discriminative itemsets) the H-DISStream size tends to become

stable with very small growth by processing a larger number of batches in the data streams. The

high growth in the size of H-DISStream caused by concept drifts is quickly neutralized by

processing the new batches and applying tail pruning (e.g., the growth H-DISStream size caused

by 𝐵9 is neutralized by processing the next few batches). Following the compact logarithmic

tilted-time window model and by applying the tail pruning as in Corollary ‎4-3 the H-DISStream

size stays small as in-memory data structure. The periodic drops in the size of H-DISStream are

caused by merging the tilted window frames. This can be seen more clearly in Figure ‎6.28 after

processing 𝐵8, 𝐵16, 𝐵24, 𝐵27 and 𝐵32.

Figure ‎6.25 H-DISStream structure size

6.3.1.1 Approximation in discriminative Itemsets in the tilted-time window model

In this section the scalability of the H-DISSparse algorithm, as a highly accurate and

highly efficient method for mining discriminative itemsets using the tilted-time window model

with the highest refined approximate bound, is evaluated with different parameter settings. The

H-DISTree algorithm is not scalable for processing large data streams with the highest

approximate bound and cannot be evaluated in this section.

Three corollaries are defined in Chapter ‎4 for mining discriminative itemsets using the

tilted-time window model with the highest refined approximate bound. The relaxation of 𝛼 in

Corollary ‎4-2 is set for the highest refined approximate bound in discriminative itemsets in the

tilted-time window model. The H-DISSparse time usage and the H-DISStream size are

represented in Figure ‎6.26 by a different setting for the relaxation of 𝛼 i.e., 𝛼 = 1, 𝛼 = 0.9 and

𝛼 = 0.75. The H-DISSparse scales well by relaxation of 𝛼 = 0.9 with improvement in

approximate discriminative itemsets as compared to relaxation of 𝛼 = 1. The H-DISSparse

scalability with smaller relaxation of 𝛼 (e.g., 𝛼 = 0.75) is more sensitive to the concept drifts in

data streams as with higher variations in time and space complexity in Figure ‎6.26.

Figure ‎6.26 Scalability of H-DISSparse algorithm by relaxation of 𝛼 = 1, 𝛼 = 0.9 and

𝛼 = 0.75

Figure ‎6.27 shows the number of sub-discriminative itemsets by setting relaxation of

𝛼 = 0.9 and 𝛼 = 0.75, respectively. The sub-discriminative itemsets are considered as overhead

for the algorithms and can be increased exponentially by setting very small relaxation of 𝛼 (e.g.,

by relaxation of 𝛼 = 0.75 the average number of sub-discriminative itemsets in the batches is

greater than 1 million).

Figure ‎6.27 Number of sub-discriminative itemsets by relaxation of 𝛼 = 0.9 and 𝛼 = 0.75

Table ‎6.2 shows the number of discriminative itemsets in the tilted-time window model

by processing each batch considering 𝛼 = 0.9.

6.3.1.2 Discriminative Itemsets in the tilted-time window model without tail pruning

The tail pruning defined in Corollary ‎4-3 is applied to the H-DISStream structure for

space saving by pruning the least potential discriminative itemsets in the tilted-time window

model. The Corollary ‎4-1 is defined for obtaining the exact frequencies of the itemsets from

their maintaining time in the tilted-time window model. The exact frequencies of the non-

discriminative subsets are obtained by traversing the FP-Tree through Header-Table links for

their appearances in the current batch of transactions. Figure ‎6.28 shows the scalability of the

original H-DISSparse algorithm eliminating Corollary ‎4-1 and Corollary ‎4-3, respectively.

Eliminating Corollary ‎4-1 adds more time complexity to the few batches (e.g., 𝐵3, 𝐵9, 𝐵11 and

𝐵17), mainly because of adding wrong discriminative itemsets to the process. The H-DISStream

size as compared to the original algorithm is much larger by eliminating Corollary ‎4-1. The H-

DISStream saves many wrong discriminative itemsets. This is caused for more tail pruning and

higher approximation in discriminative itemsets in the tilted-time window model. The size of H-

DISStream eliminating Corollary ‎4-1 is hided under H-DISStream size eliminating

Corollary ‎4-3 as in Figure ‎6.28.

Figure ‎6.28 Scalability of H-DISSparse algorithm by eliminating Corollary ‎4-1 and

Corollary ‎4-3

The H-DISSparse algorithm shows higher time complexity by eliminating the

Corollary ‎4-3 as a consequence of bigger data structures. However, in two points (i.e., after

processing 𝐵3 and 𝐵11) the time complexity decreases caused by eliminating the wide tail

pruning of discriminative itemsets appeared by concept drifts. The H-DISStream size becomes

much bigger during the time. The periodic drops in H-DISStream size after processing 𝐵8, 𝐵16,

𝐵24, 𝐵27 and 𝐵32 is caused by merging the larger window frames in the tilted-time window

model.

To evaluate the proposed H-DISSparse algorithm on real applications we ran the

experiments on the real datasets. The susy dataset from the UCI repository (Dheeru and Karra

Taniskidou 2017) provided in (Fournier-Viger et al. 2016) was used. The selected dataset is

dense (i.e., transactions have values for each attribute) with less sparsity characteristics compared

to the synthetic market basket datasets. For this reason we set the parameters in a way to show

the scalability in the best scales. The susy dataset is related to high-level features derived by

physic to help discriminate between two classes defined as signal and background. It is related to

the particles detected using particle accelerator based on Monte Carlo simulations. This dataset is

made of five million instances and the first column is the class label followed by eighteen

features. The transactions are made of about hundred ninety unique items. We selected the 32

batches each of them made of fifty thousand instances for the scale of the experiments. The H-

DISSparse algorithm is evaluated with different parameter settings for the highest refined

approximate bound in discriminative itemsets in the tilted-time window model.

6.3.2.1 Scalability on datasets with less concept drifts in the tilted-time window model

The scalability of H-DISSparse is presented within offline updating of the tilted-time

window model after processing each batch of transactions. In this section during all experiments

the discriminative level 𝜃 = 2 and support threshold 𝜑 = 1% and the scalability of the

algorithms are tested based on different relaxations of 𝛼. It is assumed while the new batch is

loaded with transactions the H-DISStream updating can be done by processing the current batch

of transactions. This works well as far as the algorithms are faster than the rate of incoming data

streams. Table ‎6.3 shows the number of discriminative itemsets in the tilted-time window model

by processing each batch considering 𝛼 = 1 i.e., no sub-discriminative itemset.

The number of discriminative itemsets in the batches (i.e., presented in 𝑊0) are different

because of distributions of the transactions. The embedded knowledge and the trends in data

streams do not change through time by the concept drifts. In Table ‎6.3 the number of

discriminative itemsets in the current window frames 𝑊0 does not have high changes. The

discriminative itemsets in the larger tilted window frames are in almost the same numbers as in

the smaller tilted window frames. In the susy datasets the discriminative itemsets in the adjunct

tilted window frames are similar and the same discriminative itemsets are discovered in the larger

merged tilted window frames. The discriminative itemsets appeared in adjunct smaller frames are

merged as new discriminative itemsets in larger window frames. The data streams in the larger

window frames 𝑊𝑘 (i.e., 𝑘 > 0) have higher lengths.

For the sake of clarity, the scalability of the algorithm is first represented by time and

space complexities for processing the batch of transactions (i.e., not considering tilted-time

window model updating) as in Figure ‎6.29. We used the DISTree algorithm (Seyfi, Geva and

Nayak 2014) and DISSparse algorithm (Seyfi et al. 2017) for processing the batch of transactions

as discussed in Chapter ‎4.

Figure ‎6.29 Scalability of batch processing not considering the tilted-time window model

updating

The small variations in batch processing time and space are caused by the difference in

the number of discriminative itemsets in the batches. The algorithm mainly has less time usage

for the tilted-time window model updating compared to the batch processing. In the tilted-time

window updating there are no fluctuations mainly because of the similar number of

discriminative itemsets discovered in the continuous batches. There is no wide tail pruning

compared to the experiments with synthetic datasets. This real dataset does not have high concept

drifts in the continuous batches.

We ran the full algorithm with the DISTree method for the batch processing, called H-

DISTree. Obviously the H-DISSparse algorithm is more efficient and scalable for batch of

transactions with different characteristics. The full time complexity of the H-DISTree and H-

DISSparse algorithms is represented in Figure ‎6.30. The H-DISTree time complexity is affected

even by small concept drifts. For the rest of the experiments in this section, the scalability of the

H-DISSparse algorithm is represented by the full time complexity in mining discriminative

itemsets in the tilted-time window model.

Figure ‎6.30 Time complexity of H-DISTree and H-DISSparse algorithms

The H-DISStream size as the biggest data structure in the designed algorithms is

presented in Figure ‎6.31. Following the compact logarithmic tilted-time window model and by

applying the tail pruning as in Corollary ‎4-3 the H-DISStream size stays small as in-memory

data structure. The periodic drops in the size of H-DISStream are caused by merging the tilted

window frames. This can be seen clearly after processing 𝐵8, 𝐵16, 𝐵24, 𝐵27 and 𝐵32.

Figure ‎6.31 H-DISStream structure size

6.3.2.2 Approximation in discriminative Itemsets in the tilted-time window model

In this section the scalability of the H-DISSparse algorithm, as a highly accurate and

highly efficient method for mining discriminative itemsets using the tilted-time window model

with the highest refined approximate bound, is evaluated with different parameter settings. The

H-DISTree algorithm is not scalable for processing large data streams with the highest

approximate bound and cannot be evaluated in this section.

Three corollaries are defined in Chapter ‎4 for mining discriminative itemsets using the

tilted-time window model with the highest refined approximate bound. The relaxation of 𝛼 in

Corollary ‎4-2 is set for the highest refined approximate bound in discriminative itemsets in the

tilted-time window model. The H-DISSparse time usage and the H-DISStream size are

represented in Figure ‎6.32 by a different setting for the relaxation of 𝛼 i.e., 𝛼 = 1 and 𝛼 = 0.9.

The H-DISSparse scales double by relaxation of 𝛼 = 0.9 with improvement in approximate

discriminative itemsets as compared to relaxation of 𝛼 = 1.

Figure ‎6.32 Scalability of H-DISSparse algorithm by relaxation of 𝛼 = 1 and 𝛼 = 0.9

Figure ‎6.33 shows the number of sub-discriminative itemsets by setting relaxation of

𝛼 = 0.9. The sub-discriminative itemsets are considered as overhead for the algorithms and can

be increased exponentially by setting very small relaxation of 𝛼.

Figure ‎6.33 Number of sub-discriminative itemsets by relaxation of 𝛼 = 0.9

Table ‎6.4 shows the number of discriminative itemsets in the tilted-time window model

by processing each batch considering 𝛼 = 0.9.

The improvements in the algorithm accuracy and recall can be seen by comparing the

Table ‎6.3 and Table ‎6.4. In the susy datasets the discriminative itemsets in the adjunct tilted

window frames are similar and by using smaller relaxation 𝛼, more discriminative itemsets are

discovered in the merged tilted window frames.

6.3.3 Discussion

The general outcome from experiments on H-DISSparse algorithm discussed in previous

sub-sections is explained here. The H-DISSparse algorithm with the defined corollaries exhibits

efficient time and space complexity for mining discriminative itemsets using the tilted-time

window model. The highest refined approximate bound in discriminative itemsets in the tilted-

time window model is obtained efficiently based on Corollary ‎4-1, Corollary ‎4-2 and

Corollary ‎4-3 by the smaller relaxation of 𝛼. Setting the relaxation of 𝛼 in accompaniment with

other parameters (i.e., support threshold 𝜑, discriminative level 𝜃 and input batch size 𝑛1) is very

important in the real applications. Proper size has to be considered for the current window frame

(i.e., 𝑊0) for updating the tilted-time window model in reasonable time intervals. This is highly

related to the application and domain experts by considering limited computing and storage

capabilities and the approximate bound in the false positive discriminative itemsets.

The changes in trend based on the concept drifts in batches are neutralized quickly in the

tilted-time window model and the in-memory H-DISStream structure is held efficiently during

the life time of data streams. The in-memory H-DISStream without tail pruning (i.e.,

Corollary ‎4-3) is not efficient even by considering a compact logarithmic tilted-time window

model. The FP-Tree structure made of transactions in one batch considering all items is used

efficiently in H-DISSparse algorithm. This stays as an efficient in-memory structure and does not

affect the algorithms’‎ scalability as in the batch processing the infrequent items are pruned in

conditional FP-Tree. The H-DISTree algorithm is more sensitive to the concept drifts and not

efficient even without considering sub-discriminative itemsets (i.e., relaxation of 𝛼 = 1). The

main part of time and space complexity in the algorithms is related to the batch processing,

although the tail pruning of non-discriminative itemsets appeared in the old batches can add

complexity to the algorithms as well.

6.4 SLIDING WINDOW MODEL

In this section we evaluate the S-DISSparse algorithm using data streams modelled as

multiple batches of transactions. The S-DISSparse algorithm is evaluated with different datasets

and parameter settings for mining exact discriminative itemsets in the sliding window model by

the offline updating state. The S-DISSparse algorithm is evaluated with different parameter

settings for the highest refined approximate bound in discriminative itemsets in the sliding

window model by the online updating state. The data streams made of different numbers of

transactions were generated using the IBM synthetic data generator (Agrawal and Srikant 1994).

The first dataset called 𝐷1, is generated with 𝑆1 as 𝑇25: 𝐼10: 𝐷60𝐾 and 𝑆2 as 𝑇25: 𝐼15: 𝐷300𝐾

limited to 10𝐾 unique items. To show the unsynchronized behaviour of the data streams in the

online sliding window model, we made a simple code to mix the two data streams together based

on their ratio size. The data streams in 𝐷1 are modelled as 30 continuous batches in the same

sizes (i.e., for the sake of clarity) with 𝑇25: 𝐼10: 𝐷2𝐾 and 𝑇25: 𝐼15: 𝐷10𝐾 belonging to the

target data stream 𝑆1 and general data stream 𝑆2, respectively. The ratio between size of 𝑆1 and

𝑆2 in 𝐷1 is the same for all 30 batches (i.e., 𝑛2/𝑛1 = 5).

In the designed experiment we consider that the data streams have same speeds in the

specified time periods and the batches contain an equal number of transactions, for the sake of

simplicity. It should be noted that in the real applications the input batches could be in different

sizes and have different characteristics, as the data streams have different speeds and different

characteristics during the time. The second dataset called 𝐷2, is generated with 𝑆1 as

𝑇25: 𝐼10: 𝐷90𝐾 and 𝑆2 as 𝑇25: 𝐼15: 𝐷450𝐾 limited to 10𝐾 unique items. The data streams in

𝐷2 are modelled as 18 continuous batches in the same sizes with 𝑇25: 𝐼10: 𝐷5𝐾 and

𝑇25: 𝐼15: 𝐷25𝐾 belonging to the target data stream 𝑆1and general data stream 𝑆2, respectively.

The ratio between size of 𝑆1 and 𝑆2 in 𝐷2 is the same for all 18 batches (i.e., 𝑛2/𝑛1 = 5). In this

section, the strategies and principles are recommended for tuning the parameters based on the

application domains and data stream characteristics.

The scalability of the S-DISSparse algorithm is presented within the offline sliding and

online sliding window models. The time complexity of S-DISSparse is presented by transaction

processing in the online state and by full size partition processing in the offline state. In the

proposed S-DISSparse algorithm, each batch of input transactions is set to a partition in the

sliding window frame 𝑊. The scalability of S-DISSparse is presented within offline sliding of

the window frame 𝑊 by the recent batch of transactions. The scalability of S-DISSparse is

presented within online sliding of the window frame 𝑊 by every recent transaction. In this

section during all experiments the discriminative level 𝜃 = 25 and support threshold 𝜑 =

0.002%. In the experiments with 𝐷1, the 25 recent partitons are fitted in the sliding window

frame 𝑊 (i.e., 𝑊 = 25). It is assumed while the new batch is loaded with transactions, the S-

DISStream updating can be done by processing the current batch of transactions. This works well

as far as the algorithm is faster than the rate of incoming data streams.

6.4.1.1 Offline sliding discriminative itemsets

The time complexity of the S-DISSparse method in the offline updating sliding window

model by each partition in 𝐷1, is presented in Figure ‎6.34. The sliding window frame 𝑊 is

initialized by the first 20 partitons in 𝐷1, for the sake of clarity. This is recommended as there

may be a large number of discriminative itemsets discovered by processing the initial partitions

with small data stream lengths in the sliding window frame 𝑊. The sliding window model is

then updated in an offline state by every new partition that is fitted in the window frame 𝑊. The

scalability of the S-DISSparse algorithm is compared with DISSparse algorithm (Seyfi et al.

2017) proposed in Chapter ‎3 for mining discriminative itemsets in a batch of transactions. The

DISSparse algorithm is used as benchmark by processing the full size window frame 𝑊 after

each offline window model sliding.

Figure ‎6.34 S-DISSparse and DISSparse time complexity for 𝐷1 (window frame 𝑊 = 25)

The efficiency of S-DISSparse is generally better in comparison to the efficiency of

DISSparse in the offline sliding window model. The difference is clearer during offline sliding by

the partition 𝑃23 (i.e., 450 seconds for DISSparse method). During window model sliding by this

partition the window frame 𝑊 is only updated by adding the recent partition. The offline sliding

by the partitions 𝑃26 to 𝑃30 using S-DISSparse show less efficiency as compared to the partitions

𝑃21 to 𝑃25. During window model sliding by these partitions the window frame 𝑊 is updated by

adding the recent partition and deleting the oldest partition.

The time complexity of the offline window sliding using S-DISSparse by some partitions

has the least efficiency as compared with DISSparse; for example, the offline window sliding by

partitions 𝑃26 and 𝑃30 shows similar time usage in both algorithms. The S-DISSparse, in these

points, scale the same as DISSparse as most of the transactions in the sliding window model are

updated during offline sliding by these partitions. The variations in the time complexity of the

algorithms by processing different partitions are caused by the concept drifts in transactions

distribution in different batches.

The space complexity of the S-DISSparse algorithm in the offline updating sliding

window model by each partition, is presented in Figure ‎6.35. The space usage of the S-

DISStream and S-FP-Tree are reported during offline sliding as the largest datasets used in S-

DISSparse algorithm. The space usage related to the batch processing (i.e., by processing each

offline sliding) is not monitored as it is much smaller in comparison to the size of these datasets.

The S-FP-Tree and S-DISStream are held in main memory during the sliding window model.

The processing time for S-FP-Tree making is small and is included in the total processing time.

Figure ‎6.35 S-DISSparse space complexity in offline sliding for 𝐷1 (𝑊 = 25)

The S-DISStream structure is the biggest data structure in the designed algorithm.

Despite the batches with high concept drifts (e.g., the batch of transactions that adds a large

number of discriminative itemsets to the sliding window frame 𝑊) the S-DISStream size tends to

become stable by sliding with a larger number of batches in data streams. Following the compact

prefix tree structure and by applying the tail pruning the S-DISStream size stays small as in-

memory data structure. The number of discriminative itemsets discovered in the sliding window

frame 𝑊 in the designed experiments is between 2.5 and 3 million (i.e., as in Figure ‎6.41) by

considering the different distributions of transactions during the sliding window model.

To show the effects of the different window size 𝑊, we run the S-DISSparse algorithm

with the second datasets 𝐷2 which has bigger size partitions. The size of the sliding window

frame 𝑊 is set to the same number of transactions as in the first experiment (i.e., 𝑊 = 10). The

sliding window frame 𝑊 is initialized by the first 8 partitons in 𝐷2, following the first

experiment and for the sake of clarity. The time and space complexity of the S-DISSparse

method in the offline updating sliding window model by each partition in 𝐷2, is presented in

Figure ‎6.36.

Figure ‎6.36 S-DISSparse and DISSparse time and space complexity for 𝐷2 (window frame

𝑊 = 10)

The S-DISSparse algorithm shows the least efficiency as compared with the DISSparse

during offline sliding window model. The more number of transactions in the sliding window

frame 𝑊 are updated by the bigger size partitions. This has to be considered for adjusting a

proper sliding window frame size based on the application domains and data stream

characteristics. The sliding window frame should be set much bigger in comparison to the

average size of a single partition.

We have conducted experiments with several different datasets and by data streams with

different characteristics. The proposed DISSparse algorithm (Seyfi et al. 2017) in Chapter ‎3 is

highly efficient and scales well when the sliding window frame 𝑊 is small in comparison to the

average size of the single partition. The S-DISSparse scales well when the sliding window frame

𝑊 is much bigger than the average size of the single partition and when a large number of

discriminative itemsets with small support and ratios are of interest. The S-DISSparse is

evaluated for mining discriminative itemsets using the online sliding window model in the

section below.

6.4.1.2 Online sliding discriminative itemsets

The scalability of the algorithm is tested based on different relaxations of 𝛼 in online

sliding of the window frame 𝑊. The time complexity of the S-DISSparse method by offline and

online processing is presented in Figure ‎6.37. The online sliding does not add high time

complexity to the S-DISSparse algorithm and it even decreases the overall time complexity of the

algorithm during the processing of some partitions; for example, the time complexity of the

online sliding window model is less than the offline sliding window model by the partitions 𝑃21.

The non-discriminative subsets (of discriminative itemsets) in S-DISStream are updated during

online sliding. This is caused for decreasing the time complexity of the S-DISSparse algorithm

during tuning the frequencies of the non-discriminative subsets in S-DISStream.

Figure ‎6.37 S-DISSparse time complexity in online and offline sliding for 𝐷1 (𝑊 = 25)

To improve the approximation of the discriminative itemsets in the online updating

sliding window model, we set the relaxation of 𝛼 = 0.9 and 𝛼 = 0.75, respectively. The S-

DISSparse time complexity during offline sliding, online sliding with 𝛼 = 1 (i.e., no sub-

discriminative itemsets), online sliding with 𝛼 = 0.9 and 𝛼 = 0.75, are represented in

Figure ‎6.38. The S-DISSparse time complexity with 𝛼 = 0.75 is more sensitive to the concept

drifts in data streams; for example, the time complexity increases with higher variation by the

window model sliding with partition 𝑃22 and partition 𝑃26.

Figure ‎6.38 S-DISSparse time complexity for online and offline sliding for 𝐷1 (𝑊 = 25)

with different relaxation of 𝛼

The space usage of the S-DISStream is presented during offline sliding by 𝛼 = 1,

𝛼 = 0.9 and 𝛼 = 0.75 in the S-DISSparse algorithm as in Figure ‎6.39. The S-FP-Tree structure

size is not affected by different relaxation of 𝛼 as the S-FP-Tree holds all items (i.e., S-FP-Tree

includes infrequent items in sliding window frame 𝑊, as well). The S-DISStream size increases

by smaller relaxation of 𝛼 with small variations by the concept drifts in data streams during the

sliding window frame 𝑊. A greater number of sub-discriminative itemsets has to be saved in the

S-DISStream structure so they can improve the approximation in discriminative itemsets in the

online sliding window model. It should be noted that the size of the Transaction-List structure

used in the online sliding window model is small compared to the S-FP-Tree and S-DISStream

and not presented, for the sake of clarity.

Figure ‎6.39 S-DISStream size for 𝐷1 (𝑊 = 25) by different relaxation of 𝛼

The online sliding window model is used for more up-to-date and accurate online

answers. The number of itemsets in which their tag is changed during online sliding (e.g.,

discriminative to non-discriminative itemsets or non-discriminative to discriminative itemsets) is

presented in Figure ‎6.40. The number of itemsets in which their tag is changed during the online

sliding window model is increased by smaller relaxation of 𝛼 (i.e., 𝛼 = 0.9 and 𝛼 = 0.75).

However, choosing a smaller relaxation of 𝛼 may not always improve the approximation in

discriminative itemsets in the online sliding window model. Based on the concept drifts in data

streams, the new discriminative itemsets are discovered in the sliding window frame 𝑊 that do

not exist in S-DISStream. This ensures the better approximation in discriminative itemsets in the

online sliding window model by setting a reasonable small relaxation of 𝛼 based on the

application domains and data stream characteristics.

Figure ‎6.40 Number of itemsets that their tag is changed for 𝐷1 (𝑊 = 25) by different

relaxation of 𝛼

The exact number of discriminative itemsets after each offline sliding window model is

represented in Figure ‎6.41. The number of sub-discriminative itemsets is also represented by

different relaxation of 𝛼 (i.e., 𝛼 = 0.9 and 𝛼 = 0.75). This also shows that choosing smaller

relaxation of 𝛼 may not always improve the approximation in discriminative itemsets in the

online sliding window model. As in Figure ‎6.41, by smaller relaxation of 𝛼 (i.e., 𝛼 = 0.75) the

number sub-discriminative itemsets grows even more than the discriminative itemsets, which

adds overhead to the algorithm scalability.

Figure ‎6.41 Number of discriminative and sub-discriminative itemsets for 𝐷1 (𝑊 = 25) by

different relaxation of 𝛼

To evaluate the proposed S-DISSparse algorithm on real applications we ran the

experiments on some real datasets. The susy dataset from the UCI repository (Dheeru and Karra

Taniskidou 2017) provided in (Fournier-Viger et al. 2016) was used. The selected dataset is

dense (i.e., transactions have values for each attribute) with less sparsity characteristics compared

to the synthetic market basket datasets. For this reason we set the parameters in a way to show

the scalability in the best scales. The susy dataset is related to high-level features derived by

physic to help discriminate between two classes defined as signal and background. It is related to

the particles detected using particle accelerator based on Monte Carlo simulations. This dataset is

made of five million instances and the first column is the class label followed by eighteen

features. The transactions are made of about hundred ninety unique items. We selected the 25

batches each of them made of twenty thousand instances for the scale of the experiments. The S-

DISSparse algorithm is evaluated with different parameter settings for mining discriminative

itemsets in the offline and online sliding window models. In this section during all experiments

the discriminative level 𝜃 = 2.5 and support threshold 𝜑 = 1%. In the experiments with susy

dataset the 20 recent partitons are fitted in the sliding window frame 𝑊 (i.e., 𝑊 = 20).

The time complexity of the S-DISSparse method in the offline updating sliding window

model by each partition in susy dataset is presented in Figure ‎6.42. The sliding window frame 𝑊

is initialized by the first 15 partitons in susy dataset for the sake of clarity. This is recommended

as there may be a large number of discriminative itemsets discovered by processing the initial

partitions with small data stream lengths in the sliding window frame 𝑊. The sliding window

model is then updated in an offline state by every new partition that is fitted in the window frame

𝑊. The scalability of the S-DISSparse algorithm is compared with DISSparse algorithm (Seyfi et

al. 2017) proposed in Chapter ‎3 for mining discriminative itemsets in a batch of transactions. The

DISSparse algorithm is used as benchmark by processing the full size window frame 𝑊 after

each offline window model sliding.

Figure ‎6.42 S-DISSparse and DISSparse time complexity for susy dataset (window frame

𝑊 = 20)

In susy dataset, the efficiency of S-DISSparse is not better in comparison to the

efficiency of DISSparse in the offline sliding window model. This is mainly because of the small

number of unique items in the transactions i.e., 190 unique item in the susy dataset. The sparsity

characteristic of the dataset is limited and the subsets are updated by every new batch coming in

or old batch going out of sliding window model.

The space complexity of the S-DISSparse algorithm in the offline updating sliding

window model by each partition, is presented in Figure ‎6.43. The space usage of the S-

DISStream and S-FP-Tree are reported during offline sliding as the largest datasets used in S-

DISSparse algorithm. The space usage related to the batch processing (i.e., by processing each

offline sliding) is not monitored as it is much smaller in comparison to the size of these datasets.

The S-FP-Tree and S-DISStream structures are held in main memory during the sliding window

model.

Figure ‎6.43 S-DISSparse space complexity in offline sliding for susy dataset (𝑊 = 20)

The S-DISSparse is evaluated for mining discriminative itemsets using the online sliding

window model. The scalability of the algorithm is tested based on different relaxations of 𝛼 in

online sliding of the window frame 𝑊. The online sliding does not add high time complexity to

the S-DISSparse algorithm. To improve the approximation of the discriminative itemsets in the

online updating sliding window model, we set the relaxation of 𝛼 = 0.9. The S-DISSparse time

complexity during offline sliding, online sliding with 𝛼 = 1 (i.e., no sub-discriminative itemsets)

and online sliding with 𝛼 = 0.9, are represented in Figure ‎6.44.

Figure ‎6.44 S-DISSparse time complexity in online and offline sliding for susy dataset

(𝑊 = 20)

The space usage of the S-DISStream is presented during offline sliding by 𝛼 = 1 and

𝛼 = 0.9 in the S-DISSparse algorithm as in Figure ‎6.45. The S-FP-Tree structure size is not

affected by different relaxation of 𝛼 as the S-FP-Tree holds all items (i.e., S-FP-Tree includes

infrequent items in sliding window frame 𝑊, as well). The S-DISStream size slightly increases

by smaller relaxation of 𝛼 during the sliding window frame 𝑊. A greater number of sub-

discriminative itemsets has to be saved in the S-DISStream structure so they can improve the

approximation in discriminative itemsets in the online sliding window model. It should be noted

that the size of the Transaction-List structure used in the online sliding window model is small

compared to the S-FP-Tree and S-DISStream and not presented, for the sake of clarity.

Figure ‎6.45 S-DISStream size for susy dataset (𝑊 = 20) by different relaxation of 𝛼

The number of itemsets in which their tag is changed during online sliding (e.g.,

discriminative to non-discriminative itemsets or non-discriminative to discriminative itemsets) is

presented in Figure ‎6.46. The number of itemsets in which their tag is changed during the online

sliding window model is increased by smaller relaxation of 𝛼 (i.e., 𝛼 = 0.9)..

Figure ‎6.46 Number of itemsets that their tag is changed for susy dataset (𝑊 = 20) by

different relaxation of 𝛼

The exact number of discriminative itemsets after each offline sliding window model is

represented in Figure ‎6.47. The number of sub-discriminative itemsets is also represented by

different relaxation of 𝛼 (i.e., 𝛼 = 0.9)

Figure ‎6.47 Number of discriminative and sub-discriminative itemsets for susy dataset

(𝑊 = 20) by different relaxation of 𝛼

In the section below we discuss the outcome from evaluation of the proposed S-

DISSparse algorithm.

6.4.3 Discussion

The general outcome from experiments on the S-DISSparse algorithm discussed in

previous sub-sections is explained here. In this section we evaluated the proposed single pass

algorithm for mining discriminative itemsets in data streams using the sliding window model.

The algorithm uses three in-memory data structures called S-FP-Tree, S-DISStream and

Transaction-List for offline and online sliding respectively. The discriminative itemsets are

discovered in the offline state and the process is continued in the online state for each new

transaction. The online sliding by new transactions and periodic offline sliding by new partition

results in efficient memory consumption and time complexity with good approximation in the

large and fast-growing data streams. The number of discriminative itemsets generated is

significantly less in comparison to frequent itemsets. This makes the discriminative itemsets

more useful for data stream discrimination.

By setting the relaxation of 𝛼 to smaller values, algorithms save the sub-discriminative

itemsets as well with increment of the time and space complexity. The sub-discriminative

itemsets have potential to be discriminative in the full size sliding window model. By using

smaller relaxation of 𝛼 the algorithm achieve better approximation in the online sliding window

frame. The relaxation of 𝛼 should be set in a value such that the increase in the time and space

complexity be acceptable in the data stream application domain.

Based on the reported experiments, the S-DISStream size may grow during the history of

data streams. In Section ‎6.2 in the batch processing experiments, we discussed in detail the best

way of setting optimum batch sizes. This is very important in the algorithms related to the sliding

window model as the exact discriminative itemsets should be updated and reported in reasonable

time periods. In our experiment we used a homogeneous set of batches in our data streams. In the

real applications, data streams have different speeds and distributions during the time caused for

different size batches with different complexities. This has to be considered in definition of batch

sizes and parameter settings. The S-DISSparse is defined as an efficient method of mining

discriminative itemsets using the sliding window model for the large datasets with high

complexities. The process of the discriminative itemset mining in algorithm is highly dependent

on the type of datasets and how the itemsets are distributed in the streams.

The sliding window model is updated in the offline and online states. The usability of the

online sliding window highly depends on the concept drifts in the input data streams and also the

input rate of the data streams. Considering the incoming partitions in the data streams the

window frame is updated in an online state with high performance if the adjacent partitions have

the least concept drifts. In the data streams with high concept drifts, the sliding window frame is

mainly updated through an offline sliding state and the online sliding state adds overhead on the

mining process. The sliding window frame size and also the number of partitions and size of each

partition can be defined based on the specific characteristics of the input datasets and based on

the domain expert knowledge.

6.5 CHAPTER SUMMARY

Considering the experiments conducted in this chapter, the problem of mining

discriminative itemsets in data streams is applicable for the large and fast-growing data streams.

The large number of discriminative itemsets with small minimum supports and frequency ratios

can be discovered efficiently.

The DISTree and DISSparse algorithms were proposed for mining discriminative

itemsets in data streams in a batch of transactions. These algorithms show a good performance

when the discriminative itemsets with higher minimum supports and frequency ratios are in

interest. The DISSparse algorithm has high efficiency for mining large number discriminative

itemsets with smaller minimum supports and frequency ratios. The DISTree algorithm fails in the

larger datasets with higher complexity when the exponential number of discriminative itemsets

has to be discovered based on the parameter settings. The FP-Tree is a data structure used in both

algorithms for holding the input transactions. This may become big and a cause for overhead in

processing the large batch of transactions with high complexity. The time and space complexity

of the algorithms is affected in different batches by distributions of the transactions. However, the

DISSparse algorithm is less affected as compared to the DISTree algorithm. The efficiency of the

DISSparse algorithm is highly related to the proposed heuristics for mining discriminative

itemsets out of the potential discriminative subsets.

The most important thing in the proposed algorithms for the batch processing is the size

of each batch of transactions. The proposed algorithms are used for offline updating of the

different window models. Considering the data stream applications, the updating time intervals

for the output results must not be with high delay. Both algorithms have full accuracy and recall

in mining the discriminative itemsets in a batch of transactions. The offline updating window

model can add approximations to the discriminative itemsets. The approximation can improve by

setting a larger size for the batch of transactions for the offline updating of the window models.

This has to be considered based on the desired outputs and the algorithms’‎complexity following

the memory and CPU limits.

The H-DISSparse algorithm was proposed for mining discriminative itemsets in data

streams using the tilted-time window model. The H-DISSparse algorithm uses the efficient

DISSparse method for the offline batch processing and can efficiently deal with large and fast-

growing data streams. The tilted-time updating window model in H-DISSparse algorithm holds

the window model small, as per its compact logarithmic structure. The effects of the high concept

drifts in data streams are neutralized in the tilted-time window model by merging the smaller

window frames to the larger window frames. The H-DISStream in-memory structure stays small

during the history of data streams by applying the proposed tail pruning technique. The larger

number of sub-discriminative itemsets improves the approximation in mining discriminative

itemsets in data streams using the tilted-time window model. However, it should be noted that

adding more sub-discriminative itemsets to the tilted-time window model may add high

overheads both in the batch processing and tilted window updating parts.

The S-DISSparse algorithm was proposed for mining discriminative itemsets in data

streams using the sliding window model. The S-DISSparse algorithm uses the DISSparse method

with new heuristics for the offline sliding window model. The proposed heuristics for the offline

sliding window model based on the recently updated subsets are efficient when the size of the

sliding window frame is set much bigger than the each partition coming in or going out of the

sliding window frame. The S-DISStream as an in-memory data structure stays small during the

sliding window model. The online sliding window model shows the updated discriminative

itemsets and the change in the tag of itemsets. The larger number of sub-discriminative itemsets

improves the approximation in mining discriminative itemsets in data streams using the online

window model. However, it should be noted that adding more sub-discriminative itemsets to the

sliding window model may add high overheads to the offline sliding part. The more number of

sub-discriminative itemsets may not improve the approximation in discriminative itemsets using

the online sliding window model as the recent discriminative itemsets may not exist in the S-

DISStream data structure.

The interesting characteristic of mining discriminative itemsets in data streams is that a

large number of discriminative itemsets can be discovered efficiently. These itemsets can be used

efficiently for different real world applications like classification, optimization and decision

making. The biggest challenge is to define the best setting of parameters considering the CPU

and memory limits and the desired outputs. The proposed principles in this chapter, with

accompany the knowledge of application domain experts can be used for parameter setting by

considering the different window models. The concept drifts in data streams have to be

considered for setting the smaller parameters.

In this chapter, the data streams were generated and modelled using a synthetic data

generator. The real data streams were modelled using datasets obtained from UCI repository. The

different real data streams may have extra challenges and add more limits to the experiments as

per different application domains. In the next chapter, we conclude the thesis for mining

discriminative itemsets in data streams using different window models and we will discuss the

future works.

Chapter 7: Conclusions Page 178

7Chapter 7: Conclusions

Pattern mining is one of the interesting research topics in data mining. In data streams,

the patterns in the target datasets are analysed based on the specified time periods. We expand the

frequent itemset mining techniques to the new areas of discriminative itemset mining.

Discriminative itemsets show the distinguishing features of the target data stream in comparison

to the general trends existed in the other data streams. The number of discriminative itemsets is

generally much fewer in comparison to the frequent itemsets. Discriminative itemsets focus on

defining the target data stream distinctly in comparison to the rest of the streams in the collection.

In this thesis, we proposed four algorithms for finding the discriminative itemsets in data streams

based on two different window models. We defined several in-memory data structures with the

effective pruning processes and heuristics for the time and space saving. All the structures

generated and used during the mining process attempt to consume minimum time and space. The

proposed methods have been extensively evaluated with datasets exhibiting their own distinct

characteristics. The data structures generated during the process were able to be fitted in the main

memory. Results ascertain that mining discriminative itemsets in data streams is realistic in fast-

growing data streams.

At first, we did extensive research on the importance of discriminative itemsets in data

streams. We provided research in different algorithms proposed for frequent itemset mining in

data streams, followed by different contrast data mining methods and association classification

mining methods. Then, we defined the problem of mining discriminative itemsets in the data

streams. The problem is defined on the itemsets that are frequent in one data stream and their

frequencies in that stream are much higher than other data streams. To achieve this, we proposed

a simple algorithm for mining the discriminative itemsets in a single batch of data streams. In the

next step, we proposed a heuristic-based algorithm for efficient mining the discriminative

itemsets in a single batch of data streams. After that we expanded the method for mining the

discriminative itemsets in data streams in the tilted-time window model. The discriminative

itemsets are discovered in each batch of transactions and are merged periodically in a tilted-time

window model. The scalability of the proposed methods was analysed with different datasets and

parameter settings, showing their acceptable time and space usage in the fast-speed large data

streams, especially for the heuristic-based method.

Following the proposed methods for mining discriminative itemsets in a single batch of

transactions, we proposed one other algorithm for mining the discriminative itemsets in data

streams over the sliding window model. The proposed method was analysed using different

datasets and multiple parameter settings. The discriminative itemset mining has more challenges

than frequent itemset mining, especially in the sliding window model. During the window frame

sliding, the algorithms have to deal with the combinatorial explosion of itemsets in data streams

coming to the window frame and going out of the window frame. The novel in-memory data

structures are defined for processing discriminative itemsets in combination with an offline and

online sliding window model. We used the offline processing to control the generation of number

of non-potential itemsets, and the online processing is used for online monitoring of the data

streams. The empirical analysis shows that the proposed algorithm provides efficient time and

space complexities in online data stream growing at fast speed. In the future, we plan to develop

methods of discovering discriminative rules using discriminative itemsets, with the aim of

proposing a classifier focusing on distinguishing features of data streams.

This thesis is concluded in this chapter. First, the contributions of this thesis are

summarized. Then, the findings that are drawn from this thesis are described. Finally, limitations

of current work and directions of future work are presented. We follow the research problem in

the future by defining the discriminative rules, which will be used for proposing a novel

classification technique called the discriminative classification.

7.1 SUMMARY OF CONTRIBUTIONS

This thesis integrates the concept of discriminative itemsets with data streams. It expands

the research in contrast data mining for the discriminative itemset mining. The proposed methods

focus on overcoming the weakness of the existing state-of-art methods for discriminative itemset

mining in the three types of tasks. The thesis also shows the importance of optimized parameter

settings in three types of tasks. We show the research shortcomings as below.

Lack of research in contrast data mining for the discriminative itemset mining

Lack of method for the discriminative itemset mining in data streams

Lack of efficient method for mining the discriminative itemsets in the tilted-time

window model

Lack of efficient method for mining the discriminative itemsets in the sliding

window model

Lack of efficient discriminative itemset mining method for different window

models optimized based on the general and specific characteristics of the target

data stream trends compare to the other data streams.

The above mentioned shortcomings are overcome by

Employing an extensive research on contrast data mining for the discriminative

itemset mining

Employing a simple method of discriminative itemset mining

Employing an advanced heuristic-based method for discriminative itemset

mining

Employing an efficient method for discriminative itemset mining in data streams

using the tilted-time model

Employing an efficient method for discriminative itemset mining in data streams

using the sliding window model

Vast evaluation of the proposed methods using different datasets and parameter

settings

The main contributions of this thesis are summarised below:

Developing extensive research to show the importance of discriminative

itemsets in data streams in the real applications.

o Extensive research in different frequent itemset mining methods in data

streams and contrast data mining methods is done. The importance of

the discriminative itemset and its superiority to the frequent itemset is

discussed, and the application of discriminative itemset for the

definition of the discriminative rule is provided.

Developing the simple method of discriminative itemset mining

o After defining the concept of the discriminative itemset mining in data

streams we proposed a simple algorithm for mining discriminative

itemsets in data streams. The proposed algorithm works well only for

the small datasets or within specific parameter settings.

o The time and space complexity of the algorithm is analysed based on the

synthetically generated and real datasets.

Developing an efficient heuristic-based method for discriminative itemset

mining

o After defining the first method of discriminative itemset mining in a

single batch of transactions we develop an efficient method by defining

a heuristic scalable for real world large datasets.

o The algorithm shows acceptable time and space complexity within

datasets with different sizes and different parameter settings.

Developing method of discriminative itemset mining in data streams in tilted-

time model

o The efficient method defined previously is used for mining discriminative

itemsets in tilted-time window model.

o The efficient algorithm defined for mining discriminative itemsets in data

streams using the tilted-time window model works well for the small

and large datasets.

o The tilted-time window model is used for saving the historical

Developing a method of discriminative itemset mining in data streams in the

sliding window model

o The efficient method defined previously is used for mining discriminative

itemsets using sliding window model.

o The efficient algorithm defined for mining the discriminative itemsets in

data streams using the sliding window model works well for the small

and large datasets.

o The sliding window model is used for saving the offline and online real-

time discriminative itemsets.

A comprehensive analysis done for all the four algorithms on the datasets

generated using a synthetic data generator and real datasets.

o The different parameter settings are tested for mining discriminative

itemsets based on different input datasets.

7.2 SUMMARY OF FINDINGS

The main findings from this thesis are summarised as following:

In response to the first research question, extensive research were done on

frequent itemset mining in data streams and contrast data mining by setting

the place of the research in the literatures. The importance of the

discriminative itemsets in data streams in real applications was emphasized

and the discriminative itemsets were proposed for the application of

classification in data streams.

o It covered the literatures that use the frequent itemset mining and

contrast data mining for the classification of static datasets and data

streams.

In response to the second research question, the concept of the

discriminative itemsets were proposed and a simple algorithm coded based

on the expansion of the FP-Growth (Han, Pei and Yin 2000) for mining the

discriminative in a single batch of transactions. This followed by an

advanced efficient method for mining the discriminative in a single batch of

transactions.

o These itemsets are used for differentiating between the target data

stream and the general data stream.

In response to the third research question, the concept of the discriminative

itemsets in the tilted-time window model and the sliding window model

were proposed. One method was coded by expansion of the proposed

method for a single batch of transactions for efficient mining of the

discriminative itemsets in the tilted-time window model. One method was

coded by expansion of the proposed efficient method for a single batch of

transactions for efficient mining of the discriminative itemsets in sliding

window model.

o These algorithms are scalable in the real world data streams

modelled as continuous batch of transactions.

In response to the fourth research question, the proposed algorithms were

tested with several synthetically generated datasets and real datasets with

different features and parameter settings.

o These algorithms can be customised by different parameter settings

based on the dataset characteristic.

7.3 CONNECTIONS BETWEEN THE THREE TASKS

This thesis contributes in definitions of the discriminative itemset mining in data streams

and four methods of mining discriminative itemsets in different window models. Discriminative

itemsets in a tilted-time window model and a sliding window model can be used for

discriminative classification.

The discriminative itemsets are useful for description mining in data streams in different

window models as well. The discriminative itemsets can be used for mining the discriminative

rules. The discriminative rules in accompaniment with the highly frequent itemsets can be used

for defining the discriminative classification techniques. This technique can be used for

prediction mining in a target data stream using distinguishing features of discriminative rules.

The classification techniques will be fast enough and efficient for fast-growing data streams.

7.4 LIMITATIONS AND THE FUTURE RESEARCH ISSUES

This thesis is applied for discriminative itemset mining techniques in data streams. The

proposed methods are analysed based on the synthetic datasets generated by the IBM synthetic

data generator and the real datasets provided in UCI repository.

One limitation in the proposed method is the concept drifts of the transactions in data

streams for the experiments’‎ purpose. Considering the concept drifts of the data streams, the

algorithms may have long delays in their runtimes. Another limitation is the size of memory

which affects the length of the window size. The tilted-time window model should be regularly

restructured and optimised based on the recent trends so it can be fitted in the main memory.

Another limitation is the large number of discriminative itemsets with low supports.

The aim of the discovered discriminative itemsets in data streams in this thesis can lead

them to be employed in classification techniques for prediction mining in large data streams. The

discovered discriminative itemsets in tilted-time and sliding window models can be used for

definition of discriminative rules. These are defined as the rules with higher support and

confidence in the target data streams compared to the general data stream. This includes

recommending the developed discriminative itemset mining algorithms for the discriminative

rule mining in data streams. The discriminative rules in the historical tilted-time and sliding

window models can be adjusted for the purpose of defining new classification techniques for

prediction mining in data streams.

References

Aggarwal, Charu C. 2007. Data streams: models and algorithms. Vol. 31: Springer Science &

Business Media.

Agrawal, Rakesh and Ramakrishnan Srikant. 1994. "Fast algorithms for mining association rules in

large databases." In Proceedings of the 20th International Conference on Very Large Data

Bases VLDB, edited, 487-499

Ahmed, Chowdhury Farhan, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong and Ho-Jin Choi. 2012.

"Interactive mining of high utility patterns over data streams." Expert Systems with

Applications 39 (15): 11979-11991. doi: 10.1016/j.eswa.2012.03.062.

Alhammady, Hamad and Kotagiri Ramamohanarao. 2005. "Mining Emerging Patterns and

Classification in Data Streams." The Proceedings of IEEE/WIC/ACM International

Conference on Web Intelligence: 272-275 doi: 10.1109/WI.2005.96.

Amagata, Daichi and Takahiro Hara. 2017 "Mining Top-k Co-Occurrence Patterns across Multiple

Streams." IEEE Transactions on Knowledge and Data Engineering 29 (10): 2249 - 2262.

doi: 10.1109/TKDE.2017.2728537.

Antonie, Maria-Luiza‎and‎Osmar‎R.‎Za¨ıane.‎2004.‎"Mining‎positive and negative association rules:

An approach for confined rules." In Proceedings of the Knowledge Discovery in Databases:

Pkdd 2004, edited by J. F. Boulicaut, F. Esposito, F. Giannotti and D. Pedreschi, 27-38.

Ayres, Jay, Johannes Gehrke, Tomi Yiu and Jason Flannick. 2002. "Sequential Pattern Mining using

A Bitmap Representation." In Proceedings of the eighth ACM SIGKDD international

conference on Knowledge discovery and data mining, edited, 429-435 doi:

10.1145/775047.775109.

Bailey, James and Elsa Loekito. 2010. "Efficient incremental mining of contrast patterns in changing

data." Information processing letters 110 (3): 88-92. doi: 10.1016/j.ipl.2009.10.012.

Bailey, James, Thomas Manoukian and Kotagiri Ramamohanarao. 2002. "Fast Algorithms for Mining

Emerging Patterns." In Proceedings of the 6th European Conference on Principles of Data

Mining and Knowledge Discovery, edited, 39-50.

Chang, Joong Hyuk and Won Suk Lee. 2003. "Finding recent frequent itemsets adaptively over online

data streams." In Proceedings of the ninth ACM SIGKDD international conference on

Knowledge discovery and data mining, edited, 487-492: ACM. doi: 10.1145/956750.956807

Chang, Joong Hyuk and Won Suk Lee. 2005. "estWin: Online data stream mining of recent frequent

itemsets by sliding window method." Journal of Information Science 31 (2): 76-90. doi:

10.1177/0165551505050785.

Cheng, James, Yiping Ke and Wilfred Ng. 2008. "A survey on algorithms for mining frequent

itemsets over data streams." Knowledge and Information Systems 16 (1): 1-27. doi:

10.1007/s10115-007-0092-4.

Chi, Yun, Haixun Wang, S Yu Philip and Richard R Muntz. 2004. "Moment: Maintaining closed

frequent itemsets over a stream sliding window." In Fourth IEEE International Conference

on Data Mining ICDM '04, edited, 59-66. doi: 10.1109/ICDM.2004.10084.

Chi, Yun, Haixun Wang, S Yu Philip and Richard R Muntz. 2006. "Catch the moment: maintaining

closed frequent itemsets over a data stream sliding window." Knowledge and Information

Systems 10 (3): 265-294. doi: 10.1007/s10115-006-0003-0.

Clark, Peter and Tim Niblett. 1989. "The CN2 induction algorithm." Machine learning 3 (4): 261-283.

doi: 10.1023/a:1022641700528.

Çokpınar,‎Samet‎and‎Taflan‎İmre‎Gündem.‎2012.‎"Positive‎and‎negative‎association‎rule‎mining‎on

XML data streams in database as a service concept." Expert Systems with Applications 39

(8): 7503-7511. doi: 10.1016/j.eswa.2012.01.128.

Dheeru, Dua and Efi Karra Taniskidou. 2017. "UCI Machine Learning Repository."

Djahantighi, Farhad Siasar, Mohammad-Reza Feizi-Derakhshi, Mir Mohsen Pedram and Zohreh

Alavi. 2010. "An Effective Algorithm for Mining Users Behaviour in Time-Periods."

European Journal of Scientific Research 40 (1): 81-90.

Dong, Guozhu and James Bailey. 2012. Contrast Data Mining: Concepts, Algorithms, and

Applications: CRC Press.

Dong, Guozhu and Jinyan Li. 1999. "Efficient Mining of Emerging Patterns: Discovering Trends and

Differences." In Proceedings of the fifth ACM SIGKDD international conference on

Knowledge discovery and data mining, edited, 43-52. doi: 10.1145/312129.312191.

Dong, Guozhu, Xiuzhen Zhang, Limsoon Wong and Jinyan Li. 1999. "CAEP: Classification by

Aggregating Emerging Patterns, Berlin, Heidelberg, edited, 30-42: Springer Berlin

Heidelberg.

Duan, Lei, Guanting Tang, Jian Pei, James Bailey, Guozhu Dong, Akiko Campbell and Changjie

Tang. 2014. "Mining Contrast Subspaces." In Advances in Knowledge Discovery and Data

Mining: 18th Pacific-Asia Conference, PAKDD 2014, Tainan, Taiwan, May 13-16, 2014.

Proceedings, Part I, Cham, edited by Vincent S. Tseng, Tu Bao Ho, Zhi-Hua Zhou, Arbee L.

P. Chen and Hung-Yu Kao, 249-260: Springer International Publishing. doi: 10.1007/978-3-

319-06608-0_21.

Duan, Lei, Guanting Tang, Jian Pei, James Bailey, Guozhu Dong, Vinh Nguyen, Akiko Campbell and

Changjie Tang. 2016. "Efficient discovery of contrast subspaces for object explanation and

characterization." Knowledge and Information Systems 47 (1): 99-129. doi: 10.1007/s10115-

015-0835-6.

Duda, Richard O and Peter E Hart. 1973. Pattern classification and scene analysis. Vol. 3: Wiley

New York.

Eichinger, Frank, Detlef D. Nauck and Frank Klawonn. 2006. "Sequence Mining for Customer

Behaviour Predictions in Telecommunications." In Proceedings of the Workshop on

Practical Data Mining: Applications, Experiences and Challenges, Berlin, Germany, edited.

Fan, Hongjian and Kotagiri Ramamohanarao. 2002. "An Efficient Single-Scan Algorithm for Mining

Essential Jumping Emerging Patterns for Classification." In Proceedings of the 6th Pacific-

Asia Conference on Advances in Knowledge Discovery and Data Mining, edited, 456-462

Fan, Hongjian and Kotagiri Ramamohanarao. 2003. "A bayesian approach to use emerging patterns

for classification." In Proceedings of the 14th Australasian database conference-Volume 17,

edited, 39-48: Australian Computer Society, Inc.

Farzanyar, Zahra, Mohammadreza Kangavari and Nick Cercone. 2012. "Max-FISM: Mining

(recently) maximal frequent itemsets over data streams using the sliding window model."

Computers & Mathematics with Applications 64 (6): 1706-1718. doi:

10.1016/j.camwa.2012.01.045.

Fournier-Viger, Philippe, Jerry Chun-Wei Lin, Antonio Gomariz, Ted Gueniche, Azadeh Soltani,

Zhihong Deng and Hoang Thanh Lam. 2016. "The SPMF Open-Source Data Mining Library

Version 2." In Machine Learning and Knowledge Discovery in Databases: European

Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016,

Proceedings, Part III, 36-40. Cham: Springer International Publishing. doi: 10.1007/978-3-

319-46131-1_8.

Gao, Chao, Lei Duan, Guozhu Dong, Haiqing Zhang, Hao Yang and Changjie Tang. 2016. "Mining

Top-k Distinguishing Sequential Patterns with Flexible Gap Constraints." In Web-Age

Information Management: 17th International Conference, WAIM 2016, Nanchang, China,

June 3-5, 2016, Proceedings, Part I, edited by Bin Cui, Nan Zhang, Jianliang Xu, Xiang Lian

and Dexi Liu, 82-94. Cham: Springer International Publishing. doi: 10.1007/978-3-319-

39937-9_7.

Giannella, Chris, Jiawei Han, Jian Pei, Xifeng Yan and Philip S. Yu. 2003. "Mining frequent patterns

in data streams at multiple time granularities." Next generation data mining 212: 191-212.

Guo, Jing, Peng Zhang, Jianlong Tan and Li Guo. 2011. "Mining frequent patterns across multiple

data streams." In Proceedings of the 20th ACM international conference on Information and

knowledge management, edited, 2325-2328: ACM. doi: 10.1145/2063576.2063957.

Guo, Lichao, Hongye Su and Yu Qu. 2011. "Approximate mining of global closed frequent itemsets

over data streams." Journal of the Franklin Institute-Engineering and Applied Mathematics

348 (6): 1052-1081. doi: 10.1016/j.jfranklin.2011.04.006.

Han, Jiawei, Jian Pei and Micheline Kamber. 2011. Data mining: concepts and techniques: Elsevier.

Han, Jiawei, Jian Pei, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal and MC

Hsu. 2001. "Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern

growth." In proceedings of the 17th international conference on data engineering (ICDE

2001) edited, 215-224.

Han, Jiawei, Jian Pei and Yiwen Yin. 2000. "Mining frequent patterns without candidate generation."

In ACM Sigmod Record, edited, 1-12: ACM. doi: 10.1145/335191.335372.

Hashemi, Sattar, Ying Yang, Zahra Mirzamomen and Mohammadreza Kangavari. 2009. "Adapted

one-versus-all decision trees for data stream classification." Knowledge and Data

Engineering, IEEE Transactions on 21 (5): 624-637. doi: 10.1109/TKDE.2008.181.

He, Zengyou, Feiyang Gu, Can Zhao, Xiaoqing Liu, Jun Wu and Ju Wang. 2017. "Conditional

discriminative pattern mining." Information Sciences 375 (C): 1-15. doi:

10.1016/j.ins.2016.09.047.

Hollfelder, Silvia, Vincent Oria and M. Tamer ¨Ozsu. 2000. "Mining user behaviour for resource

prediction in interactive electronic malls." In IEEE International Conference on Multimedia

and Expo ICME 2000, edited, 863-866

Huang, Lan, Chun-guang Zhou, Yu-qin Zhou and Zhe Wang. 2008. "Research on Data Mining

Algorithms for Automotive Customers' Behavior Prediction Problem." In 2008 Seventh

International Conference on Machine Learning and Applications, edited, 677-681. doi:

10.1109/ICMLA.2008.23.

Hyuk, Joong and Chang Won Suk Lee. 2003. "Finding Recent Frequent Itemsets Adaptively over

Online Data Streams." In International conference on knowledge discovery and data mining

(KDD’03),‎Washington,‎DC, edited, 487-492. doi: 10.1145/956750.956807.

Jiang, Nan and Le Gruenwald. 2006. "Research issues in data stream association rule mining." ACM

Sigmod Record 35 (1): 14-19. doi: 10.1145/1121995.1121998.

Kompalli, Prasanna Lakshmi and Ramesh Kumar Cherku. 2015. "Efficient Mining of Data Streams

Using Associative Classification Approach." International Journal of Software Engineering

and Knowledge Engineering 25 (03): 605-631. doi: 10.1142/s0218194015500059.

Lakshmi, K Prasanna and CRK Reddy. 2012. "Compact Tree for Associative Classification of Data

Stream Mining." International Journal of Computer Science Issues (IJCSI) 9 (2).

Leung, Carson Kai-Sang and Quamrul I Khan. 2006. "DSTree: A tree structure for the mining of

frequent sets from data streams." In Sixth International Conference on Data Mining

ICDM'06, edited, 928-932. doi: 10.1109/ICDM.2006.62

Li, Hua-Fu and Suh-Yin Lee. 2009. "Mining frequent itemsets over data streams using efficient

window sliding techniques." An International Journal Expert Systems with Applications 36

(2): 1466-1477 doi: 10.1016/j.eswa.2007.11.061.

Li, Hua-Fu, Suh-Yin Lee and Man-Kwan Shan. 2004. "An efficient algorithm for mining frequent

itemsets over the entire history of data streams." In Proceeding of first international

workshop on knowledge discovery in data streams, edited. doi: 10.1016/j.eswa.2007.11.061.

Li, Jinyan, Guozhu Dong and Kotagiri Ramamohanarao. 2000. "Instance-based classification by

emerging patterns." In Principles of Data Mining and Knowledge Discovery, 191-200:

Springer. doi: 10.1007/3-540-45372-5_19.

Li, Jinyan, Guozhu Dong and Kotagiri Ramamohanarao. 2001. "Making use of the most expressive

jumping emerging patterns for classification." Knowledge and Information systems 3 (2):

131-145. doi: 10.1007/3-540-45571-X_29.

Li, Jinyan, Haiquan Li, Limsoon Wong, Jian Pei and Guozhu Dong. 2006. "Minimum description

length principle: generators are preferable to closed patterns." Paper presented at the

Proceedings of the 21st national conference on Artificial intelligence - Volume 1, Boston,

Massachusetts. AAAI Press.

Li, Jinyan, Guimei Liu and Limsoon Wong. 2007. "Mining statistically important equivalence classes

and delta-discriminative emerging patterns." In Proceedings of the 13th ACM SIGKDD

international conference on Knowledge discovery and data mining, edited, 430-439: ACM.

doi: 10.1145/1281192.1281240.

Li, Wenmin, Jiawei Han and Jian Pei. 2001. "CMAR: Accurate and efficient classification based on

multiple class-association rules." In Proceedings IEEE International Conference on Data

Mining (ICDM '01), edited, 369-376: IEEE.

Li, Xiaoli, S Yu Philip, Bing Liu and See-Kiong Ng. 2009. "Positive Unlabeled Learning for Data

Stream Classification." In SDM, edited, 257-268: SIAM. doi: 10.1137/1.9781611972795.23.

Lim, Tjen-Sien, Wei-Yin Loh and Yu-Shan Shih. 2000. "A comparison of prediction accuracy,

complexity, and training time of thirty-three old and new classification algorithms." Machine

learning 40 (3): 203-228.

Lin, Ming-Yen, Sue-Chen Hsueh and Sheng-Kun Hwang. 2008. "Interactive mining of frequent

itemsets over arbitrary time intervals in a data stream." In The nineteenth conference on

Australasian database ADC '08, edited, 15-21.

Lin, Zhenhua, Bin Jiang, Jian Pei and Daxin Jiang. 2010. "Mining discriminative items in multiple

data streams." World Wide Web 13 (4): 497-522. doi: 10.1007/s11280-010-0094-0.

Loekito, Elsa and James Bailey. 2006. "Fast mining of high dimensional expressive contrast patterns

using zero-suppressed binary decision diagrams." Paper presented at the Proceedings of the

12th ACM SIGKDD international conference on Knowledge discovery and data mining,

Philadelphia, PA, USA. ACM. doi: 10.1145/1150402.1150438.

Ma, Bing Liu Wynne Hsu Yiming. 1998. "Integrating classification and association rule mining." In

Proceedings of the 4th, edited.

Manku, Gurmeet Singh. 2016. "Frequent Itemset Mining over Data Streams." In Data Stream

Management: Processing High-Speed Data Streams, edited by Minos Garofalakis, Johannes

Gehrke and Rajeev Rastogi, 209-219. Berlin, Heidelberg: Springer Berlin Heidelberg. doi:

10.1007/978-3-540-28608-0_10.

Manku, Gurmeet Singh and Rajeev Motwani. 2002. "Approximate Frequency Counts over Data

Streams." In Proceedings of the 28th international conference on Very Large Data Bases,

edited, 346-357: VLDB Endowment.

Masud, Mohammad M, Clay Woolam, Jing Gao, Latifur Khan, Jiawei Han, Kevin W Hamlen and

Nikunj C Oza. 2012. "Facing the reality of data stream classification: coping with scarcity of

labeled data." Knowledge and information systems 33 (1): 213-244. doi: 10.1007/s10115-

011-0447-8.

Metwally, Ahmed, Divyakant Agrawal and Amr El Abbadi. 2005. "Efficient computation of frequent

and top-k elements in data streams." In International Conference on Database Theory,

edited, 398-412: Springer. doi: 10.1007/978-3-540-30570-5_27.

Minos Garofalakis, Johannes Gehrke and Rajeev Rastogi. 2002. "Querying and mining data streams:

you only get one look". In Tutorial notes of the 28th International Conference on Very Large

Databases, Hong Kong, China.

Mori, Taketoshi, Aritoki Takada, Hiroshi Noguchi, Tatsuya Harada and Tomomasa Sato. 2005.

"Behavior prediction based on daily-life record database in distributed sensing space." In

IEEE/RSJ International Conference on Intelligent Robots and Systems, Vols 1-4, edited,

1703-1709: IEEE. doi: 10.1109/iros.2005.1545244.

Nowozin, Sebastian, Gokhan Bakir and Koji Tsuda. 2007. "Discriminative subsequence mining for

action classification." In 11th International Conference on Computer Vision, edited, 1-8:

IEEE. doi: 10.1109/ICCV.2007.4409049

Patel, Dhaval, Wynne Hsu and Mong Li Lee. 2011. "Discriminative Mutation Chains in Virus

Sequences." In Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International

Conference on, edited, 9-16: IEEE. doi: 10.1109/ICTAI.2011.11.

Peng, Wen-Chih and Zhung-Xun Liao. 2009. "Mining sequential patterns across multiple sequence

databases." Data & Knowledge Engineering 68 (10): 1014-1033.

Prasad, U. Devi and S. Madhavi. 2012. "Prediction of Churn Behavior of Bank Customers Using Data

Mining Tools." Business Intelligence Journal 5 (1): 96-101

Quinlan, J Ross. 2014. C4.5: programs for machine learning: Elsevier.

Roberto J. Bayardo, Jr. 1998. "Efficiently mining long patterns from databases." Paper presented at

the Proceedings of the 1998 ACM SIGMOD international conference on Management of

data, Seattle, Washington, USA. ACM. doi: 10.1145/276304.276313.

Saengthongloun, Bordin, Thanapat Kangkachit, Thanawin Rakthanmanon and Kitsana Waiyamai.

2013. "AC-Stream: Associative classification over data streams using multiple class

association rules." In Computer Science and Software Engineering (JCSSE), 2013 10th

International Joint Conference on, edited, 223-228: IEEE. doi:

10.1109/JCSSE.2013.6567349.

Seyfi, Majid. 2011. "Mining discriminative items in multiple data streams with hierarchical counters

approach." In Fourth International Workshop on Advanced Computational Intelligence

(IWACI), 2011, edited, 172-176 IEEE. doi: 10.1109/IWACI.2011.6159996.

Seyfi, Majid, Shlomo Geva and Richi Nayak. 2014. "Mining Discriminative Itemsets in Data

Streams." In International Conference on Web Information Systems Engineering, edited,

125-134: Springer. doi: 10.1007/978-3-319-11749-2_10

Seyfi, Majid, Richi Nayak, Yue Xu and Shlomo Geva. 2017. "Efficient mining of discriminative

itemsets." Paper presented at the Proceedings of the International Conference on Web

Intelligence, Leipzig, Germany. ACM. doi: 10.1145/3106426.3106429.

Shin, Se Jung and Won Suk Lee. 2008. "On-line generation association rules over data streams."

Information and Software Technology 50 (6): 569-578. doi: 10.1016/j.infsof.2007.06.005.

Song, Zhen-Hui and Yi Li. 2010. "Associative classification over Data Streams." In Information

Engineering and Computer Science (ICIECS), 2010 2nd International Conference on, edited,

1-4: IEEE.

Su, Li, Hong-yan Liu and Zhen-Hui Song. 2011. "A new classification algorithm for data stream."

International Journal of Modern Education and Computer Science (IJMECS) 3 (4): 32. doi:

10.5815/ijmecs.2011.04.05.

Tanbeer, Syed Khairuzzaman, Chowdhury Farhan Ahmed, Byeong-Soo Jeong and Young-Koo Lee.

2009. "Sliding window-based frequent pattern mining over data streams." Information

sciences 179 (22): 3843-3865. doi: 10.1016/j.ins.2009.07.012.

Thabtah, Fadi. 2007. "A review of associative classification mining." The Knowledge Engineering

Review 22 (01): 37-65. doi: 10.1017/s0269888907001026.

Tsai, Pauray SM. 2009. "Mining frequent itemsets in data streams using the weighted sliding window

model." Expert Systems with Applications 36 (9): 11617-11625. doi:

10.1016/j.eswa.2009.03.025.

Tseng, Vincent S and Kawuu W Lin. 2006. "Efficient mining and prediction of user behavior patterns

in mobile web systems." Information and Software Technology 48 (6): 357-369. doi:

10.1016/j.infsof.2005.12.014.

Waiyamai, Kitsana, Thanapat Kangkachit, Bordin Saengthongloun and Thanawin Rakthanmanon.

2014. "ACCD: Associative Classification over Concept-Drifting Data Streams." In Machine

Learning and Data Mining in Pattern Recognition, 78-90: Springer. doi: 10.1007/978-3-319-

08979-9_7

Wang, Haixun, Wei Fan, Philip S. Yu and Jiawei Han. 2003. "Mining Concept-Drifting Data Streams

using Ensemble Classifiers." In Proceedings of the ninth ACM SIGKDD international

conference on Knowledge discovery and data mining, edited, 226-235 doi:

10.1145/956750.956778.

Wang, Jianyong and Jiawei Han. 2004. "BIDE: Efficient Mining of Frequent Closed Sequences."

Paper presented at the Proceedings of the 20th International Conference on Data

Engineering. IEEE Computer Society.

Wu, Xindong, Chengqi Zhang and Shichao Zhang. 2004. "Efficient mining of both positive and

negative association rules." ACM Transactions on Information Systems (TOIS) 22 (3): 381-

405. doi: 10.1145/1010614.1010616.

Yu, Jeffery Xu, Zhihong Chong, Hongjun Lu and Aoying Zhou. 2004. "False positive or false

negative: mining frequent tenets from high speed transactional data streams." In Thirtieth

International conference on Very large data bases VLDB 04, edited, 204-215

Yu, Kui, Wei Ding, Dan A Simovici and Xindong Wu. 2012. "Mining emerging patterns by streaming

feature selection." In Proceedings of the 18th ACM SIGKDD international conference on

Knowledge discovery and data mining, edited, 60-68: ACM. doi: 10.1145/2339530.2339544.

Yu, Kui, Wei Ding, Dan A. Simovici, Hao Wang, Jian Pei and Xindong Wu. 2015. "Classification

with Streaming Features: An Emerging-Pattern Mining Approach." ACM Transactions on

Knowledge Discovery from Data (TKDD) 9 (4): 1-31. doi: 10.1145/2700409.

Yu, Kui, Wei Ding, Hao Wang and Xindong Wu. 2013. "Bridging causal relevance and pattern

discriminability: Mining emerging patterns from high-dimensional data." IEEE Transactions

on Knowledge and Data Engineering 25 (12): 2721-2739. doi: 10.1109/TKDE.2012.218.

Yuan, Xiaohui, Bill P. Buckles, Zhaoshan Yuan and Jian Zhang. 2002. "Mining negative association

rules." In Seventh International Symposium on Computers and Communications, edited, 623-

Zaki, Mohammed J. 2001. "SPADE: An efficient algorithm for mining frequent sequences." Machine

learning 42 (1-2): 31-60. doi: 10.1023/A:1007652502315.

Zaki, Mohammed J. and Ching-Jui Hsiao. 2002. "CHARM: An Efficient Algorithm for Closed

Itemset Mining." In Proceedings of the 2002 SIAM International Conference on Data

Mining, edited, 457-473. doi: 10.1137/1.9781611972726.27.

Zhang, Peng, Xingquan Zhu, Jianlong Tan and Li Guo. 2010. "Classifier and cluster ensembles for

mining concept drifting data streams." In IEEE 10th International Conference on Data

Mining ICDM'10, edited, 1175-1180. doi: 10.1109/ICDM.2010.125.

Zhang, Xiuzhen, Guozhu Dong and Ramamohanarao Kotagiri. 2000. "Exploring Constraints to

Efeciently Mine Emerging Patterns from Large High-dimensional Datasets." In Proceedings

of the sixth ACM SIGKDD international conference on Knowledge discovery and data

mining, edited, 310-314 doi: 10.1145/347090.347158.

Zhao, Li, Lei Wang and Qingzheng Xu. 2012. "Data stream classification with artificial endocrine

system." Applied Intelligence 37 (3): 390-404. doi: 10.1007/s10489-011-0334-8.

Zhu, Xingquan and Xindong Wu. 2007. "Discovering relational patterns across multiple databases." In

2007 IEEE 23rd International Conference on Data Engineering (ICDE 2007), edited, 726-

735: IEEE. doi: 10.1109/ICDE.2007.367918.

Zhu, Xingquan, Xindong Wu and Ying Yang. 2006. "Effective Classification of Noisy Data Streams

with Attribute-Oriented Dynamic Classifier Selection." Knowledge and Information Systems

archive 9 (3): 339-363 doi: 10.1007/s10115-005-0212-y.

mining discriminative itemsets in ata streams using ... · empirical analysis shows that the...

Documents

mining of emerging pattern: discovering frequent itemsets...

food-101 { mining discriminative components with random...

the concept of maximal frequent itemsets

text clustering using frequent itemsets

discriminative estimation (maxentmodelsandperceptron)

unsupervised discovery of mid-level discriminative...

discriminative random fields

mining frequent itemsets over uncertain databases

discriminative adaptive training and discriminative...

frequent itemsets

a better approach to mine frequent itemsets using … better...

on measuring similarity for sequences of itemsets

cleaning data with forbidden itemsets

mining top-k high utility itemsets

20170806 discriminative optimization

a fast high utility itemsets mining algorithm

data partitioning for fast mining of frequent itemsets in...

histopathological image classiﬁcation using discriminative...

novel algorithm for mining high utility itemsets

mining high utility itemsets without candidate generation