hardware enhanced association rule mining with hashing and pipelining

12
Hardware-Enhanced Association Rule Mining with Hashing and Pipelining Ying-Hsiang Wen, Jen-Wei Huang, and Ming-Syan Chen, Fellow, IEEE Abstract—Generally speaking, to implement Apriori-based association rule mining in hardware, one has to load candidate itemsets and a database into the hardware. Since the capacity of the hardware architecture is fixed, if the number of candidate itemsets or the number of items in the database is larger than the hardware capacity, the items are loaded into the hardware separately. The time complexity of those steps that need to load candidate itemsets or database items into the hardware is in proportion to the number of candidate itemsets multiplied by the number of items in the database. Too many candidate itemsets and a large database would create a performance bottleneck. In this paper, we propose a HAsh-based and PiPelIned (abbreviated as HAPPI) architecture for hardware- enhanced association rule mining. We apply the pipeline methodology in the HAPPI architecture to compare itemsets with the database and collect useful information for reducing the number of candidate itemsets and items in the database simultaneously. When the database is fed into the hardware, candidate itemsets are compared with the items in the database to find frequent itemsets. At the same time, trimming information is collected from each transaction. In addition, itemsets are generated from transactions and hashed into a hash table. The useful trimming information and the hash table enable us to reduce the number of items in the database and the number of candidate itemsets. Therefore, we can effectively reduce the frequency of loading the database into the hardware. As such, HAPPI solves the bottleneck problem in a priori-based hardware schemes. We also derive some properties to investigate the performance of this hardware implementation. As shown by the experiment results, HAPPI significantly outperforms the previous hardware approach and the software algorithm in terms of execution time. Index Terms—Hardware enhanced, association rule. Ç 1 INTRODUCTION D ATA mining technology is now used in a wide variety of fields. Applications include the analysis of customer transaction records, web site logs, credit card purchase information, call records, to name a few. The interesting results of data mining can provide useful information such as customer behavior for business managers and research- ers. One of the most important data mining applications is association rule mining [11], which can be described as follows: Let I¼fi 1 ;i 2 ; ... ;i n g denote a set of items; let D denote a set of database transactions, where each transac- tion T is a set of items such that T I ; and let X denote a set of items, called an itemset. A transaction T contains X if and only if X T . An association rule is an implication of the form X¼)Y , where X I , Y I , and X T Y ¼ 0. The rule X¼)Y has support s percent in the transaction set D if s percent of transactions in D contain X S Y . The rule X¼)Y holds in the transaction set D with confidence c percent if c percent of transactions in D that contain X also contain Y . The support of the rule X¼)Y is given by s percent ¼ jfT 2 DjX S Y T gj jDj 100 percent; where j:j indicates the number of transactions. The confidence of the rule X¼)Y is given by c percent ¼ suppððX [ Y Þ suppðXÞ 100 percent: A typical example of an association rule is that 80 percent of customers who purchase beef steak and goose liver paste would also prefer to buy bottles of red wine. Once we have found all frequent itemsets that meet the minimum support requirement, calculation of confidence for each rule is trivial. Therefore, we only need to focus on the methods of finding the frequent itemsets in the database. The Apriori [2] approach was the first to address this issue. Apriori finds frequent itemsets by scanning a database to check the frequencies of candidate itemsets, which are generated by merging frequent subitemsets. However, Apriori-based algorithms have undergone bottlenecks because they have too many candidate itemsets. DHP [16] proposed a hash table scheme, which effectively reduces the number of candidate itemsets. In addition, several mining techniques, such as TreeProjection [1], the FP-growth algorithm [12], partitioning [18], sampling [19], and the Hidden Markov Model [5] have also received a significant amount of research attention. With the increasing amount of data, it is important to develop more efficient algorithms to extract knowledge from the data. However, the volume of data size is increasing much faster than CPU execution speeds, which has a strong influence on the performance of software algorithms. Several works [7], [8] have proposed parallel computing schemes to execute operations simultaneously 784 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008 . The authors are with the National Taiwan University, 106, No. 1, Sec. 4, Roosevelt Road, Taipei, Taiwan. E-mail: {winshung, jwhuang}@arbor.ee.ntu.edu.tw, [email protected]. Manuscript received 25 Feb. 2007; revised 9 Aug. 2007; accepted 8 Oct. 2007; published online 11 Feb. 2008. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TKDE-0086-0207. Digital Object Identifier no. 10.1109/TKDE.2008.39. 1041-4347/08/$25.00 ß 2008 IEEE Published by the IEEE Computer Society

Post on 21-Oct-2014

557 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Hardware enhanced association rule mining with hashing and pipelining

Hardware-Enhanced Association Rule Miningwith Hashing and Pipelining

Ying-Hsiang Wen, Jen-Wei Huang, and Ming-Syan Chen, Fellow, IEEE

Abstract—Generally speaking, to implement Apriori-based association rule mining in hardware, one has to load candidate itemsets

and a database into the hardware. Since the capacity of the hardware architecture is fixed, if the number of candidate itemsets or the

number of items in the database is larger than the hardware capacity, the items are loaded into the hardware separately. The time

complexity of those steps that need to load candidate itemsets or database items into the hardware is in proportion to the number of

candidate itemsets multiplied by the number of items in the database. Too many candidate itemsets and a large database would create

a performance bottleneck. In this paper, we propose a HAsh-based and PiPelIned (abbreviated as HAPPI) architecture for hardware-

enhanced association rule mining. We apply the pipeline methodology in the HAPPI architecture to compare itemsets with the

database and collect useful information for reducing the number of candidate itemsets and items in the database simultaneously.

When the database is fed into the hardware, candidate itemsets are compared with the items in the database to find frequent itemsets.

At the same time, trimming information is collected from each transaction. In addition, itemsets are generated from transactions and

hashed into a hash table. The useful trimming information and the hash table enable us to reduce the number of items in the database

and the number of candidate itemsets. Therefore, we can effectively reduce the frequency of loading the database into the hardware.

As such, HAPPI solves the bottleneck problem in a priori-based hardware schemes. We also derive some properties to investigate the

performance of this hardware implementation. As shown by the experiment results, HAPPI significantly outperforms the previous

hardware approach and the software algorithm in terms of execution time.

Index Terms—Hardware enhanced, association rule.

Ç

1 INTRODUCTION

DATA mining technology is now used in a wide variety offields. Applications include the analysis of customer

transaction records, web site logs, credit card purchaseinformation, call records, to name a few. The interestingresults of data mining can provide useful information suchas customer behavior for business managers and research-ers. One of the most important data mining applications isassociation rule mining [11], which can be described asfollows: Let I ¼ fi1; i2; . . . ; ing denote a set of items; let Ddenote a set of database transactions, where each transac-tion T is a set of items such that T � I ; and let X denote aset of items, called an itemset. A transaction T contains X ifand only if X � T . An association rule is an implication ofthe form X¼)Y , where X � I , Y � I , and X

TY ¼ �. The

rule X¼)Y has support s percent in the transaction set D ifs percent of transactions in D contain X

SY . The rule

X¼)Y holds in the transaction set D with confidencec percent if c percent of transactions in D that contain X alsocontain Y . The support of the rule X¼)Y is given by

s percent ¼ jfT 2 DjXSY � Tgj

jDj � 100 percent;

where j:j indicates the number of transactions. Theconfidence of the rule X¼)Y is given by

c percent ¼ suppððX [ Y ÞsuppðXÞ � 100 percent:

A typical example of an association rule is that 80 percent ofcustomers who purchase beef steak and goose liver pastewould also prefer to buy bottles of red wine. Once we havefound all frequent itemsets that meet the minimum supportrequirement, calculation of confidence for each rule istrivial. Therefore, we only need to focus on the methods offinding the frequent itemsets in the database. The Apriori[2] approach was the first to address this issue. Apriorifinds frequent itemsets by scanning a database to check thefrequencies of candidate itemsets, which are generated bymerging frequent subitemsets. However, Apriori-basedalgorithms have undergone bottlenecks because they havetoo many candidate itemsets. DHP [16] proposed a hashtable scheme, which effectively reduces the number ofcandidate itemsets. In addition, several mining techniques,such as TreeProjection [1], the FP-growth algorithm [12],partitioning [18], sampling [19], and the Hidden MarkovModel [5] have also received a significant amount ofresearch attention.

With the increasing amount of data, it is important todevelop more efficient algorithms to extract knowledgefrom the data. However, the volume of data size isincreasing much faster than CPU execution speeds, whichhas a strong influence on the performance of softwarealgorithms. Several works [7], [8] have proposed parallelcomputing schemes to execute operations simultaneously

784 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008

. The authors are with the National Taiwan University, 106, No. 1, Sec. 4,Roosevelt Road, Taipei, Taiwan.E-mail: {winshung, jwhuang}@arbor.ee.ntu.edu.tw,[email protected].

Manuscript received 25 Feb. 2007; revised 9 Aug. 2007; accepted 8 Oct. 2007;published online 11 Feb. 2008.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TKDE-0086-0207.Digital Object Identifier no. 10.1109/TKDE.2008.39.

1041-4347/08/$25.00 � 2008 IEEE Published by the IEEE Computer Society

Page 2: Hardware enhanced association rule mining with hashing and pipelining

on multiprocessors. The performance, however, cannotimprove linearly as the number of the parallel nodes grows.Therefore, some researchers have tried to use hardwaredevices to accomplish data mining tasks. In [15], Liu et al.proposed a parallel matrix hardware architecture, whichcan efficiently generate candidate 2-itemsets, for high-throughput data stream applications. Baker and Prasanna[3], [4] designed scalable hardwares for association rulemining by utilizing the systolic array proposed in [13] and[14]. The architecture utilizes parallel computing techniquesto execute a large number of pattern matching operations atthe same time. Other hardware architectures [6], [9], [10],[20] have been designed to speed up the K-means clusteringalgorithm.

Generally speaking, Apriori-based hardware schemesrequire loading the candidate itemsets and the database intothe hardware. Since the capacity of the hardware is fixed, ifthe number of items in the database is larger than thehardware capacity, the data items must be loaded sepa-rately. Therefore, the process of comparing candidate item-sets with the database needs to be executed several times.Similarly, if the number of candidate itemsets is larger thanthe capacity of the hardware, the pattern matching proce-dure has to be separated into many rounds. Clearly, it isinfeasible for any hardware design to load the candidateitemsets and the database into hardware for multiple times.Since the time complexity of those steps that need to loadcandidate itemsets or database items into the hardware is inproportion to the number of candidate itemsets and thenumber of items in the database, this procedure is very timeconsuming. In addition, numerous candidate itemsets and ahuge database may cause a bottleneck in the system.

In this paper, we propose a HAsh-based and PiPelIned(abbreviated as HAPPI) architecture for hardware-enhancedassociation rule mining. That is, we identify certain parts ofthe mining process that is suitable and will benefit fromhardware implementation and perform hardware-enhancedmining. Explicitly, we incorporate the pipeline methodologyinto the HAPPI architecture to compare itemsets and collectuseful information that enables us to reduce the number ofcandidate itemsets and items in the database simulta-neously. As shown in Fig. 1, there are three hardwaremodules in our system. First, when the database is fed intothe hardware, the candidate itemsets are compared with theitems in the database by the systolic array. Candidateitemsets that have a higher frequency than the minimumsupport value are viewed as frequent itemsets. Second, wedetermine the frequency that each item occurs in thecandidate itemsets in the transactions at the same time.These frequencies are called trimming information. Fromthis information, infrequent items in the transactions can beeliminated since they are not useful in generating frequentitemsets through the trimming filter. Third, we generate

itemsets from transactions and hash them into the hashtable, which is then used to filter out unnecessary candidateitemsets. After the hardware compares candidate itemsetswith the items in the database, the trimming information iscollected and the hash table is built. The useful informationhelps us to reduce the number of items in the database andthe number of candidate itemsets. Based on the trimminginformation, items are trimmed if their correspondingoccurrence frequencies are not larger than the length of thecurrent candidate itemsets. In addition, after the candidateitemsets are generated by merging frequent subitemsets,they are sent to the hash table filter. If the number of itemsetsin the corresponding bucket of the hash table is less than theminimum support, the candidate itemsets are pruned. Assuch, HAPPI solves the bottleneck problem mentionedearlier by the cooperation of these three hardware modules.To achieve these goals, we devise the following fiveprocedures in the HAPPI architecture: support counting,transaction trimming, hash table building, candidate gen-eration, and candidate pruning. Moreover, we derive severalformulas to decide the optimal design in order to reduce theoverhead induced by the pipeline scheme and the idealnumber of hardware modules to achieve the best utilization.The execution time between sequential processing andpipeline processing is also analyzed in this paper.

We conduct several experiments to evaluate the perfor-mance of the HAPPI architecture. In addition, we implementthe work of Baker and Prasanna [3] and a software algorithmDHP [16] for comparison purposes. The experiment resultsshow that HAPPI outperforms the previous approach onexecution time significantly, especially when the number ofitems in the database is large and the minimum supportvalue increases. Moreover, the performance of HAPPI isbetter than that of the previous approach [3] when thesystolic array contains different numbers of hardware cells.In fact, by using only 25 hardware cells in the systolic array,we can achieve the same performance as more than 800hardware cells in the previous approach. The advantages ofthe HAPPI architecture are that it has more computingpower and saves the space costs for mining association rulesin hardware design. The scale-up experiment also showsthat HAPPI outperforms the previous approach on differentnumbers of transactions in the database. Indeed, ourarchitecture is a good example to demonstrate the metho-dology of performance enhancement by hardware. Weimplement our architecture on a commercial FPGA board.It is easily to be realized in a custom ASIC. With the progressin IC process technology, the performance of HAPPI willfurther be improved. In view of the fast increase in theamount of data in various emerging mining applications(e.g., network application mining, data stream mining, andbioinformatics data mining), it is envisioned that hardware-enhanced mining is an important research direction toexplore for future data mining tasks.

The remainder of the paper is organized as follows: Wediscuss related works in Section 2. The preliminaries arepresented in Section 3. The HAPPI architecture is describedin Section 4. Next, we show several experiments conductedon HAPPI in Section 5. Finally, we present our conclusionsin Section 6.

WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 785

Fig. 1. System architecture.

Page 3: Hardware enhanced association rule mining with hashing and pipelining

2 RELATED WORKS

In this section, we discuss two previous works that use asystolic array architecture to enhance the performance ofdata mining.

The Systolic Process Array (SPA) architecture is pro-posed in [10] to perform K-means clustering. SPA accel-erates the processing speed by utilizing several hardwarecells to calculate the distances in parallel. Each cellcorresponds to a cluster and stores the centroid of thecluster in local memory. The data flows linked by each cellinclude the data object, the minimum distance between theobject and its closest centroid, and the closest centroid of theobject. The cell computes the distance between the centroidand the input data object. Based on the resulting distance,the cell updates the minimum distance and the closestcentroid of the data object. Therefore, the system can obtainthe closest centroid of each object, respectively, from SPA.The centroids are recomputed and updated by the system,and the new centroids are sent to the cells. The systemcontinuously updates clustering results.

In [3], the authors implemented a systolic array withseveral hardware cells to speed up the Apriori algorithm.Each cell performs an ALU (larger than, smaller than, orequal to) operation, which compares the incoming itemwith items in the memory of the cell. This operationgenerates frequent itemsets by comparing candidate item-sets with the items in the database. Since all the cells canexecute their own operations simultaneously, the perfor-mance of the architecture is better than that of a singleprocessor. However, the number of cells in the systolicarray is fixed. If the number of candidate itemsets is largerthan the number of hardware cells, the pattern matchingprocedure has to be separated into many rounds. It isinfeasible to load candidate itemsets and the database intothe hardware for multiple times. As reported in [3], theperformance is only about four times faster than somesoftware algorithms. Hence, there is much room to improvethe execution time.

3 PRELIMINARIES

The hash table scheme proposed in DHP [16] improves theperformance of Apriori-based algorithms by filtering outinfrequent candidate itemsets. In addition, DHP employs aneffective pruning scheme to eliminate infrequent items intransactions. We summarize these two schemes below.

In the hash table scheme, a hash function is applied to allof candidate k-itemsets generated by frequent subitemsets.Each candidate k-itemset is mapped to a hash value, anditemsets with the same hash value are put into the samebucket of the hash table. If the number of the candidateitemsets in the bucket is less than the minimum supportthreshold, the number of these candidate itemsets in thedatabase is less than the minimum support threshold. As aresult, these candidate itemsets cannot be frequent and areremoved from the system. On the other hand, if the numberof the candidate itemsets in the bucket is larger than theminimum support threshold, the itemsets are carried to realfrequency testing process by scanning the database.

The hash table for filtering candidate k-itemsets Hk isbuilt by hashing the k-itemsets generated by each transac-tion. A hash table contains n buckets, where n is an arbitrarynumber. When an itemset is hashed to the bucket i, thenumber of itemsets in the bucket is increased by one. Thenumber of itemsets in each bucket represents the accumu-lated frequency of the itemsets whose hash values areassigned to that bucket. After candidate k-itemsets havebeen generated, they are hashed and assigned to buckets ofHk. If the number of itemsets in a bucket is less than theminimum support, candidate itemsets in this bucket areremoved. The example in Fig. 2 demonstrates how to buildH2 and how to use it to filter out candidate 2-itemsets. Afterwe scan the transaction TID ¼ 100, < AC > , < AD > , and< CD > are hashed to the buckets. According to the hashfunction shown in Fig. 2, the hash values of < AC > ,< AD > , and < CD > are 6, 0, and 6, respectively. As aresult, the number of itemsets in the buckets indexed by 6, 0,and 6 is increased by one. After all the transactions in thedatabase have been scanned, frequent 1-itemsets are found,i.e., L1 ¼ fA;B;C;Eg. In addition, the number of itemsets inthe buckets of H2 are < 3; 1; 2; 0; 3; 1; 3 > , and the minimumsupport frequency is 2. Thus, the candidate 2-itemsets inbuckets 1, 3, and 5 should be pruned. If we generatecandidate 2-itemsets from L1 � L1 directly, the original set ofcandidate 2-itemsets C2 is

f< AB >;< AC >;< AE >;< BC >;< BE >;< CE >g:

After filtering out unnecessary candidate itemsets bychecking H2, the new C02 becomes

f< AC >;< BC >;< BE >;< CE >g:

Therefore, the number of candidate itemsets can bereduced.

The pruning scheme which is able to filter out infrequentitems in the transactions will be implemented in hardware.The theoretical backgrounds of the pruning scheme arebased on the following two theorems which were presentedin [17]:

Theorem 1. A transaction can only be used to support the set offrequent ðkþ 1Þ-itemsets if it consists of at least ðkþ 1Þcandidate k-itemsets.

Theorem 2. An item in a transaction can be trimmed if it doesnot appear in at least k of the candidate k-itemsets contained

in the transaction.

786 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008

Fig. 2. The process of building H2 and using H2 to filter out C2.

Page 4: Hardware enhanced association rule mining with hashing and pipelining

Based on Theorem 2, whether an item can be trimmed ornot depends on how many candidate itemsets in the currenttransaction contain this item. The transaction trimmingmodule is based on the frequencies of all candidate itemsetsin an individual transaction. Therefore, we can handleevery transaction independently regardless of other trans-actions in the database. A counter array a½ � is used to recordthe number of times that each item in a transaction occurs inthe candidate k-itemsets. That is, counter a½i� represents thefrequency of ith item in the transaction. If a candidatek-itemset is a subset of the transaction, the numbers in thecounter array of the corresponding items that appear in thiscandidate itemset are increased by one. After comparingwith all the candidate k-itemsets, if the value of a counter isless than k, the item in the transaction is trimmed, as shownin Fig. 3. For example, transaction TID ¼ 100 has threeitems fA;C;Dg. Counter a½0� represents A, a½1� representsC, and a½2� represents D. Transaction TID ¼ 300 has fouritems fA;B;C;Eg. Counter a½0� corresponds to A, a½1�corresponds to B, a½2� corresponds to C, and a½3� corre-sponds to E, respectively. After the comparison with the setof candidate 2-itemsets, the values of the counter array forTID ¼ 100 are < 1; 1; 0 > and the values of the counterarray for TID ¼ 300 are < 1; 2; 2; 2 > . Since all values of thecounter array for TID ¼ 100 are less than 2, all correspond-ing items are trimmed from the transaction TID ¼ 100. Onthe other hand, because the value of a½0� for TID ¼ 300 isless than 2, item A is trimmed from the transaction.Therefore, transaction TID ¼ 300 becomes fB;C;Eg.

4 HAPPI ARCHITECTURE

As noted earlier, Apriori-based hardware schemes have toload candidate itemsets and the database into the hardwareto execute the comparison process. Too many candidateitemsets and a huge database would cause a performancebottleneck. To solve this problem, we propose the HAPPIarchitecture to deal with efficient hardware-enhancedassociation rule mining. We incorporate the pipelinemethodology into the HAPPI architecture to performpattern matching and collect useful information to reducethe number of candidate itemsets and items in the databasesimultaneously. In this way, HAPPI effectively solves thebottleneck problem.

In Section 4.1, we introduce our system architecture. InSection 4.2, the pipeline scheme of the HAPPI architecture ispresented. The transaction trimming scheme is given inSection 4.3. Then, we describe the hardware design of hashtable filter in Section 4.4. Finally, we derive some propertiesfor performance evaluation in Section 4.5.

4.1 System Architecture

As shown in Fig. 4, the HAPPI architecture consists of asystolic array, a trimming filter, and a hash table filter.There are several hardware cells in the systolic array. Eachcell can perform the comparison operation. Based on thecomparison results, the cells update the support counters ofcandidate itemsets and the occurrence frequencies of itemsin the trimming information. A trimming filter thenremoves infrequent items in the transactions according tothe trimming information. In addition, we build a hashtable by hashing itemsets generated by each transaction.The hash table filter then prunes unsuitable candidateitemsets.

To find frequent k-itemsets and generate candidate ðkþ1Þ-itemsets efficiently, we devise five procedures in theHAPPI architecture using the three hardware modules: thesystolic array, the trimming filter, and the hash table filter.The procedures are support counting, transaction trimming,hash table building, candidate generation, and candidatepruning. The work flow is shown in Fig. 5. The supportcounting procedure finds frequent itemsets by comparingcandidate itemsets with transactions in the database. Byloading candidate k-itemsets and streaming transactions into

WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 787

Fig. 3. An example of transaction trimming.

Fig. 4. The HAPPI architecture: (a) systolic array, (b) trimming filter, and

(c) hash table filter.

Page 5: Hardware enhanced association rule mining with hashing and pipelining

the systolic array, the frequencies that candidate itemsets

occur in the transactions can be determined. Note that if the

number of candidate itemsets is larger than the number of

hardware cells in the systolic array, the candidate itemsets

are separated into several groups. Some of the candidate

itemsets are loaded into the hardware cells and the database

is fed into the systolic array. Afterward, the other candidate

itemsets are loaded into the systolic array one by one. To

complete the comparison with all the candidate itemsets, the

database has to be examined several times. To reduce the

overhead of repeated loading, we design two additional

hardware modules, namely, a trimming filter and a hash

table filter. Infrequent items in the database are eliminated

by the trimming filter, and the number of candidate itemsets

is reduced by the hash table filter. Therefore, the time

required for support counting procedure can be effectively

reduced.After all the candidate k-itemsets have been compared

with the transactions, their frequencies are sent back to the

system. The frequent k-itemsets can be obtained from the

candidate k-itemsets whose occurrence frequencies are

larger than the minimum support. While the transactions

are being compared with the candidate itemsets, the

corresponding trimming information is collected. The

occurrence frequency of each item, which is contained in

the candidate itemsets in the transactions, is recorded and

updated to the trimming information. After comparing

candidate itemsets with the database, the trimming

information is collected. The occurrence frequencies and

the corresponding transactions are then transmitted to the

trimming filter, and infrequent items are trimmed from

the transactions according to the occurrence frequencies in

the trimming information. Then, the hash table building

procedure generates ðkþ 1Þ-itemsets from the trimmed

transactions. These ðkþ 1Þ-itemsets are hashed into the

hash table for processing. Next, the candidate generation

procedure is also executed by the systolic array. The

frequent k-itemsets are fed into the systolic array for

comparison with other frequent k-itemsets. The candidate

ðkþ 1Þ-itemsets are generated by the systolic injection and

stalling techniques similar to [3]. The candidate pruning

procedure uses the hash table to filter candidate ðkþ1Þ-itemsets that are not possible to be frequent itemsets.

Then, the procedure reverts to the support counting

procedure. The pruned candidate ðkþ 1Þ-itemsets are

loaded into the systolic array for comparison with

transactions that have been trimmed already. The above

five processes are executed repeatedly until all frequent

itemsets have been found.

4.2 Pipeline Design

We observe that the transaction trimming and the hash tablebuilding procedures are blocked by the support countingprocedure. The transaction trimming procedure has toobtain trimming information to execute the trimmingprocess. However, this process cannot be completed untilthe support counting procedure compares all the transac-tions with all the candidate itemsets. In addition, the hashtable building procedure has to get the trimmed transactionsfrom the trimming filter after all the transactions have beentrimmed. This problem can be resolved by applying thepipeline scheme, which utilize the three hardware modulessimultaneously in the HAPPI framework. First, we dividethe database into Npipe parts. One part of the transactions inthe database is streamed into the systolic array and thesupport counting process is performed on all candidateitemsets. After comparing the transactions with all thecandidate itemsets, the transactions and their trimminginformation are passed to the trimming filter first. Thesystolic array then processes the next group of transactions.After items have been trimmed from a transaction by thetrimming filter, the transaction is passed to the hash tablefilter, as shown in Fig. 6, and the trimming filter can dealwith the next transaction. In this way, all the hardwaremodules can be utilized simultaneously. Although thepipelined architecture improves the system’s performance,it increases the computational overhead because of multipletimes of loading candidate itemsets into the systolic array.The performance of the pipeline scheme and the improveddesign of the HAPPI architecture are discussed in Section 4.5.

4.3 Transaction Trimming

While the support counting procedure is being executed, thewhole database is streamed into the systolic array. However,not all the transactions are useful for generating frequentitemsets. Therefore, we filter out items in the transactionsaccording to Theorem 2 so that the database is reduced. Inthe HAPPI architecture, the trimming information recordsthe frequency of each item in a transaction that appears inthe candidate itemsets. The support counting and trimminginformation collecting operations are similar since they allneed to compare candidate itemsets with transactions.Therefore, in addition to transactions in the database, theircorresponding trimming information is also fed into thesystolic array in another pipe, while the support countingprocess is being executed. As shown in Fig. 7, a trimmingvector is embedded in each hardware cell of the systolicarray to record items that are matched with candidate

788 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008

Fig. 5. The procedure flow of one round.

Fig. 6. A diagram of the pipeline procedures.

Page 6: Hardware enhanced association rule mining with hashing and pipelining

itemsets. The ith flag in the trimming vector is set to true ifthe ith item in the transaction matches the candidate itemset.After comparing the candidate itemset with all the items in atransaction, if the candidate itemset is a subset of thetransaction, the incoming corresponding trimming informa-tion will be accumulated according to the trimming vector.Since transactions and trimming information are input indifferent pipes, support counters and trimming informationcan be updated simultaneously in a hardware cell.

In Fig. 7a, the candidate itemset < BC > is stored in thecandidate memory, and a transaction fA;B;C;D;Eg isabout to be fed into the cell. The resultant trimming vectorafter comparing < BC > with all the items in the transac-tion is shown in Fig. 7b. Because items B and C are matchedwith the candidate itemset, the trimming vector becomes< 0; 1; 1; 0; 0 > . Meanwhile, the corresponding trimminginformation is fed into the trimming register, and thetrimming information is updated from < 0; 1; 1; 0; 1 > to< 0; 2; 2; 0; 1 > .

After passing through the systolic array, transactionsand their corresponding trimming information are passedto the trimming filter. The filter trims off items whosefrequencies are less than k. As the example in Fig. 8 shows,the trimming information of the transaction fA;B;C;D;Eg

is < 2; 2; 2; 1; 2 > and the current k is 2. Therefore, the item

D should be trimmed. The new transaction becomes

fA;B;C;Dg. In this way, the size of the database can be

reduced. The trimmed transactions are sent to the hash

table filter module for hash table building.

4.4 Hash Table Filtering

To build a hardware hash table filter, we use a hash value

generator and hash table updating module. The former

generates all the k-itemset combinations of the transactions

and puts the k-itemsets into the hash function to create the

corresponding hash values. As shown in Fig. 9, the hash

value generator comprises a transaction memory, a state

machine, an index array, and a hash function. The

transaction memory stores all the items of a transaction.

The state machine is the controller that generates control

signals of different lengths ðk ¼ 2; 3 . . .Þ flexibly. Then, the

control signals are fed into the index array. To generate a

k-itemset, the first k entries in the index array are utilized.

The values in the index array are the indices of the

transaction memory. The item selected by the ith entry of

the index array is the ith item in a k-itemset. By changing

the values in the index array, the state machine can generate

different combinations of k-itemsets from the transaction.

The procedure starts by loading a transaction into the

transaction memory. Then, the values in the index array are

reset, and the state machine starts to generate control

signals. The values in the index array are changed by the

different states. Each item in the generated itemset is passed

to the hash function through the multiplexer. The hash

function takes some bits from the incoming k-itemsets to

calculate the hash values.

WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 789

Fig. 7. An example of streaming a transaction and the corresponding

trimming information into the cell. (a) Stream a transaction into the cell.

(b) Stream trimming information into the cell.

Fig. 8. The trimming filter.

Fig. 9. The hash value generator.

Page 7: Hardware enhanced association rule mining with hashing and pipelining

Consider the example in Fig. 9. We assume the current k is3. The first three index entries in the index array are used inthis case. The transaction fA;C;E;F;Gg is loaded into thetransaction memory. The values in the index array areinitiated to 0, 1, and 2, respectively, so that the first itemsetgenerated is < ACE > . Then, the state machine changes thevalues in the index array. The following numbers in the indexarray will be < 0; 1; 3 > , < 0; 1; 4 > , < 0; 2; 3 > , < 0; 2; 4 > ,to name a few. Therefore, the corresponding itemsets are< ACF > , < ACG > , < AEF > , < AEG > , and so on.

The hash values generated by the hash value generatorare passed to the hash table updating module. To speed upthe process of hash table building, we utilize Nparallel hashvalue generators so that the hash values can be generatedsimultaneously. In addition, the hash table is divided intoseveral parts to increase the throughput of hash tablebuilding. Each part of the hash table contains a range ofhash values, and the controller passes the incoming hashvalues to the buffer they belong to. These hash values aretaken as indexes of the hash table to accumulate the valuesin the table, as shown in Fig. 10. There are four parallel hashvalue generators. The size of the whole hash table is 65,536,and it is divided into four parts. Thus, the range of each partis 16,384. If the incoming hash value is 5, it belongs to thefirst part of the hash table. The controller would pass thevalue to buffer 1. If there are parallel accesses to the hashtable at the same time, only one access can be executed. Theothers will be delayed and be handled as soon as possible.The delayed itemsets are stored in the buffer temporally.Whenever the access port of hash table is free, the delayeditemsets are put into the hash table.

After all the candidate k-itemsets have been generated,they are pruned by the hash table filter. Each candidateitemset is hashed by the hash function. By querying thenumber of itemsets in the bucket with the correspondinghash value, the candidate itemset is pruned if the number ofitemsets in the bucket does not meet the minimum supportcriteria. Therefore, the number of the candidate itemsets canbe reduced effectively with the help of the hash table filter.

4.5 Performance Analysis

In this section, we derive some properties of our system withand without the pipeline scheme to investigate the totalexecution time. Suppose the number of candidate k-itemsetsis Ncand�k and the number of frequent k-itemsets is Nfreq�k:There are Ncell hardware cells in the systolic array. jT jrepresents the average number of items in a transaction, andjDj is the total number of items in the database. As shown

in Fig. 5, the time needed to find frequent k-itemsets andcandidate ðkþ 1Þ-itemsets includes the time required forsupport counting, transaction trimming, hash table building,candidate generation, and candidate pruning.

First, the execution time of the support countingprocedure is related to the number of times candidateitemsets and the database are loaded into the systolic array.That is, if Ncand�k is larger than Ncell, the candidate itemsetsand the database must be input into the systolic arraydNcand�k=Ncelle times. Each time there are, at most, Ncell

candidate itemsets loaded into the systolic array. Thenumber of items in these candidate k-itemsets is, at most,k �Ncell. In addition, all items in the database need jDj cyclesto be streamed into the systolic array. Therefore, theexecution cycle of the support counting procedure is, at most

tsup ¼ dNcand�k=Ncelle � ðk �Ncell þ jDjÞ:

Second, the transaction trimming procedure eliminatesinfrequent items and receives incoming items at the sametime. A transaction item and the corresponding trimminginformation are fed into the trimming filter during eachcycle. After the whole database has been passed through thetrimming filter, the transaction trimming procedure isfinished. Thus, the execution cycle depends on the numberof items in the database:

ttrim ¼ jDj:

Third, the hash table building procedure consists of the

hash value generation and the hash table updating pro-

cesses. Because the processes can be executed simulta-

neously, the execution time is based on the process that

generates the hash values. The execution time of hash value

generation consists of the time taken by transaction loading

and by hash value generation from transactions. The overall

transaction loading time is jDj cycles. In addition, there are

jT j items in a transaction on average. Thus, the number of

ðkþ 1Þ-itemset combinations from a transaction isCjT jkþ1. Each

time ðkþ 1Þ-itemset requires ðkþ 1Þ cycles to be generated.

The average number of transactions in the database is jDjjT j .

Therefore, the execution time of hash value generation from

the whole database is ðkþ 1Þ � CjT jk �jDjjT j cycles. Because we

have designed a parallel architecture, the procedure can be

executed by Nparallel hardware modules simultaneously. The

execution cycle of the hash table building procedure is

thash ¼ jDj þ ðkþ 1Þ � CjT jkþ1 �jDjjT j

� �� 1

Nparallel:

The fourth procedure is candidate generation. Frequentk-itemsets are compared with other frequent k-itemsets togenerate candidate ðkþ 1Þ-itemsets. The execution time ofcandidate generation consists of the time required to loadfrequent k-itemsets (at most, k �Ncell each time), the timetaken to pass frequent k-itemsets ðk �Nfreq�kÞ through thesystolic array, and the time needed to generate candidatek-itemsets ðNcand�ðkþ1ÞÞ. Similar to the support countingprocedure, if Nfreq�k is larger than Ncell, the frequentk-itemsets have to be separated into several groups and the

790 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008

Fig. 10. The parallel hash table building module.

Page 8: Hardware enhanced association rule mining with hashing and pipelining

comparison process has to be executed several times. Thus,

the execution cycle is, at most

tcandidate generation ¼ dNfreq�k=Ncelle � ðk �Ncell þ k �Nfreq�kÞþNcand�ðkþ1Þ:

Finally, the candidate pruning procedure has to hash

candidate ðkþ 1Þ-itemsets and to query the hash table Hkþ1

each time ðkþ 1Þ-itemset requires ðkþ 1Þ cycles to be

generated. The hash table can accept one input query

during each cycle. Thus, the execution time needed to hash

candidate k-itemsets ððkþ 1Þ �Ncand�ðkþ1ÞÞ and query the

hash table ðNcand�kþ1Þ is

tcandidate pruning ¼ ðkþ 1Þ �Ncand�ðkþ1Þ þNcand�ðkþ1Þ:

Since the size of the database that we consider is much

larger than the number of the candidate k-itemsets, we can

neglect the execution time of candidate generation and

pruning. Therefore, the time required for one round of the

sequential execution tseq is the sum of the time taken by the

support counting, transaction trimming, and hash table

building procedures, as shown in Property 1.

Property 1. tseq ¼ tsup þ ttrim þ thash.

The pipeline scheme incorporated in the HAPPI archi-

tecture divides the database into Npipe parts and inputs them

into the three modules. However, this scheme causes some

overhead toverhead because of reloading candidate itemsets

for multiple times in the support counting procedure:

toverhead ¼ dNcand�k=Ncelle � ðk �NcellÞ �Npipe:

Therefore, the support counting procedure has to con-

sider toverhead. The execution time of the support counting

procedure in the pipeline scheme becomes

t0sup ¼ tsup þ toverhead:

The execution time of the pipeline scheme tpipe is

analyzed according to the following two cases:Case 1. If the execution time of the support counting

procedure is longer than that of the hash table building

procedure, the other parts of the procedure would finish

their operations before the support counting procedure.

However, the transaction trimming and hash table building

procedures have to wait for the last part of the data from the

support counting procedure. Therefore, the total execution

time is t0sup, and the time required to process the last part of

the database with the trimming filter and the hash table

filter is

tpipe ¼ t0sup þ ðttrim þ thashÞ �1

Npipe:

Case 2. If the execution time of the support counting

procedure is less than that of the hash table building

procedure, the other parts of the procedure would be

completed before the hash table building procedure. Since

the hash table building procedure has to wait for the data

from the support counting and transaction trimming

procedures, the total execution time is thash, and the time

required to process the first part of the database with thesupport counting and the trimming filter is

tpipe ¼ t0sup þ ttrim� �

� 1

Npipeþ thash:

Summarizing the above two cases, the execution timetpipe can be presented as Property 2.

Property 2.

tpipe ¼ maxðt0sup; thashÞ þminðt0sup; thashÞ

� 1

Npipeþ ttrim �

1

Npipe:

To achieve the minimal value of tpipe, we consider thefollowing two cases:

1. If t0sup is larger than thash, the execution time tpipe isdominated by t0sup. To decrease the value of t0sup, wecan increase the number of hardware cells in thesystolic array until t0sup is equal to thash. Therefore, theoptimal value for Npipe to reach the minimal tpipe is

Npipe ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1

dNcand�k=Ncelle � k �Ncell� ðttrim þ thashÞ

s:

2. If t0sup is smaller than thash, the execution time tpipe ismainly taken up by thash. To decrease the value ofthash, we can increase the number of Nparallel untilthash is equal to t0sup. As a result, the optimal value forNpipe to achieve the minimal tpipe in this case is

Npipe ¼1

dNcand�k=Ncelle � k �Ncell� ðthash � tsupÞ:

To decide the values of Ncell and Nparallel in the HAPPIarchitecture, we have to get the value of Ncand�k. However,Ncand�k varies with different numbers of k. Therefore, wedecide these values according to our experimental experi-ence. Generally speaking, for many types of data in the realworld, the number of candidate 2-itemsets is the largest.Thus, we can focus on the case k ¼ 2, since the executiontime would be the largest of all the processes. For a data setwith jT j ¼ 10 and jDj ¼ 1 million, if the minimum supportis 0.2 percent, Ncand�k is about 3,000. Assume that there are500 hardware cells in the systolic array. To accelerate thehash table building procedure, we can increase the numberof Nparallel. Based on Property 2, the best number of Nparallel

is set to 4. In addition, after the transactions are trimmed bythe trimming filter we can get the current number of itemsin the database. Also, the number of Ncand�k can be obtainedafter candidate k-itemsets are pruned by the hash tablefilter. Therefore, we can calculate the value of tsup and thashbefore starting the support counting and the hash tablebuilding procedures. Since the toverhead is less than the size ofthe database under consideration, tsup can be viewed as t0sup.Based on the formulas we derived above, we can get thebest value of Npipe to minimize the value of tpipe. When thesupport counting procedure is dealing with candidate2-itemsets and the hash table building procedure is about

WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 791

Page 9: Hardware enhanced association rule mining with hashing and pipelining

to build H3, we divide the database into 30 parts, i.e.,Npipe ¼ 30. By applying pipeline scheme to these threehardware modules, the hardware utilization increases andthe wastage due to the blocking can be reduced.

5 EXPERIMENT RESULTS

In this section, we conduct several experiments on anumber of synthetic data sets to evaluate the performanceof the HAPPI architecture. We also implement an approachmainly based on [3], abbreviated as Direct Comparison(DC) method, for comparison purposes. Although a hard-ware algorithm was proposed in [4], its performanceimprovement is found much less than DC. However,HAPPI significantly outperforms DC in orders of magni-tude. Moreover, we implement a software algorithm DHP[16], denoted by SW_DHP, as the baseline. The softwarealgorithm is executed on a PC with 3-GHz P-4 CPU and1-Gbytes RAM.

Both HAPPI and DC are implemented on the AlteraStratix 1S40 FPGA board with 50-MHZ clock rate and10-Mbytes SDRAM. The hardware modules are coded inVerilog. We use ModelSim to simulate Verilog codes andverify the functions of our design. In addition, we use AlteraQuartus II IDE to build hardware modules and synthesizemodules into hardware circuits. Finally, the hardwarecircuit image is sent to the FPGA board. There is asynthesized CPU on an FPGA board. We implement asoftware program on NIOSII to verify and to recordhardware execution time. At first, the program downloadsdata from the database into the memory on FPGA. Then, thedata is fed into hardware modules. The bandwidthconsumed from the memory to the chip is 16 bits/cycle inour design. Since the data is transferred on the bus, themaximum bandwidth is limited by the bandwidth of the buson FPGA, which is generally 32 bits/cycle. The bandwidthcan be upgraded to 64 bits/cycle in some modern FPGAboard. After the execution of hardware modules, we canacquire outcomes and execution cycles. The followingexperimental results are based on execution cycles on FPGAboard. In addition, in our hardware design, the critical pathis in the hash table building module. There are many logiccombinations in this module. The synthesized hardwarecore is, therefore, complex. This makes the maximum clockfrequency in our hardware design bound to 58.6 MHz.However, since the maximum clock frequency on AlteraFPGA1s40 is 50 MHz, our design meets the hardwarerequirement. In our future work, we are going to increasethe clock frequency of the hardware architecture. We willtry to optimize the bottleneck module.

In the hardware implementation of the HAPPI architec-ture, Nparallel is set to 4. The number of the hardware cells inthe systolic array is 500; there are 65,536 buckets in the hashtable; and Npipe are assigned according to the methodology

in Section 4.5. The method used to generate synthetic data is

described in Section 5.1. The performance comparison of

several schemes in the HAPPI architecture and DC are

discussed in Section 5.2. Section 5.3 presents the perfor-

mance analysis of different distributions of frequent item-

sets. Finally, in Section 5.4, the results of some scale-up

experiments are discussed.

5.1 Generation of Synthetic Data

To obtain reliable experimental results, we employ similar

methods to those used in [16] to generate synthetic data sets.

These data sets are generated with the following parameters:

T represents the average number of items in a transaction of

the database, I denotes the average length of the maximum

potentially frequent itemsets, D is the number of transac-

tions in the database, L is the number of maximal potentially

frequent itemsets, and N is the number of different items in

the database. Table 1 summarizes the parameters used in our

experiments. In the following experiment data sets, L is set

to 3,000. To evaluate the performance of HAPPI, we conduct

several experiments with different data sets. We use TxIyDz

to represent that T ¼ x, I ¼ y, and D ¼ z. In addition, the

sensitivity analysis and the scale-up experiments are also

explored with different data sets. Note that the y-axis of the

following figures is the execution cycle in logarithmic scale.

5.2 Performance Evaluation

Initially, we conduct experiments to evaluate the perfor-

mance of several schemes in the HAPPI architecture and

DC. The testing data sets are T10I4D100 with different

numbers of items in the database. The minimum support is

set to 0.5 percent. As shown in Fig. 11, the four different

schemes are

1. the DC scheme,2. the systolic array with a trimming filter,3. the combined scheme made up of the systolic array,

the trimming filter and the hash table filter, and4. the overall HAPPI architecture with the pipeline

design.

792 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008

TABLE 1Summary of the Parameters Used

Fig. 11. The execution cycles of several schemes.

Page 10: Hardware enhanced association rule mining with hashing and pipelining

With the trimming filter, the execution time increases toabout 10 percent-70 percent compared to DC. As thenumber of different items in the database increases, theimprovement due to the trimming filter increases. Thereason is that, if the number of different items grows, thenumber of infrequent itemsets also increases. Therefore, thefiltering rate by utilizing the trimming filter is moreremarkable. Moreover, the execution cycle of the combinedscheme with the help of the hash table filter is about 25-51 times faster. The HAPPI architecture with the pipelinescheme is 47-122 times better than DC. Note that the mainimprovement of the execution time results from the hashtable filter. We not only implemented an efficient hardwaremodule for the hash table filter in the HAPPI architecturebut also designed the pipeline scheme to let the hash tablefilter work together with other hardware modules. Thepipeline and the parallel design are two helpful propertiesof the hardware architecture. Therefore, we utilize thesehardware design skills to accelerate the overall system. Asshown in Fig. 11, although the combined scheme providesmuch performance boost, the performance improvementstill benefits from the overall HAPPI architecture withpipeline design. In summary, the HAPPI architectureoutperforms DC, especially when the number of differentitems in the database is large.

5.3 Sensitivity Analysis

In the second experiment, we generate several data setswith different distributions of frequent itemsets to examinethe sensitivity of the HAPPI architecture. The experimentresults of several synthetic data sets with various mini-mum supports are shown in Figs. 12 and 13. The resultsshow that no matter what combination of the parametersT and I is used, the HAPPI architecture consistentlyoutperforms DC and SW_DHP. Specifically, the executiontime of HAPPI is less than that of DC and that of SW_DHPby several orders of magnitude. The margin grows as theminimum support increases. As Fig. 12 shows, HAPPI hasa better performance enhancement ratio to DC on theT5I2D100K data set than on the T20I8D100K data set. Thereason is that the hash table filter is especially effective ineliminating infrequent candidate 2-itemsets, as reported inthe experimental results of DHP [16]. Thus, SW_DHP alsohas better performance with these two data sets. Inaddition, the execution time tpipe is mainly related to thenumber of times dNcand�k=Ncelle of reloading the database.Since the number of Ncand�k can be substantially reduced

when k is small, the overall performance enhancement isremarkable. Since the average size of the maximalpotentially frequent itemsets of the data set T20I8D100Kis 8, the performance of the HAPPI architecture is only2.65 times faster. However, most data in the real worldcontains short frequent itemsets. That is, T and I are small.It is noted that DC has only little enhancement toSW_DHP with the short frequent itemsets, while HAPPIpossesses better performance. Therefore, the HAPPIarchitecture can perform well with real-world data sets.

Fig. 13 demonstrates that the improvement of the HAPPIarchitecture over DC becomes more noticeable with increas-ing minimum support. This is because the number of longitemsets would be eliminated with large minimum support.Therefore, the improvement due to the hash table filterincreases. In comparison with DC, the overall performance isoutstanding.

5.4 Scale-up Experiment

According to the performance analysis in Section 4.5, theexecution time tpipe is mainly related to the number of times,dNcand�k=Ncelle, of reloading the database into the systolicarray. Therefore, if the number of Ncell increases, the timewe need to stream the database is less and the overallexecution time is faster. Fig. 14 illustrates the scalingperformance of the HAPPI architecture and DC, wherethe y-axis is also in logarithmic scale. The execution cycles ofboth HAPPI and DC linearly decrease with the increasingnumber of hardware cells. We observe that HAPPI outper-forms DC on different numbers of hardware cells in thesystolic array. The most important result is that we onlyutilize 25 hardware cells in the systolic array but achieve thesame performance as the 800 hardware cells used in DC.The benefit of the HAPPI architecture is more computingpower with lower costs for data mining in hardware design.

We also conduct experiments with different numbers oftransactions in the synthetic data sets to explore thescalability of the HAPPI architecture. The generated datasets are T10I4, and the minimum support is set to0.5 percent. As shown in Fig. 15, the execution time ofHAPPI increases linearly as the number of transactions inthe synthetic data sets increases. The performance of HAPPIoutperforms DC on different numbers of transactions in thedatabase. Furthermore, Fig. 15 shows the good scalability ofHAPPI and DC. This feature is especially important because

WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 793

Fig. 12. The execution time of data sets with different T and I.

Fig. 13. The execution cycles with various minimum supports.

Page 11: Hardware enhanced association rule mining with hashing and pipelining

the size of applications is growing much faster than thespeed of CPU. Thus, hardware-enhanced data miningtechnique is imperative.

6 CONCLUSION

In this work, we have proposed the HAPPI architecture forhardware-enhanced association rule mining. The bottleneckof a priori-based hardware schemes is related to the numberof candidate itemsets and the size of the database. To solvethis problem, we apply the pipeline methodology in theHAPPI architecture to compare itemsets with the databaseand collect useful information to reduce the number ofcandidate itemsets and items in the database simulta-neously. HAPPI can prune infrequent items in the transac-tions and reduce the size of the database gradually byutilizing the trimming filter. In addition, HAPPI caneffectively eliminate infrequent candidate itemsets withthe help of the hash table filter. Therefore, the bottleneck ofa priori-based hardware schemes can be solved by theHAPPI architecture. Moreover, we derive some propertiesto analyze the performance of HAPPI. We conduct asensitivity analysis of various parameters to show manyinsights into the HAPPI architecture. HAPPI outperformsthe previous approach, especially with the increasingnumber of different items in the database, and with theincreasing minimum support values. Also, HAPPI increasescomputing power and saves the costs of data mining inhardware design as compared to the previous approach.Furthermore, HAPPI possesses good scalability.

ACKNOWLEDGMENTS

The authors would like to thank Wen-Tsai Liao at Realtekfor his helpful comments to improve this paper. The workwas supported in part by the National Science Council ofTaiwan under Contract NSC93-2752-E-002-006-PAE.

REFERENCES

[1] R. Agarwal, C. Aggarwal, and V. Prasad, “A Tree ProjectionAlgorithm for Generation of Frequent Itemsets,” J. Parallel andDistributed Computing, 2000.

[2] R. Agrawal and R. Srikant, “Fast Algorithms for MiningAssociation Rules,” Proc. 20th Int’l Conf. Very Large Databases(VLDB), 1994.

[3] Z.K. Baker and V.K. Prasanna, “Efficient Hardware Data Miningwith the Apriori Algorithm on FPGAS,” Proc. 13th Ann. IEEESymp. Field-Programmable Custom Computing Machines (FCCM),2005.

[4] Z.K. Baker and V.K. Prasanna, “An Architecture for EfficientHardware Data Mining Using Reconfigurable Computing Sys-tems,” Proc. 14th Ann. IEEE Symp. Field-Programmable CustomComputing Machines (FCCM ’06), pp. 67-75, Apr. 2006.

[5] C. Besemann and A. Denton, “Integration of Profile HiddenMarkov Model Output into Association Rule Mining,” Proc. 11thACM SIGKDD Int’l Conf. Knowledge Discovery in Data Mining (KDD’05), pp. 538-543, 2005.

[6] C.W. Chen, J. Luo, and K.J. Parker, “Image Segmentation viaAdaptive K-Mean Clustering and Knowledge-Based Morphologi-cal Operations with Biomedical Applications,” IEEE Trans. ImageProcessing, vol. 7, no. 12, pp. 1673-1683, 1998.

[7] S.M. Chung and C. Luo, “Parallel Mining of Maximal FrequentItemsets from Databases,” Proc. 15th IEEE Int’l Conf. Tools withArtificial Intelligence (ICTAI), 2003.

[8] S. Cong, J. Han, J. Hoeflinger, and D. Padua, “A Sampling-BasedFramework for Parallel Data Mining,” Proc. 10th ACM SIGPLANSymp. Principles and Practice of Parallel Programming (PPoPP ’05),June 2005.

[9] M. Estlick, M. Leeser, J. Szymanski, and J. Theiler, “AlgorithmicTransformations in the Implementation of K-Means Clustering onReconfigurable Hardware,” Proc. Ninth Ann. IEEE Symp. Field-Programmable Custom Computing Machines (FCCM), 2001.

[10] M. Gokhale, J. Frigo, K. McCabe, J. Theiler, C. Wolinski, and D.Lavenier, “Experience with a Hybrid Processor: K-Means Cluster-ing,” J. Supercomputing, pp. 131-148, 2003.

[11] J. Han and M. Kamber, Data Mining: Concepts and Techniques.Morgan Kaufmann, 2001.

[12] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns withoutCandidate Generation,” Proc. ACM SIGMOD ’00, pp. 1-12, May2000.

[13] H. Kung and C. Leiserson, “Systolic Arrays for VLSI,” Proc. SparseMatrix, 1976.

[14] N. Ling and M. Bayoumi, Specification and Verification of SystolicArrays. World Scientific Publishing, 1999.

[15] W.-C. Liu, K.-H. Liu, and M.-S. Chen, “High Performance DataStream Processing on a Novel Hardware Enhanced Framework,”Proc. 10th Pacific-Asia Conf. Knowledge Discovery and Data Mining(PAKDD ’06), Apr. 2006.

[16] J.S. Park, M.-S. Chen, and P.S. Yu, “An Effective Hash BasedAlgorithm for Mining Association Rules,” Proc. ACM SIGMOD’95, pp. 175-186, May 1995.

[17] J.S. Park, M.-S. Chen, and P.S. Yu, “Using a Hash-Based Methodwith Transaction Trimming for Mining Association Rules,” IEEETrans. Knowledge and Data Eng., vol. 9, no. 5, pp. 813-825, Sept./Oct. 1997.

[18] A. Savasere, E. Omiecinski, and S. Navathe, “An EfficientAlgorithm for Mining Association Rules in Large Databases,”Proc. 21st Int’l Conf. Very Large Databases (VLDB ’95), pp. 432-444,Sept. 1995.

[19] H. Toivonen, “Sampling Large Databases for Association Rules,”Proc. 22nd Int’l Conf. Very Large Databases (VLDB ’96), pp. 134-145,1996.

794 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008

Fig. 14. The execution cycles with different numbers of hardware units. Fig. 15. The execution cycles with various numbers of transactions, zðKÞ.

Page 12: Hardware enhanced association rule mining with hashing and pipelining

[20] C. Wolinski, M. Gokhale, and K. McCabe, “A ReconfigurableComputing Fabric,” Proc. Int’l Conf. Eng. of Reconfigurable Systemsand Algorithms (ERSA), 2004.

Ying-Hsiang Wen received the BS degree incomputer science from the National Chiao TungUniversity and the MS degree in electricalengineering from the National Taiwan Univer-sity, Taipei, in 2006. His research interestsinclude data mining, video streaming, and multi-media SoC design.

Jen-Wei Huang received the BS degree inelectrical engineering from the National TaiwanUniversity, Taipei, in 2002, where he is currentlyworking toward the PhD degree in computerscience. He is familiar with data mining area.His research interests include data mining,mobile computing, and bioinformatics. Amongthese, the web mining, incremental mining,mining data streams, time series issues, andsequential pattern mining are his special inter-

ests. In addition, some of his research are on mining general temporalassociation rules, sequential clustering, data broadcasting, progressivesequential pattern mining and bioinformatics.

Ming-Syan Chen received the BS degree inelectrical engineering from the National TaiwanUniversity, Taipei, and the MS and PhD degreesin computer, information, and control engineer-ing from the University of Michigan, Ann Arbor,in 1985 and 1988, respectively. He was thechairman of the Graduate Institute of Commu-nication Engineering (GICE), National TaiwanUniversity, from 2003 to 2006. He is currently adistinguished professor jointly appointed by the

Electrical Engineering Department, Computer Science and InformationEngineering Department, and GICE, National Taiwan University. Hewas a research staff member at IBM T.J. Watson Research Center, NewYork, from 1988 to 1996. He served as an associate editor of the IEEETransactions on Knowledge and Data Engineering from 1997 to 2001and is currently on the editorial board of the Very Large Data Base(VLDB) Journal and Knowledge and Information Systems. His researchinterests include database systems, data mining, mobile computingsystems, and multimedia networking. He has published more than240 papers in his research areas. He is a recipient of the NationalScience Council (NSC) Distinguished Research Award, Pan Wen YuanDistinguished Research Award, Teco Award, Honorary Medal ofInformation, and K.-T. Li Research Breakthrough Award for his researchwork, as well as the IBM Outstanding Innovation Award for hiscontribution to a major database product. He also received numerousawards for his teaching, inventions, and patent applications. He is afellow of the ACM and the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 795