predicting missing items in shopping carts

14
Knowledge and Data Engineering IEEE 2009 Predicting Missing Items in Shopping Carts Abstract Existing research in association with mining has focused mainly on how to expedite the search for frequently co- occurring groups of items in “shopping cart” type of transactions; less attention has been paid to methods that exploit these “frequent itemsets”for prediction purposes. This project contributes to the latter task by proposing a technique that uses partial information about the contents of a shopping cart for the prediction of what else the customer is likely to buy. Using the recently proposed data structure of item set trees (IT-trees), we obtain, in a computationally efficient manner, all rules whose antecedents contain at least one item from the incomplete shopping cart. This project combine these rules by uncertainty processing techniques, including the classical Bayesian decision theory and a new algorithm based on the Dempster-Shafer (DS) theory of evidence combination. Existing system: Existing research in association mining has focused mainly on how to expedite the search for frequently co-occurring groups of items in “shopping cart” type of transactions; less attention has been paid to methods that exploit these “frequent itemsets”for prediction purposes. Existing system Disadvantages:

Upload: vamsikaza1

Post on 02-Apr-2015

968 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predicting Missing Items in Shopping Carts

Knowledge and Data EngineeringIEEE 2009

Predicting Missing Items in Shopping Carts

Abstract

Existing research in association with mining has focused mainly on how to expedite the

search for frequently co-occurring groups of items in “shopping cart” type of

transactions; less attention has been paid to methods that exploit these “frequent

itemsets”for prediction purposes. This project contributes to the latter task by proposing a

technique that uses partial information about the contents of a shopping cart for the

prediction of what else the customer is likely to buy. Using the recently proposed data

structure of item set trees (IT-trees), we obtain, in a computationally efficient manner, all

rules whose antecedents contain at least one item from the incomplete shopping cart. This

project combine these rules by uncertainty processing techniques, including the classical

Bayesian decision theory and a new algorithm based on the Dempster-Shafer (DS) theory

of evidence combination.

Existing system:

Existing research in association mining has focused mainly on how to expedite the search

for frequently co-occurring groups of items in “shopping cart” type of transactions; less

attention has been paid to methods that exploit these “frequent itemsets”for prediction

purposes.

Existing system Disadvantages:

They didn’t find missing items in frequently used item set.

Couldn’t find number of users per item set.

Time complexity

Lack of viewing items to the user.

Proposed system:

Finding missing items using apriority algorithms in frequently used item set.

Counting number of users per item set.

Calculating total number of visitor’s in our websites.

Advantages of Proposed system:

• Reducing time complexity. User can easily view the items set. *

• Missing items can easily find in the item set.

Page 2: Predicting Missing Items in Shopping Carts

Knowledge and Data EngineeringIEEE 2009

Introduction:

The primary task of association mining is to detect frequently co-occurring groups of

items in transactional databases. The intention is to use this knowledge for prediction

purposes: if bread, butter, and milk often appear in the same transactions, then the

presence of butter and milk in a shopping cart suggests that the customer may also buy

bread. More generally, knowing which items a shopping cart contains, we want to predict

other items that the customer is likely to add before proceeding to the checkout counter.

This paradigm can be exploited in diverse applications. For example, in the domain

discussed in each “shopping cart” contained a set of hyperlinks pointing to a Web page in

medical applications, the shopping cart may contain a patient’s symptoms, results of lab

tests, and diagnoses; in a financial domain, the cart may contain companies held in the

same portfolio; and Bollmann-Sdorra et al proposed a framework that employs frequent

itemsets in the field of information retrieval. In all these databases, prediction of

unknown items can play a very important role. For instance, a patient’s symptoms are

rarely due to a single cause; two or more diseases usually conspire to make the person

sick. Having identified one, the physician tends to focus on how to treat this single

disorder, ignoring others that can meanwhile deteriorate the patient’s condition. Such

unintentional neglect can be prevented by subjecting the patient to all possible lab tests.

However, the number of tests can undergo is limited by such practical factors as time,

costs, and the patient’s discomfort. A decision-support system advising a medical doctor

about which other diseases may accompany the ones already diagnosed can help in the

selection of the most relevant additional tests. The prediction task was mentioned as early

as in the pioneering association mining paper by Agrawal et al., but the problem is yet to

be investigated in the depth it deserves. The literature survey in indicates that most

authors have focused on methods to expedite the search for frequent itemsets, while

others have investigated such special aspects as the search for time-varying associations

or the identification of localized patterns. Still, some prediction-related work has been

done as well. An early attempt by Bayardo and Agrawal reports a method to convert

frequent itemsets to rules. Some papers then suggest that a selected item can be treated as

a binary class (absence! 0; presence! 1) whose value can be predicted by such rules. A

user asks: does the current status of the shopping cart suggest that the customer will buy

Page 3: Predicting Missing Items in Shopping Carts

Knowledge and Data EngineeringIEEE 2009

bread? If yes, how reliable is this prediction? Early attempts achieved promising results

and some authors even observed that the classification performance of association mining

systems may compare favorably with that of machine-learning techniques. In our work,

we wanted to make the next logical step by allowing any item to be treated as a class

label its value is to be predicted based on the presence or absence of other items. Put

another way, knowing a subset of the shopping cart’s contents, we want to “guess”

(predict) the rest. Suppose the shopping cart of a customer at the checkout counter

contains bread, butter, milk, cheese, and pudding. Could someone who met the same

customer when the cart contained only bread, butter, and milk, have predicted that the

person would add cheese and pudding? Implicitly or explicitly, this task stood at the

cradle of this field in the 1990s; now that many practical obstacles (e.g., computational

costs) have been reduced, we want to return to it. It is important to understand that

allowing any item to be treated as a class label presents serious challenges as compared

with the case of just a single class label. The number of different items can be very high,

perhaps hundreds, or thousand, or even more. To generate association rules for each of

them separately would give rise to great many rules with two obvious consequences: first,

the memory space occupied by these rules can be many times larger than the original

database (because of the task’s combinatorial nature); second, identifying the most

relevant rules and combining their sometimes conflicting predictions may easily incur

prohibitive computational costs. In our work, we sought to solve both of these problems

by developing a technique that answers user’s queries (for shopping cart completion) in a

way that is acceptable not only in terms of accuracy, but also in terms of time and space

complexity.

Software Requirements

Operating system :- Windows XP Professional

Front End : - VS.NET 2008

Coding Language :- Visual C# .Net

Back-End : - Sql Server 2005

Page 4: Predicting Missing Items in Shopping Carts

Knowledge and Data EngineeringIEEE 2009

Data Flow Diagrams:

Data flow diagrams:

An Overview of Proposed System:

Item Set Tree Construction:

For the prediction of all missing items in a shopping cart, our algorithm speeds up the

computation by the use of the itemset trees (IT-trees) and then uses DS theoretic notions

to combine the generated rules

Page 5: Predicting Missing Items in Shopping Carts

Knowledge and Data EngineeringIEEE 2009

Modules:

Admin

Login

Upload Shopping Cart Items to the user pages

User

Registration

Login

Access Shopping Cart Items

Frequent Item Set Generation

Prediction of Missing items

Use Case Admin:

Page 6: Predicting Missing Items in Shopping Carts

Knowledge and Data EngineeringIEEE 2009

Use Case User:

Sequence Admin:

Page 7: Predicting Missing Items in Shopping Carts

Knowledge and Data EngineeringIEEE 2009

Collaboration Admin:

Page 8: Predicting Missing Items in Shopping Carts

Knowledge and Data EngineeringIEEE 2009

Sequence User:

Collaboration User:

Page 9: Predicting Missing Items in Shopping Carts

Knowledge and Data EngineeringIEEE 2009

Inputs:

• Data Set on the Web [Details of items in shopping cart]

Data collection on client, server sides and anywhere in between

Goal determine who is purchasing what products

Tracking customer data

Web logs, E-Commerce logs, cookies, explicit login

Data then used to provide personalized content to site users to:

Assist customers in locating their target selections

“Encourage” customers to make certain selections

• Automated Recommender Systems

• Networks and Recommendations

• Web Path Analysis for Purchase Prediction

Conclusion:

The mechanism reported in this paper focuses on one of the oldest tasks in association

mining: based on incomplete information about the contents of a shopping cart, can we

predict which other items the shopping cart contains? Our literature survey indicates that,

while some of the recently published systems can be used to this end, their practical

Page 10: Predicting Missing Items in Shopping Carts

Knowledge and Data EngineeringIEEE 2009

utility is constrained, for instance, by being limited to domains with very few distinct

items. Bayesian classifier can be used too, but we are not aware of any systematic study

of how it might operate under the diverse circumstances encountered in association

mining. We refer to our technique by the acronym DS-ARM. The underlying idea is

simple: when presented with an incomplete list s of items in a shopping cart, our program

first identifies all high-support, high-confidence rules that have as antecedent a subset of

s. Then, it combines the consequents of all these (sometimes conflicting) rules and

creates a set of items most likely to complete the shopping cart. Two major problems

complicate the task: first, how to identify the relevant rules in a computationally efficient

manner; second, how to combine (and quantify) the evidence of conflicting rules. We

addressed the former issue by the recently proposed technique of IT-trees and the latter

by a few simple ideas from the DS theory. Our experimental results are promising: DS-

ARM compares favorably with the Bayesian approach and outperforms more traditional

approaches even in domains designed in a manner meant to be “tailored” to them. In

Particular, DS-ARM performs well in applications where the older approaches incur

intractable computational costs (e.g., if there are many distinct items). Besides the

encouraging results, our experiments have also identified ample room for further

improvements. As indicated in Experiments 3 and 4, the computational costs of DS-ARM

still grow very fast with the average length of the transactions and with the number of

distinct items—in real world applications, this can become a serious issue. Also our

implementation of the Bayesian classifier can perhaps be found suboptimal. Finally,

completely different approaches (beyond Bayesian classification and DS theory) should

be explored—a research strand that we strongly advocate. As a matter of fact, to attract

the scientific community’s attention to all these issues was one of our major motives in

writing this paper.