[ieee 22nd international conference on data engineering (icde'06) - atlanta, ga, usa...

Download [IEEE 22nd International Conference on Data Engineering (ICDE'06) - Atlanta, GA, USA (2006.04.3-2006.04.7)] 22nd International Conference on Data Engineering (ICDE'06) - Mining Actionable Patterns by Role Models

Post on 10-Mar-2017




0 download

Embed Size (px)


  • Mining Actionable Patterns by Role Models

    Ke WangSimon Fraser University


    Yuelong JiangSimon Fraser University


    Alexander TuzhilinNew York Universityatuzhili@stern.nyu.edu


    Data mining promises to discover valid and potentiallyuseful patterns in data. Often, discovered patterns are notuseful to the user. Actionability addresses this problem inthat a pattern is deemed actionable if the user can act uponit in her favor. We introduce the notion of action as adomain-independent way to model the domain knowledge.Given a data set about actionable features and an utilitymeasure, a pattern is actionable if it summarizes a popu-lation that can be acted upon towards a more promisingpopulation observed with a higher utility. We present sev-eral pruning strategies taking into account the actionabilityrequirement to reduce the search space, and algorithms formining all actionable patterns as well as mining the top kactionable patterns. We evaluate the usefulness of patternsand the focus of search on a real-world application domain.

    1 Introduction

    1.1 Background

    Knowledge discovery in databases is the non-trivial pro-cess of identifying valid, novel, potentially useful, and ul-timately understandable patterns in data [4]. The literaturehas mostly focussed on the valid and understandablemeasures that can be defined in objective terms, such as theconfidence/support measure [2] and the size/number of pat-terns. Due to the lack of modeling user knowledge, dis-covered patterns often are known or not useful to the user.To model the novel and useful aspects, subjective mea-sures are needed [7, 10]. [10] classified subjective measuresinto unexpectedness and actionability. A pattern is unex-pected if it is surprising to the user. A pattern is actionableif the user can act upon it to her advantage.

    Despite the efforts of initial works on actionability min-ing (more details in Related Work), it turned out to be veryhard to capture formally the elusive nature of actionabil-ity in all of its manifestations and across different types of

    applications and settings. The following factors contributeto the difficulties of this problem. First, there is the dilemmathat user knowledge should be acquired to define actionabil-ity, but the ability to provide such knowledge tends to inval-idate the need for mining actionable patterns. Second, it isvery hard to model the usefulness of a pattern. Strictlyspeaking, this information is not available until after pat-terns are deployed in the real world; on the other hand, thedata mining algorithm needs to tell if a pattern is actionablebefore it is deployed. Finally, the search space is likely tobe large and a scalable method must exploit the notion ofactionability to prune the search as early as possible.

    1.2 Our Contributions

    We propose an actionability mining approach that ad-dresses three goals. (1) Find the right balance between pro-viding enough domain knowledge and making the problemscalable, tractable and non-trivial. (2) Model actionabilityin a domain-independent manner and provide a way to es-timate the success of actionable patterns. (3) Utilize userknowledge early in the search of actionable patterns. Themain idea is as follows. Suppose that a database containshistorical observations about several features and a successmeasure called the utility. Also, suppose that we know someactions can influence features in certain (simple) ways. Ifsome features are correlated with the utility in some pop-ulation of the data, a change in those features would im-ply a change in the utility. Now if some actions can influ-ence those features, this influence will cascade to the utilitythrough the correlation. We are interested in the patternsthat summarize those actions and populations where suchcascaded influences increase the utility. An example ex-plains these points.

    Example 1.1 (Phone call charge example) Considera customer database in the long distance phone callapplication:

    C(CustID,Rate,Married, , U).U is the utility representing the profit generated on a cus-tomer. For simplicity, U has the domain {Low,High}. The

    Proceedings of the 22nd International Conference on Data Engineering (ICDE06) 8-7695-2570-9/06 $20.00 2006 IEEE

  • feature Rate represents the rate charged to a customer andhas the domain {Normal, Special}, with Normal denot-ing the normal rate and Special denoting the promotionalrate. Suppose that we observed the following two patternsin the data

    D : Rate = Normal,Married = No U = Low,D : Rate = Special,Married = No U = High.

    The customers in D generated a higher profit because theymade more calls at the lower rate. Comparing these twopatterns reveals that the telephone company can make moreprofit by taking the action of offering the special rate to thecustomers in D. The increase is expected because a higherprofit was observed on the customers (i.e., D) who sharedsimilar characteristics (i.e., unmarried) but were offered thespecial rate.

    The example conveys several points. (1) In many real lifeapplications, the user knows some simple action/feature re-lationship (e.g., offering special rate can change Rate fromNormal to Special). (2) The user is interested in find-ing the action/utility relationship, i.e., how actions wouldaffect the utility. Since several actions and features couldbe involved in a complex relationship to affect the utility,finding such relationships requires examining both the ac-tion knowledge and the data, therefore, is non-trivial. (3)A pattern is actionable if it summarizes a population thatcan be affected, by taking actions, toward another popula-tion observed with a higher utility. In a sense, the secondpopulation serves the role model of the first one.

    Can this problem be solved by a standard method, suchas a classification algorithm to learn actions that influencethe utility? Unfortunately, standard learning methods can-not easily incorporate the action knowledge. Without thisknowledge, these methods can learn the rules that summa-rize the data, but not the rules that change the state of thedata. This comment equally applies to association rule min-ing [2] and other rules learnt purely from the data. In prin-ciple the actionability problem is a learning problem, butthe presence of action knowledge and the focus on changesmake it a whole new learning problem that standard meth-ods are not designed to handle.

    The contributions of this paper are as follows. We intro-duce the notion of action as a domain-independent wayto model some important domain knowledge. An action de-scribes the condition under which it applies and the effectit has on a feature. We formulate a new data mining prob-lem where actions are the first class objects and actionablepatterns are summarizations of opportunities for actions toboost the utility. We show that mining actionable patternsis at least as hard as mining frequent itemsets of [2]. Wepresent several pruning strategies taking into account the ac-tionability requirement to reduce the search space, and algo-rithms for mining all actionable patterns as well as mining

    the top k actionable patterns. We evaluate the usefulness ofpatterns and the focus of search on a real-world applicationdomain.

    In the rest of the paper, we review related work in Section2, introduce our action model in Section 3, define the action-ability problem in Section 4, present mining algorithms inSection 5, and evaluate our approach in Section 6. Finallywe conclude the paper.

    2 Related Work

    Some initial work has been done on actionability min-ing. [7] defined actionability of a key finding (pattern)in terms of the estimated benefits in taking corrective ac-tion(s) that restores the deviation back to its norm. Thisapproach worked well for the specific healthcare applica-tion, but could not be generalized to other application do-mains. A domain independent approach was presented in[1], where a hierarchy of actions was introduced and certaintypes of patterns are assigned to the nodes in the hierarchy,and actionable patterns were discovered by executing pat-tern templates at the nodes of the hierarchy.

    Kleinberg et al. [6] presented the microeconomic viewof data mining. They regarded data mining as an opti-mization problem where the utility of decisions is theobjective function: partition the customer base C intok parts C1, , Ck to maximize the sum of the optimak1maxxDiCjci x, where ci x denotes the utility of adecision x on a customer i. In our setting, however, ci x isunknown because it corresponds to the actionability of in-creasing the utility on a customer i by taking a set of actionsx. Modeling this notion of actionability is our contribution.

    Ras et al. [8] presented an algorithm for finding actionrules. They considered stable attributes whose values can-not be changed, and flexible attributes whose values can bechanged. An action rule is a pair of regular rules thatagree on all stable attributes and differ in some flexible at-tributes and the utility. They did not model actions nor at-tributes where actions affect only certain ranges, and couldnot optimize the set of actions for a better actionability. Thesearch of action rules is done in a post-processing after find-ing all regular rules.

    The action model was first proposed in [5] with the focuson constructing a model for optimizing the actionability forfuture cases. The work in [12] presented a profit-motivatedproduct recommender approach, in the presence of an in-verse correlation between the chance of selling an item andthe profit generated from the sales. The work in [13] as-sumed that the user is able to provide the cost for chang-ing a feature value. In practice, obtaining such cost will bethe bottleneck. For example, it is difficult for the user todetermine the