数据挖掘应用 赵卫东 博士 复旦大学软件学院 [email protected]....

61
数数数数数数 赵赵赵 赵赵 赵赵赵赵赵赵赵赵 [email protected]

Upload: morgan-robertson

Post on 19-Dec-2015

364 views

Category:

Documents


10 download

TRANSCRIPT

数据挖掘应用

赵卫东 博士

复旦大学软件学院[email protected]

客户关系管理( CRM )

顾客生命周期

寿命

盈利

获取消费者 保持消费者 消费者分析和恢复

收入

支出寿命

顾客生命周期中的数据挖掘支撑

顾客数据

数据挖掘在 CRM中的应用

Customer identification

CRM begins with customer identification. This phase involves targeting the population who are most likely to become customers or most profitable to the company.

It also involves analyzing customers who are being lost to the competition and how they can be won back.

Elements for customer identification include target customer analysis and customer segmentation.

Customer attraction

Organizations can direct effort and resources into attracting the target customer segments.

Direct marketing is a promotion process which motivates customers to place orders through various channels.

direct mail or coupon

目标营销

¾ À í¡ ¢¾ ö² ßÈ ËÔ ±

Ï úÊ ÛÈ ËÔ ±µ µ° ¸

¶ Ô¿ Í» §Ì á¹ ©µ Ä· þÎ ñ¼ °Æä± »Ê ¹Ó àÊ

Ê ý¾ Ý

¿ Í» §Ð ÅÏ ¢

Ê ý¾ ݲ Ö¿ â

Ô õÑ ù× î́ óÏ Þ¶ ȵ Ø· ¢» ÓÎ Òµ ÄÏ úÊ ÛÁ ¦Á ¿Î ªÉ ÌÒ µÓ û §· þÎ ñ£ ¿

¶ Ô² »Í ¬µ Ä¿ Í» §Ï ·̧ ÖÈ ºÊ ¹Ó ò »Í ¬µ ÄÏ úÊ ÛÈ ËÔ ±

Product AssociationsË ã· ¨

Õ ë¶ Ô¿ Í» §¹ ºÂ òÐ ÐÎ ªÍ Ƴ öРµ IJ úÆ·× éº Ï

Ô ¤² â¹ ºÂ òÓ ÐÕ ë¶ ÔÐ Ôµ Ä́ ÙÏ ú· ½° ¸µ Ä¿ ÉÄ ÜÐ Ô Î ÒÓ ¦̧ ÃÌ á¹ ©Ê ²Ã ´

Ñ ùµ ÄР· þÎ ñ? Ä ÇÐ ©¿ Í» §Ê ÇÎ Òµ ÄÄ ¿± ê ?

Ó ¦Ó ÃValue PredictionË ã· ¨

Customer SegmentationË ã· ¨

Customer retention

Central concern for CRM. Customer satisfaction is the essential condition for

retaining customers. Elements of customer retention include one-to-one

marketing, loyalty programs and complaints management.

One-to-one marketing refers to personalized marketing campaigns which are supported by analyzing, detecting and predicting changes in customer behaviors.

Loyalty programs involve campaigns or supporting activities which aim at maintaining a long term relationship with customers. Churn analysis, credit scoring, service quality or satisfaction form part of loyalty programs.

客户流失分析

¾ À í¡ ¢¾ ö² ßÈ ËÔ ±

Ë ùÓ Ð½ »Ò ×¼ Ç ¼

Ë ùÓ ÐÒ øРз ÖÐ ÐÎ ÒÔ õÑ ùÒ ÖÖ Æ¿ Í» §Á ÷Ê § ?

Í Ú¾ òº ó· ÖÎ ö

ClassificationË ã· ¨

¾ ö² ßÊ ÷

¶ ÔÓ Úà ¿̧ öÉ ÏÊ öµ Ä¿ Í» §½ øРз ÖÎ ö

Рж ¯

Ð èÒ ªÔ ÚÄ ³Ð ©Ò µÎ ñ· ½Ã æ× ÷Ð ©̧ ı ä

95£ ¥µ ļ õÉ Ù½ »Ò ×» î¶ µ̄ ÄÕ Ê» §Ô ÚÎ À́ 2́¸ öÔ ÂÄ Ú¿ ÉÄ Ü

½ «Á ÷Ê §

Í Ú¾ ò· ¢Ï Öµ Ĺ æÔ ò£ º

Ê ý¾ ݲ Ö¿ â

Customer development

Elements of customer development include customer lifetime value analysis, up/cross selling and market basket analysis.

Customer lifetime value analysis is defined as the prediction of the total net income a company can expect from a customer. Up/Cross selling refers to promotion activities which aim at augmenting the number of associated or closely related services that a customer uses within a firm.

Market basket analysis aims at maximizing the customer transaction intensity and value by revealing regularities in the purchase behaviour of customers.

SPSS通讯行业分析专题

SPSS Modeler通讯行业分析模型

Personalized recommendation systems

Personalized recommendation

Personalization is defined as “the ability to provide content and services tailored to individuals based on knowledge about their preferences and behavior” or “the use of technology and customer information to tailor electronic commerce interactions between a business and each individual customer”

Internet recommendation systems (Internet recommender systems) in electronic commerce is to reduce irrelevant content and provide users with more pertinent information or product.

A recommendation system is a computer-based system that uses profiles built from past usage behavior to provide relevant recommendations.

Information filtering and recommendation

rule-based filtering, content-based filtering, and collaborative filtering.

Rule-based filtering uses pre-specified if-then rules to select relevant information for recommendation.

Content-based filtering uses keywords or other product-related attributes to make recommendations.

Collaborative filtering uses preferences of similar users in the same reference group as a basis for recommendation.

Typical personalization process

understanding customers through profile building delivering personalized offering based on the knowledge

about the product and the customer measuring personalization impact

Inadequate information in IR

One possible solution for overcoming the problem is to expand the query by adding more semantic information to better describe the concepts. Relevance feedbacks and knowledge structure are used to add appropriate terms to expand the queries.

Relevance feedbacks are information on the items selected by the user from the output of previous queries.

Spreading Activation Model

In the Spreading Activation (SA) Model, concepts are expanded based on the semantics in the process of identifying customer profile and matching items and the model has been applied to expand queries.

A personalized knowledge recommendation system

A semantic-expansion approach to build the user profile by analyzing documents previously read by the person.

The semantic-expansion approach that integrates semantic information for spreading expansion and content-based filtering for document recommendation.

A sample semantic-expansion network

Experimental results

An empirical study using master theses in the National Central library in Taiwan shows that the semantic-expansion approach outperforms the traditional keyword approach in catching user interests.

构件库管理

自适应构件检索

构件检索是构件库研究中的重要问题,有效的构件检索机制能够降低构件复用成本。

构件的复用者并不是构件的设计者或构件库的管理员,在检索构件时对构件库的描述理解不充分,导致难以给出完整和精确的检索需求。

用户选择构件的结果反映其真实需求,如果能够从用户的检索行为以及用户对检索结果的反馈中推断出用户的非精确检索条件与用户实际需要的精确检索条件之间内在联系的模式,就可以提高系统的查准率。

基于关联挖掘的自适应构件检索

把关联规则挖掘方法引入构件检索,从用户检索行为以及反馈中挖掘出非精确检索条件与精确检索结果之间的关联规则,从而调整检索机制,提高构件检索的查准率。

实例

{windows} {windows ,SQL Server}{Linux} {Linux ,Mysql}{ 金融 } { 金融 ,SQL Server}{windows , 金融 } {windows , 金融 ,SQL Server}

供应链管理

零部件供应商选择

如何选择供应商不仅决定了产品的质量和成本 , 也决定了产品的销售价格、维护费用和用户满意程度。

选择供应商一般以满足时间约束的条件下最小化物流成本为目标,没有考虑零部件故障率与不同地域环境之间的相关性。

基于关联规则的零部件供应商选择

使用关联规则挖掘算法,从产品维修记录中,寻找不同供应商提供的产品零部件及其组合在不同地域的频繁故障模式。

在生成供应商选择和配送方案过程中,利用这些频繁故障模式,选择合适的零部件供应商组合,达到物流成本与产品维护成本的联合优化。

人力资源管理

人力资源管理

人力资源在高科技公司中的地位相当重要。人力招聘直接影响公司员工的素质,但传统的人力资源管理方法已经不适应高科技公司的需要。

高科技行业知识不断变化,工作不易定界,跨职能任务较多,工作过程趋于多元化。这些因素都对员工素质提出了更高的要求,依靠传统方法获知竞聘者是否能够胜任工作变得比较困难。

采用决策树挖掘出人员选拔规则

CHAID

Decision tree for predicting job performance

Improving education

Improving teaching and learning

Instructors can have trouble identifying their real difficulties in learning.

Based on the students’ testing records, the system works to identify and find those problems, and then comes up with its suggestions for designing new teaching strategies.

Assist teachers to identify students’ specific difficulties and weaknesses in learning.

Helps the student to find out his or her weak points in learning and offers improvement recommendations.

ESL recommender teaching and learning

Right/wrong answer statistical table

For every student, the system creates a right/wrong answer statistical table: a wrong answer is represented by 1 and a right answer by 0.

Summary table of students’ wrong answers

The right/wrong answer statistical tables for respectivestudents are integrated in a summary tableof students’ wrong answers, and the sum values in the table are then ranked in descending order so as to show the descending degrees of weaknesses the students have collectively .

Hierarchical clustering

Hierarchical clustering algorithm is then applied to data collected to segment the students into acertain number of clusters, or categories, each of whichincludes students sharing the same or similar characteristics.

All students’ right/wrong answer statistical tables

Clustering analysis

A clustering analysis is made of the data in All students’ right/wrong answer statistical tables. It is evident that the students whose numbers are enclosed in the following separate parentheses belong to different clusters respectively: (9,15, 6, 17, 13, 19, 14, 5); (22, 23, 4, 3, 21, 11, 24, 20, 7, 1);(12, 18, 2, 8, 25, 10, 16).

搜索引擎优化

搜索引擎优化

They are usually not search engines by themselves.

The clustering engine uses one or more traditional search engines to gather a number of results; then, it does a form of post-processing on these results in order to cluster them into meaningful groups.

The post-processing step analyzes snippets, i.e., short document abstracts returned by the search engine, usually containing words around query term occurrences.

E-Commerce Recommender Systems

Background

E-commerce has allowed businesses to provide consumers with more choices.Increasing choice, however,has also brought about information overload.

E-commerce stores are applying mass customization principles to their presentation in on-line stores. One way to achieve mass customization in e-commerce is the use of recommender systems.

What is E-commerce Recommender Systems?

Recommender systems are used by e-commerce sites to suggest products to their customers and to provide consumers with information to help them decide which products to purchase.In a sense, recommender systems enable the creation of a new store personally designed for each consumer(one-to-one marketing).

Tool for database marketing and CRM

The Structure of Recommender Systems

Recommendation Method

•Targeted customer inputs

•Community inputs •Outputs

•A typical e-commerce recommender application includes the functional I/O, the recommendation method.

Targeted Customer Inputs

explicit navigation inputs are intentionally made by the customer with the purpose of informing the recommender application of his or her preferences—keywords search , registration etc.

Implicit inputs : specific item or items that the customer is currently viewing or those items in the customer's shopping cart(purchase history).

Community Inputs

community purchase history best-seller lists text comments

Output Recommendations

a set of suggestions : ordered list or unordered lists

Ratings meta-rating: rating the comments themselves text comments item-to-item correlation user-to-user correlation Top-N Email marketing

Delivery and Presentation

Push methods reach a customer who is not currently interacting with the system for example, by sending e-mail, recommendations for related products.

Pull methods notify customers that personalized information is available but display this information only when the customer explicitly requests it.

Other types of visualization.

Recommendation Methods

Statistical summaries of community opinion within-community popularity measures and aggregate or summary ratings Association analysis

Content-based recommendations: The user will be recommended items similar to the ones the user preferred in the past;

Collaborative recommendations: The user will be recommended items that people with similar tastes and preferences liked in the past;

Hybrid approaches: These methods combine collaborative

and content-based methods. Examples?

Techniques for Recommendation Many techniques from data mining can be

adapted to the scalability problem for recommender systems:nearest-neighbor,classifiers(rule induction, neural networks, and Bayesian networks), clustering,association

Web usage mining and more general commerce-related data mining may reveal techniques for exploiting complex behavioral data.

E-commerce Recommender Applications

E-recommender systems enhance E-commerce sales (2 %-8 %) in the following ways:

Converting Browsers into Buyers Increasing Cross-sell Building Credibility through Community Inviting customers back Give the type of feedback needed for marketing

professionals

Techniques for Recommendation

Many techniques from data mining can be adapted to the scalability problem for recommender systems:nearest-neighbor,classifiers(rule induction, neural networks, and Bayesian networks), clustering,association

Web usage mining and more general commerce-related data mining may reveal techniques for exploiting complex behavioral data.

顾客评价的关联分析

A location-aware recommender system for mobile shopping environments

When receiving a service request, the on-line subsystem generates a list of possibly interesting web pages based on the customer’s interests profile, vendor data,and the instantaneous position of the customer provided by the location manager.

研讨题

阅读后面参考文献,分析案例使用的数据挖掘方法以及解决的主要问题。

结合自己的实践,说明所在岗位对商务智能的需求(针对软件工程硕士)。