analytical benchmarking meets data mining: the smartdea framework, smartdea software, and case...
TRANSCRIPT
1
Analytical Benchmarking Meets Data Mining:
The SmartDEA Framework, SmartDEA Software,
and Case Studies for Industry
Gürdal [email protected]
Invited Seminar at A*Star SIMTECH, Singapore, August 2, 2013, Friday
2
Istanbul, Turkey
Singapore
3
• Young, high-profile private University• Outskirts of Istanbul, Turkey• First students accepted in 1999
www.sabanciuniv.edu
4
• Established by the Sabanci Foundation
www.sabancivakfi.org
5
• Sabancı Group
www.sabanci.com
6
• Sabancı Family: Sakıp Sabancı, Güler Sabancı, 200+
7
• ~3000 undergrad & ~500 grad students
8
• Highest research income per faculty member among Turkish universities
9
• Young, high-profile private University• Established by the Sabanci Foundation• Sabancı Group• Sabancı Family: Sakıp Sabancı, Güler Sabancı,
200+• First students accepted in 1999• ~3000 undergrad & ~500 grad students• Highest research income per faculty member
10
Dr. Gürdal Ertek• Assistant Professor at
Sabancı University, Istanbul, Turkey, since 2002
• Ph.D. from School of Industrial and Systems Engineering @ Georgia Institute of Technology, Atlanta, GA, USA
• Research areas include – warehousing & material handling– data visualization & data mining
11
Analytical Benchmarking Meets Data Mining:
The SmartDEA Framework, SmartDEA Software, and Case Studies for Industry
Gürdal [email protected]
Invited Seminar at A*Star SIMTECH, Singapore, August 2, 2013, Friday
12
Motivation
• Analytical Benchmarking– application of mathematics and computation based
methods for benchmarking a group of entities– aims at developing objective and automated
methods of benchmarking. • Overwhelming majority of literature focuses
on – developing new benchmarking methodologies
• An important aspect forgotten:– post-analysis of the benchmarking results
13
Motivation
• Data Mining– growing field of computer science– aims at discovering the hidden patterns and coming up
with actionable insights. • Overwhelming majority of literature focuses on
– developing more efficient and effective computational algorithms.
• Important aspects not drawing deserved attention:– the quest for practical actionable knowledge– data mining can be used for post-analysis of results of
other methodologies & algorithms
14
This Seminar
• Goals– SmartDEA Solver framework for integrating
analytical benchmarking with data mining– How DEA results should be structured – Meaningful interpretation of DEA results
• Case study applications– Automotive– Wind energy– Apparel retail
15
Research Questions
• How can Data Envelopment Analysis (DEA) results be structured such that they can be analyzed using readily available data mining techniques and software tools? (SmartDEA)
• How can DEA & information visualization be used together? (Case Study 1)
• Which visualization techniques are appropriate for analyzing DEA results? (Case Study 2)
• How can DEA and data mining be integrated with the results of other data mining techniques, specifically association mining results? (Case Study 3)
16
Presentation Contents
• Background on Data Envelopment Analysis (DEA)
• SmartDEA framework• Case Studies
– Automotive– Wind Energy– Apparel Retail
17
Background
18
Sample DEA Analysis
19
Data Envelopment Analysis (DEA)
Data• Entities = DMUs (n DMUs)• Comparison of DMUs• Inputs and outputs (m inputs, s outputs)
Results• Efficiency score between 0 and 1• Reference sets• Projections
20
Basic DEA Models
• Maximize the ratio : for each DMU0
•
21
Basic DEA Models
• CRR-Input model
• CRR-Output model
22
Basic DEA Models
• BCC-Input model
• BCC-Output model
23
Basic DEA Models
24
Analyzing the solutions of DEA through information visualization and data mining techniques: SmartDEA
Framework
Alp Eren Akcay, Gürdal Ertek, Gulcin Buyukozkan
Gurdal [email protected]
25
Research Questions
• How can Data Envelopment Analysis (DEA) results be structured such that they can be analyzed using readily available data mining techniques and software tools? (SmartDEA)
• How can DEA & information visualization be used together?
• Which visualization techniques are appropriate for analyzing DEA results?
• How can DEA and data mining be integrated with the results of other data mining techniques, specifically association mining results?
26
Goal
• To build a framework for making analytical benchmarking and performance evaluations
• To design and develop a convenient DEA software, SmartDEA
27
Contribution
• To develop a general framework• To help DEA analysts to generate important
and interesting insights systematically• To integrate the results for information
visualization techniques
28
Framework
• Integration of DEA results with data mining and information visualization
29
Proposed framework1. integrates data mining and information
visualization with DEA,2. generates clean data for mining (data
auditing at the DEA modeling stage),3. allows the incorporation of “other data”
into the process,4. can accommodate multiple DEA models
within same analysis.
30
Notation
31
Notation
32
Notation
33
Notation
34
Notation
35
Notation
36
Notation
37
Notation
38
Notation
39
SmartDEA: the developed software
40
Modeling Process
• C# language• Results in file format of MS Excel• Imported data requires a certain format
41
Modeling Process
• 1- Importing Excel File:– Data requires a certain format
42
Modeling Process
• 2- Selecting the spreadsheet:
43
Modeling Process
• 3- Constructing the model:
44
Modeling Process
• 4-Selecting the DEA Model:
45
Modeling Process
• 5- Solving and generating the solution file:
46
Case Study 1:Integrating DEA with Information Visualization
for Benchmarking Dealers in theAutomotive Industry
Dr. Gürdal Ertek, Tuna Çaprak
48
Research Questions
• How can Data Envelopment Analysis (DEA) results be structured such that they can be analyzed using readily available data mining techniques and software tools?
• How can DEA & information visualization be used together? (Case Study 1)
• Which visualization techniques are appropriate for analyzing DEA results?
• How can DEA and data mining be integrated with the results of other data mining techniques, specifically association mining results?
49
A New Approach for Benchmarking and Managing TOFAŞ Dealers
Tuna ÇaprakLeaders for Industry Program ’07-’08, Sabancı University
Gürdal Ertek, Ph.D.Faculty of Engineering and Natural Sciences, Sabancı University
50
A New Approach for Benchmarking and Managing TOFAŞ Dealers
51
Data Envelopment Analysis (DEA)
Benchmark Independent Decision Making Units (DMUs)
Express Efficiency with a Single Score Between 0 and 1
Consider Multidimensional Input / Output Relations
52
Information Visualization (InfoViz)
Reveal Hidden Structures
Derive Actionable Insights
Identify Patterns
53
Information Visualization (InfoViz)
Reveal Hidden Structures
Derive Actionable Insights
Identify Patterns
Develop CompetitiveStrategies
54
Model 1: Measuring “Efficiency”
55
Model 1: Measuring “Efficiency”I
N P
U T
S
O U
T P
U T
Dealer Expenses
Spare Parts Area
No of Employees
Revenue (Total)
Dealer
DM
U
56
Model 2: Measuring “Efficiency for TOFAŞ”
57
Model 2: Measuring “Efficiency for TOFAŞ”
Amount Purchased from TOFAŞ (YTL)
I N
P U
T S
O U
T P
U T
Dealer Expenses
Spare Parts Area
No of Employees
Dealer
DM
U
58
Other Data of Interest on DMUs
Share of TOFAŞ IsRentEstimated Cities
No of Services
59
ANALYSIS and DISCUSSIONS
60
Visualization of results
• Miner 3D
61
Visualization of results
• Omniscope
62
Future Work
Further Data Analysis
Technical Report and Paper
Incorporation of City Growths
63
Special Thanks to …
Prof. Muhittin Oral Hasan ErdoğanSinan Südütemiz
Case Study 2:Insights into the Efficiencies of On-Shore Wind
Turbines: A Data-Centric Analysis
Dr. Gürdal Ertek, Murat Mustafa TunçEce Kurtaraner, Doğancan Kebude
Research Questions
• How can Data Envelopment Analysis (DEA) results be structured such that they can be analyzed using readily available data mining techniques and software tools?
• How can DEA & information visualization be used together?
• Which visualization techniques are appropriate for analyzing DEA results? (Case Study 2)
• How can DEA and data mining be integrated with the results of other data mining techniques, specifically association mining results? 65
Outline
• Wind Turbines• Our Study
– Methodology : • Data Envelopment Analysis (DEA)• Visual Data Analysis• Hypothesis Testing
– Analysis and Results– Insights
66
Wind Turbines
• Mechatronic devices that convert wind energy into electrical energy via mechanical energy.
• Features:• Diameter• Air dynamics• Tower height• Controlling devices• Location
(On-shore / Off-shore)
67
Importance of Wind Turbines• Green Energy• Worldwide installed wind power capacity
– In 1990: 2,160 MW – In 2011: 238,351 MW
(Global Wind Energy Council)
• 16% of Europe’s electricity by 2020 (The European Wind Energy Association)
68
Wind Energy in Turkey
• 40 GW wind energy potential in next 20 years
69Image Source: www.ecoenerji.net
Wind Energy in Turkey
• MİLRES: 500kW wind turbine to be designed and made in Turkey, – In 2013 output of 500 kW– In 2015 output of 2 MW– Largest budget civilian R&D project in the history
of the Turkish Republic
70
71
Our Study• Technical data of wind turbines are collected and
analysed by following methodologies: Data Envelopment Analysis (DEA) Visual Data Analysis Hypothesis Testing
• Aim: Decision of the efficient wind turbines Understanding of how to make an unefficient
turbine efficient by referencing the efficient ones Benchmarking of commercial wind turbines
visually and statistically.
72
Literature• First example of:
– Benchmarking of commercial wind turbines– Visualisation as a directed graph of reference sets
in DEA results• Use of DEA and visualization together:
– Ertek et al. (2007) “Benchmarking the Turkish apparel retail industry”.
– Ulus et al. (2006) “Financial benchmarking of transportation companies in the New York Stock Exchange (NYSE)”.
73
• Efficiency comparision of Decision Making Units (DMU) according to– Inputs (lower)– Outputs (higher)
• For each DMU– Efficiency score (between 0 and 1)– Reference sets– Projections
MethodologiesData Envelopment Analysis
74
MethodologiesVisual Data Analysis
• To distinguish different patterns in data and achieve new and useful insights. (Keim, 2002)
• Orange Canvas (software)– Scatter plot
• Miner 3d (software)– Surface plot
75
Database
1. Vestas (Denmark)2. Sinovel (China)3. Goldwind (China)4. Gamesa (Spain)5. Enercon (Germany) 6. GE (USA)7. Suzlon (India) 8. Guodian (China)9. Siemens (Germany)10. Ming Yang (China)
Top 10 companies in worldwide market share
76
DEA ModelModel A : 74 on-shore wind turbine modelsModel B : 32 on-shore wind turbine models (low-wind)
• Inputs:- Diameter (m)- Nominal wind speed (m/s)
• Outputs: - Nominal Output (V)
• Other features:- Cut-in wind speed (low/medium/high)- Company
77
DEA Model
• “BCC Output Oriented” • Smart DEA Solver software
• Developed in Sabancı University• Reads data from MS Excel and generate results
• Visual analysis with Orange Canvas and Miner3D using efficiency scores
78
Analysis and Results
79
1 - Efficiency vs Companies
2 - Efficiency vs Nominal Output
80
81
3 - Efficiency vs Cut-in Wind Speed
82
4 - Efficiency vs Diameter
83
5 - Reference Analysis
• Which efficient turbine models should inefficient ones take as references?– X axis: Efficient turbine model that should taken as
reference– Y axis: DMU name– Size of circle: Weight of reference
84
85
6 - Reference sets for Model B with yEd software
86
7 - Projection Analysis
• At which percentage should the models change their inputs and outputs to become efficient?– X-axis : Percentage change– Y-axis : Efficiency– Colors: Inputs and outputs
87
88
8 - Miner 3D Surface Plot Analysis
89
9 - Miner 3D Surface Plot Analysis
90
Insights
• Efficiency according to companies:– Enercon and GE are the most efficient companies– The efficiencies of turbines of Goldwind, Ming
Yang, Mitsubishi and Siemens are under 60%• Efficiency according to nominal output:
– Lower or higher values of nominal output is not effect efficiency
– But, outputs around 1.5 MW have higher efficiencies
91
Insights• Efficiency according to cut-in wind speed:
– 2 and 2.5 m/s have lower; 3, 3.5 and 4 m/s have higer number of models
– 3 m/s and over have higher efficiency scores compared to 2 and 2.5 m/s
• Efficiency according to diameter: – Model with the smallest diameter is the most
efficient turbine– Efficiency score of models with diameter between
70m and 85m are higher than expected
92
Insights
• Reference analysis:– DMUs 15, 20, 27, 61, 81 are the ones that taken as a
reference at most• Projection analysis:
– Some of the models should both decrease inputs and increase outputs to become efficient
– For most of the models it’s enough to increase outputs• Miner 3D surface plot analysis:
– Input and outputs parameters of the models in light colored regions are ideal for higher efficiency
93
Hypothesis Testing
• Kruskal – Wallis Test confirmed that:– Efficiency scores and cut-in wind speed is
significantly different depending on the companies.
94
References• Cooper, W. W., Seiford, L. M., Tone, K. (2006), Introduction to Data
Envelopment Analysis and its Uses, Springer, New York. • Ertek, G., Can, M.A., Ulus, F. (2007) “Benchmarking the Turkish apparel
retail industry through data envelopment analysis (DEA) and data visualization”. In: EUROMA 2007 14th International Annual EurOMA Conference: Managing Operations in an Expanding, Ankara, Turkey
• Keim, D. A. (2002), “Information visualization and data mining,” IEEE Transactions on Visualization and Computer Graphics, Vol.8, No.1, pp. 1-8.
• Ulus, Firdevs and Köse, Özlem and Ertek, Gürdal and Şen, Simay (2006) “Financial benchmarking of transportation companies in the New York Stock Exchange (NYSE) through data envolopment analaysis (DEA) and Visulation”. In: 4th International Logistics and Supply Chain Congress, İzmir, Turkey, İzmir
• Weill, L. (2004), “Measuring cost efficiency in European banking: a comparison of frontier techniques,” Journal of Productivity Analysis, Vol.21, No.2, pp. 133-152.
Q&A
• Dr. Gürdal Ertek ([email protected])• Murat Mustafa Tunç ([email protected])• Ece Kurtaraner ([email protected])• Doğancan Kebude ([email protected])
95
Case Study 3:Re-Mining Association Mining Results
Through Visualization, Data Envelopment Analysis, and Decision Trees
Gurdal Ertek, Murat Mustafa [email protected]
96
Research Questions
• How can Data Envelopment Analysis (DEA) results be structured such that they can be analyzed using readily available data mining techniques and software tools?
• How can DEA & information visualization be used together?
• Which visualization techniques are appropriate for analyzing DEA results?
• How can DEA and data mining be integrated with the results of other data mining techniques, specifically association mining results? (Case Study 3)
97
98
Book Chapter Published in
• ‘‘Computational Intelligence Applications in Industrial Engineering’’– A book edited by Prof. Cengiz Kahraman– Published by Atlantis & Springer
Outline
• Introduction• Literature• Methodology• Case Study
– Data Analysis– Data Visualization– Data Envelopment Analysis– Decision Trees– Classification
• Conclusion99
Introduction
• How the results of association mining analysis further analyzed using– Data visualization– Data Envelopment Analysis (DEA)– Decision Trees
• Visual Re-Mining of an item considering both– Positive assocations– Negative associations
100
Association Mining
• Inputs: – Transaction data that contains a subset of items
• Outputs:– List of item-set that appear together frequently
• Primary metrics:– Support is the percentage of transactions that the
items appear in– Confidence is the conditional probability that item B
appearing in transaction given that item A readily appears
101
Association Mining
102
• A classical application is market basket analysis
Graph Visualization
• Refers to the drawing of graphs, that consists– Nodes– Arcs– Special algorithms
• In order to obtain actionable insights
103
Re-mining
• Mining of a newly formed data constructed upon the results of data mining process
• The goal is– to obtain new insights that couldn’t have been
discovered otherwise, and– to characterize, describe, and explain the results
of the original data mining process
104
Data Envelopment Analysis
• Benchmark a group of entities through efficient scores
• Entities are called Decision Making Units (DMUs)
• Efficiency score increases, if– DMU generates higher output using same input,
or– DMU uses less input for the same output
105
106
Graph Metrics
• Degree shows the number of connections• Betweenness centrality represents total
number of shortest paths• Closeness centrality shows the distance
between the node and every other node• Eigenvector centrality shows the distance
between the node and every other “special” node
107
Graph Metrics
• Page rank is the value that increases if node is closely related with “special” nodes
• Clustering coefficient represents the tendency of aggregation for several nodes
108
Decision Trees
• Main goal: To identify the nodes that differs considerably from its root node
• Each node is split (branced) according to a criterion
• Our study uses ID3 algorithm• Branches are created in Orange software
109
Classification
• Dataset is divided into two groups, namely learning dataset and test dataset
• Classification algorithms are called learners– Naive Bayes– k-Nearest Neighbor (kNN)– C4.5– Support Vector Machines (SVM)– Decision Trees
• The prediction success of each learner is measured through classification accuracy (CA)
110
Methodology
1. Perform positive association mining2. Find negatively association item pairs from 1.3. Compute the percentage of positive
associations4. Construct two association graphs, (1) shows
only positive assoc., (2) shows only negative5. Compute graph metrics for each node
111
Methodology
6. Construct the dataset for re-mining7. Apply grid layout for graphs, then visually
analyze them.8. Construct a DEA model, to combine the
insights and to find the most important items9. Construct a classification model and decision
trees10. Apply multiple learners and evaluate
classification accuracy
112
Case Study• Based on real company data in apperal retail
industry– Merchandise group in men clothes line– 2007 season
113
Case Study• Company headquartered in Istanbul
– 300+ stores in Turkey– 30+ stores in more than 10 countries
114
Case Study
• As of Nov. 2010, the U.S. retail industry exceeded $377.5 billion
115
Data Analysis
• Step 1: Positive association mining – Min. support value : 100– Result: 3930 frequent item pairs involving 538
items• Step 2: Negative association mining
– Result: 2433 item pairs involving 537 items• Step 3: Percentage of positive associations of
each item
116
Data Analysis
• Step 3: Percentage of positive associations of each item
117
Data Analysis
• Step 4: Positive and negative association graphs
118
Data Analysis
• Step 5: Graph metrics were computed using NodeXL add-in for MS Excel
• Step 6: Dataset formed for re-mining– Each row is item involding positive association– Columns include
• unique item number • support count (SupC)• StartWeek • EndWeek• LifeTime
• MaxPrice• MinPrice• PriceDiff• MerchSubGroup• Category
• PercOfPositiveAssoc• Graph Metrics
119
Data Visualization
• Step 7: Grid layout applied for visualization• Color denotes PercOfPositiveAssoc
– Lighter items are mostly negative associated– Darker items are mostly positive associated
120
Data Visualization
121
Data Visualization
• Second graph:– Node size represents end-of-season sales prices
(MinPrice)– Larger nodes denote higher MinPrice (more
typically high-priced items)– Smaller nodes denote lower MinPrice
122
Data Visualization
123
Data Visualization
• Third graph:– Node shape represents category
• We want to answer if the items have a particular category type– Upper left region– Darker nodes– Larger nodes
124
Data Visualization
125
Data Envelopment Analysis (DEA)
• To analytically integrate the insights found in visualizations above
• Input:– Uniform for each item
• Output:– Support Count (SupC)– PercOfPositiveAssoc– MinPrice
• Output oriented BCC model
126
Data Envelopment Analysis
Item Eff1* Eff2** Input_Auxiliary Input_LifeTime PercOfPositiveAssoc
SupC MinPrice
059 Yes Yes 1 16 91.67 4157 19.99
087094106
YesYesYes
YesYesYes
111
261132
92.3175.0030.00
89474647346933
14.9041.57 9.25
169 No Yes 1 7 75.00 4464 34.90
289 No Yes 1 8 87.50 4317 23.06
412 Yes Yes 1 13 88.89 2658 34.90
438 No Yes 1 10 91.67 4999 14.90
513 No Yes 1 4 80.00 5115 13.80
OUTPUTINPUT OUPUTINPUT
127
Conclusions
• Our methodology combines– Association mining– Graph theory– Classification– Data Envelopment Analysis– Re-mining
• Positive associations are related to graph metric values and item’s attributes
128
References• A. Demiriz, G. Ertek, T. Atan and U. Kula, Re-mining item associations:
Methodology and a case study in apparel retailing, Decision Support Systems, 52(1), pp. 284-293.(2011).
• J.R. Quinlan,Induction of decision trees, Machine Learning, 1(1), pp. 81-106.(1986).
• Orange. http://orange.biolab.si/.• E.Alpaydin, Introduction to Machine Learning,The MIT Press(2010).• A. Demiriz, G. Ertek, T. Atan and U. Kula, Re-mining item associations:
Methodology and a case study in apparel retailing, Decision Support Systems, 52(1), pp. 284-293. (2011).
• E.M.Bonsignore, C. Dunne, D.Rotman, M. Smith, T. Capone, D.L. Hansen andB. Shneiderman, First Steps to NetViz Nirvana: Evaluating Social Network Analysis with NodeXL,inInternational Symposium on Social Intelligence and Networking (2009).
• R. Agrawal, T. Imielinski and A.N. Swami, Mining association rules between sets of items in large databases,in SIGMOD Conference,P. Buneman and S.Jajodia, (Eds) (1993).
129
References• NodeXL. http://nodexl.codeplex.com/.• A.E. Akcay, G. Ertek and G. Buyukozkan, Analyzing the solutions of DEA through
information visualization and data mining techniques: SmartDEA framework, Expert Systems with Applications (2012).
• R.D. Banker, A. Charnesand W.W. Cooper, Some models for estimating technical and scale inefficiencies in data envelopment analysis,Management Science. 30(9), pp. 1078–1092. (1984).
• G. Ertek and A. Demiriz, A framework for visualizing association mining results, Lecture Notes in Computer Science (LNCS), 4263, pp. 593-602. (2006)
• G. Ertek, M. Kaya, C.Kefeli, O. Onurand K. Uzer, Scoring and Predicting Risk Preferences,in Behavior Computing: Modeling, Analysis, Mining and Decision, Cao, L., Yu, P. S. (Eds), Springer(2012).
• C. Borgeltand R. Kruse, Graphical models: methods for data analysis and mining, Wiley (2002).
• E.N. Cinicioglu, G. Ertek, D. Demirerand H.E. Yoruk,A framework for automated association mining over multiple databases, in Innovations in Intelligent Systems and Applications (INISTA), International Symposium, IEEE, (2011).
130
References• A. Savasere, E. Omiecinski and S. Navathe, Mining for strong negative associations in a
large database of customer transactions, in Data Engineering, Proceedings., 14th International Conference, IEEE (1998).
• P.N. Tan, V. Kumar and H.Kuno, in Western Users of SAS Software Conference (2001). • I. Herman, G. Melanconand M.S. Marshall, Graph visualization and navigation in
information visualization: A survey, Visualization and Comp. Graphics, 6 (2000)• M. Van Kreveld and B. Speckmann, Graph Drawing,Lecture Notes in Computer Science
(LNCS), 7034 (2012).• R. Spence, Information Visualization, ACM Press (2001).• H. Ltifi, B. Ayed, A.M. Alimiand S. Lepreux,Survey of information visualization
techniques for exploitation in KDD, in Int. Conf. Comp. Sys.and App.(2009).• C. Chen, Information Visualization, Wiley Interdisciplinary Reviews: Computational
Statistics, 2 (2010).• W.W. Cooper, L.M. Seiford and K. Tone, Introduction to Data Envelopment Analysis and
Its Uses: With DEA Solver Software and References,Springer (2006).• S. Gattoufi, M. Oral and A. Reisman, Data envelopment analysis literature: A
bibliography update (1951--2001), Journal of Socio-Econ. Planning Sci., 38, pp. 159-229. (2004).
131
Analytical Benchmarking Meets Data Mining:
The SmartDEA Framework, SmartDEA Software, and Case Studies for Industry
Gürdal [email protected]
Invited Seminar at A*Star SIMTECH, Singapore, August 2, 2013, Friday
Research Questions
• How can Data Envelopment Analysis (DEA) results be structured such that they can be analyzed using readily available data mining techniques and software tools? (SmartDEA)
• How can DEA & information visualization be used together? (Case Study 1, Automative)
• Which visualization techniques are appropriate for analyzing DEA results? (Case Study 2, Wind)
• How can DEA and data mining be integrated with the results of other data mining techniques, specifically association mining results? (Case Study 3, Apparel Retail) 132
133
Questions?
134
Thank you感谢
Terima Kasihநன்றி�
Teşekkürler :-)