real-time market basket analysis for retail with hadoop
DESCRIPTION
TRANSCRIPT
Real-Time Market Basket Analysis for Retail with Hadoop
Simone Ferruzzi and Marco MantovaniIconsulting Spa
@IconsultingBI
Real-Time Market Basket Analysis for Retail with
Hadoop
@IconsultingBI
ICONSULTING
ICONSULTING IS AN INDEPENDENT CONSULTING COMPANY SPECIALIZED IN DWH,BI & PM
Strong expertise on all the market leading technologies
INNOVATIVE SPECIALIZEDDEVELOPING
SKILLSVENDOR
INDEPENDENT
2 3 41
WHOWE ARE
More than 300 projects; more than 100 customers
Professorship in main Italian Universities and Business Schools
In-house Academy providing education services to professionals who need to develop their skills
Spin-off of a major Research University Consortium
25% of our time invested in R&D
Certified Partner of the main Business Intelligence software vendors
# Data Warehouse # Business Intelligence# Performance Management
@IconsultingBI
PROCEDURES & OPERATING INSTRUCTIONS ACCORDING TO ISO 9001:2008
STEP BY STEPAPPROACH
PROJECT REQUIREMENT & RESTRAINTS
SERVICEQUALITY
TIME & COSTSEXECUTION
MEETINGDEADLINES
PROBLEMS & RISKSMANAGEMENT
COMMUNICATION AMONG STAKEHOLDERS
WHAT REALLY COUNTS
AGILEDESIGN THINKINGMETHODOLOGY
ICONSULTING Methodology
@IconsultingBI
OurCUSTOMERS
MANUFACTURINGALFA WASSERMANNAMPLIFONARISTON THERMOCAMAR SMACANTIERI SANLORENZOCASE NEW HOLLANDFEDRIGONIG.DCISA (Ingersoll-Rand) DUCATI MOTOR HOLDINGESSECOFIAMMFONTANOTGRUPPO COESIAGRUPPO FABBRIICF - LA FAENZAIGUZZINII.M.A. INDUSTRIA MACCHINE AUTOMATICHE INTERTABA - PHILIP MORRISKME KOMATSULOWARAMAGNETI MARELLIMALAVOLTA CORPORATEMAPEIMARAZZIMARPOSSNEGRI BOSSIOVA BARGELLINIOTISPHILIP MORRIS ITALIAPIRELLIPOZZI GINORIROSETTI MARINOSACMISECISONY EUROPATEUCO GUZZINI UNO A ERREVINAVIL
MEDIA & PUBLISHINGPANINI GROUPSKY ITALIAVODAFONEZANICHELLI EDITORE
GOVERNMENT & PUBLIC SECTORMINISTERO DELL’INTERNOMINISTERO DEL LAVORO E DELLE POLITICHE SOCIALIREGIONE EMILIA ROMAGNA REGIONE CALABRIA REGIONE VENETO AGREA ARPA ARPATCESIACOMUNE DI BOLOGNA COMUNE DI REGGIO EMILIAERVETINVITALIAI.S.P.R.A. AMBIENTEISTITUTO NAZIONALE FISICA NUCLEARELEPIDAPROV. AUTONOMA DI BOLZANOPROV. AUTONOMA DI TRENTOPROVINCIA DI RIMINIUNIVERSITA’ DI BOLOGNA
SERVICESDAY RISTOSERVICEGRUPPO SOCIETA’ GAS RIMINIMOBYRINASIENAMBIENTESOFIS
FASHIONCALZEDONIADIESELGEOXGUCCIIMAXLOTTOMILAR
FINANCIAL SERVICESCREDIT SUISSEDEXIA CREDIOPFGA CAPITAL (GRUPPO FIAT)UNIPOL BANCA
FOODBIRRA PERONIERIDANIA SADAMGRANDI SALUMIFICI ITALIANIMASSIMO ZANETTI BEVERAGE GROUPMONTENEGROSALUMIFICIO FRATELLI BERETTASEGAFREDO
LARGE SCALE RETAILCONAD ADRIATICOLA RINASCENTESMA (SIMPLY MARKET)VIP CATERING
@IconsultingBI
Business Intelligence
Turning data into Information
Historicize and Organize Information
Facilitating access to information
Evolution Trends (Big Data)
+ end users + informations + performance
Analytics and
Business Intelligenc
e
Mobile Technologie
s
Cloud Computing
Collaboration
Technologies
Connect analysis to Action
Analyze data in Real Time
Self-service BI
Advanced visualization (mapping, etc.)
New data type (unstructured data / text)
Information Discovery on Big Data
New channels of access (Mobile)
Collaboration & Social
@IconsultingBI
Market Basket Analysis for Retail
Client:Major Italian fashion company(3000+ points of sales worldwide)
Need: Market Basket Analysis on sold items.• Input: single invoice lines.• Output: Associative Rules to verify marketing
campaigns, seasonal shopping habits, layouts of shops, etc.
Solution: • Based on Hadoop ecosystem• Fully integrated with Business Intelligence platform
(Oracle Business Intelligence Enterprise Edition)
@IconsultingBI
Market Basket Analysis key concepts
• Market Basket Analysis (MBA) is an application of data mining algorithms aimed at identifying frequent patterns and co-occurrence relationships.
• Given a set of input data, the MBA returns a set of association rules like
A B
The meaning of which is «If A occurs, then B is likely to occur» (in this case, «If you buy product A, you will also buy B»)
• Each rule is associated with two values that measure the degree of interest:– Support: the percentage of cases in which the two events A and B occur together on the total of the
considered cases (e.g., the number of receipts in which A and B appear together divided by the total number of receipts);
– Confidence: the percentage of cases in which the two events A and B occur together on the total of cases where A occurs (e.g., the number of receipts that contain both products A and B divided by the total number of receipts where A appears).
@IconsultingBI
Example of associative rule
• Easywear Underwear
• Support: 9%
• Confidence: 50%
• In 9% of cases Easywear and Underwear products are sold together.
• In 50% of cases when someone purchases an Easywear item, an Underwear item is also purchased.
@IconsultingBI
Case study: MBA for Retail
• Italian company leader in the Fashion industry
• Sales data from the last three years
• More than 100 million receipts
• The results obtained can be used as an indicator for:– Defining new promotional initiatives
– Identifying optimal schemes for the layout of goods in stores
– etc.
@IconsultingBI
Architecture
Receipts
Associative Rules
Interactive Dashboards
MBA job
Job Management Console
Number of sold items &
Associative Rules
@IconsultingBI
MBA Algorithm Steps
Job 1
Job 2
Job 3
List of single sold items (receipt lines)
Items list aggregated for receipts
Support of the itemsets
Map
Reduce
Map
Reduce
Map
Reduce
Receipt key, item value
Combination of items inside the same receipt
Calculation of all possible Association Rules that meet minimum Support criteria
Association Rules that meet minimum Confidence criteria
@IconsultingBI
Job Management Interface
• Interface integrated with standard BI tool
• MBA Algorithm can run on different data sets
• Each user can perform custom analysis
• Algorithm parameters (minimum support and confidence) can be set by end users
• Examples of different analyses:
– what types of products are sold together with a discounted item?
– are there different association rules between products sold in city-center stores and those in outlets?
@IconsultingBI
Job Management Interface
Analysis Description
Time filters
Point of Sales filters
Product filters
Attributes used for association rules
Support & Confidence parameters
Run MBA
@IconsultingBI
Results Dashboard
Support Confidence
@IconsultingBI
Analysis Examples
• From 01/09/2013 to 31/12/2013 marketing campaign of a new type of bra• All Italian points of sales located in city centers• Analysis between all types of item except knitwear• Min. support 35%, min. confidence 50%
Meaning: 36% of considered receipts contain all those items; when the new bra is purchased, 52 times out of 100 a slip and a babydoll are also purchased
Same configuration as before, but considering only PoS in shopping centers
Meaning: in shopping centers, the sales of easywear drive the sales of the new bra.
Rules found:new bra slip, babydoll support: 36% confidence: 52%
Rules found:Easywear new bra support: 50% confidence: 60%
@IconsultingBI
Conclusions and future work
Conclusions
• Now business users can deeply investigate on the effectiveness of marketing and advertising campaigns and figure out whether shop windows and in-store layouts reach desired goals.
• Market Basket Analysis algorithm can be customized on users’ needs.
• Transparent interaction between Hadoop Cluster and Business Intelligence platform.
Future work: from project to solution:
• Complete framework to run complex Data Mining algorithms on Big Data.
• Hadoop to exploit parallel execution and Distributed File System.
• Seamless integration with standard Business Intelligence tools.
• More user independence on data integration.
@IconsultingBI
Real-Time Market Basket Analysis for Retail with
Hadoop