page 1 operational research title subtitle data mining at british airways simon cumming...

37
Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming ([email protected]) Principal Operational Research Consultant Royal Statistical Society. Reading, Feb2005

Upload: william-hood

Post on 28-Mar-2015

226 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

Page 1

Operational Research

title

subtitle

Data Mining at British Airways

Simon Cumming ([email protected])Principal Operational Research Consultant

Royal Statistical Society. Reading, Feb2005

Page 2: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 2

Operational Research

Data mining at British Airways

• Introduction – British Airways & Operational Research

• History and some examples of data mining at BA

• Data mining and business complexity

• Successful data mining

Page 3: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 3

Operational Research

Introduction : British Airways

• UK’s largest scheduled airline159 destinations in 75 countries114 from Heathrow

• Flights are split into three areas;DomesticEuropeanLonghaul

• 4 ‘cabins’ on long haul aircraft First Class Club World - Business Class World Traveller Plus World Traveller - Economy Class

Page 4: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 4

Operational Research

The challenges BA has faced over the last 3 years

• Middle East (war in Iraq etc.)

• World Trade Centre aftermath / terror threats, security etc.

• Low Cost carriers

• SARS

• Economic instability

• Changing relations within the travel trade

Page 5: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 5

Operational Research

Issues facing BA today

• Competing in ever tougher marketplace- Customer service and innovation.

• Improving punctuality and management of disruption.• Ensuring continued financial performance

- Return on investment for shareholders, and ability to invest for future.

• Making the most of new technologies, e.g. web, self-service.

• Getting ready for Terminal 5 at Heathrow.• Reducing unnecessary complexity.• Right use of alliances, codeshares, franchises.

Page 6: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 6

Operational Research

Operational Research at British Airways

OR at BA has been going for over 50 years. The Airline industry has some complex and interesting OR

problems, e.g. • Revenue management (yield management) –

optimising number of seats available in different selling classes (prices).

• “End-to-end” scheduling, I.e. scheduling, planning, rostering, etc.

• Engineering inventory, vehicle fleets, etc. • “Commercial” – customer data, frequent flyer

programme, transaction data, market research, consultancy

• “Operational” – Check-in, queuing, seat allocation, punctuality, baggage etc.

The academic body for airline OR is AGIFORS, the Airline Groupof the International Federation of OR Societies (www.agifors.org)

Page 7: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 7

Operational Research

Operational Research at BA

“Effective change through analytical excellence”

Problem Structuring • Clarification and understanding of a complex problem

Business Modelling • Implications of future options, decisions and scenarios • Quantitative and qualitative modelling of complex business areas or

issues

Complex Data Analysis• Delivering insight into complicated issues and questions within the

business, through uncovering trends, causes and relationships, to ensure decisions are made on basis of evidence that reflects the real world

There are also data mining people in the Sales and Marketing departments.

Page 8: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 8

Operational Research

Data mining – quick overview

• Linear and logistic regression. • Decision trees (Classification & Regression Trees – Breiman et al,

1984) –recursive partitioning based on significance measure.• Cluster analysis. Ward , k-means, etc. • Self-organising map (Kohonen, 1982) – can think of as a

structured set of clusters. • Neural network – works out an approximation to the function

relating the inputs to the outputs. • Association rules – based on conditional probabilities p(y|x), e.g.

If I buy bread, what is the probability I buy butter?

Page 9: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 9

Operational Research

How a SOM works

Each dot represents a cluster centre, i.e. a vector of data with the same columns (dimensions) as your data set.

For each row of the data set, the algorithm finds the nearest cluster centre and moves it, and its neighbours, ‘towards’ the current data row by a small amount

This process iterates through the data set a number of times. w x w ( )

Page 10: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 10

Operational Research

Cluster number 1 2 3 4 5 6 7 8 9Frequency of Cluster 358 861 495 2799 341 132 0 0 113Row (in SOM grid) 1 1 1 2 2 2 3 3 3Column (in SOM grid) 1 2 3 1 2 3 1 2 3First cabin psjs in last year 0.1 0.2 0.7 0. 0. 2.3 . . 0.1Club World PSJs in last year 0.9 1.1 1.5 0.3 0.2 9.1 . . 0.8WTP psjs in last year 0.2 0.3 0.3 0.1 0.2 1. . . 0.2WorldTrav non-pts earning psjs in LY 0.2 0.5 0.4 0.6 3.9 0.4 . . 1.Club Europe psjs in last year 4.4 1.3 5.5 0.3 0.2 6.6 . . 1.6EuroTrav non-pts earning psjs in LY 4. 0.8 1.1 0.8 1.1 0.8 . . 2.7Domestic PSJs in last year 4.3 3. 2.5 1.2 0.7 2.1 . . 3.3Europe PSJs in last year 11.1 2.8 8.8 1.6 1.7 9.6 . . 5.3Africa / M. East PSJs in last year 0.4 0.6 1.1 0.2 1.1 2.5 . . 0.3Far East PSJs in last year 0.2 0.2 0.4 0.1 0.3 1. . . 0.1North / Central America PSJs in LY 0.8 1.5 1.5 0.8 2.8 9.6 . . 1.8Net Revenue in last year 3340 2442 4778 845 1163 19057 . . 2668Has miles to redeem into this zone 1 4 8 0 1 7 . . 2ONLINE=Y 0. 0. 0. 0. 0. 0. . . 1.

Example of cluster output from SOM

Page 11: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 11

Operational Research

Data mining commercial software example: SAS Enterprise Miner

http://www.sas.com/technologies/analytics/datamining/miner

Page 12: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 12

Operational Research

Data mining methodology example: SAS Institute’s “SEMMA” concept

• Sample - by creating one or more data sets• Explore - by searching for anticipated

relationships, unanticipated trends, and anomalies in order to gain understanding and ideas

• Modify - by creating, selecting, and transforming the variables to focus the model selection process

• Model - by using the analytical tools• Assess - by evaluating the usefulness and

reliability of the findings

• You may not want to include all of these steps• It may be necessary to repeat one or more of the

steps several times

• Another examples of a data mining methodology is CRISP-DM (cross-industry platform for data mining)

Page 13: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 13

Operational Research

Some examples of previous data mining work & research at BA

• 1989/90 - looking at neural nets for forecasting bookings and identifying special events.

• 1992 - Predicting “no-shows” (use of neural networks to predict, from the booking attributes, the number of people who have made a booking but do not check in for the flight)

• 1996/7 - Engine condition monitoring : feedforward neural network and self-organising maps used for ‘novelty detection’ to spot abnormal engine condition states and monitor trends (in addition to use of sophisticated conventional physical and data analysis techniques)

• 1996/7 - Neural network for estimation of work requirement for major engineering overhauls of aircraft.

• 1999 - Forecasting pilot training requirements• Patterns in takeup of electronic ticketing and check-in.

• Effect of disruption and compensation on customer loyalty.

Page 14: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 14

Operational Research

More recent data mining on Marketing data

• 1999 – Decision trees used in customer value prediction (PCV).

• 1999 – Self-organising maps used in “Travel Service” CRM.• 2000/1 – attrition models & segmentation for Executive

Club (frequent flyer) data.

• 2001 – September 11th

• 2002/3 – Analysis of on-board customer survey data (global performance monitor)

• In-flight retail. Analysis of who buys what, on-board.• 2004 – Executive Club travel pattern segmentation

Page 15: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 15

Operational Research

British Airways Executive Club

• “Frequent flyer” scheme (but also includes “partner” organisations e.g. car hire, hotels, credit cards, foreign exchange etc. )

• BA Miles – can redeem these for free flights (and other things)

• Tier points – count towards promotion from Blue to Silver and Gold Tiers.

• Silver and Gold members are eligible for “benefits” such as lounge access, preferential check-in etc.

• Data kept on flights booked and travelled and miles earnt with partner companies.

Page 16: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 16

Operational Research

BA Data Mining Examples (1) : some Executive Club models

• UK&US attrition models (who is reducing their flying ?)

• “Behavioural” segmentation (patterns of travel, e.g. occasional longhaul premium, regular shorthaul commuter, etc. )

• “Commercial partners” usage segmentation (car hire, hotels, financial cards, etc. )

• “Segment management” (specific business propositions for top segment “frequent premium stars”)

• “New joiners” model (predict value from customer attributes and patterns)

Techniques used … .

• Cluster analysis

• Self-organising maps

• Logistic regression

• Classification & Regression Trees

Software used : SAS, Enterprise Miner

Page 17: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 17

Operational Research

BA Examples (2): “Travel Service”

• Leisure travel scheme whereby customer gave details of favourite destinations, activities, plus time of year and budget, and BA sent details of tailored offers.

(now discontinued)• Self-organising maps

(SOMs) used to cluster database and select groups for matching. (1998/9)

• The diagram shows 16 customer segments (the green squares within each box) viewed on 20 different variables, to show booking, tavel and destination patterns. The area of the small squares shows magnitude.

Note: this chart was not generated using Enterprise Miner, though SAS was used in some of the analysis

Page 18: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 18

Operational Research

“Travel Service” – some customer clusters

• Sun seekers who want all components included (13.5,2.8)

• Blue tier exec club members with city breaks (1.2,4.3)

• Busy people who get away when can & are not price sensitive (2.3,8.2)

• Adventure Trail Finders (2.6,3.2)

• Longhaul package type person (0.4,2.0)

• Type of person who just ticks “all offers” box (2.3,4.8)

• Retired Southerners looking for Australia? (9.7,2.3)

• Diners & shoppers (or who like to think they do) (3.2,1.3)

• The bookers who have not provided us with all info (8.5,20.5)

Cluster as % of total

% of cluster who have made a booking

Page 19: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 19

Operational Research

BA Example (3) : In-flight retail

This example shows a cluster with preferences for jewellery / watches and “experience”packages.

Page 20: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 20

Operational Research

BA Example (3) : In-flight retail

This example shows the use of a SOM in Enterprise Miner to identify a small cluster of customers with very high value purchase patterns

A (small) cluster of shopaholics!

Purple squares show normalised mean For this specific cluster

Blue squares show average across all clusters

Variables listed in order of Difference of this cluster from overall mean

Page 21: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

Page 21

Operational Research

Data mining and Complexity

Page 22: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 22

Operational Research

Commercial complexity and the airline business

• An airline is a very complex business In this presentation, we are just considering commercial

complexity, that is in the selling process. Operational complexity is very important to us too, but is another

subject!

• Some of this complexity is there for good reasons, e.g. good commercial sense, supply and demand economics,

or for the convenience of the customerHowever, some is ‘historic’ or dictated by third parties,or is not serving its purpose.

One area in which British Airways is interested at the moment is, • How should we measure commercial complexity?

and how effective are the many different ‘ways’ of selling tickets ? and does the complexity matter?

Page 23: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 23

Operational Research

Using data mining methods to measure complexity

How can we use data mining methods to try to measure complexity ?

• Data mining techniques are good at adjusting their parameters to represent the level of complexity in the data (number of dimensions, or interactions, or ‘different things going on’).

• Machine learning theory makes use of measures such as entropy (information), minimum description length, VC-dimension, etc.

• Take a decision tree, for example. It will continue to partition the data set recursively until it can no longer find significant splits.

• So, in the right circumstances, a decision tree can show which parts of the business are ‘simple’ and which are complex. If we set the target variable to be a measure of revenue or profitability, we can also see how the complexity relates to yield, in a crude sort of way. (Note I have taken no account of ‘cost’ here for the moment)

Page 24: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 24

Operational Research

Decision tree : “tree-ring” diagram representationin Enterprise Miner

“tree ring” diagramThe centre of the diagram represents the ‘root’ of the tree, i.e. the whole data set

The outside of the diagramrepresents the lowest levelsof subdivision

The colours are used to represent the meanvalue of the target variablewithin a group (darker colour= higher value)

An alternative wayof viewing different levels of structurein different parts of the tree

Page 25: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 25

Operational Research

Using a decision tree to measure commercial complexity

In this example, a decision tree is used to show aspects of commercial complexity.

The input data was for a London-Edinburgh flight on a single day.

The input variables represent

•different ticket classes,

•‘channels’ (agents, call centres, website and so on),

•corporate deals,

•special fares,

•different currencies, etc.

Large simple areas such as this one for undiscounted club tickets represent low complexity in this sense. There may be other kinds of complexity e.g. due to ticket or booking changes.

Highly fragmented areas such as here represent many different rates and specific circumstances.“tree ring” diagram

Page 26: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 26

Operational Research

Data mining and complexity: Output of process

Low complexity, high revenue- e.g. undealt Club class tickets

High complexity, high revenue e.g. corporate deals

Low complexity, low revenue- e.g web bookings

High complexity, low revenue e.g. groups

Complexity

Pro

fita

bili

ty

Page 27: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 27

Operational Research

Data mining and complexity: Caveats

1. Data representation. Need to allow enough detail not to average out the effect we are trying to measure, but need to limit it so we get a workable model.

2. Choosing a target variable. There may be elements of complexity which we are interested in, but which do not cause a change in the ‘target’ variable, and vice versa.

3. Problem with decision tree if the output is a straightforward linear function of the input (it will try to model it as step-functions).

4. This analysis does not tell us necessarily whether the complexity we are looking at is good or bad, but gives us places to start looking.

5. Much of the time, of course, we are not bothered about the number of combinations, because the different variables are decoupled.

6. There may of course be good reasons for retaining the complexity !

Page 28: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 28

Operational Research

Using a self-organising map to look at patterns in ticket sale data

revenue E-tickets Web bookings BA ticketed

Currency : GB £ Corporate dealt Multi-leg flights Fully flexible tickets

Key: red = high value or proportion, yellow = low

Each of the 8 diagrams showsthe value of a specific variablefor each of the 100(10x10) clusters.

Frequency (number of passengers in each cluster ) is not shown but should be examined alongside these charts.

The input data were for a London-Edinburgh flight on a single day.

The input variables represent different ticket classes, ‘channels’ (agents, call centres, website and so on), corporate deals,

special fares, different currencies, etc. A subset of 8 variables is shown here.

Page 29: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 29

Operational Research

Using a self-organising map to measure complexity

• Here, there is no target variable

• We are using the SOM to find structure in the data

• We could find the size of SOM needed to model the ‘envelope’ which covers the data, and use that size as a direct measure of complexity, in the same way as we could use the size of a decision tree to measure this ‘dimension’.

• We need to be careful how we represent the data, that we are not just measuring artefacts of the representation.

• In the SOM, we can also visually ‘overlay’ the patterns of different variables as a way of visualising correlations and fine structure.

• In the example shown, some findings are immediately evident, e.g...

Most non-e-tickets on these flights were multi-leg flights (i.e. transfers) ticketed by other airlines, in foreign currencies.

Web bookings, though accounting for a relatively large number of transactions, show up as low complexity.

Page 30: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 30

Operational Research

“So what?” – how is this measuring complexity?

0

5

10

15

20

25

30

0 1 2-5 6-10 11-20 21-50 51-100 100+

number of passengers in cluster

nu

mb

er o

f cl

ust

ers

We gave the SOM the space to form 100 clusters. It actually populated 90 of them.

Part of the objective is to find out how much of the business falls into ‘simple’ and ‘complex’ categories.

18% of the passengers fell into one cluster, That is,

web bookings sold by BA in the UK, blue executive club tier, non-flexible ticket classes.

However over 25 of the clusters had less than 5 passengers in.

Page 31: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

Page 31

Operational Research

Successful data mining

Page 32: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 32

Operational Research

Some possible difficulties with Data Mining

• Expectations either too high or too low.

• Myths of data mining.

• Loose use of the term ‘data mining’

• Asking the wrong questions.

• Wrong positioning in the company.

• Does not fit ‘standard’ approach.

• Data driven and iterative, so cannot necessarily plan in advance.

• Can get swamped by results / options / model versions.

• Danger of stating the obvious or not being believed.

• Data quality, data definition and business understanding issues.

Page 33: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 33

Operational Research

Successful Data Mining: Spreading understanding

• It is often difficult initially to communicate the place, nature and benefits of data mining, even to experienced statisticians, operational researchers, or artificial intelligence people, but once people “get it” they are enthusiastic.

• Engineers, Revenue Management and Marketing analysts are often the closest to the ideas.

• Often difficult to convey complex results in meaningful business terms.

• There is sometimes a need to convince ‘upstream’ processes of the value of collecting, cleaning and maintaining data for data mining.

Page 34: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 34

Operational Research

Successful Data Mining : asking the right questions

• Much of the skill in data mining is in helping the client to articulate the question that they really want to answer and decide if it is really a data mining question.

E.g. How many executive club members travelled to New York in business class last year ?

n

What should our marketing strategy be for the Far East region?

n

What factors influence a customer’s propensity to recommend BA?

y

To which customers should we send our next campaign ?

y

Are there any patterns in these data? ?

Page 35: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 35

Operational Research

Successful Data Mining : the right mix of knowledge

• With today’s computing tools, it is easy to get ‘results’ from a data mining exercise.

• The difficult part is interpreting these, sense-checking them, and articulating a simple message from what is often a complex picture.

• Mix of technical and business knowledge essential.

• Close involvement of clients and business domain experts.

Page 36: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 36

Operational Research

Successful Data Mining: the right tools and infrastructure

• Algorithms:

– Robustness and clarity often most important

– ‘Build vs buy’ decisions

• What BA is looking for in a data mining tool …

– Set of algorithms with good coverage of problem types.

– Scalability

– Ease of implementation of models / generated code

– Integration with data sources: ‘openness’

– Compatibility with other software and company policy

– Justifiable value

Page 37: Page 1 Operational Research title subtitle Data Mining at British Airways Simon Cumming (simon.n.cumming@britishairways.com) Principal Operational Research

TitleNamePage 37

Operational Research

Any questions ?

Simon Cumming

British Airways PLC

Waterside (HDA3)

PO box 365, Harmondsworth

Middlesex UB7 0GB

Tel / fax 020 8738 8313

Email : [email protected]