data science for the hospitality domain - opentable

58
Data Science for the Hospitality Domain Dr. Nicolas Nicolov Sr. Director, Head of Data Science, OpenTable, Inc. 1 Montgomery Str. San Francisco, CA 94104, U.S.A.

Upload: nicolas-nicolov

Post on 19-Jan-2017

7.230 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Data science for the hospitality domain - OpenTable

Data Science for the Hospitality Domain

Dr. Nicolas NicolovSr. Director, Head of Data Science,OpenTable, Inc.1 Montgomery Str.San Francisco, CA 94104, U.S.A.

Page 2: Data science for the hospitality domain - OpenTable

2

OpenTable

• Seated over 1B diners since 1999; $45B spent at partner restaurants.• 20M diners / month.• 42M reviews created since 2008 (650K reviews/month).• 600 partners: Google, TripAdvisor, Bing, Yahoo, Zagat, Eater …

Part of the Priceline group:

Page 3: Data science for the hospitality domain - OpenTable

3

US

24,194 reservable / 66,109 total

Page 4: Data science for the hospitality domain - OpenTable

4

UK5,389 reservable6,832 total

Page 5: Data science for the hospitality domain - OpenTable

5

37, 861 reservable87, 328 totalWorld-wide

Page 6: Data science for the hospitality domain - OpenTable

6

World-wide1. Italian2. Seafood3. American4. Steak5. Japanese

Restaurant CuisinesTop 5 Cities & Globally

New York City1. Italian2. American3. Japanese4. Seafood5. French

London1. Italian2. Japanese3. Indian4. Steak5. Asian

San Francisco1. Italian2. Seafood3. American4. Steak5. Japanese

Chicago 1. Italian2. American3. Steak4. Seafood5. Steakhouse

Washington DC1. American2. Italian3. Seafood4. Contemporary Am.5. Steak

Page 7: Data science for the hospitality domain - OpenTable

7

Mobile First

More than 50% of reservations on mobile.

Discovery tab / Collections:• iOS launched: June 2016.• Android launched: Nov 7, 2016.

Page 8: Data science for the hospitality domain - OpenTable

Collections:

Page 9: Data science for the hospitality domain - OpenTable

9

Data Science at OpenTable

• Autocomplete.• Search (indexing, ranking).• Recommendations.• Inventory Optimization.• Advertising /

Promoted Inventory.• Content analysis.

• Autocomplete. • Tagging.• Cuisine / menu analysis.• Search (all platforms).• Similarity: User-user / Restaurant-restaurant.• Recommendations (Web, Collections, Emails; Explanations).• Inventory optimization (cover/demand prediction, simulation,

tracking lift).• Sentiment: review analysis.• Review selection. • SEO: Points Of Interest (POIs).• Wait time prediction.• Turn time prediction.

Areas Projects

Page 10: Data science for the hospitality domain - OpenTable

Search

Page 11: Data science for the hospitality domain - OpenTable

Autocomplete: Location

11

names/cuisines/tags

Page 12: Data science for the hospitality domain - OpenTable

SearchRetrievalFacetsRankingTags

12

RankedSearchResults

Facets ~ Search keywords

Dishes

Page 13: Data science for the hospitality domain - OpenTable

13

Frequent queriesiPhone iPad

Page 14: Data science for the hospitality domain - OpenTable

14

Time to book

People have to sleep at some point

20 days in advance

Page 15: Data science for the hospitality domain - OpenTable

15

Hierarchical Cuisines

Page 16: Data science for the hospitality domain - OpenTable

16

Page 17: Data science for the hospitality domain - OpenTable

17

Page 18: Data science for the hospitality domain - OpenTable

18

Machine Learning Ranking

Page 19: Data science for the hospitality domain - OpenTable

Recommendations

Page 20: Data science for the hospitality domain - OpenTable

Personalized Restaurant Ranking

20

Alice 91% 87% 85% 84% 79% 78% 60% 59% 58% 57% 20% 19%

. . .

Bob 95% 91% 87% 85% 80% 78% 71% 69% 61% 57% 12% 10%

. . .

Page 21: Data science for the hospitality domain - OpenTable

Topic ModelsFingerprints for restaurants − from our diners’ perspective

21Italia

n fo

od

pizz

a

win

e

wai

ters

expe

nsiv

e

Page 22: Data science for the hospitality domain - OpenTable

22

Ingredients of a Recommendation Engine

Page 23: Data science for the hospitality domain - OpenTable

Personalized subgroups (lists/rows)

23

Alice 91% 87% 85% 84% 79% 78% 60% 59% 58% 57% 20% 19%

. . .

Alice

Page 24: Data science for the hospitality domain - OpenTable

24

Personalized Emails

Page 25: Data science for the hospitality domain - OpenTable

25

Mobile Recommendations

Page 26: Data science for the hospitality domain - OpenTable

26

Inventory Optimization

Page 27: Data science for the hospitality domain - OpenTable

27

Busy restaurants

We can help optimize their

schedule.

Page 28: Data science for the hospitality domain - OpenTable

28

Seat Most Diners Every Day

The average restaurant has tables empty between turns such that they could accommodate an additional 4,580 diners per year. At $45/guest (avg. cost per meal) that’s $200k.

But squeezing the most out of every seat seems impossible…

Page 29: Data science for the hospitality domain - OpenTable

29

This reservation prevents an earlier one

If the ‘turn time’ for a party of 2 at the restaurant is 2hrs the ‘Bad Reso’ starting at 7:45pm prevents a reservation @6pm. If we could have only asked the user to shift their reservation by a mere 15mins (to start at 8pm) this would have opened an entire new turn on the table (starting at 6pm).

X

Turn Time

Page 30: Data science for the hospitality domain - OpenTable

30

Keep the Table Busy the Whole Night

With the later reservation shifted over by only 15 min, now there is space for an earlier turn.

Possible Reservation

Page 31: Data science for the hospitality domain - OpenTable

31

System Prevents Costly Reservations

If we think diner can book 7:15pm, we will restrict the times that prevent the diner from getting an early turn:

Page 32: Data science for the hospitality domain - OpenTable

Accepting a 7:45pm reservation will prevent an extra turn on that table.

No Insight into Impact of Accepting a Reso

32

Page 33: Data science for the hospitality domain - OpenTable

Restaurant staff knows impact of reservation.

Tetris Shows which Resos Cost a Turn

33

Page 34: Data science for the hospitality domain - OpenTable

34

Simulator No Restrictions Winning Policy

(2 turns) (3 turns)

Page 35: Data science for the hospitality domain - OpenTable

Techniques for Cover Prediction

Page 36: Data science for the hospitality domain - OpenTable

Cover Predictionwhat ? why? how ?

36

• Predict future covers of a specific restaurant. • The predicted covers used in calculating lift.• Predictions: Time series and ML models.

Time

Cove

rs

Past Future

PredictionsReal 𝑥0 , 𝑥1 , 𝑥2 ,⋯ ,𝑥𝑛

𝐹 𝑛+1 ,𝐹𝑛+2 ,⋯ ,𝐹 𝑛+𝑘

𝑥0𝑥1

𝑥𝑛

𝐹 𝑛+1𝐹 𝑛+2𝑥2

Page 37: Data science for the hospitality domain - OpenTable

37

Lift

Lift = Average percentage difference between the observed and predicted covers.

Time

Cove

rs

Past Future

Real w/ new system

Predictions for old systemReal (train)

How the new system did.

How the old system would have done.

Page 38: Data science for the hospitality domain - OpenTable

Average

38

• Predictions = Average of all existing covers.

1

Past

Time

Cove

rs

Future

PredictionsReal

𝐹 𝑛+ 𝑗=1

𝑛+1∑𝑖=0𝑛

𝑥𝑖𝑗∈1,2 ,…

Page 39: Data science for the hospitality domain - OpenTable

Moving Average

39

• Predictions: Average of previous k values.• Sliding window: older data points not used.

2

Future

Time

Cove

rs

Past

Predictions

Real

𝑥𝑡′= 1𝑘∑𝑖=0

𝑘−1

𝑥𝑡− 𝑖 𝑡∈𝑘−1 ,𝑘 ,…

𝐹 𝑛+ 𝑗=𝑥𝑛′ 𝑗∈1 ,…𝑘Forecast

Smoothing

Page 40: Data science for the hospitality domain - OpenTable

Exponential Average

40

• Predictions = Combine existing covers by giving exponentially lower weights to older covers.

• Importance given to recent vs. older covers controlled by .

3

Time

Cove

rs

Past Future

PredictionsReal

𝑥𝑖′=𝛼 ∙ 𝑥𝑖+(1−𝛼) ∙𝑥 𝑖− 1′ 𝑖∈1 ,…,𝑛

𝐹 𝑛+ 𝑗=𝑥𝑛′ 𝑗∈1 ,…𝑘

𝑥0′ =𝑥0 Initialization

Forecast

Smoothing

(Robert Brown, Charles Holt)

Page 41: Data science for the hospitality domain - OpenTable

41

Example: Exponential Average𝑥0′ =𝑥0

𝑥1′ =𝛼 ∙𝑥1+(1−𝛼 ) ∙𝑥0𝑥2′ =𝛼 ∙𝑥2+(1−𝛼 ) ∙𝛼 ∙𝑥1+(1−𝛼 )2 ∙𝑥0

𝑥3′ =𝛼 ∙ 𝑥3+ (1−𝛼 ) ∙𝛼 ∙ 𝑥2+ (1−𝛼 )2 ∙𝛼 ∙ 𝑥1+(1−𝛼)3 ∙𝑥0

𝑥𝑛′ =𝛼 ∙ 𝑥𝑛+(1−𝛼 )1 ∙𝛼 ∙ 𝑥𝑛−1+(1−𝛼 )2 ∙𝛼 ∙𝑥𝑛−2+⋯+ (1−𝛼 )𝑛− 2 ∙𝛼 ∙𝑥1+(1−𝛼)𝑛−1 ∙ 𝑥0

Page 42: Data science for the hospitality domain - OpenTable

Holt Winters 2D

42

• Take into account previous value and the trend.• Trend is the slope between current and previous points. • and control weight given to current point and trend.

4

Time

Cove

rs

Past Future

Predictions

Real Level

Trend

Level

𝑏𝑖=𝛽 ∙ ( 𝑙𝑖−𝑙 𝑖−1 )+ (1− 𝛽) ∙𝑏𝑖−1 Trend

𝑥𝑖′=𝑙𝑖+𝑏𝑖

Forecast

double exponential smoothing

𝑙𝑖=𝛼 ∙ 𝑥𝑖+(1−𝛼 ) ∙ (𝑙𝑖− 1+𝑏𝑖− 1)

𝑙1=𝑥1 ; 𝑏1=𝑥1−𝑥0 Initialization

𝐹 𝑛+ 𝑗=𝑥𝑛′ + 𝑗 ∙𝑏𝑛

Smoothing

𝑖∈2 ,…,𝑛

𝑖∈2 ,…,𝑛

Page 43: Data science for the hospitality domain - OpenTable

43

Example: Holt Winters 2D

𝑙2=𝛼 ∙ 𝑥2+(1−𝛼 ) ∙ (𝑙1+𝑏1 )=𝛼 ∙𝑥2+(1−𝛼 ) ∙ (𝑥1+ (𝑥1−𝑥0 ))=…

𝑙1=𝑥1𝑏1=𝑥1−𝑥0

𝑥2′ =𝑙2+𝑏2

𝑏2=𝛽 ∙ ( 𝑙2− 𝑙1 )+ (1− 𝛽) ∙𝑏1=…

(initialization)

Page 44: Data science for the hospitality domain - OpenTable

Holt Winters 3D

44

• Predictions = Take into account previous value, trend in covers and seasonality.

• Trend is the slope between current and the previous point. • Seasonality takes into account the average of every kth point in the

season, in our case season is 7 points or 1 week.• , and control weight given to current point, trend and seasonality.

5

Time

Cove

rs

Past Future

PredictionsReal Level

TrendSeasonal

𝑙𝑖=𝛼 ∙ (𝑥 𝑖−𝑠𝑖−𝐿 )+(1−𝛼 ) ∙ (𝑙 𝑖−1+𝑏𝑖− 1 ) Level

𝑏𝑖=𝛽 ∙ ( 𝑙𝑖−𝑙 𝑖−1 )+ (1− 𝛽) ∙𝑏𝑖−1 Trend

𝑠𝑖=𝛾 ∙ (𝑥𝑖− 𝑙𝑖 )+ (1−𝛾 ) ∙𝑠𝑖−𝐿 Seasonality

triple exponential smoothing (with additive seasonality)

𝐹 𝑛+ 𝑗=𝑙𝑛  +  𝑗 .𝑏𝑛+𝑠𝑛−𝐿+1+ ( 𝑗 −1)𝑚𝑜𝑑 𝐿 Forecast(Peter Winters)

Page 45: Data science for the hospitality domain - OpenTable

Holt Winters 3D with Seasonality

45

L = season length = 1 week

5

Time

Cove

rs

Future

1 week

Past

Algorithm

ObservedSeasonal

𝑏0=1𝐿 ( 𝑥𝐿+1−𝑥1

𝐿 +𝑥𝐿+2−𝑥2

𝐿 +…+𝑥 𝐿+𝐿−𝑥 𝐿

𝐿 )

𝑙0=𝑥0

Page 46: Data science for the hospitality domain - OpenTable

Calculating Hyper Parameters

46

5• Minimizing objective function: Root

Mean Square Error (RMSE); depends on .

• Nelder-Mead heuristic search method.

• Simplex is a polytope of n + 1 vertices in n dimensions.

• At each step we do: reflection, expansion, contraction or shrinkage.(John A. Nelder & Roger Mead)

Page 47: Data science for the hospitality domain - OpenTable

47

Nelder – Mead: Reflection

Reflection:

𝑓 (𝐱 )Objective function:

Initial test points:

𝐱∈ℝ𝒏

𝐱𝟏 ,…,𝐱𝒏+𝟏∈ℝ𝒏

Sort: 𝑓 (𝐱𝟏 )<…< 𝑓 (𝐱𝒏+𝟏 )

𝐱𝟏

𝐱𝟐

𝐱𝒏+𝟏

𝐱𝟎

𝐱𝒓← 𝐱𝟎+𝜶 ∙ (𝐱𝟎−𝐱𝒏+𝟏 )

𝐱𝒓

𝐱𝟎←𝐱𝟏+…+𝐱𝒏

𝒏Centroid:

𝑓 (𝐱𝟏 )≤ 𝑓 (𝐱𝒓 )< 𝑓 (𝐱𝒏+𝟏 )if𝐱𝒏+𝟏← 𝐱𝒓then

𝐱𝟏

𝐱𝟐

𝐱𝒏+𝟏

𝐱𝟎

𝐱𝒓𝐱𝒏+𝟏

Reflected point

Centroid

𝛼>0

Good value for is .

(e.g., RMSE)

Page 48: Data science for the hospitality domain - OpenTable

48

Nelder – Mead: Expansion

Expanded point:

𝐱𝟏

𝐱𝟐

𝐱𝒏+𝟏

𝐱𝟎

𝐱𝒓Reflected point

𝐱𝒆Expanded point𝑓 (𝐱𝒓 )< 𝑓 (𝐱𝟏 )<…< 𝑓 (𝐱𝒏+𝟏 )

𝐱𝒆← 𝐱𝒓+𝛾 ∙ (𝐱𝒓−𝐱𝟎 )

𝑓 (𝐱𝒆 )< 𝑓 (𝐱 𝒓 )if𝐱𝒏+𝟏← 𝐱𝒆then

else 𝐱𝒏+𝟏← 𝐱𝒓

𝛾>0

𝐱𝒏+𝟏

Good value for is .

Overloaded notation: for Nelder-Mead are different from those in Holt-Winters!

𝛼 ,𝛾

Page 49: Data science for the hospitality domain - OpenTable

49

Nelder – Mead: Contraction

Contracted point:

𝑓 (𝐱𝒏+𝟏 )< 𝑓 (𝐱𝒓 )

𝐱𝟏

𝐱𝟐

𝐱𝒏+𝟏

𝐱𝟎

𝐱𝒓Reflected point

𝐱𝒄← 𝐱𝟎+𝜌 ∙ (𝐱𝒏+𝟏− 𝐱𝟎 ) 0<𝜌≤0.5

𝑓 (𝐱𝒄 )< 𝑓 (𝐱𝒏+𝟏 )if𝐱𝒏+𝟏← 𝐱𝒄then

𝐱𝒄Contracted point

𝐱𝒏+𝟏

Good value for is .

Page 50: Data science for the hospitality domain - OpenTable

50

Nelder – Mead: Shrink

Keep the best point and move all the other points towards it:

𝑓 (𝐱𝟏 )<…< 𝑓 (𝐱𝒏+𝟏 )< 𝑓 (𝐱𝒓 ) , 𝑓 (𝐱𝒄 )

(𝐱𝟏 )

𝐱 𝒊←𝐱 𝒊+𝜎 ∙ (𝐱𝟏−𝐱 𝒊 )

𝑖∈ {2,3 ,…,𝑛+1 }0<𝜎<1

Good value for is .

𝐱𝟏

𝐱𝟐

𝐱𝒏+𝟏

𝐱𝟐

𝐱𝒏+𝟏

Neither the reflection, nor contraction points are good:

Page 51: Data science for the hospitality domain - OpenTable

51

6Co

vers

Time

PredictionsTraining

Baseline: Linear regressionFeatures Weights

Prediction:

�̂�=(X ′ X)− 1 X ′ 𝐲𝐲=X 𝛽

Parameter estimation:

𝐱 ′ 𝛽

Page 52: Data science for the hospitality domain - OpenTable

52

Improvement

Baseline:2 week average

Linear Regression

Linear Regression +Support Vector Machines /

Gradient Boosting Trees

Of the restaurants on the latest platform, now we have:

22% restaurants with less than 5% error;

32% restaurants withless than 10% error.

Page 53: Data science for the hospitality domain - OpenTable

Number of Points Predicted Well

53

Time

Cove

rs

Past Future Future

Time

Cove

rsPast Future

Time

Cove

rs

Past

Average,Moving Average

Exponential Moving Average,Holt Winters 2D

Holt Winters 3D,Linear Regression,SVM / GBT

Page 54: Data science for the hospitality domain - OpenTable

Next: Similar Restaurants

54

• Cluster restaurants based on price, capacity, metro and the average number of covers on a specific day.

• Add the number of reservations of similar restaurants as a feature in the previously discussed models.

Page 55: Data science for the hospitality domain - OpenTable

Summary

55

• OpenTable reservations guarantee a spot at the restaurant.• Data Science:

Search / Recommendations / Advertising. Inventory optimization:

Time series for cover prediction. Optimization.

• Simple, yet effective techniques you can apply.

Page 56: Data science for the hospitality domain - OpenTable

Acknowledgements

56

Bhanu AgarwalChris GouldCorey ReeseCormac TwomeyDavid AmusinEli ChaitIgor GammerJoseph EssasJosh PolskyKatrin TomanekMats Einarsen

Michael HuangOlivier LarivainPablo DelgadoPavel SyrtsovSergei RadutnuySravani KamisettySteve AnnessaUtkarsh SengarWilliam Wu

Page 57: Data science for the hospitality domain - OpenTable
Page 58: Data science for the hospitality domain - OpenTable

58