using solr in online travel to improve  user experience - by karegowdra sudhakar and donato...

Post on 03-Jul-2015

887 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Using Solr in Online Travel to

Improve User Experience

Sudhakar Karegowdra, Esteban Donato

Travelocity, May 25TH 2011

{ sudhakar.karegowdra, esteban.donato}@travelocity.com

What We Will Cover

Travelocity

Speakers Background

Merchandising & Solr• Challenges

• Solution

• Sizing and performance data

• Take Away

Location Resolution & Solr• Challenges

• Solution

• Sizing and performance data

• Take Away

Q&A3

First Online Travel Agency(OTA) Launched in 1996

Grown to 3,000 employees and is one of the largest

travel agencies worldwide

Headquartered in Dallas/Fort Worth with satellite

offices in San Francisco, New York, London,

Singapore, Bangalore, Buenos Aires to name a few

In 2004, the Roaming Gnome became the

centerpiece of marketing efforts and has become an

international pop icon

Owned by Sabre Holdings - sister companies include

Travelocity Business, IgoUgo.com, lastminute.com,

Zuji among others

4

Speakers Background

Esteban Donato

• Lead Architect

Travelocity.com

My experience

– 10 + years

– Solr 2 years

– Analyzing Mahout and

Carrot2 for document

clustering engine.

Topic :

Location Resolution

5

Sudhakar Karegowdra

• Principal Architect

Travelocity.com

My experience

– 13 + years

– Solr/ Lucene 3 years

– Implementing Hadoop,

Pig and Hive for Data

warehouse.

Topic :

Merchandising

6

Merchandising By Sudhakar Karegowdra

The Challenge

Market Drivers

• Build Landing Pages with Faceted Navigation

• Enable Content Segmentation and delivery

• Support Roll out of Promotions

• Roll up Data to a higher level

E.g., All 5 star hotels in California to bring all the 5 Star

hotels from SFO,LAX, SAN etc.,

• Faster time to market new Ideas

• Rapidly scale to accommodate global brands

with disparate data sources

7

The Challenge

Traditional Database approach

• Higher time to market

• Specialized skill set to design and optimize

database structures and queries

• Aggregation of data and changing of structures

quite complex

• Building Faceted navigation capabilities needs

complex logic leading to high maintenance cost

8

Solution - Overview

Data from various sources aggregated and

ingested into Solr

• Core per Locale and Product Type

Wrapper service to combine some data across

product cores and manage configuration rules

Solr’s built in Search and Faceting to power the

navigation

9

Solution – Architecture View

10

Solr Master (Multi Core)

Oracle

Offer

Management

ToolETL

Services/Business Logic

UI Widgets Mobile

Deals Products ……

Solr Slaves (Multi Core)

Solution - Achievements

Millions of unique Long Tail Landing Pages E.g., http://www.travelocity.com/hotel-d4980-nevada-las-

vegas-hotels_5-star_business-center_green

Faster search across products E.g., Beach Deals under $500

Segmented Content delivery through tagging

Scaled well to distribute the content to different

brands, partners and advertisers

Opened up for other innovative applications Deals on Map, Deals on Mobile, Wizards etc.,

11

Solution – Road Ahead

Migration to Solr 3.1

• Geo spatial search

• CSV out put format

Query boosting by Search pattern

Near Real time Updates

Deal and user behavior mining in Hadoop –

MapReduce and Solr to Serve the Content

Move Slaves to Cloud

12

Sizing & Performance

Index Stats Number of Cores : 25

Number of Documents : ~ 1 Million Records

Response Requests : 70 tps

Average response time : 0.005 seconds (5 ms)

Software Versions Solr Version 1.4.0

– filterCache size : 30000

Tomcat – 5.5.9

JDK1.6

13

Take Away

Semi Structured Storage in Solr helps

aggregate disparate sources easilyRemember Dynamic fields

Multiple Cores to manage multiple locale data

Solr is a great enabler of “Innovations”

14

15

Location ResolutionBy Esteban Donato

The Challenge

How to develop a global location resolution

service?

Flexibility to changes

General enough to cover everyone needs

Multi language

Performance and scalability

Configurable by site

16

Architecture of the solution

17

Location DB

Solr Master

Solr Slave

Management

Tool

Auto-complete

Resolution

Batch Job

Remote Streaming indexing

CSV format

Master/Slave architecture

Multi-core: each core

represents a language

SolrJ client binary format

Solr response cache

Auto-complete

System has to suggest options as the users

type their desired location

Examples “san” => San Francisco, “veg” =>

Las Vegas

Relevancy: not all the locations are equally

important. “par” => “Paris, France”; “Parana,

Argentina”

Users can search by various fields: location

code, location name, city code, city name,

state/province code, state province name,

country code, country name.

18

Solr schema<dynamicField name="RANK*" type="int" required="false" indexed="true" stored="true" />

<field name="GLS_FULL_SEARCH" type="glsSearchField" required="false" indexed="true"

stored="false" multiValued="true" />

<fieldType name="glsSearchField" class="solr.TextField" positionIncrementGap="100“>

<analyzer>

<tokenizer class="solr.PatternTokenizerFactory" pattern="[/\-\t ]+" />

<filter class="solr.LowerCaseFilterFactory" />

<filter class="solr.TrimFilterFactory" />

<filter class="solr.ISOLatin1AccentFilterFactory" />

<filter class="solr.RemoveDuplicatesTokenFilterFactory" />

<filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement=""

replace="all"/>

</analyzer>

</fieldType>

19

Resolution

System has to resolve the location requested

by the users.

Contemplates aliases. Big Apple => New York

Contemplates ambiguities.

Contemplates misspellings. Lomdon => London NGramDistance algorithm.

How to combine distance with relevancy

Error suggesting the correct location when it is a prefix.

Lond => London

20

Spellchecker configuration<fieldType name=" spellcheckType " class="solr.TextField" positionIncrementGap="100“>

<analyzer>

<tokenizer class="solr.KeywordTokenizerFactory” />

<filter class="solr.LowerCaseFilterFactory" />

<filter class="solr.TrimFilterFactory" />

<filter class="solr.ISOLatin1AccentFilterFactory" />

<filter class="solr.RemoveDuplicatesTokenFilterFactory" />

<filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement=""

replace="all"/>

</analyzer>

</fieldType>

21

Sizing & Performance

4 cores with ~ 500,000 documents indexed

each

Response times

• Auto-complete: 15ms, 20 TPS

• Resolution: 10ms, 2 TPS

Cache configuration

• queryResultCache: maxSize=1024

• documentCache, maxSize=1024

• fieldValueCache & filterCache disabled

22

Wrap Up

Performance always as top priority

Develop simple but robust services

Provide a simple API

23

Q&A

24

Contact

Esteban Donato

• Esteban.donato@travelocity.com

• Twitter: @eddonato

Sudhakar Karegowdra

• Sudhakar.karegowdra@travelocity.com

• Twitter: @skaregowdra

https://www.facebook.com/travelocity

Twitter: @travelocity and

@RoamingGnome

25

top related