implementing search with solr at 7digital

Post on 31-May-2015

779 Views

Category:

Education

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presented by James Atherton, Search Team Lead, 7digital A usage/case study, describing our journey as we implemented Lucene/Solr, the lessons we learned along the way and where we hope to go in the future.How we implemented our instant search/search suggest. How we handle trying to index 400 million tracks and metadata for over 40 countries, comprising over 300GB of data, and about 70GB of indexes. Finally where we hope to go in the future.

TRANSCRIPT

Implementing Search with Solr at 7digital

James Atherton Content Discovery Team Lead

Implementing Search

with Solr

James Atherton

Content Discovery Lead

@mr_road

Who is 7digital?

Online digital content provider

Covering over 47 territories

Online music store: www.7digital.com

API: api.7digital.com

We power a number of music services:

Samsung

Blackberry

Turntable.fm

Pure

Where we came from...

SQL Searches

SELECT *

FROM <table>

WHERE name LIKE '<search_term>%';

This was SLOW and BAD!!

Wrapped Solr in an API

Old Architecture

API

DB

Domain Objects

Artist Documents

Release Documents (e.g. album or single)

Track Documents

First Attempt - 2011

• Artists and Releases

• Solr 1.4

• 17 stores

• ~40GB

• Dropped DIH as it had issues

2011 Architecture

HTTP

API

Search

API Solr

DB

Solr

Tracks

Artists

Releases

2012

• Added Tracks Core

• Solr 3.5

• 47 stores

• ~400GB

• More than 430 M docs

• Didn't revisit DIH

Current Architecture

HTTP

API Search

API

Artist/Release

Solrs

Track Solrs Track Solrs

Track Solrs Track Solrs

Artist/Release

Solrs

Things Learnt

We should have split by <X>; for us Shops.

Beware Inflection Points

Data size: 400GB != 40GB * 10

Throughput: 600 rpm IS NOT 4 * 150 rpm

What we want in our servers?

RAM ?

Fast Disks?

CPUs?

Virtual?

Bare Metal?

Optimize really...?

Cache Warming/First search?

Testing

Test ingestion/data import, then test again

Your data is not as clean as you think

Load test early and often

We need to be better at this still

Logs

Logging is worth its weight in gold

But don't get weighed down

Monitoring

We use statsd/graphite and NewRelic:

Visualise Indexing

Which territory's data has been indexed?

Instant Search

Magic Deploys

We recently adopted CFEngine, it is awesome!!

The Future

HTTP

API Search

API

Artist Solrs

Track

Solrs

Track

Solrs

Track

Solrs

Release

Solrs

Track Solrs Solr Cloud, in

the Cloud??

Questions

?

James Atherton

@mr_road

@7digital

We are recruiting, please talk to me afterwards.

Resources

https://github.com/etsy/statsd/

https://github.com/7digital

http://d3js.org/

James Atherton

@mr_road

@7digital

top related