implementing search with solr at 7digital

26
Implementing Search with Solr at 7digital James Atherton Content Discovery Team Lead

Upload: lucenerevolution

Post on 31-May-2015

779 views

Category:

Education


3 download

DESCRIPTION

Presented by James Atherton, Search Team Lead, 7digital A usage/case study, describing our journey as we implemented Lucene/Solr, the lessons we learned along the way and where we hope to go in the future.How we implemented our instant search/search suggest. How we handle trying to index 400 million tracks and metadata for over 40 countries, comprising over 300GB of data, and about 70GB of indexes. Finally where we hope to go in the future.

TRANSCRIPT

Page 1: Implementing search with solr at 7digital

Implementing Search with Solr at 7digital

James Atherton Content Discovery Team Lead

Page 2: Implementing search with solr at 7digital

Implementing Search

with Solr

James Atherton

Content Discovery Lead

@mr_road

Page 3: Implementing search with solr at 7digital

Who is 7digital?

Online digital content provider

Covering over 47 territories

Online music store: www.7digital.com

API: api.7digital.com

We power a number of music services:

Samsung

Blackberry

Turntable.fm

Pure

Page 4: Implementing search with solr at 7digital

Where we came from...

SQL Searches

SELECT *

FROM <table>

WHERE name LIKE '<search_term>%';

This was SLOW and BAD!!

Page 5: Implementing search with solr at 7digital

Wrapped Solr in an API

Page 6: Implementing search with solr at 7digital

Old Architecture

API

DB

Page 7: Implementing search with solr at 7digital

Domain Objects

Artist Documents

Release Documents (e.g. album or single)

Track Documents

Page 8: Implementing search with solr at 7digital

First Attempt - 2011

• Artists and Releases

• Solr 1.4

• 17 stores

• ~40GB

• Dropped DIH as it had issues

Page 9: Implementing search with solr at 7digital

2011 Architecture

HTTP

API

Search

API Solr

DB

Solr

Tracks

Artists

Releases

Page 10: Implementing search with solr at 7digital

2012

• Added Tracks Core

• Solr 3.5

• 47 stores

• ~400GB

• More than 430 M docs

• Didn't revisit DIH

Page 11: Implementing search with solr at 7digital

Current Architecture

HTTP

API Search

API

Artist/Release

Solrs

Track Solrs Track Solrs

Track Solrs Track Solrs

Artist/Release

Solrs

Page 12: Implementing search with solr at 7digital

Things Learnt

We should have split by <X>; for us Shops.

Page 13: Implementing search with solr at 7digital

Beware Inflection Points

Data size: 400GB != 40GB * 10

Throughput: 600 rpm IS NOT 4 * 150 rpm

Page 14: Implementing search with solr at 7digital

What we want in our servers?

RAM ?

Fast Disks?

CPUs?

Virtual?

Bare Metal?

Page 15: Implementing search with solr at 7digital

Optimize really...?

Page 16: Implementing search with solr at 7digital

Cache Warming/First search?

Page 17: Implementing search with solr at 7digital

Testing

Test ingestion/data import, then test again

Your data is not as clean as you think

Load test early and often

We need to be better at this still

Page 18: Implementing search with solr at 7digital

Logs

Logging is worth its weight in gold

But don't get weighed down

Page 19: Implementing search with solr at 7digital

Monitoring

We use statsd/graphite and NewRelic:

Page 20: Implementing search with solr at 7digital

Visualise Indexing

Which territory's data has been indexed?

Page 21: Implementing search with solr at 7digital

Instant Search

Page 22: Implementing search with solr at 7digital

Magic Deploys

We recently adopted CFEngine, it is awesome!!

Page 23: Implementing search with solr at 7digital

The Future

HTTP

API Search

API

Artist Solrs

Track

Solrs

Track

Solrs

Track

Solrs

Release

Solrs

Track Solrs Solr Cloud, in

the Cloud??

Page 24: Implementing search with solr at 7digital

Questions

?

James Atherton

@mr_road

@7digital

We are recruiting, please talk to me afterwards.

Page 25: Implementing search with solr at 7digital

Resources

https://github.com/etsy/statsd/

https://github.com/7digital

http://d3js.org/

Page 26: Implementing search with solr at 7digital

James Atherton

@mr_road

@7digital