implementing search with solr at 7digital

Implementing Search with Solr at 7digital

James Atherton Content Discovery Team Lead

Implementing Search

with Solr

James Atherton

Content Discovery Lead

@mr_road

Who is 7digital?

Online digital content provider

Covering over 47 territories

Online music store: www.7digital.com

API: api.7digital.com

We power a number of music services:

Samsung

Blackberry

Turntable.fm

Where we came from...

SQL Searches

SELECT *

FROM <table>

WHERE name LIKE '<search_term>%';

This was SLOW and BAD!!

Wrapped Solr in an API

Old Architecture

Domain Objects

Artist Documents

Release Documents (e.g. album or single)

Track Documents

First Attempt - 2011

• Artists and Releases

• Solr 1.4

• 17 stores

• ~40GB

• Dropped DIH as it had issues

2011 Architecture

Search

API Solr

Tracks

Artists

Releases

• Added Tracks Core

• Solr 3.5

• 47 stores

• ~400GB

• More than 430 M docs

• Didn't revisit DIH

Current Architecture

API Search

Artist/Release

Track Solrs Track Solrs

Artist/Release

Things Learnt

We should have split by <X>; for us Shops.

Beware Inflection Points

Data size: 400GB != 40GB * 10

Throughput: 600 rpm IS NOT 4 * 150 rpm

What we want in our servers?

Fast Disks?

Virtual?

Bare Metal?

Optimize really...?

Cache Warming/First search?

Testing

Test ingestion/data import, then test again

Your data is not as clean as you think

Load test early and often

We need to be better at this still

Logging is worth its weight in gold

But don't get weighed down

Monitoring

We use statsd/graphite and NewRelic:

Visualise Indexing

Which territory's data has been indexed?

Instant Search

Magic Deploys

We recently adopted CFEngine, it is awesome!!

The Future

API Search

Artist Solrs

Release

Track Solrs Solr Cloud, in

the Cloud??

Questions

James Atherton

@mr_road

@7digital

We are recruiting, please talk to me afterwards.

Resources

https://github.com/etsy/statsd/

https://github.com/7digital

http://d3js.org/

James Atherton

@mr_road

@7digital

implementing search with solr at 7digital

releases solr

instant search

added tracks core solr

cache warmingfirst search

singletrack documents

againyour data

thinkload test

number of music services

Education

optimizing solr to improve...

understanding the solr security framework - lucene solr...

solr presentation

meetup solr

sfbay area solr meetup - june 18th: benchmarking solr...

implementing a custom search syntax using solr, lucene &...

scaling solr with solr cloud

solr flair

solr + jquery =

solr -...

schemaless solr and the solr schema rest api

sign up to view the on-demand webinar,implementing search...

apache solr

typo3 camp poznan - solr usecases with hosted solr

solr flair: search user interfaces powered by apache solr

implementing conceptual search in solr using lsa and...

solr recipes

implementing a custom search syntax using solr, lucene, and...

inside solr 5 - bangalore solr/lucene meetup

nyc lucene/solr meetup: spark / solr