Download - Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee

Solr on Windows Does it work, does it scale?

Funda Real Estate (www.funda.nl) [email protected], October 20th, 2011

Contents §  Who we are §  Why we chose for Windows, why we chose for Solr §  Our worst fears, maybe yours §  Share our experiences

•  What went spectacularly well •  What we would have loved to know before

production §  Our approach, migrating the data and the people

3

Who we are §  Teun Duynstee, software architect §  funda.nl is a household name in The

Netherlands (real estate search) §  We serve around 10M pageviews a day §  Only in NL (17M inh.) we have 3M unique

visitors eacht month

§  Strongly relies on faceted search §  Used to have homegrown solution, this year

migrated to Solr

4

Screenshots

6

Our architecture §  So why did we use Windows in the 1st place?

•  Great tool support •  Performance is pretty good •  Not that expensive •  Historic reasons

§  Technology stack •  Windows Server 2003 OS (upgrading) •  MS SQL Server 2005 •  ASP.NET MVC 2 •  + MSMQ + memcached + jquery + S3 + ...

11

Old implementation

12

House ID place date price 1 Amsterdam 2011-09-14 450000 2 … … …

SELECT TOP 15 * FROM Houses where place = 'Amsterdam' order by date desc

SELECT SUM(facet1), SUM(facet2) FROM Houses * FROM Houses where place = 'Amsterdam'

House ID place date price facet1 facet2 1 Amsterdam 2011-09-14 450000 1 0 2 … … …

Enters Solr §  Had been looking at commercial search

technology, but too expensive and too risky

§  Enthousiastic, good fit to our functionality §  But will it scale? §  We already had a scale out solution to search

§  Solr 1.2 relied on rsync for replication §  As soon as 1.3 came out, we were sold

13

Run anywhere? §  Not reassured by reactions of experts on Solr

on Windows

§  So what to test at scale/load? •  Index size is small, we’re talking scaling search

volume •  Real user behavior is very hard to emulate using

testing tools

14

How did we plan

15

-  CPU for reading

data

-  CPU for creating

HTML

-  Threading

-  Disk IO for

getting at data

-  Memory for

caching results

-  CPU mostly for

indexing

-  Only

memory

? ü ü

ü

? ? ?

How did we plan §  Experience with the resource requirements of

old solution

§  By placing Solr on all servers where we would stop searching in the database •  The same load should be easily manageable by

“superior technology” •  Network topology and server roles could remain

the same •  No new servers

16

How we migrated §  Testing:

•  Stress tested Solr with representative searches and 100 concurrent threads

•  Stress tested the parsing code: no client library, but lean custom XML parsing

§  Careful adoption •  We selected a part of the functionality that we

could move over to Solr with low risk •  Always made sure that we could rollback to

previous version. Database data kept intact.

17

Libraries §  SolrSharp/SolrNet §  Both looked not like a great fit

§  We rolled a very lightweight XML parsing client §  And a very light fluid URL building API

18

How we migrated §  Choice: no new servers or server roles §  Choice: not running on Linux §  Choice: Tomcat, running as Windows Service §  Choice: Topology master/slave, no sharding §  Choice: Introducing Solr first on non-mission

critical part §  Choice: not using available client libraries, but

roll our own

19

So what happened? §  Initially we only moved a small part of the

search functionality (“recently sold”) to Solr

§  Complete success •  Less code for comparable functionality •  Operations very satisfied with stability •  Fast and low resource use

20

So what happened? §  When we moved all of the core functionality

over to Solr, hell broke loose

§  Everything became very slow, CPU and memory use looked normal

21

So what happened?

22

-  Bandwidth for responses

-  Switch had some 100Mbit

ports

-  No compression enabled

-  Memcached over same ports,

for clients & partially servers

http compression §  The .NET HttpWebRequest class supports

(de)compression, but you must turn it on: webRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

23

Numbers, schema §  138 fields (127 indexed, 12 multivalued) §  Mostly using StrField, BoolField,

SortableIntField, SortableFloatField

§  No TrieFields (we need sortMissingLast) §  No full text, stemming, tokenizing (yet)

24

Numbers, queries §  Our typical query has:

•  All fields returned •  15 resulting docs •  13 field facets •  37 query facets •  Only filters (fq=), no text searches, almost no

scoring, boosting, etc.

§  Main core has: •  1.4M docs •  1.7 GB index (avg. 1300 bytes per doc)

25

Numbers, searching At peak times §  300 Solr searches per sec. (400 on peak days) §  Caches

•  Result cache hit ratio < 1% •  Document cache hit ratio 31% •  Field value cache hit ratio 99% •  Filter cache hit ratio 99%

26

Numbers, indexing §  Incremental updates every 10 min. §  With only 1.3M documents, we can afford

regular full indexes •  Nightly •  1.5 hr

§  Replication in master-slave mode appears very stable under load. •  Memory use during replication is high

27

Numbers, resources §  4 active Solr searchers, running Solr 1.3

•  dual quadcore CPU •  1.2 GB memory for Tomcat

28

Numbers, resources

29

Caching §  We have a memcached distributed cache for

results (cf. Solr result cache)

§  Here we have hit ratios of ca. 25% §  Actually, the time for serializing, deserializing,

network latency and memcached is slower than letting Solr handle everything

§  Network traffic of memcached protocol is lower

30

Wrap Up §  Solr works fine on Windows, we have had no

stability issues in Solr whatsoever §  Pick your client libraries carefully (no obvious

choice) or use raw HTTP/XML §  Think about bandwidth when planning capacity

31

Contact Teun Duynstee

•  [email protected] •  www.funda.nl •  @teuntostring

Photo credits: http://www.sxc.hu/profile/lusi Microsoft Windows 7

32

Download - Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee

Top Related