meetup elasticsearch 13 novembre 2014

Post on 02-Jul-2015

263 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

ElasticsearchFR meetup #11 Orange French search engines

TRANSCRIPT

Orange search engine

Jean-Pierre Paris

Orange France

november 13th, 2014

2 Orange French search engines and Elastisearch

agenda

part 1 Orange French search engine

part 2 why Elasticsearch?

part 3 conclusion

3 Orange French search engines and Elastisearch

Orange search engine

4 millions ~1 million

8 bn docs FR

80 persons ~1000 servers 3 datacenters

4 Orange French search engines and Elastisearch

search engine response page

§  one response page…

§  with a lot of data sources

§  and a lot of engines

5 Orange French search engines and Elastisearch

vertical search engines

6 Orange French search engines and Elastisearch

web search and web graph

repris de Wikipedia

7 Orange French search engines and Elastisearch

volume

§  vertical search engines

–  10m documents –  5 engines in 2014

§  web graph

–  8bn urls –  2bn internal vertices –  6bn leaf vertices

–  100bn edges

13TB

10GB

8 Orange French search engines and Elastisearch

agenda

part 1 Orange French search engine

part 2 why Elasticsearch?

part 3 conclusion

9 Orange French search engines and Elastisearch

our needs

§  vertical search engines

–  adopt one common technology –  lower maintenance cost –  prepare future needs

§  web graph

–  gain insight on large dataset –  build analysis and visualization –  test new technology with large volume

10 Orange French search engines and Elastisearch

Elasticsearch responses

§  rest interface

§  near real time distributed indexing and distributed search

§  native full text search

–  with a lot of different queries and wildcards §  facets… oups! aggregations!

–  values distribution on a specific criterion §  interactive mode while exploring a dataset

–  short query response time

11 Orange French search engines and Elastisearch

hardware architecture

x30

x30

Elasticsearch cluster store store store

12 Orange French search engines and Elastisearch

indexing with ES v0.90

§  performances

–  starting at 160 doc/s (1 injector, 4 ES 2cpus, 4GB) –  with bulk 1000: 920 doc/s (1 injector, 4 ES 2cpus, 4GB) –  3 injectors: 570 doc/s * 3 = 1700 doc/s

–  1 injector, 30 ES (8cpus, 16GB): 1700 doc/s –  30 injectors, 30 ES (8cpus, 16GB): 32,000 doc/s –  30 injectors, 60 ES (http-data) (8cpus, 16GB): 36,000 doc/s –  240 injectors, 60 ES (http-data) (8cpus, 16GB): 75,000 doc/s, then

43,000 doc/s –  1bn docs in 5h (55,000 doc/s)

13 Orange French search engines and Elastisearch

hardware architecture

x30

x30

Elasticsearch cluster

store http

data data data

store http

store http

14 Orange French search engines and Elastisearch

number of shards

0

200

400

600

800

1000

1200

0 5 10 15 20 25 30

321 sec for 12 shards

sec!

#shards!

15 Orange French search engines and Elastisearch

bulksize

0

100

200

300

400

500

600

0 1000 2000 3000 4000 5000 6000 7000 8000

278 sec for bulksize 1700

bulksize!

sec!

16 Orange French search engines and Elastisearch

searching

§  performance

–  2 req/s out of the box with 6.5TB index –  OS cache is mandatory

–  130 req/s in cache –  lot of requests needed to load cache

§  relevance

–  good for vertical engines –  non significant in web graph experimentation

17 Orange French search engines and Elastisearch

why Elasticsearch AND hadoop?

§  simply use existing bridge

–  open-sourced by Elasticsearch §  ability to choose best technology

–  performance –  expression power

§  examples

–  compute and re-inject back links –  distribute Elasticsearch injections

18 Orange French search engines and Elastisearch

hardware architecture

x30

x30

Elasticsearch cluster

hadoop cluster

x1

http http http

data hdfs

hadoop hive pig

master

data hdfs

hadoop hive pig

data hdfs

hadoop hive pig

19 Orange French search engines and Elastisearch

agenda

part 1 Orange French search engine

part 2 why Elasticsearch?

part 3 conclusion

20 Orange French search engines and Elastisearch

conclusion

§  vertical engines

–  migration decided 09/13 –  first set in production 01/14

§  web graph

–  experimentation decided 09/13 –  1bn docs indexed 12/13, significant queries 03/14

§  professional community

§  connectors to others technologies

§  flexibility

–  production and experimentation –  high volume

thanks! more infos http://blog.lemoteur.fr

or @lemoteur

top related