meetup elasticsearch 13 novembre 2014

21
Orange search engine Jean-Pierre Paris Orange France november 13 th , 2014

Upload: jean-pierre-paris

Post on 02-Jul-2015

263 views

Category:

Technology


1 download

DESCRIPTION

ElasticsearchFR meetup #11 Orange French search engines

TRANSCRIPT

Page 1: Meetup Elasticsearch 13 novembre 2014

Orange search engine

Jean-Pierre Paris

Orange France

november 13th, 2014

Page 2: Meetup Elasticsearch 13 novembre 2014

2 Orange French search engines and Elastisearch

agenda

part 1 Orange French search engine

part 2 why Elasticsearch?

part 3 conclusion

Page 3: Meetup Elasticsearch 13 novembre 2014

3 Orange French search engines and Elastisearch

Orange search engine

4 millions ~1 million

8 bn docs FR

80 persons ~1000 servers 3 datacenters

Page 4: Meetup Elasticsearch 13 novembre 2014

4 Orange French search engines and Elastisearch

search engine response page

§  one response page…

§  with a lot of data sources

§  and a lot of engines

Page 5: Meetup Elasticsearch 13 novembre 2014

5 Orange French search engines and Elastisearch

vertical search engines

Page 6: Meetup Elasticsearch 13 novembre 2014

6 Orange French search engines and Elastisearch

web search and web graph

repris de Wikipedia

Page 7: Meetup Elasticsearch 13 novembre 2014

7 Orange French search engines and Elastisearch

volume

§  vertical search engines

–  10m documents –  5 engines in 2014

§  web graph

–  8bn urls –  2bn internal vertices –  6bn leaf vertices

–  100bn edges

13TB

10GB

Page 8: Meetup Elasticsearch 13 novembre 2014

8 Orange French search engines and Elastisearch

agenda

part 1 Orange French search engine

part 2 why Elasticsearch?

part 3 conclusion

Page 9: Meetup Elasticsearch 13 novembre 2014

9 Orange French search engines and Elastisearch

our needs

§  vertical search engines

–  adopt one common technology –  lower maintenance cost –  prepare future needs

§  web graph

–  gain insight on large dataset –  build analysis and visualization –  test new technology with large volume

Page 10: Meetup Elasticsearch 13 novembre 2014

10 Orange French search engines and Elastisearch

Elasticsearch responses

§  rest interface

§  near real time distributed indexing and distributed search

§  native full text search

–  with a lot of different queries and wildcards §  facets… oups! aggregations!

–  values distribution on a specific criterion §  interactive mode while exploring a dataset

–  short query response time

Page 11: Meetup Elasticsearch 13 novembre 2014

11 Orange French search engines and Elastisearch

hardware architecture

x30

x30

Elasticsearch cluster store store store

Page 12: Meetup Elasticsearch 13 novembre 2014

12 Orange French search engines and Elastisearch

indexing with ES v0.90

§  performances

–  starting at 160 doc/s (1 injector, 4 ES 2cpus, 4GB) –  with bulk 1000: 920 doc/s (1 injector, 4 ES 2cpus, 4GB) –  3 injectors: 570 doc/s * 3 = 1700 doc/s

–  1 injector, 30 ES (8cpus, 16GB): 1700 doc/s –  30 injectors, 30 ES (8cpus, 16GB): 32,000 doc/s –  30 injectors, 60 ES (http-data) (8cpus, 16GB): 36,000 doc/s –  240 injectors, 60 ES (http-data) (8cpus, 16GB): 75,000 doc/s, then

43,000 doc/s –  1bn docs in 5h (55,000 doc/s)

Page 13: Meetup Elasticsearch 13 novembre 2014

13 Orange French search engines and Elastisearch

hardware architecture

x30

x30

Elasticsearch cluster

store http

data data data

store http

store http

Page 14: Meetup Elasticsearch 13 novembre 2014

14 Orange French search engines and Elastisearch

number of shards

0

200

400

600

800

1000

1200

0 5 10 15 20 25 30

321 sec for 12 shards

sec!

#shards!

Page 15: Meetup Elasticsearch 13 novembre 2014

15 Orange French search engines and Elastisearch

bulksize

0

100

200

300

400

500

600

0 1000 2000 3000 4000 5000 6000 7000 8000

278 sec for bulksize 1700

bulksize!

sec!

Page 16: Meetup Elasticsearch 13 novembre 2014

16 Orange French search engines and Elastisearch

searching

§  performance

–  2 req/s out of the box with 6.5TB index –  OS cache is mandatory

–  130 req/s in cache –  lot of requests needed to load cache

§  relevance

–  good for vertical engines –  non significant in web graph experimentation

Page 17: Meetup Elasticsearch 13 novembre 2014

17 Orange French search engines and Elastisearch

why Elasticsearch AND hadoop?

§  simply use existing bridge

–  open-sourced by Elasticsearch §  ability to choose best technology

–  performance –  expression power

§  examples

–  compute and re-inject back links –  distribute Elasticsearch injections

Page 18: Meetup Elasticsearch 13 novembre 2014

18 Orange French search engines and Elastisearch

hardware architecture

x30

x30

Elasticsearch cluster

hadoop cluster

x1

http http http

data hdfs

hadoop hive pig

master

data hdfs

hadoop hive pig

data hdfs

hadoop hive pig

Page 19: Meetup Elasticsearch 13 novembre 2014

19 Orange French search engines and Elastisearch

agenda

part 1 Orange French search engine

part 2 why Elasticsearch?

part 3 conclusion

Page 20: Meetup Elasticsearch 13 novembre 2014

20 Orange French search engines and Elastisearch

conclusion

§  vertical engines

–  migration decided 09/13 –  first set in production 01/14

§  web graph

–  experimentation decided 09/13 –  1bn docs indexed 12/13, significant queries 03/14

§  professional community

§  connectors to others technologies

§  flexibility

–  production and experimentation –  high volume

Page 21: Meetup Elasticsearch 13 novembre 2014

thanks! more infos http://blog.lemoteur.fr

or @lemoteur