realtime analytics with elasticsearch [new media inspiration 2013]

Download Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Post on 27-Jan-2015

103 views

Category:

Technology

2 download

Embed Size (px)

DESCRIPTION

A presentation from the New Media Inspiration 2013 conference (http://www.tuesday.cz/akce/new-media-inspiration-2013/) about using Elasticsearch's faceting features for realtime analytics of big data.

TRANSCRIPT

  • 1. Real time analyticsof big data with ElasticsearchKarel Minak

2. cetsFaly ticsSONAnaJ http://www.youtube.com/watch?v=-GftBySG99Q 3. http://karmi.czhttp://elasticsearch.comRealtime Analytics With ElasticSearch 4. Using a search engine for analytics?wat?Realtime Analytics With ElasticSearch 5. HOW DOES SEARCH WORK?A collection of documentsfile_1.txtThe ruby is a pink to blood-red colored gemstone ...file_2.txtRuby is a dynamic, reflective, general-purpose object-oriented programming language ...file_3.txt"Ruby" is a song by English rock band Kaiser Chiefs ... 6. HOW DOES SEARCH WORK?How do you search documents?File.read(file_1.txt).include?(ruby)File.read(file_2.txt).include?(ruby)... 7. HOW DOES SEARCH WORK?The inverted indexTOKENS POSTINGS ruby file_1.txtfile_2.txtfile_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflectivefile_2.txt programming file_2.txt song file_3.txt englishfile_3.txt rock file_3.txthttp://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices 8. HOW DOES SEARCH WORK?The inverted indexsearch "ruby" ruby file_1.txtfile_2.txtfile_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflectivefile_2.txt programming file_2.txt song file_3.txt englishfile_3.txt rock file_3.txthttp://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices 9. HOW DOES SEARCH WORK?The inverted indexsearch "song" ruby file_1.txtfile_2.txtfile_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflectivefile_2.txt programming file_2.txt song file_3.txt englishfile_3.txt rock file_3.txthttp://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices 10. HOW DOES SEARCH WORK?The inverted indexsearch "ruby AND song" ruby file_1.txtfile_2.txtfile_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflectivefile_2.txt programming file_2.txt song file_3.txt englishfile_3.txt rock file_3.txthttp://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices 11. HOW DOES SEARCH WORK?The inverted indexTOKENS POSTINGSStatistics! ruby3file_1.txtfile_2.txtfile_3.txt pink1file_1.txt gemstone file_1.txt dynamic file_2.txt reflectivefile_2.txt programming file_2.txt song file_3.txt englishfile_3.txt rock file_3.txthttp://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices 12. http://elasticsearch.org 13. ElasticSearch is an open source, scalable,distributed, cloud-ready, highly-available full-text search engine and database with powerfulaggregation features, communicating by JSONover RESTful HTTP, based on ApacheLucene. Realtime Analytics With ElasticSearch 14. FACETSFaceted NavigationQueryFacets http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/ 15. FACETSFaceted Navigation with Elasticsearchcurl "http://localhost:9200/people/_search?pretty=true" -d { "query" : { "match" : { "name" : "John"}User query }, "filter" : { "terms" : { "employer" : ["IBM"] } Checkboxes }, "facets" : { "employer" : { "terms" : {Facets "field" : "employer", "size" : 3 }"facets" : { } "employer" : { } "missing" : 0,} "total" : 10, "other" : 3, "terms" : [ { "term" : "ibm", Response "count" : 3 }, { "term" : "twitter", "count" : 2 }, { "term" : "apple", "count" : 2 } ] } }http://www.elasticsearch.org/guide/reference/api/search/facets/index.html 16. FACETSVisualizing the Facets"facets" : { "employer" : { "missing" : 0, "total" : 10, "other" : 3, "terms" : [ { "term" : "ibm", "count" : 3 }, { "term" : "twitter", "count" : 2 }, { "term" : "apple", "count" : 2 } ]DEMO: http://bl.ocks.org/4571766 } }d3.js ~ A Bar Chart, Part 1http://mbostock.github.com/d3/tutorial/bar-1.html 17. FACETSVisualizing the Facets 18. FACETSVisualizing the Facets 19. FACETSVisualizing the Facetshttp://demo.kibana.org 20. Important Concepts No batch orientation No stats precomputation and caching No predefined metrics or schemas Combination of free text search, structuredsearch, and facets Scripting for performing adhoc analytics Extendable: write your own facet types Realtime Analytics With ElasticSearch 21. FACETSScriptingExtract and aggregate most popular domains from article URLscurl -X DELETE localhost:9200/demo-articlescurl -X POST localhost:9200/demo-articles -d {"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }curl -X PUT localhost:9200/demo-articles/a/1 -d {"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}curl -X PUT localhost:9200/demo-articles/a/2 -d {"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}curl -X PUT localhost:9200/demo-articles/a/3 -d {"title":"...","url":"http://some.blogger.com/about.html"}curl -X PUT localhost:9200/demo-articles/a/5 -d {"title":"...","url":"https://github.com/user/A"}curl -X PUT localhost:9200/demo-articles/a/5 -d {"title":"...","url":"http://github.com/user/B"}curl -X POST localhost:9200/demo-articles/_refreshcurl -X GET localhost:9200/demo-articles/_search/?search_type=count&pretty -d {"facets": {"popular-domains": {"terms": {"field" :"url","script" : "term.replace(newRegExp("https?://"), "").split("/")[0]","lang" : "javascript"} } } "facets" : {} "popular-domains" : { // ... "terms" : [ { Response "term" : "some.blogger.com", "count" : 3 }, { "term" : "github.com", "count" : 1 } ] } } 22. FACETSDemonstrationsExtract and aggregate most popular domains from article URLscurl -X DELETE localhost:9200/demo-articlescurl -X POST localhost:9200/demo-articles -d {"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }curl -X PUT localhost:9200/demo-articles/a/1 -d {"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}curl -X PUT localhost:9200/demo-articles/a/2 -d {"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}curl -X PUT localhost:9200/demo-articles/a/3 -d {"title":"...","url":"http://some.blogger.com/about.html"}curl -X PUT localhost:9200/demo-articles/a/5 -d {"title":"...","url":"https://github.com/user/A"}curl -X PUT localhost:9200/demo-articles/a/5 -d {"title":"...","url":"http://github.com/user/B"}curl -X POST localhost:9200/demo-articles/_refreshcurl -X GET localhost:9200/demo-articles/_search/?search_type=count&pretty -d {"facets": {"popular-domains": {"terms": {"field" :"url","script" : "term.replace(newRegExp("https?://"), "").split("/")[0]","lang" : "javascript"}} } }Demo "facets" : { "popular-domains" : { // ... "terms" : [ { Response "term" : "some.blogger.com", "count" : 3 }, { "term" : "github.com", "count" : 1 } ] } } 23. Thanks!d