03. elasticsearch : data in, data out

28
ElasticSearch Data In Data Out http://elastic.openthinklabs.com/

Upload: openthink-labs

Post on 11-Apr-2017

238 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: 03. ElasticSearch : Data In, Data Out

ElasticSearch

Data In Data Outhttp://elastic.openthinklabs.com/

Page 2: 03. ElasticSearch : Data In, Data Out

What Is a Document?{ "name":"John Smith", "age":42, "confirmed":true, "join_date":"2014-06-01", "home":{ "lat":51.5, "lon":0.1 }, "accounts":[ { "type":"facebook", "id":"johnsmith" }, { "type":"twitter", "id":"johnsmith" } ]}

Page 3: 03. ElasticSearch : Data In, Data Out

Document Metadata

● _index :: Collection of documents that should be grouped together for a common reason

● _type :: The class of object that the document represents

● _id :: The unique identifier for the document

Page 4: 03. ElasticSearch : Data In, Data Out

Indexing a DocumentUsing Our Own ID

PUT /website/blog/123{ "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01"}

{ "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "created": true}

Index request

Elasticsearch responds

PUT verb : store this document at this URL

Page 5: 03. ElasticSearch : Data In, Data Out

Indexing a DocumentAutogenerating IDs

POST /website/blog/{ "title": "My second blog entry", "text": "Still trying this out...", "date": "2014/01/01"}

{ "_index": "website", "_type": "blog", "_id": "AVeTjE9FnhloyZ20gpEj", "_version": 1, "created": true}

Index request

Elasticsearch responds

POST verb : store this document under this URL

Page 6: 03. ElasticSearch : Data In, Data Out

Retrieving a Document

GET /website/blog/123?pretty

{ "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "found": true, "_source": { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" }}

curl -i -XGET http://localhost:9200/website/blog/124?pretty

HTTP/1.1 404 Not FoundContent-Type: application/json; charset=UTF-8Content-Length: 83

{ "_index" : "website", "_type" : "blog", "_id" : "124", "found" : false}

Page 7: 03. ElasticSearch : Data In, Data Out

Retrieving Part of a Document

GET /website/blog/123?_source=title,text{ "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "found": true, "_source": { "text": "Just trying this out...", "title": "My first blog entry" }}

GET /website/blog/123/_source

{ "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01"}

Page 8: 03. ElasticSearch : Data In, Data Out

Checking Whether a Document Existscurl -i -IHEAD http://localhost:9200/website/blog/123

HTTP/1.1 200 OKContent-Type: text/plain; charset=UTF-8Content-Length: 0

curl -i -IHEAD http://localhost:9200/website/blog/124

HTTP/1.1 404 Not FoundContent-Type: text/plain; charset=UTF-8Content-Length: 0

Page 9: 03. ElasticSearch : Data In, Data Out

Updating a Whole Document

● Documents in Elasticsearch are immutable; we cannot change them. Instead, if we need to update an existing document, we reindex or replace it, which we can do using the same index API

PUT /website/blog/123{ "title": "My first blog entry", "text": "I am starting to get the hang of this...", "date": "2014/01/02"} {

"_index": "website", "_type": "blog", "_id": "123", "_version": 2, "created": false}

Page 10: 03. ElasticSearch : Data In, Data Out

Creating a New Document

POST /website/blog/{ ... }

PUT /website/blog/123?op_type=create{ ... }

PUT /website/blog/123/_create{ ... }

1

2

3

PUT /website/blog/123?op_type=create{ "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01"}

{ "error": "DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]", "status": 409}

Page 11: 03. ElasticSearch : Data In, Data Out

Deleting a DocumentDELETE /website/blog/123

{ "found": true, "_index": "website", "_type": "blog", "_id": "123", "_version": 3}

{ "found": false, "_index": "website", "_type": "blog", "_id": "123", "_version": 1}

DELETE /website/blog/123

Page 12: 03. ElasticSearch : Data In, Data Out

Dealing with ConflictsConsequence of no concurrency control

Page 13: 03. ElasticSearch : Data In, Data Out

Optimistic Concurrency ControlPUT /website/blog/1/_create{ "title": "My first blog entry", "text": "Just trying this out..."}

GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "_version": 1, "found": true, "_source": { "title": "My first blog entry", "text": "Just trying this out..." }}

PUT /website/blog/1?version=1 { "title": "My first blog entry", "text": "Starting to get the hang of this..."}

{ "_index": "website", "_type": "blog", "_id": "1", "_version": 2, "created": false}

12

3

Page 14: 03. ElasticSearch : Data In, Data Out

Using Versions from an External System

PUT /website/blog/2?version=5&version_type=external{ "title": "My first external blog entry", "text": "Starting to get the hang of this..."} {

"_index": "website", "_type": "blog", "_id": "2", "_version": 5, "created": true}

PUT /website/blog/2?version=10&version_type=external{ "title": "My first external blog entry", "text": "This is a piece of cake..."}

{ "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "created": false}

PUT /website/blog/2?version=10&version_type=external{ "title": "My first external blog entry", "text": "This is a piece of cake..."}

{ "error": "VersionConflictEngineException[[website][3] [blog][2]: version conflict, current [10], provided [10]]", "status": 409}

1

2

3

Page 15: 03. ElasticSearch : Data In, Data Out

Partial Updates to Documents

POST /website/blog/1/_update{ "doc" : { "tags" : [ "testing" ], "views": 0 }}

{ "_index": "website", "_type": "blog", "_id": "1", "_version": 3}

GET /website/blog/1

{ "_index": "website", "_type": "blog", "_id": "1", "_version": 3, "found": true, "_source": { "title": "My first blog entry", "text": "Starting to get the hang of this...", "views": 0, "tags": [ "testing" ] }}

1

2

Page 16: 03. ElasticSearch : Data In, Data Out

Using Scripts to Make Partial Updates

POST /website/blog/1/_update{ "script" : "ctx._source.views+=1"}

{ "_index": "website", "_type": "blog", "_id": "1", "_version": 4}

POST /website/blog/1/_update{ "script" : "ctx._source.tags+=new_tag", "params" : { "new_tag" : "search" }}

{ "_index": "website", "_type": "blog", "_id": "1", "_version": 5}

GET /website/blog/1

{ "_index": "website", "_type": "blog", "_id": "1", "_version": 6, "found": true, "_source": { "title": "My first blog entry", "text": "Starting to get the hang of this...", "views": 1, "tags": [ "testing", "search" ] }}

1

2

3

Page 17: 03. ElasticSearch : Data In, Data Out

Using Scripts to Make Partial Updates

POST /website/blog/1/_update{ "script" : "ctx.op = ctx._source.views == count ? 'delete' : 'none'", "params" : { "count": 1 }}

Delete a document based on its contents, by setting ctx.op to delete

GET /website/blog/1{ "_index": "website", "_type": "blog", "_id": "1", "found": false}

Page 18: 03. ElasticSearch : Data In, Data Out

Updating a Document That May Not Yet Exist

POST /website/pageviews/1/_update{ "script" : "ctx._source.views+=1", "upsert": { "views": 1 }}

{ "_index": "website", "_type": "pageviews", "_id": "1", "_version": 1}

GET /website/pageviews/1 { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 1, "found": true, "_source": { "views": 1 }}

Page 19: 03. ElasticSearch : Data In, Data Out

Update and ConflictsPOST /website/pageviews/1/_update?retry_on_conflict=5 { "script" : "ctx._source.views+=1", "upsert": { "views": 0 }}

{ "_index": "website", "_type": "pageviews", "_id": "1", "_version": 2 "found": true, "_source": { "views": 2 }}

Page 20: 03. ElasticSearch : Data In, Data Out

Retrieving Multiple DocumentsGET /_mget{ "docs" : [ { "_index" : "website", "_type" : "blog", "_id" : 2 }, { "_index" : "website", "_type" : "pageviews", "_id" : 1, "_source": "views" } ]}

{ "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 3, "found": true, "_source": { "views": 3 } } ]}

Page 21: 03. ElasticSearch : Data In, Data Out

Retrieving Multiple Documents

GET /website/blog/_mget{ "docs" : [ { "_id" : 2 }, { "_type" : "pageviews", "_id" : 1 } ]}

{ "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 3, "found": true, "_source": { "views": 3 } } ]}

Page 22: 03. ElasticSearch : Data In, Data Out

Retrieving Multiple Documents

GET /website/blog/_mget{ "ids" : [ "2", "1" ]}

{ "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "blog", "_id": "1", "found": false } ]}

Page 23: 03. ElasticSearch : Data In, Data Out

Cheaper in Bulk

{ action: { metadata }}\n{ request body }\n{ action: { metadata }}\n{ request body }\n...

The bulk request body has the following format :

POST /_bulk{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }} { "create": { "_index": "website", "_type": "blog", "_id": "123" }}{ "title": "My first blog post" }{ "index": { "_index": "website", "_type": "blog" }}{ "title": "My second blog post" }{ "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} }{ "doc" : {"title" : "My updated blog post"} }

{ "took": 4, "errors": false, "items": [ { "delete": { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "status": 404, "found": false } }, { "create": { "_index": "website", "_type": "blog", "_id": "123", "_version": 2, "status": 201 } }, { "create": { "_index": "website", "_type": "blog", "_id": "AVeVu4ZmPwPQAxVyMVtH", "_version": 1, "status": 201 } }, { "update": { "_index": "website", "_type": "blog", "_id": "123", "_version": 3, "status": 200 } } ]}

Page 24: 03. ElasticSearch : Data In, Data Out

Cheaper in BulkPOST /_bulk{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}{ "title": "Cannot create - it already exists" }{ "index": { "_index": "website", "_type": "blog", "_id": "123" }}{ "title": "But we can update it" }

{ "took": 2, "errors": true, "items": [ { "create": { "_index": "website", "_type": "blog", "_id": "123", "status": 409, "error": "DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]" } }, { "index": { "_index": "website", "_type": "blog", "_id": "123", "_version": 4, "status": 200 } } ]}

Page 25: 03. ElasticSearch : Data In, Data Out

Don’t Repeat YourselfPOST /website/_bulk{ "index": { "_type": "log" }}{ "event": "User logged in" } {

"took": 3, "errors": false, "items": [ { "create": { "_index": "website", "_type": "log", "_id": "AVeVyqWVPwPQAxVyMV3_", "_version": 1, "status": 201 } } ]}

Page 26: 03. ElasticSearch : Data In, Data Out

Don’t Repeat YourselfPOST /website/log/_bulk{ "index": {}}{ "event": "User logged in" }{ "index": { "_type": "blog" }}{ "title": "Overriding the default type" }

{ "took": 2, "errors": false, "items": [ { "create": { "_index": "website", "_type": "log", "_id": "AVeVzBQjPwPQAxVyMV4_", "_version": 1, "status": 201 } }, { "create": { "_index": "website", "_type": "blog", "_id": "AVeVzBQjPwPQAxVyMV5A", "_version": 1, "status": 201 } } ]}

Page 27: 03. ElasticSearch : Data In, Data Out

How Big Is Too Big ?