03. elasticsearch : data in, data out
TRANSCRIPT
What Is a Document?{ "name":"John Smith", "age":42, "confirmed":true, "join_date":"2014-06-01", "home":{ "lat":51.5, "lon":0.1 }, "accounts":[ { "type":"facebook", "id":"johnsmith" }, { "type":"twitter", "id":"johnsmith" } ]}
Document Metadata
● _index :: Collection of documents that should be grouped together for a common reason
● _type :: The class of object that the document represents
● _id :: The unique identifier for the document
Indexing a DocumentUsing Our Own ID
PUT /website/blog/123{ "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01"}
{ "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "created": true}
Index request
Elasticsearch responds
PUT verb : store this document at this URL
Indexing a DocumentAutogenerating IDs
POST /website/blog/{ "title": "My second blog entry", "text": "Still trying this out...", "date": "2014/01/01"}
{ "_index": "website", "_type": "blog", "_id": "AVeTjE9FnhloyZ20gpEj", "_version": 1, "created": true}
Index request
Elasticsearch responds
POST verb : store this document under this URL
Retrieving a Document
GET /website/blog/123?pretty
{ "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "found": true, "_source": { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" }}
curl -i -XGET http://localhost:9200/website/blog/124?pretty
HTTP/1.1 404 Not FoundContent-Type: application/json; charset=UTF-8Content-Length: 83
{ "_index" : "website", "_type" : "blog", "_id" : "124", "found" : false}
Retrieving Part of a Document
GET /website/blog/123?_source=title,text{ "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "found": true, "_source": { "text": "Just trying this out...", "title": "My first blog entry" }}
GET /website/blog/123/_source
{ "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01"}
Checking Whether a Document Existscurl -i -IHEAD http://localhost:9200/website/blog/123
HTTP/1.1 200 OKContent-Type: text/plain; charset=UTF-8Content-Length: 0
curl -i -IHEAD http://localhost:9200/website/blog/124
HTTP/1.1 404 Not FoundContent-Type: text/plain; charset=UTF-8Content-Length: 0
Updating a Whole Document
● Documents in Elasticsearch are immutable; we cannot change them. Instead, if we need to update an existing document, we reindex or replace it, which we can do using the same index API
PUT /website/blog/123{ "title": "My first blog entry", "text": "I am starting to get the hang of this...", "date": "2014/01/02"} {
"_index": "website", "_type": "blog", "_id": "123", "_version": 2, "created": false}
Creating a New Document
POST /website/blog/{ ... }
PUT /website/blog/123?op_type=create{ ... }
PUT /website/blog/123/_create{ ... }
1
2
3
PUT /website/blog/123?op_type=create{ "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01"}
{ "error": "DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]", "status": 409}
Deleting a DocumentDELETE /website/blog/123
{ "found": true, "_index": "website", "_type": "blog", "_id": "123", "_version": 3}
{ "found": false, "_index": "website", "_type": "blog", "_id": "123", "_version": 1}
DELETE /website/blog/123
Dealing with ConflictsConsequence of no concurrency control
Optimistic Concurrency ControlPUT /website/blog/1/_create{ "title": "My first blog entry", "text": "Just trying this out..."}
GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "_version": 1, "found": true, "_source": { "title": "My first blog entry", "text": "Just trying this out..." }}
PUT /website/blog/1?version=1 { "title": "My first blog entry", "text": "Starting to get the hang of this..."}
{ "_index": "website", "_type": "blog", "_id": "1", "_version": 2, "created": false}
12
3
Using Versions from an External System
PUT /website/blog/2?version=5&version_type=external{ "title": "My first external blog entry", "text": "Starting to get the hang of this..."} {
"_index": "website", "_type": "blog", "_id": "2", "_version": 5, "created": true}
PUT /website/blog/2?version=10&version_type=external{ "title": "My first external blog entry", "text": "This is a piece of cake..."}
{ "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "created": false}
PUT /website/blog/2?version=10&version_type=external{ "title": "My first external blog entry", "text": "This is a piece of cake..."}
{ "error": "VersionConflictEngineException[[website][3] [blog][2]: version conflict, current [10], provided [10]]", "status": 409}
1
2
3
Partial Updates to Documents
POST /website/blog/1/_update{ "doc" : { "tags" : [ "testing" ], "views": 0 }}
{ "_index": "website", "_type": "blog", "_id": "1", "_version": 3}
GET /website/blog/1
{ "_index": "website", "_type": "blog", "_id": "1", "_version": 3, "found": true, "_source": { "title": "My first blog entry", "text": "Starting to get the hang of this...", "views": 0, "tags": [ "testing" ] }}
1
2
Using Scripts to Make Partial Updates
POST /website/blog/1/_update{ "script" : "ctx._source.views+=1"}
{ "_index": "website", "_type": "blog", "_id": "1", "_version": 4}
POST /website/blog/1/_update{ "script" : "ctx._source.tags+=new_tag", "params" : { "new_tag" : "search" }}
{ "_index": "website", "_type": "blog", "_id": "1", "_version": 5}
GET /website/blog/1
{ "_index": "website", "_type": "blog", "_id": "1", "_version": 6, "found": true, "_source": { "title": "My first blog entry", "text": "Starting to get the hang of this...", "views": 1, "tags": [ "testing", "search" ] }}
1
2
3
Using Scripts to Make Partial Updates
POST /website/blog/1/_update{ "script" : "ctx.op = ctx._source.views == count ? 'delete' : 'none'", "params" : { "count": 1 }}
Delete a document based on its contents, by setting ctx.op to delete
GET /website/blog/1{ "_index": "website", "_type": "blog", "_id": "1", "found": false}
Updating a Document That May Not Yet Exist
POST /website/pageviews/1/_update{ "script" : "ctx._source.views+=1", "upsert": { "views": 1 }}
{ "_index": "website", "_type": "pageviews", "_id": "1", "_version": 1}
GET /website/pageviews/1 { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 1, "found": true, "_source": { "views": 1 }}
Update and ConflictsPOST /website/pageviews/1/_update?retry_on_conflict=5 { "script" : "ctx._source.views+=1", "upsert": { "views": 0 }}
{ "_index": "website", "_type": "pageviews", "_id": "1", "_version": 2 "found": true, "_source": { "views": 2 }}
Retrieving Multiple DocumentsGET /_mget{ "docs" : [ { "_index" : "website", "_type" : "blog", "_id" : 2 }, { "_index" : "website", "_type" : "pageviews", "_id" : 1, "_source": "views" } ]}
{ "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 3, "found": true, "_source": { "views": 3 } } ]}
Retrieving Multiple Documents
GET /website/blog/_mget{ "docs" : [ { "_id" : 2 }, { "_type" : "pageviews", "_id" : 1 } ]}
{ "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 3, "found": true, "_source": { "views": 3 } } ]}
Retrieving Multiple Documents
GET /website/blog/_mget{ "ids" : [ "2", "1" ]}
{ "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "blog", "_id": "1", "found": false } ]}
Cheaper in Bulk
{ action: { metadata }}\n{ request body }\n{ action: { metadata }}\n{ request body }\n...
The bulk request body has the following format :
POST /_bulk{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }} { "create": { "_index": "website", "_type": "blog", "_id": "123" }}{ "title": "My first blog post" }{ "index": { "_index": "website", "_type": "blog" }}{ "title": "My second blog post" }{ "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} }{ "doc" : {"title" : "My updated blog post"} }
{ "took": 4, "errors": false, "items": [ { "delete": { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "status": 404, "found": false } }, { "create": { "_index": "website", "_type": "blog", "_id": "123", "_version": 2, "status": 201 } }, { "create": { "_index": "website", "_type": "blog", "_id": "AVeVu4ZmPwPQAxVyMVtH", "_version": 1, "status": 201 } }, { "update": { "_index": "website", "_type": "blog", "_id": "123", "_version": 3, "status": 200 } } ]}
Cheaper in BulkPOST /_bulk{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}{ "title": "Cannot create - it already exists" }{ "index": { "_index": "website", "_type": "blog", "_id": "123" }}{ "title": "But we can update it" }
{ "took": 2, "errors": true, "items": [ { "create": { "_index": "website", "_type": "blog", "_id": "123", "status": 409, "error": "DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]" } }, { "index": { "_index": "website", "_type": "blog", "_id": "123", "_version": 4, "status": 200 } } ]}
Don’t Repeat YourselfPOST /website/_bulk{ "index": { "_type": "log" }}{ "event": "User logged in" } {
"took": 3, "errors": false, "items": [ { "create": { "_index": "website", "_type": "log", "_id": "AVeVyqWVPwPQAxVyMV3_", "_version": 1, "status": 201 } } ]}
Don’t Repeat YourselfPOST /website/log/_bulk{ "index": {}}{ "event": "User logged in" }{ "index": { "_type": "blog" }}{ "title": "Overriding the default type" }
{ "took": 2, "errors": false, "items": [ { "create": { "_index": "website", "_type": "log", "_id": "AVeVzBQjPwPQAxVyMV4_", "_version": 1, "status": 201 } }, { "create": { "_index": "website", "_type": "blog", "_id": "AVeVzBQjPwPQAxVyMV5A", "_version": 1, "status": 201 } } ]}
How Big Is Too Big ?
Referensi
● ElasticSearch, The Definitive Guide, A Distributed Real-Time Search and Analytics Engine, Clinton Gormely & Zachary Tong, O’Reilly