elasticsearch terminology

ELASTICSEARCH TERMINOLOGY

by Bo Andersen - codingexplained.com

http://codingexplained.com/

NEAR REALTIME (NRT)➤ ElasticSearch is a near realtime search engine➤ There is only a small latency from a document is indexed until it is

searchable➤ The latency is usually one second

CLUSTER➤ A cluster is a collection of nodes (servers)➤ Consists of one or more nodes, depending on the scale

➤ Can contain as many nodes as you want➤ Together, these nodes contain all data➤ A cluster provides indexing and search capability across all nodes➤ Identified by a unique name (defaults to "elasticsearch")

NODE➤ A single server that is part of a cluster➤ Stores searchable data

➤ Stores all data if there is only one node in the cluster, or part of the data if there are multiple nodes

➤ Participates in a cluster's indexing and search capabilities➤ Identified by a name (defaults to a random Marvel character)➤ A node joins a cluster named "elasticsearch" by default➤ Starting a single node on a network will by default create a new single-

node cluster named "elasticsearch"

INDEX➤ A collection of documents (e.g. product, account, movie)

➤ Each of the above examples would be a type➤ Corresponds to a database within a relational database system➤ Identified by a name, which must be lowercased

➤ Used when indexing, searching, updating and deleting documents within the index

➤ You can define as many indexes as you want within a cluster

TYPE➤ Represents a class/category of similar documents, e.g. "user"➤ Consists of a name and a mapping➤ Simplified, you can think of a type as a table within a relational database➤ An index can have one or more types defined, each with their own

mapping➤ Stored within a metadata field named _type because Lucene has no

concept of document types➤ Searching for specific document types applies a filter on this field

MAPPING➤ Similar to a database schema for a table in a relational database➤ Describes the fields that a document of a given type may have

➤ Includes the data type for each field, e.g. string, integer, date, ...➤ Also includes information on how fields should be indexed and stored by

Lucene➤ Dynamic mapping means that it is optional to define a mapping explicitly

DOCUMENT➤ A basic unit of information that can be indexed➤ Consists of fields, which are key/value pairs

➤ A value can be a string, date, object, etc.➤ Corresponds to an object in an object-oriented programming language

➤ A document can be a single user, order, product, etc.➤ Documents are expressed in JSON➤ You can store as many documents within an index as you want

SHARDS➤ An index can be divided into multiple pieces called shards

➤ Useful if an index contains more data than the hardware of a node can store (e.g. 1 TB data on a 500 GB disk)

➤ A shard is a fully functional and independent index➤ Can be stored on any node in a cluster

➤ The number of shards can be specified when creating an index➤ Allows to scale horizontally by content volume (index space)➤ Allows to distribute and parallelize operations across shards, which

increases performance

REPLICAS➤ A replica is a copy of a shard➤ Provides high availability in case a shard or node fails

➤ A replica never resides on the same node as the original shard➤ Allows scaling search volume, because search queries can be executed on

all replicas in parallel➤ By default, Elasticsearch adds 5 primary shards and 1 replica for each

index

THANK YOU FOR WATCHING!

elasticsearch terminology

Technology