dl.ebooksworld.irdl.ebooksworld.ir/motoman/packt.elasticsearch.server.3rd.edition.w… · table of...

756
www.EBooksWorld.ir

Upload: others

Post on 02-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 2: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 3: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ElasticsearchServerThirdEdition

www.EBooksWorld.ir

Page 4: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TableofContents

ElasticsearchServerThirdEdition

Credits

AbouttheAuthors

AbouttheReviewer

www.PacktPub.com

eBooks,discountoffers,andmore

Whysubscribe?

Preface

Whatthisbookcovers

Whatyouneedforthisbook

Whothisbookisfor

Conventions

Readerfeedback

Customersupport

Downloadingtheexamplecode

Downloadingthecolorimagesofthisbook

Errata

Piracy

Questions

1.GettingStartedwithElasticsearchCluster

Fulltextsearching

TheLuceneglossaryandarchitecture

Inputdataanalysis

Indexingandquerying

Scoringandqueryrelevance

ThebasicsofElasticsearch

KeyconceptsofElasticsearch

Index

Document

www.EBooksWorld.ir

Page 5: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Documenttype

Mapping

KeyconceptsoftheElasticsearchinfrastructure

Nodesandclusters

Shards

Replicas

Gateway

Indexingandsearching

Installingandconfiguringyourcluster

InstallingJava

InstallingElasticsearch

RunningElasticsearch

ShuttingdownElasticsearch

Thedirectorylayout

ConfiguringElasticsearch

Thesystem-specificinstallationandconfiguration

InstallingElasticsearchonLinux

InstallingElasticsearchusingRPMpackages

InstallingElasticsearchusingtheDEBpackage

Elasticsearchconfigurationfilelocalization

ConfiguringElasticsearchasasystemserviceonLinux

ElasticsearchasasystemserviceonWindows

ManipulatingdatawiththeRESTAPI

UnderstandingtheRESTAPI

StoringdatainElasticsearch

Creatinganewdocument

Automaticidentifiercreation

Retrievingdocuments

Updatingdocuments

Dealingwithnon-existingdocuments

Addingpartialdocuments

www.EBooksWorld.ir

Page 6: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Deletingdocuments

Versioning

Usageexample

Versioningfromexternalsystems

SearchingwiththeURIrequestquery

Sampledata

URIsearch

Elasticsearchqueryresponse

Queryanalysis

URIquerystringparameters

Thequery

Thedefaultsearchfield

Analyzer

Thedefaultoperatorproperty

Queryexplanation

Thefieldsreturned

Sortingtheresults

Thesearchtimeout

Theresultswindow

Limitingper-shardresults

Ignoringunavailableindices

Thesearchtype

Lowercasingtermexpansion

Wildcardandprefixanalysis

Lucenequerysyntax

Summary

2.IndexingYourData

Elasticsearchindexing

Shardsandreplicas

Writeconsistency

Creatingindices

www.EBooksWorld.ir

Page 7: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Alteringautomaticindexcreation

Settingsforanewlycreatedindex

Indexdeletion

Mappingsconfiguration

Typedeterminingmechanism

Disablingthetypedeterminingmechanism

Tuningthetypedeterminingmechanismfornumerictypes

Tuningthetypedeterminingmechanismfordates

Indexstructuremapping

Typeandtypesdefinition

Fields

Coretypes

Commonattributes

String

Number

Boolean

Binary

Date

Multifields

TheIPaddresstype

Tokencounttype

Usinganalyzers

Out-of-the-boxanalyzers

Definingyourownanalyzers

Defaultanalyzers

Differentsimilaritymodels

Settingper-fieldsimilarity

Availablesimilaritymodels

Configuringdefaultsimilarity

ConfiguringBM25similarity

ConfiguringDFRsimilarity

www.EBooksWorld.ir

Page 8: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ConfiguringIBsimilarity

Batchindexingtospeedupyourindexingprocess

Preparingdataforbulkindexing

Indexingthedata

The_allfield

The_sourcefield

Additionalinternalfields

Introductiontosegmentmerging

Segmentmerging

Theneedforsegmentmerging

Themergepolicy

Themergescheduler

Throttling

Introductiontorouting

Defaultindexing

Defaultsearching

Routing

Theroutingparameters

Routingfields

Summary

3.SearchingYourData

QueryingElasticsearch

Theexampledata

Asimplequery

Pagingandresultsize

Returningtheversionvalue

Limitingthescore

Choosingthefieldsthatwewanttoreturn

Sourcefiltering

Usingthescriptfields

Passingparameterstothescriptfields

www.EBooksWorld.ir

Page 9: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Understandingthequeryingprocess

Querylogic

Searchtype

Searchexecutionpreference

SearchshardsAPI

Basicqueries

Thetermquery

Thetermsquery

Thematchallquery

Thetypequery

Theexistsquery

Themissingquery

Thecommontermsquery

Thematchquery

TheBooleanmatchquery

Thephrasematchquery

Thematchphraseprefixquery

Themultimatchquery

Thequerystringquery

Runningthequerystringqueryagainstmultiplefields

Thesimplequerystringquery

Theidentifiersquery

Theprefixquery

Thefuzzyquery

Thewildcardquery

Therangequery

Regularexpressionquery

Themorelikethisquery

Compoundqueries

Theboolquery

Thedis_maxquery

www.EBooksWorld.ir

Page 10: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Theboostingquery

Theconstant_scorequery

Theindicesquery

Usingspanqueries

Aspan

Spantermquery

Spanfirstquery

Spannearquery

Spanorquery

Spannotquery

Spanwithinquery

Spancontainingquery

Spanmultiquery

Performanceconsiderations

Choosingtherightquery

Theusecases

Limitingresultstogiventags

Searchingforvaluesinarange

Boostingsomeofthematcheddocuments

Ignoringlowerscoringpartialqueries

UsingLucenequerysyntaxinqueries

Handlinguserquerieswithouterrors

Autocompleteusingprefixes

Findingtermssimilartoagivenone

Matchingphrases

Spans,spanseverywhere

Summary

4.ExtendingYourQueryingKnowledge

Filteringyourresults

Thecontextisthekey

Explicitfilteringwithboolquery

www.EBooksWorld.ir

Page 11: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Highlighting

Gettingstartedwithhighlighting

Fieldconfiguration

Underthehood

Forcinghighlightertype

ConfiguringHTMLtags

Controllinghighlightedfragments

Globalandlocalsettings

Requirematching

Customhighlightingquery

ThePostingshighlighter

Validatingyourqueries

UsingtheValidateAPI

Sortingdata

Defaultsorting

Selectingfieldsusedforsorting

Sortingmode

Specifyingbehaviorformissingfields

Dynamiccriteria

Calculatescoringwhensorting

Queryrewrite

Prefixqueryasanexample

GettingbacktoApacheLucene

Queryrewriteproperties

Summary

5.ExtendingYourIndexStructure

Indexingtree-likestructures

Datastructure

Analysis

Indexingdatathatisnotflat

Data

www.EBooksWorld.ir

Page 12: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Objects

Arrays

Mappings

Finalmappings

SendingthemappingstoElasticsearch

Tobeornottobedynamic

Disablingobjectindexing

Usingnestedobjects

Scoringandnestedqueries

Usingtheparent-childrelationship

Indexstructureanddataindexing

Childmappings

Parentmappings

Theparentdocument

Childdocuments

Querying

Queryingdatainthechilddocuments

Queryingdataintheparentdocuments

Performanceconsiderations

ModifyingyourindexstructurewiththeupdateAPI

Themappings

Addinganewfieldtotheexistingindex

Modifyingfieldsofanexistingindex

Summary

6.MakeYourSearchBetter

IntroductiontoApacheLucenescoring

Whenadocumentismatched

Defaultscoringformula

Relevancymatters

ScriptingcapabilitiesofElasticsearch

Objectsavailableduringscriptexecution

www.EBooksWorld.ir

Page 13: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Scripttypes

Infilescripts

Inlinescripts

Indexedscripts

Queryingwithscripts

Scriptingwithparameters

Scriptlanguages

Usingotherthanembeddedlanguages

Usingnativecode

Thefactoryimplementation

Implementingthenativescript

Theplugindefinition

Installingtheplugin

Runningthescript

Searchingcontentindifferentlanguages

Handlinglanguagesdifferently

Handlingmultiplelanguages

Detectingthelanguageofthedocument

Sampledocument

Themappings

Querying

Querieswithanidentifiedlanguage

Querieswithanunknownlanguage

Combiningqueries

Influencingscoreswithqueryboosts

Theboost

Addingtheboosttoqueries

Modifyingthescore

Constantscorequery

Boostingquery

Thefunctionscorequery

www.EBooksWorld.ir

Page 14: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Structureofthefunctionquery

Theweightfactorfunction

Fieldvaluefactorfunction

Thescriptscorefunction

Therandomscorefunction

Decayfunctions

Whendoesindex-timeboostingmakesense?

Definingboostinginthemappings

Wordswiththesamemeaning

Synonymfilter

Synonymsinthemappings

Synonymsstoredonthefilesystem

Definingsynonymrules

UsingApacheSolrsynonyms

Explicitsynonyms

Equivalentsynonyms

Expandingsynonyms

UsingWordNetsynonyms

Queryorindex-timesynonymexpansion

Understandingtheexplaininformation

Understandingfieldanalysis

Explainingthequery

Summary

7.AggregationsforDataAnalysis

Aggregations

Generalquerystructure

Insidetheaggregationsengine

Aggregationtypes

Metricsaggregations

Minimum,maximum,average,andsum

Missingvalues

www.EBooksWorld.ir

Page 15: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Usingscripts

Fieldvaluestatisticsandextendedstatistics

Valuecount

Fieldcardinality

Percentiles

Percentileranks

Tophitsaggregation

Additionalparameters

Geoboundsaggregation

Scriptedmetricsaggregation

Bucketsaggregations

Filteraggregation

Filtersaggregation

Termsaggregation

Countsareapproximate

Minimumdocumentcount

Rangeaggregation

Keyedbuckets

Daterangeaggregation

IPv4rangeaggregation

Missingaggregation

Histogramaggregation

Datehistogramaggregation

Timezones

Geodistanceaggregations

Geohashgridaggregation

Globalaggregation

Significanttermsaggregation

Choosingsignificantterms

Multiplevalueanalysis

Sampleraggregation

www.EBooksWorld.ir

Page 16: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Childrenaggregation

Nestedaggregation

Reversenestedaggregation

Nestingaggregationsandorderingbuckets

Bucketsordering

Pipelineaggregations

Availabletypes

Referencingotheraggregations

Gapsinthedata

Pipelineaggregationtypes

Min,max,sum,andaveragebucketaggregations

Cumulativesumaggregation

Bucketselectoraggregation

Bucketscriptaggregation

Serialdifferencingaggregation

Derivativeaggregation

Movingavgaggregation

Predictingfuturebuckets

Themodels

Summary

8.BeyondFull-textSearching

Percolator

Theindex

Percolatorpreparation

Gettingdeeper

Controllingthesizeofreturnedresults

Percolatorandscorecalculation

Combiningpercolatorswithotherfunctionalities

Gettingthenumberofmatchingqueries

Indexeddocumentpercolation

Elasticsearchspatialcapabilities

www.EBooksWorld.ir

Page 17: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Mappingpreparationforspatialsearches

Exampledata

Additionalgeo_fieldproperties

Samplequeries

Distance-basedsorting

Boundingboxfiltering

Limitingthedistance

Arbitrarygeoshapes

Point

Envelope

Polygon

Multipolygon

Anexampleusage

Storingshapesintheindex

Usingsuggesters

Availablesuggestertypes

Includingsuggestions

Suggesterresponse

Termsuggester

Termsuggesterconfigurationoptions

Additionaltermsuggesteroptions

Phrasesuggester

Configuration

Completionsuggester

Indexingdata

Queryingindexedcompletionsuggesterdata

Customweights

Contextsuggester

Contexttypes

Usingcontext

Usingthegeolocationcontext

www.EBooksWorld.ir

Page 18: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheScrollAPI

Problemdefinition

Scrollingtotherescue

Summary

9.ElasticsearchClusterinDetail

Understandingnodediscovery

Discoverytypes

Noderoles

Masternode

Datanode

Clientnode

Configuringnoderoles

Settingthecluster’sname

Zendiscovery

Masterelectionconfiguration

Configuringunicast

Faultdetectionpingsettings

Clusterstateupdatescontrol

Dealingwithmasterunavailability

AdjustingHTTPtransportsettings

DisablingHTTP

HTTPport

HTTPhost

Thegatewayandrecoverymodules

Thegateway

Recoverycontrol

Additionalgatewayrecoveryoptions

IndicesrecoveryAPI

Delayedallocation

Indexrecoveryprioritization

Templatesanddynamictemplates

www.EBooksWorld.ir

Page 19: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Templates

Anexampleofatemplate

Dynamictemplates

Thematchingpattern

Fielddefinitions

Elasticsearchplugins

Thebasics

Installingplugins

Removingplugins

Elasticsearchcaches

Fielddatacache

Fielddatasize

Circuitbreakers

Fielddataanddocvalues

Shardrequestcache

Enablingandconfiguringtheshardrequestcache

Perrequestshardrequestcachedisabling

Shardrequestcacheusagemonitoring

Nodequerycache

Indexingbuffers

Whencachesshouldbeavoided

TheupdatesettingsAPI

TheclustersettingsAPI

TheindicessettingsAPI

Summary

10.AdministratingYourCluster

Elasticsearchtimemachine

Creatingasnapshotrepository

Creatingsnapshots

Additionalparameters

Restoringasnapshot

www.EBooksWorld.ir

Page 20: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Cleaningup–deletingoldsnapshots

Monitoringyourcluster’sstateandhealth

ClusterhealthAPI

Controllinginformationdetails

Additionalparameters

IndicesstatsAPI

Docs

Store

Indexing,get,andsearch

Additionalinformation

NodesinfoAPI

Returnedinformation

NodesstatsAPI

ClusterstateAPI

ClusterstatsAPI

PendingtasksAPI

IndicesrecoveryAPI

IndicesshardstoresAPI

IndicessegmentsAPI

Controllingtheshardandreplicaallocation

Explicitlycontrollingallocation

Specifyingnodeparameters

Configuration

Indexcreation

Excludingnodesfromallocation

Requiringnodeattributes

UsingtheIPaddressforshardallocation

Disk-basedshardallocation

Configuringdiskbasedshardallocation

Disablingdiskbasedshardallocation

Thenumberofshardsandreplicaspernode

www.EBooksWorld.ir

Page 21: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Allocationthrottling

Cluster-wideallocation

Allocationawareness

Forcingallocationawareness

Filtering

Whatdoinclude,exclude,andrequiremean

Manuallymovingshardsandreplicas

Movingshards

Cancelingshardallocation

Forcingshardallocation

MultiplecommandsperHTTPrequest

Allowingoperationsonprimaryshards

Handlingrollingrestarts

Controllingclusterrebalancing

Understandingrebalance

Clusterbeingready

Theclusterrebalancesettings

Controllingwhenrebalancingwillbeallowed

Controllingthenumberofshardsbeingmovedbetweennodesconcurrently

Controllingwhichshardsmayberebalanced

TheCatAPI

Thebasics

UsingCatAPI

Commonarguments

Theexamples

Gettinginformationaboutthemasternode

Gettinginformationaboutthenodes

Retrievingrecoveryinformationforanindex

Warmingup

Defininganewwarmingquery

Retrievingthedefinedwarmingqueries

www.EBooksWorld.ir

Page 22: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Deletingawarmingquery

Disablingthewarmingupfunctionality

Choosingqueriesforwarming

Indexaliasingandusingittosimplifyyoureverydaywork

Analias

Creatinganalias

Modifyingaliases

Combiningcommands

Retrievingaliases

Removingaliases

Filteringaliases

Aliasesandrouting

Zerodowntimereindexingandaliases

Summary

11.ScalingbyExample

Hardware

Physicalserversoracloud

CPU

RAMmemory

Massstorage

Thenetwork

Howmanyservers

Costcutting

PreparingasingleElasticsearchnode

Thegeneralpreparations

Avoidingswapping

Filedescriptors

Virtualmemory

Thememory

Fielddatacacheandbreakingthecircuit

Usedocvalues

www.EBooksWorld.ir

Page 23: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RAMbufferforindexing

Indexrefreshrate

Threadpools

Horizontalexpansion

Automaticallycreatingthereplicas

Redundancyandhighavailability

Costandperformanceflexibility

Continuousupgrades

MultipleElasticsearchinstancesonasinglephysicalmachine

Preventingashardanditsreplicasfrombeingonthesamenode

Designatednoderolesforlargerclusters

Queryaggregatornodes

Datanodes

Mastereligiblenodes

Preparingtheclusterforhighindexingandqueryingthroughput

Indexingrelatedadvice

Indexrefreshrate

Threadpoolstuning

Automaticstorethrottling

Handlingtime-baseddata

Multipledatapaths

Datadistribution

Bulkindexing

RAMbufferforindexing

Adviceforhighqueryratescenarios

Shardrequestcache

Thinkaboutthequeries

Parallelizeyourqueries

Fielddatacacheandbreakingthecircuit

Keepsizeandshardsizeundercontrol

Monitoring

www.EBooksWorld.ir

Page 24: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ElasticsearchHQ

Marvel

SPMforElasticsearch

Summary

Index

www.EBooksWorld.ir

Page 25: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 26: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ElasticsearchServerThirdEdition

www.EBooksWorld.ir

Page 27: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 28: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ElasticsearchServerThirdEditionCopyright©2016PacktPublishing

Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.

Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthors,norPacktPublishing,anditsdealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecauseddirectlyorindirectlybythisbook.

PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.

Firstpublished:October2013

Secondedition:February2015

Thirdedition:February2016

Productionreference:1230216

PublishedbyPacktPublishingLtd.

LiveryPlace

35LiveryStreet

BirminghamB32PB,UK.

ISBN978-1-78588-881-6

www.packtpub.com

www.EBooksWorld.ir

Page 29: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 30: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

CreditsAuthors

RafałKuć

MarekRogoziński

Reviewer

PaigeCook

CommissioningEditor

NadeemBagban

AcquisitionEditor

DivyaPoojari

ContentDevelopmentEditor

KirtiPatil

TechnicalEditor

UtkarshaS.Kadam

CopyEditor

AlphaSingh

ProjectCoordinator

NidhiJoshi

Proofreader

SafisEditing

Indexer

RekhaNair

Graphics

JasonMonteiro

ProductionCoordinator

ManuJoseph

CoverWork

ManuJoseph

www.EBooksWorld.ir

Page 31: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 32: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AbouttheAuthorsRafałKućisasoftwareengineer,trainer,speakerandconsultant.HeisworkingasaconsultantandsoftwareengineeratSematextGroupInc.whereheconcentratesonopensourcetechnologiessuchasApacheLucene,Solr,andElasticsearch.Hehasmorethan14yearsofexperienceinvarioussoftwaredomains—frombankingsoftwaretoe–commerceproducts.HeismainlyfocusedonJava;however,heisopentoeverytoolandprogramminglanguagethatmighthelphimtoachievehisgoalseasilyandquickly.Rafałisalsooneofthefoundersofthesolr.plsite,wherehetriestosharehisknowledgeandhelppeoplesolvetheirSolrandLuceneproblems.HeisalsoaspeakeratvariousconferencesaroundtheworldsuchasLuceneEurocon,BerlinBuzzwords,ApacheCon,Lucene/SolrRevolution,Velocity,andDevOpsDays.

RafałbeganhisjourneywithLucenein2002;however,itwasn’tloveatfirstsight.WhenhecamebacktoLuceneinlate2003,herevisedhisthoughtsabouttheframeworkandsawthepotentialinsearchtechnologies.ThenSolrcameandthatwasit.HestartedworkingwithElasticsearchinthemiddleof2010.Atpresent,Lucene,Solr,Elasticsearch,andinformationretrievalarehismainareasofinterest.

RafałisalsotheauthoroftheSolrCookbookseries,ElasticSearchServeranditssecondedition,andthefirstandsecondeditionsofMasteringElasticSearch,allpublishedbyPacktPublishing.

MarekRogozińskiisasoftwarearchitectandconsultantwithmorethan10yearsofexperience.Hisspecializationconcernssolutionsbasedonopensourcesearchengines,suchasSolrandElasticsearch,andthesoftwarestackforbigdataanalyticsincludingHadoop,Hbase,andTwitterStorm.

Heisalsoacofounderofthesolr.plsite,whichpublishesinformationandtutorialsaboutSolrandLucenelibraries.HeisthecoauthorofElasticSearchServeranditssecondedition,andthefirstandsecondeditionsofMasteringElasticSearch,allpublishedbyPacktPublishing.

HeiscurrentlythechieftechnologyofficerandleadarchitectatZenCard,acompanythatprocessesandanalyzeslargequantitiesofpaymenttransactionsinrealtime,allowingautomaticandanonymousidentificationofretailcustomersonallretailerchannels(m-commerce/e-commerce/brick&mortar)andgivingretailersacustomerretentionandloyaltytool.

www.EBooksWorld.ir

Page 33: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 34: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AbouttheReviewerPaigeCookworksasasoftwarearchitectforVidea,partoftheCoxFamilyofCompanies,andlivesnearAtlanta,Georgia.Hehastwentyyearsofexperienceinsoftwaredevelopment,primarilywiththeMicrosoft.NETFramework.Hiscareerhasbeenlargelyfocusedonbuildingenterprisesolutionsforthemediaandentertainmentindustry.HeisespeciallyinterestedinsearchtechnologiesusingtheApacheLucenesearchengineandhasexperiencewithbothElasticsearchandApacheSolr.Apartfromhiswork,heenjoysDIYhomeprojectsandspendingtimewithhiswifeandtwodaughters.

www.EBooksWorld.ir

Page 35: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 36: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.PacktPub.com

www.EBooksWorld.ir

Page 37: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

eBooks,discountoffers,andmoreDidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusat<[email protected]>formoredetails.

Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewslettersandreceiveexclusivediscountsandoffersonPacktbooksandeBooks.

https://www2.packtpub.com/books/subscription/packtlib

DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt’sonlinedigitalbooklibrary.Here,youcansearch,access,andreadPackt’sentirelibraryofbooks.

www.EBooksWorld.ir

Page 38: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Whysubscribe?FullysearchableacrosseverybookpublishedbyPacktCopyandpaste,print,andbookmarkcontentOndemandandaccessibleviaawebbrowser

www.EBooksWorld.ir

Page 39: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 40: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PrefaceWelcometoElasticsearchServer,ThirdEdition.ThisisthethirdinstalmentofthebookdedicatedtoyetanothermajorreleaseofElasticsearch—thistimeversion2.2.Inthethirdedition,wehavedecidedtogoonasimilarroutethatwetookwhenwewrotethesecondeditionofthebook.WenotonlyupdatedthecontenttomatchthenewversionofElasticsearch,butalsorestructuredthebookbyremovingandaddingnewsectionsandchapters.Wereadthesuggestionswegotfromyou—thereadersofthebook,andwecarefullytriedtoincorporatethesuggestionsandcommentsreceivedsincethereleaseofthefirstandsecondeditions.

Whilereadingthisbook,youwillbetakenonajourneytothewonderfulworldoffull-textsearchprovidedbytheElasticsearchserver.WewillstartwithageneralintroductiontoElasticsearch,whichcovershowtostartandrunElasticsearch,itsbasicconcepts,andhowtoindexandsearchyourdatainthemostbasicway.Thisbookwillalsodiscussthequerylanguage,socalledQueryDSL,thatallowsyoutocreatecomplicatedqueriesandfilterreturnedresults.Inadditiontoallofthis,you’llseehowyoucanusetheaggregationframeworktocalculateaggregateddatabasedontheresultsreturnedbyyourqueries.WewillimplementtheautocompletefunctionalitytogetherandlearnhowtouseElasticsearchspatialcapabilitiesandprospectivesearch.

Finally,thisbookwillshowyouElasticsearch’sadministrationAPIcapabilitieswithfeaturessuchasshardplacementcontrol,clusterhandling,andmore,endingwithadedicatedchapterthatwilldiscussElasticsearch’spreparationforsmallandlargedeployments—bothonesthatconcentrateonindexingandalsoonesthatconcentrateonindexing.

www.EBooksWorld.ir

Page 41: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

WhatthisbookcoversChapter1,GettingStartedwithElasticsearchCluster,coverswhatfull-textsearchingis,whatApacheLuceneis,whattextanalysisis,howtorunandconfigureElasticsearch,andfinally,howtoindexandsearchyourdatainthemostbasicway.

Chapter2,IndexingYourData,showshowindexingworks,howtoprepareindexstructure,whatdatatypesweareallowedtouse,howtospeedupindexing,whatsegmentsare,howmergingworks,andwhatroutingis.

Chapter3,SearchingYourData,introducesthefull-textsearchcapabilitiesofElasticsearchbydiscussinghowtoqueryit,howthequeryingprocessworks,andwhattypesofbasicandcompoundqueriesareavailable.Inadditiontothis,wewillshowhowtouseposition-awarequeriesinElasticsearch.

Chapter4,ExtendingYourQueryKnowledge,showshowtoefficientlynarrowdownyoursearchresultsbyusingfilters,howhighlightingworks,howtosortyourresults,andhowqueryrewriteworks.

Chapter5,ExtendingYourIndexStructure,showshowtoindexmorecomplexdatastructures.Welearnhowtoindextree-likedatatypes,howtoindexdatawithrelationshipsbetweendocuments,andhowtomodifyindexstructure.

Chapter6,MakeYourSearchBetter,coversApacheLucenescoringandhowtoinfluenceitinElasticsearch,thescriptingcapabilitiesofElasticsearch,anditslanguageanalysiscapabilities.

Chapter7,AggregationsforDataAnalysis,introducesyoutothegreatworldofdataanalysisbyshowingyouhowtousetheElasticsearchaggregationframework.Wewilldiscussalltypesofaggregations—metrics,buckets,andthenewpipelineaggregationsthathavebeenintroducedinElasticsearch.

Chapter8,BeyondFull-textSearching,discussesnonfull-textsearch-relatedfunctionalitiessuchaspercolator—reversedsearch,andthegeo-spatialcapabilitiesofElasticsearch.Thischapteralsodiscussessuggesters,whichallowustobuildaspellcheckingfunctionalityandanefficientautocompletemechanism,andwewillshowhowtohandledeep-pagingefficiently.

Chapter9,ElasticsearchClusterinDetail,discussesnodesdiscoverymechanism,recoveryandgatewayElasticsearchmodules,templates,caches,andsettingsupdateAPI.

Chapter10,AdministratingYourCluster,coverstheElasticsearchbackupfunctionality,rebalancing,andshardsmoving.Inadditiontothis,youwilllearnhowtousethewarmupfunctionality,usetheCatAPI,andworkwithaliases.

Chapter11,ScalingbyExample,isdedicatedtoscalingandtuning.WewillstartwithhardwarepreparationsandconsiderationsandasingleElasticsearchnode-relatedtuning.Wewillgothroughclustersetupandverticalscaling,endingthechapterwithhighqueryingandindexingusecasesandclustermonitoring.

www.EBooksWorld.ir

Page 42: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 43: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

WhatyouneedforthisbookThisbookwaswrittenusingElasticsearchserver2.2andalltheexamplesandfunctionsshouldworkwiththis.Inadditiontothis,you’llneedacommandthatallowsyoutosendHTTPrequestsuchascurl,whichisavailableformostoperatingsystems.Pleasenotethatalltheexamplesinthisbookusethepreviouslymentionedcurltool.Ifyouwanttouseanothertool,pleaseremembertoformattherequestinanappropriatewaythatisunderstoodbythetoolofyourchoice.

Inadditiontothis,somechaptersmayrequireadditionalsoftware,suchasElasticsearchplugins,butwhenneededithasbeenexplicitlymentioned.

www.EBooksWorld.ir

Page 44: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 45: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

WhothisbookisforIfyouareabeginnertotheworldoffull-textsearchandElasticsearch,thenthisbookisespeciallyforyou.YouwillbeguidedthroughthebasicsofElasticsearchandyouwilllearnhowtousesomeoftheadvancedfunctionalities.

IfyouknowElasticsearchandyouworkedwithit,thenyoumayfindthisbookinterestingasitprovidesaniceoverviewofallthefunctionalitieswithexamplesanddescriptions.However,youmayencountersectionsthatyoualreadyknow.

IfyouknowtheApacheSolrsearchengine,thisbookcanalsobeusedtocomparesomefunctionalitiesofApacheSolrandElasticsearch.Thismaygiveyoutheknowledgeaboutwhichtoolismoreappropriateforyourusecase.

IfyouknowallthedetailsaboutElasticsearchandyouknowhoweachoftheconfigurationparameterswork,thenthisisdefinitelynotthebookyouarelookingfor.

www.EBooksWorld.ir

Page 46: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 47: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ConventionsInthisbook,youwillfindanumberoftextstylesthatdistinguishbetweendifferentkindsofinformation.Herearesomeexamplesofthesestylesandanexplanationoftheirmeaning.

Codewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandlesareshownasfollows:“IfyouusetheLinuxorOSXcommand,thecURLpackageshouldalreadybeavailable.”

Ablockofcodeissetasfollows:

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"published":{"type":"date"},

"contents":{"type":"string"}

}

}

}

}

Whenwewishtodrawyourattentiontoaparticularpartofacodeblock,therelevantlinesoritemsaresetinbold:

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"published":{"type":"date"},

"contents":{"type":"string"}

}

}

}

}

Anycommand-lineinputoroutputiswrittenasfollows:

curl-XPUThttp://localhost:9200/users/?pretty-d'{

"mappings":{

"user":{

"numeric_detection":true

}

}

}'

NoteWarningsorimportantnotesappearinaboxlikethis.

www.EBooksWorld.ir

Page 48: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 49: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ReaderfeedbackFeedbackfromourreadersisalwayswelcome.Letusknowwhatyouthinkaboutthisbook—whatyoulikedordisliked.Readerfeedbackisimportantforusasithelpsusdeveloptitlesthatyouwillreallygetthemostoutof.

Tosendusgeneralfeedback,simplye-mail<[email protected]>,andmentionthebook’stitleinthesubjectofyourmessage.

Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,seeourauthorguideatwww.packtpub.com/authors.

www.EBooksWorld.ir

Page 50: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 51: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

CustomersupportNowthatyouaretheproudownerofaPacktbook,wehaveanumberofthingstohelpyoutogetthemostfromyourpurchase.

www.EBooksWorld.ir

Page 52: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DownloadingtheexamplecodeYoucandownloadtheexamplecodefilesforthisbookfromyouraccountathttp://www.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.

Youcandownloadthecodefilesbyfollowingthesesteps:

1. Loginorregistertoourwebsiteusingyoure-mailaddressandpassword.2. HoverthemousepointerontheSUPPORTtabatthetop.3. ClickonCodeDownloads&Errata.4. EnterthenameofthebookintheSearchbox.5. Selectthebookforwhichyou’relookingtodownloadthecodefiles.6. Choosefromthedrop-downmenuwhereyoupurchasedthisbookfrom.7. ClickonCodeDownload.

Oncethefileisdownloaded,pleasemakesurethatyouunziporextractthefolderusingthelatestversionof:

WinRAR/7-ZipforWindowsZipeg/iZip/UnRarXforMac7-Zip/PeaZipforLinux

www.EBooksWorld.ir

Page 53: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DownloadingthecolorimagesofthisbookWealsoprovideyouwithaPDFfilethathascolorimagesofthescreenshots/diagramsusedinthisbook.Thecolorimageswillhelpyoubetterunderstandthechangesintheoutput.Youcandownloadthisfilefromhttps://www.packtpub.com/sites/default/files/downloads/ElasticsearchServerThirdEdition_ColorImages.pdf

www.EBooksWorld.ir

Page 54: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ErrataAlthoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyoufindamistakeinoneofourbooks—maybeamistakeinthetextorthecode—wewouldbegratefulifyoucouldreportthistous.Bydoingso,youcansaveotherreadersfromfrustrationandhelpusimprovesubsequentversionsofthisbook.Ifyoufindanyerrata,pleasereportthembyvisitinghttp://www.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetailsofyourerrata.Onceyourerrataareverified,yoursubmissionwillbeacceptedandtheerratawillbeuploadedtoourwebsiteoraddedtoanylistofexistingerrataundertheErratasectionofthattitle.

Toviewthepreviouslysubmittederrata,gotohttps://www.packtpub.com/books/content/supportandenterthenameofthebookinthesearchfield.TherequiredinformationwillappearundertheErratasection.

www.EBooksWorld.ir

Page 55: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PiracyPiracyofcopyrightedmaterialontheInternetisanongoingproblemacrossallmedia.AtPackt,wetaketheprotectionofourcopyrightandlicensesveryseriously.IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,pleaseprovideuswiththelocationaddressorwebsitenameimmediatelysothatwecanpursuearemedy.

Pleasecontactusat<[email protected]>withalinktothesuspectedpiratedmaterial.

Weappreciateyourhelpinprotectingourauthorsandourabilitytobringyouvaluablecontent.

www.EBooksWorld.ir

Page 56: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

QuestionsIfyouhaveaproblemwithanyaspectofthisbook,youcancontactusat<[email protected]>,andwewilldoourbesttoaddresstheproblem.

www.EBooksWorld.ir

Page 57: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 58: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Chapter1.GettingStartedwithElasticsearchClusterWelcometothewonderfulworldofElasticsearch—agreatfulltextsearchandanalyticsengine.Itdoesn’tmatterifyouarenewtoElasticsearchandfulltextsearchesingeneral,orifyoualreadyhavesomeexperienceinthis.Wehopethat,byreadingthisbook,you’llbeabletolearnandextendyourknowledgeofElasticsearch.Asthisbookisalsodedicatedtobeginners,wedecidedtostartwithashortintroductiontofulltextsearchesingeneral,andafterthat,abriefoverviewofElasticsearch.

PleaserememberthatElasticsearchisarapidlychangingofsoftware.Notonlyarefeaturesadded,buttheElasticsearchcorefunctionalityisalsoconstantlyevolvingandchanging.Wetrytokeepupwiththesechanges,andbecauseofthiswearegivingyouthethirdeditionofthebookdedicatedtoElasticsearch2.x.

ThefirstthingweneedtodowithElasticsearchisinstallandconfigureit.Withmanyapplications,youstartwiththeinstallationandconfigurationandusuallyforgettheimportanceofthesesteps.Wewilltrytoguideyouthroughthesestepssothatitbecomeseasiertoremember.Inadditiontothis,wewillshowyouthesimplestwaytoindexandretrievedatawithoutgoingintotoomuchdetail.ThefirstchapterwilltakeyouonaquickridethroughElasticsearchandthefulltextsearchworld.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

FulltextsearchingThebasicsofApacheLucenePerformingtextanalysisThebasicconceptsofElasticsearchInstallingandconfiguringElasticsearchUsingtheElasticsearchRESTAPItomanipulatedataSearchingusingbasicURIrequests

www.EBooksWorld.ir

Page 59: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

FulltextsearchingBackinthedayswhenfulltextsearchingwasatermknowntoasmallpercentageofengineers,mostofususedSQLdatabasestoperformsearchoperations.UsingSQLdatabasestosearchforthedatastoredinthemwasokaytosomeextent.Suchasearchwasn’tfast,especiallyonlargeamountsofdata.Evennow,smallapplicationsareusuallygoodwithastandardLIKE%phrase%searchinaSQLdatabase.However,aswegodeeperanddeeper,westarttoseethelimitsofsuchanapproach—alackofscalability,notenoughflexibility,andalackoflanguageanalysis.Ofcourse,thereareadditionalmodulesthatextendSQLdatabaseswithfulltextsearchcapabilities,buttheyarestilllimitedcomparedtodedicatedfulltextsearchlibrariesandsearchenginessuchasElasticsearch.SomeofthosereasonsledtothecreationofApacheLucene(http://lucene.apache.org/),alibrarywrittencompletelyinJava(http://java.com/en/),whichisveryfast,light,andprovideslanguageanalysisforalargenumberoflanguagesspokenthroughouttheworld.

www.EBooksWorld.ir

Page 60: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheLuceneglossaryandarchitectureBeforegoingintothedetailsoftheanalysisprocess,wewouldliketointroduceyoutotheglossaryandoverallarchitectureofApacheLucene.WedecidedthatthisinformationiscrucialforunderstandinghowElasticsearchworks,andeventhoughthebookisnotaboutApacheLucene,knowingthefoundationoftheElasticsearchanalyticsandindexingengineisvitaltofullyunderstandhowthisgreatsearchengineworks.

Thebasicconceptsofthementionedlibraryareasfollows:

Document:Thisisthemaindatacarrierusedduringindexingandsearching,comprisingoneormorefieldsthatcontainthedataweputinandgetfromLucene.Field:Thisasectionofthedocument,whichisbuiltoftwoparts:thenameandthevalue.Term:Thisisaunitofsearchrepresentingawordfromthetext.Token:Thisisanoccurrenceofaterminthetextofthefield.Itconsistsofthetermtext,startandendoffsets,andatype.

ApacheLucenewritesalltheinformationtoastructurecalledtheinvertedindex.Itisadatastructurethatmapsthetermsintheindextothedocumentsandnottheotherwayaroundasarelationaldatabasedoesinitstables.Youcanthinkofaninvertedindexasadatastructurewheredataisterm-orientedratherthandocument-oriented.Let’sseehowasimpleinvertedindexwilllook.Forexample,let’sassumethatwehavedocumentswithonlyasinglefieldcalledtitletobeindexed,andthevaluesofthatfieldareasfollows:

ElasticsearchServer(document1)MasteringElasticsearchSecondEdition(document2)ApacheSolrCookbookThirdEdition(document3)

AverysimplifiedvisualizationoftheLuceneinvertedindexcouldlookasfollows:

Eachtermpointstothenumberofdocumentsitispresentin.Forexample,theterm

www.EBooksWorld.ir

Page 61: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

editionispresenttwiceinthesecondandthirddocuments.Suchastructureallowsforveryefficientandfastsearchoperationsinterm-basedqueries(butnotexclusively).Becausetheoccurrencesofthetermareconnectedtothetermsthemselves,Lucenecanuseinformationaboutthetermoccurrencestoperformfastandprecisescoringinformationbygivingeachdocumentavaluethatrepresentshowwelleachofthereturneddocumentsmatchedthequery.

Ofcourse,theactualindexcreatedbyLuceneismuchmorecomplicatedandadvancedbecauseofadditionalfilesthatincludeinformationsuchastermvectors(perdocumentinvertedindex),docvalues(columnorientedfieldinformation),storedfields(theoriginalandnottheanalyzedvalueofthefield),andsoon.However,allyouneedtoknowfornowishowthedataisorganizedandnotwhatexactlyisstored.

Eachindexisdividedintomultiplewrite-onceandread-many-timestructurescalledsegments.EachsegmentisaminiatureApacheLuceneindexonitsown.Whenindexing,afterasinglesegmentiswrittentothediskitcan’tbeupdated,orweshouldrathersayitcan’tbefullyupdated;documentscan’tberemovedfromit,theycanonlybemarkedasdeletedinaseparatefile.ThereasonthatLucenedoesn’tallowsegmentstobeupdatedisthenatureoftheinvertedindex.Afterthefieldsareanalyzedandputintotheinvertedindex,thereisnoeasywayofbuildingtheoriginaldocumentstructure.Whendeleting,Lucenewouldhavetodeletetheinformationfromthesegment,whichtranslatestoupdatingalltheinformationwithintheinvertedindexitself.

Becauseofthefactthatsegmentsarewrite-oncestructuresLuceneisabletomergesegmentstogetherinaprocesscalledsegmentmerging.Duringindexing,ifLucenethinksthattherearetoomanysegmentsfallingintothesamecriterion,anewandbiggersegmentwillbecreated—onethatwillhavedatafromtheothersegments.Duringthatprocess,Lucenewilltrytoremovedeleteddataandgetbackthespaceneededtoholdinformationaboutthosedocuments.SegmentmergingisademandingoperationbothintermsoftheI/OandCPU.Whatwehavetorememberfornowisthatsearchingwithonelargesegmentisfasterthansearchingwithmultiplesmalleronesholdingthesamedata.That’sbecause,ingeneral,searchingtranslatestojustmatchingthequerytermstotheonesthatareindexed.Youcanimaginehowsearchingthroughmultiplesmallsegmentsandmergingthoseresultswillbeslowerthanhavingasinglesegmentpreparingtheresults.

www.EBooksWorld.ir

Page 62: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

InputdataanalysisThetransformationofadocumentthatcomestoLuceneandisprocessedandputintotheinvertedindexformatiscalledindexation.OneofthethingsLucenehastododuringthisisdataanalysis.Youmaywantsomeofyourfieldstobeprocessedbyalanguageanalyzersothatwordssuchascarandcarsaretreatedasthesamebeyourindex.Ontheotherhand,youmaywantotherfieldstobedividedonlyonthewhitespacecharacterorbeonlylowercased.

Analysisisdonebytheanalyzer,whichisbuiltofatokenizerandzeroormoretokenfilters,anditcanalsohavezeroormorecharactermappers.

AtokenizerinLuceneisusedtosplitthetextintotokens,whicharebasicallythetermswithadditionalinformationsuchasitspositionintheoriginaltextanditslength.Theresultsofthetokenizer’sworkiscalledatokenstream,wherethetokensareputonebyoneandarereadytobeprocessedbythefilters.

Apartfromthetokenizer,theLuceneanalyzerisbuiltofzeroormoretokenfiltersthatareusedtoprocesstokensinthetokenstream.Someexamplesoffiltersareasfollows:

Lowercasefilter:MakesallthetokenslowercasedSynonymsfilter:ChangesonetokentoanotheronthebasisofsynonymrulesLanguagestemmingfilters:Responsibleforreducingtokens(actually,thetextpartthattheyprovide)intotheirrootorbaseformscalledthestem(https://en.wikipedia.org/wiki/Word_stem)

Filtersareprocessedoneafteranother,sowehavealmostunlimitedanalyticalpossibilitieswiththeadditionofmultiplefilters,oneafteranother.

Finally,thecharactermappersoperateonnon-analyzedtext—theyareusedbeforethetokenizer.Therefore,wecaneasilyremoveHTMLtagsfromwholepartsoftextwithoutworryingabouttokenization.

www.EBooksWorld.ir

Page 63: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IndexingandqueryingYoumaywonderhowalltheinformationwe’vedescribedsofaraffectsindexingandqueryingwhenusingLuceneandallthesoftwarethatisbuiltontopofit.Duringindexing,Lucenewilluseananalyzerofyourchoicetoprocessthecontentsofyourdocument;ofcourse,differentanalyzerscanbeusedfordifferentfields,sothenamefieldofyourdocumentcanbeanalyzeddifferentlycomparedtothesummaryfield.Forexample,thenamefieldmayonlybetokenizedonwhitespacesandlowercased,sothatexactmatchesaredoneandthesummaryfieldisstemmedinadditiontothat.Wecanalsodecidetonotanalyzethefieldsatall—wehavefullcontrolovertheanalysisprocess.

Duringaquery,yourquerytextcanbeanalyzedaswell.However,youcanalsochoosenottoanalyzeyourqueries.ThisiscrucialtorememberbecausesomeElasticsearchqueriesareanalyzedandsomearenot.Forexample,prefixandtermqueriesarenotanalyzed,andmatchqueriesareanalyzed(wewillgettothatinChapter3,SearchingYourData).Havingqueriesthatareanalyzedandnotanalyzedisveryuseful;sometimes,youmaywanttoqueryafieldthatisnotanalyzed,whilesometimesyoumaywanttohaveafulltextsearchanalysis.Forexample,ifwesearchfortheLightRedtermandthequeryisbeinganalyzedbythestandardanalyzer,thenthetermsthatwouldbesearchedarelightandred.Ifweuseaquerytypethathasnotbeenanalyzed,thenwewillexplicitlysearchfortheLightRedterm.Wemaynotwanttoanalyzethecontentofthequeryifweareonlyinterestedinexactmatches.

Whatyoushouldrememberaboutindexingandqueryinganalysisisthattheindexshouldmatchthequeryterm.Iftheydon’tmatch,Lucenewon’treturnthedesireddocuments.Forexample,ifyouusestemmingandlowercasingduringindexing,youneedtoensurethatthetermsinthequeryarealsolowercasedandstemmed,oryourquerieswon’treturnanyresultsatall.Forexample,let’sgetbacktoourLightRedtermthatweanalyzedduringindexing;wehaveitastwotermsintheindex:lightandred.IfwerunaLightRedqueryagainstthatdataanddon’tanalyzeit,wewon’tgetthedocumentintheresults—thequerytermdoesnotmatchtheindexedterms.Itisimportanttokeepthetokenfiltersinthesameorderduringindexingandquerytimeanalysissothatthetermsresultingfromsuchananalysisarethesame.

www.EBooksWorld.ir

Page 64: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ScoringandqueryrelevanceThereisoneadditionalthingthatweonlymentionedoncetillnow—scoring.Whatisthescoreofadocument?Thescoreisaresultofascoringformulathatdescribeshowwellthedocumentmatchesthequery.Bydefault,ApacheLuceneusestheTF/IDF(termfrequency/inversedocumentfrequency)scoringmechanism,whichisanalgorithmthatcalculateshowrelevantthedocumentisinthecontextofourquery.Ofcourse,itisnottheonlyalgorithmavailable,andwewillmentionotheralgorithmsintheMappingsconfigurationsectionofChapter2,IndexingYourData.

NoteIfyouwanttoreadmoreabouttheApacheLuceneTF/IDFscoringformula,pleasevisitApacheLuceneJavadocsfortheTFIDF.Thesimilarityclassisavailableathttp://lucene.apache.org/core/5_4_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

www.EBooksWorld.ir

Page 65: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 66: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThebasicsofElasticsearchElasticsearchisanopensourcesearchserverprojectstartedbyShayBanonandpublishedinFebruary2010.Duringthistime,theprojectgrewintoamajorplayerinthefieldofsearchanddataanalysissolutionsandiswidelyusedinmanycommonorlesser-knownsearchanddataanalysisplatforms.Inaddition,duetoitsdistributednatureandreal-timesearchandanalyticscapabilities,manyorganizationsuseitasadocumentstore.

www.EBooksWorld.ir

Page 67: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

KeyconceptsofElasticsearchInthenextfewpages,wewillgetyouthroughthebasicconceptsofElasticsearch.YoucanskipthissectionifyouarealreadyfamiliarwithElasticsearcharchitecture.However,ifyouarenotfamiliarwithElasticsearch,westronglyadviseyoutoreadthissection.Wewillrefertothekeywordsusedinthissectionintherestofthebook,andunderstandingthoseconceptsiscrucialtofullyutilizeElasticsearch.

IndexAnindexisthelogicalplacewhereElasticsearchstoresthedata.EachindexcanbespreadontomultipleElasticsearchnodesandisdividedintooneormoresmallerpiecescalledshardsthatarephysicallyplacedontheharddrives.Ifyouarecomingfromtherelationaldatabaseworld,youcanthinkofanindexlikeatable.However,theindexstructureispreparedforfastandefficientfulltextsearchingand,inparticular,doesnotstoreoriginalvalues.Thatstructureiscalledaninvertedindex(https://en.wikipedia.org/wiki/Inverted_index).

IfyouknowMongoDB,youcanthinkoftheElasticsearchindexasacollectioninMongoDB.IfyouarefamiliarwithCouchDB,youcanthinkaboutanindexasyouwouldabouttheCouchDBdatabase.Elasticsearchcanholdmanyindiceslocatedononemachineorspreadthemovermultipleservers.Aswehavealreadysaid,everyindexisbuiltofoneormoreshards,andeachshardcanhavemanyreplicas.

DocumentThemainentitystoredinElasticsearchisadocument.Adocumentcanhavemultiplefields,eachhavingitsowntypeandtreateddifferently.Usingtheanalogytorelationaldatabases,adocumentisarowofdatainadatabasetable.WhenyoucompareanElasticsearchdocumenttoaMongoDBdocument,youwillseethatbothcanhavedifferentstructures.ThethingtokeepinmindwhenitcomestoElasticsearchisthatfieldsthatarecommontomultipletypesinthesameindexneedtohavethesametype.Thismeansthatallthedocumentswithafieldcalledtitleneedtohavethesamedatatypeforit,forexample,string.

Documentsconsistoffields,andeachfieldmayoccurseveraltimesinasingledocument(suchafieldiscalledmultivalued).Eachfieldhasatype(text,number,date,andsoon).Thefieldtypescanalsobecomplex—afieldcancontainothersubdocumentsorarrays.ThefieldtypeisimportanttoElasticsearchbecausetypedetermineshowvariousoperationssuchasanalysisorsortingareperformed.Fortunately,thiscanbedeterminedautomatically(however,westillsuggestusingmappings;takealookatwhatfollows).

Unliketherelationaldatabases,documentsdon’tneedtohaveafixedstructure—everydocumentmayhaveadifferentsetoffields,andinadditiontothis,fieldsdon’thavetobeknownduringapplicationdevelopment.Ofcourse,onecanforceadocumentstructurewiththeuseofschema.Fromtheclient’spointofview,adocumentisaJSONobject(seemoreabouttheJSONformatathttps://en.wikipedia.org/wiki/JSON).Eachdocumentisstoredinoneindexandhasitsownuniqueidentifier,whichcanbegenerated

www.EBooksWorld.ir

Page 68: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

automaticallybyElasticsearch,anddocumenttype.Thethingtorememberisthatthedocumentidentifierneedstobeuniqueinsideanindexandshouldbeforagiventype.Thismeansthat,inasingleindex,twodocumentscanhavethesameuniqueidentifieriftheyarenotofthesametype.

DocumenttypeInElasticsearch,oneindexcanstoremanyobjectsservingdifferentpurposes.Forexample,ablogapplicationcanstorearticlesandcomments.Thedocumenttypeletsuseasilydifferentiatebetweentheobjectsinasingleindex.Everydocumentcanhaveadifferentstructure,butinreal-worlddeployments,dividingdocumentsintotypessignificantlyhelpsindatamanipulation.Ofcourse,oneneedstokeepthelimitationsinmind.Thatis,differentdocumenttypescan’tsetdifferenttypesforthesameproperty.Forexample,afieldcalledtitlemusthavethesametypeacrossalldocumenttypesinagivenindex.

MappingInthesectionaboutthebasicsoffulltextsearching(theFulltextsearchingsection),wewroteabouttheprocessofanalysis—thepreparationoftheinputtextforindexingandsearchingdonebytheunderlyingApacheLucenelibrary.Everyfieldofthedocumentmustbeproperlyanalyzeddependingonitstype.Forexample,adifferentanalysischainisrequiredforthenumericfields(numbersshouldn’tbesortedalphabetically)andforthetextfetchedfromwebpages(forexample,thefirststepwouldrequireyoutoomittheHTMLtagsasitisuselessinformation).Tobeabletoproperlyanalyzeatindexingandqueryingtime,Elasticsearchstorestheinformationaboutthefieldsofthedocumentsinso-calledmappings.Everydocumenttypehasitsownmapping,evenifwedon’texplicitlydefineit.

www.EBooksWorld.ir

Page 69: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

KeyconceptsoftheElasticsearchinfrastructureNow,wealreadyknowthatElasticsearchstoresitsdatainoneormoreindicesandeveryindexcancontaindocumentsofvarioustypes.WealsoknowthateachdocumenthasmanyfieldsandhowElasticsearchtreatsthesefieldsisdefinedbythemappings.Butthereismore.Fromthebeginning,Elasticsearchwascreatedasadistributedsolutionthatcanhandlebillionsofdocumentsandhundredsofsearchrequestspersecond.Thisisduetoseveralimportantkeyfeaturesandconceptsthatwearegoingtodescribeinmoredetailnow.

NodesandclustersElasticsearchcanworkasastandalone,single-searchserver.Nevertheless,tobeabletoprocesslargesetsofdataandtoachievefaulttoleranceandhighavailability,Elasticsearchcanberunonmanycooperatingservers.Collectively,theseserversconnectedtogetherarecalledaclusterandeachserverformingaclusteriscalledanode.

ShardsWhenwehavealargenumberofdocuments,wemaycometoapointwhereasinglenodemaynotbeenough—forexample,becauseofRAMlimitations,harddiskcapacity,insufficientprocessingpower,andaninabilitytorespondtoclientrequestsfastenough.Insuchcases,anindex(andthedatainit)canbedividedintosmallerpartscalledshards(whereeachshardisaseparateApacheLuceneindex).Eachshardcanbeplacedonadifferentserver,andthusyourdatacanbespreadamongtheclusternodes.Whenyouqueryanindexthatisbuiltfrommultipleshards,Elasticsearchsendsthequerytoeachrelevantshardandmergestheresultinsuchawaythatyourapplicationdoesn’tknowabouttheshards.Inadditiontothis,havingmultipleshardscanspeedupindexing,becausedocumentsendupindifferentshardsandthustheindexingoperationisparallelized.

ReplicasInordertoincreasequerythroughputorachievehighavailability,shardreplicascanbeused.Areplicaisjustanexactcopyoftheshard,andeachshardcanhavezeroormorereplicas.Inotherwords,Elasticsearchcanhavemanyidenticalshardsandoneofthemisautomaticallychosenasaplacewheretheoperationsthatchangetheindexaredirected.Thisspecialshardiscalledaprimaryshard,andtheothersarecalledreplicashards.Whentheprimaryshardislost(forexample,aserverholdingthesharddataisunavailable),theclusterwillpromotethereplicatobethenewprimaryshard.

GatewayTheclusterstateisheldbythegateway,whichstorestheclusterstateandindexeddataacrossfullclusterrestarts.Bydefault,everynodehasthisinformationstoredlocally;itissynchronizedamongnodes.WewilldiscussthegatewaymoduleinThegatewayandrecoverymodulessectionofChapter9,ElasticsearchCluster,indetail.

www.EBooksWorld.ir

Page 70: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IndexingandsearchingYoumaywonderhowyoucantiealltheindices,shards,andreplicastogetherinasingleenvironment.Theoretically,itwouldbeverydifficulttofetchdatafromtheclusterwhenyouhavetoknowwhereyourdocumentis:onwhichserver,andinwhichshard.Evenmoredifficultwouldbesearchingwhenonequerycanreturndocumentsfromdifferentshardsplacedondifferentnodesinthewholecluster.Infact,thisisacomplicatedproblem;fortunately,wedon’thavetocareaboutthisatall—itishandledautomaticallybyElasticsearch.Let’slookatthefollowingdiagram:

Whenyousendanewdocumenttothecluster,youspecifyatargetindexandsendittoanyofthenodes.Thenodeknowshowmanyshardsthetargetindexhasandisabletodeterminewhichshardshouldbeusedtostoreyourdocument.Elasticsearchcanalterthisbehavior;wewilltalkaboutthisintheIntroductiontoroutingsectioninChapter2,IndexingYourData.TheimportantinformationthatyouhavetorememberfornowisthatElasticsearchcalculatestheshardinwhichthedocumentshouldbeplacedusingtheuniqueidentifierofthedocument—thisisoneofthereasonseachdocumentneedsauniqueidentifier.Aftertheindexingrequestissenttoanode,thatnodeforwardsthedocumenttothetargetnode,whichhoststherelevantshard.

Now,let’slookatthefollowingdiagramonsearchingrequestexecution:

www.EBooksWorld.ir

Page 71: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Whenyoutrytofetchadocumentbyitsidentifier,thenodeyousendthequerytousesthesameroutingalgorithmtodeterminetheshardandthenodeholdingthedocumentandagainforwardstherequest,fetchestheresult,andsendstheresulttoyou.Ontheotherhand,thequeryingprocessisamorecomplicatedone.Thenodereceivingthequeryforwardsittoallthenodesholdingtheshardsthatbelongtoagivenindexandasksforminimuminformationaboutthedocumentsthatmatchthequery(theidentifierandscorearematchedbydefault),unlessroutingisused,whenthequerywillgodirectlytoasingleshardonly.Thisiscalledthescatterphase.Afterreceivingthisinformation,theaggregatornode(thenodethatreceivestheclientrequest)sortstheresultsandsendsasecondrequesttogetthedocumentsthatareneededtobuildtheresultslist(alltheotherinformationapartfromthedocumentidentifierandscore).Thisiscalledthegatherphase.Afterthisphaseisexecuted,theresultsarereturnedtotheclient.

Nowthequestionarises:whatisthereplica’sroleinthepreviouslydescribedprocess?Whileindexing,replicasareonlyusedasanadditionalplacetostorethedata.Whenexecutingaquery,bydefault,Elasticsearchwilltrytobalancetheloadamongtheshardanditsreplicassothattheyareevenlystressed.Also,rememberthatwecanchangethisbehavior;wewilldiscussthisintheUnderstandingthequeryingprocesssectioninChapter3,SearchingYourData.

www.EBooksWorld.ir

Page 72: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 73: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

InstallingandconfiguringyourclusterInstallingandrunningElasticsearcheveninproductionenvironmentsisveryeasynowadays,comparedtohowitwasinthedaysofElasticsearch0.20.x.FromasystemthatisnotreadytoonewithElasticsearch,thereareonlyafewstepsthatoneneedstogo.Wewillexplorethesestepsinthefollowingsection:

www.EBooksWorld.ir

Page 74: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

InstallingJavaElasticsearchisaJavaapplicationandtouseitweneedtomakesurethattheJavaSEenvironmentisinstalledproperly.ElasticsearchrequiresJavaVersion7orlatertorun.Youcandownloaditfromhttp://www.oracle.com/technetwork/java/javase/downloads/index.html.YoucanalsouseOpenJDK(http://openjdk.java.net/)ifyouwish.Youcan,ofcourse,useJavaVersion7,butitisnotsupportedbyOracleanymore,atleastwithoutcommercialsupport.Forexample,youcan’texpectnew,patchedversionsofJava7tobereleased.Becauseofthis,westronglysuggestthatyouinstallJava8,especiallygiventhatJava9seemstoberightaroundthecornerwiththegeneralavailabilityplannedtobereleasedinSeptember2016.

www.EBooksWorld.ir

Page 75: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

InstallingElasticsearchToinstallElasticsearchyoujustneedtogotohttps://www.elastic.co/downloads/elasticsearch,choosethelaststableversionofElasticsearch,downloadit,andunpackit.That’sit!Theinstallationiscomplete.

NoteAtthetimeofwriting,weusedasnapshotofElasticsearch2.2.Thismeansthatwe’veskippeddescribingsomepropertiesthatweremarkedasdeprecatedandareorwillberemovedinthefutureversionsofElasticsearch.

ThemaininterfacetocommunicatewithElasticsearchisbasedontheHTTPprotocolandREST.Thismeansthatyoucanevenuseawebbrowserforsomebasicqueriesandrequests,butforanythingmoresophisticatedyou’llneedtouseadditionalsoftware,suchasthecURLcommand.IfyouusetheLinuxorOSXcommand,thecURLpackageshouldalreadybeavailable.IfyouuseWindows,youcandownloadthepackagefromhttp://curl.haxx.se/download.html.

www.EBooksWorld.ir

Page 76: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RunningElasticsearchLet’srunourfirstinstancethatwejustdownloadedastheZIParchiveandunpacked.GotothebindirectoryandrunthefollowingcommandsdependingontheOS:

LinuxorOSX:./elasticsearchWindows:elasticsearch.bat

Congratulations!Now,youhaveyourElasticsearchinstanceup-and-running.Duringitswork,theserverusuallyusestwoportnumbers:thefirstoneforcommunicationwiththeRESTAPIusingtheHTTPprotocol,andthesecondoneforthetransportmoduleusedforcommunicationinaclusterandbetweenthenativeJavaclientandthecluster.ThedefaultportusedfortheHTTPAPIis9200,sowecanchecksearchreadinessbypointingthewebbrowsertohttp://127.0.0.1:9200/.Thebrowsershouldshowacodesnippetsimilartothefollowing:

{

"name":"Blob",

"cluster_name":"elasticsearch",

"version":{

"number":"2.2.0",

"build_hash":"5b1dd1cf5a1957682d84228a569e124fedf8e325",

"build_timestamp":"2016-01-13T18:12:26Z",

"build_snapshot":true,

"lucene_version":"5.4.0"

},

"tagline":"YouKnow,forSearch"

}

TheoutputisstructuredasaJavaScriptObjectNotation(JSON)object.IfyouarenotfamiliarwithJSON,pleasetakeaminuteandreadthearticleavailableathttps://en.wikipedia.org/wiki/JSON.

NoteElasticsearchissmart.Ifthedefaultportisnotavailable,theenginebindstothenextfreeport.Youcanfindinformationaboutthisontheconsoleduringbootingasfollows:

[2016-01-1320:04:49,953][INFO][http][Blob]publish_address

{127.0.0.1:9201},bound_addresses{[fe80::1]:9200},{[::1]:9200},

{127.0.0.1:9201}

Notethefragmentwith[http].Elasticsearchusesafewportsforvarioustasks.TheinterfacethatweareusingishandledbytheHTTPmodule.

Now,wewillusethecURLprogramtocommunicatewithElasticsearch.Forexample,tochecktheclusterhealth,wewillusethefollowingcommand:

curl-XGEThttp://127.0.0.1:9200/_cluster/health?pretty

The-XparameterisadefinitionoftheHTTPrequestmethod.ThedefaultvalueisGET(sointhisexample,wecanomitthisparameter).Fornow,donotworryabouttheGETvalue;wewilldescribeitinmoredetaillaterinthischapter.

www.EBooksWorld.ir

Page 77: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Asastandard,theAPIreturnsinformationinaJSONobjectinwhichnewlinecharactersareomitted.TheprettyparameteraddedtoourrequestsforcesElasticsearchtoaddanewlinecharactertotheresponse,makingtheresponsemoreuser-friendly.Youcantryrunningtheprecedingquerywithandwithoutthe?prettyparametertoseethedifference.

Elasticsearchisusefulinsmallandmedium-sizedapplications,butithasbeenbuiltwithlargeclustersinmind.So,nowwewillsetupourbigtwo-nodecluster.UnpacktheElasticsearcharchiveinadifferentdirectoryandrunthesecondinstance.Ifwelookatthelog,wewillseethefollowing:

[2016-01-1320:07:58,561][INFO][cluster.service][BigMan]

detected_master{Blob}{5QPh00RUQraeLHAInbR4Jw}{127.0.0.1}{127.0.0.1:9300},

added{{Blob}{5QPh00RUQraeLHAInbR4Jw}{127.0.0.1}{127.0.0.1:9300},},reason:

zen-disco-receive(frommaster[{Blob}{5QPh00RUQraeLHAInbR4Jw}{127.0.0.1}

{127.0.0.1:9300}])

Thismeansthatoursecondinstance(namedBigMan)discoveredthepreviouslyrunninginstance(namedBlob).Here,Elasticsearchautomaticallyformedanewtwo-nodecluster.StartingfromElasticsearch2.0,thiswillonlyworkwithnodesrunningonthesamephysicalmachine—becauseElasticsearch2.0nolongersupportsmulticast.Toallowyourclustertoform,youneedtoinformElasticsearchaboutthenodesthatshouldbecontactedinitiallyusingthediscovery.zen.ping.unicast.hostsarrayinelasticsearch.yml.Forexample,likethis:

discovery.zen.ping.unicast.hosts:["192.168.2.1","192.168.2.2"]

www.EBooksWorld.ir

Page 78: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ShuttingdownElasticsearchEventhoughweexpectourcluster(ornode)torunflawlesslyforalifetime,wemayneedtorestartitorshutitdownproperly(forexample,formaintenance).ThefollowingarethetwowaysinwhichwecanshutdownElasticsearch:

Ifyournodeisattachedtotheconsole,justpressCtrl+CThesecondoptionistokilltheserverprocessbysendingtheTERMsignal(seethekillcommandontheLinuxboxesandProgramManageronWindows)

NoteThepreviousversionsofElasticsearchexposedadedicatedshutdownAPIbut,in2.0,thisoptionhasbeenremovedbecauseofsecurityreasons.

www.EBooksWorld.ir

Page 79: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThedirectorylayoutNow,let’sgotothenewlycreateddirectory.Weshouldseethefollowingdirectorystructure:

Directory Description

Bin ThescriptsneededtorunElasticsearchinstancesandforpluginmanagement

Config Thedirectorywhereconfigurationfilesarelocated

Lib ThelibrariesusedbyElasticsearch

Modules ThepluginsbundledwithElasticsearch

AfterElasticsearchstarts,itwillcreatethefollowingdirectories(iftheydon’texist):

Directory Description

Data ThedirectoryusedbyElasticsearchtostoreallthedata

Logs Thefileswithinformationabouteventsanderrors

Plugins Thelocationtostoretheinstalledplugins

Work ThetemporaryfilesusedbyElasticsearch

www.EBooksWorld.ir

Page 80: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ConfiguringElasticsearchOneofthereasons—ofcourse,nottheonlyone—whyElasticsearchisgainingmoreandmorepopularityisthatgettingstartedwithElasticsearchisquiteeasy.Becauseofthereasonabledefaultvaluesandautomaticsettingsforsimpleenvironments,wecanskiptheconfigurationandgostraighttoindexingandquerying(ortothenextchapterofthebook).Wecandoallthiswithoutchangingasinglelineinourconfigurationfiles.However,inordertotrulyunderstandElasticsearch,itisworthunderstandingsomeoftheavailablesettings.

WewillnowexplorethedefaultdirectoriesandthelayoutofthefilesprovidedwiththeElasticsearchtar.gzarchive.Theentireconfigurationislocatedintheconfigdirectory.Wecanseetwofileshere:elasticsearch.yml(orelasticsearch.json,whichwillbeusedifpresent)andlogging.yml.Thefirstfileisresponsibleforsettingthedefaultconfigurationvaluesfortheserver.Thisisimportantbecausesomeofthesevaluescanbechangedatruntimeandcanbekeptasapartoftheclusterstate,sothevaluesinthisfilemaynotbeaccurate.Thetwovaluesthatwecannotchangeatruntimearecluster.nameandnode.name.

Thecluster.namepropertyisresponsibleforholdingthenameofourcluster.Theclusternameseparatesdifferentclustersfromeachother.Nodesconfiguredwiththesameclusternamewilltrytoformacluster.

Thesecondvalueistheinstance(thenode.nameproperty)name.Wecanleavethisparameterundefined.Inthiscase,Elasticsearchautomaticallychoosesauniquenameforitself.Notethatthisnameischosenduringeachstartup,sothenamecanbedifferentoneachrestart.DefiningthenamecanhelpfulwhenreferringtoconcreteinstancesbytheAPIorwhenusingmonitoringtoolstoseewhatishappeningtoanodeduringlongperiodsoftimeandbetweenrestarts.Thinkaboutgivingdescriptivenamestoyournodes.

Otherparametersarecommentedwellinthefile,soweadviseyoutolookthroughit;don’tworryifyoudonotunderstandtheexplanation.Wehopethateverythingwillbecomeclearerafterreadingthenextfewchapters.

NoteRememberthatmostoftheparametersthathavebeensetintheelasticsearch.ymlfilecanbeoverwrittenwiththeuseoftheElasticsearchRESTAPI.WewilltalkaboutthisAPIinTheupdatesettingsAPIsectionofChapter9,ElasticsearchClusterinDetail.

Thesecondfile(logging.yml)defineshowmuchinformationiswrittentosystemlogs,definesthelogfiles,andcreatesnewfilesperiodically.Changesinthisfileareusuallyrequiredonlywhenyouneedtoadapttomonitoringorbackupsolutionsorduringsystemdebugging;however,ifyouwanttohaveamoredetailedlogging,youneedtoadjustitaccordingly.

Let’sleavetheconfigurationfilesfornowandlookatthebaseforalltheapplications—theoperatingsystem.TuningyouroperatingsystemisoneofthekeypointstoensurethatyourElasticsearchinstancewillworkwell.Duringindexing,especiallywhenhaving

www.EBooksWorld.ir

Page 81: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

manyshardsandreplicas,Elasticsearchwillcreatemanyfiles;so,thesystemcannotlimittheopenfiledescriptorstolessthan32,000.ForLinuxservers,thiscanusuallybechangedin/etc/security/limits.confandthecurrentvaluecanbedisplayedusingtheulimitcommand.Ifyouendupreachingthelimit,Elasticsearchwillnotbeabletocreatenewfiles;somergingwillfail,indexingmayfail,andnewindiceswillnotbecreated.

NoteOnMicrosoftWindowsplatforms,thedefaultlimitismorethan16millionhandlesperprocess,whichshouldbemorethanenough.YoucanreadmoreaboutfilehandlesontheMicrosoftWindowsplatformathttps://blogs.technet.microsoft.com/markrussinovich/2009/09/29/pushing-the-limits-of-windows-handles/.

ThenextsetofsettingsisconnectedtotheJavaVirtualMachine(JVM)heapmemorylimitforasingleElasticsearchinstance.Forsmalldeployments,thedefaultmemorylimit(1,024MB)willbesufficient,butforlargeonesitwillnotbeenough.IfyouspotentriesthatindicateOutOfMemoryErrorexceptionsinalogfile,settheES_HEAP_SIZEvariabletoavaluegreaterthan1024.WhenchoosingtherightamountofmemorysizetobegiventotheJVM,rememberthat,ingeneral,nomorethan50percentofyourtotalsystemmemoryshouldbegiven.However,aswithalltherules,thereareexceptions.Wewilldiscussthisingreaterdetaillater,butyoushouldalwaysmonitoryourJVMheapusageandadjustitwhenneeded.

www.EBooksWorld.ir

Page 82: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Thesystem-specificinstallationandconfigurationAlthoughdownloadinganarchivewithElasticsearchandunpackingitworksandisconvenientfortesting,therearededicatedmethodsforLinuxoperatingsystemsthatgiveyouseveraladvantageswhenyoudoproductiondeployment.Inproductiondeployments,theElasticsearchserviceshouldberunautomaticallywithasystemboot;weshouldhavededicatedstartandstopscripts,unifiedpaths,andsoon.ElasticsearchsupportsinstallationpackagesforvariousLinuxdistributionsthatwecanuse.Let’sseehowthisworks.

InstallingElasticsearchonLinuxTheotherwaytoinstallElasticsearchonaLinuxoperatingsystemistousepackagessuchasRPMorDEB,dependingonyourLinuxdistributionandthesupportedpackagetype.Thiswaywecanautomaticallyadapttosystemdirectorylayout;forexample,configurationandlogswillgointotheirstandardplacesinthe/etc/or/var/logdirectories.Butthisisnottheonlything.Whenusingpackages,Elasticsearchwillalsoinstallstartupscriptsandmakeourlifeeasier.What’smore,wewillbeabletoupgradeElasticsearcheasilybyrunningasinglecommandfromthecommandline.Ofcourse,thementionedpackagescanbefoundatthesameURLaddressaswementionedpreviouslywhenwetalkedaboutinstallingElasticsearchfromziportar.gzpackages:https://www.elastic.co/downloads/elasticsearch.Elasticsearchcanalsobeinstalledfromremoterepositoriesviastandarddistributiontoolssuchasapt-getoryum.

NoteBeforeinstallingElasticsearch,makesurethatyouhaveaproperversionofJavaVirtualMachineinstalled.

InstallingElasticsearchusingRPMpackages

WhenusingaLinuxdistributionthatsupportsRPMpackagessuchasFedoraLinux,(https://getfedora.org/)Elasticsearchinstallationisveryeasy.AfterdownloadingtheRPMpackage,wejustneedtorunthefollowingcommandasroot:

yumelasticsearch-2.2.0.noarch.rpm

Alternatively,youcanaddtheremoterepositoryandinstallElasticsearchfromit(thiscommandneedstoberunasrootaswell):

rpm--importhttps://packages.elastic.co/GPG-KEY-elasticsearch

ThiscommandaddstheGPGkeyandallowsthesystemtoverifythatthefetchedpackagereallycomesfromElasticsearchdevelopers.Inthesecondstep,weneedtocreatetherepositorydefinitioninthe/etc/yum.repos.d/elasticsearch.repofile.Weneedtoaddthefollowingentriestothisfile:

[elasticsearch-2.2]

name=Elasticsearchrepositoryfor2.2.xpackages

baseurl=http://packages.elastic.co/elasticsearch/2.x/centos

gpgcheck=1

www.EBooksWorld.ir

Page 83: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

gpgkey=http://packages.elastic.co/GPG-KEY-elasticsearch

enabled=1

Nowit’stimetoinstalltheElasticsearchserver,whichisassimpleasrunningthefollowingcommand(again,don’tforgettorunitasroot):

yuminstallelasticsearch

Elasticsearchwillbeautomaticallydownloaded,verified,andinstalled.

InstallingElasticsearchusingtheDEBpackage

WhenusingaLinuxdistributionthatsupportsDEBpackages(suchasDebian),installingElasticsearchisagainveryeasy.AfterdownloadingtheDEBpackage,allyouneedtodoisrunthefollowingcommand:

sudodpkg-ielasticsearch-2.2.0.deb

Itisassimpleasthat.Anotherway,whichissimilartowhatwedidwithRPMpackages,isbycreatinganewpackagessourceandinstallingElasticsearchfromtheremoterepository.ThefirststepistoaddthepublicGPGkeyusedforpackageverification.Wecandothatusingthefollowingcommand:

wget-qO-https://packages.elastic.co/GPG-KEY-elasticsearch|sudoapt-key

add-

ThesecondstepisbyaddingtheDEBpackagelocation.Weneedtoaddthefollowinglinetothe/etc/apt/sources.listfile:

debhttp://packages.elastic.co/elasticsearch/2.2/debianstablemain

ThisdefinesthesourcefortheElasticsearchpackages.ThelaststepisupdatingthelistofremotepackagesandinstallingElasticsearchusingthefollowingcommand:

sudoapt-getupdate&&sudoapt-getinstallelasticsearch

Elasticsearchconfigurationfilelocalization

WhenusingpackagestoinstallElasticsearch,theconfigurationfilesareinslightlydifferentdirectoriesthanthedefaultconfdirectory.Aftertheinstallation,theconfigurationfilesshouldbestoredinthefollowinglocation:

/etc/sysconfig/elasticsearchor/etc/default/elasticsearch:AfilewiththeconfigurationoftheElasticsearchprocessasausertorunas,directoriesforlogs,dataandmemorysettings/etc/elasticsearch/:AdirectoryfortheElasticsearchconfigurationfiles,suchastheelasticsearch.ymlfile

ConfiguringElasticsearchasasystemserviceonLinuxIfeverythinggoeswell,youcanrunElasticsearchusingthefollowingcommand:

/bin/systemctlstartelasticsearch.service

IfyouwantElasticsearchtostartautomaticallyeverytimetheoperatingsystemstarts,you

www.EBooksWorld.ir

Page 84: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

cansetupElasticsearchasasystemservicebyrunningthefollowingcommand:

/bin/systemctlenableelasticsearch.service

ElasticsearchasasystemserviceonWindowsInstallingElasticsearchasasystemserviceonWindowsisalsoveryeasy.YoujustneedtogotoyourElasticsearchinstallationdirectory,thengotothebinsubdirectory,andrunthefollowingcommand:

service.batinstall

You’llbeaskedforpermissiontodoso.Ifyouallowthescripttorun,ElasticsearchwillbeinstalledasaWindowsservice.

Ifyouwouldliketoseeallthecommandsexposedbytheservice.batscriptfile,justrunthefollowingcommandinthesamedirectoryasearlier:

service.bat

Forexample,tostartElasticsearch,wewilljustrunthefollowingcommand:

service.batstart

www.EBooksWorld.ir

Page 85: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 86: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ManipulatingdatawiththeRESTAPIElasticsearchexposesaveryrichRESTAPIthatcanbeusedtosearchthroughthedata,indexthedata,andcontrolElasticsearchbehavior.YoucanimaginethatusingtheRESTAPIallowsyoutogetasingledocument,indexorupdateadocument,gettheinformationonElasticsearchcurrentstate,createordeleteindices,orforceElasticsearchtomovearoundshardsofyourindices.Ofcourse,theseareonlyexamplesthatshowwhatyoucanexpectfromtheElasticsearchRESTAPI.Fornow,wewillconcentrateonusingthecreate,retrieve,update,delete(CRUD)partoftheElasticsearchAPI(https://en.wikipedia.org/wiki/Create,_read,_update_and_delete),whichallowsustouseElasticsearchinafashionsimilartohowwewoulduseanyotherNoSQL(https://en.wikipedia.org/wiki/NoSQL)datastore.

www.EBooksWorld.ir

Page 87: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UnderstandingtheRESTAPIIfyou’veneverusedanapplicationexposingtheRESTAPI,youmaybesurprisedhoweasyitistousesuchapplicationsandrememberhowtousethem.InREST-likearchitectures,everyrequestisdirectedtoaconcreteobjectindicatedbyapathintheaddress.Forexample,let’sassumethatourhypotheticalapplicationexposesthe/booksRESTend-pointasareferencetothelistofbooks.Insuchcase,acallto/books/1couldbeareferencetoaconcretebookwiththeidentifier1.Youcanthinkofitasadata-orientedmodelofanAPI.Ofcourse,wecannestthepaths—forexample,apathsuchas/books/1/chapterscouldreturnthelistofchaptersofourbookwithidentifier1andapathsuchas/books/1/chapters/6couldbeareferencetothesixthchapterinthatparticularbook.

Wetalkedaboutpaths,butwhenusingtheHTTPprotocol,(https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol)wehavesomeadditionalverbs(suchasPOST,GET,PUT,andsoon.)thatwecanusetodefinesystembehaviorinadditiontopaths.Soifwewouldliketoretrievethebookwithidentifier1,wewouldusetheGETrequestmethodwiththe/books/1path.However,wewouldusethePUTrequestmethodwiththesamepathtocreateabookrecordwiththeidentifierorone,thePOSTrequestmethodtoaltertherecord,DELETEtoremovethatentry,andtheHEADrequestmethodtogetbasicinformationaboutthedatareferencedbythepath.

Now,let’slookatexampleHTTPrequeststhataresenttorealElasticsearchRESTAPIendpoints,sotheprecedinghypotheticalinformationwillbeturnedintosomethingreal:

GEThttp://localhost:9200/:ThisretrievesbasicinformationaboutElasticsearch,suchastheversion,thenameofthenodethatthecommandhasbeensentto,thenameoftheclusterthatnodeisconnectedto,theApacheLuceneversion,andsoon.

GEThttp://localhost:9200/_cluster/state/nodes/Thisretrievesinformationaboutallthenodesinthecluster,suchastheiridentifiers,names,transportaddresseswithports,andadditionalnodeattributesforeachnode.

DELETEhttp://localhost:9200/books/book/123:Thisdeletesadocumentthatisindexedinthebooksindex,withthebooktypeandanidentifierof123.

WenowknowwhatRESTmeansandwecanstartconcentratingonElasticsearchtoseehowwecanstore,retrieve,alter,anddeletethedatafromitsindices.IfyouwouldliketoreadmoreaboutREST,pleaserefertohttp://en.wikipedia.org/wiki/Representational_state_transfer.

www.EBooksWorld.ir

Page 88: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

StoringdatainElasticsearchInElasticsearch,everydocumentisrepresentedbythreeattributes—theindex,thetype,andtheidentifier.Eachdocumentmustbeindexedintoasingleindex,needstohaveitstypecorrespondtothedocumentstructure,andisdescribedbytheidentifier.ThesethreeattributesallowsustoidentifyanydocumentinElasticsearchandneedstobeprovidedwhenthedocumentisphysicallywrittentotheunderlyingApacheLuceneindex.Havingtheknowledge,wearenowreadytocreateourfirstElasticsearchdocument.

CreatinganewdocumentWewillstartlearningtheElasticsearchRESTAPIbyindexingonedocument.Let’simaginethatwearebuildingaCMSsystem(http://en.wikipedia.org/wiki/Content_management_system)thatwillprovidethefunctionalityofabloggingplatformforourinternalusers.Wewillhavedifferenttypesofdocumentsinourindices,butthemostimportantonesarethearticlesthatwillbepublishedandarereadablebyusers.

BecausewetalktoElasticsearchusingJSONnotationandElasticsearchrespondstousagainusingJSON,ourexampledocumentcouldlookasfollows:

{

"id":"1",

"title":"NewversionofElasticsearchreleased!",

"content":"Version2.2releasedtoday!",

"priority":10,

"tags":["announce","elasticsearch","release"]

}

Asyoucanseeintheprecedingcodesnippet,theJSONdocumentisbuiltwithasetoffields,whereeachfieldcanhaveadifferentformat.Inourexample,wehaveasetoftextfields(id,title,andcontent),wehaveanumber(thepriorityfield),andanarrayoftextvalues(thetagsfield).Wewillshowdocumentsthataremorecomplicatedinthenextexamples.

NoteOneofthechangesintroducedinElasticsearch2.0hasbeenthatfieldnamescan’tcontainthedotcharacter.SuchfieldnameswerepossibleinolderversionsofElasticsearch,butcouldresultinserializationerrorsincertaincasesandthusElasticsearchcreatorsdecidedtoremovethatpossibility.

OnethingtorememberisthatbydefaultElasticsearchworksasaschema-lessdatastore.ThismeansthatitcantrytoguessthetypeofthefieldinadocumentsenttoElasticsearch.Itwilltrytousenumerictypesforthevaluesthatarenotenclosedinquotationmarksandstringsfordataenclosedinquotationmarks.Itwilltrytoguessthedateandindexthemindedicatedfieldsandsoon.ThisispossiblebecausetheJSONformatissemi-typed.Internally,whenthefirstdocumentwithanewfieldissenttoElasticsearch,itwillbeprocessedandmappingswillbewritten(wewilltalkmoreaboutmappingsintheMappingsconfigurationsectionofChapter2,IndexingYourData).

www.EBooksWorld.ir

Page 89: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

NoteAschema-lessapproachanddynamicmappingscanbeproblematicwhendocumentscomewithaslightlydifferentstructure—forexample,thefirstdocumentwouldcontainthevalueofthepriorityfieldwithoutquotationmarks(liketheoneshowninthediscussedexample),whiletheseconddocumentwouldhavequotationmarksforthevalueinthepriorityfield.ThiswillresultinanerrorbecauseElasticsearchwilltrytoputatextvalueinthenumericfieldandthisisnotpossibleinLucene.Becauseofthis,itisadvisabletodefineyourownmappings,whichyouwilllearnintheMappingsconfigurationsectionofChapter2,IndexingYourData.

Let’snowindexourdocumentandmakeitavailableforretrievalandsearching.Wewillindexourarticlestoanindexcalledblogunderatypenamedarticle.Wewillalsogiveourdocumentanidentifierof1,asthisisourfirstdocument.Toindexourexampledocument,wewillexecutethefollowingcommand:

curl-XPUT'http://localhost:9200/blog/article/1'-d'{"title":"New

versionofElasticsearchreleased!","content":"Version2.2released

today!","priority":10,"tags":["announce","elasticsearch","release"]

}'

Noteanewoptiontothecurlcommand,the-dparameter.Thevalueofthisoptionisthetextthatwillbeusedasarequestpayload—arequestbody.Thisway,wecansendadditionalinformationsuchasthedocumentdefinition.Also,notethattheuniqueidentifierisplacedintheURLandnotinthebody.Ifyouomitthisidentifier(whileusingtheHTTPPUTrequest),theindexingrequestwillreturnthefollowingerror:

Nohandlerfoundforuri[/blog/article]andmethod[PUT]

Ifeverythingworkedcorrectly,ElasticsearchwillreturnaJSONresponseinformingusaboutthestatusoftheindexingoperation.Thisresponseshouldbesimilartothefollowingone:

{

"_index":"blog",

"_type":"article",

"_id":"1",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0},

"created":true

}

Intheprecedingresponse,Elasticsearchincludedinformationaboutthestatusoftheoperation,index,type,identifier,andversion.Wecanalsoseeinformationabouttheshardsthattookpartintheoperation—allofthem,theonesthatweresuccessfulandtheonesthatfailed.

Automaticidentifiercreation

www.EBooksWorld.ir

Page 90: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Inthepreviousexample,wespecifiedthedocumentidentifiermanuallywhenweweresendingthedocumenttoElasticsearch.However,thereareusecaseswhenwedon’thaveanidentifierforourdocuments—forexample,whenhandlinglogsasourdata.Insuchcases,wewouldlikesomeapplicationtocreatetheidentifierforusandElasticsearchcanbesuchanapplication.Ofcourse,generatingdocumentidentifiersdoesn’tmakesensewhenyourdocumentalreadyhasthem,suchasdatainarelationaldatabase.Insuchcases,youmaywanttoupdatethedocuments;inthiscase,automaticidentifiergenerationisnotthebestidea.However,whenweareinneedofsuchfunctionality,insteadofusingtheHTTPPUTmethodwecanusePOSTandomittheidentifierintheRESTAPIpath.SoifwewouldlikeElasticsearchtogeneratetheidentifierinthepreviousexample,wewouldsendacommandlikethis:

curl-XPOST'http://localhost:9200/blog/article/'-d'{"title":"New

versionofElasticsearchreleased!","content":"Version2.2released

today!","priority":10,"tags":["announce","elasticsearch","release"]

}'

We’veusedtheHTTPPOSTmethodinsteadofPUTandwe’veomittedtheidentifier.TheresponseproducedbyElasticsearchinsuchacasewouldbeasfollows:

{

"_index":"blog",

"_type":"article",

"_id":"AU1y-s6w2WzST_RhTvCJ",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0},

"created":true

}

Asyoucansee,theresponsereturnedbyElasticsearchisalmostthesameasinthepreviousexample,withaminordifference—the_idfieldisreturned.Now,insteadofthe1value,wehaveavalueofAU1y-s6w2WzST_RhTvCJ,whichistheidentifierElasticsearchgeneratedforourdocument.

www.EBooksWorld.ir

Page 91: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RetrievingdocumentsWenowhavetwodocumentsindexedintoourElasticsearchinstance—oneusingaexplicitidentifierandoneusingageneratedidentifier.Let’snowtrytoretrieveoneofthedocumentsusingitsuniqueidentifier.Todothis,wewillneedinformationabouttheindexthedocumentisindexedin,whattypeithas,andofcoursewhatidentifierithas.Forexample,togetthedocumentfromtheblogindexwiththearticletypeandtheidentifierof1,wewouldrunthefollowingHTTPGETrequest:

curl-XGET'localhost:9200/blog/article/1?pretty'

NoteTheadditionalURIpropertycalledprettytellsElasticsearchtoincludenewlinecharactersandadditionalwhitespacesinresponsetomaketheoutputeasiertoreadforusers.

Elasticsearchwillreturnaresponsesimilartothefollowing:

{

"_index":"blog",

"_type":"article",

"_id":"1",

"_version":1,

"found":true,

"_source":{

"title":"NewversionofElasticsearchreleased!",

"content":"Version2.2releasedtoday!",

"priority":10,

"tags":["announce","elasticsearch","release"]

}

}

Asyoucanseeintheprecedingresponse,Elasticsearchreturnedthe_sourcefield,whichistheoriginaldocumentsenttoElasticsearchandafewadditionalfieldsthattellusaboutthedocument,suchastheindex,type,identifier,documentversion,andofcourseinformationastowhetherthedocumentwasfoundornot(thefoundproperty).

Ifwetrytoretrieveadocumentthatisnotpresentintheindex,suchastheonewiththe12345identifier,wegetaresponselikethis:

{

"_index":"blog",

"_type":"article",

"_id":"12345",

"found":false

}

Asyoucansee,thistimethevalueofthefoundpropertywassettofalseandtherewasno_sourcefieldbecausethedocumenthasnotbeenretrieved.

www.EBooksWorld.ir

Page 92: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UpdatingdocumentsUpdatingdocumentsintheindexisamorecomplicatedtaskcomparedtoindexing.WhenthedocumentisindexedandElasticsearchflushesthedocumenttoadisk,itcreatessegments—animmutablestructurethatiswrittenonceandreadmanytimes.ThisisdonebecausetheinvertedindexcreatedbyApacheLuceneiscurrentlyimpossibletoupdate(atleastmostofitsparts).Toupdateadocument,ElasticsearchinternallyfirstfetchesthedocumentusingtheGETrequest,modifiesits_sourcefield,removestheolddocument,andindexesanewdocumentusingtheupdatedcontent.ThecontentupdateisdoneusingscriptsinElasticsearch(wewilltalkmoreaboutscriptinginElasticsearchintheScriptingcapabilitiesofElasticsearchsectioninChapter6,MakeYourSearchBetter).

NotePleasenotethatthefollowingdocumentupdateexamplesrequireyoutoputthescript.inline:onpropertyintoyourelasticsearch.ymlconfigurationfile.ThisisneededbecauseinlinescriptingisdisabledinElasticsearchforsecurityreasons.TheotherwaytohandleupdatesistostorethescriptcontentinthefileintheElasticsearchconfigurationdirectory,butwewilltalkaboutthatintheScriptingcapabilitiesofElasticsearchsectioninChapter6,MakeYourSearchBetter.

Let’snowtrytoupdateourdocumentwithidentifier1bymodifyingitscontentfieldtocontaintheThisistheupdateddocumentsentence.Todothis,weneedtorunaPOSTHTTPrequestonthedocumentpathusingthe_updateRESTend-point.Ourrequesttomodifythedocumentwouldlookasfollows:

curl-XPOST'http://localhost:9200/blog/article/1/_update'-d'{

"script":"ctx._source.content=new_content",

"params":{

"new_content":"Thisistheupdateddocument"

}

}'

Asyoucansee,we’vesenttherequesttothe/blog/article/1/_updateRESTend-point.Intherequestbody,we’veprovidedtwoparameters—theupdatescriptinthescriptpropertyandtheparametersofthescript.Thescriptisverysimple;ittakesthe_sourcefieldandmodifiesthecontentfieldbysettingitsvaluetothevalueofthenew_contentparameter.Theparamspropertycontainsallthescriptparameters.

Fortheprecedingupdatecommandexecution,Elasticsearchwouldreturnthefollowingresponse:

{"_index":"blog","_type":"article","_id":"1","_version":2,"_shards":

{"total":2,"successful":1,"failed":0}}

Thethingtolookatintheprecedingresponseisthe_versionfield.Rightnow,theversionis2,whichmeansthatthedocumenthasbeenupdated(orre-indexed)once.Basically,eachupdatemakesElasticsearchupdatethe_versionfield.

Wecouldalsoupdatethedocumentusingthedocsectionandprovidingthechangedfield,

www.EBooksWorld.ir

Page 93: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

forexample:

curl-XPOST'http://localhost:9200/blog/article/1/_update'-d'{

"doc":{

"content":"Thisistheupdateddocument"

}

}'

Wenowretrievethedocumentusingthefollowingcommand:

curl-XGET'http://localhost:9200/blog/article/1?pretty'

AndwegetthefollowingresponsefromElasticsearch:

{

"_index":"blog",

"_type":"article",

"_id":"1",

"_version":2,

"found":true,

"_source":{

"title":"NewversionofElasticsearchreleased!",

"content":"Thisistheupdateddocument",

"priority":10,

"tags":["announce","elasticsearch","release"]

}

}

Asyoucansee,thedocumenthasbeenupdatedproperly.

NoteThethingtorememberwhenusingtheupdateAPIofElasticsearchisthatthe_sourcefieldneedstobepresentbecausethisisthefieldthatElasticsearchusestoretrievetheoriginaldocumentcontentfromtheindex.Bydefault,thatfieldisenabledandElasticsearchusesittostoretheoriginaldocument.

Dealingwithnon-existingdocumentsThenicethingwhenitcomestodocumentupdates,whichwewouldliketomentionasitcancomeinhandywhenusingElasticsearchUpdateAPI,isthatwecandefinewhatElasticsearchshoulddowhenthedocumentwetrytoupdateisnotpresent.

Forexample,let’stryincrementingthepriorityfieldvalueforanon-existingdocumentwithidentifier2:

curl-XPOST'http://localhost:9200/blog/article/2/_update'-d'{

"script":"ctx._source.priority+=1"

}'

TheresponsereturnedbyElasticsearchwouldlookmoreorlessasfollows:

{"error":{"root_cause":[{"type":"document_missing_exception","reason":"

[article][2]:document

missing","shard":"2","index":"blog"}],"type":"document_missing_exception","

reason":"[article][2]:document

www.EBooksWorld.ir

Page 94: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

missing","shard":"2","index":"blog"},"status":404}

Asyoucanimagine,thedocumenthasnotbeenupdatedbecauseitdoesn’texist.Sonow,let’smodifyourrequesttoincludetheupsertsectioninourrequestbodythatwilltellElasticsearchwhattodowhenthedocumentisnotpresent.Thenewcommandwouldlookasfollows:

curl-XPOST'http://localhost:9200/blog/article/2/_update'-d'{

"script":"ctx._source.priority+=1",

"upsert":{

"title":"Emptydocument",

"priority":0,

"tags":["empty"]

}

}'

Withthemodifiedrequest,anewdocumentwouldbeindexed;ifweretrieveitusingtheGETAPI,itwilllookasfollows:

{

"_index":"blog",

"_type":"article",

"_id":"2",

"_version":1,

"found":true,

"_source":{

"title":"Emptydocument",

"priority":0,

"tags":["empty"]

}

}

Asyoucansee,thefieldsfromtheupsertsectionofourupdaterequestweretakenbyElasticsearchandusedasdocumentfields.

AddingpartialdocumentsInadditiontowhatwealreadywroteabouttheupdateAPI,Elasticsearchisalsocapableofmergingpartialdocumentsfromtheupdaterequesttoalreadyexistingdocumentsorindexingnewdocumentsusinginformationabouttherequest,similartowhatwesawseenwiththeupsertsection.

Let’simaginethatwewouldliketoupdateourinitialdocumentandaddanewfieldcalledcounttoit(settingitto1initially).Wewouldalsoliketoindexthedocumentunderthespecifiedidentifierifthedocumentisnotpresent.Wecandothisbyrunningthefollowingcommand:

curl-XPOST'http://localhost:9200/blog/article/1/_update'-d'{

"doc":{

"count":1

},

"doc_as_upsert":true

}

Wespecifiedthenewfieldinthedocsectionandwesaidthatwewantthedocsectionto

www.EBooksWorld.ir

Page 95: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

betreatedastheupsertsectionwhenthedocumentisnotpresent(withthedoc_as_upsertpropertysettotrue).

Ifwenowretrievethatdocument,weseethefollowingresponse:

{

"_index":"blog",

"_type":"article",

"_id":"1",

"_version":3,

"found":true,

"_source":{

"title":"NewversionofElasticsearchreleased!",

"content":"Thisistheupdateddocument",

"priority":10,

"tags":["announce","elasticsearch","release"],

"count":1

}

}

NoteForafullreferenceondocumentupdates,pleaserefertotheofficialElasticsearchdocumentationontheUpdateAPI,whichisavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html.

www.EBooksWorld.ir

Page 96: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DeletingdocumentsNowthatweknowhowtoindexdocuments,updatethem,andretrievethem,itistimetolearnabouthowwecandeletethem.DeletingadocumentfromanElasticsearchindexisverysimilartoretrievingit,butwithonemajordifference—insteadofusingtheHTTPGETmethod,wehavetouseHTTPDELETEone.

Forexample,ifwewouldliketodeletethedocumentindexedintheblogindexunderthearticletypeandwithanidentifierof1,wewouldrunthefollowingcommand:

curl-XDELETE'localhost:9200/blog/article/1'

TheresponsefromElasticsearchindicatesthatthedocumenthasbeendeletedandshouldlookasfollows:

{

"found":true,

"_index":"blog",

"_type":"article",

"_id":"1",

"_version":4,

"_shards":{

"total":2,

"successful":1,

"failed":0

}

}

Ofcourse,thisisnottheonlythingwhenitcomestodeleting.Wecanalsoremoveallthedocumentsofagiventype.Forexample,ifwewouldliketodeletetheentireblogindex,weshouldjustomittheidentifierandthetype,sothecommandwouldlooklikethis:

curl-XDELETE'localhost:9200/blog'

Theprecedingcommandwouldresultinthedeletionoftheblogindex.

www.EBooksWorld.ir

Page 97: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

VersioningFinally,thereisonelastthingthatwewouldliketotalkaboutwhenitcomestodatamanipulationinElasticsearch—thegreatfeatureofversioning.Asyoumayhavealreadynoticed,Elasticsearchincrementsthedocumentversionwhenitdoesupdatestoit.Wecanleveragethisfunctionalityanduseoptimisticlocking(http://en.wikipedia.org/wiki/Optimistic_concurrency_control),andavoidconflictsandoverwriteswhenmultipleprocessesorthreadsaccessthesamedocumentconcurrently.Youcanassumethatyourindexingapplicationmaywanttotrytoupdatethedocument,whiletheuserwouldliketoupdatethedocumentwhiledoingsomemanualwork.Thequestionthatarisesis:Whichdocumentshouldbethecorrectone—theoneupdatedbytheindexingapplication,theoneupdatedbytheuser,orthemergeddocumentofthechanges?Whatifthechangesareconflicting?Tohandlesuchcases,wecanuseversioning.

UsageexampleLet’sindexanewdocumenttoourblogindex—onewithanidentifierof10,andlet’sindexitssecondversionsoonafterwedothat.Thecommandsthatdothislookasfollows:

curl-XPUT'localhost:9200/blog/article/10'-d'{"title":"Testdocument"}'

curl-XPUT'localhost:9200/blog/article/10'-d'{"title":"Updatedtest

document"}'

Becausewe’veindexedthedocumentwiththesameidentifier,itshouldhaveaversion2(youcancheckitusingtheGETrequest).

Now,let’strydeletingthedocumentwe’vejustindexedbutlet’sspecifyaversionpropertyequalto1.Bydoingthis,wetellElasticsearchthatweareinterestedindeletingthedocumentwiththeprovidedversion.Becausethedocumentisadifferentversionnow,Elasticsearchshouldn’tallowindexingwithversion1.Let’scheckifwhatwesayistrue.Thecommandwewillusetosendthedeleterequestlooksasfollows:

curl-XDELETE'localhost:9200/blog/article/10?version=1'

TheresponsegeneratedbyElasticsearchshouldbesimilartothefollowingone:

{

"error":{

"root_cause":[{

"type":"version_conflict_engine_exception",

"reason":"[article][10]:versionconflict,current[2],provided

[1]",

"shard":1,

"index":"blog"

}],

"type":"version_conflict_engine_exception",

"reason":"[article][10]:versionconflict,current[2],provided

[1]",

"shard":1,

"index":"blog"

},

"status":409

www.EBooksWorld.ir

Page 98: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

Asyoucansee,thedeleteoperationwasnotsuccessful—theversionsdidn’tmatch.Ifwesettheversionpropertyto2,thedeleteoperationwouldbesuccessful:

curl-XDELETE'localhost:9200/blog/article/10?version=2&pretty'

Theresponsethistimewilllookasfollows:

{

"found":true,

"_index":"blog",

"_type":"article",

"_id":"10",

"_version":3,

"_shards":{

"total":2,

"successful":1,

"failed":0

}

}

Thistimethedeleteoperationhasbeensuccessfulbecausetheprovidedversionwasproper.

VersioningfromexternalsystemsTheverygoodthingaboutElasticsearchversioningcapabilitiesisthatwecanprovidetheversionofthedocumentthatwewouldlikeElasticsearchtouse.Thisallowsustoprovideversionsfromexternaldatasystemsthatareourprimarydatastores.Todothis,weneedtoprovideanadditionalparameterduringindexing—version_type=externaland,ofcourse,theversionitself.Forexample,ifwewouldlikeourdocumenttohavethe12345version,wecouldsendarequestlikethis:

curl-XPUT'localhost:9200/blog/article/20?

version=12345&version_type=external'-d'{"title":"Testdocument"}'

TheresponsereturnedbyElasticsearchisasfollows:

{

"_index":"blog",

"_type":"article",

"_id":"20",

"_version":12345,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"created":true

}

Wejustneedtorememberthat,whenusingversion_type=external,weneedtoprovidetheversionincaseswhereweindexthedocument.Incaseswherewewouldliketochangethedocumentanduseoptimisticlocking,weneedtoprovideaversionparameter

www.EBooksWorld.ir

Page 99: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

equalto,orhigherthan,theversionpresentinthedocument.

www.EBooksWorld.ir

Page 100: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 101: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SearchingwiththeURIrequestqueryBeforegettingintothewonderfulworldoftheElasticsearchquerylanguage,wewouldliketointroduceyoutothesimplebutprettyflexibleURIrequestsearch,whichallowsustouseasimpleElasticsearchquerycombinedwiththeLucenequerylanguage.Ofcourse,wewillextendoursearchknowledgeusingElasticsearchinChapter3,SearchingYourData,butfornowwewillsticktothesimplestapproach.

www.EBooksWorld.ir

Page 102: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SampledataForthepurposeofthissectionofthebook,wewillcreateasimpleindexwithtwodocumenttypes.Todothis,wewillrunthefollowingsixcommands:

curl-XPOST'localhost:9200/books/es/1'-d'{"title":"Elasticsearch

Server","published":2013}'

curl-XPOST'localhost:9200/books/es/2'-d'{"title":"ElasticsearchServer

SecondEdition","published":2014}'

curl-XPOST'localhost:9200/books/es/3'-d'{"title":"Mastering

Elasticsearch","published":2013}'

curl-XPOST'localhost:9200/books/es/4'-d'{"title":"Mastering

ElasticsearchSecondEdition","published":2015}'

curl-XPOST'localhost:9200/books/solr/1'-d'{"title":"ApacheSolr4

Cookbook","published":2012}'

curl-XPOST'localhost:9200/books/solr/2'-d'{"title":"SolrCookbookThird

Edition","published":2015}'

Runningtheprecedingcommandswillcreatethebook’sindexwithtwotypes:esandsolr.Thetitleandpublishedfieldswillbeindexedandthus,searchable.

www.EBooksWorld.ir

Page 103: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

URIsearchAllqueriesinElasticsearcharesenttothe_searchendpoint.Youcansearchasingleindexormultipleindices,andyoucanrestrictyoursearchtoagivendocumenttypeormultipletypes.Forexample,inordertosearchourbook’sindex,wewillrunthefollowingcommand:

curl-XGET'localhost:9200/books/_search?pretty'

TheresultsreturnedbyElasticsearchwillincludeallthedocumentsfromourbook’sindex(becausenoqueryhasbeenspecified)andshouldlooksimilartothefollowing:

{

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":6,

"max_score":1.0,

"hits":[{

"_index":"books",

"_type":"es",

"_id":"2",

"_score":1.0,

"_source":{

"title":"ElasticsearchServerSecondEdition",

"published":2014

}

},{

"_index":"books",

"_type":"es",

"_id":"4",

"_score":1.0,

"_source":{

"title":"MasteringElasticsearchSecondEdition",

"published":2015

}

},{

"_index":"books",

"_type":"solr",

"_id":"2",

"_score":1.0,

"_source":{

"title":"SolrCookbookThirdEdition",

"published":2015

}

},{

"_index":"books",

"_type":"es",

"_id":"1",

"_score":1.0,

www.EBooksWorld.ir

Page 104: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"_source":{

"title":"ElasticsearchServer",

"published":2013

}

},{

"_index":"books",

"_type":"solr",

"_id":"1",

"_score":1.0,

"_source":{

"title":"ApacheSolr4Cookbook",

"published":2012

}

},{

"_index":"books",

"_type":"es",

"_id":"3",

"_score":1.0,

"_source":{

"title":"MasteringElasticsearch",

"published":2013

}

}]

}

}

Asyoucansee,theresponsehasaheaderthattellsyouthetotaltimeofthequeryandtheshardsusedinthequeryprocess.Inadditiontothis,wehavedocumentsmatchingthequery—thetop10documentsbydefault.Eachdocumentisdescribedbytheindex,type,identifier,score,andthesourceofthedocument,whichistheoriginaldocumentsenttoElasticsearch.

Wecanalsorunqueriesagainstmanyindices.Forexample,ifwehadanotherindexcalledclients,wecouldalsorunasinglequeryagainstthesetwoindicesasfollows:

curl-XGET'localhost:9200/books,clients/_search?pretty'

WecanalsorunqueriesagainstallthedatainElasticsearchbyomittingtheindexnamescompletelyorsettingthequeriesto_all:

curl-XGET'localhost:9200/_search?pretty'

curl-XGET'localhost:9200/_all/_search?pretty'

Inasimilarmanner,wecanalsochoosethetypeswewanttouseduringsearching.Forexample,ifwewanttosearchonlyintheestypeinthebook’sindex,werunacommandasfollows:

curl-XGET'localhost:9200/books/es/_search?pretty'

Pleaserememberthat,inordertosearchforagiventype,weneedtospecifytheindexormultipleindices.Elasticsearchallowsustohavequitearichsemanticswhenitcomestochoosingindexnames.Ifyouareinterested,pleaserefertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/multi-index.html;however,thereisonethingwewouldliketopointout.Whenrunningaqueryagainstmultiple

www.EBooksWorld.ir

Page 105: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

indices,itmayhappenthatsomeofthemdonotexistorareclosed.Insuchcases,theignore_unavailablepropertycomesinhandy.Whensettotrue,ittellsElasticsearchtoignoreunavailableorclosedindices.

Forexample,let’stryrunningthefollowingquery:

curl-XGET'localhost:9200/books,non_existing/_search?pretty'

Theresponsewouldbesimilartothefollowingone:

{

"error":{

"root_cause":[{

"type":"index_missing_exception",

"reason":"nosuchindex",

"index":"non_existing"

}],

"type":"index_missing_exception",

"reason":"nosuchindex",

"index":"non_existing"

},

"status":404

}

Nowlet’scheckwhatwillhappenifweaddtheignore_unavailable=truetoourrequestandexecutethefollowingcommand:

curl-XGET'localhost:9200/books,non_existing/_search?

pretty&ignore_unavailable=true'

Inthiscase,Elasticsearchwouldreturntheresultswithoutanyerror.

ElasticsearchqueryresponseLet’sassumethatwewanttofindallthedocumentsinourbook’sindexthatcontaintheelasticsearchterminthetitlefield.Wecandothisbyrunningthefollowingquery:

curl-XGET'localhost:9200/books/_search?pretty&q=title:elasticsearch'

TheresponsereturnedbyElasticsearchfortheprecedingrequestwillbeasfollows:

{

"took":37,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.625,

"hits":[{

"_index":"books",

"_type":"es",

"_id":"1",

"_score":0.625,

www.EBooksWorld.ir

Page 106: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"_source":{

"title":"ElasticsearchServer",

"published":2013

}

},{

"_index":"books",

"_type":"es",

"_id":"2",

"_score":0.5,

"_source":{

"title":"ElasticsearchServerSecondEdition",

"published":2014

}

},{

"_index":"books",

"_type":"es",

"_id":"4",

"_score":0.5,

"_source":{

"title":"MasteringElasticsearchSecondEdition",

"published":2015

}

},{

"_index":"books",

"_type":"es",

"_id":"3",

"_score":0.19178301,

"_source":{

"title":"MasteringElasticsearch",

"published":2013

}

}]

}

}

Thefirstsectionoftheresponsegivesusinformationabouthowmuchtimetherequesttook(thetookpropertyisspecifiedinmilliseconds),whetheritwastimedout(thetimed_outproperty),andinformationabouttheshardsthatwerequeriedduringtherequestexecution—thenumberofqueriedshards(thetotalpropertyofthe_shardsobject),thenumberofshardsthatreturnedtheresultssuccessfully(thesuccessfulpropertyofthe_shardsobject),andthenumberoffailedshards(thefailedpropertyofthe_shardsobject).Thequerymayalsotimeoutifitisexecutedforalongerperiodthanwewant.(Wecanspecifythemaximumqueryexecutiontimeusingthetimeoutparameter.)Thefailedshardmeansthatsomethingwentwrongwiththatshardoritwasnotavailableduringthesearchexecution.

Ofcourse,thementionedinformationcanbeuseful,butusually,weareinterestedintheresultsthatarereturnedinthehitsobject.Wehavethetotalnumberofdocumentsreturnedbythequery(inthetotalproperty)andthemaximumscorecalculated(inthemax_scoreproperty).Finally,wehavethehitsarraythatcontainsthereturneddocuments.Inourcase,eachreturneddocumentcontainsitsindexname(the_indexproperty),thetype(the_typeproperty),theidentifier(the_idproperty),thescore(the_scoreproperty),andthe

www.EBooksWorld.ir

Page 107: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

_sourcefield(usually,thisistheJSONobjectsentforindexing.

www.EBooksWorld.ir

Page 108: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

QueryanalysisYoumaywonderwhythequerywe’verunintheprevioussectionworked.WeindexedtheElasticsearchtermandranaqueryforElasticsearchandeventhoughtheydiffer(capitalization),therelevantdocumentswerefound.Thereasonforthisistheanalysis.Duringindexing,theunderlyingLucenelibraryanalyzesthedocumentsandindexesthedataaccordingtotheElasticsearchconfiguration.Bydefault,ElasticsearchwilltellLucenetoindexandanalyzebothstring-baseddataaswellasnumbers.ThesamehappensduringqueryingbecausetheURIrequestquerymapstothequery_stringquery(whichwillbediscussedinChapter3,SearchingYourData),andthisqueryisanalyzedbyElasticsearch.

Let’susetheindices-analyzeAPI(https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html).Itallowsustoseehowtheanalysisprocessisdone.Withthis,wecanseewhathappenedtooneofthedocumentsduringindexingandwhathappenedtoourqueryphraseduringquerying.

InordertoseewhatwasindexedinthetitlefieldoftheElasticsearchserverphrase,wewillrunthefollowingcommand:

curl-XGET'localhost:9200/books/_analyze?pretty&field=title'-d

'ElasticsearchServer'

Theresponsewillbeasfollows:

{

"tokens":[{

"token":"elasticsearch",

"start_offset":0,

"end_offset":13,

"type":"<ALPHANUM>",

"position":0

},{

"token":"server",

"start_offset":14,

"end_offset":20,

"type":"<ALPHANUM>",

"position":1

}]

}

YoucanseethatElasticsearchhasdividedthetextintotwoterms—thefirstonehasatokenvalueofelasticsearchandthesecondonehasatokenvalueoftheserver.

Nowlet’slookathowthequerytextwasanalyzed.Wecandothisbyrunningthefollowingcommand:

curl-XGET'localhost:9200/books/_analyze?pretty&field=title'-d

'elasticsearch'

Theresponseoftherequestwilllookasfollows:

www.EBooksWorld.ir

Page 109: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

{

"tokens":[{

"token":"elasticsearch",

"start_offset":0,

"end_offset":13,

"type":"<ALPHANUM>",

"position":0

}]

}

Wecanseethatthewordisthesameastheoriginalonethatwepassedtothequery.Wewon’tgetintotheLucenequerydetailsandhowthequeryparserconstructedthequery,butingeneraltheindexedtermaftertheanalysiswasthesameastheoneinthequeryaftertheanalysis;so,thedocumentmatchedthequeryandtheresultwasreturned.

www.EBooksWorld.ir

Page 110: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

URIquerystringparametersThereareafewparametersthatwecanusetocontrolURIquerybehavior,whichwewilldiscussnow.Thethingtorememberisthateachparameterinthequeryshouldbeconcatenatedwiththe&character,asshowninthefollowingexample:

curl-XGET'localhost:9200/books/_search?

pretty&q=published:2013&df=title&explain=true&default_operator=AND'

PleaseremembertoenclosetheURLoftherequestusingthe'charactersbecause,onLinux-basedsystems,the&characterwillbeanalyzedbytheLinuxshell.

ThequeryTheqparameterallowsustospecifythequerythatwewantourdocumentstomatch.ItallowsustospecifythequeryusingtheLucenequerysyntaxdescribedintheLucenequerysyntaxsectionlaterinthischapter.Forexample,asimplequerywouldlooklikethis:q=title:elasticsearch.

ThedefaultsearchfieldUsingthedfparameter,wecanspecifythedefaultsearchfieldthatshouldbeusedwhennofieldindicatorisusedintheqparameter.Bydefault,the_allfieldwillbeused.(ThisisthefieldthatElasticsearchusestocopythecontentofalltheotherfields.WewilldiscussthisingreaterdepthinChapter2,IndexingYourData).Anexampleofthedfparametervaluecanbedf=title.

AnalyzerTheanalyzerpropertyallowsustodefinethenameoftheanalyzerthatshouldbeusedtoanalyzeourquery.Bydefault,ourquerywillbeanalyzedbythesameanalyzerthatwasusedtoanalyzethefieldcontentsduringindexing.

ThedefaultoperatorpropertyThedefault_operatorpropertythatcanbesettoORorAND,allowsustospecifythedefaultBooleanoperatorusedforourquery(http://en.wikipedia.org/wiki/Boolean_algebra).Bydefault,itissettoOR,whichmeansthatasinglequerytermmatchwillbeenoughforadocumenttobereturned.SettingthisparametertoANDforaquerywillresultinreturningthedocumentsthatmatchallthequeryterms.

QueryexplanationIfwesettheexplainparametertotrue,Elasticsearchwillincludeadditionalexplaininformationwitheachdocumentintheresult—suchastheshardfromwhichthedocumentwasfetchedandthedetailedinformationaboutthescoringcalculation(wewilltalkmoreaboutitintheUnderstandingtheexplaininformationsectioninChapter6,MakeYourSearchBetter).Alsoremembernottofetchtheexplaininformationduringnormalsearchqueriesbecauseitrequiresadditionalresourcesandaddsperformancedegradationtothequeries.Forexample,aquerythatincludesexplaininformationcouldlookasfollows:

www.EBooksWorld.ir

Page 111: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

curl-XGET'localhost:9200/books/_search?pretty&explain=true&q=title:solr'

TheresultsreturnedbyElasticsearchfortheprecedingquerywouldbeasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":2,

"max_score":0.70273256,

"hits":[{

"_shard":2,

"_node":"v5iRsht9SOWVzu-GY-YHlA",

"_index":"books",

"_type":"solr",

"_id":"2",

"_score":0.70273256,

"_source":{

"title":"SolrCookbookThirdEdition",

"published":2015

},

"_explanation":{

"value":0.70273256,

"description":"weight(title:solrin0)[PerFieldSimilarity],

resultof:",

"details":[{

"value":0.70273256,

"description":"fieldWeightin0,productof:",

"details":[{

"value":1.0,

"description":"tf(freq=1.0),withfreqof:",

"details":[{

"value":1.0,

"description":"termFreq=1.0",

"details":[]

}]

},{

"value":1.4054651,

"description":"idf(docFreq=1,maxDocs=3)",

"details":[]

},{

"value":0.5,

"description":"fieldNorm(doc=0)",

"details":[]

}]

}]

}

},{

"_shard":3,

"_node":"v5iRsht9SOWVzu-GY-YHlA",

"_index":"books",

www.EBooksWorld.ir

Page 112: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"_type":"solr",

"_id":"1",

"_score":0.5,

"_source":{

"title":"ApacheSolr4Cookbook",

"published":2012

},

"_explanation":{

"value":0.5,

"description":"weight(title:solrin1)[PerFieldSimilarity],

resultof:",

"details":[{

"value":0.5,

"description":"fieldWeightin1,productof:",

"details":[{

"value":1.0,

"description":"tf(freq=1.0),withfreqof:",

"details":[{

"value":1.0,

"description":"termFreq=1.0",

"details":[]

}]

},{

"value":1.0,

"description":"idf(docFreq=1,maxDocs=2)",

"details":[]

},{

"value":0.5,

"description":"fieldNorm(doc=1)",

"details":[]

}]

}]

}

}]

}

}

ThefieldsreturnedBydefault,foreachdocumentreturned,Elasticsearchwillincludetheindexname,thetypename,thedocumentidentifier,score,andthe_sourcefield.Wecanmodifythisbehaviorbyaddingthefieldsparameterandspecifyingacomma-separatedlistoffieldnames.Thefieldwillberetrievedfromthestoredfields(iftheyexist;wewilldiscusstheminChapter2,IndexingYourData)orfromtheinternal_sourcefield.Bydefault,thevalueofthefieldsparameteris_source.Anexampleis:fields=title,priority.

Wecanalsodisablethefetchingofthe_sourcefieldbyaddingthe_sourceparameterwithitsvaluesettofalse.

SortingtheresultsUsingthesortparameter,wecanspecifycustomsorting.ThedefaultbehaviorofElasticsearchistosortthereturneddocumentsindescendingorderofthevalueofthe_scorefield.Ifwewanttosortourdocumentsdifferently,weneedtospecifythesort

www.EBooksWorld.ir

Page 113: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

parameter.Forexample,addingsort=published:descwillsortthedocumentsindescendingorderofpublishedfield.Byaddingthesort=published:ascparameter,wewilltellElasticsearchtosortthedocumentsonthebasisofthepublishedfieldinascendingorder.

Ifwespecifycustomsorting,Elasticsearchwillomitthe_scorefieldcalculationforthedocuments.Thismaynotbethedesiredbehaviorinyourcase.Ifyouwanttostillkeepatrackofthescoresforeachdocumentwhenusingacustomsort,youshouldaddthetrack_scores=truepropertytoyourquery.Pleasenotethattrackingthescoreswhendoingcustomsortingwillmakethequeryalittlebitslower(youmaynotevennoticethedifference)duetotheprocessingpowerneededtocalculatethescore.

ThesearchtimeoutBydefault,Elasticsearchdoesn’thavetimeoutforqueries,butyoumaywantyourqueriestotimeoutafteracertainamountoftime(forexample,5seconds).Elasticsearchallowsyoutodothisbyexposingthetimeoutparameter.Whenthetimeoutparameterisspecified,thequerywillbeexecuteduptoagiventimeoutvalueandtheresultsthatweregathereduptothatpointwillbereturned.Tospecifyatimeoutof5seconds,youwillhavetoaddthetimeout=5sparametertoyourquery.

TheresultswindowElasticsearchallowsyoutospecifytheresultswindow(therangeofdocumentsintheresultslistthatshouldbereturned).Wehavetwoparametersthatallowustospecifytheresultswindowsize:sizeandfrom.Thesizeparameterdefaultsto10anddefinesthemaximumnumberofresultsreturned.Thefromparameterdefaultsto0andspecifiesfromwhichdocumenttheresultsshouldbereturned.Inordertoreturnfivedocumentsstartingfromthe11thone,wewilladdthefollowingparameterstothequery:size=5&from=10.

Limitingper-shardresultsElasticsearchallowsustospecifythemaximumnumberofdocumentsthatshouldbefetchedfromeachshardusingterminate_afterpropertyandspecifyingthemaximumnumberofdocuments.Forexample,ifwewanttogetnomorethan100documentsfromeachshard,wecanaddterminate_after=100toourURIrequest.

IgnoringunavailableindicesWhenrunningqueriesagainstmultipleindices,itishandytotellElasticsearchthatwedon’tcareabouttheindicesthatarenotavailable.Bydefault,Elasticsearchwillthrowanerrorifoneoftheindicesisnotavailable,butwecanchangethisbysimplyaddingtheignore_unavailable=trueparametertoourURIrequest.

ThesearchtypeTheURIqueryallowsustospecifythesearchtypeusingthesearch_typeparameter,whichdefaultstoquery_then_fetch.Twovaluesthatwecanusehereare:dfs_query_then_fetchandquery_then_fetch.TherestofthesearchtypesavailableinolderElasticsearchversionsarenowdeprecatedorremoved.We’lllearnmoreabout

www.EBooksWorld.ir

Page 114: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

searchtypesintheUnderstandingthequeryingprocesssectionofChapter3,SearchingYourData.

LowercasingtermexpansionSomequeries,suchastheprefixquery,usequeryexpansion.WewilldiscussthisintheQueryrewritesectioninChapter4,ExtendingYourQueryingKnowledge.Weareallowedtodefinewhethertheexpandedtermsshouldbelowercasedornotusingthelowercase_expanded_termsproperty.Bydefault,thelowercase_expanded_termspropertyissettotrue,whichmeansthattheexpandedtermswillbelowercased.

WildcardandprefixanalysisBydefault,wildcardqueriesandprefixqueriesarenotanalyzed.Ifwewanttochangethisbehavior,wecansettheanalyze_wildcardpropertytotrue.

NoteIfyouwanttoseealltheparametersexposedbyElasticsearchastheURIrequestparameters,pleaserefertotheofficialdocumentationavailableat:https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html.

www.EBooksWorld.ir

Page 115: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

LucenequerysyntaxWethoughtthatitwouldbegoodtoknowabitmoreaboutwhatsyntaxcanbeusedintheqparameterpassedintheURIquery.SomeofthequeriesinElasticsearch(suchastheonecurrentlybeingdiscussed)supporttheLucenequeryparsersyntax—thelanguagethatallowsyoutoconstructqueries.Let’stakealookatitanddiscusssomebasicfeatures.

AquerythatwepasstoLuceneisdividedintotermsandoperatorsbythequeryparser.Let’sstartwiththeterms;youcandistinguishthemintotwotypes—singletermsandphrases.Forexample,toqueryforabookterminthetitlefield,wewillpassthefollowingquery:

title:book

Toqueryfortheelasticsearchbookphraseinthetitlefield,wewillpassthefollowingquery:

title:"elasticsearchbook"

Youmayhavenoticedthenameofthefieldinthebeginningandinthetermorthephraselater.

Aswealreadysaid,theLucenequerysyntaxsupportsoperators.Forexample,the+operatortellsLucenethatthegivenpartmustbematchedinthedocument,meaningthatthetermwearesearchingformustpresentinthefieldinthedocument.The-operatoristheopposite,whichmeansthatsuchapartofthequerycan’tbepresentinthedocument.Apartofthequerywithoutthe+or-operatorwillbetreatedasthegivenpartofthequerythatcanbematchedbutitisnotmandatory.So,ifwewanttofindadocumentwiththebookterminthetitlefieldandwithoutthecatterminthedescriptionfield,wesendthefollowingquery:

+title:book-description:cat

Wecanalsogroupmultipletermswithparentheses,asshowninthefollowingquery:

title:(crimepunishment)

Wecanalsoboostpartsofthequery(thisincreasestheirimportanceforthescoringalgorithm—thehighertheboost,themoreimportantthequerypartis)withthe^operatorandtheboostvalueafterit,asshowninthefollowingquery:

title:book^4

ThesearethebasicsoftheLucenequerylanguageandshouldallowyoutouseElasticsearchandconstructquerieswithoutanyproblems.However,ifyouareinterestedintheLucenequerysyntaxandyouwouldliketoexplorethatindepth,pleaserefertotheofficialdocumentationofthequeryparseravailableathttp://lucene.apache.org/core/5_4_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html.

www.EBooksWorld.ir

Page 116: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 117: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SummaryInthischapter,welearnedwhatfulltextsearchisandthecontributionApacheLucenemakestothis.Inadditiontothis,wearenowfamiliarwiththebasicconceptsofElasticsearchanditstop-levelarchitecture.WeusedtheElasticsearchRESTAPInotonlytoindexdata,butalsotoupdate,retrieve,andfinallydeleteit.We’velearnedwhatversioningisandhowwecanuseitforoptimisticlockinginElasticsearch.Finally,wesearchedourdatausingthesimpleURIquery.

Inthenextchapter,we’llfocusonindexingourdata.WewillseehowElasticsearchindexingworksandwhattheroleofprimaryshardsandreplicasis.We’llseehowElasticsearchhandlesdatathatitdoesn’tknowandhowtocreateourownmappings—theJSONstructurethatdescribesthestructureofourindex.We’llalsolearnhowtousebatchindexingtospeeduptheindexingprocessandwhatadditionalinformationcanbestoredalongwithourindextohelpusachieveourgoal.Inaddition,wewilldiscusswhatanindexsegmentis,whatsegmentmergingis,andhowtotuneasegment.Finally,we’llseehowroutingworksinElasticsearchandwhatoptionswehavewhenitcomestobothindexingandqueryingrouting.

www.EBooksWorld.ir

Page 118: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 119: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Chapter2.IndexingYourDataInthepreviouschapter,welearnedwhatfulltextsearchisandhowApacheLucenefitsthere.WewereintroducedtothebasicconceptsofElasticsearchandwearenowfamiliarwithitstop-levelarchitecture,soweknowhowitworks.WeusedtheRESTAPItoindexdata,toupdateit,todeleteit,andofcoursetoretrieveit.WesearchedourdatawiththesimpleURIqueryandweusedversioningthatallowedustouseoptimisticlockingfunctionality.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

BasicinformationaboutElasticsearchindexingAdjustingElasticsearchschema-lessbehaviorCreatingyourownmappingsUsingoutoftheboxanalyzersConfiguringyourownanalyzersIndexdatainbatchesAddingadditionalinternalinformationtoindicesSegmentmergingRouting

www.EBooksWorld.ir

Page 120: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ElasticsearchindexingSofarwehaveourElasticsearchclusterupandrunning.WealsoknowhowtouseElasticsearchRESTAPItoindexourdata,weknowhowtoretrieveit,andwealsoknowhowtoremovethedatathatwenolongerneed.We’vealsolearnedhowtosearchinourdatabyusingtheURIrequestsearchandApacheLucenequerylanguage.However,untilnowwe’veusedElasticsearchfunctionalitythatallowsusnottocareaboutindices,shards,anddatastructure.ThisisnotsomethingthatyoumaybeusedtowhenyouarecomingfromtheworldofSQLdatabases,whereyouneedthedatabaseandthetableswithallthecolumnscreatedupfront.Ingeneral,youneededtodescribethedatastructuretobeabletoputdataintothedatabase.Elasticsearchisschema-lessandbydefaultcreatesindicesautomaticallyandbecauseofthatwecanjustinstallitandindexdatawithouttheneedofanypreparations.However,thisisusuallynotthebestsituationwhenitcomestoproductionenvironmentswhereyouwanttocontroltheanalysisofyourdata.BecauseofthatwewillstartwithshowingyouhowtomanageyourindicesandthenwewillgetyouthroughtheworldofmappingsinElasticsearch.

www.EBooksWorld.ir

Page 121: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ShardsandreplicasInChapter1,GettingStartedwithElasticsearchCluster,wetoldyouthatindicesinElasticsearcharebuiltfromoneormoreshards.EachofthoseshardscontainspartofthedocumentsetandeachshardisaseparateLuceneindex.Inadditiontothat,eachshardcanhavereplicas–physicalcopiesoftheprimarysharditself.Whenwecreateanindex,wecantellElasticsearchhowmanyshardsitshouldbebuiltfrom.

NoteThedefaultnumberofshardsthatElasticsearchusesis5andeachindexwillalsocontainasinglereplica.Thedefaultconfigurationcanbechangedbysettingtheindex.number_of_shardsandindex.number_of_replicaspropertiesintheelasticsearch.ymlconfigurationfile.

Whendefaultsareused,wewillendupwithfiveApacheLuceneindicesthatourElasticsearchindexisbuiltofandonereplicaforeachofthose.So,withfiveshardsandonereplica,wewouldactuallyget10shards.Thisisbecauseeachshardwouldgetitsowncopy,sothetotalnumberofshardsintheclusterwouldbe10.

Dividingindicesinsuchawayallowsustospreadtheshardsacrossthecluster.Thenicethingaboutthatisthatalltheshardswillbeautomaticallyspreadthroughoutthecluster.Ifwehaveasinglenode,Elasticsearchwillputthefiveprimaryshardsonthatnodeandwillleavethereplicasunassigned,becauseElasticsearchdoesn’tassignshardsandtheirreplicastothesamenode.Thereasonforthatissimple–ifanodewouldcrash,wewouldloseboththeprimarysourceofthedataandallthecopies.So,ifyouhaveoneElasticsearchnode,don’tworryaboutreplicasnotbeingassigned–itissomethingtobeexpected.OfcoursewhenyouhaveenoughnodesforElasticsearchtoassignallthereplicas(inadditiontoshards),itisnotgoodtonothavethemassignedandyoushouldlookfortheprobablecausesofthatsituation.

Thethingtorememberisthathavingshardsandreplicasisnotfree.Firstofall,eachreplicaneedsadditionaldiskspace,exactlythesameamountofspacethattheoriginalshardneeds.Soifwehave3replicasforourindex,wewillactuallyneed4timesmorespace.Ifourprimaryshardweighs100GBintotal,with3replicaswewouldneed400GB–100GBforeachreplica.However,thisisnottheonlycost.EachreplicaisaLuceneindexonitsownandElasticsearchneedssomememorytohandlethat.Themoreshardsinthecluster,themorememoryisbeingused.Andfinally,havingreplicasmeansthatwewillhavetodoindexationoneachofthereplica,inadditiontotheindexationontheprimaryshard.Thereisanotionofshadowreplicaswhichcancopythewholebinaryindex,but,inmostcases,eachreplicawilldoitsownindexation.ThegoodthingaboutreplicasisthatElasticsearchwilltrytospreadthequeryandgetrequestsevenlybetweentheshardsandtheirreplicas,whichmeansthatwecanscaleourclusterhorizontallybyusingthem.

Sotosumuptheconclusions:

Havingmoreshardsintheindexallowsustospreadtheindexbetweenmoreservers

www.EBooksWorld.ir

Page 122: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

andparallelizetheindexingoperationsandthushavebetterindexingthroughput.Dependingonyourdeployment,havingmoreshardsmayincreasequerythroughputandlowerquerieslatency–especiallyinenvironmentsthatdon’thavealargenumberofqueriespersecond.Havingmoreshardsmaybeslowercomparedtoasingleshardquery,becauseElasticsearchneedstoretrievethedatafrommultipleserversandcombinethemtogetherinmemory,beforereturningthefinalqueryresults.Havingmorereplicasresultsinamoreresilientcluster,becausewhentheprimaryshardisnotavailable,itscopywilltakethatrole.Basically,havingasinglereplicaallowsustoloseonecopyofashardandstillservethewholedata.Havingtworeplicasallowsustolosetwocopiesoftheshardandstillseethewholedata.Thehigherthereplicacount,thehigherqueriesthroughputtheclusterwillhave.That’sbecauseeachreplicacanservethedataithasindependentlyfromalltheothers.Thehighernumberofshards(bothprimaryandreplicas)willresultinmorememoryneededbyElasticsearch.

Ofcourse,thesearenottheonlyrelationshipsbetweenthenumberofshardsandreplicasinElasticsearch.Wewilltalkaboutmostofthemlaterinthebook.

So,howmanyshardsandreplicasshouldwehaveforourindices?Thatdepends.Webelievethatthedefaultsarequitegoodbutnothingcanreplaceagoodtest.Notethatthenumberofreplicasisnotveryimportantbecauseyoucanadjustitonaliveclusterafterindexcreation.Youcanremoveandaddthemifyouwantandhavetheresourcestorunthem.Unfortunately,thisisnottruewhenitcomestothenumberofshards.Onceyouhaveyourindexcreated,theonlywaytochangethenumberofshardsistocreateanotherindexandre-indexyourdata.

WriteconsistencyElasticsearchallowsustocontrolthewriteconsistencytopreventwriteshappeningwhentheyshouldnot.Bydefault,Elasticsearchindexingoperationissuccessfulwhenthewriteissuccessfulonthequorumonactiveshards–meaning50%oftheactiveshardsplusone.Wecancontrolthisbehaviorbyaddingaction.write_consitencytoourelasticsearch.ymlfileorbyaddingtheconsistencyparametertoourindexrequest.Thementionedpropertiescantakethefollowingvalues:

quorum:Thedefaultvalue,requiring50%plus1activeshardstobesuccessfulfortheindexoperationtosucceedone:Requiresonlyasingleactiveshardtobesuccessfulfortheindexoperationtosucceedall:Requiresalltheactiveshardstobesuccessfulfortheindexoperationtosucceed

www.EBooksWorld.ir

Page 123: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

CreatingindicesWhenwewereindexingourdocumentsinChapter1,GettingStartedwithElasticsearchCluster,wedidn’tcareaboutindexcreationatall.WeassumedthatElasticsearchwilldoeverythingforusandactuallyitwastrue;wejustusedthefollowingcommand:

curl-XPUT'http://localhost:9200/blog/article/1'-d'{"title":"New

versionofElasticsearchreleased!","content":"Version1.0released

today!","tags":["announce","elasticsearch","release"]}'

Thisisjustfine.Ifsuchanindexdoesnotexist,Elasticsearchautomaticallycreatestheindexforus.However,therearetimeswhenwewanttocreateindicesourselvesforvariousreasons.Maybewewouldliketohavecontroloverwhichindicesarecreatedtoavoiderrorsormaybewehavesomenondefaultsettingsthatwewouldliketousewhencreatingaparticularindex.Thereasonsmaydiffer,butit’sgoodtoknowthatwecancreateindiceswithoutindexingdocuments.

ThesimplestwaytocreateanindexistorunaPUTHTTPrequestwiththenameoftheindexwewanttocreate.Forexample,tocreateanindexcalledblog,wecouldusethefollowingcommand:

curl-XPUThttp://localhost:9200/blog/

WejusttoldElasticsearchthatwewanttocreatetheindexwiththenameblog.Ifeverythinggoesright,youwillseethefollowingresponsefromElasticsearch:

{"acknowledged":true}

AlteringautomaticindexcreationWealreadymentionedthatautomaticindexcreationisnotthebestideainsomecases.Forexample,asimpletypoduringindexcreationcanleadtocreatinghundredsofunusedindicesandmakeclusterstateinformationlargerthanitshouldbe,puttingmorepressureonElasticsearchandtheunderlyingJVM.Becauseofthat,wecanturnoffautomaticindexcreationbyaddingasimplepropertytotheelasticsearch.ymlconfigurationfile:

action.auto_create_index:false

Let’sstopforawhileanddiscusstheaction.auto_create_indexproperty,becauseitallowsustodomorecomplicatedthingsthanjustallowing(settingittotrue)anddisabling(settingittofalse)automaticindexcreation.Thementionedpropertyallowsustousepatternsthatspecifytheindexnameswhichshouldbeallowedtobeautomaticallycreatedandwhichshouldbedisallowed.Forexample,let’sassumethatwewouldliketoallowautomaticindexcreationforindicesstartingwithlogsandwewouldliketodisallowalltheothers.Todosomethinglikethis,wewouldsettheaction.auto_create_indexpropertytosomethingasfollows:

action.auto_create_index:+logs*,-*

Nowifwewouldliketocreateanindexcalledlogs_2015-10-01,wewouldsucceed.Tocreatesuchanindex,wewouldusethefollowingcommand:

www.EBooksWorld.ir

Page 124: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

curl-XPUThttp://localhost:9200/logs_2015-10-01/log/1-d'{"message":

"Testlogmessage"}'

Elasticsearchwouldrespondwith:

{

"_index":"logs_2015-10-01",

"_type":"log",

"_id":"1",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"created":true

}

However,supposewenowtrytocreatetheblogusingthefollowingcommand:

curl-XPUThttp://localhost:9200/blog/article/1-d'{"title":"Testarticle

title"}'

Elasticsearchwouldrespondwithanerrorsimilartothefollowingone:

{

"error":{

"root_cause":[{

"type":"index_not_found_exception",

"reason":"nosuchindex",

"resource.type":"index_expression",

"resource.id":"blog",

"index":"blog"

}],

"type":"index_not_found_exception",

"reason":"nosuchindex",

"resource.type":"index_expression",

"resource.id":"blog",

"index":"blog"

},

"status":404

}

Onethingtorememberisthattheorderofpatterndefinitionsmatters.Elasticsearchchecksthepatternsuptothefirstpatternthatmatches,soifwemove-*asthefirstpattern,the+logs*patternwon’tbeusedatall.

SettingsforanewlycreatedindexManualindexcreationisalsonecessarywhenwewanttopassnondefaultconfigurationoptionsduringindexcreation;forexample,initialnumberofshardsandreplicas.WecandothatbyincludingJSONpayloadwithsettingsasthePUTHTTPrequestbody.Forexample,ifwewouldliketotellElasticsearchthatourblogindexshouldonlyhaveasingleshardandtworeplicasinitially,thefollowingcommandcouldbeused:

curl-XPUThttp://localhost:9200/blog/-d'{

www.EBooksWorld.ir

Page 125: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"settings":{

"number_of_shards":1,

"number_of_replicas":2

}

}'

Theprecedingcommandwillresultinthecreationoftheblogindexwithoneshardandtworeplicas,makingatotalofthreephysicalLuceneindices–calledshardsaswealreadyknow.Ofcoursetherearealotmoresettingsthatwecanuse,butwhatwedidisenoughfornowandwewilllearnabouttherestthroughoutthebook.

IndexdeletionOfcourse,similartohowwehandleddocuments,Elasticsearchallowsustodeleteindicesaswell.Deletinganindexisverysimilartocreatingit,butinsteadofusingthePUTHTTPmethod,weusetheDELETEone.Forexample,ifwewouldliketodeleteourpreviouslycreatedblogindex,wewouldrunthefollowingcommand:

curl-XDELETEhttp://localhost:9200/blog

Theresponsewillbethesameastheonewesawearlierwhenwecreatedanindexandshouldlookasfollows:

{"acknowledged":true}

Nowthatweknowwhatanindexis,howtocreateit,andhowtodeleteit,wearereadytocreateindiceswiththemappingswehavedefined.EventhoughElasticsearchisschema–less,therearealotofsituationswherewewouldliketomanuallycreatetheschema,toavoidanyproblemswiththeindexstructure.

www.EBooksWorld.ir

Page 126: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 127: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

MappingsconfigurationIfyouareusedtoSQLdatabases,youmayknowthatbeforeyoucanstartinsertingthedatainthedatabase,youneedtocreateaschema,whichwilldescribewhatyourdatalookslike.AlthoughElasticsearchisaschema-less(werathercallitdatadrivenschema)searchengineandcanfigureoutthedatastructureonthefly,wethinkthatcontrollingthestructureandthusdefiningitourselvesisabetterway.Thefieldtypedeterminingmechanismisnotgoingtoguessthefuture.Forexample,ifyoufirstsendanintegervalue,suchas60,andyousendafloatvaluesuchas70.23forthesamefield,anerrorcanhappenorElasticsearchwilljustcutoffthedecimalpartofthefloatvalue(whichisactuallywhathappens).ThisisbecauseElasticsearchwillfirstsetthefieldtypetointegerandwilltrytoindexthefloatvaluetotheintegerfieldwhichwillcausecuttingofthedecimalpointinthefloatingpointnumber.Inthenextfewpagesyou’llseehowtocreatemappingsthatsuityourneedsandmatchyourdatastructure.

NoteNotethatwedidn’tincludealltheinformationabouttheavailabletypesinthischapterandsomefeaturesofElasticsearch,suchasnestedtype,parent-childhandling,storinggeographicalpoints,andsearch,aredescribedinthefollowingchaptersofthisbook.

www.EBooksWorld.ir

Page 128: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TypedeterminingmechanismBeforewestartdescribinghowtocreatemappingsmanually,wewanttogetbacktotheautomatictypedeterminingalgorithmusedinElasticsearch.Aswealreadysaid,ElasticsearchcantryguessingtheschemaforourdocumentsbylookingattheJSONthatthedocumentisbuiltfrom.BecauseJSONisstructured,thatseemseasytodo.Forexample,stringsaresurroundedbyquotationmarks,Booleansaredefinedusingspecificwords,andnumbersarejustafewdigits.Thisisasimpletrick,butitusuallyworks.Forexample,let’slookatthefollowingdocument:

{

"field1":10,

"field2":"10"

}

Theprecedingdocumenthastwofields.Thefield1fieldwillbegivenatypenumber(tobeprecise,thatfieldwillbegivenalongtype).Thesecondfield,calledfield2willbegivenastringtype,becauseitissurroundedbyquotationmarks.Ofcourse,forsomeusecasesthiscanbethedesiredbehavior.However,ifsomehowwewouldsurroundallthedatausingquotationmark(whichisnotthebestideaanyway)ourindexstructurewouldcontainonlystringtypefields.

NoteDon’tworryaboutthefactthatyouarenotfamiliarwithwhatarethenumerictypes,thestringtypes,andsoon.WewilldescribethemafterweshowyouwhatyoucandototunetheautomatictypedeterminingmechanisminElasticsearch.

DisablingthetypedeterminingmechanismThefirstsolutionistocompletelydisabletheschema-lessbehaviorinElasticsearch.Wecandothatbyaddingtheindex.mapper.dynamicpropertytoourindexpropertiesandsettingittofalse.Wecandothatbyrunningthefollowingcommandtocreatetheindex:

curl-XPUT'localhost:9200/sites'-d'{

"index.mapper.dynamic":false

}'

BydoingthatwetoldElasticsearchthatwedon’twantittoguessthetypeofourdocumentsinthesite’sindexandthatwewillprovidethemappingsourselves.Ifwewilltryindexingsomeexampledocumenttothesite’sindex,wewillgetthefollowingerror:

{

"error":{

"root_cause":[{

"type":"type_missing_exception",

"reason":"type[[doc,tryingtoautocreatemapping,butdynamic

mappingisdisabled]]missing",

"index":"sites"

}],

"type":"type_missing_exception",

"reason":"type[[doc,tryingtoautocreatemapping,butdynamic

www.EBooksWorld.ir

Page 129: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

mappingisdisabled]]missing",

"index":"sites"

},

"status":404

}

Thisisbecausewedidn’tcreateanymappings–noschemafordocumentswascreated.Elasticsearchcouldn’tcreateoneforusbecausewedidn’tallowitandtheindexationcommandfailed.

Ofcoursethisisnottheonlythingwecandowhenitcomestoconfiguringhowthetypedeterminingmechanismworks.Wecanalsotuneitordisableitforagiventypeontheobjectlevel.WewilltalkaboutthesecondcaseinChapter5,ExtendingYourIndexStructure.Fornow,let’slookatthepossibilitiesoftuningtypedeterminingmechanisminElasticsearch.

TuningthetypedeterminingmechanismfornumerictypesOneofthesolutionstotheproblemswithJSONdocumentsandtypeguessingisthatwearenotalwaysincontrolofthedata.Thedocumentsthatweareindexingcancomefrommultipleplacesandsomesystemsinourenvironmentmayincludequotationmarksforallthefieldsinthedocument.Thiscanleadtoproblemsandbadguesses.Becauseofthat,Elasticsearchallowsustoenablemoreaggressivefieldsvaluecheckingfornumericfieldsbysettingthenumeric_detectionpropertytotrueinthemappingsdefinition.Forexample,let’sassumethatwewanttocreateanindexcalledusersandwewantittohavetheusertypeonwhichwewillwantmoreaggressivenumericfieldsparsing.Todothat,wewillusethefollowingcommand:

curl-XPUThttp://localhost:9200/users/?pretty-d'{

"mappings":{

"user":{

"numeric_detection":true

}

}

}'

Nowlet’srunthefollowingcommandtoindexasingledocumenttotheusersindex:

curl-XPOSThttp://localhost:9200/users/user/1-d'{"name":"User1",

"age":"20"}'

Earlier,withthedefaultsettings,theagefieldwouldbesettostringtype.Withthenumeric_detectionpropertysettotrue,thetypeoftheagefieldwillbesettolong.Wecancheckthatbyrunningthefollowingcommand(itwillretrievethemappingsforallthetypesintheusersindex):

curl-XGET'localhost:9200/users/_mapping?pretty'

TheprecedingcommandshouldresultinthefollowingresponsereturnedbyElasticsearch:

{

"users":{

"mappings":{

www.EBooksWorld.ir

Page 130: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"user":{

"numeric_detection":true,

"properties":{

"age":{

"type":"long"

},

"name":{

"type":"string"

}

}

}

}

}

}

Aswecansee,theagefieldwasreallysettobeoftypelong.

TuningthetypedeterminingmechanismfordatesAnothertypeofdatathatcausestroublearefieldswithdates.Datescancomeindifferentflavors,forexample,2015-10-0111:22:33isaproperdateandsois2015-10-01T11:22:33+00.Becauseofthat,Elasticsearchtriestomatchthefieldstotimestampsorstringsthatmatchsomegivendateformat.Ifthatmatchingoperationissuccessful,thefieldistreatedasadatebasedone.Ifweknowhowourdatefieldslook,wecanhelpElasticsearchbyprovidingalistofrecognizeddateformatsusingthedynamic_date_formatsproperty,whichallowsustospecifytheformatsarray.Let’slookatthefollowingcommandforcreatinganindex:

curl-XPUT'http://localhost:9200/blog/'-d'{

"mappings":{

"article":{

"dynamic_date_formats":["yyyy-MM-ddhh:mm"]

}

}

}'

Theprecedingcommandwillresultinthecreationofanindexcalledblogwiththesingletypecalledarticle.We’vealsousedthedynamic_date_formatspropertywithasingledateformatthatwillresultinElasticsearchusingthedatecoretype(refertotheCoretypessectioninthischapterformoreinformationaboutfieldtypes)forfieldsmatchingthedefinedformat.Elasticsearchusesthejoda-timelibrarytodefinethedateformats,sovisithttp://joda-time.sourceforge.net/api-release/org/joda/time/format/DateTimeFormat.htmlifyouareinterestedinknowingaboutthem.

NoteRememberthatthedynamic_date_formatpropertyacceptsanarrayofvalues.Thatmeansthatwecanhandleseveraldateformatssimultaneously.

Withtheprecedingindex,wecannowtryindexinganewdocumentusingthefollowingcommand:

www.EBooksWorld.ir

Page 131: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

curl-XPUTlocalhost:9200/blog/article/1-d'{"name":"Test",

"test_field":"2015-10-0111:22"}'

Elasticsearchwillofcourseindexthatdocument,butlet’slookatthemappingscreatedforourindex:

curl-XGET'localhost:9200/blog/_mapping?pretty'

Theresponsefortheprecedingcommandwillbeasfollows:

{

"blog":{

"mappings":{

"article":{

"dynamic_date_formats":["yyyy-MM-ddhh:mm"],

"properties":{

"name":{

"type":"string"

},

"test_field":{

"type":"date",

"format":"yyyy-MM-ddhh:mm"

}

}

}

}

}

}

Aswecansee,thetest_fieldfieldwasgivenadatetype,soourtuningworks.

Unfortunately,theproblemstillexistsifwewanttheBooleantypetobeguessed.ThereisnooptiontoforcetheguessingofBooleantypesfromthetext.Insuchcases,whenachangeofsourceformatisimpossible,wecanonlydefinethefielddirectlyinthemappingsdefinition.

www.EBooksWorld.ir

Page 132: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IndexstructuremappingEachdatahasitsownstructure–someareverysimple,andsomeincludecomplicatedobjectrelations,childrendocuments,andnestedproperties.Ineachcase,weneedtohaveaschemainElasticsearchcalledmappingsthatdefinehowthedatalooks.Ofcourse,wecanusetheschema-lessnatureofElasticsearch,butwecanandweusuallywanttopreparethemappingsupfront,soweknowhowthedataishandled.

Forthepurposesofthischapter,wewilluseasingletypeintheindex.Ofcourse,Elasticsearchasamultitenantsystemallowsustohavemultipletypesinasingleindex,butwewanttosimplifytheexample,tomakeiteasiertounderstand.So,forthepurposeofthenextfewpages,wewillcreateanindexcalledpoststhatwillholddatafordocumentsinaposttype.Wealsoassumethattheindexwillholdthefollowinginformation:

UniqueidentifieroftheblogpostNameoftheblogpostPublicationdateContents–textofthepostitself

InElasticsearch,mappings,aswithalmostallcommunication,aresentasJSONobjectsintherequestbody.So,ifwewanttocreatethesimplestmappingsthatmatchesourneed,itwilllookasfollows(westoredthemappingsintheposts.jsonfile,sowecaneasilysendit):

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"published":{"type":"date"},

"contents":{"type":"string"}

}

}

}

}

Tocreateourpostsindexwiththeprecedingmappingsfile,wewilljustrunthefollowingcommand:

curl-XPOST'http://localhost:9200/posts'[email protected]

NoteNotethatyoucanstoreyourmappingsandsetafilenametowhatevernameyoulike.Thecurlcommandwilljusttakethecontentsofit.

Andagain,ifeverythinggoeswell,weseethefollowingresponse:

{"acknowledged":true}

Elasticsearchreportedthatourindexhasbeencreated.IfwelookattheElasticsearchnodewww.EBooksWorld.ir

Page 133: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

–onthecurrentmaster,wewillseesomethingasfollows:

[2015-10-1415:02:12,840][INFO][cluster.metadata][Shalla-Bal]

[posts]creatingindex,cause[api],templates[],shards[5]/[1],mappings

[post]

Wecanseethatthepostsindexhasbeencreated,with5shardsand1replica(shards[5]/[1])andwithmappingsforasingleposttype(mappings[post]).Let’snowdiscussthecontentsoftheposts.jsonfileandthepossibilitieswhenitcomestomappings.

TypeandtypesdefinitionThemappingsdefinitioninElasticsearchisjustanotherJSONobject,soitneedstobeproperlystartedandendedwithcurlybrackets.Allthemappingsdefinitionsarenestedinsideasinglemappingsobject.Inourexample,wehadasingleposttype,butwecanhavemultipleofthem.Forexample,ifwewouldliketohavemorethanasingletypeinourmappings,wejustneedtoseparatethemwithacommacharacter.Let’sassumethatwewouldliketohaveanadditionalusertypeinourpostsindex.Themappingsdefinitioninsuchcasewilllookasfollows(westoreditintheposts_with_user.jsonfile):

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"published":{"type":"date"},

"contents":{"type":"string"}

}

},

"user":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"}

}

}

}

}

Asyoucansee,wecannamethetypeswiththenameswewant.Undereachtypewehavethepropertiesobjectinwhichwestoretheactualnameofthefieldsandtheirdefinition.

FieldsEachfieldinthemappingsdefinitionisjustanameandanobjectdescribingthepropertiesofthefield.Forexample,wecanhaveafielddefinedasthefollowing:

"body":{"type":"string","store":"yes","index":"analyzed"}

Theprecedingfielddefinitionstartswithaname–body.Afterthatwehaveanobjectwiththreeproperties–thetypeofthefield(thetypeproperty),iftheoriginalfieldvalueshouldbestored(thestoreproperty),andifthefieldshouldbeindexedandhow(theindexproperty).And,ofcourse,multiplefielddefinitionsareseparatedfromeachotherusingthecommacharacter,justlikeotherJSONobjects.

www.EBooksWorld.ir

Page 134: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

CoretypesEachfieldtypeinElasticsearchcanbegivenoneoftheprovidedcoretypes.ThecoretypesinElasticsearchareasfollows:

StringNumber(integer,long,float,double)DateBooleanBinary

Inadditiontothecoretypes,Elasticsearchprovidesadditionaltypesthatcanhandlemorecomplicateddata–suchasnesteddocuments,object,andsoon.WewilltalkabouttheminChapter5,ExtendingYourIndexStructure.

Commonattributes

Beforecontinuingwithallthecoretypedescriptions,wewouldliketodiscusssomecommonattributesthatyoucanusetodescribeallthetypes(exceptforthebinaryone):

index_name:Thisattributedefinesthenameofthefieldthatwillbestoredintheindex.Ifthisisnotdefined,thenamewillbesettothenameoftheobjectthatthefieldisdefinedwith.Usually,youdon’tneedtosetthisproperty,butitmaybeusefulinsomecases;forexample,whenyoudon’thavecontroloverthenameofthefieldsintheJSONdocumentsthataresenttoElasticsearch.index:Thisattributecantakethevaluesanalyzedandnoand,forstring-basedfields,itcanalsobesettotheadditionalnot_analyzedvalue.Ifsettoanalyzed,thefieldwillbeindexedandthussearchable.Ifsettono,youwon’tbeabletosearchonsuchafield.Thedefaultvalueisanalyzed.Incaseofstring-basedfields,thereisanadditionaloption,not_analyzed.This,whenset,willmeanthatthefieldwillbeindexedbutnotanalyzed.So,thefieldiswrittenintheindexasitwassenttoElasticsearchandonlyaperfectmatchwillbecountedduringasearch–thequerywillhavetoincludeexactlythesamevalueasthevalueintheindex.IfwecompareittotheSQLdatabasesworld,settingtheindexpropertyofafieldtonot_analyzedwouldworkjustlikeusingwherefield=value.Alsorememberthatsettingtheindexpropertytonowillresultinthedisablinginclusionofthatfieldininclude_in_all(theinclude_in_allpropertyisdiscussedasthelastpropertyinthelist).store:Thisattributecantakethevaluesyesandnoandspecifiesiftheoriginalvalueofthefieldshouldbewrittenintotheindex.Thedefaultvalueisno,whichmeansthatElasticsearchwon’tstoretheoriginalvalueofthefieldandwilltrytousethe_sourcefield(theJSONrepresentingtheoriginaldocumentthathasbeensenttoElasticsearch)whenyouwanttoretrievethefieldvalue.Storedfieldsarenotusedforsearching,howevertheycanbeusedforhighlightingifenabled(whichmaybemoreefficientthatloadingthe_sourcefieldincaseitisbig).doc_values:Thisattributecantakethevaluesoftrueandfalse.Whensettotrue,Elasticsearchwillcreateaspecialondiskstructureduringindexationfornottokenizedfields(likenotanalyzedstringfields,numberbasedfields,Booleanfields,

www.EBooksWorld.ir

Page 135: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

anddatefields).ThisstructureishighlyefficientandisusedbyElasticsearchforoperationsthatrequireun-inverteddata,suchasaggregations,sorting,orscripting.StartingwithElasticsearch2.0thedefaultvalueofthisistruefornottokenizedfields.SettingthisvaluetofalsewillresultinElasticsearchusingfielddatacacheinsteadofdocvalues,whichhashighermemorydemand,butmaybefasterinsomeraresituations.boost:Thisattributedefineshowimportantthefieldisinsidethedocument;thehighertheboost,themoreimportantthevaluesinthefieldare.Thedefaultvalueofthisattributeis1,whichmeansaneutralvalue–anythingabove1willmakethefieldmoreimportant,anythinglessthan1willmakeitlessimportant.null_value:Thisattributespecifiesavaluethatshouldbewrittenintotheindexincasethatfieldisnotapartofanindexeddocument.Thedefaultbehaviorwilljustomitthatfield.copy_to:Thisattributespecifiesanarrayoffieldstowhichtheoriginalvaluewillbecopiedto.Thisallowsfordifferentkindofanalysisofthesamedata.Forexample,youcouldimaginehavingtwofields–onecalledtitleandonecalledtitle_sort,eachhavingthesamevaluebutprocesseddifferently.Wecouldusecopy_totocopythetitlefieldvaluetotitle_sort.include_in_all:Thisattributespecifiesifthefieldshouldbeincludedinthe_allfield.The_allfieldisaspecialfieldusedbyElasticsearchtoalloweasysearchinginthecontentsofthewholeindexeddocument.Elasticsearchcreatesthecontentofthe_allfieldbycopyingallthedocumentfieldsthere.Bydefault,ifthe_allfieldisused,allthefieldswillbeincludedinit.

String

Stringisthebasictexttypewhichallowsustostoreoneormorecharactersinsideit.Asampledefinitionofsuchafieldisasfollows:

"body":{"type":"string","store":"yes","index":"analyzed"}

Inadditiontothecommonattributes,thefollowingattributescanalsobesetforthestring-basedfields:

term_vector:Thisattributecantakethevaluesno(thedefaultone),yes,with_offsets,with_positions,andwith_positions_offsets.ItdefineswhetherornottocalculatetheLucenetermvectorsforthatfield.Ifyouareusinghighlighting(distinctionwhichtermswherematchedinadocumentduringthequery),youwillneedtocalculatethetermvectorforthesocalledfastvectorhighlighting–amoreefficienthighlightingversion.analyzer:Thisattributedefinesthenameoftheanalyzerusedforindexingandsearching.Itdefaultstotheglobally-definedanalyzername.search_analyzer:Thisattributedefinesthenameoftheanalyzerusedforprocessingthepartofthequerystringthatissenttoaparticularfield.norms.enabled:Thisattributespecifieswhetherthenormsshouldbeloadedforafield.Bydefault,itissettotrueforanalyzedfields(whichmeansthatthenormswillbeloadedforsuchfields)andtofalsefornon-analyzedfields.Normsarevalues

www.EBooksWorld.ir

Page 136: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

insideofLuceneindexthatareusedwhencalculatingascoreforadocument–usuallynotneededfornotanalyzedfieldsandusedonlyduringquerytime.Anexampleindexcreationcommandthatdisablesnormforasinglefieldpresentwouldlookasfollows:

curl-XPOST'localhost:9200/essb'-d'{

"mappings":{

"book":{

"properties":{

"name":{

"type":"string",

"norms":{

"enabled":false

}

}

}

}

}

}'

norms.loading:ThisattributetakesthevalueseagerandlazyanddefineshowElasticsearchwillloadthenorms.Thefirstvaluemeansthatthenormsforsuchfieldsarealwaysloaded.Thesecondvaluemeansthatthenormswillbeloadedonlywhenneeded.Normsareusefulforscoring,butmayrequireavastamountofmemoryforlargedatasets.Havingnormsloadedeagerly(propertysettoeager)meanslessworkduringquerytime,butwillleadtomorememoryconsumption.Anexampleindexcreationcommandthateagerlyloadnormsforasinglefieldpresentlookasfollows:

curl-XPOST'localhost:9200/essb_eager'-d'{

"mappings":{

"book":{

"properties":{

"name":{

"type":"string",

"norms":{

"loading":"eager"

}

}

}

}

}

}'

position_offset_gap:Thisattributedefaultsto0andspecifiesthegapintheindexbetweeninstancesofthegivenfieldwiththesamename.Settingthistoahighervaluemaybeusefulifyouwantposition-basedqueries(suchasphrasequeries)tomatchonlyinsideasingleinstanceofthefield.index_options:Thisattributedefinestheindexingoptionsforthepostingslist–thestructureholdingtheterms(wetalkmoreaboutitinthePostingsformatsectionofthischapter).Thepossiblevaluesaredocs(onlydocumentnumbersareindexed),freqs(documentnumbersandtermfrequenciesareindexed),positions(documentnumbers,termfrequencies,andtheirpositionsareindexed),andoffsets(document

www.EBooksWorld.ir

Page 137: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

numbers,termfrequencies,theirpositions,andoffsetsareindexed).Thedefaultvalueforthispropertyispositionsforanalyzedfieldsanddocsforfieldsthatareindexedbutnotanalyzed.ignore_above:Thisattributedefinesthemaximumsizeofthefieldincharacters.Afieldwhosesizeisabovethespecifiedvaluewillbeignoredbytheanalyzer.

NoteInoneoftheupcomingElasticsearchversions,thestringtypemaybedeprecatedandmaybereplacedbytwonewtypes,textandkeyword,tobetterindicatewhatthestringbasedfieldisrepresenting.Thetexttypewillbeusedforanalyzedtextfieldsandthekeywordtypewillbeusedfornotanalyzedtextfields.Ifyouareinterestedintheincomingchanges,refertothefollowingGitHubissue:https://github.com/elastic/elasticsearch/issues/12394.

Number

Thisisthecommonnameforafewcoretypesthatgatherallthenumericfieldtypesthatareavailableandwaitingtobeused.ThefollowingtypesareavailableinElasticsearch(wespecifythembyusingthetypeproperty):

byte:Thistypedefinesabytevalue;forexample,1.Itallowsforvaluesbetween-128and127inclusive.short:Thistypedefinesashortvalue;forexample,12.Itallowsforvaluesbetween-32768and32767inclusive.integer:Thistypedefinesanintegervalue;forexample,134.Itallowsforvaluesbetween-231and231-1inclusiveuptoJava7andvaluesbetween0and232-1inJava8.long:Thistypedefinesalongvalue;forexample,123456789.Itallowsforvaluesbetween-263and263-1inclusiveuptoJava7andvaluesbetween0and264-1inJava8.float:Thistypedefinesafloatvalue;forexample,12.23.Forinformationaboutthepossiblevalues,refertohttps://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.2.3.double:Thistypedefinesadoublevalue;forexample,123.45.Forinformationaboutthepossiblevalues,refertohttps://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.2.3.

NoteYoucanlearnmoreaboutthementionedJavatypesathttp://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html.

Asampledefinitionofafieldbasedononeofthenumerictypesisasfollows:

"price":{"type":"float","precision_step":"4"}

Inadditiontothecommonattributes,thefollowingonescanalsobesetforthenumericfields:

www.EBooksWorld.ir

Page 138: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

precision_step:Thisattributedefinesthenumberoftermsgeneratedforeachvalueinthenumericfield.Thelowerthevalue,thehigherthenumberoftermsgenerated.Forfieldswithahighernumberoftermspervalue,rangequerieswillbefasteratthecostofaslightlylargerindex.Thedefaultvalueis16forlonganddouble,8forinteger,short,andfloat,and2147483647forbyte.coerce:Thisattributedefaultstotrueandcantakethevalueoftrueorfalse.ItdefinesifElasticsearchshouldtrytoconvertthestringvaluestonumbersforagivenfieldandifthedecimalpartsofthefloatvalueshouldbetruncatedfortheintegerbasedfields.ignore_malformed:Thisattributecantakethevaluetrueorfalse(whichisthedefault).Itshouldbesettotrueinordertoomitthebadlyformattedvalues.

Boolean

ThebooleancoretypeisdesignedforindexingtheBooleanvalues(trueorfalse).Asampledefinitionofafieldbasedonthebooleantypeisasfollows:

"allowed":{"type":"boolean","store":"yes"}

Binary

ThebinaryfieldisaBASE64representationofthebinarydatastoredintheindex.Youcanuseittostoredatathatisnormallywritteninbinaryform,suchasimages.Fieldsbasedonthistypearebydefaultstoredandnotindexed,soyoucanonlyretrievethemandnotperformsearchoperationsonthem.Thebinarytypeonlysupportstheindex_name,type,store,anddoc_valuesproperties.Thesamplefielddefinitionbasedonthebinaryfieldmaylooklikethefollowing:

"image":{"type":"binary"}

Date

Thedatecoretypeisdesignedtobeusedfordateindexing.ThedateinthefieldallowsustospecifyaformatthatwillberecognizedbyElasticsearch.ItisworthnotingthatallthedatesareindexedinUTCandareinternallyindexedaslongvalues.Inadditiontothat,forthedatebasedfields,ElasticsearchacceptslongvaluesrepresentingUTCmillisecondssinceepochregardlessoftheformatspecifiedforthedatefield.

ThedefaultdateformatrecognizedbyElasticsearchisquiteuniversalandallowsustoprovidethedateandoptionallythetime;forexample,2012-12-24T12:10:22.Asampledefinitionofafieldbasedonthedatetypeisasfollows:

"published":{"type":"date","format":"YYYY-mm-dd"}

Asampledocumentthatusestheabovedatefieldwiththespecifiedformatisasfollows:

{

"name":"Sampledocument",

"published":"2012-12-22"

}

Inadditiontothecommonattributes,thefollowingonescanalsobesetforthefields

www.EBooksWorld.ir

Page 139: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

basedonthedatetype:

format:Thisattributespecifiestheformatofthedate.ThedefaultvalueisdateOptionalTime.Forafulllistofformats,visithttps://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html.precision_step:Thisattributedefinesthenumberoftermsgeneratedforeachvalueinthenumericfield.Refertothenumericcoretypedescriptionformoreinformationaboutthisparameter.numeric_resolution:ThisattributedefinestheunitoftimethatElasticsearchwillusewhenanumericvalueispassedtothedatebasedfieldinsteadofthedatefollowingaformat.Bydefault,Elasticsearchusesthemillisecondsvalue,whichmeansthatthenumericvaluewillbetreatedasmillisecondssinceepoch.Anothervalueisseconds.ignore_malformed:Thisattributecantakethevaluetrueorfalse.Thedefaultvalueisfalse.Itshouldbesettotrueinordertoomitbadlyformattedvalues.

MultifieldsTherearesituationswhereweneedtohavethesamefieldanalyzeddifferently.Forexample,oneforsorting,oneforsearching,andoneforanalysiswithaggregations,butallusingthesamefieldvalue,justindexeddifferently.Wecouldofcourseusethepreviouslydescribedfieldvaluecopying,butwecanalsousesocalledmultifields.TobeabletousethatfeatureofElasticsearch,weneedtodefineanadditionalpropertyinourfielddefinitioncalledfields.Thefieldsisanobjectthatcancontainoneormoreadditionalfieldsthatwillbepresentinourindexandwillhavethevalueofthefieldthattheyareassignedto.Forexample,ifwewouldliketohaveaggregationsdoneonthenamefieldandinadditiontothatsearchonthatfield,wewoulddefineitasfollows:

"name":{

"type":"string",

"fields":{

"agg":{"type":"string","index":"not_analyzed"}

}

}

Theprecedingdefinitionwillcreatetwofields–onecallednameandthesecondcalledname.agg.Ofcourse,youdon’thavetospecifytwoseparatefieldsinthedatayouaresendingtoElasticsearch–asingleonenamednameisenough.Elasticsearchwilldotherest,whichmeanscopyingthevalueofthefieldtoallthefieldsfromtheprecedingdefinition.

TheIPaddresstypeTheipfieldtypewasaddedtoElasticsearchtosimplifytheuseofIPv4addressesinanumericform.ThisfieldtypeallowsustosearchdatathatisindexedasanIPaddress,sortonsuchdata,anduserangequeriesusingIPvalues.

Asampledefinitionofafieldbasedononeofthenumerictypesisasfollows:

www.EBooksWorld.ir

Page 140: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"address":{"type":"ip"}

Inadditiontothecommonattributes,theprecision_stepattributecanalsobesetfortheiptypebasedfields.Refertothenumerictypedescriptionformoreinformationaboutthatproperty.

Asampledocumentthatusestheipbasedfieldlooksasfollows:

{

"name":"TomPC",

"address":"192.168.2.123"

}

TokencounttypeThetoken_countfieldtypeallowsustostoreandindexinformationabouthowmanytokensthegivenfieldhasinsteadofstoringandindexingthetextprovidedtothefield.Itacceptsthesameconfigurationoptionsasthenumbertype,butinadditiontothat,weneedtospecifytheanalyzerwhichwillbeusedtodividethefieldvalueintotokens.Wedothatbyusingtheanalyzerproperty.

Asampledefinitionofafieldbasedonthetoken_countfieldtypelooksasfollows:

"title_count":{"type":"token_count","analyzer":"standard"}

www.EBooksWorld.ir

Page 141: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UsinganalyzersThegreatthingaboutElasticsearchisthatitleveragestheanalysiscapabilitiesofApacheLucene.Thismeansthatforfieldsthatarebasedonthestringtype,wecanspecifywhichanalyzerElasticsearchshoulduse.AsyourememberfromtheFulltextsearchingsectionofChapter1,GettingStartedwithElasticsearchCluster,theanalyzerisafunctionalitythatisusedtoanalyzedataorqueriesinthewaywewant.Forexample,whenwedividewordsonthebasisofwhitespacesandlowercasecharacters,wedon’thavetoworryabouttheuserssendingwordsthatarelowercasedoruppercased.ThismeansthatElasticsearch,elasticsearch,andElAstIcSeaRChwillbetreatedasthesameword.What’smoreisthatElasticsearchallowsustousenotonlytheanalyzersprovidedoutofthebox,butalsocreateourownconfigurations.Wecanalsousedifferentanalyzersatthetimeofindexinganddifferentanalyzersatthetimeofquerying—wecanchoosehowwewantourdatatobeprocessedateachstageofthesearchprocess.Let’snowhavealookattheanalyzersprovidedbyElasticsearchandatElasticsearchanalysisfunctionalityingeneral.

Out-of-the-boxanalyzersElasticsearchallowsustouseoneofthemanyanalyzersdefinedbydefault.Thefollowinganalyzersareavailableoutofthebox:

standard:ThisanalyzerisconvenientformostEuropeanlanguages(refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.htmlforthefulllistofparameters).simple:Thisanalyzersplitstheprovidedvalueonnon-lettercharactersandconvertsthemtolowercase.whitespace:Thisanalyzersplitstheprovidedvalueonthebasisofwhitespacecharacters.stop:Thisissimilartoasimpleanalyzer,butinadditiontothefunctionalityofthesimpleanalyzer,itfiltersthedataonthebasisoftheprovidedsetofstopwords(refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stop-analyzer.htmlforthefulllistofparameters).keyword:Thisisaverysimpleanalyzerthatjustpassestheprovidedvalue.You’llachievethesamebyspecifyingaparticularfieldasnot_analyzed.pattern:Thisanalyzerallowsflexibletextseparationbytheuseofregularexpressions(refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-analyzer.htmlforthefulllistofparameters).Thekeypointtorememberwhenitcomestothepatternanalyzeristhattheprovidedpatternshouldmatchtheseparatorsofthewords,notthewordsthemselves.language:Thisanalyzerisdesignedtoworkwithaspecificlanguage.Thefulllistoflanguagessupportedbythisanalyzercanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html.snowball:Thisisananalyzerthatissimilartostandard,butadditionallyprovidesthe

www.EBooksWorld.ir

Page 142: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

stemmingalgorithm(refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-snowball-analyzer.htmlforthefulllistofparameters).

NoteStemmingistheprocessofreducingtheinflectedandderivedwordstotheirstemorbaseform.Suchaprocessallowsforthereductionofwords,forexample,withcarsandcar.Forthementionedwords,stemmer(whichisanimplementationofthestemmingalgorithm)willproduceasinglestem,car.Afterindexing,thedocumentscontainingsuchwordswillbematchedwhileusinganyofthem.Withoutstemming,thedocumentswiththeword“cars”willonlybematchedbyaquerycontainingthesameword.YoucanfindmoreinformationaboutstemmingonWikipediaathttps://en.wikipedia.org/wiki/Stemming.

DefiningyourownanalyzersInadditiontotheanalyzersmentionedpreviously,ElasticsearchallowsustodefinenewoneswithouttheneedforwritingasinglelineofJavacode.Inordertodothat,weneedtoaddanadditionalsectiontoourmappingsfile;thatis,thesettingssection,whichholdsadditionalinformationusedbyElasticsearchduringindexcreation.Thefollowingcodesnippetshowshowwecandefineourcustomsettingssection:

"settings":{

"index":{

"analysis":{

"analyzer":{

"en":{

"tokenizer":"standard",

"filter":[

"asciifolding",

"lowercase",

"ourEnglishFilter"

]

}

},

"filter":{

"ourEnglishFilter":{

"type":"kstem"

}

}

}

}

}

Wespecifiedthatwewantanewanalyzernamedentobepresent.Eachanalyzerisbuiltfromasingletokenizerandmultiplefilters.Acompletelistofthedefaultfiltersandtokenizerscanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html.Ourenanalyzerincludesthestandardtokenizerandthreefilters:asciifoldingandlowercase,whicharetheonesavailablebydefault,andacustomourEnglishFilter,whichisafilterwehavedefined.

www.EBooksWorld.ir

Page 143: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Todefineafilter,weneedtoprovideitsname,itstype(thetypeproperty),andanynumberofadditionalparametersrequiredbythatfiltertype.ThefulllistoffiltertypesavailableinElasticsearchcanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenfilters.html.Pleasebeaware,thatwewon’tbediscussingeachfilterasthelistoffiltersisconstantlychanging.Ifyouareinterestedinthefullfilterslist,pleaserefertothementionedpageinthedocumentation.

So,thefinalmappingsfilewithourcustomanalyzerdefinedwillbeasfollows:

{

"settings":{

"index":{

"analysis":{

"analyzer":{

"en":{

"tokenizer":"standard",

"filter":[

"asciifolding",

"lowercase",

"ourEnglishFilter"

]

}

},

"filter":{

"ourEnglishFilter":{

"type":"kstem"

}

}

}

}

},

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string","analyzer":"en"}

}

}

}

}

Ifwesavetheprecedingmappingstoafilecalledposts_mappings.json,wecanrunthefollowingcommandtocreatethepostsindex:

curl-XPOST'http://localhost:9200/posts'-d@posts_mappings.json

WecanseehowouranalyzerworksbyusingtheAnalyzeAPI(https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html).Forexample,let’slookatthefollowingcommand:

curl-XGET'localhost:9200/posts/_analyze?pretty&field=name'-d'robots

cars'

www.EBooksWorld.ir

Page 144: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThecommandasksElasticsearchtoshowthecontentoftheanalysisofthegivenphrase(robotscars)withtheuseoftheanalyzerdefinedfortheposttypeanditsnamefield.TheresponsethatwewillgetfromElasticsearchisasfollows:

{

"tokens":[{

"token":"robot",

"start_offset":0,

"end_offset":6,

"type":"<ALPHANUM>",

"position":0

},{

"token":"car",

"start_offset":7,

"end_offset":11,

"type":"<ALPHANUM>",

"position":1

}]

}

Asyoucansee,therobotscarsphrasewasdividedintotwotokens.Inadditiontothat,therobotswordwaschangedtorobotandthecarswordwaschangedtocar.

DefaultanalyzersThereisonemorethingtosayaboutanalyzers.Elasticsearchallowsustospecifytheanalyzerthatshouldbeusedbydefaultifnoanalyzerisdefined.Thisisdoneinthesamewayasweconfiguredacustomanalyzerinthesettingssectionofthemappingsfile,butinsteadofspecifyingacustomnamefortheanalyzer,adefaultkeywordshouldbeused.Sotomakeourpreviouslydefinedanalyzerthedefault,wecanchangetheenanalyzertothefollowing:

{

"settings":{

"index":{

"analysis":{

"analyzer":{

"default":{

"tokenizer":"standard",

"filter":[

"asciifolding",

"lowercase",

"ourEnglishFilter"

]

}

},

"filter":{

"ourEnglishFilter":{

"type":"kstem"

}

}

}

}

}

www.EBooksWorld.ir

Page 145: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

Wecanalsochooseadifferentdefaultanalyzerforsearchingandadifferentoneforindexing.Ifwewouldliketodothatinsteadofusingthedefaultkeywordfortheanalyzername,weshouldusedefault_searchanddefault_indexrespectively.

www.EBooksWorld.ir

Page 146: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DifferentsimilaritymodelsWiththereleaseofApacheLucene4.0in2012,alltheusersofthisgreatfulltextsearchlibraryweregiventheopportunitytoalterthedefaultTF/IDF-basedalgorithmanduseadifferentone(we’vementioneditintheFulltextsearchingsectionofChapter1,GettingStartedwithElasticsearchCluster).BecauseofthatweareabletochooseasimilaritymodelinElasticsearch,whichbasicallyallowsustousedifferentscoringformulasforourdocuments.

NoteNotethatthesimilaritymodelstopicrangesfromintermediatetoadvancedandinmostcasestheTF/IDFbasedalgorithmwillbesufficientforyourusecase.However,wedecidedtohaveitdescribedinthebook,soyouknowthatyouhavethepossibilityofchangingthescoringalgorithmbehaviorifneeded.

Settingper-fieldsimilaritySinceElasticsearch0.90,weareallowedtosetadifferentsimilarityforeachofthefieldsthatwehaveinourmappingsfile.Forexample,let’sassumethatwehavethefollowingsimplemappingsthatweuseinordertoindextheblogposts:

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"contents":{"type":"string"}

}

}

}

}

Todothis,wewillusetheBM25similaritymodelforthenamefieldandthecontentsfield.Inordertodothat,weneedtoextendourfielddefinitionsandaddthesimilaritypropertywiththevalueofthechosensimilarityname.Ourchangedmappingswilllooklikethefollowing:

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string","similarity":"BM25"},

"contents":{"type":"string","similarity":"BM25"}

}

}

}

}

Andthat’sall,nothingmoreisneeded.Aftertheabovechange,ApacheLucenewillusetheBM25similaritytocalculatethescorefactorforthenameandthecontentsfields.

www.EBooksWorld.ir

Page 147: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AvailablesimilaritymodelsThereareatleastfivenewsimilaritymodelsavailable.Formostoftheusecases,apartfromthedefaultone,youmayfindthefollowingmodelsuseful:

OkapiBM25model:Thissimilaritymodelisbasedonaprobabilisticmodelthatestimatestheprobabilityoffindingadocumentforagivenquery.InordertousethissimilarityinElasticsearch,youneedtousetheBM25name.OkapiBM25similarityissaidperformbestwhendealingwithshorttextdocumentswheretermrepetitionsareespeciallyhurtfultotheoveralldocumentscore.Tousethissimilarity,oneneedstosetthesimilaritypropertyforafieldtoBM25.Thissimilarityisdefinedoutoftheboxanddoesn’tneedadditionalpropertiestobeset.Divergencefromrandomnessmodel:Thissimilaritymodelisbasedontheprobabilisticmodelofthesamename.InordertousethissimilarityinElasticsearch,youneedtousetheDFRname.Itissaidthatthedivergencefromrandomnesssimilaritymodelperformswellontextthatissimilartonaturallanguage.Information-basedmodel:Thisisthelastmodelofthenewlyintroducedsimilaritymodelsandisverysimilartothedivergencefromrandomnessmodel.InordertousethissimilarityinElasticsearch,youneedtousetheIBname.SimilartotheDFRsimilarity,itissaidthattheinformation-basedmodelperformswellondatasimilartonaturallanguagetext.

ThetwoothersimilaritymodelscurrentlyavailableareLMDirichletsimilarity(touseit,setthetypepropertytoLMDirichlet)andLMJelinekMercersimilarity(touseit,setthetypepropertytoLMJelinekMercer).YoucanfindmoreaboutthesesimilaritymodelsinApacheLuceneJavadocs,MasteringElasticsearchSecondEdition,publishedbyPacktPublishingorinofficialdocumentationofElasticsearchavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-similarity.html.

Configuringdefaultsimilarity

Thedefaultsimilarityallowsustoprovideanadditionaldiscount_overlapsproperty.Itallowsustocontrolifthetokensonthesamepositionsinthetokenstream(withpositionincrementof0)areomittedduringscorecalculation.Bydefault,itissettotrue,whichmeansthatthetokensonthesamepositionsareomitted;ifyouwantthemtobecounted,youcansetthatpropertytofalse.Forexample,thefollowingcommandshowshowtocreateanindexwiththediscount_overlapspropertychangedforthedefaultsimilarity:

curl-XPUT'localhost:9200/test_similarity'-d'{

"settings":{

"similarity":{

"altered_default":{

"type":"default",

"discount_overlaps":false

}

}

},

"mappings":{

"doc":{

www.EBooksWorld.ir

Page 148: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"properties":{

"name":{"type":"string","similarity":"altered_default"}

}

}

}

}'

ConfiguringBM25similarity

Eventhoughwedon’tneedtoconfiguretheBM25similarity,wecanprovidesomeadditionaloptionstotuneitsbehavior.TheBM25similarityallowsustoprovidethediscount_overlapspropertysimilartothedefaultsimilarityandtwoadditionalproperties:k1andb.Thek1propertyspecifiesthetermfrequencynormalizationfactorandthebpropertyvaluedeterminestowhatdegreethedocumentlengthwillnormalizethetermfrequencyvalues.

ConfiguringDFRsimilarity

IncaseoftheDFRsimilarity,wecanconfigurethebasic_modelproperty(whichcantakethevaluebe,d,g,if,in,p,orine),theafter_effectproperty(withvaluesofno,b,orl),andthenormalizationproperty(whichcanbeno,h1,h2,h3,orz).Ifwechooseanormalizationvalueotherthanno,weneedtosetthenormalizationfactor.

Dependingonthechosennormalizationvalue,weshouldusenormalization.h1.c(thefloatvalue)forh1normalization,normalization.h2.c(thefloatvalue)forh2normalization,normalization.h3.c(thefloatvalue)forh3normalization,andnormalization.z.z(thefloatvalue)forznormalization.Forexample,thefollowingishowtheexamplesimilarityconfigurationwilllook(weputthisintothesettingssectionofourmappingsfile):

"similarity":{

"esserverbook_dfr_similarity":{

"type":"DFR",

"basic_model":"g",

"after_effect":"l",

"normalization":"h2",

"normalization.h2.c":"2.0"

}

}

ConfiguringIBsimilarity

IncaseofIBsimilarity,wehavethefollowingparametersthroughwhichwecanconfigurethedistributionproperty(whichcantakethevalueofllorspl)andthelambdaproperty(whichcantakethevalueofdfortff).Inadditiontothat,wecanchoosethenormalizationfactor,whichisthesameasfortheDFRsimilarity,sowe’llomitdescribingitasecondtime.ThefollowingishowtheexampleIBsimilarityconfigurationwilllook(weputthisintothesettingssectionofourmappingsfile):

"similarity":{

"esserverbook_ib_similarity":{

"type":"IB",

"distribution":"ll",

www.EBooksWorld.ir

Page 149: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"lambda":"df",

"normalization":"z",

"normalization.z.z":"0.25"

}

}

www.EBooksWorld.ir

Page 150: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 151: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

BatchindexingtospeedupyourindexingprocessInChapter1,GettingStartedwithElasticsearchCluster,wesawhowtoindexaparticulardocumentintoElasticsearch.ItrequiredopeninganHTTPconnection,sendingthedocument,andclosingtheconnection.Ofcourse,wewerenotresponsibleformostofthatasweusedthecurlcommand,butinthebackgroundthisiswhathappened.However,sendingthedocumentsonebyoneisnotefficient.Becauseofthat,itisnowtimetofindouthowtoindexalargenumberofdocumentsinamoreconvenientandefficientwaythandoingsoonebyone.

www.EBooksWorld.ir

Page 152: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PreparingdataforbulkindexingElasticsearchallowsustomergemanyrequestsintoonepackage.Thispackagecanbesentasasinglerequest.What’smore,wearenotlimitedtohavingasingletypeofrequestinthesocalledbulk–wecanmixdifferenttypesofoperationstogether,whichinclude:

Addingorreplacingtheexistingdocumentsintheindex(index)Removingdocumentsfromtheindex(delete)

Addingnewdocumentsintotheindexwhenthereisnootherdefinitionofthedocumentintheindex(create)Modifyingthedocumentsorcreatingnewonesifthedocumentdoesn’texist(update)

Theformatoftherequestwaschosenforprocessingefficiency.ItassumesthateverylineoftherequestcontainsaJSONobjectwiththedescriptionoftheoperationfollowedbythesecondlinewithadocument–anotherJSONobjectitself.Wecantreatthefirstlineasakindofinformationlineandthesecondasthedataline.Theexceptiontothisruleisthedeleteoperation,whichcontainsonlytheinformationline,becausethedocumentisnotneeded.Let’slookatthefollowingexample:

{"index":{"_index":"addr","_type":"contact","_id":1}}

{"name":"FyodorDostoevsky","country":"RU"}

{"create":{"_index":"addr","_type":"contact","_id":2}}

{"name":"ErichMariaRemarque","country":"DE"}

{"create":{"_index":"addr","_type":"contact","_id":2}}

{"name":"JosephHeller","country":"US"}

{"delete":{"_index":"addr","_type":"contact","_id":4}}

{"delete":{"_index":"addr","_type":"contact","_id":1}}

Itisveryimportantthateverydocumentoractiondescriptionisplacedinoneline(endedbyanewlinecharacter).Thismeansthatthedocumentcannotbepretty-printed.Thereisadefaultlimitationonthesizeofthebulkindexingfile,whichissetto100megabytesandcanbechangedbyspecifyingthehttp.max_content_lengthpropertyintheElasticsearchconfigurationfile.Thisletsusavoidissueswithpossiblerequesttimeoutsandmemoryproblemswhendealingwithrequeststhataretoolarge.

NoteNotethatwithasinglebatchindexingfile,wecanloadthedataintomanyindicesanddocumentsinthebulkrequestcanhavedifferenttypes.

www.EBooksWorld.ir

Page 153: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IndexingthedataInordertoexecutethebulkrequest,Elasticsearchprovidesthe_bulkendpoint.Thiscanbeusedas/_bulkorwithanindexnameas/index_name/_bulkorevenwithatypeandindexnameas/index_name/type_name/_bulk.Thesecondandthirdformsdefinethedefaultvaluesfortheindexnameandthetypename.WecanomitthesepropertiesintheinformationlineofourrequestandElasticsearchwillusethedefaultvaluesfromtheURI.ItisalsoworthknowingthatthedefaultURIvaluescanbeoverwrittenbythevaluesintheinformationlines.

Assumingwe’vestoredourdatainthedocuments.jsonfile,wecanrunthefollowingcommandtosendthisdatatoElasticsearch:

curl-XPOST'localhost:9200/_bulk?pretty'[email protected]

The?prettyparameterisofcoursenotnecessary.We’veusedthisparameteronlyfortheeaseofanalyzingtheresponseoftheprecedingcommand.Whatisimportant,inthiscase,isusingcurlwiththe--data-binaryparameterinsteadofusing–d.Thisisbecausethestandard–dparameterignoresnewlinecharacters,which,aswesaidearlier,areimportantforparsingthebulkrequestcontentbyElasticsearch.Nowlet’slookattheresponsereturnedbyElasticsearch:

{

"took":469,

"errors":true,

"items":[{

"index":{

"_index":"addr",

"_type":"contact",

"_id":"1",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"status":201

}

},{

"create":{

"_index":"addr",

"_type":"contact",

"_id":"2",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"status":201

}

},{

"create":{

www.EBooksWorld.ir

Page 154: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"_index":"addr",

"_type":"contact",

"_id":"2",

"status":409,

"error":{

"type":"document_already_exists_exception",

"reason":"[contact][2]:documentalreadyexists",

"shard":"2",

"index":"addr"

}

}

},{

"delete":{

"_index":"addr",

"_type":"contact",

"_id":"4",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"status":404,

"found":false

}

},{

"delete":{

"_index":"addr",

"_type":"contact",

"_id":"1",

"_version":2,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"status":200,

"found":true

}

}]

}

Aswecansee,everyresultisapartoftheitemsarray.Let’sbrieflycomparetheseresultswithourinputdata.Thefirsttwocommands,namedindexandcreate,wereexecutedwithoutanyproblems.Thethirdoperationfailedbecausewewantedtocreatearecordwithanidentifierthatalreadyexistedintheindex.Thenexttwooperationsweredeletions.Bothsucceeded.Notethatthefirstofthemtriedtodeleteanonexistentdocument;asyoucansee,thiswasn’taproblemforElasticsearch–thethingworthnotingthoughisthatforthenonexistingdocumentwesawastatusof404,whichintheHTTPresponsecodemeansnotfound(http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html).Asyoucansee,Elasticsearchreturnsinformationabouteachoperation,soforlargebulkrequeststheresponsecanbemassive.

www.EBooksWorld.ir

Page 155: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

The_allfieldThe_allfieldisusedbyElasticsearchtostoredatafromalltheotherfieldsinasinglefieldforeaseofsearching.Thiskindoffieldmaybeusefulwhenwewanttoimplementasimplesearchfeatureandwewanttosearchallthedata(oronlythefieldswecopytothe_allfield),butwedon’twanttothinkaboutthefieldnamesandthingslikethat.Bydefault,the_allfieldisenabledandcontainsallthedatafromallthefieldsfromthedocument.However,thisfieldmakestheindexabitbiggerandthatisnotalwaysneeded.

Forexample,whenyouinputasearchphraseintoasearchboxinthelibrarycatalogsite,youexpectthatyoucansearchusingtheauthor’sname,theISBNnumber,andthewordsthatthebooktitlecontains,butsearchingforthenumberofpagesorthecovertypeusuallydoesnotmakesense.Wecaneitherdisablethe_allfieldcompletelyorexcludethecopyingofcertainfieldstoit.Inordernottoincludeacertainfieldinthe_allfield,weusetheinclude_in_allproperty,whichwasdiscussedearlierinthischapter.Tocompletelyturnoffthe_allfieldfunctionality,wemodifyourmappingsfileasfollows:

{

"book":{

"_all":{

"enabled":false

},

"properties":{

...

}

}

}

Inadditiontotheenabledproperty,the_allfieldsupportsthefollowingones:

store

term_vector

analyzer

Forinformationabouttheprecedingproperties,refertotheMappingsconfigurationsectioninthischapter.

www.EBooksWorld.ir

Page 156: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

The_sourcefieldThe_sourcefieldallowsustostoretheoriginalJSONdocumentthatwassenttoElasticsearchduringindexation.Bydefault,the_sourcefieldisturnedonassomeoftheElasticsearchfunctionalitiesdependonit(forexample,thepartialupdatefeature).Inadditiontothat,the_sourcefieldcanbeusedasthesourceofdataforthehighlightingfunctionalityifafieldisnotstored.However,ifwedon’tneedsuchafunctionality,wecandisablethe_sourcefieldasitcausessomestorageoverhead.Inordertodothat,weneedtosetthe_sourceobject’senabledpropertytofalse,asfollows:

{

"book":{ "_source":{

"enabled":false

},

"properties":{

...

}

}

}

WecanalsotellElasticsearchwhichfieldswewanttoexcludefromthe_sourcefieldandwhichfieldswewanttoinclude.Wedothatbyaddingtheincludesandexcludespropertiestothe_sourcefielddefinition.Forexample,ifwewanttoexcludeallthefieldsintheauthorpathfromthe_sourcefield,ourmappingswilllookasfollows:

{

"book":{

"_source":{

"excludes":["author.*"]

},

"properties":{

...

}

}

}

www.EBooksWorld.ir

Page 157: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AdditionalinternalfieldsThereareadditionalfieldsthatareinternallyusedbyElasticsearch,butwhichwecan’tconfigure.Thosefieldsare:

_id:Thisfieldisusedtoholdtheidentifierofthedocumentinsidetheindexandtype_uid:Thisfieldisusedtoholdtheuniqueidentifierofthedocumentintheindexandisbuiltof_idand_type(thisallowstohavedocumentswiththesameidentifierwithdifferenttypesinsidethesameindex)_type:Thisfieldisthetypenameforthedocument_field_names:Thisfieldisthelistoffieldsexistinginthedocument

www.EBooksWorld.ir

Page 158: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 159: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IntroductiontosegmentmergingIntheFulltextsearchingsectionofChapter1,GettingStartedwithElasticsearchCluster,wementionedsegmentsandtheirimmutability.WewrotethattheLucenelibrary,andthusElasticsearch,writesdatatocertainstructuresthatarewrittenonceandneverchange.Thisallowsforsomesimplification,butalsointroducestheneedforadditionalwork.Onesuchexampleisdeletion.Becausesegment,cannotbealtered,informationaboutdeletionsmustbestoredalongsideanddynamicallyappliedduringsearch.Thisisdonebyfilteringdeleteddocumentsfromthereturnedresultset.Theotherexampleistheinabilitytomodifythedocuments(however,somemodificationsarepossible,suchasmodifyingnumericdocvalues).Ofcourse,onecansaythatElasticsearchsupportsdocumentupdates(refertotheManipulatingdatawiththeRESTAPIsectionofChapter1,GettingStartedwithElasticsearchCluster).However,underthehood,theolddocumentismarkedasdeletedandtheonewiththeupdatedcontentsisindexed.

Astimepassesandyoucontinuetoindexordeleteyourdata,moreandmoresegmentsarecreated.Dependingonhowoftenyoumodifytheindex,Lucenecreatessegmentswithvariousnumbersofdocuments-thus,segmentshavedifferentsizes.Becauseofthat,thesearchperformancemaybelowerandyourindexmaybelargerthanitshouldbe–itstillcontainsthedeleteddocuments.Theequationissimple-themoresegmentsyourindexhas,theslowerthesearchspeedis.Thisiswhensegmentmergingcomesintoplay.Wedon’twanttodescribethisprocessindetail;inthecurrentElasticsearchversion,thispartoftheenginewassimplifiedbutitisstillaratheradvancedtopic.Wedecidedtomentionmergingbecausewethinkthatitishandytoknowwheretolookforthecauseoftroublesconnectedwithtoomanyopenfiles,suspiciousCPUusage,expandingindices,orsearchingandindexingspeeddegradingwithtime.

www.EBooksWorld.ir

Page 160: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SegmentmergingSegmentmergingistheprocessduringwhichtheunderlyingLucenelibrarytakesseveralsegmentsandcreatesanewsegmentbasedontheinformationfoundinthem.Theresultingsegmenthasallthedocumentsstoredintheoriginalsegmentsexcepttheonesthatweremarkedfordeletion.Afterthemergeoperation,thesourcesegmentsaredeletedfromthedisk.BecausesegmentmergingisrathercostlyintermsofCPUandI/Ousage,itiscrucialtoappropriatelycontrolwhenandhowoftenthisprocessisinvoked.

www.EBooksWorld.ir

Page 161: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheneedforsegmentmergingYoumayaskyourselfwhyyouhavetobotherwithsegmentmerging.Firstofall,themoresegmentstheindexisbuiltfrom,theslowerthesearchwillbeandthemorememoryLucenewilluse.Thesecondisthediskspaceandresources,suchasfiledescriptors,usedbytheindex.Ifyoudeletemanydocumentsfromyourindexthen,untilthemergehappens,thosedocumentsareonlymarkedasdeletedandnotdeletedphysically.So,itmayhappenthatmostofthedocumentsthatuseourCPUandmemorydon’texist!Fortunately,Elasticsearchusesreasonabledefaultsforsegmentmerginganditisveryprobablethatnochangesarenecessary.

www.EBooksWorld.ir

Page 162: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThemergepolicyThemergepolicydefineswhenthemergingprocessshouldbeperformed.Elasticsearchmergessegmentsofapproximatelysimilarsizes,takingintoaccountthemaximumnumberofsegmentsallowedpertier.Thealgorithmofmergingcanfindsegmentswiththelowestcostofmergeandthemostimpactontheresultingsegment.

Thebasicpropertiesofthetieredmergepolicyareasfollows:

index.merge.policy.expunge_deletes_allowed:ThispropertytellsElasticsearchtomergesegmentswithpercentageofthedeleteddocumentshigherthanthisvalue,defaultsto10.index.merge.policy.floor_segment:Thispropertydefaultsto2mbandtellsElasticsearchtotreatsmallersegmentsasoneswithsizeequaltothevalueofthisproperty.Itpreventsflushingoftinysegmentstoavoidtheirhighnumber.index.merge.policy.max_merge_at_once:Inthisproperty,themaximumnumberofsegmentstobemergedatoncedefaultsto10.index.merge.policy.max_merge_at_once_explicit:Inthisproperty,themaximumnumberofsegmentsmergedatonceduringexpungedeletesoroptimizeoperationsdefaultsto10.index.merge.policy.max_merged_segment:Inthisproperty,themaximumsizeofsegmentthatcanbeproducedduringnormalmergingdefaultsto5gb.index.merge.policy.segments_per_tier:Thispropertydefaultsto10androughlydefinesthenumberofsegments.Smallervaluesmeanmoremergingbutfewersegments,whichresultsinhighersearchspeedbutlowerindexingspeedandmoreI/Opressure.Highervaluesofthepropertywillresultinhighersegmentscount,thusslowersearchspeedbuthigherindexingspeed.index.merge.policy.reclaim_deletes_weight–ThispropertytellsElasticsearchhowimportantitistochoosesegmentswithmanydeleteddocuments.Itdefaultsto2.0.

Forexample,toupdatemergepolicysettingsofalreadycreatedindexwecouldrunacommandlikethis:

curl-XPUT'localhost:9200/essb/_settings'-d'{

"index.merge.policy.max_merged_segment":"10gb"

}'

Togetdeeperintosegmentmerging,refertoourbookMasteringElasticsearchSecondEdition,publishedbyPacktPublishing.Youcanalsofindmoreinformationaboutthetieredmergepolicyathttps://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-merge.html.

NoteUptothe2.0versionofElasticsearch,wewereabletochoosebetweenthreemergepolicies:tiered,log_byte_size,andlog_doc.Thecurrentlyusedmergepolicyisbased

www.EBooksWorld.ir

Page 163: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

onthetieredmergepolicyandweareforcedtouseit.

www.EBooksWorld.ir

Page 164: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThemergeschedulerThemergeschedulertellsElasticsearchhowthemergeprocessshouldoccur.Thecurrentimplementationisbasedonaconcurrentmergeschedulerthatisstartedinaseparatethreadandusesthedefinednumberofthreadsdoingmergesinparallel.Elasticsearchallowsyoutosetthenumberofthreadsthatcanbeusedforsimultaneousmergingbyusingtheindex.merge.scheduler.max_thread_countproperty.

www.EBooksWorld.ir

Page 165: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThrottlingAswehavealreadymentioned,mergingmaybeexpensivewhenitcomestoserverresources.Themergeprocessusuallyworksinparalleltootheroperations,sotheoreticallyitshouldn’thavetoomuchinfluence.Inpractice,thenumberofdiskinput/outputoperationscanbesolargeastosignificantlyaffecttheoverallperformance.Insuchcases,throttlingissomethingthatmayhelp.Infact,thisfeaturecanbeusedforlimitingthespeedofthemerge,butitmayalsobeusedforalltheoperationsusingthedatastore.ThrottlingcanbesetintheElasticsearchconfigurationfile(theelasticsearch.ymlfile)ordynamicallybyusingthesettingsAPI(refertotheTheupdatesettingsAPIsectionofChapter9,ElasticsearchCluster,fordetail).Therearetwosettingsthatadjustthrottling:typeandvalue.

Tosetthethrottlingtype,settheindices.store.throttle.typeproperty,whichallowsustousethefollowingvalues:

none:Thisvaluedefinesthatnothrottlingisonmerge:Thisvaluedefinesthatthrottlingaffectsonlythemergeprocessall:Thisvaluedefinesthatthrottlingisusedforallthedatastoreactivities

Thesecondproperty,indices.store.throttle.max_bytes_per_sec,describeshowmuchthethrottlinglimitstheI/Ooperations.Asitsnamesuggests,ittellsushowmanybytescanbeprocessedpersecond.Forexample,let’slookatthefollowingconfiguration:

indices.store.throttle.type:merge

indices.store.throttle.max_bytes_per_sec:10mb

Inthisexample,welimitthemergeoperationsto10megabytespersecond.Bydefault,Elasticsearchusesthemergethrottlingtypewiththemax_bytes_per_secpropertysetto20mb.Thismeansthatallthemergeoperationsarelimitedto20megabytespersecond.

www.EBooksWorld.ir

Page 166: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 167: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IntroductiontoroutingBydefault,Elasticsearchwilltrytodistributeyourdocumentsevenlyamongalltheshardsoftheindex.However,that’snotalwaysthedesiredsituation.Inordertoretrievethedocuments,Elasticsearchmustqueryalltheshardsandmergetheresults.Whatifwecoulddivideourdataonsomebasis(forexample,theclientidentifier)andusethatinformationtoputdatawiththesamepropertiesinthesameplaceinthecluster.Elasticsearchallowsustodothatbyexposingapowerfuldocumentandquerydistributioncontrolmechanismrouting.Inshort,itallowsustochooseashardtobeusedtoindexorsearchthedata.

www.EBooksWorld.ir

Page 168: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DefaultindexingDuringindexingoperations,whenyousendadocumentforindexing,Elasticsearchlooksatitsidentifiertochoosetheshardinwhichthedocumentshouldbeindexed.Bydefault,Elasticsearchcalculatesthehashvalueofthedocument’sidentifierand,onthebasisofthat,itputsthedocumentinoneoftheavailableprimaryshards.Then,thosedocumentsareredistributedtothereplicas.Thefollowingdiagramshowsasimpleillustrationofhowindexingworksbydefault:

www.EBooksWorld.ir

Page 169: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DefaultsearchingSearchingisabitdifferentfromindexing,becauseinmostsituationsyouneedtoqueryalltheshardstogetthedatayouareinterestedin(wewilltalkaboutthatinChapter3,SearchingYourData),atleastintheinitialscatterphaseofthequery.Imagineasituationwhenyouhavethefollowingmappingsdescribingyourindex:

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"contents":{"type":"string"},

"userId":{"type":"long"}

}}

}}

Asyoucansee,ourindexconsistsoffourfields:theidentifier(theidfield),nameofthedocument(thenamefield),contentsofthedocument(thecontentsfield),andtheidentifieroftheusertowhichthedocumentsbelong(theuserIdfield).Togetallthedocumentsforaparticularuser,onewithuserIdequalto12,youcanrunthefollowingquery:

curl–XGET'http://localhost:9200/posts/_search?q=userId:12'

Dependingonthesearchtype(wewilltalkmoreaboutitinChapter3,SearchingYourData),Elasticsearchwillrunyourquery.Itusuallymeansthatitwillfirstqueryallthenodesfortheidentifiersandscoreofthematchingdocumentsandthenitwillsendaninternalqueryagain,butonlytotherelevantshards(theonescontainingtheneededdocuments)togetthedocumentsneededtobuildtheresponse.

Averysimplifiedviewofhowthedefaultsearchingworksduringitsinitialphaseisshowninthefollowingillustration:

www.EBooksWorld.ir

Page 170: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Whatifwecouldputallthedocumentsforasingleuserintoasingleshardandqueryonthatshard?Wouldn’tthatbewiseforperformance?Yes,thatishandyandthatiswhatroutingallowsyoudoto.

www.EBooksWorld.ir

Page 171: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RoutingRoutingcancontrolwhichshardyourdocumentsandquerieswillbeforwardedto.Bynow,youwillprobablyhaveguessedthatwecanspecifytheroutingvaluebothduringindexingandduringqueryingand,infact,ifyoudecidetospecifyexplicitroutingvalues,you’llprobablywanttodothatduringindexingandsearching.

Inourcase,wewillusetheuserIdvaluetosetroutingduringindexingandthesamevaluewillbeusedduringsearching.Becausewewillusethesameroutingvalueforallthedocumentsforasingleuser,thesamehashvaluewillbecalculatedandthusallthedocumentsforthatparticularuserwillbeplacedinthesameshard.Usingthesamevalueduringsearchwillresultinsearchingasingleshardinsteadofthewholeindex.

Thereisonethingyoushouldrememberwhenusingroutingwhensearching.Whensearching,youshouldaddaquerypartthatwilllimitthereturneddocumentstotheonesforthegivenuser.Routingisnotenough.Thisisbecauseyou’llprobablyhavemoredistinctroutingvaluesthanthenumberofshardsyourindexwillbebuiltwith.Forexample,youcanhave10shardsbuildingyourindex,butatthesametimehavehundredsofusers.Itisphysicallyimpossibletodedicateasingleshardtoonlyasingleuser.Itisusuallynotgoodfromascalingpointforviewaswell.Becauseofthat,afewdistinctvaluescanpointtothesameshard–inourcasedataofafewuserswillbeplacedinthesameshard.Becauseofthat,weneedaquerypartthatwilllimitthedatatoaparticularuseridentifier,suchasatermquery.

Thefollowingdiagramshowsaverysimpleillustrationofhowsearchingworkswithaprovidedcustomroutingvalue:

www.EBooksWorld.ir

Page 172: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Asyoucansee,Elasticsearchwillsendourquerytoasingleshard.Nowlet’slookathowwecanspecifytheroutingvalues.

www.EBooksWorld.ir

Page 173: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheroutingparametersTheideaisverysimple.TheendpointusedforalltheoperationsconnectedwithfetchingorstoringdocumentsinElasticsearchallowsustouseadditionalparametercalledrouting.YoucanaddittoyourHTTPorsetitbyusingtheclientlibraryofyourchoice.

So,inordertoindexasampledocumenttothepreviouslyshownindex,wewillusethefollowingcommand:

curl-XPUT'http://localhost:9200/posts/post/1?routing=12'-d'{

"id":"1",

"name":"Testdocument",

"contents":"Testdocument",

"userId":"12"

}'

Ifwenowgetbacktoourpreviousqueryfetchingouruser’sdataandwemodifyittouserouting,itwouldlookasfollows:

curl-XGET'http://localhost:9200/posts/_search?routing=12&q=userId:12'

Asyoucansee,thesameroutingvaluewasusedduringindexingandquerying.Thisispossibleinmostcaseswhenroutingisused.Weknowwhichuserdataweareindexingandwewillprobablyknowwhichuserissearchingforthedata.Inourcase,ourimaginaryuserwasgiventheidentifierof12andweusedthatvalueduringindexingandsearching.

Notethatduringsearchingyoucanspecifymultipleroutingvaluesseparatedbycommas.Forexample,ifwewanttheprecedingquerytobeadditionallyroutedbythevalueofthesectionparameter(ifitexisted)andwealsowanttofilterbythisparameter,ourquerywilllooklikethefollowing:

curl-XGET'http://localhost:9200/posts/_search?

routing=12,6654&q=userId:12+AND+section:6654'

Ofcourse,theprecedingcommandcanmatchmultipleshardsnowasthevaluesgiventoroutingcanpointtomultipleshards.Becauseofthatyouneedtoprovideonlyasingleroutingvalueduringindexation(Elasticsearchneedstobepointedtoasingleshardorindexationwillfail).Youcanofcoursequerymultipleshardsatthesametimeandbecauseofthatmultipleroutingvaluescanbeprovidedduringsearching.

NoteRememberthatroutingisnottheonlythingthatisrequiredtogetresultsforagivenuser.That’sbecauseusuallywehavefewshardsthathaveuniqueroutingvalues.Thismeansthatwewillhavedatafrommultipleusersinasingleshard.So,whenusingrouting,youshouldalsonarrowdownyourresultstotheonesforagivenuser.You’lllearnmoreabouthowyoucandothatinChapter3,SearchingYourData.

www.EBooksWorld.ir

Page 174: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RoutingfieldsSpecifyingtheroutingvaluewitheachrequestiscriticalwhenusinganindexoperation.Withoutit,Elasticsearchusesthedefaultwayofdeterminingwherethedocumentshouldbestored–itusesthehashvalueofthedocumentidentifier.Thismayleadtoasituationwhereonedocumentexistsinmanyversionsondifferentshards.Asimilarsituationmayoccurwhenfetchingthedocument.Whenadocumentisstoredwithagivenroutingvalue,wemayhitthewrongshardandthedocumentmaybenotfound.

Infact,Elasticsearchallowsustochangethedefaultbehaviorandforcesustouseroutingwhenqueryingagivenindex.Todothat,weneedtoaddthefollowingsectiontoourtypedefinition:

"_routing":{

"required":true

}

Theprecedingdefinitionmeansthattheroutingvalueneedstobeprovided(the"required":trueproperty);withoutit,anindexrequestwillfail.

www.EBooksWorld.ir

Page 175: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 176: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SummaryInthischapter,we’velearnedalotwhenitcomestoindexationanddatahandlinginElasticsearch.WestartedwithbasicinformationaboutElasticsearchandweproceededtotuningtheschema-lessbehaviorinElasticsearch.Welearnedhowtoconfigureourmappings,useoutoftheboxlanguageanalysiscapabilitiesofElasticsearch,andcreateourownmappings.Welookedatbatchindexingtospeedupindexationandweaddedadditionalinternalinformationtothedocumentsinourindices.Finally,welookedatsegmentmergingandrouting.

Inthenextchapter,wewillfullyconcentrateonsearchingandtheextensivequerylanguageofElasticsearch.WewillstartwithhowtoqueryElasticsearchandhowtheElasticsearchqueryprocessworks.Wewilllearnaboutallthebasicqueriesandcompoundqueriestobeabletousetheminourapplications.Finally,wewillseewhichqueryshouldbechosenforthegivenusecase.

www.EBooksWorld.ir

Page 177: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 178: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Chapter3.SearchingYourDataInthepreviouschapter,wedivedintoElasticsearchindexing.Welearnedalotwhenitcomestodatahandling.WesawhowtotuneElasticsearchschema-lessmechanismandwenowknowhowtocreateourownmappings.WealsosawthecoretypesofElasticsearchandweusedanalyzers–boththeonethatcomesoutoftheboxwithElasticsearchandtheonewedefinedourselves.Weusedbulkindexingandweaddedadditionalinternalinformationtoourindices.Finally,welearnedwhatsegmentmergingis,howwecanfinetuneit,andhowtouseroutinginElasticsearchandwhatitgivesus.Thischapterisfullydedicatedtoquerying.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

HowtoqueryElasticsearchWhathappensinternallywhenqueriesarerunWhatarethebasicqueriesinElasticsearchWhatarethecompoundqueriesinElasticsearchthatallowustogroupotherqueriesHowtousepositionawarequeries–spanqueriesHowtochoosetherightqueryforthejob

www.EBooksWorld.ir

Page 179: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

QueryingElasticsearchSofar,whenwehavesearchedourdata,weusedtheRESTAPIandasimplequeryortheGETrequest.Similarly,whenwewerechangingtheindex,wealsousedtheRESTAPIandsenttheJSON-structureddatatoElasticsearch.Regardlessofthetypeofoperationwewantedtoperform,whetheritwasamappingchangeordocumentindexation,weusedJSONstructuredrequestbodytoinformElasticsearchabouttheoperationdetails.

AsimilarsituationhappenswhenwewanttosendmorethanasimplequerytoElasticsearch,westructureitusingtheJSONobjectsandsendittoElasticsearchintherequestbody.ThisiscalledthequeryDSL.Inabroaderview,Elasticsearchsupportstwokindsofqueries:basiconesandcompoundones.Basicqueries,suchasthetermquery,areusedforqueryingtheactualdata.WewillcovertheseintheBasicqueriessectionofthischapter.Thesecondtypeofqueryisthecompoundquery,suchastheboolquery,whichcancombinemultiplequeries.WewillcovertheseintheCompoundqueriessectionofthischapter.

However,thisisnotthewholepicture.Inadditiontothesetwotypesofqueries,certainqueriescanhavefiltersthatareusedtonarrowdownyourresultswithcertaincriteria.Filterqueriesdon’taffectscoringandareusuallyveryefficientandeasilycached.

Tomakeitevenmorecomplicated,queriescancontainotherqueries(don’tworry;wewilltrytoexplainallthis!).Furthermore,somequeriescancontainfiltersandotherscancontainbothqueriesandfilters.Althoughthisisnoteverything,wewillstickwiththisworkingexplanationfornow.WewillgooverthisingreaterdetailintheCompoundqueriessectioninthischapterandtheFilteringyourresultssectioninChapter4,ExtendingYourQueryingKnowledge.

www.EBooksWorld.ir

Page 180: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheexampledataIfnotstatedotherwise,thefollowingmappingswillbeusedfortherestofthechapter:

{

"book":{

"properties":{

"author":{

"type":"string"

},

"characters":{

"type":"string"

},

"copies":{

"type":"long",

"ignore_malformed":false

},

"otitle":{

"type":"string"

},

"tags":{

"type":"string",

"index":"not_analyzed"

},

"title":{

"type":"string"

},

"year":{

"type":"long",

"ignore_malformed":false,

"index":"analyzed"

},

"available":{

"type":"boolean"

}

}

}

}

Theprecedingmappingsrepresentasimplelibraryandwereusedtocreatethelibraryindex.OnethingtorememberisthatElasticsearchwillanalyzethestringbasedfieldsifwedon’tconfigureitdifferently.

Theprecedingmappingswerestoredinthemapping.jsonfileand,inordertocreatethementionedlibraryindex,wecanusethefollowingcommands:

curl-XPOST'localhost:9200/library'

curl-XPUT'localhost:9200/library/book/_mapping'[email protected]

Wealsousedthefollowingsampledataastheexampleonesforthischapter:

{"index":{"_index":"library","_type":"book","_id":"1"}}

{"title":"AllQuietontheWesternFront","otitle":"ImWestennichts

Neues","author":"ErichMariaRemarque","year":1929,"characters":["Paul

Bäumer","AlbertKropp","HaieWesthus","FredrichMüller","Stanislaus

www.EBooksWorld.ir

Page 181: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Katczinsky","Tjaden"],"tags":["novel"],"copies":1,"available":true,

"section":3}

{"index":{"_index":"library","_type":"book","_id":"2"}}

{"title":"Catch-22","author":"JosephHeller","year":1961,"characters":

["JohnYossarian","CaptainAardvark","ChaplainTappman","Colonel

Cathcart","DoctorDaneeka"],"tags":["novel"],"copies":6,"available":

false,"section":1}

{"index":{"_index":"library","_type":"book","_id":"3"}}

{"title":"TheCompleteSherlockHolmes","author":"ArthurConan

Doyle","year":1936,"characters":["SherlockHolmes","Dr.Watson","G.

Lestrade"],"tags":[],"copies":0,"available":false,"section":12}

{"index":{"_index":"library","_type":"book","_id":"4"}}

{"title":"CrimeandPunishment","otitle":"Преступлéниеи

наказáние","author":"FyodorDostoevsky","year":1886,"characters":

["Raskolnikov","SofiaSemyonovnaMarmeladova"],"tags":[],"copies":0,

"available":true}

Westoredoursampledatainthedocuments.jsonfileandweusethefollowingcommandtoindexit:

curl-s-XPOST'localhost:9200/_bulk'[email protected]

Thiscommandrunsbulkindexing.YoucanlearnmoreaboutitintheBatchindexingtospeedupyourindexingprocesssectioninChapter2,IndexingYourData.

www.EBooksWorld.ir

Page 182: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AsimplequeryThesimplestwaytoqueryElasticsearchistousetheURIrequestquery.WealreadydiscusseditintheSearchingwiththeURIrequestquerysectionofChapter1,GettingStartedwithElasticsearchCluster.Forexample,tosearchforthewordcrimeinthetitlefield,youcouldsendaqueryusingthefollowingcommand:

curl-XGET'localhost:9200/library/book/_search?q=title:crime&pretty'

Thisisaverysimple,butlimited,wayofsubmittingqueriestoElasticsearch.IfwelookfromthepointofviewoftheElasticsearchqueryDSL,theprecedingqueryisaquery_stringquery.Itsearchesforthedocumentsthathavethetermcrimeinthetitlefieldandcanberewrittenasfollows:

{

"query":{

"query_string":{"query":"title:crime"}

}

}

SendingaqueryusingthequeryDSLisabitdifferent,butstillnotrocketscience.WesendtheGET(POSTisalsoacceptedincaseyourtoolorlibrarydoesn’tallowsendingrequestbodyinHTTPGETrequests)HTTPrequesttothe_searchRESTendpointasearlierandincludethequeryintherequestbody.Let’stakealookatthefollowingcommand:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"query_string":{"query":"title:crime"}

}

}'

Asyoucansee,weusedtherequestbody(the-dswitch)tosendthewholeJSON-structuredquerytoElasticsearch.TheprettyrequestparametertellsElasticsearchtostructuretheresponseinsuchawaythatwehumanscanreaditmoreeasily.Inresponsetotheprecedingcommand,wegetthefollowingoutput:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

www.EBooksWorld.ir

Page 183: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

}]

}

}

Nice!WegotourfirstsearchresultswiththequeryDSL.

www.EBooksWorld.ir

Page 184: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PagingandresultsizeElasticsearchallowsustocontrolhowmanyresultswewanttoget(atmost)andfromwhichresultwewanttostart.Thefollowingarethetwoadditionalpropertiesthatcanbesetintherequestbody:

from:Thispropertyspecifiesthedocumentthatwewanttohaveourresultsfrom.Itsdefaultvalueis0,whichmeansthatwewanttogetourresultsfromthefirstdocument.size:Thispropertyspecifiesthemaximumnumberofdocumentswewantastheresultofasinglequery(whichdefaultsto10).Forexample,ifweareonlyinterestedinaggregationsresultsanddon’tcareaboutthedocumentsreturnedbythequery,wecansetthisparameterto0.

Ifwewantourquerytogetdocumentsstartingfromthetenthitemonthelistandfetch20documents,wesendthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"from":9,

"size":20,

"query":{

"query_string":{"query":"title:crime"}

}

}'

TipDownloadingtheexamplecode

Youcandownloadtheexamplecodefilesforthisbookfromyouraccountathttp://www.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.

Youcandownloadthecodefilesbyfollowingthesesteps:

Loginorregistertoourwebsiteusingyoure-mailaddressandpasswordHoverthemousepointerontheSUPPORTtabatthetopClickonCodeDownloads&ErrataEnterthenameofthebookintheSearchboxSelectthebookforwhichyou’relookingtodownloadthecodefilesChoosefromthedrop-downmenuwhereyoupurchasedthisbookfromClickonCodeDownload

Oncethefileisdownloaded,makesurethatyouunziporextractthefolderusingthelatestversionof:

WinRAR/7-ZipforWindowsZipeg/iZip/UnRarXforMac7-Zip/PeaZipforLinux

www.EBooksWorld.ir

Page 185: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ReturningtheversionvalueInadditiontoalltheinformationreturned,Elasticsearchcanreturntheversionofthedocument(wementionedaboutversioninginChapter1,GettingStartedwithElasticsearchCluster.Todothis,weneedtoaddtheversionpropertywiththevalueoftruetothetoplevelofourJSONobject.So,thefinalquery,whichrequeststheversioninformation,willlookasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"version":true,

"query":{

"query_string":{"query":"title:crime"}

}

}'

Afterrunningtheprecedingquery,wegetthefollowingresults:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_version":1,

"_score":0.5,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

}]

}

}

Asyoucansee,the_versionsectionispresentforthesinglehitwegot.

www.EBooksWorld.ir

Page 186: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

LimitingthescoreFornonstandardusecases,Elasticsearchprovidesafeaturethatletsusfiltertheresultsonthebasisofaminimumscorevaluethatthedocumentmusthavetobeconsideredamatch.Inordertousethisfeature,wemustprovidethemin_scorevalueatthetoplevelofourJSONobjectwiththevalueoftheminimumscore.Forexample,ifwewantourquerytoonlyreturndocumentswithascorehigherthan0.75,wesendthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"min_score":0.75,

"query":{

"query_string":{"query":"title:crime"}

}

}'

Wegetthefollowingresponseafterrunningtheprecedingquery:

{

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":0,

"max_score":null,

"hits":[]

}

}

Ifyoulookatthepreviousexamples,thescoreofourdocumentwas0.5,whichislowerthan0.75,andthuswedidn’tgetanydocumentsinresponse.

Limitingthescoreusuallydoesn’tmakemuchsensebecausecomparingscoresbetweenthequeriesisquitehard.However,maybeinyourcase,thisfunctionalitywillbeneeded.

www.EBooksWorld.ir

Page 187: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ChoosingthefieldsthatwewanttoreturnWiththeuseofthefieldsarrayintherequestbody,Elasticsearchallowsustodefinewhichfieldstoincludeintheresponse.Rememberthatyoucanonlyreturnthesefieldsiftheyaremarkedasstoredinthemappingsusedtocreatetheindex,orifthe_sourcefieldwasused(Elasticsearchusesthe_sourcefieldtoprovidethestoredvaluesandthe_sourcefieldisturnedonbydefault).

So,forexample,toreturnonlythetitleandtheyearfieldsintheresults(foreachdocument),sendthefollowingquerytoElasticsearch:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"fields":["title","year"],

"query":{

"query_string":{"query":"title:crime"}

}

}'

Inresponse,wegetthefollowingoutput:

{

"took":5,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"fields":{

"title":["CrimeandPunishment"],

"year":[1886]

}

}]

}

}

Asyoucansee,everythingworkedaswewantedto.Therearefourthingswewouldliketosharewithyouatthispoint,whichareasfollows:

Ifwedon’tdefinethefieldsarray,itwillusethedefaultvalueandreturnthe_sourcefieldifavailable.Ifweusethe_sourcefieldandrequestafieldthatisnotstored,thenthatfieldwillbeextractedfromthe_sourcefield(however,thisrequiresadditionalprocessing).Ifwewanttoreturnallthestoredfields,wejustpassanasterisk(*)asthefieldname.Fromaperformancepointofview,it’sbettertoreturnthe_sourcefieldinsteadof

www.EBooksWorld.ir

Page 188: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

multiplestoredfields.Thisisbecausegettingmultiplestoredfieldsmaybeslowercomparedtoretrievingasingle_sourcefield.

www.EBooksWorld.ir

Page 189: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SourcefilteringInadditiontochoosingwhichfieldsarereturned,Elasticsearchallowsustouseso-calledsourcefiltering.Thisfunctionalityallowsustocontrolwhichfieldsarereturnedfromthe_sourcefield.Elasticsearchexposesseveralwaystodothis.Thesimplestsourcefilteringallowsustodecidewhetheradocumentshouldbereturnedornot.Considerthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"_source":false,

"query":{

"query_string":{"query":"title:crime"}

}

}'

TheresultretunedbyElasticsearchshouldbesimilartothefollowingone:

{

"took":12,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5

}]

}

}

Notethattheresponseislimitedtobaseinformationaboutadocumentandthe_sourcefieldwasnotincluded.IfyouuseElasticsearchasasecondsourceofdataandcontentofthedocumentisservedfromSQLdatabaseorcache,thedocumentidentifierisallyouneed.

Thesecondwayissimilartothatdescribedintheprecedingfields,althoughwedefinewhichfieldsshouldbereturnedinthedocumentsourceitself.Let’sseethatusingthefollowingexamplequery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"_source":["title","otitle"],

"query":{

"query_string":{"query":"title:crime"}

}

}'

Wewantedtogetthetitleandtheotitledocumentfieldsinthereturned_sourcefield.

www.EBooksWorld.ir

Page 190: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Elasticsearchextractedthosevaluesfromtheoriginal_sourcevalueandincludedthe_sourcefieldonlywiththerequestedfields.ThewholeresponsereturnedbyElasticsearchlookedasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"_source":{

"otitle":"Преступлéниеинаказáние",

"title":"CrimeandPunishment"

}

}]

}

}

Wecanalsouseanasterisktoselectwhichfieldsshouldbereturnedinthe_sourcefield;forexample,title*willreturnvaluesforthetitlefieldandfortitle10(ifwehavesuchfieldinourdata).Ifwehavedocumentswithnestedparts,wecanusenotationwithadot;forexample,title.*toselectallthefieldsnestedunderthetitleobject.

Finally,wecanalsospecifyexplicitlywhichfieldswewanttoincludeandwhichtoexcludefromthe_sourcefield.Wecanincludefieldsusingtheincludepropertyandwecanexcludefieldsusingtheexcludeproperty(bothofthemarearraysofvalues).Forexample,ifwewantthereturned_sourcefieldtoincludeallthefieldsstartingwiththelettertbutnotthetitlefield,wewillrunthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"_source":{

"include":["t*"],

"exclude":["title"]

},

"query":{

"query_string":{"query":"title:crime"}

}

}'

www.EBooksWorld.ir

Page 191: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UsingthescriptfieldsElasticsearchallowsustousescript-evaluatedvaluesthatwillbereturnedwiththeresultdocuments(wewilldiscussElasticsearchscriptingcapabilitiesingreaterdetailintheScriptingcapabilitiesofElasticsearchsectioninChapter6,MakeYourSearchBetter).Tousethescriptfieldsfunctionality,weaddthescript_fieldssectiontoourJSONqueryobjectandanobjectwithanameofourchoiceforeachscriptedvaluethatwewanttoreturn.Forexample,toreturnavaluenamedcorrectYear,whichiscalculatedastheyearfieldminus1800,werunthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"script_fields":{

"correctYear":{

"script":"doc[\"year\"].value-1800"

}

},

"query":{

"query_string":{"query":"title:crime"}

}

}'

NoteBydefault,Elasticsearchdoesn’tallowustousedynamicscripting.Ifyoutriedtheprecedingquery,youprobablygotanerrorwithinformationstatingthatthescriptsoftype[inline]withoperation[search]andlanguage[groovy]aredisabled.Tomakethisexamplework,youshouldaddthescript.inline:onpropertytotheelasticsearch.ymlfile.However,thisexposesasecuritythreat.MakesuretoreadtheScriptingcapabilitiesofElasticsearchsectioninChapter6,MakeYourSearchBetter,tolearnabouttheconsequences.

Usingthedocnotation,likewedidintheprecedingexample,allowsustocatchtheresultsreturnedandspeedupscriptexecutionatthecostofhighermemoryconsumption.Wealsogetlimitedtosingle-valuedandsingletermfields.Ifwecareaboutmemoryusage,orifweareusingmorecomplicatedfieldvalues,wecanalwaysusethe_sourcefield.Thesamequeryusingthe_sourcefieldlooksasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"script_fields":{

"correctYear":{

"script":"_source.year-1800"

}

},

"query":{

"query_string":{"query":"title:crime"}

}

}'

ThefollowingresponseisreturnedbyElasticsearchwithdynamicscriptingenabled:

{

"took":76,

www.EBooksWorld.ir

Page 192: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"fields":{

"correctYear":[86]

}

}]

}

}

Asyoucansee,wegotthecalculatedcorrectYearfieldinresponse.

PassingparameterstothescriptfieldsLet’stakealookatonemorefeatureofthescriptfields-thepassingofadditionalparameters.Insteadofhavingthevalue1800intheequation,wecanuseavariablenameandpassitsvalueintheparamssection.Ifwedothis,ourquerywilllookasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"script_fields":{

"correctYear":{

"script":"_source.year-paramYear",

"params":{

"paramYear":1800

}

}

},

"query":{

"query_string":{"query":"title:crime"}

}

}'

Asyoucansee,weaddedtheparamYearvariableaspartofthescriptedequationandprovideditsvalueintheparamssection.ThisallowsElasticsearchtoexecutethesamescriptwithdifferentparametervaluesinaslightlymoreefficientway.

www.EBooksWorld.ir

Page 193: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 194: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UnderstandingthequeryingprocessAfterreadingtheprevioussection,wenowknowhowqueryingworksinElasticsearch.YouknowthatElasticsearch,inmostcases,needstoscatterthequeryacrossmultiplenodes,gettheresults,mergethem,fetchtherelevantdocumentsfromoneormoreshards,andreturnthefinalresultstotheclientrequestingthedocuments.Whatwedidn’ttalkaboutaretwoadditionalthingsthatdefinehowqueriesbehave:searchtypeandqueryexecutionpreference.WewillnowconcentrateonthesefunctionalitiesofElasticsearch.

www.EBooksWorld.ir

Page 195: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

QuerylogicElasticsearchisadistributedsearchengineandsoallfunctionalityprovidedmustbedistributedinitsnature.Itisexactlythesamewithquerying.Becausewewouldliketodiscusssomemoreadvancedtopicsonhowtocontrolthequeryprocess,wefirstneedtoknowhowitworks.

Let’snowgetbacktohowqueryingworks.Westartedthetheoryinthefirstchapterandwewouldliketogetbacktoit.Bydefault,ifwedon’talteranything,thequeryprocesswillconsistoftwophases:thescatterandthegatherphase.Theaggregatornode(theonethatreceivestherequest)willrunthescatterphasefirst.Duringthatphase,thequeryisdistributedtoalltheshardsthatourindexisbuiltfrom(ofcourseifroutingisnotused).Forexample,ifitisbuiltof5shardsand1replicathen5physicalshardswillbequeried(wedon’tneedtoqueryashardanditsreplicaastheycontainthesamedata).Eachofthequeriedshardswillonlyreturnthedocumentidentifierandthescoreofthedocument.Thenodethatsentthescatterquerywillwaitforalltheshardstocompletetheirtask,gathertheresults,andsortthemappropriately(inthiscase,fromtopscoringtothelowestscoringones).

Afterthat,anewrequestwillbesenttobuildthesearchresults.However,nowonlytothoseshardsthatheldthedocumentstobuildtheresponse.Inmostcases,Elasticsearchwon’tsendtherequesttoalltheshardsbuttoitssubset.That’sbecauseweusuallydon’tgetthecompleteresultofthequerybutonlyaportionofit.Thisphaseiscalledthegatherphase.Afterallthedocumentsaregathered,thefinalresponseisbuiltandreturnedasthequeryresult.ThisisthebasicanddefaultElasticsearchbehaviorbutwecanchangeit.

www.EBooksWorld.ir

Page 196: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SearchtypeElasticsearchallowsustochoosehowwewantourquerytobeprocessedinternally.Wecandothatbyspecifyingthesearchtype.Therearedifferentsituationswheredifferentsearchtypesareappropriate:sometimesonecancareonlyabouttheperformancewhilesometimesqueryrelevanceisthemostimportantfactor.YoushouldrememberthateachshardisasmallLuceneindexand,inordertoreturnmorerelevantresults,someinformation,suchasfrequencies,needstobetransferredbetweentheshards.Tocontrolhowthequeriesareexecuted,wecanpassthesearch_typerequestparameterandsetittooneofthefollowingvalues:

query_then_fetch:Inthefirststep,thequeryisexecutedtogettheinformationneededtosortandrankthedocuments.Thisstepisexecutedagainstalltheshards.Thenonlytherelevantshardsarequeriedfortheactualcontentofthedocuments.Thisisthesearchtypeusedbydefaultifnosearchtypeisprovidedwiththequeryandthisisthequerytypewedescribedpreviously.dfs_query_then_fetch:Thisissimilartoquery_then_fetch.However,itcontainsanadditionalqueryphasecomparingtoquery_then_fetchwhichcalculatesdistributedtermfrequencies.

Therearealsotwodeprecatedsearchtypes:countandscan.ThefirstoneisdeprecatedstartingfromElasticsearch2.0andthesecondonestartingwithElasticsearch2.1.Thefirstsearchtypeusedtoprovidebenefitswhereonlyaggregationsorthenumberofdocumentswasrelevant,butnowitisenoughtoaddsizeequalto0toyourqueries.Thescanrequestwasusedforscrollingfunctionality.

Soifwewouldliketousethesimplestsearchtype,wewouldrunthefollowingcommand:

curl-XGET'localhost:9200/library/book/_search?

pretty&search_type=query_then_fetch'-d'{

"query":{

"term":{"title":"crime"}

}

}'

www.EBooksWorld.ir

Page 197: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SearchexecutionpreferenceInadditiontothepossibilityofcontrollinghowthequeryisexecuted,wecanalsocontrolonwhichshardstoexecutethequery.Bydefault,Elasticsearchusesshardsandreplicasonanynodeinaroundrobinmanner–sothateachshardisqueriedasimilarnumberoftimes.Thedefaultbehavioristhepropermethodofshardexecutionpreferenceformostusecases.Buttheremaybetimeswhenwewanttochangethedefaultbehavior.Forexample,youmaywantthesearchtoonlybeexecutedontheprimaryshards.Todothat,wecansetthepreferencerequestparametertooneofthefollowingvalues:

_primary:Theoperationwillbeonlyexecutedontheprimaryshards,sothereplicaswon’tbeused.Thiscanbeusefulwhenweneedtousethelatestinformationfromtheindexbutourdataisnotreplicatedrightaway._primary_first:Theoperationwillbeexecutedontheprimaryshardsiftheyareavailable.Ifnot,itwillbeexecutedontheothershards._replica:Theoperationwillbeexecutedonlyonthereplicashards._replica_first:Thisoperationissimilarto_primary_first,butusesreplicashards.Theoperationwillbeexecutedonthereplicashardsifpossibleandontheprimaryshardsifthereplicasarenotavailable._local:Theoperationwillbeexecutedontheshardsavailableonthenodewhichtherequestwassentfromand,ifsuchshardsarenotpresent,therequestwillbeforwardedtotheappropriatenodes._only_node:node_id:Thisoperationwillbeexecutedonthenodewiththeprovidednodeidentifier._only_nodes:nodes_spec:Thisoperationwillbeexecutedonthenodesthataredefinedinnodes_spec.ThiscanbeanIPaddress,aname,anameorIPaddressusingwildcards,andsoon.Forexample,ifnodes_specissetto192.168.1.*,theoperationwillberunonthenodeswithIPaddressesstartingwith192.168.1._prefer_node:node_id:Elasticsearchwilltrytoexecutetheoperationonthenodewiththeprovidedidentifier.However,ifthenodeisnotavailable,itwillbeexecutedonthenodesthatareavailable._shards:1,2:Elasticsearchwillexecutetheoperationontheshardswiththegivenidentifiers;inthiscase,onshardswithidentifiers1and2.The_shardsparametercanbecombinedwithotherpreferences,buttheshardsidentifiersneedtobeprovidedfirst.Forexample,_shards:1,2;_local.Customvalue:Anycustom,stringvaluemaybepassed.Requestswiththesamevaluesprovidedwillbeexecutedonthesameshards.

Forexample,ifwewouldliketoexecuteaqueryonlyonthelocalshards,wewouldrunthefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty&preference=_local'-d'{

"query":{

"term":{"title":"crime"}

}

}'

www.EBooksWorld.ir

Page 198: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SearchshardsAPIWhendiscussingthesearchpreference,wewouldalsoliketomentionthesearchshardsAPIexposedbyElasticsearch.ThisAPIallowsustocheckwhichshardsthequerywillbeexecutedon.InordertousethisAPI,runarequestagainstthesearch_shardsrestendpoint.Forexample,toseehowthequerywillbeexecuted,werunthefollowingcommand:

curl-XGET'localhost:9200/library/_search_shards?pretty'-d

'{"query":"match_all":{}}'

Theresponsetotheprecedingcommandwillbeasfollows:

{

"nodes":{

"my0DcA_MTImm4NE3cG3ZIg":{

"name":"Cloud9",

"transport_address":"127.0.0.1:9300",

"attributes":{}

}

},

"shards":[[{

"state":"STARTED",

"primary":true,

"node":"my0DcA_MTImm4NE3cG3ZIg",

"relocating_node":null,

"shard":0,

"index":"library",

"version":4,

"allocation_id":{

"id":"9ayLDbL1RVSyJRYIJkuAxg"

}

}],[{

"state":"STARTED",

"primary":true,

"node":"my0DcA_MTImm4NE3cG3ZIg",

"relocating_node":null,

"shard":1,

"index":"library",

"version":4,

"allocation_id":{

"id":"wfpvtaLER-KVyOsuD46Yqg"

}

}],[{

"state":"STARTED",

"primary":true,

"node":"my0DcA_MTImm4NE3cG3ZIg",

"relocating_node":null,

"shard":2,

"index":"library",

"version":4,

"allocation_id":{

"id":"zrLPWhCOSTmjlb8TY5rYQA"

}

}],[{

www.EBooksWorld.ir

Page 199: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"state":"STARTED",

"primary":true,

"node":"my0DcA_MTImm4NE3cG3ZIg",

"relocating_node":null,

"shard":3,

"index":"library",

"version":4,

"allocation_id":{

"id":"efnvY7YcSz6X8X8USacA7g"

}

}],[{

"state":"STARTED",

"primary":true,

"node":"my0DcA_MTImm4NE3cG3ZIg",

"relocating_node":null,

"shard":4,

"index":"library",

"version":4,

"allocation_id":{

"id":"XJHW2J63QUKdh3bK3T2nzA"

}

}]]

}

Asyoucansee,intheresponsereturnedbyElasticsearch,wehavetheinformationabouttheshardsthatwillbeusedduringthequeryprocess.Ofcourse,withthesearchshardsAPI,wecanuseadditionalparametersthatcontrolthequeryingprocess.Thesepropertiesarerouting,preference,andlocal.Wearealreadyfamiliarwiththefirsttwo.ThelocalparameterisaBoolean(valuestrueorfalse),onethatallowsustotellElasticsearchtousetheclusterstateinformationstoredonthelocalnode(settinglocaltotrue)insteadoftheonefromthemasternode(settinglocaltofalse).Thisallowsustodiagnoseproblemswithclusterstatesynchronization.

www.EBooksWorld.ir

Page 200: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 201: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

BasicqueriesElasticsearchhasextensivesearchanddataanalysiscapabilitiesthatareexposedinformsofdifferentqueries,filters,aggregates,andsoon.Inthissection,wewillconcentrateonthebasicqueriesprovidedbyElasticsearch.Bybasicquerieswemeantheonesthatdon’tcombinetheotherqueriestogetherbutrunontheirown.

www.EBooksWorld.ir

Page 202: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThetermqueryThetermqueryisoneofthesimplestqueriesinElasticsearch.Itjustmatchesthedocumentthathasaterminagivenfield-theexact,notanalyzedterm.Thesimplesttermqueryisasfollows:

{

"query":{

"term":{

"title":"crime"

}

}

}

Itwillmatchthedocumentsthathavethetermcrimeinthetitlefield.Rememberthatthetermqueryisnotanalyzed,soyouneedtoprovidetheexacttermthatwillmatchthetermintheindexeddocument.Notethatinourinputdata,wehavethetitlefieldwiththevalueofCrimeandPunishment(uppercased),butwearesearchingforcrime,becausetheCrimetermsbecomescrimeafteranalysisduringindexing.

Inadditiontothetermwewanttofind,wecanalsoincludetheboostattributetoourtermquery,whichwillaffecttheimportanceofthegiventerm.WewilltalkmoreaboutboostsintheIntroductiontoApacheLucenescoringsectionofChapter6,MakeYourSearchBetter.Fornow,wejustneedtorememberthatitchangestheimportanceofthegivenpartofthequery.

Forexample,tochangeourpreviousqueryandgiveourtermqueryaboostof10.0,sendthefollowingquery:

{

"query":{

"term":{

"title":{

"value":"crime",

"boost":10.0

}

}

}

}

Asyoucansee,thequerychangedabit.Insteadofasimpletermvalue,wenestedanewJSONobjectwhichcontainsthevaluepropertyandtheboostproperty.Thevalueofthevaluepropertyshouldcontainthetermweareinterestedinandtheboostpropertyistheboostvaluewewanttouse.

www.EBooksWorld.ir

Page 203: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThetermsqueryThetermsqueryisanextensiontothetermquery.Itallowsustomatchdocumentsthathavecertaintermsintheircontentsinsteadofasingleterm.Thetermqueryallowedustomatchasingle,notanalyzedtermandthetermsqueryallowsustomatchmultipleofthose.Forexample,let’ssaythatwewanttogetallthedocumentsthathavethetermsnovelorbookinthetagsfield.Toachievethis,wewillrunthefollowingquery:

{

"query":{

"terms":{

"tags":["novel","book"]

}

}

}

Theprecedingqueryreturnsallthedocumentsthathaveoneorbothofthesearchedtermsinthetagsfield.Thisisakeypointtoremember–thetermsquerywillfinddocumentshavinganyoftheprovidedterms.

www.EBooksWorld.ir

Page 204: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThematchallqueryThematchallqueryisoneofthesimplestqueriesavailableinElasticsearch.Itallowsustomatchallofthedocumentsintheindex.Ifwewanttogetallthedocumentsfromourindex,wejustrunthefollowingquery:

{

"query":{

"match_all":{}

}

}

Wecanalsoincludeboostinthequery,whichwillbegiventoallthedocumentsmatchedbyit.Forexample,ifwewanttoaddaboostof2.0toallthedocumentsinourmatchallquery,wewillsendthefollowingquerytoElasticsearch:

{

"query":{

"match_all":{

"boost":2.0

}

}

}

www.EBooksWorld.ir

Page 205: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThetypequeryAverysimplequerythatallowsustofindallthedocumentswithacertaintype.Forexample,ifwewouldliketosearchforallthedocumentswiththebooktypeinourlibraryindex,wewillrunthefollowingquery:

{

"query":{

"type":{

"value":"book"

}

}

}

www.EBooksWorld.ir

Page 206: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheexistsqueryAquerythatallowsustofindallthedocumentsthathaveavalueinthedefinedfield.Forexample,tofindthedocumentsthathaveavalueinthetagsfield,wewillrunthefollowingquery:

{

"query":{

"exists":{

"field":"tags"

}

}

}

www.EBooksWorld.ir

Page 207: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThemissingqueryOppositetotheexistsquery,themissingqueryreturnsthedocumentsthathaveanullvalueornovalueatallinagivenfield.Forexample,tofindallthedocumentsthatdon’thaveavalueinthetagsfield,wewillrunthefollowingquery:

{

"query":{

"missing":{

"field":"tags"

}

}

}

www.EBooksWorld.ir

Page 208: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThecommontermsqueryThecommontermsqueryisamodernElasticsearchsolutionforimprovingqueryrelevanceandprecisionwithcommonwordswhenwearenotusingstopwords(http://en.wikipedia.org/wiki/Stop_words).Forexample,acrimeandpunishmentqueryresultsinthreetermqueriesandeachofthemhaveacostintermsofperformance.However,theandtermisaverycommononeanditsimpactonthedocumentscorewillbeverylow.Thesolutionisthecommontermsquerywhichdividesthequeryintotwogroups.Thefirstgroupistheonewithimportantterms,whicharetheonesthathavelowerfrequency.Thesecondgroupistheonewithlessimportantterms,whicharetheoneswithhighfrequency.ThefirstqueryisexecutedfirstandElasticsearchcalculatesthescoreforallofthetermsfromthefirstgroup.Thiswaythelowfrequencyterms,whichareusuallytheonesthathavemoreimportance,arealwaystakenintoconsideration.ThenElasticsearchexecutesthesecondqueryforthesecondgroupofterms,butcalculatesthescoreonlyforthedocumentsmatchedforthefirstquery.Thiswaythescoreisonlycalculatedfortherelevantdocumentsandthushigherperformancecanbeachieved.

Anexampleofthecommontermsqueryisasfollows:

{

"query":{

"common":{

"title":{

"query":"crimeandpunishment",

"cutoff_frequency":0.001

}

}

}

}

Thequerycantakethefollowingparameters:

query:Theactualquerycontents.cutoff_frequency:Thepercentage(0.001means0.1%)oranabsolutevalue(whenpropertyissettoavalueequaltoorlargerthan1).Highandlowfrequencygroupsareconstructedusingthisvalue.Settingthisparameterto0.001meansthatthelowfrequencytermsgroupwillbeconstructedfortermshavingafrequencyof0.1%andlower.low_freq_operator:Thiscanbesettoororand,butdefaultstoor.ItspecifiestheBooleanoperatorusedforconstructingqueriesinthelowfrequencytermgroup.Ifwewantallthetermstobepresentinadocumentforittobeconsideredamatch,weshouldsetthisparametertoand.high_freq_operator:Thiscanbesettoororand,butdefaultstoor.ItspecifiestheBooleanoperatorusedforconstructingqueriesinthehighfrequencytermgroup.Ifwewantallthetermstobepresentinadocumentforittobeconsideredamatch,weshouldsetthisparametertoand.minimum_should_match:Insteadofusinglow_freq_operatorandhigh_freq_operator,wecanuseminimum_should_match.Justlikewiththeother

www.EBooksWorld.ir

Page 209: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

queries,itallowsustospecifytheminimumnumberoftermsthatshouldbefoundinadocumentforittobeconsideredamatch.Wecanalsospecifyhigh_freqandlow_freqinsidetheminimum_should_matchobject,whichallowsustodefinethedifferentnumberoftermsthatneedtobematchedforthehighandlowfrequencyterms.boost:Theboostgiventothescoreofthedocuments.analyzer:Thenameoftheanalyzerthatwillbeusedtoanalyzethequerytext,whichdefaultstothedefaultanalyzer.disable_coord:Defaultstofalseandallowsustoenableordisablethescorefactorcomputationthatisbasedonthefractionofallthequerytermsthatadocumentcontains.Setittotrueforlessprecisescoring,butslightlyfasterqueries.

NoteUnlikethetermandtermsqueries,thecommontermsqueryisanalyzedbyElasticsearch.

www.EBooksWorld.ir

Page 210: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThematchqueryThematchquerytakesthevaluesgiveninthequeryparameter,analyzesit,andconstructstheappropriatequeryoutofit.Whenusingamatchquery,Elasticsearchwillchoosetheproperanalyzerforthefieldwechoose,soyoucanbesurethatthetermspassedtothematchquerywillbeprocessedbythesameanalyzerthatwasusedduringindexing.Rememberthatthematchquery(andthemulti_matchquery)doesn’tsupportLucenequerysyntax;however,itperfectlyfitsasaqueryhandlerforyoursearchbox.Thesimplestmatch(andthedefault)querywilllooklikethefollowing:

{

"query":{

"match":{

"title":"crimeandpunishment"

}

}

}

Theprecedingquerywillmatchallthedocumentsthathavethetermscrime,and,orpunishmentinthetitlefield.However,thepreviousqueryisonlythesimplestone;therearemultipletypesofmatchquerywhichwewilldiscussnow.

TheBooleanmatchqueryTheBooleanmatchqueryisaquerywhichanalyzestheprovidedtextandmakesaBooleanqueryoutofit.Thisisalsothedefaulttypeforthematchquery.ThereareafewparameterswhichallowustocontrolthebehavioroftheBooleanmatchqueries:

operator:Thisparametercantakethevalueofororand,andcontrolswhichBooleanoperatorisusedtoconnectthecreatedBooleanclauses.Thedefaultvalueisor.Ifwewantallthetermsinourquerytobematched,weshouldusetheandBooleanoperator.analyzer:Thisspecifiesthenameoftheanalyzerthatwillbeusedtoanalyzethequerytextanddefaultstothedefaultanalyzer.fuzziness:Providingthevalueofthisparameterallowsustoconstructfuzzyqueries.Thevalueofthisparametercanvary.Fornumericfields,itshouldbesettonumericvalue;fordatebasedfield,itcanbesettomillisecondortimevalue,suchas2h;andfortextfields,itcanbesetto0,1,or2(theeditdistanceintheLevenshteinalgorithm–https://en.wikipedia.org/wiki/Levenshtein_distance),AUTO(whichallowsElasticsearchtocontrolhowfuzzyqueriesareconstructedandwhichisapreferredvalue).Finally,fortextfields,itcanalsobesettovaluesfrom0.0to1.0,whichresultsineditdistancebeingcalculatedastermlengthminus1.0multipliedbytheprovidedfuzzinessvalue.Ingeneral,thehigherthefuzziness,themoredifferencebetweentermswillbeallowed.prefix_length:Thisallowscontroloverthebehaviorofthefuzzyquery.Formoreinformationonthevalueofthisparameter,refertotheThefuzzyquerysectioninthischapter.max_expansions:Thisallowscontroloverthebehaviorofthefuzzyquery.Formore

www.EBooksWorld.ir

Page 211: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

informationonthevalueofthisparameter,refertotheThefuzzyquerysectioninthischapter.zero_terms_query:Thisallowsustospecifythebehaviorofthequery,whenallthetermsareremovedbytheanalyzer(forexample,becauseofstopwords).Itcanbesettononeorall,withnoneasthedefault.Whensettonone,nodocumentswillbereturnedwhentheanalyzerremovesallthequeryterms.Ifsetittoall,allthedocumentswillbereturned.cutoff_frequency:Itallowsdividingthequeryintotwogroups:onewithhighfrequencytermsandonewithlowfrequencyterms.Refertothedescriptionofthecommontermsquerytoseehowthisparametercanbeused.lenient:Whensettotrue(bydefaultitisfalse),itallowsustoignoretheexceptionscausedbydataincompatibility,suchastryingtoquerynumericfieldsusingstringvalue.

Theparametersshouldbewrappedinthenameofthefieldwearerunningthequeryagainst.SoifwewanttorunasampleBooleanmatchqueryagainstthetitlefield,wesendaqueryasfollows:

{

"query":{

"match":{

"title":{

"query":"crimeandpunishment",

"operator":"and"

}

}

}

}

ThephrasematchqueryAphrasematchqueryissimilartotheBooleanquery,but,insteadofconstructingtheBooleanclausesfromtheanalyzedtext,itconstructsthephrasequery.YoumaywonderwhatphraseiswhenitcomestoLuceneandElasticsearch–well,itistwoormoretermspositionedoneafteranotherinanorder.Thefollowingparametersareavailable:

slop:Anintegervaluethatdefineshowmanyunknownwordscanbeputbetweenthetermsinthetextqueryforamatchtobeconsideredaphrase.Thedefaultvalueofthisparameteris0,whichmeansthatnoadditionalwordsareallowed.analyzer:Thisspecifiesthenameoftheanalyzerthatwillbeusedtoanalyzethequerytextanddefaultstothedefaultanalyzer.

Asamplephrasematchqueryagainstthetitlefieldlookslikethefollowingcode:

{

"query":{

"match_phrase":{

"title":{

"query":"crimepunishment",

"slop":1

}

www.EBooksWorld.ir

Page 212: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

}

}

Notethatweremovedtheandtermfromourquery,butbecausetheslopissetto1,itwillstillmatchourdocumentbecauseweallowedonetermtobepresentbetweenourterms.

ThematchphraseprefixqueryThelasttypeofthematchqueryisthematchphraseprefixquery.Thisqueryisalmostthesameasthephrasematchquery,butinaddition,itallowsprefixmatchesonthelastterminthequerytext.Also,inadditiontotheparametersexposedbythematchphrasequery,itexposesanadditionalone–themax_expansionsparameter,whichcontrolshowmanyprefixesthelasttermwillberewrittento.Ourexamplequerychangedtothematch_phrase_prefixquerywilllookasfollows:

{

"query":{

"match_phrase_prefix":{

"title":{

"query":"crimepunishm",

"slop":1,

"max_expansions":20

}

}

}

}

Notethatwedidn’tprovidethefullcrimeandpunishmentphrase,butonlycrimepunishmandstillthequerywouldmatchourdocument.Thisisbecauseweusedthematch_phrase_prefixquerycombinedwithslopsetto1.

www.EBooksWorld.ir

Page 213: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThemultimatchqueryItisthesameasthematchquery,butinsteadofrunningagainstasinglefield,itcanberunagainstmultiplefieldswiththeuseofthefieldsparameter.Ofcourse,alltheparametersyouusewiththematchquerycanbeusedwiththemultimatchquery.Soifwewouldliketomodifyourmatchquerytoberunagainstthetitleandotitlefields,wewillrunthefollowingquery:

{

"query":{

"multi_match":{

"query":"crimepunishment",

"fields":["title^10","otitle"]

}

}

}

Asshownintheprecedingexample,thenicethingaboutthemultimatchqueryisthatthefieldsdefinedinitsupportboosting,sowecanincreaseordecreasetheimportanceofmatchesoncertainfields.

However,thisisnottheonlydifferencewhenitcomestocomparisonwiththematchquery.Wecanalsocontrolhowthequeryisruninternallybyusingthetypepropertyandsettingittooneofthefollowingvalues:

best_fields:Thisisthedefaultbehavior,whichfindsdocumentshavingmatchesinanyfieldfromthedefinedones,butsettingthedocumentscoretothescoreofthebestmatchingfield.Themostusefultypewhensearchingformultiplewordsandwantingtoboostdocumentsthathavethosewordsinthesamefield.most_fields:Thisvaluefindsdocumentsthatmatchanyfieldandsetsthescoreofthedocumenttothecombinedscorefromallthematchedfields.cross_fields:Thisvaluetreatsthequeryasifallthetermswereinone,bigfield,thusreturningdocumentsmatchinganyfield.phrase:Thisvalueusesthematch_phrasequeryoneachfieldandsetsthescoreofthedocumenttothescorecombinedfromallthefields.phrase_prefix:Thisvalueusesthematch_phrase_prefixqueryoneachfieldandsetsthescoreofthedocumenttothescorecombinedfromallthefields.

Inadditiontotheparametersmentionedinthematchqueryandtype,themultimatchqueryexposessomeadditionalonesallowingmorecontroloveritsbehavior:

tie_breaker:Thisallowsustospecifythebalancebetweentheminimumandthemaximumscoringqueryitemsandthevaluecanbefrom0.0to1.0.Whenused,thescoreofthedocumentisequaltothebestscoringelementplusthetie_breakermultipliedbythescoreofalltheothermatchingfieldsinthedocument.So,whensetto0.0,Elasticsearchwillonlyusethescoreofthemostscoringmatchingelement.YoucanreadmoreaboutitinThedis_maxquerysectioninthischapter.

www.EBooksWorld.ir

Page 214: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThequerystringqueryIncomparisontotheotherqueriesavailable,thequerystringquerysupportsfullApacheLucenequerysyntax,whichwediscussedearlierintheLucenequerysyntaxsectionofChapter1,GettingStartedwithElasticsearchCluster.Itusesaqueryparsertoconstructanactualqueryusingtheprovidedtext.Anexamplequerystringquerywilllooklikethefollowingcode:

{

"query":{

"query_string":{

"query":"title:crime^10+title:punishment-otitle:cat+author:

(+Fyodor+dostoevsky)",

"default_field":"title"

}

}

}

BecausewearefamiliarwiththebasicsoftheLucenequerysyntax,wecandiscusshowtheprecedingqueryworks.Asyoucansee,wewantedtogetthedocumentsthatmayhavethetermcrimeinthetitlefieldandsuchdocumentsshouldbeboostedwiththevalueof10.Next,wewantedonlythedocumentsthathavethetermpunishmentinthetitlefieldandwedidn’twantdocumentswiththetermcatintheotitlefield.Finally,wetoldLucenethatweonlywantedthedocumentsthathadthefyodoranddostoevskytermsintheauthorfield.

SimilartomostofthequeriesinElasticsearch,thequerystringqueryprovidesquiteafewparametersthatallowustocontrolthequerybehaviorandthelistofparametersforthisqueryisratherextensive:

query:Thisspecifiesthequerytext.default_field:Thisspecifiesthedefaultfieldthequerywillbeexecutedagainst.Itdefaultstotheindex.query.default_fieldproperty,whichisbydefaultsetto_all.default_operator:Thisspecifiesthedefaultlogicaloperator(ororand)usedwhennooperatorisspecified.Thedefaultvalueofthisparameterisor.analyzer:Thisspecifiesthenameoftheanalyzerusedtoanalyzethequeryprovidedinthequeryparameter.allow_leading_wildcard:Thisspecifiesifawildcardcharacterisallowedasthefirstcharacterofaterm.Itdefaultstotrue.lowercase_expand_terms:Thisspecifiesifthetermsthatarearesultofqueryrewriteshouldbelowercased.Itdefaultstotrue,whichmeansthattherewrittentermswillbelowercased.enable_position_increments:Thisspecifiesifpositionincrementsshouldbeturnedonintheresultquery.Itdefaultstotrue.fuzzy_max_expansions:Thisspecifiesthemaximumnumberoftermsintowhichfuzzyquerywillbeexpanded,iffuzzyqueryisused.Itdefaultsto50.fuzzy_prefix_length:Thisspecifiestheprefixlengthforthegeneratedfuzzy

www.EBooksWorld.ir

Page 215: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

queriesanddefaultsto0.Tolearnmoreaboutit,lookatthefuzzyquerydescription.phrase_slop:Thisspecifiesthephraseslopanddefaultsto0.Tolearnmoreaboutit,lookatthephrasematchquerydescription.boost:Thisspecifiestheboostvaluewhichwillbeusedanddefaultsto1.0.analyze_wildcard:Thisspecifiesifthetermsgeneratedbythewildcardqueryshouldbeanalyzed.Itdefaultstofalse,whichmeansthatthosetermswon’tbeanalyzed.auto_generate_phrase_queries:specifiesifthephrasequerieswillbeautomaticallygeneratedfromthequery.Itdefaultstofalse,whichmeansthatthephrasequerieswon’tbeautomaticallygenerated.minimum_should_match:ThiscontrolshowmanyofthegeneratedBooleanshouldclausesshouldbematchedagainstadocumentforthedocumenttobeconsideredahit.Thevaluecanbeprovidedasapercentage;forexample,50%,whichwouldmeanthatatleast50percentofthegiventermsshouldmatch.Itcanalsobeprovidedasanintegervalue,suchas2,whichmeansthatatleast2termsmustmatch.fuzziness:Thiscontrolsthebehaviorofthegeneratedfuzzyquery.Refertothematchquerydescriptionformoreinformation.max_determined_states:Thisdefaultsto10000andsetsthenumberofstatesthattheautomatoncanhaveforhandlingregularexpressionqueries.Itisusedtodisallowveryexpensivequeriesusingregularexpressions.locale:Thissetsthelocalethatshouldbeusedfortheconversionofstringvalues.Bydefault,itissettoROOT.time_zone:Thissetsthetimezonethatshouldbeusedbyrangequeriesthatarerunondatebasedfields.lenient:Thiscantakethevalueoftrueorfalse.Ifsettotrue,format-basedfailureswillbeignored.Bydefault,itissettofalse.

NotethatElasticsearchcanrewritethequerystringqueryand,becauseofthat,Elasticsearchallowsustopassadditionalparametersthatcontroltherewritemethod.However,formoredetailsaboutthisprocess,gototheUnderstandingthequeryingprocesssectioninthischapter.

RunningthequerystringqueryagainstmultiplefieldsItispossibletorunthequerystringqueryagainstmultiplefields.Inordertodothat,oneneedstoprovidethefieldsparameterinthequerybody,whichshouldholdthearrayofthefieldnames.Therearetwomethodsofrunningthequerystringqueryagainstmultiplefields:thedefaultmethodusestheBooleanquerytomakequeriesandtheothermethodcanusethedis_maxquery.

Inordertousethedis_maxquery,oneshouldaddtheuse_dis_maxpropertyinthequerybodyandsetittotrue.Anexamplequerywilllooklikethefollowingcode:

{

"query":{

"query_string":{

"query":"crimepunishment",

"fields":["title","otitle"],

www.EBooksWorld.ir

Page 216: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"use_dis_max":true

}

}

}

www.EBooksWorld.ir

Page 217: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThesimplequerystringqueryThesimplequerystringqueryusesoneofthenewestqueryparsersinLucene-theSimpleQueryParser(https://lucene.apache.org/core/5_4_0/queryparser/org/apache/lucene/queryparser/simple/SimpleQueryParser.htmlSimilartothequerystringquery,itacceptsLucenequerysyntaxasthequery;however,unlikeit,itneverthrowsanexceptionwhenaparsingerrorhappens.Insteadofthrowinganexception,itdiscardstheinvalidpartsofthequeryandrunstherest.

Anexamplesimplequerystringquerywilllooklikethefollowingcode:

{

"query":{

"simple_query_string":{

"query":"crimepunishment",

"default_operator":"or"

}

}

}

Thequerysupportsparameterssuchasquery,fields,default_operator,analyzer,lowercase_expanded_terms,locale,lenient,andminimum_should_match,andcanalsoberunagainstmultiplefieldsusingthefieldsproperty.

www.EBooksWorld.ir

Page 218: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheidentifiersqueryThisisasimplequerythatfiltersthereturneddocumentstoonlythosewiththeprovidedidentifiers.Itworksontheinternal_uidfield,soitdoesn’trequirethe_idfieldtobeenabled.Thesimplestversionofsuchaquerywilllooklikethefollowing:

{

"query":{

"ids":{

"values":["1","2","3"]

}

}

}

Thisquerywillonlyreturnthosedocumentsthathaveoneoftheidentifierspresentinthevaluesarray.Wecancomplicatetheidentifiersqueryabitandalsolimitthedocumentsonthebasisoftheirtype.Forexample,ifwewanttoonlyincludedocumentsfromthebooktypes,wewillsendthefollowingquery:

{

"query":{

"ids":{

"type":"book",

"values":["1","2","3"]

}

}

}

Asyoucansee,we’veaddedthetypepropertytoourqueryandwe’vesetitsvaluetothetypeweareinterestedin.

www.EBooksWorld.ir

Page 219: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheprefixqueryThisqueryissimilartothetermqueryinitsconfigurationandtothemultitermquerywhenlookingintoitslogic.Theprefixqueryallowsustomatchdocumentsthathavethevalueinacertainfieldthatstartswithagivenprefix.Forexample,ifwewanttofindallthedocumentsthathavevaluesstartingwithcriinthetitlefield,wewillrunthefollowingquery:

{

"query":{

"prefix":{

"title":"cri"

}

}

}

Similartothetermquery,youcanalsoincludetheboostattributetoyourprefixquerywhichwillaffecttheimportanceofthegivenprefix.Forexample,ifwewouldliketochangeourpreviousqueryandgiveourqueryaboostof3.0,wewillsendthefollowingquery:

{

"query":{

"prefix":{

"title":{

"value":"cri",

"boost":3.0

}

}

}

}

NoteNotethattheprefixqueryisrewrittenbyElasticsearchandbecauseofthatElasticsearchallowsustopassanadditionalparameter,thatis,controllingtherewritemethod.However,formoredetailsaboutthatprocess,refertotheUnderstandingthequeryingprocesssectioninthischapter.

www.EBooksWorld.ir

Page 220: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThefuzzyqueryThefuzzyqueryallowsustofinddocumentsthathavevaluessimilartotheoneswe’veprovidedinthequery.Thesimilarityoftermsiscalculatedonthebasisoftheeditdistancealgorithm.Theeditdistanceiscalculatedonthebasisoftermsweprovideinthequeryandagainstthesearcheddocuments.ThisquerycanbeexpensivewhenitcomestoCPUresources,butcanhelpuswhenweneedfuzzymatching;forexample,whenusersmakespellingmistakes.Inourexample,let’sassumethatinsteadofcrime,ouruserentersthecrmewordintothesearchboxandwewouldliketorunthesimplestformoffuzzyquery.Suchaquerywilllooklikethis:

{

"query":{

"fuzzy":{

"title":"crme"

}

}

}

Theresponseforsuchaquerywillbeasfollows:

{

"took":81,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

}]

}

}

Eventhoughwemadeatypo,Elasticsearchmanagedtofindthedocumentswewereinterestedin.

www.EBooksWorld.ir

Page 221: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Wecancontrolthefuzzyquerybehaviorbyusingthefollowingparameters:

value:Thisspecifiestheactualquery.boost:Thisspecifiestheboostvalueforthequery.Itdefaultsto1.0.fuzziness:Thiscontrolsthebehaviorofthegeneratedfuzzyquery.Refertothematchquerydescriptionformoreinformation.prefix_length:Thisisthelengthofthecommonprefixofthedifferencingterms.Itdefaultsto0.max_expansions:Thisspecifiesthemaximumnumberoftermsthequerywillbeexpandedto.Thedefaultvalueisunbounded.

Theparametersshouldbewrappedinthenameofthefieldwearerunningthequeryagainst.Soifwewouldliketomodifythepreviousqueryandaddadditionalparameters,thequerywilllooklikethefollowingcode:

{

"query":{

"fuzzy":{

"title":{

"value":"crme",

"fuzziness":2

}

}

}

}

www.EBooksWorld.ir

Page 222: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThewildcardqueryAquerythatallowsustouse*and?wildcardsinthevalueswesearch.Apartfromthat,thewildcardqueryisverysimilartothetermqueryincaseofitsbody.Tosendaquerythatwouldmatchallthedocumentswiththevalueofthecr?meterm(?matchinganycharacter)wewouldsendthefollowingquery:

{

"query":{

"wildcard":{

"title":"cr?me"

}

}

}

Itwillmatchthedocumentsthathaveallthetermsmatchingcr?meinthetitlefield.However,youcanalsoincludetheboostattributetoyourwildcardquerywhichwillaffecttheimportanceofeachtermthatmatchesthegivenvalue.Forexample,ifwewouldliketochangeourpreviousqueryandgiveourtermqueryaboostof20.0,wewillsendthefollowingquery:

{

"query":{

"wildcard":{

"title":{

"value":"cr?me",

"boost":20.0

}

}

}

}

NoteNotethatwildcardqueriesarenotveryperformanceorientedqueriesandshouldbeavoidedifpossible;especiallyavoidleadingwildcards(termsstartingwithwildcards).ThewildcardqueryisrewrittenbyElasticsearchandbecauseofthatElasticsearchallowsustopassanadditionalparameter,thatis,controllingtherewritemethod.Formoredetailsaboutthisprocess,refertotheUnderstandingthequeryingprocesssectioninthischapter.Alsorememberthatthewildcardqueryisnotanalyzed.

www.EBooksWorld.ir

Page 223: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TherangequeryAquerythatallowsustofinddocumentsthathaveafieldvaluewithinacertainrangeandwhichworksfornumericalfieldsaswellasforstring-basedfieldsanddatebasedfields(justmapstoadifferentApacheLucenequery).Therangequeryshouldberunagainstasinglefieldandthequeryparametersshouldbewrappedinthefieldname.Thefollowingparametersaresupported:

gte:Thequerywillmatchdocumentswiththevaluegreaterthanorequaltotheoneprovidedwiththisparametergt:Thequerywillmatchdocumentswiththevaluegreaterthantheoneprovidedwiththisparameterlte:Thequerywillmatchdocumentswiththevaluelowerthanorequaltotheoneprovidedwiththisparameterlt:Thequerywillmatchdocumentswiththevaluelowerthantheoneprovidedwiththisparameter

Soforexample,ifwewanttofindallthebooksthathavethevaluefrom1700to1900intheyearfield,wewillrunthefollowingquery:

{

"query":{

"range":{

"year":{

"gte":1700,

"lte":1900

}

}

}

}

www.EBooksWorld.ir

Page 224: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RegularexpressionqueryRegularexpressionqueryallowsustouseregularexpressionsasthequerytext.Rememberthattheperformanceofsuchqueriesdependsonthechosenregularexpression.Ifourregularexpressionwouldmatchmanyterms,thequerywillbeslow.Thegeneralruleisthatthemoretermsmatchedbytheregularexpression,theslowerthequerywillbe.

Anexampleregularexpressionquerylookslikethis:

{

"query":{

"regexp":{

"title":{

"value":"cr.m[ae]",

"boost":10.0

}

}

}

}

TheprecedingquerywillresultinElasticsearchrewritingthequery.Therewrittenquerywillhavethenumberoftermqueriesdependingonthecontentofourindexmatchingthegivenregularexpression.Theboostparameterseeninthequeryspecifiestheboostvalueforthegeneratedqueries.

ThefullregularexpressionsyntaxacceptedbyElasticsearchcanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#regexp-syntax.

www.EBooksWorld.ir

Page 225: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThemorelikethisqueryOneofthequeriesthatgotamajorreworkinElasticsearch2.0,themorelikethisqueryallowsustoretrievedocumentsthataresimilar(ornotsimilar)totheprovidedtextortothedocumentsthatwereprovided.

Themorelikethisqueryallowsustogetdocumentsthataresimilartotheprovidedtext.Elasticsearchsupportsafewparameterstodefinehowthemorelikethisqueryshouldwork:

fields:Anarrayoffieldsthatthequeryshouldberunagainst.Itdefaultstothe_allfield.like:Thisparametercomesintwoflavors:itallowsustoprovideatextwhichthereturneddocumentsshouldbesimilartooranarrayofdocumentsthatthereturningdocumentshouldbesimilarto.unlike:Thisissimilartothelikeparameter,butitallowsustodefinetextordocumentsthatourreturningdocumentshouldnotbesimilarto.min_term_freq:Theminimumtermfrequency(forthetermsinthedocuments)belowwhichtermswillbeignored.Itdefaultsto2.max_query_terms:Themaximumnumberoftermsthatwillbeincludedinanygeneratedquery.Itdefaultsto25.Thehighervaluemaymeanhigherprecision,butlowerperformance.stop_words:Anarrayofwordsthatwillbeignoredwhencomparingdocumentsandthequery.Itisemptybydefault.min_doc_freq:Theminimumnumberofdocumentsinwhichthetermhastobepresentinordernottobeignored.Itdefaultsto5,whichmeansthatatermneedstobepresentinatleastfivedocuments.max_doc_freq:Themaximumnumberofdocumentsinwhichthetermmaybepresentinordernottobeignored.Bydefault,itisunbounded(setto0).min_word_len:Theminimumlengthofasinglewordbelowwhichawordwillbeignored.Itdefaultsto0.max_word_len:Themaximumlengthofasinglewordabovewhichitwillbeignored.Itdefaultstounbounded(whichmeanssettingthevalueto0).boost_terms:Theboostvaluethatwillbeusedwhenboostingeachterm.Itdefaultsto0.boost:Theboostvaluethatwillbeusedwhenboostingthequery.Itdefaultsto1.include:Thisspecifiesiftheinputdocumentsshouldbeincludedintheresultsreturnedbythequery.Itdefaultstofalse,whichmeansthattheinputdocumentswon’tbeincluded.minimum_should_match:Thiscontrolsthenumberoftermsthatneedtobematchedintheresultingdocuments.Bydefault,itissetto30%.analyzer:Thenameoftheanalyzerthatwillbeusedtoanalyzethetextweprovided.

Anexampleforamorelikethisquerylookslikethis:

www.EBooksWorld.ir

Page 226: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

{

"query":{

"more_like_this":{

"fields":["title","otitle"],

"like":"crimeandpunishment",

"min_term_freq":1,

"min_doc_freq":1

}

}

}

Aswesaidearlier,thelikepropertycanalsobeusedtoshowwhichdocumentstheresultsshouldbesimilarto.Forexample,thefollowingisthequerythatwillusethelikepropertytopointtoagivendocument(notethatthefollowingquerywon’treturndocumentsonourexampledata):

{

"query":{

"more_like_this":{

"fields":["title","otitle"],

"min_term_freq":1,

"min_doc_freq":1,

"like":[

{

"_index":"library",

"_type":"book",

"_id":"4"

}

]

}

}

}

Wecanalsomixthedocumentsandtexttogether:

{

"query":{

"more_like_this":{

"fields":["title","otitle"],

"min_term_freq":1,

"min_doc_freq":1,

"like":[

{

"_index":"library",

"_type":"book",

"_id":"4"

},

"crimeandpunishment"

]

}

}

}

www.EBooksWorld.ir

Page 227: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 228: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

CompoundqueriesIntheBasicqueriessectionofthischapter,wediscussedthesimplestqueriesexposedbyElasticsearch.WealsotalkedaboutthepositionawarequeriescalledspanqueriesintheSpanqueriessection.However,thesimpleonesandthespanqueriesarenottheonlyqueriesthatElasticsearchprovides.Thecompoundqueries,aswecallthem,allowustoconnectmultiplequeriestogetheroralterthebehaviorofotherqueries.Youmaywonderifyouneedsuchfunctionality.Yourdeploymentmaynotneedit,butanythingapartfromasimplequerywillprobablyrequirecompoundqueries.Forexample,combiningasimpletermquerywithamatch_phrasequerytogetbettersearchresultsmaybeagoodcandidateforcompoundqueriesusage.

www.EBooksWorld.ir

Page 229: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheboolqueryTheboolqueryallowsustowrapavirtuallyunboundednumberofqueriesandconnectthemwithalogicalvalueusingoneofthefollowingsections:

should:Thequerywrappedintothissectionmayormaynotmatch.Thenumberofshouldsectionsthathavetomatchiscontrolledbytheminimum_should_matchparametermust:Thequerywrappedintothissectionmustmatchinorderforthedocumenttobereturned.must_not:Thequerywhenwrappedintothissectionmustnotmatchinorderforthedocumenttobereturned.

Eachoftheprecedingmentionedsectionscanbepresentmultipletimesinasingleboolquery.Thisallowsustobuildverycomplexqueriesthathavemultiplelevelsofnesting(youcanincludetheboolqueryinanotherboolquery).Rememberthatthescoreoftheresultingdocumentwillbecalculatedbytakingasumofallthewrappedqueriesthatthedocumentmatched.

Inadditiontotheprecedingsections,wecanaddthefollowingparameterstothequerybodytocontrolitsbehavior:

filter:Thisallowsustospecifythepartofthequerythatshouldbeusedasafilter.YoucanreadmoreaboutfiltersintheFilteringyourresultssectioninChapter4,ExtendingYourQueryingKnowledge.boost:Thisspecifiestheboostusedinthequery,defaultingto1.0.Thehighertheboost,thehigherthescoreofthematchingdocument.minimum_should_match:Thisdescribestheminimumnumberofshouldclausesthathavetomatchinorderforthecheckeddocumenttobecountedasamatch.Forexample,itcanbeanintegervaluesuchas2orapercentagevaluesuchas75%.Formoreinformation,refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-minimum-should-match.html.disable_coord:ABooleanparameter(defaultstofalse),whichallowsustoenableordisablethescorefactorcomputationthatisbasedonthefractionofallthequerytermsthatadocumentcontains.Weshouldsetittotrueforlessprecisescoring,butslightlyfasterqueries.

Imaginethatwewanttofindallthedocumentsthathavethetermcrimeinthetitlefield.Inaddition,thedocumentsmayormaynothavearangeof1900to2000intheyearfieldandmaynothavethenothingtermintheotitlefield.Suchaquerymadewiththeboolquerywilllookasfollows:

{

"query":{

"bool":{

"must":{

"term":{

"title":"crime"

www.EBooksWorld.ir

Page 230: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

},

"should":{

"range":{

"year":{

"from":1900,

"to":2000

}

}

},

"must_not":{

"term":{

"otitle":"nothing"

}

}

}

}

}

NoteNotethatthemust,should,andmust_notsectionscancontainasinglequeryoranarrayofqueries.

www.EBooksWorld.ir

Page 231: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Thedis_maxqueryThedis_maxqueryisveryusefulasitgeneratesaunionofdocumentsreturnedbyallthesubqueriesandreturnsitastheresult.Thegoodthingaboutthisqueryisthefactthatwecancontrolhowthelowerscoringsubqueriesaffectthefinalscoreofthedocuments.Forthedis_maxquery,wespecifythequeriesusingthequeriesproperty(queryoranarrayofqueries)andthetiebreaker,withthetie_breakerproperty.Wecanalsoincludeadditionalboostbyspecifyingtheboostparameter.

Thefinaldocumentscoreiscalculatedasthesumofscoresofthemaximumscoringqueryandthesumofscoresreturnedfromtherestofthequeries,multipliedbythevalueofthetieparameter.So,thetie_breakerparameterallowsustocontrolhowthelowerscoringqueriesaffectthefinalscore.Ifwesetthetie_breakerparameterto1.0,wegettheexactsum,whilesettingthetieparameterto0.1resultsinonly10percentofthescores(ofallthescoresapartfromthemaximumscoringquery)beingaddedtothefinalscore.

Anexampleofthedis_maxqueryisasfollows:

{

"query":{

"dis_max":{

"tie_breaker":0.99,

"boost":10.0,

"queries":[

{

"match":{

"title":"crime"

}

},

{

"match":{

"author":"fyodor"

}

}

]

}

}

}

Asyoucansee,weincludedthetie_breakerandboostparameters.Inadditiontothat,wespecifiedthequeriesparameterthatholdsthearrayofqueriesthatwillberunandusedtogeneratetheunionofdocumentsforresults.

www.EBooksWorld.ir

Page 232: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheboostingqueryTheboostingquerywrapsaroundtwoqueriesandlowersthescoreofthedocumentsreturnedbyoneofthequeries.Therearethreesectionsoftheboostingquerythatneedtobedefined:thepositivesectionthatholdsthequerywhosedocumentscorewillbeleftunchanged,thenegativesectionwhoseresultingdocumentswillhavetheirscorelowered,andthenegative_boostsectionthatholdstheboostvaluethatwillbeusedtolowerthesecondsection’squeryscore.Theadvantageoftheboostingqueryisthattheresultsofboththequeries(thenegativeandthepositiveones)willbepresentintheresults,althoughthescoresofsomequerieswillbelowered.Forcomparison,ifweweretousetheboolquerywiththemust_notsection,wewouldn’tgettheresultsforsuchaquery.

Let’sassumethatwewanttohavetheresultsofasimpletermqueryforthetermcrimeinthetitlefieldandwantthescoreofsuchdocumentstonotbechanged.However,wealsowanttohavethedocumentsthatrangefrom1800to1900intheyearfield,andthescoresofdocumentsreturnedbysuchaquerytohaveanadditionalboostof0.5.Suchaquerywilllooklikethefollowing:

{

"query":{

"boosting":{

"positive":{

"term":{

"title":"crime"

}

},

"negative":{

"range":{

"year":{

"from":1800,

"to":1900

}

}

},

"negative_boost":0.5

}

}

}

www.EBooksWorld.ir

Page 233: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Theconstant_scorequeryTheconstant_scorequerywrapsanotherqueryandreturnsaconstantscoreforeachdocumentreturnedbythewrappedquery.Wespecifythescorethatshouldbegiventothedocumentsbyusingtheboostproperty,whichdefaultsto1.0.Itallowsustostrictlycontrolthescorevalueassignedforadocumentmatchedbyaquery.Forexample,ifwewanttohaveascoreof2.0forallthedocumentsthathavethetermcrimeinthetitlefield,wesendthefollowingquerytoElasticsearch:

{

"query":{

"constant_score":{

"query":{

"term":{

"title":"crime"

}

},

"boost":2.0

}

}

}

www.EBooksWorld.ir

Page 234: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheindicesqueryTheindicesqueryisusefulwhenexecutingaqueryagainstmultipleindices.Itallowsustoprovideanarrayofindices(theindicesproperty)andtwoqueries,onethatwillbeexecutedifwequerytheindexfromthelist(thequeryproperty)andthesecondthatwillbeexecutedonalltheotherindices(theno_match_queryproperty).Forexample,assumewehaveanaliasnamedbooks,holdingtwoindices:libraryandusers.Whatwewanttodoisusethisalias.However,wewanttorundifferentqueriesdependingonwhichindexisusedforsearching.Anexamplequeryfollowingthislogicwilllookasfollows:

{

"query":{

"indices":{

"indices":["library"],

"query":{

"term":{

"title":"crime"

}

},

"no_match_query":{

"term":{

"user":"crime"

}

}

}

}

}

Intheprecedingquery,thequerydescribedinthequerypropertywasrunagainstthelibraryindexandthequerydefinedintheno_match_querysectionwasrunagainstalltheotherindicespresentinthecluster,whichforourhypotheticalaliasmeanstheusersindex.

Theno_match_querypropertycanalsohaveastringvalueinsteadofaquery.Thisstringvaluecaneitherbeallornone,butitdefaultstoall.Iftheno_match_querypropertyissettoall,thedocumentsfromtheindicesthatdon’tmatchwillbereturned.Settingtheno_match_querypropertytononewillresultinnodocumentsfromtheindicesthatdon’tmatchthequeryfromthatsection.

www.EBooksWorld.ir

Page 235: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 236: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UsingspanqueriesElasticsearchleveragesLucenespanqueries,whichallowustomakequerieswhensometokensorphrasesarenearothertokensorphrases.Basically,wecancallthempositionawarequeries.Whenusingthestandardnonspanqueries,wearenotabletomakequeriesthatarepositionaware;tosomeextent,thephrasequeriesallowthat,butonlytosomeextent.So,forElasticsearchandtheunderlyingLucene,itdoesn’tmatterifthetermisinthebeginningofthesentenceorattheendornearanotherterm.Whenusingspanqueries,itdoesmatter.

ThefollowingspanqueriesareexposedinElasticsearch:

spantermqueryspanfirstqueryspannearqueryspanorqueryspannotqueryspanwithinqueryspancontainingqueryspanmultiquery

Beforewecontinuewiththedescription,let’sindexadocumenttoacompletelynewindexthatwewillusetoshowhowspanquerieswork.Todothis,weusethefollowingcommand:

curl-XPUT'localhost:9200/spans/book/1'-d'{

"title":"Testbook",

"author":"Testauthor",

"description":"Theworldbreakseveryone,andafterward,somearestrong

atthebrokenplaces"

}'

www.EBooksWorld.ir

Page 237: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AspanAspan,inourcontext,isastartingandendingtokenpositioninafield.Forexample,inourcase,theworldbreakseveryonecouldbeasinglespan,aworldcanbeasinglespantoo.Asyoumayknow,duringanalysis,Lucene,inadditiontotoken,includessomeadditionalparameters,suchaspositioninthetokenstream.PositioninformationcombinedwiththetermsallowsustoconstructspansusingElasticsearchspanqueries(whicharemappedtoLucenespanqueries).Inthenextfewpages,wewilllearnhowtoconstructspansusingdifferentspanqueriesandhowtocontrolwhichdocumentsarematched.

www.EBooksWorld.ir

Page 238: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SpantermqueryThespan_termqueryisabuilderfortheotherspanqueries.Aspan_termqueryisaquerysimilartothealreadydiscussedtermquery.Onitsown,itworksjustlikethementionedtermquery–itmatchesaterm.Itsdefinitionissimpleandlooksasfollows(weomittedsomepartsofthequeriesonpurpose,becausewewilldiscussitlater):

{

"query":{

...

"span_term":{

"description":{

"value":"world",

"boost":5.0

}

}

}

}

Asyoucansee,itisverysimilartothestandardtermquery.Theabovequeryisrunagainstthedescriptionfieldandwewanttohavethedocumentsthathavetheworldtermreturned.Wealsospecifiedtheboost,whichisalsoallowed.

Onethingtorememberisthatthespan_termquery,similartothestandardtermquery,isnotanalyzed.

www.EBooksWorld.ir

Page 239: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SpanfirstqueryThespanfirstqueryallowsustomatchdocumentsthathavematchesonlyinthefirstpositionsofthefield.Inordertodefineaspanfirstquery,weneedtonestinsideofitanyotherspanquery;forexample,aspantermquerywealreadyknow.So,let’sfindthedocumentthathasthetermworldinthefirsttwopositionsinthedescriptionfield.Wedothatbysendingthefollowingquery:

{

"query":{

"span_first":{

"match":{

"span_term":{"description":"world"}

},

"end":2

}

}

}

Intheresults,wewillgetthedocumentthatwehadindexedinthebeginningofthissection.Inthematchsectionofthespanfirstquery,weshouldincludeatleastasinglespanquerythatshouldbematchedatthemaximumpositionspecifiedbytheendparameter.

So,tounderstandeverythingwell,ifwesettheendparameterto1,weshouldn’tgetourdocumentwiththepreviousquery.So,let’scheckitbysendingthefollowingquery:

{

"query":{

"span_first":{

"match":{

"span_term":{"description":"world"}

},

"end":1

}

}

}

Theresponsetotheprecedingquerywillbeasfollows:

{

"took":1,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":0,

"max_score":null,

"hits":[]

}

}

www.EBooksWorld.ir

Page 240: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Soitisworkingasexpected.Thisisbecausethefirstterminourindexwillbethetermtheandnotthetermworldwhichwesearchedfor.

www.EBooksWorld.ir

Page 241: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SpannearqueryThespannearqueryallowsustomatchdocumentsthathaveotherspansneareachotherandwecancallthisqueryacompoundqueryasitwrapsanotherspanquery.Forexample,ifwewanttofinddocumentsthathavethetermworldnearthetermeveryone,wewillrunthefollowingquery:

{

"query":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"everyone"}}

],

"slop":0,

"in_order":true

}

}

}

Asyoucansee,wespecifyourqueriesintheclausessectionofthespannearquery.Itisanarrayofotherspanqueries.Theslopparameterdefinestheallowednumberoftermsbetweenthespans.Thein_orderparametercanbeusedtolimitthematchesonlytothosedocumentsthatmatchourqueriesinthesameorderthattheyweredefinedin.So,inourcase,wewillgetdocumentsthathaveworldeveryone,butnoteveryoneworldinthedescriptionfield.

Solet’sgetbacktoourquery,rightnowitwouldreturn0results.Ifyoulookatourexampledocument,youwillnoticethatbetweenthetermsworldandeveryone,anadditionaltermispresentandwesettheslopparameterto0(slopwasdiscussedduringthephrasequerydescription).Ifweincreaseitto1,wewillgetourresult.Totestit,let’ssendthefollowingquery:

{

"query":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"everyone"}}

],

"slop":1,

"in_order":true

}

}

}

TheresultsreturnedbyElasticsearchareasfollows:

{

"took":6,

"timed_out":false,

"_shards":{

"total":5,

www.EBooksWorld.ir

Page 242: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.10848885,

"hits":[{

"_index":"spans",

"_type":"book",

"_id":"1",

"_score":0.10848885,

"_source":{

"title":"Testbook",

"author":"Testauthor",

"description":"Theworldbreakseveryone,andafterward,someare

strongatthebrokenplaces"

}

}]

}

}

Aswecansee,thealteredquerysuccessfullyreturnedourindexeddocument.

www.EBooksWorld.ir

Page 243: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SpanorqueryThespanorqueryallowsustowrapotherspanqueriesandaggregatematchesofallthosethatwe’vewrapped.Similartothespan_nearquery,thespan_orqueryusesthearrayofclausestospecifyotherspanqueries.Forexample,ifwewanttogetthedocumentsthathavethetermworldinthefirsttwopositionsofthedescriptionfield,ortheonesthathavethetermworldnotfurtherthanasinglepositionfromthetermeveryone,wewillsendthefollowingquerytoElasticsearch:

{

"query":{

"span_or":{

"clauses":[

{

"span_first":{

"match":{

"span_term":{"description":"world"}

},

"end":2

}

},

{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"everyone"}}

],

"slop":1,

"in_order":true

}

}

]

}

}

}

Theresultoftheprecedingquerywillreturnourindexeddocument.

www.EBooksWorld.ir

Page 244: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SpannotqueryThespannotqueryallowsustospecifytwosectionsofqueries.Thefirstistheincludesectionwhichspecifieswhichspanqueriesshouldbematchedandthesecondsectionistheexcludeonewhichspecifiesthespanquerieswhichshouldn’tbeoverlappingthefirstones.Tokeepitsimple,ifaqueryfromtheexcludeonematchesthesamespan(orapartofit)asthequeryfromtheincludesection,suchadocumentwon’tbereturnedasamatchforsuchaspannotquery.Eachofthesesectionscancontainmultiplespanqueries.

So,toillustratethatquery,let’smakeaquerythatwillreturnallthedocumentsthathavethespanconstructedfromasingletermandwhichhavethetermbreaksinthedescriptionfield.Let’salsoexcludethedocumentsthathaveaspanwhichmatchesthetermsworldandeveryoneatthemaximumofasinglepositionfromeachother,whensuchaspanoverlapstheonedefinedinthefirstspanquery.

{

"query":{

"span_not":{

"include":{

"span_term":{"description":"breaks"}

},

"exclude":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"everyone"}}

],

"slop":1

}

}

}

}

}

Thefollowingistheresult:

{

"took":1,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":0,

"max_score":null,

"hits":[]

}

}

Asyouwouldhavenoticed,theresultofthequeryisaswewouldhaveexpected.Ourdocumentwasn’tfoundbecausethespanqueryfromtheexcludesectionwasoverlapping

www.EBooksWorld.ir

Page 245: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

thespanfromtheincludesection.

www.EBooksWorld.ir

Page 246: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SpanwithinqueryThespan_withinqueryallowsustofinddocumentsthathaveaspanenclosedinanotherspan.Wedefinetwosectionsinthespan_withinquery:thelittleandthebig.Thelittlesectiondefinesaspanquerythatneedstobeenclosedbythespanquerydefinedusingthebigsection.

Forexample,ifwewouldliketofindadocumentthathasthetermworldnearthetermbreaksandthosetermsshouldbeinsideaspanthatisboundbythetermsworldandafterwardnotmorethan10termsfromeachother,thequerythatdoesthatwilllookasfollows:

{

"query":{

"span_within":{

"little":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"breaks"}}

],

"slop":0,

"in_order":false

}

},

"big":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"afterward"}}

],

"slop":10,

"in_order":false

}

}

}

}

}

www.EBooksWorld.ir

Page 247: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SpancontainingqueryThespan_contaningquerycanbeseenastheoppositeofthespan_withinquerywejustdiscussed.Itallowsustomatchspansthatoverlapotherspans.Again,weusetwosectionswiththespanqueries:thelittleandthebig.Thelittlesectiondefinesaspanquerythatneedstobeenclosedbythespanquerydefinedusingthebigsection.

Wecanusethesameexample.Ifwewouldliketofindadocumentthathasthetermworldnearthetermbreaks,andthosetermsshouldbeinsideaspanthatisboundbythetermsworldandafterwardnotmorethan10termsfromeachother,thequerythatdoesthatwilllookasfollows:

{

"query":{

"span_containing":{

"little":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"breaks"}}

],

"slop":0,

"in_order":false

}

},

"big":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"afterward"}}

],

"slop":10,

"in_order":false

}

}

}

}

}

www.EBooksWorld.ir

Page 248: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SpanmultiqueryThelasttypeofspanquerythatElasticsearchsupportsisthespan_multiquery.Itallowsustowrapanymultitermquerythatwe’vediscussed(thetermquery,therangequery,thewildcardquery,theregexquery,thefuzzyquery,ortheprefixquery)asaspanquery.

Forexample,ifwewanttofinddocumentsthathavethetermstartingwiththeprefixworinthefirsttwopositionsinthedescriptionfield,wecandothatbysendingthefollowingquery:

{

"query":{

"span_multi":{

"match":{

"prefix":{

"description":{"value":"wor"}

}

}

}

}

}

Thereisonethingtoremember–themultitermquerythatwewanttouseneedstobeenclosedinthematchsectionofthespan_multiquery.

www.EBooksWorld.ir

Page 249: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PerformanceconsiderationsAfewwordsattheendofdiscussingspanqueries.Rememberthattheyarecostlierwhenitcomestoprocessingpower,becausenotonlydothetermshavetobematchedbutalsopositionshavetobecalculatedandchecked.ThismeansthatLuceneandthusElasticsearchwillneedmoreCPUcyclestocalculatealltheneededinformationtofindmatchingdocuments.Youcanexpectspanqueriestobeslowerthanthequeriesthatdon’ttakepositionsintoaccount.

www.EBooksWorld.ir

Page 250: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 251: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ChoosingtherightqueryBynowwe’veseenwhatqueriesareavailableinElasticsearch,boththesimpleonesandtheonesthatcangroupotherqueriesaswell.Beforecontinuingwithmorecomplicatedtopics,wewouldliketodiscusswhichofthequeriesshouldbeusedforwhichusecase.Ofcourse,onecoulddedicatethewholebooktoshowingdifferentqueriesusecases,sowewillonlyshowafewofthemtohelpyouseewhatyoucanexpectandwhichquerytouse.

www.EBooksWorld.ir

Page 252: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheusecasesAsyoualreadyknowwhichqueriescanbeusedtofindwhichdata,whatwewouldliketoshowyouareexampleusecasesusingthedataweindexedinChapter2,IndexingYourData.Todothis,wewillstartwithafewguidinglinesonhowtochosethequeryandthenwewillshowyouexampleusecasesanddiscusswhythosequeriescouldbeused.

LimitingresultstogiventagsOneofthesimplestexamplesofqueryingElasticsearchisthesearchforexactterms.ByexactwemeancharactertocharactercomparisonofatermthatisindexedandwrittenintoLuceneinvertedindex.Torunsuchaquery,wecanusethetermqueryprovidedbyElasticsearch.ThisisbecauseitscontentisnotanalyzedbyElasticsearch.Forexample,let’sassumethatwewouldliketosearchforallthebookswiththevaluenovelinthetagsfield,whichasweknowfromthemappingsisnotanalyzed.Todothat,wewouldrunthefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"term":{

"tags":"novel"

}

}

}'

www.EBooksWorld.ir

Page 253: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SearchingforvaluesinarangeOneofthesimplestqueriesthatcanberunisaquerymatchingdocumentsinagivenrangeofvalues.Usuallysuchqueriesareapartofalargerqueryorafilter.Forexample,aquerythatwouldreturnbookswiththenumberofcopiesfrom1to3inclusive,wouldlookasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"range":{

"copies":{

"gte":1,

"lte":3

}

}

}

}'

BoostingsomeofthematcheddocumentsTherearemanycommonexamplesofusingtheboolquery.Forexample,verysimpleoneslikefindingdocumentshavingalistofterms.Whatwewouldliketoshowyouishowtousetheboolquerytoboostsomeofthedocuments.Forexample,ifwewanttofindallthedocumentsthathaveoneormorecopyandhavetheonesthatarepublishedafter1950,wewillrunthefollowingquery:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"bool":{

"must":[

{

"range":{

"copies":{

"gte":1

}

}

}

],

"should":[

{

"range":{

"year":{

"gt":1950

}

}

}

]

}

}

}'

IgnoringlowerscoringpartialqueriesThedis_maxquery,aswediscussed,allowsustocontrolhowinfluentialthelowerscoring

www.EBooksWorld.ir

Page 254: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

partialqueriesare.Forexample,ifwewouldonlywanttoassignthescoreofthehighestscoringpartialqueryforthedocumentsmatchingcrimepunishmentinthetitlefieldorraskolnikovinthecharactersfield,wewouldrunthefollowingquery:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"fields":["_id","_score"],

"query":{

"dis_max":{

"tie_breaker":0.0,

"queries":[

{

"match":{

"title":"crimepunishment"

}

},

{

"match":{

"characters":"raskolnikov"

}

}

]

}

}

}'

Theresultfortheprecedingquerywilllookasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.70710677,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.70710677

}]

}

}

Nowlet’sseethescoreofthepartialqueriesalone.Todothat,wewillrunthepartialqueriesusingthefollowingcommands:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"fields":["_id","_score"],

"query":{

"match":{

"title":"crimepunishment"

}

www.EBooksWorld.ir

Page 255: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

}'

Theresponsefortheprecedingqueryisasfollows:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.70710677,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.70710677

}]

}

}

Thefollowingisthenextcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"fields":["_id","_score"],

"query":{

"match":{

"characters":"raskolnikov"

}

}

}'

Theresponseisasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5

}]

}

}

www.EBooksWorld.ir

Page 256: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Asyoucansee,thescoreofthedocumentreturnedbyourdis_maxqueryisequaltothescoreofthehighestscoringpartialquery(thefirstpartialquery).Thatisbecausewesetthetie_breakerpropertyto0.0.

UsingLucenequerysyntaxinqueriesHavingasimplesearchsyntaxisveryusefulforusersandwealreadyhavesuch–theLucenequerysyntax.Usingthequery_stringqueryisanexamplewherewecanleveragethatbyallowingtheuserstotypeinquerieswithadditionalcontrolcharacters.Forexample,ifwewouldliketofindbookshavingthetermscrimeandpunishmentintheirtitleandthefyodordostoevskyphraseintheauthorfield,andnotbeingpublishedbetween2000(exclusive)and2015(inclusive),wewouldusethefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"query_string":{

"query":"+title:crime+title:punishment+author:\"fyodordostoevsky\"

-copies:{2000TO2015]"

}

}

}'

Asyoucansee,weusedtheLucenequerysyntaxtopassallthematchingrequirementsandweletthequeryparserconstructtheappropriatequery.

HandlinguserquerieswithouterrorsUsingthequery_stringqueryisveryhandy,butitisnoterrortolerant.IfouruserprovidesincorrectLucenesyntax,thequerywillreturnanerror.Becauseofthat,ElasticsearchexposesasecondquerythatsupportsanalysisandfullLucenequerysyntax–thesimple_query_stringquery.Usingsuchaqueryallowsustoruntheuserqueriesandnotcareabouttheparsingerrorsatall.Forexample,let’slookatthefollowingquery:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"query_string":{

"query":"+crime+punishment\"",

"default_field":"title"

}

}

}'

Theresponsewillcontain:

{

"error":{

"root_cause":[{

"type":"query_parsing_exception",

"reason":"Failedtoparsequery[+crime+punishment\"]",

"index":"library",

"line":6,

"col":3

}],

"type":"search_phase_execution_exception",

www.EBooksWorld.ir

Page 257: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"reason":"allshardsfailed",

"phase":"query",

"grouped":true,

"failed_shards":[{

"shard":0,

"index":"library",

"node":"7jznW07BRrqjG-aJ7iKeaQ",

"reason":{

"type":"query_parsing_exception",

"reason":"Failedtoparsequery[+crime+punishment\"]",

"index":"library",

"line":6,

"col":3,

"caused_by":{

"type":"parse_exception",

"reason":"Cannotparse'+crime+punishment\"':Lexicalerror

atline1,column21.Encountered:<EOF>after:\"\"",

"caused_by":{

"type":"token_mgr_error",

"reason":"Lexicalerroratline1,column21.

Encountered:<EOF>after:\"\""

}

}

}

}]

},

"status":400

}

Thismeansthatthequerywasnotproperlyconstructedandaparseerrorhappened.That’swhythesimple_query_stringquerywasintroduced.Itusesaqueryparserthattriestohandleusermistakesandtriestoguesshowthequeryshouldlook.Ourqueryusingthatparserwilllookasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"simple_query_string":{

"query":"+crime+punishment\"",

"fields":["title"]

}

}

}'

Ifyouruntheprecedingquery,youwillseethattheproperdocumentisreturnedbyElasticsearcheventhoughthequeryisnotproperlyconstructed.

AutocompleteusingprefixesAverycommonusecaseistoprovideautocompletefunctionalityontheindexeddata.Asweknow,theprefixqueryisnotanalyzedandworksonthebasisoftermsindexedinthefield.Sotheactualfunctionalitydependsonwhichtokensareproducedduringindexing.Forexample,let’sassumethatwewouldliketoprovideautocompletefunctionalityonanytokeninthetitlefieldandtheuserprovidedwesprefix.Aquerythatwouldmatchsucharequirementlooksasfollows:

www.EBooksWorld.ir

Page 258: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"prefix":{

"title":"wes"

}

}

}'

FindingtermssimilartoagivenoneAverysimpleexampleisusingthefuzzyquerytofinddocumentshavingatermsimilartoagivenone.Forexample,ifwewanttofindallthedocumentshavingavaluesimilartocrimea,wewillrunthefollowingquery:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"fuzzy":{

"title":{

"value":"crimea",

"fuzziness":2,

"max_expansions":50

}

}

}

}'

MatchingphrasesThesimplestpositionawarequery,thephrasequeryallowsustofinddocumentsnotwithatermbuttermspositionedoneafteranother–onesthatformaphrase.Forexample,aquerythatwouldonlymatchdocumentsthathavethewestennichtsneuesphraseintheotitlefieldwouldlookasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_phrase":{

"otitle":"westennichtsneues"

}

}

}'

Spans,spanseverywhereThelastusecasewewouldliketodiscussisamorecomplicatedexampleofpositionawarequeriescalledspanqueries.Imaginethatwewouldliketorunaquerytofinddocumentsthathavethewesternfrontphrasenotmorethanthreepositionsafterthetermquietandallthatjustaftertheallterm?Thiscanbedonewithspanqueriesandthefollowingcommandshowshowsuchquerywilllook:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"span_near":{

"clauses":[

{

"span_term":{

www.EBooksWorld.ir

Page 259: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"title":"all"

}

},

{

"span_near":{

"clauses":[

{

"span_term":{

"title":"quiet"

}

},

{

"span_near":{

"clauses":[

{

"span_term":{

"title":"western"

}

},

{

"span_term":{

"title":"front"

}

}

],

"slop":0,

"in_order":true

}

}

],

"slop":3,

"in_order":true

}

}

],

"slop":0,

"in_order":true

}

}

}'

Notethatthespanqueriesarenotanalyzed.WecanseethatbylookingattheresponseoftheExplainAPI.Toseethatresponse,weshouldrunthesamerequestbody(ourquery)tothe/library/book/1/_explainRESTend-point.Theinterestingpartoftheoutputlooksasfollows:

"description":"weight(spanNear([title:all,spanNear([title:quiet,

spanNear([title:western,title:front],0,true)],3,true)],0,true)in0)

[PerFieldSimilarity],resultof:",

www.EBooksWorld.ir

Page 260: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 261: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SummaryThischapterhasbeenallaboutthequeryingprocess.WestartedbylookingathowtoqueryElasticsearchandwhatElasticsearchdoeswhenitneedstohandlethequery.Wealsolearnedaboutthebasicandcompoundqueries,sowearenowabletousebothsimplequeriesaswellastheonesthatgroupmultiplesmallqueriestogether.Finally,wediscussedhowtochoosetherightqueryforagivenusecase.

Inthenextchapter,wewillextendourqueryknowledge.WewillstartwithfilteringourqueriesandmovetohighlightingpossibilitiesandawaytovalidateourqueriesusingElasticsearchAPI.WewilldiscusssortingofsearchresultsandqueryrewritewhichwillshowuswhathappenstosomequeriesinElasticsearchinternals.

www.EBooksWorld.ir

Page 262: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 263: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Chapter4.ExtendingYourQueryingKnowledgeInthepreviouschapter,wedivedintoElasticsearchqueryingcapabilities.WediscussedhowtoqueryElasticsearchindetailandwelearnedhowElasticsearchqueryingworks.Wenowknowthebasicandcompoundqueriesofthisgreatsearchengineandwhataretheconfigurationoptionsforeachquerytype.Wealsogottoknowwhentouseourqueriesandwediscussedafewusecasesandwhichqueriescanbeusedtohandlethem.Thischapterisdedicatedtoextendingourqueryingknowledge.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

WhatfilteringisandhowtouseitWhathighlightingisandhowtouseitWhatarethehighlightertypesandwhatbenefitstheybringHowtovalidateyourqueriesHowtosortyourqueryresultsWhatqueryrewriteisandhowtocontrolit

www.EBooksWorld.ir

Page 264: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

FilteringyourresultsInthepreviouschapter,wetalkedaboutvarioustypesofqueries.Thecommonpartwasthatwealwayswantedtogetthebestresultsfirst.Thisisthemaindifferencefromthestandarddatabaseapproachwhereeverydocumentmatchesthequeryornot.Inthedatabaseworld,wedonotaskhowgoodthedocumentis;ouronlyinterestliesintheresultsreturned.Whentalkingaboutfulltextsearchenginesthisisdifferent–weareinterestednotonlyintheresults,wearealsointerestedintheirquality.Thereasonisobvious,wearesearchinginunstructureddata,usingtextfieldsthatuselanguageanalysis,stemming,andsoon.Becauseofthat,theinitialresultsofourqueries,inmostcases,giveresultsthatarefarfromoptimal.Thisiswhywhenwetalkaboutsearching,wetalkaboutprecisionanddocumentrecall.

Ontheotherhand,sometimeswewanttolimitthewholesubsetofdocumentstoachosenpart.Forexample,inalibrary,wemaywanttosearchonlytheavailablebooks,therestbeingunimportant.Sometimesthescore,busilycalculatedforthegivenfields,onlyinterfereswiththeoverallscoreandhasnomeaningintermsofaccuracy.Insuchcases,filtersshouldbeusedtolimittheresultsofthequery,butnotinterferewiththecalculatedscore.

PriortoElasticsearch2.0,filterswereindependententitiesfromqueries.Inpractice,almosteveryqueryhaditsowncounterpartinfilters.Therewasthetermqueryandthetermfilter,theboolqueryandtheboolfilter,therangequeryandtherangefilter,andsoon.Fromtheuserpointofview,themostimportantdifferencebetweenthequeriesandthefilterswasscoring.Thefilterdidn’tcalculatescore,whichresultedinthefilterbeingeasilycachedandmoreefficient.Butthisdifferencewasveryinconvenientforusers.WiththereleaseofElasticsearch2.0anditsusageofLucene5.3,filterqueriesweredeprecatedalongwithsometypesofqueriesthatallowedustousefilters.Let’sdiscusshowfilteringworksnowandwhatwecandotoachievethesameorbetterperformanceasbeforeinElasticsearch2.0.

www.EBooksWorld.ir

Page 265: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThecontextisthekeyInElasticsearch2.0,queriescancalculatescoreoromititbychoosingmoreefficientwayofexecution.Thisbehavior,inmanycases,isdoneautomaticallybasedonthecontextwherethequeryisused.Thisisaboutthequeriesthatincludefiltersections,whichremovethedocumentsbasedonsomecriteria.Thesedocumentsareunnecessaryinthereturnedresultsandshouldbeskippedasquicklyaspossiblewithoutaffectingtheoverallscore.Thankstothis,afterdiscardingsomedocumentswecanfocusonlyontherestofthedocuments,calculatingtheirscores,andsortingthembeforereturning.Theexampleofthiscasecanbethemust_notclauseofaBooleanquery.Thedocumentthatmatchesthemust_notclausewillberemovedfromthereturnedresultset,socalculatingthescoreforthedocumentsmatchedbythispartoftheboolquerywouldbeanadditional,unnecessary,andperformanceineffectivework.

Thebestthingaboutallthechangesisthatwedon’tneedtocareaboutifwewanttousefilteringornot.ElasticsearchandtheunderlyingApacheLucenelibrarytakecareofchoosingtherightexecutionmethodforus.

www.EBooksWorld.ir

Page 266: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ExplicitfilteringwithboolqueryAswementionedintheCompoundqueriessectioninChapter3,SearchingYourData,theboolqueryinElasticsearch2.0allowsustoaddafilterexplicitlybyaddingthefiltersectionandincludingaqueryinthatsection.Thisisveryconvenientifwewanttohaveapartofthequerythatneedstomatch,butwearenotinterestedinthescoreforthosedocuments.

Let’slookatthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"available":true

}

}

}'

Weseeasimplequerythatshouldreturnallthebooksinourlibraryavailableforborrowing,whichmeansthedocumentswiththeavailablefieldsettotrue.Nowlet’scompareitwiththefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"bool":{

"must":{

"match_all":{}

},

"filter":{

"term":{

"available":true

}

}

}

}

}'

Thisqueryreturnsallthebooks,butitalsocontainsthefiltersection,whichtellsElasticsearchthatweareonlyinterestedintheavailablebooks.Thequerywillreturnthesameresultsasthepreviousquerywe’veseen,ofcoursewhenlookingonlyatthenumberofdocumentsandwhichdocumentsarereturned.Thedifferenceisthescore.Forourexampledata,boththequeriesreturntwobooks.Theresultsreturnedforthefirstquerylookasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

www.EBooksWorld.ir

Page 267: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"total":2,

"max_score":1.0,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":1.0,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

},{

"_index":"library",

"_type":"book",

"_id":"1",

"_score":0.30685282,

"_source":{

"title":"AllQuietontheWesternFront",

"otitle":"ImWestennichtsNeues",

"author":"ErichMariaRemarque",

"year":1929,

"characters":["PaulBäumer","AlbertKropp","HaieWesthus",

"FredrichMüller","StanislausKatczinsky","Tjaden"],

"tags":["novel"],

"copies":1,

"available":true,

"section":3

}

}]

}

}

Theresultsforthesecondquerylookasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":2,

"max_score":1.0,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":1.0,

www.EBooksWorld.ir

Page 268: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

},{

"_index":"library",

"_type":"book",

"_id":"1",

"_score":1.0,

"_source":{

"title":"AllQuietontheWesternFront",

"otitle":"ImWestennichtsNeues",

"author":"ErichMariaRemarque",

"year":1929,

"characters":["PaulBäumer","AlbertKropp","HaieWesthus",

"FredrichMüller","StanislausKatczinsky","Tjaden"],

"tags":["novel"],

"copies":1,

"available":true,"section":3}

}]

}

}

Ifyoulookatthescoreforthedocumentsineachquery,you’llnoticethedifference.Inthesimpletermquery,Elasticsearch(theLucenelibrary,infact)hasascoreof1.0forthefirstdocumentandascoreof0.30685282forthesecondone.Thisisnotaperfectsolutionbecausetheavailabilitycheckismoreorlessbinaryandwedon’twantittointerferewiththescore.That’swhythesecondqueryisbetterinthiscase.Withtheboolqueryandfiltering,thescoreforthefilterelementisnotcalculatedandthescoreforboththedocumentsisthesame,thatis1.0.

www.EBooksWorld.ir

Page 269: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 270: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

HighlightingYouhaveprobablyheardofhighlightingorseenit.YoumaynotevenknowthatyouareactuallyusinghighlightingwhenyouareusingthebiggerandsmallerpublicsearchenginesontheWorldWideWeb(WWW).Whenwetalkabouthighlightingincontextoffulltextsearch,weusuallymeanshowingwhichwordsorphrasesfromthequerywerematchedintheresultingdocuments.Forexample,ifweuseGoogleandsearchforthewordlucene,wewouldseethatwordboldedinthesearchresults:

ItisevenmorevisibleontheMicrosoftBingsearchengine:

Inthischapter,wewillseehowtouseElasticsearchhighlightingcapabilitiestoenhanceourapplicationwithhighlightedresults.

www.EBooksWorld.ir

Page 271: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

GettingstartedwithhighlightingThereisnobetterwayofshowinghowhighlightingworksotherthanmakingaqueryandlookingattheresultsreturnedbyElasticsearch.Solet’sdothat.Weassumethatwewouldliketohighlightthetermsthatarematchedinthetitlefieldofourdocumentstoincreasethesearchexperienceofourusers.Bynowyouknowtheexampledatafromtoptobottom,solet’sagainreusethesamedataset.Wewanttomatchthetermcrimeinthetitlefieldandwewanttogethighlightingresults.Oneofthesimplestqueriesthatcanachievethislooksasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"match":{

"title":"crime"

}

},

"highlight":{

"fields":{

"title":{}

}

}

}'

Theresponsefortheprecedingqueryisasfollows:

{

"took":16,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

},

"highlight":{

"title":["<em>Crime</em>andPunishment"]

}

www.EBooksWorld.ir

Page 272: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}]

}

}

Asyoucansee,apartfromthestandardinformationaboutthedocumentsthatmatchedthequery,wegotanewsectioncalledhighlight.Elasticsearchusedthe<em>HTMLtagasthebeginningofthehighlightingsectionanditsclosingcounterparttoclosethehighlightedsection.ThisisthedefaultbehaviorofElasticsearch,butwewilllearnhowtochangethat.

www.EBooksWorld.ir

Page 273: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

FieldconfigurationInordertoperformhighlighting,theoriginalcontentofthefieldneedstobepresent.Wehavetosetthefieldswewilluseforhighlighting.Thisisdonebyeithermarkingafieldtobestoredorusingthe_sourcefieldwiththosefieldsincluded.Ifthefieldissettobestoredinthemappings,thestoredversionwillbeused,otherwiseElasticsearchwilltrytousethe_sourcefieldandextractthefieldthatneedstobehighlighted.

www.EBooksWorld.ir

Page 274: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UnderthehoodElasticsearchusesApacheLuceneunderthehoodandhighlightingisoneofthefeaturesofthatlibrary.Luceneprovidesthreetypesofhighlightingimplementation:thestandardone,whichwejustused;thesecondonecalledFastVectorHighlighter(https://lucene.apache.org/core/5_4_0/highlighter/org/apache/lucene/search/vectorhighlight/FastVectorHighlighter.htmlwhichneedstermvectorsandpositionstobeabletowork;andthethirdonecalledPostingsHighlighter

(http://lucene.apache.org/core/5_4_0/highlighter/org/apache/lucene/search/postingshighlight/PostingsHighlighter.htmlElasticsearchchoosestherighthighlighterimplementationautomatically.Ifthefieldisconfiguredwiththeterm_vectorpropertysettowith_positions_offsets,FastVectorHighlighterwillbeused.Ifthefieldisconfiguredwiththeindex_optionspropertysettooffsets,PostingsHighlighterwillbeused.Otherwise,thestandardhighlighterwillbeusedbyElasticsearch.

Whichhighlightertousedependsonyourdata,yourqueries,andtheneededperformance.Thestandardhighlighterisageneralusecaseone.However,ifyouwanttohighlightfieldswithlotsofdata,FastVectorHighlighteristherecommendedone.Thethingtorememberaboutitisthatitrequirestermvectorstobepresentandthatwillmakeyourindexslightlylarger.Finally,thefastesthighlighter,thatisalsorecommendedfornaturallanguagehighlighting,isPostingsHighlighter.However,thethingtorememberisthatPostingsHighlighterdoesn’tsupportcomplexqueriessuchasthematch_phrase_prefixqueryandinsuchcaseshighlightingwon’tbereturned.

ForcinghighlightertypeWhileElasticsearchchoosesthehighlightertypeforus,wecanalsoenforcethehighlightingtypeifwereallywantto.Todothat,weneedtosetthetypepropertytooneofthefollowingvalues:

plain:Whenthisvalueisset,Elasticsearchwillusethestandardhighlighterfvh:Whenthisvalueisset,ElasticsearchwilltryusingFastVectorHighlighter.Itwillrequiretermvectorstobeturnedonforthefieldusedforhighlighting.postings:Whenthisvalueisset,ElasticsearchwilltryusingPostingsHighlighter.Itwillrequireoffsetstobeturnedonforthefieldusedforhighlighting

Forexample,tousethestandardhighlighter,wewillrunthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"fields":{

"title":{"type":"plain"}

}

}

}'

www.EBooksWorld.ir

Page 275: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ConfiguringHTMLtagsThedefaultbehaviorofhighlightingmechanismmaynotbesuitedforeveryone–notallofuswouldliketohavethe<em>and</em>tagstobeusedforhighlighting.Becauseofthat,Elasticsearchallowsustochangethedefaultbehaviorandchangethetagsthatareusedforthatpurpose.Todothat,weshouldsetthepre_tagsandpost_tagspropertiestothecodesnippetswewantthehighlightingtostartfromandendat;forexample,by<b>and</b>.Thepre_tagsandpost_tagspropertiesarearraysandbecauseofthatwecanprovidemorethanasingleopeningandclosingtagandElasticsearchwilluseeachofthedefinedtagstohighlightdifferentwords.Forexample,ifwewanttouse<b>astheopeningtagand</b>astheclosingtag,ourquerywilllooklikethis:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"pre_tags":["<b>"],

"post_tags":["</b>"],

"fields":{

"title":{}

}

}

}'

TheresultreturnedbyElasticsearchtotheprecedingquerywillbeasfollows:

{

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

www.EBooksWorld.ir

Page 276: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"available":true

},

"highlight":{

"title":["<b>Crime</b>andPunishment"]

}

}]

}

}

Asyoucansee,thetermCrimeinthetitlefieldwassurroundedbythetagsofourchoice.

www.EBooksWorld.ir

Page 277: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ControllinghighlightedfragmentsElasticsearchallowsustocontrolthenumberofhighlightedfragmentsreturnedandtheirsizesbyexposingtwoproperties.Thefirstoneisnumber_of_fragments,whichdefinesthenumberoffragmentsreturnedbyElasticsearch(defaultsto5).Settingthispropertyto0causesthewholefieldtobereturned,whichcanbehandyforshortfieldsbutexpensiveforlongerfields.Thesecondpropertyisfragment_sizewhichletsusspecifythemaximumlengthofthehighlightedfragmentsincharactersanddefaultsto100.

Anexamplequeryusingthesepropertieswilllookasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"fields":{

"title":{"fragment_size":200,"number_of_fragments":0}

}

}

}'

www.EBooksWorld.ir

Page 278: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

GlobalandlocalsettingsThehighlightingpropertieswediscussedpreviouslycanbesetbothonaglobalbasisandperfieldbasis.Theglobaloneswillbeusedforallthefieldsthatdon’toverwritethemandshouldbeplacedonthesamelevelasthefieldssectionofyourhighlighting,forexample,likethis:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"pre_tags":["<b>"],

"post_tags":["</b>"],

"fields":{

"title":{}

}

}

}'

Youcanalsosetthepropertiesforeachfield.Forexample,ifwewouldliketokeepthedefaultbehaviorforallthefieldsexceptourtitlefield,wewoulddothefollowing:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"fields":{

"title":{"pre_tags":["<b>"],"post_tags":["</b>"]}

}

}

}'

Asyoucansee,insteadofplacingthepropertiesonthesamelevelasthefieldssection,weplaceditinsidetheemptyJSONobjectthatspecifiesthetitlefieldbehavior.Ofcourse,eachfieldcanbeconfiguredusingdifferentproperties.

www.EBooksWorld.ir

Page 279: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RequirematchingSometimestheremaybeaneed(especiallywhenusingmultiplehighlightedfields)toshowonlythefieldsthatmatchedourquery.Inordertohavesuchbehavior,weneedtosettherequire_field_matchpropertytotrue.Settingthispropertytofalsewillcauseallthetermstobehighlightedevenifafielddidn’tmatchthequery.

Toseehowthatworks,let’screateanewindexcalledusersandlet’sindexasingledocumentthere.Wewilldothatbysendingthefollowingcommand:

curl-XPUT'http://localhost:9200/users/user/1'-d'{

"name":"Testuser",

"description":"Testdocument"

}'

So,let’sassumewewanttohighlightthehitsinbothoftheprecedingfields.Ourcommandsendingthequerytoournewindexwilllooklikethis:

curl-XGET'localhost:9200/users/_search?pretty'-d'{

"query":{

"term":{

"name":"test"

}

},

"highlight":{

"fields":{

"name":{"pre_tags":["<b>"],"post_tags":["</b>"]},

"description":{"pre_tags":["<b>"],"post_tags":["</b>"]}

}

}

}'

Theresultoftheprecedingquerywillbeasfollows:

{

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.19178301,

"hits":[{

"_index":"users",

"_type":"user",

"_id":"1",

"_score":0.19178301,

"_source":{

"name":"Testuser",

"description":"Testdocument"

},

"highlight":{

www.EBooksWorld.ir

Page 280: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"name":["<b>Test</b>user"]

}

}]

}

}

Notethatweonlygothighlightingonthenamefield.Thisisbecauseourquerymatchedonlythatfield.Let’sseewhatwillhappenifwesettherequire_field_matchpropertytofalseanduseacommandsimilartothefollowingone:

curl-XGET'localhost:9200/users/_search?pretty'-d'{

"query":{

"term":{

"name":"test"

}

},

"highlight":{

"require_field_match":false,

"fields":{

"name":{"pre_tags":["<b>"],"post_tags":["</b>"]},

"description":{"pre_tags":["<b>"],"post_tags":["</b>"]}

}

}

}'

Nowlet’slookatthemodifiedqueryresults:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.19178301,

"hits":[{

"_index":"users",

"_type":"user",

"_id":"1",

"_score":0.19178301,

"_source":{

"name":"Testuser",

"description":"Testdocument"

},

"highlight":{

"name":["<b>Test</b>user"],

"description":["<b>Test</b>document"]

}

}]

}

}

Asyoucansee,Elasticsearchreturnedhighlightinginboththefieldsnow.

www.EBooksWorld.ir

Page 281: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

CustomhighlightingqueryThereareusecaseswhereyourqueriesarecomplicatedandnotreallysuitableforhighlighting,butyoustillwanttousehighlightingfunctionality.Insuchcases,Elasticsearchallowsustohighlightresultsonthebasisofadifferentqueryprovidedusingthehighlight_queryproperty.Anexampleofusingadifferenthighlightingquerylooksasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"fields":{

"title":{

"highlight_query":{

"term":{

"title":"punishment"

}

}

}

}

}

}'

Theprecedingquerywillresultinhighlightingthetermpunishmentinthetitlefield,insteadofthecrimeone.

www.EBooksWorld.ir

Page 282: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThePostingshighlighterItistimetotalkaboutthethirdavailablehighlighter.ItwasaddedinElasticsearch0.90.6andisslightlydifferentfromthepreviousones.PostingsHighlighterisautomaticallyusedwhenthefielddefinitionhasindex_optionssettooffsets.ToillustratehowPostingsHighlighterworks,wewillcreateasimpleindexwithproperconfigurationthatallowsthathighlightertowork.Wewilldothatbyusingthefollowingcommands:

curl-XPUT'localhost:9200/hl_test'

curl-XPOST'localhost:9200/hl_test/doc/_mapping'-d'{

"doc":{

"properties":{

"contents":{

"type":"string",

"fields":{

"ps":{"type":"string","index_options":"offsets"}

}

}

}

}

}'

Ifeverythinggoeswell,weshouldhaveanewindexandthemappings.Themappingshavetwofieldsdefined:onenamedcontentsandthesecondonenamedcontents.ps.Inthissecondcase,weturnedontheoffsetsbyusingtheindex_optionsproperty.ThismeansthatElasticsearchwillusethestandardhighlighterforthecontentsfieldandthepostingshighlighterforthecontents.psfield.

Toseethedifference,wewillindexasingledocumentwithafragmentfromWikipediadescribingthehistoryofBirmingham.Wedothatbyrunningthefollowingcommand:

curl-XPUTlocalhost:9200/hl_test/doc/1-d'{

"contents":"Birmingham''searlyhistoryisthatofaremoteand

marginalarea.Themaincentresofpopulation,powerandwealthinthepre-

industrialEnglishMidlandslayinthefertileandaccessiblerivervalleys

oftheTrent,theSevernandtheAvon.TheareaofmodernBirminghamlayin

between,ontheuplandBirminghamPlateauandwithinthedenselywoodedand

sparselypopulatedForestofArden."

}'

Thelaststepistosendaqueryusingboththehighlighters.Wecandoitinasinglerequestbyusingthefollowingcommand:

curl'localhost:9200/hl_test/_search?pretty'-d'{

"query":{

"term":{

"contents.ps":"modern"

}

},

"highlight":{

"require_field_match":false,

"fields":{

"contents":{},

"contents.ps":{}

www.EBooksWorld.ir

Page 283: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

}

}'

Ifeverythinggoeswell,youwillfindthefollowingsnippetintheresponsereturnedbyElasticsearch:

"highlight":{

"contents":["valleysoftheTrent,theSevernandtheAvon.Thearea

of<em>modern</em>Birminghamlayinbetween,ontheupland"],

"contents.ps":["Theareaof<em>modern</em>Birminghamlayinbetween,

ontheuplandBirminghamPlateauandwithinthedenselywoodedandsparsely

populatedForestofArden."]

}

Asyousee,boththehighlightersfoundtheoccurrenceofthedesiredword.Thedifferenceisthatthepostingshighlighterreturnsthesmartersnippet–itchecksforthesentenceboundaries.

Let’stryonemorequery:

curl'localhost:9200/hl_test/_search?pretty'-d'{

"query":{

"match_phrase":{

"contents.ps":"centresof"

}

},

"highlight":{

"require_field_match":false,

"fields":{

"contents":{},

"contents.ps":{}

}

}

}'

Wesearchedforthephrasecentresof.Asyoumayexpect,theresultsforthetwohighlighterswilldiffer.Forthestandardhighlighter,runonthecontentsfield,wewillfindthefollowingphraseintheresponse:

"Birminghamsearlyhistoryisthatofaremoteandmarginalarea.Themain

<em>centres</em><em>of</em>population"

Asyoucanclearlysee,thestandardhighlighterdividedthegivenphraseandhighlightedindividualterms.Also,notalloccurrencesofthetermscentresandofwerehighlighted,butonlytheonesthatformedthephrase.

Ontheotherhand,thePostingsHighlighterreturnedthefollowinghighlightedfragment:

"Birminghamsearlyhistoryisthat<em>of</em>aremoteandmarginal

area.","Themain<em>centres</em><em>of</em>population,powerandwealth

inthepre-industrialEnglishMidlandslayinthefertileandaccessible

rivervalleys<em>of</em>theTrent,theSevernandtheAvon.","Thearea

<em>of</em>modernBirminghamlayinbetween,ontheuplandBirmingham

PlateauandwithinthedenselywoodedandsparselypopulatedForest

www.EBooksWorld.ir

Page 284: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

<em>of</em>Arden."

Thisisthesignificantdifference.ThePostingsHighlighterhighlightedallthetermsmatchingthequeryandnotonlythosethatformedthephrase,andreturnedwholesentences.Thisisaverynicefeature,especiallywhenyouwanttodisplaythehighlightingresultsfortheuserintheUIofyourapplication.

www.EBooksWorld.ir

Page 285: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 286: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ValidatingyourqueriesTherearetimeswhenyouarenotintotalcontrolofthequeriesthatyousendtoElasticsearch.Thequeriescanbegeneratedfrommultiplecriteriamakingthemamonsterorevenworse.Theycanbegeneratedbysomekindofawizardwhichmakesithardtotroubleshootandfindthepartthatisfaultyandmakingthequeryfail.Becauseofsuchusecases,ElasticsearchexposestheValidateAPI,whichhelpsusvalidateourqueriesanddiagnosepotentialproblems.

www.EBooksWorld.ir

Page 287: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UsingtheValidateAPITheusageoftheValidateAPIisverysimple.Insteadofsendingthequerytothe_searchRESTendpoint,wesendittothe_validate/queryone.Andthat’sit.Let’slookatthefollowingcommand:

curl-XGET'localhost:9200/library/_validate/query?pretty'--data-binary'{

"query":{

"bool":{

"must":{

"term":{

"title":"crime"

}

},

"should":{

"range:{

"year":{

"from":1900,

"to":2000

}

}

},

"must_not":{

"term":{

"otitle":"nothing"

}

}

}

}

}'

AsimilarquerywasalreadyusedinthisbookinChapter3,SearchingYourData.TheprecedingcommandwilltellElasticsearchtovalidateitandreturntheinformationaboutitsvalidity.TheresponseofElasticsearchtotheprecedingcommandwillbesimilartothefollowingone:

{

"valid":false,

"_shards":{

"total":1,

"successful":1,

"failed":0

}

}

Lookatthevalidattribute.Itissettofalse.Somethingwentwrong.Let’sexecutethequeryvalidationonceagainwiththeexplainparameteraddedinthequery:

curl-XGET'localhost:9200/library/_validate/query?pretty&explain'--data-

binary'{

"query":{

"bool":{

"must":{

"term":{

www.EBooksWorld.ir

Page 288: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"title":"crime"

}

},

"should":{

"range:{

"year":{

"from":1900,

"to":2000

}

}

},

"must_not":{

"term":{

"otitle":"nothing"

}

}

}

}

}'

NowtheresultreturnedfromElasticsearchismoreverbose:

{

"valid":false,

"_shards":{

"total":1,

"successful":1,

"failed":0

},

"explanations":[{

"index":"library",

"valid":false,

"error":"[library]QueryParsingException[Failedtoparse];nested:

JsonParseException[Illegalunquotedcharacter((CTRL-CHAR,code10)):has

tobeescapedusingbackslashtobeincludedinname\nat[Source:

org.elasticsearch.transport.netty.ChannelBufferStreamInput@1110d090;line:

10,column:18]];;com.fasterxml.jackson.core.JsonParseException:Illegal

unquotedcharacter((CTRL-CHAR,code10)):hastobeescapedusing

backslashtobeincludedinname\nat[Source:

org.elasticsearch.transport.netty.ChannelBufferStreamInput@1110d090;line:

10,column:18]"

}]

}

Noweverythingisclear.Inourexample,wehaveimproperlyquotedtherangeattribute.

NoteYoumaywonderwhyinourcurlqueryweusedthe--data-binaryparameter.ThisparameterproperlypreservesthenewlinecharacterwhensendingaquerytoElasticsearch.Thismeansthatthelineandthecolumnnumberremainintactandit’seasiertofinderrors.Intheothercases,the–dparameterismoreconvenientbecauseit’sshorter.

TheValidateAPIcanalsodetectothererrors,forexample,incorrectformatofanumberorothermapping-relatedissues.Unfortunately,forourapplication,itisnoteasytodetectwhattheproblemisbecauseofalackofstructureintheerrormessages.

www.EBooksWorld.ir

Page 289: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheValidateAPIsupportsmostoftheparametersthataresupportedbystandardElasticsearchqueries,whichinclude:explain,ignore_unavailable,allow_no_indices,expand_wildcards,operation_threading,analyzer,analyze_wildcard,default_operator,df,lenient,lowercase_expanded_terms,andrewrite.

www.EBooksWorld.ir

Page 290: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 291: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SortingdataSofarwe’verunourqueriesandgottheresultsintheorderdeterminedbythescoreofeachdocument.However,itisnotenoughforalltheusecases.Itisreallyhandytobeabletosortourresultsonthebasisofthefieldvalues.Forexample,whenyouaresearchinglogsortime-baseddataingeneral,youprobablywanttohavethemostrecentdatafirst.Inadditiontothat,Elasticsearchallowsustocontrolhowthedocumentsuchbesortednotonlyusingfieldvalues,butalsousingmoresophisticatedsortinglikeonesthatusescriptsorsortingonfieldsthathavemultiplevalues.Wewillcoverallthatinthissection.

www.EBooksWorld.ir

Page 292: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DefaultsortingLet’slookatthefollowingquerythatreturnsallthebookswithatleastoneofthespecifiedwords:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"terms":{

"title":["crime","front","punishment"]

}

}

}'

Underthehood,wecanimaginethatElasticsearchseestheprecedingqueryasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"terms":{

"title":["crime","front","punishment"]

}

},

"sort":{"_score":"desc"}

}'

Lookatthehighlightedsectionintheprecedingquery.ThisisthedefaultsortingusedbyElasticsearch.Forbettervisibility,wecanchangetheformattingslightlyandshowthehighlightedfragmentasfollows:

"sort":[

{"_score":"desc"}

]

Theprecedingsectiondefineshowthedocumentsshouldbesortedintheresultslist.Inthiscase,Elasticsearchwillshowthedocumentswiththehighestscoreontopoftheresultslist.Thesimplestmodificationistoreversetheorderingbychangingthesortsectiontothefollowingone:

"sort":[

{"_score":"asc"}

]

www.EBooksWorld.ir

Page 293: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SelectingfieldsusedforsortingDefaultsortingisboring,isn’tit?So,let’schangeittosortonthebasisofthevaluesofthefieldspresentinthedocuments.Let’schoosethetitlefield,whichmeansthatthesortsectionofourquerywilllookasfollows:

"sort":[

{"title":"asc"}

]

Unfortunately,thisdoesn’tworkasexpected.AlthoughElasticsearchsortedthedocuments,theorderingissomewhatstrange.Lookcloselyattheresponse.Witheverydocument,Elasticsearchreturnsinformationaboutthesorting;forexample,fortheCrimeandPunishmentbook,thereturneddocumentlookslikethefollowingcode:

{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":null,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

},

"sort":["punishment"]

}

Ifyoucomparethetitlefieldandthereturnedsortinginformation,everythingshouldbeclear.Elasticsearch,duringtheanalysisprocess,splitsthefieldintoseveraltokens.Sincesortingisdoneusingasingletoken,Elasticsearchchoosesoneoftheproducedtokens.Itdoesthebestthatitcanbysortingthesetokensalphabeticallyandchoosingthefirstone.Thisisthereasonwhy,inthesortingvalue,wefindonlyasinglewordinsteadofthewholecontentofthetitlefield.IfyouwouldliketoseehowElasticsearchbehaveswhenusingdifferentfieldsforsorting,youcantryfieldssuchascopies:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"terms":{

"title":["crime","front","punishment"]

}

},

"sort":[

{"copies":"asc"}

]

}'

Ingeneral,itisagoodideatohaveanotanalyzedfieldforsorting.Wecanusefieldswith

www.EBooksWorld.ir

Page 294: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

multiplevaluesforsorting,but,inmostcases,itdoesn’tmakemuchsenseandhaslimitedusage.

Asanexampleofusingtwodifferentfields,oneforsortingandanotherforsearching,let’schangeourtitlefield.Thechangedtitlefielddefinitionwilllookasfollows:

"title":{

"type":"string",

"fields":{

"sort":{"type":"string","index":"not_analyzed"}

}

}

Afterchangingthetitlefieldinthemappings(we’veusedthesamemappingsasinChapter3,SearchingYourData)andre-indexingthedata,wecantrysortingthetitle.sortfieldandseewhetheritworks.Todothis,wewillneedtosendthefollowingquery:

{

"query":{

"match_all":{}

},

"sort":[

{"title.sort":"asc"}

]

}

Now,itworksproperly.Asyoucansee,weusedthenewfield,thetitle.sortone.Wesetitasnottobeanalyzed,sothereisasingletokenforthatfieldintheindexofElasticsearch.

SortingmodeIntheresponsefromElasticsearch,everydocumentcontainsinformationaboutthevalueusedforsorting.Forexample,let’slookatoneofthedocumentsreturnedbythequeryinwhichweusedthetitlefieldforsorting:

{

"_index":"library",

"_type":"book",

"_id":"1",

"_score":null,

"_source":{

"title":"AllQuietontheWesternFront",

"otitle":"ImWestennichtsNeues",

"author":"ErichMariaRemarque",

"year":1929,

"characters":["PaulBäumer","AlbertKropp","HaieWesthus",

"FredrichMüller","StanislausKatczinsky","Tjaden"],

"tags":["novel"],

"copies":1,

"available":true,

"section":3

},

"sort":["all"]

www.EBooksWorld.ir

Page 295: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

Thesortingusedinthequerytogettheprecedingdocument,wasasfollows:

"sort":[

{"title":"asc"}

]

However,becausewearesortingonananalyzedfield,whichcontainsmorethanasinglevalue,thesortingdefinitionisinfactequivalenttothelongerform,whichlooksasfollows:

"sort":[

{"title":{"order":"asc","mode":"min"}

]

modedefineswhichtokenshouldbeusedforcomparisonwhensortingonafieldwhichhasmorethanonevalue.Theavailablevalueswecanchoosefromare:

min:Sortingwillusethelowestvalue(orthefirstalphabeticalvalueonthetextbasedfields)max:Sortingwillusethehighestvalue(orthelastalphabeticalvalueonthetextbasedfields)avg:Sortingwillusetheaveragevaluemedian:Sortingwillusethemedianvaluesum:Sortingwillusethesumofallthevaluesinthefield

NoteThemodessuchasmedian,avg,andsumareusefulfornumericalmultivaluedfields,butdon’tmakemuchsensewhenitcomestotextbasedfields.

Notethatsort,inrequestandresponse,isgivenasanarray.Thissuggeststhatwecanuseseveraldifferentorderings.Elasticsearchwillusethenextelementinthesortingdefinitionlisttodetermineorderingbetweenthedocumentsthathavethesamevalueoftheprevioussortingclause.So,ifwehavethesamevalueinthetitlefield,thedocumentswillbesortedbythenextfieldthatwespecify.Forexample,ifwewouldliketogetthedocumentsthathavethemostcopiesandthensortbythetitle,wewillrunthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"terms":{

"title":["crime","front","punishment"]

}

},

"sort":[

{"copies":"desc"},{"title":"asc"}

]

}'

www.EBooksWorld.ir

Page 296: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SpecifyingbehaviorformissingfieldsWhataboutwhensomeofthedocumentsthatmatchthequerydon’thavethefieldwewanttosorton?Bydefault,documentswithoutthegivenfieldarereturnedfirstinthecaseofascendingorderandlastinthecaseofdescendingorder.However,sometimesthisisnotexactlywhatwewanttoachieve.

Whenweusesortingonnumericfields,wecanchangethedefaultElasticsearchbehaviorfordocumentswithmissingfields.Forexample,let’stakealookatthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":[

{

"section":{

"order":"asc",

"missing":"_last"

}

}

]

}'

Notetheextendedformofthesortsectionofourquery.We’veaddedthemissingparametertoit.Bysettingthemissingparameterto_last,Elasticsearchwillplacethedocumentswithoutthegivenfieldatthebottomoftheresultslist.Settingthemissingparameterto_firstwillresultinElasticsearchplacingdocumentswithoutthegivenfieldatthetopoftheresultslist.Itisworthmentioningthatbesidesthe_lastand_firstvalues,Elasticsearchalsoallowsustouseanynumber.Insuchacase,adocumentwithoutadefinedfieldwillbetreatedasthedocumentwiththisgivenvalue.

www.EBooksWorld.ir

Page 297: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DynamiccriteriaAswementionedintheprevioussection,Elasticsearchallowsustosortusingfieldsthathavemultiplevalues.Wecancontrolhowthecomparisonismadeusingscripts.WedothatbyshowingElasticsearchhowtocalculatethevaluethatshouldbeusedforsorting.Let’sassumethatwewanttosortbythefirstvalueindexedinthetagsfield.Let’stakealookatthefollowingexamplequery(notethatrunningthefollowingqueryrequiresthescript.inlinepropertysettoonintheelasticsearch.ymlfile):

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":"doc[\"tags\"].values.size()>0?doc[\"tags\"].values[0]

:\"\u19999\"",

"type":"string",

"order":"asc"

}

}

}'

Intheprecedingexample,wereplacedeverynonexistentvaluewiththeUnicodecodeofacharacterthatshouldbelowenoughinthelist.Themainideaofthiscodeistocheckifourarraycontainsatleastasingleelement.Ifitdoes,thenthefirstvaluefromthearrayisreturned.Ifthearrayisempty,wereturntheUnicodecharacterthatshouldbeplacedatthebottomoftheresultslist.Besidesthescriptparameter,thisoptionofsortingrequiresustospecifytheorder(ascending,inourcase)andtypeparametersthatwillbeusedforthecomparison(wereturnstringfromourscript).

www.EBooksWorld.ir

Page 298: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

CalculatescoringwhensortingBydefault,Elasticsearchassumesthatwhenyouusesorting,thescoreiscompletelyunimportant.Usuallyitisagoodassumption;whydoadditionalcomputationswhentheimportanceofthedocumentsisgivenbythesortingformula.Sometimes,however,youwanttoknowhowgoodthedocumentisinrelationtothecurrentquery,evenifthedocumentsarepresentedinadifferentorder.Thisiswhenthetrack_scoresparametershouldbeusedandsettotrue.Anexamplequeryusingitlooksasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"match_all":{}

},

"track_scores":true,

"sort":[

{"title":{"order":"asc"}}

]

}'

Theprecedingquerycalculatesthescoreforeverydocument.Infact,inourexample,thescoreisboringandisalwaysequalto1.0becauseofthematch_allquerywhichtreatsallthedocumentsasequal.

www.EBooksWorld.ir

Page 299: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 300: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

QueryrewriteWhendebuggingyourqueries,itisveryvaluabletoknowhowallthequeriesareexecuted.Becauseofthat,wedecidedtoincludethesectiononhowqueryrewriteworksinElasticsearch,whyitisused,andhowtocontrolit.Ifyouhaveeverusedqueries,suchastheprefixqueryandthewildcardquery,basicallyanyquerythatissaidtobemultiterm(aquerythatisbuiltofmultipleterms),you’veusedqueryrewritingeventhoughyoumaynothaveknownaboutit.Elasticsearchdoesrewriteforperformancereasons.Therewriteprocessisaboutchangingtheoriginal,expensivequeryintoasetofqueriesthatarefarlessexpensivefromanApacheLucenepointofview,thusspeedingupthequeryexecution.

www.EBooksWorld.ir

Page 301: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PrefixqueryasanexampleThebestwaytoillustratehowtherewriteprocessisdoneinternallyistolookatanexampleandseewhichtermsareusedinsteadoftheoriginalqueryterm.Wewillindexthreedocumentstoourlibrary_itindexbyusingthefollowingcommands:

curl-XPOST'localhost:9200/library_it/book/1'-d'{"title":"Solr4

Cookbook"}'

curl-XPOST'localhost:9200/library_it/book/2'-d'{"title":"Solr3.1

Cookbook"}'

curl-XPOST'localhost:9200/library_it/book/3'-d'{"title":"Mastering

Elasticsearch"}'

Whatwewouldlikeistofindallthedocumentsthatstartwiththeletters.Simpleasthat,werunthefollowingqueryagainstourlibrary_itindex:

curl-XGET'localhost:9200/library_it/_search?pretty'-d'{

"query":{

"prefix":{

"title":{

"prefix":"s",

"rewrite":"constant_score_boolean"

}

}

}

}'

We’veusedasimpleprefixquery;we’vesaidthatwewouldliketofindallthedocumentswiththelettersinthetitlefield.We’vealsousedtherewritepropertytospecifythequeryrewritemethod,butlet’sskipitfornowaswewilldiscussthepossiblevaluesofthisparameterinthelaterpartofthissection.

Astheresponsetothepreviousquery,wegetthefollowing:

{

"took":13,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":2,

"max_score":1.0,

"hits":[{

"_index":"library_it",

"_type":"book",

"_id":"2",

"_score":1.0,

"_source":{

"title":"Solr3.1Cookbook"

}

},{

"_index":"library_it",

www.EBooksWorld.ir

Page 302: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"_type":"book",

"_id":"1",

"_score":1.0,

"_source":{

"title":"Solr4Cookbook"

}

}]

}

}

Asyoucansee,inresponsewegotthetwodocumentsthathadthecontentsofthetitlefieldstartingwiththedesiredcharacter.Wedidn’tspecifythemappingsexplicitly,sowereliedonElasticsearch’sabilitytochoosethemappingtypeforus.Aswealreadyknow,forthetextfield,Elasticsearchusesthedefaultanalyzer.Thismeansthatthetermsinourdocumentswillbelowercasedand,becauseofthat,weusedthelowercasedletterinourprefixquery(rememberthattheprefixqueryisnotanalyzed).

www.EBooksWorld.ir

Page 303: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

GettingbacktoApacheLuceneNowlet’stakeastepbackandlookatApacheLuceneagain.IfyourecallwhatLuceneinvertedindexisbuiltfrom,youcantellthatitcontainsaterm,count,anddocumentpointer(ifyoudon’trecall,refertotheFulltextsearchingsectioninChapter1,GettingStartedwithElasticsearchCluster).So,let’sseehowthesimplifiedviewoftheindexmaylookfortheprecedingdatawe’veputtothelibrary_itindex:

WhatyouseeinthecolumnwiththeTermtextisquiteimportant.IfyoulookatElasticsearchandApacheLuceneinternals,youcanseethatourprefixquerywasrewrittentothefollowingLucenequery:

ConstantScore(title:solr)

WecanchecktheportionsoftherewriteusingtheElasticsearchAPI.Firstofall,wecanusetheExplainAPIbyrunningthefollowingcommand:

curl-XGET'localhost:9200/library_it/book/1/_explain?pretty'-d'{

"query":{

"prefix":{

"title":{

"prefix":"s",

"rewrite":"constant_score_boolean"

}

}

}

}'

Theresultwillbeasfollows:

{

"_index":"library_it",

"_type":"book",

"_id":"1",

"matched":true,

"explanation":{

www.EBooksWorld.ir

Page 304: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"value":1.0,

"description":"sumof:",

"details":[{

"value":1.0,

"description":"ConstantScore(title:solr),productof:",

"details":[{

"value":1.0,

"description":"boost",

"details":[]

},{

"value":1.0,

"description":"queryNorm",

"details":[]

}]

},{

"value":0.0,

"description":"matchonrequiredclause,productof:",

"details":[{

"value":0.0,

"description":"#clause",

"details":[]

},{

"value":1.0,

"description":"_type:book,productof:",

"details":[{

"value":1.0,

"description":"boost",

"details":[]

},{

"value":1.0,

"description":"queryNorm",

"details":[]

}]

}]

}]

}

}

WecanseethatElasticsearchusedaconstantscorequerywiththetermsolragainstthetitlefield.

www.EBooksWorld.ir

Page 305: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

QueryrewritepropertiesWecancontrolhowthequeriesarerewritteninternally.Todothat,weplacetherewriteparameterinsidetheJSONobjectresponsiblefortheactualquery.Forexample:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"prefix":{

"title":"s",

"rewrite":"constant_score_boolean"

}

}

}'

Therewritepropertycantakethefollowingvalues:

scoring_boolean:ThisrewritemethodtranslateseachgeneratedtermintoaBooleanshouldclauseintheBooleanquery.Thisrewritemethodcausesthescoretobecalculatedforeachdocument.Becauseofthat,thismethodmaybeCPUdemanding.Pleasealsonotethat,forqueriesthathavemanyterms,itmayexceedtheBooleanquerylimit,whichissetto1024.ThedefaultBooleanquerylimitcanbechangedbysettingtheindex.query.bool.max_clause_countpropertyintheelasticsearch.ymlfile.However,rememberthatthemoreBooleanqueriesproduced,thelowerthequeryperformancemaybe.constant_score:Thisrewritemethodchoosesconstant_score_booleanorconstant_score_filterdependingonthequeryandtakingperformanceintoconsideration.Thisisalsothedefaultbehaviorwhentherewritepropertyisnotsetatall.constant_score_boolean:Thisrewritemethodissimilartothescoring_booleanrewritemethoddescribedpreviously,butlessCPUdemandingbecausethescoringisnotcomputedand,insteadofthat,eachtermreceivesascoreequaltothequeryboost(onebydefault,andwhichcanbesetusingtheboostproperty).BecausethisrewritemethodalsoresultsinBooleanshouldclausesbeingcreated,similartothescoring_booleanrewritemethod,thismethodcanalsohitthemaximumBooleanclauseslimit.top_terms_N:ArewritemethodthattranslateseachgeneratedtermintoaBooleanshouldclauseinaBooleanqueryandkeepsthescoresascomputedbythequery.However,unlikethescoring_booleanrewritemethod,itonlykeepsanNnumberoftopscoringtermstoavoidhittingthemaximumBooleanclauseslimitandincreasethefinalqueryperformance.top_terms_blended_freqs_N:ArewritemethodthattranslateseachtermintoaBooleanqueryandtreatthetermsasiftheyhadthesametermfrequency.top_terms_boost_N:Arewritemethodsimilartothetop_terms_None,butthescoresarenotcomputed.Instead,thedocumentsaregivenascoreequaltothevalueoftheboostproperty(onebydefault).

Forexample,ifwewouldlikeourexamplequerytousetop_terms_NwithNequalto2,ourquerywouldlooklikethis:

www.EBooksWorld.ir

Page 306: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"prefix":{

"title":{

"prefix":"s",

"rewrite":"top_terms_2"

}

}

}

}'

IfyoulookattheresultsreturnedbyElasticsearch,you’llnoticethat,unlikeourinitialquery,thedocumentsweregivenascoredifferentthanthedefault1.0:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.15342641,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"3",

"_score":0.15342641,

"_source":{

"title":"TheCompleteSherlockHolmes",

"author":"ArthurConanDoyle",

"year":1936,

"characters":["SherlockHolmes","Dr.Watson","G.Lestrade"],

"tags":[],

"copies":0,

"available":false,

"section":12

}

}]

}

}

Thescoreisdifferentthanthedefault1.0becausewe’veusedthetop_terms_NrewritetypeandthistypeofqueryrewritekeepsthescoreforNtopscoringterms.

BeforewefinishtheQueryrewritesectionofthischapter,weshouldaskourselvesonelastquestion:whentousewhichrewritetype?Theanswertothisquestiongreatlydependsonyourusecase,but,tosummarize,ifyoucanlivewithlowerprecisionandrelevancy(buthigherperformance),youcangoforthetopNrewritemethod.Ifyouneedhighprecisionandthusmorerelevantqueries(butlowerperformance),choosetheBooleanapproach.

www.EBooksWorld.ir

Page 307: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 308: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SummaryThechapteryoujustfinishedwasagainfocusedonquerying.Weusedfiltersandsawwhathighlightingisandhowtouseit.Welearnedwhatarethehighlightertypesandhowtheycanhelpus.WevalidatedourqueriesandwelearnedhowElasticsearchcanhelpuswhenitcomestosortingourresults.Finally,wediscussedqueryrewriting,whatthatbringsus,andhowwecancontrolit.

Inthenextchapter,wewillgetbacktoindexationtopic.WewilldiscussindexingcomplexJSONobjectssuchastree-likestructuresandindexingdatathatisnotflat.WewillprepareElasticsearchtohandlerelationshipsbetweendocumentsandwewillusetheElasticsearchAPItoupdatethestructureofourindices.

www.EBooksWorld.ir

Page 309: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 310: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Chapter5.ExtendingYourIndexStructureWestartedthepreviouschapterbylearninghowtodealwithrevisedfilteringinElasticsearch2.xandwhattoexpectfromitnow.Wealsoexploredhighlightingandhowitcanhelpusinimprovingtheusers’searchexperience.WediscoveredqueryvalidationinElasticsearchandlearnedthewaysofdatasortinginElasticsearch.Finally,wediscussedqueryrewritingandhowthataffectsourqueries.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

Indexingtree-likestructuresIndexingdatathatisnotflatHandlingdocumentrelationshipsbyusingnestedobjectandparent–childfeaturesModifyingindexstructurebyusingElasticsearchAPI

www.EBooksWorld.ir

Page 311: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Indexingtree-likestructuresTreesareeverywhere.Ifyoudevelopane-commerceshopapplication,yourproductswillprobablybedescribedwiththeuseofcategories.Thethingaboutcategoriesisthatinmostcasestheyarehierarchical.Therearetopcategories,suchaselectronics,music,books,andsoon.Eachofthetoplevelcategoriescanhavenumerouschildrencategories,suchasfictionandscience,andthosecangetevendeeperintosciencefiction,romance,andsoon.Ifyoulookatthefilesystem,thefilesanddirectoriesarearrangedintree-likestructuresaswell.Thisbookcanalsoberepresentedasatree:chapterscontaintopicsandtopicsaredividedintosubtopics.Sothedataaroundusisarrangedintotree-likestructuresandasyoucanimagine,Elasticsearchiscapableofindexingtree-likestructuressothatwecanrepresentthedatainaneasiermanner.Let’scheckhowwecannavigatethroughthistypeofdatausingpath_analyzer.

www.EBooksWorld.ir

Page 312: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DatastructureTobeginwith,let’screateasimpleindexstructurebyusingthefollowingcommand:

curl-XPUT'localhost:9200/path?pretty'-d'{

"settings":{

"index":{

"analysis":{

"analyzer":{

"path_analyzer":{"tokenizer":"path_hierarchy"}

}

}

}

},

"mappings":{

"category":{

"properties":{

"category":{

"type":"string",

"fields":{

"name":{"type":"string","index":"not_analyzed"},

"path":{"type":"string","analyzer":"path_analyzer",

"store":true}

}

}

}

}

}

}'

Asyoucansee,wehaveasingletypecreated–thecategorytype.Wewilluseittostoreandindextheinformationaboutthelocationofourdocumentinthetreestructure.Theideaissimple–wecanshowthelocationofthedocumentasapath,intheexactsamemannerasthefilesanddirectoriesarepresentedonyourharddiskdrive.Forexample,inanautomotiveshop,wecanhave/cars/passenger/sport,/cars/passenger/camper,or/cars/delivery_truck/.However,toachievethat,weneedtoindexthispathintwodifferentways.Firstofall,wewilluseannotanalyzedfieldcalledname,tostoreandindexpathsnameinitsoriginalform.Wewillalsouseafieldcalledpath,whichwillusethepath_analyzeranalyzerwhichwe’vedefinedtoprocessthepathsoitiseasiertosearch.

www.EBooksWorld.ir

Page 313: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AnalysisNow,let’sseewhatElasticsearchwilldowiththecategorypathduringtheanalysisprocess.Toseethis,wewillusethefollowingcommandline,whichusestheanalysisAPIdiscussedintheUnderstandingtheexplaininformationsectionofChapter6,MakeYourSearchBetter:

curl-XGET'localhost:9200/path/_analyze?field=category.path&pretty'-d

'/cars/passenger/sport'

ThefollowingresultswillbereturnedbyElasticsearch:

{

"tokens":[{

"token":"/cars",

"start_offset":0,

"end_offset":5,

"type":"word",

"position":0

},{

"token":"/cars/passenger",

"start_offset":0,

"end_offset":15,

"type":"word",

"position":0

},{

"token":"/cars/passenger/sport",

"start_offset":0,

"end_offset":21,

"type":"word",

"position":0

}]

}

Aswecansee,ourcategorypath/cars/passenger/sportwasprocessedbyElasticsearchanddividedintothreetokens.Thankstothis,wecansimplyfindeverydocumentthatbelongstoagivencategoryoritssubcategoriesusingthetermfilter.Fortheexampletobecomplete,let’sindexasimpledocumentbyusingthefollowingcommand:

curl-XPUT'localhost:9200/path/category/1'-d'{"category":

"/cars/passenger/sport"}'

Anexampleofusingfiltersisasfollows:

curl-XGET'localhost:9200/path/_search?pretty'-d'{

"query":{

"bool":{

"filter":{

"term":{

"category.path":"/cars"

}

}

}

}

}'

www.EBooksWorld.ir

Page 314: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Notethatwealsohavetheoriginalvalueindexedinthecategory.namefield.Thisishandywhenwewanttofinddocumentsfromaparticularpath,ignoringthedocumentsthataredeeperinthehierarchy.

www.EBooksWorld.ir

Page 315: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 316: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IndexingdatathatisnotflatNotalldataisflatliketheexampleswehaveusedinthebookuntilnow.MostofthedatayouwillencounterwillhavesomestructureandnestedobjectsinsidetherootJSONobject.Ofcourse,ifwearebuildingoursystemthatElasticsearchwillbeapartofandweareincontrolofallthepiecesofit,wecancreateastructurethatisconvenientforElasticsearch.Buteveninsuchcases,flatdataisnotalwaysanoption.Thankfully,Elasticsearchallowsustoindexdatathatisnotflatandthissectionwillshowushowtodothat.

www.EBooksWorld.ir

Page 317: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DataLet’sassumethatwehavethefollowingdata(westoreitinthefilecalledstructured_data.json):

{

"author":{

"name":{

"firstName":"Fyodor",

"lastName":"Dostoevsky"

}

},

"isbn":"123456789",

"englishTitle":"CrimeandPunishment",

"year":1886,

"characters":[

{

"name":"Raskolnikov"

},

{

"name":"Sofia"

}

],

"copies":0

}

Asyoucanseethedataisnotflat–itcontainsarraysandnestedobjects.Ifwewanttocreatemappingsandusetheknowledgethatwe’vegotsofar,wewillhavetoflattenthedata.However,aswealreadysaid,Elasticsearchallowssomedegreeofstructureandweshouldbeabletocreatemappingsthatwillworkfortheprecedingexample.

www.EBooksWorld.ir

Page 318: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ObjectsTheprecedingexampledatashowsthestructuredJSONfile.Asyoucanseeintheexample,ourrootobjecthassomeadditional,simpleproperties,suchasenglishTitle,isbn,year,andcopies.Thesewillbeindexedasnormalfieldsintheindexandwealreadyknowhowtodealwiththem(wediscussedthatintheMappingsconfigurationsectionofChapter2,IndexingYourData).Inadditiontothat,ithasthecharactersarraytypeandtheauthorobject.Theauthorobjecthasanotherobjectnestedwithinit–thenameobject,whichhastwoproperties:firstNameandlastName.Soasyoucansee,wecanhavemultiplenestedobjectsinsideeachother.

www.EBooksWorld.ir

Page 319: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ArraysWehavealreadyusedarraytypedata,butwedidn’ttalkaboutit.Bydefault,allthefieldsinLuceneandthusinElasticsearcharemultivalued,whichmeansthattheycanstoremultiplevalues.InordertosendsuchfieldstoindexingtoElasticsearch,weusetheJSONarraytype,whichisnestedwithintheopeningandclosingsquarebrackets[].Asyoucanseeintheprecedingexample,weusedthearraytypeforthecharactersofourbook.

www.EBooksWorld.ir

Page 320: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

MappingsLet’snowlookathowourmappingswouldlooklikeforthebookobjectweshowedearlier.Wealreadysaidthattoindexarrayswedon’tneedanythingspecial.So,inourcase,toindexthecharactersdatawewillneedtoaddfieldsdefinitionsimilartothefollowingone:

"characters":{

"properties":{

"name":{"type":"string"}

}

}

Nothingstrange!Wejustnestthepropertiessectioninsidethearraysname(whichischaractersinourcase)andwedefinethefieldsthere.Astheresultoftheprecedingmappings,wewillgetthecharacters.namemultivaluedfieldintheindex.

Wedosimilarthingforourauthorobject.Wecallthesectionwiththesamenameasitispresentinthedata.Wehavetheauthorobject,butitalsohasthenameobjectnestedinit,sowedothesame–wejustnestanotherobjectinsideit.So,ourmappingsfortheauthorfieldwouldlookasfollows:

"author":{

"properties":{

"name":{

"properties":{

"firstName":{"type":"string"},

"lastName":{"type":"string"}

}

}

}

}

ThefirstNameandlastNamefieldsappearintheindexasauthor.name.firstNameandauthor.name.lastName.

Therestofthefieldsaresimplecoretypes,soI’llskipdiscussingthemastheywerealreadydiscussedintheMappingsconfigurationsectionofChapter2,IndexingYourData.

FinalmappingsSoourfinalmappingsfile,thatwe’vecalledstructured_mapping.json,lookslikethefollowing:

{

"book":{

"properties":{

"author":{

"type":"object",

"properties":{

"name":{

"type":"object",

www.EBooksWorld.ir

Page 321: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"properties":{

"firstName":{"type":"string"},

"lastName":{"type":"string"}

}

}

}

},

"isbn":{"type":"string"},

"englishTitle":{"type":"string"},

"year":{"type":"integer"},

"characters":{

"properties":{

"name":{"type":"string"}

}

},

"copies":{"type":"integer"}

}

}

}

SendingthemappingstoElasticsearchNowthatwehaveourmappingsdone,wewouldliketotestifalltheworkwedidactuallyworks.Thistimewewilluseaslightlydifferenttechniqueofcreatinganindexandputtingthemappings.First,let’screatethelibraryindexwiththefollowingcommand(youneedtodeletethelibraryindexifyoualreadyhaveit):

curl-XPUT'localhost:9200/library'

Now,let’ssendourmappingsforthebooktype:

curl-XPUT'localhost:9200/library/book/_mapping'-d

@structured_mapping.json

Nowwecanindexourexampledata:

curl-XPOST'localhost:9200/library/book/1'-d@structured_data.json

www.EBooksWorld.ir

Page 322: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TobeornottobedynamicAswealreadyknow,Elasticsearchisschema-less,whichmeansthatitcanindexdatawithouttheneedofcreatingthemappingsupfront.WhatElasticsearchwilldointhebackgroundwhenanewfieldisencounteredinthedataisamappingupdate–itwilltrytoguessthefieldtypeandaddittothemappings.ThedynamicbehaviorofElasticsearchisturnedonbydefault,buttheremaybesituationswhereyoumaywanttoturnitoffforsomepartsofyourindex.Inordertodothat,oneshouldaddthedynamicpropertytothegivenfieldandsetittofalse.Thisshouldbedoneonthesamelevelofnestingasthetypepropertyfortheobject,whichshouldn’tbedynamic.Forexample,ifwewantourauthorandnameobjectstonotbedynamic,weshouldmodifytherelevantpartofthemappingsfilesothatitlooksasfollows:

"author":{

"type":"object",

"dynamic":false,

"properties":{

"name":{

"type":"object",

"dynamic":false,

"properties":{

"firstName":{"type":"string","index":"analyzed"},

"lastName":{"type":"string","index":"analyzed"}

}

}

}

}

However,rememberthatinordertoaddnewfieldsforsuchobjects,wewouldhavetoupdatethemappings.

NoteYoucanalsoturnoffthedynamicmappingsfunctionalitybyaddingtheindex.mapper.dynamicpropertytoyourelasticsearch.ymlconfigurationfileandsettingittofalse.

www.EBooksWorld.ir

Page 323: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DisablingobjectindexingThereisoneadditionalthingthatwewouldliketomentionwhenitcomestoobjectshandling–wecandisableindexingaparticularobjectbyusingtheenabledpropertyandsettingittofalse.Theremaybevariousreasonsforthat,suchasnotwantingafieldtobeindexedornotwantingawholeJSONobjecttobeindexed.Forexample,ifwewanttoomitanobjectcalledinformationfromourauthorobject,wewillhavetheauthorobjectdefinitionlookasfollows:

"author":{

"type":"object",

"properties":{

"name":{

"type":"object",

"dynamic":false,

"properties":{

"firstName":{"type":"string","index":"analyzed"},

"lastName":{"type":"string","index":"analyzed"},

"information":{"type":"object","enabled":false}

}

}

}

}

Thedynamicparametercanalsobesettostrict.Thismeansthatnewfieldswon’tbeaddedintothedocumentwhentheyappearandtheindexingofsuchdocumentwillfail.

www.EBooksWorld.ir

Page 324: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 325: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UsingnestedobjectsNestedobjectscancomeinhandyincertainsituations.Basically,withnestedobjectsElasticsearchallowsustoconnectmultipledocumentstogether–onemaindocumentandmultipledependentones.Themaindocumentandthenestedonesareindexedtogetherandtheyareplacedinthesamesegmentoftheindex(actually,inthesameblockinsidethesegment,neareachother),whichguaranteesthebestperformancewecangetforsuchadatastructure.Thesamegoesforchangingthedocument;unlessyouareusingtheupdateAPI,youneedtoindextheparentdocumentandalltheothernestedonesatthesametime.

NoteIfyouwouldliketoreadmoreabouthownestedobjectsworkontheApacheLucenelevel,thereisaverygoodblogpostwrittenbyMikeMcCandlessathttp://blog.mikemccandless.com/2012/01/searching-relational-content-with.html.

Nowlet’sgetonwithourexampleusecase.Imaginethatwehaveashopwithclothesandwestorethesizeandcolorofeacht-shirt.Ourstandard,non-nestedmappingswilllooklikethis(storedincloth.json):

{

"cloth":{

"properties":{

"name":{"type":"string"},

"size":{"type":"string","index":"not_analyzed"},

"color":{"type":"string","index":"not_analyzed"}

}

}

}

Tocreatetheshopindexwithoutclothmapping,werunthefollowingcommands:

curl-XPOST'localhost:9200/shop'

curl-XPUT'localhost:9200/shop/cloth/_mapping'[email protected]

Nowimaginethatwehaveat-shirtinourshopthatweonlyhaveinXXLsizeinredandinXLsizeinblack.Soourexampledocumentindexationcommandwilllookasfollows:

curl-XPOST'localhost:9200/shop/cloth/1'-d'{

"name":"Testshirt",

"size":["XXL","XL"],

"color":["red","black"]

}'

However,thereisaproblemwithsuchadatastructure.WhatifoneofourclientssearchesourshopinordertofindtheXXLt-shirtinblack?Let’scheckthatbyrunningthefollowingquery(weassumethatwe’veusedourmappingstocreatetheindexandwe’veindexedourexampledocument):

curl-XGET'localhost:9200/shop/cloth/_search?pretty=true'-d'{

"query":{

www.EBooksWorld.ir

Page 326: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"bool":{

"must":[

{

"term":{"size":"XXL"}

},

{

"term":{"color":"black"}

}

]

}

}

}'

Weshouldgetnoresults,right?ButinfactElasticsearchreturnedthefollowingdocument:

{

(…)

"hits":{

"total":1,

"max_score":0.4339554,

"hits":[{

"_index":"shop",

"_type":"cloth",

"_id":"1",

"_score":0.4339554,

"_source":{

"name":"Testshirt",

"size":["XXL","XL"],

"color":["red","black"]

}

}]

}

}

Thisisbecausethedocumentwasmatched–wehavethevalueswearesearchingforinthesizefieldandinthecolorfield.Ofcourse,thisisnotwhatwewouldliketoget.

So,let’smodifyourmappingstousethenestedobjectstoseparatecolorandsizetodifferentnesteddocuments.Thefinalmappinglooksasfollows(westorethesemappingsinthecloth_nested.jsonfile):

{

"cloth":{

"properties":{

"name":{"type":"string","index":"analyzed"},

"variation":{

"type":"nested",

"properties":{

"size":{"type":"string","index":"not_analyzed"},

"color":{"type":"string","index":"not_analyzed"}

}

}

}

}

}

www.EBooksWorld.ir

Page 327: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Now,wewillcreateasecondindexcalledshop_nestedusingourmodifiedmappingsbyrunningthefollowingcommands:

curl-XPOST'localhost:9200/shop_nested'

curl-XPUT'localhost:9200/shop_nested/cloth/_mapping'-d

@cloth_nested.json

Asyoucansee,we’veintroducedanewobjectinsideourclothtype–variationone,whichisanestedone(thetypepropertysettonested).Itbasicallysaysthatwewillwanttoindexthenesteddocuments.Now,let’smodifyourdocument.Wewilladdthevariationobjecttoitandthatobjectwillstoretheobjectswithtwoproperties–sizeandcolor.Sotheindexcommandforourmodifiedexampleproductwilllooklikethefollowing:

curl-XPOST'localhost:9200/shop_nested/cloth/1'-d'{

"name":"Testshirt",

"variation":[

{"size":"XXL","color":"red"},

{"size":"XL","color":"black"}

]

}'

We’vestructuredthedocumentsothateachsizeanditsmatchingcolorisaseparatedocument.However,ifyourunourpreviousquery,itwon’treturnanydocuments.Thisisbecauseinordertoqueryfornesteddocuments,weneedtouseaspecializedquery.Sonowourquerylooksasfollows:

curl-XGET'localhost:9200/shop_nested/cloth/_search?pretty=true'-d'{

"query":{

"nested":{

"path":"variation",

"query":{

"bool":{

"must":[

{"term":{"variation.size":"XXL"}},

{"term":{"variation.color":"black"}}

]

}

}

}

}

}'

Andnow,theprecedingquerywillnotreturntheindexeddocument,becausewedon’thaveanesteddocumentthathasthesizeequaltoXXLandcolorblack.

Let’sgetbacktothequeryforasecondtodiscussitbriefly.Asyoucansee,weusedthenestedqueryinordertosearchinthenesteddocuments.Thepathpropertyspecifiesthenameofthenestedobject(yes,wecanhavemultipleofthem).Wejustincludedastandardquerysectionunderthenestedtype.Alsonotethatwespecifiedthefullpathforthefieldnamesinthenestedobjects,whichishandywhenyouhavemultilevelnesting,whichisalsopossible.

www.EBooksWorld.ir

Page 328: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ScoringandnestedqueriesThereisoneadditionalpropertywhenitcomestohandlingnesteddocumentsduringquery.Inadditiontothepathproperty,thereisthescore_modeproperty,whichallowsustodefinehowthescoringiscalculatedfromthenestedqueries.Elasticsearchallowsustosetthescore_modepropertytooneofthefollowingvalues:

avg:Thisisthedefaultvalue.Usingitforthescore_modepropertywillresultinElasticsearchtakingtheaveragevaluecalculatedfromthescoresofthedefinednestedqueries.Calculatedaveragewillbeincludedinthescoreofthemainquery.sum:Usingthisvalueforthescore_modepropertywillresultinElasticsearchtakingasumofthescoresforeachnestedqueryandincludingitinthescoreofthemainquery.min:Usingthisvalueforthescore_modepropertywillresultinElasticsearchtakingthescoreoftheminimumscoringnestedqueryandincludingitinthescoreofthemainquery.max:Usingthisvalueforthescore_modepropertywillresultinElasticsearchtakingthescoreofthemaximumscoringnestedqueryandincludingitinthescoreofthemainquery.none:Usingthisvalueforthescore_modepropertywillresultinnoscorebeingtakenfromthenestedquery.

www.EBooksWorld.ir

Page 329: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 330: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Usingtheparent-childrelationshipIntheprevioussection,wediscussedusingElasticsearchtoindexthenesteddocumentsalongwiththeparentone.However,eventhoughthenesteddocumentsareindexedasseparatedocumentsintheindex,wecan’tchangeasinglenesteddocument(unlessweusetheupdateAPI).Elasticsearchallowsustohavearealparent-childrelationshipandwewilllookatitinthefollowingsection.

www.EBooksWorld.ir

Page 331: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IndexstructureanddataindexingLet’susethesameexamplethatweusedwhendiscussingthenesteddocuments–thehypotheticalclothstore.Whatwewouldliketohaveistheabilitytoupdatethesizesandcolorswithouttheneedtoindexthewholeparentdocumentaftereachchange.WewillseehowtoachievethatusingElasticsearchparent-childfunctionality.

ChildmappingsFirstwehavetocreateachildindexdefinition.Tocreatechildmappings,weneedtoaddthe_parentpropertywiththenameoftheparenttype,whichwillbeclothinourcase.Inthechildrendocuments,wewanttohavethesizeandthecolorofthecloth.So,thecommandthatwillcreatetheshopindexandthevariationtypewilllookasfollows:

curl-XPOST'localhost:9200/shop'

curl-XPUT'localhost:9200/shop/variation/_mapping'-d'{

"variation":{

"_parent":{"type":"cloth"},

"properties":{

"size":{"type":"string","index":"not_analyzed"},

"color":{"type":"string","index":"not_analyzed"}

}

}

}'

Andthat’sall.Youdon’tneedtospecifywhichfieldwillbeusedtoconnectthechilddocumentstotheparentones.Bydefault,Elasticsearchwillusethedocuments’uniqueidentifierforthat.Ifyourememberfromthepreviouschapters,theinformationaboutauniqueidentifierispresentintheindexbydefault.

ParentmappingsTheonlyfieldweneedtohaveinourparentdocumentisname.Wedon’tneedanythingmorethanthat.So,inordertocreateourclothtypeintheshopindex,wewillrunthefollowingcommands:

curl-XPUT'localhost:9200/shop/cloth/_mapping'-d'{

"cloth":{

"properties":{

"name":{"type":"string"}

}

}

}'

TheparentdocumentNowwearegoingtoindexourparentdocument.Aswewanttostoretheinformationaboutthesizeandthecolorinthechilddocuments,theonlythingweneedtohaveintheparentdocumentsisthename.Ofcourse,thereisonethingtoremember–ourparentdocumentsneedtobeoftypecloth,becauseofthe_parentpropertyvalueinthechildmappings.Theindexingcommandforourparentdocumentisverysimpleandlooksasfollows:

www.EBooksWorld.ir

Page 332: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

curl-XPOST'localhost:9200/shop/cloth/1'-d'{

"name":"Testshirt"

}'

Ifyoulookattheprecedingcommand,you’llnoticethatourdocumentwillbegiventheidentifier1.

ChilddocumentsToindexthechilddocuments,weneedtoprovideinformationabouttheparentdocumentwiththeuseoftheparentrequestparameter.Thevalueoftheparentparametershouldpointtotheidentifieroftheparentdocument.So,toindextwochilddocumentstoourparentdocument,weneedtorunthefollowingcommandlines:

curl-XPOST'localhost:9200/shop/variation/1000?parent=1'-d'{

"color":"red",

"size":"XXL"

}'

curl-XPOST'localhost:9200/shop/variation/1001?parent=1'-d'{

"color":"black",

"size":"XL"

}'

Andthat’sall.We’veindexedtwoadditionaldocuments,whichareofourvariationtype,butwe’vespecifiedthatourdocumentshaveaparent,thedocumentwithanidentifierof1.

www.EBooksWorld.ir

Page 333: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

QueryingWe’veindexedourdataandnowweneedtouseappropriatequeriestomatchthedocumentswiththedatastoredintheirchildren.Thisisbecause,bydefault,Elasticsearchsearchesonthedocumentswithoutlookingattheparent-childrelations.Forexample,thefollowingquerywillmatchallthreedocumentsthatwe’veindexed(twochildrenandoneparent):

curl-XGET'localhost:9200/shop/_search?q=*&pretty'

Thisisnotwhatwewouldliketoachieve,atleastinmostcases.Usually,weareinterestedinparentdocumentsthathavechildrenmatchingthequery.OfcourseElasticsearchprovidessuchfunctionalitieswithspecializedtypesofqueries.

NoteThethingtorememberthoughisthat,whenrunningqueriesagainstparents,thechildrendocumentswon’tbereturned,andviceversa.

QueryingdatainthechilddocumentsImaginethatwewanttogetclothesthatareoftheXXLsizeandarered.Asyourecall,thesizeandthecoloroftheclothareindexedinthechilddocuments,soweneedaspecializedhas_childquery,tocheckwhichparentdocumentshavechildrenwiththedesiredsizeandcolor.Soanexamplequerythatmatchesourrequirementlooksasfollows:

curl-XGET'localhost:9200/shop/_search?pretty'-d'{

"query":{

"has_child":{

"type":"variation",

"query":{

"bool":{

"must":[

{"term":{"size":"XXL"}},

{"term":{"color":"red"}}

]

}

}

}

}

}'

Thequeryisquitesimple;itisofthehas_childtype,whichtellsElasticsearchthatwewanttosearchinthechilddocuments.Inordertospecifywhichtypeofchildrenweareinterestedin,wespecifythetypepropertywiththenameofthechildtype.Thequeryisprovidedusingthequeryproperty.We’veusedastandardboolquery,whichwe’vealreadydiscussed.Theresultofthequerywillcontainonlythoseparentdocumentsthathavechildrenmatchingourboolquery.Inourcase,thesingledocumentreturnedlooksasfollows:

{

www.EBooksWorld.ir

Page 334: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"took":16,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":1.0,

"hits":[{

"_index":"shop",

"_type":"cloth",

"_id":"1",

"_score":1.0,

"_source":{

"name":"Testshirt"

}

}]

}

}

Thehas_childqueryallowsustoprovideadditionalparameterstocontrolitsbehavior.Everyparentdocumentfoundmaybeconnectedwithoneormorechilddocuments.Thismeansthateverychilddocumentcaninfluencetheresultingscore.Bydefault,thequerydoesn’tcareaboutthechildrendocuments,howmanyofthemmatched,andwhatistheircontent–itonlymattersiftheymatchthequeryornot.Thiscanbechangedbyusingthescore_modeparameter,whichcontrolsthescorecalculationofthehas_childquery.Thevaluesthisparametercantakeare:

none:Thedefaultone,thescoregeneratedbytherelationis1.0min:Thescoreistakenfromthelowestscoredchildmax:Thescoreistakenfromthehighestscoredchildsum:Thescoreiscalculatedasthesumofthechildscoresavg:Thescoreistakenastheaverageofthechildscores

Let’sseeanexample:

curl-XGET'localhost:9200/shop/_search?pretty'-d'{

"query":{

"has_child":{

"type":"variation",

"score_mode":"sum",

"query":{

"bool":{

"must":[

{"term":{"size":"XXL"}},

{"term":{"color":"red"}}

]

}

}

}

}

}'

www.EBooksWorld.ir

Page 335: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Weusedsumasscore_modewhichresultsinchildrencontributingtothefinalscoreoftheparentdocument–thecontributionisthesumofscoresofeverychilddocumentmatchingthequery.

Andfinally,wecanlimitthenumberofchildrendocumentsthatneedtobematched;wecanspecifyboththemaximumnumberofthechildrendocumentsallowedtobematched(themax_childrenproperty)andtheminimumnumberofchildrendocuments(themin_childrenproperty)thatneedtobematched.Thequeryillustratingtheusageoftheseparametersisasfollows:

curl-XGET'localhost:9200/shop/_search?pretty'-d'{

"query":{

"has_child":{

"type":"variation",

"min_children":1,

"max_children":3,

"query":{

"bool":{

"must":[

{"term":{"size":"XXL"}},

{"term":{"color":"red"}}

]

}

}

}

}

}'

QueryingdataintheparentdocumentsSometimes,wearenotinterestedintheparentdocumentsbutinthechildrendocuments.Ifyouwouldliketoreturnthechilddocumentsthatmatchesagivendataintheparentdocument,Elasticsearchhasaqueryforus–thehas_parentquery.Itissimilartothehas_childquery;however,insteadofthetypeproperty,wespecifytheparent_typepropertywiththevalueoftheparentdocumenttype.Forexample,thefollowingquerywillreturnboththechilddocumentsthatwe’veindexed,butnottheparentdocument:

curl-XGET'localhost:9200/shop/_search?pretty'-d'{

"query":{

"has_parent":{

"parent_type":"cloth",

"query":{

"term":{"name":"test"}

}

}

}

}'

TheresponsefromElasticsearchwillbesimilartothefollowingone:

{

"took":3,

"timed_out":false,

"_shards":{

www.EBooksWorld.ir

Page 336: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":2,

"max_score":1.0,

"hits":[{

"_index":"shop",

"_type":"variation",

"_id":"1000",

"_score":1.0,

"_routing":"1",

"_parent":"1",

"_source":{

"color":"red",

"size":"XXL"

}

},{

"_index":"shop",

"_type":"variation",

"_id":"1001",

"_score":1.0,

"_routing":"1",

"_parent":"1",

"_source":{

"color":"black",

"size":"XL"

}

}]

}

}

Similartothehas_childquery,thehas_parentqueryalsogivesusthepossibilityoftuningthescorecalculationofthequery.Inthiscase,score_modehasonlytwooptions:none,thedefaultonewherethescorecalculatedbythequeryisequalto1.0,andscore,whichcalculatesthescoreofthedocumentonthebasisoftheparentdocumentcontents.Anexamplethatusesscore_modeinthehas_parentquerylooksasfollows:

curl-XGET'localhost:9200/shop/_search?pretty'-d'{

"query":{

"has_parent":{

"parent_type":"cloth",

"score_mode":"score",

"query":{

"term":{"name":"test"}

}

}

}

}'

Theonedifferencewiththepreviousexampleisscore_mode.Ifyouchecktheresultsofthesequeries,you’llnoticeonlyasingledifference.Thescoreofallthedocumentsfromthefirstexampleis1.0,whilethescorefortheresultsreturnedbytheprecedingqueryisequalto0.8784157.Inthiscase,allthedocumentsfoundhavethesamescore,because

www.EBooksWorld.ir

Page 337: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

theyhaveacommonparentdocument.

www.EBooksWorld.ir

Page 338: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PerformanceconsiderationsWhenusingElasticsearchparent-childfunctionality,youhavetobeawareoftheperformanceimpactthatithas.Thefirstthingyouneedtorememberisthattheparentandthechilddocumentsneedtobestoredinthesameshardinorderforthequeriestowork.Ifyouhappentohaveahighnumberofchildrenforasingleparent,youmayendupwithshardsnothavingasimilarnumberofdocuments.Becauseofthat,yourqueryperformancecanbelowerononeofthenodes,resultinginthewholequerybeingslower.Also,rememberthatparent-childquerieswillbeslowerthanonesthatrunagainstthedocumentsthatdon’thavearelationshipbetweenthem.Thereisawayofspeedingupjoinsfortheparent-childqueriesatthecostofmemorybyeagerlyloadingthesocalledglobalordinals;however,wewilldiscussthatmethodintheElasticsearchcachessectionofChapter9,ElasticsearchClusterinDetail.

Finally,thefirstquerywillpreloadandcachethedocumentidentifiersusingthedocvalues.Thistakestime.Inordertoimprovetheperformanceofinitialqueriesthatusetheparent-childrelationship,WarmerAPIcanbeused.YoucanfindmoreinformationabouthowtoaddwarmingqueriestoElasticsearchintheWarmingupsectionofChapter10,AdministratingYourCluster.

www.EBooksWorld.ir

Page 339: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 340: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ModifyingyourindexstructurewiththeupdateAPIInthepreviouschapters,wediscussedhowtocreateindexmappingsandindexthedata.Butwhatifyoualreadyhavethemappingscreated,anddataindexed,butyouwanttomodifythestructureoftheindex?Ofcourseonecouldsaythatwecouldjustcreateanewindexwithnewmappings,butthatisnotalwaysapossibility,especiallyinaproductionenvironment.Thisispossibletosomeextent.Forexample,bydefault,ifweindexadocumentwithanewfield,Elasticsearchwilladdthatfieldtotheindexstructure.Let’snowlookathowtomodifytheindexstructuremanually.

NoteForsituationswheremappingchangesareneededandtheyarenotpossiblebecauseofconflictswiththecurrentindexstructure,itisverygoodtousealiases–bothreadandwriteones.WewilldiscussaliasingintheIndexaliasingsectionofChapter10,AdministratingYourCluster.

www.EBooksWorld.ir

Page 341: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThemappingsLet’sassumethatwehavethefollowingmappingsforourusersindexstoredintheuser.jsonfile:

{

"user":{

"properties":{

"name":{"type":"string"}

}

}

}

Asyoucansee,itisverysimple.Itjusthasasinglepropertythatwillholdtheusername.Nowlet’screateanindexcalledusersandlet’susetheprecedingmappingstocreateourtype.Todothat,wewillrunthefollowingcommands:

curl-XPOST'localhost:9200/users'

curl-XPUT'localhost:9200/users/user/_mapping'[email protected]

Ifeverythinggoeswell,wewillhaveourindex(calledusers)andtype(calleduser)created.Sonowlet’strytoaddanewfieldtothemappings.

AddinganewfieldtotheexistingindexInordertoillustratehowtoaddanewfieldtoourmappings,weassumethatwewanttoaddaphonenumbertothedatastoredforeachuser.Inordertodothat,weneedtosendanHTTPPUTcommandtothe/index_name/type_name/_mappingRESTendpointwiththeproperbodythatwillincludeournewfield.Forexample,toaddthementionedphonefield,wewillrunthefollowingcommand:

curl-XPUT'http://localhost:9200/users/user/_mapping'-d'{

"user":{

"properties":{

"phone":{"type":"string",index:"not_analyzed"}

}

}

}'

Similartothepreviouscommandweran,ifeverythinggoeswell,weshouldhaveanewfieldaddedtoourindexstructure.

NoteOfcourse,Elasticsearchwon’treindexourdataorpopulatethenewlyaddedfieldautomatically.Itwilljustalterthemappingsheldbythemasternodeandpopulatethemappingstoalltheothernodesintheclusterandthat’sall.Datareindexationmustbedonebyusortheapplicationthatindexesthedatainourenvironment.Untilthen,theolddocumentswon’thavethenewlyaddedfield.Thisiscrucialtoremember.Ifyoudon’thavetheoriginaldocuments,youcanusethe_sourcefieldtogettheoriginaldatafromElasticsearchandindexthemonceagain.

Toensureeverythingisokay,wecanruntheGETHTTPrequesttothe_mappingRESTend

www.EBooksWorld.ir

Page 342: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

pointandElasticsearchwillreturntheappropriatemappings.Anexamplecommandtogetthemappingsforourusertypeintheusersindexwilllookasfollows:

curl-XGET'localhost:9200/users/user/_mapping?pretty'

ModifyingfieldsofanexistingindexOurusersindexstructurecontainstwofields:nameandphone.Let’simaginethatweindexedsomedatabutafterawhilewedecidedthatwewanttosearchonthephonefieldandwewouldliketochangeitsindexpropertyfromnot_analyzedtoanalyzed.Becausewealreadyknowhowtoaltertheindexstructure,wewillrunthefollowingcommand:

curl-XPUT'http://localhost:9200/users/user/_mapping?pretty'-d'{

"user":{

"properties":{

"phone":{"type":"string","store":"yes","index":"analyzed"}

}

}

}'

WhatElasticsearchwillreturnisaresponseindicatinganerror,whichlooksasfollows:

{

"error":{

"root_cause":[{

"type":"illegal_argument_exception",

"reason":"Mapperfor[phone]conflictswithexistingmappingin

othertypes:\n[mapper[phone]hasdifferent[index]values,mapper[phone]

hasdifferent[store]values,mapper[phone]hasdifferent[omit_norms]

values,cannotchangefromdisabletoenabled,mapper[phone]hasdifferent

[analyzer]]"

}],

"type":"illegal_argument_exception",

"reason":"Mapperfor[phone]conflictswithexistingmappinginother

types:\n[mapper[phone]hasdifferent[index]values,mapper[phone]has

different[store]values,mapper[phone]hasdifferent[omit_norms]values,

cannotchangefromdisabletoenabled,mapper[phone]hasdifferent

[analyzer]]"

},

"status":400

}

Thisisbecausewecan’tchangeafieldthatwassettobenot_analyzedtoonethatisanalyzed.Andnotonlythat,inmostcasesyouwon’tbeabletoupdatethefieldsmapping.Thisisagoodthing,becauseifwewouldbeallowedtochangesuchsettings,wewouldconfuseElasticsearchandLucene.Imaginethatwealreadyhavemanydocumentswiththephonefieldsettonot_analyzedandweareallowedtochangethemappingstoanalyzed.Elasticsearchwouldn’tchangethedatathatwasalreadyindexed,butthequeriesthatareanalyzedwouldbeprocessedwithadifferentlogicandthusyouwouldn’tbeabletoproperlyfindyourdata.

However,togiveyousomeexamplesofwhatisprohibitedandwhatisnot,wedecidedtomentionsomeoftheoperationsforboththecases.Forexample,thefollowingmodificationcanbesafelymade:

www.EBooksWorld.ir

Page 343: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AddinganewtypedefinitionAddinganewfieldAddinganewanalyzer

Thefollowingmodificationsareprohibitedorwillnotwork:

EnablingnormsforafieldChangingafieldtobestoredornotstoredChangingthetypeofthefield(forexample,fromtexttonumeric)ChangingastoredfieldtonotstoredandviceversaChangingthevalueofindexedpropertyChangingtheanalyzerofanalreadyindexeddocument

RememberthattheprecedingmentionedexamplesofallowedandnotallowedupdatesdonotmentionallthepossibilitiesofupdateAPIusageandyouhavetotryforyourselfiftheupdateyouaretryingtodowillwork.

www.EBooksWorld.ir

Page 344: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 345: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SummaryThechapteryoujustfinishedreadingconcentratedonindexingoperationsandhandlingdatathatisnotflatorhaverelationshipsbetweenthedocuments.Westartedwithindexingtree-likestructuresandobjectsinElasticsearch.Wealsousednestedobjectsandlearnedwhentheycanbeused.Wealsousedparent-childfunctionalityandwelearnedhowthisapproachisdifferentcomparedtonesteddocuments.Finally,wemodifiedourindicesstructurewithacallofanAPIandlearnedwhenthisispossible.

Inthenextchapter,wewillgetbacktoqueryingrelatedtopics.WewilllearnhowLucenescoringworks,howtousescriptsinElasticsearch,andhowtohandlemultilingualdata.Wewillaffectscoringusingboostsandwewillusesynonymstoimproveusers’searchresults.Finally,wewilllookatwhatwecandotoseehowourdocumentswerescored.

www.EBooksWorld.ir

Page 346: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 347: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Chapter6.MakeYourSearchBetterInthepreviouschapter,wewerefocusedonindexingoperations;welearnedhowtohandlethestructureddata.Westartedwithindexingtree-likestructuresandJSONobjects.Weusednestedobjectsandindexeddocumentsusingparent-childfunctionality.Finally,attheendofthechapter,weusedElasticsearchAPItomodifyourindicesstructures.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

UnderstandinghowApacheLucenescoringworksUsingscriptingHandlingmultilingualdataUsingboostingtoaffectdocumentscoringUsingsynonymsUnderstandinghowyourdocumentswerescored

www.EBooksWorld.ir

Page 348: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IntroductiontoApacheLucenescoringWhentalkingaboutqueriesandtheirrelevance,wecan’tomittheinformationaboutthescoringandwhereitcomesfrom.Butwhatisascore?Thescoreisapropertythatdescribestherelevanceofadocumentinthecontextofaquery.Inthefollowingsection,wewilltalkaboutthedefaultApacheLucenescoringmechanism–theTF/IDFalgorithmandhowitaffectsthereturneddocument.

NoteTheTF/IDFisnottheonlyavailablealgorithmexposedbyElasticsearch.Formoreinformationabouttheavailablemodels,refertotheAvailablesimilaritymodelssectioninChapter2,IndexingYourData.YoucanalsorefertothebooksMasteringElasticsearchandMasteringElasticsearchSecondEditionpublishedbyPacktPublishing.

www.EBooksWorld.ir

Page 349: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

WhenadocumentismatchedWhenadocumentisreturnedbyLucene,itmeansthatitmatchedthequerywesenttoit.Inmostcases,eachoftheresultingdocumentsintheresponseisgivenascore.Thehigherthescore,themorerelevantthedocumentisfromthesearchengine’spointofview,ofcourse,inthecontextofagivenquery.Thismeansthatthescorefactorcalculatedforthesamedocumentontwodifferentquerieswillbedifferent.Becauseofthat,comparingscoresbetweenqueriesusuallydoesn’tmakemuchsense.However,let’sgetbacktothescoring.Tocalculatethescorepropertyforadocument,multiplefactorsaretakenintoaccount:

documentboost:Theboostvaluegivenforadocumentduringindexing.fieldboost:Theboostvaluegivenforafieldduringqueryingandindexing.coord:Thecoordinationfactorthatisbasedonthenumberoftermsthedocumenthas.Itisresponsibleforgivingmorevaluetothedocumentsthatcontainmoresearchtermscomparedtotheotherdocuments.inversedocumentfrequency:Thetermbasedfactorthattellsthescoringformulahowrareforscorepropertycalculation:inversedocumentfrequency”thegiventermis.Thehighertheinversedocumentfrequencythelesscommonthetermis.lengthnorm:Thefieldbasedfactorfornormalizationbasedonthenumberoftermsthegivenfieldcontains.Thelongerthefield,thesmallerboostthisfactorwillgive.Itbasicallymeansthattheshorterdocumentswillbefavored.termfrequency:Thetermbasedfactordescribinghowmanytimesthegiventermoccursinadocument.Thehigherthetermfrequency,thehigherthescoreofthedocumentwillbe.querynorm:Thequerybasednormalizationfactorthatiscalculatedasthesumofthesquaredweightofeachofthequeryterms.Querynormisusedtoallowscorecomparisonbetweenqueries,whichwesaidisnotalwayseasyorpossible.

www.EBooksWorld.ir

Page 350: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DefaultscoringformulaThepracticalformulafortheTF/IDFalgorithmlooksasfollows:

Toadjustyourqueryrelevance,youdon’tneedtorememberthedetailsoftheequation,butitisveryimportanttoknowhowitworks–toatleastbeawarethatthereisanequationyoucananalyze.Wecanseethatthescorefactorforthedocumentisafunctionofqueryqanddocumentd.Therearealsotwofactorsthatarenotdependentdirectlyonqueryterms:coordandqueryNorm.Thesetwoelementsoftheformulaaremultipliedbythesumcalculatedforeachterminthequery.Thesumontheotherhandiscalculatedbymultiplyingthetermfrequencyforthegiventerm,itsinversedocumentfrequency,termboost,andthenorm,whichisthelengthnormwediscussedpreviously.

NoteNotethattheprecedingformulaisapracticalone.YoucanfindmoreinformationabouttheconceptualformulainLuceneJavadocsathttp://lucene.apache.org/core/5_4_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

Thegoodthingabouttheprecedingrulesisthatyoudon’tneedtorememberallofthat.Whatyoushouldbeawareofiswhatmatterswhenitcomestothedocumentscore.Basically,thereareafewruleswhichcomefromtheprecedingmentionedequation:

Therarerthematchedtermis,thehigherthescorethedocumentwillhaveTheshorterthedocumentfieldsare(thelesstermstheyhave),thehigherthescorethedocumentwillhaveThehighertheboostforthefieldsis,thehigherthescorethedocumentwillhave

Aswecansee,Lucenegivesahigherscoreforthedocumentsthathavemanyquerytermsmatchedandhaveshorterfields(lesstermsindexed)thatwereusedformatching,anditalsofavorsrarertermsinsteadofthecommonones(ofcourse,theonesthatmatched).

www.EBooksWorld.ir

Page 351: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RelevancymattersInmostcases,wewanttogetthebestmatchingdocuments.However,themostrelevantdocumentsdon’talwaysmeanthesameasthebestmatches.Someusecasesdefineverystrictrulesonwhyagivendocumentshouldbehigherontheresultslist.Forexample,onecouldsaythat,inadditiontothedocumentbeingaperfectmatchintermsofTF/IDFsimilarity,wehavepayingcustomerstoconsider.Dependingonthecustomerplan,wewanttogivemoreimportancetosuchdocuments.Insuchcases,wecouldwantthedocumentsforthecustomersthatpaythemosttobeontopofthesearchresults.Ofcourse,thisisnotrelevantinTF/IDF.

Theotherexampleisyellowpages,wherecustomerspayformoreinformationdescribingthedocument.SuchlargedocumentsmaynotbethemostrelevantonesaccordingtoTF/IDF,soyoumaywanttoadjustthescoringifyouareworkingwithsuchdata.

TheseareverysimpleexamplesandElasticsearchqueriescanbecomereallycomplicated.WewilltalkaboutsuchqueriesintheInfluencingscoreswithqueryboostssectioninthischapter.

Whenworkingonsearchrelevance,youshouldalwaysrememberthatitisnotaonetimeprocess.Yourdatawillchangewithtimeandyourquerieswillneedtobeadjusted.Inmostcases,tuningthequeryrelevancywillbeconstantwork.Youwillneedtoreacttoyourbusinessrulesandneeds,tohowtheusersbehave,andsoon.Itisveryimportanttorememberthatthisprocessisnotasingletimeoneaboutwhichyoucanforget.

www.EBooksWorld.ir

Page 352: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 353: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ScriptingcapabilitiesofElasticsearchElasticsearchhasafewfunctionalitieswherescriptscanbeused.You’vealreadyseenexamplessuchasupdatingdocumentsandsearching.WewillalsousethescriptingcapabilitiesofElasticsearchwhenwediscussaggregations.Eventhoughscriptsseemtobearatheradvancedtopic,wewilllookatthepossibilitiesofferedbyElasticsearch.That’sbecausescriptsarepricelessincertainsituations.

Elasticsearchcanuseseverallanguagesforscripting.Whennotexplicitlydeclared,itassumesthatGroovy(www.groovy-lang.org/)isused.OtherlanguagesavailableoutoftheboxareLuceneexpressionlanguageandMustache(https://mustache.github.io/).Ofcoursewecanuseplugins,whichwillmakeElasticsearchunderstandadditionalscriptinglanguages,suchasJavaScript,MVEL,andPython.Thethingworthmentioningisthatindependentfromthescriptinglanguagethatwechoose,Elasticsearchexposesobjectsthatwecanuseinourscripts.Let’sstartbybrieflylookingatwhattypeofinformationweareallowedtouseinourscripts.

www.EBooksWorld.ir

Page 354: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ObjectsavailableduringscriptexecutionDuringdifferentoperations,Elasticsearchallowsustousedifferentobjectsinourscripts.Todevelopascriptthatfitsourusecase,weshouldbefamiliarwiththeseobjects.

Forexample,duringasearchoperation,thefollowingobjectsareavailable:

_doc(alsoavailableasdoc):Thisisaninstanceoftheorg.elasticsearch.search.lookup.LeafDocLookupobject.Itgivesusaccesstothecurrentdocumentfoundwiththecalculatedscoreandfieldvalues._source:Thisisaninstanceoftheorg.elasticsearch.search.lookup.SourceLookupobject.Itprovidesaccesstothesourceofthecurrentdocumentandthevaluesdefinedinthesource._fields:Thisisaninstanceoftheorg.elasticsearch.search.lookup.LeafFieldsLookupobject.Itcanbeusedtoaccessthevaluesofthedocumentfields.

Ontheotherhand,duringadocumentupdateoperation,theprecedingmentionedvariablesarenotaccessible.Elasticsearchexposesonlythectxobjectwiththe_sourceproperty,whichprovidesaccesstothedocumentcurrentlyprocessedintheupdaterequest.

Aswehavepreviouslyseen,severalmethodsarementionedinthecontextofdocumentfieldsandtheirvalues.Let’snowlookatexamplesofhowtogetthevalueforaparticularfieldusingthepreviouslymentionedobjectavailableduringthesearchoperation.Inthebracketsafterthescriptpiece,youcanseewhatElasticsearchwillreturnforoneofourexampledocumentsfromthelibraryindex(wewillusethedocumentwithidentifier4):

_doc.title.value(and)_source.title(crimeandpunishment)_fields.title.value(null)

Abitconfusing,isn’tit?Duringindexing,theoriginaldocumentisbydefaultstoredinthe_sourcefield.Ofcourse,bydefault,allthefieldsarepresentinthat_sourcefield.Inadditiontothat,thedocumentisparsedandeveryfieldmaybestoredinanindexifitismarkedasstored(thatis,ifthestorepropertyissettotrue;otherwise,bydefault,thefieldsarenotstored).Finally,thefieldvaluemaybeconfiguredasindexed.Thismeansthatthefieldvalueisanalyzedandplacedintheindex.Tosumup,onefieldmaylandinElasticsearchindexinthefollowingways:

Asapartofthe_sourcedocumentAsastoredandunparsedoriginalvalueAsanindexedvaluethatisprocessedbyananalyzer

Inscripts,wehaveaccesstoallthesefieldrepresentations.Theonlyexceptionistheupdateoperation,which,aswe’vementionedbefore,givesusonlyaccesstodocument_sourceaspartofthectxvariable.Youmaywonderwhichversionyoushoulduse.Well,ifyouwantaccesstotheprocessedform,theanswerwillbesimple–usethe_docobject.Whatabout_sourceand_fields?Inmostcases,_sourceisagoodchoice.Itisusually

www.EBooksWorld.ir

Page 355: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

fastandneedslessdiskoperationsthanreadingtheoriginalfieldvaluesfromtheindex.Thisisespeciallytruewhenyouneedtoreadthevaluesofmultiplefieldsinyourscripts;fetchingasingle_sourcefieldisfasterthanfetchingmultipleindependentfieldsfromtheindex.

www.EBooksWorld.ir

Page 356: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ScripttypesElasticsearchallowsustousescriptsinthreedifferentways:

Inlinescripts:ThesourceofthescriptisdirectlydefinedinthequeryInfilescripts:ThesourceisdefinedintheexternalfileplacedintheElasticsearchconfig/scriptsdirectoryAsadocumentinthededicatedindex:Thesourceofthescriptisdefinedasadocumentinaspecialindexavailablebyusingthe/_scriptsAPIend-point

Choosingthewaytodefinescriptsdependsonseveralfactors.Ifyouhavescriptswhichyouwilluseinmanydifferentqueries,thefileorthededicatedindexseemtobethebestsolutions.Thescriptsinfileisprobablylessconvenient,butitispreferredfromthesecuritypointofview;theycan’tbeoverwrittenandinjectedintoyourquerycausingasecuritybreach.

InfilescriptsThisistheonlywaytoallowdynamicscriptingifwedon’twanttoenablequerydynamicscriptinginElasticsearch.Theideaisthateveryscriptusedbythequeriesisdefinedinitsownfileplacedintheconfig/scriptsdirectory.Wewillnowlookatthismethodofusingscripts.Let’screateanexamplefilecalledtag_sort.groovyandlet’splaceitintheconfig/scriptsdirectoryofourElasticsearchinstance(orinstancesifwerunacluster).Thecontentofthementionedfileshouldlooklikethis:

_doc.tags.values.size()>0?_doc.tags.values[0]:'\u19999'

Afterfewseconds,Elasticsearchwillautomaticallyloadanewfile.YoushouldseesomethinglikethefollowingintheElasticsearchlogs:

[2015-08-3013:14:33,005][INFO][script][AlexWilder]

compilingscriptfile[/Users/negativ/Developer/ES/es-

current/config/scripts/tag_sort.groovy]

NoteIfyouhavemulti-nodecluster,youhavetomakesurethatthescriptisavailableoneverynode.

Nowwearereadytousethisscriptinourqueries.YoumayrememberthatweusedexactlythesamescriptintheSortingdatasectioninChapter4,ExtendingYourQueryingKnowledge.Nowthemodifiedquerythatusesourscriptstoredinthefilelooksasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"file":"tag_sort"

www.EBooksWorld.ir

Page 357: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

},

"type":"string",

"order":"asc"

}

}

}'

Wewillreturntothis,butfirst,thenextpossiblewayofdefininginlinescripts.

InlinescriptsInlinescriptsareamoreconvenientwayofusingscripts,especiallyforconstantlychangingqueriesandforad-hocqueries.Themaindrawbackofsuchanapproachissecurity.Ifweallowuserstorunanykindofquery,includingscripts,wecanexposeourElasticsearchinstancetoattackers.SuchattackscanexecutearbitrarycodeontheserverrunningElasticsearchwithrightsequaltotheonesgiventotheuserrunningElasticsearch.Intheworstcasescenario,theattackercouldusesecurityholestogainsuperuserrights.Thisisthereasonwhyinlinescriptsaredisabledbydefault.Aftercarefulconsideration,youcanenablethembyadding:

script.inline:on

Addtheprecedingcommandlinetotheelasticsearch.ymlfile.

Afterallowingtheinlinescripttobeexecuted,wecanrunaquerythatlooksasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"inline":"_doc.tags.values.size()>0?_doc.tags.values[0]:

\"\u19999\""

},

"type":"string",

"order":"asc"

}

}

}'

IndexedscriptsThelastoptionfordefiningscriptsisstoringtheminthededicatedElasticsearchindex.Forthesamesecurityreasons,dynamicexecutionoftheindexedscriptsisbydefaultdisabled.Toenabletheindexedscripts,wehavetoaddasimilarconfigurationoptiontotheoneweaddedtobeabletousetheinlinescripts.Weneedtoaddthefollowinglinetotheelasticsearch.ymlfile:

script.indexed:on

Afteraddingtheprecedingpropertytoallthenodesandrestartingthecluster,wewillbereadytostartusingtheindexedscripts.Elasticsearchprovidesanadditional,dedicated

www.EBooksWorld.ir

Page 358: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

endpointforthispurpose.Let’sstoreourscript:

curl-XPOST'localhost:9200/_scripts/groovy/tag_sort'-d'{

"script":"_doc.tags.values.size()>0?_doc.tags.values[0]:

\"\u19999\""

}'

Thescriptisready,butlet’sdiscusswhatwejustdid.WesentanHTTPPOSTrequesttothespecial_scriptsRESTend-point.Wealsospecifiedthelanguageofthescript(groovyinourcase)andthenameofthescript(tag_sort).Thebodyoftherequestisthescriptitself.

Wecannowmoveontothequery,whichlooksasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"id":"tag_sort"

},

"type":"string",

"order":"asc"

}

}

}'

Aswesee,thequeryispracticallyidenticaltothequeryusedwiththescriptdefinedinafile.Theonlydifferenceisthatweprovidedtheidentifierofthescriptusingtheidparameterinsteadofprovidingthefilename.

www.EBooksWorld.ir

Page 359: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

QueryingwithscriptsIfwelookatanyrequestmadetoElasticsearchthatusesscripts,wewillnoticesomesimilarproperties,whichareasfollows:

script:Thispropertywrapsthescriptdefinition.inline:Thispropertyholdsthecodeofthescriptitself.id:Thispropertydefinestheidentifieroftheindexedscript.file:Thefilenameofthescriptwithouttheextension.lang:Thispropertydefinesthelanguageofthescript.Ifitisomitted,Elasticsearchassumesgroovy.params:Thisobjectcontainstheparametersandtheirvalues.Everydefinedparametercanbeusedinsidethescriptbyspecifyingthatparameter’sname.Theparametersallowustowritecleanercodewhichwillbeexecutedinamoreefficientmanner.Scriptsusingtheparametersareexecutedfasterthancodewithembeddedconstantsbecauseofcaching.

www.EBooksWorld.ir

Page 360: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ScriptingwithparametersAsourscriptsbecomemoreandmorecomplicated,theneedforcreatingmultiple,almostidenticalscriptscanappear.Thesescriptsusuallydifferinthevaluesused,withthelogicbehindthembeingexactlythesame.Inoursimpleexample,weusedahardcodedvalueusedtomarkdocumentswithemptytagslist.Let’schangethistoallowdefinitionofthehardcodedvalue.Let’suseinfilescriptdefinitionandcreateatag_sort_with_param.groovyfilewiththefollowingcontents:

_doc.tags.values.size()>0?_doc.tags.values[0]:tvalue

Theonlychangewe’vemadeistheintroductionoftheparameternamedtvalue,whichcanbesetinthequeryinthefollowingway:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"file":"tag_sort_with_param",

"params":{

"tvalue":"000"

}

},

"type":"string",

"order":"asc"

}

}

}'

Theparamssectiondefinesallthescriptparameters.Inoursimpleexample,we’veonlyusedasingleparameter,butofcoursewecanhavemultipleparametersinasinglequery.

www.EBooksWorld.ir

Page 361: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ScriptlanguagesAswealreadysaid,thedefaultlanguageforscriptingisGroovy.However,wearenotlimitedtoonlyasinglescriptinglanguagewhenusingElasticsearch.Infact,ifyouwouldliketo,youcanevenuseJavatowriteyourscripts.Inadditiontothat,thecommunitybehindElasticsearchprovidesadditionallanguagessupportasplugins.Soifyouarewillingtoinstallplugins,youcanextendthelistofscriptinglanguagesthatElasticsearchsupportsevenfurther.YoumaywonderwhyyouwouldevenconsiderusingascriptinglanguageotherthanthedefaultGroovy.Thefirstreasonisyourownpreferences.Ifyouareapythonenthusiast,youareprobablynowthinkingabouthowtousepythonforyourElasticsearchscripts.Theotherreasoncouldbesecurity.Whenwetalkedabouttheinlinescripts,wetoldyouthattheyareturnedoffbydefault.Thisisnotexactlytrueforallthescriptinglanguagesavailableoutofthebox.TheinlinescriptsaredisabledbydefaultwhenusingGroovy,butyoucanuseLuceneexpressionsandMustachewithoutanyissues.Thisisbecausethoselanguagesaresandboxed,whichmeansthatthesecuritysensitivefunctionsareturnedoff.Andofcourse,thelastfactorwhenchoosingalanguageisperformance.Theoretically,thenativescripts(inJava)shouldhavebetterperformancethanothers,butyoushouldrememberthatthedifferencecanbeinsignificant.Youshouldalwaysconsiderthecostofdevelopmentandmeasureperformance.

www.EBooksWorld.ir

Page 362: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UsingotherthanembeddedlanguagesUsingGroovyforscriptingisasimpleandsufficientsolutionformostusecases.However,youmayhaveadifferentpreferenceandyoumayliketousesomethingdifferent,suchasJavaScript,Python,orMvel.Beforeusingotherlanguages,wemustinstallanappropriateplugin.YoucanreadmoredetailsaboutpluginsintheElasticsearchpluginssectionofChapter9,ElasticsearchCluster.Fornow,we’lljustrunthefollowingcommandfromtheElasticsearchdirectory:

bin/plugininstalllang-javascript

TheprecedingcommandwillinstallapluginthatwillallowtheusageofJavaScriptasthescriptinglanguage.Theonlychangeweshouldmakeintherequestistoaddtheadditionalinformationaboutthelanguageweareusingforscriptingand,ofcourse,modifythescriptitselftocorrectlyusethenewlanguage.Lookatthefollowingexample:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"inline":"_doc.tags.values.length>0?_doc.tags.values[0]

:\"\u19999\";",

"lang":"javascript"

},

"type":"string",

"order":"asc"

}

}

}'

Asyoucansee,we’veusedJavaScriptforscriptinginsteadofthedefaultGroovy.ThelangparameterinformsElasticsearchaboutthelanguagebeingused.

www.EBooksWorld.ir

Page 363: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UsingnativecodeIncasethescriptsaretoosloworyoudon’tlikescriptinglanguages,ElasticsearchallowsyoutowriteJavaclassesandusetheminsteadofscripts.Therearetwopossiblewaysofaddingnativescripts:addingclassesdefiningscriptstoElasticsearchclasspathoraddingscriptasafunctionalityprovidedbyaplugin.Wewilldescribethissecondsolutionasitismoreelegant.

ThefactoryimplementationWeneedtoimplementatleasttwoclassestocreateanewnativescript.Thefirstoneisafactoryforourscript.Fornow,let’sfocusonit.Thefollowingsamplecodeillustratesthefactoryforourscript:

packagepl.solr.elasticsearch.examples.scripts;

importjava.util.Map;

importorg.elasticsearch.common.Nullable;

importorg.elasticsearch.script.ExecutableScript;

importorg.elasticsearch.script.NativeScriptFactory;

publicclassHashCodeSortNativeScriptFactoryimplementsNativeScriptFactory

{

@Override

publicExecutableScriptnewScript(@NullableMap<String,Object>params)

{

returnnewHashCodeSortScript(params);

}

@Override

publicbooleanneedsScores(){

returnfalse;

}

}

Theessentialpartsarehighlightedinthecodesnippet.Thisclassshouldimplementtheorg.elasticsearch.script.NativeScriptFactoryclass.Theinterfaceforcesustoimplementtwomethods.ThenewScript()methodtakestheparametersdefinedintheAPIcallandreturnsaninstanceofourscript.Finally,needsScores()informsElasticsearchifwewanttousescoringandwhetheritshouldbecalculated.

ImplementingthenativescriptNowlet’slookattheimplementationofourscript.Theideaissimple–ourscriptwillbeusedforsorting.DocumentswillbeorderedbythehashCode()valueofthechosenfield.Thedocumentswithoutavalueinthedefinedfieldwillbefirstontheresultslist.Weknowthelogicdoesn’tmaketoomuchsense,butitisgoodforpresentationasitissimple.Thesourcecodeforournativescriptlooksasfollows:

www.EBooksWorld.ir

Page 364: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

packagepl.solr.elasticsearch.examples.scripts;

importjava.util.Map;

importorg.elasticsearch.script.AbstractSearchScript;

publicclassHashCodeSortScriptextendsAbstractSearchScript{

privateStringfield="name";

publicHashCodeSortScript(Map<String,Object>params){

if(params!=null&&params.containsKey("field")){

this.field=params.get("field").toString();

}

}

@Override

publicObjectrun(){

Objectvalue=source().get(field);

if(value!=null){

returnvalue.hashCode();

}

return0;

}

}

Firstofall,ourclassinheritsfromtheorg.elasticsearch.script.AbstractSearchScriptclassandimplementstherun()method.Thisiswherewegettheappropriatevaluesfromthecurrentdocument,processitaccordingtoourstrangelogic,andreturntheresult.Youmaynoticethesource()call.Itisexactlythesame_sourceparameterthatweusedwhendealingwithnon-nativescripts.Thedoc()andfields()methodsarealsoavailableandtheyfollowthesamelogicwedescribedearlier.

Thethingworthlookingatishowwe’veusedtheparameters.Weassumethatausercanputthefieldparameter,tellinguswhichdocumentfieldwillbeusedformanipulation.Wealsoprovideadefaultvalueforthisparameter.

TheplugindefinitionWesaidthatwewillinstallourscriptasapartofaplugin.Thisiswhyweneedadditionalfiles.ThefirstfileistheplugininitializationclasswherewetellElasticsearchaboutournewscript:

packagepl.solr.elasticsearch.examples.scripts;

importorg.elasticsearch.plugins.Plugin;

importorg.elasticsearch.script.ScriptModule;

publicclassScriptPluginextendsPlugin{

@Override

publicStringdescription(){

return"Theexampleofnativesortscript";

www.EBooksWorld.ir

Page 365: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

@Override

publicStringname(){

return"naive-sort-plugin";

}

publicvoidonModule(finalScriptModulemodule){

module.registerScript("native_sort",

HashCodeSortNativeScriptFactory.class);

}

}

Theimplementationiseasy.Thedescription()andname()methodsareonlyforinformation,solet’sfocusontheonModule()method.Inourcase,weneedaccesstothescriptmodule–Elasticsearchservicewithscriptsandscriptinglanguages.ThisiswhywedefineonModule()withoneScriptModuleargument.ThankstoElasticsearchmagic,wecanusethismoduleandregisterourscriptsoitcanbefoundbytheengine.WehaveusedtheregisterScript()method,whichtakesthescriptnameandthepreviouslydefinedfactoryclass.

Thesecondneededfileisaplugindescriptorfile:plugin-descriptor.properties.ItdefinestheconstantsusedbytheElasticsearchpluginsubsystem.Withoutmorethinking,let’slookatthecontentsofthisfile:

jvm=true

classname=pl.solr.elasticsearch.examples.scripts.ScriptPlugin

elasticsearch.version=2.2.0

version=0.0.1-SNAPSHOT

name=native_script

description=ExampleNativeScripts

java.version=1.7

Theappropriatelineshavethefollowingmeaning:

jvm:tellsElasticsearchthatourfilecontainsJavacodeclassname:describesthemainclasswithplugindefinitionelasticsearch.versionandjava.version:tellsusabouttheElasticsearchversionthatissupportedbythepluginandtheJavaversionthatisneedednameanddescription:Informativenameandshortdescriptionofourplugin

Andthat’sit.Wehaveallthefilesneededtorunourscript.Pleasenotethatyoucanhavemorethanasinglescriptpackedasasingleplugin.

InstallingthepluginNowit’stimetoinstallournativescriptembeddedintheplugin.AfterpackingthecompiledclassesasaJARarchive,weshouldputitintheElasticsearchplugins/native-scriptdirectory.Thenative-scriptpartisarootdirectoryforourpluginandyoumaynameitasyouwish.Inthisdirectoryyoualsoneedthepreparedplugin-descriptor.propertiesfile.ThismakesourpluginvisibletoElasicsearch.

www.EBooksWorld.ir

Page 366: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RunningthescriptAfterrestartingElasticsearch(orthewholeclusterifyourunmorethanasinglenode),wecanstartsendingthequeriesthatuseournativescript.Forexample,wewillsendaquerythatusesourpreviouslyindexeddatafromthelibraryindex.Thisexamplequerylooksasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"script":"native_sort",

"lang":"native",

"params":{

"field":"otitle"

}

},

"type":"string",

"order":"asc"

}

}

}'

Notetheparamspartofthequery.Inthiscall,wewanttosortontheotitlefield.Weprovidethescriptnamenative_sortandthescriptlanguagenative.Thisisrequired.Ifeverythinggoeswell,weshouldseeourresultssortedbyourcustomsortlogic.IfwelookattheresponsefromElasticsearch,wewillseethatthedocumentswithouttheotitlefieldareatthefirstfewpositionsoftheresultslistandtheirsortvalueis0.

www.EBooksWorld.ir

Page 367: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 368: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SearchingcontentindifferentlanguagesUntilnow,whendiscussinglanguageanalysis,we’vetalkedmostlyabouttheory.Wedidn’tseeanexampleregardinglanguageanalysis,handlingmultiplelanguagesthatourdatacanconsistof,andsoon.Nowthiswillchange,asthissectionisdedicatedtoinformationabouthowwecanhandledatainmultiplelanguages.

www.EBooksWorld.ir

Page 369: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

HandlinglanguagesdifferentlyAsyoualreadyknow,Elasticsearchallowsustochoosedifferentanalyzersforourdata.Wecanhaveourdatadividedonthebasisofwhitespaces,orhavethemlowercased,andsoon.Thiscanusuallybedoneregardlessofthelanguage–thesametokenizationonthebasisofwhitespaceswillworkforEnglish,German,andPolish,althoughitwon’tworkforChinese.However,whatifyouwanttofinddocumentsthatcontainwordssuchascatandcatsbyonlysendingthewordcattoElasticsearch?Thisiswherelanguageanalysiscomesintoplaywithstemmingalgorithmsfordifferentlanguages,whichallowtheanalyzedwordstobereducedtotheirrootforms.Andnowtheworstpart–wecan’tuseonegeneralstemmingalgorithmforallthelanguagesintheworld;wehavetochooseoneappropriatelanguage.Thefollowingsectionsinthechapterwillhelpyouwithsomepartsofthelanguageanalysisprocess.

www.EBooksWorld.ir

Page 370: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

HandlingmultiplelanguagesThereareafewwaysofhandlingmultiplelanguagesinElasticsearchandallofthemhavesomeprosandcons.Wewon’tbediscussingeverything,butjustforthepurposeofgivingyouanidea,afewofthosemethodsareasfollows:

StoringdocumentsindifferentlanguagesasdifferenttypesStoringdocumentsindifferentlanguagesinseparateindicesStoringlanguagedataindifferentfieldsofasingledocument

Forthepurposeofthebook,wewillfocusonasinglemethod–theonethatallowsstoringdocumentsindifferentlanguagesinasingleindex.Wewillfocusonaproblemwherewehaveasingletypeofdocument,buteachdocumentmaycomefromanywhereintheworldandthuscanbewritteninmultiplelanguages.Also,wewouldliketoenableouruserstousealltheanalysiscapabilities,suchasstemmingandstopwordsfordifferentlanguages,notonlyforEnglish.

NoteNotethatthestemmingalgorithmsperformdifferentlyfordifferentlanguages,bothintermsofanalysisperformanceandtheresultingterms.Forexample,Englishstemmersareverygood,butyoucanrunintoissueswithEuropeanlanguages,suchasGerman.

www.EBooksWorld.ir

Page 371: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DetectingthelanguageofthedocumentBeforewecontinuewithshowingyouhowtosolveourproblemwithhandlingmultiplelanguagesinElasticsearch,wewouldliketotellyouaboutoneadditionalthing,thatislanguagedetection.Therearesituationswhereyoujustdon’tknowwhatlanguageyourdocumentorqueryarein.Insuchcases,languagedetectionlibrariesmaybeagoodchoice,especiallywhenusingJavaasyourprogramminglanguageofchoice.Someofthelibrariesareasfollows:

ApacheTika(http://tika.apache.org/)Languagedetection(https://github.com/shuyo/language-detection)

Thelanguagedetectionlibraryclaimstohaveover99percentprecisionfor53languages;that’salotifyouaskus.

Youshouldremember,though,thatdatalanguagedetectionwillbemorepreciseforlongertext.Becausethetextofqueriesisusuallyshort,youcanexpecttohavesomedegreeoferrorduringquerylanguageidentification.

www.EBooksWorld.ir

Page 372: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SampledocumentLet’sstartwithintroducingasampledocument,whichisasfollows:

{

"title":"Firsttestdocument",

"content":"Thisisatestdocument"

}

Asyoucansee,thedocumentisprettysimple;itcontainsthefollowingtwofields:

title:Thisfieldholdsthetitleofthedocumentcontent:Thisfieldholdstheactualcontentofthedocument

Thisdocumentisquitesimple,but,fromthesearchpointofview,theinformationaboutdocumentlanguageismissing.Whatweshoulddoisenrichthedocumentbyaddingtheneededinformation.Wecandothatbyusingoneofthepreviouslymentionedlibraries,whichwilltrytodetectthelanguage.

Afterwehavethelanguagedetected,weinformElasticsearchwhichanalyzershouldbeusedandmodifythedocumenttodirectlyshowthelanguageofeachfield.Eachofthefieldswouldhavetobeanalyzedbyalanguageanalyzerdedicatedtothedetectedlanguage.

NoteAfulllistoftheselanguageanalyzerscanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html).

Ifadocumentiswritteninalanguagethatwearenotsupporting,wewilljustfallbacktosomedefaultfieldwiththedefaultanalyzer.Forexample,ourprocessedandpreparedforindexingdocumentcouldlooklikethis:

{

"title_english":"Firsttestdocument",

"content_english":"Thisisatestdocument"

}

Thethingisthatallthisprocessingwe’vementionedwouldhavetobedoneoutsideofElasticsearchorinsomekindofcustompluginthatwouldimplementthementionedlogic.

NoteInthepreviousversionsofElasticsearch,therewasapossibilityofchoosingananalyzerbasedonthevalueofanadditionalfield,whichcontainedtheanalyzername.Thiswasamoreconvenientandelegantwaybutintroducedsomeuncertaintyaboutthefieldcontents.Youalwayshadtodeliveraproperanalyzerwhenusingthegivenfieldorstrangethingshappened.TheElasticsearchteammadethedifficultdecisionandremovedthisfeature.

Thereisalsoasimplerway:wecantakeourfirstdocumentandindexitinseveralwaysindependentlyfrominputlanguage.Let’sfocusonthissolution.

www.EBooksWorld.ir

Page 373: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThemappingsTohandleoursolution,whichwillprocessthedocumentusingseveraldefinedlanguages,weneednewmappings.Let’slookatthemappingswe’vecreatedtoindexourdocuments(we’vestoredtheminthemappings.jsonfile):

{

"mappings":{

"doc":{

"properties":{

"title":{

"type":"string",

"index":"analyzed",

"fields":{

"english":{

"type":"string",

"index":"analyzed",

"analyzer":"english"

},

"russian":{

"type":"string",

"index":"analyzed",

"analyzer":"russian"

},

"german":{

"type":"string",

"index":"analyzed",

"analyzer":"german"

}

}

},

"content":{

"type":"string",

"index":"analyzed",

"fields":{

"english":{

"type":"string",

"index":"analyzed",

"analyzer":"english"

},

"russian":{

"type":"string",

"index":"analyzed",

"analyzer":"russian"

},

"german":{

"type":"string",

"index":"analyzed",

"analyzer":"german"

}

}

}

}

}

}

www.EBooksWorld.ir

Page 374: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

Intheprecedingmappings,we’veshownthedefinitionforthetitleandcontentfields(ifyouarenotfamiliarwithanyaspectofmappingsdefinition,refertotheMappingsconfigurationsectionofChapter2,IndexingYourData).WehaveusedthemultifieldfeatureofElasticsearch:eachfieldcanbeindexedinseveralwaysusingvariouslanguageanalyzers(inourexample,thoseanalyzersare:English,Russian,andGerman).

Inaddition,thebasefieldusesthedefaultanalyzer,whichwemayuseatquerytimewhenthelanguageisunknown.So,eachfieldwillactuallyhavefourfields–thedefaultoneandthreelanguageorientedfields.

Inordertocreateasampleindexcalleddocsthatusesourmappings,wewillusethefollowingcommand:

curl-XPUT'localhost:9200/docs'[email protected]

www.EBooksWorld.ir

Page 375: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

QueryingNowlet’sseehowwecanqueryourdatatousethenewlycreatedlanguagefields.Wecandividethequeryingsituationintotwodifferentcases.Ofcourse,tostartqueryingweneeddocuments.Let’sindexourexampledocumentbyrunningthefollowingcommand:

curl-XPOST'localhost:9200/docs/doc/1'-d'{"title":"Firsttest

document","content":"Thisisatestdocument"}'

QuerieswithanidentifiedlanguageThefirstcaseiswhenwehaveourquerylanguageidentified.Let’sassumethattheidentifiedlanguageisEnglish.Insuchcases,ourqueryisasfollows:

curl'localhost:9200/docs/_search?pretty'-d'{

"query":{

"match":{

"content.english":"documents"

}

}

}'

Thethingtoputemphasisonintheprecedingqueryisthefieldusedforqueryingandthequerytype.Thefieldusediscontent.english,whichalsoindicateswhichanalyzerwewanttouse.Weusedthatfieldbecausewehadidentifiedourlanguagebeforerunningthequery.Thankstothis,theEnglishanalyzercanfindourdocumentevenifwehavethesingularformofthewordinthedocument.TheresponsereturnedbyElasticsearchwillbeasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.19178301,

"hits":[{

"_index":"docs",

"_type":"doc",

"_id":"1",

"_score":0.19178301,

"_source":{

"title":"Firsttestdocument",

"content":"Thisisatestdocument"

}

}]

}

}

Thethingtonoteisalsothequerytype–thematchquery.Weusedthematchquery

www.EBooksWorld.ir

Page 376: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

becauseitanalyzesitsbodywiththeanalyzerusedbythefieldthatitisrunagainst.Weneedthattoproperlymatchthedatainthequeryandthedataintheindex.

QuerieswithanunknownlanguageNowlet’slookatthesecondsituation–handlingquerieswhenwecouldn’tidentifythelanguageofthequery.Insuchcases,wecan’tusethefieldnamepointingtooneofthelanguages,suchascontent.german.Insuchacase,weusethedefaultfieldwhichusesthedefaultanalyzerandwesendthequerytothecontentfieldinstead.Thequerywilllookasfollows:

curl'localhost:9200/docs/_search?pretty'-d'{

"query":{

"match":{

"content":"documents"

}

}

}'

However,wedidn’tgetanyresultsthistimebecausethedefaultanalyzercan’tdealwithasingularformofawordwhenwearesearchingwithapluralform.

www.EBooksWorld.ir

Page 377: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

CombiningqueriesToadditionallyboostthedocumentsthatperfectlymatchwithourdefaultanalyzer,wecancombinethetwoprecedingquerieswiththeboolquery.Suchacombinedquerywilllookasfollows:

curl-XGET'localhost:9200/docs/_search?pretty=true'-d'{

"query":{

"bool":{

"minimum_should_match":1,

"should":[

{

"match":{

"content.english":"documents"

}

},

{

"match":{

"content":"documents"

}

}

]

}

}

}'

Forthedocumenttobereturned,atleastoneofthedefinedqueriesmustmatch.Iftheybothmatch,thedocumentwillhaveahigherscorevalueandwillbeplacedhigherintheresults.

Thereisoneadditionaladvantageoftheprecedingcombinedquery.Ifourlanguageanalyzerdoesn’tfindadocument(forexample,whentheanalysisisdifferentfromtheoneusedduringindexing),thesecondqueryhasachancetofindthetermsthataretokenizedonlybywhitespacecharactersandlowercase.

www.EBooksWorld.ir

Page 378: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 379: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

InfluencingscoreswithqueryboostsInthebeginningofthischapter,welearnedwhatscoringisandhowElasticsearchusesthescoringformula.Whenanapplicationgrows,theneedforimprovingthequalityofsearchalsoincreases-wecallitsearchexperience.Weneedtogainknowledgeaboutwhatismoreimportanttotheuserandweseehowtheusersusethesearchesfunctionality.Thisleadstovariousconclusions;forexample,weseethatsomepartsofthedocumentsaremoreimportantthanothersorthatparticularqueriesemphasizeonefieldatthecostofothers.Weneedtoincludesuchinformationinourdataandqueriessothatbothsidesofthescoringequationareclosertoourbusinessneeds.Thisiswhereboostingcanbeused.

www.EBooksWorld.ir

Page 380: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheboostBoostisanadditionalvalueusedintheprocessofscoring.Wealreadyknowitcanbeappliedto:

Query:Whenused,weinformthesearchenginethatthegivenqueryisapartofacomplexqueryandismoresignificantthantheotherparts.Document:Whenusedduringindexing,wetellElasticsearchthatadocumentismoreimportantthantheothersintheindex.Forexample,whenindexingblogposts,weareprobablymoreinterestedinthepoststhemselvesthanpingbacksorcomments.

Valuesassignedbyustoaqueryoradocumentarenottheonlyfactorsusedwhenwecalculatetheresultingscoreandweknowthat.Wewillnowlookatafewexamplesofqueryboosting.

www.EBooksWorld.ir

Page 381: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AddingtheboosttoqueriesLet’simaginethatourindexhastwodocumentsandwe’veusedthefollowingcommandstoindexthem:

curl-XPOST'localhost:9200/messages/email/1'-d'{

"id":1,

"to":"JohnSmith",

"from":"DavidJones",

"subject":"Topsecret!"

}'

curl-XPOST'localhost:9200/messages/email/2'-d'{

"id":2,

"to":"DavidJones",

"from":"JohnSmith",

"subject":"John,readthisdocument"

}'

Thisdataistrivial,butitshoulddescribeourproblemverywell.Nowlet’sassumewehavethefollowingquery:

curl-XGET'localhost:9200/messages/_search?pretty'-d'{

"query":{

"query_string":{

"query":"john",

"use_dis_max":false

}

}

}'

Inthiscase,Elasticsearchwillcreateaquerytothe_allfieldandwillfinddocumentsthatcontainthedesiredwords.Wealsosaidthatwedon’twantthedisjunctionquerytobeusedbyspecifyingtheuse_dis_maxparametertofalse(ifyoudon’trememberwhatadisjunctionqueryis,refertotheThedis_maxquerysectioninChapter3,SearchingYourData).Aswecaneasilyguess,bothourrecordswillbereturned.Therecordwithidentifierequalto2willbefirstbecausethewordJohnoccurstwotimes–onceinthefromfieldandonceinthesubjectfield.Let’scheckthisoutinthefollowingresult:

"hits":{

"total":2,

"max_score":0.13561106,

"hits":[{

"_index":"messages",

"_type":"email",

"_id":"2",

"_score":0.13561106,

"_source":{

"id":2,

"to":"DavidJones",

"from":"JohnSmith",

"subject":"John,readthisdocument"

}

},{

www.EBooksWorld.ir

Page 382: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"_index":"messages",

"_type":"email",

"_id":"1",

"_score":0.11506981,

"_source":{

"id":1,

"to":"JohnSmith",

"from":"DavidJones",

"subject":"Topsecret!"

}

}]

}

Iseverythingallright?Technically,yes.Butwethinkthattheseconddocument(theonewithidentifier1)shouldbepositionedasthefirstoneintheresultlist,becausewhensearchingforsomething,themostimportantfactor(inmanycases)ismatchingpeopleratherthanthesubjectofthemessage.Youcandisagree,butthisisexactlywhyfull-textsearchingrelevanceisadifficulttopic;sometimesitishardtotellwhichorderingisbetterforaparticularcase.Whatcanwedo?First,let’srewriteourquerytoimplicitlyinformElasticsearchwhatfieldsshouldbeusedforsearching:

curl-XGET'localhost:9200/messages/_search?pretty'-d'{

"query":{

"query_string":{

"fields":["from","to","subject"],

"query":"john",

"use_dis_max":false

}

}

}'

Thisisnotexactlythesamequeryasthepreviousone.Ifwerunit,wewillgetthesameresults(inourcase).However,ifyoulookcarefully,youwillnoticedifferencesinscoring.Inthepreviousexample,Elasticsearchonlyusedonefield,thatisthedefault_allfield.Thequerythatweareusingnowisusingthreefieldsformatching.Thismeansthatseveralfactors,suchasfieldlengths,arechanged.Anyway,thisisnotsoimportantinourcase.Elasticsearchunderthehoodgeneratesacomplexquerymadeofthreequeries–onetoeachfield.Ofcourse,thescorecontributedbyeachquerydependsonthenumberoftermsfoundinthisfieldandthelengthofthisfield.

Let’sintroducesomedifferencesbetweenthefieldsandtheirimportance.Comparethefollowingquerytothelastone:

curl-XGET'localhost:9200/messages/_search?pretty'-d'{

"query":{

"query_string":{

"fields":["from^5","to^10","subject"],

"query":"john",

"use_dis_max":false

}

}

}'

www.EBooksWorld.ir

Page 383: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Lookatthehighlightedparts(^5and^10).Byusingthatnotation(the^characterfollowedbyanumber),wecaninformElasticsearchhowimportantagivenfieldis.Weseethatthemostimportantfieldisthetofield(becauseofthehighestboostvalue).Nextwehavethefromfield,whichislessimportant.Thesubjectfieldhasthedefaultvalueforboost,whichis1.0andistheleastimportantfieldwhenitcomestoscorecalculation.Alwaysrememberthatthisvalueisonlyoneofthevariousfactors.Youmaybewonderingwhywechoose5andnot1000or1.23.Well,thisvaluedependsontheeffectwewanttoachieve,whatquerywehave,and,mostimportantly,whatdatawehaveinourindex.Typically,whendatachangesinthemeaningfulparts,weshouldprobablycheckandtuneourrelevanceonceagain.

Intheend,let’slookatasimilarexample,butusingtheboolquery:

curl-XGET'localhost:9200/messages/_search?pretty'-d'{

"query":{

"bool":{

"should":[

{"term":{"from":{"value":"john","boost":5}}},

{"term":{"to":{"value":"john","boost":10}}},

{"term":{"subject":{"value":"john"}}}

]

}

}

}'

Theprecedingquerywillyieldthesameresults,whichmeansthatthefirstdocumentontheresultslistwillbetheonewiththeidentifier1,butthescoreswillbeslightlydifferent.ThisisbecausetheLucenequeriesmadefromthelasttwoexamplesareslightlydifferentandthusthescoresaredifferent.

www.EBooksWorld.ir

Page 384: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ModifyingthescoreTheprecedingexampleshowshowtoaffecttheresultlistbyboostingparticularquerycomponents–thefields.Anothertechniqueistorunaqueryandaffectthescoreofthematcheddocuments.Inthefollowingsections,wewillsummarizethepossibilitiesofferedbyElasticsearch.Intheexamples,wewilluseourlibrarydatathatwehavealreadyusedinthepreviouschapters.

ConstantscorequeryAconstant_scorequeryallowsustotakeanyqueryandexplicitlysetthevaluethatshouldbeusedasthescorethatwillbegivenforeachmatchingdocumentbyusingtheboostparameter.

Atfirst,thisquerydoesn’tseemtobepractical.Butwhenwethinkaboutbuildingcomplexqueries,thisqueryallowsustosethowmanydocumentsmatchingthisquerycanaffectthetotalscore.Lookatthefollowingexample:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"constant_score":{

"query":{

"query_string":{

"query":"available:falseauthor:heller"

}

}

}

}

}'

Inourdata,wehavetwodocumentswiththeavailablefieldsettofalse.Oneofthesedocumentshasanadditionalvalueintheauthorfield.Ifweuseadifferentquery,thedocumentwithanadditionalvalueintheauthorfield(abookwithidentifier2)wouldbegivenahigherscore,but,thankstotheconstantscorequery,Elasticsearchwillignorethatinformationduringscoring.Bothdocumentswillbegivenascoreequalto1.0.

BoostingqueryThenexttypeofquerythatcanbeusedwithboostingistheboostingquery.Theideaistoallowustodefineapartofquerywhichwillcausematcheddocumentstohavetheirscoreslowered.Thefollowingexamplereturnsalltheavailablebooks(availablefieldsettotrue),butthebookswrittenbyE.M.Remarquewillhaveanegativeboostof0.1(whichmeansabouttentimeslowerscore):

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"boosting":{

"positive":{

"term":{

"available":true

}

},

www.EBooksWorld.ir

Page 385: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"negative":{

"match":{

"author":"remarque"

}

},

"negative_boost":0.1

}

}

}'

ThefunctionscorequeryTillnowwe’veseentwoexamplesofqueriesthatallowedustoalterthescoreofthereturneddocuments.Thethirdexamplewewantedtotalkabout,thefunction_scorequery,iswaymorecomplicatedthanthepreviouslydiscussedqueries.Thefunction_scorequeryisveryusefulwhenthescorecalculationismorecomplicatedthangivingasingleboosttoallthedocuments;boostingmorerecentdocumentsisanexampleofaperfectusecaseforthefunction_scorequery.

Structureofthefunctionquery

Thestructureofthefunctionqueryisquitesimpleandlooksasfollows:

{

"query":{

"function_score":{

"query":{...},

"functions":[

{

"filter":{...},

"FUNCTION":{...}

}

],

"boost_mode":"...",

"score_mode":"...",

"max_boost":"...",

"min_score":"...",

"boost":"..."

}

}

}

Ingeneral,thefunctionscorequerycanuseaquery,oneofseveralfunctions,andadditionalparameters.Eachfunctioncanhaveafilterdefinedtofiltertheresultsonwhichitwillbeapplied.Ifnofilterisgivenforafunction,itwillbeappliedtoallthedocuments.

Thelogicbehindthefunctionscorequeryisquitesimple.Firstofall,thefunctionsarematchedagainstthedocumentsandthescoreiscalculatedbasedonscore_mode.Afterthat,thequeryscoreforthedocumentiscombinedwiththescorecalculatedforthefunctionsandcombinedtogetheronthebasisofboost_mode.

Let’snowdiscusstheparameters:

Boostmode:Theboost_modeparameterallowsustodefinehowthescorecomputedbythefunctionquerieswillbecombinedwiththescoreofthequery.Thefollowing

www.EBooksWorld.ir

Page 386: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

valuesareallowed:

multiply:Thedefaultbehavior,whichresultsinthequeryscorebeingmultipliedbythescorecomputedfromthefunctionsreplace:Thequeryscorewillbetotallyignoredandthedocumentscorewillbeequaltothescorecalculatedbythefunctionssum:Thedocumentscorewillbecalculatedasthesumofthequeryandthefunctionscoresavg:Thescoreofthedocumentwillbeanaverageofthequeryscoreandthefunctionscoremax:Thedocumentwillbegivenamaximumofqueryscoreandfunctionscoremin:Thedocumentwillbegivenaminimumofqueryscoreandfunctionscore

Scoremode:Thescore_modeparameterdefineshowthescorecomputedbythefunctionsarecombinedtogether.Thefollowingscore_modeparametervaluesaredefined:

multiply:Thedefaultbehaviorwhichresultsinthescoresreturnedbythefunctionsbeingmultipliedsum:Thescoresreturnedbythedefinedfunctionsaresummedavg:Thescorereturnedbythefunctionsisanaverageofallthescoresofthematchingfunctionsfirst:Thescoreofthefirstfunctionwithafiltermatchingthedocumentisreturnedmax:Themaximumscoreofthefunctionsisreturnedmin:Theminimumscoreofthefunctionsisreturned

Thereisonethingtoremember–wecanlimitthemaximumcalculatedscorevaluebyusingthemax_boostparameterinthefunctionscorequery.Bydefault,thatparameterissettoFloat.MAX_VALUE,whichmeansthemaximumfloatvalue.

Theboostparameterallowsustosetaquerywideboostforthedocuments.

Ofcourse,thereisonethingweshouldremember–thescorecalculateddoesn’taffectwhichdocumentsmatchedthequery.Becauseofthat,themin_scorepropertyhasbeenintroduced.Itallowsustodefinetheminimumscoreofthedocuments.Documentsthathaveascorelowerthanthemin_scorepropertywillbeexcludedfromtheresults.

Whatwehaven’ttalkedaboutyetarethefunctionscoresthatwecanincludeinthefunctionssectionofourquery.Thecurrentlyavailablefunctionsare:

weightfactorfieldvaluefactorscriptscorerandomdecay

Theweightfactorfunction

Theweightfactorfunctionallowsustomultiplythescoreofthedocumentbyagiven

www.EBooksWorld.ir

Page 387: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

value.Thevalueoftheweightparameterisnotnormalizedandistakenasis.Anexampleusingtheweightfunction,wherewemultiplythescoreofthedocumentby20,looksasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"function_score":{

"query":{

"term":{

"available":true

}

},

"functions":[

{"weight":20}

]

}

}

}'

Fieldvaluefactorfunction

Thefield_value_factorfunctionallowsustoinfluencethescoreofthedocumentbyusingavalueofthefieldinthatdocument.Forexample,tomultiplythescoreofthedocumentbythevalueoftheyearfield,werunthefollowingquery:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"function_score":{

"query":{

"term":{

"available":true

}

},

"functions":[

{

"field_value_factor":{

"field":"year",

"missing":1

}

}

]

}

}

}'

Inadditiontochoosingthefieldwhosevalueshouldbeused,wecanalsocontrolthebehaviorofthefieldvaluefactorfunctionbyusingthefollowingproperties:

factor:Themultiplicationfactorthatwillbeusedalongwiththefieldvalue.Itdefaultsto1.modifier:Themodifierthatwillbeappliedtothefieldvalue.Itdefaultstonone.Itcantakethevalueoflog,log1p,log2p,ln,ln1p,ln2p,square,sqrt,andreciprocal.missing:Thevaluethatshouldbeusedwhenadocumentdoesn’thaveanyvaluein

www.EBooksWorld.ir

Page 388: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

thefieldspecifiedinthefieldproperty.

Thescriptscorefunction

Thescript_scorefunctionallowsustouseascripttocalculatethescorethatwillbeusedasthescorereturnedbyafunction(andthuswillfallintobehaviordefinedbytheboost_modeparameter).Anexampleofscript_scoreusageisasfollows(forthefollowingexampletowork,inlinescriptingneedstobeallowed,whichmeansaddingthescript.inlinepropertyandsettingittooninelasticsearch.yml):

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"function_score":{

"query":{

"term":{

"available":true

}

},

"functions":[

{

"script_score":{

"script":{

"inline":"_score*_source.copies*parameter1",

"params":{

"parameter1":12

}

}

}

}

]

}

}

}'

Therandomscorefunction

Byusingtherandom_scorefunction,wecangenerateapseudorandomscore,byspecifyingaseed.Inordertosimulaterandomness,weshouldspecifyanewseedeverytime.Therandomnumberwillbegeneratedbyusingthe_uidfieldandtheprovidedseed.Ifaseedisnotprovided,thecurrenttimestampwillbeused.Anexampleofusingthisisasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"function_score":{

"query":{

"term":{

"available":true

}

},

"functions":[

{

"random_score":{

"seed":12345

}

www.EBooksWorld.ir

Page 389: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

]

}

}

}'

Decayfunctions

Inadditiontotheearliermentionedscoringfunctions,Elasticsearchexposesadditionalones,calledthedecayfunctions.Thedifferencefromthepreviouslydescribedfunctionsisthatthescoregivenbythosefunctionslowerswithdistance.Thedistanceiscalculatedonthebasisofasinglevaluednumericfield(suchasadate,ageographicalpoint,orastandardnumericfield).Thesimplestexamplethatcomestomindisboostingdocumentsonthebasisofdistancefromagivenpointorboostingonthebasisofdocumentdate.

Forexample,let’sassumethatwehaveapointfieldthatstoresthelocationandwewantourdocument’sscoretobeaffectedbythedistancefromapointwheretheuserstands(forexample,ourusersendsaqueryfromamobiledevice).Assumingtheuserisat52,21,wecouldsendthefollowingquery:

{

"query":{

"function_score":{

"query":{

"term":{

"available":true

}

},

"functions":[

{

"linear":{

"point":{

"origin":"52,21",

"scale":"1km",

"offset":0,

"decay":0.2

}

}

}

]

}

}

}

Intheprecedingexample,thelinearisthenameofthedecayfunction.Thevaluewilldecaylinearlywhenusingit.Theotherpossiblevaluesaregaussandexp.We’vechosenthelineardecayfunctionbecauseofthefactthatitsetsthescoreto0whenthefieldvalueexceedsthegivenoriginvaluetwice.Thisisusefulwhenyouwanttolowerthevalueofthedocumentsthataretoofaraway.

NoteNotethatthegeographicalsearchingcapabilitiesofElasticsearchwillbediscussedintheGeosectionofChapter8,BeyondFull-textSearching.

www.EBooksWorld.ir

Page 390: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Nowlet’sdiscusstherestofthequerystructure.Thepointisthenameofthefieldwewanttouseforscorecalculation.Ifthedocumentdoesn’thaveavalueinthedefinedfield,itwillbegivenavalueof1forthetimeofcalculation.

Inadditiontothat,we’veprovidedadditionalparameters.Theoriginandscalearerequired.Theoriginparameteristhecentralpointfromwhichthecalculationwillbedoneandthescaleistherateofdecay.Bydefault,theoffsetissetto0.Ifdefined,thedecayfunctionwillonlycomputeascoreforthedocumentswithvaluegreaterthanthevalueofthisparameter.ThedecayparametertellsElasticsearchhowmuchthescoreshouldbeloweredandissetto0.5bydefault.Inourcase,we’vesaidthat,atthedistanceof1kilometer,thescoreshouldbereducedby20%(0.2).

NoteWeexpectthefunction_scorequerytobemodifiedandextendedwiththenextversionsofElasticsearch(justasitwaswithElasticsearchversion1.x).Wesuggestfollowingtheofficialdocumentationandthepagededicatedtothefunction_scorequeryathttps://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html.

www.EBooksWorld.ir

Page 391: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 392: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Whendoesindex-timeboostingmakesense?Intheprevioussection,wediscussedboostingqueries.Thiskindofapproachtohandlingdifferencesintheweightofdocumentsisveryhandy,powerful,andeasytouse.Itisalsosufficientinmostsituations.However,therearecaseswhenamoreconvenientwayofdocumentsboostingisindex-timeboosting.Oneofsuchusecaseisthesituationwhenweknowwhichdocumentsareimportantduringtheindexingphase.Insuchacase,wecanpreparethedocumentboostandincludeitaspartofthedocument.Wegainaboostthatisindependentfromaqueryatthecostofreindexingthedocumentswhentheboostvalueischanged(becauseweneedtoapplythechangedboost).Inadditiontothat,theperformancegetsslightlybetterbecausesomepartsneededintheboostingprocessarealreadycalculatedatindextime,whichcanmatterwhenyourindiceshavealargenumberofdocuments.Informationabouttheboostisstoredasapartofthenormalizationfactorandbecauseofthatitisimportanttokeepthenormsturnedon.Thismeansthatwecan’tsetnorms.enabledtofalsebecausewewon’tbeabletouseindextimeboosting.

www.EBooksWorld.ir

Page 393: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DefiningboostinginthemappingsItisalsopossibletodirectlydefinethefield’sboostinourmappings.ThiswillresultinElasticsearchgivingaboostforallthedocumentshavingavalueinsuchafield.Ofcourse,thatwillalsohappenduringindexingtime.Thefollowingexampleillustratesthat:

{

"mappings":{

"book":{

"properties":{

"title":{"type":"string"},

"author":{"type":"string","boost":10.0}

}

}

}

}

Thankstotheprecedingboost,allquerieswillfavorvaluesfoundinthefieldnamedauthor.Thisalsoappliestoqueriesusingthe_allfield,becauseElasticsearchwillapplytheboosttovaluescopiedbetweenthefields.

www.EBooksWorld.ir

Page 394: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 395: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

WordswiththesamemeaningYoumayhaveheardaboutsynonyms,wordsthathavethesameorsimilarmeaning.Sometimesyouwouldwanttohavesomewordsmatchedwhenoneofthosewordsisenteredintothesearchbox.Let’srecalloursampledatafromChapter3,SearchingYourData.Therewasabookcalledcrimeandpunishment.Whatifwewantthatbooktonotonlybematchedwhenthewordscrimeorpunishmentareused,butalsowhenusingthewordssuchascriminalityandabuse.Atfirstglance,thismaynotsoundlikegoodbehavior,butsometimesthisisreallyneeded,especiallyinusecaseswheretherearemultiplewordsmeaningthesame(likeinmedicine).Tohandlesuchusecases,wewillusesynonyms.

www.EBooksWorld.ir

Page 396: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SynonymfilterSynonymsinElasticsearcharehandledontheanalysislevel–atbothindexandquerytime,byadedicatedsynonymsfilter.Tousethesynonymfilter,weneedtodefineourownanalyzer.Forexample,let’sdefineananalyzerthatwillbecalledsynonymandwillusethewhitespacetokenizerandasinglefiltercalledsynonym.Ourfilter’stypepropertyneedstobesettosynonym,whichtellsElasticsearchthatthisfilterisasynonymfilter.

Inadditiontothat,wewanttoignorecase,sothattheuppercasedandlowercasedsynonymsaretreatedequally(settheignore_casepropertytotrue).Todefineourcustomsynonymanalyzerthatusesasynonymfilterwhencreatinganewindex,wewouldusethefollowingcommand:

curl-XPOST'localhost:9200/test'-d'{

"index":{

"analysis":{

"analyzer":{

"synonym":{

"tokenizer":"whitespace",

"filter":[

"synonym"

]

}

},

"filter":{

"synonym":{

"type":"synonym",

"ignore_case":true,

"synonyms":[

"crime=>criminality"

]

}

}

}

}

}'

SynonymsinthemappingsInthedefinitionyou’vejustseen,we’vespecifiedthesynonymruleinthemappingswesendtoElasticsearch.Todothat,weneededtoaddthesynonymsproperty,whichisanarrayofsynonymrules.Forexample,thefollowingpartofthemappingsdefinitiondefinesasinglesynonymrule:

"synonyms":[

"crime=>criminality"

]

TheprecedingruletellsElasticsearchtochangethecrimetermtothecriminalitytermwhenthecrimetermisencounteredduringanalysis.

Synonymsstoredonthefilesystem

www.EBooksWorld.ir

Page 397: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Apartfromstoringthesynonymsrulesinthemappings,Elasticsearchallowsustouseafile-basedsynonymsruleset.Touseafile,weneedtospecifythesynonyms_pathpropertyinsteadofthesynonymsone.Thesynonyms_pathpropertyshouldbesettothenameofthefilethatholdsthesynonym’sdefinitionandthespecifiedfilepathisrelativetotheElasticsearchconfigdirectory.So,ifwestoreoursynonymsinthesynonyms.txtfileandwesavethatfileintheconfigdirectory,then,inordertouseit,weshouldsetsynonyms_pathtothevalueofsynonyms.txt.

Forexample,thisishowoursynonymfilterwouldlooklikeifwewanttousethesynonymsstoredinafile:

"filter":{

"synonym":{

"type":"synonym",

"synonyms_path":"synonyms.txt"

}

}

www.EBooksWorld.ir

Page 398: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DefiningsynonymrulesSofarwehavediscussedwhatwehavetodoinordertousesynonymexpansionsinElasticsearch.Nowlet’sseewhatformatsofsynonymsareallowed.

UsingApacheSolrsynonymsThemostcommonsynonymstructureintheApacheLuceneworldisprobablytheoneusedbyApacheSolr(http://lucene.apache.org/solr/),thesearchenginebuiltontopofLucene,justlikeElasticsearchis.ThisisthedefaultwayofhandlingsynonymsinElasticsearchandthepossibilitiesofdefininganewsynonymarediscussedinthefollowingsections.

Explicitsynonyms

Asimplemappingallowsustomapalistofwordsontootherwords.So,inourcase,ifwewantthewordcriminalitytobemappedtocrimeandthewordabusetobemappedtopunishment,weneedtodefinethefollowingentries:

criminality=>crime

abuse=>punishment

Ofcourse,asinglewordcanbemappedintomultipleonesandmultipleonescanbemappedintoasingleone.Forexample:

starwars,wars=>starwars

Theprecedingexamplemeansthatstarwarsandwarswillbechangedtostarwarsbythesynonymfilter.

Equivalentsynonyms

Inadditiontotheexplicitmapping,Elasticsearchallowsustouseequivalentsynonyms.Forexample,thefollowingdefinitionwillmakeallthewordsexchangeablesothatyoucanuseanyofthemtomatchadocumentthathasoneoftheminitscontents:

star,wars,starwars,starwars

Expandingsynonyms

AsynonymfilterallowsustouseoneadditionalpropertywhenitcomestoApacheSolrformatsynonyms–theexpandproperty.Whentheexpandpropertyissettotrue(bydefaultitissettofalse),allsynonymswillbeexpandedbyElasticsearchtoallequivalentforms.Forexample,let’ssaywehavethefollowingfilterconfiguration:

"filter":{

"synonym":{

"type":"synonym",

"expand":false,

"synonyms":[

"one,two,three"

]

}

}

www.EBooksWorld.ir

Page 399: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Elasticsearchwillmaptheprecedingsynonymdefinitiontothefollowing:

one,two,three=>one

Thismeansthatthewordsone,two,andthreewillbechangedtoone.However,ifwesettheexpandpropertytotrue,thesamesynonymdefinitionwillbeinterpretedinthefollowingway:

one,two,three=>one,two,three

Thisbasicallymeansthateachofthewordsfromtheleft-sideofthedefinitionwillbeexpandedtoallthewordsontheright-side.

UsingWordNetsynonymsIfwewanttouseWordNet-structured(tolearnmoreaboutWordNet,visithttp://wordnet.princeton.edu/)synonyms,weneedtoprovideanadditionalpropertyforoursynonymfilter.ThepropertynameisformatandweshouldsetitsvaluetowordnetinorderforElasticsearchtounderstandthatformat.

www.EBooksWorld.ir

Page 400: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Queryorindex-timesynonymexpansionAswithalltheanalyzers,onecanwonderwhentousethesynonymfilter–duringindexing,duringquerying,ormaybeduringindexingandquerying.Ofcourse,itdependsonyourneeds.However,rememberthatusingindex-timesynonymsrequiresdatareindexingaftereachsynonymchange.That’sbecausetheyneedtobereappliedtoallthedocuments.Ifweuseonlythequery-timesynonyms,wecanupdatethesynonym’slistsandhavethemappliedwithoutdatareindexation.

www.EBooksWorld.ir

Page 401: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 402: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UnderstandingtheexplaininformationComparedtodatabases,usingsystemscapableofperformingfull-textsearchcanoftenbeanythingotherthanobvious.Wecansearchinmanyfieldssimultaneouslyandthedataintheindexcanvaryfromtheonesprovidedasthevaluesofthedocumentfields(becauseoftheanalysisprocess,synonyms,abbreviations,andothers).It’sevenworse!Bydefault,searchenginessortdatabyrelevance,whichmeansthateachdocumentisgivenanumberindicatinghowsimilarthedocumentistothequery.Thekeypointhereisunderstandingthehowsimilarphrase.Aswediscussedinthebeginningofthechapter,scoringtakesmanyfactorsintoaccount–howmanysearchedwordswerefoundinthedocument,howfrequentthewordis,howmanytermsareinthefield,andsoon.Thisseemscomplicatedandfindingoutwhyadocumentwasfoundandwhyanotherdocumentisbetterisnoteasy.Fortunately,Elasticsearchprovidesuswithtoolsthatcananswerthesequestionsandwewilllookattheminthissection.

www.EBooksWorld.ir

Page 403: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UnderstandingfieldanalysisOneofthecommonquestionsaskedwhenanalyzingthereturneddocumentsiswhyagivendocumentwasnotfound.Inmanycases,theproblemliesinthemappingsdefinitionandtheanalysisprocessconfiguration.Fordebuggingtheanalysisprocess,ElasticsearchprovidesadedicatedRESTAPIendpoint–the_analyzeone.

Usingitisverysimple.Let’sseehowitisusedbyrunningarequesttoElasticsearchtogiveusinformationonhowthecrimeandpunishmentphraseisanalyzed.Todothat,wewillrunacommandusingHTTPGETtothe_analyzeRESTend-pointandwewillprovidethephraseastherequestbody.Thefollowingcommanddoesthat:

curl-XGET'localhost:9200/_analyze?pretty'-d'CrimeandPunishment'

Inresponse,wegetthefollowingdata:

{

"tokens":[{

"token":"crime",

"start_offset":0,

"end_offset":5,

"type":"<ALPHANUM>",

"position":0

},{

"token":"and",

"start_offset":6,

"end_offset":9,

"type":"<ALPHANUM>",

"position":1

},{

"token":"punishment",

"start_offset":10,

"end_offset":20,

"type":"<ALPHANUM>",

"position":2

}]

}

Aswecansee,Elasticsearchdividedtheinputphraseintothreetokens.Duringprocessing,thephrasewasdividedintotokensonthebasisofwhitespacecharactersandwaslowercased.Thisshowsusexactlywhatwouldbehappeningduringtheanalysisprocess.Wecanalsoprovidethenameoftheanalyzer.Forexample,wecanchangetheprecedingcommandtosomethinglikethis:

curl-XGET'localhost:9200/_analyze?analyzer=standard&pretty'-d'Crimeand

Punishment'

Theprecedingcommandwillallowustocheckhowthestandardanalyzeranalyzesthedata.

ItisworthnotingthatthereisanotherformofanalysisAPIavailable–onewhichallowsustoprovidetokenizersandfilters.Itisveryhandywhenwewanttoexperimentwithconfigurationbeforecreatingthetargetmappings.Insteadofspecifyingtheanalyzer

www.EBooksWorld.ir

Page 404: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

parameterintherequest,weprovidethetokenizerandthefiltersparameters.Wecanprovideasingletokenizerandalistoffilters(separatedbycommacharacter).Forexample,toillustratehowtokenizationusingwhitespacetokenizerworkswithlowercaseandkstemfilterswewouldrunthefollowingrequest:

curl-XGET'localhost:9200/library/_analyze?

tokenizer=whitespace&filters=lowercase,kstem&pretty'-d'JohnSmith'

Aswecansee,ananalysisAPIcanbeveryusefulfortrackingdownbugsinthemappingconfiguration.Itisalsopricelesswhenwewanttosolveproblemswithqueriesandmatching.Itcanshowushowouranalyzerswork,whattermstheyproduce,andwhattheattributesofthosetermsare.Withsuchinformation,analyzingthequeryproblemswillbeeasiertotrackdown.

www.EBooksWorld.ir

Page 405: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ExplainingthequeryInadditiontolookingatwhathappenedduringanalysis,Elasticsearchallowsustoexplainhowthescorewascalculatedforaparticularqueryanddocument.Let’slookatthefollowingexample:

curl-XGET'localhost:9200/library/book/1/_explain?pretty&q=quiet'

Theprecedingrequestspecifiesadocumentandaquerytorun.ThedocumentisspecifiedintheURIandthequeryispassedusingtheqparameter.Usingthe_explainendpoint,weaskElasticsearchforanexplanationabouthowthedocumentwasmatchedbyElasticsearch(ornotmatched).TheresponsereturnedbyElasticsearchfortheprecedingrequestlooksasfollows:

{

"_index":"library",

"_type":"book",

"_id":"1",

"matched":true,

"explanation":{

"value":0.057534903,

"description":"sumof:",

"details":[{

"value":0.057534903,

"description":"weight(_all:quietin0)[PerFieldSimilarity],result

of:",

"details":[{

"value":0.057534903,

"description":"fieldWeightin0,productof:",

"details":[{

"value":1.0,

"description":"tf(freq=1.0),withfreqof:",

"details":[{

"value":1.0,

"description":"termFreq=1.0",

"details":[]

}]

},{

"value":0.30685282,

"description":"idf(docFreq=1,maxDocs=1)",

"details":[]

},{

"value":0.1875,

"description":"fieldNorm(doc=0)",

"details":[]

}]

}]

},{

"value":0.0,

"description":"matchonrequiredclause,productof:",

"details":[{

"value":0.0,

"description":"#clause",

"details":[]

www.EBooksWorld.ir

Page 406: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

},{

"value":3.2588913,

"description":"_type:book,productof:",

"details":[{

"value":1.0,

"description":"boost",

"details":[]

},{

"value":3.2588913,

"description":"queryNorm",

"details":[]

}]

}]

}]

}

}

Itcanlookslightlycomplicatedandwell,itiscomplicated.Itisevenworseifwerealizethatthisisonlyasimplequery!Elasticsearch,andmorespecificallytheLucenelibrary,showstheinternalinformationaboutthescoringprocess.Wewillonlyscratchthesurfaceandwillexplainthemostimportantthingsabouttheprecedingresponse.

ThefirstthingthatyoucannoticeisthatfortheparticularqueryElasticsearchprovidedtheinformationifthedocumentwasamatchornot.Ifthematchedpropertyissettotrue,itmeansthatthedocumentwasamatchfortheprovidedquery.

Thenextimportantthingistheexplanationobject.Itcontainsthreeproperties:thevalue,thedescription,andthedetails.Thevalueisthescorecalculatedforthegivenpartofthequery.Thedescriptionisthesimplifiedtextrepresentationoftheinternalscorecalculation,andthedetailsobjectcontainsdetailedinformationaboutthescorecalculation.ThenicethingisthatthedetailsobjectwillagaincontainthesamethreepropertiesandthisishowElasticsearchprovidesuswithinformationonhowthescoreiscalculated.

Forexample,let’sanalyzethefollowingpartoftheresponse:

"value":0.057534903,

"description":"sumof:",

"details":[{

"value":0.057534903,

"description":"weight(_all:quietin0)[PerFieldSimilarity],result

of:",

"details":[{

"value":0.057534903,

"description":"fieldWeightin0,productof:",

"details":[{

"value":1.0,

"description":"tf(freq=1.0),withfreqof:",

"details":[{

"value":1.0,

"description":"termFreq=1.0",

"details":[]

}]

},{

www.EBooksWorld.ir

Page 407: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"value":0.30685282,

"description":"idf(docFreq=1,maxDocs=1)",

"details":[]

},{

"value":0.1875,

"description":"fieldNorm(doc=0)",

"details":[]

}]

}]

Thescoreoftheelementis0.057534903(thevalueproperty)anditisasumof(weseethatinthedescriptionproperty)alltheinnerelements.Inthedescriptiononthefirstlevelofnestingoftheprecedingfragment,wecanseethatPerFieldSimilarityhasbeenusedandthatthescoreofthatelementistheresultoftheinnerelements–thesecondlevelofnesting.

Onthesecondlevelofdetailsnesting,wecanseethreeelements.Thefirstoneshowsusthescoreoftheelement,whichistheproductofthetwoscoresoftheelementsbelowit.Wecanalsoseevariousinternalstatisticsretrievedfromtheindex:thetermfrequencywhichinformsushowcommonthetermis(termFreq=1.0),theinverteddocumentfrequency,whichshowsushowoftenthetermappearsinthedocuments(idf(docFreq=1,maxDocs=1)),andthefieldnormalizationfactor(fieldNorm(doc=0)).

TheExplainAPIsupportsthefollowingparameters:analyze_wildcard,analyzer,default_operator,df,fields,lenient,lowercase_expanded_terms,parent,preference,routing,_source,_source_exclude,and_source_include.Tolearnmoreaboutalltheseparameters,refertotheofficialElasticsearchdocumentationregardingExplainAPI,whichisavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html.

www.EBooksWorld.ir

Page 408: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 409: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SummaryThechapterwejustfinishedwasfocusedonquerying;notaboutthematchingpartofitbutmostlyaboutscoring.WelearnedhowApacheLuceneTF/IDFscoringworks.WesawthescriptingcapabilitiesofElasticsearchandwehandledmultilingualdata.Weusedboostingtoinfluencehowthescoresofthereturneddocumentswerecalculatedandweusedsynonyms.Finally,weusedexplaininformationtoseehowthedocumentscoreswerecalculatedbythequery.

Inthenextchapter,wewillfullyfocusonElasticsearchdataanalysiscapabilities–theaggregations,theirtypes,andhowtheycanbeused.

www.EBooksWorld.ir

Page 410: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 411: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Chapter7.AggregationsforDataAnalysisInthepreviouschapter,wediscussedthequeryingsideofElasticsearchagain.WelearnedhowtheLuceneTF/IDFalgorithmworksandhowtouseElasticsearchscriptingcapabilities.Wehandledmultilingualdataandinfluenceddocumentscoreswithboosts.WeusedsynonymstomatchwordsthathavethesamemeaningandweusedElasticsearchExplainAPItoseehowdocumentscoreswerecalculated.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

WhatareaggregationsHowtheElasticsearchaggregationengineworksHowtousemetricsaggregationsHowtousebucketsaggregationsHowtousepipelineaggregations

www.EBooksWorld.ir

Page 412: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AggregationsIntroducedinElasticsearch1.0,aggregationsaretheheartofdataanalyticsinElasticsearch.Highlyflexibleandperformant,aggregationsbroughtElasticsearch1.0toanewpositionasafull-featuredanalysisengine.ExtendedthroughthelifeofElasticsearch1.x,in2.xtheyareyetmorepowerful,lessmemorydemanding,andfaster.Withthisframework,youcanuseElasticsearchastheanalysisenginefordataextractionandvisualization.Let’sseehowthatfunctionalityworksandwhatwecanachievebyusingit.

www.EBooksWorld.ir

Page 413: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

GeneralquerystructureTouseaggregations,weneedtoaddanadditionalsectioninourquery.Ingeneral,ourquerieswithaggregationslooklikethis:

{

"query":{…},

"aggs":{

"aggregation_name":{

"aggregation_type":{

...

}

}

}

}

Intheaggsproperty(youcanuseaggregationsifyouwant;aggsisjustanabbreviation),youcandefineanynumberofaggregations.EachaggregationisdefinedbyitsnameandoneofthetypesofaggregationsthatareprovidedbyElasticsearch.Onethingtorememberthoughisthatthekeydefinesthenameoftheaggregation(youwillneedittodistinguishparticularaggregationsintheserverresponse).Let’stakeourlibraryindexandcreatethefirstqueryusinguseaggregations.Acommandsendingsuchaquerylookslikethis:

curl'localhost:9200/library/_search?

search_type=query_then_fetch&size=0&pretty'-d'{

"aggs":{

"years":{

"stats":{

"field":"year"

}

},

"words":{

"terms":{

"field":"copies"

}

}

}

}'

Thisquerydefinestwoaggregations.Theaggregationnamedyearsshowsstatisticsfortheyearfield.Thewordsaggregationcontainsinformationaboutthetermsusedinagivenfield.

NoteInourexamplesweassumedthatweperformaggregationinadditiontosearching.Ifwedon’tneedfounddocuments,abetterideaistousethesizeparameterandsetitto0.Thisomitssomeunnecessaryworkandismoreefficient.Insuchacase,theendpointshouldbe/library/_search?size=0.YoucanreadmoreaboutsearchtypesinChapter3,UnderstandingtheQueryingProcess.

Let’snowlookattheresponsereturnedbyElasticsearchfortheprecedingquery:

www.EBooksWorld.ir

Page 414: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"words":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":0,

"doc_count":2

},{

"key":1,

"doc_count":1

},{

"key":6,

"doc_count":1

}]

},

"years":{

"count":4,

"min":1886.0,

"max":1961.0,

"avg":1928.0,

"sum":7712.0

}

}

}

Asyousee,boththeaggregations(yearsandwords)werereturned.Thefirstaggregationwedefinedinourquery(years)returnedgeneralstatisticsforthegivenfieldgatheredacrossallthedocumentsthatmatchedourquery.Thesecondofthedefinedaggregations(words)wasabitdifferent.Itcreatedseveralsetscalledbucketsthatwerecalculatedonthereturneddocumentsandeachoftheaggregatedvalueswaswithinoneofthesesets.Asyoucansee,therearemultipleaggregationtypesavailableandtheyreturndifferentresults.Wewillseethedifferencesinthelaterpartofthissection.

Thegreatthingabouttheaggregationengineisthatitallowsyoutohavemultipleaggregationsandthataggregationscanbenested.Thismeansthatyoucanhaveindefinitelevelsofnestingandanynumberofaggregationsingeneral.Theextendedstructureofthequeryisshownnext:

{

"query":{…},

"aggs":{

www.EBooksWorld.ir

Page 415: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"first_aggregation_name":{

"aggregation_type":{

...

},

"aggregations":{

"first_nested_aggregation":{

...

},

.

.

.

"nth_nested_aggregation":{

...

}

}

},

.

.

.

"nth_aggregation_name":{

...

}

}

}

www.EBooksWorld.ir

Page 416: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

InsidetheaggregationsengineAggregationsworkonthebasisofresultsreturnedbythequery.Thisisveryhandyaswegettheinformationthatweareinterestedin,bothfromthequeryaswellasthedataanalysisperspective.SowhatdoesElasticsearchdowhenweincludetheaggregationpartofthequeryintherequestthatwesendtoElasticsearch?Firstofall,theaggregationisexecutedoneachrelevantshardandtheresultsarereturnedtothenodethatisresponsibleforrunningthatquery.Thatnodewaitsforthepartialresultstobecalculated;afteritgetsalltheresults,itmergestheresults,producingthefinalresults.

Thisapproachisnothingnewwhenitcomestodistributedsystemsandhowtheyworkandcommunicate,butcancauseissueswhenitcomestotheprecisionoftheresults.Inmostcasesthisisnotaproblem,butyoushouldbeawareaboutwhattoexpect.Let’simaginethefollowingexample:

Theprecedingimageshowsasimplifiedviewofthreeshards,eachcontainingdocumentshavingonlyElasticsearchandSolrtermsinthem.Nowimaginethatweareinterestedinasingletermforourindex.Thetermsaggregationwhenrunusingsize=1wouldreturnasingleterm,thatwouldbetheonethatisthemostfrequent(ofcourselimitedtothequerywe’verun).SoouraggregatornodewouldseepartialresultstellingusthatElasticsearchispresentin19documentsinShard1andtheSolrtermispresentin10documentsinShard2andShard3,whichmeansthatthetoptermisSolr,whichisnottrue.Thisisanextremecase,butthereareusecases(suchasaccounting)whereprecisioniskeyandyoushouldbeawareaboutsuchsituations.

NoteComparedtoqueries,aggregationsareheavierforElasticsearchintermsofbothCPUcyclesandmemoryconsumption.WewilldiscussthisinmoredetailintheCachingAggregationssectionofthischapter.

www.EBooksWorld.ir

Page 417: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 418: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AggregationtypesElasticsearch2.xallowsustousethreetypesofaggregation:metrics,buckets,andpipeline.Themetricsaggregationsreturnametric,justlikethestatsaggregationweusedforthestatsfield.Thebucketaggregationsreturnbuckets,thekeyandthenumberofdocumentssharingthesamevalues,ranges,andsoon,justlikethetermsaggregationweusedforthecopiesfield.Finally,thepipelineaggregationsintroducedinElasticsearch2.0aggregatetheoutputoftheotheraggregationsandtheirmetrics,whichallowsustodoevenmoresophisticateddataanalysis.Knowingallthat,let’snowlookatalltheaggregationswecanuseinElasticsearch2.x.

www.EBooksWorld.ir

Page 419: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

MetricsaggregationsWewillstartwiththemetricsaggregations,whichcanaggregatevaluesfromdocumentsintoasinglemetric.Thisisalwaysthecasewithmetricsaggregations–youcanexpectthemtobeasinglemetriconthebasisofthedata.Let’snowtakealookatthemetricsaggregationsavailableinElasticsearch2.x.

Minimum,maximum,average,andsumThefirstgroupofmetricsaggregationsthatwewanttoshowyouistheonethatcalculatesthebasicvaluefromthegivendocuments.Theseaggregationsare:

min:Thiscalculatestheminimumvaluefromthegivennumericfieldinthereturneddocumentsmax:Thiscalculatesthemaximumvaluefromthegivennumericfieldinthereturneddocumentsavg:Thiscalculatesanaveragefromthegivennumericfieldinthereturneddocumentssum:Thiscalculatesthesumfromthegivennumericfieldinthereturneddocuments

Asyoucansee,theprecedingmentionedaggregationsareprettyself-explanatory.So,let’strytocalculatetheaveragevalueonourdata.Forexample,let’sassumethatwewanttocalculatetheaveragenumberofcopiesforourbooks.Thequerytodothatwilllookasfollows:

{

"aggs":{

"avg_copies":{

"avg":{

"field":"copies"

}

}

}

}

TheresultsreturnedbyElasticsearchafterrunningtheprecedingquerywillbeasfollows:

{

"took":5,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"avg_copies":{

"value":1.75

www.EBooksWorld.ir

Page 420: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

}

}

So,wehaveanaverageof1.75copiesperbook.Itisveryeasytocalculate–(6+0+1+0)/4isequalto1.75.Seemsthatwegotitright.

Missingvalues

ThenicethingaboutthepreviouslymentionedaggregationsisthatwecancontrolwhatvalueElasticsearchcanuseifthefieldswe’vespecifieddon’thaveany.Forexample,ifwewantedElasticsearchtouse0asthevalueforthecopiesfieldinourpreviousexample,wewouldaddthemissingpropertytoourqueryandandsetitto0.Forexample:

{

"aggs":{

"avg_copies":{

"avg":{

"field":"copies",

"missing":0

}

}

}

}

Usingscripts

Theinputvaluescanalsobegeneratedbyascript.Forexample,ifwewanttofindtheminimumvaluefromallthevaluesintheyearfield,butwewanttosubtract1000fromthosevalues,wewillsendanaggregationlikethefollowingone:

{

"aggs":{

"min_year":{

"min":{

"script":"doc['year'].value-1000"

}

}

}

}

NoteNotethattheprecedingqueryrequiresinlinescriptstobeallowed.Thismeansthatthequeryrequiresthescript.inlinepropertysettoonintheelasticsearch.ymlfile.

Inthiscase,thevaluetheaggregationswillusewillbetheoriginalyearfieldvaluereducedby1000.

WecanalsousethevaluescriptcapabilitiesofElasticsearch.Forexample,toachievethesameasthepreviousscript,wecanusethefollowingquery:

{

"aggs":{

"min_year":{

"min":{

www.EBooksWorld.ir

Page 421: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"field":"year",

"script":{

"inline":"_value-factor",

"params":{

"factor":1000

}

}

}

}

}

}

IfyouarenotfamiliarwithElasticsearchscriptingcapabilities,youcanreadmoreaboutitintheScriptingcapabilitiesofElasticsearchsectionofChapter6,MakeYourSearchBetter.

Onethingworthrememberingisthatusingthecommandlinemayrequireproperescapingofthevaluesinthedocarray.Forexample,thecommandthatexecutesthefirstscriptedquerywouldlookasfollows:

curl-XGET'localhost:9200/library/_search?size=0&pretty'-d'{

"aggs":{

"min_year":{

"min":{

"script":"doc[\"year\"].value-1000"

}

}

}

}'

FieldvaluestatisticsandextendedstatisticsThenextaggregationswewilldiscussaretheonesthatprovideuswiththestatisticalinformationaboutthenumericfieldwearerunningtheaggregationon:thestatsandextended_statsaggregations.

Forexample,thefollowingqueryprovidesextendedstatisticsfortheyearfield:

{

"aggs":{

"extended_statistics":{

"extended_stats":{

"field":"year"

}

}

}

}

Theresponsetotheprecedingquerywillbeasfollows:

{

"took":1,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

www.EBooksWorld.ir

Page 422: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"extended_statistics":{

"count":4,

"min":1886.0,

"max":1961.0,

"avg":1928.0,

"sum":7712.0,

"sum_of_squares":1.4871654E7,

"variance":729.5,

"std_deviation":27.00925767213901,

"std_deviation_bounds":{

"upper":1982.018515344278,

"lower":1873.981484655722

}

}

}

}

Asyoucansee,intheresponsewegotinformationaboutthenumberofdocumentswithvalueintheyearfield,theminimumvalue,themaximumvalue,theaverage,andthesum.Thesearethevaluesthatwewillgetifwerunthestatsaggregationinsteadofextended_stats.Theextended_statsaggregationprovidesadditionalinformation,suchasthesumofsquares,variance,andstandarddeviation.Elasticsearchprovidestwotypesofaggregationsbecauseextended_statsisslightlymoreexpensivewhenitcomestoprocessingpower.

NoteThestatsandextended_statsaggregations,similartothemin,max,avg,andsumaggregations,supportscriptingandallowustospecifywhichvalueshouldbeusedforthefieldsthatdon’thavevalueinthespecifiedfield.

ValuecountThevalue_countaggregationisasimpleaggregationwhichallowscountingvaluesinaggregateddocuments.Thisisquiteusefulwhenusedwithnestedaggregations.Wearenotfocusingonthattopicrightnow,butitissomethingtokeepinmind.Forexample,tousethevalue_countaggregationonthecopiedfield,wewillrunthefollowingquery:

{

"aggs":{

"count":{

"value_count":{

"field":"copies"

}

}

}

www.EBooksWorld.ir

Page 423: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

NoteThevalue_countaggregationallowsustousescripts,discussedearlierinthischapterwhenwedescribedthemin,max,avg,andsumaggregations.PleaserefertothebeginningofMetricsaggregationsectionearlierinthecurrentchapterforfurtherreference.

FieldcardinalityOneoftheaggregationthatallowsustocontrolhowresourcehungrytheaggregationwillbebycontrollingitsprecision,thecardinalityaggregationcalculatesthecountofdistinctvaluesinagivenfield.However,onethingneedstoberemembered:thecalculatedcountisanapproximation,nottheexactvalue.ElasticsearchusestheHyperLogLog++algorithm(http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdftocalculatethevalue.

Thisaggregationhasawidevarietyofusecases,suchasshowingthenumberofdistinctvaluesinafieldthatisresponsibleforholdingthestatuscodeforyourindexedApacheaccesslogs.Onequery,andyouknowtheapproximatedcountofthedistinctvaluesinthatfield.

Forexample,wecanrequestthecardinalityforourtitlefield:

{

"aggs":{

"card_title":{

"cardinality":{

"field":"title"

}

}

}

}

Tocontroltheprecisionofthecardinalitycalculation,wecanspecifytheprecision_thresholdproperty–thehigherthevalue,themoreprecisetheaggregationwillbeandthemoreresourcesitwillneed.Thecurrentmaximumprecision_thresholdvalueis40000andthedefaultdependsontheparentaggregation.Anexamplequeryusingtheprecision_thresholdpropertylooksasfollows:

{

"aggs":{

"card_title":{

"cardinality":{

"field":"title",

"precision_threshold":1000

}

}

}

}

Percentiles

www.EBooksWorld.ir

Page 424: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThepercentilesaggregationisanotherexampleofaggregationinElasticsearch.Itusesanalgorithmicapproximationapproachtoprovideuswithresults.ItusestheT-Digestalgorithm(https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)fromTedDunningandOtmarErtlandallowsustocalculatepercentiles:metricsthatshowushowmanyresultsareaboveacertainvalue.Forexample,the99thpercentileshowsusthevaluethatisgreaterthan99percentoftheothervalues.

Let’sgointoanexampleandlookataquerythatwillcalculatepercentilesfortheyearfieldinourdata:

{

"aggs":{

"copies_percentiles":{

"percentiles":{

"field":"year"

}

}

}

}

TheresultsreturnedbyElasticsearchfortheprecedingrequestwilllookasfollows:

{

"took":26,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"copies_percentiles":{

"values":{

"1.0":1887.2899999999997,

"5.0":1892.4499999999998,

"25.0":1918.25,

"50.0":1932.5,

"75.0":1942.25,

"95.0":1957.25,

"99.0":1960.25

}

}

}

}

Asyoucansee,thevaluethatishigherthan99percentofthevaluesis1960.25.

Youmaywonderwhysuchaggregationisimportant.Itisveryusefulforperformancemetrics;forexample,whereweusuallylookataveragesforsomeperiodoftime.Imaginethattheaverageresponsetimeofourqueriesforthelasthouris50milliseconds,whichis

www.EBooksWorld.ir

Page 425: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

notbad.However,ifthe95thpercentilewouldshow2seconds,thatwouldmeanthatabout5percentoftheusershadtowaittwoormoresecondsforthesearchresults,whichisnotthatgood.

Bydefault,thepercentilesaggregationcalculatessevenpercentiles:1,5,25,50,75,95,and99.Wecancontrolthisbyusingthepercentspropertyandspecifywhichpercentilesweareinterestedin.Forexample,ifwewanttogetonlythe95thandthe99thpercentile,wechangeourquerytothefollowingone:

{

"aggs":{

"copies_percentiles":{

"percentiles":{

"field":"year",

"percents":["95","99"]

}

}

}

}

NoteSimilartothemin,max,avg,andsumaggregations,thepercentilesaggregationsupportsscriptingandallowsustospecifywhichvalueshouldbeusedforthefieldsthatdon’thavevalueinthespecifiedfield.

We’vementionedearlierthatthepercentilesaggregationusesanalgorithmicapproachandisanapproximation.Aswithallapproximations,wecancontroltheprecisionandmemoryusageofthealgorithm.Wedothatbyusingthecompressionproperty,whichdefaultsto100.ItisaninternalpropertyofElasticsearchanditsimplementationdetailsmaychangebetweenversions.Itisworthknowingthatsettingthecompressionvaluetoonehigherthan100canincreasethealgorithmprecisionatthecostofmemoryusage.

PercentileranksThepercentile_ranksaggregationissimilartothepercentilesonethatwejustdiscussed.Itallowsustoshowwhichpercentileagivenvaluehas.Forexample,toshowuswhichpercentileyear1932andyear1960are,werunthefollowingquery:

{

"aggs":{

"copies_percentile_ranks":{

"percentile_ranks":{

"field":"year",

"values":["1932","1960"]

}

}

}

}

TheresponsereturnedbyElasticsearchwillbeasfollows:

{

"took":2,

www.EBooksWorld.ir

Page 426: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"copies_percentile_ranks":{

"values":{

"1932.0":49.5,

"1960.0":61.5

}

}

}

}

TophitsaggregationThetop_hitsaggregationkeepstrackofthemostrelevantdocumentbeingaggregated.Thisdoesn’tsoundveryappealing,butitallowsustoimplementoneofthemostdesiredfunctionalitiesinElasticsearchcalleddocumentgrouping,fieldcollapsing,ordocumentfolding.Suchfunctionalityisveryusefulinsomeusecases—forexample,whenwewanttoshowabookcatalogbutonlyonefromasinglepublisher.Todothatwithoutthetop_hitsaggregation,wewouldhavetorunmultiplequeries.Withthetop_hitsaggregation,weneedonlyasinglequery.

Thetop_hitsaggregationwasintroducedinElasticsearch1.3.Infact,thementioneddocumentfoldingismoreorlessasideeffectandonlyoneofthepossibleusageexamplesofthetop_hitsaggregation.

Theideabehindthetop_hitsaggregationissimple.Everydocumentthatisassignedtoaparticularbucketcanbealsoremembered.Bydefault,onlythreedocumentsperbucketareremembered.

NoteNotethat,inordertoshowthefullpotentialofthetop_hitsaggregation,wedecidedtouseoneofthebucketingaggregationsaswellandnestthemtoshowthedocumentgroupingfunctionalityimplementation.Thebucketingaggregationsaredescribedindetaillaterinthischapter.

Toshowyouapotentialusecasethatleveragesthetop_hitsaggregation,wehavedecidedtouseasimpleexample.Wewouldliketogetthemostrelevantbookpublishedevery100years.Todothatweusethefollowingquery:

{

"aggs":{

"when":{

www.EBooksWorld.ir

Page 427: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"book":{

"top_hits":{

"_source":{

"include":["title","available"]

},

"size":1

}

}

}

}

}

}

Intheprecedingexample,wedidthehistogramaggregationonyearranges.Everybucketwascreatedforeveryonehundredyears.Thenestedtop_hitsaggregationsremembersasingledocumentwiththegreatestscorefromeachbucket(becauseofthesizepropertybeingsetto1).Weaddedtheincludeoptiononlyforsimplerresults,sothatweonlyreturnthetitleandavailablefieldsforeveryaggregateddocument.TheresponsereturnedbyElasticsearchwillbeasfollows:

{

"took":8,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"when":{

"buckets":[{

"key":1800,

"doc_count":1,

"book":{

"hits":{

"total":1,

"max_score":1.0,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":1.0,

"_source":{

"available":true,

"title":"CrimeandPunishment"

www.EBooksWorld.ir

Page 428: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

}]

}

}

},{

"key":1900,

"doc_count":3,

"book":{

"hits":{

"total":3,

"max_score":1.0,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"2",

"_score":1.0,

"_source":{

"available":false,

"title":"Catch-22"

}

}]

}

}

}]

}

}

}

Wecanseethat,becauseofthetop_hitsaggregation,wehavethemostscoringdocument(fromeachbucket)includedintheresponse.Inourparticularcase,thequerywasthematch_alloneandallthedocumentshadthesamescore,sothetop-scoringdocumentforeverybucketwasmoreorlessrandom.However,youneedtorememberthatthisisthedefaultbehavior.Ifwewanttohavecustomsorting,thisisnotaproblemforElasticsearch.Wejustneedtoaddthesortpropertyforourtop_hitsaggregator.Forexample,wecanreturnthefirstbookfromagivencentury:

{

"aggs":{

"when":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"book":{

"top_hits":{

"sort":{

"year":"asc"

},

"_source":{

"include":["title","available"]

},

"size":1

}

}

www.EBooksWorld.ir

Page 429: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

}

}

}

Weaddedsortingtothetop_hitsaggregation,sotheresultsaresortedonthebasisoftheyearfield.Thismeansthatthefirstdocumentwillbetheonewiththelowestvalueinthatfieldandthisisthedocumentthatisgoingtobereturnedforeachbucket.

Additionalparameters

Sortingandfieldinclusionisnoteverythingthatwecanwedoinsidethetop_hitsaggregation.Becausethisaggregationreturnsdocuments,wecanalsousefunctionalitiessuchas:

highlightingexplainscriptingfielddatafield(uninvertedrepresentationofthefields)version

Wejustneedtoincludeanappropriatesectioninthetop_hitsaggregationbody,similartowhatwedowhenweconstructaquery.Forexample:

{

"aggs":{

"when":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"book":{

"top_hits":{

"highlight":{

"fields":{

"title":{}

}

},

"explain":true,

"version":true,

"_source":{

"include":["title","available"]

},

"fielddata_fields":["title"],

"script_fields":{

"century":{

"script":"(doc[\"year\"].value/100).intValue()"

}

},

"size":1

}

}

}

}

www.EBooksWorld.ir

Page 430: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

}

NoteNotethattheprecedingqueryrequirestheinlinescriptstobeallowed.Thismeansthatthequeryrequiresthescript.inlinepropertysettoonintheelasticsearch.ymlfile.

GeoboundsaggregationThegeo_boundsaggregationisasimpleaggregationthatallowsustocomputetheboundingboxthatincludesallthegeo_pointtypefieldvaluesfromtheaggregateddocuments.

NoteIfyouareinterestedinspatialsearches,thesectiondedicatedtoitiscalledGeoandisincludedinChapter8,BeyondFull-textSearching.

Weonlyneedtoprovidethefield(byusingthefieldproperty;itneedstobeofthegeo_pointtype).Wecanalsoprovidewrap_longitude(valuestrueorfalse;itdefaultstotrue)iftheboundingboxisallowedtooverlaptheinternationaldateline.Inresponse,wegetthelatitudeandlongitudeofthetop-leftandbottom-rightcornersoftheboundingbox.Anexamplequeryusingthisaggregationlooksasfollows(usingthehypotheticallocationfield):

{

"aggs":{

"box":{

"geo_bounds":{

"field":"location"

}

}

}

}

ScriptedmetricsaggregationThelastmetricaggregationwewanttodiscussisthescripted_metricaggregation,whichallowsustodefineourownaggregationcalculationusingscripts.Forthisaggregation,wecanprovidethefollowingscripts(map_scriptistheonlyrequiredone,therestareoptional):

init_script:Thisscriptisrunduringinitializationandallowsustosetupaninitialstateofthecalculation.map_script:Thisistheonlyrequiredscript.Itisexecutedonceforeverydocumentthatneedstostorethecalculationinanobjectcalled_agg.combine_script:ThisscriptisexecutedonceoneachshardafterElasticsearchfinishesdocumentcollectiononthatshard.reduce_script:Thisscriptisexecutedonceonthenodethatiscoordinatingaparticularqueryexecution.Thisscripthasaccesstothe_aggsvariable,whichisanarrayofthevaluesreturnedbycombine_script.

www.EBooksWorld.ir

Page 431: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Forexample,wecanusethescripted_metricaggregationtocalculateallthecopiesofallthebookswehaveinourlibrarybyrunningthefollowingrequest(weshowthewholerequesttoshowhowthenamesareescaped):

curl-XGET'localhost:9200/library/_search?size=0&pretty'-d'{

"aggs":{

"all_copies":{

"scripted_metric":{

"init_script":"_agg[\"all_copies\"]=0",

"map_script":"_agg.all_copies+=doc.copies.value",

"combine_script":"return_agg.all_copies",

"reduce_script":"sum=0;for(numberin_aggs){sum+=number};

returnsum"

}

}

}

}'

Ofcourse,theprecedingscriptisjustasimplesumandwecouldusesumaggregation,butwejustwantedtoshowyouasimpleexampleofwhatyoucandowiththescripted_metricaggregation.

NoteNotethattheprecedingqueryrequiresinlinescriptstobeallowed.Thismeansthatthequeryrequiresthescript.inlinepropertysettoonintheelasticsearch.ymlfile.

Asyoucansee,theinit_scriptpartoftheaggregationisusedtoinitializetheall_copiesvariable.Next,wehavemap_script,whichisexecutedonceforeverydocumentandwejustaddthevalueofthecopiesfieldtotheearlierinitializedvariable.Thecombine_scriptpart,executedonceoneachshard,tellsElasticsearchtoreturnthecalculatedvariable.Finally,thereduce_scriptpart,executedonceforthewholequeryontheaggregatornode,willrunaforloop,whichwillgothroughallthereturnedvaluesthatarestoredinthe_aggsarrayandreturnthesumofthose.ThefinalresultreturnedbyElasticsearchfortheprecedingquerylooksasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"all_copies":{

"value":7

}

www.EBooksWorld.ir

Page 432: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

}

www.EBooksWorld.ir

Page 433: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

BucketsaggregationsThesecondtypeofaggregationsthatwewilldiscussarethebucketsaggregations.Incomparisontometricsaggregations,bucketaggregationreturnsdatanotasasinglemetricbutasalistofkeyvaluepairscalledbuckets.Forexample,thetermsaggregationreturnsthenumberofdocumentsassociatedwitheachterminagivenfield.Theverypowerfulthingaboutbucketsaggregationsisthattheycanhavesub-aggregations,whichmeansthatwecannestotheraggregationsinsidetheaggregationsthatreturnbuckets(wewilldiscussthisattheendofthebucketsaggregationdiscussion).Let’slookatthebucketaggregationsthatareprovidedbyElasticsearchnow.

FilteraggregationThefilteraggregationisasimplebucketingaggregationthatallowsustofiltertheresultstoasinglebucket.Forexample,let’sassumethatwewanttogetacountandtheaveragecopiescountofallthebooksthatarenovels,whichmeanstheyhavethetermnovelinthetagsfield.Thequerythatwillreturnsuchresultslooksasfollows:

{

"aggs":{

"novels_count":{

"filter":{

"term":{

"tags":"novel"

}

},

"aggs":{

"avg_copies":{

"avg":{

"field":"copies"

}

}

}

}

}

}

Asyoucansee,wedefinedthefilterinthefiltersectionoftheaggregationdefinitionandwedefinedasecondnestedaggregation.Thenestedaggregationistheonethatwillberunonthefiltereddocuments.

TheresponsereturnedbyElasticsearchlooksasfollows:

{

"took":13,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

www.EBooksWorld.ir

Page 434: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"max_score":0.0,

"hits":[]

},

"aggregations":{

"novels_count":{

"doc_count":2,

"avg_copies":{

"value":3.5

}

}

}

}

Inthereturnedbucket,wehaveinformationaboutthenumberofdocuments(representedbythedoc_countproperty)andtheaveragenumberofcopies,whichisallwewanted.

FiltersaggregationThesecondbucketaggregationwewanttoshowyouisthefiltersaggregation.Whilethepreviouslydiscussedfilteraggregationresultedinasinglebucket,thefiltersaggregationreturnsmultiplebuckets–oneforeachofthedefinedfilters.Let’sextendourpreviousexampleandassumethat,inadditiontotheaveragenumberofcopiesforthenovels,wealsowanttoknowtheaveragenumberofcopiesforthebooksthatareavailable.Thequerythatwillgetusthisinformationwillusethefiltersaggregationandwilllookasfollows:

{

"aggs":{

"count":{

"filters":{

"filters":{

"novels":{

"term":{

"tags":"novel"

}

},

"available":{

"term":{

"available":true

}

}

}

},

"aggs":{

"avg_copies":{

"avg":{

"field":"copies"

}

}

}

}

}

}

Let’sstophereandlookatthedefinitionoftheaggregation.Asyoucansee,wedefined

www.EBooksWorld.ir

Page 435: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

twofiltersusingthefilterssectionofthefiltersaggregation.EachfilterhasanameandtheactualElasticsearchfilter;thefirstiscallednovelsandthesecondiscalledavailable.Elasticsearchwillusethesenamesinthereturnedresponse.ThethingtorememberisthatElasticsearchwillcreateabucketforeachdefinedfilterandwillcalculatethenestedaggregationthatwedefined–inourcase,theonethatcalculatestheaveragenumberofcopies.

NoteThefiltersaggregationallowsustoreturnonemorebucketinadditiontothedefinedones–abucketwithallthedocumentsthatdidn’tmatchthefilters.Inordertocalculatesuchabucket,weneedtoaddtheother_bucketpropertytothebodyoftheaggregationandsetittotrue.

TheresultsreturnedbyElasticsearchareasfollows:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"count":{

"buckets":{

"novels":{

"doc_count":2,

"avg_copies":{

"value":3.5

}

},

"available":{

"doc_count":2,

"avg_copies":{

"value":0.5

}

}

}

}

}

}

Asyoucansee,wegottwobuckets,whichiswhatweexpected.

TermsaggregationOneofthemostcommonlyusedbucketaggregationsisthetermsaggregation.Itallows

www.EBooksWorld.ir

Page 436: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ustogetinformationaboutthetermsandthecountofdocumentshavingthoseterms.Forexample,oneofthesimplestusesisgettingthecountofthebooksthatareavailableandnotavailable.Wecandothatbyrunningthefollowingquery:

{

"aggs":{

"counts":{

"terms":{

"field":"available"

}

}

}

}

Intheresponse,wewillgettwobuckets(becausetheBooleanfieldcanonlyhavetwovalues–trueandfalse).Here,thiswilllookasfollows:

{

"took":7,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"counts":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":0,

"key_as_string":"false",

"doc_count":2

},{

"key":1,

"key_as_string":"true",

"doc_count":2

}]

}

}

}

Bydefault,thedataissortedonthebasisofdocumentcount,whichmeansthatthemostcommontermswillbeplacedontopoftheaggregationresults.Ofcourse,wecancontrolthisbehaviorbyspecifyingtheorderpropertyandprovidingtheorderjustlikeweusuallydowhensortingbyarbitraryfieldvalues.Elasticsearchallowsustosortbythedocumentcount(usingthe_countstaticvalue)andbytheterm(usingthe_termstaticvalue).Forexample,ifwewanttosortourprecedingaggregationresultsbydescendingterm,wecanrunthefollowingquery:

www.EBooksWorld.ir

Page 437: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

{

"aggs":{

"counts":{

"terms":{

"field":"available",

"order":{"_term":"desc"}}

}

}

}

However,that’snotallwhenitcomestosorting.Wecanalsosortbytheresultsofthenestedaggregationsthatwereincludedinthequery.

Notetermsaggregation,similartothemin,max,avg,andsumaggregationsdiscussedinthemetricsaggregationsectionofthischapter,supportsscriptingandallowsustospecifywhichvalueshouldbeusedforthefieldsthatdon’thaveavalueinthespecifiedfield.

Countsareapproximate

Thethingtorememberwhendiscussingtermsaggregationisthatthecountsareapproximate.Thisisbecauseeachshardprovidesitsowncountsandreturnsthataggregatedinformationtothecoordinatingnode.Thecoordinatingnodeaggregatestheinformationitgotreturningthefinalinformationtotheclient.Becauseofthat,dependingonthedataandhowitisdistributedbetweentheshards,someinformationaboutthecountsmaybelostandthecountswillnotbeexact.Ofcourse,whendealingwithlowcardinalityfields,theapproximationwillbeclosertoexactnumbers,butstillthisissomethingthatshouldbeconsideredwhenusingthetermsaggregation.

Wecancontrolhowmuchinformationisreturnedfromeachoftheshardstothecoordinatingnode.Wecandothisbyspecifyingthesizeandtheshard_sizeproperties.Thesizepropertyspecifieshowmanybucketswillbereturnedatmost.Thehigherthesizeproperty,themoreaccuratethecalculationwillbe.However,thatwillcostusadditionalmemoryandCPUcycles,whichmeansthatthecalculationwillbemoreexpensiveandwillputmorepressureonthehardware.Thisisbecausetheresultsreturnedtothecoordinatingnodefromeachshardwillbelargerandtheresultmergingprocesswillbeharder.

Theshard_sizepropertycanbeusedtominimizetheworkthatneedstobedonebythecoordinatingnode.Whenset,thecoordinatingnodewillfetch(fromeachshard)thenumberofbucketsdeterminedbytheshard_sizeproperty.Thisallowsustoincreasetheprecisionoftheaggregationwhileavoidingtheadditionaloverheadonthecoordinatingnode.Rememberthattheshard_sizepropertycannotbesmallerthanthesizeproperty.

Finally,thesizepropertycanbesetto0,whichwilltellElasticsearchnottolimitthenumberofreturnedbuckets.Itisusuallynotwisetosetthesizepropertyto0asitcanresultinhighresourceconsumption.Also,avoidsettingthesizepropertyto0forhighcardinalityfieldsasthiswilllikelymakeyourElasticsearchclusterexplode.

Minimumdocumentcount

www.EBooksWorld.ir

Page 438: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Elasticsearchprovidesuswithtwoadditionalproperties,whichcanbeusefulincertainsituations:min_doc_countandshard_min_doc_count.Themin_doc_countpropertydefaultsto1andspecifieshowmanydocumentsmustmatchatermtobeincludedintheaggregationresults.Onethingtorememberisthatsettingthemin_doc_countpropertyto0willresultinreturningalltheterms,nomatteriftheyhaveamatchingdocumentornot.Thiscanresultinaverylargeresultsetforaggregationresults.Forexample,ifwewanttoreturntermsmatchedby5ormoredocuments,wewillrunthefollowingquery:

{

"aggs":{

"counts":{

"terms":{

"field":"available",

"min_doc_count":5}

}

}

}

Theshard_min_doc_countpropertyisverysimilaranddefineshowmanydocumentsmustmatchatermtobeincludedintheaggregation’sresults,butontheshardlevel.

RangeaggregationTherangeaggregationallowsustodefineoneormorerangesandElasticsearchcalculatesbucketsforthem.Forexample,ifwewanttocheckhowmanybookswerepublishedinagivenperiodoftime,wecreatethefollowingquery:

{

"aggs":{

"years":{

"range":{

"field":"year",

"ranges":[

{"to":1850},

{"from":1851,"to":1900},

{"from":1901,"to":1950},

{"from":1951,"to":2000},

{"from":2001}

]

}

}

}

}

Wespecifythefieldwewanttheaggregationtobecalculatedonandthearrayofranges.Eachrangeisdefinedbyoneortwoproperties:thetwoandfromsimilartotherangequerieswhichwealreadydiscussed.

TheresultreturnedbyElasticsearchforourdatalooksasfollows:

{

"took":23,

"timed_out":false,

"_shards":{

www.EBooksWorld.ir

Page 439: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"years":{

"buckets":[{

"key":"*-1850.0",

"to":1850.0,

"to_as_string":"1850.0",

"doc_count":0

},{

"key":"1851.0-1900.0",

"from":1851.0,

"from_as_string":"1851.0",

"to":1900.0,

"to_as_string":"1900.0",

"doc_count":1

},{

"key":"1901.0-1950.0",

"from":1901.0,

"from_as_string":"1901.0",

"to":1950.0,

"to_as_string":"1950.0",

"doc_count":2

},{

"key":"1951.0-2000.0",

"from":1951.0,

"from_as_string":"1951.0",

"to":2000.0,

"to_as_string":"2000.0",

"doc_count":1

},{

"key":"2001.0-*",

"from":2001.0,

"from_as_string":"2001.0",

"doc_count":0

}]

}

}

}

Forexample,between1901and1950wehadtwobooksreleased.

NoteTherangeaggregation,similartothemin,max,avg,andsumaggregationsdiscussedinthemetricsaggregationssectionofthischapter,supportsscriptingandallowsustospecifywhichvalueshouldbeusedforthefieldsthatdon’thaveavalueinthespecifiedfield.

Keyedbuckets

www.EBooksWorld.ir

Page 440: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Onethingthatshouldmentionwhenitcomestotherangeaggregationisthatwecangivethedefinedrangesnames.Forexample,let’sassumethatwewanttousethenamesBefore18thcenturyforthebooksreleasedbefore1799,18thcenturyforthebooksreleasedbetween1800and1900,19thcenturyforthebooksreleasedbetween1900and1999,andAfter19thcenturyforthebooksreleasedafter2000.Wecandothisbyaddingthekeypropertytoeachdefinedrange,givingitthename,andaddingthekeyedpropertysettotrue.Settingthekeyedpropertytotruewillassociateauniquestringvaluetoeachbucketandthekeypropertydefinesthenameforthebucketthatwillbeusedastheuniquename.Aquerythatdoesthatwilllookasfollows:

{

"aggs":{

"years":{

"range":{

"field":"year",

"keyed":true,

"ranges":[

{"key":"Before18thcentury","to":1799},

{"key":"18thcentury","from":1800,"to":1899},

{"key":"19thcentury","from":1900,"to":1999},

{"key":"After19thcentury","from":2000}

]

}

}

}

}

TheresponsereturnedbyElasticsearchinsuchacasewilllookasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"years":{

"buckets":{

"Before18thcentury":{

"to":1799.0,

"to_as_string":"1799.0",

"doc_count":0

},

"18thcentury":{

"from":1800.0,

"from_as_string":"1800.0",

"to":1899.0,

www.EBooksWorld.ir

Page 441: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"to_as_string":"1899.0",

"doc_count":1

},

"19thcentury":{

"from":1900.0,

"from_as_string":"1900.0",

"to":1999.0,

"to_as_string":"1999.0",

"doc_count":3

},

"After19thcentury":{

"from":2000.0,

"from_as_string":"2000.0",

"doc_count":0

}

}

}

}

}

NoteAnimportantandquiteusefulpointabouttherangeaggregationisthatthedefinedrangesneednotbedisjoint.Insuchcases,Elasticsearchwillproperlycountthedocumentformultiplebuckets.

DaterangeaggregationThedate_rangeaggregationissimilartothepreviouslydiscussedrangeaggregationbutitisdesignedforfieldsthatusedate-basedtypes.However,inthelibraryindex,thedocumentshaveyears,butthefieldisanumber,notadate.Forthepurposeofshowinghowthisaggregationworks,let’simaginethatwewanttoextendourlibraryindextosupportnewspapers.Todothiswewillcreateanewindex(calledlibrary2)byusingthefollowingcommand:

curl-XPOSTlocalhost:9200/_bulk--data-binary'{"index":{"_index":

"library2","_type":"book","_id":"1"}}

{"title":"Fishingnews","published":"2010/12/0310:00:00","copies":3,

"available":true}

{"index":{"_index":"library2","_type":"book","_id":"2"}}

{"title":"Knittingmagazine","published":"2010/11/0711:32:00",

"copies":1,"available":true}

{"index":{"_index":"library2","_type":"book","_id":"3"}}

{"title":"Theguardian","published":"2009/07/1304:33:00","copies":0,

"available":false}

{"index":{"_index":"library2","_type":"book","_id":"4"}}

{"title":"HadoopWorld","published":"2012/01/0104:00:00","copies":6,

"available":true}

'

Forthepurposeofthisexample,wewillleavethemappingsdefinitionforElasticsearch–thisissufficientinthiscase.Let’sstartwiththefirstqueryusingthedate_rangeaggregation:

{

www.EBooksWorld.ir

Page 442: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"aggs":{

"years":{

"date_range":{

"field":"published",

"ranges":[

{"to":"2009/12/31"},

{"from":"2010/01/01","to":"2010/12/31"},

{"from":"2011/01/01"}

]

}

}

}

}

Comparedwiththeordinaryrangeaggregation,theonlythingthatchangedistheaggregationtype,whichisnowdate_range.ThedatescanbepassedasastringinaformrecognizedbyElasticsearchorasanumbervalue(numberofmillisecondssince1970-01-01).TheresponsereturnedbyElasticsearchfortheprecedingquerylooksasfollows:

{

"took":5,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"years":{

"buckets":[{

"key":"*-2009/12/3100:00:00",

"to":1.2622176E12,

"to_as_string":"2009/12/3100:00:00",

"doc_count":1

},{

"key":"2010/01/0100:00:00-2010/12/3100:00:00",

"from":1.262304E12,

"from_as_string":"2010/01/0100:00:00",

"to":1.2937536E12,

"to_as_string":"2010/12/3100:00:00",

"doc_count":2

},{

"key":"2011/01/0100:00:00-*",

"from":1.29384E12,

"from_as_string":"2011/01/0100:00:00",

"doc_count":1

}]

}

}

}

www.EBooksWorld.ir

Page 443: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Asyoucansee,theresponseisnodifferentwhencomparedtotheresponsereturnedbytherangeaggregation.Wehavetwoattributesforeachbucket-namedfromandtowhichrepresentthenumberofmillisecondsfrom1970-01-01.Thepropertiesfrom_as_stringandto_as_stringpresentthesameinformationasfromandto,butinahuman-readableform.Ofcoursethekeyedparameterandkeyinthedefinitionofdaterangeworkinthealreadydescribedway.

Elasticsearchalsoallowsustodefinetheformatofpresenteddatesusingtheformatattribute.Inourexample,wepresentedthedateswithyearresolution,sothedayandtimepartswereunnecessary.Ifwewanttoshowthemonthnames,wecansendaquerysuchasthefollowingone:

{

"aggs":{

"years":{

"date_range":{

"field":"published",

"format":"MMMMYYYY",

"ranges":[

{"to":"December2009"},

{"from":"January2010","to":"December2010"},

{"from":"January2011"}

]

}

}

}

}

Notethatthedatesinthetoandfromparametersalsoneedtobeprovidedinthespecifiedformat.Oneofthereturnedrangeslooksasfollows:

{

"key":"January2010-December2010",

"from":1.262304E12,

"from_as_string":"January2010",

"to":1.2911616E12,

"to_as_string":"December2010",

"doc_count":1

}

NoteTheavailableformatswecanuseinformataredefinedintheJodaTimelibrary.Thefulllistisavailableathttp://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html.

Thereisonemorethingaboutthedate_rangeaggregationthatwewanttomention.Imaginethatsometimewemaywanttobuildanaggregationthatcanchangewithtime.Forexample,wemaywanttoseehowmanynewspaperswerepublishedinthelast3,6,9,and12months.Thisispossiblewithouttheneedtoadjustthequeryeverytime,aswecanuseconstantssuchasnow-9M.Thefollowingexampleshowsthis:

{

www.EBooksWorld.ir

Page 444: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"aggs":{

"years":{

"date_range":{

"field":"published",

"format":"dd-MM-YYYY",

"ranges":[

{"to":"now-9M/M"},

{"to":"now-9M"},

{"from":"now-6M/M","to":"now-9M/M"},

{"from":"now-3M/M"}

]

}

}

}

}

Thekeyhereisexpressionssuchasnow-9M.Elasticsearchdoesthemathandgeneratestheappropriatevalue.Forexample,youcanusey(year),M(month),w(week),d(day),h(hour),m(minute),ands(second).Forexample,theexpressionnow+3dmeansthreedaysfromnow.The/Minourexampletakesonlythedateroundedtomonths.Thankstosuchnotation,weonlycountfullmonths.Thesecondadvantageisthatthecalculateddateismorecache-friendlywithouttheroundingdatechangeseverymillisecondthatmakeeverycachebasedontherangeirrelevantandbasicallyuselessinmostcases.

IPv4rangeaggregationAveryinterestingaggregationistheip_rangeoneasitworksonInternetaddresses.ItworksonthefieldsdefinedwiththeiptypeandallowsdefiningrangesgivenbytheIPrangeinCIDRnotation(http://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing).Anexampleusageoftheip_rangeaggregationlooksasfollows:

{

"aggs":{

"access":{

"ip_range":{

"field":"ip",

"ranges":[

{"from":"192.168.0.1","to":"192.168.0.254"},

{"mask":"192.168.1.0/24"}

]

}

}

}

}

Theresponsetotheprecedingqueryisasfollows:

"access":{

"buckets":[

{

"from":3232235521,

"from_as_string":"192.168.0.1",

"to":3232235774,

"to_as_string":"192.168.0.254",

"doc_count":0

www.EBooksWorld.ir

Page 445: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

},

{

"key":"192.168.1.0/24",

"from":3232235776,

"from_as_string":"192.168.1.0",

"to":3232236032,

"to_as_string":"192.168.2.0",

"doc_count":4

}

]

}

Similartotherangeaggregation,wedefinebothendsofthebracketsandthemask.TherestisdonebyElasticsearchitself.

MissingaggregationThemissingaggregationallowsustocreateabucketandseehowmanydocumentshavenovalueinaspecifiedfield.Forexample,wecancheckhowmanyofourbooksinthelibraryindexdon’thavetheoriginaltitledefined–theotitlefield.Todothis,werunthefollowingquery:

{

"aggs":{

"missing_original_title":{

"missing":{

"field":"otitle"

}

}

}

}

TheresponsereturnedbyElasticsearchinthiscasewilllookasfollows:

{

"took":15,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"missing_original_title":{

"doc_count":2

}

}

}

Aswecansee,wehavetwodocumentswithouttheotitlefield.

www.EBooksWorld.ir

Page 446: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

HistogramaggregationThehistogramaggregationisaninterestingonebecauseofitsautomation.Thisaggregationdefinesbucketsitself.Weareonlyresponsiblefordefiningthefieldandtheinterval,andtherestisdoneautomatically.Thesimplestformofaquerythatusesthisaggregationlooksasfollows:

{

"aggs":{

"years":{

"histogram":{

"field":"year",

"interval":100

}

}

}

}

Thenewinformationweneedtoprovideisinterval,whichdefinesthelengthofeveryrangethatwillbeusedtocreateabucket.Wesettheintervalto100,whichinourcasewillresultinbucketsthatare100yearswide.Theaggregationpartoftheresponsetotheprecedingquerythatwassenttoourlibraryindexisasfollows:

{

"took":13,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"years":{

"buckets":[{

"key":1800,

"doc_count":1

},{

"key":1900,

"doc_count":3

}]

}

}

}

Similartotherangeaggregation,thehistogramaggregationallowsustousethekeyedpropertytodefinenamedbuckets.Theotheravailableoptionismin_doc_count,whichallowsustospecifytheminimumnumberofdocumentsrequiredtocreateabucket.Ifwesetthemin_doc_countpropertytozero,Elasticsearchwillalsoincludebucketswiththedocumentcountofzero.Wecanalsousethemissingpropertytospecifythevalue

www.EBooksWorld.ir

Page 447: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Elasticsearchshouldusewhenadocumentdoesn’thaveavalueinthespecifiedfield.

www.EBooksWorld.ir

Page 448: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DatehistogramaggregationAsadate_rangeaggregationisaspecializedformoftherangeaggregation,date_histogramisanextensionofthehistogramaggregationthatworksondates.Forthepurposeofthisexample,wewillagainusethedataweindexedwhendiscussingthedateaggregation.Thismeansthatwewillrunourqueriesagainsttheindexcalledlibrary2.Anexamplequeryusingthedate_histogramaggregationlooksasfollows:

{

"aggs":{

"years":{

"date_histogram":{

"field":"published",

"format":"yyyy-MM-ddHH:mm",

"interval":"10d",

"min_doc_count":1}

}

}

}

Thedifferencebetweenthehistogramanddate_histogramaggregationsistheintervalproperty.Thevalueofthispropertyisnowastringdescribingthetimeinterval,whichinourcaseis10days.Ofcoursewecansetittoanythingwewant.Itusesthesamesuffixeswediscussedwhiletalkingaboutformatsinthedate_rangeaggregation.Itisworthmentioningthatthenumbercanbeafloatvalue.Forexample,1.5mmeansthatthelengthofthebucketwillbeoneandahalfminutes.Theformatattributeisthesameasinthedate_rangeaggregation.Thankstoit,Elasticsearchcanaddahuman-readabledatetextaccordingtothedefinedformat.Ofcoursetheformatattributeisnotrequiredbutuseful.Inadditiontothat,similartotheotherrangeaggregations,thekeyedandmin_doc_countattributesstillwork.

TimezonesElasticsearchstoresallthedatesintheUTCtimezone.YoucandefinethetimezonetobeusedbyElasticsearchbyusingthetime_zoneattribute.Bysettingthisproperty,webasicallytellElasticsearchwhichtimezoneshouldbeusedtoperformthecalculations.Therearethreenotationswithwhichtosettheseattributes:

Wecansetthehoursoffset;forexample,time_zone:5Wecanusethetimeformat;forexample,time_zone:"-04:30"Wecanusethenameofthetimezone;forexample,time_zone:"Europe\Warsaw"

NoteLookathttp://joda-time.sourceforge.net/timezones.htmltoseetheavailabletimezones.

www.EBooksWorld.ir

Page 449: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

GeodistanceaggregationsThenexttwoaggregationsareconnectedwithmapsandspatialsearches.WewilltalkaboutgeotypesandqueriesintheElasticsearchspatialcapabilitiessectionofChapter8,BeyondFull-textSearching,sofeelfreetoskipthesetwotopicsnowandreturntothemlater.

Lookatthefollowingquery:

{

"aggs":{

"neighborhood":{

"geo_distance":{

"field":"location",

"origin":[-0.1275,51.507222],

"ranges":[

{"to":1200},

{"from":1201}

]

}

}

}

}

Youcanseethatthequeryissimilartotherangeaggregation.Theprecedingaggregationwillcalculatethenumberofdocumentsthatfallintotwobuckets:onecloserthan1200kmandthesecondonefurtherthan1200kmfromthegeographicalpointdefinedbytheoriginproperty(intheprecedingcase,theoriginisLondon).TheaggregationsectionoftheresponsereturnedbyElasticsearchlooksasfollows:

"neighborhood":{

"buckets":[

{

"key":"*-1200.0",

"from":0,

"to":1200,

"doc_count":1

},

{

"key":"1201.0-*",

"from":1201,

"doc_count":4

}

]

}

Thekeyedandthekeyattributesworkinthegeo_distanceaggregationaswell,sowecaneasilymodifytheresponsetoourneedsandcreatenamedbuckets.

Thegeo_distanceaggregationsupportsafewadditionalparametersthatareshowninthefollowingquery:

{

"aggs":{

www.EBooksWorld.ir

Page 450: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"neighborhood":{

"geo_distance":{

"field":"location",

"origin":{"lon":-0.1275,"lat":51.507222},

"unit":"m",

"distance_type":"plane",

"ranges":[

{"to":1200},

{"from":1201}

]

}

}

}

}

Wehavehighlightedthreethingsintheprecedingquery.Thefirstchangeishowwedefinedtheoriginpoint.Thistimewespecifiedthelocationbyprovidingthelatitudeandlongitudeexplicitly.

Thesecondchangeistheunitattribute.Itdefinestheunitsusedintherangesarray.Thepossiblevaluesare:km(thedefault,kilometers),mi(miles),in(inches),yd(yards),m(meters),cm(centimeters),andmm(millimeters).

Thelastattribute,distance_type,specifieshowElasticsearchcalculatesthedistance.Thepossiblevaluesare(fromthefastestbutleastaccuratetotheslowestbutthemostaccurate):plane,sloppy_arc(thedefault),andarc.

www.EBooksWorld.ir

Page 451: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

GeohashgridaggregationThesecondaggregationrelatedtogeographicalanalysisisbasedongridsandiscalledgeohash_grid.Itorganizesareasintogridsandassignseverylocationtoacellinsuchagrid.Todothisefficiently,ElasticsearchusesGeohash(http://en.wikipedia.org/wiki/Geohash),whichencodesthelocationintoastring.Thelongerthestringis,themoreaccuratethedescriptionofaparticularlocation.Forexample,oneletterissufficienttodeclareaboxofaboutfivethousandsquarekilometersand5lettersareenoughtoincreasetheaccuracytofivesquarekilometers.Let’slookatthefollowingquery:

{

"aggs":{

"neighborhood":{

"geohash_grid":{

"field":"location",

"precision":5

}

}

}

}

Wedefinedthegeohash_gridaggregationwithbucketsthathaveaprecisionoffivesquarekilometers(theprecisionattributedescribesthenumberoflettersusedinthegeohashstringobject).Thetablewithresolutionsversusthelengthofgeohashcanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-geohashgrid-aggregation.html.

Ofcourse,themoreaccuratewewanttheaggregationtobe,themoreresourcesElasticsearchwillconsume,becauseofthenumberofbucketsthattheaggregationhastocalculate.Bydefault,Elasticsearchdoesnotgeneratemorethan10,000buckets.Youcanchangethisbehaviorbyusingthesizeattribute,butkeepinmindthattheperformancemaysufferforverywidequeriesconsistingofthousandsofbuckets.

www.EBooksWorld.ir

Page 452: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

GlobalaggregationTheglobalaggregationisanaggregationthatdefinesasinglebucketcontainingallthedocumentsfromagivenindexandtype,andnotinfluencedbythequeryitself.Thethingthatdifferentiatestheglobalaggregationfromalltheothersisthattheglobalaggregationhasanemptybody.Forexample,lookatthefollowingquery:

{

"query":{

"term":{

"available":"true"

}

},

"aggs":{

"all_books":{

"global":{}

}

}

}

Inourlibraryindex,weonlyhavetwoavailablebooks,buttheresponsetotheprecedingquerylooksasfollows:

{

"took":1,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":3,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"all_books":{

"doc_count":4

}

}

}

Asyoucansee,theglobalaggregationisnotboundbythequery.Becausetheresultoftheglobalaggregationisasinglebucketcontainingallthedocuments(notnarroweddownbythequeryitself),itisaperfectcandidateforuseasatop-levelparentaggregationfornestingaggregations.

www.EBooksWorld.ir

Page 453: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SignificanttermsaggregationThesignificant_termsaggregationallowsustogetthetermsthatarerelevantandprobablythemostsignificantforagivenquery.Thegoodthingisthatitdoesn’tonlyshowthetoptermsfromtheresultsofthegivenquery,butalsotheonethatseemstobethemostimportantone.

Theusecasesforthisaggregationtypecanvaryfromfindingthemosttroublesomeserverworkinginyourapplicationenvironment,tosuggestingnicknamesfromtext.WheneverElasticsearchseesasignificantchangeinthepopularityofaterm,suchatermisacandidateforbeingsignificant.

NoteRememberthatthesignificant_termsaggregationisveryexpensivewhenitcomestoresourcesandrunningagainstlargeindices.Workisbeingdonetoprovidealightweightversionofthataggregation;asaresult,theAPIforsignificant_termsaggregationmaychangeinthefuture.

Thebestwaytodescribethesignificant_termsaggregationtypeistouseanexample.Let’sstartwithindexing12simpledocuments,whichrepresentreviewsofworkdonebyinterns:

curl-XPOST'localhost:9200/interns/review/1'-d'{"intern":"Richard",

"grade":"bad","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/2'-d'{"intern":"Ralf",

"grade":"perfect","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/3'-d'{"intern":"Richard",

"grade":"bad","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/4'-d'{"intern":"Richard",

"grade":"bad","type":"review"}'

curl-XPOST'localhost:9200/interns/review/5'-d'{"intern":"Richard",

"grade":"good","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/6'-d'{"intern":"Ralf",

"grade":"good","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/7'-d'{"intern":"Ralf",

"grade":"perfect","type":"review"}'

curl-XPOST'localhost:9200/interns/review/8'-d'{"intern":"Richard",

"grade":"medium","type":"review"}'

curl-XPOST'localhost:9200/interns/review/9'-d'{"intern":"Monica",

"grade":"medium","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/10'-d'{"intern":"Monica",

"grade":"medium","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/11'-d'{"intern":"Ralf",

"grade":"good","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/12'-d'{"intern":"Ralf",

"grade":"good","type":"grade"}'

Ofcourse,toshowtherealpowerofthesignificant_termsaggregation,weshoulduseawaylargerdataset.However,forthepurposeofthisbook,wewillconcentrateonthisexample,soitiseasiertoillustratehowthisaggregationworks.

Nowlet’stryfindingthemostsignificantgradeforRichard.Todothiswewillusethe

www.EBooksWorld.ir

Page 454: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

followingquery:

curl-XGET'localhost:9200/interns/_search?size=0&pretty'-d'{

"query":{

"match":{

"intern":"Richard"

}

},

"aggregations":{

"description":{

"significant_terms":{

"field":"grade"

}

}

}

}'

Theresultoftheprecedingquerylooksasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":5,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"description":{

"doc_count":5,

"buckets":[{

"key":"bad",

"doc_count":3,

"score":0.84,

"bg_count":3

}]

}

}

}

Asyoucansee,forourqueryElasticsearchinformedusthatthemostsignificantgradeforRichardisbad.Maybeitwasn’tthebestinternshipforhim;whoknows.

ChoosingsignificanttermsTocalculatesignificantterms,Elasticsearchlooksfordatathatreportsasignificantchangeintheirpopularitybetweentwosetsofdata:theforegroundsetandthebackgroundset.Theforegroundsetisthedatareturnedbyourquery,whilethebackgroundsetisthedatainourindex(orindices,dependingonhowwerunourqueries).Ifatermexistsin10documentsoutofonemillionindexed,butappearsin5documentsfromthe10returned,

www.EBooksWorld.ir

Page 455: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

thensuchatermisdefinitelysignificantandworthconcentratingon.

Let’sgetbacktoourprecedingexamplenowtoanalyzeitabit.Richardgotthreegradesfromthereviewers–badthreetimes,mediumonetime,andgoodonetime.Fromthesethree,thebadvalueappearedinthreeoutofthefivedocumentsmatchingthequery.Ingeneral,thebadgradeappearedinthreedocuments(thebg_countproperty)outofthe12documentsintheindex(thisisourbackgroundset).Thisgivesus25percentoftheindexeddocuments.Ontheotherhand,thebadgradeappearedinthreeoutofthefivedocumentsmatchingthequery(thisisourforegroundset),whichgivesus60percentofthedocuments.Asyoucansee,thechangeinpopularityissignificantforthebadgradeandthat’swhyElasticsearchhasreturneditinthesignificant_termsaggregationresults.

MultiplevalueanalysisThesignificant_termsaggregationcanbenestedandprovideuswithnicedataanalysiscapabilitiesthatconnecttwomultiplesetsofdata.Forexample,let’strytofindasignificantgradeforeachoftheinternsthatwehaveinformationabout.Todothiswewillnestthesignificant_termsaggregationinsidethetermsaggregation.Thequerythatdoesthatlooksasfollows:

curl-XGET'localhost:9200/interns/_search?size=0&pretty'-d'{

"aggregations":{

"grades":{

"terms":{

"field":"intern"

},

"aggregations":{

"significantGrades":{

"significant_terms":{

"field":"grade"

}

}

}

}

}

}'

TheresultsreturnedbyElasticsearchfortheprecedingqueryareasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":12,

"max_score":0.0,

"hits":[]

},

"aggregations":{

www.EBooksWorld.ir

Page 456: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"grades":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":"ralf",

"doc_count":5,

"significantGrades":{

"doc_count":5,

"buckets":[{

"key":"good",

"doc_count":3,

"score":0.48,

"bg_count":4

}]

}

},{

"key":"richard",

"doc_count":5,

"significantGrades":{

"doc_count":5,

"buckets":[{

"key":"bad",

"doc_count":3,

"score":0.84,

"bg_count":3

}]

}

},{

"key":"monica",

"doc_count":2,

"significantGrades":{

"doc_count":2,

"buckets":[]

}

}]

}

}

}

www.EBooksWorld.ir

Page 457: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SampleraggregationThesampleraggregationisoneoftheexperimentalaggregationsinElasticsearch.Itallowsustolimitthesubaggregationprocessingtoasampleofdocumentsthataretop-scoringones.Thisallowsfilteringandpotentialremovalofgarbageinthedata.Itisaverynicecandidateasatop-levelaggregationtolimittheamountofdatathesignificant_termsaggregationrunson.Thesimplestexampleofusingthisaggregationisasfollows:

{

"aggs":{

"sampler_example":{

"sampler":{

"field":"tags",

"max_docs_per_value":1,

"shard_size":10

},

"aggs":{

"best_terms":{

"terms":{

"field":"title"

}

}

}

}

}

}

Toseetherealpowerofsampling,wewillhavetoplaywithitonalargerdataset,butfornowwewilldiscusstheprecedingexample.Thesampleraggregationwasdefinedwiththreeproperties:field,max_docs_per_value,andshard_size.Thefirsttwopropertiesallowustocontrolthediversityofthesampling.WetellElasticsearchhowmanydocumentsatmaximum(thevalueofthemax_doc_per_valueproperty)canbecollectedonashardwiththesamevalueinthedefinedfield(thevalueofthefieldproperty).

Theshard_sizepropertytellsElasticsearchhowmanydocuments(atmost)tocollectfromeachshard.

www.EBooksWorld.ir

Page 458: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ChildrenaggregationThechildrenaggregationisasingle-bucketaggregationthatcreatesabucketwithallthechildrenofthespecifiedtype.Let’sgetbacktotheUsingtheparent-childrelationshipsectioninChapter5,ExtendingYourIndexStructure,andlet’susethecreatedshopindex.Tocreateabucketofallchildrendocumentswiththevariationtypeintheshopindex,werunthefollowingquery:

{

"aggs":{

"variation_children":{

"children":{

"type":"variation"

}

}

}

}

TheresponsereturnedbyElasticsearchisasfollows:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":3,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"variation_children":{

"doc_count":2

}

}

}

NoteBecausethechildrenaggregationusesparent–childfunctionality,itreliesonthe_parentfield,whichneedstobepresent.

www.EBooksWorld.ir

Page 459: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

NestedaggregationIntheUsingnestedobjectssectionofChapter5,ExtendingYourIndexStructure,welearnedaboutnesteddocuments.Let’susethatdatatolookintothenexttypeofaggregation–thenestedone.Let’screatethesimplestworkingquery,whichlookslikethis(weusetheshop_nestedindexcreatedinthementionedchapter):

{

"aggs":{

"variations":{

"nested":{

"path":"variation"

}

}

}

}

Theprecedingqueryissimilarinstructuretoanyotheraggregation.However,insteadofprovidingthefieldnameonwhichtheaggregationshouldbecalculated,itcontainsasingleparameterpath,whichpointstothenesteddocument.Intheresponsewegetanumberofnesteddocuments:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"variations":{

"doc_count":2

}

}

}

Theprecedingresponsemeansthatwehavetwonesteddocumentsintheindex,withtheprovidedtypevariation.

www.EBooksWorld.ir

Page 460: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ReversenestedaggregationThereverse_nestedaggregationisaspecial,single-bucketaggregationthatallowsaggregationonparentdocumentsfromthenesteddocuments.Thereverse_nestedaggregationdoesn’thaveabodysimilartoglobalaggregation.Soundsquitecomplicated,butitisnot.Let’slookatthefollowingquerythatwerunagainsttheshop_nestedindexcreatedinChapter5,ExtendingYourIndexStructureintheUsingnestedobjectssection:

{

"aggs":{

"variations":{

"nested":{

"path":"variation"

},

"aggs":{

"sizes":{

"terms":{

"field":"variation.size"

},

"aggs":{

"product_name_terms":{

"reverse_nested":{},

"aggs":{

"product_name_terms_per_size":{

"terms":{

"field":"name"

}

}

}

}

}

}

}

}

}

}

Westartwiththetoplevelaggregation,whichisthesamenestedaggregationthatweusedwhendiscussingthenestedaggregation.However,weincludeasub-aggregationthatusesreverse_nestedtobeabletoshowtermsfromthetitleforeachsizereturnedbythetop-levelnestedaggregation.Thisispossiblebecause,whenthereverse_nestedaggregationisused,Elasticsearchcalculatesthedataonthebasisoftheparentdocumentsinsteadofusingthenesteddocuments.

NoteRememberthatthereverse_nestedaggregationmustbeusedinsidethenestedaggregation.

Theresponsetotheprecedingquerywilllookasfollows:

{

"took":7,

www.EBooksWorld.ir

Page 461: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"variations":{

"doc_count":2,

"sizes":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":"XL",

"doc_count":1,

"product_name_terms":{

"doc_count":1,

"product_name_terms_per_size":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":"shirt",

"doc_count":1

},{

"key":"test",

"doc_count":1

}]

}

}

},{

"key":"XXL",

"doc_count":1,

"product_name_terms":{

"doc_count":1,

"product_name_terms_per_size":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":"shirt",

"doc_count":1

},{

"key":"test",

"doc_count":1

}]

}

}

}]

}

}

}

}

www.EBooksWorld.ir

Page 462: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

NestingaggregationsandorderingbucketsWhentalkingaboutbucketaggregations,wejustneedtogetbacktothetopicofnestingaggregations.Thisisaverypowerfultechnique,becauseitallowsyoutofurtherprocessthedatafordocumentsinthebuckets.Forexample,thetermsaggregationwillreturnabucketforeachtermandthestatsaggregationcanshowusthestatisticsfordocumentsineachbucket.Forexample,let’slookatthefollowingquery:

{

"aggs":{

"copies":{

"terms":{

"field":"copies"

},

"aggs":{

"years":{

"stats":{

"field":"year"

}

}

}

}

}

}

Thisisanexampleofnestedaggregations.Thetermsaggregationwillreturnbucketsforeachtermfromthecopiesfield(threebucketsinthecaseofourdata),andthestatsaggregationwillcalculatestatisticsfortheyearfieldforthedocumentsfallingintoeachbucketreturnedbythetopaggregation.TheresponsefromElasticsearchfortheprecedingquerylooksasfollows:

{

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"copies":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":0,

"doc_count":2,

"years":{

"count":2,

www.EBooksWorld.ir

Page 463: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"min":1886.0,

"max":1936.0,

"avg":1911.0,

"sum":3822.0

}

},{

"key":1,

"doc_count":1,

"years":{

"count":1,

"min":1929.0,

"max":1929.0,

"avg":1929.0,

"sum":1929.0

}

},{

"key":6,

"doc_count":1,

"years":{

"count":1,

"min":1961.0,

"max":1961.0,

"avg":1961.0,

"sum":1961.0

}

}]

}

}

}

Thisisapowerfulfeatureandallowsustobuildverycomplexdataprocessingpipelines.Ofcourse,wearenotlimitedtoasinglenestedaggregationandwecannestmultipleofthemandevennestanaggregationinsideanestedaggregation.Forexample:

{

"aggs":{

"popular_tags":{

"terms":{

"field":"copies"

},

"aggs":{

"years":{

"terms":{

"field":"year"

},

"aggs":{

"available_by_year":{

"stats":{

"field":"available"

}

}

}

},

"available":{

"stats":{

"field":"available"

www.EBooksWorld.ir

Page 464: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

}

}

}

}

}

Asyoucansee,thepossibilitiesarealmostunlimited,ifyouhaveenoughmemoryandCPUpowertohandleverycomplicatedaggregations.

BucketsorderingThereisonemorefeatureaboutnestedaggregationsandtheorderingofaggregationresults.Elasticsearchcanusevaluesfromthenestedaggregationstosorttheparentbuckets.Forexample,let’slookatthefollowingquery:

{

"aggs":{

"availability":{

"terms":{

"field":"copies",

"order":{"numbers.avg":"desc"}

},

"aggs":{

"numbers":{"stats":{}}

}

}

}

}

Inthepreviousexample,theorderintheavailabilityaggregationisbasedontheaveragevaluefromthenumbersaggregation.Thenotationnumbers.avgisrequiredinthiscase,becausestatsisamultivaluedaggregationandprovidesmultipleinformationandwewereinterestedintheaverage.Ifitwerethesumaggregation,thenameoftheaggregationwouldbesufficient.

www.EBooksWorld.ir

Page 465: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 466: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PipelineaggregationsThelasttypeofaggregationwewilldiscussispipelineaggregations.Tillnowwe’velearnedaboutmetricsaggregationsandbucketaggregations.Thefirstonereturnedmetricswhilethesecondtypereturnedbuckets.Andbothmetricsandbucketsaggregationsworkedonthebasisofreturneddocuments.Pipelineaggregationsaredifferent.Theyworkontheoutputoftheotheraggregationsandtheirmetrics,allowingfunctionalitiessuchasmoving-averagecalculations(https://en.wikipedia.org/wiki/Moving_average).

NoteRememberthatpipelineaggregationswereintroducedinElasticsearch2.0andareconsideredexperimental.ThismeansthattheAPIcanchangeinthefuture,breakingbackwards-compatibility.

www.EBooksWorld.ir

Page 467: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AvailabletypesTherearetwotypesofpipelineaggregation.Thesocalledparentaggregationsfamilyworksontheoutputofotheraggregations.Theyareabletoproducenewbucketsornewaggregationstoaddtoexistingbuckets.Thesecondtypeiscalledsiblingaggregationsandtheseaggregationsareabletoproducenewaggregationsonthesamelevel.

www.EBooksWorld.ir

Page 468: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ReferencingotheraggregationsBecauseoftheirnature,thepipelineaggregationsneedtobeabletoaccesstheresultsoftheotheraggregations.Wecandothatviathebuckets_pathproperty,whichisdefinedusingaspecifiedformat.WecanuseafewkeywordsthatallowustotellElasticsearchexactlywhichaggregationandmetricweareinterestedin.The>separatestheaggregationsandthe.characterseparatestheaggregationfromitsmetrics.Forexample,my_sum.summeansthatwetakethesummetricofanaggregationcalledmy_sum.Anotherexampleispopular_tags>my_sum.sum,whichmeansthatweareinterestedinthesummetricofasubaggregationcalledmy_sum,whichisnestedinsidethepopular_tagsaggregation.Inadditiontothis,wecanuseaspecialpathcalled_count.Thiscanbeusedtocalculatethepipelineaggregationsondocumentcountinsteadofspecifiedmetrics.

www.EBooksWorld.ir

Page 469: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

GapsinthedataOurdatacancontaingaps–situationswherethedatadoesn’texist.Forsuchusecases,wehavetheabilitytospecifythegap_policypropertyandsetittoskiporinsert_zeros.TheskipvaluetellsElasticsearchtoignorethemissingdataandcontinuefromthenextavailablevalue,whileinsert_zerosreplacesthemissingvalueswithzero.

www.EBooksWorld.ir

Page 470: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PipelineaggregationtypesMostoftheaggregationswewillshowinthissectionareverysimilartotheoneswe’vealreadyseeninthesectionsaboutmetricsandbucketsaggregations.Becauseofthat,wewon’tdiscussthemindepth.Therearealsonew,specificpipelineaggregationsthatwewanttotalkaboutinalittlemoredata.

Min,max,sum,andaveragebucketaggregationsThemin_bucket,max_bucket,sum_bucket,andavg_bucketaggregationsaresiblingaggregations,similarinwhattheyreturntothemin,max,sum,andavgaggregations.However,insteadofworkingonthedatareturnedbythequery,theyworkontheresultsoftheotheraggregations.

Toshowyouasimpleexampleofhowthisaggregationworks,let’scalculatethesumofallthebucketsreturnedbytheotheraggregations.Thequerythatwilldothatlooksasfollows:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

}

}

},

"sum_copies":{

"sum_bucket":{

"buckets_path":"periods_histogram>copies_per_100_years"

}

}

}

}

Asyoucansee,weusedthehistogramaggregationandweincludedanestedaggregationthatcalculatesthesumofthecopiesfield.Oursum_bucketsiblingaggregationisusedoutsidethemainaggregationandreferstoitusingthebuckets_pathproperty.IttellsElasticsearchthatweareinterestedinsummingthevaluesofmetricsreturnedbythecopies_per_100_yearsaggregation.TheresultreturnedbyElasticsearchforthisquerylooksasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

www.EBooksWorld.ir

Page 471: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"periods_histogram":{

"buckets":[{

"key":1800,

"doc_count":1,

"copies_per_100_years":{

"value":0.0

}

},{

"key":1900,

"doc_count":3,

"copies_per_100_years":{

"value":7.0

}

}]

},

"sum_copies":{

"value":7.0

}

}

}

Asyoucansee,Elasticsearchaddedanotherbuckettotheresults,calledsum_copies,whichholdsthevaluewewereinterestedin.

CumulativesumaggregationThecumulative_sumaggregationisaparentpipelineaggregationthatallowsustocalculatethesuminthehistogramordate_histogramaggregation.Asimpleexampleoftheaggregationlooksasfollows:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

},

"cumulative_copies_sum":{

"cumulative_sum":{

"buckets_path":"copies_per_100_years"

}

www.EBooksWorld.ir

Page 472: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

}

}

}

}

Becausethisaggregationisaparentpipelineaggregation,itisdefinedinthesubaggregations.Thereturnedresultlooksasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"periods_histogram":{

"buckets":[{

"key":1800,

"doc_count":1,

"copies_per_100_years":{

"value":0.0

},

"cumulative_copies_sum":{

"value":0.0

}

},{

"key":1900,

"doc_count":3,

"copies_per_100_years":{

"value":7.0

},

"cumulative_copies_sum":{

"value":7.0

}

}]

}

}

}

Thefirstcumulative_copies_sumis0becauseofthesumdefinedinthebucket.Thesecondisthesumofallthepreviousonesandthecurrentbucket,whichmeans7.Thenextwillbethesumofallthepreviousonesandthenextbucket.

BucketselectoraggregationThebucket_selectoraggregationisanothersiblingparentaggregation.Itallowsusingascripttodecideifabucketshouldberetainedintheparentmulti-bucketaggregation.For

www.EBooksWorld.ir

Page 473: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

example,tokeeponlybucketsthathavemorethanonecopyperperiod,wecanrunthefollowingquery(itneedsthescript.inlinepropertytobesettoonintheelasticsearch.ymlfile):

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

},

"remove_empty_buckets":{

"bucket_selector":{

"buckets_path":{

"sum_copies":"copies_per_100_years"

},

"script":"sum_copies>1"

}

}

}

}

}

}

Therearetwoimportantthingshere.Thefirstisthebuckets_pathproperty,whichisdifferenttowhatwe’veusedsofar.Nowitusesakeyandavalue.Thekeyisusedtoreferencethevalueinthescript.Thesecondimportantthingisthescriptproperty,whichdefinesthescriptthatdecidesiftheprocessedbucketshouldberetained.TheresultsreturnedbyElasticsearchinthiscaseareasfollows:

{

"took":330,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"periods_histogram":{

"buckets":[{

"key":1900,

"doc_count":3,

"copies_per_100_years":{

www.EBooksWorld.ir

Page 474: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"value":7.0

}

}]

}

}

}

Aswecansee,thebucketwiththecopies_per_100_yearsvalueequalto0hasbeenremoved.

BucketscriptaggregationThebucket_scriptaggregation(siblingparent)allowsustodefinemultiplebucketpathsandusetheminsideascript.Theusedmetricsmustbethenumerictypeandthereturnedvaluealsoneedstobenumeric.Anexampleofusingthisaggregationfollows(thefollowingqueryneedsthescript.inlinepropertytobesettoonintheelasticsearch.ymlfile):

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

},

"stats_per_100_years":{

"stats":{

"field":"copies"

}

},

"example_bucket_script":{

"bucket_script":{

"buckets_path":{

"sum_copies":"copies_per_100_years",

"count":"stats_per_100_years.count"

},

"script":"sum_copies/count*1000"

}

}

}

}

}

}

Therearetwothingshere.Thefirstthingisthatwe’vedefinedtwoentriesinthebuckets_pathproperty.Weareallowedtodothatinthebucket_scriptaggregation.Eachentryisakeyandavalue.Thekeyisthenameofthevaluethatwecanuseinthescript.Thesecondisthepathtotheaggregationmetricweareinterestedin.Ofcourse,the

www.EBooksWorld.ir

Page 475: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

scriptpropertydefinesthescriptthatreturnsthevalue.

Thereturnedresultsfortheprecedingqueryareasfollows:

{

"took":5,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"periods_histogram":{

"buckets":[{

"key":1800,

"doc_count":1,

"copies_per_100_years":{

"value":0.0

},

"stats_per_100_years":{

"count":1,

"min":0.0,

"max":0.0,

"avg":0.0,

"sum":0.0

},

"example_bucket_script":{

"value":0.0

}

},{

"key":1900,

"doc_count":3,

"copies_per_100_years":{

"value":7.0

},

"stats_per_100_years":{

"count":3,

"min":0.0,

"max":6.0,

"avg":2.3333333333333335,

"sum":7.0

},

"example_bucket_script":{

"value":2333.3333333333335

}

}]

}

}

}

www.EBooksWorld.ir

Page 476: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SerialdifferencingaggregationTheserial_diffaggregationisaparentpipelineaggregationthatimplementsatechniquewherethevaluesintimeseriesdata(suchasahistogramordatehistogram)aresubtractedfromthemselvesatdifferenttimeperiods.Thistechniqueallowsdrawingthedatachangesbetweentimeperiodsinsteadofdrawingthewholevalue.Youknowthatthepopulationofacitygrowswithtime.Ifweusetheserialdifferencingaggregationwiththeperiodofoneday,wecanseethedailygrowth.

Tocalculatetheserial_diffaggregation,weneedtheparentaggregation,whichisahistogramoradate_histogram,andweneedtoprovideitwithbuckets_path,whichpointstothemetricweareinterestedin,andlag(apositive,non-zerointegervalue),whichtellswhichpreviousbuckettosubtractfromthecurrentone.Wecanomitlag,inwhichcaseElasticsearchwillsetitto1.

Let’snowlookatasimplequerythatusesthediscussedaggregation:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

},

"first_difference":{

"serial_diff":{

"buckets_path":"copies_per_100_years",

"lag":1

}

}

}

}

}

}

Theresponsetotheprecedingquerylooksasfollows:

{

"took":68,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

www.EBooksWorld.ir

Page 477: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"hits":[]

},

"aggregations":{

"periods_histogram":{

"buckets":[{

"key":1800,

"doc_count":1,

"copies_per_100_years":{

"value":0.0

}

},{

"key":1900,

"doc_count":3,

"copies_per_100_years":{

"value":7.0

},

"first_difference":{

"value":7.0

}

}]

}

}

}

Asyoucansee,withthesecondbucketwegotouraggregation(wewillgetitwitheverybucketafterthataswell).Thecalculatedvalueis7becausethecurrentvalueofcopies_per_100_yearsis7andthepreviousis0.Subtracting0from7givesus7.

DerivativeaggregationThederivativeaggregationisanotherexampleofparentpipelineaggregation.Asitsnamesuggests,itcalculatesaderivative(https://en.wikipedia.org/wiki/Derivative)ofagivenmetricfromahistogramordatehistogram.Theonlythingweneedtoprovideisbuckets_path,whichpointstothemetricweareinterestedin.Anexamplequeryusingthisaggregationlooksasfollows:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

},

"derivative_example":{

"derivative":{

"buckets_path":"copies_per_100_years"

}

}

}

www.EBooksWorld.ir

Page 478: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

}

}

MovingavgaggregationThelastpipelineaggregationthatwewanttodiscussisthemoving_avgone.Itcalculatesthemovingaveragemetric(https://en.wikipedia.org/wiki/Moving_average)overthebucketsoftheparentaggregation(yes,thisisaparentpipelineaggregation).Similartothefewpreviouslydiscussedaggregations,itneedstoberunontheparenthistogramordatehistogramaggregation.

Whencalculatingthemovingaverage,Elasticsearchwilltakethewindow(specifiedbythewindowpropertyandsetto5bydefault),calculatetheaverageforbucketsinthewindow,movethewindowonebucketfurther,andrepeat.Ofcoursewealsoneedtoprovidebuckets_path,whichpointstothemetricthatthemovingaverageshouldbecalculatedfor.

Anexampleofusingthisaggregationlooksasfollows:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":10

},

"aggs":{

"copies_per_10_years":{

"sum":{

"field":"copies"

}

},

"moving_avg_example":{

"moving_avg":{

"buckets_path":"copies_per_10_years"

}

}

}

}

}

}

Wewillomitincludingtheresponsefortheprecedingqueryasitisquitelarge.

Predictingfuturebuckets

Theverynicethingaboutmovingaverageaggregationisthatitsupportspredictions;itcanattempttoextrapolatethedataithasandcreatefuturebuckets.Toforcetheaggregationtopredictbuckets,wejustneedtoaddthepredictpropertytoanymovingaverageaggregationandsetittothenumberofpredictionswewanttoget.Forexample,ifwewanttoaddfivepredictionstotheprecedingquery,wewillchangeittolookasfollows:

www.EBooksWorld.ir

Page 479: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":10

},

"aggs":{

"copies_per_10_years":{

"sum":{

"field":"copies"

}

},

"moving_avg_example":{

"moving_avg":{

"buckets_path":"copies_per_10_years",

"predict":5

}

}

}

}

}

Ifyoulookattheresultsandcomparetheresponsereturnedforthepreviousquerywiththeonewithpredictions,youwillnoticethatthelastbucketinthepreviousqueryendsonthekeypropertyequalto1960,whilethequerywithpredictionsendsonthekeypropertyequalto2010,whichisexactlywhatwewantedtoachieve.

Themodels

Bydefault,Elasticsearchusesthesimplestmodelforcalculatingthemovingaveragesaggregation,butwecancontrolthatbyspecifyingthemodelproperty;thispropertyholdsthenameofthemodelandthesettingsobject,whichwecanusetoprovidemodelproperties.

Thepossiblemodelsare:simple,linear,ewma,holt,andholt_winters.Discussingeachofthemodelsindetailisbeyondthescopeofthebook,soifyouareinterestedindetailsaboutthedifferentmodels,refertotheofficialElasticsearchdocumentationregardingthemovingaveragesaggregationavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-pipeline-movavg-aggregation.html.

Anexamplequeryusingdifferentmodellooksasfollows:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":10},

"aggs":{

"copies_per_10_years":{

"sum":{

"field":"copies"

www.EBooksWorld.ir

Page 480: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}},

"moving_avg_example":{

"moving_avg":{

"buckets_path":"copies_per_10_years",

"model":"holt",

"settings":{

"alpha":0.6,

"beta":0.4

}

}

}

}

}

}

}

www.EBooksWorld.ir

Page 481: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 482: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SummaryThechapterwejustfinishedwasallaboutdataanalysisinElasticsearch:theaggregationsengine.Welearnedwhattheaggregationsareandhowtheywork.Weusedmetrics,buckets,andnewlyintroducedpipelineaggregations,andlearnedwhatwecandowiththem.

Inthenextchapter,we’llgobeyondfulltextsearching.Wewillusesuggesterstobuildefficientautocompletefunctionalityandcorrecttheusers’spellingmistakes.Wewillseewhatpercolationisandhowtouseitinourapplication.WewillusethegeospatialabilitiesofElasticsearchandwe’lllearnhowtoefficientlyfetchlargeamountofdatafromElasticsearch.

www.EBooksWorld.ir

Page 483: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 484: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Chapter8.BeyondFull-textSearchingThepreviouschapterwasfullydedicatedtodataanalysisandhowwecanperformitwithElasticsearch.Welearnedhowtouseaggregations,whattypesofaggregationareavailable,andwhataggregationsareavailablewithineachtypeandhowtousethem.Inthischapter,wewillgetbacktoqueryrelatedtopics.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

WhatispercolatorandhowtouseitWhatarethegeospatialcapabilitiesofElasticsearchHowtouseandbuildfunctionalitiesusingElasticsearchsuggestersHowtousetheScrollAPItoefficientlyfetchlargenumbersofresults

www.EBooksWorld.ir

Page 485: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PercolatorHaveyoueverwonderedwhatwouldhappenifwereversethetraditionalmodelofusingqueriestofinddocumentsinElasticsearch?Doesitmakesensetohaveadocumentandsearchforqueriesmatchingit?Itisnotsurprisingthatthereisawholerangeofsolutionswherethismodelisveryuseful.Wheneveryouoperateonanunboundedstreamofinputdata,whereyousearchfortheoccurrencesofparticularevents,youcanusethisapproach.Thiscanbeusedforthedetectionoffailuresinamonitoringsystemorforthe“Tellmewhenaproductwiththedefinedcriteriawillbeavailableinthisshop”functionality.Inthissection,wewilllookathowanElasticsearchpercolatorworksandhowwecanuseittoimplementoneoftheaforementionedusecases.

www.EBooksWorld.ir

Page 486: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheindexInalltheexamplestobeusedwhendiscussingpercolatorfunctionality,wewilluseanindexcallednotifier.Thementionedindexiscreatedbyusingthefollowingcommand:

curl-XPOST'localhost:9200/notifier'-d'{

"mappings":{

"book":{

"properties":{

"title":{

"type":"string"

},

"otitle":{

"type":"string"

},

"year":{

"type":"integer"

},

"available":{

"type":"boolean"

},

"tags":{

"type":"string",

"index":"not_analyzed"

}

}

}

}

}'

Itisquitesimple.Itcontainsasingletypeandfivefields,whichwillbeusedduringourjourneythroughtheworld.

www.EBooksWorld.ir

Page 487: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PercolatorpreparationElasticsearchexposesaspecialtypecalled.percolatorthatistreateddifferently.Thismeansthatwecanstoreanydocumentsandalsosearchthemlikeanordinarytypeinanyindex.IfyoulookatanyElasticsearchquery,youwillnoticethateachisavalidJSONdocument,whichmeansthatwecanindexandstoreitasadocumentaswell.Thethingisthatpercolatorallowsustoinversethesearchlogicandsearchforquerieswhichmatchagivendocument.Thisispossiblebecauseofthetwojustdiscussedfeatures:thespecial.percolatortypeandthefactthatqueriesinElasticsearcharevalidJSONdocuments.

Let’sgetbacktothelibraryexamplefromChapter2,IndexingYourData,andtrytoindexoneofthequeriesinthepercolator.Weassumethatourusersneedtobeinformedwhenanybookmatchingthecriteriadefinedbythequeryisavailable.

Lookatthefollowingquery1.jsonfilethatcontainsanexamplequerygeneratedbytheuser:

{

"query":{

"bool":{

"must":{

"term":{

"title":"crime"

}

},

"should":{

"range":{

"year":{

"gt":1900,

"lt":2000

}

}

},

"must_not":{

"term":{

"otitle":"nothing"

}

}

}

}

}

Toenhancetheexample,wealsoassumethatourusersareallowedtodefinefiltersusingourhypotheticaluserinterface.Forexample,ourusermaybeinterestedintheavailablebooksthatwerewrittenbeforetheyear2010.Anexamplequerythatcouldhavebeenconstructedbysuchauserinterfacewouldlookasfollows(thequerywaswrittentothequery2.jsonfile):

{

"query":{

"bool":{

"must":{

"range":{

www.EBooksWorld.ir

Page 488: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"year":{

"lt":2010

}

}

},

"filter":{

"term":{

"available":true

}

}

}

}

}

Now,let’sregisterbothqueriesinthepercolator(notethatweareregisteringthequeriesandhaven’tindexedanydocuments).Inordertodothis,wewillrunthefollowingcommands:

curl-XPUT'localhost:9200/notifier/.percolator/1'[email protected]

curl-XPUT'localhost:9200/notifier/.percolator/old_books'[email protected]

Intheprecedingexamples,weusedtwocompletelydifferentidentifiers.Wedidthatinordertoshowthatwecanuseanidentifierthatbestdescribesthequery.Itisuptoustodecideunderwhichnamewewouldlikethequerytoberegistered.

Wearenowreadytouseourpercolator.Ourapplicationwillprovidedocumentstothepercolatorandcheckifanyofthealreadyregisteredqueriesmatchthedocument.Thisisexactlywhatapercolatorallowsustodo-toreversethesearchlogic.Insteadofindexingthedocumentsandrunningqueriesagainstthem,westorethequeriesandsendthedocumentstofindthematchingqueries.

Let’suseanexampledocumentthatwillmatchbothstoredqueries;itwillhavetherequiredtitleandthereleasedate,andwillmentionwhetheritiscurrentlyavailable.Thecommandtosendsuchadocumenttothepercolatorlooksasfollows:

curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{

"doc":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

}'

Asweexpected,bothqueriesmatchedandtheElasticsearchresponseincludestheidentifiersofthematchingqueries.Sucharesponselooksasfollows:

{

"took":36,

"_shards":{

"total":5,

www.EBooksWorld.ir

Page 489: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"successful":5,

"failed":0

},

"total":2,

"matches":[{

"_index":"notifier",

"_id":"old_books"

},{

"_index":"notifier",

"_id":"1"

}]

}

Thisworkslikeacharm.Oneveryimportantthingtonoteistheendpointusedinthisquery:_percolate.Usingthisendpointisrequiredwhenwewanttousethepercolator.Theindexnamecorrespondstotheindexwherethequerieswerestored,andthetypeisequaltothetypedefinedinthemappings.

NoteTheresponseformatcontainsinformationabouttheindexandthequeryidentifier.Thisinformationisincludedforcaseswhenwesearchagainstmultipleindicesatonce.Whenusingasingleindex,addinganadditionalqueryparameter,percolate_format=ids,willchangetheresponseasfollows:

"matches":["old_books","1"]

www.EBooksWorld.ir

Page 490: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

GettingdeeperBecausethequeriesregisteredinapercolatorareinfactdocuments,wecanuseanormalquerysenttoElasticsearchinordertochoosewhichqueriesstoredinthe.percolatortypeshouldbeusedinthepercolationprocess.Thismaysoundweird,butitreallygivesalotofpossibilities.Inourlibrary,wecanhaveseveralgroupsofusers.Let’sassumethatsomeofthemhavepermissionstoborrowveryrarebooks,orthatwehaveseveralbranchesinthecityandtheusercandeclarewhereheorshewouldliketogetthebookfrom.

Let’sseehowsuchusecasescanbeimplementedbyusingthepercolator.Todothis,wewillneedtoupdateourmappingandincludethebranchinformation.Wedothatbyrunningthefollowingcommand:

curl-XPOST'localhost:9200/notifier/.percolator/_mapping'-d'{

".percolator":{

"properties":{

"branches":{

"type":"string",

"index":"not_analyzed"

}

}

}

}'

Now,inordertoregisteraquery,weusethefollowingcommand:

curl-XPUT'localhost:9200/notifier/.percolator/3'-d'{

"query":{

"term":{

"title":"crime"

}

},

"branches":["brA","brB","brD"]

}'

Intheprecedingexample,weregisteredaquerythatshowsauser’sinterest.Ourhypotheticaluserisinterestedinanybookwiththetermcrimeinthetitlefield(thetermqueryisresponsibleforthis).Heorshewantstoborrowthisbookfromoneofthethreelistedbranches.Whenspecifyingthemappings,wedefinedthatthebranchesfieldisanon-analyzedstringfield.Wecannowincludeaqueryalongwiththedocumentwesentpreviously.Let’slookathowtodothis.

Ourbooksystemjustgotthebook,anditisreadytoreportthebookandcheckwhetherthebookisofinteresttoanyone.Tocheckthis,wesendthedocumentthatdescribesthebookandaddanadditionalquerytosucharequest-thequerythatwilllimittheuserstoonlytheonesinterestedinthebrBbranch.Sucharequestlooksasfollows:

curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{

"doc":{

"title":"CrimeandPunishment",

"otitle":"

www.EBooksWorld.ir

Page 491: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Преступлéниеинаказáние

",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

},

"size":10,

"filter":{

"term":{

"branches":"brB"

}

}

}'

Ifeverythingwasexecutedcorrectly,theresponsereturnedbyElasticsearchshouldlookasfollows(weindexedourquerywith3asanidentifier):

{

"took":27,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"total":1,

"matches":[{

"_index":"notifier",

"_id":"3"

}]

}

ControllingthesizeofreturnedresultsThesizeoftheresultswhenitcomestopercolatormakesthedifference.Themorequeriesasingledocumentmatches,themoreresultswillbereturnedandmorememorywillbeneededbyElasticsearch.Becauseofthis,thereisoneadditionalthingtonote-thesizeparameter.Itallowsustolimitthenumberofmatchesreturned.

PercolatorandscorecalculationInthepreviousexamples,wefilteredourqueriesusingasingletermquery,butwedidn’tthinkaboutthescoringprocessatall.Elasticsearchallowsustocalculatethescorewhenusingthepercolator.Let’schangethepreviouslyuseddocumentsenttothepercolatorandadjustitsothatscoringisused:

curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{

"doc":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

www.EBooksWorld.ir

Page 492: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"tags":[],

"copies":0,

"available":true

},

"size":10,

"query":{

"term":{

"branches":"brB"

}

},

"track_scores":true,

"sort":{

"_score":"desc"

}

}'

Asyoucansee,weusedthequerysectionandincludedanadditionaltrack_scoresattributesettotrue.Thisisneeded,becausebydefaultElasticsearchwon’tcalculatethescoreforthedocumentsbecauseofperformance.Ifweneedscoresinthepercolationprocess,weshouldbeawarethatsuchquerieswillbeslightlymoredemandingwhenitcomestoCPUprocessingpowerthantheonesthatomitcalculatingthescore.

NoteIntheprecedingexample,wetoldElasticsearchtosortourresultonthebasisofthescoreindescendingorder.Thisisthedefaultbehaviorwhentrack_scoresisturnedon,sowecanomitsortdeclaration.Atthetimeofwriting,sortingonscoreindescendingdirectionistheonlyavailableoption.

CombiningpercolatorswithotherfunctionalitiesIfweareallowedtousequeriesalongwiththedocumentssentforpercolation,whycanwenotuseotherElasticsearchfunctionalities?Ofcourse,thisispossible.Forexample,thefollowingdocumentissentalongwithanaggregationandtheresultswillincludetheaggregationcalculation:

curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{

"doc":{

"title":"CrimeandPunishment",

"available":true

},

"aggs":{

"test":{

"terms":{

"field":"branches"

}

}

}

}'

Aswecansee,percolatorallowsustorunbothqueryandaggregations.Lookatthefollowingexampledocument:

curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{

www.EBooksWorld.ir

Page 493: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"doc":{

"title":"CrimeandPunishment",

"year":1886,

"available":true

},

"size":10,

"highlight":{

"fields":{

"title":{}

}

}

}'

Asyoucansee,itcontainsahighlightingsection.AfragmentoftheresponsereturnedbyElasticsearchlooksasfollows:

{

"_index":"notifier",

"_id":"3",

"highlight":{

"title":["<em>Crime</em>andPunishment"]

}

}

NoteNotethattherearesomelimitationswhenitcomestothequerytypessupportedbythepercolatorfunctionality.Inthecurrentimplementation,parent-childrelationsarenotavailableinthepercolator,soyoucan’tusequeriessuchashas_child,top_children,andhas_parent.

www.EBooksWorld.ir

Page 494: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

GettingthenumberofmatchingqueriesSometimesyoudon’tcareaboutthematchedqueriesandyouonlywantthenumberofmatchedqueries.Insuchcases,sendingadocumentagainstthestandardpercolatorendpointisnotefficient.Elasticsearchexposesthe_percolate/countendpointtohandlesuchcasesinanefficientway.Anexampleofsuchacommandfollows:

curl-XGET'localhost:9200/notifier/book/_percolate/count?pretty'-d'{

"doc":{...}

}'

www.EBooksWorld.ir

Page 495: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IndexeddocumentpercolationInthefinal,closingparagraphofthepercolationsection,wewanttoshowyouonemorething–thepossibilityofpercolatingadocumentthatisalreadyindexed.Todothis,weneedtousetheGEToperationonthedocumentandprovideinformationaboutwhichpercolatorindexshouldbeused.Let’slookatthefollowingcommand:

curl-XGET'localhost:9200/library/book/1/_percolate?

percolate_index=notifier'

Thiscommandchecksthedocumentwiththe1identifierfromourlibraryindexagainstthepercolatorindexdefinedbythepercolate_indexparameter.Rememberthat,bydefault,Elasticsearchwillusethepercolatorinthesameindexasthedocument;that’swhywe’vespecifiedthepercolate_indexparameter.

www.EBooksWorld.ir

Page 496: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 497: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ElasticsearchspatialcapabilitiesThesearchserverssuchasElasticsearchareusuallylookedatfromtheperspectiveoffull-textsearching.Elasticsearch,becauseofitsmarketingasbeingpartofELK(Elasticsearch,Logstash,andKibana),isalsohighlyknownforbeingabletohandlelargeamountoftimeseriesdata.However,thisisonlyapartofthewholeview.Sometimesbothofthementionedusecasesarenotenough.Imaginesearchingforlocalservices.Fortheenduser,themostimportantthingistheaccuracyoftheresults.Byaccuracy,wenotonlymeantheproperresultsofthefull-textsearch,butalsotheresultsbeingasnearastheycanintermsoflocation.Inseveralcases,thisisthesameasatextsearchongeographicalnamessuchascitiesorstreets,butinothercaseswecanfinditveryusefultobeabletosearchonthebasisofthegeographicalcoordinatesofourindexeddocuments.AndthisisalsoafunctionalitythatElasticsearchiscapableofhandling.

WiththereleaseofElasticsearch2.2,thegeo_pointtypereceivedalotofchanges,especiallyinternallywherealltheoptimizationsweredone.Priorto2.2,thegeo_pointtypewasstoredintheindexasatwonotanalyzedstringvaluesandthischanged.WiththereleaseofElasticsearch2.2,thegeo_pointtypegotallthegreatimprovementsfromApacheLucenelibraryandisnowmoreefficient.

www.EBooksWorld.ir

Page 498: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

MappingpreparationforspatialsearchesInordertodiscussthespatialsearchfunctionality,let’sprepareanindexwithalistofcities.Thiswillbeaverysimpleindexwithonetypenamedpoi(whichstandsforthepointofinterest),thenameofthecity,anditscoordinates.Themappingsareasfollows:

{

"mappings":{

"poi":{

"properties":{

"name":{"type":"string"},

"location":{"type":"geo_point"}

}

}

}

}

Assumingthatweputthisdefinitionintothemapping1.jsonfile,wecancreateanindexbyrunningthefollowingcommand:

curl-XPUTlocalhost:9200/[email protected]

Theonlynewthingintheprecedingmappingsisthegeo_pointtype,whichisusedforthelocationfield.Byusingit,wecanstorethegeographicalpositionofourcityandusespatial-basedfunctionalities.

www.EBooksWorld.ir

Page 499: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ExampledataOurexampledocuments1.jsonfilewithdocumentslooksasfollows:

{"index":{"_index":"map","_type":"poi","_id":1}}

{"name":"NewYork","location":"40.664167,-73.938611"}

{"index":{"_index":"map","_type":"poi","_id":2}}

{"name":"London","location":[-0.1275,51.507222]}

{"index":{"_index":"map","_type":"poi","_id":3}}

{"name":"Moscow","location":{"lat":55.75,"lon":37.616667}}

{"index":{"_index":"map","_type":"poi","_id":4}}

{"name":"Sydney","location":"-33.859972,151.211111"}

{"index":{"_index":"map","_type":"poi","_id":5}}

{"name":"Lisbon","location":"eycs0p8ukc7v"}

Inordertoperformabulkrequest,weaddedinformationabouttheindexname,type,anduniqueidentifiersofourdocuments;so,wecannoweasilyimportthisdatausingthefollowingcommand:

curl-XPOSTlocalhost:9200/[email protected]

Onethingthatweshouldtakeacloserlookatisthelocationfield.Wecanusevariousnotationsforcoordination.Wecanprovidethelatitudeandlongitudevaluesasastring,asapairofnumbers,orasanobject.Notethatthestringandarraymethodsofprovidingthegeographicallocationhavedifferentordersforthelatitudeandlongitudeparameters.ThelastrecordshowsthatthereisalsoapossibilitytogivecoordinationasaGeohashvalue(thenotationisdescribedindetailathttp://en.wikipedia.org/wiki/Geohash).

Additionalgeo_fieldpropertiesWiththereleaseofElasticsearch2.2,thenumberofparametersthatthegeo_pointtypecanaccepthasbeenreducedandisasfollows:

geohash:BooleanparametertellingElasticsearchwhetherthe.geohashfieldshouldbecreated.Defaultstofalseunlessgeohash_prefixisused.geohash_precision:Maximumsizeofgeohashandgeohash_prefix.geohash_prefix:BooleanparametertellingElasticsearchtoindexthegeohashanditsprefixes.Defaultstofalse.ignore_malformed:BooleanparametertellingElasticsearchtoignoreabadlywrittengeo_fieldpointinsteadofrejectingthewholedocument.Defaultstofalse,whichmeansthatthebadlyformattedgeo_fielddatawillresultinanindexationerrorforthewholedocument.lat_lon:BooleanparametertellingElasticsearchtoindexthespatialdataintwoseparatefieldscalled.latand.lon.Defaultstofalse.precision_step:Parameterallowingcontroloverhowournumericgeographicalpointswillbeindexed.

Keepinmindthatthegeohashfieldrelatedandlat_lonfieldrelatedpropertieswerenotremovedforbackward-compatibilityreasons.Theuserscanstillusethem.However,thequerieswillnotusethembutwillinsteadusethehighlyoptimizeddatastructurethatis

www.EBooksWorld.ir

Page 500: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

builtduringindexingbythegeo_pointtype.

www.EBooksWorld.ir

Page 501: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SamplequeriesNowlet’slookatseveralexamplesofusingcoordinatesandsolvingcommonrequirementsinmodernapplicationsthatrequiregeographicaldatasearchingalongwithfull-textsearching.

NoteIfyouareinterestedinallthegeospatialqueriesthatareavailableforElasticsearchusers,refertotheofficialdocumentationavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/current/geo-queries.html.

Distance-basedsortingLet’sstartwithaverycommonrequirement:sortingthereturnedresultsbydistancefromagivenpoint.Inourexample,wewanttogetallthecitiesandsortthembytheirdistancesfromthecapitalofFrance,Paris.Todothis,wesendthefollowingquerytoElasticsearch:

curl-XGETlocalhost:9200/map/_search?pretty-d'{

"query":{

"match_all":{}

},

"sort":[{

"_geo_distance":{

"location":"48.8567,2.3508",

"unit":"km"

}

}]

}'

IfyouremembertheSortingdatasectionfromChapter4,ExtendingYourQueryingKnowledge,you’llnoticethattheformatisslightlydifferent.Weareusingthe_geo_distancekeytoindicatesortingbydistance.Wemustgivethebaselocation(thelocationattribute,whichholdstheinformationofthelocationofParisinourcase),andweneedtospecifytheunitsthatcanbeusedintheresults.Theavailablevaluesarekmandmi,whichstandforkilometersandmiles,respectively.Theresultofsuchaquerywillbeasfollows:

{

"took":5,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":5,

"max_score":null,

"hits":[{

"_index":"map",

"_type":"poi",

"_id":"2",

www.EBooksWorld.ir

Page 502: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"_score":null,

"_source":{

"name":"London",

"location":[-0.1275,51.507222]

},

"sort":[343.17487356850313]

},{

"_index":"map",

"_type":"poi",

"_id":"5",

"_score":null,

"_source":{

"name":"Lisbon",

"location":"eycs0p8ukc7v"

},

"sort":[1452.9506736367805]

},{

"_index":"map",

"_type":"poi",

"_id":"3",

"_score":null,

"_source":{

"name":"Moscow",

"location":{

"lat":55.75,

"lon":37.616667

}

},

"sort":[2483.837565935267]

},{

"_index":"map",

"_type":"poi",

"_id":"1",

"_score":null,

"_source":{

"name":"NewYork",

"location":"40.664167,-73.938611"

},

"sort":[5832.645958617513]

},{

"_index":"map",

"_type":"poi",

"_id":"4",

"_score":null,

"_source":{

"name":"Sydney",

"location":"-33.859972,151.211111"

},

"sort":[16978.094780773998]

}]

}

}

Aswiththeotherexamplesofsorting,Elasticsearchshowsinformationaboutthevalueusedforsorting.Let’slookatthehighlightedrecord.Aswecansee,thedistancebetweenParisandLondonisabout343km,andifyoucheckatraditionalmap,youwillseethat

www.EBooksWorld.ir

Page 503: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

thisistrue.

BoundingboxfilteringThenextexamplethatwewanttoshowisnarrowingdowntheresultstoaselectedareathatisboundedbyagivenrectangle.Thisisveryhandyifwewanttoshowresultsonthemaporwhenweallowausertomarkthemapareaforsearching.YoualreadyreadaboutfiltersintheFilteringyourresultssectionofChapter4,ExtendingYourQueryingKnowledge,buttherewedidn’tmentionspatialfilters.Thefollowingqueryshowshowwecanfilterbyusingtheboundingbox:

curl-XGETlocalhost:9200/map/_search?pretty-d'{

"query":{

"bool":{

"must":{"match_all":{}},

"filter":{

"geo_bounding_box":{

"location":{

"top_left":"52.4796,-1.903",

"bottom_right":"48.8567,2.3508"

}

}

}

}

}

}'

Intheprecedingexample,weselectedamapfragmentbetweenBirminghamandParisbyprovidingthetop-leftandbottom-rightcornercoordinates.Thesetwocornersareenoughtospecifyanyrectanglewewant,andElasticsearchwilldotherestofthecalculationforus.Thefollowingscreenshotshowsthespecifiedrectangleonthemap:

www.EBooksWorld.ir

Page 504: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Aswecansee,theonlycityfromourdatathatmeetsthecriteriaisLondon.So,let’scheckwhetherElasticsearchknowsthisbyrunningtheprecedingquery.Let’snowlookatthereturnedresults:

{

"took":38,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":1.0,

"hits":[{

"_index":"map",

"_type":"poi",

"_id":"2",

"_score":1.0,

"_source":{

"name":"London",

"location":[-0.1275,51.507222]

}

}]

}

}

www.EBooksWorld.ir

Page 505: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Asyoucansee,againElasticsearchagreeswiththemap.

LimitingthedistanceThelastexampleshowsthenextcommonrequirement:limitingtheresultstotheplacesthatarelocatednofurtherthanthedefineddistancefromagivenpoint.Forexample,ifwewanttolimitourresultstoallthecitieswithinthe500kmradiusfromParis,wecanusethefollowingquery:

curl-XGETlocalhost:9200/map/_search?pretty-d'{

"query":{

"bool":{

"must":{"match_all":{}},

"filter":{

"geo_distance":{

"location":"48.8567,2.3508",

"distance":"500km"

}

}

}

}

}'

Ifeverythinggoeswell,Elasticsearchshouldonlyreturnasinglerecordfortheprecedingquery,andtherecordshouldbeLondonagain.However,wewillleaveitforyouasareadertocheck.

www.EBooksWorld.ir

Page 506: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ArbitrarygeoshapesSometimes,usingasinglegeographicalpointorasinglerectangleisjustnotenough.Insuchcasessomethingmoresophisticatedisneeded,andElasticsearchaddressesthisbygivingyouthepossibilitytodefineshapes.Inordertoshowyouhowwecanleveragecustomshape-limitinginElasticsearch,weneedtomodifyourindexorcreateanewoneandintroducethegeo_shapetype.Ournewmappinglooksasfollows(wewillusethistocreateanindexcalledmap2):

{

"mappings":{

"poi":{

"properties":{

"name":{"type":"string","index":"not_analyzed"},

"location":{"type":"geo_shape"}

}

}

}

}

Assumingwewrotetheprecedingmappingdefinitiontothemapping2.jsonfile,wecancreateanindexbyusingthefollowingcommand:

curl-XPUTlocalhost:9200/[email protected]

NoteElasticsearchallowsustosetseveralattributesforthegeo_shapetype.Themostcommonlyusedistheprecisionparameter.Duringindexing,theshapeshavetobeconvertedtoasetofterms.Themoreaccuracyrequired,themoretermsshouldbegenerated,whichisdirectlyreflectedintheindexsizeandperformance.Precisioncanbedefinedinthefollowingunits:in,inch,yd,yard,mi,miles,km,kilometers,m,meters,cm,centimeters,ormm,millimeters.Bydefault,theprecisionissetto50m.

Next,let’schangeourexampledatatomatchournewindexstructureandcreatethedocuments2.jsonfilewiththefollowingcontents:

{"index":{"_index":"map2","_type":"poi","_id":1}}

{"name":"NewYork","location":{"type":"point","coordinates":

[-73.938611,40.664167]}}

{"index":{"_index":"map2","_type":"poi","_id":2}}

{"name":"London","location":{"type":"point","coordinates":

[-0.1275,51.507222]}}

{"index":{"_index":"map2","_type":"poi","_id":3}}

{"name":"Moscow","location":{"type":"point","coordinates":[

37.616667,55.75]}}

{"index":{"_index":"map2","_type":"poi","_id":4}}

{"name":"Sydney","location":{"type":"point","coordinates":

[151.211111,-33.865143]}}

{"index":{"_index":"map2","_type":"poi","_id":5}}

{"name":"Lisbon","location":{"type":"point","coordinates":

[-9.142685,38.736946]}}

www.EBooksWorld.ir

Page 507: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Thestructureofthefieldofthegeo_shapetypeisdifferentfromgeo_point.ItissyntacticallycalledGeoJSON(http://en.wikipedia.org/wiki/GeoJSON).Itallowsustodefinevariousgeographicaltypes.Nowit’stimetoindexourdata:

curl-XPOSTlocalhost:9200/[email protected]

Let’ssumupthetypesthatwecanuseduringquerying,atleasttheonesthatwethinkarethemostusefulones.

PointApointisdefinedbythetablewhenthefirstelementisthelongitudeandthesecondisthelatitude.Anexampleofsuchashapeisasfollows:

{

"type":"point",

"coordinates":[-0.1275,51.507222]

}

EnvelopeAnenvelopedefinesaboxgivenbythecoordinatesoftheupper-leftandbottom-rightcornersofthebox.Anexampleofsuchashapeisasfollows:

{

"type":"envelope",

"coordinates":[[-0.087890625,51.50874245880332],[2.4169921875,

48.80686346108517]]

}

PolygonApolygondefinesalistofpointsthatareconnectedtocreateourpolygon.Thefirstandthelastpointinthearraymustbethesamesothattheshapeisclosed.Anexampleofsuchashapeisasfollows:

{

"type":"polygon",

"coordinates":[[

[-5.756836,49.991408],

[-7.250977,55.124723],

[1.845703,51.500194],

[-5.756836,49.991408]

]]

}

Ifyoulookcloselyattheshapedefinition,youwillfindasupplementaryleveloftables.Thankstothis,youcandefinemorethanasinglepolygon.Insuchacase,thefirstpolygondefinesthebaseshapeandtherestofthepolygonsaretheshapesthatwillbeexcludedfromthebaseshape.

MultipolygonThemultipolygonshapeallowsustocreateashapethatconsistsofmultiplepolygons.Anexampleofsuchashapeisasfollows:

www.EBooksWorld.ir

Page 508: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

{

"type":"multipolygon",

"coordinates":[

[[

[-5.756836,49.991408],

[-7.250977,55.124723],

[1.845703,51.500194],

[-5.756836,49.991408]

]],[[

[-0.087890625,51.50874245880332],

[2.4169921875,48.80686346108517],

[3.88916015625,51.01375465718826],

[-0.087890625,51.50874245880332]

]]]

}

Themultipolygonshapecontainsmultiplepolygonsandfallsintothesamerulesasthepolygontype.So,wecanhavemultiplepolygonsand,inadditiontothis,wecanincludemultipleexclusionshapes.

AnexampleusageNowthatwehaveourindexwiththegeo_shapefields,wecancheckwhichcitiesarelocatedintheUK.Thequerythatwillallowustodothislooksasfollows:

curl-XGETlocalhost:9200/map2/_search?pretty-d'{

"query":{

"bool":{

"must":{"match_all":{}},

"filter":{

"geo_shape":{

"location":{

"shape":{

"type":"polygon",

"coordinates":[[

[-5.756836,49.991408],[-7.250977,55.124723],

[-3.955078,59.352096],[1.845703,51.500194],

[-5.756836,49.991408]

]]

}

}

}

}

}

}

}'

ThepolygontypedefinestheboundariesoftheUK(inavery,veryimpreciseway),andElasticsearch’sresponseisasfollows:

{

"took":7,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

www.EBooksWorld.ir

Page 509: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"failed":0

},

"hits":{

"total":1,

"max_score":1.0,

"hits":[{

"_index":"map2",

"_type":"poi",

"_id":"2",

"_score":1.0,

"_source":{

"name":"London",

"location":{

"type":"point",

"coordinates":[-0.1275,51.507222]

}

}

}]

}

}

Asfarasweknow,theresponseiscorrect.

StoringshapesintheindexUsually,shapedefinitionsarecomplex,andthedefinedareasdon’tchangetoooften(forexample,theboundariesoftheUK).Insuchcases,itisconvenienttodefinetheshapesintheindexandusetheminqueries.Thisispossible,andwewillnowdiscusshowtodoit.Asusual,wewillstartwiththeappropriatemapping,whichisasfollows:

{

"mappings":{

"country":{

"properties":{

"name":{"type":"string","index":"not_analyzed"},

"area":{"type":"geo_shape"}

}

}

}

}

Thismappingissimilartothemappingusedpreviously.Wehaveonlychangedthefieldnameandsaveditinthemapping3.jsonfile.Let’screateanewindexbyrunningthefollowingcommand:

curl-XPUTlocalhost:9200/[email protected]

Theexampledatathatwewilluselooksasfollows(storedinthefilecalleddocuments3.json):

{"index":{"_index":"countries","_type":"country","_id":1}}

{"name":"UK","area":{"type":"polygon","coordinates":[[[-5.756836,

49.991408],[-7.250977,55.124723],[-3.955078,59.352096],[1.845703,

51.500194],[-5.756836,49.991408]]]}}

{"index":{"_index":"countries","_type":"country","_id":2}}

{"name":"France","area":{"type":"polygon","coordinates":[[[

www.EBooksWorld.ir

Page 510: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

3.1640625,42.09822241118974],[-1.7578125,43.32517767999296],[

-4.21875,48.22467264956519],[2.4609375,50.90303283111257],[

7.998046875,48.980216985374994],[7.470703125,44.08758502824516],[

3.1640625,42.09822241118974]]]}}

{"index":{"_index":"countries","_type":"country","_id":3}}

{"name":"Spain","area":{"type":"polygon","coordinates":[[[

3.33984375,42.22851735620852],[-1.845703125,43.32517767999296],[

-9.404296875,43.19716728250127],[-6.6796875,41.57436130598913],[

-7.3828125,36.87962060502676],[-2.109375,36.52729481454624],[

3.33984375,42.22851735620852]]]}}

Toindexthedata,wejustneedtorunthefollowingcommand:

curl-XPOSTlocalhost:9200/[email protected]

Asyoucanseeinthedata,eachdocumentcontainsapolygontype.Thepolygonsdefinetheareaofthegivencountries(again,itisfarfrombeingaccurate).Ifyouremember,thefirstpointofashapeneedstobethesameasthelastonesothattheshapeisclosed.Now,let’schangeourquerytoincludetheshapesfromtheindex.Ournewquerylooksasfollows:

curl-XGETlocalhost:9200/map2/_search?pretty-d'{

"query":{

"bool":{

"must":{"match_all":{}},

"filter":{

"geo_shape":{

"location":{

"indexed_shape":{

"index":"countries",

"type":"country",

"path":"area",

"id":"1"

}

}

}

}

}

}

}'

Whencomparingthesetwoqueries,wecannotethattheshapeobjectchangedtoindexed_shape.WeneedtotellElasticsearchwheretolookforthisshape.Wecandothisbydefiningtheindex(theindexproperty,whichdefaultstoshape),thetype(thetypeproperty),andthepath(thepathproperty,whichdefaultstoshape).Theoneitemlackingisanidpropertyoftheshape.Inourcase,thisis1.However,ifyouwanttoindexmoreshapes,weadviseyoutoindextheshapeswiththeirnameastheiridentifier.

www.EBooksWorld.ir

Page 511: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 512: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UsingsuggestersAlongtimeago,startingfromElasticsearch0.90(whichwasreleasedonApril29,2013),wegottheabilitytouseso-calledsuggesters.Wecandefineasuggesterasafunctionalityallowingustocorrecttheuser’sspellingmistakesandbuildautocompletefunctionalitykeepingperformanceinmind.Thissectionisdedicatedtothesefunctionalitiesandwillhelpyoulearnaboutthem.Wewilldiscusseachavailablesuggestertypeandshowthemostcommonpropertiesthatallowustocontrolthem.However,keepinmindthatthissectionisnotacomprehensiveguidedescribingeachandeveryproperty.Descriptionofallthedetailsaboutsuggestersareaverybroadtopicandisoutofthescopeofthisbook.Ifyouwanttodigintotheirfunctionality,refertotheofficialElasticsearchdocumentation(https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html)ortotheMasteringElasticsearchSecondEditionbookpublishedbyPacktPublishing.

www.EBooksWorld.ir

Page 513: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AvailablesuggestertypesThesehavechangedsincetheinitialintroductionoftheSuggestAPItoElasticsearch.Wearenowabletousefourtypeofsuggesters:

term:Asuggesterreturningcorrectionsforeachwordpassedtoit.Usefulforsuggestionsthatarenotphrases,suchassingletermqueries.phrase:Asuggesterworkingonphrases,returningaproperphrase.completion:Asuggesterdesignedtoprovidefastandefficientautocompleteresults.context:ExtensiontotheSuggestAPIofElasticsearch.Allowsustohandlepartsofthesuggestqueriesinmemoryandthusveryeffectiveintermsofperformance.

www.EBooksWorld.ir

Page 514: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IncludingsuggestionsLet’snowtrygettingsuggestionsalongwiththequeryresults.Forexample,let’suseamatch_allqueryandtrygettingasuggestionforaserlockholnesphrase,whichhastwotermsspelledincorrectly.Todothis,werunthefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"suggest":{

"first_suggestion":{

"text":"serlockholnes",

"term":{

"field":"_all"

}

}

}

}'

Asyoucansee,we’veintroducedanewsectiontoourquery–thesuggestone.We’vespecifiedthetextwewanttogetthecorrectionforbyusingthetextproperty.We’vespecifiedthesuggesterwewanttouse(thetermone)andconfigureditspecifyingthenameofthefieldthatshouldbeusedforbuildingsuggestionsusingthefieldproperty.first_suggestionisthenamewegivetooursuggester;weneedtodothisbecausetherecanbemultipleonesused.Thisishowyousendarequestforsuggestioningeneral.

Ifwewanttogetmultiplesuggestionsforthesametext,wecanembedoursuggestionsinthesuggestobjectandplacethetextpropertyasthesuggestobjectoption.Forexample,ifwewanttogetsuggestionsfortheserlockholnestextforthetitlefieldandforthe_allfield,werunthefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"suggest":{

"text":"serlockholnes",

"first_suggestion":{

"term":{

"field":"_all"

}

},

"second_suggestion":{

"term":{

"field":"title"

}

}

}

}'

Suggesterresponse

www.EBooksWorld.ir

Page 515: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Nowlet’slookattheresponseofthefirstquerywesent.Asyoucanguess,theresponseincludesboththequeryresultsandthesuggestions:

{

"took":10,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":1.0,

"hits":[...]

},

"suggest":{

"first_suggestion":[{

"text":"serlock",

"offset":0,

"length":7,

"options":[{

"text":"sherlock",

"score":0.85714287,

"freq":1

}]

},{

"text":"holnes",

"offset":8,

"length":6,

"options":[{

"text":"holmes",

"score":0.8333333,

"freq":1

}]

}]

}

}

Wecanseethatwegotboththesearchresultsandthesuggestions(we’veomittedtheresultstomaketheexamplemorereadable)intheresponse.Thetermsuggesterreturnedalistofpossiblesuggestionsforeachtermthatwaspresentinthetextparameter.Foreachterm,thetermsuggesterreturnsanarrayofpossiblesuggestions.Lookingatthedatareturnedfortheserlockterm,wecanseetheoriginalword(thetextparameter),itsoffsetintheoriginaltextparameter(theoffsetparameter),anditslength(thelengthparameter).

TheoptionsarraycontainssuggestionsforthegivenwordandwillbeemptyifElasticsearchdoesn’tfindanysuggestions.Eachentryinthisarrayisasuggestionanddescribedbythefollowingproperties:

text:Textofthesuggestion.score:Suggestionscore;thehigherthescore,thebetterthesuggestion.freq:Frequencyofthesuggestion.Thefrequencyrepresentshowmanytimesthe

www.EBooksWorld.ir

Page 516: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

wordappearsinthedocumentsintheindexwearerunningthesuggestionqueryagainst.

www.EBooksWorld.ir

Page 517: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TermsuggesterThetermsuggesterworksonthebasisofstringeditdistance.Thismeansthatthesuggestionwiththefewestcharactersthatneedtobechanged,added,orremovedtomakethesuggestionlookastheoriginalword,isthebestone.Forexample,let’stakethewordsworlandwork.Tochangetheworltermtowork,weneedtochangethellettertok,soitmeansadistanceof1.Thetextprovidedtothesuggesterisofcourseanalyzedandthentermsarechosentobesuggested.

TermsuggesterconfigurationoptionsThecommonandmostusedtermsuggesteroptionscanbeusedforallthesuggesterimplementationsthatarebasedonthetermone.Currently,thesearethephrasesuggesterandofcoursethebasetermone.Theavailableoptionsare:

text:Thetextwewanttogetthesuggestionsfor.Thisparameterisrequiredinorderforthesuggestertowork.field:Anotherrequiredparameterthatweneedtoprovide.Thefieldparameterallowsustosetwhichfieldthesuggestionsshouldbegeneratedfor.analyzer:Thenameoftheanalyzerwhichshouldbeusedtoanalyzethetextprovidedinthetextparameter.Ifnotset,Elasticsearchutilizestheanalyzerusedforthefieldprovidedbythefieldparameter.size:Defaultsto5andspecifiesthemaximumnumberofsuggestionsallowedtobereturnedbyeachtermprovidedinthetextparameter.suggest_mode:Controlswhichsuggestionswillbeincludedandforwhattermsthesuggestionswillbereturned.Thepossibleoptionsare:missing–thedefaultbehavior,whichmeansthatthesuggesterwillonlyprovidesuggestionsfortermsthatarenotpresentintheindex;popular–meansthatthesuggestionswillonlybereturnedwhentheyaremorefrequentthantheprovidedterm;andfinallyalwaysmeansthatsuggestionswillbereturnedeverytime.sort:AllowsustospecifyhowthesuggestionsaresortedintheresultreturnedbyElasticsearch.Bydefault,itissettoscore,whichtellsElasticsearchthatthesuggestionsshouldbesortedbythesuggestionscorefirst,thesuggestiondocumentfrequencynext,andfinallybytheterm.Thesecondpossiblevalueisfrequency,whichmeansthattheresultsarefirstsortedbythedocumentfrequency,thenbythescore,andfinallybytheterm.

AdditionaltermsuggesteroptionsInadditiontotheprecedingcommontermsuggestoptions,Elasticsearchallowsustouseadditionalonesthatonlymakesenseforthetermsuggesteritself.Someoftheseoptionsareasfollows:

lowercase_terms:Whensettotrue,ittellsElasticsearchtolowercaseallthetermsthatareproducedfromthetextfieldafteranalysis.max_edits:Itdefaultsto2andspecifiesthemaximumeditdistancethatthesuggestioncanhavetobereturnedasatermsuggestion.Elasticsearchallowsusto

www.EBooksWorld.ir

Page 518: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

setthisvalueto1or2.prefix_len:Bydefault,itissetto1.Ifwearestrugglingwithsuggesterperformance,increasingthisvaluewillimprovetheoverallperformance,becausefewersuggestionswillneedtobeprocessed.min_word_len:Itdefaultsto4andspecifiestheminimumnumberofcharactersasuggestionmusthaveinordertobereturnedonthesuggestionslist.shard_size:Itdefaultstothevaluespecifiedbythesizeparameterandallowsustosetthemaximumnumberofsuggestionsthatshouldbereadfromeachshard.Settingthispropertytovalueshigherthanthesizeparametercanresultinmoreaccuratedocumentfrequencyatthecostofdegradationinsuggesterperformance.

NoteTheprovidedlistofparametersdoesnotcontainalltheoptionsthatareavailableforthetermsuggester.RefertotheofficialElasticsearchdocumentationforreference,athttps://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-term.html.

www.EBooksWorld.ir

Page 519: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PhrasesuggesterThetermsuggesterprovidesagreatwaytocorrectuserspellingmistakesonpertermbasis,butitisnotgreatforphrases.That’swhythephrasesuggesterwasintroduced.Itisbuiltontopofthetermsuggester,butaddsadditionalphrasecalculationlogictoit.

Let’sstartwithanexampleofhowtousethephrasesuggester.Thistimewewillomitthequerysectioninourquery.Wedothatbyrunningthefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"suggest":{

"text":"sherlockholnes",

"our_suggestion":{

"phrase":{"field":"_all"}

}

}

}'

Asyoucanseeintheprecedingcommand,itisalmostthesameaswesentwhenusingthetermsuggester,butinsteadofspecifyingthetermsuggestertypewe’vespecifiedthephrasetype.Theresponsetotheprecedingcommandisasfollows:

{

"took":24,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":1.0,

"hits":[...]

},

"suggest":{

"our_suggestion":[{

"text":"sherlockholnes",

"offset":0,

"length":15,

"options":[{

"text":"sherlockholmes",

"score":0.12227806

}]

}]

}

}

Asyoucansee,theresponseisverysimilartotheonereturnedbythetermsuggesterbut,insteadofasinglewordbeingreturned,itisalreadycombinedandreturnedasaphrase.

ConfigurationBecausethephrasesuggesterisbasedonthetermsuggester,itcanalsousesomeofthe

www.EBooksWorld.ir

Page 520: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

configurationoptionsprovidedbyit.Thoseoptionsare:text,size,analyzer,andshard_size.Inadditiontothementionedproperties,thephrasesuggesterexposesadditionaloptions.Someoftheseoptionsare:

max_errors:Specifiesthemaximumnumber(orpercentage)oftermsthatcanbeerroneousinordertocreateacorrectionusingit.Thevalueofthispropertycanbeeitheranintegernumber,suchas1,orafloatbetween0and1whichwillbetreatedasapercentagevalue.Bydefault,itissetto1,whichmeansthatatmostasingletermcanbemisspelledinagivencorrection.separator:Defaultstoawhitespacecharacterandspecifiestheseparatorthatwillbeusedtodividethetermsintheresultingbigramfield.

NoteTheprovidedlistofparametersdoesnotcontainalltheoptionsthatareavailableforthephrasesuggester.Infact,thelistiswaymoreextensivethanwhatwe’veprovided.RefertotheofficialElasticsearchdocumentationforreference,athttps://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html,ortoMasteringElasticsearchSecondEditionpublishedbyPacktPublishing.

www.EBooksWorld.ir

Page 521: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

CompletionsuggesterThecompletionsuggesterallowsustocreateautocompletefunctionalityinaveryperformance-effectiveway,becauseofstoringcomplicatedstructuresintheindexinsteadofcalculatingthemduringquerytime.WeneedtoprepareElasticsearchforthatbyusingadedicatedfieldtypecalledcompletion.Let’sassumethatwewanttocreateanautocompletefeaturetoallowustoshowbookauthors.Inadditiontoauthor’snamewewanttoreturntheidentifiersofthebooksshe/hewrote.Westartwithcreatingtheauthorsindexbyrunningthefollowingcommand:

curl-XPOST'localhost:9200/authors'-d'{

"mappings":{

"author":{

"properties":{

"name":{"type":"string"},

"ac":{

"type":"completion",

"payloads":true,

"analyzer":"standard",

"search_analyzer":"standard"

}

}

}

}

}'

Ourindexwillcontainasingletypecalledauthor.Eachdocumentwillhavetwofields:thenameandtheacfield,whichisthefieldwewilluseforautocomplete.We’vedefinedtheacfieldusingthecompletiontype.Inadditiontothat,we’veusedthestandardanalyzerforboththeindexandthequerytime.Thelastthingisthepayload-theadditional,optionalinformationwewillreturnalongwiththesuggestion-inourcaseitwillbeanarrayofbookidentifiers.

IndexingdataToindexthedata,weneedtoprovidesomeadditionalinformationalongwiththeonesweusuallyprovideduringindexing.Let’slookatthefollowingcommandsthatindextwodocumentsdescribingtheauthors:

curl-XPOST'localhost:9200/authors/author/1'-d'{

"name":"FyodorDostoevsky",

"ac":{

"input":["fyodor","dostoevsky"],

"output":"FyodorDostoevsky",

"payload":{"books":["123456","123457"]}

}

}'

curl-XPOST'localhost:9200/authors/author/2'-d'{

"name":"JosephConrad",

"ac":{

"input":["joseph","conrad"],

"output":"JosephConrad",

www.EBooksWorld.ir

Page 522: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"payload":{"books":["121211"]}

}

}'

Notethestructureofthedatafortheacfield.Wehaveprovidedtheinput,output,andpayloadproperties.Theoptionalpayloadpropertyisusedtoprovidetheadditionalinformationthatwillbereturned.Theinputpropertyisusedtoprovidetheinputinformationthatwillbeusedforbuildingthecompletionusedbythesuggester.Itwillbeusedforuserinputmatching.Theoptionaloutputpropertyisusedtotellthesuggesterwhichdatashouldbereturnedforthedocument.

Wecanalsoomittheadditionalparameterssectionandindexdatainthewayweareusedto,justlikeinthefollowingexample:

curl-XPOST'localhost:9200/authors/author/1'-d'{

"name":"FyodorDostoevsky",

"ac":"FyodorDostoevsky"

}'

However,becausethecompletionsuggesterusesFSTunderthehood,wewon’tbeabletofindtheprecedingdocumentbystartingwiththesecondpartoftheacfield.That’swhywethinkthatindexingthedatainthewayweshowedfirstismoreconvenient,becausewecanexplicitlycontrolwhatwewanttomatchandwhatwewanttoshowasanoutput.

QueryingindexedcompletionsuggesterdataIfwewanttofinddocumentsthathaveauthorsstartingwithfyo,werunthefollowingcommand:

curl-XGET'localhost:9200/authors/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fyo",

"completion":{

"field":"ac"

}

}

}'

Beforewelookattheresults,let’sdiscussthequery.Asyoucansee,we’verunthecommandtothe_suggestendpoint,becausewedon’twanttorunastandardquery;wearejustinterestedintheautocompleteresults.Thequeryisquitesimple.WesetitsnametoauthorsAutocomplete,wesetthetextwewanttogetthecompletionfor(thetextproperty),andweaddedthecompletionobjectwiththeconfigurationinit.Theresultoftheprecedingcommandlooksasfollows:

{

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

www.EBooksWorld.ir

Page 523: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"length":3,

"options":[{

"text":"FyodorDostoevsky",

"score":1.0,

"payload":{

"books":["123456","123457"]

}

}]

}]

}

Asyoucanseeintheresponse,wegetthedocumentwewerelookingforalongwiththepayloadinformation,ifitisavailable(fortheprecedingresponse,itisnot).

Wecanalsousefuzzysearches,whichallowustotoleratespellingmistakes.Wedothatbyincludingtheadditionalfuzzysectioninourquery.Forexample,toenablefuzzymatchinginthecompletionsuggesterandsetthemaximumeditdistanceto2(whichmeansthatamaximumoftwoerrorsareallowed),wesendthefollowingquery:

curl-XGET'localhost:9200/authors/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fio",

"completion":{

"field":"ac",

"fuzzy":{

"edit_distance":2

}

}

}

}'

Althoughwe’vemadeaspellingmistake,wewillstillgetthesameresultsaswegotearlier.

CustomweightsBydefault,thetermfrequencyisusedtodeterminetheweightofthedocumentreturnedbytheprefixsuggester.However,thismaynotbethebestsolution.Insuchcases,itisusefultodefinetheweightofthesuggestionbyspecifyingtheweightpropertyforthefielddefinedascompletion.Theweightpropertyshouldbesettoanintegervalue.Thehighertheweightpropertyvalue,themoreimportantthesuggestion.Forexample,ifwewanttospecifyaweightforthefirstdocumentinourexample,werunthefollowingcommand:

curl-XPOST'localhost:9200/authors/author/1'-d'{

"name":"FyodorDostoevsky",

"ac":{

"input":["fyodor","dostoevsky"],

"output":"FyodorDostoevsky",

"payload":{"books":["123456","123457"]},

"weight":30

}

}'

www.EBooksWorld.ir

Page 524: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Nowifwerunourexamplequery,theresultswillbeasfollows:

{

...

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

"length":3,

"options":[{

"text":"FyodorDostoevsky",

"score":30.0,

"payload":{

"books":["123456","123457"]

}

}]

}]

}

Lookhowthescoreoftheresultchanged.Inourinitialexample,itwas1.0andnowitis30.0.Thisissobecausewesettheweightparameterto30duringindexing.

ContextsuggesterThecontextsuggesterisanextensiontotheElasticsearchSuggestAPIforElasticsearch2.1andolderversionsthatwejustdiscussed.WhendescribingthecompletionsuggesterforElasticsearch2.1,wementionedthatthissuggesterallowsustohandlesuggester-relatedsearchesentirelyinmemory.Usingthissuggester,wecandefinethesocalledcontextforthequerythatwilllimitthesuggestionstoasubsetofdocuments.Becausewedefinethecontextinthemappings,itiscalculatedduringindexation,whichmakesquerytimecalculationseasierandlessdemandingintermsofperformance.

NoteRememberthatthissectionisrelatedtoElasticsearch2.1.ContextsinElasticsearch2.2arehandleddifferentlyandwerediscussedwhendiscussingthecompletionsuggester.

Contexttypes

Elasticsearch2.1supportstwotypesofcontext:categoryandgeo.Thecategorytypeofcontextallowsustoassignadocumenttooneormorecategoriesduringtheindextime.Later,duringthequerytime,wecantellElasticsearchwhichcategoryweareinterestedinandElasticsearchwilllimitthesuggestionstothosecategories.Thegeocontextallowsustolimitthedocumentsreturnedbythesuggesterstoagivenlocationortoacertaindistancefromapoint.Thenicethingaboutcontextisthatwecanhavemultiplecontexts.Forexample,wecanhaveboththecategorycontextandthegeocontextforthesamedocument.Let’snowseewhatweneedtodotousecontextinsuggestions.

Usingcontext

Usingthegeoandcategorycontextisverysimilar–theyjustdifferinparameters.Wewillshowyouhowtousecontextsinanexampleusingthesimplercategorycontextandlaterwewillgetbacktothegeocontextandshowyouwhatweneedtoprovide.

www.EBooksWorld.ir

Page 525: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Thefirststepwhenusingcontextsuggesteriscreatingapropermapping.Let’sgetbacktoourauthormapping,butthistimelet’sassumethateachauthorcanbegivenoneormorecategory–thebrandofbooksshe/heiswriting.Thiswillbeourcontext.Themappingsusingthecontextlookasfollows:

curl-XPOST'localhost:9200/authors_geo_context'-d'{

"mappings":{

"author":{

"properties":{

"name":{"type":"string"},

"ac":{

"type":"completion",

"analyzer":"simple",

"search_analyzer":"simple",

"context":{

"brand":{

"type":"category",

"default":["none"]

}

}

}

}

}

}

}'

We’veintroducedanewsectioninouracfielddefinition:context.Eachcontextisgivenaname,whichisbrandinourcase,andinsidethatobjectweprovideconfiguration.Weneedtoprovidethetypeusingthetypeproperty–wewillbeusingthecategorycontextsuggesternow.Inadditiontothat,we’vesetthedefaultarray,whichprovidesuswiththevalueorvaluesthatshouldbeusedasthedefaultcontext.Ifwewant,wecanalsoprovidethepathproperty,whichwillpointElasticsearchtoafieldinthedocumentsfromwhichthecontextvalueshouldbetaken.

Wecannowindexasingleauthorbymodifyingthecommandsweusedearlier,becauseweneedtoprovidethecontext:

curl-XPOST'localhost:9200/authors_context/author/1'-d'{

"name":"FyodorDostoevsky",

"ac":{

"input":"FyodorDostoevsky",

"context":{

"brand":"drama"

}

}

}'

Asyoucansee,theacfielddefinitionisabitdifferentnow;itisanobject.Theinputpropertyisusedtoprovidethevalueforautocompleteandthecontextobjectisusedtoprovidethevaluesforeachofthecontextsdefinedinthemappings.

Finally,wecanquerythedata.Asyoucouldimagine,wewillagainprovidethecontextweareinterestedin.Thequerythatdoesthatlooksasfollows:

www.EBooksWorld.ir

Page 526: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

curl-XGET'localhost:9200/authors_context/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fyo",

"completion":{

"field":"ac",

"context":{

"brand":"drama"

}

}

}

}'

Asyoucansee,we’veincludedthecontextobjectinthequeryinsidethecompletionsectionandwe’vesetthecontextweareinterestedinusingthecontextname.TheresponsereturnedbyElasticsearchisasfollows:

{

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

"length":3,

"options":[{

"text":"FyodorDostoevsky",

"score":1.0

}]

}]

}

However,ifwechangethebrandcontexttocomedy,forexample,Elasticsearchwillreturnnoresults,becausewedon’thaveauthorswithsuchacontext.Let’stestitbyrunningthefollowingquery:

curl-XGET'localhost:9200/authors_context/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fyo",

"completion":{

"field":"ac",

"context":{

"brand":"comedy"

}

}

}

}'

ThistimeElasticsearchreturnsthefollowingresponse:

{

"_shards":{

"total":5,

"successful":5,

"failed":0

www.EBooksWorld.ir

Page 527: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

},

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

"length":3,

"options":[]

}]

}

Thisisbecausenoauthorwiththebrandcontextandthevalueofcomedyispresentintheauthors_contextindex.

Usingthegeolocationcontext

Thegeocontextissimilartothecategorycontextwhenitcomestousingit.However,insteadoffilteringbyterms,wefilterusinggeographicalpointsanddistances.Whenweusethegeocontext,weneedtoprovideprecision,whichdefinestheprecisionofthecalculatedgeohash.Thesecondpropertythatweprovideistheneighborsone,whichcanbesettotrueorfalse.Bydefault,itissettotrue,whichmeansthattheneighboringgeohasheswillbeincludedinthecontext.

Inadditiontothat,similartothecategorycontext,wecanprovidepath,whichspecifieswhichfieldtouseasthelookupforthegeographicalpoint,andthedefaultproperty,specifyingthedefaultgeopointforthedocuments.

Forexample,let’sassumethatwewanttofilteronthebirthplaceofourauthors.Themappingsforsuchasuggesterwilllookasfollows:

curl-XPOST'localhost:9200/authors_geo_context'-d'{

"mappings":{

"author":{

"properties":{

"name":{"type":"string"},

"ac":{

"type":"completion",

"analyzer":"simple",

"search_analyzer":"simple",

"context":{

"birth_location":{

"type":"geo",

"precision":["1000km"],

"neighbors":true,

"default":{

"lat":0.0,

"lon":0.0

}

}

}

}

}

}

}

}'

Nowwecanindexthedocumentsandprovidethebirthlocation.Forourexampleauthor,

www.EBooksWorld.ir

Page 528: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

itwilllookasfollows(thecentreofMoscow):

curl-XPOST'localhost:9200/authors_geo_context/author/1'-d'{

"name":"FyodorDostoevsky",

"ac":{

"input":"FyodorDostoevsky",

"context":{

"birth_location":{

"lat":55.75,

"lon":37.61

}

}

}

}'

Asyoucansee,we’veprovidedthebirth_locationcontextforourauthor.

Nowduringquerytime,weneedtoprovidethecontextthatweareinterestedinandwecan(butwearenotobligatedto)providetheprecisionasthesubsetoftheprecisionvaluesprovidedinthemappings.We’vedefinedtheprecisionto1000km,solet’sfindalltheauthorsstartingwithfyothatwereborninKazan,whichisabout800kmfromMoscow.Weshouldfindourexampleauthor.

Thequerythatdoesthatlooksasfollows:

curl-XGET'localhost:9200/authors_geo_context/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fyo",

"completion":{

"field":"ac",

"context":{

"birth_location":{

"lat":55.45,

"lon":49.8

}

}

}

}

}'

TheresponsereturnedbyElasticsearchlooksasfollows:

{

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

"length":3,

"options":[{

"text":"FyodorDostoevsky",

"score":1.0

}]

www.EBooksWorld.ir

Page 529: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}]

}

However,ifwerunthesamequerybutpointtotheNorthPole,wewillgetnoresults:

curl-XGET'localhost:9200/authors_geo_context/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fyo",

"completion":{

"field":"ac",

"context":{

"birth_location":{

"lat":0.0,

"lon":0.0

}

}

}

}

}'

ThefollowingistheresponsefromElasticsearchinthiscase:

{

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

"length":3,

"options":[]

}]

}

www.EBooksWorld.ir

Page 530: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 531: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheScrollAPILet’simaginethatwehaveanindexwithseveralmilliondocuments.Wealreadyknowhowtobuildourqueryandsoon.However,whentryingtofetchalargenumberofdocuments,youseethatwhengettingfurtherandfurtherwithpagesoftheresults,thequeriesslowdownandfinallytimeoutorresultinmemoryissues.

Thereasonforthisisthatfull-textsearchengines,especiallythosethataredistributed,don’thandlepagingverywell.Ofcourse,gettingafewhundredpagesofresultsisnotaproblemforElasticsearch,butforgoingthroughalltheindexeddocumentsorthroughlargeresultset,aspecializedAPIhasbeenintroduced.

www.EBooksWorld.ir

Page 532: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ProblemdefinitionWhenElasticsearchgeneratesaresponse,itmustdeterminetheorderofthedocumentsthatformtheresult.Ifweareonthefirstpage,thisisnotabigproblem.Elasticsearchjustfindsthesetofdocumentsandcollectsthefirstones;let’ssay,20documents.Butifweareonthetenthpage,Elasticsearchhastotakeallthedocumentsfrompagesonetotenandthendiscardtheonesthatareonpagesonetonine.Thisisevenmorecomplicatedifwehaveadistributedenvironment,becausewedon’tknowfromwhichnodestheresultswillcome.Becauseofthat,eachnodeneedstobuildtheresponseandkeepitinmemoryforsometime.TheproblemisnotElasticsearch-specific;asimilarsituationcanbefoundinthedatabasesystems,forexample,generally,ineverysystemthatusestheso-calledpriorityqueue.

www.EBooksWorld.ir

Page 533: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ScrollingtotherescueThesolutionissimple.SinceElasticsearchhastodosomeoperations(determinethedocumentsforthepreviouspages)foreachrequest,wecanaskElasticsearchtostorethisinformationforsubsequentqueries.Thedrawbackisthatwecannotstorethisinformationforeverduetolimitedresources.Elasticsearchassumesthatwecandeclarehowlongweneedthisinformationtobeavailable.Let’sseehowitworksinpractice.

Firstofall,wequeryElasticsearchasweusuallydo.However,inadditiontoalltheknownparameters,weaddonemore:theparameterwiththeinformationthatwewanttousescrollingwithandhowlongwesuggestthatElasticsearchshouldkeeptheinformationabouttheresults.Wecandothisbysendingaqueryasfollows:

curl'localhost:9200/library/_search?pretty&scroll=5m'-d'{

"size":1,

"query":{

"match_all":{}

}

}'

Thecontentofthisqueryisirrelevant.TheimportantthingishowElasticsearchmodifiestheresponse.LookatthefollowingfirstfewlinesoftheresponsereturnedbyElasticsearch:

{

"_scroll_id":

"cXVlcnlUaGVuRmV0Y2g7NTsxNjo1RDNrYnlfb1JTeU1sX20yS0NRSUZ3OzE3OjVEM2tieV9vUl

N5TWxfbTJLQ1FJRnc7MTg6NUQza2J5X29SU3lNbF9tMktDUUlGdzsxOTo1RDNrYnlfb1JTeU1sX

20yS0NRSUZ3OzIwOjVEM2tieV9vUlN5TWxfbTJLQ1FJRnc7MDs=",

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

...

Thenewpartisthe_scroll_idsection.Thisisahandlethatwewilluseinthequeriesthatfollow.Elasticsearchhasaspecialendpointforthis:the_search/scrollendpoint.Let’slookatthefollowingexample:

curl-XGET'localhost:9200/_search/scroll?pretty'-d'{

"scroll":"5m",

"scroll_id":

"cXVlcnlUaGVuRmV0Y2g7NTsyNjo1RDNrYnlfb1JTeU1sX20yS0NRSUZ3OzI3OjVEM2tieV9vUl

N5TWxfbTJLQ1FJRnc7Mjg6NUQza2J5X29SU3lNbF9tMktDUUlGdzsyOTo1RDNrYnlfb1JTeU1sX

20yS0NRSUZ3OzMwOjVEM2tieV9vUlN5TWxfbTJLQ1FJRnc7MDs="

}'

Noweverycalltothisendpointwithscroll_idreturnsthenextpageofresults.

www.EBooksWorld.ir

Page 534: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Rememberthatthishandleisonlyvalidforthedefinedtimeofinactivity.

Ofcourse,thissolutionisnotideal,anditisnotveryappropriatewhentherearemanyrequeststorandompagesofvariousresultsorwhenthetimebetweentherequestsisdifficulttodetermine.However,youcanusethissuccessfullyforusecaseswhereyouwanttogetlargerresultsets,suchastransferringdatabetweenseveralsystems.

www.EBooksWorld.ir

Page 535: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 536: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SummaryInthechapterthatwejustfinished,welearnedaboutsomefunctionalitiesofElasticsearchthatwewon’tprobablyuseeverydayoratleastnoteveryoneofuswillusethem.Wediscussedpercolator–anupsidedownsearchfunctionalitythatallowsustoindexqueriesandfindwhichdocumentsmatchthem.WelearnedaboutthespatialcapabilitiesofElasticsearchandweusedsuggesterstocorrectuserspellingmistakesandbuildahighlyefficientautocompletefunctionality.WealsousedtheScrollAPItoefficientlyfetchlargenumberofresultsfromourElasticsearchindices.

Inthenextchapter,wewillfocusonclustersanditsconfiguration.Wewilldiscussnodediscovery,gateway,andrecoverymodules–whattheyareresponsibleforandhowtoconfigurethemtomatchourneeds.Wewillusetemplatesanddynamictemplates,andwewillseehowtoinstallpluginsextendingElasticsearch’sout-of-theboxfunctionalities.WewilllearnwhatarethecachesofElasticsearchcachesareandhowtoconfigurethemefficientlytomakethemostoutofthem.Finally,wewillusetheupdatesettingsAPItoupdateElasticsearchconfigurationonliveandrunningclusters.

www.EBooksWorld.ir

Page 537: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 538: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Chapter9.ElasticsearchClusterinDetailThepreviouschapterwasfullydedicatedtosearchfunctionalitiesthatarenotonlyaboutfulltextsearching.Welearnedhowtousepercolator–aninversedsearchthatallowsustobuildalteringfunctionalitiesontopofElasticsearch.WelearnedtousespatialfunctionalitiesofElasticsearchandweusedthesuggestAPIthatallowedustocorrectuser’sspellingmistakesaswellasbuildveryefficientautocompletefunctionalities.Butlet’snowfocusonrunningandadministeringElasticsearch.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

HowdoesElasticsearchfindnewnodesthatshouldjointheclusterWhatarethegatewayandrecoverymodulesHowdotemplatesworkHowtousedynamictemplatesHowtousetheElasticsearchpluginmechanismWhatarethecachesinElasticsearchandhowtotunethemHowtousetheUpdateSettingsAPItoupdateElasticsearchsettingsonrunningclusters

www.EBooksWorld.ir

Page 539: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UnderstandingnodediscoveryWhenstartingyourElasticsearchnode,oneofthefirstthingsthathappensislookingforamasternodethathasthesameclusternameandisvisible.Ifamasterisfound,thenodegetsjoinedintoanalreadyformedcluster.Ifnomasterisfound,thenthenodeitselfisselectedasamaster(ofcourseiftheconfigurationallowssuchbehavior).Theprocessofformingaclusterandfindingnodesiscalleddiscovery.Themoduleresponsiblefordiscoveryhastwomainpurposes:electingamasteranddiscoveringnewnodeswithinacluster.Inthissection,wewilldiscusshowwecanconfigureandtunethediscoverymodule.

www.EBooksWorld.ir

Page 540: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DiscoverytypesBydefault,withoutinstallingadditionalplugins,ElasticsearchallowsustouseZendiscovery,whichprovidesuswithunicastdiscovery.Unicast(http://en.wikipedia.org/wiki/Unicast)allowstransmissionofasinglemessageoverthenetworktoasinglehostatonce.Elasticsearchnodesendsthemessagetothenodesdefinedintheconfigurationandwaitsforaresponse.Whenthenodeisacceptedintothecluster,therecoverymodulekicksinandstartstherecoveryprocessifneeded,orthemasterelectionprocessifthemasterisstillnotelected.

NotePriortoElasticsearch2.0,theZendiscoverymoduleallowedustousemulticastdiscovery.Onamulticastcapablenetwork,ElasticsearchwasabletoautomaticallydiscovernodeswithoutspecifyinganyIPaddressesofotherElasticsearchserverssharingthesameclustername.Thiswasverymistakeproneandnotadvisedforproductionuseandthusitwasdeprecatedandremovedtoaplugin.

Elasticsearcharchitectureisdesignedtobepeertopeer.Whenrunningoperationssuchasindexingorsearching,themasternodedoesn’ttakepartincommunicationandtherelevantnodescommunicatewitheachotherdirectly.

www.EBooksWorld.ir

Page 541: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

NoderolesElasticsearchnodescanbeconfiguredtoworkinoneofthefollowingroles:

Master:Thenoderesponsibleformaintainingtheglobalclusterstate,changingitdependingontheneeds,andhandlingtheadditionandremovalofnodes.Therecanonlybeasinglemasternodeactiveinasinglecluster.Data:Thenoderesponsibleforholdingthedataandexecutingdatarelatedoperations(indexationandsearching)ontheshardsthatarepresentlocallyforthenode.Client:Thenoderesponsibleforhandlingrequests.Fortheindexingrequests,theclientnodeforwardstherequesttotheappropriateprimaryshardand,forthesearchrequests,itsendsittoalltherelevantshardsandaggregatestheresults.

Bydefault,eachnodecanworkasmaster,data,orclient.Itcanbeadataandaclientatthesametimeforexample.Onlargeandhighlyloadedclusters,itisveryimportanttodividetherolesofthenodesintheclusterandhavethenodesdoonlyasingleroleatatime.Whendealingwithsuchclusters,youwilloftenseeatleastthreemasternodes,multipledatanodes,andafewclientonlynodesaspartofthewholecluster.

MasternodeItisthemostimportantnodetypefromElasticsearchcluster’spointofview.Ithandlestheclusterstate,changesit,managesthenodesjoiningandleavingthecluster,checksthehealthoftheothernodesinthecluster(byrunningpingrequests),andmanagestheshardrelocationoperations.Ifthemasterissomehowdisconnectedfromthecluster,theremainingnodeswillselectanewmasterfromeachother.Alltheseprocessesaredoneautomaticallyonthebasisoftheconfigurationvaluesweprovide.YouusuallywantthemasternodestoonlycommunicatewiththeotherElasticsearchnodes,usingtheinternalJavacommunication.Toavoidhittingthemasternodesbymistake,itisadvisedtoturnofftheHTTPmoduleforthemintheconfiguration.

DatanodeThedatanodeisresponsibleforholdingthedataintheindices.Thedatanodesaretheonesthatneedthemostdiskspacebecauseofbeingloadedwithdataindexationrequestsandrunningsearchesonthedatatheyhavelocally.Thedatanodes,similartothemasternodescanhavetheHTTPmoduledisabled.

ClientnodeTheclientnodesareinmostcasesnodesthatdon’thaveanydataandarenotmasternodes.Theclientnodesaretheonesthatcommunicatewiththeoutsideworldandwithallthenodesinthecluster.Theyforwardthedatatotheappropriateshardsandaggregatethesearchandaggregationsresultsfromalltheothernodes.

Keepinmindthatclientnodescanhavedataaswell,butinsuchacasetheywillrunboththeindexingrequestsandthesearchrequestsforthelocaldataandwillaggregatethedatafromtheothernodes,whichinlargeclustersmaybetoomuchworkforasinglenode.

www.EBooksWorld.ir

Page 542: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ConfiguringnoderolesBydefault,Elasticsearchallowseverynodetobeamasternode,adatanode,oraclientnode.However,aswealreadymentioned,incertainsituationsyoumaywanttohavenodesthatonlyholddata,clientnodesthatareonlyusedtoprocessrequests,andmasterhoststomanagethecluster.Onesuchsituationiswhenmassiveamountsofdataneedstobehandled,wherethedatanodesshouldbeasperformantaspossible.TotellElasticsearchwhatroleitshouldtake,weusethreeBooleanpropertiessetintheelasticsearch.ymlconfigurationfile:

node.master:Whensettotrue,wetellElasticsearchthatthenodeismastereligible,whichmeansthatitcantaketheroleofamaster.However,notethatthemasterwillbeautomaticallymarkedasnotmastereligibleassoonasitisassignedaclientrole.node.data:Whensettotrue,wetellElasticsearchthatthenodecanbeusedtoholddata.node.client:Whensettotrue,wetellElasticsearchthatthenodeshouldbeusedasaclient.

So,tosetanodetoonlyholddata,weshouldaddthefollowingpropertiestotheelasticsearch.ymlconfigurationfile:

node.master:false

node.data:true

node.client:false

Tosetthenodetonotholddataandonlybeamasternode,weneedtoinstructElasticsearchthatwedon’twantthenodetoholddata.Inordertodothis,weaddthefollowingpropertiestotheelasticsearch.ymlconfigurationfile:

node.master:true

node.data:false

node.client:false

www.EBooksWorld.ir

Page 543: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Settingthecluster’snameIfwedon’tsetthecluster.namepropertyinourelasticsearch.ymlfile,Elasticsearchusestheelasticsearchdefaultvalue.Thisisnotagoodthing,becauseeachnewElasticsearchnodewillhavethesameclusternameandyoumaywanttohavemultipleclustersinthesamenetwork.Insuchacase,connectingthewrongnodestogetherisjustamatteroftime.Becauseofthat,wesuggestsettingthecluster.namepropertytosomeothervalueofyourchoice.Usually,itisagoodideatoadjustclusternamesbasedonclusterresponsibilities.

www.EBooksWorld.ir

Page 544: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ZendiscoveryThedefaultdiscoverymethodusedbyElasticsearchandonethatiscommonlyusedintheElasticsearchworldiscalledZendiscovery.Itsupportsunicastdiscoveryandallowsadjustingvariouspartsofitsconfiguration.

NoteNotethatthereareadditionaldiscoverytypesavailableasplugins,suchasAmazonEC2discovery,MicrosoftAzurediscovery,andGoogleComputeEnginediscovery.

MasterelectionconfigurationImaginethatyouhaveaclusterthatisbuiltof10nodes.Everythingisworkingfineuntilonedaywhenyournetworkfailsand3ofyournodesaredisconnectedfromthecluster,buttheystillseeeachother.BecauseoftheZendiscoveryandmasterelectionprocess,thenodesthatgotdisconnectedelectanewmasterandyouendupwithtwoclusterswiththesamename,withtwomasternodes.Suchasituationiscalledasplit-brainandyoumustavoiditasmuchaspossible.Whensplit-brainhappens,youendupwithtwo(ormore)clustersthatwon’tjoineachotheruntilthenetwork(oranyother)problemsarefixed.Thethingtorememberisthatsplit-brainmayresultinnotrecoverableerrors,suchasdataconflictsinwhichyouendupwithdatacorruptionorpartialdataloss.That’swhyitisimportanttoavoidsuchsituationsatallcosts.

Inordertopreventsplit-brainsituations,Elasticsearchprovidesadiscovery.zen.minimum_master_nodesproperty.Thispropertydefinestheminimumamountofmastereligiblenodesthatshouldbeconnectedtoeachotherinordertoformacluster.Sonowlet’sgetbacktoourcluster;ifwesetthediscovery.zen.minimum_master_nodespropertyto50percentofthetotalnodesavailable+1(whichis6inourcase),wewillendupwithasinglecluster.Whyisthat?Beforethenetworkfailure,wehad10nodes,whichismorethansixnodes,andthosenodesformedacluster.Afterthedisconnectionofthethreenodes,wewouldstillhavethefirstclusterupandrunning.However,becauseonlythreenodesgotdisconnectedandthreeislessthansix,thesethreenodeswouldn’tbeallowedtoelectanewmasterandtheywouldwaitforreconnectionwiththeoriginalcluster.

Ofcoursethisisalsonotaperfectscenario.Itisadvisedtohaveadedicatedmastereligiblenodesonly,thatdon’tworkasdataorclientnodes.Tohaveaquoruminsuchacase,weneedatleastthreededicatedmastereligiblenodes,becausethatwillallowustohaveasinglemasterofflineandstillkeepthequorum.Thisisusuallyenoughtokeeptheclustersinagoodshapewhenitcomestomasterrelatedfeaturesandtobesplit-brainproof.Ofcourse,insuchacase,thediscovery.zen.minimum_master_nodespropertyshouldbesetto2andweshouldhavethethreemasternodesupandrunning.

Furthermore,ElasticsearchallowsustoadditionallyspecifytwoadditionalBooleanproperties:discover.zen.master_election.filter_clientanddiscover.zen.master_election.filter_data.TheyallowustotellElasticsearchtoignorepingrequestsfromtheclientanddatanodesduringmasterelection.Bydefault,the

www.EBooksWorld.ir

Page 545: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

firstmentionedpropertyissettotrueandthesecondissettofalse.ThisallowsElasticsearchtofocusonthemasterelectionandnotbeoverloadedwithpingrequestsfromthenodesthatarenotmastereligible.

Inadditiontothementionedproperties,Elasticsearchallowsconfiguringtimeoutsrelatedtothemasterelectionprocess.discovery.zen.ping_timeout,whichdefaultsto3s(threeseconds),allowsconfiguringtimeoutforslownetworks–thehigherthevalue,thelesserthechanceoffailure,buttheelectionprocesscantakelonger.Thesecondpropertyiscalleddiscover.zen.join_timeoutandspecifiesthetimeoutforthejoinrequesttothemaster.Itdefaultsto20timesthediscovery.zen.ping_timeoutproperty.

ConfiguringunicastBecauseofthewayunicastworks,weneedtospecifyatleastahostthattheunicastmessageshouldbesentto.Todothis,weshouldaddthediscovery.zen.ping.unicast.hostspropertytoourelasticsearch.ymlconfigurationfile.Basically,weshouldspecifyallthehoststhatformtheclusterinthediscovery.zen.ping.unicast.hostsproperty(wedon’thavetospecifyallthehosts,wejustneedtoprovideenoughsothatwearesurethatasingleonewillwork).Forexample,ifwewantthehosts192.168.2.1,192.168.2.2and192.168.2.3forourhost,weshouldspecifytheprecedingpropertyinthefollowingway:

discovery.zen.ping.unicast.hosts:192.168.2.1:9300,192.168.2.2:9300,

192.168.2.3:9300

OnecanalsodefinearangeoftheportsElasticsearchcanuse.Forexample,tosaythatportsfrom9300to9399canbeused,wespecifythefollowing:

discovery.zen.ping.unicast.hosts:192.168.2.1:[9300-9399],192.168.2.2:

[9300-9399],192.168.2.3:[9300-9399]

Notethatthehostsareseparatedwithacommacharacterandwe’vespecifiedtheportonwhichweexpectunicastmessages.

FaultdetectionpingsettingsInadditiontothesettingsdiscussedpreviously,wecanalsocontroloralterthedefaultpingconfiguration.Pingisasignalsentbetweenthenodestocheckiftheyarerunningandresponsive.Themasternodepingsalltheothernodesintheclusterandeachoftheothernodesintheclusterpingsthemasternode.Thefollowingpropertiescanbeset:

discovery.zen.fd.ping_interval:Thisdefaultsto1s(onesecond)andspecifieshowoftenthenodespingeachotherdiscovery.zen.fd.ping_timeout:Thisdefaultsto30s(30seconds)anddefineshowlonganodewillwaitfortheresponsetoitspingmessagebeforeconsideringanodeasunresponsivediscovery.zen.fd.ping_retries:Thisdefaultsto3andspecifieshowmanyretriesshouldbetakenbeforeconsideringanodeasnotworking

Ifyouexperiencesomeproblemswithyournetwork,oryouknowthatyournodesneedmoretimetoseethepingresponse,youcanadjusttheprecedingvaluestotheonesthat

www.EBooksWorld.ir

Page 546: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

aregoodforyourdeployment.

ClusterstateupdatescontrolAswehavealreadydiscussed,themasternodeistheoneresponsibleforhandlingthechangesoftheclusterstateandElasticsearchallowsustocontrolthatprocess.Formostusecases,thedefaultsettingsaremorethanenough,butyoumayrunintosituationswherechangingthesettingsisrequired.

Themasternodeprocessesasingleclusterstatecommandatatime.Firstthemasternodepropagatesthechangestoothernodesandthenitwaitsforresponse.Eachclusterstatechangeisnotconsideredfinisheduntilenoughnodesrespondtothemasterwithacknoledgment.Thenumberofnodesthatneedtorespondisspecifiedbydiscovery.zen.minimum_master_nodes,whichwearealreadyawareof.ThemaximumtimeanElasticsearchnodewaitsforthenodestorespondis30sbydefaultandisspecifiedbythediscovery.zen.commit_timeoutproperty.Ifnotenoughnodesrespondtothemaster,theclusterstatechangeisrejected.

Onceenoughnodesrespondtothemasterpublishmessage,theclusterstatechangeisacceptedonthemasterandtheclusterstateischanged.Oncethatisdone,themastersendsamessagetoallthenodessayingthatthechangecanbeapplied.Thetimeoutofthismessageisagainsetto30secondsandiscontrolledusingthediscovery.zen.publish_timeoutproperty.

DealingwithmasterunavailabilityIfaclusterhasnomasternode,whateverthereasonmaybe,itisnotfullyoperational.Bydefault,wecan’tchangethemetadata,clusterwidecommandswillnotbeworking,andsoon.Elasticsearchallowsustoconfigurethebehaviorofthenodeswhenthemasternodeisnotelected.Todothat,wecanusethediscovery.zen.no_master_blockpropertywhichthesettingsofallandwrite.Settingthispropertytoallmeansthatalltheoperationsonthenodewillberejected,thatis,thesearchoperations,thewriterelatedoperations,andtheclusterwideoperationssuchashealthormappingsretrieval.Settingthispropertytowritemeansthatonlythewriteoperationwillberejected–thisisthedefaultbehaviorofElasticsearch.

www.EBooksWorld.ir

Page 547: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AdjustingHTTPtransportsettingsWhilediscussingthenodediscoverymoduleandprocess,wementionedtheHTTPmoduleafewtimes.WewouldliketogetbacktothattopicnowanddiscussafewusefulpropertieswhendiscussingandusingElasticsearch.

DisablingHTTPThefirstthingisdisablingtheHTTPcompletely.Thisisusefultoensurethatthemasteranddatanodeswon’tacceptanyqueriesorrequestsingeneralfromusers.TodisabletheHTTPtransportcompletely,wejustneedtoaddthehttp.enabledpropertyandsetittofalseinourelasticsearch.ymlfile.

HTTPportElasticsearchallowsustodefinetheportonwhichitwillbelisteningtoHTTPrequests.Thisisdonebyusingthehttp.portproperty.Itdefaultsto9200-9300,whichmeansthatElasticsearchwillstartfrom9200portandincreaseiftheportisnotavailable(sothenextinstancewilluse9201port,andsoon).Thereisalsohttp.publish_port,whichisveryusefulwhenrunningElasticsearchbehindafirewallandwhentheHTTPportisnotdirectlyaccessible.ItdefineswhichportshouldbeusedbytheclientsconnectingtoElasticsearchanddefaultstothesamevalueasthehttp.portproperty.

HTTPhostWecanalsodefinethehosttowhichElasticsearchwillbind.Tospecifyit,weneedtodefinethehttp.hostproperty.Thedefaultvalueistheonesetbythenetworkmodule.Ifneeded,wecansetthepublishhostandthebindhostseparatelyusingthehttp.publish_hostandhttp.bind_hostproperties.Youusuallydon’thavetospecifythesepropertiesunlessyournodeshavenonstandardhostnamesormultiplenamesandyouwantElasticsearchtobindtoasingleoneonly.

YoucanfindthefulllistofpropertiesallowedfortheHTTPmoduleinElasticsearchofficialdocumentationavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/2.2/modules-http.html.

www.EBooksWorld.ir

Page 548: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 549: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThegatewayandrecoverymodulesApartfromourindicesandthedataindexedinsidethem,Elasticsearchneedstoholdthemetadata,suchasthetypemappings,theindexlevelsettings,andsoon.Thisinformationneedstobepersistedsomewheresoitcanbereadduringclusterrecovery.Ofcourse,itcouldbestoredinmemory,butfullclusterrestartorafatalfailurewouldresultinthisinformationbeinglost,whichisnotsomethingthatwewant.ThisiswhyElasticsearchintroducedthegatewaymodule.Youcanthinkaboutitasasafeheavenforyourclusterdataandmetadata.Eachtimeyoustartyourcluster,alltheneededdataisreadfromthegatewayand,whenyoumakeachangetoyourcluster,itispersistedusingthegatewaymodule.

www.EBooksWorld.ir

Page 550: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThegatewayInordertosetthetypeofgatewaywewanttouse,weneedtoaddthegateway.typepropertytotheelasticsearch.ymlconfigurationfileandsetittothelocalvalue.Currently,Elasticsearchrecommendsusingthelocalgatewaytype(gateway.typesettolocal),whichisthedefaultoneandtheonlyoneavailablewithoutadditionalplugins.

Thedefaultlocalgatewaytypestorestheindicesandtheirmetadatainthelocalfilesystem.Comparedtotheothergateways,thewriteoperationtothisgatewayisnotperformedinanasynchronousway,so,wheneverawritesucceeds,youcanbesurethatthedatawaswrittenintothegateway(sobasicallyindexedorstoredinthetransactionlog).

www.EBooksWorld.ir

Page 551: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RecoverycontrolInadditiontochoosingthegatewaytype,Elasticsearchallowsustoconfigurewhentostarttheinitialrecoveryprocess.Therecoveryisaprocessofinitializingalltheshardsandreplicas,readingallthedatafromthetransactionlog,andapplyingthemontheshards.Basically,it’saprocessneededtostartElasticsearch.

Forexample,let’simaginethatwehaveaclusterthatconsistsof10Elasticsearchnodes.WeshouldinformElasticsearchaboutthenumberofnodesbysettinggateway.expected_nodestothatvalue,so10inourcase.WeinformElasticsearchaboutthenumberofexpectednodesthatareeligibletoholdthedataandeligibletobeselectedasamaster.Elasticsearchwillstarttherecoveryprocessimmediatelyifthenumberofnodesintheclusterisequaltothatproperty.

Wewouldalsoliketostarttherecoveryaftersixnodesaretogether.Todothis,weshouldsetthegateway.recover_after_nodespropertyto6.Thispropertyshouldbesettoavaluethatensuresthatthenewestversionoftheclusterstatesnapshotwillbeavailable,whichusuallymeansthatyoushouldstartrecoverywhenmostofyournodesareavailable.

Thereisalsoonemorething.Wewouldlikethegatewayrecoveryprocesstostart5minutesafterthegateway.recover_after_nodesconditionismet.Todothis,wesetthegateway.recover_after_timepropertyto5m.Thispropertytellsthegatewaymodulehowlongtowaitwiththerecoveryprocessafterthenumberofnodesreachedtheminimumspecifiedbythegateway.recovery_after_nodesproperty.Wemaywanttodothisbecauseweknowthatournetworkisquiteslowandwewantthenodescommunicationtobestable.NotethatElasticsearchwon’tdelaytherecoveryifthenumberofmasteranddataeligiblenodesthatformedtheclusterisequaltothevalueofthegateway.expected_nodesproperty.

Theprecedingpropertyvaluesshouldbesetintheelasticsearch.ymlconfigurationfile.Forexample:ifwewouldliketohavethepreviouslydiscussedvalueinthementionedfile,wewouldendupwiththefollowingsectioninthefile:

gateway.recover_after_nodes:6

gateway.recover_after_time:5m

gateway.expected_nodes:10

AdditionalgatewayrecoveryoptionsInadditiontothementionedoptions,Elasticsearchallowsussomeadditionaldegreeofcontrol.Theseadditionaloptionsare:

gateway.recover_after_master_nodes:Thisissimilartothegateway_recover_after_nodesproperty,butinsteadoftakingintoconsiderationallthenodes,itallowsustospecifyhowmanymastereligiblenodesshouldbepresentintheclusterbeforerecoverystartsgateway.recover_after_data_nodes:Thisisalsosimilartothegateway_recover_after_nodesproperty,butitallowsspecifyinghowmanydata

www.EBooksWorld.ir

Page 552: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

nodesshouldbepresentintheclusterbeforerecoverystartsgateway.expected_master_nodes:Thisissimilartothegateway.expected_nodesproperty,butinsteadofspecifyingthenumberofallthenodesthatweexpectinthecluster,itallowsspecifyinghowmanymastereligiblenodesweexpecttobepresentgateway.expected_master_nodes:Thisissimilartothegateway.expected_nodesproperty,butallowsspecifyinghowmanymasternodesweexpecttobepresentgateway.expected_data_nodes:Thisisalsosimilartothegateway.expected_nodesproperty,butallowsspecifyinghowmanydatanodesweexpecttobepresent

IndicesrecoveryAPIThereisalsooneotherthingwhenitcomestotherecoveryprocess–theindicesrecoveryAPI.Itallowsustoseetheprocessofindexorindicesrecovery.Touseit,wejustneedtospecifytheindicesandusethe_recoveryend-point.Forexample,tochecktherecoveryprocessofthelibraryindex,wewillrunthefollowingcommand:

curl-XGET'localhost:9200/library/_recovery?pretty'

Theresponsefortheprecedingcommandcanbelargeanddependsonthenumberofshardsintheindexandofcoursetheamountofindiceswewanttogetinformationfor.Inourcase,theresponselooksasfollows(weleftinformationaboutasingleshardtomakeitlessextensive):

{

"library":{

"shards":[{

"id":0,

"type":"STORE",

"stage":"DONE",

"primary":true,

"start_time_in_millis":1444030695956,

"stop_time_in_millis":1444030695962,

"total_time_in_millis":5,

"source":{

"id":"Brt5ejEVSVCkIfvY9iDMRQ",

"host":"127.0.0.1",

"transport_address":"127.0.0.1:9300",

"ip":"127.0.0.1",

"name":"PuffAdder"

},

"target":{

"id":"Brt5ejEVSVCkIfvY9iDMRQ",

"host":"127.0.0.1",

"transport_address":"127.0.0.1:9300",

"ip":"127.0.0.1",

"name":"PuffAdder"

},

"index":{

"size":{

"total_in_bytes":157,

"reused_in_bytes":157,

"recovered_in_bytes":0,

www.EBooksWorld.ir

Page 553: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"percent":"100.0%"

},

"files":{

"total":1,

"reused":1,

"recovered":0,

"percent":"100.0%"

},

"total_time_in_millis":1,

"source_throttle_time_in_millis":0,

"target_throttle_time_in_millis":0

},

"translog":{

"recovered":0,

"total":-1,

"percent":"-1.0%",

"total_on_start":-1,

"total_time_in_millis":4

},

"verify_index":{

"check_index_time_in_millis":0,

"total_time_in_millis":0

}

},

...

]

}

}

Asyoucanseeintheresponse,weseetheinformationabouteachshard.Foreachshard,weseethetypeoftheoperation(thetypeproperty),thestage(thestageproperty)describingwhatpartoftherecoveryprocessisinprogress,andwhetheritisaprimaryshard(theprimaryproperty).Inadditiontothis,weseesectionsaboutthesourceshard,thetargetshard,theindextheshardispartof,theinformationaboutthetransactionlog,andfinallyinformationabouttheindexverification.Allofthisallowsustoseewhatisthestatusoftherecoveryofourindices.

DelayedallocationWealreadydiscussedthatbydefaultElasticsearchtriestobalancetheshardsintheclusteraccordinglytothenumberofnodesinthatcluster.Becauseofthat,whenanodedropsoffthecluster(ormultiplenodesdo)orwhennodesjointhecluster,Elasticsearchstartsrebalancingthecluster,movingtheshardsandthereplicasaround.Thisisusuallyveryexpensive–newprimaryshardsmaybepromotedoutoftheavailablereplicas,largeamountofdatamaybecopiedbetweenthenewprimaryanditsreplicas,andsoon.Andthismaybehappeningbecauseasinglenodewasjustrestartedfor30secondsmaintenance.

Toavoidsuchsituations,Elasticsearchprovidesuswiththepossibilitytocontrolhowlongtowaitbeforebeginningallocationofshardsthatareinunassignedstate.Wecancontrolthedelaybyusingtheindex.unassigned.node_left.delayed_timeoutpropertyandsettingitonperindexbasis.Forexample,toconfiguretheallocationtimeoutforthe

www.EBooksWorld.ir

Page 554: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

libraryindexto10minutes,werunthefollowingcommand:

curl-XPUT'localhost:9200/library/_settings'-d'{

"settings":{

"index.unassigned.node_left.delayed_timeout":"10m"

}

}'

Wecanalsoconfiguretheallocationtimeoutforalltheindicesbyrunningthefollowingcommand:

curl-XPUT'localhost:9200/_all/_settings'-d'{

"settings":{

"index.unassigned.node_left.delayed_timeout":"10m"

}

}'

IndexrecoveryprioritizationElasticsearch2.2exposesonemorefeaturewhenitcomestotheindicesrecoveryprocessthatallowsustodefinewhichindicesshouldbeprioritizedwhenitcomestorecovery.Byspecifyingtheindex.prioritypropertyintheindexsettingsandassigningitapositiveintegervalue,wedefinetheorderinwhichElasticsearchshouldrecovertheindices;theoneswiththehigherindex.prioritypropertywillbestartedfirst.

Forexample,let’sassumethatwehavetwoindices,libraryandmap,andwewantthelibraryindextoberecoveredbeforethemapindex.Todothis,wewillrunthefollowingcommands:

curl-XPUT'localhost:9200/library/_settings'-d'{

"settings":{

"index.priority":10

}

}'

curl-XPUT'localhost:9200/map/_settings'-d'{

"settings":{

"index.priority":1

}

}'

Weassignedhigherprioritytothelibraryindexand,becauseofthat,itwillberecoveredfaster.

www.EBooksWorld.ir

Page 555: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 556: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TemplatesanddynamictemplatesIntheMappingsconfigurationsectionofChapter2,IndexingYourData,wediscussedmappings,howtheyarecreated,andhowthetype-determiningmechanismworks.Nowwewillgetintomoreadvancedtopics.Wewillshowyouhowtodynamicallycreatemappingsfornewindicesandhowtoapplysomelogictothetemplates,sothatnewindicesarealreadycreatedwithpredefinedmappings.

www.EBooksWorld.ir

Page 557: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TemplatesInvariouspartsofthebook,whendiscussingindexconfigurationanditsstructure,we’veseenthatthiscanbecomecomplicated,especiallywhenwehavesophisticateddatastructuresthatwewanttoindex,search,andaggregate.Especiallyifyouhavealotofsimilarindices,takingcareofthemappingsineachofthemcanbeaverypainfulprocess–eachnewindexhastobecreatedwithappropriatemappings.Elasticsearchcreatorspredictedthisandimplementedafeaturecalledindextemplates.Eachtemplatedefinesapattern,whichiscomparedtoanewlycreatedindexname.Whenbothofthemmatch,thevaluesdefinedinthetemplatearecopiedtotheindexstructuredefinition.Whenmultipletemplatesmatchthenameofthenewlycreatedindex,allofthemareappliedandthevaluesfromthetemplatesthatareappliedlateroverridethevaluesdefinedinthepreviouslyappliedtemplates.Thisisveryconvenientbecausewecandefineafewcommonsettingsinthegeneraltemplatesandchangetheminthemorespecializedones.Inaddition,thereisanorderparameterthatletsusforcethedesiredtemplateordering.Youcanthinkoftemplatesasdynamicmappingsthatcanbeappliednottothetypesindocumentsbuttotheindices.

AnexampleofatemplateLet’sseearealexampleofatemplate.Imaginethatwewanttocreatemanyindicesinwhichwedon’twanttostorethesourceofthedocumentssothatourindicesaresmaller.Wealsodon’tneedanyreplicas.WecancreateatemplatethatmatchesourneedbyusingtheElasticsearchRESTAPIandthe/_templateendpoint,bysendingthefollowingcommand:

curl-XPUThttp://localhost:9200/_template/main_template?pretty-d'{

"template":"*",

"order":1,

"settings":{

"index.number_of_replicas":0

},

"mappings":{

"_default_":{

"_source":{

"enabled":false

}

}

}

}'

Fromnowon,allthecreatedindiceswillhavenoreplicasandnosourcestored.Thisisbecausethetemplateparametervalueissetto*,whichmatchesallthenamesoftheindices.Notethe_default_typenameinourexample.Thisisaspecialtypenamewhichindicatesthatthecurrentruleshouldbeappliedtoeverydocumenttype.Thesecondinterestingthingistheorderparameter.Let’sdefineasecondtemplatebyusingthefollowingcommand:

curl-XPUThttp://localhost:9200/_template/ha_template?pretty-d'{

"template":"ha_*",

www.EBooksWorld.ir

Page 558: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"order":10,

"settings":{

"index.number_of_replicas":5

}

}'

Afterrunningtheprecedingcommand,allthenewindiceswillbehaveasearlierexcepttheoneswithnamesbeginningwithha_.Incaseoftheseindices,boththetemplatesareapplied.First,thetemplatewiththelowerordervalueisusedandthenthenexttemplateoverwritesthereplica’ssetting.So,theindiceswhosenamesstartwithha_willhavefivereplicasanddisabledsourcesstored.

NoteBeforeversion2.0,Elasticsearchtemplatescouldalsobestoredinfiles.StartingwithElasticsearch2.0,thisfeatureisnolongeravailable.

www.EBooksWorld.ir

Page 559: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DynamictemplatesSometimeswewanttohavethepossibilityofdefiningtypethatisdependentonthefieldnameandthetype.Thisiswheredynamictemplatescanhelp.Dynamictemplatesaresimilartotheusualmappings,buteachtemplatehasitspatterndefined,whichisappliedtoadocument’sfieldname.Ifafieldnamematchesthepattern,thetemplateisused.

Let’shavealookatthefollowingexample:

curl-XPOST'localhost:9200/news'-d'{

"mappings":{

"article":{

"dynamic_templates":[

{

"template_test":{

"match":"*",

"mapping":{

"index":"analyzed",

"fields":{

"str":{

"type":"{dynamic_type}",

"index":"not_analyzed"

}

}

}

}

}

]

}

}

}'

Intheprecedingexample,wedefinedthemappingforthearticletype.Inthismapping,wehaveonlyonedynamictemplatenamedtemplate_test.Thistemplateisappliedforeveryfieldintheinputdocumentbecauseofthesingleasteriskpatterninthematchproperty.Eachfieldwillbetreatedasamultifield,consistingofafieldnamedastheoriginalfield(forexample,title)andthesecondfieldwithanamesuffixedwithstr(forexample,title.str).ThefirstfieldwillhaveitstypedeterminedbyElasticsearch(withthe{dynamic_type}type),andthesecondfieldwillbeastring(becauseofthestringtype).

ThematchingpatternWehavetwowaysofdefiningthematchingpattern.Theyareasfollows:

match:Thistemplateisusedifthenameofthefieldmatchesthepattern(thispatterntypewasusedinourexample)unmatch:Thistemplateisusedifthenameofthefielddoesn’tmatchthepattern

Bydefault,thepatternisverysimpleandusesglobpatterns.Thiscanbechangedbyusingmatch_pattern=regexp.Afteraddingthisproperty,wecanuseallthemagicprovidedbyregularexpressionstomatchandunmatchthepatterns.Therearevariationssuchas

www.EBooksWorld.ir

Page 560: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

path_matchandpath_unmatchthatcanbeusedtomatchthenamesinnesteddocuments(byprovidingpath,similartoqueries).

FielddefinitionsWhenwritingatargetfielddefinition,thefollowingvariablescanbeused:

{name}:Thenameoftheoriginalfieldfoundintheinputdocument{dynamic_type}:Thetypedeterminedfromtheoriginaldocument

NoteNotethatElasticsearchchecksthetemplatesintheorderoftheirdefinitionsandthefirstmatchingtemplateisapplied.Thismeansthatthemostgenerictemplates(forexample,with"match":"*")mustbedefinedattheend.

www.EBooksWorld.ir

Page 561: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 562: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ElasticsearchpluginsAtvariousplacesinthisbook,wehaveuseddifferentpluginsthathavebeenabletoextendthecorefunctionalityofElasticsearch.YouprobablyremembertheadditionalprogramminglanguagesusedinscriptsdescribedintheScriptingcapabilitiesofElasticsearchsectionofChapter6,MakeYourSearchBetter.Inthissection,wewilllookathowthepluginsworkandhowtoinstallthem.

www.EBooksWorld.ir

Page 563: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThebasicsBydefault,Elasticsearchpluginsarelocatedintheirownsubdirectoryinthepluginssubdirectoryofthesearchenginehomedirectory.Ifyouhavedownloadedanewpluginmanually,youcanjustcreateanewdirectorywiththepluginnameandunpackthatpluginarchivetothisdirectory.Thereisalsoamoreconvenientwaytoinstallplugins:byusingthepluginscript.Wehaveuseditseveraltimesinthisbookwithouttalkingaboutit,sothistimelet’stakethetimeanddescribethistool.

Elasticsearchhastwomaintypesofplugins.Thesetwotypescanbecategorizedbasedonthecontentoftheplugin-descriptor.propertiesfile:Javapluginsandsiteplugins.Let’sstartwiththesiteplugins.TheyusuallycontainsetsofHTML,CSS,andJavaScriptfilesandaddadditionalUIcomponentstoElasticsearch.Elasticsearchtreatsthesitepluginsasafilesetthatshouldbeservedbythebuilt-inHTTPserverunderthe/_plugin/plugin_name/URL(forexample,/_plugin/bigdesk/).Thistypeofplugindoesn’tchangeanythingincoreElasticsearchfunctionality.

TheJavapluginsaretheonesthataddormodifythecoreElasticsearchfeatures.TheyusuallycontaintheJARfiles.Theplugin-descriptor.propertiesfilecontainsinformationaboutthemainclassthatshouldbeusedbyElasticsearchasanentrypointtoconfigurepluginsandallowthemtoextendtheElasticsearchfunctionality.ThenicethingabouttheJavapluginsisthattheycancontainthesitepartaswell.Thesitepartofthepluginneedstobeplacedinthe_sitedirectoryifweareunpackingthepluginmanually.

www.EBooksWorld.ir

Page 564: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

InstallingpluginsPluginscanbedownloadedfromthreesourcetypes.Thefirstistheofficialrepositorylocatedathttps://download.elastic.co.Allpluginsfromthissourcecanbeinstalledbyreferringtothepluginname.Forexample:

bin/plugininstalllang-javascript

Theprecedingcommandresultsininstallationofapluginthatallowsustouseanadditionalscriptinglanguage,JavaScript.ElasticsearchautomaticallytriestofindapluginversionthatisthesameastheversionofElasticsearchweareusing.Sometimes,likeinthefollowingexample,apluginmayaskforadditionalpermissionsduringinstallation.

Justsoweknowwhattoexpect,thisisanexampleresultofrunningtheprecedingcommand:

->Installinglang-javascript…

Trying

https://download.elastic.co/elasticsearch/release/org/elasticsearch/plugin/

lang-javascript/2.2.0/lang-javascript-2.2.0.zip…

Downloading…...............................................................

.........DONE

Verifying

https://download.elastic.co/elasticsearch/release/org/elasticsearch/plugin/

lang-javascript/2.2.0/lang-javascript-2.2.0.zipchecksumsifavailable…

Downloading.DONE

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

@WARNING:pluginrequiresadditionalpermissions@

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

*java.lang.RuntimePermissioncreateClassLoader

*org.elasticsearch.script.ClassPermission<<STANDARD>>

*org.elasticsearch.script.ClassPermission

org.mozilla.javascript.ContextFactory

*org.elasticsearch.script.ClassPermissionorg.mozilla.javascript.Callable

*org.elasticsearch.script.ClassPermission

org.mozilla.javascript.NativeFunction

*org.elasticsearch.script.ClassPermissionorg.mozilla.javascript.Script

*org.elasticsearch.script.ClassPermission

org.mozilla.javascript.ScriptRuntime

*org.elasticsearch.script.ClassPermissionorg.mozilla.javascript.Undefined

*org.elasticsearch.script.ClassPermission

org.mozilla.javascript.optimizer.OptRuntime

See

http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.

html

fordescriptionsofwhatthesepermissionsallowandtheassociatedrisks.

Continuewithinstallation?[y/N]y

Installedlang-javascriptinto/Users/someplace/elasticsearch-

2.2.0/plugins/lang-javascript

Installedlang-javascriptinto

/Users/negativ/Developer/Elastic/elasticsearch-2.2.0/plugins/lang-

javascript

www.EBooksWorld.ir

Page 565: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Ifthepluginisnotavailableatthefirstlocation,itcanbeplacedinoneoftheApacheMavenrepositories:MavenCentral(https://search.maven.org/)orMavenSonatype(https://oss.sonatype.org/).Inthiscase,thepluginnameforinstallationshouldbeequaltogroupId/artifactId/version,justaseverylibraryforMaven(http://maven.apache.org/).Forexample:

bin/plugininstallorg.elasticsearch/elasticsearch-mapper-attachments/3.0.1

ThethirdsourcearetheGitHub(https://github.com/)repositories.Theplugintoolassumesthatthegivenpluginaddresscontainstheorganizationnamefollowedbythepluginnameand,optionally,theversionnumber.Let’slookatthefollowingcommandexample:

bin/plugininstallmobz/elasticsearch-head

Ifyouwriteyourownpluginandyouhavenoaccesstotheearlier-mentionedsites,thereisnoproblem.Theplugintoolacceptstheurlpropertyfromwherethepluginshouldbedownloaded(insteadofspecifyingthenameoftheplugin).Thisoptionallowsustosetanylocationfortheplugins,includingthelocalfilesystem(usingthefile://prefix)orremotefile(usingthehttp://prefix).Forexample,thefollowingcommandwillresultintheinstallationofapluginarchivedonthelocalfilesysteminthe/tmp/elasticsearch-lang-javascript-3.0.0.RC1.zipdirectory:

bin/plugininstallfile:///tmp/elasticsearch-lang-javascript-3.0.0.RC1.zip

www.EBooksWorld.ir

Page 566: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RemovingpluginsRemovingapluginisassimpleasremovingitsdirectory.Youcanalsodothisbyusingtheplugintool.Forexample,toremovethepreviouslyinstalledJavaScriptplugin,werunacommandasfollows:

bin/pluginremovelang-javascript

Theoutputfromthecommandjustconfirmsthatthepluginwasremoved:

->Removinglang-javascript…

Removedlang-javascript

NoteYouneedtorestarttheElasticsearchnodefortheplugininstallationorremovaltotakeeffect.

www.EBooksWorld.ir

Page 567: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 568: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ElasticsearchcachesUntilnowwehaven’tmentionedElasticsearchcachesmuchinthebook.However,asmostcommonsystemsElasticsearchusersavarietyofcachestoperformmorecomplicatedoperationsortospeedupperformanceofheavydataretrievalfromdiskbasedLuceneindices.Inthissection,wewilllookatthemostcommoncachesofElasticsearch,whattheyareusedfor,whataretheperformanceimplicationsofusingthem,andhowtoconfigurethem.

www.EBooksWorld.ir

Page 569: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

FielddatacacheInthebeginningofthebook,wediscussedthatElasticsearchusesthesocalledinvertedindexdatastructuretoquicklyandefficientlysearchthroughthedocuments.Thisisverygoodwhensearchingandfilteringthedata,butforfeaturessuchasaggregations,sorting,orscriptusage,Elasticsearchneedsanun-inverteddatastructure,becausethesefunctionsrelyonperdocumentdatainformation.

Becauseoftheneedforuninverteddata,whenElasticsearchwasfirstreleaseditcontainedandstillcontainsaninmemorydatastructurecalledfielddata.Fielddataisusedtostoreallthevaluesofagivenfieldtomemorytoprovideveryfastdocumentbasedlookup.However,thecostofusingfielddataismemoryandincreasedgarbagecollection.Becauseofmemoryandperformancecost,startingfromElasticsearch2.0,eachindexed,notanalyzedfieldusesdocvaluesbydefault.Otherfields,suchasanalyzedtextfields,stillusefielddataandbecauseofthatitisgoodtoknowhowtohandlefielddata.

FielddatasizeElasticsearchallowsustocontrolhowmuchmemorythefielddatacacheuses.Bydefault,thecacheisunbounded,whichisverydangerous.Ifyouhavelargeindices,youmayrunintomemoryissues,wherethefielddatacachewilleatmostofthememorygiventoElasticsearchandwillresultinnodefailure.Weareallowedtoconfigurethesizeofthefielddatacachebyusingthestaticindices.fielddata.cache.sizepropertysettoanexplicitvalue(like10GB)ortoapercentageofthewholememorygiventoElasticsearch(like20%).

Rememberthatthefielddatacacheisveryexpensivetobuildasitneedstoloadallthevaluesofagivenfieldtomemory.Thiscantakealotoftimeresultingindegradationintheperformanceofthequeries.Becauseofthis,itisadvisedtohaveenoughmemorytokeeptheneededcachepermanentlyinElasticsearchmemory.However,weunderstandthatthisisnotalwayspossiblebecauseofhardwarecosts.

CircuitbreakersThenicethingaboutElasticsearchisthatitallowsustoachieveasimilarthinginmultiplewaysandwehavethesamesituationwhenitcomestofielddataandlimitingthememoryusage.Elasticsearchallowsustouseafunctionalitycalledcircuitbreakers,whichcanestimatehowmuchmemoryarequestoraquerywilluse,andifitisaboveadefinedthreshold,itwon’tbeexecutedatall,resultinginnomemoryusageandanexceptionthrown.Thisisverynicewhenwedon’twanttolimitthesizeofthefielddatacachebutwealsodon’twantasinglequerytocausememoryissuesandmaketheclusterunstable.Therearetwomaincircuitbreakers:thefielddatacircuitbreakerandtherequestcircuitbreaker.

Thefirstcircuitbreaker,thefielddataone,estimatestheamountofmemorythatwillneedtobeusedtoloaddatatothefielddatacacheforagivenquery.Wecanconfigurethelimitbyusingtheindices.breaker.fielddata.limitproperty,whichisbydefaultsetto60%,whichmeansthatafielddatacacheforasinglequerycan’tusemorethan60percent

www.EBooksWorld.ir

Page 570: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ofthememorygiventoElasticsearch.

Thesecondcircuitbreaker,therequestone,estimatesthememoryusedbyperrequestdatastructuresandpreventsthemfromusingmorethantheamountspecifiedbytheindices.breaker.request.limitproperty.Bydefault,thementionedpropertyissetto40%,whichmeansthatsinglerequestdatastructures,suchastheonesusedforaggregationcalculation,can’tusemorethan40%ofthememorygiventoElasticsearch.

Finally,thereisonemorecircuitbreakerthatisdefinedbytheindices.breaker.limit.totalproperty(bydefaultsetto70%).Thiscircuitbreakerdefinesthetotalamountofmemorythatcanbeusedbyboththeperrequestdatastructuresandfielddata.

Rememberthatthesettingsforcircuitbreakersaredynamicandcanbeupdatedusingclusterupdatesettings.

www.EBooksWorld.ir

Page 571: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

FielddataanddocvaluesAswealreadydiscussed,insteadoffielddatacache,docvaluescanbeused.Ofcourse,thisisonlytruefornotanalyzedfieldsandonesusingnumericdatatypesandnotmultivaluedones.Thiswillsavememoryandshouldbefasterthanthefielddatacacheduringquerytime,atthecostofslightindexingspeeddegradations(verysmall)andaslightlylargerindex.Ifyoucanusedocvalues,dothat–itwillhelpyourElasticsearchclustertomaintainstabilityandrespondtoqueriesquickly.

www.EBooksWorld.ir

Page 572: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ShardrequestcacheThefirstofthecachesthatoperatesonthequeries.Theshardrequestcachecachestheaggregationsandsuggestionsresultedbythequery,but,whenwritingthisbook,itwasnotcachingqueryhits.WhenElasticsearchexecutesthequery,thiscachecansavetheresourceconsumingaggregationsforthequeryandspeedupthesubsequentqueriesbyretrievingtheaggregationsorsuggestionsfrommemory.

NoteDuringthewritingofthisbook,theshardrequestcachewasonlyusedwhenthesize=0parameterwassetforthequery.Thismeansthatonlythetotalnumberofhits,aggregationresults,andsuggestionswillbecached.Rememberthatwhenrunningquerieswithdatesandusingthenowconstant,theshardquerycachewon’talsobeused.

Theshardrequestcache,asitsnamesays,cachestheresultsofthequeriesoneachshard,beforetheyarereturnedtothenodethataggregatestheresults.Thiscanbeverygoodwhenyouraggregationsareheavy,liketheonesthatdoalotofcomputationonthedatareturnedbythequery.Ifyourunalotofaggregationswithyourqueriesandthequeriescanberepeated,thinkaboutusingtheshardrequestcacheasitshouldhelpyouwithquerieslatency.

EnablingandconfiguringtheshardrequestcacheTheshardrequestcacheisdisabledbydefault,butcanbeeasilyenabled.Toenableit,weshouldsettheindex.requests.cache.enablepropertytotruewhencreatingtheindex.Forexample,toenabletheshardrequestcacheforanindexcallednew_library,weusethefollowingcommand:

curl-XPUT'localhost:9200/new_library'-d'{

"settings":{

"index.requests.cache.enable":true

}

}'

Onethingtorememberisthatthementionedsettingisnotdynamicallyupdatable.Weneedtoincludeitintheindexcreationcommandorwecanupdateitwhentheindexisclosed.

Themaximumsizeofthecacheisspecifiedusingtheindices.requests.cache.sizepropertyandissetto1%bydefault(whichmeans1%ofthetotalmemorygiventoElasticsearch).Wecanalsospecifyhowlongeachentryshouldbekeptbyusingtheindices.requests.cache.expireproperty,butitisnotsetbydefault.Also,thecacheisinvalidatedoncetheindexisrefreshed(duringindexsearcherreopening),whichmakesthesettinguselessmostofthetime.

NoteNotethatintheearlierversionsofElasticsearch,forexampleinthe1.xbranch,toenableordisablethiscache,theindex.cache.query.enablepropertywasused.Thismaybe

www.EBooksWorld.ir

Page 573: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

importantwhenmigratingfromolderElasticsearchversions.

PerrequestshardrequestcachedisablingElasticsearchallowsustocontroltherequestshardcacheusedonaperrequestbasis.Ifwehavethementionedcacheenabled,wecanstillforcethesearchenginetoomitcachingforsuchrequests.Thisisdonebyusingtherequest_cacheparameter.Ifsettotrue,therequestwillbecachedand,ifsettofalse,therequestwon’tbecached.Thisisespeciallyusefulwhenwewanttocacheourrequestsingeneralbutomitcachingforsomequeriesthatarerareandnotusedoften.Itisalsowiseforrequeststhatusenon-deterministicscriptsandtimerangestonotbecached.

ShardrequestcacheusagemonitoringIfwedon’tuseanymonitoringsoftwarethatallowsmonitoringthecachesusage,wecanuseElasticsearchAPItocheckthemetricsaroundtheshardrequestcache.Thiscanbedonebothattheindicesleveloratthenodeslevel.

Tocheckthemetricsfortheshardrequestcacheforalltheindices,weshouldusetheindicesstatsAPIandrunthefollowingcommand:

curl'localhost:9200/_stats/request_cache?pretty'

Tochecktherequestcachemetrics,butinpernodeview,werunthefollowingcommand:

curl'localhost:9200/_nodes/stats/indices/request_cache?pretty'

www.EBooksWorld.ir

Page 574: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

NodequerycacheThenodequerycacheisresponsibleforholdingtheresultsofqueriesforthewholenode.Itssizeisdefinedusingindices.queries.cache.size,defaultingto10%,andissharableacrossalltheshardspresentonthenode.WecansetitbothtothepercentageoftheheapmemorygiventoElasticsearch,likethedefaultone,ortoanexplicitvalue,like1024mb.Onethingtorememberaboutthecacheisthatitsconfigurationisstatic,itcan’tbeupdateddynamicallyandshouldbesetintheelasticsearch.ymlfile.Thenodequerycacheusestheleastrecentusedevictionpolicy,whichmeansthat,whenfull,itremovesthedatathatwasusedtheleast.

Thiscacheisveryusefulwhenyourunqueriesthatarerepetetiveandheavy,suchastheonesusedtogeneratecategorypagesorthemainpageinane-commerceapplication.

www.EBooksWorld.ir

Page 575: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IndexingbuffersThelastcachewewanttodiscussistheindexingbufferthatallowsustoimproveindexingthroughput.Theindexingbufferisdividedbetweenalltheshardsonthenodeandisusedtostorenewlyindexeddocuments.Oncethecachefillsup,Elasticsearchflushesthedatafromthecachetodisk,creatinganewLucenesegmentintheindex.

Therearefourstaticpropertiesthatallowustoconfiguretheindexingbuffersize.Theyneedtobesetintheelasticsearch.ymlfileandcan’tbechangeddynamicallyusingtheSettingsAPI.Thesepropertiesare:

indices.memory.index_buffer_size:Thispropertydefinestheamountofmemoryusedbyanodefortheindexingbuffer.Itacceptsbothapercentagevalueaswellasanexplicitvalueinbytes.Itdefaultsto10%,whichmeansthat10%oftheheapmemorygiventoanodewillbeusedastheindexingbuffer.indices.memory.min_index_buffer_size:Thispropertydefaultsto48mbandspecifiestheminimummemorythatwillbeusedbytheindexingbuffer.Itisusefulwhenindices.memory.index_buffer_sizeisdefinedasapercentagevalue,sothattheindexingbufferisneversmallerthanthevaluedefinedbythisproperty.indices.memory.max_index_buffer_size:Thispropertyspecifiesthemaximummemorythatwillbeusedbytheindexingbuffer.Itisusefulwhenindices.memory.index_buffer_sizeisdefinedasapercentagevalue,sothattheindexingbuffernevercrossesacertainamountofmemoryusage.indices.memory.min_shard_index_buffer_size:Thispropertydefaultsto4mbandsetsthehardminimumlimitoftheindexingbufferthatisgiventoeachshardonanode.Theindexingbufferforeachshardwillnotbelowerthanthevaluesetbythisproperty.

Whenitcomestoindexingperformance,ifyouneedhigherindexingthroughput,considersettingtheindexingbuffersizetoavaluehigherthanthedefaultsize.ItwillallowElasticsearchtoflushthedatatodisklessoftenandcreatefewersegments.Thiswillresultinlessmerges,thuslessI/OandCPUintensiveoperations.Becauseofthat,Elasticsearchwillbeabletousemoreresourcesforindexingpurposes.

www.EBooksWorld.ir

Page 576: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

WhencachesshouldbeavoidedTheusualquestionthatmaybeaskedbyusersisiftheyshouldreallycachealltheirrequests.Theanswerisobvious–ofcourse,cachesarenotthetoolforeveryone.Usingcachingisnotfree–itrequiresmemoryandadditionaloperationstoputthedatatocacheorgetthedataoutofthere.

What’smore,youshouldrememberthatElasticsearchroundrobinsqueriesbetweenprimaryshardsarereplicas,so,ifyouhavereplicas,noteveryrequestafterthefirstonewillusethecache.Imaginethatyouhaveanindexwhichhasasingleprimaryshardandtworeplicas.Whenthefirstrequestcomes,itwillhitarandomshard,butthenextrequest,evenwiththesamequery,willhitanothershard,notthesameone(unlessroutingisused).Youshouldtakethisintoconsiderationwhenusingcaches,becauseifyourqueriesarenotrepeated,youmayhavethemrunninglongerbecauseofacachebeingused.

Sotoanswerthequestionifyoushouldusecachingornot,wewouldadvisetakingyourdata,takingyourqueries,andrunningperformancetestsusingtoolssuchasJMeter(http://jmeter.apache.org).Thiswillletyouseehowyourclusterbehaveswithrealdataunderatestloadandseeifthequeriesareactuallyfasterwithorwithoutthecaches.

www.EBooksWorld.ir

Page 577: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 578: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheupdatesettingsAPIElasticsearchletsustuneitselfbyspecifyingthevariousparametersintheelasticsearch.ymlfile.ButyoushouldtreatthisfileasthesetofdefaultvaluesthatcanbechangedintheruntimeusingtheElasticsearchRESTAPI.Wecanchangeboththeperindexsettingandtheclusterwidesettings.However,youshouldrememberthatnotallpropertiescanbedynamicallychanged.Ifyoutrytoaltertheseparameters,Elasticsearchwillrespondwithapropererror.

www.EBooksWorld.ir

Page 579: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheclustersettingsAPIInordertosetoneoftheclusterproperties,weneedtousetheHTTPPUTmethodandsendaproperrequesttothe_cluster/settingsURI.However,wehavetwooptions:addingthechangesastransientorpermanent.

Thefirstone,transient,willsetthepropertyonlyuntilthefirstrestart.Inordertodothis,wesendthefollowingcommand:

curl-XPUT'localhost:9200/_cluster/settings'-d'{

"transient":{

"PROPERTY_NAME":"PROPERTY_VALUE"

}

}'

Asyoucansee,intheprecedingcommand,weusedtheobjectnamedtransientandweaddedourpropertydefinitionthere.Thismeansthatthepropertywillbevalidonlyuntiltherestart.Ifwewantourpropertysettingstopersistbetweenrestarts,insteadofusingtheobjectnamedtransient,weusetheonenamedpersistent.

Atanymoment,youcanfetchthesesettingsusingthefollowingcommand:

curl-XGETlocalhost:9200/_cluster/settings

www.EBooksWorld.ir

Page 580: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheindicessettingsAPITochangetheindicesrelatedsettings,Elasticsearchprovidesthe/_settingsendpointforchangingtheparametersforalltheindicesandthe/index_name/_settingsendpointformodifyingthesettingsofasingleindex.Whencomparedtotheclusterwidesettings,allthechangesdonetoindicesusingtheAPIarealwayspersistentandvalidafterElasticsearchrestarts.Tochangethesettingsforalltheindices,wesendthefollowingcommand:

curl-XPUT'localhost:9200/_settings'-d'{

"index":{

"PROPERTY_NAME":"PROPERTY_VALUE"

}

}'

Thecurrentsettingsforalltheindicescanbelistedusingthefollowingcommand:

curl-XGETlocalhost:9200/_settings

Tosetapropertyforasingleindex,werunthefollowingcommand:

curl-XPUT'localhost:9200/index_name/_settings'-d'{

"index":{

"PROPERTY_NAME":"PROPERTY_VALUE"

}

}'

Thegetthesettingsforthelibraryindex,werunthefollowingcommand:

curl-XGETlocalhost:9200/library/_settings

www.EBooksWorld.ir

Page 581: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 582: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SummaryInthechapterwejustfinished,welearnedafewveryimportantthingsaboutElasticsearch.Firstofall,welearnedhowwecanconfigurethenodediscoverymechanism.Inadditiontothat,welearnedtocontrolwhathappensaftertheclusterisinitiallyformedusingtherecoveryandgatewaymodules.Weuseddynamicandnon-dynamictemplatestohandleourindicesmoreeasily,andwelearnedwhattypeofcachesElasticsearchhasandhowtocontrolthem.Finally,weusedtheupdatesettingsAPItoupdatethevariousElasticsearchconfigurationvariablesonanalreadylivecluster.

Inthenextchapter,wewillfocusonclusteradministration.Wewillstartwithlearninghowtobackupourdataandhowtomonitorthekeyclustermetrics.We’llseethewaytocontrolclusterrebalancingandshardallocation,andwewilluseahumanfriendlyCatAPIthatallowsustogetvariedinformationaboutthecluster.Finally,wewilllearnaboutwarmingupourindicesandaliasing.

www.EBooksWorld.ir

Page 583: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 584: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Chapter10.AdministratingYourClusterInthepreviouschapter,wefocusedonElasticsearchnodesandclusterconfiguration.Westartedbydiscussingthenodediscoveryprocess,whatitisandhowtoconfigureit.We’vediscussedgatewayandrecoverymodulesandtunedthemtomatchourneeds.We’veusedtemplatesanddynamictemplatestomanagedatastructureeasilyandlearnedhowtoinstallpluginstoextendthefunctionalitiesofElasticsearch.Finally,we’velearnedaboutthecachesofElasticsearchandhowtoupdateindicesandclustersettingsusingadedicatedAPI.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

BackingupyourindicesinElasticsearchMonitoringyourclustersControllingshardsandrebalancingreplicasControllingshardsandallocatingreplicasUsingCATAPItolearnaboutclusterstateWarmingupAliasing

www.EBooksWorld.ir

Page 585: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ElasticsearchtimemachineAgoodpieceofsoftwareisaonethatcanmanageexceptionalsituationssuchashardwarefailureorhumanerror.Eventhoughaclusterofafewserversislessdependentonhardwareproblems,badthingscanstillhappen.Forexample,let’simaginethatyouneedtorestoreyourindices.OnepossiblesolutionistoreindexallyourdatafromaprimarydatastoresuchasaSQLdatabase.Butwhatwillyoudoifittakestoolongor,evenworse,theonlydatastoreisElasticsearch?BeforeElasticsearch1.0,creatingbackupsofindiceswasnoteasy.Theprocedureincludedstoppingindexation,flushingthedatatodisk,shuttingdownthecluster,and,finally,copyingthedatatoabackupdevice.

Fortunately,nowwecantakesnapshotsandthissectionwillguideyouandshowhowthisfunctionalityworks.

www.EBooksWorld.ir

Page 586: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

CreatingasnapshotrepositoryAsnapshotkeepsallthedatarelatedtotheclusterfromthetimethesnapshotcreationstartsanditincludesinformationabouttheclusterstateandindices.Beforewecreatesnapshots,atleastthefirstone,asnapshotrepositorymustbecreated.Eachrepositoryisrecognizedbyitsnameandshoulddefinethefollowingaspects:

name:Auniquenameoftherepository;wewillneeditlater.type:Thetypeoftherepository.Thepossiblevaluesarefs(arepositoryonasharedfilesystem)andurl(aread-onlyrepositoryavailableviaURL)settings:Additionalinformationneededdependingontherepositorytype

Now,let’screateafilesystemrepository.Beforethis,wehavetomakesurethatthedirectoryforourbackupsfulfilstworequirements.Thefirstisrelatedtosecurity.EveryrepositoryhastobeplacedinthepathdefinedintheElasticsearchconfigurationfileaspath.repo.Forexample,ourelasticsearch.ymlincludesalinesimilartothefollowingone:

path.repo:["/tmp/es_backup_folder","/tmp/backup/es"]

Thesecondrequirementsaysthateverynodeintheclustershouldbeabletoaccessthedirectorywesetfortherepository.

Sonow,let’screateanewfilesystemrepositorybyrunningthefollowingcommand:

curl-XPUTlocalhost:9200/_snapshot/backup-d'{

"type":"fs",

"settings":{

"location":"/tmp/es_backup_folder/cluster1"

}

}'

Theprecedingcommandcreatesarepositorynamedbackup,whichstoresthebackupfilesinthedirectorygivenbythelocationattribute.Elasticsearchrespondswiththefollowinginformation:

{"acknowledged":true}

Atthesametime,es_backup_folderonthelocalfilesystemiscreated—withoutanycontentyet.

NoteYoucanalsosetarelativepathwiththelocationparameter.Inthiscase,Elasticsearchdeterminestheabsolutepathbyfirstgettingthedirectorydefinedinpath.repo.

Aswesaid,thesecondrepositorytypeisurl.Itrequiresaurlparameterinsteadofthelocation,whichpointstotheaddresswheretherepositoryresides,forexample,theHTTPaddress.Asinthepreviouscase,theaddressshouldbedefinedintherepositories.url.allowed_urlsparameterintheElasticsearchconfiguration.Theparameterallowstheuseofwildcardsintheaddress.

www.EBooksWorld.ir

Page 587: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

NoteNotethatfile://addressesarecheckedagainstthepathsdefinedinthepath.repoparameter.

YoucanalsostoresnapshotsinAmazonS3,HDFS,orAzureusingtheadditionalpluginsavailable.Tolearnaboutthese,pleasevisitthefollowingpages:

https://github.com/elastic/elasticsearch-cloud-aws#s3-repositoryhttps://github.com/elastic/elasticsearch-hadoop/tree/master/repository-hdfshttps://github.com/elastic/elasticsearch-cloud-azure#azure-repository

Nowthatwehaveourfirstrepository,wecanseeitsdefinitionusingthefollowingcommand:

curl-XGETlocalhost:9200/_snapshot/backup?pretty

Wecanalsocheckalltherepositoriesbyrunningacommandlikethefollowing:

curl-XGETlocalhost:9200/_snapshot/_all?pretty

Orsimply,wecanusethis:

curl-XGETlocalhost:9200/_snapshot/_all?pretty

curl-XGETlocalhost:9200/_snapshot/?pretty

Ifyouwanttodeleteasnapshotrepository,thestandardDELETEcommandhelps:

curl-XDELETElocalhost:9200/_snapshot/backup?pretty

www.EBooksWorld.ir

Page 588: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

CreatingsnapshotsBydefault,Elasticsearchtakesalltheindicesandclustersettings(exceptthetransientones)whencreatingsnapshots.Youcancreateanynumberofsnapshotsandeachwillholdinformationavailablerightfromthetimewhenthesnapshotwascreated.Thesnapshotsarecreatedinasmartway;onlynewinformationiscopied.ThismeansthatElasticsearchknowswhichsegmentsarealreadystoredintherepositoryanddoesn’thavetosavethemagain.

Tocreateanewsnapshot,weneedtochooseauniquenameandusethefollowingcommand:

curl-XPUT'localhost:9200/_snapshot/backup/bckp1'

Theprecedingcommanddefinesanewsnapshotnamedbckp1(youcanonlyhaveonesnapshotwithagivenname;Elasticsearchwillcheckitsuniqueness)anddataisstoredinthepreviouslydefinedbackuprepository.Thecommandreturnsanimmediateresponse,whichlooksasfollows:

{"accepted":true}

Theprecedingresponsemeansthattheprocessofsnapshot-inghasstartedandcontinuesinthebackground.Ifyouwouldliketheresponsetobereturnedonlywhentheactualsnapshotiscreated,youcanaddthewait_for_completion=trueparameterasshowninthefollowingexample:

curl-XPUT'localhost:9200/_snapshot/backup/bckp2?

wait_for_completion=true&pretty'

Theresponsetotheprecedingcommandshowsthestatusofacreatedsnapshot:

{

"snapshot":{

"snapshot":"bckp2",

"version_id":2000099,

"version":"2.2.0",

"indices":["news"],

"state":"SUCCESS",

"start_time":"2016-01-07T21:21:43.740Z",

"start_time_in_millis":1446931303740,

"end_time":"2016-01-07T21:21:44.750Z",

"end_time_in_millis":1446931304750,

"duration_in_millis":1010,

"failures":[],

"shards":{

"total":5,

"failed":0,

"successful":5

}

}

}

Asyoucansee,Elasticsearchpresentsinformationaboutthetimetakenbythesnapshot-

www.EBooksWorld.ir

Page 589: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ingprocess,itsstatus,andtheindicesaffected.

AdditionalparametersThesnapshotcommandalsoacceptsthefollowingadditionalparameters:

indices:Thenamesoftheindicesofwhichwewanttotakesnapshots.ignore_unavailable:Whenthisissettofalse(thedefault),Elasticsearchwillreturnanerrorifanyindexlistedusingtheindicesparameterismissing.Whensettotrue,Elasticsearchwilljustignorethemissingindicesduringbackup.include_global_state:Whenthisissettotrue(thedefault),theclusterstateisalsowrittentothesnapshot(exceptforthetransientsettings).partial:Thesnapshotoperationsuccessdependsontheavailabilityofalltheshards.Ifanyoftheshardsisnotavailable,thesnapshotoperationwillfail.SettingpartialtotruecausesElasticsearchtosaveonlytheavailableshardsandomitthelostones.

Anexampleofusingadditionalparameterscanlookasfollows:

curl-XPUT'localhost:9200/_snapshot/backup/bckp3?

wait_for_completion=true&pretty'-d'{

"indices":"b*",

"include_global_state":"false"

}'

www.EBooksWorld.ir

Page 590: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RestoringasnapshotNowthatwehaveoursnapshotsdone,wewillalsolearnhowtorestoredatafromagivensnapshot.Aswesaidearlier,asnapshotcanbeaddressedbyitsname.Wecanlistallthesnapshotsusingthefollowingcommand:

curl-XGET'localhost:9200/_snapshot/backup/_all?pretty'

TheresponsereturnedbyElasticsearchtotheprecedingcommandshowsthelistofallavailablebackups.Everylistitemissimilartothefollowing:

{

"snapshot":{

"snapshot":"bckp2",

"version_id":2000099,

"version":"2.2.0",

"indices":["news"],

"state":"SUCCESS",

"start_time":"2016-01-07T21:21:43.740Z",

"start_time_in_millis":1446931303740,

"end_time":"2016-01-07T21:21:44.750Z",

"end_time_in_millis":1446931304750,

"duration_in_millis":1010,

"failures":[],

"shards":{

"total":5,

"failed":0,

"successful":5

}

}

}

Therepositorywecreatedearlieriscalledbackup.Torestoreasnapshotnamedbckp1fromoursnapshotrepository,runthefollowingcommand:

curl-XPOST'localhost:9200/_snapshot/backup/bckp1/_restore'

Duringtheexecutionofthiscommand,Elasticsearchtakestheindicesdefinedinthesnapshotandcreatesthemwiththedatafromthesnapshot.However,iftheindexalreadyexistsandisnotclosed,thecommandwillfail.Inthiscase,youmayfinditconvenienttoonlyrestorecertainindices,forexample:

curl-XPOST'localhost:9200/_snapshot/backup/bckp1/_restore?pretty'-d'{

"indices":"c*"}'

Theprecedingcommandrestoresonlytheindicesthatbeginwiththeletterc.Theotheravailableparametersareasfollows:

ignore_unavailable:Thisparameterwhensettofalse(thedefaultbehavior),willcauseElasticsearchtofailtherestoreprocessifanyoftheexpectedindicesisnotavailable.include_global_state:ThisparameterwhensettotruewillcauseElasticsearchtorestoretheglobalstateincludedinthesnapshot,whichisalsothedefaultbehavior.

www.EBooksWorld.ir

Page 591: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

rename_pattern:Thisparameterallowstherenamingoftheindexduringarestoreoperation.Thankstothis,therestoredindexwillhaveadifferentname.Thevalueofthisparameterisaregularexpressionthatdefinesthesourceindexname.Ifapatternmatchesthenameoftheindex,namesubstitutionwilloccur.Inthepattern,youshouldusegroupslimitedbyparenthesesusedintherename_replacementparameter.rename_replacement:Thisparameteralongwithrename_patterndefinesthetargetindexname.Usingthedollarsignandnumber,youcanrecalltheappropriategroupfromrename_pattern.

Forexample,duetorename_pattern=products_(.*),onlytheindiceswithnamesthatbeginwithproducts_willberestored.Therestoftheindexnamewillbeusedduringreplacement.rename_pattern=products_(.*)togetherwithrename_replacement=items_$1causestheproducts_carsindextoberestoredtoanindexcalleditems_cars.

www.EBooksWorld.ir

Page 592: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Cleaningup–deletingoldsnapshotsElasticsearchleavessnapshotrepositorymanagementuptoyou.Currently,thereisnoautomaticclean-upprocess.Butdon’tworry;thisissimple.Forexample,let’sremoveourpreviouslytakensnapshot:

curl-XDELETE'localhost:9200/_snapshot/backup/bckp1?pretty'

Andthat’sall.Thecommandcausesthesnapshotnamedbckp1fromthebackuprepositorytobedeleted.

www.EBooksWorld.ir

Page 593: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 594: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Monitoringyourcluster’sstateandhealthMonitoringisessentialwhenitcomestohandlingyourclusterandensuringitisinahealthystate.Itallowsadministratorsanddevelopstodetectpossibleproblemsandpreventthembeforetheyoccurortoactassoonastheystartshowing.Intheworstcase,monitoringallowsustodoapostmortemanalysisofwhathappenedtotheapplication—inthiscase,ourElasticsearchclusterandeachofthenodes.

Elasticsearchprovidesverydetailedinformationthatallowsustocheckandmonitorournodesortheclusterasawhole.Thisincludesstatisticsandinformationabouttheservers,nodes,indices,andshards.Ofcourse,wearealsoabletogetinformationabouttheentireclusterstate.BeforewegetintothedetailsaboutthementionedAPI,pleaserememberthattheAPIiscomplexandweareonlydescribingthebasics.Wewilltrytoshowyouwheretostartsoyou’llbeabletoknowwhattolookforwhenyouneedverydetailedinformation.

www.EBooksWorld.ir

Page 595: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ClusterhealthAPIOneofthemostbasicAPIsistheclusterhealthAPI,whichallowsustogetinformationabouttheentireclusterstatewithasingleHTTPcommand.Forexample,let’srunthefollowingcommand:

curl-XGET'localhost:9200/_cluster/health?pretty'

AsampleresponsereturnedbyElasticsearchfortheprecedingcommandlooksasfollows:

{

"cluster_name":"elasticsearch",

"status":"yellow",

"timed_out":false,

"number_of_nodes":1,

"number_of_data_nodes":1,

"active_primary_shards":11,

"active_shards":11,

"relocating_shards":0,

"initializing_shards":0,

"unassigned_shards":11,

"delayed_unassigned_shards":0,

"number_of_pending_tasks":0,

"number_of_in_flight_fetch":0,

"task_max_waiting_in_queue_millis":0,

"active_shards_percent_as_number":50.0

}

Themostimportantinformationisaboutthestatusofthecluster.Inourexample,weseethattheclusterisinyellowstatus.Thismeansthatalltheprimaryshardshavebeenallocatedproperly,butthereplicaswerenot(becauseofasinglenodeinthecluster,butthatdoesn’tmatterfornow).

Ofcourse,apartfromtheclusternameandstatus,wecanseehowtherequestwastimedout,howmanynodesthereare,howmanydatanodes,primaryshards,initializingshards,unassignedones,andsoon.

Let’sstophereandtalkabouttheclusterandwhenthecluster,asawhole,isfullyoperational.ClusterisfullyoperationalwhenElasticsearchisabletoallocatealltheshardsandreplicasaccordingtotheconfiguration.Thisiswhentheclusterisinthegreenstate.Theyellowstatemeansthatwearereadytohandlerequestsbecausetheprimaryshardsareallocated,butsome(orall)replicasarenot.Thelaststate,theredone,meansthatatleastoneprimaryshardwasnotallocatedandbecauseofthis,theclusterisnotreadyyet.Thatmeansthatthequeriesmayreturnerrorsornotcompleteresults.

Theprecedingcommandcanalsobeexecutedtocheckthehealthstateofcertainindices.Forexample,ifwewouldliketocheckthehealthofthelibraryandmapindices,wewouldrunthefollowingcommand:

curl-XGET'localhost:9200/_cluster/health/library,map/?pretty'

Controllinginformationdetails

www.EBooksWorld.ir

Page 596: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Elasticsearchallowsustospecifyaspeciallevelparameter,whichcantakethevalueofcluster(default),indices,orshards.ThisallowsustocontrolthedetailsofinformationreturnedbythehealthAPI.We’vealreadyseenthedefaultbehavior.Whensettingthelevelparametertoindices,apartfromtheclusterinformation,wewillalsogetperindexhealth.SettingthementionedparametertoshardstellsElasticsearchtoreturnpershardinformationinadditiontowhatwe’veseenintheexample.

AdditionalparametersInadditiontothelevelparameter,wehaveafewadditionalparametersthatcancontrolthebehaviorofthehealthAPI.

Thefirstofthementionedparametersistimeoutandallowsustocontrolhowlongatthemost,thecommandexecutionwillwaitwhenoneofthefollowingparametersisused:wait_for_status,wait_for_nodes,wait_for_relocating_shards,andwait_for_active_shards.Bydefault,itissetto30sandmeansthatthehealthcommandwillwait30secondsmaximumandreturntheresponsebythen.

Thewait_for_statusparameterallowsustotellElasticsearchwhichhealthstatustheclustershouldbeattoreturnthecommand.Itcantakethevaluesofgreen,yellow,andred.Forexample,whensettogreen,thehealthAPIcallwillreturntheresultsuntilthegreenstatusortimeoutisreached.

Thewait_for_nodesparameterallowsustosettherequirednumberofnodesavailabletoreturnthehealthcommandresponse(oruntiladefinedtimeoutisreached).Itcanbesettoanintegernumberlike3ortoasimpleequationlike>=3(means,greaterthanorequaltothreenodes)or<=3(meanslessthanorequaltothreenodes).

Thewait_for_active_shardsparametermeansthatElasticsearchwillwaitforaspecifiednumberofactiveshardstobepresentbeforereturningtheresponse.

Thelastparameteristhewait_for_relocating_shard,whichisbydefaultnotspecified.ItallowsustotellElasticsearchhowmanyrelocatingshardsitshouldwaitfor(oruntilthetimeoutisreached).Settingthisparameterto0meansthatElasticsearchshouldwaitforalltherelocatingshards.

Anexampleusageofthehealthcommandwithsomeofthementionedparametersisasfollows:

curl-XGET'localhost:9200/_cluster/health?

wait_for_status=green&wait_for_nodes=>=3&timeout=100s'

www.EBooksWorld.ir

Page 597: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IndicesstatsAPIElasticsearchindexistheplacewhereourdatalivesanditisacrucialpartformostdeployments.WiththeuseoftheindicesstatsAPIavailableusingthe_statsendpoint,wecangetalotofinformationabouttheindiceslivinginsideourcluster.Ofcourse,aswithmostoftheAPI’sinElasticsearch,wecansendacommandtogettheinformationaboutalltheindices(usingthepure_statsendpoint),aboutoneparticularindex(forexamplelibrary/_stats)orseveralindicesatthesametime(forexamplelibrary,map/_stats).Forexample,tocheckthestatisticsforthemapandlibraryindiceswe’veusedinthebook,wecouldrunthefollowingcommand:

curl-XGET'localhost:9200/library,map/_stats?pretty'

Theresponsetotheprecedingcommandhasmorethan700lines,soweonlydescribeitsstructureomittingtheresponseitself.Apartfromtheinformationabouttheresponsestatusandtheresponsetime,wecanseethreeobjectsnamedprimaries,total(in_allobject),andindices.Theindicesobjectcontainsinformationaboutthelibraryandmapindices.Theprimariesobjectcontainsinformationabouttheprimaryshardsallocatedtothecurrentnode,andthetotalobjectcontainsinformationaboutalltheshardsincludingreplicas.Alltheseobjectscancontainobjectsdescribingaparticularstatisticsuchasthefollowing:docs,store,indexing,get,search,merges,refresh,flush,warmer,query_cache,fielddata,percolate,completion,segments,translog,suggest,request_cache,andrecovery.

WecanlimittheamountofinformationthatwegetfromtheindicesstatsAPIbyprovidingthetypeofdataweareinterestedinusingthenamesofthestatisticsmentionedpreviously.Forexample,ifwewanttogetinformationaboutindexingandsearching,wecanrunthefollowingcommand:

curl-XGET'localhost:9200/library,map/_stats/indexing,search?pretty'

Let’sdiscusstheinformationstoredinthoseobjects.

DocsThedocssectionoftheresponseshowsinformationaboutindexeddocuments.Forexample,itcouldlookasfollows:

"docs":{

"count":4,

"deleted":0

}

Themaininformationisthecount,indicatingthenumberofdocumentsinthedescribedindex.Whenwedeletedocumentsfromtheindex,Elasticsearchdoesn’tremovethesedocumentsimmediatelyandonlymarksthemasdeleted.Documentsarephysicallydeletedduringthesegmentmergeprocess.Thenumberofdocumentsmarkedasdeletedispresentedbythedeletedattributeandshouldbe0rightafterthemerge.

Store

www.EBooksWorld.ir

Page 598: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Thenextstatistic,thestoreone,providesinformationregardingstorage.Forexample,suchasectioncouldlookasfollows:

"store":{

"size_in_bytes":6003,

"throttle_time_in_millis":0

}

Themaininformationisabouttheindex(orindices)size.Wecanalsolookatthrottlingstatistics.ThisinformationisusefulwhenthesystemhasproblemswiththeI/Operformanceandhasconfiguredlimitsonaninternaloperationduringsegmentmerging.

Indexing,get,andsearchTheindexing,get,andsearchsectionsoftheresponseprovideinformationaboutdatamanipulationindexingwithdeleteoperations,usingreal-timegetandsearching.Let’slookatthefollowingexamplereturnedbyElasticsearch:

"indexing":{

"index_total":0,

"index_time_in_millis":0,

"index_current":0,

"delete_total":0,

"delete_time_in_millis":0,

"delete_current":0,

"noop_update_total":0,

"is_throttled":false,

"throttle_time_in_millis":0

},

"get":{

"total":0,

"time_in_millis":0,

"exists_total":0,

"exists_time_in_millis":0,

"missing_total":0,

"missing_time_in_millis":0,

"current":0

},

"search":{

"open_contexts":0,

"query_total":0,

"query_time_in_millis":0,

"query_current":0,

"fetch_total":0,

"fetch_time_in_millis":0,

"fetch_current":0,

"scroll_total":0,

"scroll_time_in_millis":0,

"scroll_current":0

}

Asyoucansee,allofthesestatisticshavesimilarstructures.Wecanreadthetotaltimespentinvariousrequesttypes(inmilliseconds),thenumberofrequests(whichwiththetotaltimeallowsustocalculatetheaveragetimeofasinglequery).Inthecaseofgetrequests,valuableinformationishowmanyfetcheswereunsuccessful(missing

www.EBooksWorld.ir

Page 599: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

documents);anindexingrequesthasinformationaboutthrottling,andsearchincludesinformationregardingscrolling.

AdditionalinformationInadditiontothepreviouslydescribedsection,Elasticsearchprovidesthefollowinginformation:

merges:ThissectioncontainsinformationaboutLucenesegmentmergesrefresh:Thissectioncontainsinformationabouttherefreshoperationflush:Thissectioncontainsinformationaboutflusheswarmer:Thissectioncontainsinformationaboutwarmersandforhowlongtheywereexecutedquery_cache:Thisquerycachesstatisticsfielddata:Thisfielddatacachesstatisticspercolate:Thissectioncontainsinformationaboutthepercolatorusagecompletion:Thissectioncontainsinformationaboutthecompletionsuggestersegments:ThissectioncontainsinformationaboutLucenesegmentstranslog:Thissectioncontainsinformationaboutthetransactionlogscountandsizesuggest:Thissectioncontainssuggesters-relatedstatisticsrequest_cache:Thiscontainsshardrequestcachesstatisticsrecovery:Thiscontainsshardsrecoveryinformation

www.EBooksWorld.ir

Page 600: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

NodesinfoAPIThenodesinfoAPIprovidesuswithinformationaboutthenodesinthecluster.TogetinformationfromthisAPI,weneedtosendtherequesttothe_nodesRESTendpoints.ThesimplestcommandtoretrievenodesrelatedinformationfromElasticsearchwouldbeasfollows:

curl-XGET'localhost:9200/_nodes?pretty'

ThisAPIcanbeusedtofetchinformationaboutparticularnodesorasinglenodeusingthefollowing:

Nodename:IfwewouldliketogetinformationaboutthenodenamedPulse,wecouldrunacommandtothefollowingRESTendpoint:_nodes/PulseNodeidentifier:Ifwewouldliketogetinformationaboutthenodewithanidentifierequaltony4hftjNQtuKMyEvpUdQWg,wecouldrunacommandtothefollowingRESTendpoint:_nodes/ny4hftjNQtuKMyEvpUdQWgIPaddress:WecanuseIPaddressestogetinformationaboutthenodes.Forexample,ifwewouldliketogetinformationaboutthenodewithanIPaddressequalto192.168.1.103,wecouldrunacommandtothefollowingRESTendpoint:_nodes/192.168.1.103

ParametersfromtheElasticsearchconfiguration:Ifwewouldliketogetinformationaboutallthenodeswiththenode.rackpropertysetto2,wecouldrunacommandtothefollowingRESTendpoint:/_nodes/rack:2

ThisAPIalsoallowsustogetinformationaboutseveralnodesatonceusingthese:

Patterns,forexample:_nodes/192.168.1.*or_nodes/P*Nodesenumeration,forexample:_nodes/Pulse,SlabBothpatternsandenumerations,forexample:/_nodes/P*,S*

ReturnedinformationBydefault,thenodesAPIwillreturnextensiveinformationabouteachnodealongwiththename,identifier,andaddresses.Thisextensiveinformationincludesthefollowing:

settings:TheElasticsearchconfigurationos:Informationabouttheserversuchasprocessor,RAM,andswapspaceprocess:Processidentifierandrefreshintervaljvm:InformationaboutJavaVirtualMachinesuchasmemorylimits,memorypools,andgarbagecollectorsthread_pool:Theconfigurationofthreadpoolsforvariousoperationstransport:Listeningaddressesforthetransportprotocolhttp:InformationaboutlisteningaddressesforanHTTP-basedAPIplugins:Informationaboutthepluginsinstalledbytheusermodules:Informationaboutthebuilt-inplugins

AnexampleusageofthisAPIcanbeillustratedbythefollowingcommand:

curl'localhost:9200/_nodes/Pulse/os,jvm,plugins'

www.EBooksWorld.ir

Page 601: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheprecedingcommandwillreturnthebasicinformationaboutthenodenamedPulseand,inadditiontothis,itwillincludetheoperatingsysteminformation,javavirtualmachineinformation,andplugins-relatedinformation.

www.EBooksWorld.ir

Page 602: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

NodesstatsAPIThenodesstatsAPIissimilartothenodesinfoAPIdescribedintheprecedingsection.ThemaindifferenceisthatthepreviousAPIprovidedinformationabouttheenvironmentinwhichthenodeisrunning,whiletheonewearecurrentlydiscussingtellsusaboutwhathappenedwiththeclusterduringitswork.TousethenodesstatsAPI,youneedtosendacommandtothe/_nodes/statsRESTendpoint.However,similartothenodesinfoAPI,wecanalsoretrieveinformationaboutspecificnodes(forexample:_nodes/Pulse/stats).

ThesimplestcommandtoretrievenodesrelatedinformationfromElasticsearchwouldbeasfollows:

curl-XGET'localhost:9200/_nodes/stats?pretty'

Bydefault,Elasticsearchreturnsalltheavailablestatisticsbutwecanlimittheonesweareinterestedin.Theavailableoptionsareasfollows:

indices:Informationabouttheindicesincludingsize,documentcount,indexingrelatedstatistics,searchandgettime,caches,segmentmerges,andsoonos:Operatingsystemrelatedinformationsuchasfreediskspace,memory,swapusage,andsoonprocess:Memory,CPU,andfilehandlerusagerelatedtotheElasticsearchprocessjvm:Javavirtualmachinememoryandgarbagecollectorstatisticstransport:Informationaboutdatasentandreceivedbythetransportmodulehttp:Informationabouthttpconnectionsfs:InformationaboutavailablediskspaceandI/Ooperationsstatisticsthread_pool:Informationaboutthestateofthethreadsassignedtovariousoperationsbreakers:Informationaboutcircuitbreakersscript:Scriptingenginerelatedinformation

AnexampleusageofthisAPIcanbeillustratedbythefollowingcommand:

curl'localhost:9200/_nodes/Pulse/stats/os,jvm,breaker'

www.EBooksWorld.ir

Page 603: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ClusterstateAPIAnotherAPIprovidedbyElasticsearchistheclusterstateAPI.Asitsnamesuggests,itallowsustogetinformationabouttheentirecluster(wecanalsolimitthereturnedinformationtoalocalnodebyaddingthelocal=trueparametertotherequest).ThebasiccommandusedtogetalltheinformationreturnedbythisAPIlooksasfollows:

curl-XGET'localhost:9200/_cluster/state?pretty'

Wecanalsolimittheprovidedinformationtothegivenmetricsincomma–separatedform,specifiedafterthe_cluster/statepartoftheRESTcall.Forexample:

curl-XGET'localhost:9200/_cluster/state/version,nodes?pretty'

Wecanalsolimittheinformationtothegivenmetricsandindices.Forexample,ifwewouldliketogetthemetadataforthelibraryindex,wecouldrunthefollowingcommand:

curl-XGET'localhost:9200/_cluster/state/metadata/library?pretty'

Thefollowingmetricsareallowedtobeused:

version:Thisreturnsinformationabouttheclusterstateversion.master_node:Thisreturnsinformationabouttheelectedmasternode.nodes:Thisreturnsnodesinformation.routing_table:Thisreturnsroutingrelatedinformation.metadata:Thisreturnsmetadatarelatedinformation.Whenspecifyingretrievingthemetadatametricwecanalsoincludeanadditionalparametersuchasindex_templates=true,whichwillresultinincludingthedefinedindextemplates.blocks:Thisreturnstheblockspartoftheresponse.

www.EBooksWorld.ir

Page 604: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ClusterstatsAPITheclusterstatsAPIallowsustogetstatisticsabouttheindicesandnodesfromtheclusterwideperspective.TousethisAPI,weneedtoruntheGETrequesttothe/_cluster/statsRESTendpoint,forexample:

curl-XGET'localhost:9200/_cluster/stats?pretty'

Theresponsesizedependsonthenumberofshards,indices,andnodesinthecluster.Itwillincludebasicindicesinformationsuchasshards,theirstate,recoveryinformation,cachesinformation,andnoderelatedinformation.

www.EBooksWorld.ir

Page 605: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PendingtasksAPIOneoftheAPI’sthathelpsusinseeingwhatElasticsearchisdoing;itallowsustocheckwhichtasksarewaitingtobeexecuted.Toretrievethisinformation,weneedtosendarequesttothe/_cluster/pending_tasksRESTendpoint.Inthisresponse,wewillseeanarrayoftaskswithinformationaboutthem,suchastaskpriorityandtimeinqueue.

www.EBooksWorld.ir

Page 606: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IndicesrecoveryAPITherecoveryAPIgivesusinsightabouttherecoverystatusoftheshardsthatarebuildingindicesinourcluster(learnmoreaboutrecoveryinThegatewayandrecoverymodulessectionofChapter9,ElasticsearchClusterinDetail).

Thesimplestcommandthatwouldreturntheinformationabouttherecoveryofalltheshardsintheclusterwouldlookasfollows:

curl-XGET'http://localhost:9200/_recovery?pretty'

Wecanalsogetinformationaboutrecoveryforparticularindices,suchasthelibraryindexforexample:

curl-XGET'http://localhost:9200/library/_recovery?pretty'

TheresponsereturnedbyElasticsearchisdividedbyindicesandshards.Aresponseforasingleshardcouldlookasfollows:

{

"id":2,

"type":"STORE",

"stage":"DONE",

"primary":true,

"start_time_in_millis":1446132761730,

"stop_time_in_millis":1446132761734,

"total_time_in_millis":4,

"source":{

"id":"DboTibRlT1KJSQYnDPxwZQ",

"host":"127.0.0.1",

"transport_address":"127.0.0.1:9300",

"ip":"127.0.0.1",

"name":"Plague"

},

"target":{

"id":"DboTibRlT1KJSQYnDPxwZQ",

"host":"127.0.0.1",

"transport_address":"127.0.0.1:9300",

"ip":"127.0.0.1",

"name":"Plague"

},

"index":{

"size":{

"total_in_bytes":156,

"reused_in_bytes":156,

"recovered_in_bytes":0,

"percent":"100.0%"

},

"files":{

"total":1,

"reused":1,

"recovered":0,

"percent":"100.0%"

},

"total_time_in_millis":0,

www.EBooksWorld.ir

Page 607: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

"source_throttle_time_in_millis":0,

"target_throttle_time_in_millis":0

},

"translog":{

"recovered":0,

"total":-1,

"percent":"-1.0%",

"total_on_start":-1,

"total_time_in_millis":3

},

"verify_index":{

"check_index_time_in_millis":0,

"total_time_in_millis":0

}

}

Intheprecedingresponse,wecanseeinformationabouttheshardidentifier,thestageofrecovery,informationwhethertheshardisaprimaryorareplica,thetimestampsofthestartandendofrecovery,andthetotaltimetherecoveryprocesstook.Wecanseethesourcenode,targetnode,andinformationabouttheshard’sphysicalstatistics,suchassize,numberoffiles,transactionlog-relatedstatistics,andindexverificationtime.

Itisworthknowingtheinformationaboutthestagesofrecoveryandtypes.Whenitcomestothetypesofrecovery(thetypeattributeintheresponse),wecanexpectthefollowing:theSTORE,SNAPSHOT,REPLICA,andRELOCATINGvalues.Whenitcomestothestageofrecovery(thestageattributeintheresponse),wecanexpectvaluessuchasINIT(recoveryhasnotstarted),INDEX(Elasticsearchcopiesmetadatainformationanddatafromsourcetodestination),START(Elasticsearchisopeningtheshardforuse),FINALIZE(finalstage,whichcleansupgarbage),andDONE(recoveryhasended).

WecanlimittheresponsereturnedbytheindicesrecoveryAPItoonlytheshardsthatarecurrentlyinactiverecoverybyincludingtheactive_only=trueparameterintherequest.Finally,wecanrequestmoredetailedinformationbyaddingthedetailed=trueparameterintheAPIcall.

www.EBooksWorld.ir

Page 608: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IndicesshardstoresAPITheindicesshardstoresAPIgivesusinformationaboutthestorefortheshardsofourindices.WeusethisAPIbyrunningasimplecommandtothe/_shard_storesRESTendpointandprovidingornotprovidingthecomma-separatedindicesnames.

Forexample,togetinformationaboutalltheindices,wewouldrunthefollowingcommand:

curl-XGET'http://localhost:9200/_shard_stores?pretty'

Wecanalsogetinformationaboutparticularindices,suchasthelibraryandmapones:

curl-XGET'http://localhost:9200/library,map/_shard_stores?pretty'

TheresponsereturnedbyElasticsearchcontainsinformationaboutthestoreforeachshard.Forexample,thisiswhatElasticsearchreturnedforoneoftheshardsofthelibraryindex:

"0":{

"stores":[{

"DboTibRlT1KJSQYnDPxwZQ":{

"name":"Plague",

"transport_address":"127.0.0.1:9300",

"attributes":{}

},

"version":6,

"allocation":"primary"

}]

}

Wecanseeinformationaboutthenodeinthestoresarrays.Eachentrycontainsnoderelatedinformation(thenodewheretheshardisphysicallylocated),theversionofthestorecopy,andtheallocation,whichcantakethevaluesofprimary(forprimaryshards),replica(forreplicas),andunused(forunassignedshards).

www.EBooksWorld.ir

Page 609: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IndicessegmentsAPIThelastAPIwewanttomentionistheLucenesegmentsAPIthatcanbeavailedbyusingthe/_segmentsendpoint.Wecaneitherrunitfortheentirecluster,forexamplelikethis:

curl-XGET'localhost:9200/_segments?pretty'

Wecanalsorunthecommandforindividualindices.Forexample,ifwewouldliketogetsegmentsrelatedinformationforthemapandlibraryindices,wewouldusethefollowingcommand:

curl-XGET'localhost:9200/library,map/_segments?pretty'

ThisAPIprovidesinformationaboutshards,theirplacements,andinformationaboutsegmentsconnectedwiththephysicalindexmanagedbytheApacheLucenelibrary.

www.EBooksWorld.ir

Page 610: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 611: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ControllingtheshardandreplicaallocationTheindicesthatliveinsideyourElasticsearchclustercanbebuiltfrommanyshardsandeachshardcanhavemanyreplicas.Theabilitytodivideasingleindexintomultipleshardsgivesusthepossibilityofdividingthedataintomultiplephysicalinstances.Thereasonswhywewanttodothismaybedifferent.Wemaywanttoparallelizeindexingtogetmorethroughput,orwemaywanttohavesmallershardssothatourqueriesarefaster.Ofcourse,wemayhavetoomanydocumentstofitthemonasinglemachineandwemaywantashardbecauseofthis.Withreplicas,wecanparallelizethequeryloadbyhavingmultiplephysicalcopiesofeachshard.Wecansaythat,usingshardsandreplicas,wecanscaleoutElasticsearch.However,Elasticsearchhastofigureoutwhereintheclusteritshouldplaceshardsandreplicas.Itneedstofigureoutonwhichserver/nodeseachshardorreplicashouldbeplaced.

www.EBooksWorld.ir

Page 612: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ExplicitlycontrollingallocationOneofthemostcommonusecasesthatuseexplicitcontrollingofshardsandreplicasallocationinElasticsearchistime-baseddata,thatis,logs.Eachlogeventhasatimestampassociatedwithit;however,theamountoflogsinmostorganizationsisjustenormous.Thethingisthatyouneedalotofprocessingpowertoindexthem,butyoudon’tusuallysearchhistoricaldata.Ofcourse,youmaywanttodothat,butitwillbedonelessfrequentlythanthequeriesforthemostrecentdata.

Becauseofthis,wecandividetheclusterintosocalledtwotiers—thecoldandthehottier.Thehottiercontainsmorepowerfulnodes,onesthathaveveryfastdisks,lotsofCPUprocessingpower,andmemory.Thesenodeswillhandlebothalotofindexingaswellasqueriesforrecentdata.Thecoldtier,ontheotherhand,willcontainnodesthathaveverylargedisks,butarenotveryfast.Wewon’tbeindexingintothecoldtier;wewillonlystoreourhistoricalindiceshereandsearchthemfromtimetotime.WiththedefaultElasticsearchbehavior,wecan’tbesurewheretheshardsandreplicaswillbeplaced,butluckilyElasticsearchallowsustocontrolthis.

NoteThemainassumptionwhenitcomestotimeseriesdataisthatoncetheyareindexed,theyarenotbeingupdated.ThisistrueforlogindexingusecasesandweassumewecreateElasticsearchdeploymentforsuchausecase.

Theideaistocreatetheindicesthatindextoday’sdataonthehotnodesand,whenwestopusingit(whenanotherdaystarts),weupdatetheindexsettingssothatitismovedtothetiercalledcold.Let’snowseehowwecandothis.

SpecifyingnodeparametersSolet’sdivideourclusterintotwotiers.Wesaytiers,buttheycanbeanynameyouwant,wejustliketheterm“tier”anditiscommonlyused.Weassumethatwehavesixnodes.Wewantourmorepowerfulnodesnumbered1and2tobeplacedinthetiercalledhotandthenodesnumbered3,4,5,and6,whicharesmallerintermsofCPUandmemory,butverylargeintermsofdiskspace,tobeplacedinatiercalledcold.

ConfigurationToconfigure,weaddthefollowingpropertytotheelasticsearch.ymlconfigurationfileonnodes1and2(theonesthataremorepowerful):

node.tier:hot

Ofcourse,wewilladdasimilarpropertytotheelasticsearch.ymlconfigurationfileonnodes3,4,5,and6(thelesspowerfulones):

node.tier:cold

IndexcreationNowlet’screateourdailyindexfortoday’sdata,onecalledlogs_2015-12-10.Aswesaid

www.EBooksWorld.ir

Page 613: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

earlier,wewantthistobeplacedonthenodesinthehottier.Wedothisbyrunningthefollowingcommands:

curl-XPUT'http://localhost:9200/logs_2015-12-10'-d'{

"settings":{

"index":{

"routing.allocation.include.tier":"hot"

}

}

}'

Theprecedingcommandwillresultinthecreationofthelogs_2015-12-10indexandspecificationoftheindex.routing.allocation.include.tierpropertytoit.Wesetthispropertytothehotvalue,whichmeansthatwewanttoplacethelogs_2015-12-10indexonthenodesthathavethenode.tierpropertysettohot.

Now,whenthedayendsandweneedtocreateanewindex,weagainputitonthehotnodes.Wedothisbyrunningthefollowingcommand:

curl-XPUT'http://localhost:9200/logs_2015-12-11'-d'{

"settings":{

"index":{

"routing.allocation.include.tier":"hot"

}

}

}'

Finally,weneedtotellElasticsearchtomovetheindexholdingthedataforthepreviousdaytothecoldtier.Wedothisbyupdatingtheindexsettingsandsettingtheindex.routing.allocation.include.tierpropertytocold.Thisisdoneusingthefollowingcommand:

curl-XPUT'http://localhost:9200/logs_2015-12-10/_settings'-d'{

"index.routing.allocation.include.tier":"cold"

}'

Afterrunningtheprecedingcommand,Elasticsearchwillstartrelocatingtheindexcalledlogs_2015-12-10tothenodesthathavethenode.tierpropertysettocoldintheelasticsearch.ymlfilewithoutanymanualworkneededfromus.

ExcludingnodesfromallocationInthesamemanneraswespecifiedonwhichnodestheindexshouldbeplaced,wecanalsoexcludenodesfromindexallocation.Referringtothepreviouslyshownexample.ifwewanttheindexcalledlogs_2015-12-10tonotbeplacedonthenodeswiththenode.tierpropertysettocold,wewouldrunthefollowingcommand:

curl-XPUT'localhost:9200/logs_2015-12-10/_settings'-d'{

"index.routing.allocation.exclude.tier":"cold"

}'

Noticethatinsteadoftheindex.routing.allocation.include.tierproperty,we’veusedtheindex.routing.allocation.exclude.tierproperty.

www.EBooksWorld.ir

Page 614: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RequiringnodeattributesInadditiontoinclusionandexclusionrules,wecanalsospecifytherulesthatmustmatchinorderforashardtobeallocatedtoagivennode.Thedifferenceisthatwhenusingtheindex.routing.allocation.includeproperty,theindexwillbeplacedonanynodethatmatchesatleastoneoftheprovidedpropertyvalues.Usingindex.routing.allocation.require,Elasticsearchwillplacetheindexonanodethathasallthedefinedvalues.Forexample,let’sassumethatwe’vesetthefollowingsettingsforthelogs_2015-12-10index:

curl-XPUT'localhost:9200/logs_2015-12-10/_settings'-d'{

"index.routing.allocation.require.tier":"hot",

"index.routing.allocation.require.disk_type":"ssd"

}'

Afterrunningtheprecedingcommand,Elasticsearchwouldonlyplacetheshardsofthelogs_2015-12-10indexonanodewiththenode.tierpropertysettohotandthenode.disk_typepropertysettossd.

UsingtheIPaddressforshardallocationInsteadofaddingaspecialparametertothenodesconfiguration,weareallowedtouseIPaddressestospecifywhichnodeswewanttoincludeorexcludefromtheshardsandreplicasallocation.Inordertodothis,insteadofusingthetierpartoftheindex.routing.allocation.include.tierorindex.routing.allocation.exclude.tierproperties,weshouldusethe_ip.Forexample,ifwewouldlikeourlogs_2015-12-10indextobeplacedonlyonthenodeswiththe10.1.2.10and10.1.2.11IPaddresses,wewouldrunthefollowingcommand:

curl-XPUT'localhost:9200/logs_2015-12-10/_settings'-d'{

"index.routing.allocation.include._ip":"10.1.2.10,10.1.2.11"

}'

NoteInadditionto_ip,Elasticsearchalsoallowsustouse_nametospecifyallocationrulesusingnodenamesand_hosttospecifyallocationrulesusinghostnames.

Disk-basedshardallocationInadditiontothealreadydescribedallocationfilteringmethods,Elasticsearchgivesusdisk-basedshardallocationrules.Itallowsustosetallocationrulesbasedonthenodes’diskusage.

Configuringdiskbasedshardallocation

Therearefourpropertiesthatcontrolthebehaviorofadisk-basedshardallocation.Allofthemcanbeupdateddynamicallyorsetintheelasticsearch.ymlconfigurationfile.

Thefirstoftheseiscluster.info.update.interval,whichisbydefaultsetto30secondsanddefineshowoftenElasticsearchupdatesinformationaboutdiskusageonnodes.

www.EBooksWorld.ir

Page 615: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Thesecondpropertyisthecluster.routing.allocation.disk.watermark.low,whichisbydefaultsetto0.85.ThismeansthatElasticsearchwillnotallocatenewshardstoanodethatusesmorethan85%ofitsdiskspace.

Thethirdpropertyisthecluster.routing.allocation.disk.watermark.high,whichcontrolswhenElasticsearchwillstartrelocatingshardsfromagivennode.Itdefaultsto0.90andmeansthatElasticsearchwillstartreallocatingshardswhenthediskusageonagivennodeisequaltoormorethan90%.

Boththecluster.routing.allocation.disk.watermark.lowandcluster.routing.allocation.disk.watermark.highpropertiescanbesettoapercentagevalue(suchas0.60,meaning60%)andtoanabsolutevalue(suchas600mb,meaning600megabytes).

Finally,thelastpropertyiscluster.routing.allocation.disk.include_relocations,whichbydefaultissettotrue.IttellsElasticsearchtotakeintoaccounttheshardsthatarenotyetcopiedtothenodebutElasticsearchisintheprocessofdoingthat.Havingthisbehaviorturnedonbydefaultmeansthatthedisk-basedallocationmechanismwillbemorepessimisticwhenitcomestoavailablediskspaces(whenshardsarerelocating),butwewon’trunintosituationswhereshardscan’tberelocatedbecausetheassumptionsaboutdiskspacewerewrong.

Disablingdiskbasedshardallocation

Thediskbasedshardallocationisenabledbydefault.Wecandisableitbyspecifyingthecluster.routing.allocation.disk.threshold_enabledpropertyandsettingittofalse.Wecandothisintheelasticsearch.ymlfileordynamicallyusingtheclustersettingsAPI:

curl-XPUTlocalhost:9200/_cluster/settings-d'{

"transient":{

"cluster.routing.allocation.disk.threshold_enabled":false

}

}'

www.EBooksWorld.ir

Page 616: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThenumberofshardsandreplicaspernodeInadditiontospecifyingshardsandreplicasallocation,wearealsoallowedtospecifythemaximumnumberofshardsthatcanbeplacedonasinglenodeforasingleindex.Forexample,ifwewouldlikeourlogs_2015-12-10indextohaveonlyasingleshardpernode,wewouldrunthefollowingcommand:

curl-XPUT'localhost:9200/logs_2015-12-10/_settings'-d'{

"index.routing.allocation.total_shards_per_node":1

}'

Thispropertycanbeplacedintheelasticsearch.ymlfileorcanbeupdatedonliveindicesusingtheprecedingcommand.PleaserememberthatyourclustercanstayintheredstateifElasticsearchwon’tbeabletoallocatealltheprimaryshards.

www.EBooksWorld.ir

Page 617: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AllocationthrottlingTheElasticsearchallocationmechanismcanbethrottled,whichmeansthatwecancontrolhowmuchresourcesElasticsearchwilluseduringtheshardallocationandrecoveryprocess.Wearegivenfivepropertiestocontrol,whichareasfollows:

cluster.routing.allocation.node_concurrent_recoveries:Thispropertydefineshowmanyconcurrentshardrecoveriesmaybehappeningatthesametimeonanode.Thisdefaultsto2andshouldbeincreasedifyouwouldlikemoreshardstoberecoveredatthesametimeonasinglenode.However,increasingthisvaluewillresultinmoreresourceconsumptionduringrecovery.Also,pleaserememberthatduringthereplicarecoveryprocess,datawillbecopiedfromtheothernodesoverthenetwork,whichcanbeslow.cluster.routing.allocation.node_initial_primaries_recoveries:Thispropertydefaultsto4anddefineshowmanyprimaryshardsarerecoveredatthesametimeonagivennode.Becauseprimaryshardrecoveryusesdatafromlocaldisks,thisprocessshouldbeveryfast.cluster.routing.allocation.same_shard.host:ABooleanpropertythatdefaultstofalseandisapplicableonlywhenmultipleElasticsearchnodesarestartedonthesamemachine.Whensettotrue,thiswillforceElasticsearchtocheckwhetherphysicalcopiesofthesameshardarepresentonasinglephysicalmachine.Thedefaultfalsevaluemeansnocheckisdone.indices.recovery.concurrent_streams:Thisisthenumberofnetworkstreamsusedtocopydatafromothernodesthatcanbeusedconcurrentlyonasinglenode.Themorethestreams,thefasterthedatawillbecopied,butthiswillresultinmoreresourceconsumption.Thispropertydefaultsto3.indices.recovery.concurrent_small_file_streams:Thisissimilartotheindices.recovery.concurrent_streamsproperty,butdefineshowmanyconcurrentdatastreamsElasticsearchwillusetocopysmallfiles(onesthatareunder5mbinsize).Thispropertydefaultsto2.

Thisallowsustoperformachecktopreventtheallocationofmultipleinstancesofthesameshardonasinglehost,basedonhostnameandhostaddress.Thisdefaultstofalse,meaningthatnocheckisperformedbydefault.Thissettingonlyappliesifmultiplenodesarestartedonthesamemachine.

www.EBooksWorld.ir

Page 618: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Cluster-wideallocationInadditiontotheperindicesallocationsettings,Elasticsearchalsoallowsustocontrolshardandindicesallocationonacluster-widebasis—socalledshardallocationawareness.Thisisespeciallyusefulwhenwehavenodesindifferentphysicalracksandwewouldliketoplaceshardsandreplicasindifferentphysicalnodes.

Let’sstartwithasimpleexample.Weassumethatwehaveaclusterbuiltoffournodes.Eachnodeinadifferentphysicalrack.Thesimplegraphicthatillustratesthisisasfollows:

Asyoucansee,ourclusterisbuiltfromfournodes.EachnodewasboundtoaspecificIPaddressandeachnodewasgiventhetagpropertyandagroupproperty(addedtoelasticsearch.ymlasthenode.tagandnode.groupproperties).Thisclusterwillservethepurposeofshowinghowshardallocationfilteringworks.Thegroupandtagpropertiescanbegivenwhatevernamesyouwant,youjustneedtoprefixyourdesiredpropertynamewiththenodename,forexample,ifyouwouldliketouseapartypropertyname,youwouldjustaddnode.party:party1toyourelasticsearch.yml.

AllocationawarenessAllocationawarenessallowsustoconfigureshardsandtheirreplicasallocationwiththeuseofgenericparameters.Inordertoillustratehowallocationawarenessworks,wewilluseourexamplecluster.Fortheexampletowork,weshouldaddthefollowingpropertytotheelasticsearch.ymlfile:

www.EBooksWorld.ir

Page 619: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

cluster.routing.allocation.awareness.attributes:group

ThiswilltellElasticsearchtousethenode.grouppropertyastheawarenessparameter.

NoteYoucanspecifymultipleattributeswhensettingthecluster.routing.allocation.awareness.attributesproperty.Forexample:cluster.routing.allocation.awareness.attributes:group,node

Afterthis,let’sstartthefirsttwonodes,theoneswiththenode.groupparameterequaltogroupA,andlet’screateanindexbyrunningthefollowingcommand:

curl-XPOST'localhost:9200/awarness'-d'{

"settings":{

"index":{

"number_of_shards":1,"number_of_replicas":1

}

}

}'

Afterthiscommand,ourtwo-nodeclusterwilllookmoreorlesslikethis:

Asyoucansee,theindexwasdividedbetweenthetwonodesevenly.Nowlet’sseewhathappenswhenwelaunchtherestofthenodes(theoneswithnode.groupsettogroupB):

www.EBooksWorld.ir

Page 620: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Noticethedifference—theprimaryshardswerenotmovedfromtheiroriginalallocationnodes,butthereplicashardsweremovedtothenodeswithadifferentnode.groupvalue.That’sexactlyright;whenusingshardallocationawareness,Elasticsearchwon’tallocatetheprimaryshardsandreplicasofthesameindextothenodeswiththesamevalueofthepropertyusedtodeterminetheallocationawareness(whichinourcaseisthenode.group).

NotePleaserememberthatwhenusingallocationawareness,shardswillnotbeallocatedtothenodethatdoesn’thavetheexpectedattributesset.Soinourexample,anodewithoutthenode.grouppropertysetwillnotbetakenintoconsiderationbytheallocationmechanism.

ForcingallocationawarenessForcingallocationawarenesscancomeinhandywhenweknow,inadvance,howmanyvaluesourawarenessattributescantakeandwedon’twantmorereplicasthanneededtobeallocatedinourcluster,forexample,nottooverloadourclusterwithtoomanyreplicas.Forthis,wecanforcetheallocationawarenesstobeactiveonlyforcertainattributes.Wecanspecifythesevaluesusingthecluster.routing.allocation.awareness.force.zone.valuespropertyandprovidingalistofcomma-separatedvaluestoit.Forexample,ifwewouldliketheallocationawarenesstouseonlythegroupAandgroupBvaluesofthenode.groupproperty,wewouldaddthefollowingtotheelasticsearch.ymlfile:

www.EBooksWorld.ir

Page 621: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

cluster.routing.allocation.awareness.attributes:group

cluster.routing.allocation.awareness.force.zone.values:groupA,groupB

FilteringElasticsearchallowsustoconfigureallocationfortheentireclusterorfortheindexlevel.Inthecaseofclusterallocation,wecanusethepropertiesprefixes:

cluster.routing.allocation.include

cluster.routing.allocation.require

cluster.routing.allocation.exclude

Whenitcomestoindex-specificallocation,wecanusethefollowingpropertiesprefixes:

index.routing.allocation.include

index.routing.allocation.require

index.routing.allocation.exclude

Thepreviouslymentionedprefixescanbeusedwiththepropertiesthatwe’vedefinedintheelasticsearch.ymlfile(ourtagandgroupproperties)andwithaspecialpropertycalled_ipthatallowsustomatchorexcludetheuseofthenodes’IPaddresses,forexample,likethis:

cluster.routing.allocation.include._ip:192.168.2.1

IfwewouldliketoincludenodeswithagrouppropertymatchingthegroupAvalue,wewouldsetthefollowingproperty:

cluster.routing.allocation.include.group:groupA

Noticethatwe’veusedthecluster.routing.allocation.includeprefixandwe’veconcatenateditwiththenameoftheproperty,whichisgroupinourcase.

Whatdoinclude,exclude,andrequiremean

Ifyoulookcloselyattheprecedingparameters,youwillnoticethattherearethreekinds:

include:Thistypewillresultinincludingallthenodeswiththisparameterdefined.Ifmultipleincludeconditionsarevisiblethanallthenodesthatmatchatleastaoneoftheseconditionswillbetakenintoconsiderationwhenallocatingshards.Forexample,ifweaddtwocluster.routing.allocation.include.tagparameterstoourconfiguration,onewithapropertywiththevalueofnode1andsecondwiththenode2value,wewouldendupwithindices(actuallytheirshards)beingallocatedtothefirstandsecondnode(countingfromlefttoright).TosumupthenodesthathavetheincludeallocationparametertypewillbetakenintoconsiderationbyElasticsearchwhenchoosingthenodestoplaceshardson,butthisdoesn’tmeanthatElasticsearchwillputshardsinthem.require:Thisparameter,whichwasintroducedintheElasticsearch0.90typeofallocationfilter,requiresallthenodestohaveavaluethatmatchesthevalueofthisproperty.Forexample,ifweaddonecluster.routing.allocation.require.tagparametertoourconfigurationwiththevalueofnode1andacluster.routing.allocation.require.groupparameterwiththevalueofgroupA,

www.EBooksWorld.ir

Page 622: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

wewouldendupwithshardsallocatedonlytothefirstnode(theonewithanIPaddressof192.168.2.1).exclude:Thisparameterallowsustoexcludenodeswithgivenpropertiesfromtheallocationprocess.Forexample,ifwesetcluster.routing.allocation.include.tagtogroupA,wewouldendupwithindicesbeingallocatedonlytothenodeswithIPaddresses192.168.3.1and192.168.3.2(thethirdandfourthnodesinourexample).

NoteThepropertyvaluecanusesimplewildcardcharacters.Forexample,ifwewanttoincludeallthenodesthathavethegroupparametervaluebeginningwithgroup,wecouldsetthecluster.routing.allocation.include.grouppropertytogroup*.Intheexampleclustercase,thiswouldresultinmatchingnodeswiththegroupAandgroupBgroupparametervalues.

www.EBooksWorld.ir

Page 623: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ManuallymovingshardsandreplicasThelastthingwewantedtodiscussistheabilitytomanuallymoveshardsbetweennodes.Elasticsearchexposesthe_cluster/rerouteRESTend-point,whichallowsustocontrolthat.Thefollowingoperationsareavailable:

MovingashardfromnodetonodeCancellingshardallocationForcingshardallocation

Nowlet’slookcloselyatalloftheprecedingoperations.

MovingshardsLet’ssaywehavetwonodescalledes_node_oneandes_node_two,andwehavetwoshardsoftheshopindexplacedbyElasticsearchonthefirstnodeandwewouldliketomovethesecondshardtothesecondnode.Inordertodothis,wecanrunthefollowingcommand:

curl-XPOST'localhost:9200/_cluster/reroute'-d'{

"commands":[{

"move":{

"index":"shop",

"shard":1,

"from_node":"es_node_one",

"to_node":"es_node_two"

}

}]

}'

We’vespecifiedthemovecommand,whichallowsustomoveshards(andreplicas)oftheindexspecifiedbytheindexproperty.Theshardpropertyisthenumberofshardswewanttomove.And,finally,thefrom_nodepropertyspecifiesthenameofthenodewewanttomovetheshardfromandtheto_nodepropertyspecifiesthenameofthenodewewanttheshardtobeplacedon.

CancelingshardallocationIfwewouldliketocancelanon-goingallocationprocess,wecanrunthecancelcommandandspecifytheindex,node,andshardwewanttocanceltheallocationfor.Forexample:

curl-XPOST'localhost:9200/_cluster/reroute'-d'{

"commands":[{

"cancel":{

"index":"shop",

"shard":0,

"node":"es_node_one"

}

}]

}'

Theprecedingcommandwouldcanceltheallocationofshard0oftheshopindexonthe

www.EBooksWorld.ir

Page 624: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

es_node_onenode.

ForcingshardallocationInadditiontocancellingandmovingshardsandreplicas,wearealsoallowedtoallocateanunallocatedshardtoaspecificnode.Forexample,ifwehaveanunallocatedshardnumbered0fortheusersindexandwewouldlikeittobeallocatedtoes_node_twobyElasticsearch,wewouldrunthefollowingcommand:

curl-XPOST'localhost:9200/_cluster/reroute'-d'{

"commands":[{

"allocate":{

"index":"users",

"shard":0,

"node":"es_node_two"

}

}]

}'

MultiplecommandsperHTTPrequestWecan,ofcourse,includemultiplecommandsinasingleHTTPrequest.Forexample:

curl-XPOST'localhost:9200/_cluster/reroute'-d'{

"commands":[

{"move":{"index":"shop","shard":1,"from_node":"es_node_one",

"to_node":"es_node_two"}},

{"cancel":{"index":"shop","shard":0,"node":"es_node_one"}}

]

}'

AllowingoperationsonprimaryshardsThecancelandallocatecommandsacceptanadditionalallow_primaryparameter.Ifsettotrue,ittellsElasticsearchthattheoperationcanbeperformedontheprimaryshard.Pleasebeadvisedthatoperationswiththeallow_primaryparametersettotruemayresultindataloss.

www.EBooksWorld.ir

Page 625: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

HandlingrollingrestartsThereisonemorethingthatwewouldliketodiscusswhenitcomestoshardandreplicaallocation—handlingrollingrestarts.WhenElasticsearchisrestarted,itmaytakesometimetogetitbacktothecluster.Duringthistime,therestoftheclustermaydecidetodorebalancingandmoveshardsaround.Whenweknowwearedoingrollingrestarts,forexample,toupdateElasticsearchtoanewversionorinstallaplugin,wemaywanttotellthistoElasticsearch.Theprocedureforrestartingeachnodeshouldbeasfollows:

First,beforeyoudoanymaintenance,youshouldstoptheallocationbysendingthefollowingcommand:

curl-XPUT'localhost:9200/_cluster/settings'-d'{

"transient":{

"cluster.routing.allocation.enable":"none"

}

}'

ThiswilltellElasticsearchtostopallocation.Afterthis,wewillstopthenodewewanttodomaintenanceonandstartitagain.Afteritjoinsthecluster,wecanenabletheallocationagainbyrunningthefollowing:

curl-XPUT'localhost:9200/_cluster/settings'-d'{

"transient":{

"cluster.routing.allocation.enable":"all"

}

}'

Thiswillenabletheallocationagain.Thisprocedureshouldberepeatedforeachnodewewanttoperformmaintenanceon.

www.EBooksWorld.ir

Page 626: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 627: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ControllingclusterrebalancingBydefault,Elasticsearchtriestokeeptheshardsandtheirreplicasevenlybalancedacrossthecluster.Suchbehaviorisgoodinmostcases,buttherearetimeswhenwewanttocontrolthisbehavior—forexample,duringrollingrestarts.Wedon’twanttorebalancetheentireclusterwhenoneortwonodesarerestarted.Inthissection,wewilllookathowtoavoidclusterrebalanceandcontrolthisprocess’behaviorindepth.

Imagineasituationwhereyouknowthatyournetworkcanhandleveryhighamountsoftrafficortheoppositeofthis—yournetworkisusedextensivelyandyouwanttoavoidtoomuchloadonit.TheotherexampleisthatyoumaywanttodecreasethepressurethatisputonyourI/Osubsystemafterafull-clusterrestartandyouwanttohavelessshardsandreplicasbeinginitializedatthesametime.Theseareonlytwoexampleswhererebalancecontrolmaybehandy.

www.EBooksWorld.ir

Page 628: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UnderstandingrebalanceRebalancingistheprocessofmovingshardsbetweendifferentnodesinourcluster.Aswehavealreadymentioned,itisfineinmostsituations,butsometimesyoumaywanttocompletelyavoidthis.Forexample,ifwedefinehowourshardsareplacedandwewanttokeepitthisway,wemaywanttoavoidrebalancing.However,bydefault,ElasticsearchwilltrytorebalancetheclusterwhenevertheclusterstatechangesandElasticsearchthinksarebalanceisneeded(andthedelayedtimeouthaspassedasdiscussedinThegatewayandrecoverymodulessectionofChapter9,ElasticsearchClusterinDetail).

www.EBooksWorld.ir

Page 629: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ClusterbeingreadyWealreadyknowthatourindicesarebuiltfromshardsandreplicas.Primaryshardsorjustshardsaretheonesthatgetthedatafirst.Thereplicasarephysicalcopiesoftheprimariesandgetthedatafromthem.Youcanthinkoftheclusterasbeingreadytobeusedwhenalltheprimaryshardsareassignedtotheirnodesinyourcluster–assoonastheyellowhealthstateisachieved.However,Elasticsearchmaystillinitializeothershards–thereplicas.However,youcanuseyourclusterandbesurethatyoucansearchyourentiredatasetandsendindexchangecommands.Thenthecommandswillbeprocessedproperly.

www.EBooksWorld.ir

Page 630: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheclusterrebalancesettingsElasticsearchletsuscontroltherebalanceprocesswiththeuseofafewpropertiesthatcanbesetintheelasticsearch.ymlfileorbyusingtheElasticsearchRESTAPI(asdescribedinTheupdatesettingsAPIsectionofChapter9,ElasticsearchClusterinDetail).

ControllingwhenrebalancingwillbeallowedThecluster.routing.allocation.allow_rebalancepropertyallowsustospecifywhenrebalancingisallowed.Thispropertycantakethefollowingvalues:

always:Rebalancingwillbeallowedassoonasit’sneededindices_primaries_active:Rebalancingwillbeallowedwhenalltheprimaryshardsareinitializedindices_all_active:Thedefaultone,whichmeansthatrebalancingwillbeallowedwhenalltheshardsandreplicasareinitialized

Thecluster.routing.allocation.allow_rebalancepropertycanbesetintheelasticsearch.ymlconfigurationfileandupdateddynamicallyaswell.

ControllingthenumberofshardsbeingmovedbetweennodesconcurrentlyThecluster.routing.allocation.cluster_concurrent_rebalancepropertyallowsustospecifyhowmanyshardscanbemovedbetweennodesatonceintheentirecluster.Ifyouhaveaclusterthatisbuiltfrommanynodes,youcanincreasethisvalue.Thisvaluedefaultsto2.Youcanincreasethedefaultvalueifyouwouldliketherebalancingtobeperformedfaster,butthiswillputmorepressureonyourclusterresourcesandwillaffectindexingandquerying.Thecluster.routing.allocation.cluster_concurrent_rebalancepropertycanbesetintheelasticsearch.ymlconfigurationfileandupdateddynamicallyaswell.

ControllingwhichshardsmayberebalancedThecluster.routing.allocation.enablepropertyallowsustospecifywhenwhichshardswillbeallowedtoberebalancedbyElasticsearch.Thispropertycantakethefollowingvalues:

all:Thedefaultbehavior,whichtellsElasticsearchtorebalancealltheshardsintheclusterprimaries:Thisvalueallowstherebalancingoftheprimaryshardsonlyreplicas:Thisvalueallowstherebalancingofthereplicashardsonlynone:Thisvaluedisablestherebalancingofalltypeofshardsforallindicesinthecluster

Thecluster.routing.allocation.enablepropertycanbesetintheelasticsearch.ymlconfigurationfileandupdateddynamicallyaswell.

www.EBooksWorld.ir

Page 631: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 632: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheCatAPITheElasticsearchAdminAPIisquiteextensiveandcoversalmosteverypartofElasticsearcharchitecture:fromlow-levelinformationaboutLucenetohigh-levelonesabouttheclusternodesandtheirhealth.AllthisinformationisavailableusingtheElasticsearchJavaAPIaswellastheRESTAPI.However,thereturneddata,eventhoughitisaJSONdocument,isnotveryreadablebyauser,atleastwhenitcomestotheamountofinformationgiven.

Becauseofthis,Elasticsearchprovidesuswithamorehuman-friendlyAPI–theCatAPI.ThespecialCatAPIreturnsdatainasimpletext,tabularformatandwhat’smore–itprovidesaggregateddatathatisusuallyusablewithoutanyfurtherprocessing.

www.EBooksWorld.ir

Page 633: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThebasicsThebaseendpointfortheCatAPIisquiteobvious:itis/_cat.Withoutanyparameters,itshowsalltheavailableendpointsforthisAPI.Wecancheckthisbyrunningthefollowingcommand:

curl-XGET'localhost:9200/_cat'

TheresponsereturnedbyElasticsearchshouldbesimilaroridentical(dependingonyourElasticsearchversion)tothefollowingone:

=^.^=

/_cat/allocation

/_cat/shards

/_cat/shards/{index}

/_cat/master

/_cat/nodes

/_cat/indices

/_cat/indices/{index}

/_cat/segments

/_cat/segments/{index}

/_cat/count

/_cat/count/{index}

/_cat/recovery

/_cat/recovery/{index}

/_cat/health

/_cat/pending_tasks

/_cat/aliases

/_cat/aliases/{alias}

/_cat/thread_pool

/_cat/plugins

/_cat/fielddata

/_cat/fielddata/{fields}

/_cat/nodeattrs

/_cat/repositories

/_cat/snapshots/{repository}

SolookingfromthetopElasticsearchallowsustogetthefollowinginformationusingtheCatAPI:

Shardallocation-relatedinformationAllshards-relatedinformation(alsoonelimitedtoagivenindex)InformationaboutthemasternodeNodesinformationIndicesstatistics(alsoonelimitedtoagivenindex)Segmentsstatistics(alsoonelimitedtoagivenindex)Documentscount(alsoonelimitedtoagivenindex)Recoveryinformation(alsoonelimitedtoagivenindex)ClusterhealthTaskspendingforexecutionIndexaliasesandindicesforagivenaliasThreadpoolconfiguration

www.EBooksWorld.ir

Page 634: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PluginsinstalledoneachnodeFielddatacachesizeandfielddatacachesizesforindividualfieldsNodeattributesinformationDefinedbackuprepositoriesSnapshotscreatedinthebackuprepository

www.EBooksWorld.ir

Page 635: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UsingCatAPIUsingtheCatAPIisassimpleasrunningtheGETrequesttotheoneofthepreviouslymentionedRESTend-points.Forexample,togetinformationabouttheclusterstate,wecouldrunthefollowingcommand:

curl-XGET'localhost:9200/_cat/health'

TheresponsereturnedbyElasticsearchfortheprecedingcommandshouldbesimilartothefollowingone,but,ofcourse,willbedependentonyourcluster:

144629204112:47:21elasticsearchyellow11212100210-50.0%

Thisiscleanandnice.Becauseitisintabularformat,itisalsoeasytousetheresponseintoolssuchasgrep,awk,orsed–astandardsetoftoolsforeveryadministrator.Itisalsomorereadableonceyouknowwhatitisallabout.

Toaddaheaderdescribingeachcolumnpurpose,wejustneedtoaddanadditionalvparameter,justlikethis:

curl-XGET'localhost:9200/_cat/health?v'

CommonargumentsEveryCatAPIendpointhasitsownarguments,butthereareafewcommonoptionsthataresharedamongallofthem:

v:Thisaddsaheaderlinetotheresponsewiththenamesofpresenteditems.h:Thisallowsustoshowonlythechosencolumns,forexampleh=status,node.total,shards,pri.help:Thislistsallthepossiblecolumnsthatthisparticularendpointisabletoshow.Thecommandshowsthenameoftheparameter,itsabbreviation,anddescription.bytes:Thisistheformatfortheinformationrepresentingthevaluesinbytes.Aswesaidearlier,theCatAPIisdesignedtobeusedbyhumansandbecauseofthis,bydefault,thesevaluesarerepresentedinhuman-readableform,forexample:3.5kBor40GB.Thebytesoptionallowsthesettingofthesamebaseforallthenumbers,sosortingornumericalcomparisonwillbeeasier.Forexample,bytes=bpresentsallvaluesinbytes,bytes=kinkilobytes,andsoon.

NoteForthefulllistofargumentsforeachCatAPIendpoint,pleaserefertotheofficialElasticsearchdocumentationavailableat:https://www.elastic.co/guide/en/elasticsearch/reference/2.2/cat.html.

www.EBooksWorld.ir

Page 636: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TheexamplesWhenwewrotethisbook,theCatAPIhadtwenty-twoendpoints.Wedon’twanttodescribethemall–itwouldbearepeatofinformationcontainedinthedocumentationanditdoesn’tmakesense.However,wedidn’twanttoleavethissectionwithoutanexampleregardingtheusageoftheCatAPI.Becauseofthis,wedecidedtoshowhoweasilyyoucangetinformationusingtheCatAPIcomparedtothestandardJSONAPIexposedbyElasticsearch.

GettinginformationaboutthemasternodeThefirstexampleshowshoweasyitistogetinformationaboutwhichnodeinourclusteristhemasternode.Bycallingthe/_cat/masterRESTendpointwecangetinformationaboutthenodesandwhichoneofthemiscurrentlybeingelectedasamaster.Forexample,let’srunthefollowingcommand:

curl-XGET'localhost:9200/_cat/master?v'

TheresponsereturnedbyElasticsearchformylocaltwo-nodeclusterlooksasfollows:

idhostipnode

Cfj3tzqpSNi5SZx4g8osAg127.0.0.1127.0.0.1Skin

Asyoucanseeinresponse,we’vegottheinformationaboutwhichnodeiscurrentlyelectedasthemaster:wecanseeitsidentifier,IPaddress,andname.

GettinginformationaboutthenodesThe/_cat/nodesRESTendpointprovidesinformationaboutallthenodesinthecluster.Let’sseewhatElasticsearchwillreturnafterrunningthefollowingcommand:

curl-XGET'localhost:9200/_cat/nodes?v&h=name,node.role,load,uptime'

Intheprecedingexample,wehaveusedthepossibilityofchoosingwhatinformationwewanttogetfromtheapproximatelyseventyoptionsofthisendpoint.Wehavechosentogetonlythenodename,itsrole—whetherthenodeisadataorclientnode-,nodeload,anditsuptime.

AndtheresponsereturnedbyElasticsearchlooksasfollows:

namenode.roleloaduptime

Skind2.001.3h

Asyoucansee,the/_cat/nodesRESTendpointprovidesalltherequestedinformationaboutthenodesinthecluster.

RetrievingrecoveryinformationforanindexAnotherniceexampleofusingtheCatAPIisgettinginformationabouttherecoveryofasingleindexoralltheindices.Inourcase,wewillretrieverecoveryinformationforasinglelibraryindexbyrunningthefollowingcommand:

curl-XGET'localhost:9200/_cat/recovery/library?

www.EBooksWorld.ir

Page 637: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

v&h=index,shard,time,type,stage,files_percent'

Theresponsefortheprecedingcommandlooksasfollows:

indexshardtimetypestagefiles_percent

library075storedone100.0%

library183storedone100.0%

library288storedone100.0%

library379storedone100.0%

library45storedone100.0%

www.EBooksWorld.ir

Page 638: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 639: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

WarmingupSometimes,theremaybeaneedtoprepareElasticsearchtohandleyourqueries.Maybeit’sbecauseyouheavilyrelyonthefielddatacacheandyouwantittobeloadedbeforeyourproductionqueriesarrive,ormaybeyouwanttowarmupyouroperatingsystem’sI/Ocachesothatthedataindicesfilesarereadfromthecache.Whateverthereason,Elasticsearchallowsustousesocalledwarmingqueriesforourtypesandindices.

www.EBooksWorld.ir

Page 640: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DefininganewwarmingqueryAwarmingqueryisnothingmorethantheusualquerystoredinaspecialtypecalled_warmerinElasticsearch.Let’sassumethatwehavethefollowingquerythatwewanttouseforwarmingup:

curl-XGETlocalhost:9200/library/_search?pretty-d'{

"query":{

"match_all":{}

},

"aggs":{

"warming_aggs":{

"terms":{

"field":"tags"

}

}

}

}'

Tostoretheprecedingqueryasawarmingqueryforourlibraryindex,wewillrunthefollowingcommand:

curl-XPUT'localhost:9200/library/_warmer/tags_warming_query'-d'{

"query":{

"match_all":{}

},

"aggs":{

"warming_aggs":{

"terms":{

"field":"tags"

}

}

}

}'

Theprecedingcommandwillregisterourqueryasawarmingquerywiththetags_warming_queryname.Youcanhavemultiplewarmingqueriesforyourindex,buteachofthesequeriesneedstohaveauniquename.

Wecannotonlydefinewarmingqueriesfortheentireindex,butalsoforthespecifictypeinit.Forexample,tostoreourpreviouslyshownqueryasthewarmingqueryonlyforthebooktypeinthelibraryindex,runtheprecedingcommandnottothe/library/_warmerURIbutto/library/book/_warmer.So,theentirecommandwillbeasfollows:

curl-XPUT'localhost:9200/library/book/_warmer/tags_warming_query'-d'{

"query":{

"match_all":{}

},

"aggs":{

"warming_aggs":{

"terms":{

"field":"tags"

}

}

www.EBooksWorld.ir

Page 641: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

}

}'

Afteraddingawarmingquery,beforeElasticsearchallowsanewsegmenttobesearchedon,itwillbewarmedupbyrunningthedefinedwarmingqueriesonthatsegment.ThisallowsElasticsearchandtheoperatingsystemtocachedataand,thus,speedupsearching.

JustaswereadintheFulltextsearchingsectionofChapter1,GettingStartedwithElasticsearchCluster,Lucenedividestheindexintopartscalledsegments,whichoncewrittencan’tbechanged.Everynewcommitoperationcreatesanewsegment(whichiseventuallymergedifthenumberofsegmentsistoohigh),whichLuceneusesforsearching.

NotePleasenotethattheWarmerAPIwillberemovedinthefutureversionsofElasticsearch.

www.EBooksWorld.ir

Page 642: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RetrievingthedefinedwarmingqueriesInordertogetaspecificwarmingqueryforourindex,wejustneedtoknowitsname.Forexample,ifwewanttogetthewarmingquerynamedastags_warming_queryforourlibraryindex,wewillrunthefollowingcommand:

curl-XGET'localhost:9200/library/_warmer/tags_warming_query?pretty'

TheresultreturnedbyElasticsearchwillbeasfollows:

{

"library":{

"warmers":{

"tags_warming_query":{

"types":["book"],

"source":{

"query":{

"match_all":{}

},

"aggs":{

"warming_aggs":{

"terms":{

"field":"tags"

}

}

}

}

}

}

}

}

Wecanalsogetallthewarmingqueriesfortheindexandtypeusingthefollowingcommand:

curl-XGET'localhost:9200/library/_warmer?pretty'

Andfinally,wecanalsogetallthewarmingqueriesthatstartwithagivenprefix.Forexample,ifwewanttogetallthewarmingqueriesforthelibraryindexthatstartwiththetagsprefix,wewillrunthefollowingcommand:

curl-XGET'localhost:9200/library/_warmer/tags*?pretty'

www.EBooksWorld.ir

Page 643: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DeletingawarmingqueryDeletingawarmingqueryisverysimilartogettingone;wejustneedtousetheDELETEHTTPmethod.Todeleteaspecificwarmingqueryfromourindex,wejustneedtoknowitsname.Forexample,ifwewanttodeletethewarmingquerynamedtags_warming_queryforourlibraryindex,wewillrunthefollowingcommand:

curl-XDELETE'localhost:9200/library/_warmer/tags_warming_query'

Wecanalsodeleteallthewarmingqueriesfortheindexusingthefollowingcommand:

curl-XDELETE'localhost:9200/library/_warmer/_all'

Andfinally,wecanalsoremoveallthewarmingqueriesthatstartwithagivenprefix.Forexample,ifwewanttoremoveallthewarmingqueriesforthelibraryindexthatstartwiththetagsprefix,wewillrunthefollowingcommand:

curl-XDELETE'localhost:9200/library/_warmer/tags*'

www.EBooksWorld.ir

Page 644: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DisablingthewarmingupfunctionalityTodisablethewarmingqueriestotallybuttosavetheminthe_warmerindex,youshouldsettheindex.warmer.enabledconfigurationpropertytofalse(settingthispropertytotruewillresultinenablingthewarmingupfunctionality).Thissettingcanbeeitherputintheelasticsearch.ymlfileorjustsetusingtheRESTAPIonalivecluster.

Forexample,ifwewanttodisablethewarmingupfunctionalityforthelibraryindex,wewillrunthefollowingcommand:

curl-XPUT'localhost:9200/library/_settings'-d'{

"index.warmer.enabled":false

}'

www.EBooksWorld.ir

Page 645: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ChoosingqueriesforwarmingFinally,weshouldaskourselvesonequestion:whichqueriesshouldbeconsideredascandidatesforwarming.Typically,you’llwanttochooseonesthatareexpensivetoexecuteandonesthatrequirecachestobepopulated.Soyou’llprobablywanttochoosequeriesthatincludeaggregationsandsortingbasedonthefieldsinyourindex.Thiswillforcetheoperatingsystemtoloadthepartoftheindicesthatholdthedatarelatedtosuchqueriesandimprovetheperformanceofconsecutivequeriesthatarerun.Inadditiontothis,parent-childqueriesandnestedqueriesarealsopotentialcandidatesforwarming.Youmayalsochooseotherqueriesbylookingatthelogs,andfindingwhereyourperformanceisnotasgreatasyouwantittobe.Suchqueriesmayalsobeperfectcandidatesforwarmingup.

Forexample,let’ssaythatwehavethefollowingloggingconfigurationsetintheelasticsearch.ymlfile:

index.search.slowlog.threshold.query.warn:10s

index.search.slowlog.threshold.query.info:5s

index.search.slowlog.threshold.query.debug:2s

index.search.slowlog.threshold.query.trace:1s

Andwehavethefollowinglogginglevelsetinthelogging.ymlconfigurationfile:

logger:

index.search.slowlog:TRACE,index_search_slow_log_file

Noticethattheindex.search.slowlog.threshold.query.tracepropertyissetto1sandtheindex.search.slowloglogginglevelissettoTRACE.Thismeansthatwheneveraqueryisexecutedforlongerthanonesecond(onashard,notintotal),itwillbeloggedintotheslowlogfile(thenameofwhichisspecifiedbytheindex_search_slow_log_fileconfigurationsectionofthelogging.ymlconfigurationfile).Forexample,thefollowingcanbefoundinaslowlogfile:

[2015-11-2519:53:00,248][TRACE][index.search.slowlog.query]

took[340000.2ms],took_millis[3400],types[],stats[],

search_type[QUERY_THEN_FETCH],total_shards[5],source[{"query":

{"match_all":{}},"aggs":{"warming_aggs":{"terms":{"field":"tags"}}}}],

extra_source[],

Asyoucansee,intheprecedinglogline,wehavethequerytime,searchtype,andthequerysource,whichshowsustheexecutedquery.

Ofcourse,thevaluescanbedifferentinyourconfigurationbuttheslowlogcanbeavaluablesourceofthequeriesthathavebeenrunningtoolongandmayneedtohavesomewarmupdefined;maybetheseareparent-childqueriesandneedsomeidentifierstobefetchedtoperformbetter,ormaybeyouareusingafilterthatisexpensivewhenyouexecuteitforthefirsttime.

Thereisonethingyoushouldremember:don’toverloadyourElasticsearchclusterwithtoomanywarmingqueriesbecauseyoumayendupspendingtoomuchtimeinwarmingupinsteadofprocessingyourproductionqueries.

www.EBooksWorld.ir

Page 646: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 647: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 648: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IndexaliasingandusingittosimplifyyoureverydayworkWhenworkingwithmultipleindicesinElasticsearch,youcansometimeslosetrackofthem.Imagineasituationwhereyoustorelogsinyourindicesortime-baseddataingeneral.Usually,theamountofdatainsuchcasesisquitelargeand,therefore,itisagoodsolutiontohavethedatadividedsomehow.Alogicaldivisionofsuchdataisobtainedbycreatingasingleindexforasingledayoflogs(ifyouareinterestedinanopensourcesolutionusedtomanagelogs,lookattheLogstashfromtheElasticsearchsuiteathttps://www.elastic.co/products/logstash).

However,aftersometime,ifwekeepalltheindices,wewillstarthavingaproblemintakingcareofallthat.Anapplicationneedstotakecareofalltheinformation,suchaswhichindextosenddatato,whichtoquery,andsoon.Withthehelpofaliases,wecanchangethistoworkwithasinglenamejustaswewoulduseasingleindex,butwewillworkwithmultipleindices.

www.EBooksWorld.ir

Page 649: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AnaliasWhatisanindexalias?It’sanadditionalnameforoneormoreindicesthatallowsustousetheseindicesbyreferringtothemwiththoseadditionalnames.Asinglealiascanhavemultipleindicesaswellastheotherwayround;asingleindexcanbeapartofmultiplealiases.

However,pleaserememberthatyoucan’tuseanaliasthathasmultipleindicesforindexingorforreal-timeGEToperations.Elasticsearchwillthrowanexceptionifyoudothis.Wecanstilluseanaliasthatlinkstoonlyasingleindexforindexing,though.ThisisbecauseElasticsearchdoesn’tknowinwhichindexthedatashouldbeindexedorfromwhichindexthedocumentshouldbefetched.

www.EBooksWorld.ir

Page 650: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

CreatinganaliasTocreateanindexalias,weneedtoruntheHTTPPOSTmethodtothe_aliasesRESTend-pointwithadefinedaction.Forexample,thefollowingrequestwillcreateanewaliascalledweek12thatwillincludetheindicesnamedday10,day11,andday12(weneedtocreatethoseindicesfirst):

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{"add":{"index":"day10","alias":"week12"}},

{"add":{"index":"day11","alias":"week12"}},

{"add":{"index":"day12","alias":"week12"}}

]

}'

Iftheweek12aliasisn’tpresentinourElasticsearchcluster,theprecedingcommandwillcreateit.Ifitispresent,thecommandwilljustaddthespecifiedindicestoit.

Wewouldrunasearchacrossthethreeindicesasfollows:

curl-XGET'localhost:9200/day10,day11,day12/_search?q=test'

Ifeverythinggoeswell,wecaninsteadrunitasfollows:

curl-XGET'localhost:9200/week12/_search?q=test'

Isn’tthisbetter?

Sometimeswehaveasetofindiceswhereeveryindexservesindependentinformationbutsomequeriesshouldgoacrossallofthem;forexample,wehavededicatedindicesforcountries(country_en,country_us,country_de,andsoon).Inthiscase,wewouldcreatethealiasbygroupingthemall:

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{"add":{"index":"country_*","alias":"countries"}}

]

}'

Thelastcommandcreatedonlyonealias.Elasticsearchallowsyoutorewritethistosomethinglessverbose:

curl-XPUT'localhost:9200/country_*/_alias/countries'

www.EBooksWorld.ir

Page 651: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ModifyingaliasesOfcourse,youcanalsoremoveindicesfromanalias.Wecandothissimilarlytohowweaddindicestoanalias,butinsteadoftheaddcommand,weusetheremoveone.Forexample,toremovetheindexnamedday9fromtheweek12index,wewillrunthefollowingcommand:

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{"remove":{"index":"day9","alias":"week12"}}

]

}'

www.EBooksWorld.ir

Page 652: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

CombiningcommandsTheaddandremovecommandscanbesentasasinglerequest.Forexample,ifyouwouldliketocombineallthepreviouslysentcommandsintoasinglerequest,youwillhavetosendthefollowingcommand:

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{"add":{"index":"day10","alias":"week12"}},

{"add":{"index":"day11","alias":"week12"}},

{"add":{"index":"day12","alias":"week12"}},

{"remove":{"index":"day9","alias":"week12"}}

]

}'

www.EBooksWorld.ir

Page 653: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RetrievingaliasesInadditiontoaddingorremovingindicestoorfromaliases,weandourapplicationsthatuseElasticsearchmayneedtoretrieveallthealiasesavailableintheclusterorallthealiasesthatanindexisconnectedto.Toretrievethesealiases,wesendarequestusingtheHTTPGETcommand.Forexample,thefollowingcommandgetsallthealiasesfortheday10indexandthesecondonewillgetalltheavailablealiases:

curl-XGET'localhost:9200/day10/_aliases'

curl-XGET'localhost:9200/_aliases'

Theresponsefromthesecondcommandisasfollows:

{

"day12":{

"aliases":{

"week12":{}

}

},

"library":{

"aliases":{}

},

"day11":{

"aliases":{

"week12":{}

}

},

"day9":{

"aliases":{}

},

"day10":{

"aliases":{

"week12":{}

}

}

}

Youcanalsousethe_aliasendpointtogetallaliasesfromthegivenindex:

curl-XGET'localhost:9200/day10/_alias/*'

Togetaparticularaliasdefinition,youcanusethefollowing:

curl-XGET'localhost:9200/day10/_alias/day12'

www.EBooksWorld.ir

Page 654: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RemovingaliasesYoucanalsoremoveanaliasusingthe_aliasendpoint.Forexample,sendingthefollowingcommandwillremovetheclientaliasfromthedataindex:

curl-XDELETElocalhost:9200/data/_alias/client

www.EBooksWorld.ir

Page 655: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

FilteringaliasesAliasescanbeusedinawaysimilartohowviewsareusedinSQLdatabases.YoucanuseafullQueryDSL(discussedindetailinChapter3,SearchingYourData)andhaveyourfilterappliedtoallcount,search,deletebyquery,andsoon.

Let’slookatanexample.Imaginethatwewanttohavealiasesthatreturndataforacertainclientsowecanuseitinourapplication.Let’ssaythattheclientidentifierweareinterestedinisstoredintheclientIdfieldandweareinterestedinthe12345client.So,let’screatethealiasnamedclientwithourdataindex,whichwillapplyaqueryforclientIdautomatically:

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{

"add":{

"index":"data",

"alias":"client",

"filter":{"term":{"clientId":12345}}

}

}

]

}'

Sowhenusingthedefinedalias,youwillalwaysgetyourrequestfilteredbyatermquerythatensuresthatallthedocumentshavethe12345valueintheclientIdfield.

www.EBooksWorld.ir

Page 656: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AliasesandroutingIntheIntroductiontoroutingsectionofChapter2,IndexingYourData,wetalkedaboutrouting.Similartoaliasesthatusefiltering,wecanaddroutingvaluestothealiases.Imaginethatweareusingroutingonthebasisofuseridentifierandwewanttousethesameroutingvalueswithouraliases.So,forthealiasnamedclient,wewillusetheroutingvaluesof12345,12346,and12347forquerying,andonly12345forindexing.Todothis,wewillcreateanaliasusingthefollowingcommand:

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{

"add":{

"index":"data",

"alias":"client",

"search_routing":"12345,12346,12347",

"index_routing":"12345"

}

}

]

}'

Thisway,whenweindexourdatausingtheclientalias,thevaluesspecifiedbytheindex_routingpropertywillbeused.Atthetimeofquerying,thevaluesspecifiedbythesearch_routingpropertywillbeused.

Thereisonemorething.Pleaselookatthefollowingquerysenttothepreviouslydefinedalias:

curl-XGET'localhost:9200/client/_search?q=test&routing=99999,12345'

Thevalueusedasaroutingvaluewillbe12345.ThisisbecauseElasticsearchwilltakethecommonvaluesofthesearch_routingattributeandthequeryroutingparameter,whichinourcaseis12345.

www.EBooksWorld.ir

Page 657: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ZerodowntimereindexingandaliasesOneofthegreatestadvantagesofusingaliasesistheabilitytore-indexthedatawithoutanydowntimefromthesystemusingElasticsearch.Toachievethis,youwouldneedtointeractwithyourindicesonlythroughaliases—bothforindexingandquerying.Insuchacase,youcanjustcreateanewindex,indexthedatahere,andswitchaliaseswhenneeded.Duringindexing,aliaseswouldstillpointtotheoldindex,sotheapplicationcouldworkasusual.

www.EBooksWorld.ir

Page 658: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 659: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SummaryInthischapter,wediscussedElasticsearchadministration.WestartedbylearninghowtoperformbackupsofourindicesandhowtomonitorourclusterhealthandstateusingitsAPI.Wecontrolledclustershardrebalancingandlearnedhowtoadjustshardallocationaccordingtoourneeds.We’veusedtheCATAPItogetinformationaboutElasticsearchinhuman-readableformandwe’vewarmedupourqueriestomakethemfaster.Finally,we’veusedaliasestoallowabettermanagementofourindicesandtohavemoreflexibility.

Inthenextandfinalchapterofthebook,wewillfocusonahypotheticalonlinelibrarystoretoseehowtomakeElasticsearchworkinpractice.Wewillstartwithabriefintroductionandhardwareconsiderations.WewilltuneasingleinstanceofElasticsearchandproperlyconfigureourclusterbydiscussingeachofitspartsandprovidingaproperarchitecture.Wewillverticallyexpandtheclusterandprepareitforbothhighqueryingandhighindexingload.Finally,wewilllearnhowtomonitorsuchapreparedcluster.

www.EBooksWorld.ir

Page 660: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 661: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Chapter11.ScalingbyExampleInthepreviouschapter,wediscussedElasticsearchadministration.WestartedwithdiscussionaboutbackupsandhowwecandothembyusingavailableAPI.Wemonitoredthehealthandstateofourclustersandnodesandwelearnedhowtocontrolshardrebalancing.WecontrolledtheshardandreplicasallocationandusedhumanfriendlyCatAPItogetinformationabouttheclusterandnodes.Wesawhowtousewarmerstospeeduppotentiallyheavyqueriesandweusedindexaliasingtomanageourindicesmoreeasily.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

HardwarepreparationsforrunningElasticsearchTuningasingleElasticsearchnodePreparinghighlyavailableandfaulttolerantclustersExpandingElasticsearchverticallyPreparingElasticsearchforhighqueryandindexingthroughputMonitoringElasticsearch

www.EBooksWorld.ir

Page 662: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

HardwareOneofthefirstdecisionsthatweneedtomakewhenstartingeveryserioussoftwareprojectisasetchoicesrelatedtohardware.Andbelieveus,thisisnotonlyaveryimportantchoice,butalsooneofthemostdifficultones.Oftenthedecisionsaremadeatearlyprojectstages,whenonlythebasicarchitectureisknownandwedon’thavepreciseinformationregardingthequeries,dataload,andsoon.Projectarchitecthastobalanceprecautionandprojectedcostofthewholesolution.Toomanytimesitisanintersectionofexperienceandclairvoyance,whichcanleadtoeithergreatorterribleresults.

www.EBooksWorld.ir

Page 663: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PhysicalserversoracloudLet’sstartwithadecision:acloud,virtual,orphysicalmachines.Nowadays,theseareallvalidoptions,butitwasnotalwaysthecase.Sometimeagotheonlyoptionwastobuynewserversforeachenvironmentpartorshareresourceswiththeotherapplicationsonthesamemachine.Thesecondoptionmakesperfectsenseasitismorecost-effectivebutintroducesrisk.Problemswithoneapplication,especiallywhentheyarehardwarerelated,willresultinproblemsforanotherapplication.YoucanimagineoneofyourapplicationsusingmostoftheI/OsubsystemofthephysicalmachineandalltheotherapplicationsstrugglingwithlotsofI/Owaitsandperformanceproblemsbecauseofthat.Virtualizationpromisesapplicationseparationandamoreconvenientwayofmanagingresources,butyouarestilllimitedbytheunderlyinghardware.Everyunexpectedtrafficcouldbeaproblemandaffectserviceavailability.Imaginethatyourecommercesitesuddenlygainsmassivenumberofcustomers.Insteadofbeinggladthatthespikeappearedandyouhavemorepotentialcustomers,yousearchforaplacewhereyoucanbuyadditionalhardwarethatwillbesuppliedassoonaspossible.

Cloudcomputingontheotherhandmeansamoreflexiblecostmodel.Wecaneasilyaddnewmachineswheneverweneed.Wecanaddthemtemporarilywhenweexpectagreaterload(forexample,beforeChristmasforanecommercesite)andpayonlyfortheactuallyusedprocessingpower.Itisjustafewclicksintheadminpanel.Evenmore,wecanalsosetupautomaticscaling,thatisnewvirtualmachinescanappearautomaticallywhenweneedthem.Cloud-basedsoftwarecanalsoshutthemdownwhenwedonotneedthemanymore.Thecloudhasmanyadvantages,suchaslowerinitialcost,abilitytoeasilygrowyourbusiness,andinsensitivitytotemporalfluctuationsofresourcerequirements,butitalsohasseveralflaws.Thecostsofcloudserversrisefasterthanthatofphysicalmachines.Also,massstorage,althoughpracticallyunlimited,hasworsecharacteristics(numberofoperationsperseconds)thanphysicalservers.Thisissometimesagreatproblemforus,especiallywithdiskbasedstoragesuchasElasticsearch.

Inpractice,asusual,thechoicecanbehardbutgoingthroughafewpointscanhelpyouwithyourdecision:

Businessrequirementsmaydirectlypointforyourownservers;forexample,someproceduresrelatedtofinancialormedicaldataautomaticallyexcludecloudsolutionshostedbythird-partyvendorsForproofofconceptandlow/mediumloadservices,thecloudcanbeagoodchoicebecauseofsimplicity,scalability,andlowcostSolutionswithstrongrequirementsconnectedwithI/OsubsystemswillprobablyworkbetteronbaremetalmachineswhereyouhavegreaterinfluencewhatstoragetypeisavailabletoyouWhenthetrafficcangreatlychangewithinashorttime,thecloudisaperfectplaceforyou

Forthepurposeoffurtherdiscussion,let’sassumethatwewanttobuyourownservers.Weareinthecomputerstorenowandlet’sbuysomething!

www.EBooksWorld.ir

Page 664: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

CPUInmostcases,thisistheleastimportantpart.YoucanchooseanymodernCPUmodelbutyoushouldknowthatmorenumberofcoresmeansahighernumberofconcurrentqueriesandindexingthreads.Thatwillleadtobeingabletoindexdatafaster,especiallywithcomplicatedanalysisandlotsofmerges.

www.EBooksWorld.ir

Page 665: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RAMmemoryMoregigabytesofRAMisalwaysbetterthanlessgigabytesofRAM.Memoryisnecessary,especiallyforaggregationandsorting.Itislessofaproblemnow,withElasticsearch2.0anddocvalues,butstillcomplicatedquerieswithlotsofaggregationrequirememorytoprocessthedata.Memoryisalsousedforindexingbuffersandcanleadtoindexingspeedimprovements,becausemoredatacanbebufferedinmemoryandthusdiskswillbeusedlessfrequently.Ifyoutrytousemorememorythanavailable,theoperatingsystemwillusetheharddisksastemporaryspace(itstartsswapping)andyoushouldavoidthisatallcost.NotethatyoushouldnevertrytoforceElasticsearchtouseasmuchaspossiblememory.ThefirstreasonisJavagarbagecollector–lessmemoryismoreGCfriendly.Thesecondreasonisthattheunusedmemoryisactuallyusedbytheoperatingsystemforbuffersanddiskcache.Infact,whenyourindexcanfitinthisspace,alldataisreadfromthesecachesandnotfromthedisksdirectly.Thiscandrasticallyimprovetheperformance.Bydefault,ElasticsearchandtheI/OsubsystemsharethesameI/Ocache,whichgivesanotherreasontoleaveevenmorememoryfortheoperatingsystemitself.

Inpractice,8GBisthelowestrequirementformemory.ItdoesnotmeanthatElasticsearchwillneverworkwithlessmemory,butformostsituationsanddataintensiveapplications,itisthereasonableminimum.Ontheotherhand,morethan64GBisrarelyneeded.Inlieu,thinkaboutscalingthesystemhorizontallyinsteadofassigningsuchamountsofmemorytoasingleElasticsearchnode.

www.EBooksWorld.ir

Page 666: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

MassstorageWesaidthatweareinagoodsituationwhenthewholeindexfitsintomemory.Inpracticethiscanbedifficulttoachieve,sogoodandfastdisksareveryimportant.Itisevenmoreimportantifoneoftherequirementsishighindexingthroughput.Insuchacase,youmayconsiderfastSSDdisks.Unfortunately,thesedisksareexpensiveifyourdatavolumeisbig.YoucanimprovethesituationbyavoidingusingRAID(seehttps://en.wikipedia.org/wiki/RAID),exceptRAID0.Inmostcases,whenyouhandlefaulttolerancebyhavingmultipleservers,theadditionallevelofsecurityontheRAIDlevelisunnecessary.Thelastthingistoavoidusingexternalstorage,suchasnetworkattachedstorage(NAS)orNFSvolumes.Thenetworklatencyinsuchcasesalwayskillsalltheadvantagesofthesesolutions.

www.EBooksWorld.ir

Page 667: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThenetworkWhenyouuseElasticsearchcluster,eachnodeopensseveralconnectionstoothernodesforvarioususes.Whenyouindex,thedataisforwardedtodifferentshardsandreplicas.Whenyouqueryfordata,thenodeusedforqueryingcanrunmultiplepartialqueriestotheothernodesandcomposereplyfromthedatafetchedfromtheothernodes.Thisiswhyyoushouldmakesurethatyournetworkisnotthebottleneck.Inpractice,useonenetworkforalltheserversintheclusterandavoidsolutionsinwhichthenodesintheclusterarespreadbetweendatacenters.

www.EBooksWorld.ir

Page 668: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

HowmanyserversTheanswerisalwaysthesame,asitdepends.Itdependsonmanyfactors:thenumberofrequestperseconds,thedatavolume,thelevelofthequery’scomplexity,theaggregationsandsortingusage,thenumberofnewdocumentsperunitoftime,howfastnewdatashouldbeavailableforsearching(therefreshtime),theaveragedocumentsize,andtheanalyzersused.Inpractice,thehandiestansweris-testitandapproximate.

Theonethingthatisoftenunderestimatedisdatasecurity.Whenyouthinkaboutfaulttoleranceandavailability,youshouldstartfromthreeservers.Why?WetalkedaboutthesplitbrainsituationintheMasterelectionconfigurationsectionofChapter9,ElasticsearchClusterinDetail.StartingfromthreeserversweareabletohandleasingleElasticsearchnodefailurewithouttakingdownthewholecluster.

www.EBooksWorld.ir

Page 669: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

CostcuttingYoudidsometests,consideredcarefullyplannedfunctionalities,estimatedvolumesandload,andwenttotheprojectownerwithanarchitecturedraft.“Itstooexpensive”,hesaidandaskedyoutothinkaboutserversonceagain.Whatcanwedo?

Let’sthinkaboutserverrolesandtrytointroducesomedifferencesbetweenthem.Ifoneoftherequirementsisindexingmassiveamountsofdataconnectedwithtime(maybelogs),thepossiblewayishavingtwogroupsofservers:hotnodes,whennewdataarrives,andcoldnodes,whenolddataismoved.Thankstothisapproach,hotnodesmayhavefasterbutsmallerdisks(thatis,solidstatedrives)inoppositetothecoldnodes,whenfastdisksarenotsoimportantbutspaceis.Youcanalsodivideyourarchitectureintoseveralgroupsasmasterservers(lesspowerful,withrelativlysmalldisks),datanodes(biggerdisks),andqueryaggregatornodes(moreRAM).Wewilltalkaboutthisinthefollowingsections.

www.EBooksWorld.ir

Page 670: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 671: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PreparingasingleElasticsearchnodeWhenwetalkaboutverticalscaling,weoftenmeanaddingmoreresourcestotheserverElasticsearchisrunningon.WecanaddmemoryorwecanswitchtoamachinewithabetterCPUorfasterdiskstorage.Ofcourse,withbettermachineswecanexpectanincreaseinperformance;dependingonourdeploymentanditsbottlenecks,itcanbeasmallorlargeimprovement.However,therearelimitationswhenitcomestoverticalscaling.Forexample,oneofthelimitationsisthemaximumamountofphysicalmemoryavailableforyourserversorthetotalmemoryrequiredbytheJVMtooperate.Whenhavinglargedataandcomplicatedqueries,youcanverysoonrunintomemoryissuesandaddingnewmemorymaynothelpatall.Inthissection,wewilltrytogiveyougeneraladviceonwheretolookandwhattotunewhenitcomestoasingleElasticsearchnode.

Thethingtorememberwhentuningyoursystemisperformancetests,onesthatcanberepeatedunderthesamecircumstances.Onceyoumakeachange,youneedtobeabletoseehowitaffectstheoverallperformance.Inadditiontothat,Elasticsearchscalesgreat.Usingthatknowledge,wecanrunperformancetestsonasinglemachine(orafewofthem)andextrapolatetheresults.Suchobservationsmaybeagoodstartingpointforfurthertuning.

Alsokeepinmindthatthissectiondoesn’tcontainadeepdiveintoallperformancerelatedtopics,butisdedicatedtoshowingyouthemostcommonthings.

www.EBooksWorld.ir

Page 672: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThegeneralpreparationsApartfromallthethingswewilldiscussinthissection,therearethreemajor,operatingsystemrelatedthingsyouneedtoremember:thenumberofallowedfiledescriptors,thevirtualmemory,andavoidingswapping.

NotethatthefollowingsectioncontainsinformationforLinuxoperatingsystems,butyoucanalsoachievesimilaroptionsonMicrosoftWindows.

AvoidingswappingLet’sstartwiththethirdone.ElasticsearchandJavaVirtualMachinebasedapplications,ingeneral,don’tliketobeswapped.Thismeansthattheseapplicationsworkbestiftheoperatingsystemdoesn’tputthememorythattheyuseintheswapspace.Thisisverysimple,because,toaccesstheswappedmemory,theoperatingsystemwillhavetoreaditfromthedisk,whichisslowandwhichwouldaffecttheperformanceinaverybadway.

Ifwehaveenoughmemory,andweshouldhaveifwewantourElasticsearchinstancetoperformwell,wecanconfigureElasticsearchtoavoidswapping.Todothat,wejustneedtomodifytheelasticsearch.ymlfileandincludethefollowingproperty:

bootstrap.mlockall:true

Thisisoneoftheoptions.Thesecondoneistosetthepropertyvm.swappinessinthe/etc/sysctl.conffileto0(forcompleteswapdisabling)or1forswappingonlyinemergency(forKernelversions3.5andabove).

Thethirdoptionistodisableswappingbyediting/etc/fstabandremovingthelinesthatcontaintheswapword.Thefollowingisanexample/etc/fstabcontent:

LABEL=cloudimg-rootfs/ext4defaults,discard00

/dev/xvdbswapswapdefaults00

Todisableswappingwewouldjustremovethesecondlinefromtheabovecontents.Wecouldalsorunthefollowingcommandtodisableswapping:

sudoswapoff-a

However,rememberthatthiseffectwon’tpersistbetweenloggingoffandbackintothesystem,sothisisonlyatemporarysolution.

Also,rememberthatifyoudon’thaveenoughmemorytorunElasticsearch,theoperatingsystemwilljustkilltheprocesswhenswappingisdisabled.

FiledescriptorsMakesureyouhaveenoughlimitsrelatedtofiledescriptorsfortheuserrunningElasticsearch(wheninstallingfromofficialpackages,thatuserwillbecalledelasticsearch).Ifyoudon’t,youmayendupwithproblemswhenElasticsearchtriestoflushthedataandcreatenewsegmentsormergesegmentstogether,whichcanresultinindexcorruption.

Toadjustthenumberofallowedfiledescriptors,youwillneedtoadjustthewww.EBooksWorld.ir

Page 673: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

/etc/security/limits.conffile(atleastonmostcommonLinuxsystems)andadjustoraddanentryrelatedtoagivenuser(forbothsoftandhardlimits).Forexample:

elasticsearchsoftnofile65536

elasticsearchhardnofile65536

Itisadvisedtosetthenumberofallowedfiledescriptorstoatleast65536,butevenmorecanbeneeded,dependingonyourindexsize.

OnsomeLinuxsystems,youmayalsoneedtoloadanappropriatelimitsmodulefortheprecedingsettingtotakeeffect.Toloadthatmodule,youneedtoadjustthe/etc/pam.d/loginfileandaddoruncommentthefollowingline:

sessionrequiredpam_limits.so

ThereisalsoapossibilitytodisplaythenumberoffiledescriptorsavailableforElasticsearchbyaddingthe-Des.max-open-files=trueparametertoElasticsearchstartupparameters.Forexample,likethis:

bin/elasticsearch-Des.max-open-files=true

Whendoingthat,Elasticsearchwillincludeinformationaboutthefiledescriptorsinthelogs:

[2015-12-2000:22:19,869][INFO][bootstrap]max_open_files

[10240]

VirtualmemoryElasticsearch2.2useshybriddirectoryimplementation,whichisacombinationofmmapfsandniofsdirectories.Becauseofthat,especiallywhenyourindicesarelarge,youmayneedalotofvirtualmemoryonyoursystem.Bydefault,theoperatingsystemlimitstheamountofmemorymappedfilesandthatcancauseerrorswhenrunningElasticsearch.Becauseofthat,werecommendincreasingthedefaultvalues.Todothat,youjustneedtoeditthe/etc/sysctl.conffileandsetthevm.max_map_countproperty;forexample,toavalueequalto262144.

Youcanalsochangethevaluetemporarilybyrunningthefollowingcommand:

sysctl-wvm.max_map_count=262144

www.EBooksWorld.ir

Page 674: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThememoryBeforethinkingaboutElasticsearchconfigurationrelatedthings,weshouldrememberaboutgivingenoughmemorytoElasticsearch.Ingeneral,weshouldn’tgivemorethan50-60percentofthetotalavailablememorytotheJVMprocessrunningElasticsearch.WedothatbecausewewanttoleavememoryfortheoperatingsystemandfortheoperatingsystemI/Ocache.However,weneedtorememberthatthe50-60percentfigureisnotalwaystrue.Youcanimaginehavingnodeswith256GBofRAMandhavingindicesof30GBintotalonsuchanode.Insuchcircumstances,evenassigningmorethan60percentofphysicalRAMtoElasticsearchwouldleaveplentyofRAMfortheoperatingsystem.ItisalsoagoodideatosettheXmxandXmspropertiestothesamevaluestoavoidJVMheapsizeresizing.

Anotherthingtorememberarethesocalledcompressedoops(http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html#compressedOop),theordinaryobjectpointers.Javavirtualmachinecanbetoldtousethembyaddingthe-XX:+UseCompressedOopsswitch.ThisallowsJavavirtualmachinetouselessmemorytoaddresstheobjectsontheheap.However,thisisonlytrueforheapsizeslessthanorequalto31GB.Goingforalargerheapmeansnocompressedoopsandhighermemoryusageforaddressingtheobjectsontheheap.

www.EBooksWorld.ir

Page 675: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

FielddatacacheandbreakingthecircuitAsweknow,bydefaultthefielddatacacheinElasticsearchisunbounded.Thiscanbeverydangerous,especiallywhenyouareusingaggregationsandsortingonmanyfieldsthatareanalysed,becausetheydon’tusedocvaluesbydefault.Ifthosefieldsarehighcardinalityones,thenyoucanrunintoevenmoretrouble.Bytroublewemeanrunningoutofmemory.

Wehavetwodifferentfactorswecantunetobesurethatwedon’trunintooutofmemoryerrors.Firstofall,wecanlimitthesizeofthefielddatacacheandweshoulddothat.Thesecondthingisthecircuitbreaker,whichwecaneasilyconfiguretojustthrowexceptionsinsteadofloadingtoomuchdata.Combiningthesetwothingstogetherwillensurethatwedon’trunintomemoryissues.

However,weshouldalsorememberthatElasticsearchwillevictdatafromthefielddatacacheifitssizeisnotenoughtohandleaggregationrequestsorsorting.Thiswillaffectthequeryperformancebecauseloadingthefielddatainformationisnotveryefficientandisresourceintensive.However,inouropinion,itisbettertohaveourqueriesslowerthanhavingourclusterblownupbecauseofoutofmemoryerrors.

ThefielddatacacheandcachesingeneralwerediscussedintheElasticsearchcachessectionofChapter9,ElasticsearchClusterinDetail.

www.EBooksWorld.ir

Page 676: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UsedocvaluesWheneveryouplantousesorting,aggregations,orscriptingheavily,youshouldusedocvalueswheneveryoucan.Thiswillnotonlysaveyouthememoryneededforthefielddatacache,becauseoffewerobjectsproduced,itwillalsomaketheJavavirtualmachineworkbetterwithlowergarbagecollectortime.DocvalueswerediscussedintheMappingsConfigurationsectionofChapter2,IndexingYourData.

www.EBooksWorld.ir

Page 677: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RAMbufferforindexingIntheElasticsearchcachessectionofChapter9,ElasticsearchClusterinDetail,wealsodiscussed.Thereareafewthingswewouldliketomention.Firstofall,themoreRAMfortheindexingbuffer,themoredocumentsElasticsearchwillbeabletoholdinmemory.Sothemorememorywehaveforindexing,thelessoftentheflushtodiskwillhappenandfewersegmentswillbecreated.Becauseofthat,yourindexingwillbefaster.Butofcourse,wedon’twantElasticsearchtooccupy100percentoftheavailablememory.KeepinmindthattheRAMbuffersaresetpershard,sotheamountofmemorythatwillbeuseddependsonthenumberofshardsandreplicasthatareassignedonthegivennodeandonthenumberofdocumentsyouindex.Youshouldsettheupperlimitssoyournodedoesn’tblowupwhenithasmultipleshardsassigned.

www.EBooksWorld.ir

Page 678: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IndexrefreshrateElasticsearchusesLuceneandweknowitbynow.ThethingwithLuceneisthattheviewoftheindexisnotrefreshedwhennewdataisindexedorsegmentsarecreated.Toseethenewlyindexeddata,weneedtorefreshtheindex.Bydefault,Elasticsearchdoesthatonceeverysecondandtheperiodofrefreshiscontrolledbyusingtheindex.refresh_intervalproperty,specifiedperindex.Thelowertherefreshrate,thesoonerthedocumentswillbevisibleforsearchoperations.However,thatalsomeansthatElasticsearchwillneedtoputmoreresourcesintorefreshingtheindexview,meaningthattheindexingandsearchingoperationswillbeslower.Thehighertherefreshrate,themoretimeyouwillhavetowaitbeforebeingabletoseethedatainthesearchresults,butyourindexingandqueryingwillbefaster.

www.EBooksWorld.ir

Page 679: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ThreadpoolsWehaven’ttalkedaboutthreadpoolsuntilnow,butwewouldliketomentionthemnow.EachElasticsearchnodeholdsseveralthreadpoolsthatcontroltheexecutionqueuesforoperationssuchasindexingorquerying.Elasticsearchusesseveralpoolstoallowcontroloverhowthethreadsarehandledandmuchthememoryconsumptionisallowedforuserrequests.

NoteJavavirtualmachineallowsapplicationstousemultiplethreads-concurrentlyrunningmultipleapplicationtasks.FormoreinformationaboutJavathreads,refertohttp://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html.

Therearemanythreadpools(wecanspecifythetypeweareconfiguringbyspecifyingthetypeproperty).However,forperformance,themostimportantare:

generic:Thisisthethreadpoolforgenericoperations,suchasnodediscovery.Bydefault,thegenericthreadpoolisoftypecached.index:Thisisthethreadpoolusedforindexinganddeletingoperations.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto200.search:Thisisthethreadpoolusedforsearchandcountrequests.Itstypedefaultstofixedanditssizetothenumberofavailableprocessorsmultipliedby3anddividedby2,withthesizeofthequeuedefaultingto1000.suggest:Thisisthethreadpoolusedforsuggestrequests.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto1000.get:Thisisthethreadpoolusedforrealtimegetrequests.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto1000.bulk:Asyoucanguess,thisisthethreadpoolusedforbulkoperations.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto50.percolate:Thisisthethreadpoolforpercolationrequests.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto1000.

NoteBeforeElasticsearch2.1,wecouldcontrolthetypeofthethreadpool.StartingwithElasticsearch2.1wecannolongerdothat.Formoreinformationpleaserefertotheofficialdocumentation-https://www.elastic.co/guide/en/elasticsearch/reference/2.1/breaking_21_removed_features.html

Forexample,ifwewanttoconfigurethethreadpoolforindexingoperationstohaveasizeof100andaqueueof500,wewillsetthefollowingintheelasticsearch.ymlconfigurationfile:

threadpool.index.size:100

threadpool.index.queue_size:500

www.EBooksWorld.ir

Page 680: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AlsorememberthatthethreadpoolconfigurationcanbeupdatedusingtheclusterupdateAPI.Forexample,likethis:

curl-XPUT'localhost:9200/_cluster/settings'-d'{

"transient":{

"threadpool.index.size":100,

"threadpool.index.queue_size":500

}

}'

Ingeneral,youdon’tneedtoworkwiththethreadpoolsandtheirconfiguration.However,whenconfiguringyourcluster,youmaywanttoputmoreemphasisonindexingorqueryingand,insuchcases,givingmorethreadsorlargerqueuestotheprioritizedoperationmayresultinmoreresourcesbeingusedforsuchoperations.

www.EBooksWorld.ir

Page 681: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 682: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

HorizontalexpansionElasticsearchisahighlyscalablesearchandanalyticsplatform.Wecanscaleitbothhorizontallyandvertically.WediscussedhowtotuneasinglenodeinthePreparingasingleElasticsearchnodesectionearlierinthischapterandwewouldliketofocusonhorizontalscalingnow;howtohandlemultiplenodesinthesamecluster,whatrolesshouldtheyhave,andhowtotunetheconfigurationtohaveahighlyreliable,available,andfaulttolerantcluster.

Youcanimagineverticalscalinglikebuildingaskyscrapper–wehavelimitedspaceavailableandweneedtogoashighaswecan.Ofcourse,thatisexpensiveandrequiresalotofengineeringdoneright.Ontheotherhand,wehavehorizontalscaling,whichislikehavingmanyhousesinaresidentialarea.Insteadofinvestingintohardwareandhavingpowerfulmachines,wechoosetohavemultiplemachinesandourdatasplitbetweenthem.Horizontalscalinggivesusvirtuallyunlimitedscalingpossibilities.Evenwiththemostpowerfulhardware,thetimecomeswhenasinglemachineisnotenoughtohandlethedata,thequeries,orbothofthem.Insuchcases,spreadingthedataamongmultipleserversiswhatsavesusandallowsustohaveterabytesofdatainmultipleindicesspreadacrossthewholecluster,justliketheoneinthefollowingimage:

Wehaveour4nodesclusterwiththelibraryindexcreatedandbuiltoffourshards.

Ifwewanttoincreasethequeryingcapabilitiesofourcluster,wecanjustaddadditionalnodes,forexample,fourofthem.Afteraddingnewnodestothecluster,wecaneithercreatenewindicesthatwillbebuiltofmoreshardstospreadtheloadmoreevenlyoraddreplicastothealreadyexistingshards.Bothoptionsareviable.Thisisbecausewedon’thavethepossibilityofsplittingshardsoraddingmoreprimaryshardstoanexistingindex.Weshouldgoforhavingmoreprimaryshardswhenourhardwareisnotenoughtohandletheamountofdataitholds.Insuchcases,weusuallyrunintooutofmemorysituations,longshardqueryexecutiontime,swapping,orhighI/Owaits.Thesecondoption,thatishavingreplicas,isthewaytogowhenourhardwareishappilyhandlingthedatawehavebutthetrafficissohighthatthenodesjustcan’tkeepup.

www.EBooksWorld.ir

Page 683: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Thefirstoptionissimple,butlet’slooksatthesecondcase-havingmorereplicas.Sowithfouradditionalnodes,ourclusterwouldlookasfollows:

Now,let’srunthefollowingcommandtoaddasinglereplica:

curl-XPUT'localhost:9200/library/_settings'-d'{

"index":{

"number_of_replicas":1

}

}'

Ourclusterviewwouldlookmoreorlessasfollows:

Asyoucansee,eachoftheinitialshardsbuildingthelibraryindexhasasinglereplicastoredonanothernode.ThenicethingaboutshardsandtheirreplicasisthatElasticsearchissmartenoughtobalancetheshardsinasingleindexandputthemonseparatenodes.Forexample,youwon’teverendupinasituationwhereyouhaveashardanditsreplicasonthesamenode.Also,Elasticsearchisabletoroundrobinthequeriesbetweentheshardsandtheirreplicas,whichmeansthatallthenodeswillbehitbythequeriesandwedon’thavetocareaboutthat.Becauseofthat,weareabletohandlealmostdoublethequeryloadcomparedtoourinitialdeployment.

www.EBooksWorld.ir

Page 684: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AutomaticallycreatingthereplicasLet’sstayabitlongeraroundreplicas.Elasticsearchallowsustoautomaticallyexpandreplicaswhentheclusterisbigenough.Thismeansthatthereplicascanbecreatedautomaticallywhennewnodesareaddedtothecluster.Youcanwonderwheresuchfunctionalitycanbeuseful.Imagineasituationwhereyouhaveasmallindexthatyouwouldliketobepresentoneverynodesothatyourpluginsdon’thavetorundistributedqueriesjusttogetthedatafromit.Inadditiontothat,yourclusterisdynamicallychanging,thatisyouaddandremovenodesfromit.ThesimplestwaytoachievesuchfunctionalityistoallowElasticsearchtoautomaticallyexpandthereplicas.Todothat,weneedtosetindex.auto_expand_replicasto0-all,whichmeansthattheindexcanhave0replicasorbepresentonallthenodes.SoifoursmallindexiscalledshopsandwewouldlikeElasticsearchtoautomaticallyexpanditsreplicastoallthenodesinthecluster,wewouldusethefollowingcommandtocreatetheindex:

curl-XPOST'localhost:9200/shops/'-d'{

"settings":{

"index":{

"auto_expand_replicas":"0-all"

}

}

}'

Wecanalsoupdatethesettingsofthatindexifitisalreadycreatedbyrunningthefollowingcommand:

curl-XPUT'localhost:9200/shops/_settings'-d'{

"index":{

"auto_expand_replicas":"0-all"

}

}'

www.EBooksWorld.ir

Page 685: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RedundancyandhighavailabilityTheElasticsearchreplicationmechanismnotonlygivesusabilitytohandlehigherquerythroughput,butalsogivesusredundancyandhighavailability.ImagineanElasticsearchclusterhostingasingleindexcalledlibrarythatisbuiltof2shardsand0replicas.Suchaclusterwouldlookasfollows:

Nowwhathappenswhenoneofthenodesfail?Thesimplestansweristhatweloseabout50percentofthedataand,ifthefailureisfatal,welosethatdataforever.Evenwhenhavingbackups,wewouldneedtospinupanothernodeandrestorethebackupandthattakestime.Duringthattime,yourapplication,orpartsofitthatarebasedonElasticsearch,can’tworkcorrectly.IfyourbusinessreliesonElasticsearch,downtimemeansmoneyloss.Ofcourse,wecanusereplicastocreatemorereliableclustersthatcanhandlethehardwareandsoftwarefailures.Andonethingtorememberisthateverythingwillfaileventually–ifthesoftwarewon’t,hardwarewill.Forexample,sometimeagoGooglesaidthatineachoftheirclusters,duringthefirstyearatleast1000machineswillfail(youcanreadmoreonthattopicathttp://www.cnet.com/news/google-spotlights-data-center-inner-workings/).Becauseofthat,weneedtobereadytohandlesuchcases.

Let’slookatthesameclusterbutwithonereplica:

NowlosingasingleElasticsearchnodemeansthatwestillhavethewholedataavailableandwecanworkonrestoringthefullclusterstructurewithoutdowntime.Ofcourse,thisisonlyaverysmallclusterbuiltoftwoElasticsearchnodesclusters.Thelargerthecluster,themorereplicas,themorefailureyouwillbeabletohandlewithoutworryingaboutthedataloss.Ofcourseyouwillhavelowerperformance,dependingonthepercentageofnodesthatfail,butthedatawillstillbethereandtheclusterwillbeoperational.

That’swhy,whendesigningyourarchitectureanddecidingonthenumberofnodesandindicesandtheirarchitecture,youshouldtakeintoconsiderationhowmanynodes,failure

www.EBooksWorld.ir

Page 686: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

youwanttolivewith.Ofcourse,youcan’tforgetabouttheperformancepartoftheequation,butredundancyandhighavailabilityshouldbeoneofthefactorsofthescalingequation.

www.EBooksWorld.ir

Page 687: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

CostandperformanceflexibilityThedefaultdistributednatureofElasticsearchanditsabilitytoscalehorizontallyallowsustobeflexiblewhenitcomestoperformanceandcoststhatwehavewhenrunningourenvironment.Firstofall,highendserverswithhighperformancedisks,numerousCPUcores,andalotofRAMarestillexpensive.Inadditiontothat,cloudcomputingisgettingmoreandmorepopularandifyouneedalotofflexibilityanddon’twanttohaveyourownhardware,youcanchoosesolutionssuchasAmazon(http://aws.amazon.com/),Rackspace(http://www.rackspace.com/),DigitalOcean(https://www.digitalocean.com/),andsoon.Theydonotonlyallowustorunoursoftwareonrentedmachines,butalsoallowustoscaleondemand.Wejustneedtoaddmoremachineswhichisafewclicksawayorcanevenbeautomatedwithsomedegreeofwork.

Usingahostedsolutionwithoneclickmachinerentingallowshavingatrulyhorizontallyscalablesolution.Ofcourse,that’snotcheap–youpayfortheflexibility.Butwecaneasilysacrificeperformanceifcostsarethemostcrucialfactorinourbusinessplan.Ofcourse,wecanalsogotheotherway.Ifwecanaffordlargebaremetalmachines,Elasticsearchclusterscanbepushedtohundredsofterabytesofdatastoredintheindicesandstillgetdecentperformance(ofcoursewithaproperhardwareandpropertydistributed).

www.EBooksWorld.ir

Page 688: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ContinuousupgradesHighavailability,costandperformanceflexibility,andvirtuallyendlessgrowtharenottheonlythingsworthtalkingaboutwhendiscussingthescalabilitysideofElasticsearch.Atsomepointintime,youwillwanttohaveyourElasticsearchclusterupgradedtoanewversion.Itcanbebecauseofbugfixes,performanceimprovements,newfeatures,oranythingthatyoucanthinkof.Thethingisthatwhenyouhaveasingleinstanceofeachshard,withoutreplicas,anupgrademeansunavailabilityofElasticsearch(oratleastitsparts)andthatmaymeandowntimeoftheapplicationsthatuseElasticsearch.Thisisanotherreasonwhyhorizontalscalingissoimportant;youcanperformupgrades,atleasttothepointwheresoftwaresuchasElasticsearchsupports.Forexample,youcantakeElasticsearch2.0andupgradetoElasticsearch2.1withonlyrollingrestarts(gettingonenodeoutofthecluster,upgradingit,bringingitback,andcontinuingwiththenextnodeuntilallthenodesaredone),thushavingallthedatastillavailableforsearchingandindexinghappeningatthesametime.

www.EBooksWorld.ir

Page 689: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

MultipleElasticsearchinstancesonasinglephysicalmachineHavingalargephysicalmachinewithlotofmemoryandCPUcoreshasadvantagesandsomechallenges.Firstofall,ifyoudecidetorunasingleElasticsearchnodeonthatmachine,youwillsoonerorlaterrunintogarbagecollectionissues,youwillhavelotsofshardsonasinglenodewhichwillrequireahighnumberofI/OoperationsfortheinternalElasticsearchcommunication(retrievingclusterstatistics),andsoso.What’smore,youusuallyshouldn’tgoabove31GBofheapmemoryforasingleJVMprocessbecauseyoucan’tusecompressedordinaryobjectpointers(https://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html).

Insuchcases,youcaneitherrunmultipleElasticsearchinstancesonthesamebaremetalmachine,runmultiplevirtualmachinesandasingleElasticsearchinsideeachone,orrunElasticsearchinacontainer,suchasDocker(http://www.docker.com/).Thisisoutofthescopeofthebook,but,becausewearetalkingaboutscaling,wethoughtitmaybeagoodthingtomentionwhatcanbedoneinsuchcases.

NoteThereisalsothepossibilityofrunningmultipleElasticsearchserversonasinglephysicalmachinewithoutrunningmultiplevirtualmachines.Whichroadtotake-virtualmachinesormultipleinstances-isreallyyourchoice.However,weliketokeepthingsseparateandbecauseofthatweusuallygofordividinganylargeserverintomultiplevirtualmachines.Whendividingonelargeserverintomultiplesmallervirtualmachines,rememberthattheI/Osubsystemwillbesharedacrossthosesmallervirtualmachines.Becauseofthat,itmaybegoodtowiselydividethedisksbetweenthevirtualmachines.

PreventingashardanditsreplicasfrombeingonthesamenodeThereisoneadditionalthingworthmentioning.Whenyouhavemultiplephysicalserversdividedintovirtualmachines,itiscrucialtoensurethattheshardanditsreplicadon’tenduponthesamephysicalmachine.Bydefault,ElasticsearchissmartenoughtonotputtheshardanditsreplicaonthesameElasticsearchinstance,butitdoesn’tknowanythingaboutbaremetalmachines,soweneedtotellit.WecantellElasticsearchtoseparatetheshardsandreplicasbyusingclusterallocationawareness.Inourpreviouscase,wehadthreephysicalservers.Let’scallthem:server1,server2,andserver3.

NowforeachElasticsearchonaphysicalserver,wedefinethenode.server_namepropertyandwesetittotheidentifieroftheserver(thenameofthepropertycanbeanythingwewant).Soforexample,forallElasticsearchnodesonthefirstphysicalserver,wewouldsetthefollowingpropertyintheelasticsearch.ymlconfigurationfile:

node.server_name:server1

Inadditiontothat,eachElasticsearchnode(nomatteronwhichphysicalserver)needstohavethefollowingpropertyaddedtotheelasticsearch.ymlconfigurationfile:

www.EBooksWorld.ir

Page 690: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

cluster.routing.allocation.awareness.attributes:server_name

IttellsElasticsearchnottoputtheprimaryshardanditsreplicasonthenodeswiththesamevalueinthenode.server_nameproperty.ThisisenoughforusandElasticsearchwilltakecareoftherest.

www.EBooksWorld.ir

Page 691: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

DesignatednoderolesforlargerclustersThereisonemorethingthatwewanttodiscussandemphasise.Whenitcomestolargeclusters,itisimportanttoassignrolestoallthenodesinthecluster.ThisallowsforatrulyfullyfaulttolerantandhighlyavailableElasticsearchcluster.TheroleswecanassigntoeachElasticsearchnodeareasfollows:

MastereligiblenodeDatanodeQueryaggregatornode

Bydefault,eachElasticsearchnodeisbothmastereligible(itcanserveasamasternode),canholddata,andworkasaqueryaggregatornode.Youmaywonderwhythatisneeded.Letusgiveyouasimpleexample:ifthemasternodeisunderalotofstress,itmaynotbeabletohandletheclusterstaterelatedcommandfastenoughandtheclustercouldbecomeunstable.Thisisonlyasingle,simpleexampleandyoucanthinkofnumerousothers.

Becauseofthat,mostElasticsearchclustersthatarelargerthanafewnodes,usuallylookliketheonepresentedinthefollowingpicture:

Asyoucansee,ourhypotheticalclustercontainsthreeclientnodes(becauseweknowthattherewillbealotofqueries),alargenumberofdatanodesbecausetheamountofdatawillbelarge,andatleastthreemastereligiblenodesthatshouldn’tbedoinganythingelse.WhythreemasternodeswhenElasticsearchwillonlyuseasingleoneatanygiventime?Again,becauseofredundancyandtobeabletopreventsplitbrainsituationsbysettingdiscovery.zen.minimum_master_nodesto2,whichwouldallowustoeasilyhandlethefailureofasinglemastereligiblenodeinthecluster.

Letusnowgiveyousnippetsoftheconfigurationforeachtypeofnodeinourcluster.WealreadytalkedaboutthatintheUnderstandingnodediscoverysectioninChapter9,ElasticsearchClusterinDetail,butwewouldliketomentionthatonceagain.

www.EBooksWorld.ir

Page 692: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

QueryaggregatornodesThequeryaggregatornodesconfigurationisquitesimple.Toconfigurethose,wejustneedtotellElasticsearchthatwedon’twantthosenodestobemastereligibleortoholddata.Thiscorrespondstothefollowingconfigurationsnippetsintheelasticsearch.ymlfile:

node.master:false

node.data:false

DatanodesDatanodesarealsoverysimpletoconfigure.Wejustneedtotellthattheyshouldnotbemastereligible.However,wearenotbigfansofdefaultconfigurations(becausetheytendtochange)andthusourElasticsearchdatanodesconfigurationlooksasfollows:

node.master:false

node.data:true

MastereligiblenodesWe’veleftthemastereligiblenodestotheendofthegeneralscalingsection.Ofcourse,suchElasticsearchnodesshouldn’tbeallowedtoholddata,but,inadditiontothat,itisagoodpracticetodisableHTTPprotocolonsuchnodes.Thisisdonetoavoidaccidentallyqueryingthosenodes.Mastereligiblenodescanuselessresourcesthandataandqueryaggregatornodesandbecauseofthatweshouldensurethattheyareonlyusedformasterrelatedpurpose.Soourconfigurationformastereligiblenodeslooksmoreorlessasfollows:

node.master:true

node.data:false

http.enabled:false

www.EBooksWorld.ir

Page 693: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 694: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

PreparingtheclusterforhighindexingandqueryingthroughputUntilthischapter,wemostlytalkedaboutdifferentfunctionalitiesofElasticsearch,bothintermsofhandlingqueries,indexingdata,andtuning.However,runningaclusterinproductionisnotonlyaboutusingthisgreatsearchengine,butalsoaboutpreparingtheclustertohandleboththeindexingandqueryingload.Let’snowsummarizetheknowledgewehaveandseewhatarethethingsweneedtocareaboutwhenitcomestopreparingtheclusterforhighindexingandqueryingthroughput.

www.EBooksWorld.ir

Page 695: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IndexingrelatedadviceInthissection,wewilllookattheindexingrelatedadvicearoundtuningElasticsearch.Eachproductionenvironmentdataisdifferent,indexrateisdifferent,anduser’sbehaviorisdifferent.Takethatintoconsiderationandrunperformancetestsonyourenvironment.Thiswillgiveyouthebestideaaboutwhattoexpectandwhatworksthebestinthecaseofyoursystem.

IndexrefreshrateOneofthegeneralthingsyoushouldpayattentiontoistheindexrefreshrate.Weknowthatrefreshratespecifieshowfastthedocumentswillbevisibleforsearchoperations.Theequationisquitesimple-thefastertherefreshrate,theslowerthequerieswillbeandthelowertheindexingthroughput.Ifwecanallowourselvestohaveaslowerrefreshrate,suchas10sor30s,goforit.ItwillputlesspressureonElasticsearch,Lucene,andhardwareingeneral.Rememberthatbydefaulttherefreshrateissetto1s,whichbasicallymeansthattheindexsearcherobjectisreopenedeverysecond.

Togiveyouabitofinsightintowhatperformancegainswearetalkingabout,wedidsomeperformancetestsincludingElasticsearchanddifferentrefreshrates.Withtherefreshrateof1swewereabletoindexabout1000documentspersecondusingasingleElasticsearchnode.Increasingtherefreshrateto5sgaveusincreaseinindexingthroughputofmorethan25percentandwewereabletoindexabout1250documentspersecond.Settingtherefreshrateto25sgaveusabout70percentofmorethroughputascomparedto1srefreshrate,whichwasabout1700documentspersecondonthesameinfrastructure.Itisalsoworthrememberingthatincreasingthetimeindefinitelydoesn’tmakemuchsense,becauseafteracertainpoint(dependingonyourdataloadandtheamountofdatayouhave)theincreaseofperformanceisnegligible.

Someperformancecomparisonsrelatedtoindexingthroughputandindexrefreshratecanbefoundintheblogpostathttp://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/.

ThreadpoolstuningBydefault,Elasticsearchcomeswithverygooddefaultswhenitcomestoallthreadpoolsconfiguration.Youshouldrememberthattuningthedefaultthreadpoolsconfigurationshouldbedoneonlywhenyoureallyseethatyournodesarefillingupthequeuesandtheyhavestillprocessingpowerleftthatcouldbedesignatedtotheprocessingofthewaitingoperationsorwhenyouwanttoincreasethepriorityofoneormoreoperations.

Forexample,ifyoudidyourperformancetestsandyousawyourElasticsearchinstancesnotbeingsaturated100percent,butontheotherhandyouexperiencedarejectedexecutionerror,thenthatisapointwhenyoushouldstartadjustingthethreadpools.Youcaneitherincreasetheamountofthreadsthatareallowedtobeexecutedatthesametimeorincreasethequeue.Ofcourse,youshouldalsorememberthatincreasingthenumberofconcurrentlyrunningthreadstoveryhighnumberswillleadtomanyCPUcontextswitches(http://en.wikipedia.org/wiki/Context_switch)whichwillresultinaperformance

www.EBooksWorld.ir

Page 696: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

drop.

AutomaticstorethrottlingBeforeElasticsearch2.0,wehadtocareabouthowoursegmentprocesswasconfiguredandhowmuchdiskI/Omergingcoulduseingeneral,butthatchanged.RightnowElasticsearchlooksathowI/Osubsystembehavesandadjuststhethrottlingandmergingprocessifthemergesarefallingbehindtheindexing.So,wenolongerneedtoautomaticallyadjustthrottlingfordiskbasedoperations.YoucanreadmoreabouttherelatedchangesonGitHubathttps://github.com/elastic/elasticsearch/pull/9243.

Handlingtime-baseddataWhenyouhavetime-baseddata,suchaslogsforexample,thearchitectureofyourindicesplaysaveryimportantrole.Let’sassumethatwehavelogsindexedintoElasticsearch.Theseusuallycomeinlargenumbers,areconstantlyindexed,andaretimerelated(aneventthatisloggedhappenedatacertainpointintime).TheassumptionisthatyouhaveacertainretentiontoyourdataandatimethatyouwouldlikethedatatobepresentandsearchableinElasticsearch.Afterthattime,youjustdeletethedataandforgetaboutit.

Withsuchassumptionsinmind,youcouldjustcreateasingleindexwithlotofshardsandtrytoindexlargeamountsoflogsthere.However,that’snottheperfectsolution.Firstofall,becauseofmerges–thelargertheindexgets,themoreexpensivethemergesare.ElasticsearchneedstomergelargerandlargersegmentsandmoreI/OandCPUisrequiredtohandlethem.Thismeansslowdowns.Inadditiontothat,deleteswillbeexpensivebecauseyouwillhavetodeletethedataeitherbyusingTTLorbyusingdeletebyqueryplugin–bothexpensivetouseintermsofperformanceandwillcauseevenmoremerging.Andthisisnoteverything–duringqueryingyouwillhavetorunthroughthewholeindextogeteventhesmallestsliceofthedata.So,aretherebetterindexarchitecturesfortime-baseddata?

Yes,oneofthemostcommonandbestsolutionsistousetimebasedindices.Dependingonthedatavolume,youcanhavedaily,weekly,monthly,orevenhourlyindices.Thedownsideisthenumberofshardsyouwillhavewhenthenumberofindicesgrow,butapartfromthatthereareonlypros:youcancontroleachindex,changethenumberofshardsifthatisneeded,andhavefastermergingbecausetheindiceswillbesmallercomparedtoonlyonebigindex.What’smore,deletingdatawon’tbepainfulatall–theideaistodeletethewholeindices;forexample,adayworthofdataincaseofdailyindices.Querieswillalsobenefit–youcanjustrunthequeryonasingletimebasedindextonarrowdownthesearchresults.Finally,Elasticsearch,bydefault,willcreatetheindicesforus.Forexample,whenusingdailyindices,wecanhavenamessuchaslogs_2016-01-01,logs_2016-01-02,andsoon.

TheonlythingweneedtocareaboutisprovidingtheindexnameonthebasisofthedateandcreatingtemplatestoconfigureeachnewlycreatedindexandElasticsearchwilldotherest.

Multipledatapaths

www.EBooksWorld.ir

Page 697: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

WiththereleaseofElasticsearch2.0,weweregiventheabilitytospecifymultiplepath.datapropertiesinourelasticsearch.ymlpointingtodifferentdirectoriesondifferentphysicaldevices.Elasticsearchcannowleveragethatbyputtingdifferentshardsondifferentdevicesandusingthemultiplepathsinthemostefficientway.Becauseofthat,wecanparallelizewritingtodisksifwehavemorethanasingledisk.Thisisespeciallyusefulforhighindexingusecaseswhereyouindexalotofdata.

DatadistributionAsweknow,eachindexintheElasticsearchworldcanbedividedintomultipleshardsandeachshardcanhavemultiplereplicas.IncaseswhenyouhavemultipleElasticsearchnodes(andyouwillprobablyhaveinproduction),youshouldthinkaboutthenumberofshardsandreplicasandhowthatwillaffectyournodes.Datadistributionmaybecrucialtoeventheloadontheclusterandnothavesomenodesdoingmoreworkthantheotherones.

Let’stakethefollowingexample.Imaginewehaveaclusterthatisbuiltof4nodesandithasasingleindexcalledbookbuiltof3shardsandonereplica.Suchadeploymentwilllookasfollows:

Asyoucansee,thefirsttwonodeshavetwophysicalshardsallocatedtothem,whilethelasttwonodeshaveonlyoneshardallocatedeach.Theactualdataallocationisnoteven.Whensendingthequeriesandindexingdata,wewillhavethefirsttwonodesdomoreworkthantheothertwo-thisiswhatwewanttoavoid.Oneoptionistohavethebookindexhavetwoshardsandonereplica,soitlooksasfollows:

www.EBooksWorld.ir

Page 698: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Thisarchitecturewillworkanditisperfectlyfine.Wedon’thavetohaveprimaryshardsonallournodes,wecanhavereplicas,dependingonwhatbottleneckweexpect.Forqueryingwemaywanttohavemorereplicas,forindexingmoreprimaries.

Wecanalsohaveourprimaryshardssplitevenly,likeinthefollowingimage:

ThethingtorememberthoughisthatinbothcaseswewillendupwithevendistributionofshardsandreplicasandElasticsearchwilldosimilaramountofworkonallthenodes.Ofcourse,withmoreindices(likehavingdailyindices)itmaybetrickiertogetthedataevenlydistributedanditmaynotbepossibletohaveevenlydistributedshards,butweshouldtrytogettosuchpoint.

Onemorethingtorememberwhenitcomestodatadistributionandshardsandreplicasisthatwhendesigningyourindexarchitecture,youshouldrememberwhatyouwanttoachieve.Ifyouaregoingforaveryhighindexingusecase,youmaywanttospreadtheindexintomultipleshardstolowerthepressurethatisputontheCPUandtheI/Osubsystemoftheserver.Thisisalsotrueforrunningexpensivequeries,becausewithmoreshardsyoucanlowertheloadonasingleserver.However,withthequeriesthereis

www.EBooksWorld.ir

Page 699: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

onemorething-ifyournodescan’tkeepupwiththeloadcausedbyqueries,youcanaddmoreElasticsearchnodesandincreasethenumberofreplicassothatthephysicalcopiesoftheprimaryshardsareplacedonthosenodes.Thatwillmakeindexingabitslowerbutwillgiveyouthecapacitytohandlemorequeriesatthesametime.

BulkindexingThisisveryobviousadvice,butyouwouldbesurprisedhowmanyElasticsearchusersforgetaboutindexingdatainbulksinsteadofsendingthedocumentsonebyone.Sotheadvicehereistodobulksinsteadofonebyoneindexingwheneverpossible.ThethingtorememberthoughisnottooverloadElasticsearchwithtoomanybulkrequestsandtokeepthemunderareasonablesize(donotpushmillionsofdocumentsinasinglerequest).Rememberaboutthebulkthreadpoolanditssizeandtrytoadjustyourindexersnottogobeyonditoryouwillfirststarttoqueuetheserequestsand,ifElasticsearchwillnotbeabletoprocessthem,youwillquicklystartseeingrejectedexecutionexceptionsandyourdatawon’tbeindexed.

Justasanexample,wewouldliketoshowresultsoftestswedidsometimeagoforthetwotypesofindexing:onebyoneandbulks.Inthefollowingimage,wehavetheindexingthroughputwhenrunningindexationonedocumentbyone:

Inthisnextimage,wedothesame,butinsteadofindexingdocumentsonebyone,weindextheminbatchesof10documents(whichisstillarelativelylownumberofdocumentsinabulk):

www.EBooksWorld.ir

Page 700: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Asyoucansee,whenindexingdocumentsonebyone,wewereabletoindexabout30documentspersecondanditwasstable.Thesituationchangedwithbulkindexingandbatchesof10documents;wewereabletoindexslightlymorethan200documentspersecond.Sothedifferencecanbeclearlyseen.

Ofcoursethisisaverybasiccomparisonofindexingspeed.Toshowtherealdifference,weshouldusedozensofthreadsandpushElasticsearchtoitslimits.However,theprecedingcomparisonshouldgiveyouabasicviewoftheindexingthroughputgainswhenusingbulkindexing.

RAMbufferforindexingRemember,themoreavailableRAMfortheindexingbuffer(theindices.memory.index_buffer_sizeproperty),themoredocumentsElasticsearchcanholdinmemory.However,wedon’twanttohaveElasticsearchoccupy100percentoftheavailablememory.Theindexingbuffercanhelpuswithdelayingtheflushtodisk,whichwillmeanlessI/Opressureandlessmerges.YoucanreadmoreaboutindexingbufferconfigurationinChapter9,ElasticsearchClusterinDetail.

www.EBooksWorld.ir

Page 701: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

AdviceforhighqueryratescenariosOneofthegreatfeaturesofElasticsearchisitsabilitytosearchandanalyzethedatathatwasindexed.However,sometimesitisnecessarytoadjustElasticsearchandourqueriestonotonlygettheresultsofthequery,butalsogetthemfast(orinareasonableamountoftime).Inthissection,wewilllookatthepossibilitiesofpreparingElasticsearchforhighquerythroughputusecases,butnotjustthat.Wewillalsolookatgeneralperformancetipswhenitcomestoquerying.

ShardrequestcacheThepurposeoftheshardrequestcacheistocacheaggregations,suggesterresults,andnumbersofhits(itwillnotcachethereturneddocumentsandthusonlyworkswithsize=0).Whenyourqueriesuseaggregationsorsuggestions,itmaybeagoodideatoenablethiscache(itisdisabledbydefault)sothatElasticsearchcanre-usethedatastoredthere.Thebestthingaboutthecacheisthatitpromisesthesamenearreal-timesearchasasearchthatisnotcached.YoucanreadmoreaboutcachesandtheshardrequestcacheinparticularinChapter9,ElasticsearchClusterinDetail.

ThinkaboutthequeriesThisisthemostgeneraladvicewecanactuallygive–youshouldalwaysthinkaboutoptimalquerystructure,filterusage,andsoon.Forexample,let’slookatthefollowingquery:

{

"query":{

"bool":{

"must":[

{

"query_string":{

"query":"masteringANDdepartment:itANDcategory:book",

"default_field":"name"

}

},

{

"term":{

"tag":"popular"

}

},

{

"term":{

"tag":"2014"

}

}

]

}

}

}

Itreturnsthebookmatchingafewconditions.However,thereareafewthingswecanimproveintheprecedingquery.Forexample,wecanmovethestaticthingssuchasthe

www.EBooksWorld.ir

Page 702: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

tag,department,andcategoryfieldrelatedconditionstothefiltersectionoftheBooleanquery,sothatthenexttimeweusesomepartsofthequerywesaveCPUcyclesandre-usetheinformationstoredincache.Thatstaticfilteringinformationisalsonotrelevantwhenitcomestoscoring.Becauseofthatwecanmovethosestaticelementstothefiltersectionandomitscoringcalculationforthem.Forexample,thisishowtheoptimizedquerywilllooklike:

{

"query":{

"bool":{

"must":[

{

"query_string":{

"query":"mastering",

"default_field":"name"

}

}

],

"filter":[

{

"term":{

"tag":"popular"

}

},

{

"term":{

"tag":"2014"

}

},

{

"term":{

"department":"it"

}

},

{

"term":{

"category":"book"

}

}

]

}

}

}

Asyoucansee,thereareafewthingsthatwedid.Westillusedtheboolquery,butweintroducedtheuseofthefiltersection.Weusedfilteringforthestatic,non-analyzedfields.Thisallowsustoeasilyre-usethefiltersinthenextqueriesthatweexecute.Becauseofsuchqueryrestructuring,wewereabletosimplifythemainquery.Thisisexactlywhatyoushouldbedoingwhenoptimizingyourqueriesordesigningthem-haveoptimizationandperformanceinmindandtrytokeepthemasoptimalastheycanbe.Thiswillresultinfasterexecutionofthequeries,lowerresourceconsumption,andbetterhealthofthewholeElasticsearchcluster.

www.EBooksWorld.ir

Page 703: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ParallelizeyourqueriesOnethingthatisusuallyforgottenistheneedofparallelizingqueries.Imaginethatyouhaveadozennodesinyourclusterbutyourindexisbuiltofasingleshard.Iftheindexislarge,yourquerieswillperformworsethanyouexpect.Ofcourseyoucanincreasethenumberofreplicas,butthatwon’thelp.Asinglequerywillstillgotoasingleshardinthatindex,becausereplicasarenotmorethanthecopiesoftheprimaryshardandtheycontainthesamedata(oratleasttheyshould).Thisisalsotruenotonlyforindiceshavingoneshardbutalsoifyouhavemorethanoneshard,buttheyareverylarge,youcanstillhaveperformancerelatedproblems.Itissaidthatthequeryisonlyasfastastheslowestpartialqueryresponse.

Ofcourse,theparallelizationalsodependsontheusecase.IfyourunalotofqueriestoElasticsearch,youmaynotneedtoparallelizethequeries,especiallywhentheshardsaresmallenoughandyoudon’tseeproblemsatshardlevel.Ingeneral,lookatyourElasticsearchnodesandseeiftheyhaveunusedCPUcoresand,ifthat’sthecase,youmayhaveroomforimprovementandparallelization.

FielddatacacheandbreakingthecircuitWehavetwodifferentfactorswecantunetobesurethatwedon’trunintooutofmemoryerrors.Firstofall,wecanlimitthesizeofthefielddatacache.Thesecondthingisthecircuitbreaker,whichwecaneasilyconfiguretojustthrowanexceptioninsteadofloadingtoomuchdata.Combiningthesetwothingswillensurethatwedon’trunintomemoryissues.Evenifyouareusingdocvaluesalot,youmaystillrunintooutofmemoryissues.Forexample,foranalysedfields,whichcan’tusedocvaluesandwilluse,fielddatacache–configurethefielddatacacheandcircuitbreakerscorrectly.YoucanreadmoreabouthowtoconfiguretheminChapter9,ElasticsearchClusterinDetail.

KeepsizeandshardsizeundercontrolWhendealingwithsomeofthequeriesthatuseaggregations,wehavethepossibilityofusingtwoproperties:sizeandshard_size.Thesizeparameterdefineshowmanybucketsshouldbereturnedbythefinalaggregationresults;thenodethataggregatesthefinalresultswillgetthetopbucketsfromeachshardthatreturnstheresultandwillonlyreturnthetopsizeofthemtotheclient.Theshard_sizeparametertellsElasticsearchaboutthesamebutattheshardlevel.Increasingthevalueoftheshard_sizeparameterwillleadtomoreaccurateaggregations(likeinthecaseofsignificanttermsaggregation)atthecostofnetworktrafficandmemoryusage.Loweringthatparameterwillcauseaggregationresultstobelessprecise,butwewillbenefitfromlowermemoryconsumptionandlowernetworktraffic.Ifweseethatthememoryusageistoolarge,wecanlowerthesizeandshard_sizepropertiesforproblematicqueriesandseeifthequalityoftheresultsisstillacceptable.

www.EBooksWorld.ir

Page 704: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 705: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

MonitoringElasticsearchmonitoringAPIsexposealotofinformation,bothaboutthesearchengineitselfaswellasabouttheenvironment,suchastheoperatingsystem.WesawthatinChapter10,AdministratingYourCluster.Becauseofthisandtheeaseofretrievingthisinformation,numerousapplicationswerebuilt–onesthatallowustodomonitoringandbeyond.Someoftheseapplicationsaresimpleandjustreadthedatainrealtimewithoutanypersistentstorage,whileothersallowustoreadhistoricaldataaboutourclusterbehavior.Inthischapter,wewillonlyslightlytouchthetopofthepileofinformationaboutsuchapplications,butwestronglyadviseyoutogetfamiliarwithsomeofthemastheycanmakeyoureverydayworkwithElasticsearcheasier.

WechosethreeexamplesofmonitoringsolutionswhichtakeadifferentapproachofintegrationwithElasticsearch.ThefirsttwotoolsareavailableasElasticsearchpluginsandthethirdtakesadifferentapproachtointegration.

www.EBooksWorld.ir

Page 706: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ElasticsearchHQThistoolisavailableasanElasticsearchpluginbutcanalsobedownloadedseparatelyasaJavaScriptapplicationruninabrowser.

ElasticsearchHQusesJavaScriptandAJAXtechniqueswheredataisfetchedperiodicallyfromthecluster,preparedforvisualizationonthebrowserside,andshowntotheuser.

Thetoolallowsustotrackstatisticsonaparticularnode.Thebrowsercanpresentvitalinformationabouttheclusterandparticularnodes.ThefollowingscreenshotshowsthegraphicaluserinterfacefromElasticsearchHQ:

Wehavethebasicinformationaboutthecluster,thenumberofnodes,andElasticsearchhealth.Wecanalsoseewhichnodewearelookingatandsomestatisticsaboutthenode,whichincludethememoryusage(bothheapandnon-heap),thenumberofthreads,Javavirtualmachinegarbagecollectorwork,andsoon.Thepluginalsopresentssimplifiedinformationaboutschemaandshardsandallowsexecutionofsimplequeries.

InordertoinstallElasticsearchHQ,oneshouldjustrunthefollowingcommand:

bin/plugininstallroyrusso/elasticsearch-HQ

Afterthat,theGUIwillbeavailableathttp://localhost:9200/_plugin/hq/.

OnethingtorememberisthatElasticsearchHQdoesn’tpersistthefetcheddataanywhere,sothedataisonlyfetchedwhenyourbrowserisrunningandhasElasticsearchHQopened.Ifsomethinghashappenedinthepast,youwon’tbeabletodiagnoseit.

www.EBooksWorld.ir

Page 707: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

MarvelMarvelisthetoolcreatedbytheElasticsearchteam.Inthecurrentversion,itisbuiltasapluginforavisualizationplatformcalledKibana(https://www.elastic.co/products/kibana).

NoteKibanaisoutofthescopeofthisbook.YoucanfindmoreaboutKibanaonofficialproductpageavailableat

https://www.elastic.co/.

Marvelalsovisualizesbasicinformationaboutclustersandnodesbydrawingnicegraphsthataredynamicallyupdatedovertime.ThemaindifferencefromElasticsearchHQisthattheperformancedataisstoredontheserverside(inthesameorexternalElasticsearchcluster),sohistoricaldataisavailable.Theexamplescreenshotispresentednext:

TheinstallationprocedureforMarvelcontainsthreesteps:

bin/plugininstalllicense

bin/plugininstallmarvel-agent

Andfinally,thethirdstepistoinstalltheMarvelplugininKibanabyrunningthefollowingcommand:

bin/kibanaplugin--installelasticsearch/marvel/latest

www.EBooksWorld.ir

Page 708: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SPMforElasticsearchThistoolpresentsadifferentapproachthanthepreviouslymentionedtools.SPMisaSoftwareasaService(SaaS)solutioncreatedformonitoringElasticsearchinstallationsofanysizeandallowsmonitoringseveralclustersanddifferenttechnologies.ThoughitsrootsareSaaS-based,itisalsoavailableonpremises,whichmeansthatyoucanrunSPMonyourownmachineswithouttheneedforsendingyourmetricstocloud.

InformationissentbysimpleclientsoftwareinstalledontheElasticsearchmachinetotheSPMservers.Themainadvantageisthepossibilityofstoringinformationforawiderrangeoftimeandseeingwhatwashappeninginthepast.Youcancreateyourowndashboardsandcorrelatemetricswithlogsbetweenmultipleapplications(SPMallowsyoutomonitorawidevarietyofapplications).

ThefollowingscreenshotshowsthedashboardofSPMforElasticsearch:

Theoverviewdashboardshownintheprecedingscreenshotprovidesinformationabouttheclusternodes,therequestrateandlatency,thenumberofdocumentsintheindices,CPUusage,load,memorydetails,Javavirtualmachinememory,thediskspaceusage,andfinallynetworktraffic.Youcangetdetailedinformationabouteachoftheseelementsbygoingintothetabdedicatedtoit.

YoucanfindadditionalinformationaboutSPMinstallationandavailableoptionsathttp://sematext.com/spm/index.html.

www.EBooksWorld.ir

Page 709: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

www.EBooksWorld.ir

Page 710: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

SummaryInthischapter,wefocusedonscalingandtuningElasticsearch.Westartedwiththehardwarepreparationsanddecisionsweneedtomake.Next,wetunedasingleElasticsearchnodeasmuchaswecouldandafterthatweconfiguredthewholeclustertoworkaswellasitcould.Wediscussedverticalexpansionpossibilitiesandwelearnedhowtomonitorourclusteronceithitstheproductionenvironment.

Sonowwehavereachedtheendofthebook.Wehopethatitwasanicereadingexperienceandthatyoufoundthebookinteresting.Sincethepreviouseditionofthebook,Elasticsearchhaschangedalot.Notonlywhenitcomestoversions,butalsowhenitcomestofunctionalities.Someofthefeaturesarenolongerthere,someofthemweremovedtoplugins,andofcoursenewfeatureswereadded.WereallyhopethatyouhavelearnedsomethingfromthisbookandnowyouwillfinditeasiertouseElasticsearcheveryday–nomatterifyouareabeginnerinthisworldorasemi–experiencedElasticsearchuser.Astheauthorsofthisbook,butalsoasElasticsearchusersourselves,wetriedtobringyou,ourreaders,thebestreadingexperiencewecould.OfcourseElasticsearchismorethanwedescribedinthebook,especiallywhenitcomestomonitoringandadministrationcapabilitiesandAPI.However,thenumberofpagesislimitedandifweweretodescribeeverythingingreatdetailswewouldhaveendedupwithabookonethousandpageslong.WeneedtorememberthatElasticsearchisnotonlyuserfriendlybutalsoprovidesalargeamountofconfigurationoptions,queryingpossibilities,andsoon.Duetothat,wehadtochoosewhichfunctionalitiestodescribeingreaterdetails,whichhadtobeonlymentioned,andwhichhadtobetotallyskipped.Aswiththetwopreviouseditionsofthebookyouareholding,wehopethatwemadetherightchoiceandthatyouarehappyaboutwhatyou’veread.

WewouldalsoliketosaythatitisworthrememberingthatElasticsearchisconstantlyevolving.Whenwritingthisbook,wewentthroughafewstableversionsfinallymakingittothereleaseofElasticsearch2.2.Evenbackthenweknewthatnewfeaturesandimprovementswerecoming,likesomeofthechangesmentionedinthebookthatwillbepartofthenextrelease,oratleasttheyareplannedtobe.BesuretochecktheofficialdocumentationofElasticsearchperiodicallyforthereleasenotesfornewversionsofElasticsearch,ifyouwanttobeuptodatewiththenewfeaturesbeingadded.Wewillalsobewritingaboutnewfeaturesthatwethinkareworthmentioningonwww.elasticsearchserverbook.com.Soifyouareinterested,visitthesitefromtimetotime.

Onceagainthankyouforthetimeyou’vespentwiththebook.

www.EBooksWorld.ir

Page 711: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

IndexA

advices,forhighqueryratescenariosabout/Adviceforhighqueryratescenariosshardrequestcache/Shardrequestcachequeries/Thinkaboutthequeriesqueries,parallelizing/Parallelizeyourqueriesfielddatacache/Fielddatacacheandbreakingthecircuitcircuit,breaking/Fielddatacacheandbreakingthecircuitsize,controlling/Keepsizeandshardsizeundercontrolshardsize,controlling/Keepsizeandshardsizeundercontrol

aggregationengineworking/Insidetheaggregationsengine

aggregationsabout/Aggregationsgeneralquerystructure/Generalquerystructuretypes/Aggregationtypesdate_histogram/Datehistogramaggregationgeodistanceaggregations/Geodistanceaggregationsgeohashgridaggregation/Geohashgridaggregationglobalaggregation/Globalaggregationsignificant_termsaggregation/Significanttermsaggregationsampleraggregation/Sampleraggregationchildrenaggregation/Childrenaggregationnestedaggregation/Nestedaggregationreverse_nestedaggregation/Reversenestedaggregationnestingaggregations/Nestingaggregationsandorderingbuckets

aggregations,typesmetrics/Aggregationtypes,Metricsaggregationsbuckets/Aggregationtypes,Bucketsaggregationspipeline/Aggregationtypes

AmazonURL/Costandperformanceflexibility

AmazonS3URL/Creatingasnapshotrepository

AnalyzeAPIURL/Definingyourownanalyzers

analyzersusing/Usinganalyzersout-of-the-boxanalyzers/Out-of-the-boxanalyzersdefining/Definingyourownanalyzersdefaultanalyzers/Defaultanalyzers

www.EBooksWorld.ir

Page 712: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ApacheLucene/GettingbacktoApacheLuceneURL/Fulltextsearchingglossary/TheLuceneglossaryandarchitecturearchitecture/TheLuceneglossaryandarchitecturedocument/TheLuceneglossaryandarchitecturefield/TheLuceneglossaryandarchitectureterm/TheLuceneglossaryandarchitecturetoken/TheLuceneglossaryandarchitecturetokenizer/Inputdataanalysisscoring/IntroductiontoApacheLucenescoring

ApacheLuceneJavadocsfortheTFIDFURL/Scoringandqueryrelevance

ApacheLucenescoringabout/IntroductiontoApacheLucenescoringdocumentmatching,factors/Whenadocumentismatcheddefaultscoringformula/Defaultscoringformularelevantdocuments/Relevancymatters

ApacheSolrURL/UsingApacheSolrsynonyms

ApacheSolrsynonymsusing/UsingApacheSolrsynonymsexplicitsynonyms/Explicitsynonymsequivalentsynonyms/Equivalentsynonymsexpandproperty/Expandingsynonyms

ApacheTikaURL/Detectingthelanguageofthedocument

arbitrarygeoshapesabout/Arbitrarygeoshapespoint/Pointenvelope/Envelopepolygon/Polygonmultipolygon/Multipolygonexampleusage/Anexampleusagestoring,inindex/Storingshapesintheindex

arguments,CatAPIURL/Commonarguments

attributes,indexstructuremappingindex_name/Commonattributesindex/Commonattributesstore/Commonattributesdoc_values/Commonattributesboost/Commonattributesnull_value/Commonattributescopy_to/Commonattributes

www.EBooksWorld.ir

Page 713: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

include_in_all/Commonattributesprecision_step/Number,Datecoerce/Numberignore_malformed/Number,Dateformat/Dateformat,referencelink/Datenumeric_resolution/Date

availableobjects,scriptexecution_doc/Objectsavailableduringscriptexecution_source/Objectsavailableduringscriptexecution_fields/Objectsavailableduringscriptexecution

AzureURL/Creatingasnapshotrepository

www.EBooksWorld.ir

Page 714: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Bbasicqueries

about/Basicqueriestermquery/Thetermquerytermsquery/Thetermsquerymatchallquery/Thematchallquerytypequery/Thetypequeryexistsquery/Theexistsquerymissingquery/Themissingquerycommontermsquery/Thecommontermsquerymatchquery/Thematchquerymultimatchquery/Themultimatchqueryquerystringquery/Thequerystringquerysimplequerystringquery/Thesimplequerystringqueryidentifiersquery/Theidentifiersqueryprefixquery/Theprefixqueryfuzzyquery/Thefuzzyquerywildcardquery/Thewildcardqueryrangequery/Therangequeryregularexpressionquery/Regularexpressionquerymorelikethisquery/Themorelikethisquery

batchindexingused,forspeedingupindexingprocess/Batchindexingtospeedupyourindexingprocess

Booleanpropertiessetnode.master/Configuringnoderolesnode.data/Configuringnoderolesnode.client/Configuringnoderoles

boolqueryabout/Theboolqueryshouldsection/Theboolquerymustsection/Theboolquerymust_notsection/Theboolqueryfilterparameter/Theboolqueryboostparameter/Theboolqueryminimum_should_matchparameter/Theboolquerydisable_coordparameter/Theboolqueryused,forexplicitfiltering/Explicitfilteringwithboolquery

boostingquery/Theboostingqueryboost_modeparameter

multiplyvalue/Structureofthefunctionqueryreplacevalue/Structureofthefunctionquerysumvalue/Structureofthefunctionquery

www.EBooksWorld.ir

Page 715: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

avgvalue/Structureofthefunctionquerymaxvalue/Structureofthefunctionqueryminvalue/Structureofthefunctionquery

bucketaggregationsordering/Nestingaggregationsandorderingbuckets,Bucketsordering

buckets/Generalquerystructurebucketsaggregations

about/Bucketsaggregationsfilteraggregation/Filteraggregationfiltersaggregation/Filtersaggregationtermsaggregation/Termsaggregationrangeaggregation/Rangeaggregationdate_rangeaggregation/Daterangeaggregationip_rangeaggregation/IPv4rangeaggregationmissingaggregation/Missingaggregationhistogramaggregation/Histogramaggregation

bulkindexingdata,preparing/Preparingdataforbulkindexing

www.EBooksWorld.ir

Page 716: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Ccaches

about/Elasticsearchcachesfielddatacache/Fielddatacachefielddata,usingwithdocvalues/Fielddataanddocvaluesshardrequestcache/Shardrequestcachenodequerycache/Nodequerycacheindexingbuffers/Indexingbuffersavoiding,scenarios/Whencachesshouldbeavoided

CatAPIabout/TheCatAPIdefining/Thebasicsusing/UsingCatAPIcommonarguments/Commonargumentsexamples/Theexamples,Gettinginformationaboutthenodes

childrenaggregationabout/Childrenaggregation

CIDRnotationURL/IPv4rangeaggregation

ClassDateTimeFormatURL/Tuningthetypedeterminingmechanismfordates

clientnodeabout/Noderoles,Clientnode

clusterabout/Nodesandclustersinstalling/Installingandconfiguringyourclusterconfiguring/Installingandconfiguringyourclusterdirectorylayout/Thedirectorylayoutsystem-specificinstallationandconfiguration/Thesystem-specificinstallationandconfiguration

clusterhealthAPIabout/ClusterhealthAPIinformationdetails,controlling/Controllinginformationdetailsadditionalparameters/Additionalparameters

clusterrebalancingcontrolling/Controllingclusterrebalancingdefining/Understandingrebalanceimplementing/Clusterbeingreadysettings/Theclusterrebalancesettings,Controllingthenumberofshardsbeingmovedbetweennodesconcurrently

clustersettingsAPI/TheclustersettingsAPIclusterwideallocation

about/Cluster-wideallocation

www.EBooksWorld.ir

Page 717: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

allocationawareness/Allocationawarenessallocationawareness,forcing/Forcingallocationawarenessfiltering/Filtering

CMSsystemURL/Creatinganewdocument

commontermsquery/Thecommontermsquerycompletionsuggester

about/CompletionsuggesterinElasticsearch2.2/Completionsuggester

completionsuggester,Elasticsearch2.1data,indexing/Indexingdataindexeddata,querying/Queryingindexedcompletionsuggesterdatacustomweights/Customweights

completionsuggester,Elasticsearch2.2about/Completionsuggester

compoundqueriesabout/Compoundqueriesboolquery/Theboolquerydis_maxquery/Thedis_maxqueryboostingquery/Theboostingqueryconstant_scorequery/Theconstant_scorequeryindicesquery/Theindicesquery

compressedoopsURL/Thememory

compressedordinaryobjectpointersreferencelink/MultipleElasticsearchinstancesonasinglephysicalmachine

configurationoptions,phrasesuggestermax_errors/Configurationseparator/Configuration

configurationoptions,termsuggestertext/Termsuggesterconfigurationoptionsfield/Termsuggesterconfigurationoptionsanalyzer/Termsuggesterconfigurationoptionssize/Termsuggesterconfigurationoptionssuggest_mode/Termsuggesterconfigurationoptionssort/Termsuggesterconfigurationoptions

constant_scorequery/Theconstant_scorequerycontent

searching,indifferentlanguages/Searchingcontentindifferentlanguagescontent,searchingindifferentlanguages

about/Searchingcontentindifferentlanguageslanguages,handling/Handlinglanguagesdifferentlymultiplelanguages,handling/Handlingmultiplelanguagesdocumentlanguage,detecting/Detectingthelanguageofthedocument

www.EBooksWorld.ir

Page 718: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

sampledocument/Sampledocumentmappings/Themappingsdata,querying/Queryingqueries,combining/Combiningqueries

contextsuggesterabout/Contextsuggestertypes/Contexttypesusing/Usingcontextgeolocationcontext,using/Usingthegeolocationcontext

contextswitchesreferencelink/Threadpoolstuning

coretypes,indexstructuremappingabout/Coretypescommonattributes/Commonattributesstring/Stringnumber/Numberboolean/Booleanbinary/Binarydate/Date

counttoitfield/Addingpartialdocumentscreate,retrieve,update,delete(CRUD)

URL/ManipulatingdatawiththeRESTAPIcURLcommand

URL/InstallingElasticsearch

www.EBooksWorld.ir

Page 719: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Ddata

manipulating,withRESTAPI/ManipulatingdatawiththeRESTAPIstoring,inElasticsearch/StoringdatainElasticsearchpreparing,forbulkindexing/Preparingdataforbulkindexingindexing/Indexingthedata_allfield/The_allfield_sourcefield/The_sourcefieldinternalfields/Additionalinternalfieldssorting/Sortingdatadefaultsorting/Defaultsortingquerying,inchilddocuments/Queryingdatainthechilddocumentsquerying,inparentdocuments/Queryingdataintheparentdocuments

data,thatisnotflatindexing/Indexingdatathatisnotflatdata/Dataobjects/Objectsarrays/Arraysmappings/Mappingsdynamicbehavior/Tobeornottobedynamicobjectindexing,disabling/Disablingobjectindexing

datanodeabout/Noderoles

dataquerying,casesidentifiedlanguage,using/Querieswithanidentifiedlanguageunknownlanguage,using/Querieswithanunknownlanguage

datasetsforegroundsets/Choosingsignificanttermsbackgroundsets/Choosingsignificantterms

datasortingabout/Sortingdatadefaultsorting/Defaultsortingfields,selecting/Selectingfieldsusedforsortingmode/Sortingmodebehaviorformissingfields,specifying/Specifyingbehaviorformissingfieldsdynamiccriteria/Dynamiccriteriascoring,calculating/Calculatescoringwhensorting

date_histogramaggregationsabout/Datehistogramaggregationtimezones/Timezones

DEBpackageused,forinstallingElasticsearch/InstallingElasticsearchusingtheDEBpackage

www.EBooksWorld.ir

Page 720: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

defaultindexing/Defaultindexingderivativeaggregation

URL/Derivativeaggregationdesignatednodesrolesforlargerclusters

about/Designatednoderolesforlargerclustersqueryaggregatornodes/Queryaggregatornodesdatanodes/Datanodesmastereligiblenodes/Mastereligiblenodes

DigitalOceanURL/Costandperformanceflexibility

directorylayout,clusterbin/Thedirectorylayoutconfig/Thedirectorylayoutlib/Thedirectorylayoutmodules/Thedirectorylayoutdata/Thedirectorylayoutlogs/Thedirectorylayoutplugins/Thedirectorylayoutwork/Thedirectorylayout

disk-basedshardallocationabout/Disk-basedshardallocationconfiguring/Configuringdiskbasedshardallocationdisabling/Disablingdiskbasedshardallocation

dis_maxquery/Thedis_maxqueryDocker

referencelink/MultipleElasticsearchinstancesonasinglephysicalmachinedocument

about/Documentcreating/Creatinganewdocumentautomaticidentifiercreation,creating/Automaticidentifiercreationretrieving/Retrievingdocumentsupdating/Updatingdocumentsnon-existingdocuments,dealingwith/Dealingwithnon-existingdocumentspartialdocuments,adding/Addingpartialdocumentsdeleting/Deletingdocuments

documenttype/Documenttypedoubletype

URL/Numberdynamictemplates

about/Templatesanddynamictemplates,Dynamictemplatesmatchingpattern/Thematchingpatterntargetfielddefinition,writing/Fielddefinitions

www.EBooksWorld.ir

Page 721: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

EElasticsearch

about/ThebasicsofElasticsearchkeyconcepts/KeyconceptsofElasticsearchindex/Indexdocument/Documentdocumenttype/Documenttypemapping/Mappingindexing/Indexingandsearching,Elasticsearchindexingsearching/IndexingandsearchingURL/InstallingElasticsearch,Availablesimilaritymodelsinstalling/InstallingElasticsearchrunning/RunningElasticsearchshuttingdown/ShuttingdownElasticsearchconfiguring/ConfiguringElasticsearchinstalling,withRPMpackage/InstallingElasticsearchusingRPMpackagesinstalling,withDEBpackage/InstallingElasticsearchusingtheDEBpackageconfigurationfiles,localization/Elasticsearchconfigurationfilelocalizationquerying/QueryingElasticsearch,Asimplequeryexampledata/Theexampledatapaging/Pagingandresultsizeresultsize,controlling/Pagingandresultsizeversionvalue,returning/Returningtheversionvaluescore,limiting/Limitingthescorereturnfields,selecting/Choosingthefieldsthatwewanttoreturnsourcefiltering/Sourcefilteringscriptfields,using/Usingthescriptfieldsparameters,passingtoscriptfields/Passingparameterstothescriptfieldsparametrs,passingtoscriptfields/Passingparameterstothescriptfieldsscriptingcapabilities/ScriptingcapabilitiesofElasticsearchspatialcapabilities/Elasticsearchspatialcapabilitiesreferencedocumentation,URL/Configurationplugins/Elasticsearchpluginscaches/Elasticsearchcacheshardwarepreparations/Hardwaremonitoring/MonitoringKibana,URL/Marvel

Elasticsearch2.1URL/Threadpools

Elasticsearch2.2completionsuggester/Completionsuggester

Elasticsearchclusterpreparing,forhighindexing/Preparingtheclusterforhighindexingand

www.EBooksWorld.ir

Page 722: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

queryingthroughputpreparing,forhighquerying/Preparingtheclusterforhighindexingandqueryingthroughput

ElasticsearchHQtoolusing/ElasticsearchHQ

Elasticsearchindexingabout/Elasticsearchindexingshards/Shardsandreplicasreplicas/Shardsandreplicasindices,creating/Creatingindices

Elasticsearchinfrastructurekeyconcepts/KeyconceptsoftheElasticsearchinfrastructurenode/Nodesandclusterscluster/Nodesandclustersshard/Shardsreplica/Replicasgateway/Gateway

Elasticsearchmonitoringabout/MonitoringElasticsearchHQtool,using/ElasticsearchHQMarveltool,using/MarvelSPMtool,using/SPMforElasticsearch

Elasticsearchtimemachineabout/Elasticsearchtimemachinesnapshotrepository,creating/Creatingasnapshotrepositorysnapshots,creating/Creatingsnapshotssnapshot,restoring/Restoringasnapshotparameters/Restoringasnapshotoldsnapshots,deleting/Cleaningup–deletingoldsnapshots

existsquery/TheexistsqueryExplainAPI

URL/Explainingthequeryexplaininformation

about/Understandingtheexplaininformationfieldanalysis/Understandingfieldanalysisquery,explaining/Explainingthequery

www.EBooksWorld.ir

Page 723: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Ffactors,forscorepropertycalculation

documentboost/Whenadocumentismatchedfieldboost/Whenadocumentismatchedcoord/Whenadocumentismatchedinversedocumentfrequency/Whenadocumentismatchedlengthnorm/Whenadocumentismatchedtermfrequency/Whenadocumentismatchedquerynorm/Whenadocumentismatched

FastVectorHighlighterURL/Underthehood

FedoraLinuxURL/InstallingElasticsearchusingRPMpackages

fielddatacacheabout/Fielddatacachesize,controlling/Fielddatasizecircuitbreakers/Circuitbreakers

fielddefinitionvariables,dynamictemplates{name}/Fielddefinitions{dynamic_type}/Fielddefinitions

filteringabout/Filteringinclude/Whatdoinclude,exclude,andrequiremeanrequire/Whatdoinclude,exclude,andrequiremeanexclude/Whatdoinclude,exclude,andrequiremean

filterslowercasefilter/Inputdataanalysissynonymsfilter/Inputdataanalysislanguagestemmingfilters/Inputdataanalysis

filtersandtokenizersURL/Definingyourownanalyzers

filtertypesURL/Definingyourownanalyzers

fulltextsearchingabout/FulltextsearchingApacheLucene,glossary/TheLuceneglossaryandarchitectureApacheLucene,architecture/TheLuceneglossaryandarchitectureinputdataanalysis/Inputdataanalysisindexing/Indexingandqueryingquerying/Indexingandqueryingscoring/Scoringandqueryrelevancequeryrelevance/Scoringandqueryrelevance

functionscorequery

www.EBooksWorld.ir

Page 724: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

about/Thefunctionscorequerystructure/Structureofthefunctionqueryweightfactorfunction/Theweightfactorfunctionfield_value_factorfunction/Fieldvaluefactorfunctionscript_scorefunction/Thescriptscorefunctionrandom_scorefunction/Therandomscorefunctiondecayfunctions/Decayfunctions

function_scorequeryURL/Decayfunctions

fuzzyqueryabout/Thefuzzyquery

www.EBooksWorld.ir

Page 725: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Ggateway/Gatewaygatewaymodule

about/Thegatewayandrecoverymodules,Thegatewaygatewayrecoveryoptions

gateway.recover_after_master_nodes/Additionalgatewayrecoveryoptionsgateway.recover_after_data_nodes/Additionalgatewayrecoveryoptionsgateway.expected_master_nodes/Additionalgatewayrecoveryoptionsgateway.expected_data_nodes/Additionalgatewayrecoveryoptions

generalpreparations,singleElasticsearchnodeabout/Thegeneralpreparationsswapping,avoiding/Avoidingswappingfiledescriptors/Filedescriptorsvirtualmemory/Virtualmemory,Thememory

Geo/Geoboundsaggregationgeodistanceaggregations

about/GeodistanceaggregationsGeohash

URL/Geohashgridaggregationgeohashgridaggregation

about/GeohashgridaggregationURL/Geohashgridaggregation

GeohashvalueURL/Exampledata

GeoJSONURL/Arbitrarygeoshapes

geospatialqueriesURL/Samplequeries

geo_fieldpropertiesgeohash/Additionalgeo_fieldpropertiesgeohash_precision/Additionalgeo_fieldpropertiesgeohash_prefix/Additionalgeo_fieldpropertiesignore_malformed/Additionalgeo_fieldpropertieslat_lon/Additionalgeo_fieldpropertiesprecision_step/Additionalgeo_fieldproperties

GitHubURL/Installingpluginsautomaticstorethrottling,URL/Automaticstorethrottling

Githubissue,URL/String

globalaggregationabout/Globalaggregation

Groovy

www.EBooksWorld.ir

Page 726: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

URL/ScriptingcapabilitiesofElasticsearch

www.EBooksWorld.ir

Page 727: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Hhardwarepreparations,forrunningElasticsearch

about/Hardwarephysicalservers/Physicalserversoracloudcloud/PhysicalserversoracloudCPU/CPURAMmemory/RAMmemorymassstorage/Massstoragenetwork/Thenetworkserverscounting/Howmanyserverscostcutting/Costcutting

HDFSURL/Creatingasnapshotrepository

highlightedfragmentscontrolling/Controllinghighlightedfragments

highlightertypeselecting/Forcinghighlightertype

highlightingabout/Highlightingusing/Gettingstartedwithhighlightingfieldconfiguration/FieldconfigurationApacheLucene,using/Underthehoodhighlightertype,selecting/ForcinghighlightertypeHTMLtags,,configuring/ConfiguringHTMLtagsglobalsettings/Globalandlocalsettingslocalsettings/Globalandlocalsettingsmatchingneed/Requirematchingcustomquery/CustomhighlightingqueryPostingshighlighter/ThePostingshighlighter,Validatingyourqueries

horizontalexpansionabout/Horizontalexpansionreplicas,automaticcreation/Automaticallycreatingthereplicasredundancy/Redundancyandhighavailabilityhighavailability/Redundancyandhighavailabilityreferencelinks/Redundancyandhighavailabilitycostandperformanceflexibility/Costandperformanceflexibilitycontinuesupgrades/ContinuousupgradesmultipleElasticsearchinstances,onsinglephysicalmachine/MultipleElasticsearchinstancesonasinglephysicalmachinedesignatednodesrolesforlargerclusters/Designatednoderolesforlargerclusters

howsimilarphrase/UnderstandingtheexplaininformationHTTPmodule

www.EBooksWorld.ir

Page 728: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

properties,URL/HTTPhostHTTPprotocol

URL/UnderstandingtheRESTAPIHTTPtransportsettings,adjusting

node/AdjustingHTTPtransportsettingsHTTP,disabling/DisablingHTTPHTTPport/HTTPportHTTPhost/HTTPhost

HyperLogLog++algorithmURL/Fieldcardinality

www.EBooksWorld.ir

Page 729: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Iidentifiersquery

about/Theidentifiersqueryindex

segments/TheLuceneglossaryandarchitectureabout/Index

index-timeboostingusing/Whendoesindex-timeboostingmakesense?defining,inmappings/Definingboostinginthemappings

indexaliasabout/Indexaliasingandusingittosimplifyyoureverydayworkdefining/Analiascreating/Creatinganaliasmodifying/Modifyingaliasescommands,combining/Combiningcommandsretrieving/Retrievingaliasesremoving/Removingaliasesfiltering/Filteringaliasesandrouting/Aliasesandroutingandzerodowntimereindexing/Zerodowntimereindexingandaliases

indexation/Inputdataanalysisindexingprocess

speedingup,batchindexingused/Batchindexingtospeedupyourindexingprocess

indexingrelatedadvicesabout/Indexingrelatedadviceindexrefreshrate/Indexrefreshratethreadpools,tuning/Threadpoolstuningautomaticstorethrottling/Automaticstorethrottlingtime-baseddata,handling/Handlingtime-baseddatamultipledatapaths/Multipledatapathsdatadistribution/Datadistributionbulkindexing/BulkindexingRAMbuffer,usedforindexing/RAMbufferforindexing

indexrefreshratereferencelink/Indexrefreshrate

indexstructuremodifying,withupdateAPI/ModifyingyourindexstructurewiththeupdateAPI

indexstructure,modifyingmappings/Themappingsnewfield,adding/Addinganewfieldtotheexistingindexexistingindexfields,modifying/Modifyingfieldsofanexistingindex

www.EBooksWorld.ir

Page 730: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

indexstructure,parent-childrelationshipabout/Indexstructureanddataindexingchildmappings/Childmappingsparentmappings/Parentmappingsparentdocument/Theparentdocumentchildrendocuments/Childdocuments

indexstructuremappingabout/Indexstructuremappingtypes/Typeandtypesdefinitiontypesdefinition/Typeandtypesdefinitionfields/Fieldscoretypes/Coretypesmultifields/MultifieldsIPaddresstype/TheIPaddresstypetokencounttype/Tokencounttype

indices,Elasticsearchindexingcreating/Creatingindicesautomaticcreation,altering/Alteringautomaticindexcreationnewlycreatedindex,settings/Settingsforanewlycreatedindexdeleting/Indexdeletion

indicesanalyzeAPIURL/Queryanalysis

indicesquery/TheindicesqueryindicessettingsAPI/TheindicessettingsAPIindicesstatsAPI

about/IndicesstatsAPIdocs/Docsstore/Storeindexing/Indexing,get,andsearchget/Indexing,get,andsearchsearch/Indexing,get,andsearchdefining/Additionalinformation

internalfields_id/Additionalinternalfields_uid/Additionalinternalfields_type/Additionalinternalfields_field_names/Additionalinternalfields

invertedindexabout/TheLuceneglossaryandarchitectureURL/Index

www.EBooksWorld.ir

Page 731: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

JJava

URL/Fulltextsearchinginstalling/InstallingJava

JavaScriptObjectNotation(JSON)URL/RunningElasticsearch

JavathreadsURL/Threadpools

JavatypesURL/Number

JavaVersion7URL/InstallingJava

JavaVirtualMachine(JVM)/ConfiguringElasticsearchJMeter

URL/WhencachesshouldbeavoidedJodaTimelibrary

URL/DaterangeaggregationJSON

URL/Document

www.EBooksWorld.ir

Page 732: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

KKibana

URL/Marvel

www.EBooksWorld.ir

Page 733: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Llanguageanalyzer

URL/Out-of-the-boxanalyzerslanguageanalyzers

URL/Sampledocumentlanguagedetection

URL/DetectingthelanguageofthedocumentLevenshteinalgorithm

URL/TheBooleanmatchqueryLinux

Elasticsearch,installing/InstallingElasticsearchonLinuxElasticsearch,configuringassystemservice/ConfiguringElasticsearchasasystemserviceonLinux

LogstashURL/Indexaliasingandusingittosimplifyyoureverydaywork

LuceneJavadocsURL/Defaultscoringformula

Lucenequerysyntaxabout/Lucenequerysyntax

www.EBooksWorld.ir

Page 734: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Mmapping/Mappingmappings

configuration/Mappingsconfigurationtypedeterminingmechanism/Typedeterminingmechanismindexstructuremapping/Indexstructuremappinganalyzers,using/Usinganalyzerssimilaritymodels/Differentsimilaritymodelsabout/Mappingsfinalmappings/Finalmappingssending,toElasticsearch/SendingthemappingstoElasticsearchnewfield,addingtoexistingindex/Addinganewfieldtotheexistingindexfieldofexistingindex,modifying/Modifyingfieldsofanexistingindex

Marveltoolusing/Marvel

masternodeabout/Noderoles,Masternode

matchallquery/Thematchallquerymatchingpattern,dynamictemplates

match/Thematchingpatternunmatch/Thematchingpattern

matchqueryabout/ThematchqueryBooleanmatchquery/TheBooleanmatchqueryphrasematchquery/Thephrasematchquerymatchphraseprefixquery/Thematchphraseprefixquery

MavenURL/Installingplugins

MavenCentralURL/Installingplugins

MavenSonatypeURL/Installingplugins

mergepolicyabout/Themergepolicyproperties/Themergepolicy

mergescheduler/Themergeschedulermetricsaggregations

about/Metricsaggregationsmin/Minimum,maximum,average,andsummax/Minimum,maximum,average,andsumavg/Minimum,maximum,average,andsumsum/Minimum,maximum,average,andsummissingvalues/Missingvalues

www.EBooksWorld.ir

Page 735: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

scripts,using/Usingscriptsfieldvaluestatistics/Fieldvaluestatisticsandextendedstatisticsextended_statistics/Fieldvaluestatisticsandextendedstatisticsvalue_countaggregation/Valuecountfieldcardinalityaggregation/Fieldcardinalitypercentilesaggregation/Percentilespercentile_ranksaggregation/Percentilerankstop_hitsaggregation/Tophitsaggregationtop_hitsaggregation,additionalparameters/Additionalparametersgeo_boundsaggregation/Geoboundsaggregationscriptedmetricsaggregation/Scriptedmetricsaggregation

MicrosoftWindowsplatformfilehandles,URL/ConfiguringElasticsearch

minimum_should_matchparameterURL/Theboolquery

missingquery/Themissingquerymorelikethisquery

about/Themorelikethisquerymovingaveragescalculation

URL/Pipelineaggregationsmoving_avgaggregation

URL/Movingavgaggregationabout/Movingavgaggregationfuturebuckets,predicting/Predictingfuturebucketsmodels/Themodelsmodels,URL/Themodels

multimatchquery/ThemultimatchquerymultipleElasticsearchinstances,onsinglephysicalmachine

about/MultipleElasticsearchinstancesonasinglephysicalmachineshard,preventingonsamenode/Preventingashardanditsreplicasfrombeingonthesamenodereplicas,preventingonsamenode/Preventingashardanditsreplicasfrombeingonthesamenode

multipleindicesURL/URIsearch

multiterm/Queryrewritemultivaluedfield/DocumentMustache

URL/ScriptingcapabilitiesofElasticsearch

www.EBooksWorld.ir

Page 736: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Nnativecode,using

factoryimplementation/Thefactoryimplementationnativescriptimplementation/Implementingthenativescriptplugindefinition/Theplugindefinitionplugin,installing/Installingthepluginscript,running/Runningthescript

nestedaggregationabout/Nestedaggregation

nestedobjectsusing/UsingnestedobjectsURL/Usingnestedobjectsnestedqueries/Scoringandnestedqueriesscore_modeproperty,setting/Scoringandnestedqueries

nestingaggregationsabout/Nestingaggregationsandorderingbuckets

networkattachedstorage(NAS)/Massstoragenode/Nodesandclusters

discoveryTopicnabout/Understandingnodediscoverydiscoverytypes/Understandingnodediscovery,Discoverytypesroles/Noderolesclustername,setting/Settingthecluster’snameZendiscovery/ZendiscoveryHTTPtransportsettings,adjusting/AdjustingHTTPtransportsettings

noderolesmasternode/Noderoles,Masternodedatanode/Noderoles,Datanodeclientnode/Noderoles,Clientnodeconfiguring/Configuringnoderoles

nodesinfoAPIabout/NodesinfoAPIrequisites/NodesinfoAPIextensiveinformation,returning/Returnedinformation

NoSQLURL/ManipulatingdatawiththeRESTAPI

number,indexstructuremappingbyte/Numbershort/Numberinteger/Numberlong/Numberfloat,URL/Numberfloat/Numberdouble/Number

www.EBooksWorld.ir

Page 737: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

double,URL/Number

www.EBooksWorld.ir

Page 738: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Oobjectindexing

disabling/Disablingobjectindexingofficialrepository

URL/InstallingpluginsOpenJDK

URL/InstallingJavaoptimisticlocking

URL/Versioningoptions,termsuggester

lowercase_terms/Additionaltermsuggesteroptionsmax_edits/Additionaltermsuggesteroptionsprefix_len/Additionaltermsuggesteroptionsmin_word_len/Additionaltermsuggesteroptionsshard_size/Additionaltermsuggesteroptions

out-of-the-boxanalyzersstandard/Out-of-the-boxanalyzerssimple/Out-of-the-boxanalyzerswhitespace/Out-of-the-boxanalyzersstop/Out-of-the-boxanalyzerskeyword/Out-of-the-boxanalyzerspattern/Out-of-the-boxanalyzerslanguage/Out-of-the-boxanalyzerssnowball/Out-of-the-boxanalyzers

www.EBooksWorld.ir

Page 739: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Pparameters,Booleanmatchquery

operator/TheBooleanmatchqueryanalyzer/TheBooleanmatchqueryfuzziness/TheBooleanmatchqueryprefix_length/TheBooleanmatchquerymax_expansions/TheBooleanmatchqueryzero_terms_query/TheBooleanmatchquerycero_terms_query/TheBooleanmatchquerylenient/TheBooleanmatchquery

parameters,fuzzyqueryvalue/Thefuzzyqueryboost/Thefuzzyqueryfuzziness/Thefuzzyqueryprefix_length/Thefuzzyquerymax_expansions/Thefuzzyquery

parameters,morelikethisqueryfields/Themorelikethisquerylike/Themorelikethisqueryunlike/Themorelikethisqueryin_term_freq/Themorelikethisquerymax_query_terms/Themorelikethisquerystop_words/Themorelikethisquerymin_doc_freq/Themorelikethisquerymin_word_len/Themorelikethisquerymax_word_len/Themorelikethisqueryboost_terms/Themorelikethisqueryboost/Themorelikethisqueryinclude/Themorelikethisqueryminimum_should_match/Themorelikethisqueryanalyzer/Themorelikethisquery

parameters,querystringqueryquery/Thequerystringquerydefault_field/Thequerystringquerydefault_operator/Thequerystringqueryanalyzer/Thequerystringqueryallow_leading_wildcard/Thequerystringquerylowercase_expand_terms/Thequerystringqueryenable_position_increments/Thequerystringqueryfuzzy_max_expansions/Thequerystringqueryfuzzy_prefix_length/Thequerystringqueryphrase_slop/Thequerystringqueryboost/Thequerystringquery

www.EBooksWorld.ir

Page 740: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

analyze_wildcard/Thequerystringqueryauto_generate_phrase_queries/Thequerystringqueryminimum_should_match/Thequerystringqueryfuzziness/Thequerystringquerymax_determined_states/Thequerystringquerylocale/Thequerystringquerytime_zone/Thequerystringquerylenient/Thequerystringquery

parameters,rangequerygte/Therangequerygt/Therangequerylte/Therangequerylt/Therangequery

parent-childrelationshipusing/Usingtheparent-childrelationshipindexstructure/Indexstructureanddataindexingdataindexing/Indexstructureanddataindexingquerying/Queryingperformanceconsiderations/Performanceconsiderations

parentaggregations/Availabletypespatternanalyzer

URL/Out-of-the-boxanalyzerspercolator

about/Percolatorindex/Theindexpreparing/Percolatorpreparationexploring/Gettingdeeperreturnedresultssize,controlling/Controllingthesizeofreturnedresultsusing,forandscorecalculation/Percolatorandscorecalculationcombining,withotherfunctionalities/Combiningpercolatorswithotherfunctionalitiesmatchingqueriescount,obtaining/Gettingthenumberofmatchingqueriesindexeddocumentspercolation/Indexeddocumentpercolation

phrasematchqueryslop/Thephrasematchqueryanalyzer/Thephrasematchquery

phrasesuggesterabout/Phrasesuggesterconfiguration/Configuration

pipelineaggregationsabout/PipelineaggregationsURL/Pipelineaggregationsparentaggregationfamily/Availabletypessiblingaggregationfamily/Availabletypes

www.EBooksWorld.ir

Page 741: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

types/Availabletypes,Pipelineaggregationtypesotheraggregations,referencing/Referencingotheraggregationsdata,gaps/Gapsinthedata

pipelineaggregations,typessum_bucket/Min,max,sum,andaveragebucketaggregationsmin_bucket/Min,max,sum,andaveragebucketaggregationsmax_bucket/Min,max,sum,andaveragebucketaggregationsavg_bucket/Min,max,sum,andaveragebucketaggregationscumulative_sumaggregation/Cumulativesumaggregationbucket_selectoraggregation/Bucketselectoraggregationbucket_scriptaggregation/Bucketscriptaggregationserial_diffaggregation/Serialdifferencingaggregationderivativeaggregation/Derivativeaggregationmoving_avgaggregation/Movingavgaggregation

pluginsabout/Elasticsearchpluginsbasics/Thebasicsinstalling/Installingpluginsremoving/Removingplugins

PostingsHighlighterURL/Underthehoodabout/ThePostingshighlighter

prefixquery/Theprefixqueryproperties,faultdetectionpingsettings

discovery.zen.fd.ping_interval/Faultdetectionpingsettingsdiscovery.zen.fd.ping_timeout/Faultdetectionpingsettingsdiscovery.zen.fd.ping_retries/Faultdetectionpingsettings

properties,mergepolicyindex.merge.policy.expunge_deletes_allowed/Themergepolicyindex.merge.policy.max_merge_at_once/Themergepolicyindex.merge.policy.max_merge_at_once_explicit/Themergepolicyindex.merge.policy.max_merged_segment/Themergepolicyindex.merge.policy.segments_per_tier/Themergepolicyindex.merge.policy.reclaim_deletes_weight/Themergepolicy

www.EBooksWorld.ir

Page 742: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Qqueries

selecting,forwarming/Choosingqueriesforwarmingqueryboost

applying,todocument/Theboostqueryboosts

used,forinfluencingscores/Influencingscoreswithqueryboostsabout/Theboostadding,toqueries/Theboost,Addingtheboosttoqueriesscore,modifying/Modifyingthescore

queryingdata,inchilddocuments/Queryingdatainthechilddocumentsdata,inparentdocuments/Queryingdataintheparentdocuments

queryingprocessabout/Understandingthequeryingprocessquerylogic/Querylogicsearchtype,specifying/Searchtypesearchexecutionpreference,specifying/SearchexecutionpreferencesearchshardsAPI,specifying/SearchshardsAPI

queryparserURL/Lucenequerysyntax

queryrewriteabout/Queryrewriteprefixquery,example/PrefixqueryasanexampleApacheLucene,using/GettingbacktoApacheLuceneproperties/Queryrewriteproperties

querystringqueryabout/Thequerystringqueryrunning,againstmultiplefields/Runningthequerystringqueryagainstmultiplefields

www.EBooksWorld.ir

Page 743: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

RRackspace

URL/CostandperformanceflexibilityRAID

URL/Massstoragerangeaggregation

about/Rangeaggregationkeyedbuckets/Keyedbuckets

rangequery/Therangequeryrecoverymodules

about/Thegatewayandrecoverymodulesrecoveryprocess

about/Recoverycontrolgatewayrecoveryoptions/AdditionalgatewayrecoveryoptionsindicesrecoveryAPI/IndicesrecoveryAPIdelayedallocation/Delayedallocationindexrecoveryprioritization/Indexrecoveryprioritization

regularexpressionqueryabout/RegularexpressionqueryURL/Regularexpressionquery

replica/Replicasreplicas,Elasticsearchindexing

about/Shardsandreplicaswriteconsistency,controlling/Writeconsistency

RESTAPIused,fordatamanipulation/ManipulatingdatawiththeRESTAPIabout/UnderstandingtheRESTAPIURL/UnderstandingtheRESTAPIdata,storinginElasticsearch/StoringdatainElasticsearchdocuments,retrieving/Retrievingdocumentsdocuments,updating/Updatingdocumentsdocuments,deleting/Deletingdocumentsversioning/Versioning

resultsfiltering/Filteringyourresultsquerycontext/Thecontextisthekeyexplicitfiltering,boolqueryused/Explicitfilteringwithboolquery

reverse_nestedaggregationabout/Reversenestedaggregation

rewriteproperty,valuesscoring_boolean/Queryrewritepropertiesconstant_score/Queryrewritepropertiesconstant_score_boolean/Queryrewriteproperties

www.EBooksWorld.ir

Page 744: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

top_terms/Queryrewritepropertiestop_terms_blendedfreqs/Queryrewritepropertiestop_terms_boost_N/Queryrewriteproperties

rightqueryselecting/Choosingtherightqueryusecases/Theusecasesresults,limitingtogiventags/Limitingresultstogiventagsvaluesinrange,searching/Searchingforvaluesinarange

routingabout/Introductiontorouting,Routingdefaultindexing/Defaultindexingdefaultsearching/Defaultsearchingparameters/Theroutingparametersfields/Routingfields

RPMpackageused,forinstallingElasticsearch/InstallingElasticsearchusingRPMpackages

www.EBooksWorld.ir

Page 745: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Ssample

distance-basedsorting/Distance-basedsortingboundingboxfiltering/Boundingboxfilteringdistance,limiting/Limitingthedistance

samplequeriesabout/Samplequeries

sampleraggregationabout/Sampleraggregation

scoreabout/IntroductiontoApacheLucenescoringinfluencing,withqueryboosts/Influencingscoreswithqueryboostsmodifying/Modifyingthescore

score,modifyingabout/Modifyingthescoreconstant_scorequery/Constantscorequeryboostingquery/Boostingqueryfunctionscorequery/Thefunctionscorequery

score_modeparameterabout/Structureofthefunctionquerymultiplevalue/Structureofthefunctionquerysumvalue/Structureofthefunctionqueryavgvalue/Structureofthefunctionqueryfirstvalue/Structureofthefunctionquerymaxvalue/Structureofthefunctionqueryminvalue/Structureofthefunctionquery

scriptfieldsselecting/Usingthescriptfieldsparameters,passingto/Passingparameterstothescriptfields

scriptingcapabilitiesabout/ScriptingcapabilitiesofElasticsearchscriptexecution,availableobjects/Objectsavailableduringscriptexecutionscript,types/Scripttypesquerying,scriptsused/Queryingwithscriptsparameters,using/Scriptingwithparameterslanguages,Groovy/Scriptlanguagesotherthanembeddedlanguages,using/Usingotherthanembeddedlanguagesnativecode,using/Usingnativecode

scriptpropertiesscript/Queryingwithscriptsinline/Queryingwithscriptsid/Queryingwithscriptsfile/Queryingwithscripts

www.EBooksWorld.ir

Page 746: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

lang/Queryingwithscriptsparams/Queryingwithscripts

scripts,scripted_metricaggregationinit_script/Scriptedmetricsaggregationmap_script/Scriptedmetricsaggregationcombine_script/Scriptedmetricsaggregationreduce_script/Scriptedmetricsaggregation

scripttypesabout/Scripttypesinlinescripts/Scripttypes,Inlinescriptsinfilescripts/Infilescriptsindexedscripts/Indexedscripts

ScrollAPIabout/TheScrollAPIproblemdefinition/Problemdefinitionproblemdefinition,solution/Scrollingtotherescue

searching/Defaultsearchingsearchingrequestexecution/Indexingandsearchingsegmentmerging

about/Introductiontosegmentmerging,Segmentmergingneedfor/Theneedforsegmentmergingmergepolicy/Themergepolicymergepolicy,basicproperties/Themergepolicymergescheduler/Themergeschedulerthrottling/Throttling

shardallocationIPaddress,usingfor/UsingtheIPaddressforshardallocationcancelling/Cancelingshardallocationforcing/ForcingshardallocationmultiplecommandsperHTTPrequest/MultiplecommandsperHTTPrequestoperations,allowingonprimaryshards/Allowingoperationsonprimaryshards

shardandreplicaallocationcontrolling/Controllingtheshardandreplicaallocationcontrolling,explicitly/Explicitlycontrollingallocationnodeparameters,specifying/Specifyingnodeparametersconfiguration/Configurationindex,creating/Indexcreationnodes,excluding/Excludingnodesfromallocationnodeattributes,requiring/Requiringnodeattributesnumberofshardsandreplicaspernode/Thenumberofshardsandreplicaspernodeallocationthrottling/Allocationthrottlingclusterwideallocation/Cluster-wideallocationshardsandreplicas,movingmanually/Manuallymovingshardsandreplicas

www.EBooksWorld.ir

Page 747: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

rollingrestarts,handling/Handlingrollingrestartsshardrequestcache

about/Shardrequestcacheenabling/Enablingandconfiguringtheshardrequestcacheconfiguring/Enablingandconfiguringtheshardrequestcacheperrequestshardrequestcache,disabling/Perrequestshardrequestcachedisablingusagemonitoring/Shardrequestcacheusagemonitoring

shards/Index,Shardsmoving/Movingshards

shards,Elasticsearchindexingabout/Shardsandreplicaswriteconsistency,controlling/Writeconsistency

siblingaggregations/Availabletypessignificant_termsaggregation

about/Significanttermsaggregationsignificantterms,selecting/Choosingsignificanttermsmultiplevalue,analyzing/Multiplevalueanalysis

similaritymodelsabout/Differentsimilaritymodelsper-fieldsimilarity,setting/Settingper-fieldsimilarityOkapiBM25model/Availablesimilaritymodelsrandomnessmodel,divergence/Availablesimilaritymodelsinformation-basedmodel/Availablesimilaritymodelsdefaultsimilarity,configuring/ConfiguringdefaultsimilarityBM25similarity,configuring/ConfiguringBM25similarityDFRsimilarity,configuring/ConfiguringDFRsimilarityIBsimilarity,configuring/ConfiguringIBsimilarity

simplequerystringqueryabout/ThesimplequerystringqueryURL/Thesimplequerystringquery

singleElasticsearchnodetuning/PreparingasingleElasticsearchnodegeneralpreparations/Thegeneralpreparationsfielddatacache/Fielddatacacheandbreakingthecircuitcircuit,breaking/Fielddatacacheandbreakingthecircuitdocvalues,using/UsedocvaluesRAMbuffer,usedforindexing/RAMbufferforindexingindexrefreshrate/Indexrefreshratethreadpools/Threadpools

snapshotscreating/Creatingsnapshotsadditionalparameters/Additionalparameters

snowballanalyzer

www.EBooksWorld.ir

Page 748: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

URL/Out-of-the-boxanalyzersSoftwareasaService(SaaS)/SPMforElasticsearchsourcefiltering/Sourcefilteringspan/Aspanspanfirstquery/Spanfirstqueryspannearquery/Spannearqueryspannotquery/Spannotqueryspanorquery/Spanorqueryspanqueries

using/Usingspanqueriesspan/Aspanspan_termquery/Spantermqueryspanfirstquery/Spanfirstqueryspannearquery/Spannearqueryspanorquery/Spanorqueryspannotquery/Spannotqueryspan_withinquery/Spanwithinqueryspan_containingquery/Spancontainingqueryspan_multiquery/Spanmultiqueryperformanceconsiderations/Performanceconsiderations

span_contaningquery/Spancontainingqueryspan_multiquery/Spanmultiqueryspan_termquery/Spantermqueryspan_withinquery/Spanwithinqueryspatialcapabilities

about/Elasticsearchspatialcapabilitiesmappingspreparation/Mappingpreparationforspatialsearchesexampledata/Exampledatageo_fieldproperties/Additionalgeo_fieldproperties

SPMtoolURL/SPMforElasticsearch

standardanalyzerURL/Out-of-the-boxanalyzers

stateandhealth,clustermonitoring/Monitoringyourcluster’sstateandhealthclusterhealthAPI/ClusterhealthAPIindicesstatsAPI/IndicesstatsAPInodesinfoAPI/NodesinfoAPInodesstatsAPI/NodesstatsAPIclusterstateAPI/ClusterstateAPIclusterstatsAPI/ClusterstatsAPIpendingtasksAPI/PendingtasksAPIindicesrecoveryAPI/IndicesrecoveryAPIindicesshardstoresAPI/IndicesshardstoresAPI

www.EBooksWorld.ir

Page 749: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

indicessegmentsAPI/IndicessegmentsAPIstaticproperties,forindexingbuffersizeconfiguration

indices.memory.index_buffer_size/Indexingbuffersindices.memory.min_index_buffer_size/Indexingbuffersindices.memory.max_index_buffer_size/Indexingbuffersindices.memory.min_shard_index_buffer_size/Indexingbuffers

statuscodedefinitionURL/Indexingthedata

stemmingURL/Out-of-the-boxanalyzers

stopanalyzerURL/Out-of-the-boxanalyzers

stopwordsURL/Thecommontermsquery

string,indexstructuremappingterm_vector/Stringanalyzer/Stringsearch_analyzer/Stringnorms.enabled/Stringnorms.loading/Stringposition_offset_gap/Stringindex_options/Stringignore_above/String

suggestersusing/UsingsuggestersURL/Usingsuggesters,Additionaltermsuggesteroptionstypes/Availablesuggestertypessuggestions,including/Includingsuggestionsresponse/Suggesterresponsetextproperty/Suggesterresponsescoreproperty/Suggesterresponsefreqproperty/Suggesterresponse

synonymrulesdefining/DefiningsynonymrulesApacheSolrsynonyms,using/UsingApacheSolrsynonymsWordNetsynonyms,using/UsingWordNetsynonyms

synonymsabout/Wordswiththesamemeaningfiltering/Synonymfilterinmappings/Synonymsinthemappingsstoring,infilesystem/Synonymsstoredonthefilesystemrules,defining/Definingsynonymrulesindex-timesynonymsexpansion/Queryorindex-timesynonymexpansionquery-timesynonymexpansion/Queryorindex-timesynonymexpansion

www.EBooksWorld.ir

Page 750: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

synonymsfilterusing/Synonymfilter

system-specificinstallationandconfigurationabout/Thesystem-specificinstallationandconfigurationElasticsearch,installingonLinux/InstallingElasticsearchonLinuxElasticsearch,configuringassystemserviceonLinux/ConfiguringElasticsearchasasystemserviceonLinuxElasticsearch,usingassystemserviceonWindows/ElasticsearchasasystemserviceonWindows

www.EBooksWorld.ir

Page 751: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

TT-Digestalgorithm

URL/Percentilestemplates

about/Templatesexample/Anexampleofatemplate

termquery/Thetermquerytermsaggregation

about/Termsaggregationapproximatecounts/Countsareapproximateminimumdocumentcount/Minimumdocumentcount

termsquery/Thetermsquerytermsuggester

about/Termsuggesterconfigurationoptions/Termsuggesterconfigurationoptionsoptions/Additionaltermsuggesteroptions

threadpoolsabout/Threadpoolsgeneric/Threadpoolsindex/Threadpoolssearch/Threadpoolssuggest/Threadpoolsget/Threadpoolsbulk/Threadpoolspercolate/Threadpools

throttling,adjustingtypesetting/Throttlingvalue/Throttlingnonevalue/Throttlingmergevalue/Throttlingallvalue/Throttling

timezonesURL/Timezones

tree-likestructuresindexing/Indexingtree-likestructuresdatastructure/Datastructureanalysis/Analysis

typedeterminingmechanismabout/Typedeterminingmechanismdisabling/Disablingthetypedeterminingmechanismtuning,fornumerictypes/Tuningthetypedeterminingmechanismfornumerictypestuning,fordates/Tuningthetypedeterminingmechanismfordates

www.EBooksWorld.ir

Page 752: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

typeproperty,valuesplain/Forcinghighlightertypefvh/Forcinghighlightertypepostins/Forcinghighlightertype

typequery/Thetypequerytypes,suggesters

term/Availablesuggestertypes,Termsuggesterphrase/Availablesuggestertypes,Phrasesuggestercompletion/Availablesuggestertypes,Completionsuggestercontext/Availablesuggestertypes,Contextsuggester

www.EBooksWorld.ir

Page 753: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

UUnicast

URL/DiscoverytypesupdateAPI

used,formodifyingindexstructure/ModifyingyourindexstructurewiththeupdateAPI

UpdateAPIURL/Addingpartialdocuments

updatesettingsAPIabout/TheupdatesettingsAPIclustersettingsAPI/TheclustersettingsAPIindicessettingsAPI/TheindicessettingsAPI

URIquerystringparametersabout/URIquerystringparametersquery/Thequerydefaultsearchfield/Thedefaultsearchfieldanalyzerproperty/Analyzerdefaultoperator/Thedefaultoperatorpropertyexplainparameter/Queryexplanationfieldsreturned/Thefieldsreturnedresults,sorting/Sortingtheresultssearchtimeout/Thesearchtimeoutresultswindow/Theresultswindowpershardresults,limiting/Limitingper-shardresultsunavailableindices,ignoring/Ignoringunavailableindicessearchtype/Thesearchtypelowercasingtermsexpansion/Lowercasingtermexpansionwildcardqueriesanalysis/Wildcardandprefixanalysisanalyze_wildcardproperty/Wildcardandprefixanalysisprefixqueriesanalysis/Wildcardandprefixanalysis

URIrequestqueryused,forsearching/SearchingwiththeURIrequestquerysampledata/SampledataURIsearch/URIsearchanalyzing/Queryanalysisparameters/URIquerystringparametersLucenequerysyntax/Lucenequerysyntax

URIsearchabout/URIsearchElasticsearchqueryresponse/ElasticsearchqueryresponseURL/Wildcardandprefixanalysis

www.EBooksWorld.ir

Page 754: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

VValidateAPI

using/UsingtheValidateAPIvalues,has_childqueryparameter

none/Queryingdatainthechilddocumentsmin/Queryingdatainthechilddocumentsmax/Queryingdatainthechilddocumentssum/Queryingdatainthechilddocumentsavg/Queryingdatainthechilddocuments

values,inrangesearching/Searchingforvaluesinarangematcheddocuments,boosting/Boostingsomeofthematcheddocumentslowerscoringpartialqueries,ignoring/IgnoringlowerscoringpartialqueriesLucenequerysyntax,usinginqueries/UsingLucenequerysyntaxinqueriesuserquerieswithouterrors,handling/Handlinguserquerieswithouterrorsprefixes,usedforprovidingautocompletefunctionality/Autocompleteusingprefixessimilarterms,finding/Findingtermssimilartoagivenonespans/Spans,spanseverywhere

values,score_modepropertyavg/Scoringandnestedqueriessum/Scoringandnestedqueriesmin/Scoringandnestedqueriesmax/Scoringandnestedqueriesnone/Scoringandnestedqueries

versioningabout/Versioningusageexample/Usageexamplefromexternalsystem/Versioningfromexternalsystems

verticalscaling/PreparingasingleElasticsearchnode

www.EBooksWorld.ir

Page 755: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

Wwarmingquery

about/Warmingupdefining/Defininganewwarmingquerydefinedwarmingqueries,retrieving/Retrievingthedefinedwarmingqueriesdeleting/Deletingawarmingquerywarmingupfunctionality,disabling/Disablingthewarmingupfunctionality

wildcardquery/ThewildcardqueryWindows

Elasticsearch,configuringassystemservice/ElasticsearchasasystemserviceonWindows

WordNetURL/UsingWordNetsynonyms

www.EBooksWorld.ir

Page 756: dl.ebooksworld.irdl.ebooksworld.ir/motoman/Packt.Elasticsearch.Server.3rd.Edition.w… · Table of Contents Elasticsearch Server Third Edition Credits About the Authors About the

ZZendiscovery

about/Zendiscoverymasterelectionconfiguration/Masterelectionconfigurationunicast,configuring/Configuringunicastfaultdetectionpingsettings/Faultdetectionpingsettingsclusterstateupdatescontrol/Clusterstateupdatescontrolmasterunavailability,dealingwith/Dealingwithmasterunavailability

www.EBooksWorld.ir