an introduction to distributed search with datastax enterprise search
DESCRIPTION
This is an overview of distributed search with Cassandra and Solr.TRANSCRIPT
![Page 1: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/1.jpg)
TOO BIG TO FAILAn Introduction to Distributed Search with Cassandra and Solr
OpenSource Connections@PatriciaGorla
![Page 2: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/2.jpg)
ABOUT MESystems AnalystProgramming
Information Retrieval
![Page 3: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/3.jpg)
Created at Facebook to power inbox search
Distributed data store run on commodity servers
Highly available
No one single point of failure
CASSANDRA
![Page 4: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/4.jpg)
WHO USES CASSANDRA?
![Page 5: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/5.jpg)
SEARCH + CASSANDRA, 1
• First implementation: Solandra (originally Lucandra)
• Replaced Lucene index with Cassandra column families
![Page 6: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/6.jpg)
SEARCH + CASSANDRA, 2
•DataStax Enterprise Search
• Uses native Lucene index
• All data is retrieved from Cassandra
![Page 7: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/7.jpg)
Datastax Enterprise Search Cluster
DistributedLinearly ScalableHighly AvailableEventually ConsistentFull-text searchAggregation
![Page 8: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/8.jpg)
SETTING UP THE SCHEMA
• <fields>
• <field name="id" type="string" indexed="true" stored="true"/>
• <field name="name" type="text" indexed="true" stored="true"/>
• <field name="body" type="text" indexed="true" stored="true"/>
• <field name="title" type="text" indexed="true" stored="true"/>
• <field name="date" type="string" indexed="true" stored="true"/>
• </fields>
![Page 9: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/9.jpg)
WRITING TO CLUSTER
•Write to either Cassandra clients or Solr API
•Write process is the same
• True atomic updates to Cassandra
![Page 10: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/10.jpg)
Cassandra nodes are set up according to row-key hash.
![Page 11: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/11.jpg)
Data can be written directly to Cassandra
![Page 12: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/12.jpg)
Data is distributed according to row key hash and replication factor
![Page 13: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/13.jpg)
DSE first writes to Cassandra
![Page 14: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/14.jpg)
And then updates the secondary index on Solr
![Page 15: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/15.jpg)
The quorum responds with success / failure
![Page 16: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/16.jpg)
Data is now distributed evenly
![Page 17: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/17.jpg)
READING FROM CLUSTER
• Read either Cassandra-side or through Solr API
• Cassandra: fast reads*
• Solr : full-text search
• Read direction affects performance
•Data is stored in Cassandra
![Page 18: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/18.jpg)
Query is sent to node
![Page 19: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/19.jpg)
Node uses gossip to find who has the information
![Page 20: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/20.jpg)
QUERYING CASSANDRA
• Can query Solr or Cassandra directly
• Limited syntax with CQL, can use solr_query parameter
![Page 21: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/21.jpg)
Querying Cassandra directly
![Page 22: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/22.jpg)
Cassandra retrieves information from column
family
![Page 23: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/23.jpg)
Querying Solr index
![Page 24: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/24.jpg)
Row-key hashes are stored in Solr, and
Cassandra is queried for stored data
![Page 25: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/25.jpg)
Cassandra node sends request to node with the
corresponding hash, returns information
![Page 26: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/26.jpg)
Data is always synced
![Page 27: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/27.jpg)
Both nodes respond with information
![Page 28: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/28.jpg)
Updates can be committed and searched over in real time
![Page 29: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/29.jpg)
PRODUCTION USE
•Will want a mix of analytics, search nodes
![Page 30: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/30.jpg)
An OLTP - OLAP integrated solution
![Page 31: An Introduction to Distributed Search with Datastax Enterprise Search](https://reader034.vdocuments.mx/reader034/viewer/2022052505/554f44c7b4c90572088b558d/html5/thumbnails/31.jpg)
TRADEOFFS
• Changing the Solr schema requires reindex (standard for Solr)
•No multi-valued fields or composite columns