Transcript
Page 1: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Adventures in Discoverability with C* and Solr

Patricia Gorla, Systems Engineer@patriciagorla@o19s

Page 2: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

• Solr

• Cassandra

• Information retrieval

About Me

Paul Hostetler - phostetler.com

Page 3: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

How Do I Find What I’m Looking For?

Simple Complex

Aristotle’s birthplace? All ancient Greek philosophers?

Coordinates of Stagira? All cities within 100km of Stagira

Page 4: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

How Do I Find What I’m Looking For?

Simple Complex

Aristotle’s birthplace? All ancient Greek philosophers?

Coordinates of Stagira? All cities within 100km of Stagira

select birthPlacewhere name = “Aristotle”;

select coordwhere name = “Stagira”;

Page 5: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

How Do I Find What I’m Looking For?

Simple Complex

Aristotle’s birthplace? All ancient Greek philosophers?

Coordinates of Stagira? All cities within 100km of Stagira

select birthPlacewhere name = “Aristotle”;

create index on tag;

select *where tag = “Greek philosophy”;

select coordwhere name = “Stagira”;

???

Page 6: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

How Do I Find What I’m Looking For?

Simple

Aristotle’s birthplace? All ancient Greek philosophers?

Coordinates of Stagira? All cities within 100km of Stagira

q=Aristotle&fl=birthPlace q=Greek philosophy

q=Stagira&fl=point q=*:*&fq={!geofilt pt=40.530, 23.752 sfield=point d=100}

Page 7: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

• Google Site Search

• MySQL ‘like’ statements

Approaches to Search

Seth Casteel - littlefriendsphoto.com

Page 8: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Approaches to Search

• Full-text search

• Ranking (Scoring)

• Tokenization

• Stemming

• Faceting

Page 9: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Approaches to Search

• Full-text search

• Ranking (Scoring)

• Tokenization

• Stemming

• Facetingand many more!

Page 10: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

[1] Pleasure in the job puts perfection in the work.

[2] Education is the best provision for the journey to old age.

[3] If some animals are good at hunting and others are suitable for hunting, then the gods must clearly smile on hunting.

[4] It is the mark of an educated mind to be able to entertain a thought without absorbing it.

Inverted Index

Term Freq Documents

education 2 [2] [4]

hunting 3 [3]

perfection 1 [1]

Page 11: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

<fieldType name="text_general" class="solr.TextField" > <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"/> <filter class="solr.StopFilterFactory” ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer></fieldType>

Defining Field Types

Page 12: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Index-side Analysis

Hope is a waking dream

Hope is a waking dream.

Hope waking dream

hope wake dream

hope waking dream

Punctuation

Stop Words

Lowercase

Stemming

Page 13: Adventures in Discovery with Cassandra and Solr

Hope is a waking dream

Hope is a waking dream.

Hope waking dream

hope wake dream

hope waking dream

Punctuation

Stop Words

Lowercase

Stemming

hope wake dreamdesire awake wish

Synonyms

#CASSANDRAEU CASSANDRASUMMITEU

Query-side Analysis

Page 14: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Facetingfacet_fields: { tags: [hunting: 1, education: 2, work: 2], locations: [Stagira: 5, Chalcis: 3]}

Page 15: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

• Pay Google more $$

• MySQL ‘like’ shards

• Master/Slave replication

• SolrCloud

Approaches to Distribution

Page 16: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

“Distributed search is hard.”

Page 17: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

• Full-text search

• Tokenization

• Stemming

• Date ranges

• Aggregation

• High Availability

• Distributed Nature

Solr + Cassandra: Datastax Enterprise

Page 18: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Person

Examining DBpedia.org Datasets

<http://xmlns.com/foaf/0.1/name> "Aristotle"@en .<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> Person .<http://purl.org/dc/elements/1.1/description> "Greek philosopher"@en .<http://dbpedia.org/ontology/birthPlace> Stagira<http://dbpedia.org/ontology/deathPlace> Chalcis .

<http://dbpedia.org/resource/Stagira> <http://www.opengis.net/gml/_Feature> .<http://dbpedia.org/resource/Stagira#lat> "40.5916667" .<http://dbpedia.org/resource/Stagira#long> "23.7947222" .<http://dbpedia.org/resource/Stagira#point> "40.591667 23.7947222"@en .

Place

Page 19: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

“Love is a single soul inhabiting two bodies.”

Page 20: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

curl http://localhost:8983/solr/solr.person/q=Aristotle

Querying for data

Page 21: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

curl http://localhost:8983/solr/solr.location/select?q=*:* &spatial=true&fq={!geofilt pt=40.53027,23.7525 sfield=point d=100}

Filtering by location

Page 22: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

“There is no great genius without a mixture of madness.”

Page 23: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Unified Schema<fields>

<field name="id" type="string" />

<field name="name" type="text" />

<dynamicField name="*Date" type="date" />

<dynamicField name="*Place" type="text" />

<dynamicField name="*Point" type="location" />

<dynamicField name="*_tag" type="text" />

</fields>

Schema.xml

Page 24: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

curl http://localhost:8983/solr/resource/solr.location/schema.xml \

--data-binary @solr/location_schema.xml \

-H 'Content-type:text/xml; charset=utf-8 '

curl http://localhost:8983/solr/resource/solr.location/solrconfig.xml \

--data-binary @solr/location_solrconfig.xml \

-H 'Content-type:text/xml; charset=utf-8 '

curl http://localhost:8983/solr/admin/cores?action=CREATE&name=solr.location

Upload to DSE, Create Core

Page 25: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Location Schemacqlsh:solr> DESC COLUMNFAMILY location;

CREATE TABLE location (

id text PRIMARY KEY,

"_docBoost" text,

"_dynFld" text,

location text,

name text,

solr_query text,

tags text

) WITH COMPACT STORAGE AND

bloom_filter_fp_chance=0.010000 AND

caching='KEYS_ONLY' AND

comment='' AND

dclocal_read_repair_chance=0.000000 AND

gc_grace_seconds=864000 AND

read_repair_chance=0.100000 AND

replicate_on_write='true' AND

populate_io_cache_on_flush='false' AND

compaction={'class': 'SizeTieredCompactionStrategy'} AND

compression={'sstable_compression': 'SnappyCompressor'};

CREATE INDEX solr_location__docBoost_index ON location ("_docBoost");

...

CREATE INDEX solr_location_solr_query_index ON location (solr_query);

Cassandra Schema

Page 26: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Solr

• No multiValued fields

• No JOIN*

What Changes

Cassandra

• No composite columns

• No counter columns

Page 27: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

• Fault tolerant, available search

Bringing it All Together

Page 28: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

“Thank you.”

Page 29: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

THANK YOU

[email protected]@patriciagorla@o19sAll information, including slides, are on http://github.com/pgorla/million-books


Top Related