adventures in discovery with cassandra and solr

29
#CASSANDRAEU CASSANDRASUMMITEU Adventures in Discoverability with C* and Solr Patricia Gorla, Systems Engineer @patriciagorla @o19s

Upload: patricia-gorla

Post on 08-May-2015

1.598 views

Category:

Technology


2 download

DESCRIPTION

For any venture, storing your data is just the first step in making sense of it. How do you make your system discoverable? How do you tune your relevancy to accommodate real-time updates? In this session, we explore pairing Cassandra with Solr using Datastax Enterprise Search, and look at different search paradigms to help your users find patterns in your data.

TRANSCRIPT

Page 1: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Adventures in Discoverability with C* and Solr

Patricia Gorla, Systems Engineer@patriciagorla@o19s

Page 2: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

• Solr

• Cassandra

• Information retrieval

About Me

Paul Hostetler - phostetler.com

Page 3: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

How Do I Find What I’m Looking For?

Simple Complex

Aristotle’s birthplace? All ancient Greek philosophers?

Coordinates of Stagira? All cities within 100km of Stagira

Page 4: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

How Do I Find What I’m Looking For?

Simple Complex

Aristotle’s birthplace? All ancient Greek philosophers?

Coordinates of Stagira? All cities within 100km of Stagira

select birthPlacewhere name = “Aristotle”;

select coordwhere name = “Stagira”;

Page 5: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

How Do I Find What I’m Looking For?

Simple Complex

Aristotle’s birthplace? All ancient Greek philosophers?

Coordinates of Stagira? All cities within 100km of Stagira

select birthPlacewhere name = “Aristotle”;

create index on tag;

select *where tag = “Greek philosophy”;

select coordwhere name = “Stagira”;

???

Page 6: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

How Do I Find What I’m Looking For?

Simple

Aristotle’s birthplace? All ancient Greek philosophers?

Coordinates of Stagira? All cities within 100km of Stagira

q=Aristotle&fl=birthPlace q=Greek philosophy

q=Stagira&fl=point q=*:*&fq={!geofilt pt=40.530, 23.752 sfield=point d=100}

Page 7: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

• Google Site Search

• MySQL ‘like’ statements

Approaches to Search

Seth Casteel - littlefriendsphoto.com

Page 8: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Approaches to Search

• Full-text search

• Ranking (Scoring)

• Tokenization

• Stemming

• Faceting

Page 9: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Approaches to Search

• Full-text search

• Ranking (Scoring)

• Tokenization

• Stemming

• Facetingand many more!

Page 10: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

[1] Pleasure in the job puts perfection in the work.

[2] Education is the best provision for the journey to old age.

[3] If some animals are good at hunting and others are suitable for hunting, then the gods must clearly smile on hunting.

[4] It is the mark of an educated mind to be able to entertain a thought without absorbing it.

Inverted Index

Term Freq Documents

education 2 [2] [4]

hunting 3 [3]

perfection 1 [1]

Page 11: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

<fieldType name="text_general" class="solr.TextField" > <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"/> <filter class="solr.StopFilterFactory” ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer></fieldType>

Defining Field Types

Page 12: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Index-side Analysis

Hope is a waking dream

Hope is a waking dream.

Hope waking dream

hope wake dream

hope waking dream

Punctuation

Stop Words

Lowercase

Stemming

Page 13: Adventures in Discovery with Cassandra and Solr

Hope is a waking dream

Hope is a waking dream.

Hope waking dream

hope wake dream

hope waking dream

Punctuation

Stop Words

Lowercase

Stemming

hope wake dreamdesire awake wish

Synonyms

#CASSANDRAEU CASSANDRASUMMITEU

Query-side Analysis

Page 14: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Facetingfacet_fields: { tags: [hunting: 1, education: 2, work: 2], locations: [Stagira: 5, Chalcis: 3]}

Page 15: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

• Pay Google more $$

• MySQL ‘like’ shards

• Master/Slave replication

• SolrCloud

Approaches to Distribution

Page 16: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

“Distributed search is hard.”

Page 17: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

• Full-text search

• Tokenization

• Stemming

• Date ranges

• Aggregation

• High Availability

• Distributed Nature

Solr + Cassandra: Datastax Enterprise

Page 18: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Person

Examining DBpedia.org Datasets

<http://xmlns.com/foaf/0.1/name> "Aristotle"@en .<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> Person .<http://purl.org/dc/elements/1.1/description> "Greek philosopher"@en .<http://dbpedia.org/ontology/birthPlace> Stagira<http://dbpedia.org/ontology/deathPlace> Chalcis .

<http://dbpedia.org/resource/Stagira> <http://www.opengis.net/gml/_Feature> .<http://dbpedia.org/resource/Stagira#lat> "40.5916667" .<http://dbpedia.org/resource/Stagira#long> "23.7947222" .<http://dbpedia.org/resource/Stagira#point> "40.591667 23.7947222"@en .

Place

Page 19: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

“Love is a single soul inhabiting two bodies.”

Page 20: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

curl http://localhost:8983/solr/solr.person/q=Aristotle

Querying for data

Page 21: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

curl http://localhost:8983/solr/solr.location/select?q=*:* &spatial=true&fq={!geofilt pt=40.53027,23.7525 sfield=point d=100}

Filtering by location

Page 22: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

“There is no great genius without a mixture of madness.”

Page 23: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Unified Schema<fields>

<field name="id" type="string" />

<field name="name" type="text" />

<dynamicField name="*Date" type="date" />

<dynamicField name="*Place" type="text" />

<dynamicField name="*Point" type="location" />

<dynamicField name="*_tag" type="text" />

</fields>

Schema.xml

Page 24: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

curl http://localhost:8983/solr/resource/solr.location/schema.xml \

--data-binary @solr/location_schema.xml \

-H 'Content-type:text/xml; charset=utf-8 '

curl http://localhost:8983/solr/resource/solr.location/solrconfig.xml \

--data-binary @solr/location_solrconfig.xml \

-H 'Content-type:text/xml; charset=utf-8 '

curl http://localhost:8983/solr/admin/cores?action=CREATE&name=solr.location

Upload to DSE, Create Core

Page 25: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Location Schemacqlsh:solr> DESC COLUMNFAMILY location;

CREATE TABLE location (

id text PRIMARY KEY,

"_docBoost" text,

"_dynFld" text,

location text,

name text,

solr_query text,

tags text

) WITH COMPACT STORAGE AND

bloom_filter_fp_chance=0.010000 AND

caching='KEYS_ONLY' AND

comment='' AND

dclocal_read_repair_chance=0.000000 AND

gc_grace_seconds=864000 AND

read_repair_chance=0.100000 AND

replicate_on_write='true' AND

populate_io_cache_on_flush='false' AND

compaction={'class': 'SizeTieredCompactionStrategy'} AND

compression={'sstable_compression': 'SnappyCompressor'};

CREATE INDEX solr_location__docBoost_index ON location ("_docBoost");

...

CREATE INDEX solr_location_solr_query_index ON location (solr_query);

Cassandra Schema

Page 26: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

Solr

• No multiValued fields

• No JOIN*

What Changes

Cassandra

• No composite columns

• No counter columns

Page 27: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

• Fault tolerant, available search

Bringing it All Together

Page 28: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

“Thank you.”

Page 29: Adventures in Discovery with Cassandra and Solr

#CASSANDRAEU CASSANDRASUMMITEU

THANK YOU

[email protected]@patriciagorla@o19sAll information, including slides, are on http://github.com/pgorla/million-books