riak search - berlin buzzwords 2010
DESCRIPTION
Riak Search is a distributed data indexing and search platform built on top of Riak. The talk will introduce Riak Search, covering overall goals, architecture, and core functionality.TRANSCRIPT
![Page 1: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/1.jpg)
Berlin Buzzwords· June 2010
Basho Technologies
Rusty Klophaus - @rklophaus
Riak SearchA Full-Text Search
and Indexing Engine
based on Riak
![Page 2: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/2.jpg)
Why did we build it?
What are the major goals?
How does it work?
2
![Page 3: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/3.jpg)
Part One
Why did we build
Riak Search?
3
![Page 4: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/4.jpg)
Riak is
a scalable, highly-available, networked,
open-source key/value store.
4
![Page 5: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/5.jpg)
Key/Value
CLIENT RIAK
5
Writing to a Key/Value Store
![Page 6: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/6.jpg)
Object
CLIENT RIAK
6
Writing to a Key/Value Store
![Page 7: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/7.jpg)
Key
Object
CLIENT RIAK
Querying a Key/Value Store
7
![Page 8: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/8.jpg)
Key + Instructions
Object(s)
CLIENT RIAK
Walk to Related
Keys
Querying Riak via LinkWalking
8
![Page 9: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/9.jpg)
Key(s) + JS Functions
Computed Value(s)
CLIENT RIAK
Map
Reduce
Map
Querying Riak via Map/Reduce
9
![Page 10: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/10.jpg)
Key/Value Stores
like
Key-Based Queries
10
![Page 11: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/11.jpg)
where Category == "Shoes"
CLIENT RIAK
WTF!? I'm aKV store!
Query by Secondary Index
11
![Page 12: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/12.jpg)
"Converse AND Shoes"
CLIENT RIAK
This is getting old.
Full-Text Query
12
![Page 13: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/13.jpg)
These kinds of queries
need an Index.
*Market Opportunity!*
13
![Page 14: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/14.jpg)
Part Two
What are the major
goals of Riak Search?
14
![Page 15: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/15.jpg)
Your Application
Riak
An application built on Riak.
15
![Page 16: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/16.jpg)
Your Application
RiakIndex
Object
Hrm... I need an index.
16
![Page 17: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/17.jpg)
Your Application
Riak???
Hrm... I need an index with more features.
17
![Page 18: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/18.jpg)
Your Application
RiakLucene
Lucene should do the trick...
18
![Page 19: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/19.jpg)
Your Application
Lucene Lucene Lucene Riak
...shard to add more storage capacity...
19
![Page 20: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/20.jpg)
Your Application
Lucene Lucene Lucene
Lucene Lucene Lucene
Lucene Lucene Lucene
Riak
...replicate to add more throughput.
20
![Page 21: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/21.jpg)
Your Application
Lucene Lucene Lucene
Lucene Lucene Lucene
Lucene Lucene Lucene
Riak
...replicate to add more throughput.
21
Operations nightmare!
![Page 22: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/22.jpg)
Your Application
Riak-ifiedLucene
Riak
What do we really want?
22
![Page 23: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/23.jpg)
Your Application
RiakSearch
Riak
What do we really want?
23
![Page 24: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/24.jpg)
Functionality? Be like Lucene (and more).
• Lucene Syntax
• Leverages Java Lucene Analyzers
• Solr Endpoints
• Integration via Riak Post-Commit Hook (Index)
• Integration via Riak Map/Reduce (Query)
• Near-Realtime
• Schema-less
24
![Page 25: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/25.jpg)
Operations? Be like Riak.
• No special nodes
• Add nodes, get more compute and storage
• Automatically load balance
• Replicas for durability and performance
• Index and query in parallel
• Swappable storage backends
25
![Page 26: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/26.jpg)
Part Three
How do we do it?
26
![Page 27: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/27.jpg)
A Gentle Introduction to
Document Indexing
27
![Page 28: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/28.jpg)
Every dog has his day.#1
day, 1
dog, 1
every, 1
has, 1
his, 1
Inverted IndexDocument
The Inverted Index
28
![Page 29: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/29.jpg)
The dog's bark is worse than his bite.
Every dog has his day.
Let the cat out of the bag.
It's raining cats and dogs.
#1
#2
#3
#4
Combined Inverted IndexDocuments
and, 4
bag, 3
bark, 2
bite, 2
cat, 3
cat, 4
day, 1
dog, 1
dog, 2
dog, 4
every, 1
has, 1
...
The Inverted Index
29
![Page 30: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/30.jpg)
"dog AND cat"
AND
dog cat
At Query Time...
30
![Page 31: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/31.jpg)
AND
dog cat
dog, 1
dog, 2
dog, 4
cat, 3
cat, 4
At Query Time...
31
![Page 32: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/32.jpg)
AND(Merge Intersection)
1
2
4
3
4
Result: 4
At Query Time...
32
![Page 33: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/33.jpg)
OR(Merge Union)
1
2
4
3
4
Result: 1, 2, 3, 4
At Query Time...
33
![Page 34: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/34.jpg)
Complex Behavior from Simple Structures
34
![Page 35: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/35.jpg)
Storage Approaches...
35
![Page 36: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/36.jpg)
Riak Search uses
Consistent Hashing
to store data on
Partitions
36
![Page 37: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/37.jpg)
Partitions = 10
Number of Nodes = 5
Partitions per Node = 2
Replicas (NVal) = 2
Introduction to Consistent Hashing and Partitions
37
![Page 38: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/38.jpg)
Object
Introduction to Consistent Hashing and Partitions
38
![Page 39: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/39.jpg)
Document Partitioning
vs.
Term Partitioning
39
![Page 40: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/40.jpg)
...and the
Resulting Tradeoffs
40
![Page 41: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/41.jpg)
Every dog has his day.#1
Document Partitioning @ Index Time
41
![Page 42: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/42.jpg)
"dog OR cat"
Document Partitioning @ Query Time
42
![Page 43: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/43.jpg)
Every dog has his day.#1
day, 1
dog, 1
every, 1
has, 1
his, 1
Term Partitioning @ Index Time
43
![Page 44: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/44.jpg)
day, 1 has, 1
every, 1his, 1
dog, 1
Term Partitioning @ Index Time
44
![Page 45: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/45.jpg)
"dog OR cat"
Term Partitioning @ Query Time
45
![Page 46: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/46.jpg)
Document Partitioning Term Partitioning
+ Lower Latency Queries
- Lower Throughput
- Lots of Disk Seeks
- Higher Latency Queries
+ Higher Throughput
- Hotspots in Ring (the "Obama" problem)
Tradeoffs...
46
![Page 47: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/47.jpg)
Riak Search: Term Partitioning
47
Term-partitioning is the most viable approach for our beta clients’ needs: high throughput on Really Big Datasets.
Optimizations:
• Term splitting to reduce hot spots
• Bloom filters & caching to save query-time bandwidth
• Batching to save query-time & index-time bandwidth
Support for either approach eventually.
![Page 48: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/48.jpg)
Part Four
Review
48
![Page 49: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/49.jpg)
"Converse AND Shoes"
CLIENT RIAK
WTF!? I'm a
KV store!
Riak Search turns this...
49
![Page 50: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/50.jpg)
"Converse AND Shoes"
CLIENT RIAK
Gladly!
...into this...
50
![Page 51: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/51.jpg)
"Converse AND Shoes"
CLIENT RIAK
Keys or Objects
...into this...
51
![Page 52: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/52.jpg)
Your Application
RiakSearch
Riak
...while keeping operations easy.
52
![Page 53: Riak Search - Berlin Buzzwords 2010](https://reader033.vdocuments.mx/reader033/viewer/2022051611/54b72c9a4a795903318b45e5/html5/thumbnails/53.jpg)
Thanks! Questions?
Search Team:
John Muellerleile - @jrecursive
Rusty Klophaus - @rklophaus
Kevin Smith - @kevsmith
Currently working with a small set of Beta users.
Open-source release planned for Q3.
www.basho.com