@indeedeng: imhotep - large scale analytics and machine learning at indeed
DESCRIPTION
Video available at: https://www.youtube.com/watch?v=z4JTjUp3NC0 To scale the building of decision trees on large amounts of Indeed job search data, we created a system called Imhotep. In addition to being a crucial tool for building these machine learning models, Imhotep has proven to be applicable to many different analytics problems. The core of Imhotep is a distributed system that manages the parallel execution of queries across a set of time-sharded inverted indices. This talk covers Imhotep’s primitive operations that allow us to build decision trees, drill into data, build graphs, and even execute sql-like queries in IQL (Imhotep Query Language). We will also discuss what makes Imhotep fast, highly available, and fault tolerant.TRANSCRIPT
go.indeed.com/IndeedEngTalks
Imhotep Large Scale Analytics and Machine Learning at Indeed
Jeff PlaisanceEngineering Manager
I help people get jobs.
Indeed is aSearch Engine for Jobs
Indeed is a data driven organization
Indeed is a data driven organization
Data driven organizations need great tools
What does Imhotep allow you to do?
● Decision Tree Building● Analytics
What does Imhotep allow you to do?
● Decision Tree Building● Analytics
Indeed’s Analytics Philosophy
Analytics systems should be:1. Interactive2. Not Sampled3. Not Approximate
Imhotep answers questions
What was the weekly average query time in the last quarter from people doing the query “software”?
Imhotep answers questions
What percent of jobsearch results pages are for page 2 and beyond?
Imhotep answers questions
What are the 5 most common queries in each country?
Total Job Searches From 2014-03-09 to 2014-03-23
?
Query
Query Location
Query Location
Impression
Document
query: “indeed software engineer”location: “austin”impressions: 10clicks: 2time: 2014-03-17T12:00:00
Shard
0 21 3 4
5 76 8 9
10 1211 13 14
Shard
0 21 3 4
5 76 8 9
10 1211 13 14
Server2014/03/02 2014/03/09 2014/03/11
2014/03/12 2014/03/22 2014/03/24
Documents Documents Documents
Documents Documents Documents
Server2014/03/02 2014/03/09 2014/03/11
2014/03/12 2014/03/22 2014/03/24
Documents Documents Documents
Documents Documents Documents
Cluster
2014-03-02
Server A
2014-03-03
Server B
2014-03-04
Server C
Cluster
2014-03-02 2014-03-03
Server B
2014-03-04
Server CServer A
Cluster
2014-03-02 2014-03-03
Server B
2014-03-04
Server C
Client
Session
Server A
Total Job Searches From 2014-03-09 to 2014-03-23
secret
Total Job Searches From 2014-03-09 to 2014-03-23 Per Day
2014-03-09 2014-03-16 2014-03-23
Metrics
● 64 bit integers● Exactly one value per doc● Random access by doc id
Metrics
● Time● Clicks● Impressions● Revenue● … or anything else that is a number
Groups
● Documents are placed into numbered groups
● Every document starts in group 1● Group 0 means “filtered out”
Groups
● Groups are stateful and scoped to a session● Regroup operations update group for each
doc in shard
width
Metric Regroup
● Iterate over doc_id->metric lookup● Set group to
(value - start)/ bucket_width● Useful for making graphs (buckets on x-axis)
1 2 3 4 5
start end
Get Group Stats
● For each group, sums a metric for all docs in that group
Bucket By Day
1. Regroup on time metric2. Get Group Stats for count metric (always 1)
Total Job Searches From 2014-03-09 to 2014-03-23 Per Day
2014-03-09 2014-03-16 2014-03-23
Total and US Job Searches From 2014-03-09 to 2014-03-23 Per Day
2014-03-09 2014-03-16 2014-03-23
Inverted Indexes
Inverted Index
● Like index in the back of a book● words = terms, page numbers = doc ids● Term list is sorted● Doc list for each term is sorted
doc id query country impressions clicks
0 software Canada 10 1
1 blank Canada 10 0
2 sales US 5 0
3 software US 8 1
4 blank US 10 1
Standard Index
Constructing an Inverted Indexquery country impression clicks
doc id blank sales software Canada US 5 8 10 0 1
0 ✔ ✔ ✔ ✔
1 ✔ ✔ ✔ ✔
2 ✔ ✔ ✔ ✔
3 ✔ ✔ ✔ ✔
4 ✔ ✔ ✔ ✔
Constructing an Inverted Indexfield term 0 1 2 3 4
query blank ✔ ✔
sales ✔
software ✔ ✔
country Canada ✔ ✔
US ✔ ✔ ✔
impressions 5 ✔
8 ✔
10 ✔ ✔ ✔
clicks 0 ✔ ✔
1 ✔ ✔ ✔
Inverted Indexfield term doc list
query blank 1, 4
sales 2
software 0, 3
country Canada 0, 1
US 2, 3, 4
impressions 5 2
8 3
10 0, 1, 4
clicks 0 1, 2
1 0, 3, 4
Inverted Indexes
Allow you to:● Quickly find all documents containing
a term● Intersect several terms to perform
boolean queries
Lucene
● Open source inverted index implementation● Reasonably fast● Widely used, well tested
Global and US Job Searches From 2014-03-09 to 2014-03-23 Per Day
2014-03-09 2014-03-16 2014-03-23
field term doc list
query blank 1, 4
sales 2
software 0, 3
country Canada 0, 1
US 2, 3, 4
impressions 5 2
8 3
10 0, 1, 4
clicks 0 1, 2
1 0, 3, 4
Searches in the US only
field term doc list
query blank 1, 4
sales 2
software 0, 3
country Canada 0, 1
US 2, 3, 4
impressions 5 2
8 3
10 0, 1, 4
clicks 0 1, 2
1 0, 3, 4
Searches in the US only
Searches in the US onlyfield term doc list
country Canada 0, 1
US 2, 3, 4
Searches in the US only
Query Regroup● Regroup all docs which do not match a
boolean query to group zero
field term doc list
country Canada 0, 1
US 2, 3, 4
Term Regroup
Splits docs in a group into one of two new groups based on presence/absence of a term
country:US everything else
1
32
Multiterm Regroup
Generalization of term regroup to N terms and N+1 new groups
country:US everything elsecountry:CA country:FR
52 3 4
1
Total and US Job Searches From 2014-03-09 to 2014-03-23 Per Day
2014-03-09 2014-03-16 2014-03-23
Inverted Index Compression
Size of Organic Dataset for last 5 months● Original: 102 TB● Inverted: 51 TB
Inverted Index Optimizations
● Compressed data structures○ Better use of RAM and processor cache○ Better use of memory bandwidth○ Increased CPU usage and time
● Micro optimizations matter!
Delta / Varint Encoding
● Doc id lists are sorted● Delta between a doc id and the previous doc
id is sufficient● Deltas are usually small integers● Use less bits for small integers and more bits
for large integers
Delta Encoding
field term doc list
query nursing 34, 86, 247, 301, 674, 714
Delta Encoding
field term doc list
query nursing 34, 86, 247, 301, 674, 714
34, 52, 161, 54, 373, 40
Small Integer Compression
● Golomb/Rice● Varint● Binary Packing● PForDelta
Small Integer Compression
● Golomb/Rice● Varint● Bit Packing● PForDelta
Varint Encoding
9838
Varint Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
Varint Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
Varint Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
Varint Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
Varint Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
? 1 1 0 1 1 1 0
Varint Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9838
? 1 1 0 1 1 1 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
Varint Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
? 1 1 0 1 1 1 0
Varint Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
Varint Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
Varint Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
? 1 0 0 1 1 0 0
Varint Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
? 1 0 0 1 1 0 0
Varint Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
? 1 0 0 1 1 0 0
Varint Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
0 1 0 0 1 1 0 0
Varint Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
0 1 0 0 1 1 0 0
Inverted Index Compression
Size of Organic Dataset for last 5 months● Original: 102 TB● Inverted: 51 TB● Delta / Varint: 17 TB
Flamdex
● Two files per field (terms/docs)● Can add fields without rebuilding index● Faster varint decoding● No TF or positions (or wasted time decoding
them)
Varints
Pros:● Compression● Can fit more of index in RAM● Higher information throughput per byte read
from disk
Varints
Cons:● Decodes one byte at a time● Lots of branch mispredictions● Not fast to decode
Vectorized Varint Decoding
01001010 11001000 01110001 01001110
10011011 01101010 10110101 00010111
01110110 10001101 10110011 11000001
Vectorized Varint Decoding
01001010 11001000 01110001 01001110
10011011 01101010 10110101 00010111
01110110 10001101 10110011 11000001
pmovmskb: Extract top bit of each byte
Vectorized Varint Decoding
01001010 11001000 01110001 01001110
10011011 01101010 10110101 00010111
01110110 10001101 10110011 11000001
pmovmskb: Extract top bit of each byte
010010100111
Vectorized Varint Decoding
01001010 11001000 01110001 01001110
10011011 01101010 10110101 00010111
01110110 10001101 10110011 11000001
pmovmskb: Extract top bit of each byte
010010100111Lookup in 4096 entry lookup table
010010100111
Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume
010010100111
Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume
010010100111
Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume
010010100111
Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume
010010100111
Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume
010010100111
Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume
010010100111
Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume
010010100111
Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume
010010100111
Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume
010010100111
Decoding options for:● up to twelve 1 byte varints● six 1-2 byte varints● four 1-3 byte varints● two 1-5 byte varints
Vectorized Varint Decoding
● Decode six 1-2 byte varints in parallel
● Need to pad out all 1 byte varints to 2 bytes
pshufb: Intel SSSE3 instruction to shuffle bytes
Vectorized Varint Decoding
01001010 11001000 01110001 01001110
10011011 01101010 10110101 00010111
01110110 10001101 10110011 11000001
Decode 6 varints from 9 bytes
Vectorized Varint Decoding
01001010 11001000 01110001 01001110
10011011 01101010 10110101 00010111
01110110 10001101 10110011 11000001
Pad out 1 byte ints to 2 bytes
Vectorized Varint Decoding
01001010 00000000 11001000 01110001
01001110 00000000 10011011 01101010
10110101 00010111 01110110 00000000
Pad out 1 byte ints to 2 bytes
Vectorized Varint Decoding
01001010 00000000 11001000 01110001
01001110 00000000 10011011 01101010
10110101 00010111 01110110 00000000
Reverse bytes in 2 byte varints
Vectorized Varint Decoding
00000000 01001010 01110001 11001000
00000000 01001110 01101010 10011011
00010111 10110101 00000000 01110110
Reverse bytes in 2 byte varints
Vectorized Varint Decoding
00000000 01001010 01110001 11001000
00000000 01001110 01101010 10011011
00010111 10110101 00000000 01110110
Mask out leading purple 1’s
Vectorized Varint Decoding
00000000 01001010 01110001 01001000
00000000 01001110 01101010 00011011
00010111 00110101 00000000 01110110
Mask out leading purple 1’s
Vectorized Varint Decoding
00000000 01001010 01110001 01001000
00000000 01001110 01101010 00011011
00010111 00110101 00000000 01110110
Shift top bytes of each varint 1 bit right (mask/shift/or)
Vectorized Varint Decoding
00000000 01001010 00111000 11001000
00000000 01001110 00110101 00011011
00001011 10110101 00000000 01110110
Shift top bytes of each varint 1 bit right (mask/shift/or)
Vectorized Varint Decoding
00000000 01001010 00111000 11001000
00000000 01001110 00110101 00011011
00001011 10110101 00000000 01110110
● ~10 instructions● No branches● Less than 2 instructions per varint
Vectorized Varint Decoding
00000000 01001010 00111000 11001000
00000000 01001110 00110101 00011011
00001011 10110101 00000000 01110110
● Imhotep spends ~40% of its CPU time decoding varints
Vectorized Varint Decoding
00000000 01001010 00111000 11001000
00000000 01001110 00110101 00011011
00001011 10110101 00000000 01110110
● Imhotep spends ~40% of its CPU time decoding varints
● Vectorized decoder ~3-5x faster○ Decompresses at 1.5 GB per second○ ~2x overall system performance
Top 5 Locations
Term Stats
atlanta 49
austin 14
boston 25
chicago 28
dallas 13
houston 36
new york 68
san francisco 54
Term Stats Iterator
● For each term in a field, sum metrics across all docs containing that term
Term Stats Iterator
● For each term in a field, sum metrics across all docs containing that term
● How do we compute this across many machines?
dallas 5
boston 12
austin 3
atlanta 16
dallas 8
chicago 19
austin 4
atlanta 12
chicago 9
boston 13
austin 7
atlanta 21
dallas 5
boston 12
austin 3
atlanta 16
dallas 8
chicago 19
austin 4
atlanta 12
chicago 9
boston 13
austin 7
atlanta 21
dallas 5
boston 12
austin 3
atlanta 16
dallas 8
chicago 19
austin 4
atlanta 12
chicago 9
boston 13
austin 7
atlanta 21
dallas 5
boston 12
austin 3
atlanta 16
dallas 8
chicago 19
austin 4
atlanta 12
chicago 9
boston 13
austin 7
atlanta 21
dallas 5
boston 12
austin 3
atlanta 16
chicago 9
boston 13
austin 7
atlanta 21
atlanta 49
dallas 5
boston 12
austin 3
atlanta 16
dallas 8
chicago 19
austin 4
atlanta 12
chicago 9
boston 13
austin 7
atlanta 21
atlanta 49
dallas 5
boston 12
austin 3
atlanta 16
chicago 9
boston 13
austin 7
atlanta 21
dallas 5
boston 12
austin 3
atlanta 16
dallas 8
chicago 19
austin 4
atlanta 12
chicago 9
boston 13
austin 7
atlanta 21
dallas 5
boston 12
austin 3
dallas 8
chicago 19
austin 4
atlanta 12
chicago 9
boston 13
austin 7
atlanta 21
atlanta 49atlanta 49
dallas 5
boston 12
austin 3
dallas 8
chicago 19
austin 4
chicago 9
boston 13
austin 7
atlanta 21
atlanta 49atlanta 49
chicago 9
boston 13
austin 7
atlanta 49atlanta 49
dallas 5
boston 12
austin 3
dallas 8
chicago 19
austin 4
austin 14atlanta 49
chicago 9
boston 13
austin 7
dallas 5
boston 12
austin 3
dallas 8
chicago 19
austin 4
austin 14
atlanta 49
chicago 9
boston 13
austin 7
dallas 5
boston 12
austin 3
dallas 8
chicago 19
austin 4
dallas 5
boston 12
austin 14
atlanta 49
chicago 9
boston 13
austin 7
dallas 8
chicago 19
austin 4
dallas 8
chicago 19
dallas 5
boston 12
austin 14
atlanta 49
chicago 9
boston 13
austin 7
chicago 9
boston 13
dallas 8
chicago 19
dallas 5
boston 12
austin 14
atlanta 49
chicago 9
boston 13dallas 8
chicago 19
dallas 5
boston 12
boston 25austin 14
atlanta 49
boston 25
austin 14
atlanta 49
chicago 9
boston 13
dallas 8
chicago 19
dallas 5
boston 12
dallas 5
boston 25
austin 14
atlanta 49
chicago 9
boston 13
dallas 8
chicago 19
chicago 9dallas 5
boston 25
austin 14
atlanta 49
dallas 8
chicago 19
chicago 9dallas 5
chicago 28boston 25
austin 14
atlanta 49
dallas 8
chicago 19
chicago 28
boston 25
austin 14
atlanta 49
chicago 9dallas 5
dallas 8
chicago 19
dallas 8
chicago 28
boston 25
austin 14
atlanta 49
chicago 9dallas 5
dallas 8
chicago 28
boston 25
austin 14
atlanta 49
dallas 5
dallas 8
dallas 13chicago 28
boston 25
austin 14
atlanta 49
dallas 5
dallas 5 dallas 8
dallas 13
chicago 28
boston 25
austin 14
atlanta 49
dallas 8
dallas 13
chicago 28
boston 25
austin 14
atlanta 49
dallas 13
chicago 28
boston 25
austin 14
atlanta 49
Term Stats 1-6
TS 1 TS 2 TS 3 TS 4 TS 5 TS 6
TS 1-6 TS 7-12 TS 13-18
TS 1-6 TS 7-12 TS 13-18
Term Stats 1-18
Amdahl’s Law
● The speedup of a program using multiple processors is limited by the time needed for the sequential fraction of the program
Amdahl’s Law
● Sequential part of FTGS is last step in merge
● Can we distribute some part of the final merge?
Hash Partition + Interleave
● Send all stats for each unique term to the same thread based on a hash of the term
● Interleave merged terms
TS 1-6 TS 7-12 TS 13-18
Term Stats 1-18
Shard Distribution
dallas 5
boston 12
austin 3
atlanta 16
dallas 8
chicago 19
austin 4
atlanta 12
chicago 9
boston 13
austin 7
atlanta 21
dallas 5
boston 12
austin 3
atlanta 16
dallas 8
chicago 19
austin 4
atlanta 12
chicago 9
boston 13
austin 7
atlanta 21
dallas 5
boston 12
austin 3
atlanta 16
dallas 8
chicago 19
austin 4
atlanta 12
chicago 9
boston 13
austin 7
atlanta 21
dallas 5
boston 12
austin 3
atlanta 16
dallas 8
chicago 19
austin 4
atlanta 12
chicago 9
boston 13
austin 7
atlanta 21
dallas 5
boston 12
austin 3
atlanta 16
dallas 8
chicago 19
austin 4
atlanta 12
chicago 9
boston 13
austin 7
atlanta 21
dallas 5boston 12austin 3
atlanta 16
dallas 8chicago 19
austin 4
atlanta 12
chicago 9
boston 13austin 7
atlanta 21
dallas 5
boston 12
atlanta 16
dallas 8
atlanta 12
boston 13
atlanta 21
dallas 5
boston 12
atlanta 16
dallas 8
atlanta 12
boston 13
atlanta 21
dallas 5
boston 12
atlanta 16dallas 8
atlanta 12boston 13
atlanta 21
atlanta 49
dallas 5
boston 12 dallas 8 boston 13
boston 25atlanta 49
dallas 5 dallas 8
dallas 13boston 25
atlanta 49
dallas 13boston 25
atlanta 49
dallas 13
boston 25
atlanta 49
chicago 28
austin 14
dallas 13
boston 25
atlanta 49
chicago 28
austin 14
dallas 13
boston 25
atlanta 49chicago 28
austin 14
atlanta 49
dallas 13
boston 25
atlanta 49chicago 28
austin 14
atlanta 49
dallas 13
boston 25
atlanta 49
chicago 28
austin 14
dallas 13
boston 25
atlanta 49
chicago 28
austin 14
austin 14atlanta 49
dallas 13
boston 25
chicago 28
austin 14
austin 14
atlanta 49
dallas 13
boston 25
chicago 28
austin 14
chicago 28
dallas 13
boston 25
austin 14
atlanta 49
boston 25austin 14
atlanta 49
chicago 28
dallas 13
boston 25
boston 25
austin 14
atlanta 49
chicago 28
dallas 13
boston 25
dallas 13
boston 25
austin 14
atlanta 49
chicago 28
chicago 28boston 25
austin 14
atlanta 49
dallas 13 chicago 28
chicago 28
boston 25
austin 14
atlanta 49
dallas 13 chicago 28
chicago 28
boston 25
austin 14
atlanta 49
dallas 13
dallas 13
dallas 13chicago 28
boston 25
austin 14
atlanta 49
dallas 13
dallas 13chicago 28
boston 25
austin 14
atlanta 49
dallas 13
chicago 28
boston 25
austin 14
atlanta 49
Shard Distribution
● Lots of datasets for different event types● Each dataset is split into one shard per
(hour/day)● Each shard has 2 replicas for fault tolerance● How do we assign shards to machines?
Shard Distribution Considerations
● Space● Load● Hot Spots● Adding/Removing machines
Homogeneous vs. Heterogeneous Systems
● Must decide how or if you will handle heterogeneous hardware
● Cannot balance for both space and load on heterogeneous hardware
1 TB
3 TB
Homogeneous vs. Heterogeneous
Homogeneous vs. Heterogeneous
12 shards50% capacity used
4 shards50% capacity used
Homogeneous vs. Heterogeneous
12 shards50% capacity used
4 shards50% capacity used
read hotspot
Homogeneous vs. Heterogeneous
8 shards33% capacity used
8 shards100% capacity used
wasted space
Hot Spots
When accessing any subset of a dataset, evenly spread the load across CPUs, drives, network cards
Hot Spots
When accessing any subset of a dataset, evenly spread the load across CPUs, drives, network cards
This is hard
Hot Spots
Maybe random is good enough?
Hot Spots
Maybe random is good enough?
On average about 10% more data read from the most loaded machine than the least
Two Choice Randomized Load Balancing
● 2 replicas of each shard to choose from● Greedily choose the machine that currently
has the least load from this client
Two Choice Randomized Load Balancing
● 2 replicas of each shard to choose from● Greedily choose the machine that currently
has the least load from this client● On average about 1% more data read from
the most loaded machine than the least
Rendezvous Hashing
● Assignment of a shard to machines determined only by the machines that exist in the cluster
● Hash all pairs of shard ID and machine ID and pick the largest two
Rendezvous Hashing
Shard ID: organic.2014-03-02T06:00:00
H(Shard ID + m1) = 0.592624H(Shard ID + m2) = 0.294647H(Shard ID + m3) = 0.736681H(Shard ID + m4) = 0.647578H(Shard ID + m5) = 0.835598
Rendezvous Hashing
0
1m5
m3m4
m1
m2
Rendezvous Hashing
0
1m5
m3m4
m1
m2
Rendezvous Hashing
0
1m5
m3m4
m1
m2
Rendezvous Hashing
● No coordination required - deterministic algorithm used to determine assignment
● No centralized storage for shard to machine assignment
Rendezvous Hashing
Rendezvous Hashing
Rendezvous Hashing
Rendezvous Hashing
Rendezvous Hashing
Rendezvous Hashing
Rendezvous Hashing
Rendezvous Hashing
Rendezvous Hashing
Rendezvous Hashing
Expected max hash for a shard is
Rendezvous Hashing
Expected max hash for a shard is
Probability that new machine will get shard
Rendezvous Hashing
Imhotep answers questions
What was the weekly average query time in the last quarter from people doing the query “software”?
1. Query Regroup on query:software2. Metric Regroup on time, width 7 days3. Get Group Stats on query time and count,
divide after summing
Ramses
Imhotep answers questions
What percent of jobsearch results pages are for page 2 and beyond?
1. Get Group Stats on count2. Query Regroup on “-page:1”3. Get Group Stats on count4. Divide -page:1 count by total count
Ramses
Imhotep answers questions
What are the 5 most common queries in each country?
1. Multiterm Regroup on all values of country2. Term Group Stats Iteration on query
IQL
select count()
from jobsearch
‘2014-01-01’
‘2014-03-26’
group by country, query[5]
IQL
select count()
from jobsearch
‘2014-01-01’
‘2014-03-26’
group by country, query[5]
Metrics
select count()
from jobsearch
‘2014-01-01’
‘2014-03-26’
group by country, query[5]
IQL
Dataset
select count()
from jobsearch
‘2014-01-01’
‘2014-03-26’
group by country, query[5]
IQL
Regroup
select count()
from jobsearch
‘2014-01-01’
‘2014-03-26’
group by country, query[5]
IQL
Term Group Stats
Imhotep
Large Scale Analytics and Machine Learning
Imhotep
Large Scale Analytics and Machine Learning
● Varint Decoding: High Performance Vector Instructions
● Stream Merging: Hash Partition + Interleave
● Shard Distribution: Rendezvous Hashing
We’re Open Sourcing Imhotep
How You Can Use Imhotep
Data Ingestion● TSV Uploader● HadoopData Access● Imhotep Primitives● IQL
Next @IndeedEng TalkLarge Scale Interactive Analytics
with Imhotep
Tom Bergman, Product ManagerZak Cocos, Manager of Marketing Sciences
April 30, 2014
http://engineering.indeed.com/talks
Q&A
More Questions?David James
Next @IndeedEng TalkLarge Scale Interactive Analytics
with Imhotep
Tom Bergman, Product ManagerZak Cocos, Manager of Marketing Sciences
April 30, 2014
http://engineering.indeed.com/talks