case studies session 2
TRANSCRIPT
BlackbirdBillions of rows, couple of milliseconds away
Ishan ChhabraShrijeet PaliwalAbhijit Pol
$2.38965$0.6782$1.7234
$0.09$1.78964$1.6782$1.7234$0.809$2.421.25
$2.11$1.26
$2.178$2.056$0.809$2.421.25
$2.11$1.26$2.78$1.56
$1.809$2.421.25
$2.11$1.26$2.78$0.56$2.421.25
$2.11$1.26$2.78
$0.756$0.809$2.421.25
$2.11$1.26$2.78
$1.256$1.809$2.421.25
$2.11$1.26$2.78
$0.586$2.009
1.25$2.11$1.26$2.78$1.56
$0.00
Site/PageGeo/WeatherTime of DayBrand AffinityUser
[ + ][ + ]
User Segments
3. Bid Request
5. Rocket Fuel Winning Ad
2. Ad Request
6. Ad Served
1. Page Request
4. Bid & Ad
Browser
User Engagements
Publishers
Data Partners
Exchange Partners
Optimize
Simple View of Rocket Fuel Platform
Real-time Bidder
User Engagement
User DataStore
Model Scoring
So what is Blackbird?
Facebook likes
Searches on Google
Bid Requests Considered by Rocketfuel
5 B
6 B
45 B
Requests per day
Blink of an eye
SF to Tokyo network round trip
One beat of a hummindbird's wing
Look up in Blackbird
400
100
20
2
Time (ms)
Powered by
HBase, we have a problem..
Object NoSQL Mapper
List<KeyValue>
High Performance Collections
» Data loss on concurrent modification» Read per write» High amount of data per write» O(n)
» Significantly reduced flushes, compaction, network usage, GC.
» O(1)
Combined Column: 100 entries
c1:combined
1 entry
c1:rand1
2 entries
c1:rand2
1 entry
c1:rand3
Logical Collection
Append Only, the HBase view
Optimizing reads using normalization
Combined Column: 100 entries
c1:combined
1 entry
c1:rand1
2 entries
c1:rand2
1 entry
c1:rand3
Combined Column: 103 entries
c1:combined
.filter( ), .transform( ), ⋋ ⋋ ⨍
Secondary Indexes
High ThroughputLow Latency
Lookups
Not so easy!
HBase is designed for high throughput writes
Key Ideas
Read as little as possible
Stay stable, uniform, data local
Don’t go to disk
Even if you have to go
to disk, make it fast
Protobufs, Protobufs, everywhere
Stay stable, uniform, data local at all times
Good quality hardware
Properly designed row keys
Off peak daily major compaction
Give me all your Cache!
128 Gb machines with 50% block CacheHigh Cache hit ratio (90% +) by effective utilization
It’s time to disk(o)
15K SAS drivesLocal & Short circuit reads (20-30% improvement)
High throughput writes aresupported too!
Small Writes
• Append Only• Protobufs
Large Memstores
• 4 Gb• Avoids flushes,
memory churn, compaction
• Maintains read performance by avoiding multiple seeks
Tuned Compaction
• Avoid Minor compactions
• Off Peak Major compaction
Reliability & Availability
Organize the chaos or pay the cost..
» Blind writes can grow rows & table too big
» Newbie clients 'guess' a lot
» Simple queries such as row count can be hard on the fly
Be aware…
Web app Bid Serving
Ad serving Data augmentation
Batch data pipelines Ops Housekeeping
Real time data pipelines
Multitenant Blackbird
Multi tenancy makes it hard to find the defaulter
Use ACLs & client side metrics in all access paths
Draft guidelines for new clients, help them estimate the growth
Keep track of growth, row count, row size, column size etc.
Maintaining SLA Guarantees
It’s a delicate equilibrium that is hard to maintain
Shield it with aggressive alerting, dashboards & canary monitoring
1st region server dies after several hours of clogged RPC queue
Bad region moves to another region server & soon kills it too!
2jmj7l5rSw0yVb_vlWAYkK_Ybwk
stgLVlK_SsLMn4HoG82ymp-QlRtA
Clients can go rouge, it can get as bad as a DoS attack
Protection via dynamic blacklists & size limit filters
Surviving the failures
» In absence of proxy: ‘The client is part of the cluster’ [1]
» Client must report availability error to calling application thread in short time span
» Follow circuit breaker pattern for read calls (Anecdote)
» ‘pseudo’ puts (local file) for write calls
[1] Blog post from Lars Hofhansl http://hadoop-hbase.blogspot.com/2012/09/hbase-client-timeouts.html
Shoutouts!
Obligatory “we are hiring” slide!
http://rocketfuel.com/careers