© 2012 Sproxil, [email protected] May 22,2012 1
Developing Real Time Analytics Applications Using HBase in the Cloud
May 22, 2012
Rick Tucker
© 2012 Sproxil, [email protected] May 22,2012 2
About Sproxil
• Brand protection, specializing in anti-counterfeiting solutions
• Solution requires a scalable and high-throughput text message processing engine
• Supports a real-time analytics web interface
1SCRATCH
2TEXT
3VERIFY
© 2012 Sproxil, [email protected] May 22,2012 3
Why HBase?
USER SENDS TEXT MESSAGE
TEXT MESSAGE IS PROCESSED
USER RECEIVES
REPLY
CALCULATE ANALYTICS
Amazon EC2Cloud
© 2012 Sproxil, [email protected] May 22,2012 4
Real-Time Analytics Engine
• MapReduce too slow to maintain data in true real time
• As data arrives, analytical data is updated through counters
Text Message Arrives
Message Analyzed
Increment Counters
Genuine Product Authentication
Repeat Customer
Increment Counter for Genuine Authentications
+1
+1 Increment Counter for Repeat Customers
© 2012 Sproxil, [email protected] May 22,2012
Schema Design: Example 1
5
• Example: View log of text messages in chronological order
• Rowkey: row prefix + timestamp
Rowtransaction 2012-05-22 12:00:00transaction 2012-05-22 12:01:14transaction 2012-05-22 12:02:03
Note: HBase sorts rowkeys lexicographically so scans return data in reverse chronological order
© 2012 Sproxil, [email protected] May 22,2012 6
Schema Design: Example 2
• View log of text messages from individual users
• Rowkey: row prefix + user ID + timestamp
Rowtransaction userID 1 2012-05-22 12:00:00transaction userID 1 2012-05-22 12:01:14transaction userID 2 2012-05-22 12:00:54transaction userID 2 2012-05-22 12:01:22transaction userID 2 2012-05-22 12:02:01
Note: Hbase sorts rows lexicographically so scans return data in reverse chronological order
© 2012 Sproxil, [email protected] May 22,2012 7
Critical Findings
• Schema design is crucial for successful HBase implementation– Pack as much info as possible into row keys
• Use caution with Filters– E.g. Regex filters can be costly– Alternatives:
• Directly query for data you need• Use efficient filters when filtering large data sets
© 2012 Sproxil, [email protected] May 22,2012
Making Counterfeiting Unprofitable™
America | Asia | Africa Sproxil.com
+1 617 682 9577
Thank You! Your global brand protection specialists
– spanning 3 continents and
speaking 9 languages
8