hbasecon 2012 | developing real time analytics applications using hbase in the cloud - rick tucker,...
DESCRIPTION
As small companies are adapting to handle Big Data, the cloud and HBase enable developers to leverage that data to provide revenue-generating real time applications. When developing a real time application for an existing system, one must balance incrementing counters in real time with Map Reduce jobs over the same data-set. When maintaining an analytics platform, ensuring data accuracy is essential. At Sproxil, SMS logs are ingested into HBase at a growing rate and we report metrics such as SMS throughput, unique user growth over time, and return SMS user activity in real time. Sproxil provides a versatile analytics application enabling customers to handpick statistics on demand to gain market insights enabling them react quickly to trends. This talk will identify the most profitable metrics and demonstrate how to calculate them using Map Reduce while continually updating data as it arrives.TRANSCRIPT
© 2012 Sproxil, [email protected] May 22,2012 1
Developing Real Time Analytics Applications Using HBase in the Cloud
May 22, 2012
Rick Tucker
© 2012 Sproxil, [email protected] May 22,2012 2
About Sproxil
• Brand protection, specializing in anti-counterfeiting solutions
• Solution requires a scalable and high-throughput text message processing engine
• Supports a real-time analytics web interface
1SCRATCH
2TEXT
3VERIFY
© 2012 Sproxil, [email protected] May 22,2012 3
Why HBase?
USER SENDS TEXT MESSAGE
TEXT MESSAGE IS PROCESSED
USER RECEIVES
REPLY
CALCULATE ANALYTICS
Amazon EC2Cloud
© 2012 Sproxil, [email protected] May 22,2012 4
Real-Time Analytics Engine
• MapReduce too slow to maintain data in true real time
• As data arrives, analytical data is updated through counters
Text Message Arrives
Message Analyzed
Increment Counters
Genuine Product Authentication
Repeat Customer
Increment Counter for Genuine Authentications
+1
+1 Increment Counter for Repeat Customers
© 2012 Sproxil, [email protected] May 22,2012
Schema Design: Example 1
5
• Example: View log of text messages in chronological order
• Rowkey: row prefix + timestamp
Rowtransaction 2012-05-22 12:00:00transaction 2012-05-22 12:01:14transaction 2012-05-22 12:02:03
Note: HBase sorts rowkeys lexicographically so scans return data in reverse chronological order
© 2012 Sproxil, [email protected] May 22,2012 6
Schema Design: Example 2
• View log of text messages from individual users
• Rowkey: row prefix + user ID + timestamp
Rowtransaction userID 1 2012-05-22 12:00:00transaction userID 1 2012-05-22 12:01:14transaction userID 2 2012-05-22 12:00:54transaction userID 2 2012-05-22 12:01:22transaction userID 2 2012-05-22 12:02:01
Note: Hbase sorts rows lexicographically so scans return data in reverse chronological order
© 2012 Sproxil, [email protected] May 22,2012 7
Critical Findings
• Schema design is crucial for successful HBase implementation– Pack as much info as possible into row keys
• Use caution with Filters– E.g. Regex filters can be costly– Alternatives:
• Directly query for data you need• Use efficient filters when filtering large data sets
© 2012 Sproxil, [email protected] May 22,2012
Making Counterfeiting Unprofitable™
America | Asia | Africa Sproxil.com
+1 617 682 9577
Thank You! Your global brand protection specialists
– spanning 3 continents and
speaking 9 languages
8