hbasecon 2012 | developing real time analytics applications using hbase in the cloud - rick tucker,...

8
© 2012 Sproxil, Inc. [email protected] May 22,2012 Developing Real Time Analytics Applications Using HBase in the Cloud May 22, 2012 Rick Tucker [email protected] 1

Upload: cloudera-inc

Post on 27-Jun-2015

1.014 views

Category:

Technology


0 download

DESCRIPTION

As small companies are adapting to handle Big Data, the cloud and HBase enable developers to leverage that data to provide revenue-generating real time applications. When developing a real time application for an existing system, one must balance incrementing counters in real time with Map Reduce jobs over the same data-set. When maintaining an analytics platform, ensuring data accuracy is essential. At Sproxil, SMS logs are ingested into HBase at a growing rate and we report metrics such as SMS throughput, unique user growth over time, and return SMS user activity in real time. Sproxil provides a versatile analytics application enabling customers to handpick statistics on demand to gain market insights enabling them react quickly to trends. This talk will identify the most profitable metrics and demonstrate how to calculate them using Map Reduce while continually updating data as it arrives.

TRANSCRIPT

Page 1: HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

© 2012 Sproxil, [email protected] May 22,2012 1

Developing Real Time Analytics Applications Using HBase in the Cloud

May 22, 2012

Rick Tucker

[email protected]

Page 2: HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

© 2012 Sproxil, [email protected] May 22,2012 2

About Sproxil

• Brand protection, specializing in anti-counterfeiting solutions

• Solution requires a scalable and high-throughput text message processing engine

• Supports a real-time analytics web interface

1SCRATCH

2TEXT

3VERIFY

Page 3: HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

© 2012 Sproxil, [email protected] May 22,2012 3

Why HBase?

USER SENDS TEXT MESSAGE

TEXT MESSAGE IS PROCESSED

USER RECEIVES

REPLY

CALCULATE ANALYTICS

Amazon EC2Cloud

Page 4: HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

© 2012 Sproxil, [email protected] May 22,2012 4

Real-Time Analytics Engine

• MapReduce too slow to maintain data in true real time

• As data arrives, analytical data is updated through counters

Text Message Arrives

Message Analyzed

Increment Counters

Genuine Product Authentication

Repeat Customer

Increment Counter for Genuine Authentications

+1

+1 Increment Counter for Repeat Customers

Page 5: HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

© 2012 Sproxil, [email protected] May 22,2012

Schema Design: Example 1

5

• Example: View log of text messages in chronological order

• Rowkey: row prefix + timestamp

Rowtransaction 2012-05-22 12:00:00transaction 2012-05-22 12:01:14transaction 2012-05-22 12:02:03

Note: HBase sorts rowkeys lexicographically so scans return data in reverse chronological order

Page 6: HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

© 2012 Sproxil, [email protected] May 22,2012 6

Schema Design: Example 2

• View log of text messages from individual users

• Rowkey: row prefix + user ID + timestamp

Rowtransaction userID 1 2012-05-22 12:00:00transaction userID 1 2012-05-22 12:01:14transaction userID 2 2012-05-22 12:00:54transaction userID 2 2012-05-22 12:01:22transaction userID 2 2012-05-22 12:02:01

Note: Hbase sorts rows lexicographically so scans return data in reverse chronological order

Page 7: HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

© 2012 Sproxil, [email protected] May 22,2012 7

Critical Findings

• Schema design is crucial for successful HBase implementation– Pack as much info as possible into row keys

• Use caution with Filters– E.g. Regex filters can be costly– Alternatives:

• Directly query for data you need• Use efficient filters when filtering large data sets

Page 8: HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

© 2012 Sproxil, [email protected] May 22,2012

Making Counterfeiting Unprofitable™

America | Asia | Africa Sproxil.com

[email protected]

+1 617 682 9577

Thank You! Your global brand protection specialists

– spanning 3 continents and

speaking 9 languages

8