hbasecon 2012 | developing real time analytics applications using hbase in the cloud - rick tucker,...

Post on 27-Jun-2015

1.014 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

As small companies are adapting to handle Big Data, the cloud and HBase enable developers to leverage that data to provide revenue-generating real time applications. When developing a real time application for an existing system, one must balance incrementing counters in real time with Map Reduce jobs over the same data-set. When maintaining an analytics platform, ensuring data accuracy is essential. At Sproxil, SMS logs are ingested into HBase at a growing rate and we report metrics such as SMS throughput, unique user growth over time, and return SMS user activity in real time. Sproxil provides a versatile analytics application enabling customers to handpick statistics on demand to gain market insights enabling them react quickly to trends. This talk will identify the most profitable metrics and demonstrate how to calculate them using Map Reduce while continually updating data as it arrives.

TRANSCRIPT

© 2012 Sproxil, Inc.tech@sproxil.com May 22,2012 1

Developing Real Time Analytics Applications Using HBase in the Cloud

May 22, 2012

Rick Tucker

tech@sproxil.com

© 2012 Sproxil, Inc.tech@sproxil.com May 22,2012 2

About Sproxil

• Brand protection, specializing in anti-counterfeiting solutions

• Solution requires a scalable and high-throughput text message processing engine

• Supports a real-time analytics web interface

1SCRATCH

2TEXT

3VERIFY

© 2012 Sproxil, Inc.tech@sproxil.com May 22,2012 3

Why HBase?

USER SENDS TEXT MESSAGE

TEXT MESSAGE IS PROCESSED

USER RECEIVES

REPLY

CALCULATE ANALYTICS

Amazon EC2Cloud

© 2012 Sproxil, Inc.tech@sproxil.com May 22,2012 4

Real-Time Analytics Engine

• MapReduce too slow to maintain data in true real time

• As data arrives, analytical data is updated through counters

Text Message Arrives

Message Analyzed

Increment Counters

Genuine Product Authentication

Repeat Customer

Increment Counter for Genuine Authentications

+1

+1 Increment Counter for Repeat Customers

© 2012 Sproxil, Inc.tech@sproxil.com May 22,2012

Schema Design: Example 1

5

• Example: View log of text messages in chronological order

• Rowkey: row prefix + timestamp

Rowtransaction 2012-05-22 12:00:00transaction 2012-05-22 12:01:14transaction 2012-05-22 12:02:03

Note: HBase sorts rowkeys lexicographically so scans return data in reverse chronological order

© 2012 Sproxil, Inc.tech@sproxil.com May 22,2012 6

Schema Design: Example 2

• View log of text messages from individual users

• Rowkey: row prefix + user ID + timestamp

Rowtransaction userID 1 2012-05-22 12:00:00transaction userID 1 2012-05-22 12:01:14transaction userID 2 2012-05-22 12:00:54transaction userID 2 2012-05-22 12:01:22transaction userID 2 2012-05-22 12:02:01

Note: Hbase sorts rows lexicographically so scans return data in reverse chronological order

© 2012 Sproxil, Inc.tech@sproxil.com May 22,2012 7

Critical Findings

• Schema design is crucial for successful HBase implementation– Pack as much info as possible into row keys

• Use caution with Filters– E.g. Regex filters can be costly– Alternatives:

• Directly query for data you need• Use efficient filters when filtering large data sets

© 2012 Sproxil, Inc.tech@sproxil.com May 22,2012

Making Counterfeiting Unprofitable™

America | Asia | Africa Sproxil.com

tech@sproxil.com

+1 617 682 9577

Thank You! Your global brand protection specialists

– spanning 3 continents and

speaking 9 languages

8

top related