Transcript
Page 1: HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

© 2012 Sproxil, [email protected] May 22,2012 1

Developing Real Time Analytics Applications Using HBase in the Cloud

May 22, 2012

Rick Tucker

[email protected]

Page 2: HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

© 2012 Sproxil, [email protected] May 22,2012 2

About Sproxil

• Brand protection, specializing in anti-counterfeiting solutions

• Solution requires a scalable and high-throughput text message processing engine

• Supports a real-time analytics web interface

1SCRATCH

2TEXT

3VERIFY

Page 3: HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

© 2012 Sproxil, [email protected] May 22,2012 3

Why HBase?

USER SENDS TEXT MESSAGE

TEXT MESSAGE IS PROCESSED

USER RECEIVES

REPLY

CALCULATE ANALYTICS

Amazon EC2Cloud

Page 4: HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

© 2012 Sproxil, [email protected] May 22,2012 4

Real-Time Analytics Engine

• MapReduce too slow to maintain data in true real time

• As data arrives, analytical data is updated through counters

Text Message Arrives

Message Analyzed

Increment Counters

Genuine Product Authentication

Repeat Customer

Increment Counter for Genuine Authentications

+1

+1 Increment Counter for Repeat Customers

Page 5: HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

© 2012 Sproxil, [email protected] May 22,2012

Schema Design: Example 1

5

• Example: View log of text messages in chronological order

• Rowkey: row prefix + timestamp

Rowtransaction 2012-05-22 12:00:00transaction 2012-05-22 12:01:14transaction 2012-05-22 12:02:03

Note: HBase sorts rowkeys lexicographically so scans return data in reverse chronological order

Page 6: HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

© 2012 Sproxil, [email protected] May 22,2012 6

Schema Design: Example 2

• View log of text messages from individual users

• Rowkey: row prefix + user ID + timestamp

Rowtransaction userID 1 2012-05-22 12:00:00transaction userID 1 2012-05-22 12:01:14transaction userID 2 2012-05-22 12:00:54transaction userID 2 2012-05-22 12:01:22transaction userID 2 2012-05-22 12:02:01

Note: Hbase sorts rows lexicographically so scans return data in reverse chronological order

Page 7: HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

© 2012 Sproxil, [email protected] May 22,2012 7

Critical Findings

• Schema design is crucial for successful HBase implementation– Pack as much info as possible into row keys

• Use caution with Filters– E.g. Regex filters can be costly– Alternatives:

• Directly query for data you need• Use efficient filters when filtering large data sets

Page 8: HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

© 2012 Sproxil, [email protected] May 22,2012

Making Counterfeiting Unprofitable™

America | Asia | Africa Sproxil.com

[email protected]

+1 617 682 9577

Thank You! Your global brand protection specialists

– spanning 3 continents and

speaking 9 languages

8


Top Related