bizosys at fifth elephant

14
©2013 BIZOSYS TECHNOLOGIES PRIVATE LIMITED

Upload: abinasha-karana

Post on 12-Jul-2015

939 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Bizosys at fifth elephant

©2013 BIZOSYS TECHNOLOGIES PRIVATE LIMITED

Page 2: Bizosys at fifth elephant

15 Billion computations in

187 milliseconds

with a Big Join in Hadoop

Page 3: Bizosys at fifth elephant

Business Drivers

1. Support 6 months of data as opposed to 2 days

2. Near real-time calculation with optimal infrastructure

The Use-case : Assessing Market Risk of an

Investment Portfolio

Page 4: Bizosys at fifth elephant

The Use-case : Assessing Market Risk of an

Investment Portfolio

Acc Equity Qty

A1 MSFT 100

A1 ORCL 500

A2 CISCO 400

Equity Model1 Model2

MSFT $78.00 $77.12

ORCL $33.78 $31.09

CISCO $32.12 $16.00

X

What is the total portfolio value for Model1?

Page 5: Bizosys at fifth elephant

Problem with The Big Join :

Acc Equity Qty

A1 MSFT 100

A1 ORCL 500

A2 CISCO 400

Equity Model1 Model2

MSFT 78$ 77.12$

ORCL 45.12$ 49.77$

CISCO 32.12$ 16.0$

X3M positions2M products * 5000

Models/Day

15 Billion Calculations

Page 6: Bizosys at fifth elephant

Schema Design…

Price Model DAY1 DAY N

Model1 Product 1 - PriceProduct 2 - Price….Product 2000000 - Price

… … …

Model 5000 … …

Date All Positions

XX-XXX-XXXX Acc Id 1 – ProductId 1 - 23 stocks…Acc Id 22000 – ProductId 200000 - 111 stocks

Page 7: Bizosys at fifth elephant

Why 1 price model is packed in 1 HBase Cell?

0

100

200

300

400

500

600

2M Products in 1 Cell 2M Products in 2M Cells

Eventual Consistency Overhead

GBs required : Product-Price model Data

Get rid of “HBase Cell meta-data” payload

Page 8: Bizosys at fifth elephant

Why Region Server is set at 16*64 MB?

1 Thread per Price Model64 Price Model/Machine

78 64core machines** @ 78 Region Servers

Enable Parallel Computing

**This is based on scalability factor of performance testing (150ms/ price model with parallel computing)

Page 9: Bizosys at fifth elephant

Why HBase Coprocessors are used?

Region 2Machine 1

Region 1Machine 1

HBaseCoprocessor

1 Cell = 1st Price Model =2 Million product prices =

8 * 2 = 16M

1 Cell = 2nd Price Model =2 Million product prices =

8 * 2 = 16M

Region 78Machine 78

1 Cell=5000th Price Model =2 Million product prices =

8 * 2 = 16M

Value @ Risk output For 1

Day

Reducer

Mapper

Mapper

Mapper

Map-Reduce does not Jam Network.

Fin

al o

utp

ut

of

mo

de

ls

Page 10: Bizosys at fifth elephant

Why is price-model-id stored as row-key?

Reading Sequentially (HBase Scanner) is lot faster than Random Row Read

Page 11: Bizosys at fifth elephant

Hadoop Distributed File System

Hadoop Map-Reduce Hadoop HBase

HSearch Indexer HSearch Coprocessor

MR Indexing Job with Lucene Analyzers

VAR RealTime MR Plug-In

HSearch Adapter

VAR Computation Application

Batch Mode Indexing Real-Time computation

The Final Building Blocks

Page 12: Bizosys at fifth elephant

Why We Like HBase

Why We Built HSearch

• Scalable• Real-Time• Apache Licensed

• Search and Analysis inside Hadoop• Real-time Map-Reduce• Extreme Parallelization

Page 13: Bizosys at fifth elephant

• Distribute index with auto-sharding and auto-replication - Handle Big Data

• Parallelize Indexing, Searching, Grouping – in milliseconds

• Binary serde, Compress, (May encrypt) at storage and transmission - Securely

• Cache everything – Serving thousand of users

• Redundize everything –With very limited support engineers.

• Index, Search and Analyze multi-structure big data in milliseconds.

• Search/Analyze as events unfold - For any additions or changes at sources.

• Plug-in custom algos/code with runtime data grouping and computing.

WHY

HOW

Available on

Apache Licensed

hadoopsearch.net

Page 14: Bizosys at fifth elephant

©2013 BIZOSYS TECHNOLOGIES PRIVATE LIMITED

For more information regarding Bizosys business, please write to [email protected]

http://www.bizosys.com