Transcript
Page 1: Hadoop-DS: A SQL over Hadoop Benchmark

1 © 2013 IBM Corporation

• Based on popular TPC-DS benchmark

• Mimics porting workload from RDBMS Data Warehouse to SQL over Hadoop solution

Hadoop-DS: A SQL over Hadoop Benchmark

SQL Compatibility Matters:

• Big SQL is the only solution with a robust SQL engine able to execute all 99 queries, and with minimal porting effort

• Hive/Impala took weeks to port queries:

Only subset working due to SQL limitations, query timeouts & runtime failures

<1 hour ~4 weeks Porting effort ~4 weeks

73% % working 70% 100%

Common set of 46 queries working

Independently audited

**See Speaker notes for disclaimer

Page 2: Hadoop-DS: A SQL over Hadoop Benchmark

2 © 2013 IBM Corporation

Throughput Matters:

• Big SQL is 3.6x faster than Impala and 5.4x faster than Hive 0.13 for 46 common queries at 10TB:

• Big SQL also able to execute all 99 queries with 6 concurrent streams at 10TB.

Hadoop-DS: A SQL over Hadoop Benchmark Scaling Matters:

• Big SQL completed 4 concurrent query streams @30TB in 1.8x time of a single query stream

**See Speaker notes for disclaimer

Independently

audited results.


Top Related