hadoop-ds: a sql over hadoop benchmark

2
1 © 2013 IBM Corporation Based on popular TPC-DS benchmark Mimics porting workload from RDBMS Data Warehouse to SQL over Hadoop solution Hadoop-DS: A SQL over Hadoop Benchmark SQL Compatibility Matters: Big SQL is the only solution with a robust SQL engine able to execute all 99 queries, and with minimal porting effort Hive/Impala took weeks to port queries: Only subset working due to SQL limitations, query timeouts & runtime failures <1 hour ~4 weeks Porting effort ~4 weeks 73% % working 70% 100% Common set of 46 queries working Independently audited **See Speaker notes for disclaimer

Upload: nicolas-morales

Post on 26-Jan-2015

116 views

Category:

Software


3 download

DESCRIPTION

Hadoop-DS: A SQL over Hadoop Benchmark: Based on popular TPC-DS benchmark Mimics porting workload from RDBMS Data Warehouse to SQL over Hadoop solution

TRANSCRIPT

Page 1: Hadoop-DS: A SQL over Hadoop Benchmark

1 © 2013 IBM Corporation

• Based on popular TPC-DS benchmark

• Mimics porting workload from RDBMS Data Warehouse to SQL over Hadoop solution

Hadoop-DS: A SQL over Hadoop Benchmark

SQL Compatibility Matters:

• Big SQL is the only solution with a robust SQL engine able to execute all 99 queries, and with minimal porting effort

• Hive/Impala took weeks to port queries:

Only subset working due to SQL limitations, query timeouts & runtime failures

<1 hour ~4 weeks Porting effort ~4 weeks

73% % working 70% 100%

Common set of 46 queries working

Independently audited

**See Speaker notes for disclaimer

Page 2: Hadoop-DS: A SQL over Hadoop Benchmark

2 © 2013 IBM Corporation

Throughput Matters:

• Big SQL is 3.6x faster than Impala and 5.4x faster than Hive 0.13 for 46 common queries at 10TB:

• Big SQL also able to execute all 99 queries with 6 concurrent streams at 10TB.

Hadoop-DS: A SQL over Hadoop Benchmark Scaling Matters:

• Big SQL completed 4 concurrent query streams @30TB in 1.8x time of a single query stream

**See Speaker notes for disclaimer

Independently

audited results.