hadoop-ds: a sql over hadoop benchmark
DESCRIPTION
Hadoop-DS: A SQL over Hadoop Benchmark: Based on popular TPC-DS benchmark Mimics porting workload from RDBMS Data Warehouse to SQL over Hadoop solutionTRANSCRIPT
1 © 2013 IBM Corporation
• Based on popular TPC-DS benchmark
• Mimics porting workload from RDBMS Data Warehouse to SQL over Hadoop solution
Hadoop-DS: A SQL over Hadoop Benchmark
SQL Compatibility Matters:
• Big SQL is the only solution with a robust SQL engine able to execute all 99 queries, and with minimal porting effort
• Hive/Impala took weeks to port queries:
Only subset working due to SQL limitations, query timeouts & runtime failures
<1 hour ~4 weeks Porting effort ~4 weeks
73% % working 70% 100%
Common set of 46 queries working
Independently audited
**See Speaker notes for disclaimer
2 © 2013 IBM Corporation
Throughput Matters:
• Big SQL is 3.6x faster than Impala and 5.4x faster than Hive 0.13 for 46 common queries at 10TB:
• Big SQL also able to execute all 99 queries with 6 concurrent streams at 10TB.
Hadoop-DS: A SQL over Hadoop Benchmark Scaling Matters:
• Big SQL completed 4 concurrent query streams @30TB in 1.8x time of a single query stream
**See Speaker notes for disclaimer
Independently
audited results.