30tb big data decision support (hadoop-ds) benchmark

Benchmark sponsor: Berni Schiefer IBM 8200 Warden Avenue Markham, Ontario, L6C 1C7 October 24, 2014 At IBM’s request I verified the implementation and results of a 30TB Big Data Decision Support (Hadoop-DS) benchmark, with most features derived from the TPC-DS Benchmark. The Hadoop-DS benchmark was executed on the following configuration: Test Platform: IBM x3650BD - 17 Node Cluster Query Engine: IBM BigInsights Big SQL v3.0 Operating System: Red Hat Enterprise Linux 6.4 Configuration per node: CPUs 2 x Intel Xeon Processor E5-2680 v2 (2.8 GHz, 25MB L3) Memory 128GB (1867MHz DDR3) Storage 10 x 2TB SATA 3.5” HDD & 1 x 128GB SATA 2.5” SSD (swap) The results were: Single-Stream Performance 1,023 Hadoop-DS Qph@30TB Multi-Stream Performance 2,274 Hadoop-DS Qph@30TB Multi-Stream Concurrency 4 Streams Load Time 37h 11m 10s While these results are for a non-TPC benchmark, they complied with the following subset of requirements from the latest version of the TPC-DS Benchmark standard: • The database schema was defined with the proper layout and data types • The database population was generated using the TPC provided dsdgen • The database was properly scaled to 30TB and populated accordingly • The auxiliary data structure requirements were met since none were defined • The database load time was properly measured and reported • The query input variables were generated by the TPC provided dsqgen • The execution times for queries were correctly measured and reported

Upload: nicolas-morales

Post on 26-Jan-2015

105 views

Category:

Software

1 download

Report

Download

Embed Size (px):

DESCRIPTION

30TB Big Data Decision Support (Hadoop-DS) benchmark The Hadoop-DS benchmark was executed on the following configuration: Test Platform: IBM x3650BD - 17 Node Cluster Query Engine: IBM BigInsights Big SQL v3.0 Operating System: Red Hat Enterprise Linux 6.4 Configuration per

TRANSCRIPT

Page 1: 30TB Big Data Decision Support (Hadoop-DS) benchmark

Benchmark sponsor: Berni Schiefer IBM 8200 Warden Avenue Markham, Ontario, L6C 1C7

October 24, 2014

At IBM’s request I verified the implementation and results of a 30TB Big Data Decision Support

(Hadoop-DS) benchmark, with most features derived from the TPC-DS Benchmark.

The Hadoop-DS benchmark was executed on the following configuration:

Test Platform: IBM x3650BD - 17 Node Cluster Query Engine: IBM BigInsights Big SQL v3.0 Operating System: Red Hat Enterprise Linux 6.4

Configuration per node: CPUs 2 x Intel Xeon Processor E5-2680 v2 (2.8 GHz, 25MB L3) Memory 128GB (1867MHz DDR3) Storage 10 x 2TB SATA 3.5” HDD & 1 x 128GB SATA 2.5” SSD (swap)

The results were:

Single-Stream Performance 1,023 Hadoop-DS Qph@30TB Multi-Stream Performance 2,274 Hadoop-DS Qph@30TB Multi-Stream Concurrency 4 Streams Load Time 37h 11m 10s

While these results are for a non-TPC benchmark, they complied with the following subset of

requirements from the latest version of the TPC-DS Benchmark standard:

• The database schema was defined with the proper layout and data types

• The database population was generated using the TPC provided dsdgen

• The database was properly scaled to 30TB and populated accordingly

• The auxiliary data structure requirements were met since none were defined

• The database load time was properly measured and reported

• The query input variables were generated by the TPC provided dsqgen

• The execution times for queries were correctly measured and reported