30tb big data decision support (hadoop-ds) benchmark

2
Benchmark sponsor: Berni Schiefer IBM 8200 Warden Avenue Markham, Ontario, L6C 1C7 October 24, 2014 At IBM’s request I verified the implementation and results of a 30TB Big Data Decision Support (Hadoop-DS) benchmark, with most features derived from the TPC-DS Benchmark. The Hadoop-DS benchmark was executed on the following configuration: Test Platform: IBM x3650BD - 17 Node Cluster Query Engine: IBM BigInsights Big SQL v3.0 Operating System: Red Hat Enterprise Linux 6.4 Configuration per node: CPUs 2 x Intel Xeon Processor E5-2680 v2 (2.8 GHz, 25MB L3) Memory 128GB (1867MHz DDR3) Storage 10 x 2TB SATA 3.5” HDD & 1 x 128GB SATA 2.5” SSD (swap) The results were: Single-Stream Performance 1,023 Hadoop-DS Qph@30TB Multi-Stream Performance 2,274 Hadoop-DS Qph@30TB Multi-Stream Concurrency 4 Streams Load Time 37h 11m 10s While these results are for a non-TPC benchmark, they complied with the following subset of requirements from the latest version of the TPC-DS Benchmark standard: The database schema was defined with the proper layout and data types The database population was generated using the TPC provided dsdgen The database was properly scaled to 30TB and populated accordingly The auxiliary data structure requirements were met since none were defined The database load time was properly measured and reported The query input variables were generated by the TPC provided dsqgen The execution times for queries were correctly measured and reported

Upload: nicolas-morales

Post on 26-Jan-2015

105 views

Category:

Software


1 download

DESCRIPTION

30TB Big Data Decision Support (Hadoop-DS) benchmark The Hadoop-DS benchmark was executed on the following configuration: Test Platform: IBM x3650BD - 17 Node Cluster Query Engine: IBM BigInsights Big SQL v3.0 Operating System: Red Hat Enterprise Linux 6.4 Configuration per

TRANSCRIPT

Page 1: 30TB Big Data Decision Support (Hadoop-DS) benchmark

Benchmark sponsor: Berni Schiefer IBM 8200 Warden Avenue Markham, Ontario, L6C 1C7

October 24, 2014

At IBM’s request I verified the implementation and results of a 30TB Big Data Decision Support

(Hadoop-DS) benchmark, with most features derived from the TPC-DS Benchmark.

The Hadoop-DS benchmark was executed on the following configuration:

Test Platform: IBM x3650BD - 17 Node Cluster Query Engine: IBM BigInsights Big SQL v3.0 Operating System: Red Hat Enterprise Linux 6.4

Configuration per node: CPUs 2 x Intel Xeon Processor E5-2680 v2 (2.8 GHz, 25MB L3) Memory 128GB (1867MHz DDR3) Storage 10 x 2TB SATA 3.5” HDD & 1 x 128GB SATA 2.5” SSD (swap)

The results were:

Single-Stream Performance 1,023 Hadoop-DS Qph@30TB Multi-Stream Performance 2,274 Hadoop-DS Qph@30TB Multi-Stream Concurrency 4 Streams Load Time 37h 11m 10s

While these results are for a non-TPC benchmark, they complied with the following subset of

requirements from the latest version of the TPC-DS Benchmark standard:

• The database schema was defined with the proper layout and data types

• The database population was generated using the TPC provided dsdgen

• The database was properly scaled to 30TB and populated accordingly

• The auxiliary data structure requirements were met since none were defined

• The database load time was properly measured and reported

• The query input variables were generated by the TPC provided dsqgen

• The execution times for queries were correctly measured and reported

Page 2: 30TB Big Data Decision Support (Hadoop-DS) benchmark

The following aspects of the Hadoop-DS benchmark were implemented within the spirit of the

TPC-DS Benchmark:

• All 99 queries were executed using the specified and unmodified query text or by applying minor modifications to the queries

• Query answers were verified against the available validation answer sets

The following features and requirements from the latest version of the TPC-DS Benchmark

standard were not adhered to:

• The defined referential integrity constraints were not enforced

• The statistics collection did not meet the required limitations

• The data persistence properties were not demonstrated

• The data maintenance functions were neither implemented nor executed

• A single throughput test was used to measure multi-user performance

• The system pricing was not provided or reviewed

• The report did not meet the defined format and content

The executive summary and the benchmark report documenting the details of this Hadoop-DS benchmark execution were verified for accuracy.

Respectfully Yours,

François Raab, President