Transcript
Page 1: 30TB Big Data Decision Support (Hadoop-DS) benchmark

Benchmark sponsor: Berni Schiefer IBM 8200 Warden Avenue Markham, Ontario, L6C 1C7

October 24, 2014

At IBM’s request I verified the implementation and results of a 30TB Big Data Decision Support

(Hadoop-DS) benchmark, with most features derived from the TPC-DS Benchmark.

The Hadoop-DS benchmark was executed on the following configuration:

Test Platform: IBM x3650BD - 17 Node Cluster Query Engine: IBM BigInsights Big SQL v3.0 Operating System: Red Hat Enterprise Linux 6.4

Configuration per node: CPUs 2 x Intel Xeon Processor E5-2680 v2 (2.8 GHz, 25MB L3) Memory 128GB (1867MHz DDR3) Storage 10 x 2TB SATA 3.5” HDD & 1 x 128GB SATA 2.5” SSD (swap)

The results were:

Single-Stream Performance 1,023 Hadoop-DS Qph@30TB Multi-Stream Performance 2,274 Hadoop-DS Qph@30TB Multi-Stream Concurrency 4 Streams Load Time 37h 11m 10s

While these results are for a non-TPC benchmark, they complied with the following subset of

requirements from the latest version of the TPC-DS Benchmark standard:

• The database schema was defined with the proper layout and data types

• The database population was generated using the TPC provided dsdgen

• The database was properly scaled to 30TB and populated accordingly

• The auxiliary data structure requirements were met since none were defined

• The database load time was properly measured and reported

• The query input variables were generated by the TPC provided dsqgen

• The execution times for queries were correctly measured and reported

Page 2: 30TB Big Data Decision Support (Hadoop-DS) benchmark

The following aspects of the Hadoop-DS benchmark were implemented within the spirit of the

TPC-DS Benchmark:

• All 99 queries were executed using the specified and unmodified query text or by applying minor modifications to the queries

• Query answers were verified against the available validation answer sets

The following features and requirements from the latest version of the TPC-DS Benchmark

standard were not adhered to:

• The defined referential integrity constraints were not enforced

• The statistics collection did not meet the required limitations

• The data persistence properties were not demonstrated

• The data maintenance functions were neither implemented nor executed

• A single throughput test was used to measure multi-user performance

• The system pricing was not provided or reviewed

• The report did not meet the defined format and content

The executive summary and the benchmark report documenting the details of this Hadoop-DS benchmark execution were verified for accuracy.

Respectfully Yours,

François Raab, President


Top Related