30tb big data decision support (hadoop-ds) benchmark
DESCRIPTION
30TB Big Data Decision Support (Hadoop-DS) benchmark The Hadoop-DS benchmark was executed on the following configuration: Test Platform: IBM x3650BD - 17 Node Cluster Query Engine: IBM BigInsights Big SQL v3.0 Operating System: Red Hat Enterprise Linux 6.4 Configuration perTRANSCRIPT
Benchmark sponsor: Berni Schiefer IBM 8200 Warden Avenue Markham, Ontario, L6C 1C7
October 24, 2014
At IBM’s request I verified the implementation and results of a 30TB Big Data Decision Support
(Hadoop-DS) benchmark, with most features derived from the TPC-DS Benchmark.
The Hadoop-DS benchmark was executed on the following configuration:
Test Platform: IBM x3650BD - 17 Node Cluster Query Engine: IBM BigInsights Big SQL v3.0 Operating System: Red Hat Enterprise Linux 6.4
Configuration per node: CPUs 2 x Intel Xeon Processor E5-2680 v2 (2.8 GHz, 25MB L3) Memory 128GB (1867MHz DDR3) Storage 10 x 2TB SATA 3.5” HDD & 1 x 128GB SATA 2.5” SSD (swap)
The results were:
Single-Stream Performance 1,023 Hadoop-DS Qph@30TB Multi-Stream Performance 2,274 Hadoop-DS Qph@30TB Multi-Stream Concurrency 4 Streams Load Time 37h 11m 10s
While these results are for a non-TPC benchmark, they complied with the following subset of
requirements from the latest version of the TPC-DS Benchmark standard:
• The database schema was defined with the proper layout and data types
• The database population was generated using the TPC provided dsdgen
• The database was properly scaled to 30TB and populated accordingly
• The auxiliary data structure requirements were met since none were defined
• The database load time was properly measured and reported
• The query input variables were generated by the TPC provided dsqgen
• The execution times for queries were correctly measured and reported
The following aspects of the Hadoop-DS benchmark were implemented within the spirit of the
TPC-DS Benchmark:
• All 99 queries were executed using the specified and unmodified query text or by applying minor modifications to the queries
• Query answers were verified against the available validation answer sets
The following features and requirements from the latest version of the TPC-DS Benchmark
standard were not adhered to:
• The defined referential integrity constraints were not enforced
• The statistics collection did not meet the required limitations
• The data persistence properties were not demonstrated
• The data maintenance functions were neither implemented nor executed
• A single throughput test was used to measure multi-user performance
• The system pricing was not provided or reviewed
• The report did not meet the defined format and content
The executive summary and the benchmark report documenting the details of this Hadoop-DS benchmark execution were verified for accuracy.
Respectfully Yours,
François Raab, President