big data benchmarking with rdma solutions
Post on 12-May-2015
705 Views
Preview:
DESCRIPTION
TRANSCRIPT
© 2013 Mellanox Technologies 1
Big Data Benchmarking with RDMA solutions
Oracle Open World 2013
© 2013 Mellanox Technologies 2
Leading Supplier of End-to-End Interconnect Solutions
Host/Fabric SoftwareICs Switches/GatewaysAdapter Cards Cables
Comprehensive End-to-End InfiniBand and Ethernet Portfolio
Virtual Protocol Interconnect
StorageFront / Back-EndServer / Compute Switch / Gateway
56G IB & FCoIB 56G InfiniBand
10/40/56GbE & FCoE 10/40/56GbE
Fibre Channel
Virtual Protocol Interconnect
© 2013 Mellanox Technologies 3
A scalable fault-tolerant distributed system for data storage and processing
Hadoop has two main systems• Hadoop Distributed File System: self-healing high-bandwidth clustered storage. • MapReduce: distributed fault-tolerant resource management and scheduling coupled with a scalable data
programming abstraction.
Key values • Flexibility – Store any data, Run any analysis. • Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. • Economics – Cost per TB at a fraction of traditional options.
Hadoop Framework
HDFS™(Hadoop Distributed File System)
Map Reduce HBase
DISK DISK DISK DISK DISK DISK
Hive Pig
Map Reduce
HDFS™(Hadoop Distributed File System)
© 2013 Mellanox Technologies 4
Three Areas for Accelerations
Data Analytics• Explore inefficiencies in existing analytics frameworks and systems• Accelerate data processing to deliver faster results
Storage• Explore ways to refine dominant file system• Take advantage for direct attached disk to accelerate data access
Distributed Storage• Leverage popular distributed storage systems with Big Data applications• Use existing systems for usage with Big Data frameworks
© 2013 Mellanox Technologies 5
~88% CPU Utilization
I/O Offload Frees Up CPU for Application Processing U
ser
Sp
ace
Sy
stem
Sp
ace
~53% CPU Utilization
~47% CPU Overhead/Idle
~12% CPU Overhead/Idle
Without RDMA With RDMA and Offload
Use
r S
pa
ceS
yst
em S
pa
ce
© 2013 Mellanox Technologies 6
Plug-in architecture • Open-source, latest GA version 3.1 (6/10/2013)• Google code repository at: https://code.google.com/p/uda-plugin/
Accelerates Map Reduce Jobs• Accelerated merge sort
Efficient Shuffle Provider • Data transfer over RDMA • Supports InfiniBand and Ethernet
Supported Hadoop Distributions• Apache 3.0 – In the main trunk!• Apache 2.0.3 – In the main trunk!• Apache Hadoop 1.0.x ; 1.1.x• Cloudera Distribution Hadoop 3 &4• Hortonworks HDP 1.x• GPHD 1.2
Supported Hardware• ConnectX®-3 VPI• SwitchX-2 based systems
Unstructured Data Accelerator - UDA
HDFS™(Hadoop Distributed File System)
Map Reduce HBase
DISK DISK DISK DISK DISK DISK
Hive Pig
Map Reduce
© 2013 Mellanox Technologies 7
Double Map Reduce Performance with UDA
*TeraSort is a popular benchmark used to measure the performance of Hadoop cluster
~50% Disk Access CPU Efficiency 2.5X
**1TB Data Set, 20x dual X5670 (Westmere) Machines, 10x HDD Base; Vanilla GPHD1.2; UDA GPHD1.2+UDA
2X Faster Job Completion! Increase the Value of Data!
54%
© 2013 Mellanox Technologies 8
HDFS is the Hadoop File System• The underlying File system for HBase and other NoSQL Data Bases
More Drives, Higher Throughput is Needed
SSDs Solutions Must use Higher Throughput• Bounded by 1GbE and 10GbE
HDFS Acceleration; Joint Project With Ohio State University
HDFS™(Hadoop Distributed File System)
Map Reduce HBase
DISK DISK DISK DISK DISK DISK
Hive Pig
© 2013 Mellanox Technologies 9
SSDs Become De-Facto standard in HDFS deployment• Read capability is a critical factor for application performance
E-DFSIO, Part of Intel’s HiBench test suite, profiles aggregated throughput on the cluster• 1GbE network impede any performance benefit from SSD deployment
Unlocking the Power SSDs In Hadoop Environment
E-DFSIO, Showing the Power of SSD @ HDFS
© 2013 Mellanox Technologies 10
OrangeFS as Hadoop Storage Solution
© 2013 Mellanox Technologies 11
Mellanox VPI Card• MCX354A-FCBT
Mellanox Edge Switches• MSX10xx; MSX60xx
Cloudera Certified – CDH3 and CDH4
© 2013 Mellanox Technologies 12
E5-26x0 (Sandy Bridge) Machines• Dual Socket• 4+ cores each socket• 32GB+ of DRAM
Disk Drives• At least 5 x 1TB, SAS, 10K RPM
Hadoop Configuration• At least one Name Node + Job Tracker• At least 4 Data Nodes
Installation:• Your selection of Hadoop Distribution or other Big Data solution (Such as Cassandra)
Networking• ConnectX-3 VPI card, FDR, 40GbE and 10GbE• SwitchX based systems: MSX6036F, MSX1036B and MSX1016 • Mellanox’s FDR, 40GbE and 10GbE Cable Solutions
http://www.mellanox.com/related-docs/whitepapers/WP_Deploying_Hadoop.pdf
Simple Building Block for Big Data Solution
© 2013 Mellanox Technologies 13
EMC 1000-Node Analytic Platform Accelerates Industry's Hadoop Development 24 PetaByte of physical storage
• Half of every written word since inception of mankind Mellanox VPI Solutions
Test Drive Your Big Data
2X Faster Hadoop Job Run-TimeHadoop
Acceleration
High Throughput, Low Latency, RDMA Critical for ROI
© 2013 Mellanox Technologies 14
Thank You
top related