sudoers: benchmarking hadoop with aloja
TRANSCRIPT
![Page 1: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/1.jpg)
Benchmarking Hadoop with ALOJA
Oct 6, 2015
by Nicolas Poggi @ni_po
sudoers Barcelona:
![Page 2: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/2.jpg)
About Nicolas Poggi @ni_po
Work: Education:
Community:
![Page 3: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/3.jpg)
Agenda Intro on Hadoop
Current scenario and problematic
ALOJA project
Open source tools
Benchmarking DEMO
Results
DEMO results online
Open questions and comments
![Page 4: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/4.jpg)
Intro: Hadoop design and ecosystem
![Page 5: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/5.jpg)
Hadoop design
Hadoop designed to solve complex data Structured and non structured
With [close to] linear scalability
Simplifying the programming model From MPI, OpenMP, CUDA, …
Operates as a blackbox for data analysts
Image source: Hadoop, the definitive guide
![Page 6: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/6.jpg)
Hadoop parameters > 100+ tunable parameters
mapred.map/reduce.tasks.speculative.execution
obscure and interrelated
io.sort.mb 100 (300)
io.sort.record.percent 5% (15%)
io.sort.spill.percent 80% (95 – 100%)
Number of Mappers and Reducers
Rule of thumb 0.5 - 2 per CPU core
![Page 7: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/7.jpg)
Hadoop stack for tuning
Image source: Intel® Distribution for Apache Hadoop
![Page 8: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/8.jpg)
Hadoop highly-scalable but… Not a high-performance solution!
Requires Design,
Clusters, topology clusters
Setup, OS, Hadoop config
and tuning required Iterative approach
Time consuming
And extensive benchmarking!
![Page 9: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/9.jpg)
Hadoop ecosystem
Large and spread
Dominated by big players
Custom patches
Default values not ideal
Product claims
Cloud vs. On-premise
IaaS
PaaS
EMR, HDInsight
Needs standardization and auditing!
DATA
![Page 10: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/10.jpg)
Product claims Needs auditing!
![Page 11: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/11.jpg)
Too many choices?
Remote volumes
-
-
Rotational HDDs
JBODs
Large VMs
Small VMs
Gb Ethernet
InfiniBand
RAID
Cost
Performance
On-Premise
Cloud
And where is my system configuration positioned on
each of these axes?
High availability
Replication
+
+
![Page 12: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/12.jpg)
Project ALOJA
Open initiative to produce mechanisms for an automated characterization of cost-effectiveness
of Big Data deployments
Results from of a growing need of the community to understand job execution details and create transparency
Explore different configuration deployment options and their tradeoffs Both software and hardware
Cloud services and on-premise
Seeks to provide knowledge, tools, and an online service to with which users make better informed decisions
reduce the TCO for their Big Data infrastructures
Guide the future development and deployment of Big Data clusters and applications
![Page 13: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/13.jpg)
Challenges, options, and implementation
![Page 14: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/14.jpg)
Challenges (circa end 2013) Test different clusters architectures
On-premise Commodity, high-end, appliance, low-power
Cloud IaaS 32 different VMs in Azure, similar in other
providers
Cloud PaaS HDInsight, EMR, CloudBigData
Different access level Full admin, user-only, request-to-install,
everything ready, queuing systems (SGE)
Different versions Hadoop, JVM, Spark, Hive, etc…
Dev environments and testing Big Data usually requires a cluster to
develop and test
![Page 15: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/15.jpg)
Benchmarking vs. Production envs Need to compare different executions
Not how the systems are doing now This is the main diff with prod products
Dada does not change (non-OLTP) Temporary data for benchmarks vs. Important data
Fast iteration vs. Reliability Iterates configurations vs. fixed config
Many fast, experimental changes
Security can be relaxed Management for Hadoop
Vendor lock-in Lack of systems support (azure, on-prem, low-power) Hadoop is our use case, not the only one
Leave no traces on the benchmarked system
![Page 16: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/16.jpg)
Available options: (circa end 2013) Deployment
jclouds foreman Puppet Ambari
Config and deploy Ambari (hadoop only) Use Configuration
Management (CM) Puppet, chef, ansible…
Monitoring Ganglia, Zabbix Amabari Cloudera Manager Kibana, GraphD…
Problems All systems though for PROD
Not for comparison
No Azure support Many different packages No one-fits-all solution
Solution Custom implementation Based in simple components Wrapping commands
![Page 17: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/17.jpg)
ALOJA Platform main components
2 Online Repository
•Explore results
•Execution details
•Cluster details
•Costs
•Data sharing
3 Web Analytics
•Data views and evaluations
•Aggregates
•Abstracted Metrics
•Job characterization
•Machine Learning
•Predictions and clustering
1 Big Data Benchmarking
•Deploy & Provision
•Conf Management
•Parameter selection & Queuing
•Perf counters
•Low-level instrumentation
•App logs
17
NGINX, PHP, MySQL
BASH, Unix tools, CLIs R, SQL, JS
![Page 18: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/18.jpg)
Workflow in ALOJA Cluster(s) definition
• VM sizes
• # nodes
• OS, disks
• Capabilities
Execution plan
• Start cluster
• Exec Benchmarks
• Gather results
• Cleanup
Import data
• Convert perf metric
• Parse logs
• Import into DB
Evaluate data
• Data views in Vagrant VM
• Or http://hadoop.bsc.es
PA and KD •Predictive
Analytics
•Knowledge Discovery
Historic Repo
(in progress)
![Page 19: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/19.jpg)
Cluster and node definitions
Clusters (Azure example) Node (Web in Rackspace) #load AZURE defaults
source "$CONF_DIR/azure_defaults.conf"
clusterName="al-08"
numberOfNodes="8"
vmSize=“Large”
#details
vmCores="4"
vmRAM="7" #in GB
#costs
clusterCostHour="1.584" #0.176 * 9 clusterType="IaaS"
clusterDescription="A3 type VMs"
#load node defaults
source “$CONF_DIR/node_defaults.conf"
defaultProvider="rackspace"
vm_name="aloja-web"
vmSize='io1-30'
attachedVolumes="2"
diskSize="1023"
# Node roles (install functions)
extraLocalCommands="
vm_install_webserver;
vm_install_repo 'provider/rackspace';
install_ganglia_gmond;
config_ganglia_gmond 'aloja-web-rackspace' 'aloja-web';
install_percona /scratch/attached/2/mysql;"
![Page 20: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/20.jpg)
Commands and providers
Provisioning commands Providers
Connect
Node and Cluster
Uses SSH proxies automatically
Deploy
Start, Stop
Delete
Nodes and clusters
On-premise Custom settings for
clusters Multiple disk types
Different architectures
Cloud IaaS Azure, OpenStack,
Rackspace, AWS (testing)
Cloud PaaS HDInsight, CloudBigData,
EMR soon
Code at: https://github.com/Aloja/aloja/tree/master/aloja-deploy
![Page 21: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/21.jpg)
Running benchmarks in ALOJA Example of submitting a job to run:
https://github.com/Aloja/aloja/blob/master/aloja-bench/run_benchs.sh
To queue jobs and control results: https://github.com/Aloja/aloja/blob/master/shell/exeq.sh
![Page 22: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/22.jpg)
Benchmarking results
![Page 23: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/23.jpg)
ALOJA Online Benchmark Repository Entry point for explore the results collected from the executions
Index of executions Quick glance of executions
Searchable, Sortable
Execution details Performance charts and histograms
Hadoop counters
Jobs and task details
Data management of benchmark executions Data importing from different clusters
Execution validation
Data management and backup
Cluster definitions Cluster capabilities (resources) Cluster costs
Sharing results Download executions
Add external executions
Documentation and References Papers, links, and feature documentation
Available at: http://aloja.bsc.es
![Page 24: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/24.jpg)
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp.
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using: http://hadoop.bsc.es/configimprovement Details: https://raw.githubusercontent.com/Aloja/aloja/master/publications/BSC-MSR_ALOJA.pdf
![Page 25: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/25.jpg)
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes
/tmp local
2 Remotes /tmp local
1 Remotes
/tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using: http://hadoop.bsc.es/configimprovement Details: https://raw.githubusercontent.com/Aloja/aloja/master/publications/BSC-MSR_ALOJA.pdf
![Page 26: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/26.jpg)
Speedup: all disk configurations SSD vs JBOD For DFSIOE read, DFSIOE write, and Terasort
URL: http://hadoop.bsc.es/configimprovement?datefrom=&dateto=&benchs%5B%5D=dfsioe_read&benchs%5B%5D=dfsioe_write&benchs%5B%5D=terasort&id_clusters%5B%5D=21&nets%5B%5D=None&disks%5B%5D=HD2&disks%5B%5D=HD3&disks%5B%5D=HD4&disks%5B%5D=HD5&disks%5B%5D=HDD&disks%5B%5D=HS5&disks%5B%5D=RL1&disks%5B%5D=RL2&disks%5B%5D=RL3&disks%5B%5D=RL4&disks%5B%5D=RL5&disks%5B%5D=RL6&disks%5B%5D=RR1&disks%5B%5D=SS2&disks%5B%5D=SSD&mapss%5B%5D=None&comps%5B%5D=None&replications%5B%5D=None&blk_sizes%5B%5D=None&iosfs%5B%5D=None&iofilebufs%5B%5D=None&datanodess%5B%5D=None&bench_types%5B%5D=HDI&bench_types%5B%5D=HiBench&vm_sizes%5B%5D=None&vm_coress%5B%5D=None&vm_RAMs%5B%5D=None&hadoop_versions%5B%5D=None&types%5B%5D=None&filters%5B%5D=valid&filters%5B%5D=filters&allunchecked=
2 SSDs
5 SATA 1 SSD /tmp
1 SSD
1 SATA
2 SATA
3 SATA
4 SATA
5 SATA
Higher is better
Fastest config
High capacity and fast
High capacity but slow
![Page 27: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/27.jpg)
Speedup by disk configuration in the Cloud (higher is better)
URL
http://104.130.159.92/configimprovement?benchs%5B%5D=terasort&disks%5B%5D=HDD&disks%5B%5D=RL1&disks%5B%5D=RL2&disks%5B%5D=RL3&disks%5B%5D=RR1&disks%5B%5D=RR2 &disks%5B%5D=RR3&disks%5B%5D=RR4&disks%5B%5D=RR5&disks%5B%5D=RR6&disks%5B%5D=RS1&disks%5B%5D=RS6&disks%5B%5D=SSD&bench_types%5B%5D =HiBench&filters%5B%5D=valid&filters%5B%5D=filters&allunchecked=&selected-groups=disk&datefrom=&dateto=&minexetime=150&maxexetime=1500
1-6 remotes
1 and 6 remotes with /tmp on SSD
SSD only
Higher is better
![Page 28: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/28.jpg)
VM Size comparison (Azure) Lower is better
![Page 29: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/29.jpg)
Preview: Cost/Performance Scalability
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size X axis number of datanodes (cluster size Left Y Execution time (lower is better) Right Y Execution cost
Execution time Execution cost
Recommended size
![Page 30: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/30.jpg)
InfiniBand + SDD (LOCAL)
GbE SDD + (LOCAL) CLOUD (local disk /tmp and HDFS)
CLOUD (/tmp in Local Disk, HDFS in Blob storage 1-3 devices)
CLOUD (/tmp and HDFS in Blob storage 1-3 devices)
InfiniBand + SATA disks (LOCAL)
GbE+ SATA disks (LOCAL)
Price
Performance
Cost-effectiveness On-premise vs. Cloud)
Details at: https://raw.githubusercontent.com/Aloja/aloja/master/publications/BSC-MSR_ALOJA.pdf
![Page 31: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/31.jpg)
Open questions: is BASH good enough?
PROs CONs and Alternatives
Simple and Fast Well known
(basics at least)
Easy to hack
Most of the work requires running sys commands
Custom implementation problems Missing some systems
Too simple, missing: objects, inheritance,
types, data structures, testing
Python? Perl?
Puppet? Ansible?
We’ll stick to bash for now..
What’s missing for incubating in Apache?
![Page 32: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/32.jpg)
More info: ALOJA Benchmarking platform and online repository
http://aloja.bsc.es
Benchmarking Big Data by Nicolas Poggi http://www.slideshare.net/ni_po/benchmarking-hadoop
Big Data Benchmarking Community (BDBC) mailing list (~200 members from ~80organizations) http://clds.sdsc.edu/bdbc/community
Workshop Big Data Benchmarking (WBDB) Next: http://clds.sdsc.edu/wbdb2015.ca
SPEC Research Big Data working group http://research.spec.org/working-groups/big-data-working-group.html
Slides and video: Michael Frank on Big Data benchmarking
http://www.tele-task.de/archive/podcast/20430/
Tilmann Rabl Big Data Benchmarking Tutorial http://www.slideshare.net/tilmann_rabl/ieee2014-tutorialbarurabl
![Page 33: sudoers: Benchmarking Hadoop with ALOJA](https://reader031.vdocuments.mx/reader031/viewer/2022030211/58a2515f1a28abe8738b5cf9/html5/thumbnails/33.jpg)
@BDOOP_BCN
More info: http://aloja.bsc.es
or join BDOOP group http://www.meetup.com/Barcelona-BigData-Perfomance-and-
Operations
Oct 06, 2015