improving hadoop performance via linux

Improving Hadoop Cluster Performance via Linux Configura:on 2014 Hadoop Summit – San Jose, California Alex Moundalexis alexm at clouderagovt.com @technmsg

2

Tips from a Former SA

Click to edit Master :tle style

CC BY 2.0 / Richard Bumgardner

Been there, done that.

4

Tips from a Former SA Field Guy


CC BY 2.0 / Alex Moundalexis

Home sweet home.

6

Tips from a Former SA Field Guy Easy steps to take…

7

Tips from a Former SA Field Guy Easy steps to take… that most people don’t.

What This Talk Isn’t About

•  Deploying •  Puppet, Chef, Ansible, homegrown scripts, intern labor

•  Sizing & Tuning •  Depends heavily on data and workload

•  Coding •  Unless you count STDOUT redirec:on

•  Algorithms •  I suck at math, but we’ll try some mul:plica:on later

8

9

“ The answer to most Hadoop ques:ons is it

depends.”

So What ARE We Talking About?

•  Seven simple things •  Quick •  Safe •  Viable for most environments and use cases

•  Iden:fy issue, then offer solu:on

•  Note: Commands run as root or sudo

10

11

Bad news, best not to…

1. Swapping

Swapping

•  A form of memory management •  When OS runs low on memory…

•  write blocks to disk •  use now-‐free memory for other things •  read blocks back into memory from disk when needed

•  Also known as paging

12

Swapping

•  Problem: Disks are slow, especially to seek •  Hadoop is about maximizing IO

•  spend less :me acquiring data •  operate on data in place •  large streaming reads/writes from disk

•  Memory usage is limited within JVM •  we should be able to manage our memory

13

Disable Swap in Kernel

•  Well, as much as possible.

•  Immediate: # echo 0 > /proc/sys/vm/swappiness

•  Persist ager reboot: # echo “vm.swappiness = 0” >> /etc/sysctl.conf

14

Swapping Peculiari:es

•  Behavior varies based on Linux kernel •  CentOS 6.4+ / Ubuntu 10.10+ •  For you kernel gurus, that’s Linux 2.6.32-‐303+

•  Prior •  We don’t swap, except to avoid OOM condi:on.

•  Ager •  We don’t swap, ever.

•  Details: hkp://:ny.cloudera.com/noswap

15

16

Disable this too.

2. File Access Time

File Access Time

•  Linux tracks access :me •  writes to disk even if all you did was read

•  Problem •  more disk seeks •  HDFS is write-‐once, read-‐many •  NameNode tracks access informa:on for HDFS

17

Don’t Track Access Time

•  Mount volumes with noatime op:on •  In /etc/fstab: /dev/sdc /data01 ext3 defaults,noatime 0

•  Note: noatime assumes nodirtime as well •  What about relatime?

•  Faster than atime but slower than noatime •  No reboot required

•  # mount -‐o remount /data01

18

19

Reclaim it, impress your bosses!

3. Root Reserved Space

Root Reserved Space

•  EXT3/4 reserve 5% of disk for root-‐owned files •  On an OS disk, sure •  System logs, kernel panics, etc

20


CC BY 2.0 / Alex Moundalexis

Disks used to be much smaller, right?

Do The Math

•  Conserva:ve •  5% of 1 TB disk = 46 GB •  5 data disks per server = 230 GB •  5 servers per rack = 1.15 TB

•  Quasi-‐Aggressive •  5% of 4 TB disk = 186 GB •  12 data disks per server = 2.23 TB •  18 servers per rack = 40.1 TB

•  That’s a LOT of unused storage! 22

Root Reserved Space

•  On a Hadoop data disk, no root-‐owned files

•  When crea:ng a par::on # mkfs.ext3 –m 0 /dev/sdc

•  On exis:ng par::ons # tune2fs -‐m 0 /dev/sdc

•  0 is safe, 1 is for the ultra-‐paranoid

23

24

Turn it on, already!

4. Name Service Cache Daemon

Name Service Cache Daemon

•  Daemon that caches name service requests •  Passwords •  Groups •  Hosts

•  Helps weather network hiccups •  Helps more with high latency LDAP, NIS, NIS+ •  Small footprint •  Zero configura:on required

25


•  Hadoop nodes •  largely a network-‐based applica:on •  on the network constantly •  issue lots of DNS lookups, especially HBase & distcp •  can thrash DNS servers

•  Reducing latency of service requests? Smart. •  Reducing impact on shared infrastructure? Smart.

26


•  Turn it on, let it work, leave it alone: # chkconfig -‐-‐level 345 nscd on # service nscd start

•  Check on it later: # nscd -‐g

•  Unless using Red Hat SSSD; modify ncsd config first! •  Don’t use nscd to cache passwd, group, or netgroup •  Red Hat, Using NSCD with SSSD. hkp://goo.gl/68HTMQ

27

28

Not a problem, un:l they are.

5. File Handle Limits

File Handle Limits

•  Kernel refers to files via a handle •  Also called descriptors

•  Linux is a mul:-‐user system •  File handles protect the system from

•  Poor coding •  Malicious users •  Pictures of cats on the Internet

29

30 Microsog Office EULA. Really.

java.io.FileNotFoundExcep:on: (Too many open files)

File Handle Limits

•  Linux defaults usually not enough •  Increase maximum open files (default 1024)

# echo hdfs – nofile 32768 >> /etc/security/limits.conf # echo mapred – nofile 32768 >> /etc/security/limits.conf # echo hbase – nofile 32768 >> /etc/security/limits.conf

•  Bonus: Increase maximum processes too # echo hdfs – nproc 32768 >> /etc/security/limits.conf # echo mapred – nproc 32768 >> /etc/security/limits.conf # echo hbase – nproc 32768 >> /etc/security/limits.conf

•  Note: Cloudera Manager will do this for you.

31

32

Don’t be tempted to share, even on monster disks.

6. Dedicated Disk for OS and Logs

The Situa:on in Easy Steps

1.  Your new server has a dozen 1 TB disks 2.  Eleven disks are used to store data 3.  One disk is used for the OS

•  20 GB for the OS •  980 GB sits unused

4.  Someone asks “can we store data there too?” 5.  Seems reasonable, lots of space… “OK, why not.”

Sound familiar?

33

34 Microsog Office EULA. Really.

I don’t understand it, there’s no consistency to these run >mes!

No Love for Shared Disk

•  Our quest for data gets interrupted a lot: •  OS opera:ons •  OS logs •  Hadoop logging, quite chaky •  Hadoop execu:on •  userspace execu:on

•  Disk seeks are slow, remember?

35

Dedicated Disk for OS and Logs

•  At install :me •  Disk 0, OS & logs •  Disk 1-‐n, Hadoop data

•  Ager install, more complicated effort, requires manual HDFS block rebalancing: 1.  Take down HDFS

•  If you can do it in under 10 minutes, just the DataNode 2.  Move or distribute blocks from disk0/dir to disk[1-‐n]/dir 3.  Remove dir from HDFS config (dfs.data.dir) 4.  Start HDFS

36

37

Sane, both forward and reverse.

7. Name Resolu:on

Name Resolu:on Op:ons

1.  Hosts file, if you must 2.  DNS, much preferred

38

Name Resolu:on with Hosts File

•  Set canonical names properly

•  Right 10.1.1.1 r01m01.cluster.org r01m01 master1 10.1.1.2 r01w01.cluster.org r01w01 worker1

•  Wrong 10.1.1.1 r01m01 r01m01.cluster.org master1 10.1.1.2 r01w01 r01w01.cluster.org worker1

39

Name Resolu:on with Hosts File

•  Set loopback address properly •  Ensure 127.0.0.1 resolves to localhost, NOT hostname

•  Right 127.0.0.1 localhost

•  Wrong 127.0.0.1 r01m01

40

Name Resolu:on with DNS

•  Forward •  Reverse

•  Hostname should MATCH the FQDN in DNS

41

This Is What You Ought to See

42

Name Resolu:on Errata

•  Mismatches? Expect odd results. •  Problems star:ng DataNodes •  Non-‐FQDN in Web UI links •  Security features are extra sensi:ve to FQDN

•  Errors so common that link to FAQ is included in logs! •  hkp://wiki.apache.org/hadoop/UnknownHost

•  Get name resolu:on working BEFORE enabling nscd!

43

44

Time to take out your camera phones…

Summary

Summary

1.  disable vm.swappiness 2.  data disks: mount with noatime op:on 3.  data disks: disable root reserve space 4.  enable nscd 5.  increase file handle limits 6.  use dedicated OS/logging disk 7.  sane name resolu:on

hkp://:ny.cloudera.com/7steps

45

Recommended Reading

•  Hadoop Opera:ons hkp://amzn.to/1hDaN9B

46

47

Preferably related to the talk…

Ques:ons?

48

Thank You! Alex Moundalexis alexm at clouderagovt.com @technmsg We’re hiring, kids! Well, not kids.

49

Because we had enough :me…

8. Bonus Round

Others Things to Check

•  Disk IO •  hdparm

•  # hdparm -‐Tt /dev/sdc •  Looking for at least 70 MB/s from 7200 RPM disks •  Slower could indicate a failing drive, disk controller, array, etc.

•  dd •  hkp://romanrm.ru/en/dd-‐benchmark

50


•  Disable Red Hat Transparent Huge Pages (RH6+ Only) •  Can reduce elevated CPU usage •  In rc.local:

echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled

•  Reference: Linux 6 Transparent Huge Pages and Hadoop Workloads, hkp://goo.gl/WSF2qC

51


•  Enable Jumbo Frames •  Only if your network infrastructure supports it! •  Can easily (and arguably) boost throughput by 10-‐20%

52


•  Enable Jumbo Frames •  Only if your network infrastructure supports it! •  Can easily (and arguably) boost throughput by 10-‐20%

•  Monitor Everything •  How else will you know what’s happening? •  Nagios, Ganglia, CM, Ambari

53

54

Thank You! Alex Moundalexis alexm at clouderagovt.com @technmsg We’re hiring, kids! Well, not kids.

improving hadoop performance via linux

Technology

alex moundalexis alexm

tle style cc

network infrastructure supports

noatime op

edit master

boost throughput

dedicated disk

owned les