oc big data monthly meetup #5 - session 1 - altiscale
DESCRIPTION
Debugging Hive with Hadoop in the Cloud --- Soam Acharya, David Chaiken, Denis Sheahan, Charles Wimmer @AltiscaleTRANSCRIPT
![Page 1: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/1.jpg)
DEBUGGING HIVE WITH HADOOP IN THE CLOUD Soam Acharya, David Chaiken, Denis Sheahan, Charles Wimmer Altiscale, Inc. #OCBigData @ 20140917T1845-0700
![Page 2: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/2.jpg)
WHO ARE WE?
• Altiscale: Infrastructure Nerds! • Hadoop As A Service • Rack and build our own Hadoop clusters • Provide a suite of Hadoop tools
o Hive, Pig, Oozie o Others as needed: R, Python, Spark, Mahout, Impala, etc.
• Monthly billing plan: compute (YARN), storage (HDFS) • https://www.altiscale.com • @Altiscale #HadoopSherpa
![Page 3: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/3.jpg)
TALK ROADMAP
• Our Platform and Perspective • Hadoop 2 Primer • Hadoop Debugging Tools • Accessing Logs in Hadoop 2 • Hive + Hadoop Architecture • Hive Logs • Hive Issues + Case Studies
o Hive + Interactive (DRAM Centric) Processing Engines
• Conclusion: Making Hive Easier to Use
![Page 4: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/4.jpg)
OUR DYNAMIC PLATFORM
• Hadoop 2.0.5 => Hadoop 2.2.0 => Hadoop 2.4.1 • Hive 0.10 => Hive 0.12 => Stinger (Hive 0.13 + Tez) • Hive, Pig and Oozie most commonly used tools • Working with customers on:
Spark, Impala, 0xdata, Flume, Camus/Kafka, …
![Page 5: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/5.jpg)
ALTISCALE PERSPECTIVE
• What we do as a service provider… o Performance + Reliability: Jobs finish faster, fewer failures o Instant Access: Always-on access to HDFS and YARN o Hadoop Helpdesk: Tools + experts ensure customer success o Secure: Networking, SOC 2 Audit, Kerberos o Results: Faster Time-to-Value (TTV), Lower TCO
• Operational approach in this presentation… o How to use Hadoop 2 cluster tools and logs
to debug and to tune Hive o This talk will not focus on query optimization
![Page 6: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/6.jpg)
!!!Hadoop!2!Cluster!
Name!Node!!
Hadoop!Slave!
Hadoop!Slave!
Hadoop!Slave!
Resource!Manager!!
Secondary!NameNode!!
Hadoop!Slave!
Node!Managers!+!!
Data!Nodes!
QUICK PRIMER – HADOOP 2
![Page 7: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/7.jpg)
QUICK PRIMER – HADOOP 2 YARN
• Resource Manager (per cluster) o Manages job scheduling and execution o Global resource allocation
• Application Master (per job) o Manages task scheduling and execution o Local resource allocation
• Node Manager (per-machine agent) o Manages the lifecycle of task containers o Reports to RM on health and resource usage
![Page 8: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/8.jpg)
HADOOP 1 VS HADOOP 2
• No more JobTrackers, TaskTrackers • YARN ~ Operating System for Clusters
o MapReduce is implemented as a YARN application o Bring on the applications! (Spark is just the start…)
• Should be Transparent to Hive users
![Page 9: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/9.jpg)
HADOOP 2 DEBUGGING TOOLS
• Monitoring o System state of cluster:
! CPU, Memory, Network, Disk
! Nagios, Ganglia, Sensu!
! Collectd, statd, Graphite
o Hadoop level ! HDFS usage
! Resource usage: • Container memory allocated vs used
• # of jobs running at the same time
• Long running tasks
![Page 10: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/10.jpg)
HADOOP 2 DEBUGGING TOOLS
• Hadoop logs o Daemon logs: Resource Manager, NameNode, DataNode o Application logs: Application Master, MapReduce tasks o Job history file: resources allocated during job lifetime o Application configuration files: store all Hadoop application
parameters
• Source code instrumentation
![Page 11: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/11.jpg)
![Page 12: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/12.jpg)
ACCESSING LOGS IN HADOOP 2
• To view the logs for a job, click on the link under the ID column in Resource Manager UI.
![Page 13: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/13.jpg)
ACCESSING LOGS IN HADOOP 2 • To view application top level logs, click on logs. • To view individual logs for the mappers and reducers,
click on History.
![Page 14: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/14.jpg)
ACCESSING LOGS IN HADOOP 2
• Log output for the entire application.
![Page 15: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/15.jpg)
ACCESSING LOGS IN HADOOP 2
• Click on the Map link for mapper logs and the Reduce link for reducer logs.
![Page 16: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/16.jpg)
ACCESSING LOGS IN HADOOP 2
• Clicking on a single link under Name provides an overview for that particular map job.
![Page 17: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/17.jpg)
ACCESSING LOGS IN HADOOP 2
• Finally, clicking on the logs link will take you to the log output for that map job.
![Page 18: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/18.jpg)
ACCESSING LOGS IN HADOOP 2
• Fun, fun, donuts, and more fun…
![Page 19: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/19.jpg)
HIVE + HADOOP 2 ARCHITECTURE
• Hive 0.10+
!!!Hadoop!2!Cluster!
Hive!CLI! Hive!Metastore!
Hiveserver!JDBC/ODBC!
AlaCon,!KeFle,!…!
![Page 20: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/20.jpg)
HIVE LOGS
• Query Log location • From /etc/hive/hive-site.xml:
<property>" <name>hive.querylog.location</name>" <value>/home/hive/log/${user.name}</value>"</property>""
SessionStart SESSION_ID="soam_201402032341" TIME="1391470900594""
"
![Page 21: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/21.jpg)
HIVE CLIENT LOGS • /etc/hive/hive-log4j.properties:
o hive.log.dir=/var/log/hive/${user.name}
2014-05-29 19:51:09,830 INFO parse.ParseDriver (ParseDriver.java:parse(179)) - Parsing command: select count(*) from dogfood_job_data"
2014-05-29 19:51:09,852 INFO parse.ParseDriver (ParseDriver.java:parse(197)) - Parse Completed"
2014-05-29 19:51:09,852 INFO ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=parse start=1401393069830 end=1401393069852 duration=22>"
2014-05-29 19:51:09,853 INFO ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=semanticAnalyze>"
2014-05-29 19:51:09,890 INFO parse.SemanticAnalyzer (SemanticAnalyzer.java:analyzeInternal(8305)) - Starting Semantic Analysis"
2014-05-29 19:51:09,892 INFO parse.SemanticAnalyzer (SemanticAnalyzer.java:analyzeInternal(8340)) - Completed phase 1 of Semantic Analysis"
2014-05-29 19:51:09,892 INFO parse.SemanticAnalyzer (SemanticAnalyzer.java:getMetaData(1060)) - Get metadata for source tables"
2014-05-29 19:51:09,906 INFO parse.SemanticAnalyzer (SemanticAnalyzer.java:getMetaData(1167)) - Get metadata for subqueries"
2014-05-29 19:51:09,909 INFO parse.SemanticAnalyzer (SemanticAnalyzer.java:getMetaData(1187)) - Get metadata for destination tables"
"
![Page 22: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/22.jpg)
HIVE METASTORE LOGS • /etc/hive-metastore/hive-log4j.properties:
o hive.log.dir=/service/log/hive-metastore/${user.name}
2014-05-29 19:50:50,179 INFO metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 200: source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data"
2014-05-29 19:50:50,180 INFO HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=chaiken ip=/10.252.18.94 cmd=source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data "
2014-05-29 19:50:50,236 INFO metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 200: source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data"
2014-05-29 19:50:50,236 INFO HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=chaiken ip=/10.252.18.94 cmd=source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data "
2014-05-29 19:50:50,261 INFO metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 200: source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data"
![Page 23: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/23.jpg)
HIVE ISSUES + CASE STUDIES
• Hive Issues o Hive client out of memory o Hive map/reduce task out of memory o Hive metastore out of memory o Hive launches too many tasks
• Case Studies: o Hive “stuck” job o Hive “missing directories” o Analyze Hive Query Execution o Hive + Interactive (DRAM Centric) Processing Engines
![Page 24: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/24.jpg)
HIVE CLIENT OUT OF MEMORY • Memory intensive client side hive query (map-side join)
Number of reduce tasks not specified. Estimated from input data size: 999"In order to change the average load for a reducer (in bytes):" set hive.exec.reducers.bytes.per.reducer=<number>"In order to limit the maximum number of reducers:" set hive.exec.reducers.max=<number>"In order to set a constant number of reducers:" set mapred.reduce.tasks=<number>"java.lang.OutOfMemoryError: Java heap space! at java.nio.CharBuffer.wrap(CharBuffer.java:350)" at java.nio.CharBuffer.wrap(CharBuffer.java:373)" at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)"
![Page 25: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/25.jpg)
HIVE CLIENT OUT OF MEMORY
• Use HADOOP_HEAPSIZE prior to launching Hive client • HADOOP_HEAPSIZE=<new heapsize> hive <fileName>"
• Watch out for HADOOP_CLIENT_OPTS issue in hive-env.sh! • Important to know the amount of memory available on
machine running client… Do not exceed or use disproportionate amount.
$ free -m" total used free shared buffers cached"Mem: 1695 1388 306 0 60 424"-/+ buffers/cache: 903 791"Swap: 895 101 794"!!
![Page 26: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/26.jpg)
HIVE TASK OUT OF MEMORY
• Query spawns MapReduce jobs that run out of memory • How to find this issue?
o Hive diagnostic message o Hadoop MapReduce logs
![Page 27: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/27.jpg)
HIVE TASK OUT OF MEMORY • Fix is to increase task RAM allocation… set mapreduce.map.memory.mb=<new RAM allocation>; "set mapreduce.reduce.memory.mb=<new RAM allocation>;"
• Also watch out for… set mapreduce.map.java.opts=-Xmx<heap size>m; "set mapreduce.reduce.java.opts=-Xmx<heap size>m; "
• Not a magic bullet – requires manual tuning • Increase in individual container memory size:
o Decrease in overall containers that can be run o Decrease in overall parallelism
![Page 28: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/28.jpg)
HIVE METASTORE OUT OF MEMORY
• Out of memory issues not necessarily dumped to logs • Metastore can become unresponsive • Can’t submit queries • Restart with a higher heap size: export HADOOP_HEAPSIZE in hcat_server.sh
• After notifying hive users about downtime: service hcat restart"
![Page 29: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/29.jpg)
HIVE LAUNCHES TOO MANY TASKS
• Typically a function of the input data set • Lots of little files
![Page 30: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/30.jpg)
HIVE LAUNCHES TOO MANY TASKS • Set mapred.max.split.size to appropriate fraction of data size
• Also verify that
hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat"
![Page 31: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/31.jpg)
CASE STUDY: HIVE STUCK JOB
From an Altiscale customer:
“This job [jobid] has been running now for 41 hours. Is it still progressing or has something hung up the map/reduce so it’s just spinning? Do you have any insight?”
![Page 32: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/32.jpg)
HIVE STUCK JOB
1. Received jobId, application_1382973574141_4536, from client
2. Logged into client cluster. 3. Pulled up Resource Manager 4. Entered part of jobId (4536) in the search box. 5. Clicked on the link that says:
application_1382973574141_4536"6. On resulting Application Overview page, clicked on link
next to “Tracking URL” that said Application Master
![Page 33: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/33.jpg)
HIVE STUCK JOB 7. On resulting MapReduce Application page, we clicked on the
Job Id (job_1382973574141_4536). 8. The resulting MapReduce Job page displayed detailed status
of the mappers, including 4 failed mappers 9. We then clicked on the 4 link on the Maps row in the Failed
column. 10. Title of the next page was “FAILED Map attempts in
job_1382973574141_4536.” 11. Each failed mapper generated an error message. 12. Buried in the 16th line: Caused by: java.io.FileNotFoundException: File does not exist: hdfs://opaque_hostname:8020/HiveTableDir/FileName.log.date.seq !
![Page 34: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/34.jpg)
HIVE STUCK JOB
• Job was stuck for a day or so, retrying a mapper that would never finish successfully.
• During the job, our customers’ colleague realized input file was corrupted and deleted it.
• Colleague did not anticipate the affect of removing corrupted data on a running job
• Hadoop didn’t make it easy to find out: o RM => search => application link => AM overview page => MR
Application Page => MR Job Page => Failed jobs page => parse long logs
o Task retry without hope of success
![Page 35: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/35.jpg)
HIVE “MISSING DIRECTORIES”
From an Altiscale customer:
“One problem we are seeing after the [Hive Metastore] restart is that we lost quite a few directories in [HDFS]. Is there a way to recover these?”
![Page 36: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/36.jpg)
HIVE “MISSING DIRECTORIES” • Obtained list of “missing” directories from customer:
o /hive/biz/prod/* • Confirmed they were missing from HDFS • Searched through NameNode audit log to get block IDs that
belonged to missing directories.
13/07/24 21:10:08 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hive/biz/prod/incremental/carryoverstore/postdepuis/lmt_unmapped_pggroup_schema._COPYING_. BP-798113632-10.251.255.251-1370812162472 blk_3560522076897293424_2448396{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.251.255.177:50010|RBW], ReplicaUnderConstruction[10.251.255.174:50010|RBW], ReplicaUnderConstruction[10.251.255.169:50010|RBW]]}"
![Page 37: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/37.jpg)
HIVE “MISSING DIRECTORIES”
• Used blockID to locate exact time of file deletion from Namenode logs:
13/07/31 08:10:33 INFO hdfs.StateChange: BLOCK* addToInvalidates: blk_3560522076897293424_2448396 to 10.251.255.177:50010 10.251.255.169:50010 10.251.255.174:50010 "• Used time of deletion to inspect hive logs
![Page 38: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/38.jpg)
HIVE “MISSING DIRECTORIES” QueryStart QUERY_STRING="create database biz_weekly location '/hive/biz/prod'" QUERY_ID=“usrprod_20130731043232_0a40fd32-8c8a-479c-ba7d-3bd8a2698f4b" TIME="1375245164667" : QueryEnd QUERY_STRING="create database biz_weekly location '/hive/biz/prod'" QUERY_ID=”usrprod_20130731043232_0a40fd32-8c8a-479c-ba7d-3bd8a2698f4b" QUERY_RET_CODE="0" QUERY_NUM_TASKS="0" TIME="1375245166203" : QueryStart QUERY_STRING="drop database biz_weekly" QUERY_ID=”usrprod_20130731073333_e9acf35c-4f07-4f12-bd9d-bae137ae0733" TIME="1375256014799" : QueryEnd QUERY_STRING="drop database biz_weekly" QUERY_ID=”usrprod_20130731073333_e9acf35c-4f07-4f12-bd9d-bae137ae0733" QUERY_NUM_TASKS="0" TIME="1375256014838"
![Page 39: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/39.jpg)
HIVE “MISSING DIRECTORIES”
• In effect, user “usrprod” issued: At 2013-07-31 04:32:44: create database biz_weekly location '/hive/biz/prod' At 2013-07-31 07:33:24: drop database biz_weekly • This is functionally equivalent to:
hdfs dfs -rm -r /hive/biz/prod"
![Page 40: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/40.jpg)
HIVE “MISSING DIRECTORIES”
• Customer manually placed their own data in /hive – the warehouse directory managed and controlled by hive
• Customer used CREATE and DROP db commands in their code
o Hive deletes database and table locations in /hive with impunity
• Why didn’t deleted data end up in .Trash? o Trash collection not turned on in configuration settings o It is now, but need a –skipTrash option (HIVE-6469)
![Page 41: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/41.jpg)
HIVE “MISSING DIRECTORIES” • Hadoop forensics: piece together disparate sources…
o Hadoop daemon logs (NameNode) o Hive query and metastore logs o Hadoop config files
• Need better tools to correlate the different layers of the system: hive client, hive metastore, MapReduce job, YARN, HDFS, operating sytem metrics, …
By the way… Operating any distributed system would be totally insane without NTP and a standard time zone (UTC).
![Page 42: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/42.jpg)
CASE STUDY – ANALYZE QUERY
• Customer provided Hive query + data sets (100GBs to ~5 TBs)
• Needed help optimizing the query • Didn’t rewrite query immediately • Wanted to characterize query performance and isolate
bottlenecks first
![Page 43: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/43.jpg)
ANALYZE AND TUNE EXECUTION
• Ran original query on the datasets in our environment: o Two M/R Stages: Stage-1, Stage-2
• Long running reducers run out of memory o set mapreduce.reduce.memory.mb=5120"o Reduces slots and extends reduce time
• Query fails to launch Stage-2 with out of memory o set HADOOP_HEAPSIZE=1024 on client machine
• Query has 250,000 Mappers in Stage-2 which causes failure
o set mapred.max.split.size=5368709120 to reduce Mappers
![Page 44: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/44.jpg)
ANALYSIS: HOW TO VISUALIZE?
• Next challenge - how to visualize job execution? • Existing hadoop/hive logs not sufficient for this task • Wrote internal tools
o parse job history files o plot mapper and reducer execution
![Page 45: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/45.jpg)
ANALYSIS: MAP STAGE-1
![Page 46: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/46.jpg)
Single!reduce!task!
ANALYSIS: REDUCE STAGE-1
![Page 47: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/47.jpg)
ANALYSIS: MAP STAGE-2
![Page 48: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/48.jpg)
ANALYSIS: REDUCE STAGE-2
![Page 49: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/49.jpg)
ANALYZE EXECUTION: FINDINGS
• Lone, long running reducer in first stage of query • Analyzed input data:
o Query split input data by userId o Bucketizing input data by userId o One very large bucket: “invalid” userId o Discussed “invalid” userid with customer
• An error value is a common pattern! o Need to differentiate between “Don’t know and don’t care”
or “don’t know and do care.”
![Page 50: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/50.jpg)
INTERACTIVE (DRAM CENTRIC) PROCESSING SYSTEMS
• Loading data into DRAM makes processing fast! • Examples: Spark, Impala, 0xdata, …, [SAP HANA], … • Streaming systems (Storm, DataTorrent) may be similar • Need to increase YARN container memory size
![Page 51: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/51.jpg)
HIVE + INTERACTIVE: WATCH OUT FOR CONTAINER SIZE • Caution: larger YARN container settings for interactive
jobs may not be right for batch systems like Hive • Container size: needs to combine vcores and memory: yarn.scheduler.maximum-allocation-vcores yarn.nodemanager.resource.cpu-vcores ..."
![Page 52: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/52.jpg)
HIVE + INTERACTIVE: WATCH OUT FOR FRAGMENTATION
• Attempting to schedule interactive systems and batch systems like Hive may result in fragmentation
• Interactive systems may require all-or-nothing scheduling • Batch jobs with little tasks may starve interactive jobs
![Page 53: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/53.jpg)
HIVE + INTERACTIVE: WATCH OUT FOR FRAGMENTATION
Solutions for fragmentation… • Reserve interactive nodes before starting batch jobs • Reduce interactive container size (if the algorithm permits) • Node labels (YARN-726) and gang scheduling (YARN-624)
![Page 54: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/54.jpg)
CONCLUSIONS
• Hive + Hadoop debugging can get very complex o Sifting through many logs and screens o Automatic transmission versus manual transmission
• Static partitioning induced by Java Virtual Machine has benefits but also induces challenges.
• Where there are difficulties, there’s opportunity: o Better tooling, instrumentation, integration of logs/metrics
• YARN still evolving into an operating system • Hadoop as a Service: aggregate and share expertise • Need to learn from the traditional database community!
![Page 55: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale](https://reader033.vdocuments.mx/reader033/viewer/2022052622/55934b011a28ab0b568b47d6/html5/thumbnails/55.jpg)
QUESTIONS? COMMENTS?