hadoopfile.pptx

Hdfs (Hadoop distributed file system)

Hdfs(Hadoop distributed file system)Rob Jordan & Chris Livdahl

OutlineHDFS Goals & AssumptionsMaster/Slave ArchitectureCommunicationData ReplicationRobustness / Fault ToleranceResearch and Extensions

HDFS Goals & AssumptionsHDFS BasicsAn open-source implementation of Google File SystemAssume that node failure rate is highLarge files, some several GB largeWrite-once-ready-many patternReads are performed in a large streaming fashionLarge throughput instead of low latencyMoving computation is easier than moving dataHDFS FilesUser data divided into 64MB blocks and replicated across local disks of cluster node to address:Cluster network bottleneckCluster node crashesMaster/Slave ArchitectureMaster (Namenode) maintains a name space and metadata Slaves (Datanodes): maintain three copies of each data blockMaster / Slave ArchitectureNamenodeAbitrator and repository for all HDFS metadataData never flows through NamenodeExecutes file system namespace operationsopen, close, rename files and directoriesDetermines mapping of blocks to DatanodesEditLog & FsImageManaged by NamenodeStored in files on the local OS file systemEditLogTransaction logRecords all changes to file system metadataFsImageImage of entire file system namespaceMappings of blocks to filesFile system propertiesStored in a file on the local OS file system

DatanodesServe read / write requests from clientBlock creation, deletion and replication upon instruction from NamenodeNo knowledge of HDFS filesStores HDFS data in files on local file systemDetermines optimal file count per directoryCreates subdirectories automaticallyCommunicationCommunication ProtocolsLayered on top of TCP/IPRPC abstraction wraps protocolsClientProtocolClient talks to NamenodeDatanode Protocol Datanodes talk to NamenodeNamenode never initiates any RPCsIt only responds to RPC requests

HDFS Client Block Diagram

13HDFS: Hadoop Distributed File SystemsClient requests meta data about a file from namenodeData is served directly from datanodeLecture 6: Task Parallelism and MapReduceCSS534ApplicationHDFS ClientHDFS datanodeLinux local file systemHDFS datanodeLinux local file systemHDFS namenodeblock 3df2File namespace/user/css534/input(file name, block id)(block id, block location)(block id, byte range)block datainstructionsstate14File Read/Write in HDFSLecture 6: Task Parallelism and MapReduceCSS534data nodeDataNodedata nodeDataNodedata nodeDataNodedata nodeDataNodedata nodeDataNodedata nodeDataNodename nodeNameNodename nodeNameNodeHDFSclientDistributedFileSystemFSDataInputStreamclient nodeclient JVMHDFSclientDistributedFileSystemFSDataOutputStreamclient nodeclient JVMFile ReadFile Write1. create2. create3. write5. write packet6. ack packet7. close8. complete1. open2. get block locations3. read6. close4. read from the closest node5. read from the 2nd closest node4. get a list of 3 data nodesIf a data node crashed, the crashed node is removed, current block receives a newer id so as to delete the partial data from the crashed node later, and Namenode allocates an another node. Data ReplicationReplica PlacementDistinguishes HDFS from most other DFSWhen replication factor == 3Put one replica on local rackPut one replica on different node on local rackPut one replica on different node on different rackReplicas do not evenly distribute across racksStart-Up ProcessNamenode enters SafemodeReplication does not occur in SafemodeEach Datanode sends Heartbeat Each Datanode sends BlockreportLists all HDFS data blocksNamenode creates Blockmap from BlockreportsNamenode exits SafemodeReplicate any under-replicated blocks Datanode Blockreports

BlockMap and Replication

Checkpoint ProcessPerformed by NamenodeTwo versions of FsImageOne stored on diskOne in memoryApplies all transactions in EditLog to in-memory FsImageFlushes FsImage to diskTruncates EditLogCurrently only occurs on start-upRobustness / Fault ToleranceDatanode FailureDatanode sends periodic HeartbeatsNamenodeDatanodeDatanodeDatanodeDatanode FailureNamenode marks Datanodes without recent heartbeat as deadDoes not forward any new I/O requestsConstantly tracks which blocks must be replicated with BlockMapInitiates replication when necessaryNamenode FailureSingle Point of Failure for HDFS clusterFsImage and EditLog are central data structures for HDFSCorruption / loss of these files causes HDFS to become non-functionalManual intervention is necessaryAutomatic restart and failover of Namenode not yet supported (but planned)LimitationsLimitationsWrite-once modelPlan to support appending-writesA namespace with an extremely large number of files exceeds Namenodes capacity to maintainCannot be mounted by exisiting OSGetting data in and out is tediousVirtual File System can solve problemJava APIThrift API is available to use other languagesLimitationsHDFS does not implement / supportUser quotasAccess permissionsHard or soft linksData balancing schemesNo periodic checkpointsNamenode is single point of failureAutomatic restart and failover to another machine not yet supportedResearch and ExtensionsHiveFacebook Data Infrastructure Team700TB of dataTens of thousands of tablesOver 200 users per monthOpen-source data warehousing solution built on top of HadoopHiveQL: a SQL-like query languageCompiled into MapReduce jobsAble to plug in MapReduce scripts into queries

Wordcount in HiveFROM (MAP doctextUSING python wc_mapper.py AS (word, cnt)FROM docsCLUSTER BY word) aREDUCE word, cnt USING python_wc_reduce.py;Energy EfficiencyData availability maintained even though node may be idle computationally MapReduce tasks/data may underutilize the node SolutionsHadoop could communicate with nodes to put them in low-power modes when possibleHadoop could aggregate tasks and data to more fully utilize nodes, keeping other nodes powered downTrade-off between energy usage and performance of MapReduce, can achieve good enough performance at significant energy savings

Referenceshttp://hadoop.apache.org/hdfs/http://en.wikipedia.org/wiki/Apache_Hadoop#Hadoop_Distributed_File_Systemhttp://hadoop.apache.org/common/docs/r0.20.0/hdfs_design.pdfBo Dong; Jie Qiu; Qinghua Zheng; Xiao Zhong; Jingwei Li; Ying Li; , "A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: A Case Study by PowerPoint Files," Services Computing (SCC), 2010 IEEE International Conference on , vol., no., pp.65-72, 5-10 July 2010Attebury, G.; Baranovski, A.; Bloom, K.; Bockelman, B.; Kcira, D.; Letts, J.; Levshina, T.; Lundestedt, C.; Martin, T.; Maier, W.; Haifeng Pi; Rana, A.; Sfiligoi, I.; Sim, A.; Thomas, M.; Wuerthwein, F.; , "Hadoop distributed file system for the Grid," Nuclear Science Symposium Conference Record (NSS/MIC), 2009 IEEE , vol., no., pp.1056-1061, Oct. 24 2009-Nov. 1 2009Shvachko, K.; Hairong Kuang; Radia, S.; Chansler, R.; , "The Hadoop Distributed File System," Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on , vol., no., pp.1-10, 3-7 May 2010Thusoo, A.; Sarma, J.S.; Jain, N.; Zheng Shao; Chakka, P.; Ning Zhang; Antony, S.; Hao Liu; Murthy, R.; , "Hive - a petabyte scale data warehouse using Hadoop," Data Engineering (ICDE), 2010 IEEE 26th International Conference on , vol., no., pp.996-1005, 1-6 March 2010Jacob Leverich and Christos Kozyrakis. 2010. On the energy (in)efficiency of Hadoop clusters.SIGOPS Oper. Syst. Rev. 44, 1 (March 2010), 61-65.

hadoopfile.pptx

Documents

crashed node

hdfs metadatadata

data nodesif

partial data

local disks of cluster

local rackput

datahdfs filesuser data

block locationblock