hadoop distributed file system · 2019-02-07 · distributed file system hold a large amount of...
TRANSCRIPT
![Page 1: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/1.jpg)
Hadoop Distributed File
System
By
Mr.D.B.Shanmugam
Associate Professor & HOD
Sri Balaji Chockalingam Engineering
College, Arni 1
![Page 2: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/2.jpg)
Overview
Distributed File System
History of HDFS
What is HDFS
HDFS Architecture
File commands
Demonstration
Sri Balaji Chockalingam Engineering
College, Arni 2
![Page 3: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/3.jpg)
Distributed File System
Hold a large amount of data
Clients distributed across a network
Network File System(NFS)o Straightforward design
o remote access- single machine
o Constraints
Sri Balaji Chockalingam Engineering
College, Arni 3
![Page 4: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/4.jpg)
History
Sri Balaji Chockalingam Engineering
College, Arni 4
![Page 5: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/5.jpg)
History
Apache Nutch – open source web engine-
2002
Scaling issue
Publication of GFS paper in 2003-
addressed Nutch’s scaling issues
2004 – Nutch distributed File System
2006 – Apache Hadoop – MapReduce and
HDFS
Sri Balaji Chockalingam Engineering
College, Arni 5
![Page 6: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/6.jpg)
HDFS
Terabytes or Petabytes of data
Larger files than NFS
Reliable
Fast, Scalable access
Integrate well with Map Reduce
Restricted to a class of applications
Sri Balaji Chockalingam Engineering
College, Arni 6
![Page 7: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/7.jpg)
HDFS versus NFS
Single machine makes part of its file system available to other machines
Sequential or random access
PRO: Simplicity, generality, transparency
CON: Storage capacity and throughput limited by single server
Sri Balaji Chockalingam Engineering College, Arni
Single virtual file system spread over
many machines
Optimized for sequential read and
local accesses
PRO: High throughput, high capacity
"CON": Specialized for particular
types of applications
Network File System (NFS) Hadoop Distributed File System (HDFS)
![Page 8: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/8.jpg)
HDFS
Sri Balaji Chockalingam Engineering
College, Arni 8
![Page 9: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/9.jpg)
Basics
Distributed File System of Hadoop
Runs on commodity hardware
Stream data at high bandwidth
Challenge –tolerate node failure without
data loss
Simple Coherency model
Computation is near the data
Portability – built using Java
Sri Balaji Chockalingam Engineering
College, Arni 9
![Page 10: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/10.jpg)
Basics
Interface patterned after UNIX file
system
File system metadata and application data
stored separately
Metadata is on dedicated server called
Namenode
Application data on data nodes
Sri Balaji Chockalingam Engineering
College, Arni 10
![Page 11: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/11.jpg)
Basics
HDFS is good for
◦ Very large files
◦ Streaming data access
◦ Commodity hardware
Sri Balaji Chockalingam Engineering
College, Arni 11
![Page 12: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/12.jpg)
Basics
HDFS is not good for
◦ Low-latency data access
◦ Lots of small files
◦ Multiple writers, arbitrary file modifications
Sri Balaji Chockalingam Engineering
College, Arni 12
![Page 13: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/13.jpg)
Differences from GFS
Only Single writer per file
Open Source
Sri Balaji Chockalingam Engineering
College, Arni 13
![Page 14: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/14.jpg)
HDFS Architecture
Sri Balaji Chockalingam Engineering
College, Arni 14
![Page 15: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/15.jpg)
HDFS Concepts
Namespace
Blocks
Namenodes and Datanodes
Secondary Namenode
Sri Balaji Chockalingam Engineering
College, Arni 15
![Page 16: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/16.jpg)
HDFS Namespace
Hierarchy of files and directories
In RAM
Represented on Namenode by inodes
Attributes- permissions, modification and
access times, namespace and disk space
quotas
Sri Balaji Chockalingam Engineering
College, Arni 16
![Page 17: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/17.jpg)
Blocks
HDFS blocks are either 64MB or 128MB
Large blocks-minimize the cost of seeks
Benefits-can take advantage of any disks in
the cluster
Simplifies the storage subsystem-amount
of metadata storage per file is reduced
Fit well with replication
Sri Balaji Chockalingam Engineering
College, Arni 17
![Page 18: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/18.jpg)
Namenodes and Datanodes
Master-worker pattern
Single Namenode-master server
Number of Datanodes-usually one per
node in the cluster
Sri Balaji Chockalingam Engineering
College, Arni 18
![Page 19: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/19.jpg)
Namenode
Master
Manages filesystem namespace
Maintains filesystem tree and metadata-
persistently on two files-namespace image
and editlog
Stores locations of blocks-but not
persistently
Metadata – inode data and the list of
blocks of each fileSri Balaji Chockalingam Engineering
College, Arni 19
![Page 20: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/20.jpg)
Datanodes
Workhorses of the filesystem
Store and retrieve blocks
Send blockreports to Namenode
Do not use data protection mechanisms
like RAID…use replication
Sri Balaji Chockalingam Engineering
College, Arni 20
![Page 21: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/21.jpg)
Datanodes
Two files-one for data, other for block’s
metadata including checksums and
generation stamp
Size of data file equals actual length of
block
Sri Balaji Chockalingam Engineering
College, Arni 21
![Page 22: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/22.jpg)
DataNodes
Startup-handshake:o Namespace ID
o Software version
Sri Balaji Chockalingam Engineering
College, Arni 22
![Page 23: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/23.jpg)
Datanodes
After handshake:o Registration
o Storage ID
o Block Report
o Heartbeats
Sri Balaji Chockalingam Engineering
College, Arni 23
![Page 24: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/24.jpg)
Sri Balaji Chockalingam Engineering
College, Arni 24
![Page 25: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/25.jpg)
Secondary Namenode
If namenode fails, the filesystem cannot be used
Two ways to make it resilient to failure:
o Backup of files
o Secondary Namenode
Sri Balaji Chockalingam Engineering
College, Arni 25
![Page 26: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/26.jpg)
Secondary Namenode
Periodically merge namespace image with editlog
Runs on separate physical machine
Has a copy of metadata, which can be used to reconstruct state of
the namenode
Disadvantage: state lags that of the primary namenode
Renamed as CheckpointNode (CN) in 0.21 release[1]
Periodic and is not continuous
If the NameNode dies, it does not take over the responsibilities of
the NN
Sri Balaji Chockalingam Engineering
College, Arni 26
![Page 27: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/27.jpg)
HDFS Client
Code library that exports the HDFS file
system interface
Allows user applications to access the file
system
Sri Balaji Chockalingam Engineering
College, Arni 27
![Page 28: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/28.jpg)
File I/O Operations
Sri Balaji Chockalingam Engineering
College, Arni 28
![Page 29: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/29.jpg)
Write Operation
Once written, cannot be altered, only
append
HDFS Client-lease for the file
Renewal of lease
Lease – soft limit, hard limit
Single-writer multiple-reader model
Sri Balaji Chockalingam Engineering
College, Arni 29
![Page 30: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/30.jpg)
HDFS Write
Sri Balaji Chockalingam Engineering
College, Arni 30
![Page 31: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/31.jpg)
Write Operation
Block allocation
Hflush operation
Renewal of lease
Lease – soft limit, hard limit
Single-writer multiple-reader model
Sri Balaji Chockalingam Engineering
College, Arni 31
![Page 32: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/32.jpg)
Data pipeline during block construction
Sri Balaji Chockalingam Engineering
College, Arni 32
![Page 33: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/33.jpg)
Sri Balaji Chockalingam Engineering
College, Arni 33
Creation of new file
![Page 34: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/34.jpg)
Read Operation
Checksums
Verification
Sri Balaji Chockalingam Engineering
College, Arni 34
![Page 35: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/35.jpg)
HDFS Read
Sri Balaji Chockalingam Engineering
College, Arni 35
![Page 36: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/36.jpg)
Replication
Multiple nodes for reliability
Additionally, data transfer bandwidth is
multiplied
Computation is near the data
Replication factor
Sri Balaji Chockalingam Engineering
College, Arni 36
![Page 37: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/37.jpg)
Image and Journal
State is stored in two files:
fsimage: Snapshot of file system metadata
editlog: Changes since last snapshot
Normal Operation:
When namenode starts, it reads fsimage and then applies all the
changes from edits sequentially
Sri Balaji Chockalingam Engineering
College, Arni 37
![Page 38: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/38.jpg)
Snapshots
Persistently save current state
Instruction during handshake
Sri Balaji Chockalingam Engineering
College, Arni 38
![Page 39: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/39.jpg)
Block Placement
Nodes spread across multiple racks
Nodes of rack share a switch
Placement of replicas critical for reliability
Sri Balaji Chockalingam Engineering
College, Arni 39
![Page 40: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/40.jpg)
Sri Balaji Chockalingam Engineering
College, Arni 40
![Page 41: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/41.jpg)
Replication Management
Replication factor
Under-replication
Over-replication
Sri Balaji Chockalingam Engineering
College, Arni 41
![Page 42: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/42.jpg)
Balancer
Balance disk space usage
Optimize by minimizing the inter-rack
data copying
Sri Balaji Chockalingam Engineering
College, Arni 42
![Page 43: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/43.jpg)
Block Scanner
Periodically scan and verify checksums
Verification succeeded?
Corrupt block?
Sri Balaji Chockalingam Engineering
College, Arni 43
![Page 44: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/44.jpg)
Decommisioning
Removal of nodes without data loss
Retired on a schedule
No blocks are entirely replicated
Sri Balaji Chockalingam Engineering
College, Arni 44
![Page 45: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/45.jpg)
HDFS –What does it choose in CAP
Partition Tolerance – can handle loosing
data nodes
Consistency
Steps towards Availability: Backup Node
Sri Balaji Chockalingam Engineering
College, Arni 45
![Page 46: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/46.jpg)
Backup Node
NameNode streams transaction log to BackupNode
BackupNode applies log to in-memory and disk image
Always commit to disk before success to NameNode
If it restarts, it has to catch up with NameNode
Available in HDFS 0.21 release
Limitations:
o Maximum of one per Namenode
o Namenode does not forward Block Reports
o Time to restart from 2 GB image, 20M files + 40 M blocks
3 – 5 minutes to read the image from disk
30 min to process block reports
BackupNode will still take 30 minutes to failover!
Sri Balaji Chockalingam Engineering
College, Arni 46
![Page 47: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/47.jpg)
Files in HDFS
Sri Balaji Chockalingam Engineering
College, Arni 47
![Page 48: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/48.jpg)
File Permissions
Three types:
◦ Read permission (r)
◦ Write permission (w)
◦ Execute Permission (x)
Owner
Group
Mode
Sri Balaji Chockalingam Engineering
College, Arni 48
![Page 49: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/49.jpg)
Command Line Interface
Sri Balaji Chockalingam Engineering
College, Arni 49
![Page 50: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/50.jpg)
hadoop fs –help
hadoop fs –ls : List a directory
hadoop fs mkdir : makes a directory in HDFS
copyFromLocal : Copies data to HDFS from local filesystem
copyToLocal : Copies data to local filesystem
hadoop fs –rm : Deletes a file in HDFS
More:
https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
Sri Balaji Chockalingam Engineering
College, Arni 50
![Page 51: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/51.jpg)
Accessing HDFS directly from JAVA
Programs can read or write HDFS files directly
Files are represented as URIs
Access is via the FileSystem API
o To get access to the file: FileSystem.get()
o For reading, call open() -- returns InputStream
o For writing, call create() -- returns OutputStream
Sri Balaji Chockalingam Engineering
College, Arni 51
![Page 52: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/52.jpg)
Interfaces
Getting data in and out of HDFS through the command-line interface
is a bit cumbersome
Alternatives:
FUSE file system: Allows HDFS to be mounted under Unix
WebDAV Share: Can be mounted as filesystem on many OSes
HTTP: Read access through namenode’s embedded web svr
FTP: Standard FTP interface
Sri Balaji Chockalingam Engineering
College, Arni 52
![Page 53: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/53.jpg)
Demonstration
Sri Balaji Chockalingam Engineering
College, Arni 53
![Page 54: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/54.jpg)
Questions?
Sri Balaji Chockalingam Engineering
College, Arni 54
![Page 55: Hadoop Distributed File System · 2019-02-07 · Distributed File System Hold a large amount of data Clients distributed across a network Network File System(NFS) o Straightforward](https://reader034.vdocuments.mx/reader034/viewer/2022050513/5f9d24ab5c536c27d32c34be/html5/thumbnails/55.jpg)
Thankyou
Sri Balaji Chockalingam Engineering
College, Arni 55