filesystems, rpc and hdfs
DESCRIPTION
Comparison between traditional filesystems and HDFS writesTRANSCRIPT
![Page 1: Filesystems, RPC and HDFS](https://reader034.vdocuments.mx/reader034/viewer/2022052522/547dc5235806b5a95e8b45d4/html5/thumbnails/1.jpg)
February 2012
Filesystems, RPC and HDFSAlexander Lorenz
![Page 2: Filesystems, RPC and HDFS](https://reader034.vdocuments.mx/reader034/viewer/2022052522/547dc5235806b5a95e8b45d4/html5/thumbnails/2.jpg)
Agenda
2
1 Linux Kernel I/O Scheduler
2 I/O Stack in Linux
3 VFS Implementation
4 NFS RFC Model
5 RPC
6 HDFS
7 Limitations / Problems (Discussion)
©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is
prohibited.
![Page 3: Filesystems, RPC and HDFS](https://reader034.vdocuments.mx/reader034/viewer/2022052522/547dc5235806b5a95e8b45d4/html5/thumbnails/3.jpg)
Linux Kernel I/O Scheduler
• Disk seek is the slowest operation in a computer
• I/O scheduler arranges the disk head to move in a single direction to minimize seeks
• Prevent Starvation
• Improve overall disk throughput by • Reorder requests to reduce the disk seek time• Merge requests to reduce the number of requests
3
![Page 4: Filesystems, RPC and HDFS](https://reader034.vdocuments.mx/reader034/viewer/2022052522/547dc5235806b5a95e8b45d4/html5/thumbnails/4.jpg)
Kernel I/O Scheduler Framework
Block layer
external queue device driver
enqueue
Internal queues
dequeue
IO SchedulerMerge, sort
prioritize
• Linux elevator is an abstract layer to which different I/O scheduler can attach
• Merging mechanisms are provided by request queues• Front or back merge of a request and a bio• Merge two requests
• Sorting policy and merge decision are done in elevators• Pick up a request to be merged with a bio• Add a new request to the request queue• Select next request to be processed by block drivers
4
![Page 5: Filesystems, RPC and HDFS](https://reader034.vdocuments.mx/reader034/viewer/2022052522/547dc5235806b5a95e8b45d4/html5/thumbnails/5.jpg)
Filesystem
Userland
KernelspaceSys Calls
Access Locking
Prefetch Flush
Disk Layout MetaData
HDD Driver
Cache
I/O Stack in Linux
5
Application
Bulk writes
![Page 6: Filesystems, RPC and HDFS](https://reader034.vdocuments.mx/reader034/viewer/2022052522/547dc5235806b5a95e8b45d4/html5/thumbnails/6.jpg)
VFS Implementation
6
Userland
KernelspaceSys Calls
VFS
ext3 ext2 NFS CIFS
Application
![Page 7: Filesystems, RPC and HDFS](https://reader034.vdocuments.mx/reader034/viewer/2022052522/547dc5235806b5a95e8b45d4/html5/thumbnails/7.jpg)
NFS RFC Model
Local HDD Local HDD
Applicationwith
NFS Access
FilesystemNFS Client
RPC
TCP/IPUDP
RPC
TCP/IPUDP
NFS ServerFilesystem
Kernelspace Kernelspace
File Handler
7
![Page 8: Filesystems, RPC and HDFS](https://reader034.vdocuments.mx/reader034/viewer/2022052522/547dc5235806b5a95e8b45d4/html5/thumbnails/8.jpg)
NFS - OSI Model
8
![Page 9: Filesystems, RPC and HDFS](https://reader034.vdocuments.mx/reader034/viewer/2022052522/547dc5235806b5a95e8b45d4/html5/thumbnails/9.jpg)
RPC
Client Server
Process starts
Process continued
Server waits
Server waits
Server start
PCPE
PR
Termination
RPC Return
RPC Message
Client waits
9
![Page 10: Filesystems, RPC and HDFS](https://reader034.vdocuments.mx/reader034/viewer/2022052522/547dc5235806b5a95e8b45d4/html5/thumbnails/10.jpg)
HDFS Layer
10
Local Client
VFS
HDFS Application
POSIX API HDFS API
Network HDFSNFS Driver
![Page 11: Filesystems, RPC and HDFS](https://reader034.vdocuments.mx/reader034/viewer/2022052522/547dc5235806b5a95e8b45d4/html5/thumbnails/11.jpg)
HDFS Model
10
add Blck (src)
write
Pipe
line
HDFS Cluster
Namenode
Client
DN
DN
DN
Block received
Block received
Block received
![Page 12: Filesystems, RPC and HDFS](https://reader034.vdocuments.mx/reader034/viewer/2022052522/547dc5235806b5a95e8b45d4/html5/thumbnails/12.jpg)
HDFS Write Model
DN
DN
Client NNRPC (ClientProtocol)
RPC (DFSClient.DFSInputStream)
RPC (DataNodeProtocol)
RPC rcv only
FSData stream (socket)
RPC (DataNodeProtocol)
RPC proxy
RPC proxy
DFS
RPCProxy IPC
VFS
HDD
DN intern
xceiver
11
![Page 13: Filesystems, RPC and HDFS](https://reader034.vdocuments.mx/reader034/viewer/2022052522/547dc5235806b5a95e8b45d4/html5/thumbnails/13.jpg)
Links / Resources
13
The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo!
http://dtrace.org/blogs/brendan/2011/05/11/
NFS and RPC Chavalit Srisathapornphat, CISC856
Linux I/O Schedulers Hao-Ran Liu