gfarm presentation and thesis topic introduction
DESCRIPTION
This slide outlines general information of Gfarm file system and the basis for the presenter's thesis.TRANSCRIPT
![Page 1: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/1.jpg)
1
GFARM V2:A grid file system that supports high-performance distributed and parallel
data computingOsamu Tatebe, Satoshi Sekiguchi, AIST, Tsukuba, Japan
Youhei Morita, KEK, Tsukuba, JapanNoriyuki Soda, SRA, Nagoya, Japan
Satoshi Matsuoka, Titech / NII, Tokyo, Japan
Presentation: Chawanat Nakasan / M1Laboratory for Software Design and AnalysisNara Institute of Science and Technology
Seminar II, First Presentation2013.12.04
![Page 2: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/2.jpg)
2
Agenda
• What is Gfarm• Things similar to Gfarm• Replication in Gfarm
• Networking issues in Gfarm• Research introduction
Paper
Application
O. Tatebe, S. Sekiguchi, Y. Morita, N. Soda, and S. Matsuoka, “Gfarm v2: A Grid file system that supports high-performance distributed and parallel data computing,” in Computing in High Energy Physics and Nuclear Physics, 2004, pp. 1172–1175.
![Page 3: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/3.jpg)
3
Introduction
![Page 4: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/4.jpg)
4
What is Gfarm?
• Distributed File System• with Parallel Processing
CPU CPU CPU CPU
META
Metaserver
Storage
Processor
Storage Nodes
![Page 5: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/5.jpg)
5
What’s different about Gfarm?
• Other clustering solutions send files to where the jobs are.
CPU CPU
META
File
FileFileDoesn’t work well
with BIG DATA.
Job Job
![Page 6: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/6.jpg)
6
What’s different about Gfarm?
• Instead, Gfarm sends jobs to nodes with files.
CPU CPU CPU CPU
META
File File
Job Job Job Job
![Page 7: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/7.jpg)
7
Replica Management
![Page 8: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/8.jpg)
8
One Big Issue in Distributed Storage:Replication and replica management• Same files are copied and spread across the system.• Reasons:• Redundancy• Locality• In Gfarm: job location
• Problem: Consistency.
CPU CPU CPU CPU
META
File File File
![Page 9: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/9.jpg)
9
Gfarm directs file opens to the same place.• This method is very effective for consistency control.• But, it requires more coordination between the nodes i.e. more
network load and overhead.
P1
F1
P2
F2
(1) P1 opens file replica F1
(2) P2 tries to open replica F2 (same file different place)
(3) P2 is redirected to use F1 too, to limit # copies open
![Page 10: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/10.jpg)
10
Summary: What is Gfarm?
• A distributed file system …• with a parallel processing scheduler …• that sends jobs to files, not files to jobs, …• and only one replica can be written at a time!
![Page 11: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/11.jpg)
11
Application: Improving GfarmWhy do we have to improve it?
![Page 12: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/12.jpg)
12
It sounds good, until implementations get too large.• When it becomes global-scale, we have to think differently.• This is what appears to us:
CPU CPU CPU
META
![Page 13: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/13.jpg)
13
It sounds good, until implementations get too large.• But this is reality:
META CPU
CPU
CPU
![Page 14: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/14.jpg)
14
So how do we simplify this problem? We put an overlay network on top.
Physical Network(Reality)
Overlay Network(Gfarm sees)
![Page 15: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/15.jpg)
15
1. It doesn’t care about locality.
• In this case, the two red arrows are “same length” according to this topology, because it’s just one hop apart.
Overlay Network
![Page 16: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/16.jpg)
16
1. It doesn’t care about locality.
• However, it’s not when we look at physical diagram.• Gfarm’s overlay network doesn’t recognize the true distances.
Physical Network
![Page 17: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/17.jpg)
17
2. Conventional network doesn’t use every route.• Examine this topology: there’s more than one way for the circled
nodes to reach each other.
Physical Network
Best Route: Always used
Other Route(s): Rarely used
![Page 18: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/18.jpg)
18
3. If we use every route, which would we use?
CPU CPU
High latency, more bandwidthGood for data transfer
Low latency, less bandwidthGood for control messages
![Page 19: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/19.jpg)
19
We are about to use the SDN.
• SDN = Software-defined network• Concept: Use software to
dynamically add or change network data flows.
Figure:McKeown, N., & Anderson, T. (2008). OpenFlow: enabling innovation in campus networks. Retrieved from http://dl.acm.org/citation.cfm?id=1355746
![Page 20: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/20.jpg)
20
What can the SDN do?
• SDN can practically let us make a whole new protocol by programming a specific “controller” to do the job.• With SDN, we can:• Change settings dynamically• Implement specialized Quality-of-Service (QoS)• Differentiate many kinds of connections
• By application, port, users, network addresses, groups, etc.• Use multi-path routing efficiently• and much more!
![Page 21: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/21.jpg)
21
So what do we want to do?
We want to
acceleratewide-area distributed storage
by using
software defined networkto
optimize the overlay network.
![Page 22: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/22.jpg)
22
![Page 23: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/23.jpg)
23
INFORMATION for GENERAL PUBLIC
• This work was made by a member of Laboratory for Software Design and Analysis, Graduate School of Information Science, Nara Institute of Science and Technology.• This presentation is the first of two required for Master’s degree
graduation and is presented to faculty and students of the Institute.• This file has been modified for public disclosure. Actual content
during presentation was different.
![Page 24: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/24.jpg)
24
BACKUP SLIDES
• Some of them may not make sense.
![Page 25: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/25.jpg)
25
Gfarm job execution relies on file presence.
![Page 26: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/26.jpg)
26
BACKUP: Gfarm’s not Hadoop
• Gfarm isn’t Hadoop: it provides job scheduling that’s not MapReduce. Of course, Gfarm works with Hadoop if you want it to.
http://www.ibm.com/developerworks/cloud/library/cl-openstack-deployhadoop/figure4.gif
Let’s just say Gfarm doesn’t do this:
![Page 27: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/27.jpg)
27
How to work with file replicas
• To open a file in READ mode: Any replica is OK.
Replica Replica Replica Replica
Process Process
Writing Reading
Process
![Page 28: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/28.jpg)
28
How to work with file replicas
• To open a file in WRITE mode (in this order):• If somebody is writing, use a replica already opened in WRITE mode• If nobody is writing, use a replica already opened in READ mode• If nobody is reading, use any replica
Replica Replica Replica Replica
Process Process
Writing Reading
Process
![Page 29: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/29.jpg)
29
BACKUP:2. Why don’t we use every possible route?• So what we can do might be:• Transfer File A over the red path• Transfer File B over the orange path
• The overall bandwidth would be increased!
Physical Network
![Page 30: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/30.jpg)
30
BACKUP:2. Why don’t we use every possible route?• Problems of this solution:• TCP segmentation & reordering• UDP will result in A LOT of unwanted and uncorrectable reordering
• Mitigation:• Separate data & control• Just divide the link at file level, so one file on link A, another file on link B, etc.• We can do this because it’s a file system and may make use of many files at
the same time.
![Page 31: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/31.jpg)
31
BACKUP:Why bandwidth and latency don’t correlate?
• Bandwidth is limited by the link capacity and rate of transmission and receiving.• Latency is caused by processing time.• Per-router processing time is increased in the WAN due to routers being
overwhelmed by general public usage of the Internet• There can be more than 10 hops to reach a node in another country.
![Page 32: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/32.jpg)
32
Actually, why NOT SDN?
• Configuration delay: takes some time for a new route to be installed• Single Point of Failure (for centralized SDNs like OpenFlow)• Cannot easily implement multiple SDN instances• We can however pre-slice the network and run SDN on each “subnet”, or• use solutions like FlowVisor (proxy OpenFlow)
• Controller bugs can break the existing thing (even the simplest controllers can have bugs!)
![Page 33: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/33.jpg)
33
How can we use it with Gfarm?
• Data• Use multiple paths• Prefer bandwidth path
• Control• QoS• Prefer low-latency path
• These methods can be implemented in SDN
![Page 34: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/34.jpg)
34
How can we use it with Gfarm?
• We can use multiple paths to add up bandwidth.• SDN can differentiate between each flow so paths can be separated.
Multi-path routing?
Physical Network
![Page 35: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/35.jpg)
35
How can we use it with Gfarm?
• Control messages prefer low latency• Data transfers prefer greater bandwidth.• SDN knows difference between these uses and can optimize.
Application-aware routing?
CPU CPUMore latencyMore bandwidth Less latency
Less bandwidth
![Page 36: Gfarm presentation and thesis topic introduction](https://reader036.vdocuments.mx/reader036/viewer/2022062419/5575e1d7d8b42af74e8b469c/html5/thumbnails/36.jpg)
36
How can we use it with Gfarm?
• Critical uses such as control messages can be given priority so they can “skip the (potentially very long) queue” of data packets.• Some SDNs like OpenFlow are beginning to support QoS.
Quality of Service?
Important Can Wait
• VoIP• Streaming data• Control Messages• Synchronous msgs
• Scheduled jobs• Data backup• Background tasks• Unimportant things