so, jung-ki distributed computing system lab school of computer science and engineering seoul...

20
So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineer ing Seoul National University Implementation of Package Management in a Cluster Environment

Upload: bryce-harvey

Post on 12-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki

Distributed Computing System LAB

School of Computer Science and Engineering

Seoul National University

Implementation of Package Management in a Cluster

Environment

Page 2: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

2 / 20

Introduction (1/2)

Supercomputer High performance processor / high network bandwidth

Expensive system but Beowulf system is cost-effective

Motivation Focus on Cluster system

Cluster Management system Manual method / add-on method / integrated method

Registry Central repository of information about all aspects of the computer

Page 3: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

3 / 20

Introduction (2/2)

Challenge Integrated method has low availability and reliability

Can’t manage computation nodes separately

When failure occurs, system can’t be rejuvenated

Goal ( using Registry ) Improve availability and reliability of integrated method

Administrator can manage a cluster system easily

Restore cluster system with a backup snapshot

Page 4: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

4 / 20

Supercomputer

0

50

100

150

200

250

300

350

400

450

500

Constellation 0 0 12 25 118 140 79

Cluster 0 0 1 6 32 149 304

MPP 119 219 270 247 319 211 117

SMP 249 241 215 222 31 0 0

SIMD 35 11 2 0 0 0 0

Single P rocessor 97 29 0 0 0 0 0

1993 1995 1997 1999 2001 2003 2005

Domestic Supercomputer

Quantity : 14

Cluster : 4

MPP : 4

Constellation : 6

※ SNU : 2 (51/413)

60.8%

Page 5: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

5 / 20

Cluster Management System

Manual approach System administrator brings up entire system manually

Add-on method Bring up a frontend node, then add cluster packages

OSCAR / Warewulf / OpenMosix

Integrated method Cluster packages are installed and configured during the in

itial installation Rocks / Scyld

Page 6: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

6 / 20

Cluster Management System

Software Stack

Linux Kernel

Linux EnvironmentHPC

Device Drivers

Job Schedulingand Launching

Cluster software management

Cluster State management /

Monitoring

Message passing / communication Layer

Parallel code / Grid / computer lab …

OS (Linux)SGEApplication HPC

Page 7: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

7 / 20

Rocks Overview

Identity System to build and manage a Linux Cluster

Free : Open source project

Goal Make clusters easy

Philosophy Computation nodes are 100% automatically installed

Roll : set of packages

Graph / Kickstart

Run on heterogeneous system architecture

Doesn’t attempt to incrementally update software

Page 8: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

8 / 20

Rocks system

Architecture

Front-end node

node nodenode node

Local Network

eth1

eth0

eth0 eth0 eth0 eth0

internet

Page 9: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

9 / 20

What is Registry ?

Central repository of info about all aspects of the computer

Hardware, OS, applications, users information

Function

Retrieve system information

Update / add / delete software

Backup & restore system

Advantage

Easier for applications to access system

Storing large amounts of structured data (system info)

Page 10: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

10 / 20

Registry Design

ID (primary key)

Name

Membership

CPUs

Rack

Rank

Comment

NodesID (primary key)

Node

MAC

IP

Gateway

Name

Device

Module

Network

ID (primary key)

Node

Name

Version

Release

Install

Package

ID (primary key)

Node

Name

Aliases

ID (primary key)

Name

Appliance

Distribution

Memberships

ID (primary key)

Name

Graph

Node

Appliances

ID (primary key)

Name

Release

Lang

Distribution

Original Relational Schema

Appended Relation

H/Winformation

S/Winformation

Page 11: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

11 / 20

Strategy of management

Rocks Setup Minimum modification

Take advantage of original Rocks system Deploy cluster system easily

Modify related source codes insert-ethers, kickstart.cgi, Kpp, Kgen, Rgen

Running System Apply package modification

Package management program : add / update / delete packages

DB consistency management program

Page 12: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

12 / 20

Collection Method Rgen

Registry variables

Package variables

Appendedcomponent

Page 13: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

13 / 20

Modification Method

frontend node

compute node

1: check command & registry

2: transmit rpm file & command

3: perform command

4: return result(command execution)

5: update registry or handle Error

Insert commandpackage modification

Insert Command

Update package

check cmd

retreive registry

Delete package

Add package

Modify registry

[ registry-on and update cmd ]

[ registry-on and delete cmd ]

[ registry-off and add cmd ]

[ else ]

Packages tablePackage name / version / release

Instruction : Add / update / delete add –c=compute-0-0 –i=amanda-2.4.5-2.i386 add –c=all –i=all del -c=compute-0-0 –i=amanda-2.4.5-2.i386 del -c=all -i=all

Packages tableAdd / delete / update

Compute Nodes

Page 14: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

14 / 20

Registry consistency

Setup time

When frontend node removes / updates computation node

Dependency : change node table → change package table

Modify Kickstart.cgi / kgen

Apply cascading tables change

※mysql not support transaction property

Running system

Package install / delete / update

Compute node rpm information = frontend node’s registry DB

Page 15: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

15 / 20

Experiment Setup

Public Ethernet

Frontend node

Compute nodes (14)

Rocks.snu.ac.krCPU 800MhzRAM 768MBHDD 40G

Compute-0-(1~14)CPU 850MhzRAM 1GHDD 10G

468KB

117MB

capacity

3

53

volume

amanda

HPC

name

Experiment Data

1.5GB 479Rocks roll

Page 16: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

16 / 20

Original Rocks Evaluation

676 703 703 664 711 668 684 690 708 708 669 671 689 689

1104 1138 11401088

1148 1102 1120 1127 1144 1135 1096976 993 1004

0

200

400

600

800

1000

1200

1400

1 2 3 4 5 6 7 8 9 10 11 12 13 14

compute node

sectransmit service

average service time : 18min 14sec average transmit time : 11min 28sec

Network cardDHCP request

Page 17: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

17 / 20

Amanda Packages Evaluation

6413 6450

7931

6393 6636 6598

7735

66086197 5905 6194 6205

7283

6228

50235369

6735

5659 5831 5727 56335127

5589 5342 56005244

58645282

0

1000

2000

3000

4000

5000

6000

7000

8000

1 2 3 4 5 6 7 8 9 10 11 12 13 14

compute node

millisec install amanda packages delete amanda pakages

average install time : 6.62 sec Average delete time : 5.57sec

Page 18: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

18 / 20

HPC Roll Evaluation

212 205

233

188205 201 206 203 211

196 195 197 206195

7584

74 78 81 83 82 78 80 75 77 76 80 75

0

50

100

150

200

250

1 2 3 4 5 6 7 8 9 10 11 12 13 14

compute node

sec install hpc packages delete hpc packages

average install time : 3min 38sec average delete time : 1min 18sec

Page 19: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

19 / 20

Conclusion

Registry takes advantage of cluster system

Improve availability and reliability using Registry

Administrator can manage cluster systems easily

Restore cluster systems with backup snapshots

Page 20: So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management

So, Jung-ki (SNU DCS Lab)

Introduction Related Work Design Evaluation Conclusion

20 / 20

Q & A

Questions or Comments ?

Thank you !