the sc grid computing initiative kirk w. cameron, ph.d. assistant professor department of computer...
Post on 20-Dec-2015
215 Views
Preview:
TRANSCRIPT
The SC Grid Computing Initiative
Kirk W. Cameron, Ph.D.Assistant Professor
Department of Computer Science and Engineering
University of South Carolina
The Power Grid
Reliable
Universally accessibleStandardized
Low-cost
Billions of different devices
Resource
Transmission
Consumption on Demand
Distribution
The Computing Grid
Reliable
Universally accessibleStandardized
Low-cost
Billions of different devices
Resource Consumption on Demand
DistributionTransmission
What is a Grid?
• H/W + S/W infrastructure to provide access to computing resources †
– Dependable (guaranteed performance)– Consistent (standard interfaces)– Pervasive (available everywhere)– Inexpensive (on-demand to decrease overhead)
†The Grid: Blueprint for a Future Computing Infrastructure, I. Foster and C. Kesselman (Eds), Morgan-Kaufmann Publishers, 1998.
Examples
• Problem: Need to project companies computing needs– Solution: Computing on Demand– Example: Any company with medium-large # employees
• Problem: Computational needs exceed local abilities (cost)– Solution: Supercomputing on Demand– Example: Aerodynamic simulation of vehicles
• Problem: Data sets too large to be held locally– Solution: Collaborative computing with regional centers– Example: DNA sequencing analysis --> derive single sequence
locally, compare to large database that is non-local
• Private Sector Interest– Grid engines (SW), Hardware support, Outsourcing– IBM (PC Outsourcing), Sun (Grid One Engine)
Large-scale Example of a Grid
†The TeraGrid project is funded by the National Science Foundation and includes five partners: NCSA, SDSC, Argonne, CACR and PSC
NSF TeraGrid
Small-scale Example
CPU CPU CPU CPUCPU CPU CPU CPU
CPU CPU CPU CPUCPU CPU CPU CPU
CPU CPU CPU CPUCPU CPU CPU CPU
CPU
CPU
CPU
CPU
CPU
Local Network
Network of Workstations
Cluster of SMPs
128-node DoDFarm
Sender user space
CPU
Memory hierarchy
Network Buffer
Application buffer
Sender user space
CPU
Memory hierarchy
Network Buffer
Application bufferSender user space
CPU
Memory hierarchy
Network Buffer
Application bufferSender user space
CPU
Memory hierarchy
Network Buffer
Application bufferSender user space
CPU
Memory hierarchy
Network Buffer
Application buffer
Local Network
CPU CPU CPU CPUCPU CPU CPU CPU
32-node Beowulf
CPU
Loca
l Net
wor
k
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
InterconnectCloud
InterconnectCloud Lo
cal N
etwor
k
SC Grid Initiative
Electric Power Supply
• Circa 1910– Electric Power can be generated– Devices are being designed to use electricity– Users lack ability to build and operate their own generators
• Electric Power Grid– reliable, universally-accessible, standardized, low-cost,
transmission and distribution technologies– Result: new devices and new industries to manufacture
them
• Circa 2002– Billions of devices running on reliable, low-cost power
My background
• I’m a “systems” person– Intel Corporation
• SMP memory simulation
– Los Alamos National Laboratory• Performance Analysis Team (Scientific Computing)• DOE ASCI Project (TERA- and PETA-scale scientific apps)
– 2nd year at USC• Courses: Comp Arch, Parallel Comp Arch, Perform Analysis• Research: Parallel and Distributed Computing, Computer
Architecture, Performance Analysis and Prediction, Scientific Computing
• Interests: Identifying and improving performance of scientific applications through changes to algorithm and systems design (hardware, compilers, middleware (OS+))
Computational Power Supply
• Analogous to Power “Grid”– Heterogeneity (generators/outlets vs. machines/networks)– Consumer requirements
• Power consumption vs. computational requirements• Service guarantees vs. QOS• Money to be made vs. money to be made
– Economies of scale (power on demand?)– Political influence at large scale
• Local control necessary, with interfaces to outside (standards)
Why now?
• Technological improvements (cpu, networks, memory capacity)• Need for demand-driven access to computational power (e.g.
MRI)• Utilization of idle capacity (cycle stealing)• Sharing of collaborative results (virtual laboratories over
WANs)• Utilize new techniques and tools
– Network enabled solvers (Dongarra’s NetSolve at UT-Knoxsville)– Teleimmersion (collaborative use of Virtual Reality: Argonne,
Berkeley, MIT)
Applications of Grid Computing
• Distributed Supercomputing– Maximize available resources for large problems– Large-scale scientific computing– Challenges
• Scalability of service• Latency tolerance• Heterogeneous system high performance
• On-demand Computing– Access to non-local resources – computation, s/w, data,
sensors– Driven by cost/performance over absolute performance– Example: MUSC MRI data analysis
Applications of Grid Computing
• High-Throughput Computing– Scheduling large number of independent tasks– Condor (Wisconsin)– http://setiathome.ssl.berkeley.edu/
• Data Intensive Computing– Data analysis applications– Grid Physics Network (http://www.griphyn.org/)
• Collaborative Computing– Virtual shared-space laboratories– Example: Boilermaker (Argonne)
• Collaborative, interactive design of injective pollution control systems for boilers (http://www.mcs.anl.gov/metaneos)
Other Grid-related Work
• General Scientific Community (http://www.gridforum.com)– NSF Middleware Initiative– Globus Project– Condor Project– Cactus Project– See grid forum for long list…
SC Grid Initiative
• Immediate increase in local computational abilities• Ability to observe application performance and look “under the
hood”• Interface infrastructure to link with other computational Grids• Ability to provide computational power to others on-demand• “Tinker-time” to establish local expertise in Grid computing• Incentive to collaborate outside university and obtain external
funding
SC Grid Milestones
• Increase computational abilities of SC– Establish prelim working Grid on SC campus– Benchmark prelim system configurations– Use established testbed for
• Scientific computing• Multi-mode computing• Middleware Development
– Establish dedicated Grid resources• $45K Equipment Grant from USC VP Research
• Extend SC Grid boundaries– Beyond department resources– Beyond campus resources (MUSC)– Beyond state resources (IIT)
• Incorporate other technologies– Multi-mode applications (e.g. Odo)
Grid-enabled as of 1 Nov 02!
In progress
In progress
done
done
done
Outline
• Gentle intro to Grid• SC Grid Computing Initiative• Preliminary Results• Synergistic Activities
USC CSCE Department Mini Grid
1 2 3 4 5 6
7 8 9 101112
AB
12x
6x
8x
2x
9x
3x
10x
4x
11x
5x
7x
1x
Eth
ern
et
A
12x
6x
8x
2x
9x
3x
10x
4x
11x
5x
7x
1x
C
1 2 3 4 5 6
7 8 9 101112
AB
12x
6x
8x
2x
9x
3x
10x
4x
11x
5x
7x
1x
Eth
ern
et
A
12x
6x
8x
2x
9x
3x
10x
4x
11x
5x
7x
1x
C
1 2 3 4 5 6
7 8 9 101112
AB
12x
6x
8x
2x
9x
3x
10x
4x
11x
5x
7x
1x
Eth
ern
et
A
12x
6x
8x
2x
9x
3x
10x
4x
11x
5x
7x
1x
C
1 2 3 4 5 6
7 8 9 101112
AB
12x
6x
8x
2x
9x
3x
10x
4x
11x
5x
7x
1x
Eth
ern
et
A
12x
6x
8x
2x
9x
3x
10x
4x
11x
5x
7x
1x
C
SUN Ultra 10#17#2
SUN Ultra 10#1
SUN Blade 100#21#2
SUN Blade 100#1
SUN Blade 150#1
SUN Blade 150#2
Daniel N31 N32
N01 N02 N15 N16
N17 N18
Beowulf
NOWs
Node Configurations
Resources # HW OS Networking
Beowulf(for research)
1 1 master node + 32 slave nodes PIII 933M, 1GB Mem
RedHat Linux 7.1
10/100M NIC
SUN Ultra10(for teaching)
17 UltraSPARC-Iii 440M,256MB memory
Solaris 2.9 10/100M NIC
SUN Blade100(for teaching)
21
UltraSPARC-IIe 502M,256MB memory
Solaris 2.9 10/100M NIC
SUN Blade150(for teaching)
2 UltraSPARC-IIe 650M,256MB memory
Solaris 2.9 10/100M NIC
Benchmark and Testing Environment
• NPB (NAS Parallel Benchmarks 2.0)– Specifies a set of programs as benchmarks– Each benchmark has 3 problem sizes
• Class A: for moderately powerful workstation• Class B: high-end workstations or small parallel systems• Class C: high-end supercomputing
• We tested the performance of– EP kernel is "embarrassingly" parallel in that no
communication is required for the generation of the random numbers itself.
– SP kernel solves three sets of uncoupled systems of equations, first in the x, then in the y, and finally in the z direction. These systems are scalar penta-diagonal.
• Running Setting– When #node <= 16, EP is run on NOWs – When #node > 16 , EP is run on NOWs and Beowulf
Execution time for EP (Class C)
0500
100015002000250030003500
1 2 4 8 16 32 64
# of processors
Ex
ec
uti
on
Tim
e (
s)
Performance doubles with number of nodes.
Except here
MFLOPS for EP (Class C)
0
20
40
60
80
1 2 4 8 16 32 64
# of processors
MF
LO
PS
/s
MFLOPS overall shows same trend. (Performance doubles with number of
nodes.)
Except here
Node Performance for EP (Class C)
0
0.5
1
1.5
2
2.5
3
1 2 4 8 16 32 64
# of processors
MF
LO
PS
/s/N
od
e
MFLOPS/node illustrates reason for less than optimal application scalability.
At this point we incorporate older Sun machines (SunBladeSunUltra)
0
1000
2000
3000
4000
5000
6000
7000
8000
1 4 9 16 25 36
Number of nodes
Ex
ec
uti
on
tim
e (
s)
Execution Time for SP (Class B)
More realistic problems will have performance bottlenecks:
need analysis to run efficiently
top related