grid computing 7700 fall 2005 lecture 4: scientific computing and hardware gabrielle allen...

42
Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen [email protected] http://www.cct.lsu.edu/~gallen

Upload: meghan-belinda-norman

Post on 31-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Grid Computing 7700Fall 2005

Lecture 4: Scientific Computing and Hardware

Gabrielle [email protected]

http://www.cct.lsu.edu/~gallen

Page 2: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Basic Elements

CPU CPU

CPU CPU

DISK

Campus Network (LAN)

Machine Network

CPU CPU

CPU CPU

DISK

Campus Network (LAN)

Machine Network

Wide Area Network

Page 3: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Basic Elements

Distributed systems built from– Computing elements (processors)– Communication elements (networks)– Storage elements (disk, attached or

networked) New elements

– Visualization/interactive devices– Experimental and operational devices

Page 4: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Distributed Resources

Local workstations CCT Resources Campus/OCS Resources State/LONI Resources National Centers International Colleagues

Page 5: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Laws

Moores Law– Number of transistors on an integrated circuit will double every 18 months– http://en.wikipedia.org/wiki/Moores_law

“Kryders Law”– Hard disk capacity grows quicker than transistors– http://www.sciam.com/article.cfm?

chanID=sa006&colID=30&articleID=000B0C22-0805-12D8-BDFD83414B7F0000

Gilders Law– Total bandwidth of communication systems doubles every six months

Metcalfe’s Law– Value of a network is proportional to the square of the number of

nodes Amdahl’s Law

– Law of diminishing returns, maximum speedup restricted by slowest parts

– http://en.wikipedia.org/wiki/Amdahls_law

Question: So what about applications?

Page 6: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Compute Elements

Moore’s Law: #transistors on a chip (and clock speed) increase exponentially (double every 18 months)– Transistors = 20*2^[(year-1965)/1.5]– 1975 Intel 8080 has 4500 transistors, 100K

intructions/sec– 2003 Pentium IV has 221,000,000, 8 billion

instructions/sec Corollary: Price of a given level of

supercomputing power halves every 18 months Price decrease means that supercomputers

now usually built from “commodity” processors– IA32, PowerPC, “emotion engine”

Page 7: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen
Page 8: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Compute Elements

Clock speed Cache hierarchy Floating point registers Main memory Internal bandwidths Etc, etc Need powerful operating systems,

compilers, applications to leverage all this

Page 9: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Communication Elements

Links, routers, switches, name servers, protocols Infrastructure evolves slowly (politics, large scale

changes, money) Gilder's Law: total bandwidth of communication systems

doubles every six months Change in LAN to desktops

– 100 mbps shared– 100 mbps switched– 1 gbps – 10 gbps

Clusters: GigE (TCP/IP and MPICH/LAM) standard, Myricom/Quadrics (own MPI drivers) better performance, infiniband/fibrechannel different architecture

Page 10: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Network Speeds

Analog modem: 57 kbps GPRS: 114 kbps Bluetooth: 723 kbps T-1: 1.5 Mbps Eth 10Base-X: 10Mbps 802.11b (WiFi) 11 Mbps T-3: 45 Mbps OC-1: 52 Mbps Fast Eth 100Base-X: 100

Mbps

OC-12: 622 Mbps GigEth 1000Base-X: 1

Gbps OC-24: 1.2 Gbps OC-48: 2.5 Gbps OC-192: 10 Gbps 10 GigEth: 10 Gbps OC-3072: 160 Gbps

My Cox Cable– Upload: 35 KB/s– Download 250 KB/s

CCT “is” to supermike– Up/down: 5000 KB/s

Page 11: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Communication Elements

Interconnect Type

Short Message Latency (microsec)

Peak Bandwidth (mbps)

Bidirectional Bandwidth (mbps)

Approximate cost per port

Gigabit Ethernet

100 ~65 ~130 $100

Myrinet 9 280 500 $1000

Quadrics 5 300 500 $3000

Page 12: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Storage Elements

Magnetic tape/Magnetic disk Magnetic disk

– Properties: density/rotation/cost– 1970-1988 density improvements 29% per year– 1988-now density improvements 60% per year– Standard in PCs: 500mb (1995), 2gb(1997), 100gb

(2002)– Performance not increasing so fast

• Peak transfer (~100mbs)• Seek times (3-5ms) [bottleneck]

Grids: cost of storage neglibable, high speed networks make large data libraries attractive

Page 13: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

The Future (??)Machine Compute Memory Disk Network

2003 PC 8 g-op/s 512 mb 128 gb 1 gb/s

2003 SC 80 t-op/s 50 tb 1280 pb 10 tb/s

2008 PC 64 g-op/s 16 gb 2 tb 10 gb/s

2008 SC 640 t-op/s 160 tb 20 pb 100 tb/s

2013 PC 512 g-op/s 256 gb 32 tb 100 gb/s

2013 SC 5 p-op/s 2.6 pb 320 pb 1 pb/s

1 mega = 10^61 giga = 10^91 tera = 10^121 peta = 10^15

TeraGrid:40 TFlop/s6 TB memory1 Petabytes storage10 Gigabits/s

Earth Simulator:40 TFlop/s10 TB memory2.5 Petabytes storage13 Gigabits/s

DOE BlueGene:367 TFlop/s16 TB memory400 Terabyte storage

Page 14: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Supercomputers Definition of supercomputer

– Machine on top500.org ?• http://www.top500.org/lists/plists.php?Y=2005&M=06

– Machine costing over $1M ?– Basically highest end machines

Top 3 (2005)– DOE BlueGene/L (USA) 66K procs/137 TF– IBM BGW (USA) 41K procs/91 TF– NASA Columbia (USA) 10K procs/52TF

Top 3 (2003)– Earth Simulator (JAPAN) 5K procs/36 TF (6)– ASCI Q (USA) 8K procs/14 TF (12)– G5 Cluster (USA) 2k procs/12 TF (14)

Others– 18 IBM (China)– 147 Supermike (LSU !!!)

www.webopedia.com

The fastest type of computer. Supercomputers are very expensive

and are employed for specializedapplications that require immense amounts of mathematical calculations. For example, weather

forecasting requires a supercomputer. Other uses of supercomputers include

animated graphics, fluid dynamic calculations, nuclear energy research, and petroleum exploration.The chief difference between a supercomputer

and a mainframe is that a supercomputer channels all its power into executing a few programs as fast

as possible, whereas a mainframe uses its power to execute many programs

concurrently.

Page 15: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Architectural Classes

Flynn (1972): classification based on the way system manipulates instruction and data streams:

SISD Single Instruction Single Data– One instruction stream executed serially.– Conventional workstations

SIMD Single Instruction Multiple Data– Large (many thousands) number of processing units– All execute same instruction on different data in lockstep– Vector processors (NEC SX-6i) acting on arrays of data

MISD Multiple Instruction Single Data– No machines built

MIMD Multiple Instruction Multiple Data– Different to SISD because instructions/data are related

Page 16: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

More Classification

Shared Memory Systems– Multiple CPUs sharing same address space– One memory accessed by all processors equally– Location of data not important to user– Can be SIMD (single processor vector processor) or MIMD– OpenMP http://www.openmp.org/index.cgi?faq

Distributed Memory Systems– Each CPU has own memory– CPUs are connected by network– Location of data important– Can be SIMD (lock step example before) or MIMD (large

variety of network topologies)– Distributed processing takes DM-MIMD to extreme

Page 17: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Message Passing

Essential for DM machines, but often also used for SM machines for compatibility– MPI Message Passing interface– PVM Parallel Virtual Machine

Page 18: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

DM-MIMD

Fast growing section, best performance. Need to balance computation and communication performance in machine design (and upgrades)

User has to distribute data between processors User has to perform data exchange between processors

explicitly Slow compared to SM machines to access data on other

processors Programming models/algorithms important Programming environments can make this easier (e.g.

Cactus Framework http://www.cactuscode.org handles data distribution, communications, IO, …)

Same programming models need to be extended to Grid computing

Page 19: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

ccNUMA

Cache Coherent Non Uniform Memory Access Build systems from SMPs (symmetric

multiprocessing nodes) SMPs consist of up to ~16 processors

connected by a crossbar which share same memory

Each node is a SM-MIMD, but with different memory access times for different processors (memory is physically distributed)

Nodes then connecting in a different way Computational scientists like these machines

Page 20: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

DM-MIMD

Processor topology and interconnects very important– Hypercube (with 2^d nodes number of steps

between two nodes at most d, possible to simulate other topologies)

– Fat tree (simple tree structure with more connections at higher levels to ease conjestion)

– 2D/3D mesh structure (many apps map well to this, avoids expense)

– Crossbars (connecting up to around 64 processors, can be hierarchical)

Details should be hidden from application programmers, but for performance need to be aware

Page 21: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Virtual Shared Memory

Kendall Square Research Systems tried to implement at hardware level

High Performance Fortran– HPF Specification 1993– Simulates a virtual shared memory at a software

level– Programming directives distribute data across

processors– Looks like shared memory machine to user

Some vendors have propriety virtual shared memory programming models by providing global address space

Page 22: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Network Eras

Past (1969-1988)– ARPANET/NSFNET

Current (1988-2005) Future (2005-)

Historical network maps– http://www.cybergeography.org/atlas/historical.html

Page 23: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Network Infrastructure

Chapter 30 (The Grid 2) Network infrastructure is the foundation

on which Grids are built Composition of local and wide area

services, transport protocols and services, routing protocols and network services, link protocols and physical media

One example of network infrastructure in the Internet (core protocols TCP/IP)

Page 24: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Protocol Agreed-upon format for transmitting data between two

devices which determines:– The type of error checking to be used– Any data compression method– How sending device indicates it has finished sending a message– How receiving device indicates it has received a message

Various standard protocols: differ in simplicity, reliability, performance.

Computer/device must support the right ones to communicate with other computers.

Implemented either in hardware or in software http://www.protocols.com/protocols.htm

Page 25: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Slow to Change

Internet has not changed much since 1983 (when TCP/IP deployed), which does make is stable, but still don’t really have envisaged services:– Multicast (one-to-many communication)– Network Reservation– Quality of Service

New protocols peer-to-peer file sharing and instant messaging

New technology coupled to applications drive change: e-mail, web/file-sharing, video streaming

Page 26: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Past: 1969-1988 ARPANET (1969) 56-kbps lines

– Experiment to investigate resource sharing and remote access

– Added interface message processor (IMP) at each end of network (our routers), provided flexibility for lower levels and higher level applications

– Success from: freely available documentation and source code; software bundled with new machines; use for teaching; community development vs. proprietary

NSFNET (1985) 45-mpbs lines – Connect academic HPC centers

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 27: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

ARPANET: 1971

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 28: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

ARPANET: 1980

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 29: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

NSFNET: 1991

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 30: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Past: 1969-1988

Driving application: e-mail, remote file access, remote job control (drove basic protocols)

Network technology: WAN links lines leased from telephone companies. Xerox Palo Alto Research Center (PARC) created Ethernet (3 mbps) (alternatives token ring (IBM), …). Workstations appear bundled with network protocols. PCs on the network as interface costs dropped and processors became more powerful.

Page 31: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Past: 1969-1988

Protocols and Services– telnet, file transfer protocol, e-mail– Underlying transport protocol TCP (stream

of bytes which can be opened or closed, data can be sent or received)

– Machine location: Domain Name System (DNS) (replaced list of named files)

• Hierarchical, distributed, redundant

Page 32: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Past: 1969-1988

System Integration– ARPANET: assumed central network operations center– NSFNET: introduced hierarchical system, toplevel

backbone network connecting to regional networks connecting to campuses

Packet switching strategy was important (using computing power to optimize communication)

Single communication model was important because it allowed so many people to be connected driving future development.

Page 33: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Present: 1988-2005

Internet today: complex structure of backbone networks and regional networks

Increased role of private sector (e.g. AT&T, BellSouth), who basically control our network now.

Page 34: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

LSU Campus

Page 35: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

LANet

Louisiana statewide network: Office of Telecommunications Management, state agencies, higher education: 6Mbps -> $2450 a month

http://www.state.la.us/otm/lanet/

Page 36: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Quest

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 37: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Bell South

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Baton Rouge: 4 DS3 to New Orleans, 1 DS3 to Houston

Page 38: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Abeline (Internet2)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

http://abilene.internet2.edu/maps-lists/Traffic: http://loadrunner.uits.iu.edu/weathermaps/abilene/

Page 39: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

National Lambda Rail

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

http://www.nationallambdarail.org/architecture.html

Page 40: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

National Lambda Rail

Page 41: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Global Terabit Research Network

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 42: Grid Computing 7700 Fall 2005 Lecture 4: Scientific Computing and Hardware Gabrielle Allen allen@bit.csc.lsu.edu gallen

Required Reading

Overview of Recent Supercomputers– http://www.euroben.nl/reports/overview05a.pdf

Concentrate on pages 1 to 32, you do not need to learn this, just get an appreciation of the concepts.