cs 152 computer architecture and engineering lecture 22 ...cs152/sp08/lectures/l22-final.pdfcs 152...

32
CS 152 Computer Architecture and Engineering Lecture 22: Final Lecture Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.cs.berkeley.edu/~cs152 5/6/2008 2 CS152-Spring!08 Today’s Lecture Review entire semester What you learned Follow-on classes What’s next in computer architecture?

Upload: ngoxuyen

Post on 26-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

CS 152 Computer Architecture

and Engineering

Lecture 22: Final Lecture

Krste AsanovicElectrical Engineering and Computer Sciences

University of California, Berkeley

http://www.eecs.berkeley.edu/~krste

http://inst.cs.berkeley.edu/~cs152

5/6/2008 2

CS152-Spring!08

Today’s Lecture

• Review entire semester– What you learned

• Follow-on classes

• What’s next in computer architecture?

5/6/2008 3

CS152-Spring!08

The New CS152 Executive Summary(what was promised in lecture 1)

The processor yourpredecessors built inCS152

What you’llunderstand andexperiment with inthe new CS152

Plus, the technologybehind chip-scalemultiprocessors(CMPs)

5/6/2008 4

CS152-Spring!08

From Babbage to IBM 650

5/6/2008 5

CS152-Spring!08

IBM 360: Initial Implementations

Model 30 . . . Model 70

Storage 8K - 64 KB 256K - 512 KB

Datapath 8-bit 64-bit

Circuit Delay 30 nsec/level 5 nsec/level

Local Store Main Store Transistor Registers

Control Store Read only 1µsec Conventional circuits

IBM 360 instruction set architecture (ISA) completelyhid the underlying technological differences betweenvarious models.

Milestone: The first true ISA designed as portablehardware-software interface!

With minor modifications it still survives today!

5/6/2008 6

CS152-Spring!08

Microcoded Microarchitecture

Memory(RAM)

Datapath

µcontroller

(ROM)

AddrData

zero?busy?

opcode

enMemMemWrt

holds fixedmicrocode instructions

holds user programwritten in macrocode

instructions (e.g.,MIPS, x86, etc.)

5/6/2008 7

CS152-Spring!08

Implementing Complex Instructions

ExtSel

A B

RegWrt

enReg

enMem

MA

addr addr

data data

rsrtrd

32(PC)31(Link)

RegSel

OpSel ldA ldB ldMA

Memory

32 GPRs+ PC ...

32-bit RegALU

enALU

Bus

IR

busyzero?Opcode

ldIR

ImmExt

enImm

2

ALUcontrol

2

3

MemWrt

32

rsrtrd

rd ! M[(rs)] op (rt) Reg-Memory-src ALU op M[(rd)] ! (rs) op (rt) Reg-Memory-dst ALU op M[(rd)] ! M[(rs)] op M[(rt)] Mem-Mem ALU op

5/6/2008 8

CS152-Spring!08

From CISC to RISC

• Use fast RAM to build fast instruction cache of user-visible instructions, not fixed hardware microroutines

– Can change contents of fast instruction memory to fit whatapplication needs right now

• Use simple ISA to enable hardwired pipelinedimplementation

– Most compiled code only used a few of the available CISCinstructions

– Simpler encoding allowed pipelined implementations

• Further benefit with integration– In early ‘80s, can fit 32-bit datapath + small caches on a single chip

– No chip crossings in common case allows faster operation

5/6/2008 9

CS152-Spring!08

Nanocoding

• MC68000 had 17-bit µcode containing either 10-bit µjump or 9-bitnanoinstruction pointer

– Nanoinstructions were 68 bits wide, decoded to give 196 controlsignals

µcode ROM

nanoaddress

µcode

next-state

µaddress

µPC (state)

nanoinstruction ROMdata

Exploits recurringcontrol signal patternsin µcode, e.g.,

ALU0 A ! Reg[rs]

...ALUi0 A ! Reg[rs]

...

User PC

Inst. Cache

Hardwired Decode

5/6/2008 10

CS152-Spring!08

“Iron Law” of Processor Performance

Time = Instructions Cycles Time Program Program * Instruction * Cycle

– Instructions per program depends on source code, compilertechnology, and ISA

– Cycles per instructions (CPI) depends upon the ISA and themicroarchitecture

– Time per cycle depends upon the microarchitecture and thebase technology

short1Pipelined

long1Single-cycle unpipelined

short>1Microcoded

cycle timeCPIMicroarchitecture

5/6/2008 11

CS152-Spring!08

5-Stage Pipelined Execution

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .instruction1 IF1 ID1 EX1 MA1 WB1

instruction2 IF2 ID2 EX2 MA2 WB2

instruction3 IF3 ID3 EX3 MA3 WB3

instruction4 IF4 ID4 EX4 MA4 WB4

instruction5 IF5 ID5 EX5 MA5 WB5

Write-Back(WB)

I-Fetch(IF)

Execute(EX)

Decode, Reg. Fetch(ID)

Memory(MA)

addr

wdata

rdataDataMemory

we

ALU

ImmExt

0x4

Add

addrrdata

Inst.Memory

rd1

GPRs

rs1rs2

wswdrd2

we

IRPC

5/6/2008 12

CS152-Spring!08

Pipeline Hazards

• Pipelining instructions is complicated by HAZARDS:– Structural hazards (two instructions want same hardware resource)

– Data hazards (earlier instruction produces value needed by laterinstruction)

– Control hazards (instruction changes control flow, e.g., branches orexceptions)

• Techniques to handle hazards:– Interlock (hold newer instruction until older instructions drain out of

pipeline)

– Bypass (transfer value from older instruction to newer instruction assoon as available somwhere in machine)

– Speculate (guess effect of earlier instruction)

• Speculation needs predictor, prediction check, andrecovery mechanism

5/6/2008 13

CS152-Spring!08

Exception Handling 5-Stage Pipeline

PCInst.Mem D Decode E M

DataMem W+

IllegalOpcode

Overflow Data addressExceptions

PC addressException

AsynchronousInterrupts

ExcD

PCD

ExcE

PCE

ExcM

PCM

Cause

EPC

Kill DStage

Kill FStage

Kill EStage

SelectHandlerPC

KillWriteback

CommitPoint

5/6/2008 14

CS152-Spring!08

Processor-DRAM Gap (latency)

Time

!Proc 60%/year

DRAM7%/year

1

10

100

1000

1980

1981

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

DRAM

CPU

1982

Processor-MemoryPerformance Gap:(grows 50% / year)

Perf

orm

ance “Moore’s Law”

Four-issue 2GHz superscalar accessing 100ns DRAM couldexecute 800 instructions during time for one memory access!

5/6/2008CS152-Spring!08

Common Predictable Patterns

Two predictable properties of memory references:

– Temporal Locality: If a location is referenced it islikely to be referenced again in the near future.

– Spatial Locality: If a location is referenced it is likelythat locations near it will be referenced in the nearfuture.

Memory Reference Patterns

Donald J. Hatfield, Jeanette Gerald: Program

Restructuring for Virtual Memory. IBM Systems Journal

10(3): 168-192 (1971)

Time

Mem

ory

Ad

dre

ss (

on

e d

ot

per

access)

TimeTime

SpatialLocality

Temporal Locality

5/6/2008 17

CS152-Spring!08

Causes for Cache Misses

• Compulsory: first-reference to a block a.k.a. cold

start misses- misses that would occur even with infinite cache

• Capacity: cache is too small to hold all data needed by the program- misses that would occur even under perfect replacement policy

• Conflict: misses that occur because of collisions due to block-placement strategy

- misses that would not occur with full associativity

5/6/2008 18

CS152-Spring!08

A Typical Memory Hierarchy c.2006

L1 DataCache

L1Instruction

CacheUnified L2

Cache

RF Memory

Memory

Memory

Memory

Multiportedregister file

(part of CPU)

Split instruction & dataprimary caches(on-chip SRAM)

Multiple interleavedmemory banks

(DRAM)

Large unified secondary cache(on-chip SRAM)

CPU

5/6/2008 19

CS152-Spring!08

Modern Virtual Memory Systems Illusion of a large, private, uniform store

Protection & Privacyseveral users, each with their privateaddress space and one or moreshared address spaces

page table " name space

Demand PagingProvides the ability to run programslarger than the primary memory

Hides differences in machineconfigurations

The price is address translation oneach memory reference

OS

useri

PrimaryMemory

SwappingStore

VA PAmapping

TLB

5/6/2008 20

CS152-Spring!08

Hierarchical Page Table

Level 1 Page Table

Level 2

Page Tables

Data Pages

page in primary memory page in secondary memory

Root of the CurrentPage Table

p1

offset

p2

Virtual Address

(ProcessorRegister)

PTE of a nonexistent page

p1 p2 offset

01112212231

10-bitL1 index

10-bit L2 index

5/6/2008 21

CS152-Spring!08

Address Translation & Protection

• Every instruction and data access needs address

translation and protection checks

A good VM design needs to be fast (~ one cycle) andspace efficient -> Translation Lookaside Buffer (TLB)

Physical Address

Virtual Address

AddressTranslation

Virtual Page No. (VPN) offset

Physical Page No. (PPN) offset

ProtectionCheck

Exception?

Kernel/User Mode

Read/Write

5/6/2008 22

CS152-Spring!08

Address Translation in CPU Pipeline

• Software handlers need restartable exception on page fault or protectionviolation

• Handling a TLB miss needs a hardware or software mechanism to refill TLB

• Need mechanisms to cope with the additional latency of a TLB:

– slow down the clock

– pipeline the TLB and cache access

– virtual address caches

– parallel TLB/cache access

PCInstTLB

Inst.Cache D Decode E M

DataTLB

DataCache W+

TLB miss? Page Fault?Protection violation?

TLB miss? Page Fault?Protection violation?

5/6/2008 23

CS152-Spring!08

Concurrent Access to TLB & Cache

Index L is available without consulting the TLB# cache and TLB accesses can begin simultaneously

Tag comparison is made after both accesses are completed

Cases: L + b = k L + b < k L + b > k

VPN L b

TLB Direct-map Cache 2L

blocks2b-byte block

PPN Page Offset

=hit?

DataPhysical Tag

Tag

VA

PA

VirtualIndex

k

5/6/2008 24

CS152-Spring!08

CS152 Administrivia

• Lab 4 competition winners!

• Quiz 6 on Thursday, May 8– L19-21, PS 6, Lab 6

• Last 15 minutes, course survey– HKN survey

– Informal feedback survey for those who’ve not done it already

• Quiz 5 results

5/6/2008 30

CS152-Spring!08

Complex Pipeline Structure

IF ID WB

ALU Mem

Fadd

Fmul

Fdiv

Issue

GPR’sFPR’s

5/6/2008 31

CS152-Spring!08

Superscalar In-Order Pipeline

• Fetch two instructions per cycle;issue both simultaneously if oneis integer/memory and other isfloating-point

• Inexpensive way of increasingthroughput, examples includeAlpha 21064 (1992) & MIPSR5000 series (1996)

• Same idea can be extended towider issue by duplicatingfunctional units (e.g. 4-issueUltraSPARC) but register fileports and bypassing costs growquickly

Commit

Point

2PC

Inst.Mem D

DualDecode X1 X2

DataMem W+GPRs

X2 WFadd X3

X3

FPRs X1

X2 Fmul X3

X2FDiv X3

Unpipelined

divider

5/6/2008 32

CS152-Spring!08

Types of Data Hazards

Consider executing a sequence ofrk ! (ri) op (rj)

type of instructions

Data-dependencer3 ! (r1) op (r2) Read-after-Write r5 ! (r3) op (r4) (RAW) hazard

Anti-dependencer3 ! (r1) op (r2) Write-after-Read r1 ! (r4) op (r5) (WAR) hazard

Output-dependencer3 ! (r1) op (r2) Write-after-Write r3 ! (r6) op (r7) (WAW) hazard

5/6/2008 33

CS152-Spring!08

Fetch: Instruction bits retrievedfrom cache.

Phases of Instruction Execution

I-cache

FetchBuffer

IssueBuffer

Func.Units

Arch.State

Execute: Instructions and operands sentto execution units .When execution completes, all results andexception flags are available.

Decode: Instructions placed in appropriateissue (aka “dispatch”) stage buffer

ResultBuffer Commit: Instruction irrevocably updates

architectural state (aka “graduation” or“completion”).

PC

5/6/2008 34

CS152-Spring!08

Pipeline Design with Physical Regfile

FetchDecode &Rename

Reorder BufferPC

BranchPrediction

Update predictors

Commit

BranchResolution

BranchUnit

ALU MEMStoreBuffer

D$

Execute

In-Order

In-OrderOut-of-Order

Physical Reg. File

kill

kill

kill

kill

5/6/2008 35

CS152-Spring!08

Reorder Buffer HoldsActive Instruction Window

ld r1, (r3)

add r3, r1, r2

sub r6, r7, r9

add r3, r3, r6

ld r6, (r1)

add r6, r6, r3

st r6, (r1)

ld r6, (r1)

(Older instructions)

(Newer instructions)

Cycle t

ld r1, (r3)

add r3, r1, r2

sub r6, r7, r9

add r3, r3, r6

ld r6, (r1)

add r6, r6, r3

st r6, (r1)

ld r6, (r1)

Commit

Fetch

Cycle t + 1

Execute

5/6/2008 36

CS152-Spring!08

Branch History Table

4K-entry BHT, 2 bits/entry, ~80-90% correct predictions

0 0Fetch PC

Branch? Target PC

+

I-Cache

Opcode offset

Instruction

k

BHT Index

2k-entryBHT,2 bits/entry

Taken/¬Taken?

5/6/2008 37

CS152-Spring!08

Two-Level Branch PredictorPentium Pro uses the result from the last two branchesto select one of the four sets of BHT bits (~95% correct)

0 0

kFetch PC

Shift inTaken/¬Takenresults of eachbranch

2-bit global branchhistory shift register

Taken/¬Taken?

5/6/2008 38

CS152-Spring!08

Branch Target Buffer (BTB)

• Keep both the branch PC and target PC in the BTB• PC+4 is fetched if match fails• Only taken branches and jumps held in BTB• Next PC determined before branch fetched and decoded

2k-entry direct-mapped BTB(can also be associative)

I-Cache PC

k

Valid

valid

Entry PC

=

match

predicted

target

target PC

5/6/2008 39

CS152-Spring!08

Combining BTB and BHT• BTB entries are considerably more expensive than BHT, but can

redirect fetches at earlier stage in pipeline and can accelerateindirect branches (JR)

• BHT can hold many more entries and is more accurate

A PC Generation/Mux

P Instruction Fetch Stage 1

F Instruction Fetch Stage 2

B Branch Address Calc/Begin Decode

I Complete Decode

J Steer Instructions to Functional units

R Register File Read

E Integer Execute

BTB

BHTBHT in laterpipeline stagecorrects whenBTB misses apredictedtaken branch

BTB/BHT only updated after branch resolves in E stage

5/6/2008 40

CS152-Spring!08

Check instructiondependencies

Superscalar processor

Sequential ISA Bottleneck

a = foo(b);

for (i=0, i<

Sequentialsource code

Superscalar compiler

Find independentoperations

Scheduleoperations

Sequentialmachine code

Scheduleexecution

5/6/2008 41

CS152-Spring!08

VLIW: Very Long Instruction Word

• Multiple operations packed into one instruction

• Each operation slot is for a fixed function

• Constant operation latencies are specified

• Architecture requires guarantee of:– Parallelism within an instruction => no cross-operation RAW check

– No data use before data ready => no data interlocks

Two Integer Units,Single Cycle Latency

Two Load/Store Units,Three Cycle Latency Two Floating-Point Units,

Four Cycle Latency

Int Op 2 Mem Op 1 Mem Op 2 FP Op 1 FP Op 2Int Op 1

5/6/2008 42

CS152-Spring!08

Scheduling Loop Unrolled Code

loop: ld f1, 0(r1)

ld f2, 8(r1)

ld f3, 16(r1)

ld f4, 24(r1)

add r1, 32

fadd f5, f0, f1

fadd f6, f0, f2

fadd f7, f0, f3

fadd f8, f0, f4

sd f5, 0(r2)

sd f6, 8(r2)

sd f7, 16(r2)

sd f8, 24(r2)

add r2, 32

bne r1, r3, loop

Schedule

Int1 Int 2 M1 M2 FP+ FPx

loop:

Unroll 4 ways

ld f1

ld f2

ld f3

ld f4add r1 fadd f5

fadd f6

fadd f7

fadd f8

sd f5

sd f6

sd f7

sd f8add r2 bne

5/6/2008 43

CS152-Spring!08

Software Pipelining

loop: ld f1, 0(r1)

ld f2, 8(r1)

ld f3, 16(r1)

ld f4, 24(r1)

add r1, 32

fadd f5, f0, f1

fadd f6, f0, f2

fadd f7, f0, f3

fadd f8, f0, f4

sd f5, 0(r2)

sd f6, 8(r2)

sd f7, 16(r2)

add r2, 32

sd f8, -8(r2)

bne r1, r3, loop

Int1 Int 2 M1 M2 FP+ FPxUnroll 4 ways first

ld f1

ld f2

ld f3

ld f4

fadd f5

fadd f6

fadd f7

fadd f8

sd f5

sd f6

sd f7

sd f8

add r1

add r2

bne

ld f1

ld f2

ld f3

ld f4

fadd f5

fadd f6

fadd f7

fadd f8

sd f5

sd f6

sd f7

sd f8

add r1

add r2

bne

ld f1

ld f2

ld f3

ld f4

fadd f5

fadd f6

fadd f7

fadd f8

sd f5

add r1

loop:iterate

prolog

epilog

5/6/2008 44

CS152-Spring!08

Vector Programming Model

+ + + + + +

[0] [1] [VLR-1]

Vector ArithmeticInstructions

ADDV v3, v1, v2 v3

v2v1

Scalar Registers

r0

r15Vector Registers

v0

v15

[0] [1] [2] [VLRMAX-1]

VLRVector Length Register

v1Vector Load and

Store Instructions

LV v1, r1, r2

Base, r1 Stride, r2Memory

Vector Register

5/6/2008 45

CS152-Spring!08

Vector Unit Structure

Lane

Functional Unit

VectorRegisters

Memory Subsystem

Elements0, 4, 8, …

Elements1, 5, 9, …

Elements2, 6, 10, …

Elements3, 7, 11, …

5/6/2008 46

CS152-Spring!08

load

Vector Instruction Parallelism

Can overlap execution of multiple vector instructions– example machine has 32 elements per vector register and 8 lanes

loadmul

mul

add

add

Load Unit Multiply Unit Add Unit

time

Instructionissue

Complete 24 operations/cycle while issuing 1 short instruction/cycle

5/6/2008 47

CS152-Spring!08

Multithreading

How can we guarantee no dependencies betweeninstructions in a pipeline?

-- One way is to interleave execution of instructionsfrom different program threads on same pipeline

F D X M W

t0 t1 t2 t3 t4 t5 t6 t7 t8

T1: LW r1, 0(r2)T2: ADD r7, r1, r4T3: XORI r5, r4, #12T4: SW 0(r7), r5T1: LW r5, 12(r1)

t9

F D X M W

F D X M W

F D X M W

F D X M W

Interleave 4 threads, T1-T4, on non-bypassed 5-stage pipe

Prior instruction ina thread alwayscompletes write-back before nextinstruction insame thread readsregister file

5/6/2008 48

CS152-Spring!08

Multithreaded Categories

Tim

e (p

roce

ssor

cyc

le) Superscalar Fine-Grained Coarse-Grained Multiprocessing

Simultaneous

Multithreading

Thread 1

Thread 2

Thread 3

Thread 4

Thread 5

Idle slot

5/6/2008 49

CS152-Spring!08!085/6/2008 49

CS152-Spring

Power 4

SMT in Power 5

2 fetch (PC),

2 initial decodes

2 commits

(architected

register sets)

5/6/2008 50

CS152-Spring!08

A Producer-Consumer Example

The program is written assuminginstructions are executed in order.

Producer posting Item x:Load Rtail, (tail)Store (Rtail), xRtail=Rtail+1Store (tail), Rtail

Consumer:Load Rhead, (head)

spin: Load Rtail, (tail)if Rhead==Rtail goto spinLoad R, (Rhead)Rhead=Rhead+1Store (head), Rhead

process(R)

Producer Consumertail head

RtailRtail Rhead R

5/6/2008 51

CS152-Spring!08

Sequential ConsistencyA Memory Model

“ A system is sequentially consistent if the result ofany execution is the same as if the operations of allthe processors were executed in some sequential order, and the operations of each individual processorappear in the order specified by the program”

Leslie Lamport

Sequential Consistency = arbitrary order-preserving interleavingof memory references of sequential programs

M

P P P P P P

5/6/2008 52

CS152-Spring!08

Sequential Consistency

Sequential consistency imposes more memory orderingconstraints than those imposed by uniprocessorprogram dependencies ( )

What are these in our example ?

T1: T2:Store (X), 1 (X = 1) Load R1, (Y) Store (Y), 11 (Y = 11) Store (Y’), R1 (Y’= Y)

Load R2, (X) Store (X’), R2 (X’= X)

additional SC requirements

5/6/2008 53

CS152-Spring!08

Mutual Exclusion and Locks

Want to guarantee only one process is active in a criticalsection

• Blocking atomic read-modify-write instructionse.g., Test&Set, Fetch&Add, Swap

vs• Non-blocking atomic read-modify-write instructions

e.g., Compare&Swap, Load-reserve/Store-conditionalvs

• Protocols based on ordinary Loads and Stores

5/6/2008 54

CS152-Spring!08

Snoopy Cache Protocols

Use snoopy mechanism to keep all processors’view of memory coherent

M1

M2

M3

Snoopy Cache

DMA

Physical Memory

Memory Bus

Snoopy Cache

Snoopy Cache

DISKS

5/6/2008 55

CS152-Spring!08

MESI: An Enhanced MSI protocol increased performance for private data

M E

S I

M: Modified ExclusiveE: Exclusive, unmodifiedS: Shared I: Invalid

Each cache line has a tag

Address tag

state bits

Write miss

Other processorintent to write

Read miss,shared

Other processorintent to write

P1 write

Read by any processor

Other processor reads

P1 writes back

P1 readP1 writeor read

Cache state inprocessor P1

P 1 in

tent

to w

rite

Read miss,not shared

5/6/2008 56

CS152-Spring!08

Basic Operation of Directory

• k processors.

• With each cache-block in memory:k presence-bits, 1 dirty-bit

• With each cache-block in cache:1 valid bit, and 1 dirty (owner) bit• ••

P P

Cache Cache

Memory Directory

presence bits dirty bit

Interconnection Network

• Read from main memory by processor i:

• If dirty-bit OFF then { read from main memory; turn p[i] ON; }

• if dirty-bit ON then { recall line from dirty proc (cache state toshared); update memory; turn dirty-bit OFF; turn p[i] ON; supplyrecalled data to i;}

• Write to main memory by processor i:

• If dirty-bit OFF then {send invalidations to all caches that have theblock; turn dirty-bit ON; supply data to i; turn p[i] ON; ... }

5/6/2008 57

CS152-Spring!08

Directory Cache Protocol(Handout 6)

• Assumptions: Reliable network, FIFO messagedelivery between any given source-destination pair

CPU

Cache

Interconnection Network

DirectoryController

DRAM Bank

DirectoryController

DRAM Bank

CPU

Cache

CPU

Cache

CPU

Cache

CPU

Cache

CPU

Cache

DirectoryController

DRAM Bank

DirectoryController

DRAM Bank

5/6/2008 58

CS152-Spring!08

Performance of Symmetric Shared-MemoryMultiprocessors

Cache performance is combination of:

1. Uniprocessor cache miss traffic

2. Traffic caused by communication– Results in invalidations and subsequent cache misses

• Adds 4th C: coherence miss– Joins Compulsory, Capacity, Conflict

– (Sometimes called a Communication miss)

5/6/2008 59

CS152-Spring!08

Intel “Nehalem”(2008)

• 2-8 cores

• SMT (2 threads/core)

• Private L2$/core

• Shared L3$

• Initially in 45nm

5/6/2008 60

CS152-Spring!08

Related Courses

CS61C CS 152

CS 258

CS 150

Basic computer

organization, first look

at pipelines + caches

Computer Architecture,

First look at parallel

architectures

Parallel Architectures,

Languages, Systems

Digital Logic Design

Strong

Prerequisite

CS 194-6

New FPGA-based

Architecture Lab Class

CS 252

Graduate Computer

Architecture,

Advanced Topics

5/6/2008 61

CS152-Spring!08

Advice: Get involved in research

E.g.,

• RADLab - data center

• ParLab - parallel clients

• Undergrad research experience is the most importantpart of application to top grad schools.

5/6/2008 62

CS152-Spring!08

End of CS152

• Thanks for being such patient guinea pigs!– Hopefully your pain will help future generations of CS152 students

5/6/2008 63

CS152-Spring!08

Acknowledgements

• These slides contain material developed andcopyright by:

– Arvind (MIT)

– Krste Asanovic (MIT/UCB)

– Joel Emer (Intel/MIT)

– James Hoe (CMU)

– John Kubiatowicz (UCB)

– David Patterson (UCB)

• MIT material derived from course 6.823

• UCB material derived from course CS252