adding pep to multicore - mit computer science and...

Post on 30-Apr-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Adding PEP to MulticoreEric Lau

Jason Miller, Inseok Choi, Omer Khan, Jim Holt, Anantha Chandrakasan, Donald Yeung, Saman Amarasinghe, Anant Agarwal

Feb 16, 2011

Outline• Motivation• PEP Concept• PEP Core Architecture• Graphite Simulation• Applications• Conclusion

3/16/2011 Angstrom Lunch Seminar 2

Motivation PEP Concept Architecture Graphite Applications Conclusion

Angstrom

3/16/2011 Angstrom Lunch Seminar 3

Decrease programming effort

Increase performance

Increase resiliency

Increase energy efficiency

Motivation PEP Concept Architecture Graphite Applications Conclusion

Positive Energy Partnerships• Non-application resources that perform tasks for a

net gain in energy.

• Hardware Partnerships• shared resources• hard-wired events

• Software Partnerships• expose to h/w ISA• algorithms• event servicing

3/16/2011 Angstrom Lunch Seminar 4

Motivation PEP Concept Architecture Graphite Applications Conclusion

Tile

Positive Energy Partnerships• PEP Cores

3/16/2011 Angstrom Lunch Seminar 5

Router

ComputeCore

Cache

PEP

Motivation PEP Concept Architecture Graphite Applications Conclusion

PEP Core Architecture

3/16/2011 Angstrom Lunch Seminar 6

Motivation PEP Concept Architecture Graphite Applications Conclusion

PEP Core Architecture• Area Considerations

• μController: 16-bit, RISC, in-order datapath, unified cache• RAW: 32-bit, RISC, 8-stage, in-order datapath, split cache• Core 2: 64-bit, x86, 14-stage, out-of-order datapath, split cache

3/16/2011 Angstrom Lunch Seminar 7

Core Node (nm) SRAM Size (mm2) Scaled Size (mm2)

μController 65 128K 1.5 0.03

RAW 180 128K 16 0.25

Intel Core 2 65 2M/4M 80 9.0

Motivation PEP Concept Architecture Graphite Applications Conclusion

PEP Core Architecture• Power Considerations

• μController: 16-bit, RISC, 1 MHz• TilePro64: 64-bit, VLIW, 700-866 MHz• Core 2: 64-bit, x86, 1-2.4GHz

3/16/2011 Angstrom Lunch Seminar 8

Core Energy/Cycle

μController 27.2pJ

TilePro64 286pJ

Intel Core 2 101 000pJ

Motivation PEP Concept Architecture Graphite Applications Conclusion

Graphite

3/16/2011 Angstrom Lunch Seminar 9

Target Architecture

Graphite

Host Threads

Application

Host Process

Host Process

Host Process

Target Core Target Core Target Core

Target Core Target Core Target Core

Target Core Target Core Target CoreApplication Threads

Motivation PEP Concept Architecture Graphite Applications Conclusion

Graphite

3/16/2011 Angstrom Lunch Seminar 10

Target Architecture

Graphite

Host Threads

Application Threads

Application

Host Process

Host Process

Host Process

C P C P C P

C PC PC P

C P C P C P

Motivation PEP Concept Architecture Graphite Applications Conclusion

Potential Applications• Case Study

– Memory Prefetching

• Other Applications– Security– Reliability– Event Probing

3/16/2011 Angstrom Lunch Seminar 11

Motivation PEP Concept Architecture Graphite Applications Conclusion

Case Study: Memory Prefetching• Pre-Execution

3/16/2011 Angstrom Lunch Seminar 12

• PEP cores are free to run with minimal energy:– Low clock frequency– Low power hardware– Tight coupling

• Helper thread extracted from main thread.– Static compiler– Dynamic extraction

Motivation PEP Concept Architecture Graphite Applications Conclusion

Case Study: Helper Threads• EM3D benchmark (Execution Time)

3/16/2011 Angstrom Lunch Seminar 13

1.081.17 1.27

1.38

2.07

2.36

0.0

0.5

1.0

1.5

2.0

2.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

Nor

mal

ized

Exec

utio

n Ti

me

PEP Frequency/Main Frequency

Motivation PEP Concept Architecture Graphite Applications Conclusion

Case Study: Helper Threads• EM3D benchmark (Energy Efficiency)

3/16/2011 Angstrom Lunch Seminar 14

1.07931.1623

1.2436 1.2592

1.5797

1.1944

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

Nor

mal

ized

Ener

gy E

ffici

ency

PEP Frequency/Main Frequency

Motivation PEP Concept Architecture Graphite Applications Conclusion

Potential Applications• Security

– PEP core does not run application code• Immunity to application level attacks!

– Dynamic Information Flow Tracking (DIFT)• Taint pointers using PEP core• Run DIFT instructions in PEP

3/16/2011 Angstrom Lunch Seminar 15

Motivation PEP Concept Architecture Graphite Applications Conclusion

Potential Applications• Security

– Helper thread runs ahead only on DIFT instructions– Tight coupling allows access to register values

3/16/2011 Angstrom Lunch Seminar 16

PEP PipelineMain Pipeline

main helper

taint register

Motivation PEP Concept Architecture Graphite Applications Conclusion

Potential Applications• Reliability

– With 1000 cores on a single chip, and ultra low voltage logic, transient faults are a terrible issue.

– Redundant Multithreading (RMT)• Create a leading thread and a trailing thread.• Perform exact same execution on both threads.• Commit results only if both threads yield same results.

3/16/2011 Angstrom Lunch Seminar 17

Motivation PEP Concept Architecture Graphite Applications Conclusion

Potential Applications• Reliability

– Implementation• SMT: Low hardware overhead; incurs switching overhead.• CMP: Simple to implement; repeats mis-speculations.• PEP cores gain advantages of both!

3/16/2011 Angstrom Lunch Seminar 18

trailing

leading

Memory/RF/ $

Input Replication

Output Comparison

Motivation PEP Concept Architecture Graphite Applications Conclusion

Potential Applications• Self-Aware Computing

– Requires a way to monitor itself:• SMT• Extra thread• Extra core

– PEP core possesses detached execution context.• Main core keeps running!

3/16/2011 Angstrom Lunch Seminar 19

Motivation PEP Concept Architecture Graphite Applications Conclusion

Potential Applications• Event Probing

3/16/2011 Angstrom Lunch Seminar 20

Motivation PEP Concept Architecture Graphite Applications Conclusion

Potential Applications• Event Probing

3/16/2011 Angstrom Lunch Seminar 21

Main Pipeline

Cache PEP Core

On-chip Network Switch

Low

High

PEP Network Switch

Motivation PEP Concept Architecture Graphite Applications Conclusion

Conclusion• PEP cores allow for several optimizations with a

low energy/area cost (<10%).

• Next steps:– Evaluate more applications with more benchmarks.– Implement a comprehensive scheme for all these

applications.– Determine proper communication protocol between

PEP cores and main cores.

3/16/2011 Angstrom Lunch Seminar 22

HELPER

Case Study: Memory Prefetching

3/16/2011 Angstrom Lunch Seminar 23

CarbonSpawnHelperThread(helper);...

CarbonCondBroadcast(&resume_cond);cur_node = node_vec;

for (n = 0; n < N; ++n, ++cur_node){CarbonMutexLock(&counter_lock);shared_counter++;CarbonMutexunLock(&counter_lock);

cur_value = val(cur_node);values = cur_node->values;coeffs = cur_node->coeffs;

for (i = 0; i < stop; i++) cur_value -= values[i]*coeffs[i];

val(cur_node) = cur_value;}

MAIN

void * helper(){

while(1) {

CarbonCondWait(&resume_cond);

// Synchronization Code...local_counter = shared_counter++;...

values = cur_node->values;coeffs = cur_node->coeffs;

for (i = 0; i < stop; i++) {coeff_dummy = coeffs[i];value_dummy = values[i];

}}

}

HELPER

Case Study: Memory Prefetching

3/16/2011 Angstrom Lunch Seminar 24

CarbonSpawnHelperThread(helper);...

CarbonCondBroadcast(&resume_cond);cur_node = node_vec;

for (n = 0; n < N; ++n, ++cur_node){CarbonMutexLock(&counter_lock);shared_counter++;CarbonMutexunLock(&counter_lock);

cur_value = val(cur_node);values = cur_node->values;coeffs = cur_node->coeffs;

for (i = 0; i < stop; i++) cur_value -= values[i]*coeffs[i];

val(cur_node) = cur_value;}

MAIN

while (l_counter < n_nodes){while(1) {

CarbonMutexLock(&counter_lock);c_counter = shared_counter;CarbonMutexUnlock(&counter_lock);

offset = l_counter - c_counter;

if (offset >= MIN && offset < MAX)break;

else if ( offset <= MIN_OFFSET)l_counter = c_counter + OFFSET;break;

}

while(prev_l_counter < l_counter){cur_node++;prev_l_counter++;

}}

Backup Slides

3/16/2011 Angstrom Lunch Seminar 25

Motivation PEP Concept Architecture Graphite Applications Conclusion

Case Study: Memory Prefetching• Simulation Results on Graphite

– Graphite instantiates two cores per tile.– Each core runs a separate context (stack, registers, etc.)– PEP and main cores contain private L1 and shared L2.– Synchronization through finite barrier across all threads.

• EM3D benchmark (Olden Suite)

3/16/2011 Angstrom Lunch Seminar 26

top related