andrada zoltan alexandra fedorova joel nider craig mustard ...€¦ · andrada zoltan alexandra...

Post on 22-Aug-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Processing in

Storage Class MemoryJoel Nider Craig MustardAndrada Zoltan Alexandra Fedorova

Embedding Processors in SCMCPU

Non-volatile RAM

Storage Latency Is Decreasing

Scaling Compute with Storage

CPU + registers

Smart Caches

PIM in RAM

SCM

Smart Disks / SSD

Storage Arrays

Volatile

Persistent

Latency

Scaling Compute with Storage

CPU + registers

Smart Caches

PIM in RAM

SCM

Smart Disks / SSD

Storage Arrays

Volatile

Persistent

Latency

Benefits of PIM on SCM

CPUMemory bus

DPU

SCM

DRAM

Benefits of PIM on SCM

CPUMemory bus

Benefits of PIM on SCM

CPUMemory bus

Benefits of PIM on SCM

CPU

DPU

SCM

Memory bus

Benefits of PIM on SCM

CPUMemory bus

DPU Count: SCM Capacity: 64 4 GB Ratio: 1:64 MB

Core Density

Benefits of PIM on SCM

CPUMemory bus

DPU Count: SCM Capacity: 128 8 GB Ratio: 1:64 MB

Benefits of PIM on SCM

CPUMemory bus

DPU Count: SCM Capacity: 256 16 GB Ratio: 1:64 MB

Benefits of PIM on SCM

CPUMemory bus

DPU Count: SCM Capacity: 512 32 GB Ratio: 1:64 MB

Benefits of PIM on SCM

CPUMemory bus

PIM Design Points

Inter-PIMCommunication

CoreDensity

InstructionSet

Address Translation

UPMEM Architecture and Limitations

DPU

DRAM

UPMEM Architecture and Limitations

DPU

DRAM DDR Interface

Control

SRAM

External Bus

Interleaved Multithreading

UPMEM Architecture and Limitations

ABCDEFGHIJKLMNOPQRSTUVMemory bus

Input data

DPU 0

DPU 1

DPU 2

UPMEM Architecture and Limitations

IJKLMNOPQRSTUVWXYZabcdMemory bus

Input data

A B C D E F G H

DPU 0 A

DPU 1 B

DPU 2 C

UPMEM Architecture and Limitations

QRSTUVWXYZabcdefghijklMemory bus

Input data

AI

BJ

CK

DL

EM

FN

GO

HP

DPU 0 AI

DPU 1 BJ

DPU 2 CK

Raw Performance: Throughput

64KB SRAM9 ranks x 64 DPUS = 576 DPUs

576 DPUs x 64MB = 36GB DRAM

36 GB in 0.16 s = 252 GB/s

Top speed of DDR4-2400 channel: 19GB/s

16 threads @ 2KB per transfer

64 MBDRAM DPU

Use Case: Compression

File Size DPUs

spamfile 84 MB 172

mozilla 50 MB 105

nci 30 MB 64

dickens 10 MB 35

sao 7 MB 21

xml 5 MB 15

world192 1 MB 4

plrabn12 0.5 MB 2

terror2 0.1 MB 1

Wishlist

Concurrent Memory Access

Data Triggered Functions

Mix OfMemory Types

Tuning ForPerformance

Future Directions

HyperdimensionalComputing

Regular Expression

Search?

Thank you for watchingJoel Nider joel@ece.ubc.ca

Craig Mustard craigm@ece.ubc.ca

Andrada Zoltan zoltandrada@gmail.com

Alexandra Fedorova sasha@ece.ubc.ca

top related