andrada zoltan alexandra fedorova joel nider craig mustard ...€¦ · andrada zoltan alexandra...

26
Processing in Storage Class Memory Joel Nider Craig Mustard Andrada Zoltan Alexandra Fedorova

Upload: others

Post on 22-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Processing in

Storage Class MemoryJoel Nider Craig MustardAndrada Zoltan Alexandra Fedorova

Page 2: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Embedding Processors in SCMCPU

Non-volatile RAM

Page 3: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Storage Latency Is Decreasing

Page 4: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Scaling Compute with Storage

CPU + registers

Smart Caches

PIM in RAM

SCM

Smart Disks / SSD

Storage Arrays

Volatile

Persistent

Latency

Page 5: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Scaling Compute with Storage

CPU + registers

Smart Caches

PIM in RAM

SCM

Smart Disks / SSD

Storage Arrays

Volatile

Persistent

Latency

Page 6: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Benefits of PIM on SCM

CPUMemory bus

DPU

SCM

DRAM

Page 7: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Benefits of PIM on SCM

CPUMemory bus

Page 8: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Benefits of PIM on SCM

CPUMemory bus

Page 9: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Benefits of PIM on SCM

CPU

DPU

SCM

Memory bus

Page 10: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Benefits of PIM on SCM

CPUMemory bus

DPU Count: SCM Capacity: 64 4 GB Ratio: 1:64 MB

Core Density

Page 11: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Benefits of PIM on SCM

CPUMemory bus

DPU Count: SCM Capacity: 128 8 GB Ratio: 1:64 MB

Page 12: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Benefits of PIM on SCM

CPUMemory bus

DPU Count: SCM Capacity: 256 16 GB Ratio: 1:64 MB

Page 13: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Benefits of PIM on SCM

CPUMemory bus

DPU Count: SCM Capacity: 512 32 GB Ratio: 1:64 MB

Page 14: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Benefits of PIM on SCM

CPUMemory bus

Page 15: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

PIM Design Points

Inter-PIMCommunication

CoreDensity

InstructionSet

Address Translation

Page 16: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

UPMEM Architecture and Limitations

DPU

DRAM

Page 17: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

UPMEM Architecture and Limitations

DPU

DRAM DDR Interface

Control

SRAM

External Bus

Page 18: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Interleaved Multithreading

Page 19: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

UPMEM Architecture and Limitations

ABCDEFGHIJKLMNOPQRSTUVMemory bus

Input data

DPU 0

DPU 1

DPU 2

Page 20: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

UPMEM Architecture and Limitations

IJKLMNOPQRSTUVWXYZabcdMemory bus

Input data

A B C D E F G H

DPU 0 A

DPU 1 B

DPU 2 C

Page 21: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

UPMEM Architecture and Limitations

QRSTUVWXYZabcdefghijklMemory bus

Input data

AI

BJ

CK

DL

EM

FN

GO

HP

DPU 0 AI

DPU 1 BJ

DPU 2 CK

Page 22: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Raw Performance: Throughput

64KB SRAM9 ranks x 64 DPUS = 576 DPUs

576 DPUs x 64MB = 36GB DRAM

36 GB in 0.16 s = 252 GB/s

Top speed of DDR4-2400 channel: 19GB/s

16 threads @ 2KB per transfer

64 MBDRAM DPU

Page 23: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Use Case: Compression

File Size DPUs

spamfile 84 MB 172

mozilla 50 MB 105

nci 30 MB 64

dickens 10 MB 35

sao 7 MB 21

xml 5 MB 15

world192 1 MB 4

plrabn12 0.5 MB 2

terror2 0.1 MB 1

Page 24: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Wishlist

Concurrent Memory Access

Data Triggered Functions

Mix OfMemory Types

Tuning ForPerformance

Page 25: Andrada Zoltan Alexandra Fedorova Joel Nider Craig Mustard ...€¦ · Andrada Zoltan Alexandra Fedorova. Embedding Processors in SCM CPU Non-volatile RAM. Storage Latency Is Decreasing

Future Directions

HyperdimensionalComputing

Regular Expression

Search?