lightweight compression methods achieving 120gbps and...

40
Lightweight Compression Methods Achieving 120GBps and More Piotr Przymus Laboratoire d’Informatique Fondamentale de Marseille Aix-Marseille University, France GPU Technology Conference Silicon Valley May 2017 P. Przymus Lightweight Compression Methods Achieving 120GBps and More 1/25

Upload: others

Post on 16-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Lightweight Compression Methods Achieving120GBps and More

Piotr Przymus

Laboratoire d’Informatique Fondamentale de MarseilleAix-Marseille University, France

GPU Technology ConferenceSilicon Valley May 2017

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 1/25

Page 2: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

K. Kaczmarski and P. Przymus, Fixed Length Lightweight Compression forGPU Revised, Journal of Parallel and Distributed Computing, 2017.

A lightweight compression library for GPU.github.com/mis-wut/feathergpuMIT-licenesed.This project was partly funded by National Science Centre, decisionDEC-2012/07/D/ST6/02483.

TeamKrzysztof Kaczmarski

Warsaw University of Technology, PolandPiotr Przymus

Aix-Marseille University, FranceNicolaus Copernicus University in Toruń,Poland.

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 2/25

Page 3: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Lightweight compression on GPU – motivation

Lightweight compression algorithms favours compression and decompressionspeed over compression ratio.

Improved data transfer:Disk ↔ RAM ↔ GPU.GPU ↔ GPU:

exchange of already compressed data,compress → transfer → decompress.

Lower memory footprint:Less disk space used.Less RAM used.Less GPU memory used.

Improved internal memory access:In some cases improved internal GPU memory access.

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 3/25

Page 4: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Lightweight compression on GPU – motivation

Lightweight compression algorithms favours compression and decompressionspeed over compression ratio.

Improved data transfer:Disk ↔ RAM ↔ GPU.GPU ↔ GPU:

exchange of already compressed data,compress → transfer → decompress.

Lower memory footprint:Less disk space used.Less RAM used.Less GPU memory used.

Improved internal memory access:In some cases improved internal GPU memory access.

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 3/25

Page 5: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Lightweight compression on GPU – motivation

Lightweight compression algorithms favours compression and decompressionspeed over compression ratio.

Improved data transfer:Disk ↔ RAM ↔ GPU.GPU ↔ GPU:

exchange of already compressed data,compress → transfer → decompress.

Lower memory footprint:Less disk space used.Less RAM used.Less GPU memory used.

Improved internal memory access:In some cases improved internal GPU memory access.

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 3/25

Page 6: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Lightweight compression on GPU – motivation

Lightweight compression algorithms favours compression and decompressionspeed over compression ratio.

Improved data transfer:Disk ↔ RAM ↔ GPU.GPU ↔ GPU:

exchange of already compressed data,compress → transfer → decompress.

Lower memory footprint:Less disk space used.Less RAM used.Less GPU memory used.

Improved internal memory access:In some cases improved internal GPU memory access.

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 3/25

Page 7: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Fixed length compression

Fixed length (FL) – is a simple well known compression scheme where fixednumber of bits is suppressed. Suppressed bits should be equal to 0.

0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 00 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0

Figure: Original data, only 4 bits are used in each byte.

0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0

Figure: Compressed data (each byte encodes two words of length 4 bits.)

compression ratio (CR) = Uncompressed sizeCompressed size = 2,

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 4/25

Page 8: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Fixed length compression

Fixed length (FL) – is a simple well known compression scheme where fixednumber of bits is suppressed. Suppressed bits should be equal to 0.

0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 00 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0

Figure: Original data, only 4 bits are used in each byte.

0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0

Figure: Compressed data (each byte encodes two words of length 4 bits.)

compression ratio (CR) = Uncompressed sizeCompressed size = 2,

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 4/25

Page 9: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Fixed length compression

Fixed length (FL) – is a simple well known compression scheme where fixednumber of bits is suppressed. Suppressed bits should be equal to 0.

0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 00 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0

Figure: Original data, only 4 bits are used in each byte.

0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0

Figure: Compressed data (each byte encodes two words of length 4 bits.)

compression ratio (CR) = Uncompressed sizeCompressed size = 2,

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 4/25

Page 10: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Fixed length compression

Fixed length (FL) – is a simple well known compression scheme where fixednumber of bits is suppressed. Suppressed bits should be equal to 0.

0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 00 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0

Figure: Original data, only 4 bits are used in each byte.

0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0

Figure: Compressed data (each byte encodes two words of length 4 bits.)

compression ratio (CR) = Uncompressed sizeCompressed size = 2,

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 4/25

Page 11: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Fixed length compression

Fixed length (FL) compression:easy to implement,easy to achieve high data throughput.

Many applications:Database compression: Columns, Indexes,Timeseries compression,Graph compression,etc.

Many variants:Patched FL, Adaptive FL, DELTA-*

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 5/25

Page 12: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Fixed length compression on GPU

Performance over flexibility (Fang et al. 2010)High performance but highly simplified version of algorithm.

Words are mapped to full bytes e.g. 4 bits word will be mapped to 1 byte.Uses map primitive.Coalesced reads and writes: YES.Direct memory access: YES.

Flexibility over performance(Nvbio and Kaczmarski, Przymus 2012-2017)

No simplifications at the cost of lower performance.Supports all possible bit encodings.Uses allgather or gather primitive.Coalesced reads and writes: NO.Direct memory access: YES.

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 6/25

Page 13: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Fixed length compression on GPU

01

...31

0 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..

1024 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 1055

Figure: Read pattern: GPU version of FL algorithm

0 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495

Figure: Write pattern: GPU version of FL algorithm

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 7/25

Page 14: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Fixed length on GPU (C+D, GTX Titan Black)

0 200 400 600 800 1000

Data Size MB

0

20

40

60

80

100

120

Ban

dwid

th G

B/s

int max int min long max long min int long

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 8/25

Page 15: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Fixed length on GPU (1 GB of data, GTX Titan Black)

Bit Encoding 8 16 24 32 40 48 56 63

50

100

150

200

250

300

Com

pr. G

B/s

50

100

150

200

250

300

Dec

ompr

. GB

/s

int long

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 9/25

Page 16: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Can we do better?

Aligned Fixed Length (AFL) algorithm.

The FL algorithm is optimized for CPU memory access scheme.We can do better with GPU friendly memory organisation scheme.

FeaturesNo simplifications, high performance on GPU.Still works quite well on CPU, but loses some cache hits benefits.Supports all possible bit encodings.Uses allgather or gather primitive.Coalesced reads and writes: YES.Direct memory access: YES.

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 10/25

Page 17: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Aligned FL on GPU0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..

1024 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 1055

Figure: Read pattern: GPU version of Aligned FL algorithm

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495

Figure: Write pattern: GPU version of Aligned FL algorithm

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 11/25

Page 18: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Aligned FL on GPU (C+D, GTX Titan Black)

0 200 400 600 800 1000

Data Size MB

0

20

40

60

80

100

120

Ban

dwid

th G

B/s

int max int min long max long min int long

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 12/25

Page 19: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Aligned FL on GPU (1 GB of data, GTX Titan Black)

Bit Encoding 8 16 24 32 40 48 56 63

50

100

150

200

250

300

Com

pr. G

B/s

50

100

150

200

250

300

Dec

ompr

. GB

/s

int long

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 13/25

Page 20: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Direct memory access

Direct memory access (Random access):single value access with decompression on-the-fly,no need for explicite decompression step,simple integration with existing algorithms.

Example applications of direct access FL:Bioinformatics – NVBIO,Databases – Fang et al. 2010,Timeseries, Graph – Kaczmarski and Przymus 2012-2017.

Aligned FL supports direct access!

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 14/25

Page 21: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Direct access AFL on GPU (C+D, GTX Titan Black)

0 200 400 600 800 1000

Data Size MB

0

20

40

60

80

100

120

Ban

dwid

th G

B/s

int max int min long max long min int long

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 15/25

Page 22: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Initial transformations

A initial data preparation is often used in order to minimize bits usage.Frame of reference (FOR):

Subtracts a given value from all values{v0, v1, . . . , vn} → {v0 − f , v1 − f , . . . , vn − f }.Straightforward and simple integration with AFL.

DELTA:Transforms data into the differences between successive values{v0, v1, . . . , vn} → {v1 − v0, v2 − v1, . . . , vn − vn−1}.Integration with AFL algorithm is not that simple.

Coalesced memory reads and writes require different scheme of datareads.Threads operate on data subsequence {ak , ak+32, ak+64, . . .}.Solution: Interthread communication within warp using shuffleinstruction available starting from the Kepler.

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 16/25

Page 23: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

DELTA Aligned FL on GPU (C+D, GTX Titan Black)

0 200 400 600 800 1000

Data Size MB

0

20

40

60

80

100

120

Ban

dwid

th G

B/s

int max int min long max long min int long

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 17/25

Page 24: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Dealing with the data outliers

FL and AFL algorithms are prone to data outliers.Example:

Input data: {1, 2, 3, 2, 2, 3, 1, 1, 3, 2, 3, 1, 1}2 bits FL (or AFL) encoding may be used.

Input data: {1, 2, 3, 2, 2, 3, 1, 1, 64, 2, 3, 1, 1}6 bits FL (or AFL) encoding may be used.

Solution: Outlier aware algorithmsPatched FL and Patched Aligned FLAdaptive FL and Adaptive Aligned FL

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 18/25

Page 25: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Patched Aligned FL

Patched FL – Zukowski et al. 2006 for CPU.Patched FL (new memory organisation) – Yan et al. 2009 for CPU.Patched FL on GPU – Kaczmarski and Przymus 2012, 2013.

Structure:compressed data, exceptions positions, exceptions values.

Patched Aligned FL:One step compression:

threads gather outliers in local memory buffers,buffer overflow is managed per warp (voting of threads).

Two step decompression:step 1: AFL decompression,step 2: exceptions extraction.

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 19/25

Page 26: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Patched Aligned FL on GPU (C+D, GTX Titan Black)

0 200 400 600 800 1000

Data Size MB

0

20

40

60

80

100

120

Ban

dwid

th G

B/s

int optim. int pessim. long optim. long pessim. int long

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 20/25

Page 27: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Adaptive FL

First introduced by Delbru et al. 2010 for CPU:processes data in chunks,sets different bit length encoding for each chunk.

Adaptive FL does not fit GPU memory model very well.

Adaptive Aligned FL:Aligned FL + new organisation of compressed chunks.Bit length encoding is established per warp.Compressed chunks are order according to warps completion time.

Note that warps may finish in different order then data order.

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 21/25

Page 28: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Adaptive Aligned FL on GPU

202020101023333333393939 5 5 5 1818181831313224 3 3 17383838 1 1111272727132828282837373716 9 9 9 6 121212262626263414141422221515158 8 8 36363636212121 2 2 2 2 1935353535 4 2925 0 0 0 0 7 7 7 7 3030

Figure: Adaptive aligned memory organisation: main compression array.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

86297423831247906444 3 304835566143251578 0 7159 5 228551323684941921 6 5579674026 9

Figure: Adaptive aligned memory organisation: warp offset index.

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 22/25

Page 29: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Adaptive Aligned FL on GPU (C+D, GTX Titan Black)

0 200 400 600 800 1000

Data Size MB

0

20

40

60

80

100

120

Ban

dwid

th G

B/s

int optim. int pessim. long optim. long pessim. int long

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 23/25

Page 30: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

TPC-H Data Benchmark

Org. / Size Bandwidth [GB/s]

Column used Sort. MB Alg. CR Comp. Decomp.

l discount N 239

FL 8.00 52.7137 34.0756dec. / AFL 8.00 211.8846 181.8800int(4) PAFL 8.00 200.6869 72.8897

AAFL 30.79 101.4848 125.6707

l quantity N 239

FL 2.46 45.4952 32.0122int(4) / AFL 2.46 154.2067 141.1335int(4) PAFL 2.46 153.2897 65.5559

AAFL 9.72 81.5276 105.6029

l partkey N 239

FL 1.52 37.7608 29.4157id / AFL 1.52 125.7666 120.2389int(4) PAFL 1.52 125.0484 60.6303

AAFL 6.05 72.4169 93.3693

l shipdate Y 239

FL 1.06 28.2662 26.2429date / DELTA-AFL 1.88 136.6975 127.8505int(4) DELTA-PAFL 31.91 205.3076 55.5019

DELTA-AAFL 126.31 210.5169 209.3501

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 24/25

Page 31: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

A lightweight compression library for GPU.github.com/mis-wut/feathergpuMIT-licenesed.Supported algorithms:

FL, FOR, Adaptive FL, Patched AFLAligned FL, Aligned FOR, Adaptive Aligned FL, Patched Aligned AFL

GTX Titan Black:Compression up to 250GB/sDecompression up to 250GB/s

Tesla P100 16 GB (New!)Compression up to 550GB/sDecompression up to 450GB/s

K. Kaczmarski and P. Przymus, Fixed Length Lightweight Compression forGPU Revised, Journal of Parallel and Distributed Computing, 2017.

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 25/25

Page 32: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Appendix

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 1/9

Page 33: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Aligned FL on GPU (C+D, Tesla P100 16 GB)

0 200 400 600 800 1000

Data Size MB

0

50

100

150

200

250B

andw

idth

 GB

/s

int max int min long max long min int long

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 2/9

Page 34: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Aligned FL on GPU (1 GB of data, Tesla P100 16 GB)

Bit Encoding 8 16 24 32 40 48 56 63

50100150200250300350400450500

Com

pr. G

B/s

50100150200250300350400450500

Dec

ompr

. GB

/s

int long

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 3/9

Page 35: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Direct access Aligned FL on GPU

Bit Encoding 8 16 24 32 40 48 56 63

50

100

150

200

250

300

Com

pr. G

B/s

50

100

150

200

250

300

Dec

ompr

. GB

/s

int long

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 4/9

Page 36: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

DELTA Aligned FL on GPU

Bit Encoding 8 16 24 32 40 48 56 63

50

100

150

200

250

300

Com

pr. G

B/s

50

100

150

200

250

300

Dec

ompr

. GB

/s

int long

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 5/9

Page 37: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Patched Aligned FL on GPU (compression)

0 200 400 600 800 1000

Data Size MB

0

50

100

150

200

250B

andw

idth

 GB

/s

int optim. int pessim. long optim. long pessim. int long

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 6/9

Page 38: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Patched Aligned FL on GPU (decompression)

0 200 400 600 800 1000

Data Size MB

0

50

100

150

200

250B

andw

idth

 GB

/s

int optim. int pessim. long optim. long pessim. int long

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 7/9

Page 39: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Adaptive Aligned FL on GPU (compression)

0 200 400 600 800 1000

Data Size MB

0

50

100

150

200

250B

andw

idth

 GB

/s

int optim. int pessim. long optim. long pessim. int long

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 8/9

Page 40: Lightweight Compression Methods Achieving 120GBps and …on-demand.gputechconf.com/gtc/2017/presentation/s...Aix-Marseille University, France GPU Technology Conference Silicon Valley

Adaptive Aligned FL on GPU (decompression)

0 200 400 600 800 1000

Data Size MB

0

50

100

150

200

250B

andw

idth

 GB

/s

int optim. int pessim. long optim. long pessim. int long

P. Przymus Lightweight Compression Methods Achieving 120GBps and More 9/9