jctools: pit of performance › system › files › presentation-slides › jctools… · # using...
TRANSCRIPT
![Page 1: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/1.jpg)
© Copyright - TTNR Labs Limited 2018
1
JCTools: Pit Of Performance Nitsan Wakart/@nitsanw
![Page 2: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/2.jpg)
© Copyright - TTNR Labs Limited 2018 2
You Look Familiar...
![Page 3: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/3.jpg)
© Copyright - TTNR Labs Limited 2018 3
Why me?
● Main JCTools contributor● Performance Eng.● Find me on:
– GitHub : nitsanw– Blog : psy-lob-saw.blogspot.com– Twitter : nitsanw
● Also: Cape Town JUG Organizer
![Page 4: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/4.jpg)
© Copyright - TTNR Labs Limited 2018 4
JCTools?
● Concurrent Queues:– SPSC/MPSC/SPMC/MPMC– Linked/Array/LinkedArray/Compound
● And MOAR!!!
![Page 5: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/5.jpg)
© Copyright - TTNR Labs Limited 2018 5
What is this about?
● Optimizations as found in the code
● Design pressures/choices
● Novel algorithm: MPSC linked queues
![Page 6: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/6.jpg)
© Copyright - TTNR Labs Limited 2018 6
From the codebase...
● Layout● Unsafe● Concurrency specialization● Bits and bobs
![Page 7: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/7.jpg)
© Copyright - TTNR Labs Limited 2018 7
Layout
abstract class ...L1Pad<E> extends ...ColdField<E> { long p00,p01,p02,p03,p04,p05,p06,p07, p08,p09,p10,p11,p12,p13,p14;}
abstract class ...ProducerIndexFields<E> extends ...L1Pad<E> { private volatile long producerIndex; protected long producerLimit;}
abstract class ...L2Pad<E> extends ...ProducerIndexFields<E> { long p00,p01,p02,p03,p04,p05,p06,p07, p08,p09,p10,p11,p12,p13,p14;}
![Page 8: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/8.jpg)
© Copyright - TTNR Labs Limited 2018 8
Layout
abstract class ...L1Pad<E> extends ...ColdField<E> {
long p00,p01,p02,p03,p04,p05,p06,p07, p08,p09,p10,p11,p12,p13,p14;}
abstract class ...ProducerIndexFields<E> extends ...L1Pad<E> { private volatile long producerIndex; protected long producerLimit;}
abstract class ...L2Pad<E> extends ...ProducerIndexFields<E> {
long p00,p01,p02,p03,p04,p05,p06,p07, p08,p09,p10,p11,p12,p13,p14;}
![Page 9: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/9.jpg)
© Copyright - TTNR Labs Limited 2018 9
Measure with/with out:
With padding: ~950 ns/opSans padding: ~5000 ns/op
*QueueBurst, 100 messages benchmark
![Page 10: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/10.jpg)
© Copyright - TTNR Labs Limited 2018 10
Java Object Layout
● 8b aligned● Object header (8/12/16b)● Sub-classes fields after parent fields● Field order within class can change
![Page 11: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/11.jpg)
© Copyright - TTNR Labs Limited 2018 11
Let’s use JOL!!!# Running 64-bit HotSpot VM.# Using compressed oop with 3-bit shift.# Using compressed klass with 3-bit shift.# WARNING | Compressed references base/shifts are guessed by the experiment!# WARNING | Therefore, computed addresses are just guesses, and ARE NOT RELIABLE.# WARNING | Make sure to attach Serviceability Agent to get the reliable addresses.# Objects are 8 bytes aligned.# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Instantiated the sample instance via public org.jctools.queues.FFBuffer(int)org.jctools.queues.SpscArrayQueue object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 37 28 01 f8 (00110111 00101000 00000001 11111000) (-134141897) 12 4 (alignment/padding gap) 16 8 long ConcurrentCircularArrayQueueL0Pad.p00 0 24 8 long ConcurrentCircularArrayQueueL0Pad.p01 0 32 8 long ConcurrentCircularArrayQueueL0Pad.p02 0 40 8 long ConcurrentCircularArrayQueueL0Pad.p03 0 48 8 long ConcurrentCircularArrayQueueL0Pad.p04 0 56 8 long ConcurrentCircularArrayQueueL0Pad.p05 0 64 8 long ConcurrentCircularArrayQueueL0Pad.p06 0 72 8 long ConcurrentCircularArrayQueueL0Pad.p07 0 80 8 long ConcurrentCircularArrayQueueL0Pad.p08 0 88 8 long ConcurrentCircularArrayQueueL0Pad.p09 0 96 8 long ConcurrentCircularArrayQueueL0Pad.p10 0 104 8 long ConcurrentCircularArrayQueueL0Pad.p11 0 112 8 long ConcurrentCircularArrayQueueL0Pad.p12 0 120 8 long ConcurrentCircularArrayQueueL0Pad.p13 0 128 8 long ConcurrentCircularArrayQueueL0Pad.p14 0 136 8 long ConcurrentCircularArrayQueue.mask 3 144 4 java.lang.Object[] ConcurrentCircularArrayQueue.buffer [null, null, null, null] 148 4 int SpscArrayQueueColdField.lookAheadStep 1 152 8 long SpscArrayQueueL1Pad.p00 0 160 8 long SpscArrayQueueL1Pad.p01 0 168 8 long SpscArrayQueueL1Pad.p02 0 176 8 long SpscArrayQueueL1Pad.p03 0 184 8 long SpscArrayQueueL1Pad.p04 0 192 8 long SpscArrayQueueL1Pad.p05 0 200 8 long SpscArrayQueueL1Pad.p06 0 208 8 long SpscArrayQueueL1Pad.p07 0 216 8 long SpscArrayQueueL1Pad.p08 0 224 8 long SpscArrayQueueL1Pad.p09 0 232 8 long SpscArrayQueueL1Pad.p10 0 240 8 long SpscArrayQueueL1Pad.p11 0 248 8 long SpscArrayQueueL1Pad.p12 0 256 8 long SpscArrayQueueL1Pad.p13 0 264 8 long SpscArrayQueueL1Pad.p14 0 272 8 long SpscArrayQueueProducerIndexFields.producerIndex 0 280 8 long SpscArrayQueueProducerIndexFields.producerLimit 0 288 8 long SpscArrayQueueL2Pad.p00 0 296 8 long SpscArrayQueueL2Pad.p01 0 304 8 long SpscArrayQueueL2Pad.p02 0 312 8 long SpscArrayQueueL2Pad.p03 0 320 8 long SpscArrayQueueL2Pad.p04 0 328 8 long SpscArrayQueueL2Pad.p05 0 336 8 long SpscArrayQueueL2Pad.p06 0 344 8 long SpscArrayQueueL2Pad.p07 0 352 8 long SpscArrayQueueL2Pad.p08 0 360 8 long SpscArrayQueueL2Pad.p09 0 368 8 long SpscArrayQueueL2Pad.p10 0 376 8 long SpscArrayQueueL2Pad.p11 0 384 8 long SpscArrayQueueL2Pad.p12 0 392 8 long SpscArrayQueueL2Pad.p13 0 400 8 long SpscArrayQueueL2Pad.p14 0 408 8 long SpscArrayQueueConsumerIndexField.consumerIndex 0 416 8 long SpscArrayQueueL3Pad.p00 0 424 8 long SpscArrayQueueL3Pad.p01 0 432 8 long SpscArrayQueueL3Pad.p02 0 440 8 long SpscArrayQueueL3Pad.p03 0 448 8 long SpscArrayQueueL3Pad.p04 0 456 8 long SpscArrayQueueL3Pad.p05 0 464 8 long SpscArrayQueueL3Pad.p06 0 472 8 long SpscArrayQueueL3Pad.p07 0 480 8 long SpscArrayQueueL3Pad.p08 0 488 8 long SpscArrayQueueL3Pad.p09 0 496 8 long SpscArrayQueueL3Pad.p10 0 504 8 long SpscArrayQueueL3Pad.p11 0 512 8 long SpscArrayQueueL3Pad.p12 0 520 8 long SpscArrayQueueL3Pad.p13 0 528 8 long SpscArrayQueueL3Pad.p14 0Instance size: 536 bytes
![Page 12: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/12.jpg)
© Copyright - TTNR Labs Limited 2018 12
Let’s use JOL!!!# Running 64-bit HotSpot VM.# Using compressed oop with 3-bit shift.# Using compressed klass with 3-bit shift.# Objects are 8 bytes aligned.# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
![Page 13: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/13.jpg)
© Copyright - TTNR Labs Limited 2018 13
Let’s use JOL!!!org.jctools.queues.SpscArrayQueue object internals: OFFSET SIZE TYPE DESCRIPTION 0 12 (object header) 12 4 (alignment/padding gap) 16 120 long ConcurrentCircularArrayQueueL0Pad.p00 [... p14] 136 8 long ConcurrentCircularArrayQueue.mask 144 4 java.lang.Object[] ConcurrentCircularArrayQueue.buffer 148 4 int SpscArrayQueueColdField.lookAheadStep 152 120 long SpscArrayQueueL1Pad.p00 [... p14] 272 8 long SpscArrayQueueProducerIndexFields.producerIndex 280 8 long SpscArrayQueueProducerIndexFields.producerLimit 288 120 long SpscArrayQueueL2Pad.p00 [... p14] 408 8 long SpscArrayQueueConsumerIndexField.consumerIndex 416 120 long SpscArrayQueueL3Pad.p00 [... p14]
Instance size: 536 bytes
![Page 14: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/14.jpg)
© Copyright - TTNR Labs Limited 2018 14
Let’s use JOL!!!org.jctools.queues.SpscArrayQueue object internals: OFFSET SIZE TYPE DESCRIPTION
0 12 (object header) 12 4 (alignment/padding gap) 16 120 long ConcurrentCircularArrayQueueL0Pad.p00 [... p14] 136 8 long ConcurrentCircularArrayQueue.mask 144 4 java.lang.Object[] ConcurrentCircularArrayQueue.buffer 148 4 int SpscArrayQueueColdField.lookAheadStep 152 120 long SpscArrayQueueL1Pad.p00 [... p14] 272 8 long SpscArrayQueueProducerIndexFields.producerIndex 280 8 long SpscArrayQueueProducerIndexFields.producerLimit 288 120 long SpscArrayQueueL2Pad.p00 [... p14] 408 8 long SpscArrayQueueConsumerIndexField.consumerIndex 416 120 long SpscArrayQueueL3Pad.p00 [... p14]
Instance size: 536 bytes
![Page 15: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/15.jpg)
© Copyright - TTNR Labs Limited 2018 15
Let’s use JOL!!!org.jctools.queues.SpscArrayQueue object internals: OFFSET SIZE TYPE DESCRIPTION 0 12 (object header) 12 4 (alignment/padding gap) 16 120 long ConcurrentQueueL0Pad.p00 [... p14] 136 8 long ConcurrentCircularArrayQueue.mask 144 4 java.lang.Object[] ConcurrentCircularArrayQueue.buffer 148 4 int SpscArrayQueueColdField.lookAheadStep 152 120 long SpscArrayQueueL1Pad.p00 [... p14] 272 8 long SpscArrayQueueProducerIndexFields.producerIndex 280 8 long SpscArrayQueueProducerIndexFields.producerLimit 288 120 long SpscArrayQueueL2Pad.p00 [... p14] 408 8 long SpscArrayQueueConsumerIndexField.consumerIndex 416 120 long SpscArrayQueueL3Pad.p00 [... p14]
Instance size: 536 bytes
![Page 16: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/16.jpg)
© Copyright - TTNR Labs Limited 2018 16
Let’s use JOL!!!org.jctools.queues.SpscArrayQueue object internals: OFFSET SIZE TYPE DESCRIPTION 0 12 (object header) 12 4 (alignment/padding gap) 16 120 long ConcurrentCircularArrayQueueL0Pad.p00 [... p14] 136 8 long ConcurrentCircularArrayQueue.mask 144 4 java.lang.Object[] ConcurrentCircularArrayQueue.buffer 148 4 int SpscArrayQueueColdField.lookAheadStep 152 120 long SpscArrayQueueL1Pad.p00 [... p14] 272 8 long SpscArrayQueueProducerIndexFields.producerIndex 280 8 long SpscArrayQueueProducerIndexFields.producerLimit 288 120 long SpscArrayQueueL2Pad.p00 [... p14] 408 8 long SpscArrayQueueConsumerIndexField.consumerIndex 416 120 long SpscArrayQueueL3Pad.p00 [... p14]
Instance size: 536 bytes
![Page 17: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/17.jpg)
© Copyright - TTNR Labs Limited 2018 17
Why bother with layout?
● False sharing● Minimize load misses● Offheap?
– Atomicity– Alignment requirements (line/page/sector)– Different method (see Aeron)
![Page 18: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/18.jpg)
© Copyright - TTNR Labs Limited 2018 18
Sharing?header header pIndex cIndex mask buffer
![Page 19: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/19.jpg)
© Copyright - TTNR Labs Limited 2018 19
False Sharingheader header pIndex cIndex Mask buffer
![Page 20: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/20.jpg)
© Copyright - TTNR Labs Limited 2018 20
False Sharingheader header pIndex cIndex Mask buffer
![Page 21: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/21.jpg)
© Copyright - TTNR Labs Limited 2018 21
Pad to resolve...??? header header pad12 pad13 pad14 pad15 pad16
pad17 pad20 pad21 pad22 pad23 pad24 pad25 pad26
pIndex pMask pBuffer pad10 pad11 pad12 pad13 pad14
pad15 pad16 pad17 pad20 pad21 pad22 pad23 pad24
pad25 pad26 pad27 cIndex cMask cBuffer pad10 pad11
![Page 22: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/22.jpg)
© Copyright - TTNR Labs Limited 2018 22
Alternatives To Method?
● Rely on field order● @Contended (field/class level, -XX:-Restrict)● Go Offheap
![Page 23: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/23.jpg)
© Copyright - TTNR Labs Limited 2018 23
What would be nice?
@ForceLayout(align=64) // state alignment preference
Class X {
long field1; // still type aligned
byte[120] pad; // allocate presized array inline
long field2;
}
![Page 24: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/24.jpg)
© Copyright - TTNR Labs Limited 2018
24
Unsafe
![Page 25: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/25.jpg)
© Copyright - TTNR Labs Limited 2018 25
Looks safe enough...abstract class ...ProducerIndexFields<E> extends SpscArrayQueueL1Pad<E>{ final static long P_INDEX_OFFSET =
fieldOffset(...Fields.class, "producerIndex"); private volatile long producerIndex; protected long producerLimit;
long lvProducerIndex() { return producerIndex; } long lpProducerIndex() {
return UNSAFE.getLong(this, P_INDEX_OFFSET); }
void soProducerIndex(long newValue) {
UNSAFE.putOrderedLong(this, P_INDEX_OFFSET, newValue); }}
![Page 26: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/26.jpg)
© Copyright - TTNR Labs Limited 2018 26
Why do we need Unsafe?
● Better than a poke in the eye with a sharp stick?
● Performance?● Compatibility????!??!?!?!?
![Page 27: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/27.jpg)
© Copyright - TTNR Labs Limited 2018 27
Unsafe for Compatibility?
● Just Works!™ (JDK6,7,8,11)● Shouldn’t we use VarHandle?
![Page 28: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/28.jpg)
© Copyright - TTNR Labs Limited 2018 28
Unsafe vs. A*FU (pre-8u101)
With Unsafe : ~950 ns/op
With A*FU@7 : ~1350 ns/op
* Tested with JDK 7u80 as an example
![Page 29: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/29.jpg)
© Copyright - TTNR Labs Limited 2018 29
Unsafe vs. A*FU (post 8u101/11)
With Unsafe : ~950 ns/op
With A*FU@8 : ~1050 ns/op
With A*FU@11 : ~1150 ns/op
See: https://shipilev.net/blog/2015/faster-atomic-fu/
![Page 30: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/30.jpg)
© Copyright - TTNR Labs Limited 2018 30
Unsafe vs VarHandles (>=11)
With Unsafe : ~950 ns/op
With VarHandle : ~1150 ns/op
![Page 31: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/31.jpg)
© Copyright - TTNR Labs Limited 2018 31
Unsafe for performance?
With Unsafe@7,8,11 : ~950 ns/op
With A*FU@7 : ~1300 ns/op
With A*FU@8 : ~1050 ns/op
With A*FU@11 : ~1150 ns/op
With VarHandle@11 : ~1150 ns/op
![Page 32: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/32.jpg)
© Copyright - TTNR Labs Limited 2018 32
Unsafe vs Alternatives
● Pre-JDK11?– Volatile– AtomicBoolean/Integer/Long/*Array– Atomic*FieldUpdater
● JDK11+?– Should move to VarHandle?– Multi-version jars
● Offheap? GO UNSAFE!!!
![Page 33: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/33.jpg)
© Copyright - TTNR Labs Limited 2018
33
Concurrency Specialization
![Page 34: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/34.jpg)
© Copyright - TTNR Labs Limited 2018 34
Generic vs. Specialized● JDK?
– MPMC– Lock based (ArrayBlockingQ/LinkedBlockingQ)– Lock-less (ConcurrentLinkedQ)– Allocation?
● JCTools?– MPMC/SPMC/MPSC/SPSC– Array backed/Linked/both!– Lock-less/lock-free/wait-free– Zero allocation options
![Page 35: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/35.jpg)
© Copyright - TTNR Labs Limited 2018 35
Generic vs. Specialized● Load: plain, opaque, volatile● Store: plain, ordered, volatile● compareAndSwap● getAndAdd
![Page 36: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/36.jpg)
© Copyright - TTNR Labs Limited 2018 36
Single Writer● Load: plain/volatile● Store: ordered
![Page 37: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/37.jpg)
© Copyright - TTNR Labs Limited 2018 37
Multi-Writer● Load: plain/volatile● Store: getAndAdd/compareAndSwap
![Page 38: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/38.jpg)
© Copyright - TTNR Labs Limited 2018 38
Generic vs. SpecializedSPSC – 950 ns/op
MPSC – 6500 ns/op
MPMC – 5500 ns/op
CLQ – 13800 ns/op
ABQ – 36000 ns/op
![Page 39: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/39.jpg)
© Copyright - TTNR Labs Limited 2018 39
Lock less/Lock free/Wait freeSPSC – 950 ns/op – WF
MPSC – 5000/6500 ns/op – LF/LL
MPMC – 5000/5500 ns/op – LF/LL
CLQ – 13800 ns/op – LL
ABQ – 36000 ns/op – Locks...
![Page 40: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/40.jpg)
© Copyright - TTNR Labs Limited 2018 40
AllocationsCLQ – 2400b per op(100 messages)
LBQ – ~3000b
ABQ – ?
![Page 41: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/41.jpg)
© Copyright - TTNR Labs Limited 2018 41
AllocationsCLQ – 2400b per op(100 messages)
LBQ – ~3000b
ABQ – ~1500b!!!
JCTools array backed - 0b
![Page 42: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/42.jpg)
© Copyright - TTNR Labs Limited 2018
42
Helping the Compiler
![Page 43: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/43.jpg)
© Copyright - TTNR Labs Limited 2018 43
Const vs Finalprivate final long size = <some power of 2>; long offset = REF_ARRAY_BASE +
((consumerIndex % size) * REF_ELEMENT_SCALE);
long offset = REF_ARRAY_BASE +
((consumerIndex & (size-1)) * REF_ELEMENT_SCALE);
![Page 44: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/44.jpg)
© Copyright - TTNR Labs Limited 2018 44
Const vs Finallong offset = REF_ARRAY_BASE + ((consumerIndex & (size-1)) * REF_ELEMENT_SCALE);
long offset = REF_ARRAY_BASE +
((consumerIndex & mask) << REF_ELEMENT_SHIFT);
![Page 45: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/45.jpg)
© Copyright - TTNR Labs Limited 2018 45
Const vs Finallong offset = REF_ARRAY_BASE + ((consumerIndex & mask) << REF_ELEMENT_SHIFT);
long offset = calcElementOffset(consumerIndex, mask);
![Page 46: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/46.jpg)
© Copyright - TTNR Labs Limited 2018 46
Loads Across an Volatile Loadpublic E poll(){ long consumerIndex = lpConsumerIndex();
E[] buffer = this.buffer; long offset = calcElementOffset(consumerIndex, mask);
E e = (E) UNSAFE.getObjectVolatile(buffer, offset); if (null == e) return null; // EMPTY
UNSAFE.putOrderedObject(buffer, offset, null); soConsumerIndex(consumerIndex + 1); return e;}
![Page 47: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/47.jpg)
© Copyright - TTNR Labs Limited 2018
47
A Novel Algorithm
![Page 48: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/48.jpg)
© Copyright - TTNR Labs Limited 2018 48
Typical Offerings
● Array backed– Pre-allocated, fixed sized
● Linked– Allocate cell per element– No pre-allocation– Bounded/Unbounded
![Page 49: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/49.jpg)
© Copyright - TTNR Labs Limited 2018 49
Actor Use Case
● Lots of queues● Some full & busy● Most empty● MPSC
![Page 50: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/50.jpg)
© Copyright - TTNR Labs Limited 2018 50
Linked Array Queues?
● Middle Ground?● Allocate a multi-element ‘cell/chunk’● Stay in ‘cell/chunk’? → no allocation● Resizing on the fly
![Page 51: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/51.jpg)
© Copyright - TTNR Labs Limited 2018 51
Challenges?
● Lock less algo?● Linking new array● Sizing
![Page 52: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/52.jpg)
© Copyright - TTNR Labs Limited 2018 52
Hanging on a Bit
● Steal producerIndex parity bit– Fixup code to match
● CAS parity to block ALL other producers– Let them spin
![Page 53: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/53.jpg)
© Copyright - TTNR Labs Limited 2018 53
JUMP indicator
● Notify consumer needs to jump● Use extra cell in array as next pointer
![Page 54: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/54.jpg)
© Copyright - TTNR Labs Limited 2018 54
Performance?
● Slightly slower than array backed MPSC● Very similar throughput● Allocates as needed:
– Stay in chunk → no allocation– Growable → allocate until capacity– Chunked → bounded, fixed chunks– Unbounded → blow yer heap, in fixed increments!
![Page 55: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/55.jpg)
© Copyright - TTNR Labs Limited 2018
55
In Closing...
![Page 56: JCTools: Pit Of Performance › system › files › presentation-slides › jctools… · # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING](https://reader034.vdocuments.mx/reader034/viewer/2022042323/5f0e20057e708231d43dbb95/html5/thumbnails/56.jpg)
© Copyright - TTNR Labs Limited 2018
56
Thanks!