a quest for predictable latency...some queue implementations spend more time queueing to be enqueued...

112
A Quest for Predictable Latency Adventures in Java Concurrency Martin Thompson - @mjpt777

Upload: others

Post on 01-Oct-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

A Quest for

Predictable LatencyAdventures in Java Concurrency

Martin Thompson - @mjpt777

Page 2: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition
Page 3: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

If a system does not respond

in a timely manner then it is

effectively unavailable

Page 4: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

1. It’s all about the Blocking

2. What do we mean by Latency

3. Adventures with Locks and queues

4. Some Alternative FIFOs

5. Where can we go Next

Page 5: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

1. It’s all about the

Blocking

Page 6: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

What is preventing progress?

Page 7: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

A thread is blocked when it

cannot make progress

Page 8: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

There are two major causes of

blocking

Page 9: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Blocking

• Systemic Pauses

JVM Safepoints (GC, etc.)

Transparent Huge Pages (Linux)

Hardware (C-States, SMIs)

Page 10: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Blocking

• Concurrent Algorithms

Notifying Completion

Mutual Exclusion (Contention)

Synchronisation / Rendezvous

• Systemic Pauses

JVM Safepoints (GC, etc.)

Transparent Huge Pages (Linux)

Hardware (C-States, SMIs)

Page 11: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

• Call into kernel on contention

• Always blocking

• Difficult to get right

• Execute in user space

• Can be non-blocking

• Very difficult to get right

Atomic/CAS InstructionsLocks

Concurrent Algorithms

Page 12: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

2. What do we mean by

Latency

Page 13: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Queuing Theory

Page 14: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Queuing Theory

Service Time

Page 15: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Queuing Theory

Latent/Wait Time

Service Time

Page 16: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Queuing Theory

Service Time

Response

Time

Latent/Wait Time

Page 17: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Queuing Theory

Service Time

Response

Time

DequeueEnqueue

Latent/Wait Time

Page 18: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Queuing Theory

0.0

2.0

4.0

6.0

8.0

10.0

12.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Re

spo

nse

Tim

e

Utilisation

Page 19: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Amdahl’s & Universal Scalability Laws

0

2

4

6

8

10

12

14

16

18

20

1 2 4 8 16 32 64 128 256 512 1024

Sp

ee

du

p

Processors

Amdahl USL

Page 20: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

3. Adventures with

Locks and Queues?

Page 21: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Some queue implementations

spend more time queueing to be

enqueued than in queue time!!!

Page 22: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

The evils of Blocking

Page 23: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Queue.put()

&&

Queue.take()

Page 24: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Condition Variables

Page 25: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Condition Variable RTT

Echo

Service

Histogram

Record

Signal Ping

Signal Pong

Page 26: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

µs

Page 27: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Max =

5525.503µs

µs

Page 28: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Bad News

That’s the best case scenario!

Page 29: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Are non-blocking APIs any better?

Page 30: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Queue.offer()

&&

Queue.poll()

Page 31: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

https://github.com/real-logic/benchmarks

Java 8u60

Ubuntu 15.04 – “performance” mode

Intel i7-3632QM – 2.2 GHz (Ivy Bridge)

Context for the Benchmarks

Page 32: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Prod # Mean 99%

Baseline (Wait-free) 1 167 189

ArrayBlockingQueue 1 645 1,210

2 1,984 11,648

3 5,257 19,680

LinkedBlockingQueue 1 461 740

2 1,197 7,320

3 2,010 15,152

ConcurrentLinkedQueue 1 281 361

2 381 559

3 444 705

Burst Length = 1: RTT (ns)

Page 33: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Prod # Mean 99%

Baseline (Wait-free) 1 721 982

ArrayBlockingQueue 1 30,346 41,728

2 35,631 65,008

3 50,271 90,240

LinkedBlockingQueue 1 31,422 41,600

2 45,096 94,208

3 89,820 180,224

ConcurrentLinkedQueue 1 12,916 15,792

2 25,132 35,136

3 39,462 56,768

Burst Length = 100: RTT (ns)

Page 34: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Prod # Mean 99%

Baseline (Wait-free) 1 721 982

ArrayBlockingQueue 1 30,346 41,728

2 35,631 65,008

3 50,271 90,240

LinkedBlockingQueue 1 31,422 41,600

2 45,096 94,208

3 89,820 180,224

ConcurrentLinkedQueue 1 12,916 15,792

2 25,132 35,136

3 39,462 56,768

Burst Length = 100: RTT (ns)

Page 35: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Backpressure?

Size methods?

Flow Rates?

Garbage?

Fan out?

Page 36: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

5. Some alternative

FIFOs

Page 37: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Inter-Thread FIFOs

Page 38: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Disruptor 1.0 – 2.0

claimSequence <<CAS>>

cursor

gating

Reference Ring Buffer

Influences: Lamport + Network Cards

0

0

0

Page 39: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Disruptor 1.0 – 2.0

claimSequence <<CAS>>

cursor

gating

Reference Ring Buffer

Influences: Lamport + Network Cards

1

0

0

Page 40: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Disruptor 1.0 – 2.0

claimSequence <<CAS>>

cursor

gating

Reference Ring Buffer

Influences: Lamport + Network Cards

1

0

0

Page 41: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Disruptor 1.0 – 2.0

claimSequence <<CAS>>

cursor

gating

Reference Ring Buffer

Influences: Lamport + Network Cards

1

1

0

Page 42: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Disruptor 1.0 – 2.0

claimSequence <<CAS>>

cursor

gating

Reference Ring Buffer

Influences: Lamport + Network Cards

1

1

0

Page 43: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Disruptor 1.0 – 2.0

claimSequence <<CAS>>

cursor

gating

Reference Ring Buffer

Influences: Lamport + Network Cards

1

1

1

Page 44: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

long expectedSequence = claimedSequence - 1;

while (cursor != expectedSequence)

{

// busy spin

}

cursor = claimedSequence;

Page 45: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Disruptor 3.0

gatingCache

cursor <<CAS>>

gating

Reference Ring Buffer

available

Influences: Lamport + Network Cards + Fast Flow

0

0

0

Page 46: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Disruptor 3.0

gatingCache

cursor <<CAS>>

gating

Reference Ring Buffer

available

Influences: Lamport + Network Cards + Fast Flow

0

1

0

Page 47: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Disruptor 3.0

gatingCache

cursor <<CAS>>

gating

Reference Ring Buffer

available

Influences: Lamport + Network Cards + Fast Flow

0

1

0

Page 48: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Disruptor 3.0

gatingCache

cursor <<CAS>>

gating

Reference Ring Buffer

available

Influences: Lamport + Network Cards + Fast Flow

0

1

0

1

Page 49: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Disruptor 3.0

gatingCache

cursor <<CAS>>

gating

Reference Ring Buffer

available

Influences: Lamport + Network Cards + Fast Flow

0

1

0

1

Page 50: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Disruptor 3.0

gatingCache

cursor <<CAS>>

gating

Reference Ring Buffer

available

Influences: Lamport + Network Cards + Fast Flow

0

1

1

1

Page 51: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Disruptor 2.0

Page 52: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Disruptor 3.0

Page 53: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

What if I need something with a

Queue interface?

Page 54: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

ManyToOneConcurrentArrayQueue

Influences: Lamport + Fast Flow

headCache

tail <<CAS>>

head

Reference Ring Buffer

0

0

0

Page 55: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

ManyToOneConcurrentArrayQueue

Influences: Lamport + Fast Flow

headCache

tail <<CAS>>

head

Reference Ring Buffer

0

1

0

Page 56: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

ManyToOneConcurrentArrayQueue

Influences: Lamport + Fast Flow

headCache

tail <<CAS>>

head

Reference Ring Buffer

0

1

0

Page 57: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

ManyToOneConcurrentArrayQueue

Influences: Lamport + Fast Flow

headCache

tail <<CAS>>

head

Reference Ring Buffer

0

1

0

Page 58: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

ManyToOneConcurrentArrayQueue

Influences: Lamport + Fast Flow

headCache

tail <<CAS>>

head

Reference Ring Buffer

0

1

1

Page 59: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Prod # Mean 99%

Baseline (Wait-free) 1 167 189

ConcurrentLinkedQueue 1 281 361

2 381 559

3 444 705

Disruptor 3.3.2 1 201 263

2 256 373

3 282 459

ManyToOneConcurrentArrayQueue 1 173 198

2 238 319

3 259 392

Burst Length = 1: RTT (ns)

Page 60: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Prod # Mean 99%

Baseline (Wait-free) 1 167 189

ConcurrentLinkedQueue 1 281 361

2 381 559

3 444 705

Disruptor 3.3.2 1 201 263

2 256 373

3 282 459

ManyToOneConcurrentArrayQueue 1 173 198

2 238 319

3 259 392

Burst Length = 1: RTT (ns)

Page 61: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Prod # Mean 99%

Baseline (Wait-free) 1 721 982

ConcurrentLinkedQueue 1 12,916 15,792

2 25,132 35,136

3 39,462 56,768

Disruptor 3.3.2 1 7,030 8,192

2 15,159 21,184

3 31,597 54,976

ManyToOneConcurrentArrayQueue 1 3,570 3,932

2 12,926 16,480

3 26,685 51,072

Burst Length = 100: RTT (ns)

Page 62: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Prod # Mean 99%

Baseline (Wait-free) 1 721 982

ConcurrentLinkedQueue 1 12,916 15,792

2 25,132 35,136

3 39,462 56,768

Disruptor 3.3.2 1 7,030 8,192

2 15,159 21,184

3 31,597 54,976

ManyToOneConcurrentArrayQueue 1 3,570 3,932

2 12,926 16,480

3 26,685 51,072

Burst Length = 100: RTT (ns)

Page 63: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

BUT!!!

Disruptor has single producer

and batch methods

Page 64: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Inter-Process FIFOs

Page 65: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

ManyToOneRingBuffer

Page 66: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail <<CAS>>

File

Message 2

Header

Message 3

Header

Head

Page 67: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail <<CAS>>

File

Message 2

Header

Message 3

Header

Head

Page 68: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail <<CAS>>

File

Message 2

Header

Message 3

Header

Head

Message 4

Page 69: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail <<CAS>>

File

Message 2

Header

Message 3

Header

Head

Message 4

Header

Page 70: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail <<CAS>>

File

Message 3

Header

Head

Message 4

Header

Page 71: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail <<CAS>>

File

Message 3

HeaderHead

Message 4

Header

Page 72: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

MPSC Ring Buffer Message Header

0 1 2 3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-------------------------------------------------------------+

|R| Frame Length |

+-+-------------------------------------------------------------+

|R| Message Type |

+-+-------------------------------------------------------------+

| Encoded Message ...

... |

+---------------------------------------------------------------+

Page 73: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Takes a throughput hit on zero’ing

memory but latency is good

Page 74: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Aeron IPC

Page 75: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

TailFile

Page 76: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail

File

Message 1

Header

Page 77: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail

File

Message 1

Header

Message 2

Header

Page 78: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail

File

Message 1

Header

Message 2

Header

Page 79: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail

File

Message 1

Header

Message 2

Header

Message 3

Page 80: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail

File

Message 1

Header

Message 2

Header

Message 3

Header

Page 81: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

One big file that

goes on forever?

Page 82: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

No!!!

Page faults, page cache churn,

VM pressure, ...

Page 83: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

ActiveDirtyClean

Tail

Message

Header

Message

Header

Message

Header

Message

Header

Message

Header

Message

Header

Message

Header

Message

Header

Page 84: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Do publishers need a CAS?

Page 85: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail <<XADD>>

File

Message 1

Header

Message 2

Header

Message 3

Header

Message X

Message Y

Page 86: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail <<XADD>>

File

Message 1

Header

Message 2

Header

Message 3

Header

Message X

Message Y

Page 87: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail

File

Message 1

Header

Message 2

Header

Message 3

Header

Message X

Message Y

Page 88: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail

File

Message 1

Header

Message 2

Header

Message 3

Header

Message X

Message Y

Header

Page 89: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail

File

Message 1

Header

Message 2

Header

Message 3

Header

Message X

Message Y

Header

Padding

Page 90: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

TailFile

Message 1

Header

Message 2

Header

Message 3

Header

Message Y

Header

Padding

File

Message X

Page 91: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Tail

File

Message 1

Header

Message 2

Header

Message 3

Header

Message X

Message Y

Header

Padding

File

Header

Page 92: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Have a background thread do

zero’ing and flow control

Page 93: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Prod # Mean 99%

Baseline (Wait-free) 1 167 189

ConcurrentLinkedQueue 1 281 361

2 381 559

3 444 705

ManyToOneRingBuffer 1 170 194

2 256 279

3 283 340

Aeron IPC 1 192 210

2 251 286

3 290 342

Burst Length = 1: RTT (ns)

Page 94: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Prod # Mean 99%

Baseline (Wait-free) 1 167 189

ConcurrentLinkedQueue 1 281 361

2 381 559

3 444 705

ManyToOneRingBuffer 1 170 194

2 256 279

3 283 340

Aeron IPC 1 192 210

2 251 286

3 290 342

Burst Length = 1: RTT (ns)

Page 95: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Aeron Data Message Header

0 1 2 3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-------------------------------------------------------------+

|R| Frame Length |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-------------------------------+

| Version |B|E| Flags | Type |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-------------------------------+

|R| Term Offset |

+-+-------------------------------------------------------------+

| Session ID |

+---------------------------------------------------------------+

| Stream ID |

+---------------------------------------------------------------+

| Term ID |

+---------------------------------------------------------------+

| Encoded Message ...

... |

+---------------------------------------------------------------+

Page 96: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Prod # Mean 99%

Baseline (Wait-free) 1 721 982

ConcurrentLinkedQueue 1 12,916 15,792

2 25,132 35,136

3 39,462 56,768

ManyToOneRingBuffer 1 3,773 3,992

2 11,286 13,200

3 27,255 34,432

Aeron IPC 1 4,436 4,632

2 7,623 8,224

3 10,825 13,872

Burst Length = 100: RTT (ns)

Page 97: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Prod # Mean 99%

Baseline (Wait-free) 1 721 982

ConcurrentLinkedQueue 1 12,916 15,792

2 25,132 35,136

3 39,462 56,768

ManyToOneRingBuffer 1 3,773 3,992

2 11,286 13,200

3 27,255 34,432

Aeron IPC 1 4,436 4,632

2 7,623 8,224

3 10,825 13,872

Burst Length = 100: RTT (ns)

Page 98: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Less “False Sharing”

-

Inlined data vs Reference Array

Less “Card Marking”

Page 99: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Is avoiding the spinning “CAS”

loop is a major step forward?

Page 100: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Prod # Mean 99%

Baseline (Wait-free) 1 721 982

ConcurrentLinkedQueue 1 12,916 15,792

2 25,132 35,136

3 39,462 56,768

ManyToOneRingBuffer 1 3,773 3,992

2 11,286 13,200

3 27,255 34,432

Aeron IPC 1 4,436 4,632

2 7,623 8,224

3 10,825 13,872

Burst Length = 100: RTT (ns)

Page 101: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

200,000

Burst Length = 100: 99% RTT (ns)

Page 102: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Logging is a

Messaging Problem

What design should we use?

Page 103: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

5. Where can we go

Next?

Page 104: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

C vs Java

Page 105: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

C vs Java

• Spin Wait Loops Thread.onSpinWait()

Page 106: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

C vs Java

• Spin Wait Loops Thread.onSpinWait()

• Data Dependent Loads

Heap Aggregates - ObjectLayout

Stack Allocation – Value Types

Page 107: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

C vs Java

• Spin Wait Loops Thread.onSpinWait()

• Data Dependent Loads

Heap Aggregates - ObjectLayout

Stack Allocation – Value Types

• Memory Copying Baseline against memcpy() for

differing alignment

Page 108: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

C vs Java

• Spin Wait Loops Thread.onSpinWait()

• Data Dependent Loads

Heap Aggregates - ObjectLayout

Stack Allocation – Value Types

• Memory Copying Baseline against memcpy() for

differing alignment

• Queue Interface

Break conflated concerns to reduce blocking actions

Page 109: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

In closing…

Page 110: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

2TB RAM and 100+ vcores!!!

Page 111: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

https://github.com/real-logic/benchmarks

https://github.com/real-logic/Agrona

https://github.com/real-logic/Aeron

Where can I find the code?

Page 112: A Quest for Predictable Latency...Some queue implementations spend more time queueing to be enqueued than in queue time!!! The evils of Blocking. Queue.put() && Queue.take() Condition

Blog: http://mechanical-sympathy.blogspot.com/

Twitter: @mjpt777

“Any intelligent fool can make things bigger, more complex, and more violent.

It takes a touch of genius, and a lot of courage, to move in the opposite direction.”

- Albert Einstein

Questions?