microservice protocols of interaction
TRANSCRIPT
Microservice Protocols of Interaction
Todd L. Montgomery @toddlmontgomery
About me…
What is a Protocol?
Why should we care!?
@toddlmontgomery
pro·to·col noun \ˈprō-tə-ˌko ̇l, -ˌkōl, -ˌkäl, -kəl\
...
3 b : a set of conventions governing the treatment and especially the formatting of data in an electronic communications system <network protocols>
...
3 a : a code prescribing strict adherence to correct etiquette and precedence (as in diplomatic exchange and in the military services) <a breach of protocol>
Protocols of Interaction
Wire Protocol, Method Calls, Shared Memory Interactions, etc.
Microservice Architectures
Forced Decoupling
via an“Asynchronous, Binary Boundary”
Forced Loose Coupling
The truth is…
Protocols can and do Couple
Protocols of Interactionare quite important!
Protocols of Interaction Matter!
The Environment
Networks, and especially the Internet,are Hostile Environments
Data can be lost,
duplicated, and re-ordered!!
TCP connections can…
be closedunexpectedly
end in anunknown state
be interceptedby idiots, er Proxies
Duplicated
Re-Ordered
Lost
Which meansData over TCP* might be…
* - When connections are re-established
Don’t assume the networkis reliable
https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing
Case Studies
Case Study 1
Loose Ordering
@toddlmontgomery
SyncRequests
&Responses
Request
Request
RequestResponse
Response
Response
Throughput limited by Round-Trip Time (RTT)!
@toddlmontgomery
AsyncRequests
&Responses
Request
Request
RequestResponse
Response
Response
Throughput less limited by Round-Trip Time!
@toddlmontgomery
AsyncRequests
&Responses
Correlation!
Request 0
Request 1
Request 2Response 0
Response 1
Response 2
Aside…
Ordering is an Illusion!!
Compiler can re-order
Runtime can re-order
CPU can re-order
Ordering has to be imposed!
@toddlmontgomery
AsyncRequests
&Responses
Correlation!
Request 0
Request 1
Request 2Response 0
Response 1
Response 2
@toddlmontgomery
Correlation!
Request 0
Request 1
Request 2Response 0
Response 1
Response 2
Ordering
@toddlmontgomery
Correlation!
Request 0
Request 1
Request 2
Response 0
Response 1
Response 2
(Valid)Re-Ordering
(one of many)
@toddlmontgomery
Handling the Unexpected
Request 0Response 1
Invalid, Drop We only know of 0.1 is unknown!
SCTPHTTP/2 (SPDY)
…most OSI Layer 4 protocols
Case Study 2
Can you hear me now?Timeouts & Retries
@toddlmontgomery
Request
ACK
Processing
Handling the unexpected
@toddlmontgomery
Request
Tim
eout
Inte
rval
X
@toddlmontgomery
Request
ACK
Processing
XTi
meo
ut In
terv
al
Retransmit at end of interval
@toddlmontgomery
ACK
Processing…
Avoid Spurious Retransmits
Retransmit
Original
Tim
eout
Inte
rval
@toddlmontgomery
Interval = N x “typical” RTT
Account for processing delay
XTi
meo
ut In
terv
al
“Average”
@toddlmontgomery
Measure! But very “noisy”?
RTT
Mea
sure
men
t
Variances inprocessing,
transmission,etc.
TCP Retransmit Timeout (RTO)
Err = M - A A <- A + gErrD <- D + h(|Err| - D)RTO = A + 4D
M = measurement, A = smoothed average, D = smoothed mean deviation,
g and h = gain constants (0 to 1)
TCP Retransmit Timeout (RTO)
Err = M - A A <- A + gErrD <- D + h(|Err| - D)RTO = A + 4D
Do you measure on a Retransmit? NO!
@toddlmontgomery
Does processing twice hurt?
X
Original
ACK
Retrans
Process Once
Process Twice
Tim
eout
Inte
rval
@toddlmontgomery
Are Original & Retransmit treated the same?
X
Original
ACK
Retrans
Process Once
Process Twice
Tim
eout
Inte
rval
TCPSCTPAeron
…anything with reliability
Case Study 3
What I Need! When I Need It!“Lifetime” Management
“Managing” Application Working Set
or
Service Liveness
Caching Algorithms
LRU, MRU, PLRU, RR,SLRU, LFU, …
“Liveness” is essential
@toddlmontgomery
Request
ACK
Service Ais Alive!
Service Bis Alive!
Service A Service B
Consequence of Processing
@toddlmontgomery
Keepalive
Keepal
ive
Service Ais Alive!
Service Bis Alive!
Service A Service B
Absence of Processing
RIP Route Deletion
Step 0 - route info broadcast @30 secondsStep 1 (3 min) - Set Distance to Infinity (16) Step 2 (+1 min) - Delete Route
Aside… RIP… aptly named
Aeron Driver Keepalive
Time of Last Activity = Shared Variable
Doesn’t need to be a message…
@toddlmontgomery
Bye
Bye
Service Ais gone!
Service Bis gone!
Service A Service B
Optimization, but insufficient with arbitrary failures
Liveness often exists acrosstransient connectivity
So…Don’t conflate transport
state with liveness!
Like TCP connection state
Dead TCP connection !=
Dead Service
Live TCP connection !=
Live Service
BGPOSPF
Transports…
almost every protocol
Case Study 4
Elasti-What?Self-Similar Behavior
Request X
Request X
Request X
Request X, X, X
Multiple same/similar requests at the same time
Response X, X, X
Similar Problem…
Reliable Multicast
1, 2, 3
1, 2, 3 1, 2, 3 1, 2, 3
Non-correlated loss
X X X
NAK 1, 2, 3
NAK 2
NAK 1
NAK 3
Request individual lost data
Retransmit 1, 2, 3
1, 2, 3
1, 2, 3 1, 2, 3 1, 2, 3
Temporally/Spatially Correlated Loss
X X X
NAK 2
NAK 2
NAK 2
NAK 2, 2, 2
Multiple requests for same data
Retransmit 2, 2, 2
Request 2
Request 2
Request 2
Request 2, 2, 2
It’s a generic problem!
Request 2
Request 2
Request 2
Request 2, 2, 2
Overloading Responder & Network
Request 2
Publish RequestsDon’t Immediately Request, Listen first
Timeout!Request
2Request
2
Suppress Request
Request 2
How long to wait & listen for?
Timeout!Request
2Request
2
Suppress Request
Statistics to the Rescue!
SRM Backoff
RandomBackoff = [C1, C1+C2] * 1-way delay
Random is more than good enough
Request 2
Request 2
Request 2, 2
Must also shed duplicates on the responder
Response 2, 2
Shed second “Request 2” if too soon
X
X
SRMPGMAeron
…
http://en.wikipedia.org/wiki/Scalable_Reliable_Multicasthttp://www.eurecom.fr/en/publication/107/detail/optimal-multicast-feedback
Case Study 5
Hey, Slow Down!Flow (& Congestion) Control
@toddlmontgomery
Data
Data
DataACK
ACK
ACK
Throughput = Data Length / RTT
RTT
Stop-And-WaitFlow Control
Delay
Bandwidth
BDP = (Byte / sec) * sec = Bytes
BDP(Buffer)
@toddlmontgomery
Data
ACKRT
T
Throughput = N * Data Length / RTT
… N Data“Blobs”
So…How big is N?
This is surprisingly hard to answer
It depends…
Big… but
Don’t overflow receiver
Don’t overflow “network”
TCP Flow Control
Receiver advertises N
TCP Congestion Control
Sender probes for network N
TCP Sender
min(Receiver N, Network N)
Only go as fast as Network & Receiver
ReactiveStreams
Subscriber uses explicit request(N)
Publisher assumes best case
http://www.reactive-streams.org/
Takeaways!
Protocols of interaction are important & can be tremendously impactful
for better or worse…
@toddlmontgomery
Questions?
• IETF http://www.ietf.org/• Aeron https://github.com/real-logic/Aeron• Twitter @toddlmontgomery
Thank You!