programming tcp for responsiveness

Post on 24-Jan-2017

5.972 Views

Category:

Internet

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Programming TCP for responsiveness

DeNA Co., Ltd.Kazuho Oku

1

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

explains TCP latency optimization implemented in H2O HTTP/2 server 2.1

2Programming TCP for responsivesess

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Background

3Programming TCP for responsivesess

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

TCP slow start

n  Initial Congestion Window (IW)=10⁃  only 10 packets can be sent in first RTT⁃  used to be IW=3

n  window increase: 1.5x/RTT

4Programming TCP for responsivesess

0

100,000

200,000

300,000

400,000

500,000

600,000

700,000

800,000

1 2 3 4 5 6 7 8

bytestransmi,ed

RTT

TCPslowstart(IW10,MSS1460)

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Why 1.5x?

During slow start, a TCP increments cwnd by at most SMSS bytes for each ACK received that cumulatively acknowledges new data.(snip)The delayed ACK algorithm specified in [RFC1122] SHOULD be used by a TCP receiver. When using delayed ACKs, a TCP receiver MUST NOT excessively delay acknowledgments. Specifically, an ACK SHOULD be generated for at least every second full-sized segment, and MUST be generated within 500 ms of the arrival of the first unacknowledged packet.

TCP Congestion Control (RFC 5681)

5Programming TCP for responsivesess

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Flow of the ideal HTTP

n  fastest within the limits of TCP/IPn  receive a request 0-RTT, and:

⁃  first send CSS/JS*⁃  then send the HTML⁃  then send the images*

*: but only the ones not cached by the browser

6Programming TCP for responsivesess

client server

1RT

T

request

response

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

The reality in HTTP/2

n  TCP establishment: +1 RTT*n  TLS handshake: +2 RTT**n  HTML fetch: +1 RTTn  JS,CSS fetch: +2 RTT***

n  Total: 6 RTT

*: 0 RTT on reconnection**: 1 RTT on reconnection***: servers often cannot switch to sending JS,CSS instantly, due to the output buffered in TCP send buffer

7Programming TCP for responsivesess

client server

1RT

T

TCPSYN

TCPSYNACK

TLSHandshake

TLSHandshake

TLSHandshake

TLSHandshake

GET/

HTML

GETcss,js

css,js〜〜

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Ongoing optimizations

n  TCP Fast Open⁃  initial establishment in 1 RTT⁃  re-establishment in 0 RTT

n  TLS 1.3⁃  initial handshake complete in 1 RTT⁃  resumption in 0 RTT

n  what can be done in the HTTP/2 layer?

8Programming TCP for responsivesess

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Programming TCP for responsiveness

9Programming TCP for responsivesess

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Programming TCP for responsiveness

Answer: TCP Urgent Indications (i.e. MSG_OOB)

10Programming TCP for responsivesess

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Programming TCP for responsiveness

Answer: TCP Urgent Indications (i.e. MSG_OOB)

11Programming TCP for responsivesess

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

TCP Urgent Indications

n  out-of-band messaging for TCP⁃  used by telnet!

n  can only send 1 octet⁃  conflicting specs on how to handle multi-octet

messagesn  cannot be used for HTTP/2n  RFC 6093 “recommends against the use of urgent

mechanism” (RFC 7414)

12Programming TCP for responsivesess

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Typical sequence of HTTP/2

13Programming TCP for responsivesess

HTTP/2 200 OK

<!DOCTYPE HTML>…<SCRIPT SRC=”jquery.js”>…

client server

GET /

GET /jquery.js

needtoswitchsendingfromHTMLtoJSatthisverymoment(meansthatamountofdatasentin*mustbesmallerthanIW)

1RTT

*

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Buffering in TCP and TLS layer

14Programming TCP for responsivesess

TCPsendbuffer

CWNDunacked pollthreshold

BIObuf.

// ordinary code (non-blocking)while (SSL_write(…) != SSL_ERR_WANT_WRITE) ;

TLSRecords

sentimmediately notimmediatelysent

HTTP/2frames

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Why do we have buffers?

15Programming TCP for responsivesess

n  TCP send buffer:⁃  reduce ping-pong bet. kernel and application

n  BIO buffer:⁃  for data that couldnʼt be stored in TCP send buffer

TCPsendbuffer

CWNDunacked pollthreshold

BIObuf.

TLSRecords

sentimmediately notimmediatelysent

HTTP/2frames

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Improvement: poll-then-write

16Programming TCP for responsivesess

TCPsendbuffer

CWNDunacked pollthreshold

// only call SSL_write when polls notifies the app.while (poll_for_write(fd) == SOCKET_IS_READY) SSL_write(…);

TLSRecords

sentimmediately notimmediatelysent

HTTP/2frames

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Adjust poll threshold

17Programming TCP for responsivesess

TCPsendbuffer

CWNDunacked pollthreshold

n  set poll threshold to the end of CWND?⁃  setsockopt(TCP_NOTSENT_LOWAT)⁃  in linux, the minimum is CWND + 1 octet•  becomes unstable when set to CWND + 0

TLSRecords

sentimmediately notimmediatelysent

HTTP/2frames

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Adjust poll threshold

18Programming TCP for responsivesess

CWNDunacked pollthreshold

// only call SSL_write when polls notifies the app.while (poll_for_write(fd) == SOCKET_IS_READY) SSL_write(…);

TLSRecords

sentimmediately notimmediatelysent

HTTP/2frames

TCPsendbuffer

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Further improvement: read TCP states

19Programming TCP for responsivesess

CWNDunacked pollthreshold

// calc size of data to send by calling getsockopt(TCP_INFO)if (poll_for_write(fd) == SOCKET_IS_READY) { capacity = CWND - unacked + TWO_MSS - TLS_overhead; SSL_write(prepare_http2_frames(capacity));}

TLSRecords

sentimmediately notimmediatelysent

HTTP/2frames

TCPsendbuffer

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Negative impact of additional delay

n  increased delay bet. ACK recv. → data send, since:⁃  traditional approach: completes within kernel⁃  this approach: application needs to be notified to

generate new datan  outcome:

⁃  increase of CWND becomes slower⁃  leads to slower peak speed?•  depends on how CWND at peak is calculated

⁃  does kernel use TCP timestamp for the matter?

20Programming TCP for responsivesess

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Countermeasures

n  optimize for responsiveness only when necessary⁃  i.e. when RTT is big and CWND is small⁃  impact of optimization is proportional to

unsent_bytes / CWNDn  disable optimization if additional delay is significant

⁃  when epoll returns immediately, estimated additional delay is equal to the time spent by the loop

21Programming TCP for responsivesess

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Configuration Directives

n  http2-latopt-min-rtt⁃  minimum TCP RTT to enable the optimization⁃  default: UINT_MAX (disabled)

n  http2-latopt-max-cwnd⁃  maximum CWND to enable (in octets)⁃  default: 65535

n  http2-max-additional-delay⁃  max. additional delay (as the ratio to TCP RTT)⁃  latopt disabled if the delay is greater⁃  default: 0.1

22Programming TCP for responsivesess

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Pseudo-codesize_t get_suggested_write_size() { getsockopt(fd, IPPROTO_TCP, TCP_INFO, &tcp_info, sizeof(tcp_info)); if (tcp_info.tcpi_rtt < min_rtt || tcp_info.tcpi_snd_cwnd > max_cwnd) return UNKNOWN;

switch (SSL_get_current_cipher(ssl)->id) { case TLS1_CK_RSA_WITH_AES_128_GCM_SHA256: case …: tls_overhead = 5 + 8 + 16; break; default: return UNKNOWN; }

packets_sendable = tcp_info.tcpi_snd_cwnd > tcp_info.tcpi_unacked ? tcp_info.tcpi_snd_cwnd - tcp_info.tcpi_unacked : 0; return (packets_sendable + 2) * (tcp_info.tcpi_snd_mss - tls_overhead);}

23Programming TCP for responsivesess

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Benchmark (1)

24Programming TCP for responsivesess

n  conditions:⁃  server in Ireland, client in Tokyo (RTT 250ms)⁃  load tiny js at the top of a large HTML

n  result: delay decreased from 511ms to 250ms⁃  i.e. JS fetch latency was 2RTT, became 1 RTT•  similar results in other environments

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Benchmark (2)

n  using same data as previousn  server: Sakura VPS (Ishikari DC)

25Programming TCP for responsivesess

0

50

100

150

200

250

300

HTML JS

millisecon

ds�

downloadingHTML(andJSwithin)RTT~25ms�

master latopt

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Conclusion

n  near-optimal result can be achieved⁃  by adjusting poll threshold and reading TCP

states⁃  1-packet overhead due to restriction in Linux

kerneln  1-RTT improvement in H2O

⁃  estimated 1-RTT improvement per the depth of the load graph

26Programming TCP for responsivesess

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Under the hood

27Programming TCP for responsivesess

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

TCP_NOTSENT_LOWAT

n  supported by Linux, OS Xn  on Linux:

⁃  sysctl:•  set to -1: use kernel default•  set to 0: sshd hangs•  set to positive int: override kernel default

⁃  setsockopt:•  set to 0: use default (sysctl or kernel)•  set to int: override default

28Programming TCP for responsivesess

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Unit of CWND

n  Linux: # of packets⁃  if INITCWND is 10, you can send at most 10

packets at once, regardless of their sizen  BSD (incl. OS X): octets

⁃  you can send CWND*MSS octets, regardless of the number of packets•  if CWND=10 and MSS=1460, it is possible to send

14,600 packets containing 1-octet payload

29Programming TCP for responsivesess

Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.

Determining amount of data that can be sent immediately

OS MSS CWND inflight sendbuffer(inflight+unsent)

Linux tcpi_snd_mss tcpi_snd_cwnd* tcpi_snd_unacked* ioctl(SIOCOUTQ)

OSX** tcpi_maxseg tcpi_snd_cwnd - tcpi_snd_sbbytes

FreeBSD tcpi_snd_mss tcpi_snd_cwnd - ioctl(FIONWRITE)

NetBSD tcpi_snd_mss tcpi_snd_cwnd* - ioctl(FIONWRITE)

30Programming TCP for responsivesess

n  calculate either of:⁃  CWND - inflight⁃  min(CWND - (inflight + unsent), 0)

n  units used in the calculation must be the same⁃  NetBSD: fail

*:unitsofvaluesmarkedarepackets,unmarkedareoctets**:somefmesthevaluesoftcpi_*arereturnedaszeros

top related