how speedy is spdy?

14
This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’14). April 2–4, 2014 • Seattle, WA, USA ISBN 978-1-931971-09-6 Open access to the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’14) is sponsored by USENIX How Speedy is SPDY? Xiao Sophia Wang, Aruna Balasubramanian, Arvind Krishnamurthy, and David Wetherall, University of Washington https://www.usenix.org/conference/nsdi14/technical-sessions/wang

Upload: vandat

Post on 18-Jan-2017

229 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: How Speedy is SPDY?

This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems

Design and Implementation (NSDI ’14).April 2–4, 2014 • Seattle, WA, USA

ISBN 978-1-931971-09-6

Open access to the Proceedings of the 11th USENIX Symposium on

Networked Systems Design and Implementation (NSDI ’14)

is sponsored by USENIX

How Speedy is SPDY?Xiao Sophia Wang, Aruna Balasubramanian, Arvind Krishnamurthy,

and David Wetherall, University of Washington

https://www.usenix.org/conference/nsdi14/technical-sessions/wang

Page 2: How Speedy is SPDY?

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 387

How speedy is SPDY?

Xiao Sophia Wang, Aruna Balasubramanian, Arvind Krishnamurthy, and David WetherallUniversity of Washington

AbstractSPDY is increasingly being used as an enhancement

to HTTP/1.1. To understand its impact on performance,we conduct a systematic study of Web page load time(PLT) under SPDY and compare it to HTTP. To identifythe factors that affect PLT, we proceed from simple, syn-thetic pages to complete page loads based on the top 200Alexa sites. We find that SPDY provides a significant im-provement over HTTP when we ignore dependencies inthe page load process and the effects of browser compu-tation. Most SPDY benefits stem from the use of a singleTCP connection, but the same feature is also detrimen-tal under high packet loss. Unfortunately, the benefitscan be easily overwhelmed by dependencies and com-putation, reducing the improvements with SPDY to 7%for our lower bandwidth and higher RTT scenarios. Wealso find that request prioritization is of little help, whileserver push has good potential; we present a push pol-icy based on dependencies that gives comparable perfor-mance to mod spdy while sending much less data.

1 IntroductionHTTP/1.1 has been used to deliver Web pages using mul-tiple, persistent TCP connections for at least the pastdecade. Yet as the Web has evolved, it has been criti-cized for opening too many connections in some settingsand too few connections in other settings, not providingsufficient control over the transfer of Web objects, andnot supporting various types of compression.

To make the Web faster, Google proposed and de-ployed a new transport for HTTP messages, calledSPDY, starting in 2009. SPDY adds a framing layer formultiplexing concurrent application-level transfers overa single TCP connection, support for prioritization andunsolicited push of Web objects, and a number of otherfeatures. SPDY is fast becoming one of the most im-portant protocols for the Web; it is already deployed bymany popular websites such as Google, Facebook, andTwitter, and supported by browsers including Chrome,Firefox, and IE 11. Further, IETF is standardizing aHTTP/2.0 proposal that is heavily based on SPDY [10].

Given the central role that SPDY is likely to play inthe Web, it is important to understand how SPDY per-forms relative to HTTP. Unfortunately, the performanceof SPDY is not well understood. There have been sev-eral studies, predominantly white papers, but the find-ings often conflict. Some studies show that SPDY im-proves performance [20, 14], while others show that it

provides only a modest improvement [13, 19]. In ourown study [25] of page load time (PLT) for the top 200Web pages from Alexa [1], we found either SPDY orHTTP could provide better performance by a significantmargin, with SPDY performing only slightly better thanHTTP in the median case.

As we have looked more deeply into the performanceof SPDY, we have come to appreciate why it is chal-lenging to understand. Both SPDY and HTTP perfor-mance depend on many factors external to the protocolsthemselves, including network parameters, TCP settings,and Web page characteristics. Any of these factors canhave a large impact on performance, and to understandtheir interplay it is necessary to sweep a large portion ofthe parameter space. A second challenge is that thereis much variability in page load time (PLT). The vari-ability comes not only from random events like networkloss, but from browser computation (i.e., JavaScript eval-uation and HTML parsing). A third challenge is that de-pendencies between network activities and browser com-putation can have a significant impact on PLT [25].

In this work, we present what we believe to be themost in-depth study of page load time under SPDY todate. To make it possible to reproduce experiments, wedevelop a tool called Epload that controls the variabil-ity by recording and replaying the process of a page loadat fine granularity, complete with browser dependenciesand deterministic computational delays; in addition weuse a controlled network environment. The other key toour approach is to isolate the different factors that affectPLT with reproducible experiments that progress fromsimple but unrealistic transfers to full page loads. Bylooking at results across this progression, we can sys-tematically isolate the impact of the contributing factorsand identify when SPDY helps significantly and when itperforms poorly compared to HTTP.

Our experiments progress as follows. We first com-pare SPDY and HTTP simply as a transport protocol(with no browser dependencies or computation) thattransfers Web objects from both artificial and real pages(from the top 200 Alexa sites). We use a decision treeanalysis to identify the situations in which SPDY out-performs HTTP and vice versa. We find that SPDY im-proves PLT significantly in a large number of scenariosthat track the benefits of using a single TCP connection.Specifically, SPDY helps for small object sizes and un-der low loss rates by: batching several small objects in aTCP segment; reducing congestion-induced retransmis-

1

Page 3: How Speedy is SPDY?

388 11th USENIX Symposium on Networked Systems Design and Implementation USENIX Association

sions; and reducing the time when the TCP pipe is idle.Conversely, SPDY significantly hurts performance underhigh packet loss for large objects. This is because a setof TCP connections tends to perform better under highpacket loss; it is necessary to tune TCP behavior to boostperformance.

Next, we examine the complete Web page load pro-cess by incorporating dependencies and computationaldelays. With these factors, the benefits of SPDY are re-duced, and can even be negated. This is because: i) thereare fewer outstanding objects at a given time; ii) traffic isless bursty; and iii) the impact of the network is degradedby computation. Overall, we find SPDY benefits to belarger when there is less bandwidth and longer RTTs. Forthese cases SPDY reduces the PLT for 70–80% of Webpages, and for shorter, faster links it has little effect, butit can also increase PLT: the worst 20% of pages see anincrease of at least 6% for long RTT networks.

In search of greater benefits, we explore SPDY mech-anisms for prioritization and server push. Prioritiza-tion helps little because it is limited by load dependen-cies, but server push has the potential for significant im-provements. How to obtain this benefit depends on theserver push policy, which is a non-trivial issue because ofcaching. This leads us to develop a policy based on de-pendency levels that performs comparably to mod spdy’spolicy [11] while pushing 80% less data.

Our contributions are as follows:• A systematic measurement study using synthetic

pages and real pages from 200 popular sites that iden-tifies the combinations of factors for which SPDYimproves (and sometimes reduces) PLT compared toHTTP.

• A page load tool, Epload, that emulates the detailedpage load process of a target page, including its depen-dencies, while eliminating variability due to browsercomputation. With a controlled network environment,Epload enables reproducible but authentic page loadexperiments for the first time.

• A SPDY server push policy based on dependencyinformation that provides comparable benefits tomod spdy while sending much less data over the net-work.In the rest of this paper, we first review SPDY back-

ground (§2) and then briefly describe our challenge andapproach (§3). Next, we extensively study TCP’s im-pact on SPDY (§4) and extend to Web page’s impact onSPDY (§5). We discuss in §6, review related work in §7,and conclude in §8.

2 BackgroundIn this section, we review issues with HTTP perfor-mance and describe how the new SPDY protocol ad-dresses them.

2.1 Limitations of HTTP/1.1

When HTTP/1.1, or simply HTTP, was designed in thelate 1990’s, Web applications were fairly simple andrudimentary. Since then, Web pages have become morecomplex and dynamic, making it difficult for HTTP tomeet the increasingly demanding user experience. Be-low, we identify some of the limitations of HTTP:

i) Browsers open too many TCP connections to loada page. HTTP improves performance by using parallelTCP connections. But if the number of connections istoo large, the aggregate flow may cause network conges-tion, high packet loss, and reduced performance [9]. Fur-ther, services often deliver Web objects from multiple do-mains, which results in even more TCP connections andthe possibility of high packet loss.

ii) Web transfers are strictly initiated from the client.Consider the loading of embedded objects. Theoreti-cally, the server can send embedded objects along withthe parent object when it receives a request for the par-ent object. In HTTP, because an object can be sent onlyin response to a client request, the server has to wait foran explicit request which is sent only after the client hasreceived and processed the parent page.

iii) A TCP segment cannot carry more than one HTTPrequest or response. HTTP, TCP and other headers couldaccount for a significant portion of a packet when HTTPrequests or responses are small. So if there are a largenumber of small embedded objects in a page, the over-head associated with these headers is substantial.

2.2 SPDY

SPDY addresses several of the issues described above.We now review the key ideas in SPDY’s design and im-plementation and its deployment status.

Design: There are four key SPDY features.i) Single TCP connection. SPDY opens a single

TCP connection to a domain and multiplexes multipleHTTP requests and responses (a.k.a., SPDY streams)over the connection. The multiplexing here is similar toHTTP/1.1 pipelining but is finer-grained. A single con-nection also helps reduce SSL overhead. Besides client-side benefits, using a single connection helps reduce thenumber of TCP connections opened at servers.

ii) Request prioritization. Some Web objects, such asJavaScript code modules, are more important than othersand thus should be loaded earlier. SPDY allows the clientto specify a priority level for each object, which is thenused by the server in scheduling the transfer of the object.

iii) Server push. SPDY allows the server to push em-bedded objects before the client requests for them. Thisimproves latency but could also increase transmitted dataif the objects are already cached at the client.

iv) Header compression. SPDY supports HTTP

2

Page 4: How Speedy is SPDY?

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 389

header compression since experiments suggest thatHTTP headers for a single session contain duplicatecopies of the same information (e.g., User-Agent).

Implementation: SPDY is implemented by adding aframing layer to the network stack between HTTP andthe transport layer. Unlike HTTP, SPDY splits HTTPheaders and data payloads into two kinds of frames.SYN_STREAM frames carry request headers and SYN_REPLY frames carry response headers. When a headerexceeds the frame size, one or more HEADERS frameswill follow. HTTP data payloads are sliced into DATAframes. There is no standardized value for the frame size,and we find that mod spdy caps frame size to 4KB [11].Because frame size is the granularity of multiplexing, toolarge a frame decreases the ability to multiplex while toosmall a frame increases overhead. SPDY frames are en-capsulated in one or more consecutive TCP segments. ATCP segment can carry multiple SPDY frames, making itpossible to batch up small HTTP requests and responses.

Deployment: SPDY is deployed over SSL and TCP. Onthe client side, SPDY is enabled in Chrome, Firefox,and IE 11. On the server side, popular websites suchas Google, Facebook, and Twitter have deployed SPDY.Another popular use of SPDY is between a proxy and aclient, such as the Amazon Silk browser [16] and An-droid Chrome Beta [2]. SPDY version 3 is the most re-cent specification and is widely deployed [21].

3 Pinning SPDY downWe would like to experimentally evaluate how SPDYperforms relative to HTTP because SPDY is likely toplay a key role in the Web. But, understanding SPDYperformance is hard. Below, we identify three challengesin studying the performance of SPDY and then providean overview of our approach.

3.1 Challenges

We identify the challenges on the basis of previous stud-ies and our own initial experimentation. As a first step,we extensively load two Web pages for a thousand timesusing a measurement node at the University of Washing-ton. One page displays fifty world flags [12], whichis advertised by mod spdy [11] to demonstrate the per-formance benefits of SPDY, and the other is the Twitterhome page. The results are depicted in Figure 1.

First, we observe that SPDY helps the flag page butnot the Twitter page, and it is not immediately apparentas to why that is the case. Further experimentation in em-ulated settings also revealed that both the magnitude andthe direction of the performance differences vary signif-icantly with network conditions. Taken together, this in-dicates that SPDY’s performance depends on many fac-tors such as Web page characteristics, network parame-ters, and TCP settings, and that measurement studies will

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2 2.5 3 3.5 4

CD

F

Page load time (seconds)

HTTP

SPDY

(a) The flag page

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2 2.5 3 3.5 4

CD

F

Page load time (seconds)

HTTP

SPDY

(b) Twitter home page

Figure 1: Distributions of PLTs of SPDY and HTTP. Per-formed a thousand runs for each curve without caching.

likely yield different, even conflicting, results, if they usedifferent experimental settings. Therefore, a comprehen-sive sweep of the parameter space is necessary to eval-uate under what conditions SPDY helps, what kinds ofWeb pages benefit most from SPDY, and what parame-ters best support SPDY.

Second, we observed in our experiments that the mea-sured page load times have high variances, and this oftenoverwhelms the differences between SPDY and HTTP.For example, in Figure 1(b), the variance of the PLT forthe Twitter page is 0.5 second but the PLT difference be-tween HTTP and SPDY is only 0.02 second. We observehigh variance even when we load the two pages in a fullycontrolled network. This indicates that the variabilitylikely stems from browser computation (i.e., JavaScriptevaluation and HTML parsing). Controlling this vari-ability is key to reproducing experiments so as to obtainmeaningful comparisons.

Third, prior work has shown that the dependencies be-tween network operations and computation has a signif-icant impact on PLT [25]. Interestingly, page dependen-cies also influence the scheduling of network traffic andaffects how much SPDY helps or hurts performance (§4and §5). Thus, on one hand, ignoring browser computa-tions can reduce PLT variability, but on the other hand,dependencies need to be preserved in order to obtain ac-curate measurements under realistic offered loads.

3.2 Approach

Our approach is to separate the various factors that affectSPDY and study them in isolation. This allows us to con-trol and identify the extent to which these factors affectSPDY.

First, we extensively sweep the parameter space of allthe factors that affect SPDY including RTT, bandwidth,loss rate, TCP initial window, number of objects on apage, and object sizes. We initially ignore page loaddependencies and computation in order to simplify ouranalysis. This systematic study allows us to identifywhen SPDY helps or hurts and characterize the impor-tance of the contributing factors. Based on further anal-ysis of why SPDY sometimes hurts, we propose some

3

Page 5: How Speedy is SPDY?

390 11th USENIX Symposium on Networked Systems Design and Implementation USENIX Association

simple modifications to TCP.Second, before we perform experiments with page

load dependencies, we address the variability caused bycomputation. We develop a tool called Epload that em-ulates the process of a page load. Instead of performingreal browser computation, Epload records the processof a sample page load, identifies when computations hap-pen, and replays the page load by introducing the appro-priate delays associated with the recorded computations.After emulating a computation activity, Epload per-forms real network requests to dependent Web objects.This allows us to control the variability of computationwhile also modeling page load dependencies. In con-trast to the methodology that statistically reduces vari-ability by obtaining a large amount of data (usually fromproduction), our methodology mitigates the root causeof variability and thus largely reduces the amount of re-quired experiments.

Third, we study the effects of dependencies and com-putation by performing page loads with Epload. We arethen able to identify how much dependencies and com-putation affect SPDY, and to identify the relative impor-tance of other contributing factors. To mitigate the neg-ative impact of dependencies and computation, we ex-plore the use of prioritization and server push that enablethe client and the server to coordinate the transfers. Here,we are able to evaluate the extent to which these mech-anisms can improve performance when used appropri-ately.

4 TCP and SPDYIn this section, we extensively study the performance ofSPDY as a transfer protocol on both synthetic and realpages by ignoring page load dependencies and computa-tion. This allows us to measure SPDY performance with-out other confounding factors such as browser computa-tion and page load dependencies. Here, SPDY is onlydifferent from HTTP in the use of a single TCP connec-tion, header compression, and a framing layer.

4.1 Experimental setup

We conduct the experiments by setting up a client and aserver that can communicate over both HTTP and SPDY.Both the server and the client are connected to the cam-pus LAN at the University of Washington. We use Dum-mynet [6] to vary network parameters. Below details theexperimental setup.

Server: Our server is a 64-bit machine with 2.4GHz 16core CPU and 16GB memory. It runs Ubuntu 12.04 withLinux kernel 3.7.5 using the default TCP variant Cubic.We use a TCP initial window size of ten as the defaultsetting, as suggested by SPDY best practices [18]. HTTPand SPDY are enabled on Apache 2.2.2 with the SPDYmodule, mod spdy 0.9.3.3-386, installed. We use SPDY

Categ Factor Range High

Netrtt 20ms, 100ms, 200ms ≥100msbw 1Mbps, 10Mbps ≥10Mbpspkt loss 0, 0.005, 0.01, 0.02 ≥ 0.01

TCP iw 3, 10, 21, 32 ≥ 21

Pageobj size 100B, 1K, 10K, 100K, 1M ≥ 1K# of obj 2, 8, 16, 32, 64, 128, 512 ≥ 64

Table 1: Contributing factors to SPDY performance. Wedefine a threshold for each factor, so that we can classifya setting as being high or low in our analysis.

3 without SSL which allows us to decode the SPDYframes in TCP payloads. To control the exact size ofWeb objects, we turn off gzip encoding.

Client: Because we issue requests at the granularityof Web objects and not pages, we do not work withbrowsers, and instead develop our own SPDY client byfollowing the SPDY/3 specification [21]. Unlike otherwget-like SPDY clients such as spdylay [22] that opena TCP connection per request, our SPDY client allows usto reuse TCP connections. Similarly, we also develop anHTTP client for comparison. We set the maximum num-ber of parallel TCP connections for HTTP to six, as usedby all major browsers. As the receive window is auto-tuned, it is not a bottleneck in our experiments.

Web pages: To experiment with synthetic pages, we cre-ate objects with pre-specified sizes and numbers. Toexperiment with real pages, we download the homepages of the Alexa top 200 websites to our own server.To avoid the negative impact of domain sharding onSPDY [18], we serve all embedded objects from the sameserver including those that are dynamically generated byJavaScript.

We run the experiments presented in the entire paperfrom June to September, 2013. We repeat our experi-ments five times and present the median to exclude theeffects of random loss. We collect network traces at boththe client and the server. We define page load time (PLT)as the elapsed time between when the first object is re-quested and when the last object is received. Because wedo not experiment within a browser, we do not use theW3C load event [24].

4.2 Experimenting with synthetic pages

In experimenting with synthetic pages, we consider abroad range of parameter settings for the various factorsthat affect performance. Table 1 summarizes the param-eter space used in our experiments. The RTT values in-clude 20ms (intra-coast), 100ms (inter-coast), and 200ms(3G link or cross-continent). The bandwidths emulatea broadband link with 10Mbps [4] and a 3G link with1Mbps [3]. We inject random packet loss rates from zeroto 2% since studies suggest that Google servers experi-ence a loss rate between 1% and 2% [5]. At the server,

4

Page 6: How Speedy is SPDY?

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 391

obj size

loss

RTT

small

large

low high

highlow

highlow lowhigh

SPDY

SPDY

SPDY

EQUALSPDY

SPDY

icwndlow high

EQUAL

low highloss

large

small large

RTT

highlow

highlow

SPDY

low high

EQUAL

RTTlow

large

low high

HTTP

HTTPEQUAL

small

SPDY

EQUAL

EQUAL

high

EQUAL

RTT

low high

HTTP

small

SPDY better or equal to HTTP HTTP better

# of obj# of obj # of obj

icwnd

BWBW BW BW

Figure 2: The decision tree that tells when SPDY orHTTP helps. A leaf pointing to SPDY (HTTP) meansSPDY (HTTP) helps; a leaf pointing to EQUAL meansSPDY and HTTP are comparable. Table 1 shows how wedefine a factor being high or low.

we vary TCP initial window size from 3 (used by earlierLinux kernel versions) to 32 (used by Google servers).We also consider a wide range of Web object sizes (100Bto 1M) and object numbers (2 to 512). For simplicity, wechoose one value for each factor which means that thereis no cross traffic.

When we sweep this large parameter space, we findthat SPDY improves performance under certain condi-tions, but degrades performance under other conditions.

4.2.1 When does SPDY help or hurt

There have been many hypotheses as to whether SPDYhelps or hurts based on analytical inference about par-allel versus single TCP connections. For example, onehypothesis is that SPDY hurts because a single TCP con-nection increases congestion window slower than multi-ple connections; another hypothesis is that SPDY helpsstragglers because HTTP has to balance its commu-nications across parallel TCP. However, it is unclearhow much hypotheses contribute to SPDY performance.Here, we sort out the most important findings, meaningthat hypotheses that are shown here contribute more toSPDY performance than those that are not shown.

Methodology: To understand the conditions underwhich SPDY helps or hurts, we build a predictive modelbased on decision tree analysis. In the analysis, each con-figuration is a combination of values for all factors listedin Table 1. For each configuration, we add an additionalvariable s, which is the PLT of SPDY divided by that ofHTTP. We run the decision tree to predict the configura-

0

1

2

3

4

5

6

7

0 0.5% 1% 2%

PLT

(seconds)

HTTP

SPDY

(a) Packet loss rate

0

1

2

3

4

5

2 8 16 32 64 128

PLT

(seconds)

HTTP

SPDY

(b) Object number

0

2

4

6

8

10

12

100Byte 1KB 10KB 100KB

PLT

(seconds)

HTTP

SPDY

(c) Object sizeFigure 3: Performance trends for three factors with adefault setting: rtt=200ms, bw=10Mbps, loss=0, iw=10,obj size=10K, obj number=64.

tions under which SPDY outperforms HTTP (s < 0.9)and under which HTTP outperforms SPDY (s > 1.1).The decision tree analysis generates the likelihood thata configuration works better under SPDY (or HTTP). Ifthis likelihood is over 0.75, we mark the branch as SPDY(or HTTP); otherwise, we say that SPDY and HTTP per-form equally.

We obtain the decision tree in Figure 2 as follows.First, we produce a decision tree based on all the factors.To populate the branches, we also generate supplementaldecision trees based on subsets of factors. Each supple-mental decision tree has a prediction accuracy of 84% orhigher. Last, we merge the branches from supplementaldecision trees into the original decision tree.

Results: The decision tree shows that SPDY hurts whenpacket loss is high. However, SPDY helps under a num-ber of conditions, for example, when there are:• Many small objects, or small objects under low loss.• Many large objects under low loss.• Few objects under good network conditions and a

large TCP initial window.The decision tree also depicts the relative importance

of contributing factors. Intuitively, factors close to theroot of the decision tree affect SPDY performance morethan those near the leaves. This is because the decisiontree places the important factors near the root to reducethe number of branches. We find that object size and lossrate are the most important factors in predicting SPDYperformance. However, RTT, bandwidth, and TCP initialwindow play a less important role.

5

Page 7: How Speedy is SPDY?

392 11th USENIX Symposium on Networked Systems Design and Implementation USENIX Association

0.7

0.75

0.8

0.85

0.9

0.95

1

0 50 100 150 200

CD

F

# of retransmits

HTTP

SPDY w/ TCP

Figure 4: SPDY reduces the number of retransmissions.

How much SPDY helps or hurts: We present threetrending graphs in Figure 3. Figure 3(a) shows thatHTTP outperforms SPDY by half when loss rate in-creases to 2%, Figure 3(b) shows the trend that SPDYperforms better as the number of objects increases, andFigure 3(c) shows the trend that SPDY performs worseas the object size increases. We publish the results,trends, and network traces at http://wprof.cs.washington.edu/spdy/.

4.2.2 Why does SPDY help or hurt

While the decision tree informs the conditions underwhich SPDY helps or hurts, it does not explain why. Tothis end, we analyze the network traces we collected toexplain SPDY performance. We discuss below our find-ings.

SPDY helps on small objects. Our traces suggest thatTCP implements congestion control by counting out-standing packets not bytes. Thus, sending a few smallobjects with HTTP will promptly use up the conges-tion window, though outstanding bytes are far below thewindow limit. In contrast, SPDY batches small objectsand thus eliminates this problem. This explains why theflag page [12], which mod spdy advertised, benefits fromSPDY.

SPDY benefits from having a single connection. Wefind several reasons as to why SPDY benefits from a sin-gle TCP connection. First, a single connection results infewer retransmissions. Figure 4 shows the retransmis-sions in SPDY and HTTP across all configurations ex-cept those with zero injected loss. SPDY helps becausepacket loss occurs more often when concurrent TCP con-nections are competing with each other. There are addi-tional explanations for why SPDY benefits from using asingle connection. In our previous study [25], our exper-iments showed that SPDY significantly reduced the con-tribution of the TCP connection setup time to the criticalpath of a page download. Further, our experiments in §5will show that a single pipe reduces the amount of timethe pipe is idle due to delayed client requests.

SPDY degrades under high loss due to the use of asingle pipe. We discussed above that a single TCP con-nection helps under several conditions. However, a sin-gle connection hurts under high packet loss because it

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200

CD

F

# objects

(a) # of objects

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200

CD

F

# objects

(b) # of objects < 1.5KB

0

0.2

0.4

0.6

0.8

1

0 500 1000 1500 2000

CD

F

Page size (KB)

(c) Page size (KB)

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

CD

F

Average object size (KB)

(d) Mean object size (KB)

Figure 5: Characteristics of top 200 Alexa Web pages.

aggressively reduces the congestion window comparedto HTTP which reduces the congestion window on onlyone of its parallel connections.

4.3 Experimenting with real pages

In this section, we study the effects of varying objectsizes and number of objects based on the distributionsobserved in real Web pages. We continue to vary otherfactors such as network conditions and TCP settingsbased on the parameter space described in Table 1. Dueto space limit, we only show results under a 10Mbpsbandwidth.

First, we examine the page characteristics of realpages because they can explain why SPDY helps or hurtswhen we relate them to the decision tree. Figure 5 showsthe characteristics of the top 200 Alexa Web pages [1].The median number of objects is 30 and the median pagesize is 750KB. We find high variability in the size of ob-jects within a page. The standard deviation of the objectsize within a page is 31KB (median), even more than theaverage object size 17KB (median).

Figure 6 shows PLT of SPDY divided by that of HTTPacross the 200 Web pages. It suggests that SPDY helpson 70% of the pages consistently across network con-ditions. Interestingly, SPDY shows a 2x speedup overhalf of the pages, likely due to the following reasons.First, SPDY almost eliminates retransmissions (as indi-cated in Figure 7). Compared to a similar analysis for ar-tificial pages (see Figure 4), SPDY’s retransmission rateis even lower. Second, we find in Figure 5(b) that 80% ofthe pages have small objects, and that half of the pageshave more than ten small objects. Since SPDY helpswith small objects (based on the decision tree analysis),it is not surprising that SPDY has lower PLT for this setof experiments. In addition, we hypothesize that SPDYcould help with stragglers since it multiplexes all objectson to a single connection and thus reduces the dynam-ics of congestion windows. To check this hypothesis, weran a set of experiments with overall page size and the

6

Page 8: How Speedy is SPDY?

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 393

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

CD

F

PLT of SPDY divided by PLT of HTTP

0 loss;TCP

2% loss;TCP

(a) rtt=20ms, bw=10Mbps

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

CD

F

PLT of SPDY divided by PLT of HTTP

0 loss;TCP

2% loss;TCP

(b) rtt=200ms, bw=10Mbps

Figure 6: SPDY performance across 200 pages with ob-ject sizes and numbers of objects drawn from real pages.SPDY helps more under a 1Mbps bandwidth.

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80 100 120 140

CD

F

# of retransmits

HTTP

SPDY w/ TCP

Figure 7: SPDY helps reduce retransmissions.

number of objects drawn from the real pages, but withequal object sizes embedded inside the pages. When weperform this experiment, HTTP’s performance improvesonly marginally indicating that there is very little strag-gler effect.

4.4 TCP modifications

Previously, we found that SPDY hurts mainly under highpacket loss because a single TCP connection reduces thecongestion window more aggressively than HTTP’s par-allel connections. Here, we demonstrate that the negativeimpact can be mitigated by simple TCP modifications.

Our modification (a.k.a., TCP+) mimics behaviors ofconcurrent connections with a single connection. Let thenumber of parallel TCP connections be n. First, we pro-pose to multiply the initial window by n to reduce the ef-fect of slow start. Second, we suggest scaling the receivewindow by n to ensure that the SPDY connection has thesame amount of receive buffer as HTTP’s parallel con-

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

CD

F

PLT of SPDY divided by PLT of HTTP

2% loss;TCP

2% loss;TCP+

Figure 8: TCP+ helps SPDY across the 200 pages.RTT=20ms, BW=10Mbps. Results on other network set-tings are similar.

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80 100 120 140

CD

F

# of retransmits

HTTP

SPDY w/ TCP

SPDY w/ TCP+

Figure 9: With TCP+, SPDY still produces few retrans-missions.

nections. Third, when packet loss occurs, the congestionwindow (cwnd) backs off with a rate β′ = 1− (1−β)/nwhere β is the original backoff rate. In practice, the num-ber of concurrent connections changes over time. Be-cause we are unable to pass this value to the Linux kernelin real time, we assume that HTTP uses six connectionsand set n = 6. We use six here because it is found opti-mal and used by major browsers [17].

We perform the same set of SPDY experiments withboth synthetic and real pages using TCP+. Figure 8shows that SPDY performs better with TCP+, and the de-cision tree analysis for TCP+ suggests that loss rate is nolonger a key factor that determines SPDY performance.

To evaluate the potential side effects of TCP+, we lookat the number of retransmissions produced by TCP+.Figure 9 shows that SPDY still produces much fewerretransmissions with TCP+ than with HTTP, meaningthat TCP+ does not abuse the congestion window un-der the conditions that we experimented with. Here, weaim to demonstrate that SPDY’s negative impact underhigh random loss can be mitigated by tuning the conges-tion window. Because the loss patterns in real networksare likely more complex, a solution for real networksrequires further consideration and extensive evaluationsand is out of the scope of this paper.

5 Web pages and SPDYThis section examines how SPDY performs for real Webpages. Real page loads incur dependencies and compu-tation that may affect SPDY’s performance. To incor-porate dependencies and computation while controllingvariability, we develop a page load emulator Epload

7

Page 9: How Speedy is SPDY?

394 11th USENIX Symposium on Networked Systems Design and Implementation USENIX Association

Network activity Computation activity Dependency

Parse css tag Parse js tag Parse img tag

De

HTML

Elapsed Time Start Page load

CSS

JS1

Image

JS2

Parse js tag g g

Figure 10: A dependency graph obtained from WProf.

that hides the complexity and variations in browser com-putation while performing authentic network requests(§5.1). We use Epload to identify the effect of pageload dependencies and computation on SPDY’s perfor-mance (§5.2). We further study SPDY’s potential by ex-amining prioritization and server push (§5.3).

5.1 Epload: emulating page loads

Web objects in a page are usually not loaded at the sametime, because loading an object can depend on loadingor evaluating other objects. Therefore, not only networkconditions, but also page load dependencies and browsercomputation, affect page load times. To study how muchSPDY helps the overall page load time, we need to evalu-ate SPDY’s performance by preserving dependencies andcomputation of real page loads.

Dependencies and computation are naturally pre-served by loading pages in real browsers. However, thisprocedure incurs high variances in page load times thatstem from both network conditions and browser com-putation. We have conducted controlled experiments tocontrol the variability of network, and here introduce theEpload emulator to control the variability of computa-tion.

Design: The key idea of Epload is to decouple networkoperations and computation in page loads. This allowsEpload to simplify computation while scheduling net-work requests at the appropriate points during the pageload.Epload records the process of a page load by cap-

turing the dependency graph using our previous work,WProf [25]. WProf captures the dependency and timinginformation of a page load. Figure 10 shows an exampleof a dependency graph obtained from WProf where ac-tivities depend on each other. This Web page embeds aCSS, a JavaScript, an image, and another JavaScript. Abar represents an activity (i.e., loading objects, evaluat-ing CSS and JavaScript, parsing HTML) while an arrowrepresents that one activity depends on another. For ex-ample, evaluating JS1 depends on both loading JS1 andevaluating CSS. Therefore, evaluating JS1 can only startafter the other two activities complete. There are otherdependencies such as layout and painting. Because they

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

CD

F

Page load time (seconds)

Chrome

Epload

Figure 11: Page loads using Chrome v.s. Epload.

do not occur deterministically and significantly, we ex-clude them here.

Using the recorded dependency graph, Epload re-plays the page load process as follows. First, Eploadstarts the activity that loads the root HTML. When theactivity is finished, Epload checks whether it shouldtrigger a dependent activity based on whether all activi-ties that the dependent activity depends on are finished.For example in Figure 10, the dependent activity is pars-ing the HTML, and it should be triggered. Next, it startsthe activity that parses the HTML. Instead of performingHTML parsing, it waits for the same amount of time thatparsing takes (based on the recorded information) andchecks dependent activities upon completion. This pro-ceeds until all activities are finished. The actual replayprocess is more complex because a dependent activitycan start before an activity is fully completed. For exam-ple, parsing an HTML starts after the first chunk of theHTTP response is received; and loading the CSS startsafter the first chunk of HTML is fully parsed. Eploadmodels all of these aspects of a page load.

Implementation: Epload recorder is implementedbased on WProf to generate a dependency graph thatspecifies activities and their dependencies. Eploadrecords the computational delays while performing thepage load in the browser, whereas the network delays arerealized independently for each replay run. We imple-ment Epload replayer using node.js. The output fromEpload replayer is a series of throttled HTTP or SPDYrequests to perform a page load. The Epload codeis available at http://wprof.cs.washington.edu/spdy/.

Evaluation: We validate that Epload controls the vari-ability of computation. We compare the differences oftwo runs across 200 pages loaded by Epload and byChrome. The network is tuned to a 20ms RTT, a 10Mbpsbandwidth, and zero loss. Figure 11 shows that Eploadproduces at most 5% differences for over 80% of pageswhich is a 90% reduction compared to Chrome.

5.2 Effects of dependencies and computation

We use Epload to measure the impact of dependenciesand computation. We set up experiments as follows. TheEpload recorder uses a WProf-instrumented Chrome to

8

Page 10: How Speedy is SPDY?

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 395

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

CD

F

PLT of SPDY divided by PLT of HTTP

0 loss;TCP

0 loss;TCP+

2% loss;TCP

2% loss;TCP+

(a) rtt=20ms, bw=1Mbps

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

CD

F

PLT of SPDY divided by PLT of HTTP

0 loss;TCP

0 loss;TCP+

2% loss;TCP

2% loss;TCP+

(b) rtt=200ms, bw=1Mbps

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

CD

F

PLT of SPDY divided by PLT of HTTP

0 loss;TCP

0 loss;TCP+

2% loss;TCP

2% loss;TCP+

(c) rtt=20ms, bw=10Mbps

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

CD

F

PLT of SPDY divided by PLT of HTTP

0 loss;TCP

0 loss;TCP+

2% loss;TCP

2% loss;TCP+

(d) rtt=200ms, bw=10Mbps

Figure 12: SPDY performance using emulated page loads. Compared to Figure 6, it suggests that dependencies andcomputation reduce the impact of SPDY and that RTT and bandwidth become more important.

obtain the dependency graphs of the top 200 Alexa Webpages [1]. Epload runs on a Mac with 2GHz dual coreCPU and 4GB memory. We vary other factors based onthe parameter space described in Table 1. Due to spacelimit, we only show figures under a 10Mbps bandwidth.

Figure 12 shows the performance of SPDY versusHTTP after incorporating dependencies and computa-tion. Compared to Figure 6, dependencies and com-putation largely reduce the amount that SPDY helps orhurts. We make the following observations along withsupporting evidence. First, computation and dependen-cies increase PLTs of both HTTP and SPDY, reducingthe network load. Second, SPDY reduces the amount oftime a connection is idle, lowering the possibility of slowstart (see Figure 13). Third, dependencies help HTTP bymaking traffic less bursty, resulting in fewer retransmis-sions (see Figure 14). Fourth, having fewer outstandingobjects diminishes SPDY’s gains, because SPDY helpsmore when there are a large number of outstanding ob-jects (as suggested by the decision tree in Figure 2).Here, we see that dependencies and computation reduceand can easily nullify the benefits of SPDY, implyingthat speeding up computation or breaking dependenciesmight be necessary to improve the PLT using SPDY.

Interestingly, we find that RTT and bandwidth nowplay a more important role in the performance of SPDY.For example, Figure 12 shows that SPDY helps up to80% of the pages under low bandwidths, but only 55%of the pages under high bandwidths. This is because RTTand bandwidth determine the amount of time page loadsspend in network relative to computation, and further the

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

CD

F

% of idle RTTs

HTTP

SPDY

Figure 13: Fractions of RTTs when a TCP connection isidle. Experimented under 2% loss rate.

amount of impact that computation has on SPDY. Thisexplains why SPDY provides minimal improvements un-der good network conditions (see Figure 12(c)).

To identify the impact of computation, we scale thetime spent in each computation activity by factors of 0,0.5, and 2. Figure 15 shows the performance of SPDYversus HTTP, both with scaled computation and underhigh bandwidths, suggesting that speeding up computa-tion increases the impact of SPDY. Surprisingly, speed-ing up computation to the extreme is sometimes no betterthan a x2 speedup. This is because computation delaysthe requesting of dependent objects which allows for pre-viously requested objects to be loaded faster, and there-fore possibly lowers the PLT.

5.3 Advancing SPDY

SPDY provides two mechanisms, i) prioritization and ii)server push, to mitigate the negative effects of dependen-cies and computation of real page loads. However, littleis known about how to better use the mechanisms. In thissection, we explore advanced policies to speed up page

9

Page 11: How Speedy is SPDY?

396 11th USENIX Symposium on Networked Systems Design and Implementation USENIX Association

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80 100 120 140

CD

F

# of retransmits

w/o dep. & comp.

w/ dep. & comp.

Figure 14: SPDY helps reduce retransmissions.

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

CD

F

PLT of SPDY divided by PLT of HTTP

x 0

x 0.5

x 1

x 2

Figure 15: Results by varying computation whenbw=10Mbps, rtt=200ms.

loads using these mechanisms.

5.3.1 Basis of advancing

To better schedule objects, both prioritization and serverpush provide mechanisms to specify the importance foreach object. Thus, the key issue is to identify the im-portance of objects in an automatic manner. To highlightthe benefits, we leverage the dependency information ob-tained from a previous load of the same page. This infor-mation gives us ground truth as to which objects are crit-ical for reducing PLT. For example, in Figure 10, all theactivities depend on loading the HTML, making HTMLthe most important object; but no activity depends onloading the image, suggesting that the image is not animportant object.

To quantify the importance of an object, we first lookat the time required to finish the page load starting fromthe load of this object. We denote this as time to finish(TTF). In Figure 10, TTF of the image is simply the timeto load the image alone, while TTF of JS2 is the timeto both load and evaluate it. Because TTF of the imageis longer than TTF of JS2, this image is more importantthan JS2. Unfortunately in practice, it is not clear as tohow long it would take to load an object, before we makethe decision to prioritize or push it.

Therefore, we simplify the definition of importance.First, we convert the activity-based dependency graph toan object-based graph by eliminating computation whilepreserving dependencies (Figure 16). Second, we calcu-late the longest path from each object to the leaf objects;this process is equivalent to calculating node depths of adirected acyclic graph. Figure 16 (right) shows an exam-ple of assigned depths. Note that the depth here equals

Network activity Computation activity

H

Elapsed Time

J1

C

I1

J2

H

C J1

I1 J2

J1

H

C

d:1

d:3

d:4

d:2

d:3

WProf dependency graph Object dependency graph

Elapsed Time

I2 I2

J2

d:1

J1C

Dependency De

Figure 16: Converting WProf dependency graph to anobject-based graph. Calculating a depth to each object inthe object-based graph.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

10th 50th 90th

PL

T_

w/_

prio

rity

/ P

LT

_w

/o_

prio

rity

chrome-priority

dependendy-priority

(a) rtt=20ms, bw=10Mbps

0

0.2

0.4

0.6

0.8

1

1.2

1.4

10th 50th 90th

PL

T_

w/_

prio

rity

/ P

LT

_w

/o_

prio

rity

chrome-priority

dependendy-priority

(b) rtt=200ms, bw=10Mbps

Figure 17: Results of priority (zero packet loss) whenbw=10Mbps. bw=1Mbps results are similar to (b).

TTF if we ignore computation and suppose that the loadof each object takes the same amount of time.

We use this depth information to prioritize and pushobjects. This implies that the browser or the servershould know this beforehand. We provide a tool to letWeb developers measure the depth information for ob-jects transported by their pages.

5.3.2 Prioritization

SPDY/3 allows eight priority levels for clients touse when requesting objects. SPDY best practiceswebsite [18] recommends prioritizing HTML overCSS/JavaScript and CSS/JS over the rest (chrome-priority). Our priority levels are obtained by lin-early mapping the depth information computed above(dependency-priority).

We compare the two prioritization policies to baselineSPDY in Figure 17. Interestingly, we find that there isalmost no benefit by using chrome-priority whiledependency-policymarginally helps under a 20msRTT. The impact of explicit prioritization is minimal be-cause the dependency graph has already implicitly priori-tized objects. Implicit prioritization results from browserpolicies, independent of Web pages themselves. For ex-ample in Figure 10, all other objects cannot be loadedbefore HTML; Image and JS2 cannot be loaded beforeCSS and JS1. As dependencies limit the impact of SPDY,prioritization cannot break dependencies, and thus is un-likely to improve SPDY’s PLT.

10

Page 12: How Speedy is SPDY?

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 397

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

CD

F

% of pushed bytes

Push all

By embedding

By dependency

(a) Pushed bytes

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

CD

F

PLT w/ server push divided by SPDY PLT

By dependency

By embedding

Push all

(b) rtt=20ms, bw=10Mbps

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

CD

F

PLT w/ server push divided by SPDY PLT

By dependency

By embedding

Push all

(c) rtt=200ms, bw=10Mbps

Figure 18: Results of server push when bw=10Mbps.

5.3.3 Server push

SPDY allows servers to push objects to save round trips.However, server push is non-trivial because there is a ten-sion between making page loads faster and wasting band-width. Particularly, one should not overuse server pushif pushed objects are already cached. Thus, the key goalis to speed up page loads while keeping the cost low.

We find no standard or best practices guidance fromGoogle on how to do server push. Mod spdy can be con-figured to push up to an embedding level, which is de-fined as follows: the root HTML page is at embeddinglevel 0; objects at embedding level i are those whoseURLs are embedded in objects at embedding level i− 1.An alternative policy is to push based on the depth infor-mation.

Figure 18 shows server push performance (i.e., pushall objects, one embedding level, and one dependencylevel) compared to baseline SPDY. We find that serverpush helps, especially under high RTT. We also find thatpushing by dependency incurs comparable speedups topushing by embedding, while benefiting from a 80% re-duction in pushed bytes (Figure 18(a)). Note that serverpush does not always help because pushed objects sharebandwidth with more important objects. In contrast toprioritization, server push can help because it breaks de-pendencies which limits the performance gains of SPDY.

5.4 Putting it all together

We now pool together the various enhancements (i.e.,TCP+ and server push by one dependency level). Fig-ure 19 shows that this improves SPDY by 30% underhigh RTTs. But this improvement largely diminishes un-der low RTTs where computation dominates page loadtimes.

6 Discussions

SPDY in the wild: To evaluate SPDY in the wild,we place clients at Virginia (US-East), North Califor-nia (US-West), and Ireland (Europe) using Amazon EC2micro-instances. We add explanatory power by period-ically probing network parameters between clients andthe server, and find that RTTs are consistent: 22ms(US-East), 71ms (US-West), and 168ms (Europe). Forall vantage points, bandwidths are high (10Mbps to

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

CD

F

PLT of SPDY divided by PLT of HTTP

None

TCP+

TCP+; push

(a) rtt=20ms, bw=10Mbps

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

CD

F

PLT of SPDY divided by PLT of HTTP

None

TCP+

TCP+; push

(b) rtt=200ms, bw=10Mbps

Figure 19: Put all together when bw=10Mbps.

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

CD

F

PLT of SPDY divided by PLT of HTTP

No sharding

By domain

By TLD

Figure 20: Results of domain shading when bw=10Mbpsand rtt=20ms.

143Mbps) and loss rates are extremely low. These net-work parameters well explain our SPDY evaluations inthe wild (not shown due to space limit) that are simi-lar to synthetic ones under high bandwidths and low lossrates. The evaluations here are preliminary and coveringa complete set of scenarios would be future work.

Domain sharding: As suggested by SPDY best prac-tices [18], we used a single connection to fetch all theobjects of a page to eliminate the negative impact of do-main sharing. In practice, migrating objects to one do-main suffers from deployment issues given popular usesof third parties (e.g., CDNs, Ads, and Analytics). To this

11

Page 13: How Speedy is SPDY?

398 11th USENIX Symposium on Networked Systems Design and Implementation USENIX Association

end, we evaluate situations when objects are distributedto multiple servers that cooperatively use SPDY. We dis-tribute objects by full domain to represent the state-of-the-art of domain sharding. We also distribute objects bytop-level domain (TLD). This demonstrates the situationwhen websites have eliminated domain sharding but stilluse third-party services. Figure 20 compares SPDY per-formance under these object distributions. We find thatdomain sharding hurts as expected but hosting objects byTLD is comparable to using one connection, suggestingthat SPDY’s performance does not degrade much whensome portions of the page are provided by third-party ser-vices.

SSL: SSL adds overhead to page loads which can de-grade the impact of SPDY, but it keeps the handshakeoverhead low by using a single connection. We conductour experiments using SSL and find that the overhead ofSSL is too small to affect SPDY’s performance.

Mobile: We perform a small set of SPDY measurementsunder mobile environments. We assume large RTTs, lowbandwidths, high losses, and large computational delays,as suggested by related literature [3, 26]. Results withsimulated slow networks suggest that SPDY helps morebut also hurts more. It also shows that prioritization andserver push by dependency help less (not shown due tospace limit). However, large computational delays onmobile devices reduce the benefits provided by SPDY.This means that the benefits of SPDY under mobile sce-narios depends on the relative changes in performanceof the network and computation. Further studies on realmobile devices and networks would advance the under-standing in this space.

Limitations: Our work does not consider a number ofaspects. First, we did not evaluate the effects of headercompression because it is not expected to provide sig-nificant benefits. Second, we did not evaluate dynamicpages which take more time in server processing. Simi-lar to browser computation, server processing will likelyreduce the impact of SPDY. Last, we are unable to eval-uate SPDY under production servers where network isheavily used.

7 Related Work

SPDY studies: Erman et al. [7] studied SPDY in thewild on 20 Web pages by using cellular connections andSPDY proxies. They found that SPDY performed poorlywhile interacting with radios due to a large body of un-necessary retransmissions. We used more reliable con-nections, enabled SPDY on servers, and swept a morecomplete parameter space. Other SPDY studies includethe SPDY white paper [20] and measurements by Mi-crosoft [14], Akamai [13], and Cable Labs [19]. TheSPDY white paper shows a 27% to 60% speedup for

SPDY, but the other studies show that SPDY helps onlymarginally. While providing invaluable measurements,these studies look at a limited parameter space. Studiesby Microsoft [14] and Cable Labs [19] only measuredsingle Web pages and the other studies consider only alimited set of network conditions. Our study extensivelyswept the parameter space including network parame-ters, TCP settings, and Web page characteristics. We arethe first to isolate the effect of dependencies, which arefound to limit the impact of SPDY.

TCP enhancements for the Web: Google have pro-posed and deployed several TCP enhancements to makethe Web faster. TCP fast open eliminates the TCP con-nection setup time by sending application data in theSYN packet [15]. Proportional rate reduction smoothlybacks off congestion window to transmit more data un-der packet loss [5]. Tail loss probe [23] and othermeasurement-driven enhancements described in [8] miti-gated or eliminated loss recovery by retransmission time-out. Our TCP modifications are specific to SPDY and areorthogonal to Google’s proposals.

Advanced SPDY mechanisms: There are no recom-mended policies on how to use the server push mech-anism. We find that mod spdy [11] implements serverpush by embedding levels. However, we find that thispush policy wastes bandwidths. We provide a serverpush policy based on dependency levels that performscomparably to mod spdy’s while pushing 80% less data.

8 ConclusionOur experiments and prior work show that SPDY can ei-ther help or sometimes hurt the load times of real Webpages by browsers compared to using HTTP. To learnwhich factors lead to performance improvements, westart with simple, synthetic page loads and progressivelyadd key features of the real page load process. We findthat most of the performance impact of SPDY comesfrom its use of a single TCP connection: when there islittle network loss a single connection tends to performwell, but when there is high loss a set of connections tendto perform better. However, the benefits from a singleTCP connection can be easily overwhelmed by depen-dencies in real Web pages and browser computation. Weconclude that further benefits in PLT will require changesto restructure the page load process, such as the serverpush feature of SPDY, as well as careful configuration atthe TCP level to ensure good network performance.

AcknowledgementsWe thank Will Chan, Yu-Chung Cheng, and RobertoPeon from Google, our shepherd, Sanjay Rao, and theanonymous reviewers for their feedback. We thankRuhui Yan for helping analyze packet traces.

12

Page 14: How Speedy is SPDY?

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 399

References[1] Alexa - The Web Information Company.

http://www.alexa.com/topsites/countries/US.

[2] Data compression in chrome beta for android.http://blog.chromium.org/2013/03/data-compression-in-chrome-beta-for.html.

[3] A. Balasubramanian, R. Mahajan, andA. Venkataramani. Augmenting Mobile 3GUsing WiFi. In Proc. of the international confer-ence on Mobile systems, applications, and services(Mobisys), 2010.

[4] National Broadband Map. http://www.broadbandmap.gov/.

[5] N. Dukkipati, M. Mathis, Y. Cheng, andM. Ghobadi. Proportional Rate Reduction for TCP.In Proc. of the SIGCOMM conference on InternetMeasurement Conference (IMC), 2011.

[6] Dummynet. http://info.iet.unipi.it/

˜luigi/dummynet/.

[7] J. Erman, V. Gopalakrishnan, R. Jana, and K. Ra-makrishnan. Towards a SPDYier Mobile Web? InProc. of the International Conference on emerg-ing Networking EXperiments and Technologies(CoNEXT), 2013.

[8] T. Flach, N. Dukkipati, A. Terzis, B. Raghavan,N. Cardwell, Y. Cheng, A. Jain, S. Hao, E. Katz-Bassett, and R. Govindan. Reducing Web Latency:the Virtue of Gentle Aggression. In Proc. of theACM Sigcomm, 2013.

[9] T. J. Hacker, B. D. Noble, and B. D. Athey. TheEffects of Systemic Packet Loss on Aggregate TCPFlows . In Proc. of IEEE Iternational Parallel andDistributed Processing Symposium (IPDPS), 2002.

[10] HTTP/2.0 Draft Specifications. https://github.com/http2/http2-spec.

[11] mod spdy. https://code.google.com/p/mod-spdy/.

[12] World Flags mod spdy Demo. https://www.modspdy.com/world-flags/.

[13] Not as SPDY as you thought. http://www.guypo.com/technical/not-as-spdy-as-you-thought/.

[14] J. Padhye and H. F. Nielsen. A comparison ofSPDY and HTTP performance. In MSR-TR-2012-102.

[15] S. Radhakrishnan, Y. Cheng, J. Chu, A. Jain, andB. Raghavan. TCP Fast Open. In Proc. of the Inter-national Conference on emerging Networking EX-periments and Technologies (CoNEXT), 2011.

[16] Amazon silk browser. http://amazonsilk.wordpress.com/.

[17] Chapter 11. HTTP 1.X. http://chimera.labs.oreilly.com/books/1230000000545/ch11.html.

[18] SPDY best practices. http://dev.chromium.org/spdy/spdy-best-practices.

[19] Analysis of SPDY and TCP Initcwnd.http://tools.ietf.org/html/draft-white-httpbis-spdy-analysis-00.

[20] SPDY whitepaper. http://www.chromium.org/spdy/spdy-whitepaper.

[21] SPDY protocol–Draft 3. http://www.chromium.org/spdy/spdy-protocol/spdy-protocol-draft3.

[22] Spdylay - SPDY C Library. https://github.com/tatsuhiro-t/spdylay.

[23] Tail Loss Probe (TLP): An Algorithm forFast Recovery of Tail Losses. http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01.

[24] W3C DOM Level 3 Events Specification.http://www.w3.org/TR/DOM-Level-3-Events/.

[25] X. S. Wang, A. Balasubramanian, A. Krishna-murthy, and D. Wetherall. Demystifying page loadperformance with WProf. In Proc. of the USENIXconference on Networked Systems Design and Im-plementation (NSDI), 2013.

[26] X. S. Wang, H. Shen, and D. Wetherall. Accelerat-ing the Mobile Web with Selective Offloading. InProc. of the ACM Sigcomm Workshop on MobileCloud Computing (MCC), 2013.

13