page load time

Upload: peeyush-singh

Post on 05-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Page Load Time

    1/10

    Optimizing Page Load Time

    It is widely accepted that fast-loading pages improve the user experience. In recentyears, many sites have started using AJAX techniques to reduce latency. Rather thanround-trip through the server retrieving a completely new page with every click, oftenthe browser can either alter the layout of the page instantly or fetch a small amountof HTML, XML, or javascript from the server and alter the existing page. In either case,this significantly decreases the amount of time between a user click and the browserfinishing rendering the new content.

    However, for many sites that reference dozens of external objects, the majority of thepage load time is spent in separate HTTP requests for images, javascript, andstylesheets. AJAX probably could help, but speeding up or eliminating these separateHTTP requests might help more, yet there isn't a common body of knowledge abouthow to do so.

    While working on optimizing page load times for a high-profile AJAX application, I hada chance to investigate how much I could reduce latency due to external objects.Specifically, I looked into how the HTTP client implementation in common browsersand characteristics of common Internet connections affect page load time for pageswith many small objects.

    I found a few things to be interesting:

    IE, Firefox, and Safari ship with HTTP pipelining disabled by default; Opera isthe only browser I know of that enables it. No pipelining means each requesthas to be answered and its connection freed up before the next request can besent. This incurs average extra latency of the round-trip (ping) time to the user

    divided by the number of connections allowed. Or if your server has HTTPkeepalives disabled, doing another TCP three-way handshake adds anotherround trip, doubling this latency.

    By default, IE allows only two outstanding connections per hostname whentalking to HTTP/1.1 servers or eight-ish outstanding connections total. Firefoxhas similar limits. Using up to four hostnames instead of one will give you moreconnections. (IP addresses don't matter; the hostnames can all point to thesame IP.)

    Most DSL or cable Internet connections have asymmetric bandwidth, at rateslike 1.5Mbit down/128Kbit up, 6Mbit down/512Kbit up, etc. Ratios of downloadto upload bandwidth are commonly in the 5:1 to 20:1 range. This means thatfor your users, a request takes the same amount of time to send as it takes to

    receive an object of 5 to 20 times the request size. Requests are commonlyaround 500 bytes, so this should significantly impact objects that are smallerthan maybe 2.5k to 10k. This means that serving small objects might mean thepage load is bottlenecked on the users' upload bandwidth, as strange as thatmay sound.

    Using these, I came up with a model to guesstimate the effective bandwidth of usersof various flavors of network connections when loading various object sizes. Itassumes that each HTTP request is 500 bytes and that the HTTP reply includes 500

    http://en.wikipedia.org/wiki/Ajax_(programming)http://www.mozilla.org/projects/netlib/http/pipelining-faq.htmlhttp://www.mozilla.org/projects/netlib/http/pipelining-faq.htmlhttp://en.wikipedia.org/wiki/Ajax_(programming)
  • 8/2/2019 Page Load Time

    2/10

    bytes of headers in addition to the object requested. It is simplistic and only coversconnection limits and asymmetric bandwidth, and doesn't account for the TCPhandshake of the first request of a persistent (keepalive) connection, which isamortized when requesting many objects from the same connection. Note that this isbest-case effective bandwidth and doesn't include other limitations like TCP slow-start, packet loss, etc. The results are interesting enough to suggest avenues of

    exploration but are no substitute for actually measuring the difference with realbrowsers.

    To show the effect of keepalives and multiple hostnames, I simulated a user on netoffering 1.5Mbit down/384Kbit up who is 100ms away with 0% packet loss. Thisroughly corresponds to medium-speed ADSL on the other side of the U.S. from yourservers. Shown here is the effective bandwidth while loading a page with manyobjects of a given size, with effective bandwidth defined as total object bytes receiveddivided by the time to receive them:

    Interesting things to note:

    For objects of relatively small size (the left-hand portion of the graph), you cansee from the empty space above the plotted line how little of the user'sdownstream bandwidth is being used, even though the browser is requesting

  • 8/2/2019 Page Load Time

    3/10

    objects as fast as it can. This user has to be requesting objects larger than100k before he's mostly filling his available downstream bandwidth.

    For objects under roughly 8k in size, you can double his effective bandwidth byturning keepalives on and spreading the requests over four hostnames. This isa huge win.

    If the user were to enable pipelining in his browser (such as setting Firefox's

    network.http.pipelining in about:config), the number of hostnames we usewouldn't matter, and he'd make even more effective use of his availablebandwidth. But we can't control that server-side.

    Perhaps more clearly, the following is a graph of how much faster pages could loadfor an assortment of common access speeds and latencies with many external objectsspread over four hostnames and keepalives enabled. Baseline (0%) is one hostnameand keepalives disabled.

    Interesting things from that graph:

    If you load many objects smaller than 10k, both local users and ones on theother side of the world could see substantial improvement from enablingkeepalives and spreading requests over 4 hostnames.

    There is a much greater improvement for users further away.

  • 8/2/2019 Page Load Time

    4/10

    This will matter more as access speeds increase. The user on 100meg ethernetonly 20ms away from the server saw the biggest improvement.

    One more thing I examined was the effect of request size on effective bandwidth. Theabove graphs assumed 500 byte requests and 500 bytes of reply headers in additionto the object contents. How does changing that affect performance of our 1.5Mbitdown/384Kbit up and 100ms away user, assuming we're already using fourhostnames and keepalives?

    This shows that at small object sizes, we're bottlenecked on the upstream bandwidth.The browser sending larger requests (such as ones laden with lots of cookies) seemsto slow the requests down by 40% worst-case for this user.

    As I've said, these graphs are based on a simulation and don't account for a numberof real-world factors. But I've unscientifically verified the results with real browsers onreal net and believe them to be a useful gauge. I'd like to find the time and resourcesto reproduce these using real data collected from real browsers over a range of objectsizes, access speeds, and latencies.

  • 8/2/2019 Page Load Time

    5/10

    Measuring the effective bandwidth of your users

    You can measure the effective bandwidth of your users on your site relatively easily,and if the effective bandwidth of users viewing your pages is substantially below theiravailable downstream bandwidth, it might be worth attempting to improve this.

    Before giving the browser any external object references (, , , etc), record the current time. Afterthe page load is done, subtract the time you began, and include that time in the URLof an image you reference off of your server.

    Sample javascript implementing this:

    ...

    This will produce web log entries of the form:10.1.2.3 - - [28/Oct/2006:13:47:45 -0700] "GET /timer.gif?u=http://example.com/page.html&t=0.971HTTP/1.1" 200 49 ...

    in this case, showing that for this user, loading the rest of http://example.com/page.html took 0.971 seconds. And if you know that the

    combined size of everything referenced from that page is 57842 bytes, 57842 bytes *8 bits per byte / 0.971 seconds = 476556 bits per second effective bandwidth for thatpage load. If this user should be getting 1.5Mbit downstream bandwidth, there issubstantial room for improvement.

  • 8/2/2019 Page Load Time

    6/10

    Tips to reduce your page load time

    After you gather some page-load times and effective bandwidth for real users all overthe world, you can experiment with changes that will improve those times. Measurethe difference and keep any that offer a substantial improvement.

    Try some of the following:

    Turn on HTTP keepalives for external objects. Otherwise you add an extraround-trip to do another TCP three-way handshake and slow-start for every HTTPrequest. If you are worried about hitting global server connection limits, set thekeepalive timeout to something short, like 5-10 seconds. Also look into serving yourstatic content from a different webserver than your dynamic content. Havingthousands of connections open to a stripped down static file webserver can happen in

    like 10 megs of RAM total, whereas your main webserver might easily eat 10 megs ofRAM per connection. Load fewer external objects. Due to request overhead, one bigger file just loads

    faster than two smaller ones half its size. Figure out how to globally reference thesame one or two javascript files and one or two external stylesheets instead of many;if you have more, try preprocessing them when you publish them. If your UI usesdozens of tiny GIFs all over the place, consider switching to a much cleaner CSS-based design which probably won't need so many images. Or load all of your commonUI images in one request using a technique called "CSS sprites".

    If your users regularly load a dozen or more uncached or uncacheable objectsper page, consider evenly spreading those objects over four hostnames. This usuallymeans your users can have 4x as many outstanding connections to you. Without

    HTTP pipelining, this results in their average request latency dropping to about 1/4 ofwhat it was before.

    When you generate a page, evenly spreading your images over fourhostnames is most easily done with a hash function, like MD5. Rather thanhaving all tags load objects from http://static.example.com/, create fourhostnames (e.g. static0.example.com, static1.example.com,static2.example.com, static3.example.com) and use two bits from an MD5 ofthe image path to choose which of the four hosts you reference in the tag. Make sure all pages consistently reference the same hostname for thesame image URL, or you'll end up defeating caching.

    Beware that each additional hostname adds the overhead of an extra DNSlookup and an extra TCP three-way handshake. If your users have pipeliningenabled or a given page loads fewer than around a dozen objects, they will seeno benefit from the increased concurrency and the site may actually load moreslowly. The benefits only become apparent on pages with larger numbers ofobjects. Be sure to measure the difference seen by your users if you implementthis.

    http://httpd.apache.org/docs/2.0/mod/core.html#keepalivehttp://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_establishmenthttp://en.wikipedia.org/wiki/Slow-starthttp://en.wikipedia.org/wiki/Slow-starthttp://www.w3.org/Style/CSS/learninghttp://www.csszengarden.com/http://www.informit.com/articles/article.asp?p=447210&rl=1http://httpd.apache.org/docs/2.0/mod/core.html#keepalivehttp://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_establishmenthttp://en.wikipedia.org/wiki/Slow-starthttp://www.w3.org/Style/CSS/learninghttp://www.csszengarden.com/http://www.informit.com/articles/article.asp?p=447210&rl=1
  • 8/2/2019 Page Load Time

    7/10

    Possibly the best thing you can do to speed up pages for repeat visitors is toallow static images, stylesheets, and javascript to be unconditionally cached by thebrowser. This won't help the first page load for a new user, but can substantiallyspeed up subsequent ones.

    Set an Expires header on everything you can, with a date days or even monthsinto the future. This tells the browser it is okay to not revalidate on everyrequest, which can add latency of at least one round-trip per object per pageload for no reason.

    Instead of relying on the browser to revalidate its cache, if you change anobject, change its URL. One simple way to do this for static objects if you havestaged pushes is to have the push process create a new directory named bythe build number, and teach your site to always reference objects out of thecurrent build's base URL. (Instead of you'd use . When you doanother build next week, all references change to .) This also nicely solvesproblems with browsers sometimes caching things longer than they should --since the URL changed, they think it is a completely different object.

    If you conditionally gzip HTML, javascript, or CSS, you probably want to add a"Cache-Control: private" if you set an Expires header. This will preventproblems with caching by proxies that won't understand that your gzippedcontent can't be served to everyone. (The Vary header was designed to do thismore elegantly, but you can't use it because of IE brokenness.)

    For anything where you always serve the exact same content when given thesame URL (e.g. static images), add "Cache-Control: public" to give proxies

    explicit permission to cache the result and serve it to different users. If acaching proxy local to the user has the content, it is likely to have much lesslatency than you; why not let it serve your static objects if it has them?

    Avoid the use of query params in image URLs, etc. At least the Squid cacherefuses to cache any URL containing a question mark by default. I've heardrumors that other things won't cache those URLs at all, but I don't have moreinformation.

    On pages where your users are often sent the exact same content over andover, such as your home page or RSS feeds, implementing conditional GETs cansubstantially improve response time and save server load and bandwidth in cases

    where the page hasn't changed.

    When serving a static files (including HTML) off of disk, most webservers willgenerate Last-Modified and/or ETag reply headers for you and make use of thecorresponding If-Modified-Since and/or If-None-Match mechanisms on requests.But as soon as you add server-side includes, dynamic templating, or have codegenerating your content as it is served, you are usually on your own toimplement these.

    http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.21http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9http://www.squid-cache.org/http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.3.1http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.3.2http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.25http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.26http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.21http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9http://www.squid-cache.org/http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.3.1http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.3.2http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.25http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.26
  • 8/2/2019 Page Load Time

    8/10

  • 8/2/2019 Page Load Time

    9/10

    time increases with each object loaded.YSlow extends Firebug to offer tips on how toimprove your site's performance.

    The Safari team offers a tip on a hidden feature in their browser that offerssome timing data too.

    Or if you are familiar with the HTTP protocol and TCP/IP at the packet level, youcan watch what is going on using tcpdump, ngrep, or ethereal. These tools areindispensable for all sorts of network debugging.

    Try benchmarking common pages on your site from a local network with ab,which comes with the Apache webserver. If your server is taking longer than 5 or 10milliseconds to generate a page, you should make sure you have a goodunderstanding of where it is spending its time.

    If your latencies are high and your webserver process (or CGI if you are usingthat) is eating a lot of CPU during this test, it is often a result of using a

    scripting language that needs to recompile your scripts with every request.Software like eAccelerator for PHP, mod_perl for perl, mod_python for python,etc can cache your scripts in a compiled state, dramatically speeding up yoursite. Beyond that, look at finding a profiler for your language that can tell youwhere you are spending your CPU. If you improve that, your pages will loadfaster and you'll be able to handle more traffic with fewer machines.

    If your site relies on doing a lot of database work or some other time-consuming task to generate the page, consider adding server-side caching ofthe slow operation. Most people start with writing a cache to local memory orlocal disk, but that starts to fall down if you expand to more than a few webserver machines. Look into using memcached, which essentially creates an

    extremely fast shared cache that's the combined size of the spare RAM yougive it off of all of your machines. It has clients available in most commonlanguages.

    (Optional) Petition browser vendors to turn on HTTP pipelining by default onnew browsers. Doing so will remove some of the need for these tricks and make muchof the web feel much faster for the average user. (Firefox has this disabledsupposedly because some proxies, some load balancers, and some versions of IISchoke on pipelined requests. But Opera has found sufficient workarounds to enablepipelining by default. Why can't other browsers do similarly?)

    The above list covers improving the speed of communication between browser and

    server and can be applied generally to many sites, regardless of what web serversoftware they use or what language the code behind your site is written in. There is,unfortunately, a lot that isn't covered.

    While the tips above are intended to improve your page load times, a side benefit ofmany of them is a reduction in server bandwidth and CPU needed for the averagepage view. Reducing your costs while improving your user experience seems it shouldbe worth spending some time on.

    http://developer.yahoo.com/yslow/http://webkit.org/blog/?p=75http://www.die.net/doc/linux/man/man8/tcpdump.8.htmlhttp://ngrep.sourceforge.net/http://www.ethereal.com/http://httpd.apache.org/docs/trunk/programs/ab.htmlhttp://httpd.apache.org/http://eaccelerator.net/http://perl.apache.org/http://www.modpython.org/http://www.danga.com/memcached/http://developer.yahoo.com/yslow/http://webkit.org/blog/?p=75http://www.die.net/doc/linux/man/man8/tcpdump.8.htmlhttp://ngrep.sourceforge.net/http://www.ethereal.com/http://httpd.apache.org/docs/trunk/programs/ab.htmlhttp://httpd.apache.org/http://eaccelerator.net/http://perl.apache.org/http://www.modpython.org/http://www.danga.com/memcached/
  • 8/2/2019 Page Load Time

    10/10