software engineering for the web: the state of practice. icse 2014

Post on 29-Nov-2014

985 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Today’s web applications increasingly rely on client-side code execution. HTML is not just created on the server, but ma- nipulated extensively within the browser through JavaScript code. In this paper we seek to understand the software en- gineering implications of this. We look at deviations from many known best practices in such areas of network per- formance, accessibility, and correct structuring of HTML documents. Furthermore, we assess to what extent such deviations manifest themselves through client-side code ma- nipulation only. To answer these questions, we conducted a large scale experiment, involving automated client-enabled crawling of over 4000 web applications, resulting in over 100,000,000 pages analyzed, and close to 1,000,000 unique client site user interface states. Our findings show that the majority of sites contain a substantial number of problems, making sites unnecessarily slow, inaccessible for the visually impaired, and with layout that is in unpredictable due to errors in the dynamically modified DOM trees http://salt.ece.ubc.ca/publications/docs/icse14-seip.pdf

TRANSCRIPT

Software Engineering for the WebThe State of the Practice

Alex Nederlof

http://bit.ly/sop_icse14

Arie van DeursenAli Mesbah

@alexnederlof@avandeursen

@amesbah

TESTING WEB APPS IS A

PAIN IN THE NECKCan’t we fix that?

SPOILER:WE’RE NOT DOING WELL

Web

Applications?

The web was designed for document sharing

between researchers using

HTML

But thenJavaScript

Happened

COMPLEXITY x DIVERSITY - TESTING

= BUGS

CRAWLJAX JavaScript-Enabled Crawling

sldfjsdfk

<!DOCTYPE HTML> <HTML> <HEADER> <TITLE>Computers Rule</TITLE> </HEADER> <BODY> <H1>Computer says:</H1> <p>NO</p> </BODY> </HTML> !

!

<!DOCTYPE HTML> <HTML> <HEADER> <TITLE>Ultimate Answer</TITLE> </HEADER> <BODY> <H1>Computer says:</H1> <p>42</p> </BODY> </HTML> !

!

STATESARE THE NEW

PAGES

4,221 APPLICATIONS

2,974,641 STATES

How dynamic is the web?

How bad is the web?

MEASURINGDYNAMISM

How dynamic is the web?

States / URL 1.9 states

State invisibility 96%

Post-load DOM manipulations

64% Text 89% DOM

ASSESSINGTHE DAMAGE

DEFINING AMBIGUOUS ID

ATTRIBUTES

<H1 class=”title” id=”first-title”>Hello!</H1>

53% of the sites do on 35% of the states

DEFINE A DOCTYPE

<!DOCTYPE HTML> <HTML> <HEADER> <TITLE>Hello World</TITLE> </HEADER> <BODY> <H1>Hello Msc Thesis!</H1> <A href=”http://ns.nl”>Go to NS</A> </BODY> </HTML>

61.6% RENDER IT

90’s STYLE

FORMULATE VALID HTML

<H1 class=”title” id=”first-title”>Hello!</H1>

13% Forget this

{9% go wrong here

20% misplace elements altogether

53% Contain Double IDs

61% Renders like the 90s

~ 20% Contains invalid HTML

SPEED

Errors in the web

Best practices

THOU SHALL CACHE THY RESOURCES

43% doesn’t

0% Used HTML-5 Caching

THOU SHALL COMPRESS

THY RESOURCES

80% doesn’t

THOU SHALL PUT STYLE SHEETS

ON TOP

56% doesn’t

THOU SHALL ONLY BLOCK JS

WHEN NECESSARY

43% Does not cache

80% Is not compressed

56% Reloads CSS too often

ACCESSIBILITY

FEEL

LISTEN

<IMG src=”lolcat.jpg” alt=”Picture of a cat” />

<LABEL for=”username”> Enter your username </LABEL>

36% Do not label input

<div role=”navigation”>

<HEADER> <ARTICLE>

NAVIGATION

<NAV>

25%

5%11% 60%

No indicatorsJust rolesJust SemanticBoth

NAVIGATION ASSISTANCE

THE WEB IS:• HIGHLY DYNAMIC

• RIDDLED WITH ERRORS

• NOT AS FAST AS IT COULD BE

• NOT NEARLY ACCESSIBLE ENOUGH

What to do?

Modern Web development

All pages are rendered

New pages are rendered client side

STATIC ANALYSIS+ CRAWLER= SUPER POWERS

Generic invariantsValid HTML, JavaScript, CSS

Accessibility support

Performance best practices

Do all images load?

Am I using my framework correctly?

Are all my pages translated?

Are there any JS errors triggered?

Semi-Generic invariants

Is my logo on every page?

Is the feedback button on every page?

Does every page link to the homepage?

App-specific invariants

CRAWLING BONUS!Code coverage

Performance testingRandom testing

CHALLENGESState duplication detection is hard

Deployment seems hard

Testing by crawling works and should be explored further.

Automated Error detectionQuestions?

Find me on Twitter: @alexnederlofhttp://crawljax.com

top related