automatic javascript parallelism for resource-efficient

25
Horcrux: Automatic JavaScript Parallelism for Resource-Efficient Web Computations Shaghayegh Mardani 1 , Ayush Goel 2 , Ronny Ko 3 , Harsha Madhyastha 2 , Ravi Netravali 4 1 UCLA, 2 University of Michigan, 3 Harvard University, 4 Princeton University 1

Upload: others

Post on 13-Nov-2021

36 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic JavaScript Parallelism for Resource-Efficient

Horcrux:Automatic JavaScript Parallelism for

Resource-Efficient Web Computations

Shaghayegh Mardani1, Ayush Goel2, Ronny Ko3, Harsha Madhyastha2, Ravi Netravali41UCLA, 2University of Michigan, 3Harvard University, 4Princeton University

1

Page 2: Automatic JavaScript Parallelism for Resource-Efficient

Modern Web Browsing

2

Performance MattersWeb Traffic is Dominated by Mobile

Users

53% of visits are abandoned if a

mobile site takes longer than 3

seconds to load.Source: thinkwithgoogle.com

Content Providers

Source: bluecorona.com

If your site makes $100,000/day, 1 sec

improvement in page load increases revenue

by $7,000 daily.

Page 3: Automatic JavaScript Parallelism for Resource-Efficient

The Problem: Computation Delays

• Evaluation setup

• With NO network delays:

3

Developed Region Emerging Region

• 582 pages in US• Google Pixel 3

• 2.0 GHz octa-core

• 91 pages in Pakistan• Redmi 6A

• 2.0 GHz quad-core

Performance metrics:• Page Load Time (PLT)• Speed Index (SI)

% of pages with load times > 3

Page 4: Automatic JavaScript Parallelism for Resource-Efficient

Why Are Computation Delays Significant?

4

• 80% more JavaScript over the last 5 years

Mobile DevicesJavaScript

primary compute improvements

=more cores

52%

Page 5: Automatic JavaScript Parallelism for Resource-Efficient

More cores != Better performance

5

Reason: Single-thread execution Solution: Parallelizing JavaScript

Page 6: Automatic JavaScript Parallelism for Resource-Efficient

Parallelization Opportunities

• Web workers are widely supported by browsers• Constrained APIs:• No access to DOM APIs• No access to the main global state

• Legacy pages are highly amenable to safe parallelization

6

Pass-by-value

DOM APIs

Main Thread

JS Heap

JavaScript Engine

# of cores Speedup in JS Runtimes

2 cores 49%

4 cores 75%

8 cores 87%

Web Worker

Page 7: Automatic JavaScript Parallelism for Resource-Efficient

Challenges

7

Conservative Signatures

C1: Ensuring CorrectnessThe exact state accessed by parallelized JavaScript- Despite non-determinism and across all possible control paths

Page 8: Automatic JavaScript Parallelism for Resource-Efficient

Signature Generation

8

Dynamic Instrumentation

Instrumented

f(x)

Conservative Signatures

Original Page

Headless Browser

Concolic Engine

concrete input

JS codeto execute

f(x)

g(x)

h(x)

f´(x)

g´(x)

h´(x)

g(x) h(x)

if (x == 5) {y = x + 5;

} else {z = 9;

}

{"reads":[x],"writes":[y, z]

}Server-side and Offline

Conservative signaturesDynamic analysis: track read/writes to page state

Concolic execution: explore all possible control paths

Page 9: Automatic JavaScript Parallelism for Resource-Efficient

Challenges

9

Client-sideParallelization

Scheduler

Conservative Signatures

C1: Ensuring CorrectnessThe exact state accessed by parallelized JavaScript- Despite non-determinism and across all possible control paths

C2: Web Workers Constrained APIs- How to parallelize execution with constrained APIs?

Page 10: Automatic JavaScript Parallelism for Resource-Efficient

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

10Main Thread

Scheduler

Page State

JS Heap

DOMQueue

f(x) +

g(x) +

h(x) +

Web Worker

Page 11: Automatic JavaScript Parallelism for Resource-Efficient

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

11Main Thread

Scheduler

Page State

JS Heap

DOMQueue

g(x) +

h(x) +

Web Worker

no state dependenciesx = "foo"

y = 5

f(x) reads & writes to x, yf(x) +

Page 12: Automatic JavaScript Parallelism for Resource-Efficient

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

12Main Thread

Scheduler

Page State

JS Heap

DOMQueue

g(x) +

h(x) +

Web Worker

f(x) +

x = "foo"y = 5

Page 13: Automatic JavaScript Parallelism for Resource-Efficient

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

13Main Thread

Scheduler

Page State

JS Heap

DOMQueue

g(x) +

h(x) +

Web Worker

Initializestate

x = "foo"y = 5

Executef(x)

Page 14: Automatic JavaScript Parallelism for Resource-Efficient

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

14Main Thread

Scheduler

Page State

JS Heap

DOMQueue

h(x) +

Web Worker

Initializestate

x = "foo"y = 5

Executef(x)

g(x) reads y

f(x) reads & writes to x, y

state dependency

g(x) +

Page 15: Automatic JavaScript Parallelism for Resource-Efficient

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

15Main Thread

Scheduler

Page State

JS Heap

DOMQueue

h(x) +

Web Worker

Initializestate

x = "foo"y = 5

Executef(x)

g(x)+

Page 16: Automatic JavaScript Parallelism for Resource-Efficient

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

16Main Thread

Scheduler

Page State

JS Heap

DOMQueue

h(x) +

Web Worker

Initializestate

x = "foo"y = 5

Executef(x)

h(x) reads z

g(x)+

no state dependenciesz = "bar"

Page 17: Automatic JavaScript Parallelism for Resource-Efficient

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

17Main Thread

Scheduler

Page State

JS Heap

DOMQueue

Web Worker

Initializestate

x = "foo"y = 5

Executef(x)

g(x)+

h(x) +

z = "bar"

Page 18: Automatic JavaScript Parallelism for Resource-Efficient

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

18Main Thread

Scheduler

Page State

JS Heap

DOMQueue

Web Worker

Initializestate

x = "foo"y = 5

Executef(x)

g(x)+

Initializestate

Executeh(x)

z = "bar"

Page 19: Automatic JavaScript Parallelism for Resource-Efficient

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

19Main Thread

Scheduler

Page State

JS Heap

DOMQueue

Web Worker

Initializestate

x = "foo"y = 5

Executef(x)

g(x)+

Initializestate

Executeh(x)

z = "bar"

Pauses the execution

getEle

mentBy

Id()

Page 20: Automatic JavaScript Parallelism for Resource-Efficient

Challenges

20

Client-sideParallelization

Scheduler

Conservative Signatures

C1: Ensuring CorrectnessThe exact state accessed by parallelized JavaScript- Despite non-determinism and across all possible control paths

C2: Web Workers Constrained APIs- How to parallelize execution with constrained APIs?

Root-function Granularity

C3: Offloading overheads- Pass-by-value I/O can take ~0-10 ms

Page 21: Automatic JavaScript Parallelism for Resource-Efficient

Offloading Granularity

• Trade-off: parallelization benefits vs. offloading overheads• Solution: offloading root-function invocations

21

A()

B()

C()

D()

<script>function A() {

…B(); // invokes C()D();…

}A();

</script>

Original Page Root-function

4x fewer offloads and 73% of parallelization

benefits

Page 22: Automatic JavaScript Parallelism for Resource-Efficient

Evaluation Questions

• Impact on browser computation delays?• Impact on end-to-end performance?• Horcrux comparison to prior compute optimizations?• What do conservative signatures forgo?• How much are the server-side overheads?

22

Page 23: Automatic JavaScript Parallelism for Resource-Efficient

Computation Delay Reductions

• Total Computation Time (TCT)

23

0

20

40

60

80

Developed Emerging

% Im

prov

emen

t in

TC

T

Network: WiFi LTEMedian Improvements:

Developed: WiFi (41%), LTE (34%)

Emerging: WiFi (44%), LTE (31%)

Page 24: Automatic JavaScript Parallelism for Resource-Efficient

End-to-end Performance Improvements

• Page Load Time (PLT) and Speed Index (SI)

24

0.00

0.25

0.50

0.75

1.00

0 25 50 75% Improvement

CD

F

LTE, PLTLTE, SIWiFi, PLTWiFi, SI

100

27%

Developed Region

0.00

0.25

0.50

0.75

1.00

0 25 50 75% Improvement

CD

F

LTE, PLTLTE, SIWiFi, PLTWiFi, SI

100

29%

Emerging Region

Page 25: Automatic JavaScript Parallelism for Resource-Efficient

Conclusion

• More cores != Better performance• Horcrux automatically parallelizes JavaScript execution using concolic

execution to take advantage of phones multi-core CPUs

25

[email protected]

github.com/ShaghayeghMrdn/horcrux-osdi21