proposed work 1. client-server synchronization proposed work 2

Proposed Work

1

Client-Server Synchronization

Proposed Work

2

Implement Synchronization Framework

• Implement multi-streaming basics

• Separate streams

• Events tagged by virtual clock

• Begin optimization of specific streams

• Partial-knowledge prediction

• Generalize as we go

• Software framework?

3

Implement Building Selection Synchronization

• First target: decouple from dependencies

• Most important: pre-compute far ahead

• All clients informed of future selections

• Selection may depend upon past object position

• Secondary: adjust selection criteria

• More deterministic: compensation less likely

• Proximity a factor, but eliminate hard-edge cases

4

Object Transform Stream

• Common Verlet integrator functor tested

• Dynamically test float-point drift

• Compensation could ramp up for some systems

• Test most efficient way to deal with collisions

• Transmit collision info (including narrow misses)

• Pre-transmit transform updates at critical points

5

Significance

• Bandwidth is a scalability issue

• Scalable City’s complex environment

• Bandwidth is a cost issue

• Mobile systems

• Latency tolerance is a cost issue

• Normally addressed with more local servers

• Assumes dividing up customers is an option

6

Physics System

Proposed Work

7

Host Broad Phase Path

• Broad phase and CLEngine is ready!

• Implement active-subset maintenance code

• Leverages existing subsystems for performance

• Analyze performance characteristics

• How many players before bp becomes liability?

• Expect very good results, but results must be evaluated holistically with communication in the picture

8

OpenCL Broad Phase Path

• CL Hash Grid with additional query stage

• Traditional “gather results” challenge now applies

• Comm. plumbing for state maintenance

• Host code to manage very large objects

• Optional: Evaluate a state-aware CLEngine

• Adds a level of indirection, but reduces work

9

Hybrid Broad Phase

• Each system may exhibit valuable trade-offs

• Host BP: Lower power most of the time?

• CL BP: Higher maximum performance?

• Dynamically switching systems possible

• Best of all worlds: based upon cross-over point

• Only if performance data indicates benefit

10

Optimizations

• Many optimizations undone

• Communication (Transport Kernel, more)

• Computation (Rewrite stick constraint solver, etc)

• OpenCL (local memory, vector types)

• Expect much more speed-up

• Will address once complete system is up

• Attack highest overheads first

11

Distributed Layer

• Perform object migration, ghosting via MPI

• Or extend our custom boost::asio/ser. system

• Allows to scale beyond single memory space

• Apply to clusters, server farms

• No longer limited to bandwidth of a few channels

• Test very large scale systems

12

Significance

• Highest performance physics system

• Most power-efficient system

• Usable for large VR environments, games

• Low-power mobile device support w/ OpenCL

• Likely to be available soon

13

Tracking Heap Growth

Proposed Work

14

Diagnosing Unbounded Heap Growth in C++Proposed Work

• Implement multi-threaded sampling• Stack tracing for insertion operations

– Identify points of growth for aggregates• Reduce overhead of temp aggregates

– Detect stack-based aggregates– Minimum lifetime before tracking begins– Allow stack-tracing with low perform. impact

15

erik

new slide

Diagnosing Unbounded Heap Growth in C++ Proposed Work

• Child aggregates problem (e.g., Linear Hashing buckets)– Children arrive, grow, and are then replaced

• Google Chrome / Chromium results– Continuously-growing false positives– Parent is tumor, children being detected

16

2 3 4 5 6 70

100

200

300

400

500

600

700

Tumors

erik

new slide

Diagnosing Unbounded Heap Growth in C++ Proposed Work

• All children have precisely the same type tag– Can use a “unique” filter post-processing– Not ideal: can cause you to miss tumors!

• Problem: no concept of aggregate relationships

17

2 3 4 5 6 70

5

10

15

20

25

30

35

Tumors

False Positives

False Negatives

erik

new slide

Diagnosing Unbounded Heap Growth in C++Proposed Work

• Identifying aggregate relationships– Detect size holistically– Performance challenges

• Increased automation– Detect custom data structures

• Automatically create wrappers for them• Presently user must identify & create wrapper

– Perform code transformations automatically• Solutions not complete worked out

– Both may require parser (e.g. clang)18

erik

new slide

Significance

• Long-running applications need dynamic analysis

• Our experience indicates that most software suffers from these problems

• Allowed Scalable City to run for months

• Mobile applications: less memory, no VM

• Lowers cost of bug discovery significantly

• Stable software is good for everyone!

19

proposed work 1. client-server synchronization proposed work 2

Documents

physics system proposed

c proposed work

complete system

opencl broad phase path

hybrid broad phase

powerefficient system

performance characteristics

performance data