using the new flash stage3d web technology to build your own next 3d browser mmog daosheng mu, lead...
TRANSCRIPT
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser MMOG
Daosheng Mu, Lead Programmer
Eric Chang, CTO
XPEC Entertainment Inc.
Outline
• Brief of Speakers• Introduction of Adobe Flash Stage3D API• XPEC Flash 3D Engine• Optimization for Flash Program• Future Works• Conclusion• Q & A
Brief of Speakers
• Eric Chang– 19 Years of Game Industry
Experiences– Cross-platform 3D Game
Engine Development– PC/Console/Web
Brief of Speakers
• Daosheng Mu– 4.5 Years of Cross-platform 3D Game Engine Development
Experiences– PC/Console/Web
Why Flash?Native C/C++ vs. Unity vs. Flash
Native C/C++ Unity Flash
DevelopmentDifficulty High Low Mid
Ease ofCross Platform Low High High
Performance High Mid Low
Market Popularity Low Mid
High(>95%)
Project C4 Demo Video
Introduction of Adobe Flash Stage3D API
Stage3D
• Support all browsers
Stage3D
• Stage3D includes with GPU-accelerated 3D APIs– Z-buffering– Stencil/Color buffer– Vertex shaders– Fragment shaders– Cube textures– More…
Stage3D
• Pros:– GPU accelerated API– Relies on DirectX, OpenGL, OpenGL ES– Programmable pipeline
• Cons:– No support of alpha test– No support of high-precision texture format
Stage3D
ResourceNumber allowedTotal memoryVertex buffers 4096 256 MBIndex buffers 4096 128 MB Programs 4096 16 MB Textures 4096 128 MB*Cube textures 4096 256 MB Draw call limits 32,768
*350 MB is absolute limit for textures, 340 MB is the result we gather
AGAL
• Adobe Graphics Assembly Language– No support of ‘if-else’ statements– No support of ‘constants’
Program3D
XPEC Flash 3D Engine
Model Pipeline
• Action Message Format (AMF):– Native ByteArray compression– Native object serialization
3DS Max
EngineLoader
Exporter ColladaBinary
Converter AMF
AMFEngineRender
XPEC Flash 3D Engine
• Application: update/render on CPU• Command buffer: store graphics API
instruction
Application DriverCPU
XPEC Flash 3D Engine: Application
Object3D•Material•Geometry
Update•UpdateDeltaTime•UpdateTransform
Scene management•Scene partition•Frustum culling
Update•UpdateHierarchy
Draw•SetMaterial•SetGeometry
Stage3D•Set Stage3D APIs
Scene Management
• Goal: Minimize draw calls as possible• Indoor Scene
– BSP tree
• Outdoor Scene– Octree/Quad tree– Cell– Grid
Scene Management: Project C4• Grid partition• Object3D: (MinX, MaxX), (MinY, MaxY)
(0, 0)
(2, 2)
(4, 4)
(0,0),
(1,2)
(3,4),(0,2)
y
x
Scene Management: Project C4
• Frustum: (MinX, MaxX), (MinY, MaxY)
(0, 0)
(2, 2)
(4, 4)
(1,4),(0,4)
(0,0),
(1,2)
(3,4),(0,2)
y
x
XPEC Flash 3D Engine:Command Buffer
Initialize
• createVertex/Index Buffer
• createTexture
• createProgram
Begin
• clear• setRend
erToTexture
Draw
• setVertex/Index Buffer
• setProgram• setProgram
Constants• setRenderS
tate• setTextureA
t• drawTriangl
es
• Avoid user/kernel mode transition• Decrease shader patching
– “Material sorting”• Reduce draw call
– “Shared buffers”– “Dynamic batching”
Material Sorting
• Opaque/Translucent
Material Sorting
• State management• 1047/2598 draw calls
0
20
40
60
CPU waiting GPURender loop
Ela
psed
tim
e(m
s)
020406080
100
CPU waiting GPURender loop
Ela
psed
tim
e(m
s)
Before sorting(ms) After sorting(ms)
NVIDIA 8800 GT- 1047 draw calls
Render loop elapsed time
16 16
Total elapsed time
41 40
NVIDIA 8800 GT- 2598 draw calls
Render loop elapsed time
36 36
Total elapsed time
50 50
Before sorting(ms) After sorting(ms)
NVIDIA 6600 GT- 1047 draw calls
Render loop elapsed time
34 31
Total elapsed time
53 48
NVIDIA 6600 GT- 2598 draw calls
Render loop elapsed time
81 64
Total elapsed time
89 89
Shared Buffers
• Problem:– Numbers of buffers are limited
ResourceNumber allowedTotal memoryVertex buffers 4096 256 MBIndex buffers 4096 128 MB Programs 4096 16 MB
Shared BuffersVertex Buffer
Index Buffer
Vertex Buffer
Index Buffer
Vertex Buffer
Index Buffer
Particle System
• Each particle property is computed on the CPU at each frame– Alpha, Color,
LinearForce, Size, Speed, UV
– Facing
Particle System
• Index buffer– Indices will not be changed
• Vertex buffer– Problem:
• Particle amount depends on frame• Upload data to vertex buffer frequently
Particle System
StaticIndex Buffer
DynamicVertex Buffer
Vertex Data
Skinned Model
• Problem:– Lesser vertex constants
allowed• 128 constants per vertex
program
– Global vertex constants• Lighting, Fog, Const
Skinned Model
• 4x3 Matrix• Bone count per
geometry is limited to 29– “Split mesh”
128 constants / 3 = 42.6666 bones3 * 29 bones = 87 constants
Shadow Map
Shadow Map
present()
End frame
setRenderToBackBuffer()
Set shadow map
setRenderToTexture()
Clear shadow map Draw to shadow map
clear()
Clear back buffer
Shadow Map• Problem:
– Texture format: RGBA8– Artifact
• Aliasing• Popping while moving
• Size: 1024x1024• RGBA8 R32
Shadow Map
Shadow Map
• Percentage Closer Filtering (PCF) solution:– Hard shadow– Aliasing– Popping while moving
Shadow Map
• PCF
pw = 1/mapWidth
ph = 1/mapHeight
• Result = 0.5 * texel( 0, 0) + 0.125 * texel( -pw, +ph) + 0.125 * texel(-pw, -ph)+ 0.125 * texel( +pw, +ph) + 0.125 * texel(+pw, -ph)
(-pw , +ph) (+pw , +ph)
(0, 0)
(+pw , -ph)(-pw , -ph)
Shadow Map
• PCF based solution:
NVIDIA 6600GT - 1047 draw
calls
NVIDIA 6600GT - 1047 draw calls with
PCF
NVIDIA 8800GT - 1047 draw
calls
NVIDIA 8800GT - 1047 draw calls with
PCF
0
20
40
60
80
100
CPU waiting GPURender loop
Ela
ps
ed
tim
e(m
s)
Toon Shading
• Single pass– Problem: Dependent on no. of face
• Two passes– Scale vertex position following the vertex
normal– Not dependent on no. of face
𝑣→
:𝑣𝑖𝑒𝑤𝑣𝑒𝑐𝑡𝑜𝑟𝜃
𝑖𝑓 𝜃> h h𝑡 𝑟𝑒𝑠 𝑜𝑙𝑑 ,𝑑𝑟𝑎𝑤𝑡𝑜𝑜𝑛𝑐𝑜𝑙𝑜𝑟
𝑁→
:𝑣𝑒𝑟𝑡𝑒𝑥 𝑛𝑜𝑟𝑚𝑎𝑙
Toon Shading
• Enable back face• Scale vertex
position• Draw color
Toon• Enable front face• Draw material
General Result
Alpha Test
• Problem:– Stage3D without alpha test– “kil opcode in AGAL”
• Performance penalty on mobile device
Alpha Test• Solution:
Render loop time(ms)
Total time(ms)
6600GT alpha test
17~19 47
6600GT alpha blend
18~19 65~67
8800GT alpha test
0.16 37
8800GT alpha blend
0.3 36
• 304 draw calls• Alpha-test performance is better on
desktop
Replace alpha-test
with alpha-blend
Post Effect
OriginGlowDOFColor Filter
Static Lightmap
• Pros:– Pre-computation– Global illumination
• Cons:– More textures
Optimization for Flash Program
Optimization for Flash Program
• Problem:– For Each is slow
• “Use for-loop to replace it”
– Memory management• “Recycle manager”• “Strengthen garbage collection”
Optimization for Flash Program
• Solution:– Recycle manager
• Reduce garbage collection loading• Save objects initial time• public function
recycleObject3D( obj:IObject3D ):void• public function requestObject3D( classType:int ,
searchKey:*, renderHandle:int = 0 ):*
Optimization for Flash Program
• Solution:– Strengthen garbage collection
• Avoid inner function• Force to dereference function pointer• Dereference attribute in object destructor
• Avoid inner function• Force to dereference function pointer
Without inner function
Use inner function
Optimization for Flash Program
• Experiment: before vs. after– Switching among levels
Before improvement: After improvement :
Rapid loading
Rapid loading
• Streaming– Data compression
• PNG: swf compression: 20%~55%• Package: zip compression: 25~30%
– Batch loading• Separate resource to several packages• Download what you really need
Rapid loading
Enter to avatar stage
Enter to game stage
After loading picture finished
5Mb/sElapsed time (sec)
15 6 12
• game code• ui
• game scene • scene textures
Future Works
• Adobe Texture Format (ATF)– Support for compressed/mipmap textures on the
different GPU chipset
• FlasCC– C++ AS3 Compilation
• AS3 Workers– Multi-thread support
• MovieClip– Replace with Stage3D UI framework, ex: Starling
Conclusion
• Cross-Device/Cross-OS/Cross-Browser– Browser + Cloud Computing– Write Once, Run Anywhere
• Flash vs. HTML5• Cross-Compiling Technology Trend
– C/C++ + Flash/ActionScript– C/C++ + HTML5/JavaScript
Acknowledgements
• XPEC - Project C4 Team• XPEC - RDO Team