Download - Keynote from ISUVR'10
Image-based modelling for augmented reality
Anton van den Hengel
Director, Australian Centre for Visual technologies
Professor, Adelaide University, South Australia
Director, PunchCard Visual Technologies
3D Modelling for AR
AR needs modelsAR is about the interaction between the real
and the synthetic 3D modelling isn’t much fun
Even with the best interfaces invented 3D Studio Max? Blender?
User-created content
2D UCC has changed the face of the web Blogs, Wikis, Social networking sites, Advertising, Fanfiction, News Sites, Trip
planners, Mobile Photos & Videos, Customer review sites, Forums, Experience and photo sharing sites, Audio, Video games, Maps and location systems and such, but more
Associated Content, Atom.com, BatchBuzz.com, Brickfish, CreateDebate, Dailymotion, Deviant Art, Demotix, Digg, eBay, Eventful, Fark, Epinions, Facebook, Filemobile, Flickr, Forelinksters, Friends Reunited, GiantBomb, Helium.com, HubPages, InfoBarrel, iStockphoto, Justin.tv, JayCut, Mahalo, Metacafe, Mouthshut.com, MySpace, Newgrounds, Orkut, OpenStreetMap, Picasa, Photobucket, PhoneZoo, Revver, Scribd, Second Life, Shutterstock, Shvoong, Skyrock, Squidoo, TripAdvisor, The Politicus, TypePad, Twitter, Urban Dictionary, Veoh, Vimeo, Widgetbox, Wigix, Wikia, WikiMapia, Wikinvest, Wikipedia, Wix.com, WordPress, Yelp, YouTube, YoYoGames, Zooppa
User-created content for AR
Google-created content for AR
UCC for AR
Just using images is a good startBut limits interactions to 2D
Flexible AR requires 3D models Ubiquitous AR requires UCC Flexible ubiquitous AR requires 3D UCC
3D UCC
3D has been limited by the lack of good UCC tools This is true for AR But also VR, 3D TV,
Second Life, Google Earth, Little Big Planet, 3D PDF, Adobe Premier, Unreal Tournament, Playstation, SGML, ...
3D UCC
AR particularly needs to model the real world Images are a good source of 3D information
Easily accessible They’re typically captured anyway Almost everything has a camera attached
Humans are very good at interpreting them
Can AR be ubiquitous without UCC?
Image-based 3D UCC
The image is the interfacePeople can’t help but see images in 3DMost image sets embody 3D
Powerful way to model real objectsVarying levels of interactionVarying types of models
Helps even in modelling imaginary objects
Image-based modelling for AR
AR is largely about interactive imagesAny other mode of interaction adds
complexity The majority of the content is real 3D modelling from images seems a
natural fit with AR
Image-based 3D modelling
AutomaticVery detailed models of everythingBut it’s getting better
InteractiveMeans you can specify
What you want to model What kind of model you want
Videotrace
Interactive image-based modelling A familiar interface Image-based interactions
The image is the interface Generates low polygon count models with
textures
Input
Modelling
Results
Another example
Interactive 3D modelling
3D modelling is critical to all sorts of application Special effects, but also mining, architecture, defence,
urban planning, … People are getting more visually sophisticated More 3D data is being generated
More cameras, but also scanners etc The interfaces of modelling programs are usually
very hard to fathom
Low polygon-count models
Insert your own objects into a game Model an environment for AR Put your house into Google Earth Video editing
Cut and paste between sequencesRemove someone from your home videos
Put your truck into a game
Put your truck into a game
Modelling for special effects
Video editing requires models
Video editing requires models
Modelling architecture
Modeling for virtual environments
Modeling for virtual environments
Modeling for virtual environments
Modeling for virtual environments
The process
Capture and import the video Run video through the camera tracker
Performs structure and motion analysis
Interact with the system to generate and edit the modelExport to your application
The approach
Pre-compute where possibleStructure from motion (camera tracking)Superpixels
Then interact Interactions allow user to exploit precomputed
results
Structure from motion
Camera tracking Calculates
Reconstructed point cloudCamera parameters
Location Orientation Intrinsics (eg. Focal length)
Informs interaction interpretation process
Structure from motion
Interactions Straight lines
Closed sets of lines define planar polygons Curves
For planar shapes with curved edges For NURBS surfaces
Mirroring Duplicates existing geometry
Extrusion Dense meshing
Fitting planar faces
User specifies boundary Boundary specifies infinitely many planes Fitting similar to pre-emptive RANSAC
Generate bounded plane hypotheses from point cloud
Eliminate hypotheses that fail a series of tests Run simplest / most robust tests first
Generally 3d tests before 2d tests
Image plane
Line of sight
Fitting planar facesFitting planar faces
Object points
Hierarchical RANSAC Generate bounded plane hypotheses Tests
Support from point cloudReprojects within new image boundariesConstraints on relative edge length and face
sizeColour histogram matching on facesColour matching on edge projectionsReprojection is not self-occluding
2D Curves
3D Curves
Mirroring
Extrusion
Dense surface reconstruction
Live modelling
Live modelling
Most geometry cannot be modelled beforehandYou can’t tell where it will beModelling the whole world won’t work
Need to generate models in-situWhile you’re there
Live modelling in AR
Using VideoTrace to model geometry from live video To insert elsewhere in
the world So real objects can
occlude synthetic geometry
Live modelling for AR
The camera tracking is performed live using SLAMSimultaneous Localisation and mapping
Markerless video tracking No prior model of the space
Using PTAM Parallel Tracking and Mapping Klein and Murray
PTAM
Videotrace - Live
Occlusion
Low polygon count models?Needed for efficiencyNot accurate enough for occlusion
calculations SLAM errors also prevent direct occlusion
modelling
Occlusion boundary refinement
The model of the foreground object is projected into the imageUsing the PTAM-estimated camera
parameters But there is always some misalignment Solve using a live segmentation of the real
object from the video
Occlusion boundary refinement
Lay out nodes of a graph around the projected boundarySet foreground and background probabilities
per node from colour modelSet link weights from edge strengthSegment using max-flow algorithm
At frame rate
Occlusion boundary refinement
Occlusion boundary refinement
Graph cut means that model doesn’t need to be accurate Very low polygon
counts Very simple modelling
process More complex objects
possible
Occlusion boundary refinement
Graph cut gives a hard segmentation
Fix with an alpha matte
Blends between foreground and synthetic object
Fixes some holes in the cut
Live modelling for AR
AR modelling for other purposes
Minimal interaction AR modelling Use the camera as the modelling tool
The user only specifies the object, the rest is done with the camera
Projective texturingSome compensation for Visual Hull
Silhouette modelling
Minimal interaction modelling
How to get Videotrace
It’s available on free beta testJust register at www.punchcard.com.auThey will email you a link It’s a real beta
Hopefully the final version will be free too
What’s next?
New interactions, applications and data sources Interactive SFM, Better SLAM Videoshop