making robust computer vision in games presented by: diarmid campbell
Post on 31-Mar-2015
221 Views
Preview:
TRANSCRIPT
Making robust computer vision in gamesPresented by: Diarmid Campbell
Introduction
Who I am: Diarmid Campbell
What I do: Run the Vision R&D group
Where we do it : Sony’s London development studio
What we do: Research computer vision for camera based games
This talk: Making robust computer vision in games
Contents
What we do and why The development process Testing and videos Computer Vision Concepts A robust head tracker Marker based Augmented Reality The problems we faced A demo of EyePet
Camera based games
Camera mounted on the TV You see yourself on the TV Game is overlaid on top of you
Past games on PS2
Computer Vision is hard
“Computer vision makes you want to kill yourself”
-Dr Nick Lord 2009
Why is it hard?
Humans mange it effortlessly Image is a 2D array of numbers Take 5 images and plot them as a
height map
Pick the odd one out
Pick the odd one out
Odd one out
Pick the odd one out
Odd one out
Pick the odd one out
Odd one out
Factors affecting the pixels
Background objects in scene Orientation/position of objects Lighting/Shadows Occlusion
George is in the pixels
Not interested in those George was hidden in the pixels “Here is an image, what is it of?” The general computer vision
problem is hard If we constrain the problem, it is
much easier (but still hard)
Robust Inputs
We can use computer vision as an input mechanism Motion detection in EyeToy games
Robustness is how consistently an input mechanism does what the player is expecting
An input mechanism must be robust
Importance of robustness
If your fire button only worked 9 times out of 10, you would chuck your controller out.
Importance of robustness
There are ways around it
Importance of robustness
Imagine your gun is a champagne bottle
Importance of robustness
Each button click shakes it Eventually the top blows off The lack of robustness is hidden
Importance of robustness
Perhaps you need to now fight tortoises instead of warriors
Importance of robustness
The mechanic is now “robust” But it is laggy and unresponsive Cannot rely on split-second timing
Importance of robustness
Illustrates a general point If the game copes well with non-
robust inputs It will also cope well with someone
not playing it well It creates a skill ceiling Manifests itself as lack of game-
play depth
Importance of robustness
If you want a deep skill base game mechanics
Robust input is essential
The Development Process
Computer Vision Researcher Game designer
“Tell me what the game mechanic is and I’ll make you a state of the art solution “
“Give me something that works and I’ll see what we can make that’s fun”
The chicken and the egg
You cannot do one before the other Both development timelines happen in
parallel We are still figuring it out Here are some guidelines
Months Game development Computer Vision research1st pass prototype
1 Concepting2 Survey state of the art3 Prototyping4 Extend methods/analyse56789 Production Production
101112131415 Alpha1617 Beta1819 Alpha Master2021 Beta2223 Master24
Research timeline
Convinced we can create the technology
Something up and running
Time
Vision tech beta before game reaches alpha
Required infrastructure
Prototyping environment Matlab Octave
Be able to capture videos Runtime algorithms
Open CV VXL
Videos and testing
Videos and testing
Computer vision is hard because many variables affect the images The lighting The player’s clothes The wallpaper Spectators
3D cameras have their own pros and cons
Representative videos
Videos allow us to capture these variables and test
Videos MUST be representative Works in 99% of cases Useless if that 1% appear in 50% of
living rooms Make videos early in development Demo: head tracker capturing
Head detection videos
We run it through different algorithms Cell SDK face detector Show failure modes
When it fails we can find the frame it failed in and debug
Regression testing
Automated testing Run through load of videos Compare with expected results Expected results could be is head
visible?
When videos aren’t enough
SCEA R&D labs invented the forthcoming PlayStation®Move controller
Uses a camera and other sensors to track the controller
Videos were good early on But cannot change a video:
Lighting Backgrounds Camera settings
Solution
Video1 Video2
Reasons to buy a robot arm(as if you really need persuading)
Can test the same motion under many different conditions
Can try special hard cases
Computer Vision Concepts
Computer Vision Concepts
Videos tell us when it fails How do we fix it? This is the field of computer vision I cannot go into details of techniques Instead I will explain:
The common concepts How they link together
This should help if you: Read papers Talk to experts
Feature extraction
Images contain a lot of information This one is 900K
Feature extraction
Instead of using pixels directly extract high level properties of groups of pixels
Result in less data which is more relevant to the problem at hand
Image Feature Extraction Features
Feature extraction
PS3 Demo: Basic image PS3 Demo: Canny edge detector
Invariant to lighting changes Store additional gradient info
PS3 Demo: Motion Used in all our camera games
PS3 Demo: Feature points Store image patch for each one Can match them frame to frame
Likelihood functions
“Given that we have observed these features, what is the probability that we are observing what we modelled“
Conditional probability
Bayesian statistics underpins most vision algorithm
Model
Features
Likelihood Function P(F | M)
Cost functions = Likelihood functions Some terminology Sometimes you will here about
“Cost functions” They are the same concept:
Likelihood goes up with a good match Cost goes down
One is (conceptually) the inverse of the other
Cost functions
Sum of Squared Differences (SSD)
SSD
SSD
1532
12
High cost = bad match
Low cost = good match
Cost functions
Sum of Squared Differences (SSD)
Model(1) . . .
Classifier Most likely model
Model(2) Model(n)
Features
Classifiers
Compares observed features to a number of models Tells you which model fits the features best
Which model fits best
Classifiers: Face example
Is this a face?
Classifiers: Face example
Classic detector (Voila-Jones)
Image
Feature Extraction
Haar Wavelet Features
Boosted cascade Classifier
Face Model
Non-Face model
Is it a face?
Models are trained on example images
Classifier
Classifiers: Face example
PS3 demo
Detectors
We have a model (with associated state) Given some observed features Detector returns:
Is the object present? What it’s state? It’s state (X,Y position/rotation/Human pose)
Model
Features Detector
State
Is object present?
Detectors: Faces again
Viola-Jones face detector: Scans a box over the image Different positions and sizes Runs the classifier and returns any
positives Recall face detection demo
Trackers
We have a model, some observed features and the previous state
Tracker returns the next state
Model
Features TrackerNext State
Previous State
Trackers: Face example
PS3 Demo: SSD tracker PS3 Demo: Wand game If we move quickly the tracker gets stuck in a
local minimum
Learning more
Computer Vision Conferences ICCV CVPR ECCV
Read papers accepted by conferences
Get friendly with an academic Or hire one!
Robust Head Tracking
Track rotation and scale
The SSD based tracker did not track rotation and scale
Next iteration of tracker does: X, Y position Scale θ : in plane rotation
PS demo: Hager tracker (swap demo)
Track rotation and scale
Tracked more types of movement But very fragile Problem:
A 2D image patch is not a good model of a head
Track rotation and scale
Does not deal with out-of-plane rotation
Track rotation and scale
Even in-plane rotation is not right
Colour histograms
Lets move away from comparing pixels and think about features
Consider these images of the same objects
Colour histograms
If we compared them pixel for pixel they would seem very different
But look at a histogram of the colours that appear in them and they look the same
Colour histograms
Histograms are a feature that throw away all spatial information
Where we are now
Current system uses: Colour histograms Keeps approximate spatial information
Where we are now
It has a foreground and a background model – each with its own histograms
Where we are now
PS3 Demo
Marker based Augment Reality (AR)
Marker based AR
Marker based AR is in a published game: EyePet
Camera setup
Topics to discuss
Camera based gamesWhat is EyePet?Improving the techFuture research
What the player sees on the TV
Virtual
RealTopics to discuss
Camera based gamesWhat is EyePet?Improving the techFuture research
Marker based AR
We shipped a “magic card” with the game
Allows the players to manipulate virtual objects in 3D
Finding the marker Input image
Topics to discuss
Camera based gamesWhat is EyePet?Improving the techFuture research
Finding the marker Threshold
Topics to discuss
Camera based gamesWhat is EyePet?Improving the techFuture research
Finding the marker Trace outlines
Topics to discuss
Camera based gamesWhat is EyePet?Improving the techFuture research
Finding the marker Test for quad shapes
Topics to discuss
Camera based gamesWhat is EyePet?Improving the techFuture research
Finding the marker Actually, just keep pairs of quads
Topics to discuss
Camera based gamesWhat is EyePet?Improving the techFuture research
Finding the marker
Take corner positions Calculate a 2D transform
Topics to discuss
Camera based gamesWhat is EyePet?Improving the techFuture research
Finding the marker Match the pattern
Topics to discuss
Camera based gamesWhat is EyePet?Improving the techFuture research
Finding the marker Match the pattern
Topics to discuss
Camera based gamesWhat is EyePet?Improving the techFuture research
Finding the marker Match the pattern
Topics to discuss
Camera based gamesWhat is EyePet?Improving the techFuture research
Finding the marker Match the pattern
Topics to discuss
Camera based gamesWhat is EyePet?Improving the techFuture research
Finding the marker Match the pattern
Topics to discuss
Camera based gamesWhat is EyePet?Improving the techFuture research
Finding the marker Match the pattern (Yes!)
Topics to discuss
Camera based gamesWhat is EyePet?Improving the techFuture research
Finding the marker
Decompose the 2D transform Camera projection Model view matrix
Use a Kalman filter
Topics to discuss
Camera based gamesWhat is EyePet?Improving the techFuture research
Finding the marker And we’re done…..
Topics to discuss
Camera based gamesWhat is EyePet?Improving the techFuture research
Problems we faced
Picking the right threshold
Threshold to find black and white regions
But which one? Many clever solutions – didn’t work Brute force approach Try lots (around 60) thresholds
Picking the right threshold
PS3 Demo: Thresholds PS3 Demo: AR Thresholds
Light sensitive matching
Pattern matching used Sum of Square Differences (SSD)
SSD = 2242 SSD = 14 SSD = 874
Brightness of image affected the score
Light sensitive matching
Use Normalised Cross Correlation (NCC) instead
SSD = 0.8 SSD = 0.9 SSD = 0.9
Light sensitive matching
New way to look at images An image is an array of numbers We can list out every number and it becomes a
vector
12 0 34 23 123 63 23143 7 34 23 23 34 51
156 34 34 51 4 234 2313 34 234 23 63 1 23
0 2 1 14 2 4 2456 52 4 254 24 132 13232 34 132 23 23 4 35
2 4 4 54 3 34 4523 2 34 231 35 23 0
1 32 23 143 34 254 23243 0 1 32 234 23 234
232 34 45 65 4 54 14232 4 45 54 132 254231 2 1 143 234 23 45
100
100
1243
15613
05632
223
143
23207
3434
25234
42
320
34…
34
10,000
Light sensitive matching
This is a co-ordinate vector in “image space” Every 100X100 image corresponds to a single
unique point in image space
Light sensitive matching
This is a co-ordinate vector in “image space” Every 100X100 image corresponds to a single
unique point in image space Brightening an image corresponds to scaling the
position vector
Light sensitive matching
When comparing two images SSD corresponds to the distance between them
in image space NCC corresponds to their angle
SSD
θ
θ
Light sensitive matching
Linear algebra is the other pillar of computer vision
Feature extracting is just a transformation from one space to another
Image space -> Feature space Classifiers are often just planes
which divide up the space (e.g. into a region that contains faces and a region that doesn’t)
Occlusion
It is easy to occlude the marker with your fingers
Occlusion
Put big red handle on and instruct the player to hold it
Also put handle on the back
Occlusion: Another approach(still in research phase)
Edge based tracking Uses AR Marker to initialise Then tracks using edge features PS3Demo (load EyePet)
False positives
When not occluded, we find the marker (almost) all the time
Our home videos showed this False positives were a problem Not represented in our videos Added some Hollywood films to the
video tests We knew that no markers were
present
False positives
Saved out all spurious frames
False positives
Made a number of tweaks to algorithm E.g. Pattern matching whole marker,
not just the centre pattern 20 times less false detections
EyePet Demo
EyePet Demo
Use motion detection for normal interaction Call Jump Stroke
Use AR card for health monitor Screen-facing case
Needs stimulation Trampoline
Finally Give him a shower
Summary
What we do and why The development process Testing and videos Computer Vision Concepts A robust head tracker Marker based Augmented Reality The problems we faced A demo of EyePet
The End(please fill out your questionnaires)
top related