rgb d video enhancement and applications · lu sheng, thesis oral defense. structures in a local...
TRANSCRIPT
Probabilistic Approaches for RGB-D Video Enhancement and Applications
Speaker: Lu ShengSupervisor: Prof. King Ngi Ngan
Lu Sheng, Thesis Oral Defense
Why RGB-D Data Essential?
RGB: 2D visual pattern Depth: 3D geometry
RGB image cannot explicitly tells the computer the 3Dstructure of each object
Depth cannot tell us the texture patterns overlaid
RGB + Depth helps us to comprehensively understand the3D visual world
Lu Sheng, Thesis Oral Defense
Why RGB-D Data Essential?
Explosive growth of 3D applications
3D reconstruction Novel view synthesis
Virtual reality / Augmented reality 3DTV & FTV Refocus
Motion sensing /gesture recognition
Lu Sheng, Thesis Oral Defense
Why RGB-D Data Essential?
Explosive growth of 3D applications
Autonomousnavigation & safety
Personal & industrial robots
Scene understanding Pedestrian detection Action recognitionLu Sheng, Thesis Oral Defense
Stereo vision
Shape-from-shading Structure-from-motion
Recent Depth Acquisition Methods
L R
Drawbacks
Usually computationally intensive
Mediocre quality
Require simple or artificial shooting conditions.Lu Sheng, Thesis Oral Defense
Recent Depth Acquisition Methods
Kinect Time-of-flight camera Laser scanner
Compare to passive methods
Standard resolution depth frames in video frame rate
More robust to difficult shooting conditions
Drawbacks
Poor quality impedes the depth-based tasks to give full play to their potential performances
Lu Sheng, Thesis Oral Defense
High Quality Depth Data are Important
A lot of applications require high quality depth data
Spatiotemporal depth video enhancement is necessary
Depth data cannot perform structural regularization by their own
If accompanied by synchronized RGB data
multi-modal structural features shared by texture and geometry enable guidance from the texture features to regularize the depth maps
Lu Sheng, Thesis Oral Defense
Depth is NOT Texture
Depth links to the 3D geometry of the captured scene
Learn effective methods to encode these observations
Spatial relationshipsbetween objects
Depth ordering
Occlusion reasoning
Object segmentation
Geometric structures inside each object
Piecewise smoothness
Distinctive discontinuities
Lu Sheng, Thesis Oral Defense
Goals
Explore effective ways to render robust spatiotemporal RGB-D depth video enhancement
Learn specific treatments compatible to 3D geometry forenhancement and depth-based applications
Employ probabilistic approaches to model these tasks
Lu Sheng, Thesis Oral Defense
Hybrid Geometric Hole Filling Strategy
for Spatial Enhancement
Spatial RGB-D Enhancement
Lu Sheng, Thesis Oral Defense
Introduction
enhanced depth image
Low resolution Noise & outliers Depth missing holes Structure distortions
RGB-D images upsampled raw depth image
? High definite Structure optimized Complete
Lu Sheng, Thesis Oral Defense
Introduction
Observations
Co-occurrences between depth discontinuities and image edges
Homogeneous texture patterns have similar 3D geometries
Lu Sheng, Thesis Oral Defense
Hybrid Geometric Hole Filling Strategy
Filtering-based Depth Interpolation
Segment-based Depth Propagation
Hole Filling
Depth Map Refinement
Input RGB-D pair Output RGB-D pair
Lu Sheng, Thesis Oral Defense
Hole Partitioning
Up-sample low-resolution depth map into sparse grid
Pixels are divided into two parts
in hole region:
with depth values:
Further partition holes into two parts
based on valid depth pixels in its neighbors
Lu Sheng, Thesis Oral Defense
Filtering-based Depth Interpolation
Filtering-based Depth Interpolation for region
Require enough depth info. in the neighbors to infer a reliable depth value
Joint Bilateral Filtering
Fill Fill whole image
× =
Lu Sheng, Thesis Oral Defense
Depth Propagation under Segment Constraint
Depth Propagation for region
Segment constraint
Depth variation is smooth in an over-segmented RGB patch
One parametric surface model in one patch
Generate segments
Superpixel – simple linear iterative clustering (SLIC)
Hole patch
Patch with known depth
Partially filled patch
After filling
Lu Sheng, Thesis Oral Defense
Depth Propagation under Segment Constraint
Filling the partially filled patches by surface fitting with RANSAC
Surface propagation for patches
Assign the surface model by finding its most similar RGB patch with known surface model in the neighborhood
The cost function models the statistical texture similarity and spatial distance
A greedy algorithm is exploited
Lu Sheng, Thesis Oral Defense
Depth Propagation under Segment Constraint
Generate segments Fill in partially filled patches Fill in hole patches
Depth map refinement
Various filtering methods can be exploited here
A standard joint bilateral filtering is utilized for simplicity
Lu Sheng, Thesis Oral Defense
Experimental Results
Middlebury dataset
Error metric: Bad Pixel Ratio (Δ𝑑 ≥ 1 as bad pixel)
[1] C. Richardt, et. al., Coherent spatiotemporal filtering, upsampling and rendering of RGBZ videos, CGF, 2012[2] L. Wang, et. al., Stereoscopic inpainting: Joint color and depth completion from stereo images, CVPR 2008.
RGB Images Depth images Ground truth Muti-res JBU[1] Wang et.al [2] Proposed method
BP: 8.35% BP: 3.65% BP: 3.33%
BP: 14.10% BP: 3.10% BP: 2.51%
Lu Sheng, Thesis Oral Defense
Weighted Structure Filters
Based on Parametric Structural Decomposition
Spatial RGB-D Enhancement
Lu Sheng, Thesis Oral Defense
Introduction
A variety of popular image filters are related to the local statistics of the input image
Median filter: catch half point at the cumulative local distribution
Mode filter: seek the global mode of the local distribution
Average filter: estimate the expectation of the local distribution
Lu Sheng, Thesis Oral Defense
Introduction
Provided with a guidance feature map
Image intensity, patches, edge maps, …
These filters can be extended to joint weighted filters
Propagate local feature statistics into the target image
Various applications
Enhancement / de-noising / style manipulation / structure decomposition ….
Lu Sheng, Thesis Oral Defense
Introduction
Disparity enhancement
Image denoising
JPEG artifact removalContrast enhancement
Image stylization
Joint depth upsampling
Lu Sheng, Thesis Oral Defense
Weighted Distribution Estimation
The weighted distribution is
encodes both the spatial nearness and range affinity
measures the data compatibility
Brute-force implementation is of high computational cost
Computational cost depends on the number of samples 𝑔𝑖
✓ Hundreds of filtering operations are required to output a satisfactory distribution
✓ How to reduce it but do not distort the distribution?
𝑔𝑖
Lu Sheng, Thesis Oral Defense
Structures in a Local Patch
cloud
object
tower
sky
Lu Sheng, Thesis Oral Defense
Structures in a Local Patch
cloud
object
tower
sky
A patch of a natural image does not contain a large number of structures
Nearby patches share similar structures
Two pixels are similar if they both have high likelihoods to the same local structures
It is possible to construct the distribution of a local patch by the mixture model Lu Sheng, Thesis Oral Defense
A Probabilistic Kernel
Convention kernel for data compatibility
Assume the image is conveyed by several (e.g. 𝐿) structures throughout the image domain
Measure the difference between 𝑓𝑥 and 𝑓𝑦
Lu Sheng, Thesis Oral Defense
A Probabilistic Kernel
Each structure is a probabilistic model
Two pixels are similar if they both have high responses to the 𝑙𝑡ℎ model
Assemble all models
Gaussian distribution with noise std
Lu Sheng, Thesis Oral Defense
Weighted Distribution Estimation
Kernel
Gaussian, Kronecker delta, etc.
Distribution Estimation
Kernel
Local structure similarity
Distribution Estimation
Conventional Distribution The Proposed Distribution
Need hundreds of filtering operations
Only 𝐿 filtering operations to get 𝜓𝐱 𝑙 , 𝑙 ∈ ℒ
A mixture models!
Lu Sheng, Thesis Oral Defense
Gaussian Models for the Local Structures
Gaussian distribution to define the models for the local structures
Uniformly Quantized Models (UQM)
Locally Adaptive Models (LAM)
Lu Sheng, Thesis Oral Defense
Gaussian Models for the Local Structures
Estimation of the Locally Adaptive Models
Hierarchical Clustering by Binary Space Partition Tree
1
𝑆1
3
𝑆3
2
𝑆2
7
6
4
5
+
+
+
- -
-with
Lu Sheng, Thesis Oral Defense
Experimental Results & Discussions
The speedup of the proposed method
The gain is generally 2~4x faster for grayscale image 6~12x faster for color image Even faster for disparity map or cartoon-style
image due to their high structural homogeneity A manual threshold to stop model generation
Runtime comparison
Estimate the necessary LAM models on the BSD3000 dataset
Lu Sheng, Thesis Oral Defense
Experimental Results & Discussions
Application-I: Disparity Enhancement (error metric: RMSE)
~16s
~4s<1s
Lu Sheng, Thesis Oral Defense
Experimental Results & Discussions
Application-I: Disparity Enhancement
Cover more details & avoid staircase artifact Although small number of LAM models cannot cover all the details, it is
still superior to the UQM models
Lu Sheng, Thesis Oral Defense
Raw Color frame
Spatial filter Spatiotemporal filter
Lu Sheng, Thesis Oral Defense
Experimental Results & Discussions
Application-II: JPEG Block Artifact Removal
Piecewise smooth results and reduce staircase artifact but do not distort necessary structures
Lu Sheng, Thesis Oral Defense
Experimental Results & Discussions
Application-III: Contrast Enhancement
source image
after structure-preserving
filtering
after detailenhancement
Lu Sheng, Thesis Oral Defense
Experimental Results & Discussions
Application-IV: Joint Depth Map Upsampling
Lu Sheng, Thesis Oral Defense
Spatiotemporal Enhancement
based on Static Structure
Spatiotemporal RGB-D Enhancement
Lu Sheng, Thesis Oral Defense
Introduction
A raw depth video of a natural scene
Contains various complex and even unpredictable dynamic contents
Suffers spatial and temporal artifacts
Raw Kinect video
Color-coded Raw TOF video
Lu Sheng, Thesis Oral Defense
Introduction
A raw depth video of a natural scene
Contains various complex and even unpredictable dynamic contents
Suffers spatial and temporal artifacts
After the spatial enhancement
Reduce artifacts in spatial domain
But introduce temporal flickering
No temporal consistency
Aggravate flickering artifacts
Raw Kinect video
Spatial JBF
Lu Sheng, Thesis Oral Defense
Introduction
After a conventional spatiotemporal enhancement
Still contain temporal flickering
Distort depth variation on dynamic objects
Coherent spatiotemporal JBF
Spatial JBF
How to eliminate the temporal flickering while not distort the necessary depth
variation along dynamic objects?
Lu Sheng, Thesis Oral Defense
Static Structure
A moving object
A static object
The static background
Kinect or another depth camera
Lu Sheng, Thesis Oral Defense
Static Structure
A moving object
A static object
The static background
Kinect or another depth camera
Captured depth map
Lu Sheng, Thesis Oral Defense
Static Structure
Intrinsic structure underneath the captured scene
lies on or behind the surface of the input depth frame
A probabilistic medium to indicate whether a region is static
A moving object
A static object
The static background
Kinect or another depth camera
static structure
Lu Sheng, Thesis Oral Defense
Static Structure
Simple observations
Moving objects stay in its front
Static regions or visible background area are fused into it
A moving object
A static object
The static background
Kinect or another depth camera
static structure
Lu Sheng, Thesis Oral Defense
Static Structure Spatiotemporal Enhancement
Robust static/dynamic region detection by the static structure
Spatiotemporally enhance the static region with the static structure
Spatially optimized the dynamic foreground
Temporally coherent for static region and depth variation preserved
for dynamic contents
How to estimate static structure?
Lu Sheng, Thesis Oral Defense
Generative Model for Static Structure
Camera center
Line of sight
Current static structure
Behind the structure
Before the structure
A Probabilistic Generative Model
Lu Sheng, Thesis Oral Defense
Generative Model for Static Structure
A Probabilistic Generative Model
If incoming depth belongs to
State-I: the static structure
Camera center
Line of sight
Current static structure
State-I
Lu Sheng, Thesis Oral Defense
Generative Model for Static Structure
A Probabilistic Generative Model
If incoming depth belongs to
State-I: the static structure
State-II: outliers in the front or moving objects
Camera center
Line of sight
Current static structure
State-II
is an indicate function that is equal to 1, when input argument is true and 0 vice visa
Lu Sheng, Thesis Oral Defense
Generative Model for Static Structure
A Probabilistic Generative Model
If incoming depth belongs to
State-I: the static structure
State-II: outliers in the front or moving objects
State-III: outliers rearward or revealedbackground
Camera center
Line of sight
Current static structure
State-III
is an indicate function that is equal to 1, when input argument is true and 0 vice visa
Lu Sheng, Thesis Oral Defense
Generative Model for Static Structure
A Probabilistic Generative Model
The likelihood of w.r.t. the given static structure
Gaussian prior over
Dirichlet prior over the frequency of each state
Camera center
Current static structure
Lu Sheng, Thesis Oral Defense
Online Update Scheme
A Probabilistic Generative Model
The posterior
is the set of previous depth samples
is the set of current samples
Camera center
Current static structure
Lu Sheng, Thesis Oral Defense
Online Update Scheme
A Probabilistic Generative Model
The posterior
is the set of previous depth samples
is the set of current samples
If the input frame only contains the static scene and outliers, the updated static structure will be governed by the posterior, and we have
Its probable depth is
The reliability of the model is
Variational approximation for efficiencyCamera center
Updated static structure
Lu Sheng, Thesis Oral Defense
Layer Assignment
Label the input depth frame into three layers
𝑙𝑖𝑠𝑠: agree with estimated static structure
𝑙𝑑𝑦𝑛: belong to dynamic objects
𝑙𝑜𝑐𝑐: refer to the previous occluded structure
𝑙𝑖𝑠𝑠 and 𝑙𝑜𝑐𝑐 defines the current static regions
Fully Connected Conditional Random Fields with effective inference based on real-time high-dimensional filters
𝒍𝒊𝒔𝒔
𝒍𝒅𝒚𝒏
𝒍𝒐𝒄𝒄
Lu Sheng, Thesis Oral Defense
Layer Assignment & Online Update of the Static Structure
(a)
(b)
(c)
(d)
(e)
#1 #2 #3 #4
#1 #2 #3 #4
#1 #2 #3 #4
#1 #2 #3 #4
#1 #2 #3 #4
#5
#5
#5
#5
#5
Raw depth
Raw color
Layer assign.
Depthstatic
struct.
Colorstatic
struct. Lu Sheng, Thesis Oral Defense
Layer Assignment & Online Update Update of the Static Structure
#1 #2 #3 #4 #5
(a)
(b)
(c)
#1 #2 #3 #4 #5
#1 #2 #3 #4 #5
Raw depth
Layer assign.
Depthstatic
struct.
Lu Sheng, Thesis Oral Defense
Spatiotemporal Depth Video Enhancement
Input data (𝑡)
Layer Assignment
VariationalApproximation
Spatial Enhancement
Static Structure (𝑡)
Static Structure (𝑡 − 1)
Spatiotemporal Depth Video Enhancement
Online Static Structure Updating Scheme Enhanced depth frame
(𝑡)Lu Sheng, Thesis Oral Defense
Result Comparisons
(a) Raw RGB-D videos
(b) Proposed method (c) Lang et al. [3]
[1] C. Richardt, et. al, “Coherent spatiotemporal filtering, upsamplingand rendering of RGBZ videos,” Computer Graphics Forum, 2012.
[2] D. Min et al, “Depth video enhancement based on weighted mode filtering,” TIP, 2012.
[3] M. Lang et al, “Practical temporal consistency for image-based graphics applications,”TOG. 2012.
superior in static scene reconstruction dynamic object enhancement
Lu Sheng, Thesis Oral Defense
Result Comparisons
(a) Raw RGB-D videos (b) Proposed method
(c) CSTF [1] (d) WMF [2] (e) Lang et al. [3]Lu Sheng, Thesis Oral Defense
Color frames
Depth frames
CSTF [1]
WMF [2]
Lang et al. [3]
Ours
Closed-upsLu Sheng, Thesis Oral Defense
Result Comparisons
(b) Proposed method (c) Lang et al. [3]
(a) Raw RGB-D videos
(b) Proposed method (c) Lang et al. [3]
(a) Raw RGB-D videos
dyn_kinect_1 dyn_kinect_2
Lu Sheng, Thesis Oral Defense
Result Comparisons
(a) dyn_kinect_2 (b) dyn_kinect_3
Color
Depth
Lang et al. [3]
Ours
dyn_kinect_1 dyn_kinect_2
Lu Sheng, Thesis Oral Defense
Applications
Application-I: Background Subtraction
color image by raw depth image by the proposed method
Lang et al. [3] CSTF [1] WMF [2]Lu Sheng, Thesis Oral Defense
Applications
Application-II: Novel View Synthesis
(a) color image (b) raw depth image (c) enhanced depth image
(d) by raw depth image (e) by static structure (f) by enhanced depth imageLu Sheng, Thesis Oral Defense
A Generative Model
for Robust 3D Facial Pose Tracking
Depth-based Application
Lu Sheng, Thesis Oral Defense
Introduction
Why facial pose tracking interesting?
Immersive Video Communication
3DTV & Free-viewpoint TV
VR / AR and etc.
With expression added?
Image/Video Editing
Performance Capturing and etc.
Lu Sheng, Thesis Oral Defense
Introduction
How to let it
Markerless
No explicit or manual markers
Realtime
Cannot afford sophisticated correspondence estimation & face shape representation
Robustness and Smoothness
Robust to illumination variations, occlusions & outliers
Robust to varying facial expressions
Temporally coherent tracking
Adaptive to any user on-the-fly without manual calibration
Lu Sheng, Thesis Oral Defense
Introduction
RGB based facial pose tracking has been successfully performed under optimally constrained scenes
It is fragile for unconstrained capturing conditions
Illumination variations
Shadows
Large and severe occlusions
Common in numerous applications in consumer level
Lu Sheng, Thesis Oral Defense
Introduction
Commodity real-time range sensors
Explicitly tell the space relationship
Irrelevant to illumination variations & shading
Easier inference for occlusions
BUT new challenges arisen Noise, missing values &
outliers Complex occlusions Varying expressions Online user adaptation
Lu Sheng, Thesis Oral Defense
The Proposed Method
A framework that
unifies pose tracking and face model adaptation on-the-fly
offers accurate, occlusion-aware and uninterrupted 3D facial pose tracking
A visibility constrained criterion for
correspondence-free and occlusion-aware rigid facial pose estimation
A generative multilinear face model
both models the identity and expression
facilitates the online face model personalization without the interference caused by the expression variations
Lu Sheng, Thesis Oral Defense
Probabilistic 3D Face Parameterization
Multilinear Face Model
Unifies the representations of identity and expression
Models the face dataset as a 3D tensor
Decomposes it by High-order singular value decomposition
Any face can be reconstructed as
Lu Sheng, Thesis Oral Defense
Probabilistic 3D Face Parameterization
Generative models for face modeling
Model the uncertainties of the shape, identity, and expression
Feasible to simulate, predict the face identity and expression
Enable group-wise rigid facial pose estimation suitable for any faces
The generative face model can be learned from a training dataset
FaceWarehouse Dataset
150 identity, 47 expressions Different ages, genders, races … Its diversity lets the learned face
model cover most common identities and expressions
Lu Sheng, Thesis Oral Defense
Probabilistic 3D Face Parameterization
Identity and Expression Priors
Multilinear Gaussian Face Model
Learned from the FaceWarehouse datasettogether with the core tensor
for for
(b) Variance by (c) Variance by
mm
(a) Mean face (d) Variance by Lu Sheng, Thesis Oral Defense
Probabilistic Facial Pose Tracking
Rigid PoseTracking
Identity Adaptation
Input
Output
Identity distribution
Pose Parameters Face Model
Lu Sheng, Thesis Oral Defense
Transform a canonical face model to match the input point cloud
The warped face model has the distribution
Robust Facial Pose Estimation
(b) Variance by (c) Variance by
mm
(a) Mean face (d) Variance by
Face model in canonical coordinate
inputpoint cloud
scale rotation translation
Lu Sheng, Thesis Oral Defense
Ray Visibility Constraint
Occlusions are inevitable in uncontrolled scenarios
Occluded human faces are always behind the occluding objects, like hairs,fingers/gestures, glasses, accessories
Self-occlusion Occluded by hair
Occluded by hand/gestureOccluded by accessories
Lu Sheng, Thesis Oral Defense
Ray Visibility Constraint
Ray Visibility Constraint
If correctly aligned
the visible face model points are those that overlap with the input point cloud
the rest face model points should always be occluded by the input point cloud
(a) Case-I (b) Case-II (c) Case-III
Face point is visible Face point is occluded
Should be prevented
Lu Sheng, Thesis Oral Defense
Ray Visibility Constraint
Connect point pair along a ray
their distance along the surface of the input data
The distribution of one face model point ismapped along the surface normal direction
The face model point is visible
The face mode point is occluded visible
occluded
face distribution
line-of-sightcamera
Lu Sheng, Thesis Oral Defense
Ray Visibility Constraint
Ray Visibility Score
Measures the compatibility between the distributions of the face model andthe input point cloud
Applies the Kullback-Leibler Divergence
data distribution
projected model distribution
The minimization of ray visibility score results in the optimalcompatibility between these two distribution
Quasi-Newton method & further refined by particle swarm optimization
Occlusions receive constant penalties
Visible points punish the misalignment & model uncertainties
More robust than ICP-based cost function
solver
Lu Sheng, Thesis Oral Defense
Robust Facial Pose Estimation
Result comparison with the generic face model
(a) Color image (b) Point cloud (c) Initial alignment
(d) ICP (e) RVC + ML (f) RVS (g) RVS + PSO
Lu Sheng, Thesis Oral Defense
Robust Facial Pose Estimation
More results with the generic face model
(a) Color image (b) Point cloud (c) Initial alignment (d) Ours
no explicit correspondences
handle occlusions even with apoor initial pose
less vulnerable to bad localminima
PSO increases the robustness
Lu Sheng, Thesis Oral Defense
Online Identity Adaptation
Variational Approximation
The face model is identified by the identity distribution
It can be online estimated through assumed density filtering (ADF)
The data likelihood A mixture distribution encoding the model and outlier The model fitting function is robust to quantization with a modified
projection distance
The variance of identity is enlarged per frame to prevent overfitting Lu Sheng, Thesis Oral Defense
Online Identity Adaptation
(a)
(b)
(c)
Results of online model adaptationLu Sheng, Thesis Oral Defense
Experimental Results & Discussions
Experiments on public depth-based facial pose datasets
Biwi dataset ICT-3DHP dataset
Dataset 𝑵𝒔𝒆𝒒 𝑵𝒇𝒓𝒎 𝑵𝒔𝒖𝒃𝒋 occlusions expressions 𝝎𝒎𝒂𝒙
Biwi 24 ~15K 25accessories
hairneutral ~ slight
±75 yaw±60 pitch
ICT-3DHP 10 ~14k 10accessories
hairslight ~
exaggerated±75 yaw±45 pitch
Lu Sheng, Thesis Oral Defense
Experimental Results & Discussions
Robust to profiled faces due to large rotations and occlusions from hair andaccessories.
profiled face profiled faceocclusions
occlusions expressions profiled faceocclusions
Lu Sheng, Thesis Oral Defense
Experimental Results & Discussions
The proposed system is also effective to the expression variations
Ray visibility constraint
efficiently infer the occlusionsagainst the face model
optimize the visible face areaagainst the occlusions
Personalized face model
enables compact fitting
robust to changes in thepersonalized expressions
Lu Sheng, Thesis Oral Defense
Experimental Results & Discussions
Adaptation between different users
Three different identities are presented in three adjacent frames
Lu Sheng, Thesis Oral Defense
Experimental Results & Discussions
Comparison with the state-of-the-arts
MethodErrors
Yaw (deg) Pitch (deg) Roll (deg) Trans (mm)
Ours 2.3 2.0 1.9 6.9
RF 8.9 8.5 7.9 14.0
Martin 3.6 2.5 2.6 5.8
CLM-Z 14.8 12.0 23.3 16.7
TSP 3.9 3.0 2.5 8.4
PSO 11.1 6.6 6.7 13.8
Meyer 2.1 2.1 2.4 5.9
Li* 2.2 1.7 3.2 -
*This method is based on RGB-D data
Discriminative: RF Model fitting: CLM-Z, PSO, Martin et al.,
Meyer et al. Feature-based: TSP RGB-D: Li*
MethodErrors
Yaw (deg) Pitch (deg) Roll (deg)
Ours 3.4 3.2 3.3
RF 7.2 9.4 7.5
CLM-Z 6.9 7.1 10.5
Li* 3.3 3.1 2.9
Biwi dataset ICT-3DHP dataset
Lu Sheng, Thesis Oral Defense
Conclusion
Lu Sheng, Thesis Oral Defense
Conclusions
Hybrid Geometric Hole filling Strategy for Spatial enhancement
• Hybrid hole filling merging the interpolation and parametric structure propagation
• A novel texture-constrained patch matching method for a robust structure inference
Weighted Structure Filters Based on Parametric Structural Decomposition
• An efficient distribution estimation that are adaptive to local image structure
• Accelerating joint weighted filters without structural distortions
Lu Sheng, Thesis Oral Defense
Conclusions
Spatiotemporal Enhancement based on Static Structure
• Robust temporally consistent depth enhancement based on a probabilistic static structure of the captured scene
• The dynamic content is enhanced spatially while the static region favors a long-range spatiotemporal optimization
A Generative Model for Robust 3D Facial Pose Tracking
• A robust depth-based facial pose tracking system with an adaptive face model personalization
• The multilinear generative face model and the visibility-constrained rigid pose estimation improve the robustness
Lu Sheng, Thesis Oral Defense
Publications
Lu Sheng, King Ngi Ngan, Chern-Loon Lim and Songnan Li, Online Temporally Consistent Indoor Depth Video Enhancement via Static Structure, TIP, 2015.
Songnan Li, King Ngi Ngan, Raveendran Paramesran and Lu Sheng, Real-time Head Pose Tracking with Online Face Template Reconstruction, TPAMI, 2016.
Lu Sheng, Tak-Wai Hui and King Ngi Ngan, Accelerating the Distribution Estimation for the Weighted Median/Mode Filters, ACCV, 2014.
Lu Sheng, Songnan Li and King Ngi Ngan, Temporal Depth Video Enhancement Based On Intrinsic Static Structure, ICIP, 2014.
Lu Sheng, King Ngi Ngan and Songnan Li, Depth Enhancement Based On Hybrid Geometric Hole Filling Strategy, ICIP, 2013.
Chi Ho Cheung, Lu Sheng and King Ngi Ngan, A disocclusion filling method using multiple sprites with depth for virtual view synthesis, ICMEW, 2015.
Songnan Li, King Ngi Ngan and Lu Sheng, Screen-camera Calibration Using a Thread, ICIP, 2014.
Songnan Li, King Ngi Ngan and Lu Sheng, A Head Pose Tracking System Using RGB-D Camera, ICVS, 2013.
Lu Sheng, Jianfei Cai and King Ngi Ngan,, TIP, in preparation. A Generative Model for Robust 3D Facial Pose Tracking, TIP, in preparation.
Lu Sheng and King Ngi Ngan, Weighted Structural Prior for Structure-preserving Image and Video Applications, TIP, in preparation. Lu Sheng, Thesis Oral Defense
Thanks to
My supervisor Prof. King Ngi NganProf. Jianfei Cai
Committee members Prof. Wai Kuen Cham, Prof. Thierry Blu,
and Prof. Kwanghoon Sohn
My lovely IVP labmates
& My sweet families!
Lu Sheng, Thesis Oral Defense
Depth Propagation under Segment Constraint
Cost function construction
Randomly select 𝑘 sub-patches in each patch
Estimate similarity between two sub-patches
Calculate the cost of 𝑗𝑡ℎ sub-patch of 𝑢 with 𝑣, and find the 𝑣∗ patch with the minimum cost
Form a histogram indicating the number of sub-patches in 𝑢 that matches with 𝑣
Add spatial constraint, the cost is
Lu Sheng, Thesis Oral Defense
Gaussian Models for the Local Structures
Kernel Specification
Distribution is a mixture of Gaussian models
Constant time filter: Domain transform filter [1] Guided image filter [2]
[1] K. He et al., ECCV 2010[2] E. Gastal and M. Oliveira, ACM ToG 2011
Noise variance
Lu Sheng, Thesis Oral Defense
Gaussian Models for the Local Structures
Noise std
Lu Sheng, Thesis Oral Defense
Online Update Scheme
Variational Parameter Estimation
Factorize the posterior into independent Gaussian and Dirichlet distributions
The reliability of the model
The probable depth is
The posterior can be approximated by
Recursive estimation is possible!
Lu Sheng, Thesis Oral Defense
Online Update Scheme
Variational Parameter Estimation
Factorize the posterior into independent Gaussian and Dirichlet distributions
The posterior can be approximated by
Moment matching to estimate the hyperparameters
Closed-
form
solutions!
Lu Sheng, Thesis Oral Defense