towards weakly-supervised visual understanding

39
Zhiding Yu Learning & Perception Research, NVIDIA [email protected] Towards Weakly-Supervised Visual Understanding

Upload: others

Post on 25-May-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards Weakly-Supervised Visual Understanding

Zhiding Yu Learning & Perception Research, NVIDIA

[email protected]

Towards Weakly-Supervised Visual Understanding

Page 2: Towards Weakly-Supervised Visual Understanding

Introduction

Page 3: Towards Weakly-Supervised Visual Understanding

The Benefit of Big Data and Computation Power

Figure credit: Kaiming He et al., Deep Residual Learning for Image Recognition, CVPR16

Page 4: Towards Weakly-Supervised Visual Understanding

Beyond Supervised Learning

“The revolution will not be supervised!”

— Alyosha Efros

Reinforcement Learning

(Cherry)

Supervised Learning

(Icing)

Unsupervised Learning

(Cake)

“If intelligence was a cake, unsupervised learning would be the cake,

supervised learning would be the icing on the cake, and reinforcement

learning would be the cherry on the cake.” — Yann LeCun

Page 5: Towards Weakly-Supervised Visual Understanding

Weakly-Supervised Learning

Image credit: https://firstbook.org/blog/2016/03/11/

teaching-much-more-than-basic-concepts/

From Research Perspective

▪ Similar to how human learns to understand the world

▪ Good support for “continuous learning”

From Application Perspective

▪ Good middle ground between unsupervised learning

and supervised learning

▪ Potential to accommodate labels in diverse forms

▪ Scalable to much larger amount of data

Page 6: Towards Weakly-Supervised Visual Understanding

Weakly-Supervised Learning

WSL

Incomplete

Supervision

Inaccurate

Supervision

Inexact

Supervision

Page 7: Towards Weakly-Supervised Visual Understanding

WSL

Incomplete

Supervision

Inaccurate

Supervision

Inexact

Supervision

▪ Wrong/misaligned labels

▪ Ambiguities

▪ Noisy label learning

Weakly-Supervised Learning

▪ Seg/Det with cls label/bbox/point

▪ Multiple instance learning

▪ Attention models

▪ Semi-supervised learning

▪ Teacher-student models

▪ Domain adaptation

Self-supervision Meta-supervision Structured info

Domain prior Normalization

Eliminating the intrinsic uncertainty

in WSL is the key!

Page 8: Towards Weakly-Supervised Visual Understanding

8

Learning with Inaccurate

Supervision

Page 9: Towards Weakly-Supervised Visual Understanding

Category-Aware Semantic Edge Detection

Original Image

Semantic Edges Category-Aware Semantic Edges

Perceptual Edges

Page 10: Towards Weakly-Supervised Visual Understanding

Category-Aware Semantic Edge Detection

Zhiding Yu et al., CASENet: Deep Category-Aware

Semantic Edge Detection, CVPR17

Saining Xie et al., Holistically-Nested Edge

Detection, ICCV15

Page 11: Towards Weakly-Supervised Visual Understanding

Human Annotations Can Be Noisy!

Image credit: Microsoft COCO: Common Objects in Context (http://cocodataset.org)

Page 12: Towards Weakly-Supervised Visual Understanding

Motivations of This Work

Automatic edge alignment Producing high quality sharp/crisp edges during testing

Page 13: Towards Weakly-Supervised Visual Understanding

The Proposed Learning Framework

Zhiding Yu et al., Simultaneous Edge Alignment and Learning, ECCV18

Page 14: Towards Weakly-Supervised Visual Understanding

Learning and Optimization

Page 15: Towards Weakly-Supervised Visual Understanding

Experiment: Qualitative Results (SBD)

Original GT CASENet SEAL

Page 16: Towards Weakly-Supervised Visual Understanding

Experiment: Qualitative Results (Cityscapes)

Original GT CASENet SEAL

Page 17: Towards Weakly-Supervised Visual Understanding

17

Page 18: Towards Weakly-Supervised Visual Understanding

18

Page 19: Towards Weakly-Supervised Visual Understanding

SBD Test Set Re-Annotation

Page 20: Towards Weakly-Supervised Visual Understanding

Experiment: Quantitative Results

Page 21: Towards Weakly-Supervised Visual Understanding

Experiment: Automatic Label Refinement

Alignment on Cityscapes (red: before alignment, blue: after alignment)

Original GT SEAL

Page 22: Towards Weakly-Supervised Visual Understanding

Learning with Incomplete

Supervision

Page 23: Towards Weakly-Supervised Visual Understanding

Obtaining Per-Pixel Dense Labels is Hard

Real application often requires model robustness over scenes with large diversity

▪ Different cities, different weather, different views

▪ Large scale annotated image data is beneficial

Annotating large scale real world image dataset is expensive

▪ Cityscapes dataset: 90 minutes per image on average

Page 24: Towards Weakly-Supervised Visual Understanding

Use Synthetic Data to Obtain Infinite GTs?

Original image from GTA5 Ground truth from game Engine

Original image from Cityscapes Human annotated ground truth

Page 25: Towards Weakly-Supervised Visual Understanding

Drop of Performance Due to Domain Gaps

Cityscapes images Model trained on Cityscapes Model trained on GTA5

Page 26: Towards Weakly-Supervised Visual Understanding

Unsupervised Domain Adaptation

Page 27: Towards Weakly-Supervised Visual Understanding

Domain Adaptation via Deep Self-Training

Yang Zou*, Zhiding Yu* et al., Unsupervised Domain Adaptation for Semantic Segmentation via Class-

Balanced Self-Training, ECCV18

Page 28: Towards Weakly-Supervised Visual Understanding

Preliminaries and Definitions

Page 29: Towards Weakly-Supervised Visual Understanding

Self-Training (ST) with Self-Paced Learning

Page 30: Towards Weakly-Supervised Visual Understanding

Class-Balanced Self-Training

Page 31: Towards Weakly-Supervised Visual Understanding

Self-Paced Learning Policy Design

Page 32: Towards Weakly-Supervised Visual Understanding

Incorporating Spatial Priors

Page 33: Towards Weakly-Supervised Visual Understanding

Experiment: GTA to Cityscapes

Original Image Ground Truth Source Model CBST-SP

Page 34: Towards Weakly-Supervised Visual Understanding

Experiment: GTA to Cityscapes

Page 35: Towards Weakly-Supervised Visual Understanding

Learning with Inexact

Supervision

Page 36: Towards Weakly-Supervised Visual Understanding

Learning Instance Det/Seg with Image-Level Labels

Previous Method (WSDDN)

Our Proposed Method

Work in progress with Zhongzheng Ren, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz et al.

Page 37: Towards Weakly-Supervised Visual Understanding

Conclusions and Future Works

Page 38: Towards Weakly-Supervised Visual Understanding

Conclusions and Future Works

Conclusions

▪ WSL methods are useful in a wide range of tasks, such as Autonomous Driving, IVA, AI City,

Robotics, Annotation, Web Video Analysis, Cloud Service, Advertisements, etc.

▪ Impact from a fundamental research perspective towards achieving AGI.

Future works

▪ A good WSL platform that can handle a variety of weak grounding signals and tasks.

▪ Models with better designed self-sup/meta-sup/structured info/priors/normalization.

▪ Large-scale weakly and unsupervised learning from videos.

▪ Weak grounding signal with combination to robotics and reinforcement learning.

Page 39: Towards Weakly-Supervised Visual Understanding

Thanks You!