![Page 1: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/1.jpg)
Challenges for Deep Scene Understanding
Bolei Zhou
MIT
Hang
ZhaoSanja
Fidler(UToronto)
Adela
BarriusoAntonio
Torralba
Aditya
Khosla
Aude
Oliva
Xavier
Puig
Bolei
Zhou
![Page 2: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/2.jpg)
ObjectsintheSceneContext
![Page 3: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/3.jpg)
Challenge 1: Scene Classification
Top1: street
Top2: residential neighborhood
Top3: crosswalk
Top4: apartment building
Top5: office building
objects
Challenge 2: Scene Parsingstuff
Deep s
cene u
nders
tandin
g
![Page 4: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/4.jpg)
• 8milliontrainingimagesfrom365categoriesofPlacesDatabase
• Testset:900imagespercategory
Webpage:http://places2.csail.mit.edu
![Page 5: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/5.jpg)
Constructing Places Database
2.Queryanddownload
images
696adjectives+scenenames
~90million rawimagesdownloaded
1. Collectscenenames
fromdictionary
~1000scenenames
3.Annotatethrough
AmazonMechanicalTurk
Threeroundsofannotations
![Page 6: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/6.jpg)
amusement park
arch
corral
windmill train station platform
Urban
tower
swimming pool street
soccer field
elevator door
bar
cafeteria
veterinarians office
bedroom
conference center
Indoor
staircase
shoe shop
field road
fishpond
watering hole
Nature
rainforest
![Page 7: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/7.jpg)
Results
92 validsubmissionsfrom27 teams(eachteamallowstosubmit
atmost5submissions).
6.00%
8.00%
10.00%
12.00%
14.00%
16.00%
18.00%
20.00%
0 10 20 30 40 50 60 70 80 90 100
Top-5errorsofallthe92submission(sorted)
Baseline
singleResNet152:14.9%
![Page 8: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/8.jpg)
Results
Team Name Top-5Error
Hikvision 9.01%
MW 10.19%
Trimps-Soushen 10.30%
SIAT_MMLAB 10.43%
NTU-SC 10.85%
Hikvision
Qiaoyong Zhong, ChaoLi,Yingying
Zhang,Haiming Sun,Shicai Yang,Di
Xie,Shiliang Pu.
Hikvision ResearchInstitute
MW
GangSunandJie Hu
ChineseAcademyofSciencesand
PekingUniversity
Trimps-Soushen
Jie Shao,Xiaoteng Zhang,Zhengyan
Ding,Yixin Zhao,Yanjun Chen,Jianying
Zhou,Wenfei Wang,LinMei,
Chuanping Hu
TheThirdResearchInstituteofthe
MinistryofPublicSecurity,China
92 validsubmissionsfrom27 teams.
ResNet152 14.93%
VGG16 14.99%
AlexNet 17.25%
Singlemodel
baselines
![Page 9: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/9.jpg)
Ambiguouspredictions
1)Unusualactivityinascene 2)Multiplesceneparts
top-1: restaurant
top-2: ice cream parlor
top-3: coffee shop
top-4: pizzeria
top-5: cafeteria
aquarium
top-1: campsite
top-2: sandbox
top-3: beer garden
top-4: market outdoor
top-5: flea market indoor
junkyard
top-1: balcony interior
top-2: beach house
top-3: boardwalk
top-4: roof garden
top-5: restaurant patio
lagoon
top-1: martial arts gym
top-2: stable
top-3: boxing ring
top-4: locker room
top-5: basketball court
construction site
![Page 10: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/10.jpg)
• Newchallengethisyear
• Eachpixeloftheimageisclassifiedintosomeclass
class labelsemantic mask
Scene
parsing
![Page 11: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/11.jpg)
• 22,000imagesfortrainingandvalidation,3,000imagesfortesting
• 150classesofobjects(car,person,table,etc)andstuff(sky,road,ceiling,etc)
00.020.040.060.080.1
0.120.140.160.18
Pixelfrequencyinthetrainingset
![Page 12: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/12.jpg)
• 22,000imagesfortrainingandvalidation,3,000imagesfortesting
• 150classesofobjects(car,person,table,etc)andstuff(sky,road,ceiling,etc)
![Page 13: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/13.jpg)
ConstructingADEDataset
• Annotatingeachobjectinstancesinascene
• Singleexpertannotatorforafewyearsofwork
http://groups.csail.mit.edu/vision/datasets/ADE20K/
Labelme AnnotationTool
Ms. Adela Barriuso
![Page 14: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/14.jpg)
![Page 15: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/15.jpg)
![Page 16: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/16.jpg)
![Page 17: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/17.jpg)
![Page 18: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/18.jpg)
![Page 19: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/19.jpg)
![Page 20: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/20.jpg)
![Page 21: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/21.jpg)
![Page 22: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/22.jpg)
Results75 validsubmissionsfrom22 teams
0.3
0.35
0.4
0.45
0.5
0.55
0.6
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75
Finalscore=(meanIoU +pixelaccuracy)/2forallthe75
submissions
Baseline(VGG-based):0.4567
![Page 23: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/23.jpg)
Results
Team Name FinalScore
SenseCUSceneParsing 0.5721
Adelaide 0.5674
360+MCG-ICT-CAS_SP 0.5556
SegModel 0.5465
CASIA_IVA 0.5433
FinalScore=(meanIoU +pixelaccuracy)/2
SenseCUSceneParsing
Hengshuang Zhao,Jianping Shi,
Xiaojuan Qi,Xiaogang Wang,Tong
Xiao,Jiaya Jia
Sensetime andCUHK,HongKong
Adelaide
Zifeng Wu,Chunhua Shen,Antonvan
enHengel
UniversityofAdelaide,Australia
360+MCG-ICT-CAS_SP
Rui Zhang,MinLin, ShengTang, YuLi, YunPeng
Chen, YongDong Zhang, JinTao Li, YuGang
Han, ShuiCheng Yan
Qihoo 360 ,MultimediaComputing
Group,InstituteofComputing
Technology,ChineseAcademyofSciences(MCG-
ICT-CAS),NationalUniversityofSingapore(NUS)
75 validsubmissionsfrom22 teams
DilatedNet 0.4567
FCN-8s 0.4480
SegNet 0.4079
Singlemodel
baselines
![Page 24: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/24.jpg)
DataConsistencyandHumanPerformance
• 61imagesfromval setarere-annotatedafter6months.
82.4%pixelsgotthesamelabel.
64.03% 64.77% 65.41%
73.67% 74.49% 74.73%
82.40%
50.00%
55.00%
60.00%
65.00%
70.00%
75.00%
80.00%
85.00%
PixelAccuracy
![Page 25: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/25.jpg)
DataConsistencyandHumanPerformance
• 61imagesfromval setarere-annotatedafter6months.
82.4%pixelsgotthesamelabel.
64.03% 64.77% 65.41%
73.67% 74.49% 74.73%
82.40%
50.00%
55.00%
60.00%
65.00%
70.00%
75.00%
80.00%
85.00%
PixelAccuracy
floor
sky
wall
building
20.30%pixelaccuracy
averageimage annotationmode
![Page 26: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/26.jpg)
96.55% 96.36% 96.27% 91.96% 91.04%
94.14% 95.63% 96.55% 94.19% 89.59%
93.71% 93.51% 94.84% 94.89% 94.54%
92.73% 91.88% 93.02% 92.62% 91.47%
92.25% 85.66% 90.47% 79.81% 84.85%
Image Ground-truth SenseCU… Adelaide 360+MCG… SegModel CASIA_IVA
![Page 27: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/27.jpg)
62.89% 51.23% 50.25% 45.66% 67.91%
56.19% 42.86% 55.82% 81.53% 48.69%
grandstand
54.08% 52.81% 53.70% 50.49% 38.97%
41.49% 40.72% 49.07% 42.28% 45.04%
boat
car
booth
Image Ground-truth SenseCU… Adelaide 360+MCG… SegModel CASIAIVA
![Page 28: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao](https://reader033.vdocuments.mx/reader033/viewer/2022051721/5a821a487f8b9a682c8db934/html5/thumbnails/28.jpg)
ThanksalltheParticipantsandAudiences!
Hang
ZhaoSanja
Fidler(UToronto)
Adela
BarriusoAntonio
Torralba
Aditya
Khosla
Aude
Oliva
Xavier
Puig
Bolei
Zhou
http://places2.csail.mit.edu
http://sceneparsing.csail.mit.edu