abstractdpfan.net/wp-content/uploads/cvpr19videosalbenchmark_v8... · 2020-02-21 · ?Ò. ˝. c°...
TRANSCRIPT
à�ÀªwÍ5ÔNuÿ
��²1 �©)2 §²²1 ∗ !ïX2,3
1 Hm�ÆO�ÅÆ� 2 å <ó�UïÄ�(IIAI) 3 �®nó�ÆO�ÅÆ�
http://dpfan.net/DAVSOD/
Abstract
L��c¥§<�éÀªwÍÔNuÿ£VSOD¤
�,�FÃßþ", §ïÄ.�Ï"����õ�!
äkp�þI5�ý¢Ä�|µe�VSODêâ8"�
d§·�°%�ï��Ú<Àú5¿å����!
È�I5�VSOD êâ8§§k226�Àª!2.4�v§
ºXØÓ�ý¢|µ!é�!¢~ÚÄ�"/Ï�
A�ý¢úÄ5À:ê⧷���°(�^rI
5"l Äg²(rNäk]Ô5�wwwÍÍÍ555===£££y
�§=Àª¥�wÍé��U¬Ä�UC"
�?�Ú�VSOD+�Jø���¡�µÿ§�
©37�yk�VSODêâ8Ú·��ï�DAVSODê
â8þ£o�4�v§�c5���¤XÚ/µ�
17��ä�L5�VSOD�{"|^n�²;��
I§·�¥y�¡ ���5U©Û"d§·�
�JÑ��ÄO�."§����¡�wÍ5
=£��á�PÁòÈ�ä£convLSTM¤§�ÏLÆ
S<a5¿å=£1�5k�/Ó¼ÀªÄ�wÍ
5"2��µÿ(J��.muÚ'�m81²�
cµ"¤k�wÍã!µÿóä�!±9êâ8�µ
https://github.com/DengPingFan/DAVSOD/"
1.ÚówÍ5ÔNuÿ£SOD¤�3l·�ã�½Ä�
Àª¥J��áÚ5¿å�ÔN"T?Ö u@�ï
Ä¥<a�Àú5¿1�§=<aÀúXÚ(HVS)¥�
��¯<Uå))U¯�/ò5¿å=£�Àú|
µ¥�ä&Eþ�«�"@c�ïÄ [6,39]®²½þ/
y¢3ù«wª�!é�-?�wÍ5�ä£é�w
∗§²²([email protected])´�©�ÏÕ�ö"
一群狮子正在地上玩耍
注意力转移
显著性转移
3
描述:
#显著对象数.: 4 2
视频类别.: 动物
对象类别.: 狮子
相机运动.: 慢
对象运动.: 慢
视频帧
注意视点
实例-级VSOD 标注
对象-级VSOD 标注
Figure 1: �©DAVSODêâ8�I5«~"¤�¹�´LI5X§wÍ5=£§é�/¢~-?VSOD^rI5§wÍ
é��ê8§|µ/é�aO±9�Å/é�$Ä�ª§ù
�VSOD?ÖJøj¢�Ä:¿¦��«d3A^ÉÃ"
Í5¤ÚÛª�Àú5¿å©�1�£Àú5¿Å�¤
�m�3p��'5"
<3*ý¢.�L§¥§Àú�Ä�5Ã
?Ø3"Ïd§ÀªwÍÔNuÿ£VSOD¤éun
)HVS�d3Ån�~��kÏuy¢¥�«A^
§S�uÐ"X§Àª©� [68]§Àªi4 [51]§À
ªØ [21, 23]§gÄf¨ [84]§<ÅpÄ [76]�"Ø
ÙÆâd�Ú¢S¿Â§duÀªêâ£�«$
Ä�ª!ñ!�!ÔN/C�¤g��]Ô±9
<a3Ä�|µ¥Àú5¿1�£ÀJ55¿©�§
5¿=£ [5, 31, 54]¤�k�E,5§Ñ¦VSOD¡�
Xã��(J"Ïd§ùAcÚå2��ïÄ,
� [7, 19, 25, 30, 32, 33, 55]£�L2¤"
, §äk�L5�VSODµÿ��ïE,î¢
�§ù�VSODï��%ÇuÐ/¤m²�é'"¦
+kA��éVSOD?ÖJÑ�êâ8 [29, 34, 37, 46,
50, 53, 69]§��3±e"�µÄk§<3Ä�èA�
L§¥§5¿å¬�XÀªSN�Cz kÀJ5/
Ä�©�] �ØÓ�Ü©"�´§±c�êâ8I
5�¿vkòÄ��<ú5À:êâ�Ä3S§ ´
1
Figure 2: DAVSODêâ8¥�Àª«~§Ù(Jd¢~-?^rI5�©�(JÚ5¿À:ã£me�¤U\ ¤"
��òÀª ©¤lÑ�·�v5I5§ÏdØU
�«<aÄ�*Ïmý¢�5¿1�"Ùg§§�
��*Ð5§CX��§õ�5ÚJÝÏ~É���"
l §ykêâ8�ù��³�T©|�?�Ú
uÐ"
�©�ü��zXeµÄk§·�;��VSOD?
Ö�ï���5��DAVSOD£È�I5�Àªw
Íé�uÿ¤êâ8"
• §�¹226�ÀªS�§î�/�âý¢�<a5
À:P¹£�ã.2¤5I5"���´§ÀJ5
5¿Ú5¿å=£ùü���Ä�5¿A5Ñ
��Ä�"3DAVSODêâ8¥§wÍé��U
¬3ØÓ��k¤UC£�ã.1¤§ù�ÎÜ¢S
�I�éÀªSNk����n)"ù�Ò�ï
��ÚÀú5¿å���5�VSODêâ8"
• d§ÀªS�ÏL°%çÀ�±ºXõ��|
µ/é�aO!$Ä�ª¿±ÅvÅ��°(/I
5£�2.4�v¤"
• DAVSOD�,��A:´Jøé�Ú¢~-?I
5±9{á�©i£ã"ùk|ur?�«d3
ïÄ���uЧX¢~-?VSOD§ÀªwÍé
�aê§ÄuwÍ5�Àªi4�"
Ùg§|^®ïá�DAVSODêâ8Ú�c
�7�VSODêâ8 [29,34,37,46,50,53,69]§·�é17«
�k?��. [8, 11, 29, 35, 38, 46, 47, 56, 61, 62, 64, 68–
70, 75, 80, 85]?1�¡�µÿ§¦Ù¤����
�VSODµÿ"d§·��JÑ��¶�SSAV
£¡�wÍ5=£�VSOD¤�Ä:�."§ÏL¦^
wÍ5=£a�convLSTM�¬5ÆS¿ýÿÀªwÍ
5§T�¬wª/�[<a3Ä�|µ¥�Àú5¿
å=£1�"þãµÿ(J�Ù/y²SSAV�.�
k�5"
�©�ü��z|¤�����µÿ²�"
����Ö¿5�µÿ§k§�±�\�\/
êâ8 c° #Vi. #AF. DL AS FP EF IL
SegV2 [34] 2013 14 1, 065 XFBMS [50] 2014 59 720
MCL [29] 2015 9 463ViSal [69] 2015 17 193
DAVIS [53] 2016 50 3, 455 XUVSD [46] 2017 18 3, 262 X
VOS [37] 2018 200 7, 467 X
DAVSOD 2019 226 23, 938 X X X X X
Table 1: DAVSODêâ8±9�cVSODêâ8�ÚOêâ"w,§DAVSODJø�\´L�I5" #Vi.: Àªêþ"#AF.: I5v�êþ"DL:´Ä´È�£Åv¤I5"AS:´
Ä�Ä5¿å=£"FP:wÍÔN�I5´Ä�â<ú5
À:"EF:´Ä�I5�wÍé�Jø<ú5À:"IL:´Ä
Jø¢~-?I5"
)VSOD?Ö¿r?�õ�ïÄó��Xù���u
Ð"
2.�'ó�
VSODêâ8. ùc§kA�êâ8�ïá½Ú\VSOD+�"L1�Ñùêâ8�ÚOêâ"Ù
¥§SegV2 [34]ÚFBMS [50]´ü�@Ï�æB�êâ
8"du§�´�A½8� �O�§ÏdØ�·
ÜVSOD?Ö",��MCL [29]êâ8=k9�{ü�
ÀªS�"ViSal [69]K´1��;��O�VSODê
â8§�¹17��k²wé��ÀªS�"�C§
Wang�< [70]rͶ�Àª©�êâ8DAVIS [53]Ú
\�VSOD¥§§d50�äk]Ô5�|µ|¤"¦
+þãêâ8lØÓ§Ýþr?VSOD�uЧ�
Ù5�£=A��Àª¤îÉ�"¿�ùêâ8
��ï��ÄÄ�|µ¥<aý¢�5¿À:§=
=ÏLA�I5ö±Ãó��ªÉä/éÑwÍÔ
N"I5L§¥Ø�ÄE,|µ¥�vm�SA5
´üvÕáI5"�C§����(200�Àª)5�
�VOS [37]êâ8Ü©/�Öþã��"�§�õ
�5ÚÊH5�~k�§du§¹k�þ{ü�¿S
2
?Ò. �. c° Ñ��. Ôöêþ Ôö8 Basic a. OF SP S-measure PCT �è
1 SIVM [56] 2010 ECCV CRF,ÚOþ T 0.481∼0.606 72.4* M&C++2 DCSM [30] 2011 TCSVT SORMål T 0.023* C++3 RDCM [41] 2013 TCSVT gabor§«�é' T X 9.8* N/A4 SPVM [47] 2014 TCSVT ���§��ã T X 0.470∼0.724 56.1* M&C++5 CDVM [15] 2014 TCSVT Ø � T 1.73* M6 TIMP [85] 2014 CVPR �mN� T X 0.539∼0.667 69.2* M&C++7 STUW [16] 2014 TIP ؽ\� T X 50.7* M8 EBSG [49] 2015 CVPR �ª©�n T X N/A9 SAGM [68] 2015 CVPR ÿ/ål T X X 0.615∼0.749 45.4* M&C++
10 ETPM [58] 2015 CVPR úÄJlk� T X N/A11 RWRV [29] 2015 TIP �Åir T 0.330∼0.595 18.3* M12 GFVM [69] 2015 TIP FÝ6 T X X 0.613∼0.757 53.7* M&C++13 MB+M [80] 2015 ICCV ��æNål T 0.552∼0.726 0.02* M&C++14 MSTM [64] 2016 CVPR ��)¤ä T 0.540∼ 0.657 0.02* M&C++15 SGSP [46] 2017 TCSVT ã§��ã T X X 0.557∼0.706 51.7* M&C++16 SFLR [8] 2017 TIP $���5 T X X 0.470∼0.724 119.4* M&C++17 STBP [75] 2017 TIP �µk� T X 0.533∼0.752 49.49* M&C++18 VSOP [22] 2017 TC é�ÿÀµ T X X M&C++19 DSR3 [32] 2017 BMVC 44 (6+8+30) clips 10C+S2+DV RCL [42] D Py&Ca20 VQCU [3] 2018 TMM 1̧ã(� T X 0.78* M21 CSGM [71] 2018 TCSVT ÀªéÜwÍ5 T X X 3.86* M&C++22 STUM [2] 2018 TIP ÛÜ�����¢ T N.A.23 SAVM [72] 2018 PAMI ÿ/ål T X X 0.615∼0.749 45.4* M&C++24 bMRF [7] 2018 TMM MRF T X X 2.63* N/A25 LESR [86] 2018 TMM ÛÜ�O§���O T X X 5.93* N/A26 TVPI [55] 2018 TIP ÿ/ål, CRF T X 2.78* M&C27 SDVM [4] 2018 TIP ��©) T N/A28 SCOM [11] 2018 TIP ∼10K frame pairs MK DCL [36] D X X 0.555∼0.832 38.8 N/A29 STCR [33] 2018 TIP 44 (6+8+30) clips 10C+S2+DV CRF D X N/A30 DLVS [70] 2018 TIP ∼18K frame pairs MK+DO+S2+FS FCN [48] D X X 0.682∼0.881 0.47 Py&Ca31 SCNN [62] 2018 TCSVT ∼11K frame pairs MK+S2+FS VGGNet [60] D X X 0.657∼0.847 38.5 N/A32 FGRN [35] 2018 CVPR ∼10K frame pairs S2+FS+DV �á�PÁ D X 0.674∼0.861 0.09 Py&Ca33 SCOV [27] 2018 ECCV BOW [17],ÿÀµ, FCIS [40] T X X 3.44 N/A34 MBNM [38] 2018 ECCV ∼13K frame pairs Voc12 + Coco [43] + DV $ÄA�, DeepLab [9] D X 0.637∼0.898 2.63 N/A35 PDBM [61] 2018 ECCV ∼18K frame pairs MK+DO+DV DC [78] D 0.698∼0.907 0.05 Py&Ca36 UVOS [25] 2018 ECCV IO>�J�ì D X X N/A37 SSAV (Ours) 2019 CVPR ∼13K frame pairs DAVSOD val + DO +DV SSLSTM, PDC [61] D 0.724∼0.941 0.05 Py&Ca
Table 2: 36�kcäk�L5�VSOD�{ÚJÑ�SSAV�."Ôö8: 10C = 10-Clips [18]"S2 = SegV2 [34]"DV = DAVIS [53].
DO = DUT-OMRON [77]. MK = MSRA10K [12]"MB = MSRA-B [45]"FS = FBMS [50]"Voc12= PASCAL VOC2012 [13]"Basic:CRF =^��Å|"SP =�?��"SORM =gS�qÝþ"MRF =ê��Å�Å|"a.: T =DÚ�{"D =�ÝÆS"
OF:´Ä¦^16Eâ"SP:´Ä¦^���L©�Eâ"S-measure [14]:L4¥8�êâ8Smeasure��©��"PCT:zvO��m£¦¤"du [3, 7, 11, 27, 38, 41, 62, 86]�.vkúÙ�è§Ïd$1�mPCTs5 u�©½öd�öJø"�è: M =
Matlab"Py = Python"Ca= Caffe"N/A =©¥Ã{¼�"“*”L«CPU$1�m"
|µÚ�Žû��|µ"
o�5`§DAVSOD�þãêâ8k²w�«Oµ
i)ÏL�\©ÛÄ�|µe<aý¢�5¿1�§·
�uyÀú5¿å=£y�"l §ÄgrNÄ�
|µ¥�wÍé�=£§¿Jø���!�Àú5
¿å����I5" ii) Ùõ�5!�5�È�I5!
���é�/¢~-?wÍé��I5!Àª£ã±9´
L�á5I5£~X§wÍé��êþ§$Ä�ª±
9|µ/é�aO�¤§�Ó�VSOD?Ö�ej¢ Õ
A�Ä:"
VSOD�." @�VSOD�. [8, 20, 22, 29, 46, 47, 56,
57, 68, 69]ïá3Ãó�O�A�£X§ôÚ!$Ä�¤
�þ§¿3é�§Ýþ�6uã�wÍé�uÿ+
�(~X§¥%±>é' [12]§�µk� [73])¥;.�
éuª�{ÚÀú5¿@�nØ£~X§A��Ün
Ø [63]§Ú�ª|¢ [74]�¤"±§§��&¢æ
^ØÓ�O�Å��Ü�mÚ�mwÍA���{§
XFÝ6| [69]§ÿ/ål [68]§é�Åir [29]Ú
1Ìã(� [3]"Ïd§��A�ó§±9L�Uå
k��ÃóA�7,¬åPDÚVSOD�.�uÐ"�
�L2"
�C§Ï�Ý ²�ä3ã�wÍ5uÿ [26,
44, 65, 66, 79, 81–83]+��¤õA^§Äu�ÝÆS
�VSOD�. [25,32,33,35,61,62,70]dd�É'5"ä
N ó§Wang�<�ïÄ [70]´VSOD¥�@æ^�
òÈ ²�ä�",��ÓÏó� [32]¦^3DÈÅì
ò�mÚ�m&EÜ¿���CRFµe¥"�5§�
��ÝA� [33]§RNN [35]§7i©*ÜconvLSTM
[61]ºY�JÑ5±B�Ð/ÓP�mÚ�mwÍA
�"du ²�ä�r��ÆSUå§ùÄu�Ý
3
0 50 100 150 2000
200
400
600
0 50 100 150 200
实例对象
0
0.1
0.2
0.3
0.4
0.5
0.6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0100200300400
自拍
舞蹈
大鸟羊
显著对象
交通工具
动物
人类活动
人工制品
社交艺术
日常
运动
35%
15%7%
18%
8%
14%
17%
57%
(a) (b)
#¢~
ÀªS�
z�ÀªS�¥¤I5�wÍ¢~
(c)
#˻v
ÀªS�
z�ÀªS�¥�ã�v
(d)
ªÇ
'Ç
é�/¢~º�
(e)
Figure 3: 'uDAVSODêâ8�ÚOêâ: (a)|µ/é�aO" (b, c)©O�L¢~Úã�vI5�©Ù" (d)é�/¢~�'~
©Ù"(e)L«(a)¥|µaO�m��p�6'X"
�VSOD�.Ï~���Ð�5U", §ù�.
Ñ�Ñéun)<aÀú5¿Å��~��wÍ
5=£y�"�é ó§·��Ä:�.£SSAV¤²
(/|^wÍ5=£�¢§��Rk¤��(J"
·�3Ô�±c�êâ8ÚJÑ�DAVSODêâ
8þXÚ/é17��k?�VSOD�. [29, 34, 37, 46,
50, 53, 69] ?1µÿ§ù�ó��Lî8��
3VSOD+�¥���5Uµ�"·�/Ï�þ�½þ
(J§�VSOD+�¥y�X���(Ø¿�Ñ
eZkcµ�ïÄ��"
3.JÑ�êâ8�!ò�[�ã¤J�DAVSODêâ8§§´;
��VSOD?Ö�O�"ã1Úã2Ы�I5�ã�
v"[!�Ööë�Ö¿á�"·�òl±e4�'�
�¡50�DAVSOD"
3.1.Àªæ8DAVSOD�ÀªS� gDHF1K [67]§DHF1K´
�c��5��Ä�úÄJlêâ8"¦^DHF1K�
ïDAVSODêâ8k±eÐ?"DHF1K´lYoutubeþ
Â8�§ºX�«y¢|µ!õ«ÔN*Ú$Ä
�ª!´L�é�aO§±9Ä�|µ¥�Ü©~�
�]Ô§ù�·��ï�5�Úäk�L5�µÿ
Jøj¢�Ä:"���´§DHF1K¤Jø�À
ú5À:¦·�U���Ün�!)Ôéu�é
�-?wÍ5I5"·�±Ãó��ªòÀª©��¡
ã(ã.3(c))¿íØ@�ç¶LÞ�¡ã"ÏLù«�
ª§�ª�����.êâ8§§�¹226�Àª§
�O23,938v§798¦"Àª©EÇ�640×360��"3.2.êâI5wÍ5=£I5. 3ý¢�Ä�|µ¥ [31,54]§<a�
5¿å1��\E,§=ÀJ55¿å©�Úwª�
5¿å=££duâ,�ô§#�Ä�¯��¤Ñ
�Uu)"ÏLDHF1K�úÄJlP¹§·�*�
êâ°Ä�5¿å=£ÊH�3§Xã1¤«", §
VSOD+�¥�c�ïÄÑvk²(rNù«Ä��
Àú5¿1�"3DAVSOD¥§·��âý¢�<a5
À:5I5wÍ�é�§¿�ÄgI55¿å=£
¤u)���§rNT+�¥wÍ5=£ù��ä
]Ô�?Ö"
|µÚé�aOI5. �©z [67]��§z�ÀªÑ
ÃÄIP��aO£X§ÄÔ,�Ïóä,<ó�¬,<
a¹Ä¤"<a¹Äk4�faµ$ħF~-§��-±
9²â¹Ä"�ué�aO§ÚMSCOCO��§��
¹“¯Ô”aO£ Ø´“ÀÜ”¤"ù�·�Òïá�
���70��~Ñy�|µ/é��L"ã.3(a)Ú(e)¥§
©OЫ|µ/é�aO9Ù�p�65"��é�
I5L§kÊ�I5öë�"
¢~/é�?wÍÔNI5. ·�420�I5ö²
L10�Àª«~ýÔö�§lz��I5�Àªv
¥ÀJÑ�õ5�é�¿[�/I5§�£^°(�>
�Ó+ Ø´o÷�õ>/¤"I5ö���¦«©Ñ
ØÓ�¢~¿�üÕ?1I5§l ��23,938vé�
?wÍ5I5Ú39,498�¢~?wÍ5I5"
3.3.êâ8�A:�ÚO��\)DAVSODêâ8§A��A�Xe"
4
�Å$Ä é�$Ä ¢~-?é��êDAVSOD
ú ¯ ½ ú ¯ 1 2 3 ≥ 4
Àªêþ 102 124 117 72 37 134 125 46 33
Table 3: 'uDAVSODêâ8¥��Å/é�$ÄÚwÍé�¢~êþ�ÚO&E"
´Lõ��wÍé�. DAVSOD¥�wÍé�ºX´
L�aOµÄÔ£X§�f§�j¤§�ý£ð�§g
1�¤§<ó�¬£~X§Ýf§ïÓÔ¤Ú�«/ª
�<a¹Ä£~X§Í�§ä1¤§¦��¡/n)Ä
�|µeé�-?wÍ5¤��U"
wÍé�¢~�êþ. ykêâ8Û�uwÍ5é�¢~�êþ£�L1¤"�c�ïÄ [28]L²§<a�
±�8,/°(a��õ�5�ÔN ÃI���/
Oê"Ïd§XL3¤«§DAVSOD�¹�õ�wÍ
é�£zv�õ5�wÍé�¢~§²þ��1.65¤"ã
ã.3(b)�Ñz�Àª¥I5�¢~êþ©Ù"
wÍé��º�.é�?wÍé����½Â�cµé����ã��m�'Ç"DAVSODêâ8¥�wÍé
�º��0.29%∼91.3%£²þµ11.5%¤§Cz���
2"£�ë�ã.3(d)¤"
õ�z��Å$Ä�ª. DAVSOD�¹�«ØÓ��
Å$Ä�ª(�L3)"3ù��êâþ?1Ôö��{
�±�Ð/?ný¢�Ä�|µ§Ïd�¢^"
ØÓ�é�$Ä�ª. DAVSODU«DHF1K�`³§
K)�«��(�L3)ý¢�Ä�|µ(X§é�$Ä
�ªl½�¯�)"ùéu;�LÝ[ܱ9�*Ú
O(/?1�{µÿ�'�"
¥% �. DAVSODÚykêâ8 [29, 34, 37, 46, 50, 53,
69]�¥% �Xã4¤«"
3.4.êâ8y©ykêâ8vk�3ÿÁ8§ù�éN´���
.3êâ8þLÝ[Ü"Ïd§·�Uì4µ2µ4�
'~òÀª©�Ôö!�yÚÿÁ8Ü"æ^�Å
çÀ�üѧ·�����ÕA�y©(J§�¹
90�Ôö8Àª!46��y8Àª±990�ÿÁ8
Àª"Ôö8ÚÿÁ8�I5(Jò¬úm ÿÁ8
�I5(Jò��3"�âVSOD?Ö�Jݧ90�
ÿÁ8q?�Ú�©�35�N´f8!30��~f8
Ú25�(Jf8"
SegV2 [34]SegV2 [34]SegV2 [34]SegV2 [34]SegV2 [34]SegV2 [34]SegV2 [34]SegV2 [34]SegV2 [34]SegV2 [34]SegV2 [34]SegV2 [34]SegV2 [34]SegV2 [34]SegV2 [34]SegV2 [34]SegV2 [34]
DAVIS [53]DAVIS [53]DAVIS [53]DAVIS [53]DAVIS [53]DAVIS [53]DAVIS [53]DAVIS [53]DAVIS [53]DAVIS [53]DAVIS [53]DAVIS [53]DAVIS [53]DAVIS [53]DAVIS [53]DAVIS [53]DAVIS [53]
FBMS [50]FBMS [50]FBMS [50]FBMS [50]FBMS [50]FBMS [50]FBMS [50]FBMS [50]FBMS [50]FBMS [50]FBMS [50]FBMS [50]FBMS [50]FBMS [50]FBMS [50]FBMS [50]FBMS [50]
UVSD [46]UVSD [46]UVSD [46]UVSD [46]UVSD [46]UVSD [46]UVSD [46]UVSD [46]UVSD [46]UVSD [46]UVSD [46]UVSD [46]UVSD [46]UVSD [46]UVSD [46]UVSD [46]UVSD [46]
ViSal [69]ViSal [69]ViSal [69]ViSal [69]ViSal [69]ViSal [69]ViSal [69]ViSal [69]ViSal [69]ViSal [69]ViSal [69]ViSal [69]ViSal [69]ViSal [69]ViSal [69]ViSal [69]ViSal [69]
VOS [37]VOS [37]VOS [37]VOS [37]VOS [37]VOS [37]VOS [37]VOS [37]VOS [37]VOS [37]VOS [37]VOS [37]VOS [37]VOS [37]VOS [37]VOS [37]VOS [37]
MCL [29]MCL [29]MCL [29]MCL [29]MCL [29]MCL [29]MCL [29]MCL [29]MCL [29]MCL [29]MCL [29]MCL [29]MCL [29]MCL [29]MCL [29]MCL [29]MCL [29]
DAVSODDAVSODDAVSODDAVSODDAVSODDAVSODDAVSODDAVSODDAVSODDAVSODDAVSODDAVSODDAVSODDAVSODDAVSODDAVSODDAVSOD
Figure 4: DAVSODÚykVSODêâ8�¥% �.
4.J��.
4.1.ÄuwÍ5=£�VSOD�.
�.Vã. �©¤JÑ�SSAV�.dü�Ä��¬�¤µ7i©*ÜòÈ�¬£PDC¤ [61]ÚwÍ5=£a
��¬(SSLSTM)"cö^u°�/ÆS·�wÍ5A
�§�öòDÚ��á�PÁòÈ�ä£convLSTM¤
[59]�wÍ5=£a�5¿£SSAA¤Å��(Ü"
SSAV�.ò²dPDC�¬���·�A�S���
Ñ\§Ó��Ä�SCzÚwÍ5=£l ���A
�VSOD(J"
7i©*ÜòÈ(PDC)�¬. �#�©�ÚVSOD�
ïÄL² [10, 61]§duõºÝ&E�|^Ú�m[!
��3§²1U\�|�kæ�Ç�*ÜòÈ��±
¼��Ð�ÆS5U"Ïd·�¦^PDC�¬ [61]�
�·�A�J�ì"/ªþ§-Q∈RW×H×C L«Ñ\
vI∈Rw×h×3�3DA�Üþ"*ÜÇ�d>1�*ÜòÈ
�Dd�±�^�Q¥§l ��A�P∈RW×H×C′"T
ÑÑA��±�©�m©EǧÓ�¼����
aÉ�£æ�Ú��d¤"ÏL¿1ü��|£K¤ØÓ
*ÜÇ{dk}Kk=1 �*ÜòÈ�{Ddk}Kk=15|�PDC�
¬:
X = [Q, P1, . . . , Pk, . . . , PK ], (1)
Ù¥§X∈RW×H×(C+KC′)§Pk =Ddk(Q). [., .]�Lë�
ö�"PDCOr��A�X´�«�r��A�£|^
õºÝ&E¤��3�©&EQ£ÏLí�ë�¤"
wÍ5ÔN=£a�convLSTM(SSLSTM). ·�JÑ�«wÍ5=£a��convLSTM [59]§§¦
�convLSTMäkwÍ5=£a�5¿Å�"§´�
�r��Ì��.§Ø=�±ÓP�S&E§��
±«©�µ¥�wÍÔN±9?è5¿å=£&
E"�äN/`§ÏLPDC�¬§·�¼�äkTv
�Ñ\Àª�·�L«{Xt}Tt=1"3��t§�½Xt§
wÍ5=£a��convLSTMÑÑ�A�wÍé�ù
èSt∈ [0, 1]W×H :
5
Xt-1C St-1
CQt
P1
P2
P3
P4
Ht
At
St
C St+1
Gt
Gt-1
It+1
It
It-1
dilated convC
concatenation element-wise multiplication
Xt
LVSOD
LVSOD
Mt+1
Mt-1
Ft+1
Ft
Ft-1
Gt+1Ht+1
Ht-1
At+1
At-1
Xt+1
LVSOD
LAtt
LAtt
LAtt
d
4
4
2
8
16
16
2
8
16
2
8
4
ResNet50+PDC SSLSTM Loss
Mt
Saliency shift
HtA
Ht-1A
Ht+1A
Figure 5: SSAV�.�oNe�" SSAVdüÜ©|¤µ7i©*ÜòÈ£PDC¤�¬ÚwÍ5=£a�convLSTM£SSLSTM¤
�¬"cö^uk��·�wX5ÆS§�ö^uÓ�Ó¼�mÄ�ÚwÍ5=�"k'[!��1§4!"
ÛõG�: Ht=convLSTM(Xt,Ht−1),
wÍ5=£a�5¿: At =FA({X1, · · · ,Xt}),
a�=£: Gm,t=At �Hm,t,
wÍ5ÔNýÿ: St=σ(wS⊗Gt),
(2)
Ù¥§H ∈ RW×H×ML«3DÜþÛõG�"5¿å
ãA∈ [0, 1]W×H´lwÍ5=£a��5¿�äFA O
�Ñ5�§§òkc�v�Ä3S"G∈RW×H×ML«
a�=£§ m ∈M L«Ï�¢ÚeI"�ÎÒ�Ý��¦{"wS ∈R1×1×M´��1×1�òÈا�^�wÍé�Ö�¼ê"⊗�òÈö�§σ´sigmoid-¹
¼ê"
þã�¬�'�|¤Ü©´wÍ5=£a�5¿
�äFA"éw,§§¿��� ²�5¿Å�§Ï�
§�^5éconvLSTMÑÑ�A�H?1\�"Ød�
§·��Ï"§Uv±k�/�[<a5¿å=£
1�"�Ä�ù���k«O�?Ö§·�Ú\�
���convLSTM5�ïFA§l ¦�convLSTM¥i
@,�convLSTM(�µ
wÍ5=£a�5¿: At=FA({X1, · · · ,Xt}),
5¿åA�J�: HAt =convLSTMA(Xt,H
At−1),
5¿åN�: At=σ(wA⊗HA
t ),
(3)
Ù¥wA∈ R1×1×M�L��1×1�òÈØ^5N�5
¿åA�HA�����5ݧsigmoid¼êσ2
r�5Ý�8�z�[0, 1]",�wÍ5=£a
�5¿At^uOrúª2¥�wÍé�©�A�H"
duconvLSTMA�A^§·��5¿å�¬¼�
r��ÆSUå§ù�ÆSwªÚÛª�5¿å
=£Jøj¢�Ä:"b�{It ∈ Rw×h×3}Tt=1��
¹Tv���ÔöÀª§{Ft ∈ [0, 1]W×H}Tt=1�<ú
5ÀI5S�§{Mt ∈ {0, 1}W×H}Tt=1�ÀªwÍé
�^rI5(J"·�¤^���¼êd5¿å�
.{At∈{0, 1}W×H}Tt=1�ÑÑÚ��ÀªwÍé�ýÿ
(J{St∈{0, 1}W×H}Tt=1�¤µ
L =∑T
t=1
(`(It) · LAtt(At,Ft) + LVSOD(St,Mt)
), (4)
Ù¥LAttÚLVSODÑ´�����¼ê. `(·) ∈ {0,1}L«´Ä�35¿À:I5£Ï��c�õê�VSODêâ
8"�<ú5À:êâ§�L1)"�"��A�5¿À
:�§Ø�Òج�£D"���´§�`(·)=0�§
�ª3¥�wÍ5=£a�5¿�.FAÒ±Ûª�ª
Ôö"ù�±w�´�«;.� ²5¿Å�"�
Jø5¿À:I5�(`(·) = 1)§FAÒ±wª��ª
Ôö"/ÏconvLSTM(�§FAÒUO(/ò·�
�VSOD�.�5¿å=£���é�þ(�ã.6)"
4.2.¢�[!
PDC�.�Ä:CNN�ä5gResNet-50 [24]�ò
È�¿���ü�Ú���1"¤kÑ\ã�vÑ�
��473×473��m©EÇ�Q ∈ R60×60×2048"�
©z [61]��§·���K = 4, C = 512§dk = 2k
(k ∈ {1, · · ·, 4})"éu�ª2¥�convLSTM§·�¦^
��3×3×32�òÈØ" éu�ª3¥�convLSTMAK
^��3×3×16�òÈØ"ÔöüÑþ§·�Ú [61]�
<�±��(�´�¦^MSRA-10k [12]êâ8)"d§
·�?�Ú|^DAVSOD��y85wª/ÔöwÍ
5=£a�5¿�¬"
6
2010-2015 2016-2017 2018
Metric SIVM TIMP SPVM RWRV MB+M SAGM GFVM MSTM STBP SGSP SFLR SCOM SCNN DLVS FGRN MBNM PDBM SSAV†
[56] [85] [47] [29] [80] [68] [69] [64] [75] [46] [8] [11]† [62]† [70]† [35]† [38]† [61]†
ViSa
l max F ↑ .522 .479 .700 .440 .692 .688 .683 .673 .622 .677 .779 .831 .831 .852 .848 .883 .888 .939S ↑ .606 .612 .724 .595 .726 .749 .757 .749 .629 .706 .814 .762 .847 .881 .861 .898 .907 .943M ↓ .197 .170 .133 .188 .129 .105 .107 .095 .163 .165 .062 .122 .071 .048 .045 .020 .032 .020
FB
MS-
T max F ↑ .426 .456 .330 .336 .487 .564 .571 .500 .595 .630 .660 .797 .762 .759 .767 .816 .821 .865S ↑ .545 .576 .515 .521 .609 .659 .651 .613 .627 .661 .699 .794 .794 .794 .809 .857 .851 .879M ↓ .236 .192 .209 .242 .206 .161 .160 .177 .152 .172 .117 .079 .095 .091 .088 .047 .064 .040
DAV
IS-T max F ↑ .450 .488 .390 .345 .470 .515 .569 .429 .544 .655 .727 .783 .714 .708 .783 .861 .855 .861
S ↑ .557 .593 .592 .556 .597 .676 .687 .583 .677 .692 .790 .832 .783 .794 .838 .887 .882 .893M ↓ .212 .172 .146 .199 .177 .103 .103 .165 .096 .138 .056 .048 .064 .061 .043 .031 .028 .028
SegV
2
max F ↑ .581 .573 .618 .438 .554 .634 .592 .526 .640 .673 .745 .764 ** ** ** .716 .800 .801S ↑ .605 .644 .668 .583 .618 .719 .699 .643 .735 .681 .804 .815 ** ** ** .809 .864 .851
M ↓ .251 .116 .108 .162 .146 .081 .091 .114 .061 .124 .037 .030 ** ** ** .026 .024 .023
UV
SD
max F ↑ .293 .338 .404 .281 .339 .414 .426 .336 .403 .544 .562 .420 .550 .564 .630 .550 .863 .801
S ↑ .481 .537 .581 .536 .563 .629 .628 .551 .614 .601 .713 .555 .712 .721 .745 .698 .901 .861
M ↓ .260 .178 .146 .180 .169 .111 .106 .145 .105 .165 .059 .206 .075 .060 .042 .079 .018 .025
MC
L max F ↑ .420 .598 .595 .446 .261 .422 .406 .313 .607 .645 .669 .422 .628 .551 .625 .698 .798 .774
S ↑ .548 .642 .665 .577 .539 .615 .613 .540 .700 .679 .734 .569 .730 .682 .709 .755 .856 .819
M ↓ .185 .113 .105 .167 .178 .136 .132 .171 .078 .100 .054 .204 .054 .060 .044 .119 .021 .027
VOS-
T max F ↑ .439 .401 .351 .422 .562 .482 .506 .567 .526 .426 .546 .690 .609 .675 .669 .670 .742 .742S ↑ .558 .575 .511 .552 .661 .619 .615 .657 .576 .557 .624 .712 .704 .760 .715 .742 .818 .819M ↓ .217 .215 .223 .211 .158 .172 .162 .144 .163 .236 .145 .162 .109 .099 .097 .099 .078 .073
DAV
SOD
-T max F ↑ .298 .395 .358 .283 .342 .370 .334 .344 .410 .426 .478 .464 .532 .521 .573 .520 .572 .603S ↑ .486 .563 .538 .504 .538 .565 .553 .532 .568 .577 .624 .599 .674 .657 .693 .637 .698 .724M ↓ .288 .195 .202 .245 .228 .184 .167 .211 .160 .207 .143 .220 .128 .129 .098 .159 .116 .092
Table 4: 17��k?�VSOD�.37�êâ8þ�µÿ(J: SegV2 [34], FBMS [50], ViSal [69], MCL [29], DAVIS [53], UVSD [46],
VOS [37]±9�©�DAVSODÿÁ8¥35�{üf8"�5¿§TIMP=3VOSþ�9�áÀª?1ÿÁ§Ï�§Ã{?n�À
ª"“**”L«T�.®²3Têâ8þ?1Ôö"“-T”L«(J´3Têâ8�ÿÁ8þ���"“†”L«�ÝÆS�."ôÚ��L«5U�Ð"�Z©êIP�oooNNN"
5.µÿ(J
5.1.¢���
µ��I.�½þïþ�.5U§·�^2«61��
Iµ²þýéØ�(MAE)M [52], F-measure F [1],9
�#JÑ�(�5�IS-measure S [14]"µÿ�è�
�µhttp://dpfan.net/DAVSOD/
µÿ��.. ·��ÿÁ17¥�.£DÚ�{11«§
�Ý�.6«¤"ÀJ�.�IO�µi¤�è®úm§
ii¤äk�L5"
µÿüÑ. �Jø�¡�µÿ§·�3yk�7�êâ8Ú¤JÑ�DAVSODþµ�17«äk
�L5��." VOS [37]§FBMS [50]§DAVIS [53]§
DAVSODù4�êâ8�ÿÁ8±94����êâ
8ViSal [69]§MCL [29], SegV2 [34]§UVSD [46] �^
5��ÿÁ8"§��237�Àª§�4�v"
5.2.5U'�Úêâ8©Û
�!ò¥yeZUr?�5ïÄ�k�(Ø"
DÚ�.�5U.ÄuL4¥�ØÓ�I§·��Ñ�
(Ø�µ“SFLR [8]§SGSP [46]ÚSTBP [75]´VSOD¥
��ÝÆS�.�c3¶"” SFLRÚSGSPÑwª/�
Ä16üÑ5J�$ÄA�"�O�¤�Ï~ép
£�L2¤"��5¿�´§ù3��.Ñ|^���E
â3«�?Oþ�Ü��A�"
�Ý�.�5U. µÿ¥cn¶��.(=SSAV§
PDBM [61]§MBNM [38]) ÑÄu�ÝÆSEâ§ù
L² ²�ääkr��ÆSUå"3ViSalêâ8þ
£VSOD�1��;��O�êâ8¤§§��²þ5
U(max F)$�pu0.9"
DÚ��ÝVSOD�.�'�. lL4��A�¤
k�Ý�.Ñ`uDÚ�{§ù8õu�Ý�ä
�r��wÍ5A�J�Uå",�k��uy
7
(1)
(2)
(3)
(4)
(5)
(a)ã�v (b)À: (c)ÃóI5 (d) SSAV (e) MBNM [38] (f) FGRN [35] (g) PDBM [61] (h) SFLR [8] (i) SAGM [68]
Figure 6: DAVSODêâ8þ�Ý�.c3¶(MBNM [38]§FGRN [35]§PDBM [61])�DÚ�.c2¶(SFLR [8]§SAGM [68])�Àú(J'�" ·��SSAV�.¤õÓPwÍ5=£y�"
´²;�{¥�Ð��.(SFLR [8])3MCL!UVSD!
ViSal9DAVSODêâ8þ�5U',�Ý�.§
XSCOM [11] �5U�Ð"`²3�ÝÆSe�¥
ïÄXÛk�|^<�k��£´ékcµ���"
êâ8©Û.3L4¥§·�^�ÚIP©ê§�V�
ôÚ¿�XA½�I£X§max F§S±9M¤äk�Ð5U"·�uyViSalÚUVSDêâ8�éN´§Ï�
ü¶c2��.µSSAVÚPDBM [61]¼��~p�5
U£S > 0.9¤"�´§éu�DAVSODù��ä]Ô5
�êâ8§VOSD�.�5U¬:ìeü(S<0.73)"ù
�«VOSD�.��NÚüÕ5U3�5�ïÄ¥Ñ
�k�J,�m"
$1�m©Û. L2�Ñ�cVSOD�{Ú�©JÑ
�SSAV�{�$1�m£PCF@��¤"鮲úÙ
�è��.§ÙÿÁ�m´3�Ó�M�²�µIntel
Xeon(R) E5-2676v3 @2.4GHz×24!GTX TITAN Xþÿ
Á�"Ù{�.�ÿÁ�mK´l�Ø©¥Á¹�"
5¿�§�©JÑ��.vkA^?Ûc/�?n£~
X§CRF¤�{§Ïd?n�Ý=I�0.05¦"
5.3.©l¢�ÛªÚwªwÍ5=£a�5¿Å�. �ïĤJÑ�SSAA�¬�ØÓÔöüÑ�K�§·��Ñ2�
Ä�µwªÚÛª§éAu¤JÑ�SSAV�.±w
ªÚÛª�ª?1Ôö"ÛªÄ�Ôö�§·�æ
^�´VSOD¥é�?I5 vk¦^DAVSODêâ8
¥�5¿À:I5"dL5��§SSAV�.æ^wª
Ôö�ª`uÛªÔö"ùL²|^úÄêâkÏ
uSSAV�.�Ð/Ó¼wÍ5=£y�§l ?�Ú
Type Baseline S ↑ max F ↑ M ↓
SSAAexplicit 0.724 0.603 0.092implicit 0.684 0.593 0.103
SSLSTM w/o SSLSTM 0.667 0.541 0.132
Table 5: SSAV�.3DAVSODêâ8þ�©l¢�"
Jp�ª�VSOD5U"
wÍ5=£a�convLSTM�k�5. �ï
ÄSSLSTM(§ 4)�k�5§·�Jø,��Ä�µ
w/o SSLSTM§=lSSAV�.¥�KSSLSTM�¬"l
L5¥uy§Ä��5Uk¤eü(S : 0.724→ 0.667)"
ùy¢¤JÑ�SSLSTM�¬Uläk]Ô5�ê
â¥k�/ÆS�ÀJ55¿å©�Ú5¿å=£"
��k?��.'�. L4�ѤJ�SSAV�
.��c�k?�17«VSOD�{�5U"�©�Ä
�SSAV�.5U3�õêêâ8þLy'Ù¦�.
�Ð"äN ó§·���.3ViSalÚFBMSêâ8
þ�5U��wÍJp" 3VOS§SegV2ÚDAVIS
êâ8þ¼����5U"�u�äk]Ô5
�DAVSODêâ8§SSAV�.�¼��Z5U"·
�òùLyÑÚ�5U8õuSSLSTM�Ú\§§
k�/ÆSÄ�|µ¥�wÍ5©�Å�§¿��
�.O(/?n@Àúþ��«�"
ã.6L²§�Ù¦�k?��{�'§�©
�SSAV�{¼��Àú�J��n�" SSAV�.
¤õÓ¼wÍ5=£y�(l11�15vµc→[c,Ýf]→c→Ýf→[c,Ýf])", §Ù§p5U
�VSOD�.�oÃ{âw��wÍé�£~X§
SFLR§ SAGM¤§�o=Ó¼�£Ä�c£~X§
8
MBNM¤"·�F"�©JÑ�Ä:�.U��.�m
um81²�cµ"
6.(Ø
ÏL�ï#�!�Àú5¿å���5DAVSOD�
êâ8§ïá��5��µÿ§¿JÑSSAVÄ:�
.§�©¥yVSOD+���¡�Nï"�'DÚ½
�ÝÆS�.§¤JÑ�SSAV�.¼�R��5U
¿���Àúþ�lÐ�(J"�þ¢�y²§=
¦�Ä�5U�Z��.§VSOD¯Kq����)
û"þããå±9�\�©Ûòk|uT+��uЧ
¿-u�2��d3ïħ~X§ÄuwÍ5a��
Àªi4§ÀªwÍé�aêÚ¢~?VSOD�"
References
[1] Radhakrishna Achanta, Sheila Hemami, Francisco Estrada,
and Sabine Susstrunk. Frequency-tuned salient region detec-
tion. In IEEE CVPR, pages 1597–1604, 2009. 7
[2] Tariq Alshawi, Zhiling Long, and Ghassan AlRegib. Unsu-
pervised uncertainty estimation using spatiotemporal cues in
video saliency detection. IEEE TIP, pages 2818–2827, 2018.
3
[3] Caglar Aytekin, Horst Possegger, Thomas Mauthner, Serkan
Kiranyaz, Horst Bischof, and Moncef Gabbouj. Spatiotem-
poral saliency estimation by spectral foreground detection.
IEEE TMM, 20(1):82–95, 2018. 3
[4] Saumik Bhattacharya, K Subramanian Venkatesh, and
Sumana Gupta. Visual saliency detection using spatiotem-
poral decomposition. IEEE TIP, 27(4):1665–1675, 2018. 3
[5] Marc D. Binder, Nobutaka Hirokawa, and Uwe Windhorst,
editors. Gaze Shift, pages 1676–1676. Springer Berlin Hei-
delberg, Berlin, Heidelberg, 2009. 1
[6] Ali Borji, Dicky N Sihite, and Laurent Itti. What stands out
in a scene? A study of human explicit saliency judgment.
Vision Research, 91:62–77, 2013. 1
[7] Chenglizhao Chen, Shuai Li, Hong Qin, Zhenkuan Pan, and
Guowei Yang. Bi-level feature learning for video saliency
detection. IEEE TMM, 2018. 1, 3
[8] Chenglizhao Chen, Shuai Li, Yongguang Wang, Hong Qin,
and Aimin Hao. Video saliency detection via spatial-
temporal fusion and low-rank coherency diffusion. IEEE
TIP, 26(7):3156–3170, 2017. 2, 3, 7, 8
[9] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos,
Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image
segmentation with deep convolutional nets, atrous convolu-
tion, and fully connected crfs. IEEE TPAMI, 40(4):834–848,
2018. 3
[10] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos,
Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image
segmentation with deep convolutional nets, atrous convolu-
tion, and fully connected crfs. IEEE TPAMI, 40(4):834–848,
2018. 5
[11] Yuhuan Chen, Wenbin Zou, Yi Tang, Xia Li, Chen Xu, and
Nikos Komodakis. Scom: Spatiotemporal constrained opti-
mization for salient object detection. IEEE TIP, 27(7):3345–
3357, 2018. 2, 3, 7, 8
[12] Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Philip H.
S. Torr, and Shi-Min Hu. Global contrast based salient region
detection. IEEE TPAMI, 37(3):569–582, 2015. 3, 6
[13] Mark Everingham, SM Ali Eslami, Luc Van Gool, Christo-
pher KI Williams, John Winn, and Andrew Zisserman. The
pascal visual object classes challenge: A retrospective. IJCV,
111(1):98–136, 2015. 3
[14] Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and
Ali Borji. Structure-measure: A New Way to Evaluate Fore-
ground Maps. In IEEE ICCV, pages 4548–4557, 2017.
http://dpfan.net/smeasure/. 3, 7
[15] Yuming Fang, Weisi Lin, Zhenzhong Chen, Chia-Ming T-
sai, and Chia-Wen Lin. A video saliency detection model in
compressed domain. IEEE TCSVT, 24(1):27–38, 2014. 3
[16] Yuming Fang, Zhou Wang, Weisi Lin, and Zhijun Fang.
Video saliency incorporating spatiotemporal cues and uncer-
tainty weighting. IEEE TIP, 23(9):3910–3921, 2014. 3
[17] Li Fei-Fei and Pietro Perona. A bayesian hierarchical model
for learning natural scene categories. In IEEE CVPR, pages
524–531, 2005. 3
[18] Ken Fukuchi, Kouji Miyazato, Akisato Kimura, Shigeru
Takagi, and Junji Yamato. Saliency-based video segmen-
tation with graph cuts and sequentially updated priors. In
ICME, pages 638–641, 2009. 3
[19] Siavash Gorji and James J Clark. Going From Image to
Video Saliency: Augmenting Image Salience With Dynamic
Attentional Push. In IEEE CVPR, pages 7501–7511, 2018.
1
[20] Chenlei Guo, Qi Ma, and Liming Zhang. Spatio-temporal
saliency detection using phase spectrum of quaternion fouri-
er transform. In IEEE CVPR, pages 1–8, 2008. 3
9
[21] Chenlei Guo and Liming Zhang. A novel multiresolution
spatiotemporal saliency detection model and its applications
in image and video compression. IEEE TIP, 19(1):185–198,
2010. 1
[22] Fang Guo, Wenguan Wang, Jianbing Shen, Ling Shao, Jian
Yang, Dacheng Tao, and Yuan Yan Tang. Video saliency
detection using object proposals. IEEE Transactions on Cy-
bernetics, 2017. 3
[23] Hadi Hadizadeh and Ivan V Bajic. Saliency-aware video
compression. IEEE TIP, 23(1):19–33, 2014. 1
[24] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
Deep residual learning for image recognition. In IEEE
CVPR, pages 770–778, 2016. 6
[25] Yuan-Ting Hu, Jia-Bin Huang, and Alexander G. Schwing.
Unsupervised video object segmentation using motion
saliency-guided spatio-temporal propagation. In ECCV.
Springer, 2018. 1, 3
[26] Md Amirul Islam, Mahmoud Kalash, and Neil DB Bruce.
Revisiting salient object detection: Simultaneous detection,
ranking, and subitizing of multiple salient objects. In IEEE
CVPR, pages 7142–7150, 2018. 3
[27] Yeong Jun Koh, Young-Yoon Lee, and Chang-Su Kim. Se-
quential clique optimization for video object segmentation.
In ECCV. Springer, 2018. 3
[28] Edna L Kaufman, Miles W Lord, Thomas Whelan Reese,
and John Volkmann. The discrimination of visual number.
The American Journal of Psychology, 62(4):498–525, 1949.
5
[29] Hansang Kim, Youngbae Kim, Jae-Young Sim, and Chang-
Su Kim. Spatiotemporal saliency detection for video se-
quences based on random walk with restart. IEEE TIP,
24(8):2552–2564, 2015. 1, 2, 3, 4, 5, 7
[30] Wonjun Kim, Chanho Jung, and Changick Kim. Spatiotem-
poral saliency detection and its applications in static and dy-
namic scenes. IEEE TCSVT, 21(4):446–456, 2011. 1, 3
[31] Christof Koch and Shimon Ullman. Shifts in selective vi-
sual attention: Towards the underlying neural circuitry. In
Matters of Intelligence, pages 115–141. 1987. 1, 4
[32] Trung-Nghia Le and Akihiro Sugimoto. Deeply supervised
3d recurrent fcn for salient object detection in videos. In
BMVC, 2017. 1, 3
[33] Trung-Nghia Le and Akihiro Sugimoto. Video salient objec-
t detection using spatiotemporal deep features. IEEE TIP,
27(10):5002–5015, 2018. 1, 3
[34] Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, and
James M Rehg. Video segmentation by tracking many figure-
ground segments. In IEEE ICCV, pages 2192–2199, 2013.
1, 2, 3, 4, 5, 7
[35] Guanbin Li, Yuan Xie, Tianhao Wei, Keze Wang, and Liang
Lin. Flow guided recurrent neural encoder for video salient
object detection. In IEEE CVPR, pages 3243–3252, 2018. 2,
3, 7, 8
[36] Guanbin Li and Yizhou Yu. Deep contrast learning for salient
object detection. In IEEE CVPR, pages 478–487, 2016. 3
[37] Jia Li, Changqun Xia, and Xiaowu Chen. A benchmark
dataset and saliency-guided stacked autoencoders for video-
based salient object detection. IEEE TIP, 27(1):349–364,
2018. 1, 2, 4, 5, 7
[38] Siyang Li, Bryan Seybold, Alexey Vorobyov, Xuejing Lei,
and C.-C. Jay Kuo. Unsupervised video object segmentation
with motion-based bilateral networks. In ECCV. Springer,
2018. 2, 3, 7, 8
[39] Yin Li, Xiaodi Hou, Christof Koch, James M Rehg, and Alan
L Yuille. The secrets of salient object segmentation. In IEEE
CVPR, pages 280–287, 2014. 1
[40] Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, and Yichen Wei.
Fully convolutional instance-aware semantic segmentation.
In IEEE CVPR, pages 2359–2367, 2017. 3
[41] Yong Li, Bin Sheng, Lizhuang Ma, Wen Wu, and Zhifeng
Xie. Temporally coherent video saliency using regional dy-
namic contrast. IEEE TCSVT, 23(12):2067–2076, 2013. 3
[42] Ming Liang and Xiaolin Hu. Recurrent convolutional neural
network for object recognition. In IEEE CVPR, pages 3367–
3375, 2015. 3
[43] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays,
Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence
Zitnick. Microsoft coco: Common objects in context. In
ECCV, pages 740–755. Springer, 2014. 3
[44] Nian Liu, Junwei Han, and Ming-Hsuan Yang. PiCANet:
Learning pixel-wise contextual attention for saliency detec-
tion. In IEEE CVPR, pages 3089–3098, 2018. 3
[45] Tie Liu, Jian Sun, Nan-Ning Zheng, Xiaoou Tang, and
Heung-Yeung Shum. Learning to Detect A Salient Object.
In IEEE CVPR, pages 1–8, 2007. 3
[46] Zhi Liu, Junhao Li, Linwei Ye, Guangling Sun, and Li-
quan Shen. Saliency detection for unconstrained videos us-
ing superpixel-level graph and spatiotemporal propagation.
IEEE TCSVT, 27(12):2527–2542, 2017. 1, 2, 3, 4, 5, 7
10
[47] Zhi Liu, Xiang Zhang, Shuhua Luo, and Olivier Le Meur.
Superpixel-based spatiotemporal saliency detection. IEEE
TCSVT, 24(9):1522–1540, 2014. 2, 3, 7
[48] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully
convolutional networks for semantic segmentation. In IEEE
CVPR, pages 3431–3440, 2015. 3
[49] Thomas Mauthner, Horst Possegger, Georg Waltner, and
Horst Bischof. Encoding based saliency detection for videos
and images. In IEEE CVPR, pages 2494–2502, 2015. 3
[50] Peter Ochs, Jitendra Malik, and Thomas Brox. Segmentation
of moving objects by long term video analysis. IEEE TPAMI,
36(6):1187–1200, 2014. 1, 2, 3, 4, 5, 7
[51] Yingwei Pan, Ting Yao, Houqiang Li, and Tao Mei. Video
captioning with transferred semantic attributes. In CVPR,
pages 6504–6512, 2017. 1
[52] Federico Perazzi, Philipp Krahenbuhl, Yael Pritch, and
Alexander Hornung. Saliency filters: Contrast based filtering
for salient region detection. In CVPR, pages 733–740, 2012.
7
[53] Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc
Van Gool, Markus Gross, and Alexander Sorkine-Hornung.
A benchmark dataset and evaluation methodology for video
object segmentation. In IEEE CVPR, pages 724–732, 2016.
1, 2, 3, 4, 5, 7
[54] Matthew S Peterson, Arthur F Kramer, and David E Irwin.
Covert shifts of attention precede involuntary eye move-
ments. Perception & Psychophysics, 66(3):398–405, 2004.
1, 4
[55] Wenliang Qiu, Xinbo Gao, and Bing Han. Eye fixation as-
sisted video saliency detection via total variation-based pair-
wise interaction. IEEE TIP, pages 4724–4739, 2018. 1, 3
[56] Esa Rahtu, Juho Kannala, Mikko Salo, and Janne Heikkila.
Segmenting salient objects from images and videos. In EC-
CV, pages 366–379. Springer, 2010. 2, 3, 7
[57] Hae Jong Seo and Peyman Milanfar. Static and space-time
visual saliency detection by self-resemblance. Journal of Vi-
sion, 9(12):15–15, 2009. 3
[58] Karthikeyan Shanmuga Vadivel, Thuyen Ngo, Miguel Eck-
stein, and BS Manjunath. Eye tracking assisted extraction of
attentionally important objects from videos. In CVPR, pages
3241–3250, 2015. 3
[59] Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung,
Wai-Kin Wong, and Wang-Chun Woo. Convolutional LST-
M network: A machine learning approach for precipitation
nowcasting. In NIPS, 2015. 5
[60] Karen Simonyan and Andrew Zisserman. Very deep convo-
lutional networks for large-scale image recognition. In ICLR,
2015. 3
[61] Hongmei Song, Wenguan Wang, Sanyuan Zhao, Jianbing
Sheng, and Kin-Man Lam. Pyramid dilated deeper convL-
STM for video salient object detection. In ECCV. Springer,
2018. 2, 3, 5, 6, 7, 8
[62] Yi Tang, Wenbin Zou, Zhi Jin, Yuhuan Chen, Yang Hua, and
Xia Li. Weakly supervised salient object detection with spa-
tiotemporal cascade neural networks. IEEE TCSVT, 2018. 2,
3, 7
[63] Anne M Treisman and Garry Gelade. A feature-integration
theory of attention. Cognitive Psychology, 12(1):97–136,
1980. 3
[64] Wei-Chih Tu, Shengfeng He, Qingxiong Yang, and Shao-Yi
Chien. Real-time salient object detection with a minimum
spanning tree. In IEEE CVPR, pages 2334–2342, 2016. 2, 3,
7
[65] Tiantian Wang, Lihe Zhang, Shuo Wang, Huchuan Lu, Gang
Yang, Xiang Ruan, and Ali Borji. Detect globally, refine
locally: A novel approach to saliency detection. In IEEE
CVPR, pages 3127–3135, 2018. 3
[66] Wenguan Wang, Jianbing Shen, Xingping Dong, and Al-
i Borji. Salient object detection driven by fixation prediction.
In IEEE CVPR, pages 1711–1720, 2018. 3
[67] Wenguan Wang, Jianbing Shen, Fang Guo, Ming-Ming
Cheng, and Ali Borji. Revisiting video saliency: A large-
scale benchmark and a new model. In IEEE CVPR, pages
4894–4903, 2018. 4
[68] Wenguan Wang, Jianbing Shen, and Fatih Porikli. Saliency-
aware geodesic video object segmentation. In IEEE CVPR,
pages 3395–3402, 2015. 1, 2, 3, 7, 8
[69] Wenguan Wang, Jianbing Shen, and Ling Shao. Consisten-
t video saliency using local gradient flow optimization and
global refinement. IEEE TIP, 24(11):4185–4196, 2015. 1, 2,
3, 4, 5, 7
[70] Wenguan Wang, Jianbing Shen, and Ling Shao. Video salient
object detection via fully convolutional networks. IEEE TIP,
27(1):38–49, 2018. 2, 3, 7
[71] Wenguan Wang, Jianbing Shen, Hanqiu Sun, and Ling Shao.
Video co-saliency guided co-segmentation. IEEE TCSVT,
28(8):1727–1736, 2018. 3
[72] Wenguan Wang, Jianbing Shen, Ruigang Yang, and Fatih
Porikli. Saliency-aware video object segmentation. IEEE
TPAMI, pages 20–33, 2018. 3
11
[73] Yichen Wei, Fang Wen, Wangjiang Zhu, and Jian Sun.
Geodesic saliency using background priors. In ECCV, pages
29–42. Springer, 2012. 3
[74] Jeremy M Wolfe, Kyle R Cave, and Susan L Franzel. Guided
search: An alternative to the feature integration model for
visual search. Journal of Experimental Psychology: Human
Perception and Performance, 15(3):419, 1989. 3
[75] Tao Xi, Wei Zhao, Han Wang, and Weisi Lin. Salient object
detection with spatiotemporal background priors for video.
IEEE TIP, 26(7):3425–3436, 2017. 2, 3, 7
[76] Ning Xu, Brian Price, Scott Cohen, Jimei Yang, and Thomas
S Huang. Deep interactive object selection. In IEEE CVPR,
pages 373–381, 2016. 1
[77] Chuan Yang, Lihe Zhang, Huchuan Lu, Xiang Ruan, and
Ming-Hsuan Yang. Saliency detection via graph-based man-
ifold ranking. In IEEE CVPR, pages 3166–3173, 2013. 3
[78] Fisher Yu and Vladlen Koltun. Multi-scale context aggrega-
tion by dilated convolutions. In ICLR, 2016. 3
[79] Yu Zeng, Huchuan Lu, Lihe Zhang, Mengyang Feng, and
Ali Borji. Learning to promote saliency detectors. In IEEE
CVPR, pages 1644–1653, 2018. 3
[80] Jianming Zhang, Stan Sclaroff, Zhe Lin, Xiaohui Shen, Bri-
an Price, and Radomir Mech. Minimum barrier salient object
detection at 80 fps. In IEEE ICCV, pages 1404–1412, 2015.
2, 3, 7
[81] Jing Zhang, Tong Zhang, Yuchao Dai, Mehrtash Harandi,
and Richard Hartley. Deep unsupervised saliency detection:
A multiple noisy labeling perspective. In IEEE CVPR, pages
9029–9038, 2018. 3
[82] Lu Zhang, Ju Dai, Huchuan Lu, You He, and Gang Wang. A
bi-directional message passing model for salient object de-
tection. In IEEE CVPR, pages 1741–1750, 2018. 3
[83] Xiaoning Zhang, Tiantian Wang, Jinqing Qi, Huchuan Lu,
and Gang Wang. Progressive attention guided recurrent net-
work for salient object detection. In IEEE CVPR, pages 714–
722, 2018. 3
[84] Ziyu Zhang, Sanja Fidler, and Raquel Urtasun. Instance-
level segmentation for autonomous driving with deep dense-
ly connected mrfs. In IEEE CVPR, pages 669–677, 2016.
1
[85] Feng Zhou, Sing Bing Kang, and Michael F Cohen. Time-
mapping using space-time saliency. In IEEE CVPR, pages
3358–3365, 2014. 2, 3, 7
[86] Xiaofei Zhou, Zhi Liu, Chen Gong, and Wei Liu. Improv-
ing video saliency detection via localized estimation and s-
patiotemporal refinement. IEEE TMM, pages 2993–3007,
2018. 3
12