[1]张国印,王泽宇,吴艳霞,等.面向场景解析的空间结构化编码深度网络[J].哈尔滨工程大学学报,2017,38(12):1928-1936.[doi:10.11990/jheu.201701042]
 ZHANG Guoying,WANG Zeyu,WU Yanxian,et al.Spatial structure encoded deep networks for scene parsing[J].hebgcdxxb,2017,38(12):1928-1936.[doi:10.11990/jheu.201701042]
点击复制

面向场景解析的空间结构化编码深度网络(/HTML)
分享到:

《哈尔滨工程大学学报》[ISSN:1006-6977/CN:61-1281/TN]

卷:
38
期数:
2017年12期
页码:
1928-1936
栏目:
出版日期:
2017-12-25

文章信息/Info

Title:
Spatial structure encoded deep networks for scene parsing
作者:
张国印1 王泽宇1 吴艳霞1 布树辉2
1. 哈尔滨工程大学 计算机科学与技术学院, 黑龙江 哈尔滨 150001;
2. 西北工业大学 航空学院, 陕西 西安 710072
Author(s):
ZHANG Guoying1 WANG Zeyu1 WU Yanxian1 BU Shuhui2
1. College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China;
2. School of Aeronautics, Northwestern Polytechnical University, Xi’an 710072, China
关键词:
场景解析全卷积神经网络图模型空间结构化编码算法多维视觉特征空间关系特征混合特征
分类号:
TP391.413;TP18
DOI:
10.11990/jheu.201701042
文献标志码:
A
摘要:
为了研究有效的特征提取和精确的空间结构化学习对提升场景解析效果的作用,本文提高出基于全卷积神经网络空间结构化编码深度网络,网络内嵌的结构化学习层有机地结合了图模型网络和空间结构化编码算法,算法能够比较准确地描述物体所处空间的物体分布以及物体间的空间位置关系。通过空间结构化编码深度网络,网络不仅能够提取包含多层形状信息的多维视觉特征,而且可以生成包含结构化信息的空间关系特征,从而得到更为准确表达图像语义信息的混合特征。实验结果表明:在SIFT FLOW和PASCAL VOC 2012标准数据集上,空间结构化编码深度网络较现有方法能够显著地提升场景解析的准确率。

参考文献/References:

[1] SHOTTON J, WINN J, ROTHER C, et al. Textonboost for image understanding:Multi-class object recognition and segmentation by jointly modeling texture, layout, and context[J]. International journal of computer vision, 2009, 81(1):2-23.
[2] FARABET C, COUPRIE C, NAJMAN L, et al. Learning hierarchical features for scene labeling[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(8):1915-1929.
[3] SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(4):640-651.
[4] NOH H, HONG S, HAN B. Learning deconvolution network for semantic segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile, 2015:1520-1528.
[5] BADRINARAYANAN V, KENDALL A, CIPOLLA R. Segnet:a deep convolutional encoder-decoder architecture for image segmentation[EB/OL]. 2015,arXiv preprint arXiv:1511.00561, 2015.
[6] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. Computer sience, 2014(4):357-361.
[7] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab:Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE transactions on pattern analysis & machine Intelligence, 2017.
[8] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Advances in neural information processing Systems, 2012, 25(2):2012.
[9] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-4-10),arXiv preprint arXiv:1409.1556.
[10] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Boston, USA, 2015:1-9.
[11] LAFFERTY J D, MCCALLUM A, PEREIRA F C N. Conditional random fields:probabilistic models for segmenting and labeling sequence data[C]//Eighteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc, 2001:282-289.
[12] ZHENG S, JAYASUMANA S, ROMERA-PAREDES B, et al. Conditional random fields as recurrent neural networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile, 2015:1529-1537.
[13] LIN G, SHEN C, VAN DEN HENGEL A, et al. Efficient piecewise training of deep structured models for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016:3194-3203.
[14] LIU F, LIN G, SHEN C. CRF learning with CNN features for image segmentation[J]. Pattern recognition, 2015, 48(10):2983-2992.
[15] BYEON W, LIWICKI M, BREUEL T M. Texture classification using 2d lstm networks[C]//201422nd International Conference on. Pattern Recognition (ICPR).[S.l.]2014:1144-1149.
[16] THEIS L, BETHGE M. Generative image modeling using spatial LSTMs[C]//Advances in Neural Information Processing Systems.[S.l.] 2015:1927-1935.
[17] BYEON W, BREUEL T M, RAUE F, et al. Scene labeling with lstm recurrent neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015:3547-3555.
[18] LIANG X, SHEN X, XIANG D, et al. Semantic object parsing with local-global long short-term memory[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016:3185-3193.
[19] LIANG X, SHEN X, FENG J, et al. Semantic object parsing with graph lstm[C]//European Conference on Computer Vision. Springer International Publishing, 2016:125-143.
[20] LI Z, GAN Y, LIANG X, et al. LSTM-CF:Unifying context modeling and fusion with LSTMS for RGB-D scene labeling[C]//European Conference on Computer Vision. Springer International Publishing, 2016:541-557.
[21] ZHANG R, YANG W, PENG Z, et al. Progressively Diffused Networks for Semantic Image Segmentation[EB/OL].[2016-12-20], arXiv preprint arXiv:1702.05839.
[22] BU S, HAN P, LIU Z, et al. Scene parsing using inference Embedded Deep Networks[J]. Pattern recognition, 2016, 59:188-198.
[23] ACHANTA R, SHAJI A, SMITH K, et al. SLIC superpixels compared to state-of-the-art superpixel methods[J]. IEEE transactions on pattern analysis and machine intelligence, 2012, 34(11):2274-2282.
[24] HUNTER R S. Photoelectric color difference meter[J]. JOSA, 1958, 48(12):985-995.
[25] SMITH T, GUILD J. The CIE colorimetric standards and their use[J]. Transactions of the optical society, 1931, 33(3):73.
[26] KOLLER D, FRIEDMAN N. Probabilistic graphical models:principles and techniques[M].[S.l.]:MIT Press, 2009.
[27] HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507.
[28] FREUND Y, HAUSSLER D. Unsupervised learning of distributions on binary vectors using two layer networks[C]//Advances in neural information processing systems, 1992:912-919.
[29] HINTON G E. Training products of experts by minimizing contrastive divergence[J]. Training, 2006, 14(8).
[30] HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural computation, 2006, 18(7):1527-1554.
[31] BENGIO Y, LAMBLIN P, POPOVICI D, et al. Greedy layer-wise training of deep networks[C]//Advances in Neural Information Processing Systems, 2007:153-160.
[32] HINTON G. A practical guide to training restricted Boltzmann machines[J]. Momentum, 2010, 9(1):926.
[33] LIU C, YUEN J, TORRALBA A. Nonparametric scene parsing via label transfer[J]. IEEE transactions on pattern analysis and machine intelligence, 2011, 33(12):2368-2382.
[34] EVERINGHAM M, ESLAMI S M A, VAN GOOL L, et al. The pascal visual object classes challenge:A retrospective[J]. International journal of computer vision, 2015, 111(1):98-136.
[35] LECUN Y A, BOTTOU L, ORR G B, et al. Efficient backprop[M]//Neural networks:Tricks of the trade. Berlin Heidelberg:Springer, 2012:9-48.
[36] VEDALDI A, LENC K. Matconvnet:convolutional neural networks for matlab[C]//Proceedings of the 23rd ACM international conference on Multimedia. 2015:689-692.
[37] SCHMIDT M. UGM:A Matlab toolbox for probabilistic undirected graphical models[2016-12-20].[EB/OL]. URL http://www.cs.ubc.ca/schmidtm/Software/UGM.html.
[38] PERCEPTRON M. DeepLearning 0.1 documentation.[EB/OL].2014,URL http://deeplearning.net/tutorial/
[39] HARIHARAN B, ARBELáEZ P, BOURDEV L, et al. Semantic contours from inverse detectors[C]//2011 IEEE International Conference on Computer Vision (ICCV). 2011:991-998.
[40] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft coco:Common objects in context[C]//European conference on computer vision. Springer, 2014:740-755.

备注/Memo

备注/Memo:
收稿日期:2017-01-14。
基金项目:国家重点研发计划(2016YFB1000400);国家自然科学基金项目(61573284);中央高校自由探索基金项目(HEUCF100606).
作者简介:张国印(1962-),男,教授,博士生导师;吴艳霞(1979-),女,副教授.
通讯作者:吴艳霞,E-mail:wuyanxia@hrbeu.edu.cn.
更新日期/Last Update: 2018-01-13