[1]徐森,卢志茂,张春祥,等.使用证据累积的文本聚类谱算法[J].哈尔滨工程大学学报,2010,(08):0.
 XU Sen,LU Zhi mao,ZHANG Chun xiang,et al.A document clustering spectral algorithm that uses evidence accumulation[J].hebgcdxxb,2010,(08):0.
点击复制

使用证据累积的文本聚类谱算法(/HTML)
分享到:

《哈尔滨工程大学学报》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2010年08期
页码:
0
栏目:
出版日期:
2010-08-25

文章信息/Info

Title:
A document clustering spectral algorithm that uses evidence accumulation
作者:
徐森卢志茂 张春祥 顾国昌 张琦
(1. 盐城工学院信息工程学院, 江苏盐城224000; 2. 哈尔滨工程大学计算机科学与技术学院,黑龙江哈尔滨150001; 3. 哈尔滨理工大学计算机科学与技术学院,黑龙江哈尔滨150001)
Author(s):
XU Sen LU Zhimao ZHANG Chunxiang GU Guochang ZHANG Qi
(1. School of Information Engineering, Yancheng Institute of Technology, Yancheng 224000, China; 2. College of Computer Science and Technology,Harbin Engineering University, Harbin 150001, China; 3. School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150001, China)
关键词:
聚类分析文本聚类聚类谱证据累积超球K均值
分类号:
TP391
文献标志码:
A
摘要:
针对谱聚类算法相似度函数设置困难问题,提出了一种使用证据累积的文本聚类谱算法.该算法使用超球K均值算法对文本集进行多次聚类,并将每次得到的划分结果作为判断2个文本是否应该放在一个簇中的证据,由此构建文本的相似度矩阵和正则化拉普拉斯矩阵.在TREC和Reuters文本集上进行了实验,验证了本文算法的有效性,它比层次聚类算法和CLUTO提供的K均值算法更加优越.

参考文献/References:

[1]TAN P N, STEINBACH M, KUMAR V. Introduction to data mining [M]. MA: AddisonWesley Longman, 2005: 487647.[2]徐森, 卢志茂, 顾国昌. 解决文本聚类集成问题的两个谱算法 [J]. 自动化学报, 2009, 35(7): 9971002.
XU Sen, LU Zhimao, GU Guochang. Two spectral algorithms for ensembling document clusters [J]. Acta Automatica Sinica, 2009, 35(7): 9971002.
[3]LUXBURG U V. A tutorial on spectral clustering [J]. Statistics and Computing, 2007, 17(4): 395416.
[4]HAGEN L, KAHNG A B. New spectral methods for ratio cut partitioning and clustering [J]. IEEE Transactions on ComputerAided Design, 1992, 11(9): 10741085.
[5]SHI J, MALIK J. Normalized cuts and image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888905.
[6]NG A Y, JORDAN M I, WEISS Y. On spectral clustering: analysis and an algorithm [C]// Advances in Neural Information Processing Systems. Vancouver, Canada, 2001.
[7]MEILA M, SHI J. A random walks view of spectral segmentation [C]// The 8th International Workshop on Artificial Intelligence and Statistics. Key West, USA, 200
[8]王玲, 薄列峰, 焦李成. 密度敏感的谱聚类 [J]. 电子学报, 2007, 35(8): 15771581.
WANG Ling, BO Liefeng, JIAO Licheng. Densitysensitive spectral clustering [J]. Acta Electronica Sinica, 2007, 35(8): 15771581.
[9]FRED A, JAIN A K. Data clustering using evidence accumulation [C]// Proceedings of the 16th International Conference on Pattern Recognition, Quebec City, Canada, 2002:276280.
[10]FRED A L, JAIN A K. Combining multiple clusterings using evidence accumulation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6): 835850.
[11]DHILLON I S, MODHA D S. Concept decompositions for large sparse text data using clustering [J]. Machine Learning, 2001, 42: 143175.
[12]LUXBURG U V, BELKIN M, BOUSQUET O. Consistency of spectral clustering [J]. The Annals of Statistics, 2008, 36(2): 555586.
[13]TREHL A, GHOSH J. Cluster ensembles—a knowledge reuse framework for combining partitionings [J]. Journal of Machine Learning Research, 2002, 3: 583617.
[14]TREC. Text retrieval conference[EB/OL].[20071128]. http://trec.nist.gov.
[15]LEWIS D D. Reuters21578 text categorization test collection distribution 1.0 [EB/OL].[20071128]. http://www.research.att.com/~lewis.

相似文献/References:

[1]周爽”,张钧萍,张枫,等.基于蚁群算法的遥感图像聚类方法[J].哈尔滨工程大学学报,2009,(02):210.
 ZHOU Shuan9,ZHANG Junpin9,ZHANG Fen9,et al.Clustering method for remote sensing images based on an ant colony algorithm[J].hebgcdxxb,2009,(08):210.
[2]李涛,裴文江,王少平,等.竞争与动态合作学习聚类分析算法[J].哈尔滨工程大学学报,2010,(01):102.
 LI Tao,PEI Wen jiang,WANG Shao ping,et al.Competitive and dynamic cooperative learning algorithm[J].hebgcdxxb,2010,(08):102.
[3]卢志茂,李纯,张琦.近邻传播的文本聚类集成谱算法[J].哈尔滨工程大学学报,2012,(07):899.[doi:10.3969/j.issn.1006-7043.201109001]
 LU Zhimao,LI Chun,ZHANG Qi.A document cluster ensemble spectral algorithm based on affinity propagation[J].hebgcdxxb,2012,(08):899.[doi:10.3969/j.issn.1006-7043.201109001]
[4]陈伟,周文.中国航空航天制造业自主创新效率研究[J].哈尔滨工程大学学报,2014,(06):777.[doi:10.3969/j.issn.10067043.201306010]
 CHEN Wei,ZHOU Wen.Research on the efficiency of indigenous innovation of the aerospace industry in China[J].hebgcdxxb,2014,(08):777.[doi:10.3969/j.issn.10067043.201306010]
[5]张冰,杨静,张健沛,等.面向聚类分析的邻域拓扑势熵数据扰动方法[J].哈尔滨工程大学学报,2014,(09):1149.[doi:10.3969/j.issn.1006-7043.201311034]
 ZHANG Bing,YANG Jing,ZHANG Jianpei,et al.A neighborhood topological potential entropy data perturbation method for clustering analysis[J].hebgcdxxb,2014,(08):1149.[doi:10.3969/j.issn.1006-7043.201311034]
[6]于德龙,孙柏涛,闫培雷.应用多指标体系构建城乡分级公里格网模型[J].哈尔滨工程大学学报,2015,(12):1584.[doi:10.11990/jheu.201507070]
 YU Delong,SUN Baitao,YAN Peilei.Use of a multi-indicator system to construct kilometer grid-based urban and rural classification model[J].hebgcdxxb,2015,(08):1584.[doi:10.11990/jheu.201507070]

备注/Memo

备注/Memo:
国家自然科学基金资助项目(60603092,60903082,60975042);高等学校博士学科点专项科研基金资助项目(20070217043).
更新日期/Last Update: 2010-09-03