[1]郭世明,高宏.基于滑动窗口挖掘数据流高效用项集的有效算法[J].哈尔滨工程大学学报,2018,39(04):721-729.[doi:10.11990/jheu.201611075]
 GUO Shiming,GAO Hong.An efficient algorithm for mining high utility itemsets from data streams based on sliding window techniques[J].hebgcdxxb,2018,39(04):721-729.[doi:10.11990/jheu.201611075]
点击复制

基于滑动窗口挖掘数据流高效用项集的有效算法(/HTML)
分享到:

《哈尔滨工程大学学报》[ISSN:1006-6977/CN:61-1281/TN]

卷:
39
期数:
2018年04期
页码:
721-729
栏目:
出版日期:
2018-04-05

文章信息/Info

Title:
An efficient algorithm for mining high utility itemsets from data streams based on sliding window techniques
作者:
郭世明 高宏
哈尔滨工业大学 计算机学院, 黑龙江 哈尔滨 150001
Author(s):
GUO Shiming GAO Hong
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
关键词:
高效用项集模式增长数据流效用挖掘滑动窗口数据挖掘
分类号:
TP391
DOI:
10.11990/jheu.201611075
文献标志码:
A
摘要:
现有的基于滑动窗口挖掘高效用项集的研究方法存在:候选项集通常数量巨大,需要大量的存储空间及计算候选项集的真实效用是非常耗时的问题。本文提出一种不生成候选项集的挖掘算法HUISW(high utility itemset mining over a siding window),HUISW采用一种新的树结构HUIL-Tree(high utility itemset tee which arranges items according to lexicographic order)存储滑动窗口中的项集信息,采用效用数据库存储项集在窗口事务中的效用信息,在挖掘过程中HUISW采用模式增长的方法对由HUIL-Tree生成的项集通过其与效用数据库的对应关系,直接计算其在滑动窗口中的效用,整个过程避免了候选项集的生成。在实验中通过由稀疏和稠密数据集模拟的数据流对HUISW进行性能评估,并与同类算法SHU-Growth(siding window based high utility growth)进行比较,实验结果表明HUISW显著优于SHU-Growth,运行时间最快可提升两个数量级。

参考文献/References:

[1] CHI Yun, WANG Haixun, YU P S, et al. Moment:maintaining closed frequent itemsets over a stream sliding Window[C]//Proceedings of the 4th IEEE International Conference on Data Mining. Brighton, UK:IEEE, 2004:59-66.
[2] AGRAWAL R, SRIKANT R. Fast algorithms for mining association rules in large databases[C]//Proceedings of the 20th International Conference on Very Large Data Bases. San Francisco, CA:Morgan Kaufmann Press, 1994:487-499.
[3] CHU C J, TSENG V S, LIANG T. An efficient algorithm for mining temporal high utility itemsets from data streams[J]. Journal of systems and software, 2008, 81(7):1105-1117.
[4] LI Huafu. MHUI-max:an efficient algorithm for discovering high-utility itemsets from data streams[J]. Journal of information science, 2011, 37(5):532-545.
[5] LI Huafu, HUANG H Y, CHEN Yicheng, et al. Fast and memory efficient mining of high utility itemsets in data streams[C]//Proceedings of the 8th IEEE International Conference on Data Mining. Pisa, Italy:IEEE, 2008:881-886.
[6] HAN Jiawei, PEI Jian, YIN Yiwen. Mining frequent patterns without candidate generation[C]//Proceedings of 2000 ACM SIGMOD International Conference on Management of Data. Dallas, Texas, USA:ACM, 2000:1-12.
[7] AHMED C F, TANBEER S K, JEONG B S, et al. Interactive mining of high utility patterns over data streams[J]. Expert systems with applications, 2012, 39(15):11979-11991.
[8] RYANG H, YUN U. High utility pattern mining over data streams with sliding window technique[J]. Expert systems with applications, 2016, 57:214-231.
[9] AHMED C F, TANBEER S K, JEONG B S, et al. Efficient tree structures for high utility pattern mining in incremental databases[J]. IEEE transactions on knowledge and data engineering, 2009, 21(12):1708-1721.
[10] ZIHAYAT M, AN Aijun. Mining top-k high utility patterns over data streams[J]. Information sciences, 2014, 285:138-161.
[11] LEUNG C K S, KHAN Q I, LI Zhan, et al. CanTree:a canonical-order tree for incremental frequent-pattern mining[J]. Knowledge and information systems, 2007, 11(3):287-311.
[12] LIU Ying, LIAO Weikeng, CHOUDHARY A. A two-phase algorithm for fast discovery of high utility itemsets[C]//Proceedings of the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Hanoi, Vietnam:Springer, 2005:689-695.
[13] LIU Mengchi, QU Junfeng. Mining high utility itemsets without candidate generation[C]//Proceedings of the 21st ACM International Conference on Information and Knowledge Management. Maui, Hawaii, USA:ACM, 2012:55-64.

备注/Memo

备注/Memo:
收稿日期:2016-11-24。
基金项目:国家自然科学基金项目(61190115).
作者简介:郭世明(1982-),男,博士研究生;高宏(1950-),女,教授,博士生导师.
通讯作者:郭世明,E-mail:ggmyson@hit.edu.cn
更新日期/Last Update: 2018-04-11