[1]谭龙,张晓琪,贾立,等.一种高效的大数据增量真值发现算法[J].哈尔滨工程大学学报,2019,40(04):805-812.[doi:10.11990/jheu.201808060]
 TAN Long,ZHANG Xiaoqi,JIA li,et al.A high-efficiency incremental truth discovery algorithm in big data[J].hebgcdxxb,2019,40(04):805-812.[doi:10.11990/jheu.201808060]
点击复制

一种高效的大数据增量真值发现算法(/HTML)
分享到:

《哈尔滨工程大学学报》[ISSN:1006-6977/CN:61-1281/TN]

卷:
40
期数:
2019年04期
页码:
805-812
栏目:
出版日期:
2019-04-05

文章信息/Info

Title:
A high-efficiency incremental truth discovery algorithm in big data
作者:
谭龙1 张晓琪1 贾立2 李建中1 王宏志2
1. 黑龙江大学 计算机科学与技术学院, 黑龙江 哈尔滨, 150080;
2. 哈尔滨工业大学 计算机科学与技术系, 黑龙江 哈尔滨 150001
Author(s):
TAN Long1 ZHANG Xiaoqi1 JIA li2 LI jianzhong1 WANG hongzhi2
1. School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China;
2. Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
关键词:
Map-Reduce贝叶斯真值发现增量投票机制大数据数据质量
分类号:
TP311.1
DOI:
10.11990/jheu.201808060
文献标志码:
A
摘要:
针对多源异构大数据中传统真值发现算法可扩展性不足、增量真值发现效果差等问题,本文将Map-Reduce框架和贝叶斯真值发现模型相结合,提出了基于Map-Reduce的并行真值发现算法;在MPTF算法基础上,引入Incoop增量框架和基于投票机制的分类器集成策略,并优化了Map过程和Reduce过程,提出了一种高效的大数据增量真值发现算法;实验表明:该算法不仅提高了分类器的准确性,而且实现了新增数据源的真值发现。通过理论分析和实验对比证明,该算法具有高效性和广泛适用性,同时可以兼顾多种现实中的复杂情形。

参考文献/References:

[1] JIA Li, WANG Hongzhi, LI Jianzhong, et al. Incremental truth discovery for information from multiple data sources[M]//GAO Y. Web-Age Information Management. Berlin:Springer, 2013:56-66.
[2] HUANG Chao, WANG Dong, CHAWLA N. Scalable uncertainty-aware truth discovery in big data social sensing applications for cyber-physical systems[J/OL]. IEEE Transactions on Big Data:(2017-02-14). https://ieeexplore.ieee.org/document/7855694. DOI:10.1109/TBDATA.2017.2669308.
[3] ZHANG D Y, WANG Dong, ZHANG Yang. Constraint-aware dynamic truth discovery in big data social media sensing[C]//2017 IEEE International Conference on Big Data (Big Data). Boston, MA, USA:IEEE, 2017:57-66.
[4] ZHANG D, WANG Dong, VANCE N, et al. On scalable and robust truth discovery in big data social media sensing applications[J/OL]. IEEE Transactions on Big Data:(2018-04-10). https://www.computer.org/csdl/trans/bd/preprint/08334619-abs.html. DOI:10.1109/TBDATA.2018.2824812.
[5] LI Yaliang, GAO Jing, MENG Chuisi, et al. A survey on truth discovery[J]. ACM SIGKDD Explorations Newsletter, 2016, 17(2):1-16.
[6] YIN Xiaoxin, HAN Jiawei, YU P S. Truth discovery with multiple conflicting information providers on the web[J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(6):796-808.
[7] DONG X L, BERTI-EQUILLE L, SRIVASTAVA D. Truth discovery and copying detection in a dynamic world[J]. Proceedings of the VLDB Endowment, 2009, 2(1):562-573.
[8] ZHAO Bo, RUBINSTEIN B I P, GEMMELL J, et al. A Bayesian approach to discovering truth from conflicting sources for data integration[J]. Proceedings of the VLDB Endowment, 2012, 5(6):550-561.
[9] PASTERNACK J, ROTH D. Knowing what to believe (when you already know something)[C]//Proceedings of the 23rd International Conference on Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics, 2010:877-885.
[10] LI Qi, LI Yaliang, GAO Jing, et al. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation[C]//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. New York, NY, USA:ACM, 2014:1187-1198.
[11] ZHOU Dengyong, PLATT J C, BASU S, et al. Learning from the wisdom of crowds by minimax entropy[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada:Curran Associates Inc., 2012:2204-2212.
[12] MARSHALL J, ARGUETA A, WANG Dong. A neural network approach for truth discovery in social sensing[C]//2017 IEEE 14th International Conference on Mobile Ad Hoc and Sensor Systems (MASS). Orlando, FL, USA:IEEE, 2017:343-347.
[13] ZHANG D Y, BADILLA J, ZHANG Yang, et al. Towards reliable missing truth discovery in online social media sensing applications[C]//2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). Barcelona, Spain:IEEE, 2018:143-150.
[14] BAKHTIARI B, YAZDI H S. Bayesian filter based on the wisdom of crowds[J]. Neurocomputing, 2018, 283:181-195.
[15] FANG X S, SHENG Q Z, WANG Xianzi, et al. SmartVote:a full-fledged graph-based model for multi-valued truth discovery[J]. World Wide Web:(2018-08-22). https://link.springer.com/article/10.1007%2Fs11280-018-0629-3. DOI:10.1007/s11280-018-0629-3.
[16] ZHAO Bo, RUBINSTEIN B I P, GEMMELL J, et al. A Bayesian approach to discovering truth from conflicting sources for data integration[J]. Proceedings of the VLDB Endowment, 2012, 5(6):550-561.
[17] GARCIA-ULLOA D A, XIONG Li, SUNDERAM V. Truth discovery for spatio-temporal events from crowdsourced data[J]. Proceedings of the VLDB Endowment, 2017, 10(11):1562-1573.
[18] THIYAGARAJ M P B, ALOYSIUS A. A survey on truth discovery methods for big data[J]. International Journal of Computational Intelligence Research, 2017, 13(7):1799-1810.
[19] XU Guowen, LI Hongwei, TAN Chen, et al. Achieving efficient and privacy-preserving truth discovery in crowd sensing systems[J]. Computers & Security, 2017, 69:114-126.

备注/Memo

备注/Memo:
收稿日期:2018-08-20。
基金项目:国家自然科学基金面上项目(81273649);黑龙江省自然科学基金面上项目(F201434).
作者简介:谭龙,男,副教授,博士研究生;王宏志,男,教授,博士生导师;李建中.
通讯作者:李建中,E-mail:Lijzh@hit.edu.cn
更新日期/Last Update: 2019-04-03