research-article

Robust Spammer Detection in Microblogs: Leveraging User Carefulness

Authors:

Neil Zhenqiang Gong,

Guangzhong Sun,

Enhong ChenAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 8, Issue 6

Article No.: 83, Pages 1 - 31

https://doi.org/10.1145/3086637

Published: 18 August 2017 Publication History

Abstract

Microblogging Web sites, such as Twitter and Sina Weibo, have become popular platforms for socializing and sharing information in recent years. Spammers have also discovered this new opportunity to unfairly overpower normal users with unsolicited content, namely social spams. Although it is intuitive for everyone to follow legitimate users, recent studies show that both legitimate users and spammers follow spammers for different reasons. Evidence of users seeking spammers on purpose is also observed. We regard this behavior as useful information for spammer detection. In this article, we approach the problem of spammer detection by leveraging the “carefulness” of users, which indicates how careful a user is when she is about to follow a potential spammer. We propose a framework to measure the carefulness and develop a supervised learning algorithm to estimate it based on known spammers and legitimate users. We illustrate how the robustness of the detection algorithms can be improved with aid of the proposed measure. Evaluation on two real datasets from Sina Weibo and Twitter with millions of users are performed, as well as an online test on Sina Weibo. The results show that our approach indeed captures the carefulness, and it is effective for detecting spammers. In addition, we find that our measure is also beneficial for other applications, such as link prediction.

References

[1]

Lada A. Adamic and Eytan Adar. 2003. Friends and neighbors on the Web. Social Networks 25, 3, 211--230.

[2]

Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: Predicting and recommending links in social networks. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 635--644.

Digital Library

[3]

Fabrıcio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgılio Almeida. 2010. Detecting spammers on Twitter. In Proceedings of the 7th Annual Collaboration, Electronic Messaging, Anti-Abuse, and Spam Conference. 12.

[4]

Yazan Boshmaf, Dionysios Logothetis, Georgos Siganos, Jorge Lería, Jose Lorenzo, Matei Ripeanu, and Konstantin Beznosov. 2015. Íntegro: Leveraging victim prediction for robust fake account detection in OSNs. In Proceedings of the 2015 Network and Distributed System Security Symposium.

[5]

P. O. Boykin and V. P. Roychowdhury. 2005. Leveraging social networks to fight spam. Computer 38, 4, 61--68.

Digital Library

[6]

Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro. 2012. Aiding the detection of fake accounts in large scale social online services. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 15.

Digital Library

[7]

Paul-Alexandru Chirita, Jörg Diederich, and Wolfgang Nejdl. 2005. MailRank: Using ranking for spam detection. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management. 373--380.

Digital Library

[8]

George Danezis and Prateek Mittal. 2009. SybilInfer: Detecting sybil nodes using social networks. In Proceedings of the ISOC Network and Distributed System Security Symposium.

[9]

Peng Gao, Neil Zhenqiang Gong, Sanjeev Kulkarni, Kurt Thomas, and Prateek Mittal. 2015. SybilFrame: A defense-in-depth framework for structure-based sybil detection. arXiv:1503.02985.

[10]

Sheng Gao, Ludovic Denoyer, and Patrick Gallinari. 2011. Temporal link prediction by integrating content and structure information. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 1169--1174.

Digital Library

[11]

Saptarshi Ghosh, Bimal Viswanath, Farshad Kooti, Naveen Kumar Sharma, Gautam Korlam, Fabricio Benevenuto, Niloy Ganguly, and Krishna Phani Gummadi. 2012. Understanding and combating link farming in the Twitter social network. In Proceedings of the 21st International Conference on World Wide Web. 61--70.

Digital Library

[12]

Neil Zhenqiang Gong, Michael Frank, and Payal Mittal. 2014a. SybilBelief: A semi-supervised learning approach for structure-based sybil detection. IEEE Transactions on Information Forensics and Security 9, 6, 976--987.

Digital Library

[13]

Neil Zhenqiang Gong, Ameet Talwalkar, Lester Mackey, Ling Huang, Eui Chul Richard Shin, Emil Stefanov, Elaine Runting Shi, and Dawn Song. 2014b. Joint link prediction and attribute inference using a social-attribute network. ACM Transactions on Intelligent Systems and Technology 5, 2, Article No. 27.

Digital Library

[14]

Chris Grier, Kurt Thomas, Vern Paxson, and Michael Zhang. 2010. @Spam: The underground on 140 characters or less. In Proceedings of the 17th ACM Conference on Computer and Communications Security. 27--37.

Digital Library

[15]

Zoltán Gyöngyi, Hector Garcia-Molina, and Jan Pedersen. 2004. Combating Web spam with TrustRank. In Proceedings of the 30th International Conference on Very Large Data Bases. 576--587.

Digital Library

[16]

Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina. 2007. Fighting spam on social Web sites: A survey of approaches and future challenges. IEEE Internet Computing 11, 6, 36--45.

Digital Library

[17]

John Hopcroft, Tiancheng Lou, and Jie Tang. 2011. Who will follow you back? Reciprocal relationship prediction. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 1137--1146.

[18]

Xia Hu, Jiliang Tang, and Huan Liu. 2014. Leveraging knowledge across media for spammer detection in microblogging. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. 547--556.

Digital Library

[19]

Xia Hu, Jiliang Tang, Yanchao Zhang, and Huan Liu. 2013. Social spammer detection in microblogging. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2633--2639.

[20]

Junxian Huang, Yinglian Xie, Fang Yu, Qifa Ke, Martin Abadi, Eliot Gillum, and Z. Morley Mao. 2013. SocialWatch: Detection of online service abuse via large-scale social graphs. In Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer, and Communications Security. 143--148.

Digital Library

[21]

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web. 591--600.

Digital Library

[22]

Kyumin Lee, James Caverlee, and Steve Webb. 2010. Uncovering social spammers: Social honeypots + machine learning. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 435--442.

Digital Library

[23]

Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining. 631--636.

Digital Library

[24]

David Liben-Nowell and Jon Kleinberg. 2003. The link prediction problem for social networks. In Proceedings of the 12th International Conference on Information and Knowledge Management. 556--559.

Digital Library

[25]

Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. 2010. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference. 1--9.

Digital Library

[26]

Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. 2011. Suspended accounts in retrospect: An analysis of Twitter spam. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference. 243--258.

Digital Library

[27]

Binghui Wang, Le Zhang, and Neil Zhenqiang Gong. 2017. SybilSCAR: Sybil detection in online social networks via local rule based propagation. In Proceedings of the IEEE International Conference on Computer Communications.

[28]

Dashun Wang, Dino Pedreschi, Chaoming Song, Fosca Giannotti, and Albert-Laszlo Barabasi. 2011. Human mobility, social ties, and link prediction. In Proceedings of the 17th ACM International Conference on Knowledge Discovery and Data Mining. 1100--1108.

Digital Library

[29]

Jianshu Weng, Ee-Peng Lim, Jing Jiang, and Qi He. 2010. TwitterRank: Finding topic-sensitive influential Twitterers. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 261--270.

Digital Library

[30]

Q. Xu, E. W. Xiang, Q. Yang, J. Du, and J. Zhong. 2012. SMS spam detection using noncontent features. IEEE Intelligent Systems 27, 6, 44--51.

Digital Library

[31]

Jilong Xue, Zhi Yang, Xiaoyong Yang, Xiao Wang, Lijiang Chen, and Yafei Dai. 2013. VoteTrust: Leveraging friend invitation graph to defend against social network sybils. In Proceedings of the 32nd IEEE International Conference on Computer Communications. 2400--2408.

[32]

Lian Yan, Robert H. Dodier, Michael Mozer, and Richard H. Wolniewicz. 2003. Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic. In Proceedings of the 20th International Conference on Machine Learning. 848--855.

[33]

Chao Yang, Robert Harkreader, Jialong Zhang, Seungwon Shin, and Guofei Gu. 2012. Analyzing spammers’ social networks for fun and profit: A case study of cyber criminal ecosystem on Twitter. In Proceedings of the 21st International Conference on World Wide Web. 71--80.

Digital Library

[34]

Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai. 2011. Uncovering social network sybils in the wild. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement. ACM, New York, NY, 259--268.

Digital Library

[35]

Sarita Yardi, Daniel Romero, and Grant Schoenebeck. 2009. Detecting spam in a Twitter network. First Monday 15, 1.

[36]

Haifeng Yu, Phillip B. Gibbons, Michael Kaminsky, and Feng Xiao. 2008. SybilLimit: A near-optimal social network defense against sybil attacks. In Proceedings of the IEEE Symposium on Security and Privacy. 3--17.

Digital Library

[37]

Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman. 2006. SybilGuard: Defending against sybil attacks via social networks. Computer Communication Review 36, 4, 267--278.

Digital Library

[38]

L. L. Yu, S. Asur, and B. A. Huberman. 2012. Artificial inflation: The real story of trends and trend-setters in Sina Weibo. In Proceedings of the International Conference on Privacy, Security, Risk, and Trust, and the International Conference on Social Computing. 514--519.

Digital Library

[39]

Yin Zhu, Xiao Wang, Erheng Zhong, Nathan Nan Liu, He Li, and Qiang Yang. 2012. Discovering spammers in social networks. In Proceedings of the 26th AAAI Conference on Artificial Intelligence.

Cited By

Lu HGong DLi ZLiu FLiu F(2023)SybilHP: Sybil Detection in Directed Social Networks with Adaptive Homophily PredictionApplied Sciences10.3390/app1309534113:9(5341)Online publication date: 25-Apr-2023
https://doi.org/10.3390/app13095341
Deng LWu CLian DWu YChen E(2023)Markov-Driven Graph Convolutional Networks for Social Spammer DetectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.315066935:12(12310-12322)Online publication date: 1-Dec-2023
https://doi.org/10.1109/TKDE.2022.3150669
Furutani SShibahara TAkiyama MAida M(2023)Interpreting Graph-Based Sybil Detection Methods as Low-Pass FilteringIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.323736418(1225-1236)Online publication date: 2023
https://doi.org/10.1109/TIFS.2023.3237364
Show More Cited By

Index Terms

Robust Spammer Detection in Microblogs: Leveraging User Carefulness
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Leveraging Careful Microblog Users for Spammer Detection
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

Microblogging websites, e.g. Twitter and Sina Weibo, have become a popular platform for socializing and sharing information in recent years. Spammers have also discovered this new opportunity to unfairly overpower normal users with unsolicited content, ...
Discovering spammer communities in twitter

Online social networks have become immensely popular in recent years and have become the major sources for tracking the reverberation of events and news throughout the world. However, the diversity and popularity of online social networks attract ...
Leveraging knowledge across media for spammer detection in microblogging
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

While microblogging has emerged as an important information sharing and communication platform, it has also become a convenient venue for spammers to overwhelm other users with unwanted content. Currently, spammer detection in microblogging focuses on ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 8, Issue 6

Survey Paper, Regular Papers and Special Issue: Social Media Processing

November 2017

265 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3127339

Editor:
Yu Zheng
Microsoft Research, China

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 August 2017

Accepted: 01 March 2017

Revised: 01 May 2016

Received: 01 December 2015

Published in TIST Volume 8, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation for Distinguished Young Scholars of China
Youth Innovation Promotion Association of the Chinese Academy of Sciences

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
376
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lu HGong DLi ZLiu FLiu F(2023)SybilHP: Sybil Detection in Directed Social Networks with Adaptive Homophily PredictionApplied Sciences10.3390/app1309534113:9(5341)Online publication date: 25-Apr-2023
https://doi.org/10.3390/app13095341
Deng LWu CLian DWu YChen E(2023)Markov-Driven Graph Convolutional Networks for Social Spammer DetectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.315066935:12(12310-12322)Online publication date: 1-Dec-2023
https://doi.org/10.1109/TKDE.2022.3150669
Furutani SShibahara TAkiyama MAida M(2023)Interpreting Graph-Based Sybil Detection Methods as Low-Pass FilteringIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.323736418(1225-1236)Online publication date: 2023
https://doi.org/10.1109/TIFS.2023.3237364
Fei GCheng YMa WChen CWen SHu G(2023)Real-Time Detection of COVID-19 Events From Twitter: A Spatial-Temporally Bursty-Aware MethodIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.316974210:2(656-672)Online publication date: Apr-2023
https://doi.org/10.1109/TCSS.2022.3169742
Guo ZYu KJolfaei ADing FZhang N(2022)Fuz-Spam: Label Smoothing-Based Fuzzy Detection of Spammers in Internet of ThingsIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2021.313031130:11(4543-4554)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1109/TFUZZ.2021.3130311
Huang QHe YWu BZhao HLian W(2022)Abnormal Behavior Analysis Based on Truth Discovery and Machine Learning2022 Global Conference on Robotics, Artificial Intelligence and Information Technology (GCRAIT)10.1109/GCRAIT55928.2022.00026(83-88)Online publication date: Jul-2022
https://doi.org/10.1109/GCRAIT55928.2022.00026
Mewada ADewang R(2022)A comprehensive survey of various methods in opinion spam detectionMultimedia Tools and Applications10.1007/s11042-022-13702-582:9(13199-13239)Online publication date: 5-Sep-2022
https://dl.acm.org/doi/10.1007/s11042-022-13702-5
He YYang PCheng P(2022)Semi-supervised internet water army detection based on graph embeddingMultimedia Tools and Applications10.1007/s11042-022-13633-182:7(9891-9912)Online publication date: 16-Sep-2022
https://dl.acm.org/doi/10.1007/s11042-022-13633-1
Nedunchezhian PMahalingam M(2022)SybilSort algorithm - a friend request decision tracking recommender system in online social networksApplied Intelligence10.1007/s10489-021-02578-x52:4(3995-4014)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1007/s10489-021-02578-x
Guo ZShen YBashir AImran MKumar NZhang DYu K(2021)Robust Spammer Detection Using Collaborative Neural Network in Internet-of-Things ApplicationsIEEE Internet of Things Journal10.1109/JIOT.2020.30038028:12(9549-9558)Online publication date: 15-Jun-2021
https://doi.org/10.1109/JIOT.2020.3003802
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents