skip to main content
10.1145/3394231.3397920acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article

User Identity Linkage in Social Media Using Linguistic and Social Interaction Features

Published: 06 July 2020 Publication History

Abstract

Social media users often hold several accounts in their effort to multiply the spread of their thoughts, ideas, and viewpoints. In the particular case of objectionable content, users tend to create multiple accounts to bypass the combating measures enforced by social media platforms and thus retain their online identity even if some of their accounts are suspended. User identity linkage aims to reveal social media accounts likely to belong to the same natural person so as to prevent the spread of abusive/illegal activities. To this end, this work proposes a machine learning-based detection model, which uses multiple attributes of users’ online activity in order to identify whether two or more virtual identities belong to the same real natural person. The models efficacy is demonstrated on two cases on abusive and terrorism-related Twitter content.

Supplementary Material

MP4 File (3394231.3397920.mp4)
Presentation Video

References

[1]
Meghan J Babcock, Vivian P Ta, and William Ickes. 2014. Latent semantic similarity and language style matching in initial dyadic interactions. Journal of Language and Social Psychology 33, 1 (2014), 78–88.
[2]
Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion. IW3C2, 759–760.
[3]
John D Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on Twitter. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 1301–1309.
[4]
Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean birds: Detecting aggression and bullying on twitter. In Proceedings of the 2017 ACM on Web Science Conference. ACM, 13–22.
[5]
Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Measuring #gamergate: A tale of hate, sexism, and bullying. In Proceedings of the 26th International Conference on World Wide Web Companion. IW3C2, 1285–1290.
[6]
Maura Conway, Moign Khawaja, Suraj Lakhani, Jeremy Reffin, Andrew Robertson, and David Weir. 2018. Disrupting Daesh: measuring takedown of online terrorist material and its impacts. Studies in Conflict & Terrorism(2018), 1–20.
[7]
Ali Fisher. 2015. Swarmcast: How jihadist networks maintain a persistent online presence. Perspectives on Terrorism 9, 3 (2015), 3–20.
[8]
Antigoni-Maria Founta, Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Athena Vakali, and Ilias Leontiadis. 2019. A Unified Deep Learning Architecture for Abuse Detection. In Proceedings of the 2019 ACM on Web Science Conference (to appear). ACM.
[9]
[9] Gephi.2019. https://gephi.org/.
[10]
Ilias Gialampoukidis, George Kalpakis, Theodora Tsikrika, Symeon Papadopoulos, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2017. Detection of terrorism-related twitter communities using centrality scores. In Proceedings of the 2nd International Workshop on Multimedia Forensics and Security. ACM, 21–25.
[11]
Oana Goga, Howard Lei, Sree Hari Krishnan Parthasarathi, Gerald Friedland, Robin Sommer, and Renata Teixeira. 2013. Exploiting innocuous activity for correlating users across sites. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 447–458.
[12]
Oana Goga, Patrick Loiseau, Robin Sommer, Renata Teixeira, and Krishna P Gummadi. 2015. On the reliability of profile matching across large online social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1799–1808.
[13]
Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 168–177.
[14]
Fredrik Johansson, Lisa Kaati, and Amendra Shrestha. 2013. Detecting multiple aliases in social media. In Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, 1004–1011.
[15]
Fredrik Johansson, Lisa Kaati, and Amendra Shrestha. 2015. Timeprints for identifying social media users with multiple aliases. Security Informatics 4, 1 (2015), 7.
[16]
Imrul Kayes, Nicolas Kourtellis, Daniele Quercia, Adriana Iamnitchi, and Francesco Bonchi. 2015. The Social World of Content Abusers in Community Question Answering. In Proceedings of the 24th International Conference on World Wide Web. IW3C2, 570–580.
[17]
Ji-Hyun Kim. 2009. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational statistics & data analysis 53, 11 (2009), 3735–3745.
[18]
Jytte Klausen. 2015. Tweeting the Jihad: Social media networks of Western foreign fighters in Syria and Iraq. Studies in Conflict & Terrorism 38, 1 (2015), 1–22.
[19]
Srijan Kumar, Justin Cheng, Jure Leskovec, and VS Subrahmanian. 2017. An army of me: Sockpuppets in online discussion communities. In Proceedings of the 26th International Conference on World Wide Web. IW3C2, 857–866.
[20]
Li Liu, William K Cheung, Xin Li, and Lejian Liao. 2016. Aligning Users across Social Networks Using Network Embedding. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. AAAI Press, 1774–1780.
[21]
Siyuan Liu, Shuhui Wang, Feida Zhu, Jinbo Zhang, and Ramayya Krishnan. 2014. Hydra: Large-scale social identity linkage via heterogeneous behavior modeling. In Proceedings of the 2014 International Conference on Management of data. ACM, 51–62.
[22]
Anshu Malhotra, Luam Totti, Wagner Meira Jr, Ponnurangam Kumaraguru, and Virgilio Almeida. 2012. Studying user footprints in different online social networks. In Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining. IEEE Computer Society, 1065–1070.
[23]
Adrienne Massanari. 2017. #Gamergate and The Fappening: How Reddit’s algorithm, governance, and culture support toxic technocultures. New Media & Society(2017), 329–346.
[24]
Ryan McDonald, Kevin Lerman, and Fernando Pereira. 2006. Multilingual dependency analysis with a two-stage discriminative parser. In Proceedings of the 10th Conference on Computational Natural Language Learning. ACL, 216–220.
[25]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781(2013).
[26]
Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Interspeech, Vol. 2. 3.
[27]
Xin Mu, Feida Zhu, Ee-Peng Lim, Jing Xiao, Jianzong Wang, and Zhi-Hua Zhou. 2016. User identity linkage by latent user space modelling. In Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining. ACM, 1775–1784.
[28]
Arjun Mukherjee and Bing Liu. 2010. Improving gender classification of blog authors. In Proceedings of the 2010 conference on Empirical Methods in natural Language Processing. Association for Computational Linguistics, 207–217.
[29]
Shuyo Nakatani. 2010. Language Detection Library for Java. https://github.com/shuyo/language-detection
[30]
Gonzalo Navarro. 2001. A Guided Tour to Approximate String Matching. Comput. Surveys 33, 1 (2001), 31–88.
[31]
Yuanping Nie, Yan Jia, Shudong Li, Xiang Zhu, Aiping Li, and Bin Zhou. 2016. Identifying users across social networks based on dynamic core interests. Neurocomputing 210(2016), 107–115.
[32]
Leonardo Nizzoli, Marco Avvenuti, Stefano Cresci, and Maurizio Tesconi. 2019. Extremist Propaganda Tweet Classification with Deep Learning in Realistic Scenarios. In Proceedings of the 10th ACM Conference on Web Science. 203–204.
[33]
Jan Pennekamp, Martin Henze, Oliver Hohlfeld, and Andriy Panchenko. 2019. Hi DoppelgäNger : Towards Detecting Manipulation in News Comments. In Companion Proceedings of The 2019 World Wide Web Conference. ACM, 197–205.
[34]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 1532–1543.
[35]
Christopher Riederer, Yunsung Kim, Augustin Chaintreau, Nitish Korula, and Silvio Lattanzi. 2016. Linking users across domains with location data: Theory and validation. In Proceedings of the 25th International Conference on World Wide Web. IW3C2, 707–719.
[36]
Juan Soler-Company and Leo Wanner. 2017. On the relevance of syntactic and discourse features for author profiling and identification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. ACL, 681–687.
[37]
Abu Bakr Soliman, Kareem Eissa, and Samhaa R El-Beltagy. 2017. Aravec: A set of arabic word embedding models for use in arabic nlp. Procedia Computer Science 117 (2017), 256–265.
[38]
Thamar Solorio, Ragib Hasan, and Mainul Mizan. 2013. A case study of sockpuppet detection in wikipedia. In Proceedings of the Workshop on Language Analysis in Social Media. ACL, 59–68.
[39]
Statista. 2019. Number of monthly active Twitter users worldwide from 1st quarter 2010 to 1st quarter 2019. goo.gl/JLy8Ko.
[40]
[40] Theano.2019. http://deeplearning.net/software/theano/.
[41]
Michail Tsikerdekis and Sherali Zeadally. 2014. Multiple account identity deception detection in social media using nonverbal behavior. IEEE Transactions on Information Forensics and Security 9, 8(2014), 1311–1321.
[42]
Twitter Public Policy. 2018. Expanding and building #TwitterTransparency. http://bit.ly/2SIHGNf.
[43]
Queenie Wong. 2019. Facebook pulls down fake accounts from the UK and Romania. cnet.co/2w81ZJd.
[44]
Reza Zafarani and Huan Liu. 2013. Connecting users across social media sites: a behavioral-modeling approach. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 41–49.

Cited By

View all
  • (2024)Topic Partition of User-Generated Texts for User Identity Linkage Across Social Networks2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651152(1-7)Online publication date: 30-Jun-2024
  • (2024)From Research to Applications: What Can We Extract with Social Media Sensing?SN Computer Science10.1007/s42979-024-02712-95:5Online publication date: 24-Apr-2024
  • (2023)The Effect of Using Hashtags on Consumer Engagement with The Promotion of Property Products2023 International Conference on Informatics, Multimedia, Cyber and Informations System (ICIMCIS)10.1109/ICIMCIS60089.2023.10349084(61-66)Online publication date: 7-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WebSci '20: Proceedings of the 12th ACM Conference on Web Science
July 2020
361 pages
ISBN:9781450379892
DOI:10.1145/3394231
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Abusive and Illegal content
  2. Actor identity resolution
  3. Twitter

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • CONNEXIONs
  • PREVISION

Conference

WebSci '20
Sponsor:
WebSci '20: 12th ACM Conference on Web Science
July 6 - 10, 2020
Southampton, United Kingdom

Acceptance Rates

Overall Acceptance Rate 245 of 933 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)12
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Topic Partition of User-Generated Texts for User Identity Linkage Across Social Networks2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651152(1-7)Online publication date: 30-Jun-2024
  • (2024)From Research to Applications: What Can We Extract with Social Media Sensing?SN Computer Science10.1007/s42979-024-02712-95:5Online publication date: 24-Apr-2024
  • (2023)The Effect of Using Hashtags on Consumer Engagement with The Promotion of Property Products2023 International Conference on Informatics, Multimedia, Cyber and Informations System (ICIMCIS)10.1109/ICIMCIS60089.2023.10349084(61-66)Online publication date: 7-Nov-2023
  • (2023)A Review of User Identity Linkage Across Social Networks2023 8th International Conference on Data Science in Cyberspace (DSC)10.1109/DSC59305.2023.00068(429-436)Online publication date: 18-Aug-2023
  • (2023)Optimizing user profile matching: a text-based approachInternational Journal of Computers and Applications10.1080/1206212X.2023.221824445:5(403-412)Online publication date: 30-May-2023
  • (2022)User Identity Linkage across Social Networks with the Enhancement of Knowledge Graph and Time Decay FunctionEntropy10.3390/e2411160324:11(1603)Online publication date: 4-Nov-2022
  • (2021)UGCLink: User Identity Linkage by Modeling User Generated Contents with Knowledge Distillation2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671907(607-613)Online publication date: 15-Dec-2021
  • (2020)Multimodal Social Media MiningProceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop10.1145/3423327.3423511(5-6)Online publication date: 16-Oct-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media