research-article

User Identity Linkage in Social Media Using Linguistic and Social Interaction Features

Authors:

Despoina Chatzakou,

Juan Soler-Company,

Theodora Tsikrika,

Stefanos Vrochidis,

Ioannis KompatsiarisAuthors Info & Claims

WebSci '20: Proceedings of the 12th ACM Conference on Web Science

Pages 295 - 304

https://doi.org/10.1145/3394231.3397920

Published: 06 July 2020 Publication History

Abstract

Social media users often hold several accounts in their effort to multiply the spread of their thoughts, ideas, and viewpoints. In the particular case of objectionable content, users tend to create multiple accounts to bypass the combating measures enforced by social media platforms and thus retain their online identity even if some of their accounts are suspended. User identity linkage aims to reveal social media accounts likely to belong to the same natural person so as to prevent the spread of abusive/illegal activities. To this end, this work proposes a machine learning-based detection model, which uses multiple attributes of users’ online activity in order to identify whether two or more virtual identities belong to the same real natural person. The models efficacy is demonstrated on two cases on abusive and terrorism-related Twitter content.

Supplementary Material

MP4 File (3394231.3397920.mp4)

Presentation Video

Download
98.00 MB

References

[1]

Meghan J Babcock, Vivian P Ta, and William Ickes. 2014. Latent semantic similarity and language style matching in initial dyadic interactions. Journal of Language and Social Psychology 33, 1 (2014), 78–88.

[2]

Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion. IW3C2, 759–760.

Digital Library

[3]

John D Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on Twitter. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 1301–1309.

[4]

Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean birds: Detecting aggression and bullying on twitter. In Proceedings of the 2017 ACM on Web Science Conference. ACM, 13–22.

Digital Library

[5]

Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Measuring #gamergate: A tale of hate, sexism, and bullying. In Proceedings of the 26th International Conference on World Wide Web Companion. IW3C2, 1285–1290.

Digital Library

[6]

Maura Conway, Moign Khawaja, Suraj Lakhani, Jeremy Reffin, Andrew Robertson, and David Weir. 2018. Disrupting Daesh: measuring takedown of online terrorist material and its impacts. Studies in Conflict & Terrorism(2018), 1–20.

[7]

Ali Fisher. 2015. Swarmcast: How jihadist networks maintain a persistent online presence. Perspectives on Terrorism 9, 3 (2015), 3–20.

[8]

Antigoni-Maria Founta, Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Athena Vakali, and Ilias Leontiadis. 2019. A Unified Deep Learning Architecture for Abuse Detection. In Proceedings of the 2019 ACM on Web Science Conference (to appear). ACM.

Digital Library

[9]

[9] Gephi.2019. https://gephi.org/.

[10]

Ilias Gialampoukidis, George Kalpakis, Theodora Tsikrika, Symeon Papadopoulos, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2017. Detection of terrorism-related twitter communities using centrality scores. In Proceedings of the 2nd International Workshop on Multimedia Forensics and Security. ACM, 21–25.

Digital Library

[11]

Oana Goga, Howard Lei, Sree Hari Krishnan Parthasarathi, Gerald Friedland, Robin Sommer, and Renata Teixeira. 2013. Exploiting innocuous activity for correlating users across sites. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 447–458.

Digital Library

[12]

Oana Goga, Patrick Loiseau, Robin Sommer, Renata Teixeira, and Krishna P Gummadi. 2015. On the reliability of profile matching across large online social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1799–1808.

Digital Library

[13]

Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 168–177.

Digital Library

[14]

Fredrik Johansson, Lisa Kaati, and Amendra Shrestha. 2013. Detecting multiple aliases in social media. In Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, 1004–1011.

Digital Library

[15]

Fredrik Johansson, Lisa Kaati, and Amendra Shrestha. 2015. Timeprints for identifying social media users with multiple aliases. Security Informatics 4, 1 (2015), 7.

[16]

Imrul Kayes, Nicolas Kourtellis, Daniele Quercia, Adriana Iamnitchi, and Francesco Bonchi. 2015. The Social World of Content Abusers in Community Question Answering. In Proceedings of the 24th International Conference on World Wide Web. IW3C2, 570–580.

Digital Library

[17]

Ji-Hyun Kim. 2009. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational statistics & data analysis 53, 11 (2009), 3735–3745.

[18]

Jytte Klausen. 2015. Tweeting the Jihad: Social media networks of Western foreign fighters in Syria and Iraq. Studies in Conflict & Terrorism 38, 1 (2015), 1–22.

[19]

Srijan Kumar, Justin Cheng, Jure Leskovec, and VS Subrahmanian. 2017. An army of me: Sockpuppets in online discussion communities. In Proceedings of the 26th International Conference on World Wide Web. IW3C2, 857–866.

Digital Library

[20]

Li Liu, William K Cheung, Xin Li, and Lejian Liao. 2016. Aligning Users across Social Networks Using Network Embedding. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. AAAI Press, 1774–1780.

[21]

Siyuan Liu, Shuhui Wang, Feida Zhu, Jinbo Zhang, and Ramayya Krishnan. 2014. Hydra: Large-scale social identity linkage via heterogeneous behavior modeling. In Proceedings of the 2014 International Conference on Management of data. ACM, 51–62.

Digital Library

[22]

Anshu Malhotra, Luam Totti, Wagner Meira Jr, Ponnurangam Kumaraguru, and Virgilio Almeida. 2012. Studying user footprints in different online social networks. In Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining. IEEE Computer Society, 1065–1070.

Digital Library

[23]

Adrienne Massanari. 2017. #Gamergate and The Fappening: How Reddit’s algorithm, governance, and culture support toxic technocultures. New Media & Society(2017), 329–346.

[24]

Ryan McDonald, Kevin Lerman, and Fernando Pereira. 2006. Multilingual dependency analysis with a two-stage discriminative parser. In Proceedings of the 10th Conference on Computational Natural Language Learning. ACL, 216–220.

[25]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781(2013).

[26]

Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Interspeech, Vol. 2. 3.

[27]

Xin Mu, Feida Zhu, Ee-Peng Lim, Jing Xiao, Jianzong Wang, and Zhi-Hua Zhou. 2016. User identity linkage by latent user space modelling. In Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining. ACM, 1775–1784.

Digital Library

[28]

Arjun Mukherjee and Bing Liu. 2010. Improving gender classification of blog authors. In Proceedings of the 2010 conference on Empirical Methods in natural Language Processing. Association for Computational Linguistics, 207–217.

Digital Library

[29]

Shuyo Nakatani. 2010. Language Detection Library for Java. https://github.com/shuyo/language-detection

[30]

Gonzalo Navarro. 2001. A Guided Tour to Approximate String Matching. Comput. Surveys 33, 1 (2001), 31–88.

Digital Library

[31]

Yuanping Nie, Yan Jia, Shudong Li, Xiang Zhu, Aiping Li, and Bin Zhou. 2016. Identifying users across social networks based on dynamic core interests. Neurocomputing 210(2016), 107–115.

Digital Library

[32]

Leonardo Nizzoli, Marco Avvenuti, Stefano Cresci, and Maurizio Tesconi. 2019. Extremist Propaganda Tweet Classification with Deep Learning in Realistic Scenarios. In Proceedings of the 10th ACM Conference on Web Science. 203–204.

Digital Library

[33]

Jan Pennekamp, Martin Henze, Oliver Hohlfeld, and Andriy Panchenko. 2019. Hi DoppelgäNger : Towards Detecting Manipulation in News Comments. In Companion Proceedings of The 2019 World Wide Web Conference. ACM, 197–205.

Digital Library

[34]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 1532–1543.

[35]

Christopher Riederer, Yunsung Kim, Augustin Chaintreau, Nitish Korula, and Silvio Lattanzi. 2016. Linking users across domains with location data: Theory and validation. In Proceedings of the 25th International Conference on World Wide Web. IW3C2, 707–719.

Digital Library

[36]

Juan Soler-Company and Leo Wanner. 2017. On the relevance of syntactic and discourse features for author profiling and identification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. ACL, 681–687.

[37]

Abu Bakr Soliman, Kareem Eissa, and Samhaa R El-Beltagy. 2017. Aravec: A set of arabic word embedding models for use in arabic nlp. Procedia Computer Science 117 (2017), 256–265.

[38]

Thamar Solorio, Ragib Hasan, and Mainul Mizan. 2013. A case study of sockpuppet detection in wikipedia. In Proceedings of the Workshop on Language Analysis in Social Media. ACL, 59–68.

[39]

Statista. 2019. Number of monthly active Twitter users worldwide from 1st quarter 2010 to 1st quarter 2019. goo.gl/JLy8Ko.

[40]

[40] Theano.2019. http://deeplearning.net/software/theano/.

[41]

Michail Tsikerdekis and Sherali Zeadally. 2014. Multiple account identity deception detection in social media using nonverbal behavior. IEEE Transactions on Information Forensics and Security 9, 8(2014), 1311–1321.

Digital Library

[42]

Twitter Public Policy. 2018. Expanding and building #TwitterTransparency. http://bit.ly/2SIHGNf.

[43]

Queenie Wong. 2019. Facebook pulls down fake accounts from the UK and Romania. cnet.co/2w81ZJd.

[44]

Reza Zafarani and Huan Liu. 2013. Connecting users across social media sites: a behavioral-modeling approach. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 41–49.

Digital Library

Cited By

Guo XLiu YLiu F(2024)Topic Partition of User-Generated Texts for User Identity Linkage Across Social Networks2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651152(1-7)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10651152
Bozas AAndreadis SChatzakou DSymeonidis STheodosiadou OKyriakidis PKokkalas AStathopoulos EDiplaris STsikrika TGialampoukidis IVrochidis SKompatsiaris I(2024)From Research to Applications: What Can We Extract with Social Media Sensing?SN Computer Science10.1007/s42979-024-02712-95:5Online publication date: 24-Apr-2024
https://doi.org/10.1007/s42979-024-02712-9
Tjandra BLuhukay DSyahchari D(2023)The Effect of Using Hashtags on Consumer Engagement with The Promotion of Property Products2023 International Conference on Informatics, Multimedia, Cyber and Informations System (ICIMCIS)10.1109/ICIMCIS60089.2023.10349084(61-66)Online publication date: 7-Nov-2023
https://doi.org/10.1109/ICIMCIS60089.2023.10349084
Show More Cited By

Recommendations

Social media user classification: based on social capital expectation, susceptibility, and compulsion loop
ICEC '17: Proceedings of the International Conference on Electronic Commerce

Social media such as Facebook, Instagram and Twitter are originally developed as communication tools among individuals for private conversations. Through the platforms, people share photos, stories and news with their social media friends to interact ...
What is Twitter, a social network or a news media?
WWW '10: Proceedings of the 19th international conference on World wide web

Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal ...
Social Media Marketing in Luxury Retail

This study examines the potentials of social media marketing for luxury retailers. Social media marketing tactics of three luxury retail brands Barneys New York, Net-a-Porter.com, and Saks Fifth Avenue were examined across three major social media sites ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WebSci '20: Proceedings of the 12th ACM Conference on Web Science

July 2020

361 pages

ISBN:9781450379892

DOI:10.1145/3394231

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 July 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

CONNEXIONs
PREVISION

Conference

WebSci '20

Sponsor:

SIGWEB

WebSci '20: 12th ACM Conference on Web Science

July 6 - 10, 2020

Southampton, United Kingdom

Acceptance Rates

Overall Acceptance Rate 245 of 933 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
267
Total Downloads

Downloads (Last 12 months)55
Downloads (Last 6 weeks)12

Reflects downloads up to 15 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Guo XLiu YLiu F(2024)Topic Partition of User-Generated Texts for User Identity Linkage Across Social Networks2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651152(1-7)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10651152
Bozas AAndreadis SChatzakou DSymeonidis STheodosiadou OKyriakidis PKokkalas AStathopoulos EDiplaris STsikrika TGialampoukidis IVrochidis SKompatsiaris I(2024)From Research to Applications: What Can We Extract with Social Media Sensing?SN Computer Science10.1007/s42979-024-02712-95:5Online publication date: 24-Apr-2024
https://doi.org/10.1007/s42979-024-02712-9
Tjandra BLuhukay DSyahchari D(2023)The Effect of Using Hashtags on Consumer Engagement with The Promotion of Property Products2023 International Conference on Informatics, Multimedia, Cyber and Informations System (ICIMCIS)10.1109/ICIMCIS60089.2023.10349084(61-66)Online publication date: 7-Nov-2023
https://doi.org/10.1109/ICIMCIS60089.2023.10349084
Lu DLi QLiu ZLi SWu XLi S(2023)A Review of User Identity Linkage Across Social Networks2023 8th International Conference on Data Science in Cyberspace (DSC)10.1109/DSC59305.2023.00068(429-436)Online publication date: 18-Aug-2023
https://doi.org/10.1109/DSC59305.2023.00068
Benkhedda YAzouaou F(2023)Optimizing user profile matching: a text-based approachInternational Journal of Computers and Applications10.1080/1206212X.2023.221824445:5(403-412)Online publication date: 30-May-2023
https://doi.org/10.1080/1206212X.2023.2218244
Gao HWang YShao JShen HCheng X(2022)User Identity Linkage across Social Networks with the Enhancement of Knowledge Graph and Time Decay FunctionEntropy10.3390/e2411160324:11(1603)Online publication date: 4-Nov-2022
https://doi.org/10.3390/e24111603
Gao HWang YShao JShen HCheng X(2021)UGCLink: User Identity Linkage by Modeling User Generated Contents with Knowledge Distillation2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671907(607-613)Online publication date: 15-Dec-2021
https://doi.org/10.1109/BigData52589.2021.9671907
Kompatsiaris ISchuller BLefter ICambria EKompatsiaris IStappen L(2020)Multimodal Social Media MiningProceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop10.1145/3423327.3423511(5-6)Online publication date: 16-Oct-2020
https://dl.acm.org/doi/10.1145/3423327.3423511

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents