research-article

Linking Multiple User Identities of Multiple Services from Massive Mobility Traces

Authors:

Depeng JinAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 12, Issue 4

Article No.: 39, Pages 1 - 28

https://doi.org/10.1145/3439817

Published: 12 August 2021 Publication History

Abstract

Understanding the linkability of online user identifiers (IDs) is critical to both service providers (for business intelligence) and individual users (for assessing privacy risks). Existing methods are designed to match IDs across two services but face key challenges of matching multiple services in practice, particularly when users have multiple IDs per service. In this article, we propose a novel system to link IDs across multiple services by exploring the spatial-temporal features of user activities, of which the core idea is that the same user's online IDs are more likely to repeatedly appear at the same location. Specifically, we first utilize a contact graph to capture the “co-location” of all IDs across multiple services. Based on this graph, we propose a set-wise matching algorithm to discover candidate ID sets and use Bayesian inference to generate confidence scores for candidate ranking, which is proved to be optimal. We evaluate our system using two real-world ground-truth datasets from an Internet service provider (4 services, 815K IDs) and Twitter-Foursquare (2 services, 770 IDs). Extensive results show that our system significantly outperforms the state-of-the-art algorithms in accuracy (AUC is higher by 0.1–0.2), and it is highly robust against data quality, matching order, and number of services.

References

[1]

Osman Abul, Francesco Bonchi, and Mirco Nanni. 2008. Never walk alone: Uncertainty for anonymity in moving objects databases. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'08).

Digital Library

[2]

Osman Abul, Francesco Bonchi, and Mirco Nanni. 2010. Anonymization of moving objects databases by clustering and perturbation. Information Systems 35, 8 (2010), 884–910.

Digital Library

[3]

Gergely Acs and Claude Castelluccia. 2014. A case study: Privacy preserving release of spatio-temporal density in paris. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'14).

Digital Library

[4]

Miguel E. Andrés, Nicolás E. Bordenabe, Konstantinos Chatzikokolakis, and Catuscia Palamidessi. 2013. Geo-indistinguishability: Differential privacy for location-based systems. In Proceedings of the ACM SIGSAC Conference on Computer & Communications Security (CCS'13).

Digital Library

[5]

Dirk Brockmann, Lars Hufnagel, and Theo Geisel. 2006. The scaling laws of human travel. Nature 439, 7075 (2006), 462.

[6]

Wei Chen, Hongzhi Yin, Weiqing Wang, Lei Zhao, Wen Hua, and Xiaofang Zhou. 2017. Exploiting spatio-temporal user behaviors for user linkage. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'17)517–526.

Digital Library

[7]

Eunjoon Cho, Seth A. Myers, and Jure Leskovec. 2011. Friendship and mobility: User movement in location-based social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'11).

Digital Library

[8]

Yoni De Mulder, George Danezis, Lejla Batina, and Bart Preneel. 2008. Identification via location-profiling in GSM networks. In Proceedings of the 7th ACM workshop on Privacy in the electronic society (WPES'08).

Digital Library

[9]

Fatima Zohra Ennaji, Lobna Azaza, Zakaria Maamar, Abdelaziz El Fazziki, Marinette Savonnet, Mohamed Sadgal, Eric Leclercq, Idir Amine Amarouche, and Djamal Benslimane. 2018. Impact of credibility on opinion analysis in social media. Fundamenta Informaticae 162, 4 (2018), 259–281.

[10]

Jie Feng, Mingyang Zhang, Huandong Wang, Zeyu Yang, Chao Zhang, Yong Li, and Depeng Jin. 2019. DPLink: User identity linkage via deep neural network from heterogeneous mobility data. In Proceedings of the international conference on World Wide Web (WWW'19). 459–469.

Digital Library

[11]

Ming Gao, Ee-Peng Lim, David Lo, Feida Zhu, Philips Kokoh Prasetyo, and Aoying Zhou. 2015. CNL: Collective network linkage across heterogeneous social platforms. In Proceedings of the IEEE International Conference on Data Mining (ICDM'15).

Digital Library

[12]

Xing Gao, Wenli Ji, Yongjun Li, Yao Deng, and Wei Dong. 2018. User identification with spatio-temporal awareness across social networks. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'18).

Digital Library

[13]

Oana Goga, Howard Lei, Sree Hari Krishnan Parthasarathi, Gerald Friedland, Robin Sommer, and Renata Teixeira. 2013. Exploiting innocuous activity for correlating users across sites. In Proceedings of the 22nd international conference on World Wide Web (WWW'13).

Digital Library

[14]

Oana Goga, Patrick Loiseau, Robin Sommer, Renata Teixeira, and Krishna P. Gummadi. 2015. On the reliability of profile matching across large online social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'15).

Digital Library

[15]

Marco Gramaglia and Marco Fiore. 2015. Hiding mobile traffic fingerprints with GLOVE. In Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies (CoNext'15).

Digital Library

[16]

David J. Hand and Robert J. Till. 2001. A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45, 2 (2001), 171–186.

Digital Library

[17]

Danesh Irani, Steve Webb, Calton Pu, and Kang Li. 2011. Modeling unintended personal-information leakage from multiple online social networks. Internet Computing 15, 3 (2011), 13–19.

Digital Library

[18]

Shouling Ji, Weiqing Li, Neil Zhenqiang Gong, Prateek Mittal, and Raheem A. Beyah. 2015. On your social network de-anonymizablity: Quantification and large scale evaluation with seed knowledge. In Proceedings of the Network and Distributed System Security Symposium (NDSS'15).

[19]

Shouling Ji, Weiqing Li, Mudhakar Srivatsa, and Raheem Beyah. 2014. Structural data de-anonymization: Quantification, practice, and implications. In Proceedings of the ACM Conference on Computer and Communications Security (CCS'14).

Digital Library

[20]

Ethan Katz-Bassett, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas Anderson, and Yatin Chawathe. 2006. Towards IP geolocation using delay and topology measurements. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC'06).

Digital Library

[21]

Nitish Korula and Silvio Lattanzi. 2014. An efficient reconciliation algorithm for social networks. Proceedings of the VLDB Endowment (PVLDB) 7, 5 (2014), 377–388.

Digital Library

[22]

Shamanth Kumar, Reza Zafarani, and Huan Liu. 2011. Understanding user migration patterns in social media. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI'11).

Digital Library

[23]

Chaozhuo Li, Senzhang Wang, Yukun Wang, Philip Yu, Yanbo Liang, Yun Liu, and Zhoujun Li. 2019. Adversarial learning for weakly-supervised social network alignment. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI'19).

[24]

Chaozhuo Li, Senzhang Wang, Philip S. Yu, Lei Zheng, Xiaoming Zhang, Zhoujun Li, and Yanbo Liang. 2018. Distribution distance minimization for unsupervised user identity linkage. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'18).

Digital Library

[25]

Chung-Yi Li and Shou-De Lin. 2014. Matching users and items across domains to improve the recommendation quality. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'14).

Digital Library

[26]

Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE'07).

[27]

George Liu and Gerald Maguire Jr. 1996. A class of mobile motion prediction algorithms for wireless mobile computing and communications. Mobile Networks and Applications 1, 2 (1996), 113–121.

Digital Library

[28]

Siyuan Liu, Shuhui Wang, Feida Zhu, Jinbo Zhang, and Ramayya Krishnan. 2014. Hydra: Large-scale social identity linkage via heterogeneous behavior modeling. In Proceedings of the ACM Conference on Management of Data (SIGMOD'14).

Digital Library

[29]

Xin Lu, Erik Wetter, Nita Bharti, Andrew J. Tatem, and Linus Bengtsson. 2013. Approaching the limit of predictability in human mobility. Scientific Reports 3, 1 (2013), 1–9.

[30]

Xucheng Luo, Shengyang Li, and Yuxiang Peng. 2020. CNNTOP: A CNN-based trajectory owner prediction method. arXiv preprint arXiv:2001.01185 (2020).

[31]

Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasubramaniam. 2007. l-Diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD) 1, 1 (2007), 3.

Digital Library

[32]

Xin Mu, Feida Zhu, Ee Peng Lim, Jing Xiao, Jianzong Wang, and Zhi Hua Zhou. 2016. User identity linkage by latent user space modelling. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'16).

Digital Library

[33]

Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. Adaptive Computation and Machine Learning. MIT Press.

Digital Library

[34]

Farid M. Naini, Jayakrishnan Unnikrishnan, Patrick Thiran, and Martin Vetterli. 2016. Where you are is who you are: User identification by matching statistics. IEEE Transactions on Information Forensics and Security 11, 2 (2016), 358–372.

Digital Library

[35]

Arvind Narayanan and Vitaly Shmatikov. 2008. Robust de-anonymization of large sparse datasets. In Proceedings of the IEEE Symposium on Security and Privacy (SP'08).

Digital Library

[36]

Yuanping Nie, Yan Jia, Shudong Li, Xiang Zhu, Aiping Li, and Bin Zhou. 2016. Identifying users across social networks based on dynamic core interests. Neurocomputing 210 (2016), 107–115.

Digital Library

[37]

Simon Oya, Carmela Troncoso, and Fernando Pérez-González. 2017. Back to the drawing board: Revisiting the design of optimal location privacy-preserving mechanisms. In Proceedings of the ACM Conference on Computer and Communications Security (CCS'17).

Digital Library

[38]

Pewinternet. [n.d.]. Social Media Update 2016. http://www.pewinternet.org/2016/11/11/social-media-update-2016/.

[39]

David Martin Powers. 2011. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies 2, 1 (2011), 37–63.

[40]

Jan-Hendrik Prinz, Hao Wu, Marco Sarich, Bettina Keller, Martin Senne, Martin Held, John D. Chodera, Christof Schütte, and Frank Noé. 2011. Markov models of molecular kinetics: Generation and validation. Journal of Chemical Physics 134, 17 (2011), 174105.

[41]

Christopher Riederer, Yunsung Kim, Augustin Chaintreau, Nitish Korula, and Silvio Lattanzi. 2016. Linking users across domains with location data: Theory and validation. In Proceedings of the 25th International Conference on World Wide Web (WWW'16).

Digital Library

[42]

Luca Rossi and Mirco Musolesi. 2014. It's the way you check-in: Identifying users in location-based social networks. In Proceedings of the 2nd ACM SIGCOMM Workshop on Online Social Networks (WOSN'14).

Digital Library

[43]

Reza Shokri, George Theodorakopoulos, Jean-Yves Le Boudec, and Jean-Pierre Hubaux. 2011. Quantifying location privacy. In Proceedings of the IEEE Symposium on Security and Privacy (SP'11).

Digital Library

[44]

Kai Shu, Suhang Wang, Jiliang Tang, Reza Zafarani, and Huan Liu. 2017. User identity linkage across online social networks: A review. ACM SIGKDD Explorations Newsletter 18, 2 (2017), 5–17.

Digital Library

[45]

Mudhakar Srivatsa and Mike Hicks. 2012. Deanonymizing mobility traces: Using social network as a side-channel. In Proceedings of the ACM Conference on Computer and Communications Security (CCS'12).

Digital Library

[46]

Latanya Sweeney. 2002. k-Anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5 (2002), 557–570.

Digital Library

[47]

Techcrunch. [n.d.]. How Many Twitter Accounts Do You Have?https://techcrunch.com/2008/01/09/how-many-twitter-accounts-do-you-have/.

[48]

Benjamin Trendelkamp-Schroer and Frank Noé. 2013. Efficient Bayesian estimation of Markov model transition matrices with given stationary distribution. Journal of Chemical Physics 138, 16 (2013), 04B612.

[49]

Ionut Trestian, Supranamaya Ranjan, Aleksandar Kuzmanovic, and Antonio Nucci. 2009. Measuring serendipity: Connecting people, locations and interests in a mobile 3G network. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC'09).

Digital Library

[50]

Huandong Wang, Chen Gao, Yong Li, Zhi-Li Zhang, and Depeng Jin. 2017. From fingerprint to footprint: Revealing physical world privacy leakage by cyberspace cookie logs. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'17). 1209–1218.

Digital Library

[51]

Huandong Wang, Yong Li, Chen Gao, Gang Wang, Xiaoming Tao, and Depeng Jin. 2021. Anonymization and de-anonymization of mobility trajectories: Dissecting the gaps between theory and practice. IEEE Transactions on Mobile Computing 20, 3 (2021).

[52]

Huandong Wang, Yong Li, Gang Wang, and Depeng Jin. 2018. You are how you move: Linking multiple user identities from massive mobility traces. In Proceedings of the SIAM International Conference on Data Mining (SDM'18).

[53]

Ming Yan, Jitao Sang, Tao Mei, and Changsheng Xu. 2013. Friend transfer: Cold-start friend recommendation with cross-platform transfer learning of social knowledge. In Proceedings of the International Conference on Multimedia and Expo (ICME'13).

[54]

Chunfeng Yang, Huan Yan, Donghan Yu, Yong Li, and Dah Ming Chiu. 2017. Multi-site user behavior modeling and its application in video recommendation. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'17).

Digital Library

[55]

Reza Zafarani and Huan Liu. 2013. Connecting users across social media sites: A behavioral-modeling approach. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'13).

Digital Library

[56]

Reza Zafarani and Huan Liu. 2014. Finding friends on a new site using minimum information. In Proceedings of the SIAM International Conference on Data Mining (SDM'14).

[57]

Hui Zang and Jean Bolot. 2011. Anonymization of location data does not work: A large-scale measurement study. In Proceedings of the 17th Annual International Conference on Mobile Computing and Networking (MobiCom'11).

Digital Library

[58]

Jiawei Zhang, Xiangnan Kong, and Philip S. Yu. 2014. Transferring heterogeneous links across location-based social networks. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM'14).

Digital Library

[59]

Jiawei Zhang and S. Yu Philip. 2015. Multiple anonymized social networks alignment. In Proceedings of the IEEE International Conference on Data Mining (ICDM'15).

Digital Library

[60]

Zexuan Zhong, Yong Cao, Mu Guo, and Zaiqing Nie. 2018. CoLink: An unsupervised framework for user identity linkage. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI'18).

[61]

Xiaoping Zhou, Xun Liang, Xiaoyong Du, and Jichao Zhao. 2018. Structure based user identification across social networks. IEEE Transactions on Knowledge and Data Engineering 30, 6 (2018), 1178–1191.

Cited By

Abou Akar CAbdel Massih RYaghi AKhalil JKamradt MMakhoul A(2024)Generative Adversarial Network Applications in Industry 4.0: A ReviewInternational Journal of Computer Vision10.1007/s11263-023-01966-9132:6(2195-2254)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1007/s11263-023-01966-9
Cen KYang ZWang ZDong M(2024)A cross-domain user association scheme based on graph attention networks with trajectory embeddingMachine Learning10.1007/s10994-024-06613-zOnline publication date: 21-Aug-2024
https://doi.org/10.1007/s10994-024-06613-z
Zhang PZhang ZGong D(2024)An improved normal wiggly hesitant fuzzy FMEA model and its application to risk assessment of electric bus systemsApplied Intelligence10.1007/s10489-024-05458-254:8(6213-6237)Online publication date: 8-May-2024
https://dl.acm.org/doi/10.1007/s10489-024-05458-2
Show More Cited By

Index Terms

Linking Multiple User Identities of Multiple Services from Massive Mobility Traces

Recommendations

A conceptual graph approach to semantic similarity computation method for e-service discovery

E-services are services that make themselves available on the web and can be invoked over the internet. These services can be composed together to form new services. With the great growth of the number of services on the web it is indispensable to adopt ...
Privacy preserving service discovery and ranking for multiple user qos requirements in service-based software systems
Processing multiple requests to construct skyline composite services

The performance of a composite service is determined by the performance of involved component services. When multiple non-functional criteria are considered, users are required to express their preferences over different quality attributes as numeric ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 12, Issue 4

August 2021

368 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3468075

Editor:
Huan Liu
Arizona State University, USA

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2021

Accepted: 01 November 2020

Revised: 01 September 2020

Received: 01 April 2020

Published in TIST Volume 12, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Key Research and Development Program of China
National Nature Science Foundation of China
Beijing Natural Science Foundation
Beijing National Research Center for Information Science and Technology
Tsinghua University - Tencent Joint Laboratory for Internet Innovation Technology

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
166
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)3

Reflects downloads up to 14 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Abou Akar CAbdel Massih RYaghi AKhalil JKamradt MMakhoul A(2024)Generative Adversarial Network Applications in Industry 4.0: A ReviewInternational Journal of Computer Vision10.1007/s11263-023-01966-9132:6(2195-2254)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1007/s11263-023-01966-9
Cen KYang ZWang ZDong M(2024)A cross-domain user association scheme based on graph attention networks with trajectory embeddingMachine Learning10.1007/s10994-024-06613-zOnline publication date: 21-Aug-2024
https://doi.org/10.1007/s10994-024-06613-z
Zhang PZhang ZGong D(2024)An improved normal wiggly hesitant fuzzy FMEA model and its application to risk assessment of electric bus systemsApplied Intelligence10.1007/s10489-024-05458-254:8(6213-6237)Online publication date: 8-May-2024
https://dl.acm.org/doi/10.1007/s10489-024-05458-2
Xu RHuang WZhao JChen MNie L(2023)A Spatial and Adversarial Representation Learning Approach for Land Use Classification with POIsACM Transactions on Intelligent Systems and Technology10.1145/362782414:6(1-25)Online publication date: 14-Nov-2023
https://dl.acm.org/doi/10.1145/3627824
Jangra HK ASaha SBanerjee SChelliah MKumaraguru P(2023)Social Re-Identification Assisted RTO Detection for E-CommerceCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3587620(854-858)Online publication date: 30-Apr-2023
https://doi.org/10.1145/3543873.3587620
Shirvani SBaseri YGhorbani A(2023)Evaluation Framework for Electric Vehicle Security Risk AssessmentIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.330766025:1(33-56)Online publication date: 11-Sep-2023
https://dl.acm.org/doi/10.1109/TITS.2023.3307660
Wang BZhang MDing PYang TJin YXu Y(2023)User re-identification via human mobility trajectories with siamese transformer networksApplied Intelligence10.1007/s10489-023-05234-854:1(815-834)Online publication date: 20-Dec-2023
https://dl.acm.org/doi/10.1007/s10489-023-05234-8
Hossain NBijoy MIslam SShatabda S(2023)Panini: a transformer-based grammatical error correction method for BanglaNeural Computing and Applications10.1007/s00521-023-09211-736:7(3463-3477)Online publication date: 4-Dec-2023
https://dl.acm.org/doi/10.1007/s00521-023-09211-7
Yang X(undefined)Research on the Application of Neural Network Classification Model in English Grammar Error CorrectionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3596492
https://dl.acm.org/doi/10.1145/3596492

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents