skip to main content
research-article

Linking Multiple User Identities of Multiple Services from Massive Mobility Traces

Published: 12 August 2021 Publication History

Abstract

Understanding the linkability of online user identifiers (IDs) is critical to both service providers (for business intelligence) and individual users (for assessing privacy risks). Existing methods are designed to match IDs across two services but face key challenges of matching multiple services in practice, particularly when users have multiple IDs per service. In this article, we propose a novel system to link IDs across multiple services by exploring the spatial-temporal features of user activities, of which the core idea is that the same user's online IDs are more likely to repeatedly appear at the same location. Specifically, we first utilize a contact graph to capture the “co-location” of all IDs across multiple services. Based on this graph, we propose a set-wise matching algorithm to discover candidate ID sets and use Bayesian inference to generate confidence scores for candidate ranking, which is proved to be optimal. We evaluate our system using two real-world ground-truth datasets from an Internet service provider (4 services, 815K IDs) and Twitter-Foursquare (2 services, 770 IDs). Extensive results show that our system significantly outperforms the state-of-the-art algorithms in accuracy (AUC is higher by 0.1–0.2), and it is highly robust against data quality, matching order, and number of services.

References

[1]
Osman Abul, Francesco Bonchi, and Mirco Nanni. 2008. Never walk alone: Uncertainty for anonymity in moving objects databases. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'08).
[2]
Osman Abul, Francesco Bonchi, and Mirco Nanni. 2010. Anonymization of moving objects databases by clustering and perturbation. Information Systems 35, 8 (2010), 884–910.
[3]
Gergely Acs and Claude Castelluccia. 2014. A case study: Privacy preserving release of spatio-temporal density in paris. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'14).
[4]
Miguel E. Andrés, Nicolás E. Bordenabe, Konstantinos Chatzikokolakis, and Catuscia Palamidessi. 2013. Geo-indistinguishability: Differential privacy for location-based systems. In Proceedings of the ACM SIGSAC Conference on Computer & Communications Security (CCS'13).
[5]
Dirk Brockmann, Lars Hufnagel, and Theo Geisel. 2006. The scaling laws of human travel. Nature 439, 7075 (2006), 462.
[6]
Wei Chen, Hongzhi Yin, Weiqing Wang, Lei Zhao, Wen Hua, and Xiaofang Zhou. 2017. Exploiting spatio-temporal user behaviors for user linkage. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'17)517–526.
[7]
Eunjoon Cho, Seth A. Myers, and Jure Leskovec. 2011. Friendship and mobility: User movement in location-based social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'11).
[8]
Yoni De Mulder, George Danezis, Lejla Batina, and Bart Preneel. 2008. Identification via location-profiling in GSM networks. In Proceedings of the 7th ACM workshop on Privacy in the electronic society (WPES'08).
[9]
Fatima Zohra Ennaji, Lobna Azaza, Zakaria Maamar, Abdelaziz El Fazziki, Marinette Savonnet, Mohamed Sadgal, Eric Leclercq, Idir Amine Amarouche, and Djamal Benslimane. 2018. Impact of credibility on opinion analysis in social media. Fundamenta Informaticae 162, 4 (2018), 259–281.
[10]
Jie Feng, Mingyang Zhang, Huandong Wang, Zeyu Yang, Chao Zhang, Yong Li, and Depeng Jin. 2019. DPLink: User identity linkage via deep neural network from heterogeneous mobility data. In Proceedings of the international conference on World Wide Web (WWW'19). 459–469.
[11]
Ming Gao, Ee-Peng Lim, David Lo, Feida Zhu, Philips Kokoh Prasetyo, and Aoying Zhou. 2015. CNL: Collective network linkage across heterogeneous social platforms. In Proceedings of the IEEE International Conference on Data Mining (ICDM'15).
[12]
Xing Gao, Wenli Ji, Yongjun Li, Yao Deng, and Wei Dong. 2018. User identification with spatio-temporal awareness across social networks. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'18).
[13]
Oana Goga, Howard Lei, Sree Hari Krishnan Parthasarathi, Gerald Friedland, Robin Sommer, and Renata Teixeira. 2013. Exploiting innocuous activity for correlating users across sites. In Proceedings of the 22nd international conference on World Wide Web (WWW'13).
[14]
Oana Goga, Patrick Loiseau, Robin Sommer, Renata Teixeira, and Krishna P. Gummadi. 2015. On the reliability of profile matching across large online social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'15).
[15]
Marco Gramaglia and Marco Fiore. 2015. Hiding mobile traffic fingerprints with GLOVE. In Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies (CoNext'15).
[16]
David J. Hand and Robert J. Till. 2001. A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45, 2 (2001), 171–186.
[17]
Danesh Irani, Steve Webb, Calton Pu, and Kang Li. 2011. Modeling unintended personal-information leakage from multiple online social networks. Internet Computing 15, 3 (2011), 13–19.
[18]
Shouling Ji, Weiqing Li, Neil Zhenqiang Gong, Prateek Mittal, and Raheem A. Beyah. 2015. On your social network de-anonymizablity: Quantification and large scale evaluation with seed knowledge. In Proceedings of the Network and Distributed System Security Symposium (NDSS'15).
[19]
Shouling Ji, Weiqing Li, Mudhakar Srivatsa, and Raheem Beyah. 2014. Structural data de-anonymization: Quantification, practice, and implications. In Proceedings of the ACM Conference on Computer and Communications Security (CCS'14).
[20]
Ethan Katz-Bassett, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas Anderson, and Yatin Chawathe. 2006. Towards IP geolocation using delay and topology measurements. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC'06).
[21]
Nitish Korula and Silvio Lattanzi. 2014. An efficient reconciliation algorithm for social networks. Proceedings of the VLDB Endowment (PVLDB) 7, 5 (2014), 377–388.
[22]
Shamanth Kumar, Reza Zafarani, and Huan Liu. 2011. Understanding user migration patterns in social media. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI'11).
[23]
Chaozhuo Li, Senzhang Wang, Yukun Wang, Philip Yu, Yanbo Liang, Yun Liu, and Zhoujun Li. 2019. Adversarial learning for weakly-supervised social network alignment. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI'19).
[24]
Chaozhuo Li, Senzhang Wang, Philip S. Yu, Lei Zheng, Xiaoming Zhang, Zhoujun Li, and Yanbo Liang. 2018. Distribution distance minimization for unsupervised user identity linkage. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'18).
[25]
Chung-Yi Li and Shou-De Lin. 2014. Matching users and items across domains to improve the recommendation quality. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'14).
[26]
Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE'07).
[27]
George Liu and Gerald Maguire Jr. 1996. A class of mobile motion prediction algorithms for wireless mobile computing and communications. Mobile Networks and Applications 1, 2 (1996), 113–121.
[28]
Siyuan Liu, Shuhui Wang, Feida Zhu, Jinbo Zhang, and Ramayya Krishnan. 2014. Hydra: Large-scale social identity linkage via heterogeneous behavior modeling. In Proceedings of the ACM Conference on Management of Data (SIGMOD'14).
[29]
Xin Lu, Erik Wetter, Nita Bharti, Andrew J. Tatem, and Linus Bengtsson. 2013. Approaching the limit of predictability in human mobility. Scientific Reports 3, 1 (2013), 1–9.
[30]
Xucheng Luo, Shengyang Li, and Yuxiang Peng. 2020. CNNTOP: A CNN-based trajectory owner prediction method. arXiv preprint arXiv:2001.01185 (2020).
[31]
Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasubramaniam. 2007. l-Diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD) 1, 1 (2007), 3.
[32]
Xin Mu, Feida Zhu, Ee Peng Lim, Jing Xiao, Jianzong Wang, and Zhi Hua Zhou. 2016. User identity linkage by latent user space modelling. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'16).
[33]
Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. Adaptive Computation and Machine Learning. MIT Press.
[34]
Farid M. Naini, Jayakrishnan Unnikrishnan, Patrick Thiran, and Martin Vetterli. 2016. Where you are is who you are: User identification by matching statistics. IEEE Transactions on Information Forensics and Security 11, 2 (2016), 358–372.
[35]
Arvind Narayanan and Vitaly Shmatikov. 2008. Robust de-anonymization of large sparse datasets. In Proceedings of the IEEE Symposium on Security and Privacy (SP'08).
[36]
Yuanping Nie, Yan Jia, Shudong Li, Xiang Zhu, Aiping Li, and Bin Zhou. 2016. Identifying users across social networks based on dynamic core interests. Neurocomputing 210 (2016), 107–115.
[37]
Simon Oya, Carmela Troncoso, and Fernando Pérez-González. 2017. Back to the drawing board: Revisiting the design of optimal location privacy-preserving mechanisms. In Proceedings of the ACM Conference on Computer and Communications Security (CCS'17).
[38]
Pewinternet. [n.d.]. Social Media Update 2016. http://www.pewinternet.org/2016/11/11/social-media-update-2016/.
[39]
David Martin Powers. 2011. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies 2, 1 (2011), 37–63.
[40]
Jan-Hendrik Prinz, Hao Wu, Marco Sarich, Bettina Keller, Martin Senne, Martin Held, John D. Chodera, Christof Schütte, and Frank Noé. 2011. Markov models of molecular kinetics: Generation and validation. Journal of Chemical Physics 134, 17 (2011), 174105.
[41]
Christopher Riederer, Yunsung Kim, Augustin Chaintreau, Nitish Korula, and Silvio Lattanzi. 2016. Linking users across domains with location data: Theory and validation. In Proceedings of the 25th International Conference on World Wide Web (WWW'16).
[42]
Luca Rossi and Mirco Musolesi. 2014. It's the way you check-in: Identifying users in location-based social networks. In Proceedings of the 2nd ACM SIGCOMM Workshop on Online Social Networks (WOSN'14).
[43]
Reza Shokri, George Theodorakopoulos, Jean-Yves Le Boudec, and Jean-Pierre Hubaux. 2011. Quantifying location privacy. In Proceedings of the IEEE Symposium on Security and Privacy (SP'11).
[44]
Kai Shu, Suhang Wang, Jiliang Tang, Reza Zafarani, and Huan Liu. 2017. User identity linkage across online social networks: A review. ACM SIGKDD Explorations Newsletter 18, 2 (2017), 5–17.
[45]
Mudhakar Srivatsa and Mike Hicks. 2012. Deanonymizing mobility traces: Using social network as a side-channel. In Proceedings of the ACM Conference on Computer and Communications Security (CCS'12).
[46]
Latanya Sweeney. 2002. k-Anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5 (2002), 557–570.
[47]
Techcrunch. [n.d.]. How Many Twitter Accounts Do You Have?https://techcrunch.com/2008/01/09/how-many-twitter-accounts-do-you-have/.
[48]
Benjamin Trendelkamp-Schroer and Frank Noé. 2013. Efficient Bayesian estimation of Markov model transition matrices with given stationary distribution. Journal of Chemical Physics 138, 16 (2013), 04B612.
[49]
Ionut Trestian, Supranamaya Ranjan, Aleksandar Kuzmanovic, and Antonio Nucci. 2009. Measuring serendipity: Connecting people, locations and interests in a mobile 3G network. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC'09).
[50]
Huandong Wang, Chen Gao, Yong Li, Zhi-Li Zhang, and Depeng Jin. 2017. From fingerprint to footprint: Revealing physical world privacy leakage by cyberspace cookie logs. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'17). 1209–1218.
[51]
Huandong Wang, Yong Li, Chen Gao, Gang Wang, Xiaoming Tao, and Depeng Jin. 2021. Anonymization and de-anonymization of mobility trajectories: Dissecting the gaps between theory and practice. IEEE Transactions on Mobile Computing 20, 3 (2021).
[52]
Huandong Wang, Yong Li, Gang Wang, and Depeng Jin. 2018. You are how you move: Linking multiple user identities from massive mobility traces. In Proceedings of the SIAM International Conference on Data Mining (SDM'18).
[53]
Ming Yan, Jitao Sang, Tao Mei, and Changsheng Xu. 2013. Friend transfer: Cold-start friend recommendation with cross-platform transfer learning of social knowledge. In Proceedings of the International Conference on Multimedia and Expo (ICME'13).
[54]
Chunfeng Yang, Huan Yan, Donghan Yu, Yong Li, and Dah Ming Chiu. 2017. Multi-site user behavior modeling and its application in video recommendation. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'17).
[55]
Reza Zafarani and Huan Liu. 2013. Connecting users across social media sites: A behavioral-modeling approach. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'13).
[56]
Reza Zafarani and Huan Liu. 2014. Finding friends on a new site using minimum information. In Proceedings of the SIAM International Conference on Data Mining (SDM'14).
[57]
Hui Zang and Jean Bolot. 2011. Anonymization of location data does not work: A large-scale measurement study. In Proceedings of the 17th Annual International Conference on Mobile Computing and Networking (MobiCom'11).
[58]
Jiawei Zhang, Xiangnan Kong, and Philip S. Yu. 2014. Transferring heterogeneous links across location-based social networks. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM'14).
[59]
Jiawei Zhang and S. Yu Philip. 2015. Multiple anonymized social networks alignment. In Proceedings of the IEEE International Conference on Data Mining (ICDM'15).
[60]
Zexuan Zhong, Yong Cao, Mu Guo, and Zaiqing Nie. 2018. CoLink: An unsupervised framework for user identity linkage. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI'18).
[61]
Xiaoping Zhou, Xun Liang, Xiaoyong Du, and Jichao Zhao. 2018. Structure based user identification across social networks. IEEE Transactions on Knowledge and Data Engineering 30, 6 (2018), 1178–1191.

Cited By

View all
  • (2024)Generative Adversarial Network Applications in Industry 4.0: A ReviewInternational Journal of Computer Vision10.1007/s11263-023-01966-9132:6(2195-2254)Online publication date: 12-Jan-2024
  • (2024)A cross-domain user association scheme based on graph attention networks with trajectory embeddingMachine Learning10.1007/s10994-024-06613-zOnline publication date: 21-Aug-2024
  • (2024)An improved normal wiggly hesitant fuzzy FMEA model and its application to risk assessment of electric bus systemsApplied Intelligence10.1007/s10489-024-05458-254:8(6213-6237)Online publication date: 8-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 12, Issue 4
August 2021
368 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3468075
  • Editor:
  • Huan Liu
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2021
Accepted: 01 November 2020
Revised: 01 September 2020
Received: 01 April 2020
Published in TIST Volume 12, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Identity linkage
  2. spatio-temporal trajectory
  3. online services
  4. set-wise id matching

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Key Research and Development Program of China
  • National Nature Science Foundation of China
  • Beijing Natural Science Foundation
  • Beijing National Research Center for Information Science and Technology
  • Tsinghua University - Tencent Joint Laboratory for Internet Innovation Technology

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)3
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Generative Adversarial Network Applications in Industry 4.0: A ReviewInternational Journal of Computer Vision10.1007/s11263-023-01966-9132:6(2195-2254)Online publication date: 12-Jan-2024
  • (2024)A cross-domain user association scheme based on graph attention networks with trajectory embeddingMachine Learning10.1007/s10994-024-06613-zOnline publication date: 21-Aug-2024
  • (2024)An improved normal wiggly hesitant fuzzy FMEA model and its application to risk assessment of electric bus systemsApplied Intelligence10.1007/s10489-024-05458-254:8(6213-6237)Online publication date: 8-May-2024
  • (2023)A Spatial and Adversarial Representation Learning Approach for Land Use Classification with POIsACM Transactions on Intelligent Systems and Technology10.1145/362782414:6(1-25)Online publication date: 14-Nov-2023
  • (2023)Social Re-Identification Assisted RTO Detection for E-CommerceCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3587620(854-858)Online publication date: 30-Apr-2023
  • (2023)Evaluation Framework for Electric Vehicle Security Risk AssessmentIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.330766025:1(33-56)Online publication date: 11-Sep-2023
  • (2023)User re-identification via human mobility trajectories with siamese transformer networksApplied Intelligence10.1007/s10489-023-05234-854:1(815-834)Online publication date: 20-Dec-2023
  • (2023)Panini: a transformer-based grammatical error correction method for BanglaNeural Computing and Applications10.1007/s00521-023-09211-736:7(3463-3477)Online publication date: 4-Dec-2023
  • (undefined)Research on the Application of Neural Network Classification Model in English Grammar Error CorrectionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3596492

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media