Abstract
Vertical Federated Learning (VFL) is becoming a standard collaborative learning paradigm with various practical applications. Randomness is essential to enhancing privacy in VFL, but introducing too much external randomness often leads to an intolerable performance loss. Instead, as it was demonstrated for other federated learning settings, leveraging internal randomness – as provided by variational autoencoders (VAEs) –can be beneficial. However, the resulting privacy has never been quantified so far, nor has the approach been investigated for VFL.
We therefore propose a novel differential privacy (DP) estimate, denoted as distance-based empirical local differential privacy (\(\textsf{dELDP}\)). It allows us to empirically bound DP parameters of models or model components, quantifying the internal randomness with appropriate distance and sensitivity metrics. We apply \(\textsf{dELDP}\) to investigate the DP of VAEs and observe values up to \(\epsilon \approx 6.4\) and \(\delta = 2^{-32}\). Based on this, to link the \(\textsf{dELDP}\) parameters to the privacy of VAE-including VFL systems in practice, we conduct comprehensive experiments on the robustness against state-of-the-art privacy attacks. The results illustrate that the VAE system is robust against feature reconstruction attacks and outperforms other privacy-enhancing methods for VFL, especially when the adversary holds 75% of the features during label inference attacks.
Y. Sun and L. Duan—These authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In Step 1 in Fig. 1, features of the same individual have to be aligned by the identifier. If \(\mathcal {A}\) participates in training, \(\mathcal {A}\) sees the (quasi-)identifier of every sample.
- 2.
\(V(x) \in \mathbb {R} \), so \(|V(x)| = ||V(x)||_2\). The VAE parameter \(\theta \) has been trained.
- 3.
We abuse the notation by letting \(\sigma _{\ell } = \sqrt{\sigma _{\ell }^{\textsf{T}}\sigma _{\ell }}\) so we have the form in the theorem.
References
Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318 (2016)
Bai, J., Wang, W., Gomes, C.P.: Contrastively disentangled sequential variational autoencoder. In: Advances in Neural Information Processing Systems, vol. 34, pp. 10105–10118 (2021)
Bator, M.: Dataset for Sensorless Drive Diagnosis. UCI Machine Learning Repository (2015)
Bernstein, J., Wang, Y.X., Azizzadenesheli, K., Anandkumar, A.: signSGD: compressed optimisation for non-convex problems. In: International Conference on Machine Learning, pp. 560–569. PMLR (2018)
Bird, J.J., Faria, D.R., Premebida, C., Ekárt, A., Vogiatzis, G.: Look and listen: a multi-modality late fusion approach to scene classification for autonomous machines. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 10380–10385. IEEE (2020)
Bonawitz, K., et al.: Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191 (2017)
Burchard, P., Daoud, A., Dotterrer, D.: Empirical differential privacy. arXiv preprint arXiv:1910.12820 (2019)
Cohen, G., Afshar, S., Tapson, J., Van Schaik, A.: EMNIST: extending MNIST to handwritten letters. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 2921–2926. IEEE (2017)
Dai, Y., et al.: Improving adversarial robustness of medical imaging systems via adding global attention noise. Comput. Biol. Med. 164, 107251 (2023)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Duan, Y.: Privacy without noise. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1517–1520 (2009)
Dwork, C., Roth, A., et al.: The algorithmic foundations of differential privacy. Found. Trends® Theor. Comput. Sci. 9(3–4), 211–407 (2014)
Erlingsson, Ú., Feldman, V., Mironov, I., Raghunathan, A., Talwar, K., Thakurta, A.: Amplification by shuffling: From local to central differential privacy via anonymity. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 2468–2479. SIAM (2019)
Fu, C., et al.: Label inference attacks against vertical federated learning. In: 31st USENIX Security Symposium (USENIX Security 2022), pp. 1397–1414 (2022)
Geng, J., et al.: Towards general deep leakage in federated learning. arXiv preprint arXiv:2110.09074 (2021)
Grining, K., Klonowski, M.: Towards extending noiseless privacy: dependent data and more practical approach. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 546–560 (2017)
Jiang, X., Zhou, X., Grossklags, J.: Comprehensive analysis of privacy leakage in vertical federated learning during prediction. Proc. Priv. Enhancing Technol. 2022(2), 263–281 (2022)
Kairouz, P., et al.: Advances and open problems in federated learning. Found. Trends Mach. Learn. 14(1–2), 1–210 (2021)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Knott, B., Venkataraman, S., Hannun, A., Sengupta, S., Ibrahim, M., van der Maaten, L.: Crypten: secure multi-party computation meets machine learning. In: Advances in Neural Information Processing Systems, vol. 34, pp. 4961–4973 (2021)
Koker, T., Mireshghallah, F., Titcombe, T., Kaissis, G.: U-noise: learnable noise masks for interpretable image segmentation. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 394–398. IEEE (2021)
Criteo dataset (2021). https://labs.criteo.com/2014/02/download-kaggle-display-advertising-challenge-dataset/
Li, H., Lin, H., Polychroniadou, A., Tessaro, S.: LERNA: secure single-server aggregation via key-homomorphic masking. In: Guo, J., Steinfeld, R. (eds.) ASIACRYPT 2023. LNCS, vol. 14438, pp. 302–334. Springer, Cham (2023). https://doi.org/10.1007/978-981-99-8721-4_10
Li, X., et al.: Opboost: a vertical federated tree boosting framework based on order-preserving desensitization. Proc. VLDB Endow. 16(2), 202–215 (2022). https://doi.org/10.14778/3565816.3565823
Liu, Y., et al.: Vertical federated learning. arXiv preprint arXiv:2211.12814 (2022)
Luo, X., Wu, Y., Xiao, X., Ooi, B.C.: Feature inference attack on model predictions in vertical federated learning. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 181–192. IEEE (2021)
Neumeier, M., Botsch, M., Tollkühn, A., Berberich, T.: Variational autoencoder-based vehicle trajectory prediction with an interpretable latent space. In: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pp. 820–827. IEEE (2021)
Orekondy, T., Schiele, B., Fritz, M.: Knockoff nets: stealing functionality of black-box models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4954–4963 (2019)
Ou, W., Zeng, J., Guo, Z., Yan, W., Liu, D., Fuentes, S.: A homomorphic-encryption-based vertical federated learning scheme for rick management. Comput. Sci. Inf. Syst. 17(3), 819–834 (2020)
Ranbaduge, T., Ding, M.: Differentially private vertical federated learning. arXiv preprint arXiv:2211.06782 (2022)
Scheliga, D., Mäder, P., Seeland, M.: Precode-a generic model extension to prevent deep gradient leakage. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1849–1858 (2022)
Seif, M., Tandon, R., Li, M.: Wireless federated learning with local differential privacy. In: 2020 IEEE International Symposium on Information Theory (ISIT), pp. 2604–2609. IEEE (2020)
Shridhar, K., Laumann, F., Liwicki, M.: A comprehensive guide to Bayesian convolutional neural network with variational inference. arXiv preprint arXiv:1901.02731 (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Truex, S., Liu, L., Chow, K.H., Gursoy, M.E., Wei, W.: LDP-fed: federated learning with local differential privacy. In: Proceedings of the Third ACM International Workshop on Edge Systems, Analytics and Networking, pp. 61–66 (2020)
Yang, C., Wang, X., Mao, S.: Autotag: recurrent variational autoencoder for unsupervised apnea detection with RFID tags. In: 2018 IEEE Global Communications Conference (GLOBECOM), pp. 1–7. IEEE (2018)
Yang, J., Shi, R., Ni, B.: Medmnist classification decathlon: a lightweight automl benchmark for medical image analysis. In: IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 191–195 (2021)
Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 1–19 (2019)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Zhu, L., Liu, Z., Han, S.: Deep leakage from gradients. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Acknowledgments
This research is funded by the European Research Center of Huawei Technologies. Asja Fischer acknowledges support by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC 2092 CASA - 390781972. We thank anonymous reviewers for the various constructive comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Proof of Theorem 2
A Proof of Theorem 2
Here is the complete proof of the \(\textsf{dLDP}\) main theorem (Theorem 2).
Proof
We start from one-dimensional case and upgrade it to the high dimensional case. If the distance \(< \infty \), it is 1 by definition for \(x, x'\) with \(\textsf{Label} (x) = \textsf{Label} (x')\) in (8). We start from one-dimensional case and real valued \(\mu ({x}), V(x)\). Fix a label \(\ell = \textsf{Label} (x) = \textsf{Label} (x')\), The sensitivity is
We then consider the absolute value of the privacy loss for \(x, x'\):
where \(\sigma _{1} = ||V(x)||_2\) and \(\sigma _{2} = ||V(x')||_2\), and V() is fixed.Footnote 2
Without loss of generality, we assume \(\sigma _{1} \le \sigma _{2}\). Then we have
So to bound (12), we can bound the left-hand side of (13) instead, i.e., we need an \(\epsilon _1\), such that
Thus, if we iterate over all x with \(\textsf{Label} (x) = \ell \) and let
the bound for the privacy loss w.r.t. label \( \ell \) is
With a similar argument in the proof of Theorem A.1 in [12], and since |x| is identical to \(|| x ||_{2}\) if \(x \in \mathbb {R} \), then we can have the relation
By taking \(\epsilon = \max (\{\epsilon _{\ell }\})\), we can conclude that Theorem 2 is correct for one-dimensional posterior sampling in VAE.
We use a convention \(\mu _{x}:=\mu (x)\). For the high dimension case, i.e., \(\mu _{x}, x, V(x) \in \mathbb {R} ^{m}\), the sensitivity is changed to
Similarly, we are interested in the privacy loss w.r.t. label \(\ell \)
For \(\sigma _{1} = V(x)\) and \(\sigma _{2} = V(x')\), with \(\sigma _{1}^{\textsf{T}}\sigma _1 \le \sigma _{2}^{\textsf{T}}\sigma _2\), we can have
If we assume that \(\mu _{x}-\mu _{x'}\) with the maximum norm is aligned with x (which gives the biggest denominator), we are back to the one-dimensional case, i.e., for
the bound \(\epsilon _{\ell }\) for the privacy loss must fulfill
So we have (20) in high dimension. Theorem 2 holds.Footnote 3
\(\square \)
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sun, Y. et al. (2024). Exploiting Internal Randomness for Privacy in Vertical Federated Learning. In: Garcia-Alfaro, J., Kozik, R., Choraś, M., Katsikas, S. (eds) Computer Security – ESORICS 2024. ESORICS 2024. Lecture Notes in Computer Science, vol 14983. Springer, Cham. https://doi.org/10.1007/978-3-031-70890-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-70890-9_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70889-3
Online ISBN: 978-3-031-70890-9
eBook Packages: Computer ScienceComputer Science (R0)