skip to main content
research-article

MP2SDA: Multi-Party Parallelized Sparse Discriminant Learning

Published: 13 March 2020 Publication History

Abstract

Sparse Discriminant Analysis (SDA) has been widely used to improve the performance of classical Fisher’s Linear Discriminant Analysis in supervised metric learning, feature selection, and classification. With the increasing needs of distributed data collection, storage, and processing, enabling the Sparse Discriminant Learning to embrace the multi-party distributed computing environments becomes an emerging research topic. This article proposes a novel multi-party SDA algorithm, which can learn SDA models effectively without sharing any raw data and basic statistics among machines. The proposed algorithm (1) leverages the direct estimation of SDA to derive a distributed loss function for the discriminant learning, (2) parameterizes the distributed loss function with local/global estimates through bootstrapping, and (3) approximates a global estimation of linear discriminant projection vector by optimizing the “distributed bootstrapping loss function” with gossip-based stochastic gradient descent. Experimental results on both synthetic and real-world benchmark datasets show that our algorithm can compete with the aggregated SDA with similar performance, and significantly outperforms the most recent distributed SDA in terms of accuracy and F1-score.

References

[1]
Gustavo EAPA Batista, Andre CPLF Carvalho, and Maria Carolina Monard. 2000. Applying one-sided selection to unbalanced datasets. In Mexican International Conference on Artificial Intelligence. Springer, 315--325.
[2]
James O. Berger. 2013. Statistical Decision Theory and Bayesian Analysis. Springer Science 8 Business Media.
[3]
Kanishka Bhaduri, Ran Wolff, Chris Giannella, and Hillol Kargupta. 2008. Distributed decision-tree induction in peer-to-peer systems. Statistical Analysis and Data Mining 1, 2 (2008), 85--103.
[4]
Jiang Bian, Laura E. Barnes, Guanling Chen, and Haoyi Xiong. 2017. Early detection of diseases using electronic health records data and covariance-regularized linear discriminant analysis. In 2017 IEEE EMBS International Conference on Biomedical 8 Health Informatics (BHI’17). IEEE, 457--460.
[5]
Jiang Bian, Haoyi Xiong, Wei Cheng, Wenqing Hu, Zhishan Guo, and Yanjie Fu. 2017. Multi-party sparse discriminant learning. In 2017 IEEE International Conference on Data Mining (ICDM’17). IEEE, 745--750.
[6]
Jiang Bian, Haoyi Xiong, Yanjie Fu, and Sajal K. Das. 2018. CSWA: Aggregation-free spatial-temporal community sensing. In 32nd AAAI Conference on Artificial Intelligence.
[7]
Dan Bogdanov, Margus Niitsoo, Tomas Toft, and Jan Willemson. 2012. High-performance secure multi-party computation for data mining applications. International Journal of Information Security 11, 6 (2012), 403--418.
[8]
Tony Cai and Weidong Liu. 2011. A direct estimation approach to sparse linear discriminant analysis. Journal of the American Statistical Association 106, 496 (2011), 1566--1577.
[9]
T. Tony Cai, Zhao Ren, Harrison H. Zhou, et al. 2016. Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation. Electronic Journal of Statistics 10, 1 (2016), 1--59.
[10]
Hao Chen and Ronald Cramer. 2006. Algebraic geometric secret sharing schemes and secure multi-party computations over small fields. In Annual International Cryptology Conference. Springer, 521--536.
[11]
Yiu-ming Cheung and Jian Lou. 2015. Efficient generalized conditional gradient with gradient sliding for composite optimization. In International Joint Conferences on Artificial Intelligence. 3409--3415.
[12]
Line Clemmensen, Trevor Hastie, Daniela Witten, and Bjarne Ersbøll. 2011. Sparse discriminant analysis. Technometrics 53, 4 (2011), 406--413.
[13]
Soham De and Tom Goldstein. 2016. Efficient distributed SGD with variance reduction. In 2016 IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 111--120.
[14]
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V. Le, et al. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems. 1223--1231.
[15]
Richard O. Duda, Peter E. Hart, and David G. Stork. 2001. Pattern Classification (2nd Ed). Wiley.
[16]
Heinz W. Engl and Charles W. Groetsch. 2014. Inverse and Ill-posed Problems. Vol. 4. Elsevier.
[17]
Zhi Fengy, Haoyi Xiong, Chuanyuan Song, Sijia Yang, Baoxin Zhao, Licheng Wang, Zeyu Chen, Shengwen Yang, Liping Liu, and Jun Huan. 2019. SecureGBM: Secure multi-party gradient boosting. In IEEE International Conference on Big Data (Big Data’19). IEEE.
[18]
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2008. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 3 (2008), 432--441.
[19]
Suyog Gupta, Wei Zhang, and Fei Wang. 2016. Model accuracy and runtime tradeoff in distributed deep learning: A systematic study. In 2016 IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 171--180.
[20]
Wassily Hoeffding, Herbert Robbins, et al. 1948. The central limit theorem for dependent random variables. Duke Mathematical Journal 15, 3 (1948), 773--780.
[21]
Jana Jankova, Sara van de Geer, et al. 2015. Confidence intervals for high-dimensional inverse covariance estimation. Electronic Journal of Statistics 9, 1 (2015), 1205--1229.
[22]
Adel Javanmard and Andrea Montanari. 2014. Confidence intervals and hypothesis testing for high-dimensional regression. Journal of Machine Learning Research 15, 1 (2014), 2869--2909.
[23]
W. J. Krzanowski, Philip Jonathan, W. V. McCarthy, and M. R. Thomas. 1995. Discriminant analysis with singular covariance matrices: Methods and applications to spectroscopic data. Applied Statistics 44, 1 (1995), 101--115.
[24]
Christopher Z. Mooney, Robert D. Duval, and Robert Duvall. 1993. Bootstrapping: A Nonparametric Approach to Statistical Inference. Sage, 94–95.
[25]
Per Aslak Mykland. 1992. Asymptotic expansions and bootstrapping distributions for dependent variables: A martingale approach. The Annals of Statistics 20, 2 (1992), 623--654.
[26]
Róbert Ormándi, István Hegedűs, and Márk Jelasity. 2013. Gossip learning with linear models on fully distributed data. Concurrency and Computation: Practice and Experience 25, 4 (2013), 556--571.
[27]
Finbarr O’Sullivan. 1986. A statistical perspective on ill-posed inverse problems. Statistical Science 1, 4 (1986), 502--518.
[28]
M. Hashem Pesaran, Yongcheol Shin, and Ron P. Smith. 1999. Pooled mean group estimation of dynamic heterogeneous panels. Journal of the American Statistical Association 94, 446 (1999), 621--634.
[29]
John C. Platt. 1999. Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods—Support Vector Learning. MIT Press, Cambridge, MA, 185--208.
[30]
Foster J. Provost and Daniel N. Hennessy. 1996. Scaling up: Distributed machine learning with cooperation. In Annual Conference on Innovative Applications of Artificial Intelligence, Vol. 1. 74--79.
[31]
Guo-Jun Qi, Charu Aggarwal, Deepak Turaga, Daby Sow, and Phil Anno. 2015. State-driven dynamic sensor selection and prediction with state-stacked sparseness. In 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 945--954.
[32]
Guo-Jun Qi, Jinhui Tang, Zheng-Jun Zha, Tat-Seng Chua, and Hong-Jiang Zhang. 2009. An efficient sparse metric learning in high-dimensional space via l 1-penalized log-determinant regularization. In 26th Annual International Conference on Machine Learning. ACM, 841--848.
[33]
Hong Qian and Yang Yu. 2016. Scaling simultaneous optimistic optimization for high-dimensional non-convex functions with low effective dimensions. In AAAI Conference on Artificial Intelligence. 2000--2006.
[34]
Sarunas Raudys and Robert P. W. Duin. 1998. Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix. Pattern Recognition Letters 19, 5 (1998), 385--392.
[35]
Jun Shao, Yazhen Wang, Xinwei Deng, Sijian Wang, et al. 2011. Sparse linear discriminant analysis by thresholding for high dimensional data. The Annals of Statistics 39, 2 (2011), 1241--1265.
[36]
Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Zechao Li, Yu-Gang Jiang, and Shuicheng Yan. 2016. Image classification with tailored fine-grained dictionaries. IEEE Transactions on Circuits and Systems for Video Technology 28, 2 (2016), 454--467.
[37]
Padhraic Smyth, Max Welling, and Arthur U. Asuncion. 2009. Asynchronous distributed learning of topic models. In Advances in Neural Information Processing Systems. 81--88.
[38]
Charles M. Stein. 1981. Estimation of the mean of a multivariate normal distribution. The Annals of Statistics 9, 6 (1981), 1135--1151.
[39]
Jinhui Tang, Richang Hong, Shuicheng Yan, Tat-Seng Chua, Guo-Jun Qi, and Ramesh Jain. 2011. Image annotation by k nn-sparse graph-based label propagation over noisily tagged web images. ACM Transactions on Intelligent Systems and Technology 2, 2 (2011), 14.
[40]
Lu Tian and Quanquan Gu. 2016. Communication-efficient distributed sparse linear discriminant analysis. In Proceeding of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS’16).
[41]
Kai Ming Ting. 2011. Precision and recall. In Encyclopedia of Machine Learning. Springer, 781--781.
[42]
Konstantinos I. Tsianos, Sean Lawlor, and Michael G. Rabbat. 2012. Consensus-based distributed optimization: Practical issues and applications in large-scale machine learning. In 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton’12). IEEE, 1543--1550.
[43]
Jie Wen, Xiaozhao Fang, Jinrong Cui, Lunke Fei, Ke Yan, Yan Chen, and Yong Xu. 2018. Robust sparse linear discriminant analysis. IEEE Transactions on Circuits and Systems for Video Technology 29, 2 (2018), 390--403.
[44]
Daniela M. Witten, Jerome H. Friedman, and Noah Simon. 2011. New insights and faster computations for the graphical lasso. Journal of Computational and Graphical Statistics 20, 4 (2011), 892--900.
[45]
Daniela M. Witten and Robert Tibshirani. 2009. Covariance-regularized regression and classification for high dimensional problems. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71, 3 (2009), 615--636.
[46]
Zechao Li, Hanjiang Lai, Liyan Zhang, Shuicheng Yan, Xiangbo Shu, and Jinhui Tang. 2017. Personalized age progression with bi-level aging dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2017), 905--917.
[47]
Eric P. Xing, Qirong Ho, Wei Dai, Jin Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. 2015. Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data 1, 2 (2015), 49--67.
[48]
Eric P. Xing, Qirong Ho, Pengtao Xie, and Dai Wei. 2016. Strategies and principles of distributed machine learning on big data. Engineering 2, 2 (2016), 179--195.
[49]
Haoyi Xiong, Wei Cheng, Jiang Bian, Wenqing Hu, Zeyi Sun, and Zhishan Guo. 2018. DBSDA: Lowering the bound of misclassification rate for sparse linear discriminant analysis via model debiasing. IEEE Transactions on Neural Networks and Learning Systems 30, 3 (2018), 707--717.
[50]
Haoyi Xiong, Wei Cheng, Yanjie Fu, Wenqing Hu, Jiang Bian, and Zhishan Guo. 2018. De-biasing covariance-regularized discriminant analysis. In 27th International Joint Conference on Artificial Intelligence. 2889--2897.
[51]
Haoyi Xiong, Wei Cheng, Wenqing Hu, Jiang Bian, and Zhishan Guo. 2017. AWDA: An adaptive wishart discriminant analysis. In 2017 IEEE International Conference on Data Mining (ICDM’17). IEEE, 525--534.
[52]
Shuangyan Yi, Zhenyu He, Yiu-Ming Cheung, and Wen-Sheng Chen. 2017. Unified sparse subspace learning via self-contained regression. IEEE Transactions on Circuits and Systems for Video Technology 28, 10 (2017), 2537--2550.
[53]
Jian-Pei Zhang, Zhong-Wei Li, and Jing Yang. 2005. A parallel SVM training algorithm on large-scale classification problems. In 2005 International Conference on Machine Learning and Cybernetics, Vol. 3. IEEE, 1637--1641.
[54]
Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. 2008. Large-scale parallel collaborative filtering for the Netflix prize. In International Conference on Algorithmic Applications in Management. Springer, 337--348.
[55]
Martin Zinkevich, Markus Weimer, Lihong Li, and Alex J. Smola. 2010. Parallelized stochastic gradient descent. In Advances in Neural Information Processing Systems. 2595--2603.

Cited By

View all
  • (2023)Data Placement for Multi-Tenant Data Federation on the CloudIEEE Transactions on Cloud Computing10.1109/TCC.2021.313657711:2(1414-1429)Online publication date: 1-Apr-2023
  • (2022)Machine Learning in Real-Time Internet of Things (IoT) Systems: A SurveyIEEE Internet of Things Journal10.1109/JIOT.2022.31610509:11(8364-8386)Online publication date: 1-Jun-2022
  • (2022)OGM: Online gaussian graphical models on the flyApplied Intelligence10.1007/s10489-021-02563-452:3(3103-3117)Online publication date: 1-Feb-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 14, Issue 3
June 2020
381 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3388473
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2020
Accepted: 01 December 2019
Revised: 01 October 2019
Received: 01 March 2019
Published in TKDD Volume 14, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Sparse discriminant analysis
  2. distributed
  3. multi-party
  4. parallelized

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • NSF: RAISE: CA-FW-HTF: Prepare the US Labor Force for Future Jobs in the Hotel and Restaurant Industry: A Hybrid Framework and Multi-Stakeholder Approach
  • NSF: CRII: CSR: NeuroMC---Parallel Online Scheduling of Mixed-Criticality Real-Time Systems via Neural Networks

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Data Placement for Multi-Tenant Data Federation on the CloudIEEE Transactions on Cloud Computing10.1109/TCC.2021.313657711:2(1414-1429)Online publication date: 1-Apr-2023
  • (2022)Machine Learning in Real-Time Internet of Things (IoT) Systems: A SurveyIEEE Internet of Things Journal10.1109/JIOT.2022.31610509:11(8364-8386)Online publication date: 1-Jun-2022
  • (2022)OGM: Online gaussian graphical models on the flyApplied Intelligence10.1007/s10489-021-02563-452:3(3103-3117)Online publication date: 1-Feb-2022
  • (2022)From distributed machine learning to federated learning: a surveyKnowledge and Information Systems10.1007/s10115-022-01664-x64:4(885-917)Online publication date: 1-Apr-2022
  • (2021)Robust generalised quadratic discriminant analysisPattern Recognition10.1016/j.patcog.2021.107981117(107981)Online publication date: Sep-2021

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media