research-article

MP²SDA: Multi-Party Parallelized Sparse Discriminant Learning

Authors:

Zhishan GuoAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 14, Issue 3

Article No.: 26, Pages 1 - 22

https://doi.org/10.1145/3374919

Published: 13 March 2020 Publication History

Abstract

Sparse Discriminant Analysis (SDA) has been widely used to improve the performance of classical Fisher’s Linear Discriminant Analysis in supervised metric learning, feature selection, and classification. With the increasing needs of distributed data collection, storage, and processing, enabling the Sparse Discriminant Learning to embrace the multi-party distributed computing environments becomes an emerging research topic. This article proposes a novel multi-party SDA algorithm, which can learn SDA models effectively without sharing any raw data and basic statistics among machines. The proposed algorithm (1) leverages the direct estimation of SDA to derive a distributed loss function for the discriminant learning, (2) parameterizes the distributed loss function with local/global estimates through bootstrapping, and (3) approximates a global estimation of linear discriminant projection vector by optimizing the “distributed bootstrapping loss function” with gossip-based stochastic gradient descent. Experimental results on both synthetic and real-world benchmark datasets show that our algorithm can compete with the aggregated SDA with similar performance, and significantly outperforms the most recent distributed SDA in terms of accuracy and F1-score.

References

[1]

Gustavo EAPA Batista, Andre CPLF Carvalho, and Maria Carolina Monard. 2000. Applying one-sided selection to unbalanced datasets. In Mexican International Conference on Artificial Intelligence. Springer, 315--325.

[2]

James O. Berger. 2013. Statistical Decision Theory and Bayesian Analysis. Springer Science 8 Business Media.

[3]

Kanishka Bhaduri, Ran Wolff, Chris Giannella, and Hillol Kargupta. 2008. Distributed decision-tree induction in peer-to-peer systems. Statistical Analysis and Data Mining 1, 2 (2008), 85--103.

Digital Library

[4]

Jiang Bian, Laura E. Barnes, Guanling Chen, and Haoyi Xiong. 2017. Early detection of diseases using electronic health records data and covariance-regularized linear discriminant analysis. In 2017 IEEE EMBS International Conference on Biomedical 8 Health Informatics (BHI’17). IEEE, 457--460.

[5]

Jiang Bian, Haoyi Xiong, Wei Cheng, Wenqing Hu, Zhishan Guo, and Yanjie Fu. 2017. Multi-party sparse discriminant learning. In 2017 IEEE International Conference on Data Mining (ICDM’17). IEEE, 745--750.

[6]

Jiang Bian, Haoyi Xiong, Yanjie Fu, and Sajal K. Das. 2018. CSWA: Aggregation-free spatial-temporal community sensing. In 32nd AAAI Conference on Artificial Intelligence.

[7]

Dan Bogdanov, Margus Niitsoo, Tomas Toft, and Jan Willemson. 2012. High-performance secure multi-party computation for data mining applications. International Journal of Information Security 11, 6 (2012), 403--418.

Digital Library

[8]

Tony Cai and Weidong Liu. 2011. A direct estimation approach to sparse linear discriminant analysis. Journal of the American Statistical Association 106, 496 (2011), 1566--1577.

[9]

T. Tony Cai, Zhao Ren, Harrison H. Zhou, et al. 2016. Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation. Electronic Journal of Statistics 10, 1 (2016), 1--59.

[10]

Hao Chen and Ronald Cramer. 2006. Algebraic geometric secret sharing schemes and secure multi-party computations over small fields. In Annual International Cryptology Conference. Springer, 521--536.

Digital Library

[11]

Yiu-ming Cheung and Jian Lou. 2015. Efficient generalized conditional gradient with gradient sliding for composite optimization. In International Joint Conferences on Artificial Intelligence. 3409--3415.

[12]

Line Clemmensen, Trevor Hastie, Daniela Witten, and Bjarne Ersbøll. 2011. Sparse discriminant analysis. Technometrics 53, 4 (2011), 406--413.

[13]

Soham De and Tom Goldstein. 2016. Efficient distributed SGD with variance reduction. In 2016 IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 111--120.

[14]

Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V. Le, et al. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems. 1223--1231.

[15]

Richard O. Duda, Peter E. Hart, and David G. Stork. 2001. Pattern Classification (2nd Ed). Wiley.

Digital Library

[16]

Heinz W. Engl and Charles W. Groetsch. 2014. Inverse and Ill-posed Problems. Vol. 4. Elsevier.

[17]

Zhi Fengy, Haoyi Xiong, Chuanyuan Song, Sijia Yang, Baoxin Zhao, Licheng Wang, Zeyu Chen, Shengwen Yang, Liping Liu, and Jun Huan. 2019. SecureGBM: Secure multi-party gradient boosting. In IEEE International Conference on Big Data (Big Data’19). IEEE.

[18]

Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2008. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 3 (2008), 432--441.

[19]

Suyog Gupta, Wei Zhang, and Fei Wang. 2016. Model accuracy and runtime tradeoff in distributed deep learning: A systematic study. In 2016 IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 171--180.

[20]

Wassily Hoeffding, Herbert Robbins, et al. 1948. The central limit theorem for dependent random variables. Duke Mathematical Journal 15, 3 (1948), 773--780.

[21]

Jana Jankova, Sara van de Geer, et al. 2015. Confidence intervals for high-dimensional inverse covariance estimation. Electronic Journal of Statistics 9, 1 (2015), 1205--1229.

[22]

Adel Javanmard and Andrea Montanari. 2014. Confidence intervals and hypothesis testing for high-dimensional regression. Journal of Machine Learning Research 15, 1 (2014), 2869--2909.

Digital Library

[23]

W. J. Krzanowski, Philip Jonathan, W. V. McCarthy, and M. R. Thomas. 1995. Discriminant analysis with singular covariance matrices: Methods and applications to spectroscopic data. Applied Statistics 44, 1 (1995), 101--115.

[24]

Christopher Z. Mooney, Robert D. Duval, and Robert Duvall. 1993. Bootstrapping: A Nonparametric Approach to Statistical Inference. Sage, 94–95.

[25]

Per Aslak Mykland. 1992. Asymptotic expansions and bootstrapping distributions for dependent variables: A martingale approach. The Annals of Statistics 20, 2 (1992), 623--654.

[26]

Róbert Ormándi, István Hegedűs, and Márk Jelasity. 2013. Gossip learning with linear models on fully distributed data. Concurrency and Computation: Practice and Experience 25, 4 (2013), 556--571.

[27]

Finbarr O’Sullivan. 1986. A statistical perspective on ill-posed inverse problems. Statistical Science 1, 4 (1986), 502--518.

[28]

M. Hashem Pesaran, Yongcheol Shin, and Ron P. Smith. 1999. Pooled mean group estimation of dynamic heterogeneous panels. Journal of the American Statistical Association 94, 446 (1999), 621--634.

[29]

John C. Platt. 1999. Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods—Support Vector Learning. MIT Press, Cambridge, MA, 185--208.

[30]

Foster J. Provost and Daniel N. Hennessy. 1996. Scaling up: Distributed machine learning with cooperation. In Annual Conference on Innovative Applications of Artificial Intelligence, Vol. 1. 74--79.

[31]

Guo-Jun Qi, Charu Aggarwal, Deepak Turaga, Daby Sow, and Phil Anno. 2015. State-driven dynamic sensor selection and prediction with state-stacked sparseness. In 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 945--954.

Digital Library

[32]

Guo-Jun Qi, Jinhui Tang, Zheng-Jun Zha, Tat-Seng Chua, and Hong-Jiang Zhang. 2009. An efficient sparse metric learning in high-dimensional space via l 1-penalized log-determinant regularization. In 26th Annual International Conference on Machine Learning. ACM, 841--848.

Digital Library

[33]

Hong Qian and Yang Yu. 2016. Scaling simultaneous optimistic optimization for high-dimensional non-convex functions with low effective dimensions. In AAAI Conference on Artificial Intelligence. 2000--2006.

[34]

Sarunas Raudys and Robert P. W. Duin. 1998. Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix. Pattern Recognition Letters 19, 5 (1998), 385--392.

Digital Library

[35]

Jun Shao, Yazhen Wang, Xinwei Deng, Sijian Wang, et al. 2011. Sparse linear discriminant analysis by thresholding for high dimensional data. The Annals of Statistics 39, 2 (2011), 1241--1265.

[36]

Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Zechao Li, Yu-Gang Jiang, and Shuicheng Yan. 2016. Image classification with tailored fine-grained dictionaries. IEEE Transactions on Circuits and Systems for Video Technology 28, 2 (2016), 454--467.

[37]

Padhraic Smyth, Max Welling, and Arthur U. Asuncion. 2009. Asynchronous distributed learning of topic models. In Advances in Neural Information Processing Systems. 81--88.

[38]

Charles M. Stein. 1981. Estimation of the mean of a multivariate normal distribution. The Annals of Statistics 9, 6 (1981), 1135--1151.

[39]

Jinhui Tang, Richang Hong, Shuicheng Yan, Tat-Seng Chua, Guo-Jun Qi, and Ramesh Jain. 2011. Image annotation by k nn-sparse graph-based label propagation over noisily tagged web images. ACM Transactions on Intelligent Systems and Technology 2, 2 (2011), 14.

Digital Library

[40]

Lu Tian and Quanquan Gu. 2016. Communication-efficient distributed sparse linear discriminant analysis. In Proceeding of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS’16).

[41]

Kai Ming Ting. 2011. Precision and recall. In Encyclopedia of Machine Learning. Springer, 781--781.

[42]

Konstantinos I. Tsianos, Sean Lawlor, and Michael G. Rabbat. 2012. Consensus-based distributed optimization: Practical issues and applications in large-scale machine learning. In 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton’12). IEEE, 1543--1550.

[43]

Jie Wen, Xiaozhao Fang, Jinrong Cui, Lunke Fei, Ke Yan, Yan Chen, and Yong Xu. 2018. Robust sparse linear discriminant analysis. IEEE Transactions on Circuits and Systems for Video Technology 29, 2 (2018), 390--403.

Digital Library

[44]

Daniela M. Witten, Jerome H. Friedman, and Noah Simon. 2011. New insights and faster computations for the graphical lasso. Journal of Computational and Graphical Statistics 20, 4 (2011), 892--900.

[45]

Daniela M. Witten and Robert Tibshirani. 2009. Covariance-regularized regression and classification for high dimensional problems. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71, 3 (2009), 615--636.

[46]

Zechao Li, Hanjiang Lai, Liyan Zhang, Shuicheng Yan, Xiangbo Shu, and Jinhui Tang. 2017. Personalized age progression with bi-level aging dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2017), 905--917.

[47]

Eric P. Xing, Qirong Ho, Wei Dai, Jin Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. 2015. Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data 1, 2 (2015), 49--67.

[48]

Eric P. Xing, Qirong Ho, Pengtao Xie, and Dai Wei. 2016. Strategies and principles of distributed machine learning on big data. Engineering 2, 2 (2016), 179--195.

[49]

Haoyi Xiong, Wei Cheng, Jiang Bian, Wenqing Hu, Zeyi Sun, and Zhishan Guo. 2018. DBSDA: Lowering the bound of misclassification rate for sparse linear discriminant analysis via model debiasing. IEEE Transactions on Neural Networks and Learning Systems 30, 3 (2018), 707--717.

[50]

Haoyi Xiong, Wei Cheng, Yanjie Fu, Wenqing Hu, Jiang Bian, and Zhishan Guo. 2018. De-biasing covariance-regularized discriminant analysis. In 27th International Joint Conference on Artificial Intelligence. 2889--2897.

[51]

Haoyi Xiong, Wei Cheng, Wenqing Hu, Jiang Bian, and Zhishan Guo. 2017. AWDA: An adaptive wishart discriminant analysis. In 2017 IEEE International Conference on Data Mining (ICDM’17). IEEE, 525--534.

[52]

Shuangyan Yi, Zhenyu He, Yiu-Ming Cheung, and Wen-Sheng Chen. 2017. Unified sparse subspace learning via self-contained regression. IEEE Transactions on Circuits and Systems for Video Technology 28, 10 (2017), 2537--2550.

Digital Library

[53]

Jian-Pei Zhang, Zhong-Wei Li, and Jing Yang. 2005. A parallel SVM training algorithm on large-scale classification problems. In 2005 International Conference on Machine Learning and Cybernetics, Vol. 3. IEEE, 1637--1641.

[54]

Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. 2008. Large-scale parallel collaborative filtering for the Netflix prize. In International Conference on Algorithmic Applications in Management. Springer, 337--348.

Digital Library

[55]

Martin Zinkevich, Markus Weimer, Lihong Li, and Alex J. Smola. 2010. Parallelized stochastic gradient descent. In Advances in Neural Information Processing Systems. 2595--2603.

Digital Library

Cited By

Liu JMo LYang SZhou JJi SXiong HDou D(2023)Data Placement for Multi-Tenant Data Federation on the CloudIEEE Transactions on Cloud Computing10.1109/TCC.2021.313657711:2(1414-1429)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TCC.2021.3136577
Bian JArafat AXiong HLi JLi LChen HWang JDou DGuo Z(2022)Machine Learning in Real-Time Internet of Things (IoT) Systems: A SurveyIEEE Internet of Things Journal10.1109/JIOT.2022.31610509:11(8364-8386)Online publication date: 1-Jun-2022
https://doi.org/10.1109/JIOT.2022.3161050
Yang SXiong HZhang YLing YWang LXu KSun Z(2022)OGM: Online gaussian graphical models on the flyApplied Intelligence10.1007/s10489-021-02563-452:3(3103-3117)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1007/s10489-021-02563-4
Show More Cited By

Index Terms

MP²SDA: Multi-Party Parallelized Sparse Discriminant Learning
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms
2. Information systems
  1. Information systems applications
    1. Spatial-temporal systems

Recommendations

Regularized generalized eigen-decomposition with applications to sparse supervised feature extraction and sparse discriminant analysis

We propose a general technique for obtaining sparse solutions to generalized eigenvalue problems, and call it Regularized Generalized Eigen-Decomposition (RGED). For decades, Fisher's discriminant criterion has been applied in supervised feature ...
Proximal methods for sparse optimal scoring and discriminant analysis
Abstract
Linear discriminant analysis (LDA) is a classical method for dimensionality reduction, where discriminant vectors are sought to project data to a lower dimensional space for optimal separability of classes. Several recent papers have outlined ...
Sparse alternating decision tree

Alternating decision tree (ADTree) brings interpretability to boosting.A novel sparse version of multivariate ADTree is presented.Sparse ADTree is a better generalization of existing univariate ADTree.The decision nodes are designed based on modified ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 14, Issue 3

June 2020

381 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3388473

Editors:
Charu Aggarwal
IBM T. J. Watson Research, USA
,
Xindong Wu
Minginglamp Academy of Sciences, China

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2020

Accepted: 01 December 2019

Revised: 01 October 2019

Received: 01 March 2019

Published in TKDD Volume 14, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

NSF: RAISE: CA-FW-HTF: Prepare the US Labor Force for Future Jobs in the Hotel and Restaurant Industry: A Hybrid Framework and Multi-Stakeholder Approach
NSF: CRII: CSR: NeuroMC---Parallel Online Scheduling of Mixed-Criticality Real-Time Systems via Neural Networks

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
228
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)2

Reflects downloads up to 14 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu JMo LYang SZhou JJi SXiong HDou D(2023)Data Placement for Multi-Tenant Data Federation on the CloudIEEE Transactions on Cloud Computing10.1109/TCC.2021.313657711:2(1414-1429)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TCC.2021.3136577
Bian JArafat AXiong HLi JLi LChen HWang JDou DGuo Z(2022)Machine Learning in Real-Time Internet of Things (IoT) Systems: A SurveyIEEE Internet of Things Journal10.1109/JIOT.2022.31610509:11(8364-8386)Online publication date: 1-Jun-2022
https://doi.org/10.1109/JIOT.2022.3161050
Yang SXiong HZhang YLing YWang LXu KSun Z(2022)OGM: Online gaussian graphical models on the flyApplied Intelligence10.1007/s10489-021-02563-452:3(3103-3117)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1007/s10489-021-02563-4
Liu JHuang JZhou YLi XJi SXiong HDou D(2022)From distributed machine learning to federated learning: a surveyKnowledge and Information Systems10.1007/s10115-022-01664-x64:4(885-917)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1007/s10115-022-01664-x
Ghosh ASahaRay RChakrabarty SBhadra S(2021)Robust generalised quadratic discriminant analysisPattern Recognition10.1016/j.patcog.2021.107981117(107981)Online publication date: Sep-2021
https://doi.org/10.1016/j.patcog.2021.107981

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents