skip to main content
10.1145/3219819.3219948acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Online Adaptive Asymmetric Active Learning for Budgeted Imbalanced Data

Published: 19 July 2018 Publication History

Abstract

This paper investigates Online Active Learning (OAL) for imbalanced unlabeled datastream, where only a budget of labels can be queried to optimize some cost-sensitive performance measure. OAL can solve many real-world problems, such as anomaly detection in healthcare, finance and network security. In these problems, there are two key challenges: the query budget is often limited; the ratio between two classes is highly imbalanced. To address these challenges, existing work of OAL adopts either asymmetric losses or queries (an isolated asymmetric strategy) to tackle the imbalance, and uses first-order methods to optimize the cost-sensitive measure. However, they may incur two deficiencies: (1) the poor ability in handling imbalanced data due to the isolated asymmetric strategy; (2) relative slow convergence rate due to the first-order optimization. In this paper, we propose a novel Online Adaptive Asymmetric Active (OA3) learning algorithm, which is based on a new asymmetric strategy (merging both the asymmetric losses and queries strategies), and second-order optimization. We theoretically analyze its bounds, and also empirically evaluate it on four real-world online anomaly detection tasks. Promising results confirm the effectiveness and robustness of the proposed algorithm in various application domains.

References

[1]
P. Bachman, A. Sordoni, A. Trischler. Learning algorithms for active learning. In 34th International Conference on Machine Learning, 2017, pp. 301--310.
[2]
N. Abe, B. Zadrozny, J. Langford. Outlier detection by active learning. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 504--509.
[3]
C. Aggarwal, X. Kong, Q. Gu, J. Han, P. Yu. Active learning: a survey, Data Classification: Algorithms and Applications, 2014.
[4]
J. Attenberg, F. Provost. Why label when you can search? Alternatives to active learning for applying human resources to build classification models under extreme class imbalance. In smallSIGKDD International Conference on Knowledge Discovery and Data Mining, 2010, pp. 423--432.
[5]
N. Cesa-Bianchi, C. Gentile, L. Zaniboni. Worst-case analysis of selective sampling for linear classification. Journal of Machine Learning Research, 2006, No. 7, pp. 1205--1230.
[6]
N. Cesa-Bianchi, A. Conconi, C. Gentile. A second-order perceptron algorithm. SIAM Journal on Computing, 2005, No. 3, pp. 640--668.
[7]
S. Chakraborty, V. Balasubramanian, A. Sankar, S. Panchanathan, J. Ye. Batchrank: A novel batch mode active learning framework for hierarchical classification. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 99--108.
[8]
C. C. Chang, C. J. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, Vol. 2, No. 3, pp. 27.
[9]
K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 2006, pp. 551--585.
[10]
K. Crammer, A. Kulesza, M. Dredze. Adaptive regularization of weight vectors. In Advances in Neural Information Processing Systems, 2009, pp. 414--422.
[11]
M. Dundar, B. Krishnapuram, J. Bi, R. B. Rao. Learning classifiers when the training data is not IID. In International Joint Conference on Artificial Intelligence, 2007, pp. 756--761.
[12]
M. Fang, X. Zhu, B. Li, W. Ding, X. Wu. Self-taught active learning from crowds. In IEEE International Conference on Data Mining, 2012, pp. 858--863.
[13]
Z. Ferdowsi, R. Ghani, R. Settimi. Online active learning with imbalanced classes. In IEEE International Conference on Data Mining. 2013, pp. 1043--1048.
[14]
K. Fujii, H. Kashima. Budgeted stream-based active learning via adaptive submodular maximization. In Advances in Neural Information Processing Systems, 2016, pp. 514--522.
[15]
Y. Freund, R. E. Schapire. Large margin classification using the perceptron algorithm. Machine learning, 1999, No. 3, pp. 277--296.
[16]
S. Hao, J. Lu, P. Zhao, C. Zhang, S. C. Hoi, C. Miao. Second-order online active learning and its applications. IEEE Transactions on Knowledge and Data Engineering, 2017.
[17]
S. Hao, P. Zhao, J. Lu, S. C. Hoi, C. Miao, C. Zhang. Soal: Second-order online active learning. In IEEE International Conference on Data Mining, 2016, pp. 931--936.
[18]
R. Horn, C. Johnson. Matrix analysis. Cambridge University Express, 1990.
[19]
G. Hulten, L. Spencer, P. Domingos. Mining time-changing data streams. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp. 97--106.
[20]
S. C. Hoi, R. Jin, J. Zhu, M. R. Lyu. Batch mode active learning and its application to medical image classification. In International Conference on Machine Learning, 2006, pp. 417--424.
[21]
S. J. Huang, J. L. Chen, X. Mu, Z. H. Zhou. Cost-Effective active learning from diverse labelers. In International Joint Conference on Artificial Intelligence, 2017, pp. 1879--1885.
[22]
K. Konyushkova, R. Sznitman, P. Fua. Learning active learning from data. In Advances in Neural Information Processing Systems, 2017, pp. 4228--4238.
[23]
A. Krishnamurthy, A. Agarwal, T. Huang, D. Hal and J. Langford. Active learning for cost-sensitive classification. In International Conference on Machine Learning, 2017, pp. 1915--1924.
[24]
Y. Li, P. M. Long. The relaxed online maximum margin algorithm. In Advances in Neural Information Processing Systems, 2000, pp. 498--504.
[25]
J. Lu, P. Zhao, S. C. Hoi. Online passive-aggressive active learning. Machine Learning, 2016, Vol. 103, No. 2, pp. 141--183.
[26]
S. O. Moepya, S. S. Akhoury, F. V. Nelwamondo. Applying cost-sensitive classification for financial fraud detection under high class-imbalance. In IEEE International Conference on Data Mining, 2014, pp. 183--192.
[27]
F. Nan, V. Saligrama. Adaptive classification for prediction under a budget. In Advances in Neural Information Processing Systems, 2017, pp. 4730--4740.
[28]
V. S. Sheng, F. Provost, P. G. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 614--622.
[29]
J. Wang, P. Zhao and S. C. Hoi. Cost-sensitive online classification. IEEE Transactions on Knowledge and Data Engineering, 2014, vol. 26, no. 10, pp. 2425--2438.
[30]
X. Zhang, T. Yang, P. Srinivasan. Online asymmetric active learning with imbalanced data. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 2055--2064.
[31]
Y. Zhang, G. Shu, Y. Li. Strategy-updating depending on local environment enhances cooperation in prisoner's dilemma game. Applied Mathematics and Computation, 2017, vol. 301, pp. 224--232.
[32]
P. Zhao, S. C. Hoi. Cost-sensitive online active learning with application to malicious URL detection. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, pp. 919--927.
[33]
P. Zhao, F. Zhuang, M. Wu, X. Li, and S. C. Hoi. Cost-sensitive online classification with adaptive regularization and its applications. In IEEE International Conference on Data Mining, 2015, pp. 649--658.
[34]
P. Zhao, Y. Zhang, M. Wu, S. C. Hoi, M. Tan, J. Huang. Adaptive cost-sensitive online classification. IEEE Transactions on Knowledge and Data Engineering, 2018.
[35]
I. Zliobaite, A. Bifet, B. Pfahringer, G. Holmes. Active learning with drifting streaming data. IEEE Transactions on Neural Networks and Learning Systems, 2014, Vol. 25, No. 1, pp. 27--39.

Cited By

View all
  • (2024)Parameters Efficient Fine-Tuning for Long-Tailed Sequential RecommendationArtificial Intelligence10.1007/978-981-99-8850-1_36(442-459)Online publication date: 4-Feb-2024
  • (2023)Deep Long-Tailed Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326811845:9(10795-10816)Online publication date: 1-Sep-2023
  • (2023)Active Learning for Human-in-the-Loop Customs InspectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.314429935:12(12039-12052)Online publication date: 1-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. active learning
  2. anomaly detection
  3. cost-sensitive learning
  4. imbalance data
  5. online learning
  6. query budget

Qualifiers

  • Research-article

Funding Sources

  • Recruitment Program for Young Professionals
  • Guangdong Provincial Scientific and Technological funds
  • National Natural Science Foundation of China (NSFC)
  • Pearl River S$\&$T Nova Program of Guangzhou
  • CCF-Tencent Open Research Fund
  • Fundamental Research Funds for the Central Universities

Conference

KDD '18
Sponsor:

Acceptance Rates

KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Parameters Efficient Fine-Tuning for Long-Tailed Sequential RecommendationArtificial Intelligence10.1007/978-981-99-8850-1_36(442-459)Online publication date: 4-Feb-2024
  • (2023)Deep Long-Tailed Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326811845:9(10795-10816)Online publication date: 1-Sep-2023
  • (2023)Active Learning for Human-in-the-Loop Customs InspectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.314429935:12(12039-12052)Online publication date: 1-Dec-2023
  • (2023)SuperDisco: Super-Class Discovery Improves Visual Recognition for the Long-Tail2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01910(19944-19954)Online publication date: Jun-2023
  • (2023)Long-Tailed Visual Recognition via Improved Cross-Window Self-Attention and TrivialAugmentIEEE Access10.1109/ACCESS.2023.327720411(49601-49610)Online publication date: 2023
  • (2023)A dual‐balanced network for long‐tail distribution object detectionIET Computer Vision10.1049/cvi2.1218217:5(565-575)Online publication date: 3-Mar-2023
  • (2023)DRL: Dynamic rebalance learning for adversarial robustness of UAV with long-tailed distributionComputer Communications10.1016/j.comcom.2023.04.002205(14-23)Online publication date: May-2023
  • (2023)Adaptive Embedding and Distribution Re-margin for Long-Tail RecognitionArtificial Neural Networks and Machine Learning – ICANN 202310.1007/978-3-031-44198-1_4(38-50)Online publication date: 22-Sep-2023
  • (2022)Self-supervised aggregation of diverse experts for test-agnostic long-tailed recognitionProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602740(34077-34090)Online publication date: 28-Nov-2022
  • (2022)ALLIE: Active Learning on Large-scale Imbalanced GraphsProceedings of the ACM Web Conference 202210.1145/3485447.3512229(690-698)Online publication date: 25-Apr-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media