research-article

Online Adaptive Asymmetric Active Learning for Budgeted Imbalanced Data

Authors:

Mingkui TanAuthors Info & Claims

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 2768 - 2777

https://doi.org/10.1145/3219819.3219948

Published: 19 July 2018 Publication History

Abstract

This paper investigates Online Active Learning (OAL) for imbalanced unlabeled datastream, where only a budget of labels can be queried to optimize some cost-sensitive performance measure. OAL can solve many real-world problems, such as anomaly detection in healthcare, finance and network security. In these problems, there are two key challenges: the query budget is often limited; the ratio between two classes is highly imbalanced. To address these challenges, existing work of OAL adopts either asymmetric losses or queries (an isolated asymmetric strategy) to tackle the imbalance, and uses first-order methods to optimize the cost-sensitive measure. However, they may incur two deficiencies: (1) the poor ability in handling imbalanced data due to the isolated asymmetric strategy; (2) relative slow convergence rate due to the first-order optimization. In this paper, we propose a novel Online Adaptive Asymmetric Active (OA3) learning algorithm, which is based on a new asymmetric strategy (merging both the asymmetric losses and queries strategies), and second-order optimization. We theoretically analyze its bounds, and also empirically evaluate it on four real-world online anomaly detection tasks. Promising results confirm the effectiveness and robustness of the proposed algorithm in various application domains.

References

[1]

P. Bachman, A. Sordoni, A. Trischler. Learning algorithms for active learning. In 34th International Conference on Machine Learning, 2017, pp. 301--310.

Digital Library

[2]

N. Abe, B. Zadrozny, J. Langford. Outlier detection by active learning. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 504--509.

Digital Library

[3]

C. Aggarwal, X. Kong, Q. Gu, J. Han, P. Yu. Active learning: a survey, Data Classification: Algorithms and Applications, 2014.

Digital Library

[4]

J. Attenberg, F. Provost. Why label when you can search? Alternatives to active learning for applying human resources to build classification models under extreme class imbalance. In smallSIGKDD International Conference on Knowledge Discovery and Data Mining, 2010, pp. 423--432.

Digital Library

[5]

N. Cesa-Bianchi, C. Gentile, L. Zaniboni. Worst-case analysis of selective sampling for linear classification. Journal of Machine Learning Research, 2006, No. 7, pp. 1205--1230.

Digital Library

[6]

N. Cesa-Bianchi, A. Conconi, C. Gentile. A second-order perceptron algorithm. SIAM Journal on Computing, 2005, No. 3, pp. 640--668.

Digital Library

[7]

S. Chakraborty, V. Balasubramanian, A. Sankar, S. Panchanathan, J. Ye. Batchrank: A novel batch mode active learning framework for hierarchical classification. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 99--108.

Digital Library

[8]

C. C. Chang, C. J. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, Vol. 2, No. 3, pp. 27.

Digital Library

[9]

K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 2006, pp. 551--585.

Digital Library

[10]

K. Crammer, A. Kulesza, M. Dredze. Adaptive regularization of weight vectors. In Advances in Neural Information Processing Systems, 2009, pp. 414--422.

Digital Library

[11]

M. Dundar, B. Krishnapuram, J. Bi, R. B. Rao. Learning classifiers when the training data is not IID. In International Joint Conference on Artificial Intelligence, 2007, pp. 756--761.

Digital Library

[12]

M. Fang, X. Zhu, B. Li, W. Ding, X. Wu. Self-taught active learning from crowds. In IEEE International Conference on Data Mining, 2012, pp. 858--863.

Digital Library

[13]

Z. Ferdowsi, R. Ghani, R. Settimi. Online active learning with imbalanced classes. In IEEE International Conference on Data Mining. 2013, pp. 1043--1048.

[14]

K. Fujii, H. Kashima. Budgeted stream-based active learning via adaptive submodular maximization. In Advances in Neural Information Processing Systems, 2016, pp. 514--522.

Digital Library

[15]

Y. Freund, R. E. Schapire. Large margin classification using the perceptron algorithm. Machine learning, 1999, No. 3, pp. 277--296.

Digital Library

[16]

S. Hao, J. Lu, P. Zhao, C. Zhang, S. C. Hoi, C. Miao. Second-order online active learning and its applications. IEEE Transactions on Knowledge and Data Engineering, 2017.

[17]

S. Hao, P. Zhao, J. Lu, S. C. Hoi, C. Miao, C. Zhang. Soal: Second-order online active learning. In IEEE International Conference on Data Mining, 2016, pp. 931--936.

[18]

R. Horn, C. Johnson. Matrix analysis. Cambridge University Express, 1990.

Digital Library

[19]

G. Hulten, L. Spencer, P. Domingos. Mining time-changing data streams. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp. 97--106.

Digital Library

[20]

S. C. Hoi, R. Jin, J. Zhu, M. R. Lyu. Batch mode active learning and its application to medical image classification. In International Conference on Machine Learning, 2006, pp. 417--424.

Digital Library

[21]

S. J. Huang, J. L. Chen, X. Mu, Z. H. Zhou. Cost-Effective active learning from diverse labelers. In International Joint Conference on Artificial Intelligence, 2017, pp. 1879--1885.

Digital Library

[22]

K. Konyushkova, R. Sznitman, P. Fua. Learning active learning from data. In Advances in Neural Information Processing Systems, 2017, pp. 4228--4238.

[23]

A. Krishnamurthy, A. Agarwal, T. Huang, D. Hal and J. Langford. Active learning for cost-sensitive classification. In International Conference on Machine Learning, 2017, pp. 1915--1924.

[24]

Y. Li, P. M. Long. The relaxed online maximum margin algorithm. In Advances in Neural Information Processing Systems, 2000, pp. 498--504.

Digital Library

[25]

J. Lu, P. Zhao, S. C. Hoi. Online passive-aggressive active learning. Machine Learning, 2016, Vol. 103, No. 2, pp. 141--183.

Digital Library

[26]

S. O. Moepya, S. S. Akhoury, F. V. Nelwamondo. Applying cost-sensitive classification for financial fraud detection under high class-imbalance. In IEEE International Conference on Data Mining, 2014, pp. 183--192.

[27]

F. Nan, V. Saligrama. Adaptive classification for prediction under a budget. In Advances in Neural Information Processing Systems, 2017, pp. 4730--4740.

[28]

V. S. Sheng, F. Provost, P. G. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 614--622.

Digital Library

[29]

J. Wang, P. Zhao and S. C. Hoi. Cost-sensitive online classification. IEEE Transactions on Knowledge and Data Engineering, 2014, vol. 26, no. 10, pp. 2425--2438.

[30]

X. Zhang, T. Yang, P. Srinivasan. Online asymmetric active learning with imbalanced data. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 2055--2064.

Digital Library

[31]

Y. Zhang, G. Shu, Y. Li. Strategy-updating depending on local environment enhances cooperation in prisoner's dilemma game. Applied Mathematics and Computation, 2017, vol. 301, pp. 224--232.

Digital Library

[32]

P. Zhao, S. C. Hoi. Cost-sensitive online active learning with application to malicious URL detection. In SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, pp. 919--927.

Digital Library

[33]

P. Zhao, F. Zhuang, M. Wu, X. Li, and S. C. Hoi. Cost-sensitive online classification with adaptive regularization and its applications. In IEEE International Conference on Data Mining, 2015, pp. 649--658.

Digital Library

[34]

P. Zhao, Y. Zhang, M. Wu, S. C. Hoi, M. Tan, J. Huang. Adaptive cost-sensitive online classification. IEEE Transactions on Knowledge and Data Engineering, 2018.

[35]

I. Zliobaite, A. Bifet, B. Pfahringer, G. Holmes. Active learning with drifting streaming data. IEEE Transactions on Neural Networks and Learning Systems, 2014, Vol. 25, No. 1, pp. 27--39.

Cited By

Lv ZWang FZhang SZhang WKuang KWu F(2024)Parameters Efficient Fine-Tuning for Long-Tailed Sequential RecommendationArtificial Intelligence10.1007/978-981-99-8850-1_36(442-459)Online publication date: 4-Feb-2024
https://doi.org/10.1007/978-981-99-8850-1_36
Zhang YKang BHooi BYan SFeng J(2023)Deep Long-Tailed Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326811845:9(10795-10816)Online publication date: 1-Sep-2023
https://doi.org/10.1109/TPAMI.2023.3268118
Kim SMai THan SPark SThi Nguyen DSo JSingh KCha M(2023)Active Learning for Human-in-the-Loop Customs InspectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.314429935:12(12039-12052)Online publication date: 1-Dec-2023
https://doi.org/10.1109/TKDE.2022.3144299
Show More Cited By

Index Terms

Online Adaptive Asymmetric Active Learning for Budgeted Imbalanced Data
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Anomaly detection
2. Theory of computation
  1. Design and analysis of algorithms
    1. Online algorithms
      1. Online learning algorithms
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Active learning

Recommendations

Online Asymmetric Active Learning with Imbalanced Data
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

This paper considers online learning with imbalanced streaming data under a query budget, where the act of querying for labels is constrained to a budget limit. We study different active querying strategies for classification. In particular, we propose ...
Learning from Imbalanced Data

With the continuous expansion of data availability in many large-scale, complex, and networked systems, such as surveillance, security, Internet, and finance, it becomes critical to advance the fundamental understanding of knowledge discovery and ...
Studying Active Learning in the Cost-Sensitive Framework
HICSS '12: Proceedings of the 2012 45th Hawaii International Conference on System Sciences

Active learning is a learning paradigm that actively acquires extra information with an "effort" for a certain "gain" when building learning models. This paper unifies the effort and gain by studying active learning in the cost-sensitive framework. The ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

July 2018

2925 pages

ISBN:9781450355520

DOI:10.1145/3219819

General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Recruitment Program for Young Professionals
Guangdong Provincial Scientific and Technological funds
National Natural Science Foundation of China (NSFC)
Pearl River S$\&$T Nova Program of Guangzhou
CCF-Tencent Open Research Fund
Fundamental Research Funds for the Central Universities

Conference

KDD '18

Sponsor:

KDD '18: The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 19 - 23, 2018

London, United Kingdom

Acceptance Rates

KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
1,028
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lv ZWang FZhang SZhang WKuang KWu F(2024)Parameters Efficient Fine-Tuning for Long-Tailed Sequential RecommendationArtificial Intelligence10.1007/978-981-99-8850-1_36(442-459)Online publication date: 4-Feb-2024
https://doi.org/10.1007/978-981-99-8850-1_36
Zhang YKang BHooi BYan SFeng J(2023)Deep Long-Tailed Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326811845:9(10795-10816)Online publication date: 1-Sep-2023
https://doi.org/10.1109/TPAMI.2023.3268118
Kim SMai THan SPark SThi Nguyen DSo JSingh KCha M(2023)Active Learning for Human-in-the-Loop Customs InspectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.314429935:12(12039-12052)Online publication date: 1-Dec-2023
https://doi.org/10.1109/TKDE.2022.3144299
Du YShen JZhen XSnoek C(2023)SuperDisco: Super-Class Discovery Improves Visual Recognition for the Long-Tail2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01910(19944-19954)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.01910
Song YLi MWang B(2023)Long-Tailed Visual Recognition via Improved Cross-Window Self-Attention and TrivialAugmentIEEE Access10.1109/ACCESS.2023.327720411(49601-49610)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3277204
Gong HLi YDong J(2023)A dual‐balanced network for long‐tail distribution object detectionIET Computer Vision10.1049/cvi2.1218217:5(565-575)Online publication date: 3-Mar-2023
https://dl.acm.org/doi/10.1049/cvi2.12182
Sun YChen YWu PWang XWang Q(2023)DRL: Dynamic rebalance learning for adversarial robustness of UAV with long-tailed distributionComputer Communications10.1016/j.comcom.2023.04.002205(14-23)Online publication date: May-2023
https://doi.org/10.1016/j.comcom.2023.04.002
Su YChen BFeng ZYan J(2023)Adaptive Embedding and Distribution Re-margin for Long-Tail RecognitionArtificial Neural Networks and Machine Learning – ICANN 202310.1007/978-3-031-44198-1_4(38-50)Online publication date: 22-Sep-2023
https://doi.org/10.1007/978-3-031-44198-1_4
Zhang YHooi BHong LFeng JKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Self-supervised aggregation of diverse experts for test-agnostic long-tailed recognitionProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602740(34077-34090)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602740
Cui LTang XKatariya SRao NAgrawal PSubbian KLee D(2022)ALLIE: Active Learning on Large-scale Imbalanced GraphsProceedings of the ACM Web Conference 202210.1145/3485447.3512229(690-698)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3485447.3512229
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents