skip to main content
10.1145/3495243.3560531acmconferencesArticle/Chapter ViewAbstractPublication PagesmobicomConference Proceedingsconference-collections
research-article
Public Access

Audio-domain position-independent backdoor attack via unnoticeable triggers

Published: 14 October 2022 Publication History

Abstract

Deep learning models have become key enablers of voice user interfaces. With the growing trend of adopting outsourced training of these models, backdoor attacks, stealthy yet effective training-phase attacks, have gained increasing attention. They inject hidden trigger patterns through training set poisoning and overwrite the model's predictions in the inference phase. Research in backdoor attacks has been focusing on image classification tasks, while there have been few studies in the audio domain. In this work, we explore the severity of audio-domain backdoor attacks and demonstrate their feasibility under practical scenarios of voice user interfaces, where an adversary injects (plays) an unnoticeable audio trigger into live speech to launch the attack. To realize such attacks, we consider jointly optimizing the audio trigger and the target model in the training phase, deriving a position-independent, unnoticeable, and robust audio trigger. We design new data poisoning techniques and penalty-based algorithms that inject the trigger into randomly generated temporal positions in the audio input during training, rendering the trigger resilient to any temporal position variations. We further design an environmental sound mimicking technique to make the trigger resemble unnoticeable situational sounds and simulate played over-the-air distortions to improve the trigger's robustness during the joint optimization process. Extensive experiments on two important applications (i.e., speech command recognition and speaker recognition) demonstrate that our attack can achieve an average success rate of over 99% under both digital and physical attack settings.

References

[1]
Hojjat Aghakhani, Thorsten Eisenhofer, Lea Schönherr, Dorothea Kolossa, Thorsten Holz, Christopher Kruegel, and Giovanni Vigna. 2020. VENOMAVE: Clean-Label Poisoning Against Speech Recognition. Computing Research Repository (CoRR), abs/2010.10682 (2020).
[2]
Jont B Allen and David A Berkley. 1979. Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America 65, 4 (1979), 943--950.
[3]
Amazon. 2022. Amazon SageMaker. https://docs.aws.amazon.com/sagemaker/index.html.
[4]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (IEEE S & P). 39--57.
[5]
Nicholas Carlini and David Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE security and privacy workshops (IEEE SPW). 1--7.
[6]
Guangke Chen, Sen Chenb, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, and Yang Liu. 2021. Who is real bob? adversarial attacks on speaker recognition systems. In 2021 IEEE Symposium on Security and Privacy (IEEE SP). 694--711.
[7]
Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017).
[8]
Douglas Coimbra de Andrade, Sabato Leo, Martin Loesener Da Silva Viana, and Christoph Bernkopf. 2018. A neural attention model for speech command recognition. arXiv preprint arXiv:1808.08929 (2018).
[9]
Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein. 2005. A tutorial on the cross-entropy method. Annals of operations research 134, 1 (2005), 19--67.
[10]
Yansong Gao, Change Xu, Derui Wang, Shiping Chen, Damith C Ranasinghe, and Surya Nepal. 2019. Strip: A defence against trojan attacks on deep neural networks. In Proceedings of the 35th Annual Computer Security Applications Conference. 113--125.
[11]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
[12]
Google. 2022. Vertex AI | Google Cloud. https://cloud.google.com/vertex-ai.
[13]
Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2019. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access 7 (2019), 47230--47244.
[14]
M Shamim Hossain, Ghulam Muhammad, and Atif Alamri. 2019. Smart healthcare monitoring: a voice pathology detection paradigm for smart cities. Multimedia Systems 25, 5 (2019), 565--575.
[15]
Yehao Kong and Jiliang Zhang. 2019. Adversarial audio: A new information hiding method and backdoor for dnn-based speech recognition models. arXiv preprint arXiv:1904.03829 (2019).
[16]
Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, and Zhenyao Zhu. 2017. Deep speaker: an end-to-end neural speaker embedding system. arXiv preprint arXiv:1705.02304 (2017).
[17]
Yiming Li, Tongqing Zhai, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. 2021. Backdoor attack in the physical world. arXiv preprint arXiv:2104.02361 (2021).
[18]
Yiming Li, Tongqing Zhai, Baoyuan Wu, Yong Jiang, Zhifeng Li, and Shutao Xia. 2020. Rethinking the trigger of backdoor attack. arXiv preprint arXiv:2004.04692 (2020).
[19]
Zhuohang Li, Cong Shi, Yi Xie, Jian Liu, Bo Yuan, and Yingying Chen. 2020. Practical adversarial attacks against speaker recognition systems. In Proceedings of the 21st international workshop on mobile computing systems and applications. 9--14.
[20]
Zhuohang Li, Yi Wu, Jian Liu, Yingying Chen, and Bo Yuan. 2020. Advpulse: Universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 1121--1134.
[21]
Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2018. Fine-pruning: Defending against backdooring attacks on deep neural networks. In International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 273--294.
[22]
Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2017. Trojaning attack on neural networks. (2017).
[23]
Microsoft. 2022. Azure Machine Learning. https://azure.microsoft.com/en-us/services/machine-learning/.
[24]
Tuan Anh Nguyen and Anh Tran. 2020. Input-aware dynamic backdoor attack. Advances in Neural Information Processing Systems 33 (2020), 3454--3464.
[25]
Tuan Anh Nguyen and Anh Tuan Tran. 2020. WaNet-Imperceptible Warping-based Backdoor Attack. In International Conference on Learning Representations (ICLR).
[26]
Yao Qin, Nicholas Carlini, Garrison Cottrell, Ian Goodfellow, and Colin Raffel. 2019. Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In International conference on machine learning. PMLR, 5231--5240.
[27]
Mirco Ravanelli and Yoshua Bengio. 2018. Speaker recognition from raw waveform with sincnet. In 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 1021--1028.
[28]
Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pirsiavash. 2020. Hidden trigger backdoor attacks. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 11957--11965.
[29]
Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, and Yang Zhang. 2020. Dynamic backdoor attacks against machine learning models. arXiv preprint arXiv:2003.03675 (2020).
[30]
Lea Schönherr, Thorsten Eisenhofer, Steffen Zeiler, Thorsten Holz, and Dorothea Kolossa. 2020. Imperio: Robust over-the-air adversarial examples for automatic speech recognition systems. In Annual Computer Security Applications Conference. 843--855.
[31]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.
[32]
Ali Shafahi, W Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Goldstein. 2018. Poison frogs! targeted clean-label poisoning attacks on neural networks. Advances in neural information processing systems 31 (2018).
[33]
David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018. X-vectors: Robust dnn embeddings for speaker recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 5329--5333.
[34]
Yoiti Suzuki and Hisashi Takeshima. 2004. Equal-loudness-level contours for pure tones. The Journal of the Acoustical Society of America 116, 2 (2004), 918--933.
[35]
Raphael Tang and Jimmy Lin. 2017. Honk: A pytorch reimplementation of convolutional neural networks for keyword spotting. arXiv preprint arXiv:1710.06554 (2017).
[36]
Rana Tassabehji and Mumtaz A Kamala. 2012. Evaluating biometrics for online banking: The case for usability. International Journal of Information Management 32, 5 (2012), 489--494.
[37]
Tensorflow. 2020. Simple audio recognition: Recognizing keywords. https://www.tensorflow.org/tutorials/audio/simple_audio.
[38]
Amrita S Tulshan and Sudhir Namdeorao Dhage. 2018. Survey on virtual assistant: Google assistant, siri, cortana, alexa. In International symposium on signal processing and intelligent recognition systems. Springer, 190--201.
[39]
Roman Vygon and Nikolay Mikhaylovskiy. 2021. Learning efficient representations for keyword spotting with triplet loss. In International Conference on Speech and Computer. Springer, 773--785.
[40]
Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (IEEE SP). 707--723.
[41]
Pete Warden. 2018. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018).
[42]
Junichi Yamagishi, Christophe Veaux, and Kirsten MacDonald. 2019. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92).
[43]
Tongqing Zhai, Yiming Li, Ziqi Zhang, Baoyuan Wu, Yong Jiang, and Shu-Tao Xia. 2021. Backdoor attack against speaker verification. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2560--2564.
[44]
Lei Zhang, Yan Meng, Jiahao Yu, Chong Xiang, Brandon Falk, and Haojin Zhu. 2020. Voiceprint mimicry attack towards speaker verification system in smart home. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications. IEEE, 377--386.
[45]
Quan Zhang, Yifeng Ding, Yongqiang Tian, Jianmin Guo, Min Yuan, and Yu Jiang. 2021. AdvDoor: adversarial backdoor attack of deep learning system. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 127--138.

Cited By

View all
  • (2024)Inaudible Backdoor Attack via Stealthy Frequency Trigger Injection in Audio SpectrogramProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649345(31-45)Online publication date: 29-May-2024
  • (2024)Toward Stealthy Backdoor Attacks Against Speech Recognition via Elements of SoundIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.340488519(5852-5866)Online publication date: 2024
  • (2024)FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00148(1646-1664)Online publication date: 19-May-2024
  • Show More Cited By

Index Terms

  1. Audio-domain position-independent backdoor attack via unnoticeable triggers

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking
    October 2022
    932 pages
    ISBN:9781450391818
    DOI:10.1145/3495243
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. audio-domain backdoor attacks
    2. over-the-air physical attacks
    3. position-independent attacks

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ACM MobiCom '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 440 of 2,972 submissions, 15%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)469
    • Downloads (Last 6 weeks)56
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Inaudible Backdoor Attack via Stealthy Frequency Trigger Injection in Audio SpectrogramProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649345(31-45)Online publication date: 29-May-2024
    • (2024)Toward Stealthy Backdoor Attacks Against Speech Recognition via Elements of SoundIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.340488519(5852-5866)Online publication date: 2024
    • (2024)FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00148(1646-1664)Online publication date: 19-May-2024
    • (2024)Enrollment-Stage Backdoor Attacks on Speaker Recognition Systems via Adversarial UltrasoundIEEE Internet of Things Journal10.1109/JIOT.2023.332825311:8(13108-13124)Online publication date: 15-Apr-2024
    • (2024)Breaking Speaker Recognition with PaddingbackICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448169(4435-4439)Online publication date: 14-Apr-2024
    • (2024)Audio Steganography Based Backdoor Attack for Speech Recognition Software2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00161(1208-1217)Online publication date: 2-Jul-2024
    • (2024)A Systematic Review of Adversarial Machine Learning Attacks, Defensive Controls, and TechnologiesIEEE Access10.1109/ACCESS.2024.342332312(99382-99421)Online publication date: 2024
    • (2023)PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme InjectionProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3607199.3607240(366-380)Online publication date: 16-Oct-2023
    • (2023)MASTERKEY: Practical Backdoor Attack Against Speaker Verification SystemsProceedings of the 29th Annual International Conference on Mobile Computing and Networking10.1145/3570361.3613261(1-15)Online publication date: 2-Oct-2023
    • (2023)Stealthy Backdoor Attack Against Speaker Recognition Using Phase-Injection Hidden TriggerIEEE Signal Processing Letters10.1109/LSP.2023.329342930(1057-1061)Online publication date: 2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media