research-article

Public Access

Audio-domain position-independent backdoor attack via unnoticeable triggers

Authors:

Tianfang Zhang,

Yingying ChenAuthors Info & Claims

MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking

Pages 583 - 595

https://doi.org/10.1145/3495243.3560531

Published: 14 October 2022 Publication History

Abstract

Deep learning models have become key enablers of voice user interfaces. With the growing trend of adopting outsourced training of these models, backdoor attacks, stealthy yet effective training-phase attacks, have gained increasing attention. They inject hidden trigger patterns through training set poisoning and overwrite the model's predictions in the inference phase. Research in backdoor attacks has been focusing on image classification tasks, while there have been few studies in the audio domain. In this work, we explore the severity of audio-domain backdoor attacks and demonstrate their feasibility under practical scenarios of voice user interfaces, where an adversary injects (plays) an unnoticeable audio trigger into live speech to launch the attack. To realize such attacks, we consider jointly optimizing the audio trigger and the target model in the training phase, deriving a position-independent, unnoticeable, and robust audio trigger. We design new data poisoning techniques and penalty-based algorithms that inject the trigger into randomly generated temporal positions in the audio input during training, rendering the trigger resilient to any temporal position variations. We further design an environmental sound mimicking technique to make the trigger resemble unnoticeable situational sounds and simulate played over-the-air distortions to improve the trigger's robustness during the joint optimization process. Extensive experiments on two important applications (i.e., speech command recognition and speaker recognition) demonstrate that our attack can achieve an average success rate of over 99% under both digital and physical attack settings.

References

[1]

Hojjat Aghakhani, Thorsten Eisenhofer, Lea Schönherr, Dorothea Kolossa, Thorsten Holz, Christopher Kruegel, and Giovanni Vigna. 2020. VENOMAVE: Clean-Label Poisoning Against Speech Recognition. Computing Research Repository (CoRR), abs/2010.10682 (2020).

[2]

Jont B Allen and David A Berkley. 1979. Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America 65, 4 (1979), 943--950.

[3]

Amazon. 2022. Amazon SageMaker. https://docs.aws.amazon.com/sagemaker/index.html.

[4]

Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (IEEE S & P). 39--57.

[5]

Nicholas Carlini and David Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE security and privacy workshops (IEEE SPW). 1--7.

[6]

Guangke Chen, Sen Chenb, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, and Yang Liu. 2021. Who is real bob? adversarial attacks on speaker recognition systems. In 2021 IEEE Symposium on Security and Privacy (IEEE SP). 694--711.

[7]

Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017).

[8]

Douglas Coimbra de Andrade, Sabato Leo, Martin Loesener Da Silva Viana, and Christoph Bernkopf. 2018. A neural attention model for speech command recognition. arXiv preprint arXiv:1808.08929 (2018).

[9]

Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein. 2005. A tutorial on the cross-entropy method. Annals of operations research 134, 1 (2005), 19--67.

[10]

Yansong Gao, Change Xu, Derui Wang, Shiping Chen, Damith C Ranasinghe, and Surya Nepal. 2019. Strip: A defence against trojan attacks on deep neural networks. In Proceedings of the 35th Annual Computer Security Applications Conference. 113--125.

Digital Library

[11]

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).

[12]

Google. 2022. Vertex AI | Google Cloud. https://cloud.google.com/vertex-ai.

[13]

Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2019. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access 7 (2019), 47230--47244.

[14]

M Shamim Hossain, Ghulam Muhammad, and Atif Alamri. 2019. Smart healthcare monitoring: a voice pathology detection paradigm for smart cities. Multimedia Systems 25, 5 (2019), 565--575.

[15]

Yehao Kong and Jiliang Zhang. 2019. Adversarial audio: A new information hiding method and backdoor for dnn-based speech recognition models. arXiv preprint arXiv:1904.03829 (2019).

[16]

Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, and Zhenyao Zhu. 2017. Deep speaker: an end-to-end neural speaker embedding system. arXiv preprint arXiv:1705.02304 (2017).

[17]

Yiming Li, Tongqing Zhai, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. 2021. Backdoor attack in the physical world. arXiv preprint arXiv:2104.02361 (2021).

[18]

Yiming Li, Tongqing Zhai, Baoyuan Wu, Yong Jiang, Zhifeng Li, and Shutao Xia. 2020. Rethinking the trigger of backdoor attack. arXiv preprint arXiv:2004.04692 (2020).

[19]

Zhuohang Li, Cong Shi, Yi Xie, Jian Liu, Bo Yuan, and Yingying Chen. 2020. Practical adversarial attacks against speaker recognition systems. In Proceedings of the 21st international workshop on mobile computing systems and applications. 9--14.

Digital Library

[20]

Zhuohang Li, Yi Wu, Jian Liu, Yingying Chen, and Bo Yuan. 2020. Advpulse: Universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 1121--1134.

Digital Library

[21]

Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2018. Fine-pruning: Defending against backdooring attacks on deep neural networks. In International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 273--294.

[22]

Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2017. Trojaning attack on neural networks. (2017).

[23]

Microsoft. 2022. Azure Machine Learning. https://azure.microsoft.com/en-us/services/machine-learning/.

[24]

Tuan Anh Nguyen and Anh Tran. 2020. Input-aware dynamic backdoor attack. Advances in Neural Information Processing Systems 33 (2020), 3454--3464.

[25]

Tuan Anh Nguyen and Anh Tuan Tran. 2020. WaNet-Imperceptible Warping-based Backdoor Attack. In International Conference on Learning Representations (ICLR).

[26]

Yao Qin, Nicholas Carlini, Garrison Cottrell, Ian Goodfellow, and Colin Raffel. 2019. Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In International conference on machine learning. PMLR, 5231--5240.

[27]

Mirco Ravanelli and Yoshua Bengio. 2018. Speaker recognition from raw waveform with sincnet. In 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 1021--1028.

[28]

Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pirsiavash. 2020. Hidden trigger backdoor attacks. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 11957--11965.

[29]

Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, and Yang Zhang. 2020. Dynamic backdoor attacks against machine learning models. arXiv preprint arXiv:2003.03675 (2020).

[30]

Lea Schönherr, Thorsten Eisenhofer, Steffen Zeiler, Thorsten Holz, and Dorothea Kolossa. 2020. Imperio: Robust over-the-air adversarial examples for automatic speech recognition systems. In Annual Computer Security Applications Conference. 843--855.

Digital Library

[31]

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.

[32]

Ali Shafahi, W Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Goldstein. 2018. Poison frogs! targeted clean-label poisoning attacks on neural networks. Advances in neural information processing systems 31 (2018).

[33]

David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018. X-vectors: Robust dnn embeddings for speaker recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 5329--5333.

Digital Library

[34]

Yoiti Suzuki and Hisashi Takeshima. 2004. Equal-loudness-level contours for pure tones. The Journal of the Acoustical Society of America 116, 2 (2004), 918--933.

[35]

Raphael Tang and Jimmy Lin. 2017. Honk: A pytorch reimplementation of convolutional neural networks for keyword spotting. arXiv preprint arXiv:1710.06554 (2017).

[36]

Rana Tassabehji and Mumtaz A Kamala. 2012. Evaluating biometrics for online banking: The case for usability. International Journal of Information Management 32, 5 (2012), 489--494.

Digital Library

[37]

Tensorflow. 2020. Simple audio recognition: Recognizing keywords. https://www.tensorflow.org/tutorials/audio/simple_audio.

[38]

Amrita S Tulshan and Sudhir Namdeorao Dhage. 2018. Survey on virtual assistant: Google assistant, siri, cortana, alexa. In International symposium on signal processing and intelligent recognition systems. Springer, 190--201.

[39]

Roman Vygon and Nikolay Mikhaylovskiy. 2021. Learning efficient representations for keyword spotting with triplet loss. In International Conference on Speech and Computer. Springer, 773--785.

Digital Library

[40]

Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (IEEE SP). 707--723.

[41]

Pete Warden. 2018. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018).

[42]

Junichi Yamagishi, Christophe Veaux, and Kirsten MacDonald. 2019. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92).

[43]

Tongqing Zhai, Yiming Li, Ziqi Zhang, Baoyuan Wu, Yong Jiang, and Shu-Tao Xia. 2021. Backdoor attack against speaker verification. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2560--2564.

[44]

Lei Zhang, Yan Meng, Jiahao Yu, Chong Xiang, Brandon Falk, and Haojin Zhu. 2020. Voiceprint mimicry attack towards speaker verification system in smart home. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications. IEEE, 377--386.

Digital Library

[45]

Quan Zhang, Yifeng Ding, Yongqiang Tian, Jianmin Guo, Min Yuan, and Yu Jiang. 2021. AdvDoor: adversarial backdoor attack of deep learning system. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 127--138.

Digital Library

Cited By

Zhang TPhan HTang ZShi CWang YYuan BChen YGanesan DLane NShi W(2024)Inaudible Backdoor Attack via Stealthy Frequency Trigger Injection in Audio SpectrogramProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649345(31-45)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3636534.3649345
Cai HZhang PDong HXiao YKoffas SLi Y(2024)Toward Stealthy Backdoor Attacks Against Speech Recognition via Elements of SoundIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.340488519(5852-5866)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3404885
Lan JWang JYan BYan ZBertino E(2024)FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00148(1646-1664)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00148
Show More Cited By

Index Terms

Audio-domain position-independent backdoor attack via unnoticeable triggers
1. Security and privacy
  1. Network security
    1. Mobile and wireless security

Recommendations

MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems
ACM MobiCom '23: Proceedings of the 29th Annual International Conference on Mobile Computing and Networking

Speaker Verification (SV) is widely deployed in mobile systems to authenticate legitimate users by using their voice traits. In this work, we propose a backdoor attack MasterKey, to compromise the SV models. Different from previous attacks, we focus ...
Unnoticeable Backdoor Attacks on Graph Neural Networks
WWW '23: Proceedings of the ACM Web Conference 2023

Graph Neural Networks (GNNs) have achieved promising results in various tasks such as node classification and graph classification. Recent studies find that GNNs are vulnerable to adversarial attacks. However, effective backdoor attacks on graphs are ...
Inaudible Backdoor Attack via Stealthy Frequency Trigger Injection in Audio Spectrogram
ACM MobiCom '24: Proceedings of the 30th Annual International Conference on Mobile Computing and Networking

Deep learning-enabled Voice User Interfaces (VUIs) have surpassed human-level performance in acoustic perception tasks. However, the significant cost associated with training these models compels users to rely on third-party data or outsource training ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking

October 2022

932 pages

ISBN:9781450391818

DOI:10.1145/3495243

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOBILE: ACM Special Interest Group on Mobility of Systems, Users, Data and Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

ACM MobiCom '22

Sponsor:

SIGMOBILE

ACM MobiCom '22: The 28th Annual International Conference on Mobile Computing and Networking

October 17 - 21, 2022

NSW, Sydney, Australia

Acceptance Rates

Overall Acceptance Rate 440 of 2,972 submissions, 15%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
1,111
Total Downloads

Downloads (Last 12 months)469
Downloads (Last 6 weeks)56

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang TPhan HTang ZShi CWang YYuan BChen YGanesan DLane NShi W(2024)Inaudible Backdoor Attack via Stealthy Frequency Trigger Injection in Audio SpectrogramProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649345(31-45)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3636534.3649345
Cai HZhang PDong HXiao YKoffas SLi Y(2024)Toward Stealthy Backdoor Attacks Against Speech Recognition via Elements of SoundIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.340488519(5852-5866)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3404885
Lan JWang JYan BYan ZBertino E(2024)FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00148(1646-1664)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00148
Li XZe JYan CCheng YJi XXu W(2024)Enrollment-Stage Backdoor Attacks on Speaker Recognition Systems via Adversarial UltrasoundIEEE Internet of Things Journal10.1109/JIOT.2023.332825311:8(13108-13124)Online publication date: 15-Apr-2024
https://doi.org/10.1109/JIOT.2023.3328253
Ye ZYan DDong LShen K(2024)Breaking Speaker Recognition with PaddingbackICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448169(4435-4439)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10448169
Zhang MJi SCai HDong HZhang PLi Y(2024)Audio Steganography Based Backdoor Attack for Speech Recognition Software2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00161(1208-1217)Online publication date: 2-Jul-2024
https://doi.org/10.1109/COMPSAC61105.2024.00161
Malik JMuthalagu RPawar P(2024)A Systematic Review of Adversarial Machine Learning Attacks, Defensive Controls, and TechnologiesIEEE Access10.1109/ACCESS.2024.342332312(99382-99421)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3423323
Guo HWang GWang YChen BYan QXiao L(2023)PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme InjectionProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3607199.3607240(366-380)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3607199.3607240
Guo HChen XGuo JXiao LYan QCosta XAl Hassanieh HAsadi ACox LPerino DWidmer JGiustiniano D(2023)MASTERKEY: Practical Backdoor Attack Against Speaker Verification SystemsProceedings of the 29th Annual International Conference on Mobile Computing and Networking10.1145/3570361.3613261(1-15)Online publication date: 2-Oct-2023
https://dl.acm.org/doi/10.1145/3570361.3613261
Ye ZYan DDong LDeng JYu S(2023)Stealthy Backdoor Attack Against Speaker Recognition Using Phase-Injection Hidden TriggerIEEE Signal Processing Letters10.1109/LSP.2023.329342930(1057-1061)Online publication date: 2023
https://doi.org/10.1109/LSP.2023.3293429

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents