Article

Free access

Runtime neural pruning

Authors:

Jie ZhouAuthors Info & Claims

NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems

Pages 2178 - 2188

Published: 04 December 2017 Publication History

PDF eReader Publisher Site

Abstract

In this paper, we propose a Runtime Neural Pruning (RNP) framework which prunes the deep neural network dynamically at the runtime. Unlike existing neural pruning methods which produce a fixed pruned model for deployment, our method preserves the full ability of the original network and conducts pruning according to the input image and current feature maps adaptively. The pruning is performed in a bottom-up, layer-by-layer manner, which we model as a Markov decision process and use reinforcement learning for training. The agent judges the importance of each convolutional kernel and conducts channel-wise pruning conditioned on different samples, where the network is pruned more when the image is easier for the task. Since the ability of network is fully preserved, the balance point is easily adjustable according to the available resources. Our method can be applied to off-the-shelf network structures and reach a better tradeoff between speed and accuracy, especially with a large pruning rate.

References

[1]

Amjad Almahairi, Nicolas Ballas, Tim Cooijmans, Yin Zheng, Hugo Larochelle, and Aaron Courville. Dynamic capacity networks. arXiv preprint arXiv:1511.07838, 2015.

Digital Library

[2]

Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung. Structured pruning of deep convolutional neural networks. arXiv preprint arXiv:1512.08571, 2015.

[3]

Richard Bellman. Dynamic programming and lagrange multipliers. PNAS, 42(10):767-769, 1956.

[4]

Djalel Benbouzid, Róbert Busa-Fekete, and Balázs Kégl. Fast classification using sparse decision dags. arXiv preprint arXiv:1206.6387, 2012.

Digital Library

[5]

Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, and Doina Precup. Conditional computation in neural networks for faster models. arXiv preprint arXiv:1511.06297, 2015.

[6]

Yoshua Bengio, Nicholas Léonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.

[7]

Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. Adaptive neural networks for fast test-time prediction. arXiv preprint arXiv:1702.07811, 2017.

[8]

Juan C Caicedo and Svetlana Lazebnik. Active object localization with deep reinforcement learning. In ICCV, pages 2488-2496, 2015.

Digital Library

[9]

Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759, 2014.

[10]

Michael Figurnov, Maxwell D Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry Vetrov, and Ruslan Salakhutdinov. Spatially adaptive computation time for residual networks. arXiv preprint arXiv:1612.02297, 2016.

[11]

Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.

[12]

Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. In NIPS, pages 1135-1143, 2015.

Digital Library

[13]

Stephen José Hanson and Lorien Y Pratt. Comparing biases for minimal network construction with back-propagation. In NIPS, pages 177-185, 1989.

[14]

Babak Hassibi, David G Stork, et al. Second order derivatives for network pruning: Optimal brain surgeon. NIPS, pages 164-164, 1993.

[15]

He He, Jason Eisner, and Hal Daume. Imitation learning by coaching. In Advances in Neural Information Processing Systems, pages 3149-3157, 2012.

[16]

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.

[17]

Hengyuan Hu, Rui Peng, Yu-Wing Tai, and Chi-Keung Tang. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016.

[18]

Gary B Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report, Technical Report 07-49, University of Massachusetts, Amherst, 2007.

[19]

Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866, 2014.

[20]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.

[21]

Sergey Karayev, Tobias Baumgartner, Mario Fritz, and Trevor Darrell. Timely object recognition. In Advances in Neural Information Processing Systems, pages 890-898, 2012.

[22]

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009.

[23]

Neeraj Kumar, Alexander Berg, Peter N Belhumeur, and Shree Nayar. Describable visual attributes for face verification and image search. PAMI, 33(10):1962-1977, 2011.

Digital Library

[24]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.

[25]

Yann LeCun, John S Denker, and Sara A Solla. Optimal brain damage. In NIPS, pages 598-605, 1990.

[26]

Sam Leroux, Steven Bohez, Elias De Coninck, Tim Verbelen, Bert Vankeirsbilck, Pieter Simoens, and Bart Dhoedt. The cascading neural network: building the internet of smart things. Knowledge and Information Systems, pages 1-24, 2017.

Digital Library

[27]

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016.

[28]

Haoxiang Li, Zhe Lin, Xiaohui Shen, Jonathan Brandt, and Gang Hua. A convolutional neural network cascade for face detection. In CVPR, pages 5325-5334, 2015.

[29]

Michael L Littman. Reinforcement learning improves behaviour from evaluative feedback. Nature, 521(7553):445-451, 2015.

[30]

Lanlan Liu and Jia Deng. Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution. arXiv preprint arXiv:1701.00299, 2017.

[31]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.

[32]

Kenton Murray and David Chiang. Auto-sizing neural networks: With applications to n-gram language models. arXiv preprint arXiv:1508.05051, 2015.

[33]

Augustus Odena, Dieterich Lawson, and Christopher Olah. Changing model behavior at test-time using reinforcement learning. arXiv preprint arXiv:1702.07780, 2017.

[34]

Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.

Digital Library

[35]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, pages 91-99, 2015.

Digital Library

[36]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. IJCV, 115(3):211-252, 2015.

Digital Library

[37]

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[38]

Nikko Ström. Phoneme probability estimation with dynamic sparsely connected artificial neural networks. The Free Speech Journal, 5:1-41, 1997.

[39]

Chen Sun, Manohar Paluri, Ronan Collobert, Ram Nevatia, and Lubomir Bourdev. Pronet: Learning to propose object-specific boxes for cascaded neural networks. In CVPR, pages 3485-3493, 2016.

[40]

Yi Sun, Xiaogang Wang, and Xiaoou Tang. Deep convolutional network cascade for facial point detection. In CVPR, pages 3476-3483, 2013.

Digital Library

[41]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261, 2016.

[42]

Tijmen Tieleman and Geoffrey Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, 4(2), 2012.

[43]

Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279-292, 1992.

[44]

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning structured sparsity in deep neural networks. In NIPS, pages 2074-2082, 2016.

Digital Library

[45]

Fangyi Zhang, Jürgen Leitner, Michael Milford, Ben Upcroft, and Peter Corke. Towards vision-based deep reinforcement learning for robotic motion control. arXiv preprint arXiv:1511.03791, 2015.

[46]

Xiangyu Zhang, Jianhua Zou, Kaiming He, and Jian Sun. Accelerating very deep convolutional networks for classification and detection. PAMI, 38(10):1943-1955, 2016.

Digital Library

Cited By

Pan JYang SFoo LKe QRahmani HFan ZLiu J(2024)Progressive Channel-Shrinking NetworkIEEE Transactions on Multimedia10.1109/TMM.2023.329119726(2016-2026)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3291197
Lee JMukhanov LMolahosseini AMinhas UHua YMartinez del Rincon JDichev KHong CVandierendonck H(2023)Resource-Efficient Convolutional Networks: A Survey on Model-, Arithmetic-, and Implementation-Level TechniquesACM Computing Surveys10.1145/358709555:13s(1-36)Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1145/3587095
Tang CZhai HOuyang KWang ZZhu YZhu WMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Arbitrary Bit-width Network: A Joint Layer-Wise Quantization and Adaptive Inference ApproachProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548001(2899-2908)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3548001
Show More Cited By

Runtime neural pruning
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Structured Pruning with Automatic Pruning Rate Derivation for Image Processing Neural Networks
ISMSI '22: Proceedings of the 2022 6th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence

Structured pruning has been proposed for network model compression. Because most of existing structured pruning methods assign pruning rate manually, finding appropriate pruning rate to suppress the degradation of pruned model accuracy is difficult. ...
Accelerator-Aware Pruning for Convolutional Neural Networks
Convolutional neural networks have shown tremendous performance capabilities in computer vision tasks, but their excessive amounts of weight storage and arithmetic operations prevent them from being adopted in embedded environments. One of the solutions ...
Recursive least squares method for training and pruning convolutional neural networks
Abstract
Convolutional neural networks (CNNs) have shown good performance in many practical applications. However, their high computational and storage requirements make them difficult to deploy on resource-constrained devices. To address this issue, in ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems

December 2017

7104 pages

ISBN:9781510860964

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 04 December 2017

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
538
Total Downloads

Downloads (Last 12 months)265
Downloads (Last 6 weeks)27

Reflects downloads up to 15 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pan JYang SFoo LKe QRahmani HFan ZLiu J(2024)Progressive Channel-Shrinking NetworkIEEE Transactions on Multimedia10.1109/TMM.2023.329119726(2016-2026)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3291197
Lee JMukhanov LMolahosseini AMinhas UHua YMartinez del Rincon JDichev KHong CVandierendonck H(2023)Resource-Efficient Convolutional Networks: A Survey on Model-, Arithmetic-, and Implementation-Level TechniquesACM Computing Surveys10.1145/358709555:13s(1-36)Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1145/3587095
Tang CZhai HOuyang KWang ZZhu YZhu WMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Arbitrary Bit-width Network: A Joint Layer-Wise Quantization and Adaptive Inference ApproachProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548001(2899-2908)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3548001
Mendez JBierzynski KCuéllar MMorales D(2022)Edge Intelligence: Concepts, Architectures, Applications, and Future DirectionsACM Transactions on Embedded Computing Systems10.1145/348667421:5(1-41)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3486674
Ngo MLuo TQuek T(2021)Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge Computing: A Contextual-Bandit ApproachACM Transactions on Internet of Things10.1145/34801723:1(1-23)Online publication date: 27-Oct-2021
https://dl.acm.org/doi/10.1145/3480172
Laskaridis SKouris ALane N(2021)Adaptive Inference through Early-Exit NetworksProceedings of the 5th International Workshop on Embedded and Mobile Deep Learning10.1145/3469116.3470012(1-6)Online publication date: 25-Jun-2021
https://dl.acm.org/doi/10.1145/3469116.3470012
Maleki MNabipour-Meybodi AKamal MAfzali-Kusha APedram M(2021)An Energy-Efficient Inference Method in Convolutional Neural Networks Based on Dynamic Adjustment of the Pruning LevelACM Transactions on Design Automation of Electronic Systems10.1145/346097226:6(1-20)Online publication date: 1-Aug-2021
https://dl.acm.org/doi/10.1145/3460972
Lee GKim MKim MWoo SDemartini GZuccon GCulpepper JHuang ZTong H(2021)Efficient Multi-Scale Feature Generation Adaptive NetworkProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482337(883-892)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482337
Tan KWang D(2021)Towards Model Compression for Deep Learning Based Speech EnhancementIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2021.308228229(1785-1794)Online publication date: 20-May-2021
https://dl.acm.org/doi/10.1109/TASLP.2021.3082282
Samplawski CHuang JGanesan DMarlin B(2020)Towards Objection Detection Under IoT Resource ConstraintsProceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things10.1145/3417313.3429379(14-20)Online publication date: 16-Nov-2020
https://dl.acm.org/doi/10.1145/3417313.3429379
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents