skip to main content
research-article
Open access

NERVE: Real-Time Neural Video Recovery and Enhancement on Mobile Devices

Published: 28 March 2024 Publication History

Abstract

As mobile devices become increasingly popular for video streaming, it is crucial to optimize the streaming experience for these devices. Although deep learning-based video enhancement techniques are gaining attention, most of them cannot support real-time enhancement on mobile devices. Additionally, many of these techniques are focused solely on super-resolution and cannot handle partial or complete loss or corruption of video frames, which is common in the Internet and wireless networks.
To overcome these challenges, we present NERVE, a novel approach in this paper. NERVE consists of (i) a novel video frame recovery scheme, (ii) a new super-resolution algorithm, and (iii) an enhancement-aware video bit rate adaptation algorithm. We implement NERVE on an iPhone 12, and it can support 30 frames per second (FPS). We evaluate NERVE in various networks such as 3G, 4G, 5G, and WiFi networks. Our evaluation shows that NERVE enables real-time video recovery and enhancement, and results in 24% - 83% increase in video Quality of Experience (QoE) in our video streaming system.

References

[1]
aioquic, 2019. https://github.com/aiortc/aioquic.
[2]
S. Aigner and M. Korner. Futuregan: Anticipating the future ¨ frames of video sequences using spatio-temporal 3d convolutions in progressively growing autoencoder GANs. arXiv:1810.01325, 2018.
[3]
Z. Akhtar, Y. S. Nam, R. Govindan, S. Rao, J. Chen, E. Katz-Bassett, B. Ribeiro, J. Zhan, and H. Zhang. Oboe: Auto-tuning video abr algorithms to network conditions. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pages 44--58, 2018.
[4]
H. Amirpour, M. Ghanbari, and C. Timmerer. Deepstream: Video streaming enhancements using compressed deep neural networks. IEEE Transactions on Circuits and Systems for Video Technology, 2022.
[5]
P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 33(5):898--916, May 2011.
[6]
K. C. Chan, X. Wang, K. Yu, C. Dong, and C. C. Loy. Basicvsr: The search for essential components in video superresolution and beyond. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4947--4956, 2021.
[7]
K. C. Chan, X. Wang, K. Yu, C. Dong, and C. C. Loy. Basicvsr: The search for essential components in video superresolution and beyond. Computer Vision and Pattern Recognition, 2021.
[8]
J. Chen, M. Hu, Z. Luo, Z. Wang, and D. Wu. Sr360: boosting 360-degree video streaming with super-resolution. In Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video, pages 1--6, 2020.
[9]
S. K. Chin and R. Braun. A survey of udp packet loss characteristics. In Proc. of Conference Record of 35th Asilomar Conference on Signals, Systems and Computers, 2001.
[10]
Chrome is deploying http/3 and ietf quic. https://blog.chromium.org/2020/10/chrome-is-deploying-http3-and-ietfquic. html.
[11]
M. Chu, Y. Xie, J. Mayer, L. Leal-Taixé, and N. Thuerey. Learning temporal coherence via self-supervision for gan-based video generation. ACM Transactions on Graphics (TOG), 39(4):75--1, 2020.
[12]
Cirp. https://cirpapple.substack.com/p/iphone-14-pro-and-pro-max-soar.
[13]
M. Dasari, A. Bhattacharya, S. Vargas, P. Sahu, A. Balasubramanian, and S. R. Das. Streaming 360-degree videos using super-resolution. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pages 1977--1986. IEEE, 2020.
[14]
D. Fuoli, S. Gu, and R. Timofte. Efficient video super-resolution through recurrent latent space propagation. arXiv: Image and Video Processing, 2019.
[15]
P. Hu, R. Misra, and S. Katti. Dejavu: Enhancing videoconferencing with prior knowledge. In Proceedings of the 20th International Workshop on Mobile Computing Systems and Applications, pages 63--68, 2019.
[16]
T.-Y. Huang, R. Johari, N. McKeown, M. Trunnell, and M.Watson. A buffer-based approach to rate adaptation: Evidence from a large video streaming service. In Proceedings of the 2014 ACM conference on SIGCOMM, pages 187--198, 2014.
[17]
H. Jiang, Z. Liu, Y. Wang, K. Lee, and I. Rhee. Understanding bufferbloat in cellular networks. In Proc. of CellNet, 2012.
[18]
H. Jiang, Y. Wang, K. Lee, and I. Rhee. Tackling bufferbloat in 3g/4g mobile networks. In Proc. of IMC, 2012.
[19]
J. Jiang, V. Sekar, and H. Zhang. Improving fairness, efficiency, and stability in http-based adaptive video streaming with festive. In Proceedings of the 8th international conference on Emerging networking experiments and technologies, pages 97--108, 2012.
[20]
J. Kang, S. W. Oh, and S. J. Kim. Error compensation framework for flow-guided video inpainting. In European Conference on Computer Vision, pages 375--390. Springer, 2022.
[21]
T. H. Kim, M. S. Sajjadi, M. Hirsch, and B. Scholkopf. Spatio-temporal transformer network for video restoration. In Proceedings of the European Conference on Computer Vision (ECCV), pages 106--122, 2018.
[22]
A. Langley, A. Riddoch, A. Wilk, A. Vicente, C. Krasic, D. Zhang, F. Yang, F. Kouranov, I. Swett, J. Iyengar, et al. The quic transport protocol: Design and internet-scale deployment. In Proceedings of the conference of the ACM special interest group on data communication, pages 183--196, 2017.
[23]
I. Lee, S. Kim, S. Sathyanarayana, K. Bin, S. Chong, K. Lee, D. Grunwald, and S. Ha. R-fec: Rl-based fec adjustment for better qoe in webrtc. In Proc. of MM, 2022.
[24]
W. Li, X. Tao, T. Guo, L. Qi, J. Lu, and J. Jia. Mucan: Multi-correspondence aggregation network for video superresolution. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part X 16, pages 335--351. Springer, 2020.
[25]
J. Lorincz, Z. Klarin. A comprehensive overview of tcp congestion control in 5g networks: Research challenges and future perspectives. Sensors, 2021.
[26]
H. Mao, R. Netravali, and M. Alizadeh. Neural adaptive video streaming with pensieve. In Proceedings of the conference of the ACM special interest group on data communication, pages 197--210, 2017.
[27]
Medium report about 'top 10 most popular types of videos on youtube'. https://mag.octoly.com/here-are-the-top-10- most-popular-types-of-videos-on-youtube-4ea1e1a192ac.
[28]
Mobile rrn. https://github.com/MediaTek-NeuroPilot/mai22-real-time-video-sr.
[29]
Nearly 60% of americans now stream video daily on smartphones, tablets and computers. https://www.nexttv.com/ news/nearly-60-of-americans-now-stream-video-daily-on-smart-phones-tablets-and-computers.
[30]
S. Nah, S. Baik, S. Hong, G. Moon, S. Son, R. Timofte, and K. M. Lee. Ntire 2019 challenge on video deblurring and super-resolution: Dataset and study. In CVPR Workshops, June 2019.
[31]
A. Narayanan, X. Zhang, R. Zhu, A. Hassan, S. Jin, X. Zhu, X. Zhang, D. Rybkin, Z. Yang, Z. M. Mao, F. Qian, and Z.-L. Zhang. A variegated look at 5g in the wild: performance, power, and qoe implications. In Proc. of SIGCOMM, 2021.
[32]
Capture network log. chrome://net-export/.
[33]
Proximal policy optimization (ppo). https://openai.com/blog/openai-baselines-ppo/.
[34]
Psnr. https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio.
[35]
A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever. Zero-shot text-to-image generation. arXiv: Computer Vision and Pattern Recognition, 2021.
[36]
A. Ranjan and M. J. Black. Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4161--4170, 2017.
[37]
I. S. Reed and G. Solomon. Polynomial codes over certain finite fields. Journal of the society for industrial and applied mathematics, 8(2):300--304, 1960.
[38]
M. S. Sajjadi, R. Vemulapalli, and M. Brown. Frame-recurrent video super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6626--6634, 2018.
[39]
V. Sanh, T. Wolf, and A. Rush. Movement pruning: Adaptive sparsity by fine-tuning. Advances in Neural Information Processing Systems, 33:20378--20389, 2020.
[40]
A. Sankisa, A. Punjabi, and A. K. Katsaggelos. Video error concealment using deep neural networks. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 380--384. IEEE, 2018.
[41]
A. Sankisa, A. Punjabi, and A. K. Katsaggelos. Temporal capsule networks for video motion estimation and error concealment. Signal, Image and Video Processing, 14(7):1369--1377, 2020.
[42]
W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1874--1883, 2016.
[43]
K. Spiteri, R. Urgaonkar, and R. K. Sitaraman. Bola: Near-optimal bitrate adaptation for online videos. IEEE/ACM Transactions On Networking, 28(4):1698--1711, 2020.
[44]
Ssim. https://en.wikipedia.org/wiki/Structural_similarity.
[45]
Z. Su, W. Liu, Z. Yu, D. Hu, Q. Liao, Q. Tian, M. Pietikäinen, and L. Liu. Pixel difference networks for efficient edge detection. International Conference on Computer Vision, 2021.
[46]
A. Terwilliger, G. Brazil, and X. Liu. Recurrent flow-guided semantic forecasting. In Proc. of WACV, 2019.
[47]
S. Tulyakov, M.-Y. Liu, X. Yang, and J. Kautz. Mocogan: Decomposing motion and content for video generation. In Proc. of CVPR, 2018.
[48]
C. Vondrick, H. Pirsiavash, and A. Torralba. Generating videos with scene dynamics. In Proc. of NeurIPS, 2016.
[49]
L. Wang, Y. Guo, Z. Lin, X. Deng, and W. An. Learning for video super-resolution through hr optical flow estimation. In Computer Vision--ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2--6, 2018, Revised Selected Papers, Part I 14, pages 514--529. Springer, 2019.
[50]
Y. Wang, L. Jiang, M.-H. Yang, L.-J. Li, M. Longand, and L. Fei-Fei. In Proc. of ICLR, 2019.
[51]
Wowza's dash bitrate recommendation. https://www.wowza.com/docs/how-to-encode-source-video-for-wowzastreaming- cloud.
[52]
J. Xiao, X. Jiang, N. Zheng, H. Yang, Y. Yang, Y. Yang, D. Li, and K.-M. Lam. Online video super-resolution with convolutional kernel bypass graft. 2022.
[53]
D. Xu, A. Zhou, X. Zhang, G. Wang, X. Liu, C. An, Y. Shi, L. Liu, and H. Ma. Understanding operational 5g: A first measurement study on its coverage, performance and energy consumption. In Proc. of SIGCOMM, 2020.
[54]
W. Yan, Y. Zhang, P. Abbeel, and A. Srinivas. Videogpt: Video generation using vq-vae and transformers. arXiv: Computer Vision and Pattern Recognition, 2021.
[55]
H. Yeo, C. J. Chong, Y. Jung, J. Ye, and D. Han. Nemo: enabling neural-enhanced video streaming on commodity mobile devices. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1--14, 2020.
[56]
H. Yeo, Y. Jung, J. Kim, J. Shin, and D. Han. Neural adaptive content-aware internet video delivery. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18), pages 645--661, 2018.
[57]
P. Yi, Z. Wang, K. Jiang, J. Jiang, and J. Ma. Progressive fusion video super-resolution network via exploiting nonlocal spatio-temporal correlations. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3106--3115, 2019.
[58]
X. Yin, A. Jindal, V. Sekar, and B. Sinopoli. A control-theoretic approach for dynamic adaptive video streaming over http. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, pages 325--338, 2015.
[59]
J. Zhang, Y. Wang, M. Long, W. Jianmin, and P. S. Yu. Z-order recurrent neural networks for video prediction. In Proc. of ICME, 2019.
[60]
Y. Zhang, N. Duffield, V. Paxson, and S. Shenker. On the constancy of internet path properties. In Proc. of IMW, 2001.
[61]
G. Zhou, Z. Luo, M. Hu, and D. Wu. Presr: Neural-enhanced adaptive streaming of vbr-encoded videos with selective prefetching. IEEE Transactions on Broadcasting, 2022.

Cited By

View all
  • (2024)Evaluating Novel Network Coding Schemes for Wirelessly Delivered Media Streams2024 2nd International Conference on Artificial Intelligence and Machine Learning Applications Theme: Healthcare and Internet of Things (AIMLA)10.1109/AIMLA59606.2024.10531387(1-6)Online publication date: 15-Mar-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Networking
Proceedings of the ACM on Networking  Volume 2, Issue CoNEXT1
PACMNET
March 2024
95 pages
EISSN:2834-5509
DOI:10.1145/3655593
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2024
Published in PACMNET Volume 2, Issue CoNEXT1

Check for updates

Author Tags

  1. bit rate adaptation
  2. mobile video streaming
  3. video enhancement
  4. video frame recovery

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)252
  • Downloads (Last 6 weeks)52
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Evaluating Novel Network Coding Schemes for Wirelessly Delivered Media Streams2024 2nd International Conference on Artificial Intelligence and Machine Learning Applications Theme: Healthcare and Internet of Things (AIMLA)10.1109/AIMLA59606.2024.10531387(1-6)Online publication date: 15-Mar-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media