skip to main content
10.1145/3539618.3591817acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

A Retrieval System for Images and Videos based on Aesthetic Assessment of Visuals

Published: 18 July 2023 Publication History

Abstract

Attractive images or videos are the visual backbones of journalism and social media to gain the user's attention. From trailers to teaser images to image galleries, appealing visuals have only grown in importance over the years. However, selecting eye-catching shots from a video or the perfect image from large image collections is a challenging and time-consuming task. We present our tool that can assess image and video content from an aesthetic standpoint. We discovered that it is possible to perform such an assessment by combining expert knowledge with data-driven information. We combine the relevant aesthetic features and machine learning algorithms into an aesthetics retrieval system, which enables users to sort uploaded visuals based on an aesthetic score and interact with additional photographic, cinematic, and person-specific features.

References

[1]
[n. d.]. Docker Compose. https://docs.docker.com/compose/. Accessed: 2023-02-17.
[2]
[n. d.]. FastAPI framework, high performance, easy to learn, fast to code, ready for production. https://fastapi.tiangolo.com/. Accessed: 2023-02--17.
[3]
[n. d.]. Flickr. https://www.flickr.com/
[4]
[n. d.]. Google Photos. https://www.google.com/photos/
[5]
[n. d.]. Helm. The package manager for Kubernetes. https://helm.sh/. Accessed: 2023-02--17.
[6]
[n. d.]. Kubernetes. Production-Grade Container Orchestration. https:// kubernetes.io/. Accessed: 2023-02--17.
[7]
[n. d.]. React. A JavaScript library for building user interfaces. https://reactjs.org/. Accessed: 2023-02--17.
[8]
[n. d.]. Tamedia Image Concierge. https://www.epfl.ch/labs/lsir/tamedia-image-concierge/
[9]
[n. d.]. TinyDB, your tiny, document oriented database optimized for your happiness. https://tinydb.readthedocs.io/en/latest/. Accessed: 2023-02--17.
[10]
[n. d.]. Unplash. https://unsplash.com/
[11]
Aasif Ansari and Muzammil H Mohammed. 2015. Content based video retrieval systems-methods, techniques, trends and challenges. International Journal of Computer Applications 112, 7 (2015).
[12]
Y Alp Aslandogan and Clement T. Yu. 1999. Techniques and systems for image and video retrieval. IEEE transactions on Knowledge and Data Engineering 11, 1 (1999), 56--63.
[13]
Tunç Ozan Aydin, Aljoscha Smolic, and Markus Gross. 2014. Automated aesthetic analysis of photographic images. IEEE transactions on visualization and computer graphics 21, 1 (2014), 31--42.
[14]
Aaron Bangor, Philip T Kortum, and James T Miller. 2008. An empirical evaluation of the system usability scale. Intl. Journal of Human--Computer Interaction 24, 6 (2008), 574--594.
[15]
Luigi Celona, Marco Leonardi, Paolo Napoletano, and Alessandro Rozza. 2022. Composition and Style Attributes Guided Image Aesthetic Assessment. IEEE Transactions on Image Processing 31 (2022), 5009--5024. https://doi.org/10.1109/ TIP.2022.3191853
[16]
Qiuyu Chen, Wei Zhang, Ning Zhou, Peng Lei, Yi Xu, Yu Zheng, and Jianping Fan. 2020. Adaptive fractional dilated convolution network for image aesthetics assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14114--14123.
[17]
John P Chin, Virginia A Diehl, and Kent L Norman. 1988. Development of an instrument measuring user satisfaction of the human-computer interface. In Proceedings of the SIGCHI conference on Human factors in computing systems. 213--218.
[18]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.
[19]
Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2017. Image aesthetic assessment: An experimental survey. IEEE Signal Processing Magazine 34, 4 (2017), 80--106.
[20]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations. https://openreview.net/forum?id= YicbFdNTTy
[21]
Lorenz Gen, Flo and Ramzi. 2016. EyeEm. https://developer.nvidia.com/blog/ understanding-aesthetics-deep-learning
[22]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[23]
Vlad Hosu, Bastian Goldlucke, and Dietmar Saupe. 2019. Effective aesthetics prediction with multi-level spatially pooled features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9375--9383.
[24]
Qingqiu Huang, Yu Xiong, Anyi Rao, Jiaze Wang, and Dahua Lin. 2020. Movienet: A holistic dataset for movie understanding. In European Conference on Computer Vision. Springer, 709--727.
[25]
Saikishore Kalloori, Francesco Ricci, and Rosella Gennari. 2018. Eliciting pairwise preferences in recommender systems. In Proceedings of the 12th ACM Conference on Recommender Systems. 329--337.
[26]
Saikishore Kalloori, Francesco Ricci, and Marko Tkalcic. 2016. Pairwise preferences based matrix factorization and nearest neighbor recommendation techniques. In Proceedings of the 10th ACM Conference on Recommender Systems. 143--146.
[27]
Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, and Charless Fowlkes. 2016. Photo aesthetics ranking network with attributes and content adaptation. In European conference on computer vision. Springer, 662--679.
[28]
Xin Lu, Zhe Lin, Hailin Jin, Jianchao Yang, and James Z. Wang. 2014. RAPID: Rating Pictorial Aesthetics Using Deep Learning. In Proceedings of the 22nd ACM International Conference on Multimedia (Orlando, Florida, USA) (MM '14). Association for Computing Machinery, New York, NY, USA, 457--466. https: //doi.org/10.1145/2647868.2654927
[29]
Xin Lu, Zhe Lin, Xiaohui Shen, Radomír Mech, and James Z. Wang. 2015. Deep Multi-patch Aggregation Network for Image Style, Aesthetics, and Quality Estimation. In 2015 IEEE International Conference on Computer Vision (ICCV). 990--998. https://doi.org/10.1109/ICCV.2015.119
[30]
Wei Luo, Xiaogang Wang, and Xiaoou Tang. 2011. Content-based photo quality assessment. In 2011 International Conference on Computer Vision. IEEE, 2206--2213.
[31]
Shuang Ma, Jing Liu, and Chang Wen Chen. 2017. A-lamp: Adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4535--4544.
[32]
Yu-Fei Ma, Lie Lu, Hong-Jiang Zhang, and Mingjing Li. 2002. A user attention model for video summarization. In Proceedings of the tenth ACM international conference on Multimedia. 533--542.
[33]
Naila Murray, Luca Marchesotti, and Florent Perronnin. 2012. AVA: A large-scale database for aesthetic visual analysis. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2408--2415.
[34]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al . 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
[35]
BV Patel and BB Meshram. 2012. Content based video retrieval systems. arXiv preprint arXiv:1205.1641 (2012).
[36]
Luan Pham, The Huynh Vu, and Tuan Anh Tran. 2021. Facial expression recognition using residual masking network. In 2020 25Th international conference on pattern recognition (ICPR). IEEE, 4513--4519.
[37]
Guoping Qiu. 2022. Challenges and opportunities of image and video retrieval. Frontiers in Imaging 1 (2022), 2.
[38]
Anyi Rao, Jiaze Wang, Linning Xu, Xuekun Jiang, Qingqiu Huang, Bolei Zhou, and Dahua Lin. 2020. A unified framework for shot type classification based on subject centric lens. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XI 16. Springer, 17--34.
[39]
Luca Rossetto, Ivan Giangreco, Claudiu Tanase, and Heiko Schuldt. 2016. Vitrivr: A Flexible Retrieval Stack Supporting Multiple Query Modes for Searching in Multimedia Collections. In Proceedings of the 24th ACM International Conference on Multimedia (Amsterdam, The Netherlands) (MM '16). Association for Computing Machinery, New York, NY, USA, 1183--1186. https://doi.org/10.1145/2964284. 2973797
[40]
El Mehdi Saoudi and Said Jai-Andaloussi. 2021. A distributed content-based video retrieval system for large datasets. Journal of Big Data 8, 1 (2021), 1--26.
[41]
Ville Satopaa, Jeannie Albrecht, David Irwin, and Barath Raghavan. 2011. Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior. In 2011 31st International Conference on Distributed Computing Systems Workshops. 166--171. https://doi.org/10.1109/ICDCSW.2011.20
[42]
Farhana Sultana, Abu Sufian, and Paramartha Dutta. 2020. Evolution of image segmentation using deep convolutional neural network: a survey. Knowledge-Based Systems 201 (2020), 106062.
[43]
Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision. 843--852.
[44]
Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE transactions on image processing 27, 8 (2018), 3998--4011.
[45]
Munan Xu, Jia-Xing Zhong, Yurui Ren, Shan Liu, and Ge Li. 2020. Context-aware attention network for predicting image aesthetic subjectivity. In Proceedings of the 28th ACM International Conference on Multimedia. 798--806.
[46]
Feng Yang, Junjie Ke, Peyman Milanfar, Qifei Wang, and Yilin Wang. 2021. MUSIQ: Multi-scale Image Quality Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
[47]
Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016. Video summarization with long short-term memory. In European conference on computer vision. Springer, 766--782.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2023
3567 pages
ISBN:9781450394086
DOI:10.1145/3539618
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. aesthetics assessment
  2. multimedia retrieval
  3. retrieval system

Qualifiers

  • Short-paper

Conference

SIGIR '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 137
    Total Downloads
  • Downloads (Last 12 months)110
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media