Skip to main content

PDF Malware Detection Using Visualization and Machine Learning

  • Conference paper
  • First Online:
Data and Applications Security and Privacy XXXV (DBSec 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12840))

Included in the following conference series:

Abstract

Recently, as more and more disasters caused by malware have been reported worldwide, people started to pay more attention to malware detection to prevent malicious attacks in advance. According to the diversity of the software platforms that people use, the malware also varies pretty much, for example: Xcode Ghost on iOS apps, FakePlayer on Android apps, and WannaCrypt on PC. Moreover, most of the time people ignore the potential security threats around us while surfing the internet, processing files or even reading email. The Portable Document Format (PDF) file, one of the most commonly used file types in the world, can be used to store texts, images, multimedia contents, and even scripts. However, with the increasing popularity and demands of PDF files, only a small fraction of people know how easy it could be to conceal malware in normal PDF files. In this paper, we propose a novel technique combining Malware Visualization and Image Classification to detect PDF files and identify which ones might be malicious. By extracting data from PDF files and traversing each object within, we can obtain the holistic tree-like structure of PDF files. Furthermore, according to the signature of the objects in the files, we assign different colors obtained from SimHash to generate RGB images. Lastly, our proposed model trained by the VGG19 with CNN architecture achieved up to 0.973 accuracy and 0.975 F1-score to distinguish malicious PDF files, which is viable for personal, or enterprise-wide use and easy to implement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. SentinelOne: Malicious PDFs - Revealing the Techniques Behind the Attacks. https://www.sentinelone.com/blog/malicious-pdfs-revealing-techniques-behind-attacks/. Accessed 27 Mar 2019

  2. Cybersecurity Insiders: Cyber Attack with Ransomware hidden inside PDF Documents. https://www.cybersecurity-insiders.com/cyber-attack-with-ransomware-hidden-inside-pdf-documents/

  3. Kaspersky: Top 4 dangerous file attachments. https://www.kaspersky.com/blog/top4-dangerous-attachments-2019/27147/. Accessed 31 May 2019

  4. O’Shaughnessy, S.: Image-based malware classification: a space filling curve approach. In: 2019 IEEE Symposium on Visualization for Cyber Security (VizSec), pp. 1–10. IEEE, October 2019. https://doi.org/10.1109/VizSec48167.2019.9161583

  5. Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.S.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, pp. 1–7, July 2011. https://doi.org/10.1145/2016904.2016908

  6. Fu, J., Xue, J., Wang, Y., Liu, Z., Shan, C.: Malware visualization for fine-grained classification. IEEE Access 6, 14510–14523 (2018). https://doi.org/10.1109/ACCESS.2018.2805301

    Article  Google Scholar 

  7. Bhodia, N., Prajapati, P., Di Troia, F., Stamp, M.: Transfer learning for image-based malware classification. arXiv preprint arXiv:1903.11551 (2019)

  8. Darus, F.M., Ahmad, N.A., Ariffin, A.F.M.: Android Malware classification using XGBoost on data image pattern. In: 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), pp. 118–122. IEEE, November 2019. https://doi.org/10.1109/IoTaIS47347.2019.8980412

  9. Kapoor, A., Dhavale, S.: Control flow graph based multiclass malware detection using bi-normal separation. Defence Sci. J. 66(2), 138–145 (2016). https://doi.org/10.14429/dsj.66.9701

  10. Han, K., Kang, B., Im, E.G.: Malware analysis using visualized image matrices. Sci. World J. 2014, 106–120 (2014). https://doi.org/10.1155/2014/132713

  11. Laskov, P., Šrndić, N.: Static detection of malicious JavaScript-bearing PDF documents. In: Proceedings of the 27th Annual Computer Security Applications Conference, pp. 373–382, December 2011. https://doi.org/10.1145/2076732.2076785

  12. Maiorca, D., Ariu, D., Corona, I., Giacinto, G.: A structural and content-based approach for a precise and robust detection of malicious PDF files. In: 2015 International Conference on Information Systems Security and Privacy (ICISSP), pp. 27–36. IEEE, February 2015

    Google Scholar 

  13. Smutz, C., Stavrou, A.: Malicious PDF detection using metadata and structural features. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 239–248, December 2012. https://doi.org/10.1145/2420950.2420987

  14. Blonce, A., Filiol, E., Frayssignes, L.: Portable document format (pdf) security analysis and malware threats. In: Presentations of Europe BlackHat 2008 Conference, March 2008

    Google Scholar 

  15. Maiorca, D., Biggio, B.: Digital investigation of PDF files: unveiling traces of embedded malware. IEEE Secur. Privacy 17 (2017). https://doi.org/10.1109/MSEC.2018.2875879

  16. Corum, A., Jenkins, D., Zheng, J.: Robust PDF malware detection with image visualization and processing techniques. In: 2019 2nd International Conference on Data Intelligence and Security (ICDIS), pp. 108–114. IEEE, June 2019. https://doi.org/10.1109/ICDIS.2019.00024

  17. Whitington, J.: PDF Explained: The ISO Standard for Document Exchange, 1st edn. O’Reilly Media, Newton (2011)

    Google Scholar 

  18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition CORR.abs/1409.1556. arXiv preprint arXiv:1409.1556

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ching-Yuan Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, CY., Chiu, MY., Huang, QX., Sun, HM. (2021). PDF Malware Detection Using Visualization and Machine Learning. In: Barker, K., Ghazinour, K. (eds) Data and Applications Security and Privacy XXXV. DBSec 2021. Lecture Notes in Computer Science(), vol 12840. Springer, Cham. https://doi.org/10.1007/978-3-030-81242-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-81242-3_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-81241-6

  • Online ISBN: 978-3-030-81242-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics