skip to main content
research-article
Open access

Learning graph-based heuristics for pointer analysis without handcrafting application-specific features

Published: 13 November 2020 Publication History

Abstract

We present Graphick, a new technique for automatically learning graph-based heuristics for pointer analysis. Striking a balance between precision and scalability of pointer analysis requires designing good analysis heuristics. For example, because applying context sensitivity to all methods in a real-world program is impractical, pointer analysis typically uses a heuristic to employ context sensitivity only when it is necessary. Past research has shown that exploiting the program's graph structure is a promising way of developing cost-effective analysis heuristics, promoting the recent trend of ``graph-based heuristics'' that work on the graph representations of programs obtained from a pre-analysis. Although promising, manually developing such heuristics remains challenging, requiring a great deal of expertise and laborious effort. In this paper, we aim to reduce this burden by learning graph-based heuristics automatically, in particular without hand-crafted application-specific features. To do so, we present a feature language to describe graph structures and an algorithm for learning analysis heuristics within the language. We implemented Graphick on top of Doop and used it to learn graph-based heuristics for object sensitivity and heap abstraction. The evaluation results show that our approach is general and can generate high-quality heuristics. For both instances, the learned heuristics are as competitive as the existing state-of-the-art heuristics designed manually by analysis experts.

Supplementary Material

Auxiliary Presentation Video (oopsla20main-p218-p-video.mp4)
We present Graphick, a new technique for automatically learning graph-based heuristics for pointer analysis. The existing researches have shown that exploiting the program's graph structure is a promising way to develop cost-effective analysis heuristics, promoting the recent trend of ``graph-based heuristics'' that work on the graph representations of programs obtained from a pre-analysis. Although promising, manually developing such heuristics remains challenging, requiring a great deal of expertise and laborious effort. In this paper, we aim to reduce this burden by learning graph-based heuristics automatically, in particular without hand-crafted application-specific features. To do so, we present a feature language to describe graph structures and an algorithm for learning analysis heuristics within the language. We implemented Graphick on top of Doop and used it to learn graph-based heuristics for object sensitivity and heap abstraction.

References

[1]
Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. FlowDroid: Precise Context, Flow, Field, Object-sensitive and Lifecycle-aware Taint Analysis for Android Apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (Edinburgh, United Kingdom) (PLDI '14). ACM, New York, NY, USA, 259-269. https: //doi.org/10.1145/2594291.2594299
[2]
Dzintars Avots, Michael Dalton, V. Benjamin Livshits, and Monica S. Lam. 2005. Improving Software Security with a C Pointer Analysis. In Proceedings of the 27th International Conference on Software Engineering (St. Louis, MO, USA) ( ICSE '05). ACM, New York, NY, USA, 332-341. https://doi.org/10.1145/1062455.1062520
[3]
Stephen M. Blackburn, Robin Garner, Chris Hofmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018 ). Association for Computing Machinery, New York, NY, USA, 95-106. https://doi.org/10.1145/3236024.3236079
[4]
Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis. 2018a. Precision-guided Context Sensitivity for Pointer Analysis. Proc. ACM Program. Lang. 2, OOPSLA, Article 141 (Oct. 2018 ), 29 pages. https://doi.org/10.1145/3276511
[5]
Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis. 2018b. Scalability-first Pointer Analysis with Self-tuning Contextsensitivity. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) ( ESEC/FSE 2018). ACM, New York, NY, USA, 129-140. https://doi.org/10.1145/3236024.3236041
[6]
Percy Liang and Mayur Naik. 2011. Scaling Abstraction Refinement via Pruning. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (San Jose, California, USA) ( PLDI '11). ACM, New York, NY, USA, 590-601. https://doi.org/10.1145/1993498.1993567
[7]
Percy Liang, Omer Tripp, and Mayur Naik. 2011. Learning Minimal Abstractions. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Austin, Texas, USA) ( POPL '11). ACM, New York, NY, USA, 31-42. https://doi.org/10.1145/1926385.1926391
[8]
V. Benjamin Livshits and Monica S. Lam. 2003. Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs. In Proceedings of the 9th European Software Engineering Conference Held Jointly with 11th ACM SIGSOFT International Symposium on Foundations of Software Engineering (Helsinki, Finland) (ESEC/FSE-11). ACM, New York, NY, USA, 317-326. https://doi.org/10.1145/940071.940114
[9]
Jingbo Lu and Jingling Xue. 2019. Precision-Preserving yet Fast Object-Sensitive Pointer Analysis with Partial Context Sensitivity. Proc. ACM Program. Lang. 3, OOPSLA, Article 148 (Oct. 2019 ), 29 pages. https://doi.org/10.1145/3360574
[10]
Ana Milanova, Atanas Rountev, and Barbara G. Ryder. 2002. Parameterized Object Sensitivity for Points-to and Side-efect Analyses for Java. In Proceedings of the 2002 ACM SIGSOFT International Symposium on Software Testing and Analysis (Roma, Italy) ( ISSTA '02). ACM, New York, NY, USA, 1-11. https://doi.org/10.1145/566172.566174
[11]
Ana Milanova, Atanas Rountev, and Barbara G. Ryder. 2005. Parameterized Object Sensitivity for Points-to Analysis for Java. ACM Trans. Softw. Eng. Methodol. 14, 1 (Jan. 2005 ), 1-41. https://doi.org/10.1145/1044834.1044835
[12]
Mayur Naik, Alex Aiken, and John Whaley. 2006. Efective Static Race Detection for Java. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (Ottawa, Ontario, Canada) ( PLDI '06). ACM, New York, NY, USA, 308-319. https://doi.org/10.1145/1133981.1134018
[13]
Mayur Naik, Chang-Seo Park, Koushik Sen, and David Gay. 2009. Efective Static Deadlock Detection. In Proceedings of the 31st International Conference on Software Engineering (ICSE '09). IEEE Computer Society, Washington, DC, USA, 386-396. https://doi.org/10.1109/ICSE. 2009.5070538
[14]
Hakjoo Oh, Wonchan Lee, Kihong Heo, Hongseok Yang, and Kwangkeun Yi. 2014. Selective Context-sensitivity Guided by Impact Pre-analysis. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (Edinburgh, United Kingdom) (PLDI '14). ACM, New York, NY, USA, 475-484. https://doi.org/10.1145/ 2594291.2594318
[15]
Hakjoo Oh, Hongseok Yang, and Kwangkeun Yi. 2015. Learning a Strategy for Adapting a Program Analysis via Bayesian Optimisation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (Pittsburgh, PA, USA) ( OOPSLA 2015). ACM, New York, NY, USA, 572-588. https: //doi.org/10.1145/2814270.2814309
[16]
Gagandeep Singh, Markus Püschel, and Martin Vechev. 2018. Fast Numerical Program Analysis with Reinforcement Learning. In Computer Aided Verification, Hana Chockler and Georg Weissenbacher (Eds.). Springer International Publishing, Cham, 211-229.
[17]
Yannis Smaragdakis and George Balatsouras. 2015. Pointer Analysis. Foundations and Trends in Programming Languages 2, 1 ( 2015 ), 1-69. https://doi.org/10.1561/2500000014
[18]
Yannis Smaragdakis, Martin Bravenboer, and Ondrej Lhoták. 2011. Pick Your Contexts Well: Understanding Object-sensitivity. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Austin, Texas, USA) ( POPL '11). ACM, New York, NY, USA, 17-30. https://doi.org/10.1145/1926385.1926390
[19]
Yannis Smaragdakis, George Kastrinis, and George Balatsouras. 2014. Introspective Analysis: Context-sensitivity, Across the Board. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (Edinburgh, United Kingdom) (PLDI '14). ACM, New York, NY, USA, 485-495. https://doi.org/10.1145/2594291.2594320
[20]
SPEC SPECjvm98. 1999. Release 1.03. Standard Performance Evaluation Corporation ( 1999 ).
[21]
Y. Sui, D. Ye, and J. Xue. 2014. Detecting Memory Leaks Statically with Full-Sparse Value-Flow Analysis. IEEE Transactions on Software Engineering 40, 2 (Feb 2014 ), 107-122. https://doi.org/10.1109/TSE. 2014.2302311
[22]
Tian Tan, Yue Li, and Jingling Xue. 2016. Making k-Object-Sensitive Pointer Analysis More Precise with Still k-Limiting. In Static Analysis, Xavier Rival (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 489-510.
[23]
Tian Tan, Yue Li, and Jingling Xue. 2017. Eficient and Precise Points-to Analysis: Modeling the Heap by Merging Equivalent Automata. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (Barcelona, Spain) ( PLDI 2017). ACM, New York, NY, USA, 278-291. https://doi.org/10.1145/3062341.3062360
[24]
Omer Tripp, Marco Pistoia, Stephen J. Fink, Manu Sridharan, and Omri Weisman. 2009. TAJ: Efective Taint Analysis of Web Applications. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (Dublin, Ireland) ( PLDI '09). ACM, New York, NY, USA, 87-97. https://doi.org/10.1145/1542476.1542486
[25]
Guoqing Xu and Atanas Rountev. 2008. Merging Equivalent Contexts for Scalable Heap-cloning-based Context-sensitive Points-to Analysis. In Proceedings of the 2008 International Symposium on Software Testing and Analysis (Seattle, WA, USA) ( ISSTA '08). ACM, New York, NY, USA, 225-236. https://doi.org/10.1145/1390630.1390658
[26]
Xuezheng Xu, Yulei Sui, Hua Yan, and Jingling Xue. 2019. VFix: Value-Flow-Guided Precise Program Repair for Null Pointer Dereferences. In Proceedings of the 41st International Conference on Software Engineering (Montreal, Quebec, Canada) ( ICSE '19). IEEE Press, 512-523. https://doi.org/10.1109/ICSE. 2019.00063
[27]
Hua Yan, Yulei Sui, Shiping Chen, and Jingling Xue. 2017. Machine-Learning-Guided Typestate Analysis for Static UseAfter-Free Detection. In Proceedings of the 33rd Annual Computer Security Applications Conference (Orlando, FL, USA) ( ACSAC 2017). ACM, New York, NY, USA, 42-54. https://doi.org/10.1145/3134600.3134620
[28]
Xin Zhang, Ravi Mangal, Radu Grigore, Mayur Naik, and Hongseok Yang. 2014. On Abstraction Refinement for Program Analyses in Datalog. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (Edinburgh, United Kingdom) (PLDI '14). ACM, New York, NY, USA, 239-248. https://doi.org/10.1145/2594291.2594327
[29]
Xin Zhang, Mayur Naik, and Hongseok Yang. 2013. Finding Optimum Abstractions in Parametric Dataflow Analysis. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (Seattle, Washington, USA) ( PLDI '13). Association for Computing Machinery, New York, NY, USA, 365-376. https://doi.org/10. 1145/2491956.2462185

Cited By

View all
  • (2024)Falcon: A Fused Approach to Path-Sensitive Sparse Data Dependence AnalysisProceedings of the ACM on Programming Languages10.1145/36564008:PLDI(567-592)Online publication date: 20-Jun-2024
  • (2024)When to Stop Going Down the Rabbit Hole: Taming Context-Sensitivity on the FlyProceedings of the 13th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis10.1145/3652588.3663321(35-44)Online publication date: 20-Jun-2024
  • (2024)Learning Abstraction Selection for Bayesian Program AnalysisProceedings of the ACM on Programming Languages10.1145/36498458:OOPSLA1(954-982)Online publication date: 29-Apr-2024
  • Show More Cited By

Index Terms

  1. Learning graph-based heuristics for pointer analysis without handcrafting application-specific features

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Programming Languages
    Proceedings of the ACM on Programming Languages  Volume 4, Issue OOPSLA
    November 2020
    3108 pages
    EISSN:2475-1421
    DOI:10.1145/3436718
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 November 2020
    Published in PACMPL Volume 4, Issue OOPSLA

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. Context sensitivity
    2. Data-driven static analysis
    3. Heap abstraction
    4. Machine learning for program analysis
    5. Pointer analysis

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)366
    • Downloads (Last 6 weeks)43
    Reflects downloads up to 21 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Falcon: A Fused Approach to Path-Sensitive Sparse Data Dependence AnalysisProceedings of the ACM on Programming Languages10.1145/36564008:PLDI(567-592)Online publication date: 20-Jun-2024
    • (2024)When to Stop Going Down the Rabbit Hole: Taming Context-Sensitivity on the FlyProceedings of the 13th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis10.1145/3652588.3663321(35-44)Online publication date: 20-Jun-2024
    • (2024)Learning Abstraction Selection for Bayesian Program AnalysisProceedings of the ACM on Programming Languages10.1145/36498458:OOPSLA1(954-982)Online publication date: 29-Apr-2024
    • (2024)Generic Sensitivity: Generics-Guided Context Sensitivity for Pointer AnalysisIEEE Transactions on Software Engineering10.1109/TSE.2024.337764550:5(1144-1162)Online publication date: May-2024
    • (2023)A Container-Usage-Pattern-Based Context Debloating Approach for Object-Sensitive Pointer AnalysisProceedings of the ACM on Programming Languages10.1145/36228327:OOPSLA2(971-1000)Online publication date: 16-Oct-2023
    • (2023)Tai-e: A Developer-Friendly Static Analysis Framework for Java by Harnessing the Good Designs of ClassicsProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598120(1093-1105)Online publication date: 12-Jul-2023
    • (2023)Context Sensitivity without Contexts: A Cut-Shortcut Approach to Fast and Precise Pointer AnalysisProceedings of the ACM on Programming Languages10.1145/35912427:PLDI(539-564)Online publication date: 6-Jun-2023
    • (2023)RaceInjector: Injecting Races to Evaluate and Learn Dynamic Race Detection AlgorithmsProceedings of the 12th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis10.1145/3589250.3596142(63-70)Online publication date: 6-Jun-2023
    • (2023)IFDS-based Context Debloating for Object-Sensitive Pointer AnalysisACM Transactions on Software Engineering and Methodology10.1145/357964132:4(1-44)Online publication date: 27-May-2023
    • (2023)Learning to Boost Disjunctive Static Bug-FindersProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00099(1097-1109)Online publication date: 14-May-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media