skip to main content
research-article
Open access

Decoupling algorithms from schedules for easy optimization of image processing pipelines

Published: 01 July 2012 Publication History

Abstract

Using existing programming tools, writing high-performance image processing code requires sacrificing readability, portability, and modularity. We argue that this is a consequence of conflating what computations define the algorithm, with decisions about storage and the order of computation. We refer to these latter two concerns as the schedule, including choices of tiling, fusion, recomputation vs. storage, vectorization, and parallelism.
We propose a representation for feed-forward imaging pipelines that separates the algorithm from its schedule, enabling high-performance without sacrificing code clarity. This decoupling simplifies the algorithm specification: images and intermediate buffers become functions over an infinite integer domain, with no explicit storage or boundary conditions. Imaging pipelines are compositions of functions. Programmers separately specify scheduling strategies for the various functions composing the algorithm, which allows them to efficiently explore different optimizations without changing the algorithmic code.
We demonstrate the power of this representation by expressing a range of recent image processing applications in an embedded domain specific language called Halide, and compiling them for ARM, x86, and GPUs. Our compiler targets SIMD units, multiple cores, and complex memory hierarchies. We demonstrate that it can handle algorithms such as a camera raw pipeline, the bilateral grid, fast local Laplacian filtering, and image segmentation. The algorithms expressed in our language are both shorter and faster than state-of-the-art implementations.

Supplementary Material

ZIP File (a32-ragan-kelley.zip)
Supplemental material.

References

[1]
Adams, A., Talvala, E.-V., Park, S. H., Jacobs, D. E., Ajdin, B., Gelfand, N., Dolson, J., Vaquero, D., Baek, J., Tico, M., Lensch, H. P. A., Matusik, W., Pulli, K., Horowitz, M., and Levoy, M. 2010. The Frankencamera: An experimental platform for computational photography. ACM Transactions on Graphics 29, 4 (July), 29:1--29:12.
[2]
Aubry, M., Paris, S., Hasinoff, S. W., Kautz, J., and Durand, F. 2011. Fast and robust pyramid-based image processing. Tech. Rep. MIT-CSAIL-TR-2011-049, Massachusetts Institute of Technology.
[3]
Buck, I. 2007. GPU computing: Programming a massively parallel processor. In CGO '07: Proceedings of the International Symposium on Code Generation and Optimization, IEEE Computer Society, 17.
[4]
Chen, J., Paris, S., and Durand, F. 2007. Real-time edge-aware image processing with the bilateral grid. ACM Transactions on Graphics 26, 3 (July), 103:1--103:9.
[5]
CoreImage. Apple CoreImage programming guide. http://developer.apple.com/library/mac/#documentation/GraphicsImaging/Conceptual/CoreImaging.
[6]
Elliott, C., Finne, S., and de Moor, O. 2003. Compiling embedded languages. Journal of Functional Programming 13, 2. Updated version of paper by the same name that appeared in SAIG '00 proceedings.
[7]
Elliott, C. 2001. Functional image synthesis. In Proceedings of Bridges.
[8]
Fatahalian, K., Horn, D. R., Knight, T. J., Leem, L., Houston, M., Park, J. Y., Erez, M., Ren, M., Aiken, A., Dally, W. J., and Hanrahan, P. 2006. Sequoia: programming the memory hierarchy. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, ACM, SC '06.
[9]
Feautrier, P. 1991. Dataflow analysis of array and scalar references. International Journal of Parallel Programming 20.
[10]
Gordon, M. I., Thies, W., Karczmarek, M., Lin, J., Meli, A. S., Leger, C., Lamb, A. A., Wong, J., Hoffman, H., Maze, D. Z., and Amarasinghe, S. 2002. A stream compiler for communication-exposed architectures. In International Conference on Architectural Support for Programming Languages and Operating Systems.
[11]
Guenter, B., and Nehab, D. 2010. The neon image processing language. Tech. Rep. MSR-TR-2010-175, Microsoft Research.
[12]
IPP. Intel Integrated Performance Primitives. http://software.intel.com/en-us/articles/intel-ipp/.
[13]
Kapasi, U. J., Mattson, P., Dally, W. J., Owens, J. D., and Towles, B. 2002. Stream scheduling. Concurrent VLSI Architecture Tech Report 122, Stanford University, March.
[14]
Kass, M., Witkin, A., and Terzopoulos, D. 1988. Snakes: Active contour models. International Journal of Computer Vision 1, 4.
[15]
Levoy, M. 1994. Spreadsheets for images. In Proceedings of SIGGRAPH 94, Computer Graphics Proceedings, Annual Conference Series, 139--146.
[16]
Li, C., Xu, C., Gui, C., and Fox, M. D. 2010. Distance regularized level set evolution and its application to image segmentation. IEEE Transactions on Image Processing 19, 12 (December), 3243--3254.
[17]
LLVM. The LLVM compiler infrastructure. http://llvm.org.
[18]
McCool, M. D., Qin, Z., and Popa, T. S. 2002. Shader metaprogramming. In Graphics Hardware 2002, 57--68.
[19]
Newburn, C. J., So, B., Liu, Z., McCool, M., Ghuloum, A., Toit, S. D., Wang, Z. G., Du, Z. H., Chen, Y., Wu, G., Guo, P., Liu, Z., and Zhang, D. 2011. Intel's array building blocks: A retargetable, dynamic compiler and embedded language. In Proceedings of the 2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, IEEE Computer Society, CGO '11, 224--235.
[20]
OpenCL, 2011. The OpenCL specification, version 1.2. http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf.
[21]
OpenMP. OpenMP. http://openmp.org/.
[22]
Paris, S., and Durand, F. 2009. A fast approximation of the bilateral filter using a signal processing approach. International Journal of Computer Vision 81, 1, 24--52.
[23]
Paris, S., Kornprobst, P., Tumblin, J., and Durand, F. 2009. Bilateral filtering: Theory and applications. Foundations and Trends in Computer Graphics and Vision.
[24]
Paris, S., Hasinoff, S. W., and Kautz, J. 2011. Local Laplacian filters: Edge-aware image processing with a Laplacian pyramid. ACM Transactions on Graphics 30, 4.
[25]
Pharr, M., and Mark, W. R. 2012. ispc: A SPMD compiler for high-performance CPU programming. In Proceedings of Innovative Parallel Computing (InPar).
[26]
PixelBender. Adobe PixelBender reference. http://www.adobe.com/content/dam/Adobe/en/devnet/pixelbender/pdfs/pixelbender_reference.pdf.
[27]
Püschel, M., Moura, J. M. F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R. W., and Rizzolo, N. 2005. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation" 93, 2, 232--275.
[28]
Shantzis, M. A. 1994. A model for efficient and flexible image computing. In Proceedings of the 21st annual conference on Computer graphics and interactive techniques, ACM, SIGGRAPH '94, 147--154.

Cited By

View all
  • (2024)SCGen: A Versatile Generator Framework for Agile Design of Stochastic Circuits2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546649(1-6)Online publication date: 25-Mar-2024
  • (2024)Guided Equality SaturationProceedings of the ACM on Programming Languages10.1145/36329008:POPL(1727-1758)Online publication date: 5-Jan-2024
  • (2024)Automated Buffer Sizing of Dataflow Applications in a High-level Synthesis WorkflowACM Transactions on Reconfigurable Technology and Systems10.1145/362610317:1(1-26)Online publication date: 27-Jan-2024
  • Show More Cited By

Index Terms

  1. Decoupling algorithms from schedules for easy optimization of image processing pipelines

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Graphics
    ACM Transactions on Graphics  Volume 31, Issue 4
    July 2012
    935 pages
    ISSN:0730-0301
    EISSN:1557-7368
    DOI:10.1145/2185520
    Issue’s Table of Contents
    • cover image ACM Overlay Books
      Seminal Graphics Papers: Pushing the Boundaries, Volume 2
      August 2023
      893 pages
      ISBN:9798400708978
      DOI:10.1145/3596711
      • Editor:
      • Mary C. Whitton
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 July 2012
    Published in TOG Volume 31, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    • Seminal Paper

    Author Tags

    1. compilers
    2. image processing
    3. performance

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)258
    • Downloads (Last 6 weeks)24
    Reflects downloads up to 15 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SCGen: A Versatile Generator Framework for Agile Design of Stochastic Circuits2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546649(1-6)Online publication date: 25-Mar-2024
    • (2024)Guided Equality SaturationProceedings of the ACM on Programming Languages10.1145/36329008:POPL(1727-1758)Online publication date: 5-Jan-2024
    • (2024)Automated Buffer Sizing of Dataflow Applications in a High-level Synthesis WorkflowACM Transactions on Reconfigurable Technology and Systems10.1145/362610317:1(1-26)Online publication date: 27-Jan-2024
    • (2024)GTCO: Graph and Tensor Co-Design for Transformer-Based Image Recognition on Tensor CoresIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.331716943:2(586-599)Online publication date: 1-Feb-2024
    • (2024)SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for HalideIEEE Access10.1109/ACCESS.2023.334566012(7563-7583)Online publication date: 2024
    • (2023)SLANG.D: Fast, Modular and Differentiable Shader ProgrammingACM Transactions on Graphics10.1145/361835342:6(1-28)Online publication date: 5-Dec-2023
    • (2023)Guided Linear UpsamplingACM Transactions on Graphics10.1145/359245342:4(1-12)Online publication date: 26-Jul-2023
    • (2023)Semantics and Scheduling for Machine Knitting CompilersACM Transactions on Graphics10.1145/359244942:4(1-26)Online publication date: 26-Jul-2023
    • (2023)Indexed Streams: A Formal Intermediate Representation for Fused Contraction ProgramsProceedings of the ACM on Programming Languages10.1145/35912687:PLDI(1169-1193)Online publication date: 6-Jun-2023
    • (2023)Mosaic: An Interoperable Compiler for Tensor AlgebraProceedings of the ACM on Programming Languages10.1145/35912367:PLDI(394-419)Online publication date: 6-Jun-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media