skip to main content
article

A stream compiler for communication-exposed architectures

Published: 01 October 2002 Publication History

Abstract

With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, SmartMemories, TRIPS). However, for their use to be widespread, it will be necessary to develop compiler technology that enables a portable, high-level language to execute efficiently across a range of wire-exposed architectures.In this paper, we describe our compiler for StreamIt: a high-level, architecture-independent language for streaming applications. We focus on our backend for the Raw processor. Though StreamIt exposes the parallelism and communication patterns of stream programs, some analysis is needed to adapt a stream program to a software-exposed processor. We describe a partitioning algorithm that employs fission and fusion transformations to adjust the granularity of a stream graph, a layout algorithm that maps a stream graph to a given network topology, and a scheduling strategy that generates a fine-grained static communication pattern for each computational element.We have implemented a fully functional compiler that parallelizes StreamIt applications for Raw, including several load-balancing transformations. Using the cycle-accurate Raw simulator, we demonstrate that the StreamIt compiler can automatically map a high-level stream abstraction to Raw without losing performance. We consider this work to be a first step towards a portable programming model for communication-exposed architectures.

References

[1]
Streamit homepage. http://compiler.lcs.mit.edu/streamit.]]
[2]
The Transputer Databook. Inmos Corporation, 1988.]]
[3]
G. Berry and G. Gonthier. The Esterel Synchronous Programming Language: Design, Semantics, Implementation. Science of Computer Programming, 19(2), 1992.]]
[4]
S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers, 1996.]]
[5]
G. Bilsen, M. Engels, R. Lauwereins, and J. Peperstraete. Cyclo-static dataflow. IEEE Trans. on Signal Processing, 1996.]]
[6]
S. Borkar, R. Cohn, G. Cox, S. Gleason, T. Gross, H. T. Kung, M. Lam, B. Moore, C. Peterson, J. Pieper, L. Rankin, P. S. Tseng, J. Sutton, J. Urbanski, and J. Webb. iWarp: An integrated solution to high-speed parallel computing. In Supercomputing, 1988.]]
[7]
E. Caspi, M. Chu, R. Huang, J. Yeh, J. Wawrzynek, and A. DeHon. Stream Computations Organized for Reconfigurable Execution (SCORE): Extended Abstract. In Proceedings of the Conference on Field Programmable Logic and Applications, 2000.]]
[8]
M. B. T. et. al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs. IEEE Micro vol 22, Issue 2, 2002.]]
[9]
J. Gaudiot, W. Bohm, T. DeBoni, J. Feo, and P. Mille". The Sisal Model of Functional Programming and its Implementation. In Proceedings of the Second Aizu International Symposium on Parallel Algorithms/Architectures Synthesis, 1997.]]
[10]
T. Gautier, P. L. Guernic, and L. Besnard. Signal: A declarative language for synchronous programming of real-time systems. Springer Verlag Lecture Notes in Computer Science, 274, 1987.]]
[11]
V. Gay-Para, T. Graf, A.-G. Lemonnier, and E. Wais. Kopi Reference manual. http://www.dms.at/kopi/docs/kopi.html, 2001.]]
[12]
T. Gross and D. R. O'Halloron. iWarp, Anatomy of a Parallel Computing System. MIT Press, 1998.]]
[13]
N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data-flow programming language LUSTRE. Proc. of the IEEE, 79(9), 1991.]]
[14]
R. Ho, K. Mai, and M. Horowitz. The Future of Wires. In Proc. of the IEEE, 2001.]]
[15]
C. A. R. Hoare. Communicating sequential processes. Communications of the ACM, 21(8), 1978.]]
[16]
Inmos Corporation. Occam 2 Reference Manual. Prentice Hall, 1988.]]
[17]
U. J. Kapasi, P. Mattson, W. J. Dally, J. D. Owens, and B. Towles. Stream scheduling. In Proc. of the 3rd Workshop on Media and Streaming Processors, 2001.]]
[18]
S. Kirkpatrick, J. C. D. Gelatt, and M. Vecchi. Optimization by Simulated Annealing. Science, 220(4598), May 1983.]]
[19]
J. Lebak. Polymorphous Computing Architecture (PCA) Example Applications and Description. External Report, Lincoln Laboratory, Mass. Inst. of Technology, 2001.]]
[20]
E. A. Lee. Overview of the Ptolemy Project. UCB/ERL Technical Memorandum UCB/ERL M01/11, Dept. EECS, University of California, Berkeley, CA, March 2001.]]
[21]
W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. P. Amarasinghe. Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine. In Architectural Support for Programming Languages and Operating Systems, 1998.]]
[22]
K. Mai, T. Paaske, N. Jayasena, R. Ho, W. Dally, and M. Horowitz. Smart memories: A modular recongurable architecture. In ISCA 2000, Vancouver, BC, Canada.]]
[23]
D. May, R. Shepherd, and C. Keane. Communicating Process Architecture: Transputers and Occam. Future Parallel Computers: An Advanced Course, Pisa, Lecture Notes in Computer Science, 272, June 1987.]]
[24]
A. Mitschele-Thiel. Automatic Configuration and Optimization of Parallel Transputer Applications. Transputer Applications and Systems '93, 1993.]]
[25]
D. R. O'Hallaron. The ASSIGN Parallel Program Generator. Carnegie Mellon Technical Report CMU-CS-91-141, 1991.]]
[26]
T. A. Proebsting and S. A. Watterson. Filter Fusion. In POPL, 1996.]]
[27]
S. Rixner, W. J. Dally, U. J. Kapasi, B. Khailany, A. Lopez-Lagunas, P. R. Mattson, and J. D. Owens. A bandwidth-efficient architecture for media processing. In International Symposium on Microarchitecture, 1998.]]
[28]
K. Sankaralingam, R. Nagarajan, S. Keckler, and D. Burger. A Technology-Scalable Architecture for Fast Clocks and High ILP. University of Texas at Austin, Dept. of Computer Sciences Technical Report TR-01-02, 2001.]]
[29]
R. Stephens. A Survey of Stream Processing. Acta Informatica, 34(7), 1997.]]
[30]
M. B. Taylor, W. Lee, S. Amarasinghe, and A. Agarwal. Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures. Technical Report MIT-LCS-TR-859, Mass. Inst. of Technology, July 2002.]]
[31]
W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In Proceedings of the International Conference on Compiler Construction, to appear, Grenoble, France, 2002.]]
[32]
W. Thies, M. Karczmarek, M. Gordon, D. Maze, J. Wong, H. Hoffmann, M. Brown, and S. Amarasinghe. StreamIt: A Compiler for Streaming Applications. MIT-LCS Technical Memo LCS-TM-622, Cambridge, MA, 2001.]]
[33]
E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: Raw machines. IEEE Computer, 30(9), 1997.]]

Cited By

View all
  • (2024)SongC: A Compiler for Hybrid Near-Memory and In-Memory Many-Core ArchitectureIEEE Transactions on Computers10.1109/TC.2023.331194873:10(2420-2433)Online publication date: 1-Oct-2024
  • (2018)SpinStreamsProceedings of the 19th International Middleware Conference10.1145/3274808.3274814(66-79)Online publication date: 26-Nov-2018
  • (2018)Implicit Data-Parallelism in Kahn Process NetworksProceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms10.1145/3183767.3183790(20-25)Online publication date: 23-Jan-2018
  • Show More Cited By

Index Terms

  1. A stream compiler for communication-exposed architectures
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 37, Issue 10
      October 2002
      296 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/605432
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
        October 2002
        318 pages
        ISBN:1581135742
        DOI:10.1145/605397
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 October 2002
      Published in SIGPLAN Volume 37, Issue 10

      Check for updates

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)24
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 15 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)SongC: A Compiler for Hybrid Near-Memory and In-Memory Many-Core ArchitectureIEEE Transactions on Computers10.1109/TC.2023.331194873:10(2420-2433)Online publication date: 1-Oct-2024
      • (2018)SpinStreamsProceedings of the 19th International Middleware Conference10.1145/3274808.3274814(66-79)Online publication date: 26-Nov-2018
      • (2018)Implicit Data-Parallelism in Kahn Process NetworksProceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms10.1145/3183767.3183790(20-25)Online publication date: 23-Jan-2018
      • (2017)Tightening Contention Delays While Scheduling Parallel Applications on Multi-core ArchitecturesACM Transactions on Embedded Computing Systems10.1145/312649616:5s(1-20)Online publication date: 27-Sep-2017
      • (2016)Using fusion to enable late design decisions for pipelined computationsProceedings of the 5th International Workshop on Functional High-Performance Computing10.1145/2975991.2975993(9-16)Online publication date: 8-Sep-2016
      • (2016)Communication-aware mapping of stream graphs for multi-GPU platformsProceedings of the 2016 International Symposium on Code Generation and Optimization10.1145/2854038.2854055(94-104)Online publication date: 29-Feb-2016
      • (2016)A Retargetable Compilation Framework for Heterogeneous Reconfigurable ComputingACM Transactions on Reconfigurable Technology and Systems10.1145/28439469:4(1-22)Online publication date: 11-Aug-2016
      • (2016)Modeling Multi-Periodic Simulink Systems by Synchronous Dataflow Graphs2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS.2016.7461343(1-10)Online publication date: Apr-2016
      • (2015)Compiling high performance recursive filtersProceedings of the 7th Conference on High-Performance Graphics10.1145/2790060.2790063(85-94)Online publication date: 7-Aug-2015
      • (2015)Orchestrating Multiple Data-Parallel Kernels on Multiple DevicesProceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT)10.1109/PACT.2015.14(355-366)Online publication date: 18-Oct-2015
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media