TRAM: Optimizing Fine-grained Communication with Topological Routing and Aggregation of Messages
International Conference on Parallel Processing (ICPP) 2014
Publication Type: Paper
Repository URL: 201310_TRAM
Abstract
Fine-grained communication in supercomputing applications often limits
performance through high communication overhead and poor utilization of network
bandwidth. This paper presents Topological Routing and Aggregation Module
(TRAM), a library that optimizes fine-grained communication performance by
routing and dynamically combining short messages. TRAM collects units of
fine-grained communication from the application and combines them into
aggregated messages with a common intermediate destination. It routes these
messages along a virtual mesh topology mapped onto the physical topology of the
network. TRAM improves network bandwidth utilization and reduces communication
overhead. It is particularly effective in optimizing patterns with global
communication and large message counts, such as all-to-all and many-to-many, as
well as sparse, irregular, dynamic or data dependent patterns. We demonstrate
how TRAM improves performance through theoretical analysis and experimental
verification using benchmarks and scientific applications. We present speedups
on petascale systems of 6x for communication benchmarks and up to 4x for
applications.
People
- Lukasz Wesolowski
- Ramprasad Venkataraman
- Abhishek Gupta
- Jae-Seung Yeom
- Keith Bisset
- Yanhua Sun
- Pritish Jetley
- Thomas Quinn
- Laxmikant Kale
Research Areas