skip to main content
10.1145/2851553.2851574acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

Interconnect Emulator for Aiding Performance Analysis of Distributed Memory Applications

Published: 12 March 2016 Publication History

Abstract

Many modern large graph and Big Data processing applications operate on datasets that do not fit into DRAM of a single machine. This leads to a design of scale-out applications, where the application dataset is partitioned and processed by a cluster of machines. Distributed memory applications exhibit complex behavior: they tend to interleave computations and communications, use bursty transfers, and utilize global synchronization primitives. This makes it difficult to analyze the impact of communication layer on the application performance and answer the questions: how interconnect latency or bandwidth characteristics may change the application performance will the application performance scale when processed by a larger system? In this work, we introduce a novel emulation framework, called InterSense, which is implemented on top of existing high-speed interconnect, such as InfiniBand, and which provides two performance knobs for changing the (today's) interconnect bandwidth and latency. This approach offers an easy-to-use framework for a sensitivity analysis of complex distributed applications to communication layer performance instead of creating customized and time-consuming application models to answer the same questions. We evaluate the emulator accuracy with popular OSU MPI benchmark suite and two clusters with different generation InfiniBand interconnects (DDR and FDR): InterSense emulates the specified andwidth and latency values with less than 2% error between the expected and measured values. To demonstrate the InterSense's ease of use, we present a case study, where we apply InterSense for sensitivity analysis of four applications and benchmarks for getting non-trivial insights.

References

[1]
Emulab - Network Emulation Testbed, http://www.emulab.net/ .
[2]
Graph 500 Benchmark. www.graph500.org/.
[3]
HP Labs. The Machine: A new kind of computer. http://www.hpl.hp.com/research/systems-research/.
[4]
HPCC RandomAccess (GUPS) Benchmark. http://icl.cs.utk.edu/projectsfiles/hpcc/RandomAccess/.
[5]
MVAPICH: MPI over InfiniBand, 10GigE/iWARP and RoCE. http://mvapich.cse.ohio-state.edu/.
[6]
MVAPICH Ohio State University Micro benchmark. http://mvapich.cse.ohio-state.edu/benchmarks/.
[7]
NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html.
[8]
netem, http://www.linuxfoundation.org/collaborate/workgroups/networking/netem.
[9]
K. Asanovic. FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers. In Proc. of FAST, 2014.
[10]
G. Banga, J. C. Mogul, and P. Druschel. A scalable and Explicit Event Delivery Mechanism for UNIX. In Proc. of the USENIX Annual Technical Conference, 1999.
[11]
F. Checconi and F. Petrini. Traversing Trillions of Edges in Real Time: Graph Exploration on Large-Scale Parallel Machines. In Proc. of Intl. Parallel and Distributed Processing Symposium, IPDPS'14, 2014.
[12]
F. Checconi, F. Petrini, J. Willcock, A. Lumsdaine, A. R. Choudhury, and Y. Sabharwal. Breaking the Speed and Scalability Barriers for Graph Exploration on Distributed-Memory Machines. In Proc. of Conference on High Performance Computing Networking, Storage and Analysis, SC'12, 2012.
[13]
J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. Wasi-ur Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur, and D. K. Panda. Memcached Design on High Performance RDMA Capable Interconnects. In Proc. of the 2011 International Conference on Parallel Processing, ICPP '11, 2011.
[14]
X. Lu, M. Wasi-ur Rahman, N. S. Islam, D. Shankar, and D. K. D. Panda. Accelerating Spark with RDMA for Big Data Processing: Early Experiences. In Proc. of Hot Interconnects, 2014.
[15]
R. P. Martin, A. M. Vahdat, D. E. Culler, and T. E. Anderson. Effects of Communication Latency, Overhead, and Bandwidth in a Cluster Architecture. In Proc. of the 24th Annual International Symposium on Computer Architecture, ISCA '97, 1997.
[16]
E. M. Nahum, M.-C. Rosu, S. Seshan, and J. Almeida. The Effects of Wide-area Conditions on WWW Server Performance. In Proc. of the 2001 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '01, 2001.
[17]
B. D. Noble, M. Satyanarayanan, G. T. Nguyen, and R. H. Katz. Trace-Based Mobile Network Emulation. In Proc. of SIGCOMM, 1997.
[18]
X. Que, F. Checconi, and F. Petrini. Performance Analysis of Graph Algorithms on P7IH. In Proc. of the 29th Intl. Conference on Supercomputing, ISC'14, 2014.
[19]
V. Saxena, Y. Sabharwal, and P. Bhatotia. Performance evaluation and optimization of random memory access on multicores with high productivity. In Proc. of Intl. Conference on High Performance Computing (HiPC), 2010.
[20]
A. Vahdat, K. Yocum, K. Walsh, P. Mahadevan, D. Kostic, J. Chase, and D. Becker. Scalability and accuracy in a large-scale network emulator. SIGOPS Oper. Syst. Rev., 36(SI), Dec. 2002.
[21]
Q. Wang, L. Cherkasova, J. Li, and H. Volos. InterSense: Interconnect Performance Emulator for Future Scale-out Distributed Memory Applications. In Intl. Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2015.
[22]
M. Wasi-ur Rahman, N. S. Islam, X. Lu, J. Jose, H. Subramoni, H. Wang, and D. K. D. Panda. High-Performance RDMA-based Design of Hadoop MapReduce over InfiniBand. In Proc. of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, IPDPSW '13, 2013.
[23]
M. Wasiur-Rahman, X. Lu, N. S. Islam, R. Rajachandrasekar, and D. K. Panda. MapReduce over Lustre: Can RDMA-Based Approach Benefit? In Proc. of the 20th International Conference EuroPar, 2014.
[24]
B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, and A. Joglekar. An Integrated Experimental Environment for Distributed Systems and Networks. SIGOPS Oper. Syst. Rev., 36(SI), Dec. 2002.
[25]
H. Yu and A. Vahdat. The Costs and Limits of Availability for Replicated Services. In Proc. of the 18th ACM Symposium on Operating Systems Principles (SOSP), 2001.

Cited By

View all
  • (2018)Measuring Network Latency Variation Impacts to High Performance Computing Application PerformanceProceedings of the 2018 ACM/SPEC International Conference on Performance Engineering10.1145/3184407.3184427(68-79)Online publication date: 30-Mar-2018
  • (2017)Predictive modeling and scalability analysis for large graph analytics2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM)10.23919/INM.2017.7987265(63-71)Online publication date: May-2017
  • (2017)Open Source In-Memory Data Grid SystemsProceedings of the 8th ACM/SPEC on International Conference on Performance Engineering10.1145/3030207.3053671(163-164)Online publication date: 17-Apr-2017
  • Show More Cited By

Index Terms

  1. Interconnect Emulator for Aiding Performance Analysis of Distributed Memory Applications

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ICPE '16: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering
        March 2016
        346 pages
        ISBN:9781450340809
        DOI:10.1145/2851553
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 12 March 2016

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. MPI
        2. benchmarking
        3. distributed shared memory
        4. infiniband
        5. performance emulation
        6. profiling

        Qualifiers

        • Research-article

        Conference

        ICPE'16

        Acceptance Rates

        ICPE '16 Paper Acceptance Rate 23 of 74 submissions, 31%;
        Overall Acceptance Rate 252 of 851 submissions, 30%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)3
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 22 Sep 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2018)Measuring Network Latency Variation Impacts to High Performance Computing Application PerformanceProceedings of the 2018 ACM/SPEC International Conference on Performance Engineering10.1145/3184407.3184427(68-79)Online publication date: 30-Mar-2018
        • (2017)Predictive modeling and scalability analysis for large graph analytics2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM)10.23919/INM.2017.7987265(63-71)Online publication date: May-2017
        • (2017)Open Source In-Memory Data Grid SystemsProceedings of the 8th ACM/SPEC on International Conference on Performance Engineering10.1145/3030207.3053671(163-164)Online publication date: 17-Apr-2017
        • (2017)Benchmarking and Performance Analysis for Distributed Cache Systems: A Comparative Case StudyPerformance Evaluation and Benchmarking for the Analytics Era10.1007/978-3-319-72401-0_11(147-163)Online publication date: 30-Dec-2017

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media