research-article

Interconnect Emulator for Aiding Performance Analysis of Distributed Memory Applications

Authors:

Ludmila Cherkasova,

Haris VolosAuthors Info & Claims

ICPE '16: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering

Pages 75 - 83

https://doi.org/10.1145/2851553.2851574

Published: 12 March 2016 Publication History

Abstract

Many modern large graph and Big Data processing applications operate on datasets that do not fit into DRAM of a single machine. This leads to a design of scale-out applications, where the application dataset is partitioned and processed by a cluster of machines. Distributed memory applications exhibit complex behavior: they tend to interleave computations and communications, use bursty transfers, and utilize global synchronization primitives. This makes it difficult to analyze the impact of communication layer on the application performance and answer the questions: how interconnect latency or bandwidth characteristics may change the application performance will the application performance scale when processed by a larger system? In this work, we introduce a novel emulation framework, called InterSense, which is implemented on top of existing high-speed interconnect, such as InfiniBand, and which provides two performance knobs for changing the (today's) interconnect bandwidth and latency. This approach offers an easy-to-use framework for a sensitivity analysis of complex distributed applications to communication layer performance instead of creating customized and time-consuming application models to answer the same questions. We evaluate the emulator accuracy with popular OSU MPI benchmark suite and two clusters with different generation InfiniBand interconnects (DDR and FDR): InterSense emulates the specified andwidth and latency values with less than 2% error between the expected and measured values. To demonstrate the InterSense's ease of use, we present a case study, where we apply InterSense for sensitivity analysis of four applications and benchmarks for getting non-trivial insights.

References

[1]

Emulab - Network Emulation Testbed, http://www.emulab.net/ .

[2]

Graph 500 Benchmark. www.graph500.org/.

[3]

HP Labs. The Machine: A new kind of computer. http://www.hpl.hp.com/research/systems-research/.

[4]

HPCC RandomAccess (GUPS) Benchmark. http://icl.cs.utk.edu/projectsfiles/hpcc/RandomAccess/.

[5]

MVAPICH: MPI over InfiniBand, 10GigE/iWARP and RoCE. http://mvapich.cse.ohio-state.edu/.

[6]

MVAPICH Ohio State University Micro benchmark. http://mvapich.cse.ohio-state.edu/benchmarks/.

[7]

NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html.

[8]

netem, http://www.linuxfoundation.org/collaborate/workgroups/networking/netem.

[9]

K. Asanovic. FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers. In Proc. of FAST, 2014.

[10]

G. Banga, J. C. Mogul, and P. Druschel. A scalable and Explicit Event Delivery Mechanism for UNIX. In Proc. of the USENIX Annual Technical Conference, 1999.

Digital Library

[11]

F. Checconi and F. Petrini. Traversing Trillions of Edges in Real Time: Graph Exploration on Large-Scale Parallel Machines. In Proc. of Intl. Parallel and Distributed Processing Symposium, IPDPS'14, 2014.

Digital Library

[12]

F. Checconi, F. Petrini, J. Willcock, A. Lumsdaine, A. R. Choudhury, and Y. Sabharwal. Breaking the Speed and Scalability Barriers for Graph Exploration on Distributed-Memory Machines. In Proc. of Conference on High Performance Computing Networking, Storage and Analysis, SC'12, 2012.

Digital Library

[13]

J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. Wasi-ur Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur, and D. K. Panda. Memcached Design on High Performance RDMA Capable Interconnects. In Proc. of the 2011 International Conference on Parallel Processing, ICPP '11, 2011.

Digital Library

[14]

X. Lu, M. Wasi-ur Rahman, N. S. Islam, D. Shankar, and D. K. D. Panda. Accelerating Spark with RDMA for Big Data Processing: Early Experiences. In Proc. of Hot Interconnects, 2014.

Digital Library

[15]

R. P. Martin, A. M. Vahdat, D. E. Culler, and T. E. Anderson. Effects of Communication Latency, Overhead, and Bandwidth in a Cluster Architecture. In Proc. of the 24th Annual International Symposium on Computer Architecture, ISCA '97, 1997.

Digital Library

[16]

E. M. Nahum, M.-C. Rosu, S. Seshan, and J. Almeida. The Effects of Wide-area Conditions on WWW Server Performance. In Proc. of the 2001 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '01, 2001.

Digital Library

[17]

B. D. Noble, M. Satyanarayanan, G. T. Nguyen, and R. H. Katz. Trace-Based Mobile Network Emulation. In Proc. of SIGCOMM, 1997.

Digital Library

[18]

X. Que, F. Checconi, and F. Petrini. Performance Analysis of Graph Algorithms on P7IH. In Proc. of the 29th Intl. Conference on Supercomputing, ISC'14, 2014.

Digital Library

[19]

V. Saxena, Y. Sabharwal, and P. Bhatotia. Performance evaluation and optimization of random memory access on multicores with high productivity. In Proc. of Intl. Conference on High Performance Computing (HiPC), 2010.

[20]

A. Vahdat, K. Yocum, K. Walsh, P. Mahadevan, D. Kostic, J. Chase, and D. Becker. Scalability and accuracy in a large-scale network emulator. SIGOPS Oper. Syst. Rev., 36(SI), Dec. 2002.

Digital Library

[21]

Q. Wang, L. Cherkasova, J. Li, and H. Volos. InterSense: Interconnect Performance Emulator for Future Scale-out Distributed Memory Applications. In Intl. Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2015.

Digital Library

[22]

M. Wasi-ur Rahman, N. S. Islam, X. Lu, J. Jose, H. Subramoni, H. Wang, and D. K. D. Panda. High-Performance RDMA-based Design of Hadoop MapReduce over InfiniBand. In Proc. of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, IPDPSW '13, 2013.

Digital Library

[23]

M. Wasiur-Rahman, X. Lu, N. S. Islam, R. Rajachandrasekar, and D. K. Panda. MapReduce over Lustre: Can RDMA-Based Approach Benefit? In Proc. of the 20th International Conference EuroPar, 2014.

[24]

B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, and A. Joglekar. An Integrated Experimental Environment for Distributed Systems and Networks. SIGOPS Oper. Syst. Rev., 36(SI), Dec. 2002.

Digital Library

[25]

H. Yu and A. Vahdat. The Costs and Limits of Availability for Replicated Services. In Proc. of the 18th ACM Symposium on Operating Systems Principles (SOSP), 2001.

Digital Library

Cited By

Underwood RAnderson JApon AWolter KKnottenbelt Wvan Hoorn ANambiar MKoziolek H(2018)Measuring Network Latency Variation Impacts to High Performance Computing Application PerformanceProceedings of the 2018 ACM/SPEC International Conference on Performance Engineering10.1145/3184407.3184427(68-79)Online publication date: 30-Mar-2018
https://dl.acm.org/doi/10.1145/3184407.3184427
Medya SCherkasova LSingh A(2017)Predictive modeling and scalability analysis for large graph analytics2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM)10.23919/INM.2017.7987265(63-71)Online publication date: May-2017
https://doi.org/10.23919/INM.2017.7987265
Salhi HOdeh FNasser RTaweel ABinder WCortellessa VKoziolek ASmirni EPoess M(2017)Open Source In-Memory Data Grid SystemsProceedings of the 8th ACM/SPEC on International Conference on Performance Engineering10.1145/3030207.3053671(163-164)Online publication date: 17-Apr-2017
https://dl.acm.org/doi/10.1145/3030207.3053671
Show More Cited By

Index Terms

Interconnect Emulator for Aiding Performance Analysis of Distributed Memory Applications
1. General and reference
  1. Cross-computing tools and techniques

Recommendations

Can software reliability outperform hardware reliability on high performance interconnects?: a case study with MPI over infiniband
ICS '08: Proceedings of the 22nd annual international conference on Supercomputing

An important part of modern supercomputing platforms is the network interconnect. As the number of computing nodes in clusters have increased, the role of the interconnect has become more important. Modern interconnects, such as InfiniBand, Quadrics, ...
Topology agnostic hot-spot avoidance with InfiniBand
The Best of CCGrid'2007: A Snapshot of an ‘Adolescent’ Area

InfiniBand has become a very popular interconnect due to its advanced features and open standard. Large-scale InfiniBand clusters are becoming very popular, as reflected by the TOP 500 supercomputer rankings. However, even with popular topologies such ...
High performance RDMA-based MPI implementation over infiniBand
Special issue I: The 17th annual international conference on supercomputing (ICS'03)

Although InfiniBand Architecture is relatively new in the high performance computing area, it offers many features which help us to improve the performance of communication subsystems. One of these features is Remote Direct Memory Access (RDMA) ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICPE '16: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering

March 2016

346 pages

ISBN:9781450340809

DOI:10.1145/2851553

General Chairs:
Alberto Avritzer
Sonatype, Inc., USA
,
Alexandru Iosup
Delft University of Technology, the Netherlands
,
Program Chairs:
Xiaoyun Zhu
Futurewei Technologies, USA
,
Steffen Becker
Chemnitz University of Technology, Germany

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 March 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICPE'16

Sponsor:

ICPE'16: ACM/SPEC International Conference on Performance Engineering

March 12 - 16, 2016

Delft, The Netherlands

Acceptance Rates

ICPE '16 Paper Acceptance Rate 23 of 74 submissions, 31%;

Overall Acceptance Rate 252 of 851 submissions, 30%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
102
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Underwood RAnderson JApon AWolter KKnottenbelt Wvan Hoorn ANambiar MKoziolek H(2018)Measuring Network Latency Variation Impacts to High Performance Computing Application PerformanceProceedings of the 2018 ACM/SPEC International Conference on Performance Engineering10.1145/3184407.3184427(68-79)Online publication date: 30-Mar-2018
https://dl.acm.org/doi/10.1145/3184407.3184427
Medya SCherkasova LSingh A(2017)Predictive modeling and scalability analysis for large graph analytics2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM)10.23919/INM.2017.7987265(63-71)Online publication date: May-2017
https://doi.org/10.23919/INM.2017.7987265
Salhi HOdeh FNasser RTaweel ABinder WCortellessa VKoziolek ASmirni EPoess M(2017)Open Source In-Memory Data Grid SystemsProceedings of the 8th ACM/SPEC on International Conference on Performance Engineering10.1145/3030207.3053671(163-164)Online publication date: 17-Apr-2017
https://dl.acm.org/doi/10.1145/3030207.3053671
Salhi HOdeh FNasser RTaweel A(2017)Benchmarking and Performance Analysis for Distributed Cache Systems: A Comparative Case StudyPerformance Evaluation and Benchmarking for the Analytics Era10.1007/978-3-319-72401-0_11(147-163)Online publication date: 30-Dec-2017
https://doi.org/10.1007/978-3-319-72401-0_11

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents