skip to main content
research-article

RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors

Published: 01 February 2002 Publication History

Abstract

The early 1990s saw several announcements of commercial shared-memory systems using processors that aggressively exploited instruction-level parallelism (ILP), including the MIPS R10000, Hewlett-Packard PA8000, and Intel Pentium Pro. These processors could potentially reduce memory read stalls by over-lapping read latency with other operations, possibly changing the nature of performance bottlenecks in the system.The authors' experience with Rsim demonstrates that modeling ILP features is important even in shared-memory multiprocessor systems. In particular, current simple processor-based approximations cannot model significant performance effects for applications exhibiting parallel read misses. Further, recent shared-memory designs such as aggressive implementations of sequential consistency use the aggressive ILP-enhancing features of modern processors that simple processor-based simulators do not model.As microprocessor systems become more complex, the availability of shared infrastructure source code is likely to become increasingly crucial. The authors plan to release a new Rsim version shortly that will include instruction caches, TLBs, multimedia extensions, simultaneous multithreading, Rabbit fast simulation mode, and ports to Linux platforms.

References

[1]
K. Gharachorloo A. Gupta and J. Hennessy, "Two Techniques to Enhance the Performance of Memory Consistency Models," Proc. Int'l Conf. Parallel Processing (ICPP 91), vol. I, CRC Press, Boca Raton, Fla., 1991, pp. 355-364.
[2]
R.G. Covington, et al., "The Efficient Simulation of Parallel Computer Systems," Int'l J. Computer Simulation, Jan. 1991, pp. 31-58.
[3]
V.S. Pai P. Ranganathan and S.V. Adve, "The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology," Proc. 3rd IEEE Symp. High-Performance Computer Architecture (HPCA 97), IEEE CS Press, Los Alamitos, Calif., 1997, pp. 72-83.
[4]
M. Durbhakula V.S. Pai and S.V. Adve, "Improving the Accuracy vs. Speed Tradeoff for Simulating Shared-Memory Multiprocessors with ILP Processors," Proc. 5th Int'l Symp. High-Performance Computer Architecture (HPCA 99), IEEE CS Press, Los Alamitos, Calif., 1999, pp. 23-32.
[5]
V.S. Pai and S.V. Adve, "Code Transformations to Improve Memory Parallelism," Proc. 32nd Ann. Int'l Symp. Microarchitecture, (MICRO 99), IEEE CS Press, Los Alamitos, Calif., 1999, pp. 147-155.
[6]
V.S. Pai and S.V. Adve, "Comparing and Combining Read Miss Clustering and Software Prefetching," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT 01), IEEE CS Press, Los Alamitos, Calif., 2001, pp. 292-303.
[7]
J. Gibson, et al., "FLASH vs. (Simulated) FLASH: Closing the Simulation Loop," Proc. 9th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS 00), ACM Press, New York, 2000, pp. 49-58.
[8]
M. Rosenblum, et al., "Using the SimOS Machine Simulator to Study Complex Computer Systems," ACM Trans. Modeling and Computer Simulation, vol. 7, no. 1, 1997, pp. 78-103.
[9]
E. Schnarr and J. Larus, "Fast Out-of-Order Processor Simulation Using Memoization," Proc. 8th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS 98), ACM Press, New York, 1998, pp. 283-294.
[10]
M. Oskin F.T. Chong and M. Farrens, "HLS: Combining Statistical and Symbolic Simulation to Guide Microprocessor Designs," Proc. 27th Ann. Int'l Symp. Computer Architecture (ISCA 00), ACM Press, New York, 2000, pp. 71-82.
[11]
J. Gibson, et al., "FLASH vs. (Simulated) FLASH: Closing the Simulation Loop," Proc. 9th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS 00), ACM Press, New York, 2000, pp. 49-58.
[12]
R. Desikan D. Burger and S.W. Keckler, "Measuring Experimental Error in Microprocessor Simulation," Proc. 28th Ann. Int'l Symp. Computer Architecture (ISCA 01), ACM Press, New York, 2001, pp. 266-277.

Cited By

View all
  • (2022)PPT-Multicore: performance prediction of OpenMP applications using reuse profiles and analytical modelingThe Journal of Supercomputing10.1007/s11227-021-03949-478:2(2354-2385)Online publication date: 1-Feb-2022
  • (2017)ParTejasACM Transactions on Modeling and Computer Simulation10.1145/307758227:3(1-24)Online publication date: 2-Aug-2017
  • (2015)The Design and Experiments of A SID-Based Power-Aware Simulator for Embedded Multicore SystemsACM Transactions on Design Automation of Electronic Systems10.1145/269983420:2(1-27)Online publication date: 2-Mar-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Computer
Computer  Volume 35, Issue 2
February 2002
105 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 February 2002

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)PPT-Multicore: performance prediction of OpenMP applications using reuse profiles and analytical modelingThe Journal of Supercomputing10.1007/s11227-021-03949-478:2(2354-2385)Online publication date: 1-Feb-2022
  • (2017)ParTejasACM Transactions on Modeling and Computer Simulation10.1145/307758227:3(1-24)Online publication date: 2-Aug-2017
  • (2015)The Design and Experiments of A SID-Based Power-Aware Simulator for Embedded Multicore SystemsACM Transactions on Design Automation of Electronic Systems10.1145/269983420:2(1-27)Online publication date: 2-Mar-2015
  • (2013)A survey on cache tuning from a power/energy perspectiveACM Computing Surveys10.1145/2480741.248074945:3(1-49)Online publication date: 3-Jul-2013
  • (2013)SimsysProceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools10.1145/2432516.2432517(1-8)Online publication date: 21-Jan-2013
  • (2012)AttackboardProceedings of the 49th Annual Design Automation Conference10.1145/2228360.2228428(376-381)Online publication date: 3-Jun-2012
  • (2012)Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihostingProceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units10.1145/2159430.2159442(101-109)Online publication date: 3-Mar-2012
  • (2011)Design of multi-channel wireless NoC to improve on-chip communication capacityProceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip10.1145/1999946.1999975(177-184)Online publication date: 1-May-2011
  • (2011)EASEProceedings of the 16th Western Canadian Conference on Computing Education10.1145/1989622.1989629(23-27)Online publication date: 6-May-2011
  • (2010)P-GASProceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation10.1109/PADS.2010.5471655(89-96)Online publication date: 17-May-2010
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media