skip to main content
10.5555/2337159.2337194acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Reducing memory reference energy with opportunistic virtual caching

Published: 09 June 2012 Publication History

Abstract

Most modern cores perform a highly-associative transaction look aside buffer (TLB) lookup on every memory access. These designs often hide the TLB lookup latency by overlapping it with L1 cache access, but this overlap does not hide the power dissipated by TLB lookups. It can even exacerbate the power dissipation by requiring higher associativity L1 cache. With today's concern for power dissipation, designs could instead adopt a virtual L1 cache, wherein TLB access power is dissipated only after L1 cache misses. Unfortunately, virtual caches have compatibility issues, such as supporting writeable synonyms and x86's physical page table walker.
This work proposes an Opportunistic Virtual Cache (OVC) that exposes virtual caching as a dynamic optimization by allowing some memory blocks to be cached with virtual addresses and others with physical addresses. OVC relies on small OS changes to signal which pages can use virtual caching (e.g., no writeable synonyms), but defaults to physical caching for compatibility. We show OVC's promise with analysis that finds virtual cache problems exist, but are dynamically rare. We change 240 lines in Linux 2.6.28 to enable OVC. On experiments with Parsec and commercial workloads, the resulting system saves 94-99% of TLB lookup energy and nearly 23% of L1 cache dynamic lookup energy.

References

[1]
Ashok, R., Chheda, S., and Moritz, C. A. Cool-Mem: combining statically speculative memory accessing with selective address translation for energy efficiency. Proc. of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, 2002.
[2]
Barr, T. W., Cox, A. L., and Rixner, S. SpecTLB: a mechanism for speculative address translation. Proc. of the 38th Annual Intnl. Symp. on Computer Architecture, 2011.
[3]
Bhargava, R., Serebrin, B., Spadini, F., and Manne, S. Accelerating two-dimensional page walks for virtualized systems. Proc. of the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems, 2008.
[4]
Binkert, N., Beckmann, B., Black, G., et al. The gem5 simulator. Computer Architecture News, 2011.
[5]
Cekleov, M. and Dubois, M. Virtual-Address Caches Part 1: Problems and Solutions in Uniprocessors. IEEE Micro 17, 5 (1997).
[6]
Cekleov, M. and Dubois, M. Virtual-Address Caches, Part 2: Multiprocessor Issues. IEEE Micro 17, 6 (1997).
[7]
Chang, Y.-J. and Lan, M.-F. Two new techniques integrated for energy-efficient TLB design. IEEE Trans. Very Large Scale Integr. System 15, 1 (2007).
[8]
Chase, J. S., Levy, H. M., Lazowska, E. D., and Baker-Harvey, M. Lightweight shared objects in a 64-bit operating system. Object-oriented programming systems, languages, and applications, 1992.
[9]
Consortium, I. S. Berkeley Internet Name Domain (BIND). http://www.isc.org/software/bind.
[10]
Diefendorff, K., Oehler, R., and Hochsprung, and R. Evolution of the PowerPC Architecture. IEEE Micro 14, 2 (1994).
[11]
Ekman, M., Dahlgren, F., and Stenstrom, P. TLB and Snoop Energy-Reduction using Virtual Caches in Low-Power Chip-Multiprocessors. In Proceedings of International Symposium on Low Power Electronics and Design, 2002, 243--246.
[12]
Eric J. Koldinger, J. S. C. and Eggers, S. J. Architecture support for single address space operating systems. In Proc. of the 5th international conference on Architectural support for programming languages and operating systems, 1992.
[13]
Goodman, J. R. Coherency for multiprocessor virtual address caches. Proc. of the 2nd international conference on Architectural support for programming languages and operating systems, 1987.
[14]
J. H. Lee, C. W. and Kim, S. D. Selective block buffering TLB system for embedded processors. IEE Proc. Comput. Dig. Techniques 152, 4 (2002).
[15]
Jacob, B. and Mudge, T. Uniprocessor Virtual Memory without TLBs. IEEE Trans. on Computer 50, 5 (2001).
[16]
Juan, T., Lang, T., and Navarro, J. J. Reducing TLB power requirements. Proc. of the international symposium on Low power electronics and design, 1997.
[17]
Kadayif, I., Nath, P., Kandemir, M., and Sivasubramaniam, A. Reducing Data TLB Power via Compiler-Directed Address Generation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 26, 2 (2007).
[18]
Kadayif, I., Sivasubramaniam, A., Kandemir, M., Kandiraju, G., and Chen, G. Generating physical addresses directly for saving instruction TLB energy. Proc. of the 35th annual ACM/IEEE international symposium on Microarchitecture, 2002.
[19]
Kim, J., Min, S. L., Jeon, S., Ahn, B., Jeong, D.-K., and Kim, C. S. U-cache: a cost-effective solution to synonym problem. 1st IEEE symposium on High-Performance Computer Architecture, (HPCA) 1995.
[20]
Larus, G. H. J., Abadi, M., Aiken, M., et al. An Overview of the Singularity Project. Microsoft Research, 2005.
[21]
Lee, H.-H.S. and Ballapuram, C. S. Energy efficient D-TLB and data cache using semantic-aware multilateral partitioning. Proc. of the international symposium on Low power electronics and design, 2003.
[22]
Luk, C.-K., Cohn, R., Muth, R., et al. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. Proc. of the SIGPLAN 2005 Conference on Programming Language Design and Implementation, 2005.
[23]
Lynch, W. L. The Interaction of Virtual Memory and Cache Memory. Stanford University, 1993.
[24]
Manne, S., Klauser, A., Grunwald, D., and Somenzi, F. "Low power TLB design for high performance microprocessors." University of Colorado, Boulder, 1997.
[25]
McNairy, C. and Soltis, D. Itanium 2 Processor Microarchitecture. IEEE Micro 23, 2 (2003), 44--55.
[26]
memcached - a distributed memory object caching system. www.memcached.org.
[27]
Mozilla, M. Firefox,. http://www.mozilla.org/.
[28]
Muralimanohar, N., Balasubramonian, R., and Jouppi, N. P. CACTI 6.0. Hewlett Packard Labs, 2009.
[29]
Patterson, D. A. and Hennessy, J. L. Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufmann, 2005.
[30]
Princeton Application Repository for Shared-Memory Computers. http://parsec.cs.princeton.edu/.
[31]
Puttaswamy, K. and Loh, G. H. Thermal analysis of a 3D die-stacked high-performance microprocessor. 16th ACM Great Lakes symposium on VLSI, 2006.
[32]
Qiu, X. and Dubois, M. The Synonym Lookaside Buffer: A Solution to the Synonym Problem in Virtual Caches. IEEE Trans. on Computers 57, 12 (2008).
[33]
Sodani, A. Race to Exascale: Opportunities and Challenges. MICRO 2011 Keynote talk.
[34]
Talluri, M., Kong, S., Hill, M. D., and Patterson, D. A. Tradeoffs in Supporting Two Page Sizes. Proc. of the 19th Annual International Symposium on Computer Architecture, 1992.
[35]
Wang, W. H., Baer, J.-L., and Levy, and H. M. Organization and performance of a two-level virtual-real cache hierarchy. Proc. of the 16th annual international symposium on Computer architecture, 1989.
[36]
Wiggins, A. and Heiser, G. Fast Address-Space Switching on the StrongARM SA-1100 Processor. Proc. of the 5th Australasian Computer Architecture Conference, (1999).
[37]
Woo, D. H., Ghosh, M., özer, E., Biles, S., and Lee, H.-H.S. Reducing energy of virtual cache synonym lookup using bloom filters. In Proceedings of the international conference on Compilers, architecture and synthesis for embedded systems (CASES), 2006.
[38]
Wood, D. A., Eggers, S. J., Gibson, G., Hill, M. D., and Pendleton, J. M. An in-cache address translation mechanism. ISCA'86: 13th annual international symposium on Computer architecture, (1986).
[39]
Zhou, X. and Petrov, P. Heterogeneously tagged caches for low-power embedded systems with virtual memory support. ACM Transactions on Design Automation of Electronic Systems (TODAES) 13, 2 (2008).
[40]
Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A, Part1, Chapter 2. 2009.
[41]
SpecJBB 2005. http://www.spec.org/jbb2005/.

Cited By

View all
  • (2023)Mosaic Pages: Big TLB Reach with Small PagesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582021(433-448)Online publication date: 25-Mar-2023
  • (2023)FlexPointer: Fast Address Translation Based on Range TLB and Tagged PointersACM Transactions on Architecture and Code Optimization10.1145/357985420:2(1-24)Online publication date: 1-Mar-2023
  • (2022)CARAT CAKE: replacing paging via compiler/kernel cooperationProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507771(98-114)Online publication date: 28-Feb-2022
  • Show More Cited By
  1. Reducing memory reference energy with opportunistic virtual caching

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISCA '12: Proceedings of the 39th Annual International Symposium on Computer Architecture
    June 2012
    584 pages
    ISBN:9781450316422
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 40, Issue 3
      ISCA '12
      June 2012
      559 pages
      ISSN:0163-5964
      DOI:10.1145/2366231
      Issue’s Table of Contents

    Sponsors

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 09 June 2012

    Check for updates

    Qualifiers

    • Research-article

    Conference

    ISCA '12
    Sponsor:

    Acceptance Rates

    ISCA '12 Paper Acceptance Rate 47 of 262 submissions, 18%;
    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Upcoming Conference

    ISCA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Mosaic Pages: Big TLB Reach with Small PagesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582021(433-448)Online publication date: 25-Mar-2023
    • (2023)FlexPointer: Fast Address Translation Based on Range TLB and Tagged PointersACM Transactions on Architecture and Code Optimization10.1145/357985420:2(1-24)Online publication date: 1-Mar-2023
    • (2022)CARAT CAKE: replacing paging via compiler/kernel cooperationProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507771(98-114)Online publication date: 28-Feb-2022
    • (2022)Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load PredictionProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00015(1-18)Online publication date: 1-Oct-2022
    • (2020)Tailored page sizesProceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00078(900-912)Online publication date: 30-May-2020
    • (2019)Translation rangerProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322223(698-710)Online publication date: 22-Jun-2019
    • (2018)LegoOSProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291175(69-87)Online publication date: 8-Oct-2018
    • (2018)Decoupling address generation from loads and stores to improve data access energy efficiencyACM SIGPLAN Notices10.1145/3299710.321134053:6(65-75)Online publication date: 19-Jun-2018
    • (2018)Filtering Translation Bandwidth with Virtual CachingACM SIGPLAN Notices10.1145/3296957.317319553:2(113-127)Online publication date: 19-Mar-2018
    • (2018)Decoupling address generation from loads and stores to improve data access energy efficiencyProceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3211332.3211340(65-75)Online publication date: 19-Jun-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media