Other architectures

Applied Filters

People

Publications

Reproducibility Badges

Publication Date

Searched The ACM Guide to Computing Literature (3,762,366 records)|Limit your search to The ACM Full-Text Collection (757,133 records)

Showing 1 - 20of145 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
Free
September 2024
JUST ACCEPTED
Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless Communication
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3695880
Stream processing, which involves real-time computation of data as it is created or received, is vital for various applications, specifically wireless communication. The evolving protocols, the requirement for high-throughput, and the challenges of ...
0
Metrics
Total Citations0
View online with eReader
PDF
research-article
Free
September 2024
JUST ACCEPTED
Graph-OPU: A highly flexible FPGA-Based Overlay Processor for Graph Neural Networks
- Enhao Tang,
- Shun Li,
- Ruiqi Chen,
- Hao Zhou,
- Yuhanxiao Ma,
- Haoyang Zhang,
- Jun Yu,
- Kun Wang
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3691636
Field-programmable gate arrays (FPGAs) are an ideal candidate for accelerating graph neural networks (GNNs). However, the FPGA redeployment process is time-consuming when updating or switching between diverse GNN models across different applications. ...
0
119
Metrics
Total Citations0
Total Downloads119
Last 12 Months119
Last 6 weeks119
View online with eReader
PDF
research-article
Free
August 2024
JUST ACCEPTED
FPGA-Based Sparse Matrix Multiplication Accelerators: From State-of-the-art to Future Opportunities
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3687480
Sparse matrix multiplication (SpMM) plays a critical role in high-performance computing applications, such as deep learning, image processing, and physical simulation. Field-Programmable Gate Arrays (FPGAs), with their configurable hardware resources, can ...
0
224
Metrics
Total Citations0
Total Downloads224
Last 12 Months224
Last 6 weeks224
View online with eReader
PDF
research-article
Free
August 2024
JUST ACCEPTED
CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP Architecture
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3686163
Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning applications. To cope with the high computation demands of these applications, heterogeneous architectures featuring both FPGA and dedicated ASIC accelerators have ...
0
142
Metrics
Total Citations0
Total Downloads142
Last 12 Months142
Last 6 weeks142
View online with eReader
PDF
research-article
Free
August 2024
JUST ACCEPTED
PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3676849
In recent years, the adoption of FPGAs in datacenters has increased, with a growing number of users choosing High-Level Synthesis (HLS) as their preferred programming method. While HLS simplifies FPGA programming, one notable challenge arises when scaling ...
0
92
Metrics
Total Citations0
Total Downloads92
Last 12 Months92
Last 6 weeks92
View online with eReader
PDF
research-article
Free
July 2024
JUST ACCEPTED
SQL2FPGA: Automated Acceleration of SQL Query Processing on Modern CPU-FPGA Platforms
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3674843
Today’s big data query engines are constantly under pressure to keep up with the rapidly increasing demand for faster processing of more complex workloads. In the past few years, FPGA-based database acceleration efforts have demonstrated promising ...
0
75
Metrics
Total Citations0
Total Downloads75
Last 12 Months75
Last 6 weeks27
View online with eReader
PDF
research-article
Free
July 2024
JUST ACCEPTED
A Scalable Accelerator for Local Score Computation of Structure Learning in Bayesian Networks
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3674842
A Bayesian network is a powerful tool for representing uncertainty in data, offering transparent and interpretable inference, unlike neural networks’ black-box mechanisms. To fully harness the potential of Bayesian networks, it is essential to learn the ...
0
74
Metrics
Total Citations0
Total Downloads74
Last 12 Months74
Last 6 weeks21
View online with eReader
PDF
research-article
Open Access
July 2024
NC-Library: Expanding SystemC Capabilities for Nested reConfigurable Hardware Modelling
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 3Article No.: 37, Pages 1–29https://doi.org/10.1145/3662001
As runtime reconfiguration is used in an increasing number of hardware architectures, new simulation and modeling tools are needed to support the developer during the design phases. In this article, a language extension for SystemC is presented, together ...
0
307
Metrics
Total Citations0
Total Downloads307
Last 12 Months307
Last 6 weeks135
View online with eReader
PDF
research-article
Free
June 2024
JUST ACCEPTED
Efficient SpMM Accelerator for Deep Learning: Sparkle and Its Automated Generator
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3665896
Deep learning (DL) technology has made breakthroughs in a wide range of intelligent tasks such as vision, language, recommendation systems, etc. Sparse matrix multiplication (SpMM) is the key computation kernel of most sparse models. Conventional ...
0
153
Metrics
Total Citations0
Total Downloads153
Last 12 Months153
Last 6 weeks37
View online with eReader
PDF
research-article
Open Access
May 2024
R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRA
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 2Article No.: 34, Pages 1–34https://doi.org/10.1145/3656642
Emerging data-driven applications in the embedded, e-Health, and internet of things (IoT) domain require complex on-device signal analysis and data reduction to maximize energy efficiency on these energy-constrained devices. Coarse-grained reconfigurable ...
0
1,469
Metrics
Total Citations0
Total Downloads1,469
Last 12 Months1,469
Last 6 weeks130
View online with eReader
PDF
research-article
Open Access
April 2024
Across Time and Space: Senju’s Approach for Scaling Iterative Stencil Loop Accelerators on Single and Multiple FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 2Article No.: 28, Pages 1–33https://doi.org/10.1145/3634920
Stencil-based applications play an essential role in high-performance systems as they occur in numerous computational areas, such as partial differential equation solving. In this context, Iterative Stencil Loops (ISLs) represent a prominent and well-...
1
981
Metrics
Total Citations1
Total Downloads981
Last 12 Months981
Last 6 weeks59
View online with eReader
PDF
research-article
April 2024
HyBNN: Quantifying and Optimizing Hardware Efficiency of Binary Neural Networks
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 2Article No.: 25, Pages 1–24https://doi.org/10.1145/3631610
Binary neural network (BNN), where both the weight and the activation values are represented with one bit, provides an attractive alternative to deploy highly efficient deep learning inference on resource-constrained edge devices. However, our ...
0
354
Metrics
Total Citations0
Total Downloads354
Last 12 Months354
Last 6 weeks45
Get Access
research-article
Free
April 2024
JUST ACCEPTED
HLPerf: Demystifying the Performance of HLS-based Graph Neural Networks with Dataflow Architectures
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3655627
The development of FPGA-based applications using HLS is fraught with performance pitfalls and large design space exploration times. These issues are exacerbated when the application is complicated and its performance is dependent on the input data set, as ...
0
233
Metrics
Total Citations0
Total Downloads233
Last 12 Months233
Last 6 weeks49
View online with eReader
PDF
research-article
Open Access
September 2024
FADO: Floorplan-Aware Directive Optimization Based on Synthesis and Analytical Models for High-Level Synthesis Designs on Multi-Die FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 3Article No.: 47, Pages 1–33https://doi.org/10.1145/3653458
Multi-die FPGAs are widely adopted for large-scale accelerators, but optimizing high-level synthesis designs on these FPGAs faces two challenges. First, the delay caused by die-crossing nets creates an NP-hard floorplanning problem. Second, traditional ...
0
252
Metrics
Total Citations0
Total Downloads252
Last 12 Months252
Last 6 weeks45
View online with eReader
PDF
research-article
Open Access
March 2024
ExHiPR: Extended High-Level Partial Reconfiguration for Fast Incremental FPGA Compilation
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 2Article No.: 21, Pages 1–28https://doi.org/10.1145/3617837
Partial Reconfiguration (PR) is a key technique in the application design on modern FPGAs. However, current PR tools heavily rely on the developer to manually conduct PR module definition, floorplanning, and flow control at a low level. The existing PR ...
0
1,298
Metrics
Total Citations0
Total Downloads1,298
Last 12 Months1,298
Last 6 weeks99
View online with eReader
PDF
research-article
March 2024
XVDPU: A High-Performance CNN Accelerator on the Versal Platform Powered by the AI Engine
- Xijie Jia,
- Yu Zhang,
- Guangdong Liu,
- Xinlin Yang,
- Tianyu Zhang,
- Jia Zheng,
- Dongdong Xu,
- Zhuohuan Liu,
- Mengke Liu,
- Xiaoyang Yan,
- Hong Wang,
- Rongzhang Zheng,
- Li Wang,
- Dong Li,
- Satyaprakash Pareek,
- Jian Weng,
- Lu Tian,
- Dongliang Xie,
- Hong Luo,
- Yi Shan
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 2Article No.: 20, Pages 1–24https://doi.org/10.1145/3617836
Today, convolutional neural networks (CNNs) are widely used in computer vision applications. However, the trends of higher accuracy and higher resolution generate larger networks. The requirements of computation or I/O are the key bottlenecks. In this ...
0
863
Metrics
Total Citations0
Total Downloads863
Last 12 Months863
Last 6 weeks118
Get Access
research-article
Open Access
March 2024
GraphScale: Scalable Processing on FPGAs for HBM and Large Graphs
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 2Article No.: 22, Pages 1–23https://doi.org/10.1145/3616497
Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine learning and data ...
0
1,512
Metrics
Total Citations0
Total Downloads1,512
Last 12 Months1,512
Last 6 weeks119
View online with eReader
PDF
research-article
September 2024
DONGLE 2.0: Direct FPGA-Orchestrated NVMe Storage for HLS
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 3Article No.: 45, Pages 1–32https://doi.org/10.1145/3650038
Rapid growth in data size poses significant computational and memory challenges to data processing. FPGA accelerators and near-storage processing have emerged as compelling solutions for tackling the growing computational and memory requirements. Many ...
0
204
Metrics
Total Citations0
Total Downloads204
Last 12 Months204
Last 6 weeks47
Get Access
research-article
Open Access
February 2024
Eciton: Very Low-power Recurrent Neural Network Accelerator for Real-time Inference at the Edge
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 1Article No.: 16, Pages 1–25https://doi.org/10.1145/3629979
This article presents Eciton, a very low-power recurrent neural network accelerator for time series data within low-power edge sensor nodes, achieving real-time inference with a power consumption of 17 mW under load. Eciton reduces memory and chip ...
0
1,454
Metrics
Total Citations0
Total Downloads1,454
Last 12 Months1,454
Last 6 weeks90
View online with eReader
PDF
research-article
Open Access
January 2024
Tailor: Altering Skip Connections for Resource-Efficient Inference
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 1Article No.: 11, Pages 1–23https://doi.org/10.1145/3624990
Deep neural networks use skip connections to improve training convergence. However, these skip connections are costly in hardware, requiring extra buffers and increasing on- and off-chip memory utilization and bandwidth requirements. In this article, we ...
1
1,745
Metrics
Total Citations1
Total Downloads1,745
Last 12 Months1,745
Last 6 weeks117
View online with eReader
PDF

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Reproducibility Badges

Publication Date

Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless Communication

Graph-OPU: A highly flexible FPGA-Based Overlay Processor for Graph Neural Networks

FPGA-Based Sparse Matrix Multiplication Accelerators: From State-of-the-art to Future Opportunities

CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP Architecture

PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs

SQL2FPGA: Automated Acceleration of SQL Query Processing on Modern CPU-FPGA Platforms

A Scalable Accelerator for Local Score Computation of Structure Learning in Bayesian Networks

NC-Library: Expanding SystemC Capabilities for Nested reConfigurable Hardware Modelling

Efficient SpMM Accelerator for Deep Learning: Sparkle and Its Automated Generator

R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRA

Across Time and Space: Senju’s Approach for Scaling Iterative Stencil Loop Accelerators on Single and Multiple FPGAs

HyBNN: Quantifying and Optimizing Hardware Efficiency of Binary Neural Networks

HLPerf: Demystifying the Performance of HLS-based Graph Neural Networks with Dataflow Architectures

FADO: Floorplan-Aware Directive Optimization Based on Synthesis and Analytical Models for High-Level Synthesis Designs on Multi-Die FPGAs

ExHiPR: Extended High-Level Partial Reconfiguration for Fast Incremental FPGA Compilation

XVDPU: A High-Performance CNN Accelerator on the Versal Platform Powered by the AI Engine

GraphScale: Scalable Processing on FPGAs for HBM and Large Graphs

DONGLE 2.0: Direct FPGA-Orchestrated NVMe Storage for HLS

Eciton: Very Low-power Recurrent Neural Network Accelerator for Real-time Inference at the Edge

Tailor: Altering Skip Connections for Resource-Efficient Inference