Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleSeptember 2024JUST ACCEPTED
Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless Communication
- Kuan-Yu Chen,
- Thomas Mason Nelson,
- Alireza Khadem,
- Morteza Fayazi,
- Sanjay Sri Vallabh Singapuram,
- Ronald Dreslinski,
- Nishil Talati,
- Hun-Seok Kim,
- David Blaauw
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3695880Stream processing, which involves real-time computation of data as it is created or received, is vital for various applications, specifically wireless communication. The evolving protocols, the requirement for high-throughput, and the challenges of ...
- research-articleAugust 2024JUST ACCEPTED
FPGA-Based Sparse Matrix Multiplication Accelerators: From State-of-the-art to Future Opportunities
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3687480Sparse matrix multiplication (SpMM) plays a critical role in high-performance computing applications, such as deep learning, image processing, and physical simulation. Field-Programmable Gate Arrays (FPGAs), with their configurable hardware resources, can ...
- research-articleAugust 2024JUST ACCEPTED
CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP Architecture
- research-articleAugust 2024JUST ACCEPTED
PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs
- Moazin Khatti,
- Xingyu Tian,
- Ahmad Sedigh Baroughi,
- Akhil Raj Baranwal,
- Yuze Chi,
- Licheng Guo,
- Jason Cong,
- Zhenman Fang
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3676849In recent years, the adoption of FPGAs in datacenters has increased, with a growing number of users choosing High-Level Synthesis (HLS) as their preferred programming method. While HLS simplifies FPGA programming, one notable challenge arises when scaling ...
-
- research-articleJuly 2024JUST ACCEPTED
SQL2FPGA: Automated Acceleration of SQL Query Processing on Modern CPU-FPGA Platforms
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3674843Today’s big data query engines are constantly under pressure to keep up with the rapidly increasing demand for faster processing of more complex workloads. In the past few years, FPGA-based database acceleration efforts have demonstrated promising ...
- research-articleJuly 2024JUST ACCEPTED
A Scalable Accelerator for Local Score Computation of Structure Learning in Bayesian Networks
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3674842A Bayesian network is a powerful tool for representing uncertainty in data, offering transparent and interpretable inference, unlike neural networks’ black-box mechanisms. To fully harness the potential of Bayesian networks, it is essential to learn the ...
- research-articleJuly 2024
NC-Library: Expanding SystemC Capabilities for Nested reConfigurable Hardware Modelling
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 3Article No.: 37, Pages 1–29https://doi.org/10.1145/3662001As runtime reconfiguration is used in an increasing number of hardware architectures, new simulation and modeling tools are needed to support the developer during the design phases. In this article, a language extension for SystemC is presented, together ...
- research-articleMay 2024
R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRA
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 2Article No.: 34, Pages 1–34https://doi.org/10.1145/3656642Emerging data-driven applications in the embedded, e-Health, and internet of things (IoT) domain require complex on-device signal analysis and data reduction to maximize energy efficiency on these energy-constrained devices. Coarse-grained reconfigurable ...
- research-articleApril 2024
Across Time and Space: Senju’s Approach for Scaling Iterative Stencil Loop Accelerators on Single and Multiple FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 2Article No.: 28, Pages 1–33https://doi.org/10.1145/3634920Stencil-based applications play an essential role in high-performance systems as they occur in numerous computational areas, such as partial differential equation solving. In this context, Iterative Stencil Loops (ISLs) represent a prominent and well-...
- research-articleApril 2024
HyBNN: Quantifying and Optimizing Hardware Efficiency of Binary Neural Networks
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 2Article No.: 25, Pages 1–24https://doi.org/10.1145/3631610Binary neural network (BNN), where both the weight and the activation values are represented with one bit, provides an attractive alternative to deploy highly efficient deep learning inference on resource-constrained edge devices. However, our ...
- research-articleApril 2024JUST ACCEPTED
HLPerf: Demystifying the Performance of HLS-based Graph Neural Networks with Dataflow Architectures
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3655627The development of FPGA-based applications using HLS is fraught with performance pitfalls and large design space exploration times. These issues are exacerbated when the application is complicated and its performance is dependent on the input data set, as ...
- research-articleSeptember 2024
FADO: Floorplan-Aware Directive Optimization Based on Synthesis and Analytical Models for High-Level Synthesis Designs on Multi-Die FPGAs
- Linfeng Du,
- Tingyuan Liang,
- Xiaofeng Zhou,
- Jinming Ge,
- Shangkun Li,
- Sharad Sinha,
- Jieru Zhao,
- Zhiyao Xie,
- Wei Zhang
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 3Article No.: 47, Pages 1–33https://doi.org/10.1145/3653458Multi-die FPGAs are widely adopted for large-scale accelerators, but optimizing high-level synthesis designs on these FPGAs faces two challenges. First, the delay caused by die-crossing nets creates an NP-hard floorplanning problem. Second, traditional ...
- research-articleMarch 2024
ExHiPR: Extended High-Level Partial Reconfiguration for Fast Incremental FPGA Compilation
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 2Article No.: 21, Pages 1–28https://doi.org/10.1145/3617837Partial Reconfiguration (PR) is a key technique in the application design on modern FPGAs. However, current PR tools heavily rely on the developer to manually conduct PR module definition, floorplanning, and flow control at a low level. The existing PR ...
- research-articleMarch 2024
XVDPU: A High-Performance CNN Accelerator on the Versal Platform Powered by the AI Engine
- Xijie Jia,
- Yu Zhang,
- Guangdong Liu,
- Xinlin Yang,
- Tianyu Zhang,
- Jia Zheng,
- Dongdong Xu,
- Zhuohuan Liu,
- Mengke Liu,
- Xiaoyang Yan,
- Hong Wang,
- Rongzhang Zheng,
- Li Wang,
- Dong Li,
- Satyaprakash Pareek,
- Jian Weng,
- Lu Tian,
- Dongliang Xie,
- Hong Luo,
- Yi Shan
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 2Article No.: 20, Pages 1–24https://doi.org/10.1145/3617836Today, convolutional neural networks (CNNs) are widely used in computer vision applications. However, the trends of higher accuracy and higher resolution generate larger networks. The requirements of computation or I/O are the key bottlenecks. In this ...
- research-articleMarch 2024
GraphScale: Scalable Processing on FPGAs for HBM and Large Graphs
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 2Article No.: 22, Pages 1–23https://doi.org/10.1145/3616497Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine learning and data ...
- research-articleSeptember 2024
DONGLE 2.0: Direct FPGA-Orchestrated NVMe Storage for HLS
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 3Article No.: 45, Pages 1–32https://doi.org/10.1145/3650038Rapid growth in data size poses significant computational and memory challenges to data processing. FPGA accelerators and near-storage processing have emerged as compelling solutions for tackling the growing computational and memory requirements. Many ...
- research-articleFebruary 2024
Eciton: Very Low-power Recurrent Neural Network Accelerator for Real-time Inference at the Edge
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 1Article No.: 16, Pages 1–25https://doi.org/10.1145/3629979This article presents Eciton, a very low-power recurrent neural network accelerator for time series data within low-power edge sensor nodes, achieving real-time inference with a power consumption of 17 mW under load. Eciton reduces memory and chip ...
- research-articleJanuary 2024
Tailor: Altering Skip Connections for Resource-Efficient Inference
- Olivia Weng,
- Gabriel Marcano,
- Vladimir Loncar,
- Alireza Khodamoradi,
- Abarajithan G,
- Nojan Sheybani,
- Andres Meza,
- Farinaz Koushanfar,
- Kristof Denolf,
- Javier Mauricio Duarte,
- Ryan Kastner
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 1Article No.: 11, Pages 1–23https://doi.org/10.1145/3624990Deep neural networks use skip connections to improve training convergence. However, these skip connections are costly in hardware, requiring extra buffers and increasing on- and off-chip memory utilization and bandwidth requirements. In this article, we ...