PLDI 2025
Mon 16 - Fri 20 June 2025 Seoul, South Korea
Tue 17 Jun 2025 11:10 - 11:30 at Violet - Embedded Systems and Real-Time Optimization Chair(s): Yulei Sui

Fast Fourier Transform (FFT) is critical in applications such as signal processing, communications, and AI.
Embedded GPUs are often used to accelerate FFT due to their computational efficiency, but energy efficiency remains a key challenge due to power constraints.
Existing solutions, such as the cuFFT library provided by NVIDIA, employ static configurations for the number of thread blocks and threads per block.
This static approach often results in ineffective threads that consume power without contributing to performance, particularly if the FFT length or batch size varies.
Furthermore, for large FFT lengths, cuFFT internally splits the computation into multiple kernel invocations.
This decomposition can lead to L2 cache thrashing,
resulting in redundant global memory accesses and degraded efficiency.
To address these challenges, this paper proposes SSFFT, a software technique for embedded GPUs.
The key idea of SSFFT is to maximize the number of useful threads that contribute to performance while minimizing ineffective threads.
SSFFT is implemented based on a novel theoretical model that determines how many thread blocks and threads per block are effective for a given FFT length, batch size, and hardware resource availability.
SSFFT statically determines these configurations and adaptively launches either a GPU kernel for regular FFT operations or a newly implemented kernel that integrates multiple FFT steps.
By tailoring thread allocation to workload characteristics and minimizing inter-kernel memory interference, SSFFT improves energy efficiency without compromising performance.
In our evaluation, SSFFT achieves a 1.29$\times$ speedup and a 1.26$\times$ improvement in throughput per watt compared to cuFFT.

Tue 17 Jun

Displayed time zone: Seoul change

10:30 - 11:50
Embedded Systems and Real-Time OptimizationLCTES at Violet
Chair(s): Yulei Sui University of New South Wales
10:30
20m
Talk
rtesbench: A Multi-core Benchmark Framework for Real-Time Embedded Systems
LCTES
Yixiao Xing Nagoya University, Yixiao Li Nagoya University, Hiroaki Takada Nagoya University
DOI
10:50
20m
Talk
ASC-Hook: Efficient System Call Interception for ARM
LCTES
Yang Shen National University of Defense Technology, Min Xie National University of Defense Technology, Tao Wu Changsha University of Science and Technology, Wenzhe Zhang National University of Defense Technology, China, Ruibo Wang National University of Defense Technology, Gen Zhang National University of Defense Technology
DOI
11:10
20m
Talk
SSFFT: Energy-Efficient Selective Scaling for Fast Fourier Transform in Embedded GPUs
LCTES
Dongwon Yang Korea University, Jaebeom Jeon Korea University, Minseong Gil Korea University, Junsu Kim Korea University, Seondeok Kim Korea University, Gunjae Koo Korea University, Myung Kuk Yoon Ewha Womans University, Yunho Oh Korea University
DOI
11:30
20m
Talk
vNV-Heap: An Ownership-Based Virtually Non-volatile Heap for Embedded Systems
LCTES
Markus Elias Gerber Friedrich-Alexander-Universität Erlangen-Nürnberg, Luis Gerhorst Friedrich-Alexander-Universität Erlangen-Nürnberg, Ishwar Mudraje Saarland University, Kai Vogelgesang Saarland University, Thorsten Herfet Saarland University, Peter Wägemann Friedrich-Alexander University Erlangen-Nürnberg (FAU)
DOI