SSFFT: Energy-Efficient Selective Scaling for Fast Fourier Transform in Embedded GPUs
Fast Fourier Transform (FFT) is critical in applications such as signal processing, communications, and AI.
Embedded GPUs are often used to accelerate FFT due to their computational efficiency, but energy efficiency remains a key challenge due to power constraints.
Existing solutions, such as the cuFFT library provided by NVIDIA, employ static configurations for the number of thread blocks and threads per block.
This static approach often results in ineffective threads that consume power without contributing to performance, particularly if the FFT length or batch size varies.
Furthermore, for large FFT lengths, cuFFT internally splits the computation into multiple kernel invocations.
This decomposition can lead to L2 cache thrashing,
resulting in redundant global memory accesses and degraded efficiency.
To address these challenges, this paper proposes SSFFT, a software technique for embedded GPUs.
The key idea of SSFFT is to maximize the number of useful threads that contribute to performance while minimizing ineffective threads.
SSFFT is implemented based on a novel theoretical model that determines how many thread blocks and threads per block are effective for a given FFT length, batch size, and hardware resource availability.
SSFFT statically determines these configurations and adaptively launches either a GPU kernel for regular FFT operations or a newly implemented kernel that integrates multiple FFT steps.
By tailoring thread allocation to workload characteristics and minimizing inter-kernel memory interference, SSFFT improves energy efficiency without compromising performance.
In our evaluation, SSFFT achieves a 1.29$\times$ speedup and a 1.26$\times$ improvement in throughput per watt compared to cuFFT.
Tue 17 JunDisplayed time zone: Seoul change
10:30 - 11:50 | |||
10:30 20mTalk | rtesbench: A Multi-core Benchmark Framework for Real-Time Embedded Systems LCTES DOI | ||
10:50 20mTalk | ASC-Hook: Efficient System Call Interception for ARM LCTES Yang Shen National University of Defense Technology, Min Xie National University of Defense Technology, Tao Wu Changsha University of Science and Technology, Wenzhe Zhang National University of Defense Technology, China, Ruibo Wang National University of Defense Technology, Gen Zhang National University of Defense Technology DOI | ||
11:10 20mTalk | SSFFT: Energy-Efficient Selective Scaling for Fast Fourier Transform in Embedded GPUs LCTES Dongwon Yang Korea University, Jaebeom Jeon Korea University, Minseong Gil Korea University, Junsu Kim Korea University, Seondeok Kim Korea University, Gunjae Koo Korea University, Myung Kuk Yoon Ewha Womans University, Yunho Oh Korea University DOI | ||
11:30 20mTalk | vNV-Heap: An Ownership-Based Virtually Non-volatile Heap for Embedded Systems LCTES Markus Elias Gerber Friedrich-Alexander-Universität Erlangen-Nürnberg, Luis Gerhorst Friedrich-Alexander-Universität Erlangen-Nürnberg, Ishwar Mudraje Saarland University, Kai Vogelgesang Saarland University, Thorsten Herfet Saarland University, Peter Wägemann Friedrich-Alexander University Erlangen-Nürnberg (FAU) DOI |