SSFFT: Energy-Efficient Selective Scaling for Fast Fourier Transform in Embedded GPUs (LCTES 2025 - Languages, Compilers, Tools and Theory of Embedded Systems)

Who

Dongwon Yang, Jaebeom Jeon, Minseong Gil, Junsu Kim, Seondeok Kim, Gunjae Koo, Myung Kuk Yoon, Yunho Oh

Track

LCTES 2025

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 17 Jun 2025 11:10 - 11:30 at Violet - Embedded Systems and Real-Time Optimization Chair(s): Yulei Sui

Abstract

Fast Fourier Transform (FFT) is critical in applications such as signal processing, communications, and AI.
Embedded GPUs are often used to accelerate FFT due to their computational efficiency, but energy efficiency remains a key challenge due to power constraints.
Existing solutions, such as the cuFFT library provided by NVIDIA, employ static configurations for the number of thread blocks and threads per block.
This static approach often results in ineffective threads that consume power without contributing to performance, particularly if the FFT length or batch size varies.
Furthermore, for large FFT lengths, cuFFT internally splits the computation into multiple kernel invocations.
This decomposition can lead to L2 cache thrashing,
resulting in redundant global memory accesses and degraded efficiency.
To address these challenges, this paper proposes SSFFT, a software technique for embedded GPUs.
The key idea of SSFFT is to maximize the number of useful threads that contribute to performance while minimizing ineffective threads.
SSFFT is implemented based on a novel theoretical model that determines how many thread blocks and threads per block are effective for a given FFT length, batch size, and hardware resource availability.
SSFFT statically determines these configurations and adaptively launches either a GPU kernel for regular FFT operations or a newly implemented kernel that integrates multiple FFT steps.
By tailoring thread allocation to workload characteristics and minimizing inter-kernel memory interference, SSFFT improves energy efficiency without compromising performance.
In our evaluation, SSFFT achieves a 1.29$\times$ speedup and a 1.26$\times$ improvement in throughput per watt compared to cuFFT.

DOI

https://doi.org/10.1145/3735452.3735529

Dongwon Yang

Korea University

Jaebeom Jeon

Korea University

Minseong Gil

Korea University

Junsu Kim

Korea University

Seondeok Kim

Korea University

Gunjae Koo

Korea University

Myung Kuk Yoon

Ewha Womans University

Yunho Oh

Korea University

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 17 Jun
Displayed time zone: Seoul change

10:30 - 11:50	Embedded Systems and Real-Time OptimizationLCTES at Violet Chair(s): Yulei Sui University of New South Wales

10:30 20m Talk		rtesbench: A Multi-core Benchmark Framework for Real-Time Embedded Systems LCTES Yixiao Xing Nagoya University, Yixiao Li Nagoya University, Hiroaki Takada Nagoya University DOI
10:50 20m Talk		ASC-Hook: Efficient System Call Interception for ARM LCTES Yang Shen National University of Defense Technology, Min Xie National University of Defense Technology, Tao Wu Changsha University of Science and Technology, Wenzhe Zhang National University of Defense Technology, China, Ruibo Wang National University of Defense Technology, Gen Zhang National University of Defense Technology DOI
11:10 20m Talk		SSFFT: Energy-Efficient Selective Scaling for Fast Fourier Transform in Embedded GPUs LCTES Dongwon Yang Korea University, Jaebeom Jeon Korea University, Minseong Gil Korea University, Junsu Kim Korea University, Seondeok Kim Korea University, Gunjae Koo Korea University, Myung Kuk Yoon Ewha Womans University, Yunho Oh Korea University DOI
11:30 20m Talk		vNV-Heap: An Ownership-Based Virtually Non-volatile Heap for Embedded Systems LCTES Markus Elias Gerber Friedrich-Alexander-Universität Erlangen-Nürnberg, Luis Gerhorst Friedrich-Alexander-Universität Erlangen-Nürnberg, Ishwar Mudraje Saarland University, Kai Vogelgesang Saarland University, Thorsten Herfet Saarland University, Peter Wägemann Friedrich-Alexander University Erlangen-Nürnberg (FAU) DOI