JCSE, vol. 11, no. 2, pp.69-77, 2017
DOI: http://dx.doi.org/10.5626/JCSE.2017.11.2.69
Enhancing GPU Performance by Efficient Hardware-Based and Hybrid L1 Data Cache Bypassing
Yijie Huangfu and Wei Zhang
Department of Electrical and Computer Engineering, Virginia Commonwealth University, Richmond, VA, USA
Abstract: Recent GPUs have adopted cache memory to benefit general-purpose GPU (GPGPU) programs. However, unlike CPU
programs, GPGPU programs typically have considerably less temporal/spatial locality. Moreover, the L1 data cache is
used by many threads that access a data size typically considerably larger than the L1 cache, making it critical to bypass
L1 data cache intelligently to enhance GPU cache performance. In this paper, we examine GPU cache access behavior
and propose a simple hardware-based GPU cache bypassing method that can be applied to GPU applications without
recompiling programs. Moreover, we introduce a hybrid method that integrates static profiling information and hardware-
based bypassing to further enhance performance. Our experimental results reveal that hardware-based cache
bypassing can boost performance for most benchmarks, and the hybrid method can achieve performance comparable to
state-of-the-art compiler-based bypassing with considerably less profiling cost.
Keyword:
GPU; CUDA; Cache bypassing; Memory traffic; Profiling
Full Paper: 269 Downloads, 1515 View
|