JCSE, vol. 12, no. 2, pp.50-62, 2018
DOI: http://dx.doi.org/10.5626/JCSE.2018.12.2.50
Improving CPU and GPU Performance through Sample-Based Dynamic LLC Bypassing
Xin Wang and Wei Zhang
Department of Electrical and Computer Engineering, Virginia Commonwealth University, Richmond, VA, USA
Abstract: The current trend toward integrated central processing units (CPUs) and graphics processing units (GPUs) on the same
chip presents new challenges for the efficient and fair sharing of resources. Unlike traditional multicores, CPU and GPU
cores in the integrated architecture can generate diverse cache traffics and exhibit quite different temporal or spatial data
localities. The shared last-level cache (LLC) between the two can result in a large amount of interference between CPU
and GPU LLC accesses, thus impacting the performance of both the CPUs and GPUs. Cache bypassing is a promising
method to improve LLC performance and to alleviate resource contention between CPUs and GPUs. However, inefficient
cache bypassing may lead to significant Network on Chip (NoC) traffic congestion and subsequent performance
degradation, particularly for a CPU on a heterogeneous CPU-GPU system with an on-chip ring network. To manage the
LLC more efficiently, we propose a sample-based dynamic cache bypassing method for shared LLC in heterogeneous
CPU-GPU multicore systems. This method samples the LLC miss rates and NoC traffics for both the CPU and GPU at
run-time and uses a statistical bypassing decision-making model to intelligently decide whether to bypass or not. Our
experiments show that bypassing CPU can be more useful than bypassing GPU for integrated CPU-GPU architecture
with ring-based NoC topology. Our results indicate that bypassing both CPU and GPU can improve CPU performance by
34.30% and GPU performance by 3.20%, while bypassing CPU alone enhances CPU performance by 38.09% and GPU
performance by 1.11%, and bypassing GPU alone increases CPU performance by 4.12% and GPU performance by
2.60%, on average.
Keyword:
Cache bypassing; Last level cache; Heterogeneous CPU-GPU architecture
Full Paper: 314 Downloads, 1875 View
|