JCSE

JCSE, vol. 12, no. 2, pp.50-62, 2018

DOI: http://dx.doi.org/10.5626/JCSE.2018.12.2.50

Improving CPU and GPU Performance through Sample-Based Dynamic LLC Bypassing

Xin Wang and Wei Zhang
Department of Electrical and Computer Engineering, Virginia Commonwealth University, Richmond, VA, USA

Abstract: The current trend toward integrated central processing units (CPUs) and graphics processing units (GPUs) on the same chip presents new challenges for the efficient and fair sharing of resources. Unlike traditional multicores, CPU and GPU cores in the integrated architecture can generate diverse cache traffics and exhibit quite different temporal or spatial data localities. The shared last-level cache (LLC) between the two can result in a large amount of interference between CPU and GPU LLC accesses, thus impacting the performance of both the CPUs and GPUs. Cache bypassing is a promising method to improve LLC performance and to alleviate resource contention between CPUs and GPUs. However, inefficient cache bypassing may lead to significant Network on Chip (NoC) traffic congestion and subsequent performance degradation, particularly for a CPU on a heterogeneous CPU-GPU system with an on-chip ring network. To manage the LLC more efficiently, we propose a sample-based dynamic cache bypassing method for shared LLC in heterogeneous CPU-GPU multicore systems. This method samples the LLC miss rates and NoC traffics for both the CPU and GPU at run-time and uses a statistical bypassing decision-making model to intelligently decide whether to bypass or not. Our experiments show that bypassing CPU can be more useful than bypassing GPU for integrated CPU-GPU architecture with ring-based NoC topology. Our results indicate that bypassing both CPU and GPU can improve CPU performance by 34.30% and GPU performance by 3.20%, while bypassing CPU alone enhances CPU performance by 38.09% and GPU performance by 1.11%, and bypassing GPU alone increases CPU performance by 4.12% and GPU performance by 2.60%, on average.

Keyword: Cache bypassing; Last level cache; Heterogeneous CPU-GPU architecture

Full Paper: 316 Downloads, 2078 View