JCSE, vol. 12, no. 2, pp.37-49, 2018
DOI: http://dx.doi.org/10.5626/JCSE.2018.12.2.37
Packing Narrow-Width Operands to Improve GPU Performance
Xin Wang and Wei Zhang
Department of Electrical and Computer Engineering, Virginia Commonwealth University, Richmond, VA, USA
Abstract: Graphics processing units (GPUs), originally designed for graphics applications, have become a popular platform to
accelerate general purpose computations. By exploiting massive thread-level parallelism (TLP), GPUs can achieve high
throughput as well as memory latency hiding. A GPU typically employs a very large register file (RF) in order to support
fast and low-cost context switching between tens of thousands of active threads. As a result, exploiting the RF efficiently
is critical for the GPU to achieve high performance. We observe that for many GPGPU applications, a large percentage
of computed results actually have fewer significant bits compared to the full width of a 32-bit register, and thus propose
a GPU register packing scheme to dynamically exploit narrow-width operands and pack multiple operands into a single
full-width register. By using dynamical register packing, more RF space becomes available which allows the GPU to
enable more TLP through assigning additional thread blocks on streaming multiprocessors (SMs), and thus improve performance.
Our experimental results indicate that dynamic register packing can improve GPU performance by up to
1.96X, and by 1.18X on average.
Keyword:
GPU register file; Narrow-width operand; Dynamic register packing
Full Paper: 391 Downloads, 1526 View
|