The 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

MICRO-46 Session 7 - Heterogeneous Computing

Heterogeneous System Coherence for Integrated CPU-GPU Systems

Jason Power (AMD)
Arkaprava Basu (University of Wisconsin-Madison) (AMD)
Junli Gu (AMD)
Sooraj Puthoor (AMD)
Bradford M. Beckmann (AMD)
Mark D. Hill (University of Wisconsin-Madison, AMD)
Steven K. Reinhardt (AMD)
David A. Wood (University of Wisconsin-Madison, AMD)

Lightning session talk: PDF, Presentation: PDF, Poster: PDF, Full Paper: DOI 10.1145/2540708.2540747

Abstract:
Many future heterogeneous systems will integrate CPUs and GPUs physically on a single chip and logically connect them via shared memory to avoid explicit data copying. Making this shared memory coherent facilitates programming and fine-grained sharing, but throughput-oriented GPUs can overwhelm CPUs with coherence requests not well-filtered by caches. Meanwhile, region coherence has been proposed for CPU-only systems to reduce snoop bandwidth by obtaining coherence permissions for large regions.

This paper develops Heterogeneous System Coherence (HSC) for CPU-GPU systems to mitigate the coherence bandwidth effects of GPU memory requests. HSC replaces a standard directory with a region directory and adds a region buffer to the L2 cache. These structures allow the system to move bandwidth from the coherence network to the high-bandwidth direct-access bus without sacrificing coherence.

Evaluation results with a subset of Rodinia benchmarks and the AMD APP SDK show that HSC can improve performance compared to a conventional directory protocol by an average of more than 2× and a maximum of more than 4.5×. Additionally, HSC reduces the bandwidth to the directory by an average of 94% and by more than 99% for four of the analyzed benchmarks.