The 48th Annual IEEE/ACM International Symposium on Microarchitecture, 2015
AMD Research has developed an APU (Accelerated Processing Unit) model that extends gem5 with a GPU timing model that executes the Heterogeneous System Architecture Intermediate Language (HSAIL). The resulting AMD gem5 APU simulator is a cycle-level flexible research model that is capable of representing many different APU configurations, on-chip cache hierarchies, and system designs. Our APU extensions allow researchers to model both CPU and GPU memory requests and the interactions between them. In particular, the model uses SLICC and Ruby to implement a wide variety of coherence and synchronization solutions, which is a critical research area in heterogeneous computing. For example, the model has been used in several top-tier computer architecture publications in the last 2 years [Micro 2013, HPCA 2014, ASPLOS 2014, ISCA 2014, HPCA 2015, ASPLOS 2015].
In this tutorial, we will describe the capabilities of the AMD gem5 APU simulator that will be publically released with a liberal BSD license before MICRO 2015. We will detail the simulated APU architecture, review the execution flow, and describe how the simulator has been used. The presentation will also discuss key design decisions and tradeoffs. For example, we use of system-call emulation mode to avoid the OS and driver support. Also, our GPU model directly executes HSAIL instructions rather than the proprietary hardware instructions. Relying on HSA’s publicly available intermediate language simplifies the simulator’s distribution and eliminates the dependency on the final machine-dependent compilation step, but is less accurate when modeling low-level hardware details such as register behavior.