The 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

MICRO-46 Session 4B - Cache Management

Imbalanced Cache Partitioning for Balanced Data-Parallel Programs

Abhisek Pan (Purdue University)
Vijay S. Pai (Purdue University)

Lightning session talk: PDF, Presentation: PDF, Poster: PDF, Full Paper: DOI 10.1145/2540708.2540734

Abstract:
This paper investigates partitioning the ways of a shared last-level cache among the threads of a symmetric data-parallel application running on a chip-multiprocessor. Unlike prior work on way-partitioning for unrelated threads in a multiprogramming workload, the domain of multithreaded programs requires both throughput and fairness. Additionally, our workloads show no obvious thread differences to exploit: program threads see nearly identical IPC and data reuse as they progress (as expected for a well-written load-balanced data-parallel program).

Despite the balance and symmetry among threads, this paper shows that a balanced partitioning of cache ways between threads is suboptimal. Instead, this paper proposes a strategy of temporarily imbalancing the partitions between different threads to improve cache utilization by adapting to the locality behavior of the threads as captured by dynamic set-specific reuse-distance (SSRD). Cumulative SSRD histograms have knees that correspond to different important working sets; thus, cache ways can be taken away from a thread with only minimal performance impact if that thread is currently operating far from a knee. Those ways can then be given to a single "preferred" thread to push it over the next knee. The preferred thread is chosen in a round-robin fashion to ensure balanced progress over the execution. The algorithm also effectively handles scenarios where an unpartitioned cache might outperform any sort of explicit partitioning. This dynamic partition imbalance algorithm allows up to 44% reduction in execution time and 91% reduction in misses over an unpartitioned shared cache for 9 benchmarks from the PARSEC-2.0 and SPEC OMP suites.