The MICRO Test of Time (ToT) award recognizes the most influential papers published in prior sessions of the International Symposium on Microarchitecture, each of whom has had a significant impact in the field. For the year 2020, the selection committee (comprising of Onur Mutlu, Pradip Bose, Daniel Jimenez, Tor Aamodt, Sreenivas Subramoney, and Reetu Das) chose three award papers democratically based on rigorous two-round rankings and a final discussion meeting:
Congratulations to the winners! For each of the papers, the committee has put together a brief highlight of its impact and significance:
This paper presents one of the pioneering paradigms proposed in the 1990s that pushed the envelope in terms of extracting ILP. Like the foundational Multiscalar paper (ISCA, 1995), this DMT (dynamic multi-threading) paradigm increases instruction supply by fetching from multiple control boundary points within a single program. In key contrast to Multiscalar, the fetch threads are created automatically by hardware (at procedure and loop boundaries), without relying on compiler support. The spawned threads are executed speculatively on a simultaneously multithreading pipeline. A two-level hierarchical instruction window mechanism is supported by out-of-order instruction fetch and dispatch. Coupled also with novel data value prediction, DMT enables distant ILP exploitation by effectively increasing the dispatchable instruction pool, without incurring the hardware complexity of large single-window instruction wake-up and select logic. In summary, this foundational paper opened up the path beyond the known ILP limits of a single program’s superscalar execution model, without any compiler dependence.
This paper introduced the idea of Fetch Directed Instruction Prefetching, an instruction prefetching technique in which the branch predictor and branch target buffer are allowed to run ahead of the main thread of execution providing addresses to prefetch instructions along the predicted path. Building on the idea of the Fetch Target Queue introduced previously by the same authors, this paper demonstrates a simple but very effective instruction prefetching technique that has had a significant and long-lasting impact on microprocessor design as well as academic research into instruction prefetching. Given today’s ever-expanding instruction working set sizes, we expect this foundational work to become even more important as research into instruction prefetching continues.
This paper demonstrates that set conflicts at the cache level lead to bank conflicts at the main memory level, increasing row buffer conflicts, and memory latency. It proposes the permutation-based memory interleaving technique to solve the problem at a very low hardware cost. The proposed method has had a significant impact on modern systems. For example, Sun Microsystems adopted the method and many mainstream commercial processors use the method (click here for a list of some examples) or a variant of it.