MICRO logo

Annual IEEE/ACM International Symposium on Microarchitecture®

MICRO Test of Time Award

List of Eligible Papers for the 2019 Award

View the 2019 call for nominations.

MICRO 1997

Paper TitleAuthors
The Bi-Mode Branch PredictorChih-Chieh Lee, I-Cheng K. Chen, Trevor N. Mudge
Path-Based Next Trace PredictionQuinn Jacobson, Eric Rotenberg, James E. Smith
Alternative Fetch and Issue Policies for the Trace Cache Fetch MechanismDaniel Holmes Friendly, Sanjay Jeram Patel, Yale N. Patt
Reducing the Performance Impact of Instruction Cache Misses by Writing Instructions Into the Reservation Stations Out-of-OrderJared Stark, Paul Racunas, Yale N. Patt
On High-Bandwidth Data Cache Design for Multi-Issue ProcessorsJude A. Rivers, Gary S. Tyson, Edward S. Davidson, Todd M. Austin
Run-Time Spatial Locality Detection and OptimizationTeresa L. Johnson, Matthew C. Merten, Wen-Mei W. Hwu
A Comparison of Data Prefetching On an Access Decoupled and Superscalar MachineG. P. Jones, N. P. Topham
The Design and Performance of a Conflict-Avoiding CacheNigel Topham, Antonio González, José González
Prediction Caches for Superscalar ProcessorsJames E. Bennett, Michael J. Flynn
A Framework for Balancing Control Flow and PredicationDavid I. August, Wen-mei W. Hwu, Scott A. Mahlke
Evaluation of Scheduling Techniques On a SPARC-Based VLIW TestbedSeongbae Park, SangMin Shim, Soo-Mook Moon
Tuning Compiler Optimizations for Simultaneous MultithreadingJack L. Lo, Susan J. Eggers, Henry M. Levy, Sujay S. Parekh, Dean M. Tullsen
Exploiting Dead Value InformationMilo M. Martin, Amir Roth, Charles N. Fischer
Trace ProcessorsEric Rotenberg, Quinn Jacobson, Yiannakis Sazeides, Jim Smith
The Multicluster Architecture: Reducing Cycle Time Through PartitioningKeith I. Farkas, Paul Chow, Norman P. Jouppi, Zvonko Vranesic
Out-of-Order Vector ArchitecturesRoger Espasa, Mateo Valero, James E. Smith
Initial Results On the Performance and Cost of Vector MicroprocessorsCorinna G. Lee, Derek J. DeVries
The Filter Cache: An Energy Efficient Memory StructureJohnson Kin, Munish Gupta, William H. Mangione-Smith
Improving Code Density Using Compression TechniquesCharles Lefurgy, Peter Bird, I-Cheng Chen, Trevor Mudge
Procedure Based Program CompressionDarko Kirovski, Johnson Kin, William H. Mangione-Smith
Improving the Accuracy and Performance of Memory Communication Through RenamingGary S. Tyson, Todd M. Austin
Microarchitecture Support for Improving the Performance of Load Target PredictionChung-Ho Chen, Akida Wu
Streamlining Inter-Operation Memory Communication Via Data Dependence PredictionAndreas Moshovos, Gurindar S. Sohi
The Predictability of Data ValuesYiannakis Sazeides, James E. Smith
Value ProfilingBrad Calder, Peter Feller, Alan Eustace
Can Program Profiling Support Value Prediction?Freddy Gabbay, Avi Mendelson
Highly Accurate Data Value Prediction Using Hybrid PredictorsKai Wang, Manoj Franklin
ProfileMe: Hardware Support for Instruction-Level Profiling On Out-of-Order ProcessorsJeffrey Dean, James E. Hicks, Carl A. Waldspurger, William E. Weihl, George Chrysos
Procedure Placement Using Temporal Ordering InformationNikolas Gloy, Trevor Blackwell, Michael D. Smith, Brad Calder
Predicting Data Cache Misses in Non-Numeric Applications Through Correlation ProfilingTodd C. Mowry, Chi-Keung Luk
MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communicatons SystemsChunho Lee, Miodrag Potkonjak, William H. Mangione-Smith
Cache Sensitive Modulo SchedulingF. Jesús Sánchez, Antonio González
Unroll-and-Jam Using Uniformly Generated SetsSteve Carr, Yiping Guan
Resource-Sensitive Profile-Directed Data Flow Analysis for Code OptimizationRajiv Gupta, David A. Berson, Jesse Z. Fang

MICRO 1998

Paper TitleAuthors
A Bandwidth-Efficient Architecture for Media ProcessingScott Rixner, William J. Dally, Ujval J. Kapasi, Brucek Khailany, Abelardo López-Lagunas, Peter R. Mattson, John D. Owens
Exploiting Instruction Level Parallelism in Geometry Processing for Three Dimensional Graphics ApplicationsChia-Lin Yang, Barton Sano, Alvin R. Lebeck
Simple Vector Microprocessors for Multimedia ApplicationsCorinna G. Lee, Mark G. Stoodley
Evaluating MMX Technology Using DSP and Multimedia ApplicationsRavi Bhargava, Lizy K. John, Brian L. Evans, Ramesh Radhakrishnan
Analyzing the Working Set Characteristics of Branch ExecutionSangwook P. Kim, Gary S. Tyson
Dataflow Analysis of Branch Mispredictions and Its Application to Early Resolution of Branch OutcomesAlexandre Farcy, Olivier Temam, Roger Espasa, Toni Juan
The YAGS Branch Prediction SchemeAvinoam N. Eden, Trevor Mudge
Task Selection for a Multiscalar ProcessorT. N. Vijaykumar, Gurindar S. Sohi
Split-Path Enhanced Pipeline Scheduling for Loops with Control FlowsSangMin Shim, Soo-Mook Moon
Effective Cluster Assignment for Modulo SchedulingErik Nystrom, Alexandre E. Eichenberger
Better Global Scheduling Using Path ProfilesCliff Young, Michael D. Smith
Predictive Techniques for Aggressive Load SpeculationGlenn Reinman, Brad Calder
Compiler-Directed Early Load-Address GenerationBen-Chung Cheng, Daniel A. Connors, Wen-mei W. Hwu
Load Latency Tolerance in Dynamically Scheduled ProcessorsSrikanth T. Srinivasan, Alvin R. Lebeck
Improving I/O Performance with a Conditional Store BufferLambert Schaelicke, Al Davis
Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache MicroprocessorsDaniel Holmes Friendly, Sanjay Jeram Patel, Yale N. Patt
Cooperative Prefetching: Compiler and Hardware Support for Effective Instruction Prefetching in Modern ProcessorsChi-Keung Luk, Todd C. Mowry
Code Compression Based on Operand FactorizationGuido Araujo, Paulo Centoducatte, Mario Cartes, Ricardo Pannain
Understanding the Differences Between Value Prediction and Instruction ReuseAvinash Sodani, Gurindar S. Sohi
A Novel Renaming Scheme to Exploit Value Temporal Locality Through Physical Register Reuse and UnificationStephen Jourdan, Ronny Ronen, Michael Bekerman, Bishara Shomar, Adi Yoaz
A Dynamic Multithreading ProcessorHaitham Akkary, Michael A. Driscoll
Widening Resources: A Cost-Effective Technique for Aggressive ILP ArchitecturesDavid López, Josep Llosa, Mateo Valero, Eduard Ayguadé
The Cascaded Predictor: Economical and Adaptive Branch Target PredictionKarel Driesen, Urs Hölzle
Improving Prediction for Procedure Returns with Return-Address-Stack Repair MechanismsKevin Skadron, Pritpal S. Ahuja, Margaret Martonosi, Douglas W. Clark
Predicting Indirect Branches via Data CompressionJohn Kalamatianos, David R. Kaeli
Improving Locality Using Loop and Data Transformations in an Integrated FrameworkMahmut Kandemir, Alok Choudhary, J. Ramanujam, Prithviraj Banerjee
Precise Register Allocation for Irregular ArchitecturesTimothy Kong, Kent D. Wilken
Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File MicroarchitecturesEmre Özer, Sanjeev Banerjia, Thomas M. Conte

MICRO 1999

Paper TitleAuthors
Control Independence in Trace ProcessorsEric Rotenberg, James E. Smith
Fetch Directed Instruction PrefetchingGlenn Reinman, Brad Calder, Todd M. Austin
Improving Branch Predictors by Correlating on Data ValuesTimothy H. Heil, Zak Smith, James E. Smith
Instruction Fetch Mechanisms for Multipath Execution ProcessorsArtur Klauser, Dirk Grunwald
A Superscalar 3D Graphics EngineAndrew Wolfe, Derek B. Noonburg
Dynamic 3D Graphics Workload Characterization and the Architectural ImplicationsTulika Mitra, Tzi-cker Chiueh
Exploiting a New Level of DLP in Multimedia ApplicationsJesus Corbal, Roger Espasa, Mateo Valero
Compiler-Driven Cached Code Compression Schemes for Embedded ILP ProcessorsSergei Y. Larin, Thomas M. Conte
Evaluation of a High Performance Code Compression MethodCharles Lefurgy, Eva Piccininni, Trevor N. Mudge
Low-Cost Branch Folding for Embedded Applications with Small Tight LoopsLea Hwang Lee, Jeff Scott, Bill Moyer, John Arends
Automatic and Efficient Evaluation of Memory Hierarchies for Embedded SystemsSantosh G. Abraham, Scott A. Mahlke
Hardware Identification of Cache Conflict MissesJamison D. Collins, Dean M. Tullsen
Access Region Locality for High-Bandwidth Processor Memory System DesignSangyeun Cho, Pen-Chung Yew, Gyungho Lee
Code Transformations to Improve Memory ParallelismVijay S. Pai, Sarita V. Adve
Compiler-Directed Dynamic Computation Reuse: Rationale and Initial ResultsDaniel A. Connors, Wen-mei W. Hwu
Dynamic Memory Disambiguation in the Presence of Out-of-Order Store IssuingSoner Onder, Rajiv Gupta
Read-After-Read Memory Dependence PredictionAndreas Moshovos, Gurindar S. Sohi
Delaying Physical Register Allocation through Virtual-Physical RegistersTeresa Monreal, Antonio González, Mateo Valero, José González, Victor Viñals
Exploiting ILP in Page-based Intelligent MemoryMark Oskin, Justin Hensley, Diana Keen, Frederic T. Chong, Matthew K. Farrens, Aneet Chopra
The Use of Multithreading for Exception HandlingCraig B. Zilles, Joel S. Emer, Gurindar S. Sohi
Value Prediction for Speculative Multithreaded ArchitecturesPedro Marcuello, Jordi Tubella, Antonio González
Predicting the Usefulness of a Block Result: A Micro-Architectural Technique for High-Performance Low-Power ProcessorsEnric Musoll
Selective Cache Ways: On-Demand Cache Resource AllocationDavid H. Albonesi
Wavefront Scheduling: Path Based Data Representation and Scheduling of SubgraphsJay Bharadwaj, Kishore N. Menezes, Chris McKinsey
Balance Scheduling: Weighting Branch Tradeoffs in SuperblocksAlexandre E. Eichenberger, Waleed Meleis
Optimizations and Oracle Parallelism with Dynamic TranslationKemal Ebcioglu, Erik R. Altman, Sumedh W. Sathaye, Michael Gschwind

MICRO 2000

Paper TitleAuthors
Eager Writeback - A Technique for Improving Bandwidth UtilizationHsien-Hsin S. Lee, Gary S. Tyson, Matthew K. Farrens
Silent Stores for FreeKevin M. Lepak, Mikko H. Lipasti
A Permutation-Based Page Interleaving Scheme to Reduce Row-Buffer Conflicts and Exploit Data LocalityZhao Zhang, Zhichun Zhu, Xiaodong Zhang
Predictor-Directed Stream BuffersTimothy Sherwood, Suleyman Sair, Brad Calder
On Pipelining Dynamic Instruction Scheduling LogicJared Stark, Mary D. Brown, Yale N. Patt
The Impact of Delay on the Design of Branch PredictorsDaniel A. Jiménez, Stephen W. Keckler, Calvin Lin
Improving BTB Performance in the Presence of DLLsStevan A. Vlaovic, Edward S. Davidson, Gary S. Tyson
Efficient Checker Processor DesignSaugata Chatterjee, Christopher T. Weaver, Todd M. Austin
An Integrated Approach to Accelerate Data and Predicate Computations in HyperblocksAlexandre E. Eichenberger, Waleed Meleis, Suman Maradani
Accurate and Efficient Predicate Analysis with Binary Decision DiagramsJohn W. Sias, Wen-mei W. Hwu, David I. August
Modulo Scheduling for a Fully-Distributed Clustered VLIW ArchitectureF. Jesús Sánchez, Antonio González
Two-Level Hierarchical Register File Organization for VLIW ProcessorsJavier Zalamea, Josep Llosa, Eduard Ayguadé, Mateo Valero
PipeRench Implementation of the Instruction Path CoprocessorYuan C. Chou, Pazhani Pillai, Herman Schmit, John Paul Shen
Efficient Conditional Operations for Data-Parallel ArchitecturesUjval J. Kapasi, William J. Dally, Scott Rixner, Peter R. Mattson, John D. Owens, Brucek Khailany
Flexible Hardware Acceleration for Multimedia Oriented MicroprocessorsFrederik Vermeulen, Lode Nachtergaele, Francky Catthoor, Diederik Verkest, Hugo De Man
Very Low Power Pipelines Using Significance CompressionRamon Canal, Antonio González, James E. Smith
A Static Power Model for ArchitectsJ. Adam Butts, Gurindar S. Sohi
A Framework for Dynamic Energy Efficiency and Temperature ManagementMichael C. Huang, Jose Renau, Seung-Moon Yoo, Josep Torrellas
Dynamic Zero Compression for Cache Energy ReductionLuis Villa, Michael Zhang, Krste Asanovic
Register Integration: A Simple and Efficient Implementation of Squash ReuseAmir Roth, Gurindar S. Sohi
The Store-Load Address Table and Speculative Register PromotionMatt Postiff, David A. Greene, Trevor N. Mudge
Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor ArchitecturesRajeev Balasubramonian, David H. Albonesi, Alper Buyuktosunoglu, Sandhya Dwarkadas
Frequent Value Compression in Data CachesJun Yang, Youtao Zhang, Rajiv Gupta
A Study of Slipstream ProcessorsZachary Purser, Karthik Sundaramoorthy, Eric Rotenberg
Relational Profiling: Enabling Thread-Level Parallelism in Virtual MachinesTimothy H. Heil, James E. Smith
Calpa: A Tool for Automating Selective Dynamic CompilationMarkus Mock, Craig Chambers, Susan J. Eggers
Increasing the Size of Atomic Instruction Blocks Using Control Flow AssertionsSanjay J. Patel, Tony Tung, Satarupa Bose, Matthew M. Crum
Reducing Wire Delay Penalty Through Value PredictionJoan-Manuel Parcerisa, Antonio González
Compiler Controlled Value Prediction Using Branch Predictor Based ConfidenceEric Larson, Todd M. Austin
Instruction Distribution Heuristics for Quad-Cluster, Dynamically-Scheduled, Superscalar ProcessorsAmirali Baniasadi, Andreas Moshovos
Performance Improvement with Circuit-Level SpeculationTong Liu, Shih-Lien Lu

MICRO 2001

Paper TitleAuthors
Skipper: A Microarchitecture for Exploiting Control-Flow IndependenceChen-Yong Cher, T. N. Vijaykumar
Performance Characterization of a Hardware Mechanism for Dynamic OptimizationBrian Fahs, Satarupa Bose, Matthew Crum, Brian Slechta, Francesco Spadini, Tony Tung, Sanjay J. Patel, Steven S. Lumetta
Using Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time SystemsEric Rotenberg
A Design Space Evaluation of Grid Processor ArchitecturesRamadass Nagarajan, Karthikeyan Sankaralingam, Doug Burger, Stephen W. Keckler
Reducing Set-Associative Cache Energy via Way-Prediction and Selective Direct-MappingMichael D. Powell, Amit Agarwal, T. N. Vijaykumar, Babak Falsafi, Kaushik Roy
A Code Decompression Architecture for VLIW ProcessorsYuan Xie, Wayne Wolf, Haris Lekatsas
Direct Load: Dependence-Linked Dataflow Resolution of Load Address and Cache CoordinateByung-Kwon Chung, Jinsuo Zhang, Jih-Kwon Peir, Shih-Chang Lai, Konrad Lai
Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath ResourcesDmitry Ponomarev, Gurhan Kucuk, Kanad Ghose
Exploiting VLIW Schedule Slacks for Dynamic and Leakage Energy ReductionWensheng Zhang, Vijaykrishnan Narayanan, Mahmut Kandemir, Mary Jane Irwin, David Duarte, Yuh-Fang Tsai
Reducing Power with Dynamic Critical Path InformationJohn S. Seng, Eric S. Tune, Dean M. Tullsen
Direct Addressed Caches for Reduced Power ConsumptionEmmett Witchel, Sam Larsen, C. Scott Ananian, Krste Asanović
Modulo Schedule BuffersMatthew C. Merten, Wen-mei W. Hwu
Graph-Partitioning Based Instruction Scheduling for Clustered ProcessorsAlex Aletà, Josep M. Codina, Jesús Sánchez, Antonio González
Modulo Scheduling with Integrated Register Spilling for Clustered VLIW ArchitecturesJavier Zalamea, Josep Llosa, Eduard Ayguadé, Mateo Valero
Efficient Static Single Assignment Form for PredicationArthur Stoutchinin, Francois de Ferriere
The Impact of If-Conversion and Branch Prediction on Program Execution on the Intel® Itanium™ ProcessorYoungsoo Choi, Allan Knies, Luke Gerke, Tin-Fook Ngai
Mapping Reference Code to Irregular DSPs Within the Retargetable, Optimizing Compiler COGEN(T)Gary William Gréwal, Charles Thomas Wilson
Select-Free Instruction Scheduling LogicMary D. Brown, Jared Stark, Yale N. Patt
Dual Use of Superscalar Datapath for Transient-Fault Detection and RecoveryJoydeep Ray, James C. Hoe, Babak Falsafi
A High-Speed Dynamic Instruction Scheduling Scheme for Superscalar ProcessorsMasahiro Goshima, Kengo Nishino, Toshiaki Kitamura, Yasuhiko Nakashima, Shinji Tomita, Shin-ichiro Mori
Reducing the Complexity of the Register File in Dynamic Superscalar ProcessorsRajeev Balasubramonian, Sandhya Dwarkadas, David H. Albonesi
Saving Energy with Architectural and Frequency Adaptations for Multimedia ApplicationsChristopher J. Hughes, Jayanth Srinivasan, Sarita V. Adve
Enhancing Loop Buffering of Media and Telecommunications Applications Using Low-Overhead PredicationJohn W. Sias, Hillery C. Hunter, Wen-mei W. Hwu
Cool-Cache for Hot MultimediaOsman S. Unsal, Raksit Ashok, Israel Koren, C. Mani Krishna, Csaba Andras Moritz
ZR: A 3D API Transparent Technology for Chunk RenderingEmile Hsieh, Vladimir Pentkovski, Thomas Piazza
Speculative Lock Elision: Enabling Highly Concurrent Multithreaded ExecutionRavi Rajwar, James R. Goodman
Dynamic Speculative PrecomputationJamison D. Collins, Dean M. Tullsen, Hong Wang, John P. Shen
Handling Long-Latency Loads in a Simultaneous Multithreading ProcessorDean M. Tullsen, Jeffery A. Brown
Correctly Implementing Value Prediction in Microprocessors That Support Multithreading or MultiprocessingMilo M. K. Martin, Daniel J. Sorin, Harold W. Cain, Mark D. Hill, Mikko H. Lipasti