Annual IEEE/ACM International Symposium on Microarchitecture

MICRO Test of Time Award

List of Eligible Papers for the 2017 Award

View the 2017 call for nominations.

MICRO 1995

Paper TitleAuthors
Performance Issues in Correlated Branch Prediction SchemesNicolas Gloy, Michael D. Smith, Cliff Young
Dynamic Path-Based Branch CorrelationRavi Nair
The Predictability of Branches in LibrariesBrad Calder, Dirk Grunwald, Amitabh Srivastava
The Performance Impact of Incomplete Bypassing in Processor PipelinesPritpal S. Ahuja, Douglas W. Clark, Anne Rogers
Efficient Instruction Scheduling Using Finite State AutomataVasanth Bala, Norman Rubin
Critical Path Reduction for Scalar ProgramsMichael Schlansker, Vinod Kathail
A Limit Study of Local Memory Requirements Using Value Reuse ProfilesAndrew S. Huang, John P. Shen
Zero-Cycle Loads: Microarchitecture Support for Reducing Load LatencyTodd M. Austin, Gurindar S. Sohi
A Modified Approach to Data Cache ManagementGary Tyson, Matthew Farrens, John Matthews, Andrew R. Pleszkun
Petri Net Versus Modulo Scheduling for Software PipeliningVicki H. Allan, U. R. Shah, K. M. Reddy
Modulo Scheduling with Multiple Initiation IntervalsNancy J. Warter-Perez, Noubar Partamian
Spill-Free Parallel Scheduling of Basic BlocksB. Natarajan, M. Schlansker
Improving Instruction-Level Parallelism by Loop Unrolling and Dynamic Memory DisambiguationJack W. Davidson, Sanjay Jinturkar
Self-Regulation of Workload in the Manchester Data-Flow ComputerJohn R. Gurd, David F. Snelling
The M-Machine MulticomputerMarco Fillo, Stephen W. Keckler, William J. Dally, Nicholas P. Carter, Andrew Chang, Yevgeny Gurevich, Whay S. Lee
Region-Based Compilation: An Introduction and MotivationRichard E. Hank, Wen-Mei W. Hwu, B. Ramakrishna Rau
An Experimental Study of Several Cooperative Register Allocation and Instruction Scheduling StrategiesCindy Norris, Lori L. Pollock
Register Allocation for Predicated CodeAlexandre E. Eichenberger, Edward S. Davidson
Partial Resolution in Branch Target BuffersBarry Fagin, Kathryn Russell
A System Level Perspective On Branch Architecture PerformanceBrad Calder, Dirk Grunwald, Joel Emer
Dynamic Rescheduling: A Technique for Object Code Compatibility in VLIW ArchitecturesThomas M. Conte, Sumedh W. Sathaye
Improving CISC Instruction Decoding Performance Using a Fill UnitMark Smotherman, Manoj Franklin
SPAID: Software Prefetching in Pointer- and Call-Intensive EnvironmentsMikko H. Lipasti, William J. Schmidt, Steven R. Kunkel, Robert R. Roediger
An Effective Programmable Prefetch Engine for On-Chip CachesTien-Fu Chen
Cache Miss Heuristics and Preloading Techniques for General-Purpose ProgramsToshihiro Ozawa, Yasunori Kimura, Shin'ichiro Nishizaki
Alternative Implementations of Hybrid Branch PredictorsPo-Ying Chang, Eric Hao, Yale N. Patt
Control Flow Prediction with Tree-Like Subgraphs for Superscalar ProcessorsSimonjit Dutta, Manoj Franklin
The Role of Adaptivity in Two-Level Adaptive Branch PredictionStuart Sechrest, Chih-Chieh Lee, Trevor Mudge
Design of Storage Hierarchy in Multithreaded ArchitecturesLucas Roh, Walid A. Najjar
An Investigation of the Performance of Various Instruction-Issue Buffer TopologiesStéphan Jourdan, Pascal Sainrat, Daniel Litaize
Decoupling Integer Execution in Superscalar ProcessorsSubbarao Palacharla, J. E. Smith
Exploiting Short-Lived Variables in Superscalar ProcessorsLuis A. Lozano, Guang R. Gao
Partitioned Register File for TTAsJohan Janssen, Henk Corporaal
Disjoint Eager Execution: An Optimal Form of Speculative ExecutionAugustus K. Uht, Vijay Sindagi, Kelley Hall
Unrolling-Based Optimizations for Modulo SchedulingDaniel M. Lavery, Wen-Mei W. Hwu
Stage Scheduling: A Technique to Reduce the Register Requirements of a Modulo ScheduleAlexandre E. Eichenberger, Edward S. Davidson
Hypernode Reduction Modulo SchedulingJosep Llosa, Mateo Valero, Eduard Ayguadé, Antonio González

MICRO 1996

Paper TitleAuthors
A Persistent Rescheduled-Page Cache for Low Overhead Object Code Compatibility in VLIW ArchitecturesThomas M. Conte, Sumedh W. Sathaye, Sanjeev Banerjia
Integrating a Misprediction Recovery Cache (MRC) Into a Superscalar PipelineJames O. Bondi, Ashwini K. Nanda, Simonjit Dutta
Accurate and Practical Profile-Driven Compilation Using the Profile BufferThomas M. Conte, Kishore N. Menezes, Mary Ann Hirsch
Efficient Path ProfilingThomas Ball, James R. Larus
Profile-Driven Instruction Level Parallel Scheduling with Application to Super BlocksC. Chekuri, R. Johnson, R. Motwani, B. Natarajan, B. R. Rau, M. Schlansker
Speculative Hedge: Regulating Compile-Time Speculation Against Profile VariationsBrian L. Deitrich, Wen-mei W. Hwu
Hot Cold Optimization of Large Windows/NT ApplicationsRobert Cohn, P. Geoffrey Lowney
Java Bytecode to Native Code Translation: The Caffeine Prototype and Preliminary ResultsCheng-Hsueh A. Hsieh, John C. Gyllenhaal, Wen-mei W. Hwu
Analysis Techniques for Predicated CodeRichard Johnson, Michael Schlansker
Global Predicate Analysis and Its Application to Register AllocationDavid M. Gillies, Dz-ching Roy Ju, Richard Johnson, Michael Schlansker
Modulo Scheduling of Loops in Control-Intensive Non-Numeric ProgramsDaniel M. Lavery, Wen-mei W. Hwu
Assigning Confidence to Conditional Branch PredictionsErik Jacobsen, Eric Rotenberg, J. E. Smith
Compiler Synthesized Dynamic Branch PredictionScott Mahlke, Balas Natarajan
Wrong-Path Instruction PrefetchingJim Pierce, Trevor Mudge
Design Decisions Influencing the UltraSPARC's Instruction Fetch ArchitectureRobert Yung
Increasing the Instruction Fetch Rate Via Block-Structured Instruction Set ArchitecturesEric Hao, Po-Yung Chang, Marius Evers, Yale N. Patt
Instruction Fetch Mechanisms for VLIW Architectures with Compressed EncodingsThomas M. Conte, Sanjeev Banerjia, Sergei Y. Larin, Kishore N. Menezes, Sumedh W. Sathaye
Tango: A Hardware-Based Data Prefetching Technique for Superscalar ProcessorsShlomit S. Pinter, Adi Yoaz
Exceeding the Dataflow Limit Via Value PredictionMikko H. Lipasti, John Paul Shen
The Performance Potential of Data Dependence Speculation & CollapsingYiannakis Sazeides, Stamatis Vassiliadis, James E. Smith
Heuristics for Register-Constrained Software PipeliningJosep Llosa, Mateo Valero, Eduard Ayguadé
Software Pipelining Loops with Conditional BranchesMark G. Stoodley, Corinna G. Lee
Combining Loop Transformations Considering Caches and SchedulingMichael E. Wolf, Dror E. Maydan, Ding-Kai Chen
Instruction Scheduling and Executable EditingEric Schnarr, James R. Larus
Instruction Scheduling for the HP PA-8000David A. Dunn, Wei-Chung Hsu
Meld Scheduling: Relaxing Scheduling Constraints Across Region BoundariesSantosh G. Abraham, Vinod Kathail, Brian L. Deitrich
Custom-Fit Processors: Letting Applications Define ArchitecturesJoseph A. Fisher, Paolo Faraboschi, Giuseppe Desoli
Optimization for a Superscalar Out-of-Order MachineAnne M. Holler
Optimization of Machine Descriptions for Efficient UseJohn C. Gyllenhaal, Wen-mei W. Hwu, B. Ramabriohna Rau

MICRO 1997

Paper TitleAuthors
The Bi-Mode Branch PredictorChih-Chieh Lee, I-Cheng K. Chen, Trevor N. Mudge
Path-Based Next Trace PredictionQuinn Jacobson, Eric Rotenberg, James E. Smith
Alternative Fetch and Issue Policies for the Trace Cache Fetch MechanismDaniel Holmes Friendly, Sanjay Jeram Patel, Yale N. Patt
Reducing the Performance Impact of Instruction Cache Misses by Writing Instructions Into the Reservation Stations Out-of-OrderJared Stark, Paul Racunas, Yale N. Patt
On High-Bandwidth Data Cache Design for Multi-Issue ProcessorsJude A. Rivers, Gary S. Tyson, Edward S. Davidson, Todd M. Austin
Run-Time Spatial Locality Detection and OptimizationTeresa L. Johnson, Matthew C. Merten, Wen-Mei W. Hwu
A Comparison of Data Prefetching On an Access Decoupled and Superscalar MachineG. P. Jones, N. P. Topham
The Design and Performance of a Conflict-Avoiding CacheNigel Topham, Antonio González, José González
Prediction Caches for Superscalar ProcessorsJames E. Bennett, Michael J. Flynn
A Framework for Balancing Control Flow and PredicationDavid I. August, Wen-mei W. Hwu, Scott A. Mahlke
Evaluation of Scheduling Techniques On a SPARC-Based VLIW TestbedSeongbae Park, SangMin Shim, Soo-Mook Moon
Tuning Compiler Optimizations for Simultaneous MultithreadingJack L. Lo, Susan J. Eggers, Henry M. Levy, Sujay S. Parekh, Dean M. Tullsen
Exploiting Dead Value InformationMilo M. Martin, Amir Roth, Charles N. Fischer
Trace ProcessorsEric Rotenberg, Quinn Jacobson, Yiannakis Sazeides, Jim Smith
The Multicluster Architecture: Reducing Cycle Time Through PartitioningKeith I. Farkas, Paul Chow, Norman P. Jouppi, Zvonko Vranesic
Out-of-Order Vector ArchitecturesRoger Espasa, Mateo Valero, James E. Smith
Initial Results On the Performance and Cost of Vector MicroprocessorsCorinna G. Lee, Derek J. DeVries
The Filter Cache: An Energy Efficient Memory StructureJohnson Kin, Munish Gupta, William H. Mangione-Smith
Improving Code Density Using Compression TechniquesCharles Lefurgy, Peter Bird, I-Cheng Chen, Trevor Mudge
Procedure Based Program CompressionDarko Kirovski, Johnson Kin, William H. Mangione-Smith
Improving the Accuracy and Performance of Memory Communication Through RenamingGary S. Tyson, Todd M. Austin
Microarchitecture Support for Improving the Performance of Load Target PredictionChung-Ho Chen, Akida Wu
Streamlining Inter-Operation Memory Communication Via Data Dependence PredictionAndreas Moshovos, Gurindar S. Sohi
The Predictability of Data ValuesYiannakis Sazeides, James E. Smith
Value ProfilingBrad Calder, Peter Feller, Alan Eustace
Can Program Profiling Support Value Prediction?Freddy Gabbay, Avi Mendelson
Highly Accurate Data Value Prediction Using Hybrid PredictorsKai Wang, Manoj Franklin
ProfileMe: Hardware Support for Instruction-Level Profiling On Out-of-Order ProcessorsJeffrey Dean, James E. Hicks, Carl A. Waldspurger, William E. Weihl, George Chrysos
Procedure Placement Using Temporal Ordering InformationNikolas Gloy, Trevor Blackwell, Michael D. Smith, Brad Calder
Predicting Data Cache Misses in Non-Numeric Applications Through Correlation ProfilingTodd C. Mowry, Chi-Keung Luk
Available Paralellism in Video ApplicationsHeng Liao, Andrew Wolfe
MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communicatons SystemsChunho Lee, Miodrag Potkonjak, William H. Mangione-Smith
Cache Sensitive Modulo SchedulingF. Jesús Sánchez, Antonio González
Unroll-and-Jam Using Uniformly Generated SetsSteve Carr, Yiping Guan
Resource-Sensitive Profile-Directed Data Flow Analysis for Code OptimizationRajiv Gupta, David A. Berson, Jesse Z. Fang

MICRO 1998

Paper TitleAuthors
A Bandwidth-Efficient Architecture for Media ProcessingScott Rixner, William J. Dally, Ujval J. Kapasi, Brucek Khailany, Abelardo López-Lagunas, Peter R. Mattson, John D. Owens
Exploiting Instruction Level Parallelism in Geometry Processing for Three Dimensional Graphics ApplicationsChia-Lin Yang, Barton Sano, Alvin R. Lebeck
Simple Vector Microprocessors for Multimedia ApplicationsCorinna G. Lee, Mark G. Stoodley
Evaluating MMX Technology Using DSP and Multimedia ApplicationsRavi Bhargava, Lizy K. John, Brian L. Evans, Ramesh Radhakrishnan
Analyzing the Working Set Characteristics of Branch ExecutionSangwook P. Kim, Gary S. Tyson
Dataflow Analysis of Branch Mispredictions and Its Application to Early Resolution of Branch OutcomesAlexandre Farcy, Olivier Temam, Roger Espasa, Toni Juan
The YAGS Branch Prediction SchemeAvinoam N. Eden, Trevor Mudge
Task Selection for a Multiscalar ProcessorT. N. Vijaykumar, Gurindar S. Sohi
Split-Path Enhanced Pipeline Scheduling for Loops with Control FlowsSangMin Shim, Soo-Mook Moon
Effective Cluster Assignment for Modulo SchedulingErik Nystrom, Alexandre E. Eichenberger
Better Global Scheduling Using Path ProfilesCliff Young, Michael D. Smith
Predictive Techniques for Aggressive Load SpeculationGlenn Reinman, Brad Calder
Compiler-Directed Early Load-Address GenerationBen-Chung Cheng, Daniel A. Connors, Wen-mei W. Hwu
Load Latency Tolerance in Dynamically Scheduled ProcessorsSrikanth T. Srinivasan, Alvin R. Lebeck
Improving I/O Performance with a Conditional Store BufferLambert Schaelicke, Al Davis
Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache MicroprocessorsDaniel Holmes Friendly, Sanjay Jeram Patel, Yale N. Patt
Cooperative Prefetching: Compiler and Hardware Support for Effective Instruction Prefetching in Modern ProcessorsChi-Keung Luk, Todd C. Mowry
Code Compression Based on Operand FactorizationGuido Araujo, Paulo Centoducatte, Mario Cartes, Ricardo Pannain
Understanding the Differences Between Value Prediction and Instruction ReuseAvinash Sodani, Gurindar S. Sohi
A Novel Renaming Scheme to Exploit Value Temporal Locality Through Physical Register Reuse and UnificationStephen Jourdan, Ronny Ronen, Michael Bekerman, Bishara Shomar, Adi Yoaz
A Dynamic Multithreading ProcessorHaitham Akkary, Michael A. Driscoll
Widening Resources: A Cost-Effective Technique for Aggressive ILP ArchitecturesDavid López, Josep Llosa, Mateo Valero, Eduard Ayguadé
The Cascaded Predictor: Economical and Adaptive Branch Target PredictionKarel Driesen, Urs Hölzle
Improving Prediction for Procedure Returns with Return-Address-Stack Repair MechanismsKevin Skadron, Pritpal S. Ahuja, Margaret Martonosi, Douglas W. Clark
Predicting Indirect Branches via Data CompressionJohn Kalamatianos, David R. Kaeli
Improving Locality Using Loop and Data Transformations in an Integrated FrameworkMahmut Kandemir, Alok Choudhary, J. Ramanujam, Prithviraj Banerjee
Precise Register Allocation for Irregular ArchitecturesTimothy Kong, Kent D. Wilken
Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File MicroarchitecturesEmre Özer, Sanjeev Banerjia, Thomas M. Conte

MICRO 1999

Paper TitleAuthors
Control Independence in Trace ProcessorsEric Rotenberg, James E. Smith
Fetch Directed Instruction PrefetchingGlenn Reinman, Brad Calder, Todd M. Austin
Improving Branch Predictors by Correlating on Data ValuesTimothy H. Heil, Zak Smith, James E. Smith
Instruction Fetch Mechanisms for Multipath Execution ProcessorsArtur Klauser, Dirk Grunwald
A Superscalar 3D Graphics EngineAndrew Wolfe, Derek B. Noonburg
Dynamic 3D Graphics Workload Characterization and the Architectural ImplicationsTulika Mitra, Tzi-cker Chiueh
Exploiting a New Level of DLP in Multimedia ApplicationsJesus Corbal, Roger Espasa, Mateo Valero
Compiler-Driven Cached Code Compression Schemes for Embedded ILP ProcessorsSergei Y. Larin, Thomas M. Conte
Evaluation of a High Performance Code Compression MethodCharles Lefurgy, Eva Piccininni, Trevor N. Mudge
Low-Cost Branch Folding for Embedded Applications with Small Tight LoopsLea Hwang Lee, Jeff Scott, Bill Moyer, John Arends
Automatic and Efficient Evaluation of Memory Hierarchies for Embedded SystemsSantosh G. Abraham, Scott A. Mahlke
Hardware Identification of Cache Conflict MissesJamison D. Collins, Dean M. Tullsen
Access Region Locality for High-Bandwidth Processor Memory System DesignSangyeun Cho, Pen-Chung Yew, Gyungho Lee
Code Transformations to Improve Memory ParallelismVijay S. Pai, Sarita V. Adve
Compiler-Directed Dynamic Computation Reuse: Rationale and Initial ResultsDaniel A. Connors, Wen-mei W. Hwu
Dynamic Memory Disambiguation in the Presence of Out-of-Order Store IssuingSoner Onder, Rajiv Gupta
Read-After-Read Memory Dependence PredictionAndreas Moshovos, Gurindar S. Sohi
Delaying Physical Register Allocation through Virtual-Physical RegistersTeresa Monreal, Antonio Gonz, Mateo Valero, José González, Victor Viñals
DIVA: A Reliable Substrate for Deep Submicron Microarchitecture DesignTodd M. Austin
Exploiting ILP in Page-based Intelligent MemoryMark Oskin, Justin Hensley, Diana Keen, Frederic T. Chong, Matthew K. Farrens, Aneet Chopra
The Use of Multithreading for Exception HandlingCraig B. Zilles, Joel S. Emer, Gurindar S. Sohi
Value Prediction for Speculative Multithreaded ArchitecturesPedro Marcuello, Jordi Tubella, Antonio Gonz
Predicting the Usefulness of a Block Result: A Micro-Architectural Technique for High-Performance Low-Power ProcessorsEnric Musoll
Selective Cache Ways: On-Demand Cache Resource AllocationDavid H. Albonesi
Wavefront Scheduling: Path based Data Representation and Scheduling of SubgraphsJay Bharadwaj, Kishore N. Menezes, Chris McKinsey
Balance Scheduling: Weighting Branch Tradeoffs in SuperblocksAlexandre E. Eichenberger, Waleed Meleis
Optimizations and Oracle Parallelism with Dynamic TranslationKemal Ebcioglu, Erik R. Altman, Sumedh W. Sathaye, Michael Gschwind