Low Power Robust Computing

 

Schedule
Saturday 4th, all day
Organizers/Speakers
- Todd Austin, David Blaauw, Trevor Mudge, and Dennis Sylvester, The University of Michigan, Ann Arbor, MI, USA
- Krisztian Flautner, ARM Ltd., Cambridge, UK
- Nam Sung Kim, The Circuit Research Lab., Intel Corp., Hillsboro, OR, USA
Abstract

This is a one day tutorial on low power robust computing. The goal of the tutorial is threefold: 1) to summarize recent low power research for computer microarchitects; 2) to summarize future technology trends and what they may mean for low power and robust concerns— again with an emphasis on those aspects that are relevant to computer microarchitects; and 3) to summarize fault tolerant computing techniques to support the robust computing. The importance of robust computing increases as feature sizes are scaled to reduce power, because process variability increases and reliability decreases. The presenters have worked together on different aspects of this subject, assuring a consistency of presentation.

At a recent International Electron Devices Meeting, Intel chairman Andrew Grove reiterated his position that Moore’s law (stating that circuit density doubles roughly every two years) will not slow down for at least a decade. By that time, integrated circuits (ICs) will have feature sizes of 30 nanometers, allowing for integration of billions of devices on a single die and enabling unforeseen computational capabilities. However, with growing levels of integration, power densities also skyrocket with the International Technology Roadmap for Semiconductors predicting that, left unchecked, power consumption will reach 1200 Watts for high-end processors in 2018. In fact, Grove cites power consumption as a major shows topper with off-state current leakage “a limiter of integration.”

In addition to the power consumption crisis, aggressively scaled feature sizes also result in increased process variability and poor reliability. For instance, the ability to consistently resolve critical dimensions (CDs) of 30nm is severely compromised creating substantial uncertainty in device performance. Hence, Grove mentions that at 30nm design will enter an era of “probablisitic computing,” with the behavior of logic gates no longer deterministic. At the same time, susceptibility to single event upsets (SEUs, or soft errors) from radiation particle strikes will grow due to supply voltage scaling while power supply integrity (IR drop, inductive noise, electromigration failure) will be exacerbated by rapidly increasing current demand. Thus, new approaches to robust and low power design will be crucial to the successful continuation of process scaling which has fueled the modern semiconductor industry.

 This tutorial will present recent results in robust low power computing. The perspective will be microarchitectural: what limitations does it put on the microarchitecture and what can the microarchitect do to reduce the dependency on power and improve robustness. The tutorial will start with a technology overview that charts future trends in power and reliability. Power has been a growing concern for the past several years, but the emphasis has moved from dynamic power to static power—subthreshold and gate oxide leakage. We will present model for each that suppress any detail that is of less interest to the microarchitect. We will present a summary of prior research in dynamic power reduction in microarchitectures, and give some examples of industrial solutions. We will also review prior research in microarchitectural reduction of leakage performed by us and others.

Many, but not all, sources of power are reduced along with reduced feature size. Thus the continuing scaling that Moore’s Law predicts is in many ways good for reducing power. There are also other important factors driving feature size reduction, of course. Unfortunately, scaling also reduces reliability by increasing uncertainty in device performance. Therefore, in order to take advantage of scaling, it will be necessary to compute in the presence of various types of errors. This manifests itself in several ways. Two that are particularly important are susceptibility to SEUs, and, more seriously, gates that will not meet their specifications. We will review techniques to provide robustness in light of these trends. In particular, we will revisit techniques developed by the fault tolerant community as well as newer ideas in timing speculation, exemplified by our Razor research.

Outline
1. Introduction (1.5 hr.)

- Basics theory of power modeling and abstractions for computer microarchitects.
- Circuit fundamentals for architects.
- Sources of power: dynamic models; static models. Relationship to heat.
- Models for memory—cache and DRAM, interconnect and logic components,
- What we can expect in the future—technology trends. Why reliability will be a challenge as feature sizes reduce. What the technology trends mean for microarchitects.
 

2. Dynamic Power (1.5 hr.)

- Survey of microarchitectural solutions such as filter caches, general activity shielding techniques, and off-chip interconnect. Basic idea behind dynamic voltage scaling (DVS) to reduce dynamic power. Survey of commercial solutions such as ARM’s IEM (Intelligent Energy Management), Transmeta’s Long Run 1 and 2, Intel’s Speed Step, etc. Future of DVS given the lowering of Vdd with scaling.
 

3. Static Power (1.5 hr.)

- Simplified leakage models. Subthreshold leakage; gate oxide leakage.
- Survey of microarchitectural solutions. Memory models of leakage. Trade-off in multilevel caches. DVS for static power reduction.
Interconnect models. Power vs. delay trade-offs
- Combining static and dynamic models to obtain total power. Resulting design trade-offs.
- Pipelining vs. parallel processing as a strategies to reduce power.
 

4. Robust Computing (1.5 hr.)

- Effects of scaling on reliability. How low can we scale voltage? Susceptibility to SEUs and Increase in variation? What it means for computer architects
- Designing to protect against SEUs. Traditional techniques for design for robustness—fault tolerant techniques. techniques from testing. Techniques employing timing speculation like Razor and ideas derived from testing.
- “Better than worst case design” ideas.
- Wrap-up and concluding