- Todd Austin, David Blaauw, Trevor Mudge, and
Dennis Sylvester, The University of Michigan, Ann Arbor, MI, USA
- Krisztian Flautner, ARM Ltd., Cambridge, UK
- Nam Sung Kim, The Circuit Research Lab., Intel Corp.,
Hillsboro, OR, USA |
This is a one day tutorial on low power robust
computing. The goal of the tutorial is threefold: 1) to summarize
recent low power research for computer microarchitects; 2) to
summarize future technology trends and what they may mean for low
power and robust concerns— again with an emphasis on those aspects
that are relevant to computer microarchitects; and 3) to summarize
fault tolerant computing techniques to support the robust computing.
The importance of robust computing increases as feature sizes are
scaled to reduce power, because process variability increases and
reliability decreases. The presenters have worked together on
different aspects of this subject, assuring a consistency of
presentation.
At a recent International Electron Devices Meeting,
Intel chairman Andrew Grove reiterated his position that Moore’s law (stating
that circuit density doubles roughly every two years) will not slow
down for at least a decade. By that time, integrated circuits (ICs)
will have feature sizes of 30 nanometers, allowing for integration of
billions of devices on a single die and enabling unforeseen
computational capabilities. However, with growing levels of
integration, power densities also skyrocket with the International
Technology Roadmap for Semiconductors predicting that, left unchecked,
power consumption will reach 1200 Watts for high-end processors in
2018. In fact, Grove cites power consumption as a major shows topper
with off-state current leakage “a limiter of integration.”
In addition to the power consumption crisis,
aggressively scaled feature sizes also result in increased process
variability and poor reliability. For instance, the ability to
consistently resolve critical dimensions (CDs) of 30nm is severely
compromised creating substantial uncertainty in device performance.
Hence, Grove mentions that at 30nm design will enter an era of
“probablisitic computing,” with the behavior of logic gates no longer
deterministic. At the same time, susceptibility to single event upsets
(SEUs, or soft errors) from radiation particle strikes will grow due
to supply voltage scaling while power supply integrity (IR drop,
inductive noise, electromigration failure) will be exacerbated by
rapidly increasing current demand. Thus, new approaches to robust and
low power design will be crucial to the successful continuation of
process scaling which has fueled the modern semiconductor industry.
This tutorial will present recent results in
robust low power computing. The perspective will be microarchitectural:
what limitations does it put on the microarchitecture and what can the
microarchitect do to reduce the dependency on power and improve
robustness. The tutorial will start with a technology overview that
charts future trends in power and reliability. Power has been a
growing concern for the past several years, but the emphasis has moved
from dynamic power to static power—subthreshold and gate oxide leakage.
We will present model for each that suppress any detail that is of
less interest to the microarchitect. We will present a summary of
prior research in dynamic power reduction in microarchitectures, and
give some examples of industrial solutions. We will also review prior
research in microarchitectural reduction of leakage performed by us
and others.
Many, but not all, sources of power are reduced
along with reduced feature size. Thus the continuing scaling that
Moore’s Law predicts is in many ways good for reducing power. There
are also other important factors driving feature size reduction, of
course. Unfortunately, scaling also reduces reliability by increasing
uncertainty in device performance. Therefore, in order to take
advantage of scaling, it will be necessary to compute in the presence
of various types of errors. This manifests itself in several ways. Two
that are particularly important are susceptibility to SEUs, and, more
seriously, gates that will not meet their specifications. We will
review techniques to provide robustness in light of these trends. In
particular, we will revisit techniques developed by the fault tolerant
community as well as newer ideas in timing speculation, exemplified by
our Razor research. |
1. Introduction (1.5 hr.)
- Basics theory of power modeling and abstractions for computer
microarchitects.
- Circuit fundamentals for architects.
- Sources of power: dynamic models; static models. Relationship to
heat.
- Models for memory—cache and DRAM, interconnect and logic
components,
- What we can expect in the future—technology trends. Why
reliability will be a challenge as feature sizes reduce. What the
technology trends mean for microarchitects.
2. Dynamic Power (1.5 hr.)
- Survey of microarchitectural solutions such as filter caches,
general activity shielding techniques, and off-chip interconnect.
Basic idea behind dynamic voltage scaling (DVS) to reduce dynamic
power. Survey of commercial solutions such as ARM’s IEM (Intelligent
Energy Management), Transmeta’s Long Run 1 and 2, Intel’s Speed Step,
etc. Future of DVS given the lowering of Vdd with scaling.
3. Static Power (1.5 hr.)
- Simplified leakage models. Subthreshold leakage; gate oxide
leakage.
- Survey of microarchitectural solutions. Memory models of leakage.
Trade-off in multilevel caches. DVS for static power reduction.
Interconnect models. Power vs. delay trade-offs
- Combining static and dynamic models to obtain total power.
Resulting design trade-offs.
- Pipelining vs. parallel processing as a strategies to reduce power.
4. Robust Computing (1.5 hr.)
- Effects of scaling on reliability. How low can we scale voltage?
Susceptibility to SEUs and Increase in variation? What it means for
computer architects
- Designing to protect against SEUs. Traditional techniques for
design for robustness—fault tolerant techniques. techniques from
testing. Techniques employing timing speculation like Razor and
ideas derived from testing.
- “Better than worst case design” ideas.
- Wrap-up and concluding
|