# Preparing for a Post Moore's Law World

Todd Austin University of Michigan



# **Perspectives on Scaling**

#### C-FAR: Center for Future Architectures Research

- Focused on scaling in 2020-2030 silicon
- Performance, power and cost
- 27 faculty at 14 universities, 92 students



#### • Why i

• The

The th

• Why i

All of the work presented in this talk is that of C-FAR faculty.

oration and dograding

Monympleter drisson

End dylactrinerdeScaling





Big Biatao An aleyticcess





### **Moore's Law Performance Gap**





# **Is Density Still Scaling?**



# What Does This All Mean to Architects?

Today, value = scalability (performance, power, cost).

But, the technology scaling component has left us.





### **Remedy #1: Chip Multiprocessors**





### **CMP Performance Scaling for the Highly Parallel PARSEC Benchmarks**



From "Dark Silicon and the End of Multicore Scaling," by Esmaeilzadeh et al.



# What Does the Press Think?



work. For decades, microprocessors followed what's known as Dennard scaling. Dennard predicted that oxide thickness, transistor length, and transistor width could all be scaled by a constant factor. Dennard scaling is what gave Moore's law its teeth; it's the reason the general-purpose microprocessor was able to overtake and dominate other types of computers.

I don't feel that way. I don't feel good about the speed or crisp is. Not on a desktop, not on a high-end laptop, and especially not on a : my job includes developing software *for* mobile devices, I have messed hem.

continuous web-browsing and, in less demanding situateens. While tablets still hold the crown, computers ha

and I thought. Hmm. And it dawned on me: I don't use real applications anymore.



### We Investigate: Who's to Blame?

Programmers





# **Largest NA Bitcoin Miner**

- GPGPU-based system
- Fills 2000 sq.ft. warehouse
- Computes 1 petahash/s
- Reportedly generates \$8M in Bitcoins per month
- Unfortunately soon to be obsolete as Bitcoin difficulty continues to scale





### We Investigate: Who's to Blame?

11

#### Educators



Programmers





# **CS Education is Booming**

#### CS enrollment on a fast-rising trajectory for a decade

### Parallel programming at UM

- EECS 381, Object-Oriented and Advanced Programming
- EECS 482, Operating Systems
- EECS 570, Parallel Computer Architecture
- · EECS 587, Parallel Computing
- EECS 591, Distributed Systems
- EECS 598, Ubiquitous Parallelism
- I have been teaching and developing CS in Ethiopia
  - Nearly 600 students in the CS program
  - 2<sup>nd</sup> most popular major in the university





## We Investigate: Who's to Blame?

#### Educators



#### Programmers





#### The Transistor



### **The Dark Silicon Dilemma**







Courtesy Michael Taylor @ UCSD

## **The Dark Silicon Dilemma**







Courtesy Michael Taylor @ UCSD

### **The Dark Silicon Dilemma**



#### Fast forward to 2005: Threshold Scaling Problems due to Leakage Prevents Us From Scaling Voltage





Courtesy Michael Taylor @ UCSD

### We Investigate: Who's to Blame?

#### Educators



Programmers







Architects



# The Tyranny of Amdahl's Law



### We Investigate: Who's to Blame?

#### Educators



Programmers









#### Architects



## A Story about Jason and His Two Advisors









### **EVA: Embedded Vision Architecture**



# **Where We Need to Focus**



Heterogeneous parallel systems overcome *dark silicon* and the *tyranny of Amdahl's Law*.



# Why These Ideas Will Likely Fail, Unless We Make a Change...

- *The Good*: Hetero-parallel systems can close the Moore's Law gap
- The Bad: Dennard scaling has stopped, Moore's Law is slowing, leaving a growing gap
- The Ugly: Hetero-parallel designs needed to close the gap will be too expensive to afford

• We must make design much *cheaper*!





# What I Want You to Remember

- Successfully bridging the Moore's Law performance gap is less about "*How*" to do it and more about "*How Much*" does it cost!
- My claim: if we can effect a 100x reduction in the cost to bring a design to market, innovation will flourish and scaling challenges will be overcome.



### **Design Costs Are Skyrocketing**



### **Outcome: "Nanodiversity" is Dwindling**



Year



Source: Gartner Group

# Inexpensive "Design" Promotes Innovation and Adaptation

- Don't Believe Me? Ask Mother Nature!
  - r/K selection theory is a biological mechanism that organisms use to better adapt to their environment
- In unstable environments, *r-selection* predominates as the ability to reproduce quickly is crucial
- In stable environments, *K-selection* predominates as the ability to compete successfully for limited resources is crucial







## **The Remedy: Scale Innovation**

- Ultimate goal: accelerate system architecture innovation and make it sufficiently inexpensive that anyone can do it anywhere
- Approach #1: Expect more from architectural innovation
- Approach #2: Reduce the cost to design custom hardware
- Approach #3: Embrace open-source concepts
- Approach #4: Widen the applicability of custom hardware
- Approach #5: Reduce the cost of manufacturing custom H/W





# 1) Expect more from architectural innovation



#### HELIX-UP Unleashed Parallelization David Brooks @ Harvard

- Traditional parallelizing compilers must honor
  possible dependencies
- HELIX-UP manufactures parallelism by profiling which deps do not exist and which are not needed
  - Based on user supplied output distortion function
- Big step for parallelization
  - 2x speedup over parallelizing compilers, 6x over serial, < 7% distortion





Nehalem 6 cores, 2 threads per core



# Association Rule Mining with the Automata Processor Kevin Skadron @ UVA

- Micron's Automata processor
  - Implements FSMs at memory
  - Massively parallel with accelerators
- Mapped data-mining ARM rules to memory-based FSMs
  - ARM algorithms identify relationships between data elements
  - Implementations are often memory bottlenecked
- Big-data sets had big speedups
  - 90x+ over single CPU performance
  - 2-9x+ speedups over CMPs and GPUs
- Joint effort with UVA and Micron







# 2) Reduce the cost to design custom hardware



#### Better tools and infrastructure

- Scalable accelerator synthesis and compilation, generate code and H/W for highly reusable accelerators
- Composable design space exploration, enables efficient exploration of highly complex design spaces
- Well put-together benchmark suites to drive development efforts



### **CortexSuite: A Synthetic Brain Benchmark Suite**



## 3) Embrace Open-Source Concepts



# 3) Embrace Open-Source Concepts





#### **Red** = non-free IP, **Green** = free IP

# **Open-Source H/W is Growing**









# 4) Widen the Applicability of Customized H/W Krste Asar

Krste Asanovic @ UC-Berkeley



- ESP: Ensembles of Specialized Processors
  - Ensembles are algorithmic-specific processors optimized for code "patterns"
  - Approach uses *composable customization* to deliver speed and efficiency that is widely applicable to general purpose programs
  - Grand challenges remain: what are the components and how are they connected?



# 5) Reduce the cost of manufacturing customized H/W Martha Kim @ Columbia

Briotkerrtdenughtærxpilionenexplbæstfæssidindplay-ltimee
wæstedikæzfæticinating & ACN/ps?+3D + FPGA interconnect





- Diversity via brick ecosystem & interconnect flexibility
- Brick design costs amortized across all designs
- Robust interconnect and custom bricks rival ASIC speeds



# Conclusions

- Heterogeneous design could continue Moore's law perf. scaling via innovation alone
  - But, it requires a diverse hardware ecosystem with affordable customization
- Effective and affordable customization won't happen without our help
  - 1. Expect more from architectural innovation
  - 2. Reduce the cost to design customized design
  - 3. Embrace open-source concepts
  - 4. Widen the applicability of customization
  - 5. Reduce the cost of custom manufacturing
- Increasing "nanodiversity" is a good thing
  - More jobs, companies, and students
  - More competition and scalable innovation







### Questions

