MICRO-48- Proceedings of the 48th International Symposium on Microarchitecture

Full Citation in the ACM Digital Library

SESSION: Best paper candidates

Large pages and lightweight memory management in virtualized environments: can you have it both ways?

Binh Pham
Ján Veselý
Gabriel H. Loh
Abhishek Bhattacharjee

Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems

Guowei Zhang
Webb Horn
Daniel Sanchez

CCICheck: using µhb graphs to verify the coherence-consistency interface

Yatin A. Manerkar
Daniel Lustig
Michael Pellauer
Margaret Martonosi

SESSION: Cache

HyComp: a hybrid cache compression method for selection of data-type-specific compression methods

Angelos Arelakis
Fredrik Dahlgren
Per Stenstrom

Doppelgänger: a cache for approximate computing

Joshua San Miguel
Jorge Albericio
Andreas Moshovos
Natalie Enright Jerger

The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory

Lavanya Subramanian
Vivek Seshadri
Arnab Ghosh
Samira Khan
Onur Mutlu

MORC: a manycore-oriented compressed cache

Tri M. Nguyen
David Wentzlaff

SESSION: Security

Avoiding information leakage in the memory controller with fixed service policies

Ali Shafiee
Akhila Gundu
Manjunath Shevgoor
Rajeev Balasubramonian
Mohit Tiwari

Fork path: improving efficiency of ORAM by removing redundant memory accesses

Xian Zhang
Guangyu Sun
Chao Zhang
Weiqi Zhang
Yun Liang
Tao Wang
Yiran Chen
Jia Di

Locking down insecure indirection with hardware-based control-data isolation

William Arthur
Sahil Madeka
Reetuparna Das
Todd Austin

Authenticache: harnessing cache ECC for system authentication

Anys Bacha
Radu Teodorescu

SESSION: Prefetching

Efficiently prefetching complex address patterns

Manjunath Shevgoor
Sahil Koladiya
Rajeev Balasubramonian
Chris Wilkerson
Seth H. Pugsley
Zeshan Chishti

Self-contained, accurate precomputation prefetching

Islam Atta
Xin Tong
Vijayalakshmi Srinivasan
Ioana Baldini
Andreas Moshovos

Confluence: unified instruction supply for scale-out servers

Cansu Kaynak
Boris Grot
Babak Falsafi

IMP: indirect memory prefetcher

Xiangyao Yu
Christopher J. Hughes
Nadathur Satish
Srinivas Devadas

SESSION: Concurrency

DeSC: decoupled supply-compute communication management for heterogeneous architectures

Tae Jun Ham
Juan L. Aragón
Margaret Martonosi

Efficient warp execution in presence of divergence with collaborative context collection

Farzad Khorasani
Rajiv Gupta
Laxmi N. Bhuyan

Control flow coalescing on a hybrid dataflow/von Neumann GPGPU

Dani Voitsechov
Yoav Etsion

A scalable architecture for ordered parallelism

Mark C. Jeffrey
Suvinay Subramanian
Cong Yan
Joel Emer
Daniel Sanchez

SESSION: DRAM

More is less: improving the energy efficiency of data movement via opportunistic use of sparse codes

Yanwei Song
Engin Ipek

Improving DRAM latency with dynamic asymmetric subarray

Shih-Lien Lu
Ying-Chen Lin
Chia-Lin Yang

Gather-scatter DRAM: in-DRAM address translation to improve the spatial locality of non-unit strided accesses

Vivek Seshadri
Thomas Mullins
Amirali Boroumand
Onur Mutlu
Phillip B. Gibbons
Michael A. Kozuch
Todd C. Mowry

SESSION: Voltage

The CRISP performance model for dynamic voltage and frequency scaling in a GPGPU

Rajib Nath
Dean Tullsen

Safe limits on voltage reduction efficiency in GPUs: a direct measurement approach

Jingwen Leng
Alper Buyuktosunoglu
Ramon Bertran
Pradip Bose
Vijay Janapa Reddi

Adaptive guardband scheduling to improve system-level efficiency of the POWER7+

Yazhou Zu
Charles R. Lefurgy
Jingwen Leng
Matthew Halpern
Michael S. Floyd
Vijay Janapa Reddi

SESSION: Micro-architecture

DynaMOS: dynamic schedule migration for heterogeneous cores

Shruti Padmanabha
Andrew Lukefahr
Reetuparna Das
Scott Mahlke

Long term parking (LTP): criticality-aware resource allocation in OOO processors

Andreas Sembrant
Trevor Carlson
Erik Hagersten
David Black-Shaffer
Arthur Perais
André Seznec
Pierre Michaud

The inner most loop iteration counter: a new dimension in branch history

André Seznec
Joshua San Miguel
Jorge Albericio

Filtered runahead execution with a runahead buffer

Milad Hashemi
Yale N. Patt

Bungee jumps: accelerating indirect branches through HW/SW co-design

Daniel S. McFarlin
Craig Zilles

SESSION: GPU

SAWS: synchronization aware GPGPU warp scheduling for multiple independent warp schedulers

Jiwei Liu
Jun Yang
Rami Melhem

Enabling coordinated register allocation and thread-level parallelism optimization for GPUs

Xiaolong Xie
Yun Liang
Xiuhong Li
Yudong Wu
Guangyu Sun
Tao Wang
Dongrui Fan

Free launch: optimizing GPU dynamic kernel launches through thread reuse

Guoyang Chen
Xipeng Shen

GPU register file virtualization

Hyeran Jeon
Gokul Subramanian Ravi
Nam Sung Kim
Murali Annavaram

WarpPool: sharing requests with inter-warp coalescing for throughput processors

John Kloosterman
Jonathan Beaumont
Mick Wollman
Ankit Sethia
Ron Dreslinski
Trevor Mudge
Scott Mahlke

SESSION: Accelerator

Ultra-low power render-based collision detection for CPU/GPU systems

Enrique de Lucas
Pedro Marcuello
Joan-Manuel Parcerisa
Antonio González

Execution time prediction for energy-efficient hardware accelerators

Tao Chen
Alexander Rucker
G. Edward Suh

Border control: sandboxing accelerators

Lena E. Olson
Jason Power
Mark D. Hill
David A. Wood

Neural acceleration for GPU throughput processors

Amir Yazdanbakhsh
Jongse Park
Hardik Sharma
Pejman Lotfi-Kamran
Hadi Esmaeilzadeh

Neuromorphic accelerators: a comparison between neuroscience and machine-learning approaches

Zidong Du
Daniel D. Ben-Dayan Rubin
Yunji Chen
Liqiang He
Tianshi Chen
Lei Zhang
Chengyong Wu
Olivier Temam

SESSION: Mobile & emerging systems

Prediction-guided performance-energy trade-off for interactive applications

Daniel Lo
Taejoon Song
G. Edward Suh

Architecture-aware automatic computation offload for native applications

Gwangmu Lee
Hyunjoon Park
Seonyeong Heo
Kyung-Ah Chang
Hyogun Lee
Hanjun Kim

Fast support for unstructured data processing: the unified automata processor

Yuanwei Fang
Tung T. Hoang
Michela Becchi
Andrew A. Chien

Enabling interposer-based disintegration of multi-core processors

Ajaykumar Kannan
Natalie Enright Jerger
Gabriel H. Loh

DCS: a fast and scalable device-centric server architecture

Jaehyung Ahn
Dongup Kwon
Youngsok Kim
Mohammadamin Ajdari
Jaewon Lee
Jangwoo Kim

SESSION: Datacenter

Modeling the implications of DRAM failures and protection techniques on datacenter TCO

Panagiota Nikolaou
Yiannakis Sazeides
Lorena Ndreu
Marios Kleanthous

TimeTrader: exploiting latency tail to save datacenter energy for online search

Balajee Vamanan
Hamza Bin Sohail
Jahangir Hasan
T. N. Vijaykumar

Rubik: fast analytical power management for latency-critical systems

Harshad Kasture
Davide B. Bartolini
Nathan Beckmann
Daniel Sanchez

SESSION: Memory systems

CLEAN-ECC: high reliability ECC for adaptive granularity memory system

Seong-Lyong Gong
Minsoo Rhu
Jungrae Kim
Jinsuk Chung
Mattan Erez

vCache: architectural support for transparent and isolated virtual LLCs in virtualized environments

Daehoon Kim
Hwanju Kim
Nam Sung Kim
Jaehyuk Huh

An integrated concurrency and core-ISA architectural envelope definition, and test oracle, for IBM POWER multiprocessors

Kathryn E. Gray
Gabriel Kerneis
Dominic Mulligan
Christopher Pulte
Susmit Sarkar
Peter Sewell

SESSION: Coherence, consistency, persistency

Efficient GPU synchronization without scopes: saying no to complex consistency models

Matthew D. Sinclair
Johnathan Alsop
Sarita V. Adve

Efficient persist barriers for multicores

Arpit Joshi
Vijay Nagarajan
Marcelo Cintra
Stratis Viglas

ThyNVM: enabling software-transparent crash consistency in persistent memory systems

Jinglei Ren
Jishen Zhao
Samira Khan
Jongmoo Choi
Yongwei Wu
Onur Mutlu

Coherence domain restriction on large scale systems

Yaosheng Fu
Tri M. Nguyen
David Wentzlaff

Efficiently enforcing strong memory ordering in GPUs

Abhayendra Singh
Shaizeen Aga
Satish Narayanasamy

SESSION: Modeling & characterization

Characterizing, modeling, and improving the QoE of mobile devices with low battery level

Kaige Yan
Xingyao Zhang
Xin Fu

Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance

Newsha Ardalani
Clint Lestourgeon
Karthikeyan Sankaralingam
Xiaojin Zhu

A fast and accurate analytical technique to compute the AVF of sequential bits in a processor

Steven Raasch
Arijit Biswas
Jon Stephan
Paul Racunas
Joel Emer

Enabling portable energy efficiency with memory accelerated library

Qi Guo
Tze-Meng Low
Nikolaos Alachiotis
Berkin Akin
Larry Pileggi
James C. Hoe
Franz Franchetti

Microarchitectural implications of event-driven server-side web applications

Yuhao Zhu
Daniel Richins
Matthew Halpern
Vijay Janapa Reddi