MICRO-48- Proceedings of the 48th International Symposium on Microarchitecture

Full Citation in the ACM Digital Library

SESSION: Best paper candidates

Large pages and lightweight memory management in virtualized environments: can you have it both ways?

Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems

CCICheck: using µhb graphs to verify the coherence-consistency interface


HyComp: a hybrid cache compression method for selection of data-type-specific compression methods

Doppelgänger: a cache for approximate computing

The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory

MORC: a manycore-oriented compressed cache

SESSION: Security

Avoiding information leakage in the memory controller with fixed service policies

Fork path: improving efficiency of ORAM by removing redundant memory accesses

Locking down insecure indirection with hardware-based control-data isolation

Authenticache: harnessing cache ECC for system authentication

SESSION: Prefetching

Efficiently prefetching complex address patterns

Self-contained, accurate precomputation prefetching

Confluence: unified instruction supply for scale-out servers

IMP: indirect memory prefetcher

SESSION: Concurrency

DeSC: decoupled supply-compute communication management for heterogeneous architectures

Efficient warp execution in presence of divergence with collaborative context collection

Control flow coalescing on a hybrid dataflow/von Neumann GPGPU

A scalable architecture for ordered parallelism


More is less: improving the energy efficiency of data movement via opportunistic use of sparse codes

Improving DRAM latency with dynamic asymmetric subarray

Gather-scatter DRAM: in-DRAM address translation to improve the spatial locality of non-unit strided accesses

SESSION: Voltage

The CRISP performance model for dynamic voltage and frequency scaling in a GPGPU

Safe limits on voltage reduction efficiency in GPUs: a direct measurement approach

Adaptive guardband scheduling to improve system-level efficiency of the POWER7+

SESSION: Micro-architecture

DynaMOS: dynamic schedule migration for heterogeneous cores

Long term parking (LTP): criticality-aware resource allocation in OOO processors

The inner most loop iteration counter: a new dimension in branch history

Filtered runahead execution with a runahead buffer

Bungee jumps: accelerating indirect branches through HW/SW co-design


SAWS: synchronization aware GPGPU warp scheduling for multiple independent warp schedulers

Enabling coordinated register allocation and thread-level parallelism optimization for GPUs

Free launch: optimizing GPU dynamic kernel launches through thread reuse

GPU register file virtualization

WarpPool: sharing requests with inter-warp coalescing for throughput processors

SESSION: Accelerator

Ultra-low power render-based collision detection for CPU/GPU systems

Execution time prediction for energy-efficient hardware accelerators

Border control: sandboxing accelerators

Neural acceleration for GPU throughput processors

Neuromorphic accelerators: a comparison between neuroscience and machine-learning approaches

SESSION: Mobile & emerging systems

Prediction-guided performance-energy trade-off for interactive applications

Architecture-aware automatic computation offload for native applications

Fast support for unstructured data processing: the unified automata processor

Enabling interposer-based disintegration of multi-core processors

DCS: a fast and scalable device-centric server architecture

SESSION: Datacenter

Modeling the implications of DRAM failures and protection techniques on datacenter TCO

TimeTrader: exploiting latency tail to save datacenter energy for online search

Rubik: fast analytical power management for latency-critical systems

SESSION: Memory systems

CLEAN-ECC: high reliability ECC for adaptive granularity memory system

vCache: architectural support for transparent and isolated virtual LLCs in virtualized environments

An integrated concurrency and core-ISA architectural envelope definition, and test oracle, for IBM POWER multiprocessors

SESSION: Coherence, consistency, persistency

Efficient GPU synchronization without scopes: saying no to complex consistency models

Efficient persist barriers for multicores

ThyNVM: enabling software-transparent crash consistency in persistent memory systems

Coherence domain restriction on large scale systems

Efficiently enforcing strong memory ordering in GPUs

SESSION: Modeling & characterization

Characterizing, modeling, and improving the QoE of mobile devices with low battery level

Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance

A fast and accurate analytical technique to compute the AVF of sequential bits in a processor

Enabling portable energy efficiency with memory accelerated library

Microarchitectural implications of event-driven server-side web applications