October 15, 2016 (Saturday)
Time / Room 406 405 403 402
8:30-10:30 NoCArc (9th International Workshop on Network on Chip Architectures)

Organizers: Maurizio Palesi, Masoud Daneshtalab, Xiaohang Wang
Accel (Tutorial on Rapid Exploration of Accelerator-rich Architectures: Automation from Concept to Prototyping)

Organizers: Jason Cong, Zhenman Fang, Yakun Sophia Shao
GPGPU (Tutorial on Intel Graphics Architecture: ISA and Microarchitecture)

Organizers: Subramaniam Maiyuran, Jason Ross and Ken Lueh
Tejas (Tejas: a versatile Java based architectural simulator)

Organizers:Prathmesh Kallurkar and Smruti Sarangi
13:30-15:00 BigBench+SAE: Instrumenting an Industry-Standard BigData Benchmark for BigData Analytics

Organizers: Vijay Janapa Reddi, Nadav Chachmon, Magnus Christensson, Daniel Richins
NOPE (2nd Workshop on Negative Outcomes, Post-mortems, and Experiences)

Organizers: Bob Adolf, Svilen Kanev, Brandon Reagen
October 16, 2016 (Sunday)
Time / Room 403 (Streaming to 401/402) 405 406
8:30-10:30 HW-ML (Tutorial on Hardware Architectures for Deep Neural Networks)

Organizers: Joel Emer, Vivienne Sze, and Yu-Hsin Chen
MemoryTech (Tutorial on Existing and Emerging Memory Technologies and Circuits)

Organizers: Meng-Fan (Marvin) Chang, Yue-Der Chih, Helia Naeimi, Darsen Lu, Shih-Lien Lu, Dinesh Somasekhar and Shigeki Tomishima
IoT (Cognitive Edge Computing)

Organizers: Ravi Iyer, Vijay Janapa Reddi and Shiao-Li Tsao
October 16, 2016 (Sunday)
Reception (Finger foods, soft drinks, Taiwan beers, alcoholic drinks provided. Registration desk opens 17:30-20:00 at B2)
October 17, 2016 (Monday)
Breakfast
8:00-8:20 Opening remarks
Hall I
Keynote I: Internet of Things: History and Hype, Technology and Policy
Margaret Martonosi (Princeton)

Chair: Mikko Lipasti
9:20-10:00 Lightning Session I
Chair: Moin Qureshi
Hall III Hall I
10:20-12:00 Session 1a: Microarchitecture
Chair: Minsoo Rhu
Session 1b: Cloud & Storage
Chair: Babak Falsafi
Dictionary Sharing: An Efficient Cache Compression Scheme for Compressed Caches, Biswabandan Panda (INRIA), André Seznec (INRIA)
Perceptron Learning for Reuse Prediction, Elvira Teran (Texas A&M University), Zhe Wang (Intel), Daniel A. Jiménez (Texas A&M University)
pTask: A Smart Prefetching Scheme for OS Intensive Applications, Prathmesh Kallurkar (Indian Institute of Technology, New Delhi), Smruti R. Sarangi (Indian Institute of Technology, New Delhi)
Register Sharing for Equality Prediction, Arthur Perais (INRIA/IRISA), Fernando A. Endo (INRIA/IRISA), André Seznec (INRIA/IRISA)
Data-Centric Execution of Speculative Parallel Programs, Mark C. Jeffrey (MIT), Suvinay Subramanian (MIT), Maleen Abeydeera (MIT), Joel Emer (NVIDIA/MIT), Daniel Sanchez (MIT)
SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing, Alexandros Daglis (EPFL), Dmitrii Ustiugov (EPFL), Stanko Novaković (EPFL), Edouard Bugnion (EPFL), Babak Falsafi (EPFL), Boris Grot (University of Edinburgh)
A Cloud-Scale Acceleration Architecture, Adrian M. Caulfield (Microsoft), Eric S. Chung (Microsoft), Andrew Putnam (Microsoft), Hari Angepat (Microsoft), Jeremy Fowers (Microsoft), Michael Haselman (Microsoft), Stephen Heil (Microsoft), Matt Humphrey (Microsoft), Puneet Kaur (Microsoft), Joo-Young Kim (Microsoft), Daniel Lo (Microsoft), Todd Massengill (Microsoft), Kalin Ovtcharov (Microsoft), Michael Papamichael (Microsoft), Lisa Woods (Microsoft), Sitaram Lanka (Microsoft), Derek Chiou (Microsoft), Doug Burger (Microsoft)
Towards Efficient Server Architecture for Virtualized Network Function Deployment: Implications and Implementations, Yang Hu (University of Florida), Tao Li (University of Florida)
Bridging the I/O Performance Gap for Big Data Workloads: A New NVDIMM-based Approach, Renhai Chen (The Hong Kong Polytechnic University), Zili Shao (The Hong Kong Polytechnic University), Tao Li (University of Florida)
NeSC: Self-Virtualizing Nested Storage Controller, Yonatan Gottesman (Technion-Israel Institute of Technology), Yoav Etsion (Technion-Israel Institute of Technology)
Lunch (3F Yangtse River & Dragon Hall)
Hall I
14:00-15:40 Poster session
Hall III Hall I
16:00-18:00 Session 2a: GPU
Chair: Hyeran Jeon
Session 2b: Neural Networks
Chair: Emre Ozer
MIMD Synchronization on SIMT Architectures, Ahmed ElTantawy (University of British Columbia), Tor M. Aamodt (University of British Columbia)
Efficient Kernel Synthesis for Performance Portable Programming, Li-Wen Chang (University of Illinois at Urbana-Champaign), Izzat El Hajj (University of Illinois at Urbana-Champaign), Christopher Rodrigues (Huawei), Juan Gómez-Luna (University of Córdoba), Wen-mei Hwu (University of Illinois at Urbana-Champaign)
KLAP: Kernel Launch Aggregation and Promotion for Optimizing Dynamic Parallelism, Izzat El Hajj (University of Illinois at Urbana-Champaign), Juan Gómez-Luna (University of Córdoba), Cheng Li (University of Illinois at Urbana-Champaign), Li-Wen Chang (University of Illinois at Urbana-Champaign), Dejan Milojicic (Hewlett-Packard), Wen-mei Hwu (University of Illinois at Urbana-Champaign)
Cache-Emulated Register File: An Integrated On-Chip Memory Architecture for High Performance GPGPUs, Naifeng Jing (Shanghai Jiao Tong University), Jianfei Wang (Shanghai Jiao Tong University), Fengfeng Fan (Shanghai Jiao Tong University), Wenkang Yu (Shanghai Jiao Tong University), Li Jiang (Shanghai Jiao Tong University), Chao Li (Shanghai Jiao Tong University), Xiaoyao Liang (Shanghai Jiao Tong University)
Zorua: A Holistic Approach to Resource Virtualization in GPUs, Nandita Vijaykumar (Carnegie Mellon University), Kevin Hsieh (Carnegie Mellon University), Gennady Pekhimenko (Microsoft and Carnegie Mellon University), Samira Khan (University of Virginia), Ashish Shrestha (Carnegie Mellon University), Saugata Ghose (Carnegie Mellon University), Adwait Jog (College of William and Mary), Phillip B. Gibbons (Carnegie Mellon University), Onur Mutlu (ETH Zürich and Carnegie Mellon University)
GRAPE: Minimizing Energy for GPU Applications with Performance Requirements, Muhammad Husni Santriaji (Surya University & University of Chicago), Henry Hoffmann (University of Chicago)
From High-Level Deep Neural Models to FPGAs, Hardik Sharma (Georgia Institute of Technology), Jongse Park (Georgia Institute of Technology), Divya Mahajan (Georgia Institute of Technology), Emmanuel Amaro (Georgia Institute of Technology), Joon Kyung Kim (Georgia Institute of Technology), Chenkai Shao (Georgia Institute of Technology), Asit Mishra (Intel), Hadi Esmaeilzadeh (Georgia Institute of Technology)
vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design, Minsoo Rhu (NVIDIA), Natalia Gimelshein (NVIDIA), Jason Clemons (NVIDIA), Arslan Zulfiqar (NVIDIA), Stephen W. Keckler (NVIDIA)
Stripes: Bit-Serial Deep Neural Network Computing, Patrick Judd (University of Toronto), Jorge Albericio (University of Toronto), Tayler Hetherington (University of British Columbia), Tor M. Aamodt (University of British Columbia), Andreas Moshovos (University of Toronto)
Cambricon-X: An Accelerator for Sparse Neural Networks, Shijin Zhang (Chinese Academy of Sciences), Zidong Du (Chinese Academy of Sciences), Lei Zhang (Chinese Academy of Scienses), Huiying Lan (Chinese Academy of Sciences), Shaoli Liu (Chinese Academy of Sciences), Ling Li (Chinese Academy of Sciences), Qi Guo (Chinese Academy of Sciences), Tianshi Chen (Chinese Academy of Sciences), Yunji Chen (Chinese Academy of Sciences)
NEUTRAMS: Neural Network Transformation and Co-design under Neuromorphic Hardware Constraints, Yu Ji (Tsinghua University), YouHui Zhang (Tsinghua University), ShuangChen Li (University of California, Santa Barbara), Ping Chi (University of California, Santa Barbara), CiHang Jiang (Tsinghua University), Peng Qu (Tsinghua University), Yuan Xie (University of California, Santa Barbara), WenGuang Chen (Tsinghua University)
Fused-Layer CNN Accelerators, Manoj Alwani (Stony Brook University), Han Chen (Stony Brook University), Michael Ferdman (Stony Brook University), Peter Milder (Stony Brook University)
18:20-20:00 Business meeting
October 18, 2016 (Tuesday)
Breakfast
Hall I
Keynote II: Low Power CPU: From Mobile to Wearable & IoT
Uming Ko (MediaTek)

Chair: Hsien-Hsin Lee
9:30-10:10 Lightning Session II
Chair: Yuan Xie
Hall III Hall I
10:30-12:10 Session 3a: Compilation & Memory
Chair: Samira Khan
Session 3b: Interconnect
Chair: Sreenivas Subramoney
Continuous Shape Shifting: Enabling Loop Co-optimization via Near-Free Dynamic Code Rewriting, Animesh Jain (University of Michigan, Ann Arbor), Michael A. Laurenzano (University of Michigan, Ann Arbor), Lingjia Tang (University of Michigan, Ann Arbor), Jason Mars (University of Michigan, Ann Arbor)
CrystalBall: Statically Analyzing Runtime Behavior via Deep Sequence Learning, Stephen Zekany (University of Michigan, Ann Arbor), Daniel Rings (University of Michigan, Ann Arbor), Nathan Harada (University of Michigan, Ann Arbor), Michael A. Laurenzano (University of Michigan, Ann Arbor; Clinc), Lingjia Tang (University of Michigan, Ann Arbor; Clinc), Jason Mars (University of Michigan, Ann Arbor; Clinc)
Low-Cost Soft Error Resilience with Unified Data Verification and Fine-Grained Recovery for Acoustic Sensor Based Detection, Qingrui Liu (Virginia Tech), Changhee Jung (Virginia Tech, Blacksburg), Dongyoon Lee (Virginia Tech, Blacksburg), Devesh Tiwari (Oak Ridge National Lab)
Lazy Release Consistency for GPUs, Johnathan Alsop (University of Illinois at Urbana-Champaign), Marc S. Orr (University of Wisconsin - Madison and AMD), Bradford M. Beckmann (AMD), David A. Wood (University of Wisconsin - Madison and AMD)
Improving Energy Efficiency of DRAM by Exploiting Half Page Row Access, Heonjae Ha (Stanford University), Ardavan Pedram (Stanford University and Movidius), Stephen Richardson (Stanford University), Shahar Kvatinsky (Technion-Israel Institute of Technology), Mark Horowitz (Stanford University)
OSCAR: Orchestrating STT-RAM Cache Traffic for Heterogeneous CPU-GPU Architectures, Jia Zhan (University of California, Santa Barbara), Onur Kayiran (Advanced Micro Devices), Gabriel H. Loh (Advanced Micro Devices), Chita R. Das (The Pennsylvania State University), Yuan Xie (University of California, Santa Barbara)
A Unified Memory Network Architecture for In-Memory Computing in Commodity Servers, Jia Zhan (University of California, Santa Barbara), Itir Akgun (University of California, Santa Barbara), Jishen Zhao (Univeristy of California, Santa Cruz), Al Davis (HP), Paolo Faraboschi (HP), Yuangang Wang (Huawei), Yuan Xie (University of California, Santa Barbara)
Contention-based Congestion Management in Large-Scale Networks, Gwangsun Kim (KAIST), Changhyun Kim (KAIST), Jiyun Jeong (KAIST), Mike Parker (Intel), John Kim (KAIST)
Dynamic Error Mitigation in NoCs using Intelligent Prediction Techniques, Dominic DiTomaso (Ohio University), Travis Boraten (Ohio University), Avinash Kodi (Ohio University), Ahmed Louri (George Washington University)
Reducing Data Movement Energy via Online Data Clustering and Encoding, Shibo Wang (University of Rochester), Engin Ipek (University of Rochester)
Award Lunch (including Bob Rau Award, Test of Time)
(B1 Formosa)
(B1 Formosa)
Hall III Hall I
14:10-15:30 Session 4a: Multicore
Chair: Carole-Jean Wu
Session 4b: Security
Chair: Koji Inoue
Racer: TSO Consistency via Race Detection, Alberto Ros (Universidad de Murcia), Stefanos Kaxiras (Uppsala Universitet)
Exploiting Semantic Commutativity in Hardware Speculation, Guowei Zhang (MIT), Virginia Chiu (MIT), Daniel Sanchez (MIT)
CANDY: Enabling Coherent DRAM Caches for Multi-Node Systems, Chiachen Chou (Georgia Institute of Technology), Aamer Jaleel (NVIDIA), Moinuddin K. Qureshi (Georgia Institute of Technology)
C3D: Mitigating the NUMA Bottleneck via Coherent DRAM Caches, Cheng-Chieh Huang (University of Edinburgh), Rakesh Kumar (University of Edinburgh), Marco Elver (University of Edinburgh), Boris Grot (University of Edinburgh), Vijay Nagarajan (University of Edinburgh)
Quantifying and Improving the Efficiency of Hardware-based Mobile Malware Detectors, Mikhail Kazdagli (UT Austin), Vijay Janapa Reddi (UT Austin), Mohit Tiwari (UT Austin)
PoisonIvy: Safe Speculation for Secure Memory, Tamara Silbergleit Lehman (Duke University), Andrew D. Hilton (Duke University), Benjamin C. Lee (Duke University)
ReplayConfusion: Detecting Cache-based Covert Channel Attacks Using Record and Replay, Mengjia Yan (University of Illinois at Urbana Champaign), Yasser Shalabi (University of Illinois at Urbana Champaign), Josep Torrellas (University of Illinois at Urbana Champaign)
Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR, Dmitry Evtyushkin (Binghamton University), Dmitry Ponomarev (Binghamton University), Nael Abu-Ghazaleh (University of California, Riverside)
Hall III Hall I
16:00-17:00 Session 5a: Approximate Computing
Chair: Andreas Moshovos
Session 5b: Accelerators 1
Chair: Tao Li
Concise Loads and Stores: The Case for an Asymmetric Compute-Memory Architecture for Approximation, Animesh Jain (University of Michigan), Parker Hill (University of Michigan), Shih-Chieh Lin (University of Michigan), Muneeb Khan (Uppsala University), Md E. Haque (University of Michigan), Michael A. Laurenzano (University of Michigan), Scott Mahlke (University of Michigan), Lingjia Tang (University of Michigan), Jason Mars (University of Michigan)
Approxilyzer: Towards A Systematic Framework for Instruction-Level Approximate Computing and its Application to Hardware Resiliency, Radha Venkatagiri (University of Illinois at Urbana Champaign), Abdulrahman Mahmoud (University of Illinois at Urbana Champaign), Siva Kumar Sastry Hari (NVIDIA), Sarita V. Adve (University of Illinois at Urbana Champaign),
The Bunker Cache for Spatio-Value Approximation, Joshua San Miguel (University of Toronto), Jorge Albericio (University of Toronto), Natalie Enright Jerger (University of Toronto), Aamer Jaleel (NVIDIA)
HARE: Hardware Accelerator for Regular Expressions, Vaibhav Gogte (University of Michigan), Aasheesh Kolli (University of Michigan), Michael J. Cafarella (University of Michigan), Loris D'Antoni (University of Wisconsin-Madison), Thomas F. Wenisch (University of Michigan)
The Microarchitecture of a Real-time Robot Motion Planning Accelerator, Sean Murray (Duke University), Will Floyd-Jones (Duke University), Ying Qi (Duke University), George Konidaris (Duke University), Daniel J. Sorin (Duke University)
Efficient Data Supply for Hardware Accelerators with Prefetching and Access/Execute Decoupling, Tao Chen (Cornell University), G. Edward Suh (Cornell University)
18:00-21:00 Banquet
Buses bound for the banquet venue will be prepared in front of Howard Hotel between 17:10 - 17:30. Return buses will depart from the banquet venue at 21:00 and arrive Howard Hotel at around 21:30
October 19, 2016 (Wednesday)
Breakfast
Hall III Hall I
8:00-9:40 Session 6a: Accelerators 2
Chair: Ren-Shuo Liu
Session 6b: Mobile & Power Mgmt
Chair: Jaewoong Sim
An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition, Reza Yazdani (Universitat Politecnica de Catalunya), Albert Segura (Universitat Politecnica de Catalunya), Jose-Maria Arnau (Universitat Politecnica de Catalunya), Antonio Gonzalez (Universitat Politecnica de Catalunya)
Co-Designing Accelerators and SoC Interfaces using gem5-Aladdin, Yakun Sophia Shao (NVIDIA), Sam (Likun) Xi (Harvard University), Vijayalakshmi Srinivasan (IBM), Gu-Yeon Wei (Harvard University), David Brooks (Harvard University)
CHAINSAW: Von-Neumann Accelerators to Leverage Fused Instruction Chains, Amirali Sharifan (Simon Fraser University), Snehasish Kumar (Simon Fraser University), Apala Guha (Simon Fraser University), Arrvindh Shriraman (Simon Fraser University)
Chameleon: Versatile and Practical Near-DRAM Acceleration Architecture for Large Memory Systems, Hadi Asghari-Moghaddam (University of Illinois at Urbana-Champaign), Young Hoon Son (Seoul National University), Jung Ho Ahn (Seoul National University), Nam Sung Kim (University of Illinois at Urbana-Champaign)
A Patch Memory System For Image Processing and Computer Vision, Jason Clemons (NVIDIA), Chih-Chi Cheng (Qualcomm), Iuri Frosio (NVIDIA), Daniel Johnson (NVIDIA), Steve W. Keckler (NVIDIA)
Evaluating Programmable Architectures for Imaging and Vision Applications, Artem Vasilyev (Stanford University), Nikhil Bhagdikar (Stanford University), Ardavan Pedram (Stanford University and Movidius), Stephen Richardson (Stanford University), Shahar Kvatinsky (Technion), Mark Horowitz (Stanford University)
Redefining QoS and Customizing the Power Management Policy to Satisfy Individual Mobile Users, Kaige Yan (University of Houston), Xingyao Zhang (University of Houston), Jingweijia Tan (University of Houston), Xin Fu (University of Houston)
Snatch: Opportunistically Reassigning Power Allocation between Processor and Memory in 3D Stacks, Dimitrios Skarlatos (University of Illinois at Urbana-Champaign), Renji Thomas (Ohio State University), Aditya Agrawal (NVIDIA), Shibin Qin (University of Illinois at Urbana-Champaign), Robert Pilawa-Podgurski (University of Illinois at Urbana-Champaign), Ulya R. Karpuzcu (University of Minnesota, Twin Cities), Radu Teodorescu (Ohio State University), Nam Sung Kim (University of Illinois at Urbana-Champaign), Josep Torrellas (University of Illinois at Urbana-Champaign)
Ti-states: Processor Power Management in the Temperature Inversion Region, Yazhou Zu (University of Texas at Austin), Wei Huang (AMD), Indrani Paul (AMD), Vijay Janapa Reddi (University of Texas at Austin)
Hall I
10:00-12:00 Session 7: Best Paper Candidates
Chairs: Mikko Lipasti, Hsien-Hsin Lee
Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics, Tae Jun Ham (Princeton University), Lisa Wu (University of California, Berkeley), Narayanan Sundaram (Intel), Nadathur Satish (Intel), Margaret Martonosi (Princeton University)
Improving Bank-Level Parallelism for Irregular Applications, Xulong Tang (Pennsylvania State University, University Park), Mahmut Kandemir (Pennsylvania State University, University Park), Praveen Yedlapalli (VMware), Jagadish Kotra (Pennsylvania State University, University Park)
Delegated Persist Ordering, Aasheesh Kolli (University of Michigan), Jeff Rosen (Snowflake Computing), Stephan Diestelhorst (ARM), Ali Saidi (ARM), Steven Pelley (Snowflake Computing), Sihang Liu (University of Michigan), Peter M. Chen (University of Michigan), Thomas F. Wenisch (University of Michigan)
Spectral Profiling: Observer-Effect-Free Profiling by Monitoring EM Emanations, Nader Sehatbakhsh (Georgia Institute of Technology, Atlanta), Alireza Nazari (Georgia Institute of Technology, Atlanta), Alenka Zajic (Georgia Institute of Technology, Atlanta), Milos Prvulovic (Georgia Institute of Technology, Atlanta)
Path Confidence based Lookahead Prefetching, Jinchun Kim (Texas A&M University), Seth H. Pugsley (Intel), Paul V. Gratz (Texas A&M University), A. L. Narasimha Reddy (Texas A&M University), Chris Wilkerson (Intel), Zeshan Chishti (Intel)
Continuous Runahead: Transparent Hardware Acceleration for Memory Intensive Workloads, Milad Hashemi (The University of Texas at Austin), Onur Mutlu (ETH Zürich), Yale N. Patt (The University of Texas at Austin)
12:00-12:30 Session 8: Conference Closing and Best Paper Award
12:30-18:15 Conference Excursion: (includes sack lunch)
Buses bound for the excursion place will be prepared in front of Howard Hotel at 12:30. Return buses will depart from the excursion place at 17:45 and arrive Howard Hotel at around 18:15
