Welcome Message from the General Co-Chairs
Nikos Hardavellas (Northwestern University) and Simone Campanoni (Northwestern University)

Welcome Message from the Program Co-Chairs
Boris Grot (University of Edinburgh) and Ulya Karpuzcu (University of Minnesota, Twin Cities)

8:30 AM CDT – 9:30 AM CDT: Keynote I by Dave Ditzel (Esperanto Technologies)

The evolution of processor design in the last 50 years has taken a variety of paths, but the principles embodied in RISC design have been one of the most used approaches over the last 25 years. This talk will make the case that general-purpose RISC processors are likely to stay in the mainstream for the foreseeable future, along with other key technology developments that will influence the design of processors of the future. Having been an active participant in the RISC revolution since the term was coined in 1980, this presentation will review some of the salient achievements in the evolution of RISC until today, and talk about where the future will lead us. Most notable is that until recently, much computing has been done on only a single or small number of processor cores, but that we are entering a renaissance where access to thousands to millions of processor cores will be as common as a desktop computer is today. This talk will touch on some of the key technology advances that will enable our RISC based future, and make some predictions for what we can expect in the next decade.

Dave Ditzel is the founder and CTO of Esperanto Technologies, Inc., a company founded in 2014 that builds energy-efficient processors for AI and beyond based on the RISC-V instruction set. Prior to Esperanto, Dave spent six years at Intel Corporation as vice-president of Hybrid Computing, leading a team building a high-performance out-of-order processor using binary translation to run legacy x86 or ARM applications with improved energy efficiency. In 2007, he founded ThruChip Communications, to reduce IO energy between die by using wireless inductive communication. In 1995 Dave founded Transmeta Corporation, which developed low-power x86 compatible processors using Code Morphing binary translation on top of an energy-efficient VLIW architecture. Dave spent 10 years at Sun Microsystems as CTO for the SPARC Technology Business and led the development of the 64-bit SPARC ISA and various SPARC processors. Prior to Sun, Dave spent 10 years at AT&T Bell Laboratories, where he worked on a series of RISC processors optimized for the C programming language. Dave Ditzel was a graduate student under U.C. Berkeley Professor David Patterson and in 1980 they co-authored “The Case for the Reduced Instruction Set Computer”, which catalyzed the movement to RISC processors.

Session Chairs: Ulya Karpuzcu (University of Minnesota, Twin Cities) & Boris Grot (University of Edinburgh)
Best Paper Nominee
Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction
Rahul Bera, Konstantinos Kanellopoulos (ETH Zürich); Shankar Balachandran (Intel); David Novo (LIRMM / University of Montpellier / CNRS); Ataberk Olgun, Mohammad Sadrosadati, Onur Mutlu (ETH Zürich)

Best Paper Nominee
Whisper: Profile-Guided Branch Misprediction Elimination for Data Center Applications
Tanvir Ahmed Khan, Muhammed Ugur (University of Michigan); Krishnendra Nathella, Dam Sunwoo (Arm Research); Heiner Litz (University of California, Santa Cruz); Daniel A. Jiménez (Texas A&M University); Baris Kasikci (University of Michigan)

Best Paper Nominee
OverGen: Improving FPGA Usability Through Domain-specific Overlay Generation
Sihao Liu, Jian Weng, Dylan Kupsh, Atefeh Sohrabizadeh, Zhengrong Wang, Licheng Guo (University of California, Los Angeles); Jiuyang Liu (Huazhong University of Science and Technology); Maxim Zhulin, Rishabh Mani (University of California, Los Angeles); Lucheng Zhang (Institute of Software, Chinese Academy of Sciences); Jason Cong, Tony Nowatzki (University of California, Los Angeles)

Best Paper Nominee
Cambricon-P: A Bitflow Architecture for Arbitrary Precision Computing
Yifan Hao, Yongwei Zhao, Chenxiao Liu, Shuyao Cheng, Xiaqing Li, Xing Hu, Zidong Du, Qi Guo (Institute of Computing Technology, Chinese Academy of Sciences); Zhiwei Xu (Institute of Computing Technology of the Chinese Academy of Sciences, China); Tianshi Chen (Chinese Academy of Science)

Session Chair: John Kim (KAIST)
Revisiting Residue Codes for Modern Memories
Evgeny Manzhosov, Adam Hastings, Meghna Pancholi, Ryan Piersma, Mohamed Tarek Ibn Ziad, Simha Sethumadhavan (Columbia University)

PageORAM: An Efficient DRAM Page Aware ORAM Strategy
Rachit Rajat, Yongqin Wang, Murali Annavaram (University of Southern California)

AQUA: Scalable Rowhammer Mitigation by Quarantining Aggressor Rows at Runtime
Anish Saxena, Gururaj Saileshwar (Georgia Institute of Technology); Prashant J. Nair (University of British Columbia); Moinuddin Qureshi (Georgia Institute of Technology)

CRONUS: Fault-Isolated, Secure and High-Performance Heterogeneous Computing for Trusted Execution Environments
Jianyu Jiang, Ji Qi, Tianxiang Shen, Xusheng Chen, Shixiong Zhao (University of Hong Kong); Sen Wang, Li Chen, Nicholas Zhang (Huawei Technologies); Xiapu Luo (Hong Kong Polytechnic University); Heming Cui (University of Hong Kong)
Session Chair: Gilles Pokam (Intel)
Reconstructing Out-of-Order Issue Queue
Ipoom Jeong, Jiwon Lee (Yonsei University); Myung Kuk Yoon (Ewha Womans University); Won Woo Ro (Yonsei University)

Speculative Code Compaction: Eliminating Dead Code via Speculative Microcode Transformations
Logan Moody, Wei Qi, Abdolrasoul Sharifi, Layne Berry, Joey Rudek (University of Virginia); Jayesh Gaur, Jeff Parkhurst, Sreenivas Subramoney (Intel); Kevin Skadron, Ashish Venkat (University of Virginia)

big.VLITTLE: On-Demand Data-Parallel Acceleration for Mobile Systems on Chip
Tuan Ta, Khalid Al-Hawaj, Nick Cebry, Yanghui Ou, Eric Hall, Courtney Golden, Christopher Batten (Cornell University)

Exploring Instruction Fusion Opportunities in General Purpose Processors
Sawan Singh (University of Murcia); Arthur Perais (Centre National de la Recherche Scientifique); Alexandra Jimborean, Alberto Ros (University of Murcia)

Session Chair: Hyeran Jeon (University of California, Merced)
DTexL: Decoupled Raster Pipeline for Texture Locality
Diya Joseph (Universitat Politècnica de Catalunya / Barcelona Supercomputing Center); Juan L. Aragón (University of Murcia); Joan-Manuel Parcerisa, Antonio González (Universitat Politècnica de Catalunya / Barcelona Supercomputing Center)

Morpheus: Extending the Last Level Cache in GPU Systems with Idle GPU Cores' Resources
Sina Darabi (Sharif University of Technology); Mohammad Sadrosadati (ETH Zürich); Negar Akbarzadeh (Sharif University of Technology); Joël Lindegger (ETH Zürich); S. Mohammad Hosseini (Sharif University of Technology); Jisung Park (POSTECH / ETH Zürich); Juan Gómez Luna, Onur Mutlu (ETH Zürich); Hamid Sarbazi-Azad (Sharif University of Technology / IPM)

Featherweight Soft Error Resilience for GPUs
Yida Zhang, Changhee Jung (Purdue University)

Vulkan-Sim: A GPU Architecture Simulator for Ray Tracing
Mohammadreza Saed, Yuan Hsi Chou, Lufei Liu (University of British Columbia); Tyler Nowicki (Huawei Technologies); Tor Aamodt (University of British Columbia)
Session Chair: Magnus Jahre (Norwegian University of Science and Technology)
Pushing Point Cloud Compression to Edge
Ziyu Ying, Shulin Zhao, Sandeepa Bhuyan, Cyan Subhra Mishra, Mahmut Taylan Kandemir, Chita Das (Pennsylvania State University)

Automatic Domain-Specific SoC Design for Autonomous Unmanned Aerial Vehicles
Srivatsan Krishnan (Harvard); Zishen Wan (Georgia Institute of Technology); Kshitij Bhardwaj (LLNL); Paul Whatmough (Arm / Harvard University); Aleksandra Faust (Google Research); Sabrina M. Neuman (Harvard University / Massacuhsetts Institute of Technology); Gu-Yeon Wei, David Brooks (Harvard University); Vijay Janapa Reddi (Harvard University)

A Hardware-Software Interface for Charge Management in Energy-Harvesting Systems
Emily Ruppel, Milijana Surbatovich, Harsh Desai (Carnegie Mellon University); Kiwan Maeng (Facebook); Brandon Lucia (Carnegie Mellon University)

ROG: A High Performance and Robust Distributed Training System for Robotic IoT
Xiuxian Guan, Zekai Sun, Shengliang Deng, Xusheng Chen, Shixiong Zhao, Zongyuan Zhang, Tianyang Duan, Yuexuan Wang, Chenshu Wu (University of Hong Kong); Yong Cui (Tsinghua University); Libo Zhang, Yanjun Wu (Institute of Software, Chinese Academy of Sciences); Rui Wang (Southern University of Science and Technology); Heming Cui (University of Hong Kong)

Session Chair: Hung-Wei Tseng (University of California, Riverside)
ASSASIN: Architecture Support for Stream Computing to Accelerate Computational Storage
Chen Zou (University of Chicago); Andrew A. Chien (University of Chicago / Argonne National Laboratory)

DaxVM: Stressing the Limits of Memory as a File Interface
Chloe Alverti (National Technical University of Athens); Vasileios Karakostas (University of Athens); Nikhita Kunati (NVIDIA); Georgios Goumas (National Technical University of Athens); Michael Swift (University of Wisconsin-Madison)

Networked SSD: Flash Memory Interconnection Network for High-Bandwidth SSD
Jiho Kim (KAIST); Seokwon Kang (Hanyang University); Yongjun Park (Yonsei University); John Kim (KAIST)

Designing Virtual Memory System of MCM GPUs
Neha Jawalkar, Pratheek B., Arkaprava Basu (Indian Institute of Science-Bangalore)
Session Chair: Jian Huang (University of Illinois Urbana-Champaign)
Altocumulus: Scalable Scheduling for Nanosecond-Scale Remote Procedure Calls
Jiechen Zhao, Iris Uwizeyimana, Karthik Ganesan, Mark C. Jeffrey, Natalie Enright Jerger (University of Toronto)

SIMR: Single Instruction Multiple Request Processing for Energy-Efficient Data Center Microservices
Mahmoud Khairy, Ahmad Alawneh, Aaron Barnes, Timothy G. Rogers (Purdue University)

Patching Up Network Data Leaks with Sweeper
Marina Vemmou, Albert Y. Cho, Alexandros Daglis (Georgia Institute of Technology)

IDIO: Network-Driven, Inbound Network Data Orchestration on Server Processors
Mohammad Alian (KU); Siddharth Agarwal (University of Illinois Urbana-Champaign); Jongmin Shin (DGIST); Neel Patel (University of Kansas); Yifan Yuan (University of Illinois Urbana-Champaign); Daehoon Kim (DGIST); Ren Wang (Intel); Nam Sung Kim (University of Illinois Urbana-Champaign)

Session Chair: Changhee Jung (Purdue University)
Treebeard: An Optimizing Compiler for Decision Tree Based ML Inference
Ashwin Prasad (Indian Institute of Science-Bangalore); Sampath Rajendra, Kaushik Rajan (Microsoft Research); R. Govindarajan (Indian Institute of Science-Bangalore); Uday Bondhugula (Indian Institute of Science-Bangalore / PolyMage Labs)

GCD2: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs
Wei Niu, Jiexiong Guan (William & Mary); Xipeng Shen (North Carolina State University / Facebook); Yanzhi Wang (Northeastern University); Gagan Agrawal (Augusta University); Bin Ren (William & Mary)

OCOLOS: Online COde Layout OptimizationS
Yuxuan Zhang (University of Pennsylvania); Tanvir Ahmed Khan (University of Michigan); Gilles A. Pokam (Intel); Baris Kasikci (University of Michigan); Heiner Litz (University of California, Santa Cruz); Joseph Devietti (University of Pennsylvania)

RipTide: A Programmable, Energy-Minimal Dataflow Compiler and Architecture
Graham Gobieski, Souradip Ghosh, Marijn Heule, Todd C. Mowry (Carnegie Mellon University); Tony Nowatzki (University of California, Los Angeles); Nathan Beckmann, Brandon Lucia (Carnegie Mellon University)
Session Chair: Yingyan (Celine) Lin (Georgia Institute of Technology)
Skipper: Enabling Efficient SNN Training Through Activation-Checkpointing and Time-Skipping
Sonali Singh, Anup Sarma, Sen Lu, Abhronil Sengupta, Mahmut Taylan Kandemir (Pennsylvania State University); Emre Neftci (RWTH Aachen); Vijaykrishnan Narayanan, Chita Das (Pennsylvania State University)

Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tiles
Renzo Andri (Huawei Technologies); Beatrice Bussolino (Huawei Technologies / Politecnico di Torino); Antonio Cipolletta, Lukas Cavigelli, Zhe Wang (Huawei Technologies)

Adaptable Butterfly Accelerator for Attention-Based NNs via Hardware and Algorithm Co-Design
Hongxiang Fan (Imperial College London); Thomas Chau, Stylianos Venieris (Samsung AI Center Cambridge); Royson Lee (University of Cambridge); Alexandros Kouris (Samsung AI Center Cambridge / Imperial College London); Wayne Luk (Imperial College London); Nicholas D. Lane (Samsung AI Center Cambridge / University of Cambridge); Mohamed Abdelfattah (Cornell University)

DFX: A Low-Latency Multi-FPGA Appliance for Accelerating Transformer-Based Text Generation
Seongmin Hong, Seungjae Moon, Junsoo Kim (KAIST); Sungjae Lee, Minsub Kim, Dongsoo Lee (NAVER); Joo-Young Kim (KAIST)

HARMONY: Heterogeneity-Aware Hierarchical Management for Federated Learning System
Chunlin Tian, Li Li (University of Macau); Zhan Shi (University of Texas at Austin); Jun Wang (Futurewei Technology); ChengZhong Xu (University of Macau)

8:30 AM CDT – 9:30 AM CDT: Keynote II by Jason Cong (UCLA)

As we enter the era of customized computing, where customized domain-specific accelerators (DSAs) are used extensively for performance and energy efficiency, ideally we would like to enable every programmer to offload the compute-intensive portion of his/her program to one or a set of DSAs, either pre-implemented in ASICs or synthesized on demand on programmable fabrics, such as FPGAs. High-level synthesis (HLS) made an important progress in this direction, but it still requires the programmer to provide various pragmas, such as loop unrolling, pipelining, and tiling, to define the microarchitecture of the accelerator, which is a challenging task to most software programmer. In this talk, we present our latest research on automated accelerator synthesis and customized computing on FPGAs, ranging from microarchitecture guided optimization, such as automated generation of highly optimized systolic arrays and stencil computation engines, to more general source-to-source transformation based on graph-based neural networks and meta learning, and finally to latency-insensitive system-level integration.

Jason Cong is the Volgenau Chair for Engineering Excellence Professor (and former Department Chair) at the UCLA Computer Science Department, with joint appointment from the Electrical Engineering Department, the director of Center for Domain-Specific Computing (CDSC), and the director of VLSI Architecture, Synthesis, and Technology (VAST) Laboratory. Dr. Cong’s research interests include novel architectures and compilation for customizable computing, synthesis of VLSI circuits and systems, and highly scalable algorithms. He has close to 500 publications in these areas, including 16 best paper awards, three 10-Year Most Influential Paper Awards, and three papers inducted to the FPGA and Reconfigurable Computing Hall of Fame. He and his former students co-founded AutoESL, which developed the most widely used high-level synthesis tool for FPGAs (renamed to Vivado HLS after Xilinx’s acquisition). He was elected an IEEE Fellow in 2000, ACM Fellow in 2008, the National Academy of Engineering in 2017, and the National Academy of Inventors in 2020. He is the recipient of the 2022 IEEE Robert Noyce Medal for fundamental contributions to electronic design automation and FPGA design methods.

Session Chair: Rujia Wang (Illinois Institute of Technology)
Leaky Way: A Conflict-Based Cache Covert Channel Bypassing Set Associativity
Yanan Guo, Xin Xin, Youtao Zhang, Jun Yang (University of Pittsburgh)

SwiftDir: Secure Cache Coherence Without Overprotection
Chenlu Miao, Kai Bu, Mengming Li (Zhejiang University); Shaowu Mao, Jianwei Jia (Huawei Technologies)

Self-Reinforcing Memoization for Cryptography Calculations in Secure Memory Systems
Xin Wang, Daulet Talapkaliyev, Matthew Hicks, Xun Jian (Virginia Tech)

Eager Memory Cryptography in Caches
Xin Wang (Virginia Tech); Jagadish Kotra (AMD Research); Xun Jian (Virginia Tech)
Session Chair: Elaheh Sadredini (University of California, Riverside)
GenPIP: In-Memory Acceleration of Genome Analysis by Tight Integration of Basecalling and Read Mapping
Haiyu Mao, Mohammed Alser, Mohammad Sadrosadati, Can Firtina, Akanksha Baranwal (ETH Zürich); Damla Senol Cali (Bionano Genomics); Aditya Manglik, Nour Almadhoun Alserr, Onur Mutlu (ETH Zürich)

BEACON: Scalable Near-Data-Processing Accelerators for Genome Analysis near Memory Pool with the CXL Support
Wenqin Huangfu (University of California at Santa Barbara); Krishna T. Malladi, Andrew Chang (Samsung); Yuan Xie (University of California, Santa Barbara)

Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation
Amir Yazdanbakhsh (Google Research); Ashkan Moradifirouzabadi, Zheng Li, Mingu Kang (University of California, San Diego)

ICE: An Intelligent Cognition Engine with 3D NAND-based In-Memory Computing for Vector Similarity Search Acceleration
Han-Wen Hu (Macronix International / National Tsing Hua University); Wei-Chen Wang (Macronix International / National Taiwan University); Yung-Chun Lee, Bo-Rong Lin, Huai-Mu Wang, Chong-Ying Lee, Yu-Ming Huang, Yen-Po Lin, Tzu-Hsiang Su, Chih-Chang Hsieh, Chia-Ming Hu, Yi-Ting Lai, Chung-Kuang Chen (Macronix International); Yuan-Hao Chang (Academia Sinica); Han-Sung Chen, Hsiang-Pang Li (Macronix International); Tei-Wei Kuo (City University of Hong Kong / National Taiwan University); Keh-Chung Wang, Meng-Fan Chang, Chun-Hsiung Hung, Chih-Yuan Lu (Macronix International)

Session Chair: Saugata Ghose (University of Illinois Urbana-Champaign)
CORUSCANT: Fast Efficient Processing-in-Racetrack Memories
Sebastien Ollivier, Stephen Longofono (University of Pittsburgh); Prayash Dutta (University of South Florida); Jingtong Hu (University of Pittsburgh); Sanjukta Bhanja (University of South Florida); Alex K. Jones (University of Pittsburgh)

IDLD: Instantaneous Detection of Leakage and Duplication of Identifiers Used for Register Renaming
Yiannakis Sazeides (University of Cyprus); Alex Gerber (Google); Ron Gabor (NVIDIA); Arkady Bramnik (Intel); George Papadimitirou, Dimitris Gizopoulos (University of Athens); Chrysostomos Nicopoulos (University of Cyprus); Giorgos Dimitrakopoulos, Karyofyllis Patsidis (Democritus University of Thrace)

HiRA: Hidden Row Activation for Reducing Refresh Latency of Off-the-Shelf DRAM Chips
Abdullah Giray Yaglikci (ETH Zürich); Ataberk Olgun (TOBB University of Economics and Technology); Lois Orosa, Minesh Patel, Haocong Luo, Hasan Hassan (ETH Zürich); Oguz Ergin (TOBB University of Economics and Technology); Onur Mutlu (ETH Zürich)
Session Chair: Karthik Swaminathan (IBM)
AgileWatts: An Energy-Efficient CPU Core Idle-State Architecture for Latency-Sensitive Server Applications
Jawad Haj-Yahya (Rivos); Haris Volos (University of Cyprus); Davide B. Bartolini (Huawei Technologies); Georgia Antoniou (University of Cyprus); Jeremie Kim (ETH Zürich); Wang Zhe, Kleovoulos Kalaitzidis, Tom Rollet, Chen Zhirui, Ye Geng (Huawei Technologies); Onur Mutlu (ETH Zürich); Yanos Sazeides (University of Cyprus)

AgilePkgC: An Agile System Idle State Architecture for Energy Proportional Datacenter Servers
Georgia Antoniou, Haris Volos (University of Cyprus); Davide B. Bartolini, Tom Rollet (Huawei); Yanos Sazeides (University of Cyprus); Jawad Haj-Yahya (Rivos)

Realizing Emotional Interactions to Learn User Experience and Guide Energy Optimization for Mobile Architectures
Xueliang Li (Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology)

What are the computer architecture and systems challenges, and what can/should the research community do about it? Global concern about climate change, and growing challenges with resource depletion and environmental damage make sustainability a core societal challenge with engagement of citizens, governments, and corporations around the world. Computing has become endemic, with digital electronics increasingly a part of manufactured devices from key fobs to automobiles to buildings to cloud datacenters, and our research community and industry has thrived as a result. As a consequence, computing's negative environmental impact is large and growing fast.

  • What are the computer architecture and systems challenges?
  • What can/should the research community do about it?

Specific topics will include - reducing scope 2 (operational) and scope 3 (embodied, downstream), accelerators, FPGAs, dark silicon, lifetime, datacenter design, and e-waste.

The panel will raise questions, provoke interesting research directions, and highlight industry, government, and funding opportunities.


  • Andrew A. Chien (Organizer) is Eckhardt Professor at the University of Chicago, directs the CERES Center for Unstoppable Computing, and serves the National Science Foundation CISE Advisory Committee. Dr. Chien leads the Zero-carbon Cloud project in 2015, pioneering synergies between cloud computing and the renewable power grid that have led to commercial startups and new grid market products. From 2017 to 2022, he served as the Editor-in-Chief of Communications of the ACM. Dr. Chien was VP of Research at Intel Corporation from 2005-2010, SAIC Chair Professor at UCSD, and also a Professor at the University of Illinois. Dr. Chien is an ACM, IEEE, and AAAS Fellow, and earned his PhD, MS, and BS from the Massachusetts Institute of Technology.
  • Alex K. Jones is a Professor of Electrical and Computer Engineering and Computer Science (by courtesy) at the University of Pittsburgh. He is currently on leave from Pitt to serve as a Program Director in the CNS Division of CISE at the US NSF. Dr. Jones has been active in sustainable computing research for more than a decade with a focus on reducing carbon of computing architectures throughout the entire system lifecycle. Towards this end his group has released the open source GreenChip tool. He is currently the steering committee chair for the IEEE International Green and Sustainable Conference (IGSC) which is now in its 13th year. He is an associate editor of the SUSCOM journal and did a rotation as AE for IEEE Transactions on Sustainable Computing. Beyond sustainable computing his interests include processing-in-memory, reliability and fault tolerance from terrestrial to space systems, and quantum computing. are broadly in the area of computer architecture. His research is funded by NSF, DARPA, NSA, and industry.
  • Brandon Lucia is a Professor in the Electrical and Computer Engineering Department at Carnegie Mellon University. His lab's research encompasses programming languages, software systems, and computer architecture, especially with applications to highly physically constrained systems. His work developed the practical and theoretical foundations of hardware and software support for intermittent computing on energy-harvesting computing devices, energy-minimal reconfigurable dataflow architectures, and space-based computer systems, including constellations of nanosatellites. His lab with collaborators has architected and taped-out several chips that define the state-of-art in low-power, energy-efficient general-purpose computation and his team has launched several intermittent computer systems to Low-Earth Orbit. His research group's work has received several Best Paper (or equivalent) Awards, and he has received the Sloan Foundation Fellowship, the IEEE Technical Committee on Computer Architecture Young Computer Architect Award, and a number of other awards.

Session Chair: Vivek Seshadri (Microsoft Research)
FracDRAM: Fractional Values in Off-the-Shelf DRAM
Fei Gao, Georgios Tziantzioulis, David Wentzlaff (Princeton University)

pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables
João Dinis Ferreira (ETH Zürich); Gabriel Falcao (Instituto de Telecomunicações / University of Coimbra); Juan Gómez-Luna, Mohammed Alser, Geraldo F. Oliveira, Jeremie Kim, Mohammad Sadrosadati, Lois Orosa (ETH Zürich); Taha Shahroodi (TU Delft); Anant Nori (Intel); Onur Mutlu (ETH Zürich)

Multi-Layer In-Memory Processing
Daichi Fujiki (Keio University); Alireza Khadem (University of Michigan); Scott Mahlke (University of Michigan / NVIDIA Research); Reetuparna Das (University of Michigan)

Flash-Cosmos: In-Flash Bulk Bitwise Operations Using Inherent Computation Capability of NAND Flash Memory
Jisung Park (POSTECH / ETH Zürich); Roknoddin Azizi, Geraldo F. Oliveira, Mohammad Sadrosadati, Rakesh Nadig (ETH Zürich); David Novo (LIRMM / Université Montpellier / CNRS); Juan Gómez-Luna (ETH Zürich); Myungsuk Kim (Kyungpook National University); Onur Mutlu (ETH Zürich)
Session Chair: Brad Beckmann (AMD)
Page Size Aware Cache Prefetching
Georgios Vavouliotis (Universitat Politècnica de Catalunya / Barcelona Supercomputing Center); Gino Chacon (Texas A&M University); Lluc Alvarez (Universitat Politècnica de Catalunya / Barcelona Supercomputing Center); Paul Gratz, Daniel A. Jiménez (Texas A&M University); Marc Casas (Universitat Politècnica de Catalunya / Barcelona Supercomputing Center)

Berti: An Accurate Local-Delta Data Prefetcher
Agustín Navarro-Torres (Universidad de Zaragoza); Biswabandan Panda (Indian Institute of Technology Bombay); Jesús Alastruey-Benedé, Pablo Ibáñez, Víctor Viñals Yúfera (Universidad de Zaragoza); Alberto Ros (University of Murcia)

Translation-Optimized Memory Compression for Capacity
Gagandeep Panwar, Muhammad Laghari, David Bears, Yuqing Liu, Chandler Jearls (Virginia Tech); Esha Choukse (Microsoft Research); Kirk W. Cameron, Ali Butt, Xun Jian (Virginia Tech)

Merging Similar Patterns for Hardware Prefetching
Shizhi Jiang, Qiusong Yang, Yiwei Ci (Institute of Software, Chinese Academy of Sciences)

Session Chair: Gokul Subramanian Ravi (University of Chicago)
AutoComm: A Framework for Enabling Efficient Communication in Distributed Quantum Programs
Anbang Wu, Hezi Zhang, Gushu Li (University of California, Santa Barbara); Alireza Shabani (Cisco Research); Yuan Xie, Yufei Ding (University of California, Santa Barbara)

Let Each Quantum Bit Choose Its Basis Gates
Sophia Fuhui Lin (University of Chicago); Sara Sussman (Princeton University); Casey Duckering (University of Chicago); Pranav S. Mundada (Princeton University); Jonathan M. Baker, Rohan S. Kumar (University of Chicago); Andrew A. Houck (Princeton University); Frederic T. Chong (University of Chicago)

COMPAQT: Compressed Waveform Memory Architecture for Scalable Qubit Control
Satvik Maurya, Swamit Tannu (University of Wisconsin-Madison)

Qubit Mapping and Routing via MaxSAT
Abtin Molavi, Amanda Xu, Martin Diges, Lauren Pick, Swamit Tannu, Aws Albarghouthi (University of Wisconsin-Madison)

Scaling Superconducting Quantum Computers with Chiplet Architectures
Kaitlin Smith, Gokul Subramanian Ravi, Jonathan Baker, Fred Chong (University of Chicago)

Q3DE: A Fault-Tolerant Quantum Computer Architecture for Multi-Bit Burst Errors by Cosmic Rays
Yasunari Suzuki (NTT Computer and Data Science Laboratories); Takanori Sugiyama (University of Tokyo); Tomochika Arai (NTT Computer and Data Science Laboratories); Wang Liao (University of Tokyo); Koji Inoue, Teruo Tanimoto (Kyushu University)
PIMMiner: A High-Performance PIM Architecture-Aware Graph Mining Framework
Jiya Su (Illinois Institute of Technology)

HaX-CoNN : Heterogeneity-Aware Execution of Concurrent Deep Neural Networks
Ismet Dagli (Colorado School of Mines)

A Scalable Distributed-Data Architecture for Memory-Bound Applications
Marcelo Orenes-Vera (Princeton University)

Architectural Implications of Google's Data Center Applications
Kan Zhu (University of Michigan)

MIDAS: Multi-Candidate Evaluation with Delayed Selection for Reducing SWAPs in NISQ Programs
Suhas Vittal (Georgia Institute of Technology)

Introducing Payload Awareness to Improve Data Center Efficiency
Hilbert Chen (University of Michigan)

8:30 AM CDT – 9:30 AM CDT: Keynote III by Krysta M. Svore (Microsoft)

While quantum computing promises to help solve some of the great challenges ahead, we are still in the early days of what will be possible. Today’s quantum computers enable exciting research and early development, however their small scale often limits what’s possible and leaves an eagerness to do more. Quantum at scale requires three foundational elements: an industrial scale quantum machine, the power of the cloud, and an ecosystem of innovators. Where does it all come together? Azure Quantum, Microsoft's platform for quantum innovation and exploration. Learn how Microsoft is architecting the scalable quantum machine and empowering innovators with quantum at scale by co-designing tools to optimize quantum solutions, to run small instances on today’s diverse and maturing quantum hardware, and prepare for tomorrow’s-scaled quantum compute.

Dr. Krysta Svore is a Distinguished Engineer and VP of Quantum Software at Microsoft. She is passionate about empowering people and organizations around the world with quantum computing and realizing a scaled quantum machine. Her team designs and delivers Azure Quantum, the most diverse cloud platform for quantum research and discovery, and is developing a comprehensive software stack for scalable quantum computing including languages, compilers, and mappings to quantum hardware. Her team designs open software including Q# and QIR. Dr. Svore has published over 70 refereed articles and filed over 30 patents. She is a fellow of the American Association for the Advancement of Science and of Washington State Academy of Sciences. She won the 2010 Yahoo! Learning to Rank Challenge with a team of colleagues, received an ACM Best of 2013 Notable Article award, and was recognized as one of Business Insider Most Powerful Female Engineers of 2018. A Kavli fellow of the National Academy of Sciences, she also serves as an advisor to the National Quantum Initiative, the Advanced Scientific Computing Advisory Committee of the Department of Energy, and the ISAT Committee of DARPA, in addition to numerous other quantum centers and initiatives globally.

Session Chair: Freddy Gabbay (Ruppin Academic Center)
RemembERR: Leveraging Microprocessor Errata for Improving Design Testing and Validation
Flavien Solt, Patrick Jattke, Kaveh Razavi (ETH Zürich)

Datamime: Generating Representative Benchmarks by Automatically Synthesizing Datasets
Hyun Ryong Lee, Daniel Sanchez (Massachusetts Institute of Technology)

An Architecture Interface and Offload Model for Low-Overhead, Near-Data, Distributed Accelerators
Saambhavi Vajjiravelu Baskaran, Jack Sampson, Mahmut Taylan Kandemir (Pennsylvania State University)

Towards Developing High Performance RISC-V Processors Using Agile Methodology
Yinan Xu, Zihao Yu, Dan Tang, Guokai Chen, Lingrui Gou, Yue Jin, Qianruo Li, Xin Li, Jiawei Lin, Tong Liu, Zhigang Liu (Institute of Computing Technology, Chinese Academy of Sciences); Jiazhan Tan (Peking University); Huaqiang Wang, Huizhe Wang, Kaifan Wang, Chuanqi Zhang (Institute of Computing Technology, Chinese Academy of Sciences); Fawang Zhang (Shenzhen University); Linjuan Zhang, Zifei Zhang, Yaoyang Zhou (Institute of Computing Technology, Chinese Academy of Sciences); Yike Zhou (Nanjing University); Jiangrui Zou, Ye Cai (Shenzhen University); Dandan Huan, Zusong Li, Jiye Zhao (Beijing VCore Technology); Zihao Chen, Wei He, Qiyuan Quan (Peng Cheng Laboratory); Sa Wang (Institute of Computing Technology, Chinese Academy of Science); Kan Shi, Ninghui Sun, Yungang Bao (Institute of Computing Technology, Chinese Academy of Sciences)
Session Chair: Fan Yao (University of Central Florida)
DiVa: An Accelerator for Differentially Private Machine Learning
Beomsik Park, Ranggi Hwang, Dongho Yoon, Yoonhyuk Choi, Minsoo Rhu (KAIST)

EVAX: Towards a Practical, Pro-active and Adaptive Architecture for High Performance and Security
Samira Mirbagher Ajorpaz, Daniel Moghimi, Jeffrey Collins (University of California, San Diego); Nael Abu-Ghazaleh (University of California Riverside); Gilles A. Pokam (Intel); Dean Tullsen (University of California, San Diego)

ARK: Fully Homomorphic Encryption Accelerator with Runtime Data Generation and Inter-Operation Key Reuse
Jongmin Kim, Gwangho Lee, Sangpyo Kim, Gina Sohn (Seoul National University); John Kim, Minsoo Rhu (KAIST); Jung Ho Ahn (Seoul National University)

Horus: Persistent Security for Extended Persistence-Domain Memory Systems
Xijing Han, James Tuck, Amro Awad (North Carolina State University)

Session Chair: Josep Torrellas (University of Illinois Urbana-Champaign)
Mint: An Accelerator for Mining Temporal Motifs
Nishil Talati, Haojie Ye (University of Michigan); Sanketh Vedula (Technion); Kuan-Yu Chen, Yuhan Chen, Daniel Liu, Yichao Yuan, David Blaauw (University of Michigan); Alex Bronstein (Technion); Trevor Mudge, Ronald Dreslinski (University of Michigan)

DPU-v2: Energy-Efficient Execution of Irregular Directed Acyclic Graphs
Nimish Shah, Wannes Meert, Marian Verhelst (KU Leuven)

XPGraph: XPline-Friendly Persistent Memory Graph Stores for Large-Scale Evolving Graphs
Rui Wang, Shuibing He, Weixu Zong (Zhejiang University); Yongkun Li, Yinlong Xu (University of Science and Technology of China)

A Data-Centric Accelerator for High-Performance Hypergraph Processing
Qinggang Wang, Long Zheng, Ao Hu, Yu Huang, Pengcheng Yao, Chuangyi Gui, Xiaofei Liao, Hai Jin (Huazhong University of Science and Technology); Jingling Xue (UNSW Sydney)

ReGraph: Scaling Graph Processing on HBM-Enabled FPGAs with Heterogeneous Pipelines
Xinyu Chen (National University of Singapore); Yao Chen (Advanced Digital Sciences Center, Singapore); Feng Cheng (City University of Hong Kong); Hongshi Tan, Bingsheng He, Weng-fai Wong (National University of Singapore)
Session Chair: Po-An Tsai (NVIDIA Research)
3D-FPIM: An Extreme Energy-Efficient DNN Acceleration System Using 3D NAND Flash-Based In-Situ PIM Unit
Hunjun Lee, Minseop Kim, Dongmoon Min (Seoul National University); Joonsung Kim (Google); Jongwon Back, Honam Yoo, Jongho Lee, Jangwoo Kim (Seoul National University)

Sparseloop: An Analytical Approach to Sparse Tensor Accelerator Modeling
Yannan Nellie Wu (Massachusetts Institute of Technology); Po-An Tsai, Angshuman Parashar (NVIDIA); Vivienne Sze (Massachusetts Institute of Technology); Joel Emer (Massachusetts Institute of Technology / NVIDIA)

DeepBurning-SEG: Generating DNN Accelerators of Segment-Grained Pipeline Architecture
Xuyi Cai, Ying Wang, Xiaohan Ma, Yinhe Han, Lei Zhang (Institute of Computing Technology, Chinese Academy of Sciences)

ANT: Exploiting Adaptive Numerical Data Type for Low-Bit Deep Neural Network Quantization
Cong Guo (Shanghai Jiao Tong University); Chen Zhang (Microsoft Research); Jingwen Leng, Zihan Liu (Shanghai Jiao Tong University); Fan Yang (Microsoft Research); Yunxin Liu (Institute for AI Industry Research, Tsinghua University); Minyi Guo (Shanghai Jiao Tong University); Yuhao Zhu (University of Rochester)

Ristretto: An Atomized Processing Architecture for Sparsity-Condensed Stream Flow in CNN
Gang Li (Shanghai Jiao Tong University); Weixiang Xu (Institute of Automation, Chinese Academy of Sciences); Zhuoran Song, Naifeng Jing (Shanghai Jiao Tong University); Jian Cheng (Institute of Automation, Chinese Academy of Sciences); Xiaoyao Liang (Shanghai Jiao Tong University)

