9th Computing Systems Research Day - 7 January 2025
Schedule
-
11:45-12.00 | Welcome
-
Abstract
Sustainability and climate change is a major challenge for our generation. In this talk I will argue that sustainable development requires a holistic approach and involves multi-perspective thinking. Applied to computing, sustainable development means that we need to consider the entire environmental impact of computing, including raw material extraction, component manufacturing, product assembly, transportation, use, repair/maintenance, and end-of-life processing (disassembly and recycling/reuse). Analyzing current trends reveals that the embodied footprint is, or will soon be, more significant compared to the operational footprint. I will present a simple, yet insightful, first-order model to assess and reason about the sustainability of processors in light of the inherent data uncertainty. Applying the model to a variety of case studies illustrates what computer architects and engineers can and should do to better understand the sustainability impact of computing, and to design sustainable processors.
Bio
Lieven Eeckhout is a Professor at Ghent University, Belgium. His research interests include computer architecture and the hardware/software interface, with specific emphasis on performance evaluation and modeling, dynamic resource management, CPU/GPU microarchitecture, and sustainability. He is the recipient of the 2017 ACM SIGARCH Maurice Wilkes Award and the 2017 OOPSLA Most Influential Paper Award, and he was elevated to IEEE Fellow in 2018 and ACM Fellow in 2021. Other awards include seven IEEE Micro Top Picks selections, three Best of CAL selections, and the MICRO 2023 and ISPASS 2013 Best Paper Awards. He served as the Program Chair for ISCA 2020, HPCA 2015, CGO 2013 and ISPASS 2009, and as General Chair for ASPLOS 2025, IISWC 2023 and ISPASS 2010.
-
13:00-14:00 | Lunch Break
-
Abstract
Resource elasticity is one of the key defining characteristics of the Function-as-a-Service (FaaS) serverless computing paradigm. While compute resources assigned to VM-sandboxed functions can be seamlessly adjusted on the fly, memory elasticity remains challenging. Hot(un)plugging memory resources suffers from long reclamation latencies and occupies valuable CPU resources. In this talk, we identify the obliviousness of the OS memory manager to the hotplugged memory as the key issue hindering hot-unplug performance, and introduce a novel approach for fast and efficient VM memory hot(un)plug, targeting VM-sandboxed serverless functions. We will discuss how segregating hotplugged memory regions allows us to bound allocation lifetimes, solving the long reclamation latencies that plague current systems. We will demonstrate how this new approach, implemented as an extension to the Linux memory manager, achieves sub-second reclamation of multi-GiB memory blocks — an order-of-magnitude improvement over the state-of-the-art — without sacrificing performance under realistic FaaS loads.
Bio
Orestis Lagkas Nikolos is a Postdoctoral Researcher at the Computing Systems Lab (CSLab) at NTUA, specializing in Cloud Computing and Serverless architectures. Holding a Ph.D. (October 2025) and a Master’s degree in Electrical and Computer Engineering from NTUA, his research focuses on optimizing Function-as-a-Service (FaaS) through VM-based sandboxing and containerization. His most recent work investigates VM memory elasticity, aiming to improve resource efficiency and performance in serverless environments.
-
Abstract
With the end of Dennard scaling and the slowing of Moore’s Law, performance improvements in general-purpose processors have plateaued. Computing is increasingly moving away from general-purpose processors to a highly heterogeneous ecosystem centered around hardware specialization. In this emerging landscape, a diverse range of accelerators are deployed, from the edge to the cloud, and from personal devices to large supercomputers. These accelerators have already enabled significant advances in domains such as machine learning. However, when such heterogeneous devices are integrated into end-to-end systems, the interactions between the various system components introduce a set of novel and interconnected challenges. Excessive control and data movement overheads, the rising complexity of performance tuning, and bottlenecks imposed by non-specialized components limit the full potential of accelerator-based computing. In this talk, I will present an overview of my efforts to overcome these barriers by re-architecting and co-designing end-to-end system stacks around heterogeneity – from accelerators to processors, networks, algorithms, and optimizing compilers. I will then highlight two concrete examples of accelerator-centric system design: DECA, an accelerator–processor co-design for large language model inference, and NetSparse, an accelerator–network co-design for scalable sparse applications. Together, these efforts represent a shift from legacy processor-centric design and lay foundations for fundamentally more efficient computing systems architected around heterogeneity at every level.
Bio
Gerasimos (Makis) Gerogiannis is a PhD candidate at the University of Illinois at Urbana-Champaign (UIUC), advised by Professor Josep Torrellas. His research focuses on computer architecture, with an emphasis on accelerator-based heterogeneous computing systems. In his work, he advocates re-architecting and co-designing the end-to-end computing stack around heterogeneity – from accelerators to processors, networks, algorithms, and optimizing compilers. His research has appeared in top-tier venues such as ISCA, MICRO, ASPLOS, HPCA, and ICML, and has been recognized with one IEEE MICRO Top Pick and one Honorable Mention. Prior to joining UIUC, Makis received the Diploma Degree in Electrical and Computer Engineering from the University of Patras.
-
Abstract
The page cache is central to the performance of many applications. However, its one-size-fits-all eviction policy may perform poorly for many workloads. While the systems community has experimented with new and adaptive eviction policies in non-kernel settings (e.g., key-value stores, CDNs), it is very difficult to implement such policies in the kernel. We design a flexible eBPF-based framework for the Linux page cache, called cache_ext that allows developers to customize the page cache without modifying the kernel. We demonstrate the flexibility of cache_ext’s interface by using it to implement eight different policies, including sophisticated eviction algorithms. Our evaluation finds that it is indeed beneficial for applications to customize the page cache to match their workloads’ unique properties, and that they can achieve up to 70% higher throughput and 58% lower tail latency.
Bio
Ioannis Zarkadas is currently working on the XLA compiler at Google. Ioannis obtained a PhD from Columbia University in 2025 and before that, an undergraduate from the National Technical University of Athens in 2019. He is broadly interested in computer systems and performance optimizations, with his PhD focusing on operating system extensions for storage and memory efficiency.
-
15:30-16:00 | Break
-
Abstract
The computational demands of modern Deep Neural Networks (DNNs) are immense and constantly growing. While training costs usually capture public attention, inference demands are also contributing in significant computational, energy and environmental footprints. Sparsity stands out as a critical mechanism for drastically reducing these resource demands. However, its potential remains largely untapped and is not yet fully incorporated in production AI systems. To bridge this gap, we need knowledge and insights for performance engineers keen to get involved in deep learning inference optimization. In particular, in this talk we: a) discuss the various forms of sparsity that can be utilized in DNN inference, b) explain how the original dense computations translate to sparse kernels, c) provide an extensive bibliographic review of the state-of-the-art in the implementation of these kernels for CPUs and GPUs, d) discuss the availability of sparse datasets in support of sparsity-related research and development, e) explore the current software tools and frameworks that provide robust sparsity support, and f) present evaluation results of different implementations of the key SpMM and SDDMM kernels on CPU and GPU platforms.
Bio
Georgios Goumas is a Professor at the School of ECE of the National Technical University of Athens. His research interests include high-performance computing and architectures, cloud computing, resource allocation policies, resource-demanding applications, sparse algebra, automatic parallelizing compilers, parallel programming models, etc. He has published more than 80 research papers in journals, international conferences and peer-reviewed workshops. He has worked in several European and National R&D programs in the field of High Performance Computing, Cloud Computing, Networking and Storage for IT systems (ACTiCLOUD - Scientific Manager, EuroEXA - Coordinator, REGALE - Coordinator, SEANERGYS, DARE, DCOMEX, EXAFOAM, DAPHNE, Bonseyes, HiDALGO/2, PRACE, PRESEMT, DOLFIN).
-
17:00-17:15 | Closing Remarks
Venue
Ceremonial Hall of the Central Administration Building (Zografou Campus)