CS Special Seminar: Michael Jungmair "Rethinking Data Processing Systems: A Compiler-Centric Approach for Modern Workloads and Heterogeneous Hardware"
Rethinking Data Processing Systems: A Compiler-Centric Approach for Modern Workloads and Heterogeneous Hardware
Michael Jungmair
Technical University of Munich
Google Scholar
Abstract: Data processing systems are increasingly expected to process workloads beyond traditional SQL queries. They must integrate user-defined functions (UDFs), support data-science pipelines, and execute domain-specific algorithms. However, existing systems typically treat such computations as black boxes, preventing logical optimization and limiting the ability to fully exploit modern hardware.
In this talk, I will discuss how to rethink core components of data processing systems from a compiler’s perspective. Key ideas include using high-level, extensible intermediate representations to model logical and physical query plans, implementing query optimization as compiler passes, and introducing a new compile-time abstraction layer below high-level operators. I will briefly present the end-to-end realization of these concepts in LingoDB, our open-source system, and show how they enable the inlining, efficient compilation, and cross-boundary optimization of Python UDFs, as well as execution on modern GPUs. Finally, I will outline three future research directions: (1) providing query optimization and compilation as a service for existing systems, (2) end-to-end optimization of data-science and machine-learning pipelines, and (3) rethinking data systems for interaction with AI agents.
Bio: Michael Jungmair is a researcher in data management systems and compilers at the Technical University of Munich, where he has recently submitted his PhD thesis. His research focuses on designing novel architectures for data processing systems that enable efficient execution of complex workloads defined using general-purpose programming languages, while leveraging modern hardware accelerators such as GPUs. He leads the open-source LingoDB project, which applies compiler techniques to combine high performance with flexibility and extensibility in modern query engines.
Department of Computer Science
We are an internationally-oriented community and home to world-class research in modern computer science.