Events

Guest talk: Miika Aittala (NVIDIA) and Aku Rouhe (AMD Silo AI)

Miika Aittala (NVIDIA) and Aku Rouhe (AMD Silo AI) will give presentations at the CS-E4891 Deep Generative Models course.
Guest Talk tapahtumakuva

12:15 Dr. Aku Rouhe, Research Scientist, AMD Silo AI

LLM post-training and the Superficial Alignment Hypothesis

Abstract: Large Language Model (LLM) post-training is a final training stage that makes LLMs behave as we wish. In this stage, the models learn to chat, to follow instructions, and align to human preferences and values. After this stage the most relevant evaluations no longer measure how much information is packed into the weights, but how useful the models are. But is the effect of post-training just a shallow surface-level change as the Superficial Alignment Hypothesis states?

Short bio: Aku Rouhe is an Aalto-alumni from B.Sc to Ph.D. For his doctoral degree he studied speech recognition and the trend of end-to-end models. He interned at Mila, QC, Canada, and helped create the open source SpeechBrain toolkit. He defended in 2024. Since 2023 he has been working on Large Language Models at Silo AI, now a part of AMD.

13:15 Dr. Miika Aittala, Senior Research Scientist, NVIDIA

Guiding a Diffusion Model with a Bad Version of Itself

Abstract: The primary axes of interest in image-generating diffusion models are image quality, the amount of variation in the results, and how well the results align with a given condition, e.g., a class label or a text prompt. The popular classifier-free guidance approach uses an unconditional model to guide a conditional model, leading to simultaneously better prompt alignment and higher-quality images at the cost of reduced variation. These effects seem inherently entangled, and thus hard to control. We make the surprising observation that it is possible to obtain disentangled control over image quality without compromising the amount of variation by guiding generation using a smaller, less-trained version of the model itself rather than an unconditional model. This leads to significant improvements in ImageNet generation, setting record FIDs of 1.01 for 64x64 and 1.25 for 512x512, using publicly available networks. Furthermore, the method is also applicable to unconditional diffusion models, drastically improving their quality.

Short bio: Miika Aittala is a senior research scientist at NVIDIA Research. His research interests include neural generative modeling and computer graphics, and his recent work has focused on fundamentals of diffusion models and GANs. Prior to joining NVIDIA, he obtained his PhD from Aalto University, and worked as a postdoctoral researcher at MIT CSAIL.

This guest talk is hosted by Associate Professor Harri Lähdesmäki, Department of Computer Science & Assistant Professor Lauri Juvela, Department of Information and Communications Engineering.

Department of Computer Science

We are an internationally-oriented community and home to world-class research in modern computer science.

Read more
  • Updated:
  • Published:
Share
URL copied!