Events

Scene structure representation and understanding in human and machine vision

Lectures and seminars

This time, Tarun Khajuria (University of Tartu) will present research on scene structure representation in human and machine vision, using a star-constellation–inspired task to explore how humans flexibly interpret sparse visual inputs and how these strategies compare to generative models and deep learning architectures.

When

24.11.2025 14:15 – 15:30 (UTC +2)

Where

Onsite & Online

Otakaari 3, 02150 Espoo F239a Auditorio, or via Zoom (https://aalto.zoom.us/j/67444945844)

Event language(s)

English

Welcome to our ABC Seminars! This seminar series is open for everyone. The talk will take place in Otakaari 3, F239a Auditorio. After the talks, coffee and pulla will be served.

The event will be also streamed via Zoom at: https://aalto.zoom.us/j/67444945844

Scene structure representation and understanding in human and machine vision: Insights from star-constellations inspired vision task

Abstract: Humans have the ability to flexibly interpret the same visual scene in multiple ways. For example, in a cinema hall, we can identify individual seats as chairs, bean bags, or couches, while also perceiving them as part of larger structure of rows and sections that define walkable paths. This flexibility also facilitates the robustness of our visual system under challenging conditions, using structural understanding to infer missing information but also to disentangle relevant objects from co-occurring elements when required based on the context. In this talk, I will discuss our work exploring the computational mechanisms that support such robustness in human vision. We designed a challenging vision task inspired by star constellations, where outlines of objects are hidden within sparse arrangements of dots. In our experiments, human participants show that recognizing these highly underspecified images can involve forming and iteratively refining multiple object hypotheses guided by shape and structural cues. By comparing these human strategies with our proposed generative search models and popular deep learning architectures, we recognise key computational components supporting visual inference in difficult conditions. Finally, I will discuss how these insights motivated our examination of structured scene representation in machine vision encoders, focusing on the role of abstraction to support operations for visual understanding.

Updated: 21.11.2025
Published: 10.11.2025