ZurichNLP #16
Wed 21 May
|Zürich
Ivan Vulić (University of Cambridge/Google DeepMind) about vision-language models for spatial reasoning and Catalina Torres (University of Zurich) about Swiss German's unique grammar.


Time & Location
21 May 2025, 18:00 – 20:00
Zürich, OAT ETH Zurich (14th floor), Andreasstrasse 5, 8050 Zürich, Switzerland
About the Event
Ivan Vulić (University of Cambridge/Google DeepMind): Guiding Vision-Language Models to Climb the Mountain of Spatial Reasoning
Large Vision-Language Models (VLMs) have demonstrated impressive performance in general vision-language tasks. However, even the most recent and most powerful VLMs still struggle even with simple spatial understanding and reasoning capabilities. In this talk, I will first provide a brief overview of our recent work on creating new benchmarks and improving evaluation of VLMs for a range of spatial reasoning tasks. I will then outline our novel methodology related to enhancing spatial reasoning capabilities of VLMs, with a focus on spatial navigation tasks, such as multi-modal visualization-of-thought and purely visual planning.
Catalina Torres (University of Zurich): How Swiss German Helps us Understand Grammar Better