ZurichNLP #16
Wed 21 May
|Zürich
Ivan Vulić (University of Cambridge/Google DeepMind) about vision-language models for spatial reasoning and Tiago Pimentel (ETH Zurich) about Generalisation in Language Models.


Time & Location
21 May 2025, 18:00 – 20:00
Zürich, OAT ETH Zurich (14th floor), Andreasstrasse 5, 8050 Zürich, Switzerland
About the Event
Ivan Vulić (University of Cambridge/Google DeepMind): Guiding Vision-Language Models to Climb the Mountain of Spatial Reasoning
Large Vision-Language Models (VLMs) have demonstrated impressive performance in general vision-language tasks. However, even the most recent and most powerful VLMs still struggle even with simple spatial understanding and reasoning capabilities. In this talk, I will first provide a brief overview of our recent work on creating new benchmarks and improving evaluation of VLMs for a range of spatial reasoning tasks. I will then outline our novel methodology related to enhancing spatial reasoning capabilities of VLMs, with a focus on spatial navigation tasks, such as multi-modal visualization-of-thought and purely visual planning.
Tiago Pimentel (ETH Zurich): Duplicating Vocabularies to Analyse Generalisation in Language Models
In this talk, we will explore how duplicating a language model’s vocabulary can create controlled experiments, which we can leverage to address two research questions. First, we use vocabulary duplication to…