ZurichCV #10
Thu 07 Aug
|Zürich
Felix Wimbauer (Technical University of Munich/Google) about understanding the 3D World from videos, and Joan Puigcerver (Google DeepMind) on how to scale computer vision architectures.


Time & Location
07 Aug 2025, 18:00 – 20:00
Zürich, OAT ETH Zurich (14th floor), Andreasstrasse 5, 8050 Zürich, Switzerland
About the Event
Felix Wimbauer (Technical University of Munich/Google): Learning to Understand the 3D World from Large-Scale Video Datasets
Felix will cover three of his works (MonoRec, BehindTheScenes, S4C) for understanding 3D scenes from video data, highlighting approaches that shift from traditional supervised learning to more data-efficient, self-supervised techniques. Each method addresses the common challenges of estimating depth, handling dynamic objects, and synthesizing new views, all while working with minimal labeled data. These techniques share a focus on leveraging large video datasets to improve depth prediction, semantic segmentation, and camera motion estimation without relying on specialized sensors. Building on that, he will present his recent CVPR 2025 paper AnyCam, which builds on these principles by estimating camera motion and scene geometry from casual videos, aiming for more reliable performance in diverse environments.
Joan Puigcerver (Google DeepMind): Scaling Computer Vision: Transformers and Sparse Mixture-of-Experts Models The field of Computer Vision shifted from ad-hoc models trained…