ZurichCV #5
Tue, 17 Sept
|Zürich
Julian Eisenschlos from Google DeepMind and Anton Obukhov from ETH Zurich.
Time & Location
17 Sept 2024, 18:00 – 20:00
Zürich, OAT ETH Zurich (14th floor), Andreasstrasse 5, 8050 Zürich, Switzerland
About the Event
Julian Eisenschlos (from Google Deepmind) will talk about: "Visual language: how generation can drive understanding of charts and plots"
"Large amounts of content both online and offline relies on structure to organize and communicate information more effectively. While natural image understanding and generation has been studied extensively, visually situated language such as tables, charts, plots, and infographics, continues to be a challenge for models large and small. In this talk we will show how teaching models to generate visually situated language can improve downstream reading and reasoning on this data modality for tasks such as question answering, entailment and summarization."
Anton Obukhov (from ETH Zurich) will talk about "Beyond Astronaut on a Horse: Repurposing Text-to-Image Models".
"Stable Diffusion has garnered significant attention from the vision community, artists, and content creators. Its widespread adoption stems from notable enhancements in generation quality, versatile conditioning across different modalities, and the availability of open-source models. In this talk, we will initially explore leveraging the rich generative image prior in 3D contexts, focusing on the task of painting 3D geometry. Subsequently, we will examine the application of these pretrained generative diffusion models within the perception stack. Specifically, we enhance domain generalization in semantic segmentation through a data-centric approach for image and label generation. In the final segment, we introduce an innovative monocular depth estimation pipeline. This pipeline, built upon the generative prior and refined with synthetic data, achieves state-of-the-art results across multiple real-world datasets in a zero-shot fashion."