KCIST Kolloquium - Mastering New Visual Understanding Tasks in 3D Scenes with Efficient Learning
-
Tagungsort:
Adenauerring 2, Geb. 50.20, Raum 148, 76131 Karlsruhe
-
Datum:
25. Mai 2023, 10:00 Uhr
-
Referent:
Assistant Professor Ziad Al-Halah, The University of Utah
Ziad Al-Halah is an Assistant Professor of Computer Science in The University of Utah. Before the UofU, Ziad was a Postdoc in the University of Texas at Austin, working with Prof. Kristen Grauman. He received his PhD in CS from Karlsruhe Institute of Technology, advised by Prof. Rainer Stiefelhagen. His research in Computer Vision and AI aims to develop intelligent machines that perceive and understand the world around them, and that can transfer and adapt their previous experience to learn new visual tasks faster, better, and while using limited supervision and resources. Ziad publishes regularly in top tier Computer Vision and AI conferences (CVPR, ICCV, ECCV, ICLR, NeurIPS). His work has been recognized with a best paper award (ICPR’14) and as the first place winner of the Habitat Challenge (PointNav) at CVPR'20 and the Textbook Question Answering Challenge (TQA) at CVPR'17.
-
Abstract:
In the age of deep neural networks, advances in Computer Vision research have reached an outstanding level of performance, sometimes matching that of a human expert for certain tasks. However, these developments often come at a substantial hidden cost in the form of large-scale manually labeled data or use of massive computation resources. Most concerning, ask these models to do a slightly different version of the task they learned, and they will fail completely. In contrast, people learn new things from very little data, and we can solve new tasks with great efficiency by tapping into our previous experience. Towards achieving this remarkable ability in machines, in my research, I develop intelligent visual agents that efficiently re-use previous experiences and skills to solve a new visual task, while requiring limited supervision and computation. In this talk, I will discuss how we can create autonomous agents that learn common sense re-usable visual reasoning skills and behaviors that enable them to interact, explore, navigate, and search for objects in a 3D space while using between 6X and 35X less training data.