Ass.Prof. Katerina Fragkiadaki, Carnegie Mellon University, to speak at ECE-NTUA on October 25, 2019, at 11:00 (Teleteaching Room, Central Library of NTUA)

Lecture Title: Embodied Visual Recognition with Implicit 3D Feature Representations

Abstract: Current state-of-the-art CNNs localize rare object categories in internet photos, yet, they miss basic facts that a two-year-old has mastered: that objects have 3D extent, they persist over time despite changes in the camera view, they do not 3D intersect, and others. We will discuss neural architectures that given video streams learn to disentangle scene appearance from camera and object motion, and distill the former into world-centric 3D feature maps. We will show the proposed architectures learn object permanence, can generate RGB views from novel viewpoints in truly novel scenes, have objects emerge in 3D without human annotations, support grounding of language in 3D visual simulations, and learn intuitive physics in a persistent 3D feature space. In this way, they overcome many limitations of 2D CNNs for video perception, model learning and language grounding.

Short CV: Katerina Fragkiadaki is an Assistant Professor in the Machine Learning Department in Carnegie Mellon University. He received her Ph.D. from University of Pennsylvania in 2013 and was a postdoctoral fellow in UC Berkeley and Google research (2013-2016). She has done a lot of work on video segmentation, motion dynamics learning and on the area of injecting geometry into deep visual learning. Her group develops algorithms for mobile computer vision and learning of Physics and common sense for agents that move around and interact with the world. She received a best Ph.D. thesis award in 2013 and served as the area chair in CVPR 2018, ICML 2019/2020, ICLR 2019, CVPR 2020.