Dr Yannis Kalantidis to speak at ECE-NTUA on February 25, 2020, at 17:30 (Multimedia amphitheater, Central Library of NTUA)

Lecture Title: Learning efficient representations for image and video understanding

Abstract: Two important challenges in image and video understanding are designing more effective and efficient deep Convolutional Neural Networks, and learning models that are able to achieve higher-level understanding. In this talk, I will present some of my recent works towards tackling these challenges. Specifically, I will present the Global Reasoning Networks [CVPR 2019], a new approach for reasoning over arbitrary sets of features of the input, by projecting them from a coordinate space into an interaction space where relational reasoning can be efficiently computed. I will also introduce the Octave Convolution [ICCV 2019], a plug-and-play replacement for the convolution operator that exploits the spatial redundancy of CNN activations and can be used without any adjustments to the network architecture. The two methods presented are complementary and achieve state-of-the-art performance on both image and video tasks. Aiming for higher-level understanding, I will further present our recent works on vision and language modeling, specifically our work on learning state-of-the-art image and video captioning models that are also able to better visually ground the generated sentences [CVPR 2019].

Short Bio: For the last three years Yannis Kalantidis was a research scientist at Facebook AI in Menlo Park, California. He grew up in Athens, Greece and lived there until 2015, with brief breaks in Sweden, Spain and the United States. He got his PhD on large-scale search and clustering from the National Technical University of Athens in 2014. He was a postdoc and research scientist at Yahoo Research in San Francisco for from 2015 until 2017, leading the visual similarity search project at Flickr and participated in the Visual Genome project at Stanford. At Facebook Research he was part of the video understanding group, conducting research on representation learning, video understanding and modeling of vision and language. He is further leading the Computer Vision for Global Challenges Initiative that has organized impactful workshops at top venues like CVPR and ICLR. Personal website: https://www.skamalas.com/

This lecture is co-organised by the School of Electrical and Computer Engineering, NTUA, and the Greek Chapter of IEEE Computational Intelligence Society.