Seminar: Dissertation Talk: CS | May 15 | 1:30-2:30 p.m. | Sutardja Dai Hall, Newton Room/730
Evan Shelhamer, UC Berkeley
Much of the recent progress on visual recognition has been driven by deep learning and its bicameral heart of composition and end-to-end optimization. Its diffusion however was neither instantaneous nor effortless. To advance across the frontiers of vision, deep learning had to be equipped with the right structures: the true, intrinsic structures of the visual world.
In this talk, I will focus on incorporating locality and scale structure into end-to-end learning to address image-to-image tasks that take image inputs and return image outputs, and examine how dynamic inference to adapt model computation can help cope with the variability of these rich prediction problems. I will look at these directions through the lens of local recognition tasks that require inference of what and where.
Fully convolutional networks decompose image-to-image learning and inference into local scopes. Factorizing these scopes into structured Gaussian and free-form parts, and learning both, optimizes their size and shape to control the degree of locality. Dynamic inference equips our fully convolutional networks with adaptivity to more fully engage with the vastness and variety of vision.
Advisor: Trevor Darrell
Faculty, Students - Graduate