Model-based computer vision seeks to explain an image in terms of the underlying processes that give rise to visual appearance. This could be a geometric model (as in structure-from-motion), a physics-based model (as in photometric shape estimation) or even biophysical or perceptual models. These models unify vision and graphics, since graphics seeks to simulate the forward process while vision seeks to invert the models to estimate useful intrinsic properties. Deep learning based vision seems to turn much of this on its head. Why bother with a complicated model that makes lots of simplifying assumptions and may be inadequate to explain real world, noisy data? Black box CNNs seem to be able to learn to solve many of the same problems without relying on human-created models. In this talk, I will begin with some classical model-based computer vision, specifically with applications in visual media. I will highlight the advantages and flexibility that comes from having an underlying model. Then I will consider whether the same problems can really be solved by black box machine learning methods. Finally, I will argue that the next big advances in vision and graphics will come from combining modelling and learning, for example by incorporating geometric or photometric models into CNNs.