Salle Ribot
The pentagonal faces of a soccer ball or the smooth surface of an egg reflect light patterns that are readily perceived as three-dimensional shapes. Yet, each object projects different 3D signals. The regular pattern of projected pentagons from the ball form on the retina a compelling texture gradient. The unmarked surface of the egg, however, only a shading gradient. Traditionally, the presence or absence of image signals reflected by various objects in a scene has been modelled in terms of signal reliability. According to this view, each signal is used to compute a probability distribution of possible depth interpretations, which is on average peaked at the veridical value. If signals are reliable, like the texture gradient projected by the ball, then this distribution is narrow. The Maximum Likelihood approach of cue integration combines the depth estimates from all the signals by factoring in their reliabilities, guarantying a final estimate that is maximally reliable. Although the MLE approach is a rational choice, it is still not clear how signal reliabilities are estimated and how the visual system can learn a veridical mapping between image and 3D properties. In this talk, I will present a model that does not depend on these assumptions. Instead, I postulate that the visual system learns a fixed mapping between image signals and depth through a deterministic function that maximizes the sensitivity to depth while minimizing its response to random fluctuations of image signals. I will illustrate how this model can account for previous results, but also predict novel findings that are incompatible with the MLE account of 3D cue integration.