¹⁰Based on (Olshausen and Field, 1996; Van Hateren and van der Schaaf, 1998). The learning principle used here can be intuitively understood from two different viewpoints. One is independence of the features: the outputs of the neural network (which in this case has a single layer) should be as independent as possible in the sense of probability theory. In other words, knowing one feature should give minimal information about the other features. The other viewpoint is sparsity: the features should be silent (zero) most of the time and only rarely turned “on”. An important benefit of such sparse coding is that it minimizes energy consumption if representing a feature that is zero consumes little energy. Therefore, the learning principle used is called either independent component analysis or sparse coding, which are almost the same thing. Such analysis can be implemented as a particular kind of Hebbian learning. Actually, there is an even more fundamental regularity in visual input than the one depicted here, which is that two near-by pixels tend to have similar grey-scale values (they are strongly correlated). That is, if a pixel is, say, white, the pixels next to it are quite likely to be white as well—and the same applies for any colour. Such similarities are analysed by neurons (“ganglion cells”) in the retina. However, this regularity is so elementary that it is in some sense included in, or implied by, the regularity described by the edges. Mathematically speaking, the covariances of pixel grey-scale values are perfectly modelled by independent component analysis and no additional model is needed. For a general introduction to the models used here, see Hyvärinen et al. (2009).