Figures and data

Two types of possible movements within the solution space.
(A) Two options of how drift may look in the solution space. Random walk within the space of equally good solutions that is either undirected (left) or directed (right). (B) The qualitative consequence of the two movement types. For an undirected random walk, all properties of the solution will remain roughly constant (left). For the directed movement there should be a given property that is gradually increasing or decreasing (right).

Noisy learning leads to spontaneous sparsification.
(A) Illustration of an agent in a corridor receiving high-dimensional visual input from the walls. (B) Log loss as a function of training steps, log loss of 0 corresponds to a mean estimator. The loss rapidly decreases, and then remains roughly constant. (C) Information (blue) and fraction of units with non-zero activation for at least one input (red) as a function of training steps. (D) Rate maps sampled at four different time points. Maps in each row are sorted according to a different time point. Sorting is done based on the peak tuning value to the latent variable. (E) Correlation of rate maps between different time points along training. Only active units are used. (F) Figures reproduced from [21] where mice spent different amount of time in two environments. Fraction of place cells in the beginning relative to the end of the experiment (left), average Spatial Information (SI) per cell in the beginning relative to the end of the experiment (middle) and the decoding error for the position of the mouse (right).

Generality of the results.
Summary of 1117 simulations with various parameters (see Table 1). (A) Histogram of fraction of active units after 107 training steps for each simulation. (B) Subset of 178 simulations with the same parameters and varying noise variance, each point represents a single simulation. Fraction of active units as a function of the variance of the noise (top), the log of sparsification time scale as a function of the variance of the noise (bottom). (C) Learning a similarity matching task with Hebbian and anti-Hebbian learning with published code from [23]. Performance of the network (blue) and fraction of active units (red) as a function of training steps. Note that the loss axis does not start at zero, and the dynamic range is small.

Parameter ranges for random simulations.

Noisy learning leads to a flat landscape.
(A) Gradient Descent dynamics over a two-dimensional loss function with a one-dimensional zero-loss manifold. Note that the loss is identically zero along the horizontal axis, but the left area is flatter. The orange trajectory begins at the red dot. Note the asymmetric extension into the left area. (B) Fraction of active units as a function of the number of non-zero eigenvalues. (C) Log of non-zero eigenvalues at two consecutive time points. Note that eigenvalues do not correspond to one another when calculated at two different time points, and this plot demonstrates the change in their distribution rather than changes in eigenvalues corresponding to specific directions. The distribution of larger eigenvalues hardly changes, while the distribution of smaller eigenvalues is pushed to smaller values. (D) Sum of the Hessian’s eigenvalues as a function of time for learning with label noise.