Uncertainty in Recurrent Computation: An Ensemble View of Temporal Credit Assignment

Recurrent neural networks have long served as the workhorse for sequential data processing, from natural language processing to the modeling of biological neural populations. Yet beneath their widespread application lies a persistent theoretical challenge: temporal credit assignment. Determining how each synaptic weight at every time step contributes to a final outcome remains computationally demanding and conceptually opaque. Standard approaches rely on backpropagation through time, treating weights as precise point estimates and gradients as exact quantities. But this deterministic framing may obscure a deeper truth about how recurrent computation actually functions.

In "Ensemble perspective for understanding temporal credit assignment," the authors propose a fundamental reframing. Rather than viewing each connection as a fixed scalar value, they model synaptic weights as random variables drawn from spike and slab distributions. This shift from point estimates to full posterior distributions allows the network to maintain uncertainty about its own connectivity. By deriving a mean field algorithm to train at the ensemble level, the researchers demonstrate that temporal credit assignment depends not on precise gradient calculations, but on the evolution of weight distributions across the network population.

The Spike and Slab Framework and Mean Field Training

The technical innovation centers on the spike and slab prior, a mixture distribution combining a delta function at zero (the spike) with a continuous Gaussian component (the slab). This formulation elegantly captures both structural uncertainty (whether a connection exists at all) and parametric uncertainty (the strength of existing connections). In practice, this means each recurrent synapse is characterized by a binary inclusion variable and a continuous weight value, both governed by hyperparameters that evolve during training.

The authors derive a mean field approximation to handle the intractable posterior inference over these latent variables. Rather than sampling from the full posterior, which would be prohibitively expensive for recurrent architectures, the mean field approach factorizes the distribution and tracks sufficient statistics. This yields a deterministic algorithm that nevertheless captures the essential uncertainty structure. Notably, the paper provides an analytic solution in the infinitely large network limit, revealing how population level dynamics emerge from the interaction of individual stochastic synapses.

When applied to sequential MNIST, where handwritten digits must be classified from pixel sequences, the ensemble approach reveals critical differences from standard training. The model identifies sparse subsets of connections that dominate the temporal integration, with the spike and slab hyperparameters effectively implementing a form of structured regularization. The mean field updates allocate credit not by exact gradient chains, but by adjusting the probability distributions over synaptic configurations.

Symmetry Breaking and Emergent Selectivity

Beyond classification accuracy, the paper offers insight into the geometry of the learned parameter space. Through low dimensional projections of both neural activity and synaptic dynamics, the authors demonstrate spontaneous symmetry breaking in the ensemble. Initially symmetric distributions over weight configurations collapse into distinct modes, with certain connection patterns stabilizing while others vanish into the spike component. This process reveals that weight uncertainty itself acts as a selection mechanism; connections with high variance in the early training phase are pruned or consolidated based on their contribution to the temporal computation.

The analysis extends to multisensory integration, a fundamental cognitive function where organisms must combine temporally disparate cues. The ensemble model exhibits distinct types of emergent neural selectivity that mirror biological observations. Some units develop sensitivity to specific temporal intervals, while others encode the reliability of sensory channels through their variance parameters. The hyperparameters of the spike and slab distribution effectively encode the temporal statistics of the input, suggesting that