Whose Fault Is it Anyway? The Problem of Credit Assignment

Credit assignment is one of the central ideas in machine learning. When you are training a ML model, and it makes an error, an algorithm has to apportion credit (really, it’s blame, not credit) to each of the model’s parameters. Then the algorithm adjusts the parameters ever so slightly, such that the model’s error for the same input is reduced, and it gets closer to producing the correct answer.

WHY MACHINES LEARN

“A masterpiece”—Geoff Hinton

In the late 1950s, when Frank Rosenblatt designed the perceptron and Bernie Widrow designed ADALINE (for adaptive linear neuron), they were limited by the ideas of the time in how they could do credit assignment. Both the perceptron and ADALINE were single-layer neural networks. The algorithms for updating the network’s parameters (i.e. the weights) did not work for multi-layer neural networks. Crucially, the algorithm developed by Widrow along with Ted Hoff, called the Widrow-Hoff Least Mean Squares (LMS) algorithm, presaged what was to come, because they had figured out a noisy, roughshod way of doing gradient descent, using an algebraic approximation of the calculus used to minimize error.

There were plenty of indications from the 1960s onwards for how to possibly do credit assignment for multi-layer neural networks. If you are interested, look up Jürgen Schmidhuber’s blog post. But it was the 1986 Nature paper—”Learning representations by back-propagating errors”—by Rumelhart, Hinton and Williams that sealed the deal. The backpropagation algorithm was born. It showed how to do credit assignment for multi-layer neural networks with hidden layers (meaning layers that are not exposed directly to the network’s inputs or outputs); more importantly, it showed what the networks learned as a result of doing backpropagation.

The backpropagation algorithm is by no means the only way to do credit assignment. For one, the algorithm is not biologically plausible: our brains are almost certainly not doing backprop. That’s because the algorithm has to keep track of the weight matrices and the activations of neurons, in order to use the chain rule of calculus and do credit assignment back from the output layer to the input layer. This can’t be what the brain is doing. It’s likely using some local mechanism, meaning a mechanism that only depends on figuring out how to apportion blame for an error to a given neuron’s synapses by looking at its immediate neighborhood, and not depend on long-distance connections.

The search for a biologically-plausible mechanism is ongoing.

Next
Next

Machine Learning’s Arc Of History