How neural networks work-and why they've become a big business

Start from the beginning
                                        

This procedure allowed early neural networks to "learn" in a way that superficially resembled the behavior of the human nervous system. The approach enjoyed a decade of hype in the 1960s. But then an by computer scientists Marvin Minsky and Seymour Papert demonstrated that these early neural networks had significant limitations.

Rosenblatt's early neural networks only had one or two trainable layers. Minsky and Papert showed that such simple networks are mathematically incapable of modeling complex real-world phenomena.

In principle, deeper neural networks were more versatile. But deeper networks would have strained the meager computing resources available at the time. More importantly, no one had developed an efficient algorithm to train deep neural networks. The simple used in the first neural networks didn't scale for deeper networks.

As a result, neural networks fell out of favor in the 1970s and early 1980s—part of that era's "AI winter."

A breakthrough algorithm

(Picture above: Enlarge / My wetware's on-board neural network rates this image as having a hot dog probability of 1

Oops! This image does not follow our content guidelines. To continue publishing, please remove it or upload a different image.

(Picture above: Enlarge / My wetware's on-board neural network rates this image as having a hot dog probability of 1. We're going to be rich!)

The fortunes of neural networks were revived by a that introduced the concept of , a practical method to train deep neural networks.

Suppose you're an engineer at a fictional software company who has been assigned to to determine whether or not an image contains a hot dog. You start with a randomly initialized neural network that takes an image as an input and outputs a value between 0 and 1—with a 1 meaning "hot dog" and a 0 meaning "not a hot dog."

To train the network, you assemble thousands of images, each with a label indicating whether or not the image contains a hot dog. You feed the first image—which happens to contain a hot dog—into the neural network. It produces an output value of 0.07—indicating no hot dog. That's the wrong answer; the network should have produced a value close to 1.

The goal of the backpropagation algorithm is to adjust input weights so that the network will produce a higher value if it is shown this picture again—and, hopefully, other images containing hot dogs. To do this, the backpropagation algorithm starts by examining the inputs to the neuron in the output layer. Each input value has a weight variable. The backpropagation algorithm will adjust each weight in a direction that would have produced a higher value. The larger an input's value, the more its weight gets increased.

So far I've just described a simple that would have been familiar to researchers in the 1960s. The breakthrough of backpropagation was the next step: the algorithm uses partial derivatives to apportion "blame" for the wrong output among the neuron's inputs. The algorithm computes how the final neuron's output would have been affected by a small change in each input value—and whether the change would have pushed the result closer to the right answer, or away from it.

The result is a set of error values for each neuron in the second-to-last layer—essentially, a signal estimating whether each neuron's value was too high or too low. The algorithm then repeats the adjustment process for these new neurons in the second layer. For each neuron, it makes small changes to the input weights to nudge the network closer to the correct answer.

Random NonfictionWhere stories live. Discover now