Then, once again, the algorithm uses partial derivatives to compute how the value of each input to the second-to-last layer contributed to the errors in that layer's output—and propagates the errors back to the third-to-last layer, where the process repeats once more.
I've just given a simplified summary of how backpropagation works. If you want the full mathematically rigorous details, I recommend on the topic. For our purposes, the key point is that backpropagation radically expanded the scope of trainable neural networks. People were no longer limited to simple networks with one or two layers. They could build networks with five, ten, or fifty layers, and these networks could have arbitrarily complex internal structures.
The invention of backpropagation launched a second neural networks boom that began producing practical results. In 1998, a group of AT&T researchers showed how to use neural networks to , allowing automated check processing.
"The main message of this paper is that better pattern recognition systems can be built by relying more on automatic learning and less on hand-designed heuristics," the authors wrote.
Still, at this phase neural networks were just one of many techniques in the toolboxes of machine learning researchers. When I took a course on artificial intelligence as a grad student in 2008, neural networks were listed as we could choose to implement for our final assignment. But deep learning was about to eclipse other techniques.
Big data shows the power of deep learning
Oops! This image does not follow our content guidelines. To continue publishing, please remove it or upload a different image.
(Picture above: Enlarge / RELAXATION DETECTED. BEACH PROBABILITY 1.0. BEGIN MAI TAI CONSUMPTION SUBROUTINE.)
Backpropagation made deeper networks more computationally tractable, but those deeper networks still required more computing resources than shallower networks. Research results in the 1990s and 2000s often suggested that there were diminishing returns to making neural networks more complex.
Then a —which described a neural network dubbed AlexNet after lead researcher Alex Krizhevsky—transformed people's thinking. Dramatically deeper networks could deliver breakthrough performance but only if they were combined with ample computing power and lots and lots of data.
AlexNet was developed by a trio of University of Toronto computer scientists who were entering an academic competition called . The organizers of the competition had scraped the Internet and assembled a corpus of 1 million images—each of which was labeled with one of a thousand object categories like "cherry," "container ship," or "leopard." AI researchers were invited to train their machine learning software on some of the images and then try to guess the correct labels for other images the software hadn't seen before. Competing software chose five possible labels out of the thousand categories for each picture. The answer was judged successful if one of these five labels matched the actual label in the data set.
It was a hard problem, and prior to 2012 the results hadn't been very good. The 2011 winner had a of 25%.
In 2012, the AlexNet team blew their rivals out of the water with a top-5 error rate of 16%. The nearest rival that year had a 26% error rate.