Right now there’s a big hype about Machine learning and Big Data all around in the tech world. This is not surprising as they have played a significant role in Automation, Business advancements and predictions. But along with them Deep Learning is also now becoming a popular term in recent times. One interesting fact about deep learning is that it was abandoned in late 1980s, but later in 2007 Geoffrey Hinton brought an algorithm which all over again has invoked research in it.

Before I begin with the story behind the evolution of Deep Learning let’s first understand what’s the need of Deep Learning.

- Deep Learning is based on training a neural network with many hidden layers in it. The major benefit of using deep networks is
**Node-efficiency**which means it is often possible to approximate complex functions to the same accuracy using a deeper network with much fewer total nodes compared to a 2-hidden-layer network with nodes in huge number.

- Intuitively, the reason for a smaller and deeper network to be more effective than an equally sized (in total nodes) shallower network is that a deep network
**reduces the amount of redundant work.**With very deep networks, it is possible to model functions that work with many layers of abstraction – for example, classifying the gender of images of faces, or the breed of dogs. It is not practical to perform these tasks using shallow networks, because of the redundant work done.

Computational efficiency of deep networks was known for a very long time and have been attempted as early as in 1980s. In 1989, LeCun et al. successfully applied a 3-hidden-layer network to ZIP code recognition. However, they were not able to scale to higher number of layers or more complex problems due to very slow training, the reason of which was not understood at that time. In 1991, Hochreiter identified the problem and called it ”Vanishing Gradients”. Essentially, what’s happening is as errors are propagated back from the output layer, it is multiplied by derivatives of the activation function. As soon as the propagation gets to a node in saturation (where derivative is close to 0), the error is reduced to the level of noise, and nodes behind the saturated node train extremely slowly.

No effective solution to the problem was found for many years. Researchers continued to try to build deep networks, but with often disappointing performance.

No further improvement was seen in Deep Learning till 2007 when Geoffrey Hinton proposed a solution to the problem, and started the current wave of renewed interest in deep learning. The idea was to train the network in greedy unsupervised fashion before finally training the network to an entire labeled data. After this several other ideas were proposed Dropout, Momentum, Relu+L1 regularization which enhanced the performance of Deep Neural Networks greatly.

One noteworthy achievement of deep learning is the competition it invoked between tech giants.

IMAGENET Challenge’s Dataset involves 1.5 million images and the network is expected to recognize a 1000 things.

Deep Learning is still considered to be in its initial stage, with increasing computational power possibilities with Deep Networks are also increasing immensely.

## One thought on “Machine Learning’s Evolution to Deep Learning”