“Artificial intelligence is a branch of computer science dealing with the simulation of intelligent behavior in computers.”
Merriam Webster dictionary
The first step forward in the evolution of neural networks occurred in 1943, when Mc-Culloch and Pitts, a physicist and a mathematician respectively, wrote an article explaining how neurons should work.
Then they created a model, using an electrical circuit, which allowed the calculation of simple Boolean functions. After a while, in 1949, Donald Hebb reinforced the concept of neurons, in his publication “The organization of behavior”, concluding that every time a path between two neurons A and B was used, it became stronger. To date this discovery corresponds to the crude definition of weight of one connection between two neurons. There were no big news until the late ’50s. In ’58, thanks to the model presented by Rosenblatt, the perceptron was born. It differs from the neuron by the presence of an activation function, threshold or step funct. Rosenblatt patented an algorithm of supervised learning for this modified MCP neuron. Which allowed the neuron to find the right weight directly from the training itself. Thanks to this discovery was made possible to tackle various types of problems, including binary classification.
The AI winter(s) and the rise
Perceptron studies continued without progress, from 1974 to 1980. Until in 1986 Hinton, Rumelhart and Williams introduced the concept of hidden layer and backpropagation (procedure that aims at optimizing the weights of a network). These were (and still are) fundamental for the emergence of the MLP (multilayer perceptron) network, a feedforward network with at least three levels (hidden> = 1). From here on, the evolution of artificial intelligence was flat, without any kind of confusion, there was the second winter, until the advent of robust classifiers such as SVM and multi-classifiers, in the late 90s.
In 2012 there was the explosion of deep learning, an idea already present since the 90s. That failed to blossom due to the failure to achieve the expected results. The reasons were mainly :
- lack of available data (big data)
- not sufficient computing power (problem solved to date thanks to the computation capacity of the gpu, trivial)
- low complexity of the used models