====== Learning algorithm ======

Reinforcement [[learning algorithm]]s aim to determine the ideal behavior within a specific context based on simple reward feedback on their actions; the self-driving car is a typical example. 

Supervised learning [[algorithm]]s are trained on prelabeled data referred to as the training set
((Azimi P, Mohammadi HR, Benzel EC, Shahzadi S, Azhari S, Montazeri A. Artificial neural networks in neurosurgery. J Neurol Neurosurg Psychiatry . 2015;86(3):251-256.)).

This [[training]] process is an iterative process in which [[machine learning]] (ML) algorithms try to find the optimal combination of [[variable]]s and weights given to the [[input]] variables (referred to as features) of the model with the goal of minimizing the [[training]] error as judged by the difference between predicted [[outcome]] and actual outcome
((Deo RC. Machine learning in medicine. Circulation . 2015;132(20):1920-1930.)).
----
Several studies have shown that natural gradient descent for on-line learning is much more efficient than standard gradient descent. In this article, we derive natural gradients in a slightly different manner and discuss implications for batch-mode learning and pruning, linking them to existing algorithms such as Levenberg-Marquardt optimization and optimal brain surgeon. The Fisher matrix plays an important role in all these algorithms. The second half of the article discusses a layered approximation of the Fisher matrix specific to multilayered perceptrons. Using this approximation rather than the exact Fisher matrix, we arrive at much faster "natural" learning algorithms and more robust pruning procedures
((Heskes T. On 'natural' learning and pruning in multi-layered perceptrons.
Neural Comput. 2000 Apr;12(4):881-901. PubMed PMID: 10770836.
)).