Computational neuroscience for learning from a small sample
Abstract
Deep neural networks have been remarkably useful for image classification and phoneme recognition. Combined with reinforcement learning algorithms, deep neural networks have outperformed human experts in simulated video games and the game “Go”. To achieve such successes, millions of images, hundreds of millions of phonemes, and tens of millions of games have been utilized as training data sets in the supervised learning or training trials in the reinforcement learning. Meanwhile, in the 2015 DARPA robotics challenge final competition (2015 DARPA Robotics Challenge Finals), many humanoid robots fell while walking on sand, going up stairs, turning bulbs, or getting out of a car. A small number of humanoids completed all the tasks, but they were extremely slower than humans. By age 5, human infants are able to execute all of the above tasks more quickly and reliably than humanoid robots developed by world premier researchers. What could be the reasons of this dramatic contrast between success and failure for simulated versus real-world tasks by artificial intelligence? In the simulated video games and “Go”, the degrees of freedom of the controlled system were relatively small, there were no hidden variables, and state transitions were deterministic without noise and perfectly described by simple rules. Thus, the computer simulations were exactly correct without errors. For the final reason, tens of millions of simulated games are generated by software players, and they can be used efficiently for DeepQ learning (a Q-learning algorithm of reinforcement learning combined with deep neural network learning). In contrast, a humanoid robot in the real world is a complicated nonlinear dynamical system with huge degrees of freedom. Indeed, hidden states can be situated far above measured sensory signals and far below issued motor commands. Many physical processes, including contact and friction, are difficult to model. Mainly for the final reason, quantitatively reliable simulations of humanoid robots in real-world environments are extremely difficult even if not impossible. Thus, reinforcement learning in humanoids designed to operate in the real world has been typically conducted using real experimental trials. However, when humanoids fall, they are often damaged such that no further trials can be accumulated before painful, expensive and laborious repairs are made. In artificial intelligence, or more precisely, in neural networks learning and machine learning, it is well established that when a learning system with a fixed degrees of freedom n is utilized, approximately 10n training samples are necessary. If it is possible to conduct tens of millions of learning trials, a large learning system, such as deep neural networks, can be utilized. However, if only 100 trials can be accumulated, only very simple learning systems with ten degrees of freedom should be utilized to avoid over-fitting problems in learning. I postulate that these differences in the number of training samples and consequently resulting allowed degrees of freedom of the control systems readily explain the dramatic contrast between the success of the simulated learning and the failure of the real-world learning mentioned above.
Animal brains are confronted with sensorimotor problems that are much more challenging than those faced by humanoid robots. Animal bodies are flexible and possess an enormous number of muscles, sensors, and motor neurons. Neurons are slow-computing devices with a significant degree of noise. Thus, physical modeling of animal movements is very difficult, as there are many degrees of freedom, hidden variables, a high noise level, and a risk of injury or death in the case of failure. The human brain contains 10 to the 11th neurons and 10 to the 14th synapses. As a learning control system it has enormous degrees of freedom. If we assume that the number of synapses correspond to the degree of freedom of the learning system, and that a single reinforcement learning trial can be obtained within 10 seconds, then it follows that an animal brain will need 10 to the 15th training trials, and thus 10 to the 16th seconds for learning time to avoid over-fitting. This period is much longer than an animal life. In contrast to this estimate, humans learn motor control very quickly. For example, humans can learn new dynamic environment within a few trials. Human infants learn to walk after only several thousands falls. Through computational neuroscience research of sensorimotor learning, I hope to understand a mystery to brake the common sense in artificial intelligence: 10 to the 11th degrees-of-freedom learning system can learn to control an extremely complicated nonlinear dynamical system only after 1,000 failures. Kawato and Samejima (2007) reviewed several computational schemes for enabling efficient reinforcement learning from a small training samples. They include internal models, sparse estimation algorithms, multiple- paired forward and inverse models, and a hierarchical reinforcement learning algorithms. Attention, consciousness, metacognition, and episodic memory are important research topics in cognitive neuroscience, and have recently attracted the interests of artificial intelligence researchers with the hope that they could provide computational mechanisms to decrease high dimensionality of data in learning. They may play essential roles in constructing abstract concepts, dimensions and attributes that are high-level representations necessary in the upper layers of hierarchical reinforcement learning. With respect to reducing the dimensionality of high-dimensional data, electrical synapses that transmit information via gap junctions are attractive elements in neuronal circuits because they tend to synchronize neurons and effectively reduce the degrees of freedom of the circuit.
The cerebellum is important for motor control and motor learning and plays very important roles in multi-joint movements such as walking. The inferior olivary (IO) nucleus sends climbing fiber inputs to Purkinje cell (PC), the only output of all motor coordination in the cerebellar cortex, and possesses the highest density of gap junctions in the mammalian brain. As a good candidate for a neuronal system that plays a central role in motor learning and that may be useful in investigating the above-mentioned disparity between the large degrees of freedom of learning systems and conditions where only a small number of training trials are available, I focus on the olivo-cerebellar system. Of special interest is the network of IO neurons, which may control the degrees of freedom by adjusting their synchronous/asynchronous firing activities to provide an adaptive framework for the learning machinery. In the cerebellar motor learning, it has been known that the IO neurons transmit error signals to the PC, inducing plasticity at the parallel fiber-PC synapses. Recent investigations have also revealed multiple plasticity mechanisms as well as evidence that parallel fiber-evoked simple spikes to PCs contribute to cerebellum-dependent learning to some extent. One dominant view over the last several decades suggests that complex spikes transmitted through the climbing fibers provide instructive signals to the PCs to drive learning. To examine the functions of the IO, computational modeling has been one of the promising driving forces. As the carrier of the teaching signals, the IO has been modeled to provide the climbing fiber inputs in the simulation studies of the cerebellar learning. To explore the IO dynamics in detail, a class of simplified conductance-based models has been developed to reproduce experimental observation of sub- threshold oscillations. Further details of the electrophysiological properties of the IO neurons have been described by multiple compartment models, which have been applied to elucidate experimental observation of the sub-threshold activities, to examine the capability of their information transmission, and to estimate conductance levels of the IO network from experimental data. Owing to the advanced experimental methods as well as the rapid growth in computer power, the computational models have been nowadays utilized for quantitative understanding of the experimentally measured IO dynamics and furthermore for testing hypotheses regarding IO functions. Here, I review recent advances in the computational modeling of the olivo- cerebellar system.