Source: analyticsindiamag.com
Recently, researchers at Alphabet’s DeepMind and the University of California, Berkeley proposed an AI framework for comparing child and AI agent behaviours and hence developing new exploration techniques.
While developing a reinforcement learning agent, there are several questions related to the exploring behaviour that comes into the mind of researchers — how should an agent gather enough experience from different environments in order to produce optimal behaviours. According to the researchers, the issue of explorations has been considered as one of the most fundamental issues in RL.
There have been a number of researchers performed regarding this issue. Despite these efforts, the issue of exploration remains unsolved. The algorithms that have achieved state-of-the-art performance on popular benchmarks such as Atari, often rely on simple exploration strategies — greedy — combined with huge amounts of computation.
From A Child’s Perspective
In this paper, the researchers performed direct, controlled comparisons between children and agents to leverage insights from children’s exploratory behaviour to improve the design of RL algorithms.
A human possesses an innate ability to explore the environment from the day a child is born. Children are an active and curious learner who explore the surroundings thoroughly and efficiently to learn new things.
According to the researchers, recent evidence suggests that children do indeed explore more than a grown-up human. This means, children tend to perform higher amounts of learning than an adult. The generalisation and rapid learning resulting from children’s exploration is indeed in contrast with what modern RL agents exhibit.
There are three main reasons for comparing children’s behaviour with an RL agent. They are mentioned below: –
- Ecological Validity: According to the researchers, one crucial reason for gathering data from children and RL agents in the same environment is that it helps the reinforcement learning agents to be evaluated in a more ecologically valid setting, contrasting to the grid world-like settings, 2D Atari games, among others.
- Controlled Comparisons: Comparing children directly to reinforcement learning agents aim to provide a standard baseline for the evaluation of agent behaviour and can assist in identifying areas of promising research in deep RL.
- Cognitive Modelling: In addition to the above two conditions, the direct comparisons in children and RL agents give strong direction in the development of new cognitive models of behaviour, furthering the “virtuous cycle” between cognitive science and AI.
The researchers presented a methodology based on DeepMind Lab for directly comparing child and RL agent behaviour in simulated exploration tasks. This allowed the researchers to precisely test questions about how children explore, how agents explore, and how and why they differ.
Further, using this methodology, the researchers proposed two candidate experiments designed to test key qualitative predictions of different exploration algorithms with respect to what is known about children’s exploration behaviour in other domains. The two experiments are free versus goal-directed exploration and sparse versus dense rewards. These experiments have been validated on queries such as how much children and agents are willing to explore, whether free versus goal-directed exploration strategies differ and how the reward shaping affects exploration.
How It Works
DeepMind Lab is a learning environment, based on the Quake game engine, that provides a suite of challenging 3D navigation and puzzle-solving tasks for learning agents. The researchers proposed a unified environment with tasks for training and evaluating both humans and agents.
According to the researchers, these tasks require physical or spatial navigation capabilities to achieve and are modelled after games that children play themselves. In the experimental setup, children are allowed to interact with the DeepMind Lab environment through a custom Arduino-based controller. The controller exposes four actions that agents would use in this environment, which are moving forward, move back, move left, and turn right.
Wrapping Up
In both the experiments mentioned above, the researchers found out that during free versus goal-directed exploration, children’s search strategies differ between the no-goal and goal condition. Children’s behaviour is compared with a depth-first search (DFS) agent, and it showed that in the no-goal condition, children made choices consistent with DFS 89.61 per cent of the time compared to the goal condition, in which children made choices consistent with DFS 96.04 per cent of the time. In the second experiment, which is sparse versus dense rewards, the researchers found out that children are less likely to explore an area in the dense rewards condition,