Source: venturebeat.com
In a preprint paper this week published on Arxiv.org, researchers at Intel describe Sample Factory, a system that achieves high throughput — higher than 105 environment frames per second — in reinforcement learning experiments. In contrast to the distributed servers and hardware setups those experiments typically require, Sample Factory is optimized for single-machine settings, enabling researchers to achieve what the coauthors claim are “unprecedented” results in AI training for video games, robotics, and other domains.
Training AI software agents in simulation is the cornerstone of contemporary reinforcement learning research. But despite improvements in the sample efficiency of leading methods, most remain notoriously data- and computation-hungry. Performance has risen due to the increased scale of experiments, in large part. Billion-scale experiments with complex environments are now relatively commonplace, and the most advanced efforts have agents take trillions of actions in a single session.
Sample Factory targets efficiency with an algorithm called asynchronous proximal policy optimization, which aggressively parallelizes agent training and achieves throughput as high as 130,000 FPS (which here indicates environment frames per second) on a single-GPU commodity PC. It minimizes the idle time for all computations by associating each workload with one of three types of components: rollout workers, policy workers, and learners. These components communicate with each other using a fast queuing protocol and shared hardware memory. The queuing provides the basis for continuous and asynchronous execution, where the next computation step can be started immediately as long as there is something in the queue to process.
To be clear, Sample Factory doesn’t enable experiments that couldn’t be performed before. But it accelerates them so that they’re more practical on single-PC setups than before. At full throttle, even with multi-agent environments and large populations of agents, Sample Factory can generate and consume more than 1GB of data per second. A typical update to a model takes less than 1 millisecond.
In experiments on two PCs — one with a 10-core CPU and a GTX 1080 Ti GPU and a second with a server-class 36-core CPU and a single RTX 2080 Ti — the researchers evaluated Sample Factory’s performance on three simulators: Atari, VizDoom (a Doom-like game used for AI research), and DeepMind Lab (a Quake III-like environment). They report that the system outperformed the baseline methods in most of the training scenarios after between 700 to 2,000 environments, reaching at least 10,000 frames per second.
In one test, the researchers used Sample Factory to train an agent to solve a set of 30 environments simultaneously. In another, they trained eight agents in “duel” and “deathmatch” scenarios within VizDoom, after which the agents beat the in-game bots on the highest difficulty in 100% of matches. And in a third, they had eight agents battle against each other to accumulate 18 years of simulated experience, which enabled those agents to defeat scripted bots 78 times out of 100.
“We aim to democratize deep [reinforcement learning] and make it possible to train whole populations of agents on billions of environment transitions using widely available commodity hardware,” the coauthors wrote. “We believe this is an important area of research, as it can benefit any project that leverages model-free [reinforcement learning]. With our system architecture, researchers can iterate on their ideas faster, thus accelerating progress in the field.”