TOP 10 REINFORCEMENT LEARNING PAPERS FROM ICLR 2020

Source: analyticsindiamag.com

Reinforcement Learning has become the base approach in order to attain artificial general intelligence. The ICLR (International Conference on Learning Representations) is one of the major AI conferences that take place every year. With more than 600 interesting research papers, there are around 44 research papers in reinforcement learning that have been accepted in this year’s conference.

This article lists down the top 10 papers on reinforcement learning one must read from ICLR 2020.

1| Graph Convolutional Reinforcement Learning

About: In this paper, the researchers proposed graph convolutional reinforcement learning. In this model, the graph convolution adapts to the dynamics of the underlying graph of the multi-agent environment whereas the relation kernels capture the interplay between agents by their relation representations. In simple words, the multi-agent environment is modelled as a graph and the graph convolutional reinforcement learning, also called DGN is instantiated based on deep Q network and trained end-to-end.

According to the researchers, unlike other parameter-sharing methods, graph convolution enhances the cooperation of agents by allowing the policy to be optimised by jointly considering agents in the receptive field and promoting mutual help.

2| Measuring the Reliability of Reinforcement Learning Algorithms

About: Lack of reliability is a well-known issue for reinforcement learning (RL) algorithms. In this paper, the researchers proposed a set of metrics that quantitatively measure different aspects of reliability.

According to the researchers, the analysis distinguishes between several typical modes to evaluate RL performance, such as “evaluation during training” that is computed over the course of training vs “evaluation after learning”, which is evaluated on a fixed policy after it has been trained. These metrics are also designed to measure different aspects of reliability, e.g. reproducibility (variability across training runs and variability across rollouts of a fixed policy) or stability (variability within training runs).

3| Behaviour Suite for Reinforcement Learning

About: The researchers at DeepMind introduces the Behaviour Suite for Reinforcement Learning or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate the core capabilities of reinforcement learning agents with two objectives.

First, to collect clear, informative and scalable problems that capture key issues in the design of general and efficient learning algorithms. Second, to study agent behaviour through their performance on these shared benchmarks.

4| The Ingredients of Real World Robotic Reinforcement Learning

About: In this paper, the researcher at UC, Berkeley and team discussed the elements for a robotic learning system that can autonomously improve with the data that are collected in the real world. They proposed a particular instantiation of a system using dexterous manipulation and investigated several challenges that come up when learning without instrumentation.

Furthermore, the researchers proposed simple and scalable solutions to these challenges, and then demonstrated the efficacy of the proposed system on a set of dexterous robotic manipulation tasks. They also provided an in-depth analysis of the challenges associated with this learning paradigm.

5| Network Randomisation: A Simple Technique for Generalisation in Deep Reinforcement Learning

About: Here, the researchers proposed a simple technique to improve a generalisation ability of deep RL agents by introducing a randomised (convolutional) neural network that randomly perturbs input observations. The technique enables trained agents to adapt to new domains by learning robust features invariant across varied and randomised environments.

6| On the Weaknesses of Reinforcement Learning for Neural Machine Translation

About: Reinforcement learning (RL) is frequently used to increase performance in text generation tasks, including machine translation (MT) through the use of Minimum Risk Training (MRT) and Generative Adversarial Networks (GAN). In this paper, the researchers proved that one of the most common RL methods for MT does not optimise the expected reward, as well as show that other methods take an infeasible long time to converge. They further suggested that Reinforcement learning practices in machine translation are likely to improve the performance in some cases such as, where the pre-trained parameters are already close to yielding the correct translation.

7| Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation

About: In this paper, the researchers proposed a reinforcement learning based graph-to-sequence (Graph2Seq) model for Natural Question Generation (QG). The model consists of a Graph2Seq generator with a novel Bidirectional Gated Graph Neural Network-based encoder to embed the passage and a hybrid evaluator with a mixed objective combining both cross-entropy and RL losses to ensure the generation of syntactically and semantically valid text. The proposed model is end-to-end trainable, achieves new state-of-the-art scores, and outperforms existing methods by a significant margin on the standard SQuAD benchmark for QG.

8| Adversarial Policies: Attacking Deep Reinforcement Learning

About: Deep reinforcement learning policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers. In this paper, the researchers proposed a novel and physically realistic threat model for adversarial examples in RL and demonstrated the existence of adversarial policies in this threat model for several simulated robotics games.

The researchers further conducted a detailed analysis of why the adversarial policies work and how the adversarial policies reliably beat the victim, despite training with less than 3% as many timesteps and generating seemingly random behaviour.

9| Causal Discovery with Reinforcement Learning

About: Discovering causal structure among a set of variables is a fundamental problem in many empirical sciences. In this paper, the researchers proposed to use reinforcement learning to search for the Directed Acyclic Graph (DAG) with the best scoring. The encoder-decoder model takes observable data as input and generates graph adjacency matrices that are used to compute rewards. In contrast with typical RL applications where the goal is to learn a policy, they used RL as a search strategy and the final output would be the graph, among all graphs generated during training, that achieves the best reward.

10| Model-Based Reinforcement Learning for Atari

About: In this paper, the researchers explored how video prediction models can similarly enable agents to solve Atari games with fewer interactions than model-free methods. They described Simulated Policy Learning (SimPLe), which is a complete model-based deep RL algorithm based on video prediction models and presents a comparison of several model architectures, including a novel architecture that yields the best results in the setting. According to the researchers, in most games, SimPLe outperformed state-of-the-art model-free algorithms, while in some games by over an order of magnitude.

Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!