In recent years, significant progress has been made in the area of deep reinforcement learning. Deep reinforcement learning uses deep neural networks to model the value function (value-based) or the agent’s policy (policy-based) or both (actor-critic). Prior to the widespread success of deep neural networks, complex features had to be engineered to train an RL algorithm. This meant reduced learning capacity, limiting the scope of RL to simple environments. With deep learning, models can be built using millions of trainable weights, freeing the user from tedious feature engineering. Relevant features are generated automatically during the training process, allowing the agent to learn optimal policies in complex environments.
Traditionally, RL is applied to one task at a time. Each task is learned by a separate RL agent, and these agents do not share knowledge. This makes learning complex behaviors, such as driving a car, inefficient and slow. Problems that share a common information source, have related underlying structure, and are interdependent can get a huge performance boost by allowing multiple agents to work together. Multiple agents can share the same representation of the system by training them simultaneously, allowing improvements in the performance of one agent to be leveraged by another. A3C (Asynchronous Advantage Actor-Critic) is an exciting development in this area, where related tasks are learned concurrently by multiple agents. This multi-task learning scenario is driving RL closer to AGI, where a meta-agent learns how to learn, making problem-solving more autonomous than ever before.