Transfer Learning in DQN using weighted layers copying

Eitan Ziv & Gal Shachaf

Supervised by Tom Zahavy

March 2018

Abstract

In DRL, we can refer to transfer learning as the ability to use knowledge gained while training an agent in one domain and applying it to the training of another agent, usually in a different domain.
By transferring the weights of different parts of the networks, we sought to improve the learning rate and maximum reward achieved by the DQN algorithm.

Reinforcement Learning and Deep Q-network

Reinforcement Learning is an area of machine learning concerned with how a 'software agent' (a controller) ought to take actions in an environment so as to maximize some notion of cumulative reward.
RL methods essentially deal with the solution of optimal control problems using on-line measurements.

Deep Q-network is a novel RL algorithm created by Google DeepMind.
Achieved excellent results in various domains.

Transfer Learning

Transfer learning is a research problem that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.

By using transfer learning we hope to:

1. Achieve higher performances

2. Accelerate learning rate

3. Reduce training resources (time, compute, samples)

Method

In our work we’ve focused on Transfer Learning in the form of transferring the weights from the layers of the source agent best deep neural network.

Our method consisted of copying the n first or last weighted layers of the source agent (meaning, the agent trained on the source domain) to a new, random initialized DQN ”destination agent”, and then training the ”destination agent” on the destination domain.

During the training itself, we either let the weights of the copied layers adjust according to the training process (”fine tuning”), or left unchanged throughout the entire training session (by zeroing the gradients).

Domain Similarity

Transfer learning is effected greatly by the similarity of the source and destination domains.

Since there are no proven metrics of similarity between the Atari 2600 games, we chose the following parameters to help us asses the similarity between games:

1. Action set. for each game in the Atari 2600, a different set of controller keys are used, and for different actions. Hence, we can measure the similarity of games using this action set, when two games who have the exact same actions are very similar, and games that use different controller keys or same keys for different actions are dissimilar.

2. Game Objectives. In the Atari 2600, there are a few families of games, e.g. the Pacman family, or the space invaders family. Each ”game family” has a very similar gaming behavior the player’s objectives are very similar in all of the games in the family, and the strategies the player employ in each game belonging to a family are also similar. For our experiments, we’ve treated two games who belong to the same family as Close-Domains.

3. Visual resemblance. The ”weakest” metric we’ve applied relates to the visual similarity between games. In this metric we considered the colors, shapes and sizes of the objects in each game of the pair to grade their similarity

Results

Far domains

Close domains - First Layers

Close domains - Last Layers

Close domains - Last Layers - Copying from distilled network

Discussion

Knowledge Transfer in DQN results in negative transfer across all domains - close and far
Single experiment in close domain (last layer) suggests advantage if limited in destination domain training resource
Transfer of the first layer in far domains results in little to no effect on the learning rate, resembles random seed
Transfer learning is less suitable for DQN, since DQN agents are less-generalized : take long training time, train on unique coupled tasks-set per game