The-Mean-Squared-Error-of-Double-Q-Learning

Here are codes for reproduction of the Neurips 2020 paper "The Mean-Squared Error of Double Q-Learning"

We test Double Q-learning and Q-learning for different environments.

All experiments below were run using Matlab R2018b and Python 3.6.9

Environments that we considered

Baird's example: bairds
GridWorld: grid
CartPole: cartpole
Maximization Bias: Bias, Bias(nn)

Baird's Experiment

File:

bairds/GenBaird.m
bairds/simulation_baird.m
bairds/plot.py

In simulation_baird.m, change the input to the function GenBaird to simulate different settings

Run simulation_baird.m, it generates several files, meanings are the same as GridWorld specified later.

To plot the trace of mean-squared error: python3 plot.py

GridWorld Experiment

File:

grid/GenGrid.m
grid/simulation_grid.m
grid/plot.py

In simulation_grid.m, change the input to the function GenGrid to simulate different size of GridWorld

Run simulation_grid, it generates several files, meanings are specified later.

Grid-n=3-errsingle.txt

mean-squared error of Q-learning over a sample path of length 100000, averaged on 100 tests
Grid-n=3-stderrsingle.txt

the standard deviation of each value in the last file
Grid-n=3-errDouble.txt

Similar setting, for Double Q-learning
Grid-n=3-stdDouble.txt
Grid-n=3-erravg_d.txt

Similar setting, for Double Q-learning with twice the step size and averaged estimator
Grid-n=3-stderravg_d.txt
Grid-n=3-errDouble_d.txt

Similar setting, for Double Q-learning with twice the step size
Grid-n=3-stderrDouble_d.txt

To plot the trace of mean-squared error: python3 plot.py

CartPole Experiment

File:

cartpole/cartpole.py
cartpole/read.py

Dependence

OpenAI Gym

To run experiments: python cartpole.py

Change self.policy, self.twofold, self.avg on line 27 in the code to test:

Q-learning: self.policy = 'Q', self.twofold=0, self.avg = False
Double Q-learning: self.policy = 'D-Q', self.twofold=0, self.avg = False
Double Q-learning with twice the step size: self.policy = 'D-Q', self.twofold=1, self.avg = False
Double Q-learning with twice tep step size and averaged estimator: self.policy = 'D-Q', self.twofold=1, self.avg = True

It generates a file Reward-Q (or Reward-D-Q, Reward-D-Q-twice, Reward-D-Q-twice-average):

contains a 100*20 vectors (readable using numpy.load):

the element on the i-th row and j-th column is the mean reward of the estimator on the 50j-th episode for the i-th test

To plot the distribution of the hit time: python plot.py

Maximization Bias Experiment

File:

bias/bias.py
bias/plot.py
bias(nn)/bias(nn).py
bias(nn)/plot.py

Dependence

OpenAI Gym
PyTorch 1.6.0

bias.py is the code for a tabular setting and $M=8$, and bias(nn).py is the code for a setting with neural networks and $M=10^9$. They are basically the same, so we focus on bias(nn).py in the followings.

To run experiments: python bias(nn).py

Change self.policy, self.twofold, self.avg on line 59 in the code to test:

Q-learning: self.policy = 'Q', self.twofold=0, self.avg = False
Double Q-learning: self.policy = 'D-Q', self.twofold=0, self.avg = False
Double Q-learning with twice the step size: self.policy = 'D-Q', self.twofold=1, self.avg = False
Double Q-learning with twice tep step size and averaged estimator: self.policy = 'D-Q', self.twofold=1, self.avg = True

It generates a file ProbLeft-Q (or ProbLeft-D-Q, ProbLeft-D-Q-twice, ProbLeft-D-Q-twice-average):

contains a 1000*200 vectors (readable using numpy.load):

the element on the i-th row and j-th column is the probability of a left action for the first j-th episode in the i-th test

To plot the convergence of probability to go to the left, run python plot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The-Mean-Squared-Error-of-Double-Q-Learning

Baird's Experiment

GridWorld Experiment

CartPole Experiment

Maximization Bias Experiment

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Bias(nn)		Bias(nn)
Bias		Bias
bairds		bairds
cartpole		cartpole
grid		grid
LICENSE		LICENSE
README.md		README.md

License

wentaoweng/The-Mean-Squared-Error-of-Double-Q-Learning

Folders and files

Latest commit

History

Repository files navigation

The-Mean-Squared-Error-of-Double-Q-Learning

Baird's Experiment

GridWorld Experiment

CartPole Experiment

Maximization Bias Experiment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages