In this project, we compared two algorithms
- BPR: exploitative style, a way of playing to identify and exploit imbalances in the strategies of your opponents.
- MADDPG/M3DDPG: game theory optimal (GTO) style, a way of playing a game that makes you unexploitable to your opponents.
Check the report for more detail.
env.py
is the environment we developed to test the algorithms. You can interact with the environment by runningplay_with_model.py
.train/
folder contains the code we used to train our agent.- Notice that you may need to add
sys.path.append
to makeimport env
works
- Notice that you may need to add
- For the MADDPG/M3DDPG agents, we stored them as
pickle
objects after training for reuse.