Inverted double pendulum with Soft Actor Critic (SAC) RL model.
Highest score: 9359.85
After 614 episodes (< 5 min of training).
Demo:
Here, I've enabled keyboard inputs, which correspond to 0.7 * max_action of left/right input. It's amazing to see how the actor recovers almost instantly. Of course, for more catastrophic events (like me holding down an arrow key), it is impossible to recover.
pend.mp4
v4_9k.mp4
I trained the double inverted pendulum using InvertedDoublePendulum-v5 earlier. There was some confusion, and other group members used v4, but I wanted to include my v5 results anyways!
Highest score: 82,812
Demo:
v5_80k.mp4
All documentation is automatically generated by pdoc3
.
To generate documentation, run pdoc --html -o docs . -f
.
Make sure you do NOT have pdoc
and only use pip install pdoc3
or there
might be package conflicts.
In the code, I sometimes reference back to the original paper + other resources.
[1] paper
[2] lib - provided by Sorina
[3] article
[4] video