Skip to content

🗣️ 🤓 Inverted double pendulum with Soft Actor Critic (SAC) RL model.

Notifications You must be signed in to change notification settings

armeetj/sac-pendulum

Repository files navigation

sac-pendulum

Inverted double pendulum with Soft Actor Critic (SAC) RL model.

v4

Highest score: 9359.85

After 614 episodes (< 5 min of training).

Demo:

Here, I've enabled keyboard inputs, which correspond to 0.7 * max_action of left/right input. It's amazing to see how the actor recovers almost instantly. Of course, for more catastrophic events (like me holding down an arrow key), it is impossible to recover.

pend.mp4
v4_9k.mp4

Score Plots: plot

v5

I trained the double inverted pendulum using InvertedDoublePendulum-v5 earlier. There was some confusion, and other group members used v4, but I wanted to include my v5 results anyways!

Highest score: 82,812

Demo:

v5_80k.mp4

Score Plots: v5_scores

Documentation

All documentation is automatically generated by pdoc3.

To generate documentation, run pdoc --html -o docs . -f.

Make sure you do NOT have pdoc and only use pip install pdoc3 or there might be package conflicts.

References

In the code, I sometimes reference back to the original paper + other resources.

[1] paper
[2] lib - provided by Sorina
[3] article
[4] video

About

🗣️ 🤓 Inverted double pendulum with Soft Actor Critic (SAC) RL model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages