sac-pendulum

Inverted double pendulum with Soft Actor Critic (SAC) RL model.

v4

Highest score: 9359.85

After 614 episodes (< 5 min of training).

Demo:

Here, I've enabled keyboard inputs, which correspond to 0.7 * max_action of left/right input. It's amazing to see how the actor recovers almost instantly. Of course, for more catastrophic events (like me holding down an arrow key), it is impossible to recover.

pend.mp4

v4_9k.mp4

Score Plots:

v5

I trained the double inverted pendulum using InvertedDoublePendulum-v5 earlier. There was some confusion, and other group members used v4, but I wanted to include my v5 results anyways!

Highest score: 82,812

Demo:

v5_80k.mp4

Score Plots:

Documentation

All documentation is automatically generated by pdoc3.

To generate documentation, run pdoc --html -o docs . -f.

Make sure you do NOT have pdoc and only use pip install pdoc3 or there might be package conflicts.

References

In the code, I sometimes reference back to the original paper + other resources.

[1] paper
[2] lib - provided by Sorina
[3] article
[4] video

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
checkpoints		checkpoints
demo		demo
docs		docs
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
buffer.py		buffer.py
main.py		main.py
models.py		models.py
plot.png		plot.png
plot.py		plot.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sac-pendulum

v4

v5

Documentation

References

About

Releases

Packages

Languages

armeetj/sac-pendulum

Folders and files

Latest commit

History

Repository files navigation

sac-pendulum

v4

v5

Documentation

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages