Skip to content

Using preference transformer to learning a reward function from dataset, then train an agent with PPO

Notifications You must be signed in to change notification settings

zsychina/PrefTransPPO

Repository files navigation

PrefTransPPO

implementation

About

Using preference transformer to learning a reward function from dataset, then train an agent with PPO

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published