zsychina / PrefTransPPO Public

Using preference transformer to learning a reward function from dataset, then train an agent with PPO

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
online_test		online_test
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
run_online.sh		run_online.sh
run_reward.sh		run_reward.sh
train.py		train.py
transformer.py		transformer.py