LAION-AI · andreaskoepf · Jul 10, 2023 · Jun 17, 2023
@@ -98,8 +98,8 @@ export SFT_MODEL=$MODEL_PATH/sft_model/$(ls -t $MODEL_PATH/sft_model/ | head -n
 5. Train the reward model
 
 ```bash
-cd ../reward/instructor
-python trainer.py configs/deberta-v3-base.yml --output_dir $MODEL_PATH/reward_model
+cd model_training
+python trainer_rm.py --configs defaults_rm oasst-rm-1-pythia-1b
 ```
 
 6. Get RM trained model
@@ -117,7 +117,7 @@ export REWARD_MODEL=$MODEL_PATH/reward_model/$(ls -t $MODEL_PATH/reward_model/ |
 7. Train the RL agent
 
 ```bash
-cd ../../model_training
+cd model_training
 python trainer_rl.py --configs defaults_rlhf --cache_dir $DATA_PATH --rank_model $REWARD_MODEL --sft_model $SFT_MODEL --output_dir $MODEL_PATH/rl_model
 ```
 

@@ -57,11 +57,16 @@ Currently only these languages are supported via prompt translation:
 ar,de,fr,en,it,nl,tr,ru,ms,ko,ja,zh
 ```
 
+We provide many more datasets for training a list of these can be found in
+[here](https://github.com/LAION-AI/Open-Assistant/blob/main/model/model_training/custom_datasets/__init__.py)
+
 ## Dataset sub-sampling
 
 We can subsample the **training** data by passing either the `fraction` or
-`size` argument in the `configs/config.yml` file. Don't forget the additional
-colon ":" after the dataset name when doing this.
+`size` argument in the `configs/config.yml` (for RM training
+`configs/config_rm.yml` and for RL training `configs/config_rl.yml`
+respectively) file. Don't forget the additional colon ":" after the dataset name
+when doing this.
 
 Example: