Train your LLM judge model using multi training dataset in a unified manner.

Prepare Data

make raw data dir by running Data/make_raw_files_dir.sh
Download raw files. Download raw/auto_judge/pairwise_traindata.jsonl at pairwise_traindata.jsonl.

Download raw/auto_judge/testdata_pairwise.jsonl at testdata_pairwise.jsonl.

Download raw/llm_bar/GPTInst/dataset.json at result.json.

Download raw/llm_bar/GPTOut/dataset.json at result.json.

Download raw/llm_bar/Manual/dataset.json at result.json.

Download raw/llm_bar/Natural/dataset.json at result.json.

Download raw/llm_bar/Neighbor/dataset.json at result.json.

Download raw/mt_bench/gpt-4_pair.jsonl at gpt-4_pair.jsonl.

Download raw/panda_lm/pandalm_test.json at pandalm_test.json.
format the raw data into unified data format by running corresponding scripts in Data/scripts/process.
combine the formatted data to openai format by running Data/scripts/prompts/combine.py and generate the same format test data by running Data/scripts/prompts/combine_test.py.
soft link the output openai format training file to LLaMA-Factory/data and add an entry in dataset_info.json to fit LLaMA-Factory training pipeline, here is an example:

  "train_openai": {
    "file_name": "train_openai.json",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant",
      "system_tag": "system"
    }
  },

Download Pretrain models

See LLaMA-Factory for supported pretrain large language models, create a sub-folder Models and put the pretrained checkpoints in it.

Fine-tuning your Judge Model!

Reference our training examples in Train scripts, note that configs of these examples are saved in Configs.

Test Judge Model performance

See test scripts in Test scripts for more details, Judge Models are tested on PandaLM, LLM bar, MT bench and Auto-j test sets with respect to human preference.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Configs		Configs
Data		Data
LLaMA-Factory		LLaMA-Factory
Test_scripts		Test_scripts
Train_scripts		Train_scripts
img		img
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mv.sh		mv.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Train your LLM judge model using multi training dataset in a unified manner.

Prepare Data

Download Pretrain models

Fine-tuning your Judge Model!

Test Judge Model performance

About

Releases

Packages

Languages

License

myendless1/llm-as-a-judge

Folders and files

Latest commit

History

Repository files navigation

Train your LLM judge model using multi training dataset in a unified manner.

Prepare Data

Download Pretrain models

Fine-tuning your Judge Model!

Test Judge Model performance

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages