Skip to content

Train your LLM judge model using multi training dataset in a unified manner.

License

Notifications You must be signed in to change notification settings

myendless1/llm-as-a-judge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Train your LLM judge model using multi training dataset in a unified manner.

Our training results

Prepare Data

  1. make raw data dir by running Data/make_raw_files_dir.sh

  2. Download raw files. Download raw/auto_judge/pairwise_traindata.jsonl at pairwise_traindata.jsonl.

    Download raw/auto_judge/testdata_pairwise.jsonl at testdata_pairwise.jsonl.

    Download raw/llm_bar/GPTInst/dataset.json at result.json.

    Download raw/llm_bar/GPTOut/dataset.json at result.json.

    Download raw/llm_bar/Manual/dataset.json at result.json.

    Download raw/llm_bar/Natural/dataset.json at result.json.

    Download raw/llm_bar/Neighbor/dataset.json at result.json.

    Download raw/mt_bench/gpt-4_pair.jsonl at gpt-4_pair.jsonl.

    Download raw/panda_lm/pandalm_test.json at pandalm_test.json.

  3. format the raw data into unified data format by running corresponding scripts in Data/scripts/process.

  4. combine the formatted data to openai format by running Data/scripts/prompts/combine.py and generate the same format test data by running Data/scripts/prompts/combine_test.py.

  5. soft link the output openai format training file to LLaMA-Factory/data and add an entry in dataset_info.json to fit LLaMA-Factory training pipeline, here is an example:

  "train_openai": {
    "file_name": "train_openai.json",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant",
      "system_tag": "system"
    }
  },

Download Pretrain models

See LLaMA-Factory for supported pretrain large language models, create a sub-folder Models and put the pretrained checkpoints in it.

Fine-tuning your Judge Model!

Reference our training examples in Train scripts, note that configs of these examples are saved in Configs.

Test Judge Model performance

See test scripts in Test scripts for more details, Judge Models are tested on PandaLM, LLM bar, MT bench and Auto-j test sets with respect to human preference.

About

Train your LLM judge model using multi training dataset in a unified manner.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published