jrank: Ranking Japanese LLMs

| Ranking | Blog | Discord |

This repository supports YuzuAI's Rakuda leaderboard of Japanese LLMs, which is a Japanese-focused analogue of LMSYS' Vicuna eval.

Adding a model to Rakuda

To add a model to the Rakuda leaderboard, first have the model answer the Rakuda questions. These questions are stored in jrank/questions/ and on hugging-face.

If you wish, you can use the jrank/get_model_qa.py script to generate these answers. This script loads and runs models using model adapters from FastChat. Custom adapters can also be implemented in jrank/adapters.py, and scripts showing exactly the commands used to run existing models on the leaderboard are stored in jrank/jobs/. If your model is only accessible via an API, consult jrank/get_gpt_qa.py.

Once your model has answered the Rakuda questions, use jrank/matchmaker.py to send pairs of answers from your model and other ranked models to an external reviewer, by default GPT-4 (jrank/reviewer_gpt.py). The reviewer will evaluate which answer is better and store its results in jrank/reviews.

Finally run the analysis notebook jrank/bradley-terry.ipynb which will perform a Bayesian analysis of the reviews and infer the strength of each model. The ranking will be output to jrank/rankings/.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github/workflows		.github/workflows
jrank		jrank
.flake8		.flake8
.gitignore		.gitignore
Makefile		Makefile
pyproject.toml		pyproject.toml
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

jrank: Ranking Japanese LLMs

Adding a model to Rakuda

About

Releases

Packages

Languages

TRI-ML/japanese-llm-ranking

Folders and files

Latest commit

History

Repository files navigation

jrank: Ranking Japanese LLMs

Adding a model to Rakuda

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages