LiTERatE - Literary Translation Evaluation and Rating Ensemble

LiTERatE is a benchmark specifically designed for evaluating machine translation systems on literary text from Chinese, Japanese, and Korean languages. Unlike traditional MT benchmarks that focus on news articles, technical documentation, or general text, LiTERatE focuses on the unique challenges of literary translation with its creative and nuanced nature.

Overview

Our evaluation uses chunks of 200-500 CJK characters as the basic unit, providing terminology glossaries and contextual information to all systems. An ensemble of LLMs judges translations through head-to-head comparisons with human translations, achieving 82% accuracy compared to decisive human judgments.

The scores below represent each system's win rate against human translators (0-100). A score of 50 indicates parity with human translation quality, while higher scores suggest superior performance.

This repository contains the website for the LiTERatE benchmark, which showcases the leaderboard of various machine translation systems evaluated on literary translation tasks.

LiTERatE Leaderboard

The scores below represent the overall head-to-head win-rate versus human translators for each system.

Rank	Model	Version	Win Rate (%)
1	Deepseek R1	—	87.3
2	Omni Qi	—	67.6
3	o3-mini	o3-mini-2025-01-31	62.7
4	GPT-4o	gpt-4o-2024-11-20	61.0
5	Deepseek V3	—	57.7
6	Claude 3.7 Sonnet	claude-3-7-sonnet-20250219	54.0
7	Claude 3.5 Sonnet	claude-3-5-sonnet-20241022	52.0
8	Gemini 1.5 Pro	gemini-1.5-pro-002	50.7
9	Qwen Max	qwen-max-2025-01-25	49.3
10	Qwen Plus	qwen-plus-2025-01-25	49.3
11	Gemini 2.0 Flash	gemini-2.0-flash-001	43.0
12	Mistral Large	mistral-large-2411	40.0
13	Gemini Flash 1.5 8B	gemini-1.5-flash-8b-001	38.3
14	GPT-4o-mini	gpt-4o-mini-2024-07-18	35.3
15	Phi-4	—	33.0
16	Llama 3.3 70B	llama-3.3-70b-instruct	32.7
17	Gemini 2.0 Flash Lite	gemini-2.0-flash-lite-001	31.3
18	Claude 3.5 Haiku	claude-3-5-haiku-20241022	30.7
19	Mistral Small 3	mistral-small-24b-instruct-2501	27.7
20	Qwen Turbo	qwen-turbo-2024-11-01	27.3
21	Google Translate (NMT)	—	6.7

Last updated: 2025-02-27

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
public		public
scripts		scripts
.gitignore		.gitignore
README.md		README.md
biome.json		biome.json
next.config.ts		next.config.ts
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LiTERatE - Literary Translation Evaluation and Rating Ensemble

Overview

LiTERatE Leaderboard

About

Languages

readomni/literate

Folders and files

Latest commit

History

Repository files navigation

LiTERatE - Literary Translation Evaluation and Rating Ensemble

Overview

LiTERatE Leaderboard

About

Topics

Resources

Stars

Watchers

Forks

Languages