Pinned Loading
-
EleutherAI/lm-evaluation-harness
EleutherAI/lm-evaluation-harness PublicA framework for few-shot evaluation of language models.
-
ltgoslo/noreval
ltgoslo/noreval PublicA Norwegian Language Understanding and Generation Evaluation Benchmark
-
ai-forever/mgpt
ai-forever/mgpt PublicMultilingual Generative Pretrained Model
-
RussianNLP/RussianSuperGLUE
RussianNLP/RussianSuperGLUE PublicRussian SuperGLUE benchmark
-
PragmaticsLab/vote_and_rank
PragmaticsLab/vote_and_rank PublicNovel aggregation methods for multi-task NLP benchmarking
-
Toloka/beemo
Toloka/beemo PublicBenchmark for fine-grained machine-generated text detection. 6.5k texts written by humans, generated by ten open-source instruction-finetuned LLMs and edited by expert annotators.
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.