A library & tools to evaluate predictive language models.
-
Updated
Aug 9, 2023 - Python
A library & tools to evaluate predictive language models.
Code and data for "KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark" (LREC-COLING 2024)
A data construction and evaluation framework to quantify privacy norm awareness of language models (LMs) and emerging privacy risk of LM agents. (NeurIPS 2024 D&B)
Add a description, image, and links to the language-model-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the language-model-evaluation topic, visit your repo's landing page and select "manage topics."