Evaluating Dialect Robustness of Large Language Models via Conversation Understanding

Authors: Dipankar Srirag and Nihar Ranjan Sahoo and Aditya Joshi

Abstract

With an evergrowing number of LLMs reporting superlative performance for English, their ability to perform equitably for different dialects of English ($\textit{i.e.}$, dialect robustness) needs to be ascertained. Specifically, we use English language (US English or Indian English) conversations between humans who play the word-guessing game of taboo. We formulate two evaluative tasks: target word prediction (TWP) ($\textit{i.e.}$, predict the masked target word in a conversation) and target word selection (TWS) ($\textit{i.e.}$, select the most likely masked target word in a conversation, from among a set of candidate words). Extending MD-3, an existing dialectic dataset of taboo-playing conversations, we introduce MMD-3, a target-word-masked version of MD-3 with the en-US and en-IN subsets. We create two subsets: en-MV (where en-US is transformed to include dialectal information) and en-TR (where dialectal information is removed from en-IN). We evaluate one open-source (Llama3) and two closed-source (GPT-4/3.5) LLMs. LLMs perform significantly better for US English than Indian English for both TWP and TWS tasks, for all settings, exhibiting marginalisation against the Indian dialect of English. While GPT-based models perform the best, the comparatively smaller models work more equitably after fine-tuning. Our error analysis shows that the LLMs can understand the dialect better after fine-tuning using dialectal data. Our evaluation methodology exhibits a novel way to examine attributes of language models using pre-existing dialogue datasets.

Keywords

Large Language Models
Dialect Robustness
Conversation Understanding
Word-Guessing Game

BibTeX Citation

@misc{srirag2024evaluating,
      title={Evaluating Dialect Robustness of Language Models via Conversation Understanding}, 
      author={Dipankar Srirag and Nihar Ranjan Sahoo and Aditya Joshi},
      year={2024},
      eprint={2405.05688},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
m-md3		m-md3
md3		md3
messages		messages
models/gpt		models/gpt
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating Dialect Robustness of Large Language Models via Conversation Understanding

Abstract

Keywords

BibTeX Citation

Contact

About

Releases

Packages

Languages

dipankarsrirag/eval-dialect-robust

Folders and files

Latest commit

History

Repository files navigation

Evaluating Dialect Robustness of Large Language Models via Conversation Understanding

Abstract

Keywords

BibTeX Citation

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages