wwp_analogies_wem

Repository Description

This repo contains data, scripts, and corpora related to developing a set of analogies relevant to historic text for model evaluation tasks. Word Embedding Models are typicaL evaluated by presenting the model with a series of analogies (for example king - man = ?) and assessing how well the model performs when asked to interpret these analogies. Currently, WEMs are typically evaluated used a set of analogies developed by Mikolov et al. (2013b), who also created the WEM algorithm Word2Vec. The analogy set contains 19544 question pairs which can be used to evaluate how well the model understands the semantic relationships within its vocabulary. This analogy set, however, reflects more contemporary understandings of vocabulary and semantic relationships and thus is not optimal for evaluating a WEM trained on pre-twentieth century texts. This project is interested in both developing a workflow for creating custom analogies for historic models as well as offering a tentative set of analogies based on data obtained from a series of large corpora of pre-nineteenth century texts.

Relevant Links

Spreadsheet of word counts and initial analogy testing
Link to a folder containing the entire corpus as a single text file as well as a cleaned version of the same text file with stop words removed

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
models		models
scripts		scripts
visualizations		visualizations
.gitattributes		.gitattributes
README.md		README.md
wem-evaluation.ipynb		wem-evaluation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wwp_analogies_wem

Repository Description

Relevant Links

About

Releases

Packages

Languages

NEU-DSG/wwp_analogies_word_embedding_models

Folders and files

Latest commit

History

Repository files navigation

wwp_analogies_wem

Repository Description

Relevant Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages