Skip to content

repo for initial experiments in eighteenth century wem evaluation

Notifications You must be signed in to change notification settings

NEU-DSG/wwp_analogies_word_embedding_models

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wwp_analogies_wem

Repository Description

This repo contains data, scripts, and corpora related to developing a set of analogies relevant to historic text for model evaluation tasks. Word Embedding Models are typicaL evaluated by presenting the model with a series of analogies (for example king - man = ?) and assessing how well the model performs when asked to interpret these analogies. Currently, WEMs are typically evaluated used a set of analogies developed by Mikolov et al. (2013b), who also created the WEM algorithm Word2Vec. The analogy set contains 19544 question pairs which can be used to evaluate how well the model understands the semantic relationships within its vocabulary. This analogy set, however, reflects more contemporary understandings of vocabulary and semantic relationships and thus is not optimal for evaluating a WEM trained on pre-twentieth century texts. This project is interested in both developing a workflow for creating custom analogies for historic models as well as offering a tentative set of analogies based on data obtained from a series of large corpora of pre-nineteenth century texts.

Relevant Links

  • Spreadsheet of word counts and initial analogy testing
  • Link to a folder containing the entire corpus as a single text file as well as a cleaned version of the same text file with stop words removed

About

repo for initial experiments in eighteenth century wem evaluation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 75.3%
  • Python 24.7%