Skip to content

A12Studios/semantle-es

Repository files navigation

Semantle-es

This is a spanish version of Semantle.

Running locally

One-time setup

  1. Get spanish Word2vec dataset from Spanish Billion Word Corpus and Embeddings. Download the word2vec binary format to the data directory. Unzip it
  2. Download the "Lista total de frecuencias" data file (CREA_total.ZIP) from Corpus de Referencia del Español Actual (CREA) - Listado de frecuencias to the data directory. Do not unzip it
  3. Create a python virtual environment: python3 -m venv .
  4. Activate the environment: source bin/activate
  5. Install all dependencies: python3 -m pip install -r requirements.txt
  6. Load model into sqlite db: python3 dump-vecs.py. Takes ~5min in a 2.4 GHz Intel Core i5 MacBook Pro
  7. Dump hints into pickle file: python3 dump-hints.py. Takes ~30mins in a 2.4 GHz Intel Core i5
  8. Load hints into sqlite db: python3 store-hints.py. Fast.
  9. I don't think we need/use the respelling feature of Semantle-en, so no need to run british.py

Running it

  1. Run web server: python3 semantle.py

Running in production

One-time setup

TBD

Running it

  1. Run web server: ./start_server_prod.sh

Attribution

Original Semantle code by David Turner. Changes:

  • Improved dump-hints.py performance
  • Add progress indicator to dump and store scripts
  • Localization

Word2vec data set by Cristian Cardellino. Citation:

Cristian Cardellino: Spanish Billion Words Corpus and Embeddings (March 2016), https://crscardellino.github.io/SBWCE/

Frequent words data set from Corpus de referencia del español actual. Citation:

REAL ACADEMIA ESPAÑOLA: Banco de datos (CREA) [en línea]. Corpus de referencia del español actual. http://www.rae.es [2022-02-25]

About

Source code for "Semantle en español"

Resources

License

GPL-3.0, Unknown licenses found

Licenses found

GPL-3.0
LICENSE
Unknown
COPYING

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published