Skip to content

Contains source code to compute document embeddings (GloVe, Specter) for the academic articles corpus

License

Notifications You must be signed in to change notification settings

vitality-vis/embed

Repository files navigation

vitaLITy: creating GloVe and Specter document embeddings

Requirements

  • Python 3.9 - Link - tested on Python 3.9 on MacOSX Sonoma
  • pip - Link - package installer for Python
  • venv - Link - Serves files in virtual environment

Setup

  • Create and activate a Python virtual environment. We have tested using Python3.9.
  • brew install gcc
  • export CC=/opt/homebrew/Cellar/gcc/14.1.0_2/bin/g++-14 This will be different for different users/systems.
  • export CFLAGS="-Wa,-q"
  • pip install --upgrade pip setuptools wheel
  • python -m pip install -r requirements.txt
  • python -m spacy download en_core_web_sm
  • python -m nltk.downloader popular
  • Set OPEN_AI_KEY in config.py.
  • export TOKENIZERS_PARALLELISM=false

Note: pip cache purge might be needed sometimes to start fresh installation.

Run

  • Configure file paths in config.py. Sample data files are provided:
    • data/sample-dataset-sans-embeddings.tsv - the output file from the scraper module as the input file to compute embeddings.
    • data/sample-dataset-with-embeddings.tsv - the output file with computed embeddings.
  • Run python embed.py

Credits

vitaLITy was created by Arpit Narechania, Alireza Karduni, Ryan Wesslen, and Emily Wall.

Citation

@article{narechania2021vitality,
  title={vitaLITy: Promoting Serendipitous Discovery of Academic Literature with Transformers \& Visual Analytics},
  author={Narechania, Arpit and Karduni, Alireza and Wesslen, Ryan and Wall, Emily},
  journal={IEEE Transactions on Visualization and Computer Graphics},
  year={2022},
  doi={10.1109/TVCG.2021.3114820},
  publisher={IEEE}
}

License

The software is available under the MIT License.

Contact

If you have any questions, feel free to open an issue or contact Arpit Narechania.

About

Contains source code to compute document embeddings (GloVe, Specter) for the academic articles corpus

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages