Skip to content

Latest commit

 

History

History
53 lines (40 loc) · 2.3 KB

README.md

File metadata and controls

53 lines (40 loc) · 2.3 KB

vitaLITy: creating GloVe and Specter document embeddings

Requirements

  • Python 3.9 - Link - tested on Python 3.9 on MacOSX Sonoma
  • pip - Link - package installer for Python
  • venv - Link - Serves files in virtual environment

Setup

  • Create and activate a Python virtual environment. We have tested using Python3.9.
  • brew install gcc
  • export CC=/opt/homebrew/Cellar/gcc/14.1.0_2/bin/g++-14 This will be different for different users/systems.
  • export CFLAGS="-Wa,-q"
  • pip install --upgrade pip setuptools wheel
  • python -m pip install -r requirements.txt
  • python -m spacy download en_core_web_sm
  • python -m nltk.downloader popular
  • Set OPEN_AI_KEY in config.py.
  • export TOKENIZERS_PARALLELISM=false

Note: pip cache purge might be needed sometimes to start fresh installation.

Run

  • Configure file paths in config.py. Sample data files are provided:
    • data/sample-dataset-sans-embeddings.tsv - the output file from the scraper module as the input file to compute embeddings.
    • data/sample-dataset-with-embeddings.tsv - the output file with computed embeddings.
  • Run python embed.py

Credits

vitaLITy was created by Arpit Narechania, Alireza Karduni, Ryan Wesslen, and Emily Wall.

Citation

@article{narechania2021vitality,
  title={vitaLITy: Promoting Serendipitous Discovery of Academic Literature with Transformers \& Visual Analytics},
  author={Narechania, Arpit and Karduni, Alireza and Wesslen, Ryan and Wall, Emily},
  journal={IEEE Transactions on Visualization and Computer Graphics},
  year={2022},
  doi={10.1109/TVCG.2021.3114820},
  publisher={IEEE}
}

License

The software is available under the MIT License.

Contact

If you have any questions, feel free to open an issue or contact Arpit Narechania.