Skip to content

Data repository for "Fine-tuning protein language models boosts predictions across diverse tasks"

License

CC-BY-4.0, MIT licenses found

Licenses found

CC-BY-4.0
LICENSE.txt
MIT
MIT-LICENSE.txt
Notifications You must be signed in to change notification settings

RSchmirler/data-repo_plm-finetune-eval

Repository files navigation

Repository for "Fine-tuning protein language models boosts predictions across diverse tasks"

This repo contains all data used and generated during this work (Preprint). We also provide Notebooks to reproduce our work, inlcuding examples.

  • Embedding contains notebooks to generate embeddings and train embeddings based predictors
  • Finetuning contains notebooks to finetune all protein language models used in our work
  • data contains all data for figures in the main manuscript
  • SOM data contains all data for figures and tables in the Supplementary Online Material
  • training_logs.zip contains the raw training history logging files our analysis is based on.
  • training data.zip contains all training datasets used for this work. Each dataset consists of a training, validation, and test set. When using those data, please quote and consult the authors of the original data sets. But we recommend using the original data sources (linked below) as data available here will not be kept updated and mainly serves reproduction purposes.

License

The data in this repository is released under terms of the CC-BY-4.0.

The source code in this repository is licensed under the MIT license, which you can find in the MIT-LICENSE.txt file.

About

Data repository for "Fine-tuning protein language models boosts predictions across diverse tasks"

Resources

License

CC-BY-4.0, MIT licenses found

Licenses found

CC-BY-4.0
LICENSE.txt
MIT
MIT-LICENSE.txt

Stars

Watchers

Forks

Packages

No packages published