This is Slovak Spacy model.
- Requires Spacy 3.x.
- Contains Floret Word Vectors.
- Tagger module uses Slovak National Corpus Tagset.
- Morphological analyzer uses Universal dependencies tagset and is trained on Slovak dependency treebank.
- Lemmatizer is trained on Slovak dependency treebank.
- Named entity recognizer is trained separately on WikiAnn database.
-
- Model for trained lemmatization, POS tagging and dependency relations.
- Contains Floret Word Vectors, trained on our web corpus.
- Should be without license issues.
-
Spacy 3.4, NER + Dependencies.
- Includes the dependencies model.
- This model uses separate fine-tuned model for NER recognition.
- Spacy 3.3, Dependencies. Model for trained lemmatization, POS tagging and dependency relations.
- Spacy 3.3, NER + Dependencies. This model uses separate fine-tuned model for NER recognition.
These models do not have word vectors.
Requirements for training:
- Anaconda virtual environment
- Spacy 3
- make
- bash
Usage:
-
Install dependencies in the Conda
./prepare-env.sh
-
Download and prepare data:
make
-
Train models
./train.sh
Author:
Daniel Hládek daniel.hladek@tuke.sk and Technical University of Košice
Sources:
- The model uses spacy-transformers and SlovakBERT.
- Part of Speech and Dependency relations The Slovak UD treebank with Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Semi-automatic named entities - Unspecified License