Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new models for NER information extraction #20

Open
eriknovak opened this issue Jul 17, 2024 · 0 comments
Open

Add new models for NER information extraction #20

eriknovak opened this issue Jul 17, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@eriknovak
Copy link
Owner

Connected to a problem?

The current NER information extraction focuses on using the GLiNER model, specifically urchade/gliner_multi_pii-v1. While this model does support some different languages, we would need models that would cover a more extensive list of languages. Furthermore, the NER model should support various domains as well.

Solution?

Find NER datasets or create synthetic datasets that support different languages and domains. For this, we could use the scripts provided by the GLiNER package and publish the trained models on the huggingface hub.

An additional bonus would be to evaluate these models in different languages and domains. However, this could be difficult due to the lack of open datasets for these use cases.

Alternatives?

No response

@eriknovak eriknovak added the enhancement New feature or request label Jul 17, 2024
@eriknovak eriknovak self-assigned this Jul 17, 2024
@eriknovak eriknovak removed their assignment Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant