Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with Text Classification Documentation Recipe #5224

Closed
ClaudMor opened this issue Mar 28, 2020 · 2 comments
Closed

Help with Text Classification Documentation Recipe #5224

ClaudMor opened this issue Mar 28, 2020 · 2 comments
Labels
feat / textcat Feature: Text Classifier usage General spaCy usage

Comments

@ClaudMor
Copy link

Hello,

From the documentation, there are two points which are not clear to me:

  1. In the context of text classification, could the textcat pipe benefit from being preceded by other spacy pipes (e.g. sentencizer, ner, tok2vec, etc)? If so, how?
  2. Is there any recommended way of using predictions coming separately from text classification oriented spaCy's features (like textcat and spaCy's word vectors) to improve performance?

I take the chance to also cite a Stack Overflow issue I wrote, which I hope could help many non-programmers like me approaching spaCy to go beyond the basics.

Hope this is the right place to ask, thanks in advance for any help.

@svlandeg svlandeg added feat / textcat Feature: Text Classifier usage General spaCy usage labels Mar 30, 2020
@svlandeg
Copy link
Member

svlandeg commented Mar 30, 2020

Hi @claudio20497 : These kind of questions are probably better kept at StackOverflow, where this is a larger community that can help. We don't always have the bandwidth to review specific use-cases, and we'd like to keep this issue tracker focused specifically on bug reports and feature requests.

That said, to answer your first question: The textcat pipeline currently does not consider features from previous pipes like NER or tagging. For spacy v.3.0 however, we're revamping the library so that you'll be able to change the ML models by just providing a different configuration file. This means you'll also be able to easily swap in a different tok2vec component. We're currently working on this on the develop branch. You might be interested in looking at this PR: #5143, which gives you an idea of where we're heading.

In the meantime, you may also be interested in this: https://github.com/explosion/spacy-transformers/blob/master/examples/train_textcat.py

I'll close this issue for now as there is no real action point on our side. I'll try to have a look at your SO post later this week.

@lock
Copy link

lock bot commented May 5, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feat / textcat Feature: Text Classifier usage General spaCy usage
Projects
None yet
Development

No branches or pull requests

2 participants