Feature Request for TextCategorizer: add numeric features for text classification #2253
Replies: 4 comments
-
Looking for a similar feature. I have a text and some additional numerical fields that should help in classification. |
Beta Was this translation helpful? Give feedback.
-
Until it'll be possible for spaCy to incorporate additional features for the classifier I have a quick question for the explosion team, which I assume can be useful for followers of this thread. Imagine I have some text that I want to classify - I have some really useful metadata as well that should be used as additional features for the classifier. Would it make sense to take my features and combine with The custom similarity section might be useful as well. |
Beta Was this translation helpful? Give feedback.
-
Regarding @mr-bjerre quick question (in case he hasn't found an answer yet) I just dealt with a task which maybe fits his imaginary scenario. This is the way I actually dealt with it:
Then the same has been done with the test set. In both cases, I used en_core_web_lg (is there any better choice?). Finally, those predictions on the 20% of train were used to train a scikit logistic regression (also tried a scikit MLP and a scikit SVM with poorer validation performance). (You may find a more detailed explanation of the procedure in this Stack Overflow issue I wrote) Hope it helps. Anyway, I feel this is a very artificial and "third party" way to deal with metadata. |
Beta Was this translation helpful? Give feedback.
-
While I realise this is quite an old thread, I just wanted to point out that with the upcoming v3 version of spaCy, it should become much easier to customize these kind of things. You can find more information about the current Release Candidate here: https://nightly.spacy.io/ Another interesting recent discussion about using custom features (in v3) is here: #6527, specifically see this comment by @bratao! For more details on customizing model architectures, see here: https://nightly.spacy.io/usage/layers-architectures |
Beta Was this translation helpful? Give feedback.
-
I have extra, numeric data that comes along with the text that I am trying to classify. I suspect that these features will help the model learn.
I also have sentence/paragraph level embeddings that would help the classifier out.
I imagine this situation comes up fairly frequently for text classification - there is data that comes along with the text that may be able to help classification.
Your Environment
Beta Was this translation helpful? Give feedback.
All reactions