-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add beam_parser and beam_ner components for v3 #6369
Conversation
…into feature/v3-beam
Just a +1. I trained my own NER in v3 and using this PR with a small beam size, and I got some expressive gains over the greedy NER version👏 |
@bratao Oh that's really nice to hear, thanks! Can you share:
|
Hello @honnibal , It is composed of 28 Documents. The biggest have 72k tokens. The average amount of tokens per document was 12k. I used batches of up to 64k tokens. Regular NER: Beam NER ( Beam-size of 3): For comparison |
Thanks for the details! It's a shame about the speed. If you want to make it a bit faster (possibly at the expense of accuracy), you could try |
Still testing.Okay this should be mergeable now. The config usage is like this:
The affordances for getting probabilities out of the beam aren't really there at the moment, I want to build them into a second PR.
The PR has a lot of incidental changes to the parser, and requires models to be retrained. The incidental changes are introduced due to problems with the transition system and state class that made the beam parsing slower and ineffective. Specifically, the parser relied on this mechanism where we would "fast forward" through states that had only one valid action. This fast-forwarding isn't correct for the beam objective, since states still need to be scored under the global model, even if there's only one next action.
I've also cleaned up the definition of the
Break
transition to be simpler and more consistent. It now inserts a sentence break beginning atB[1]
, i.e. the first word of the buffer. Previously the break was inserted at the leftmost edge ofB[0]
. The new definition lets the parser see both the last word of the sentence and the first word of the next sentence in the state. It also reduces the interaction with the other actions, and makes it easier to respect preset sentence boundaries.The
StateC
data structure has also been revised considerably, to reduce the expense of copy operations that made the beam slow on long inputs. We now don't copy theTokenC*
array, and the parse is now quicker to copy, especially for states near the beginning.