-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native coref component #7264
Native coref component #7264
Conversation
* initial coref_er pipe * matcher more flexible * base coref component without actual model * initial setup of coref_er.score * rename to include_label * preliminary score_clusters method * apply scoring in coref component * IO fix * return None loss for now * rename to CoreferenceResolver * some preliminary unit tests * use registry as callable
Status March 1:
While all of this is mostly dummy framework, it already helped discover some bugs & required functionality, cf PRs #7197, #7209 and #7225. Going forward, having this bare framework should facilitate working on this functionality with different people in parallel, filling in different parts... TODO
Open questions / current issues
|
Just saying that I hope that the state of the art Anyway this is a very welcome improvement that I'm looking forward :) |
This includes the coref code that was being tested separately, modified to work in spaCy. It hasn't been tested yet and presumably still needs fixes. In particular, the evaluation code is currently omitted. It's unclear at the moment whether we want to use a complex scorer similar to the official one, or a simpler scorer using more modern evaluation methods.
Ended up not making a difference, but oh well.
When sentences are not available, just treat the whole doc as one sentence. A reasonable general fallback, but important due to the init call, where upstream components aren't run.
Training seems to actually run now!
This makes their scope tighter and more contained, and has the nice side effect that fewer things need to be passed around for backprop.
This is closer to the traditional evaluation method. That uses an average of three scores, this is just using the bcubed metric for now (nothing special about bcubed, just picked one). The scoring implementation comes from the coval project. It relies on scipy, which is one issue, and is rather involved, which is another. Besides being comparable with traditional evaluations, this scoring is relatively fast.
The intent of this was that it would be a component pipeline that used entities as input, but that's now covered by the get_mentions function as a pipeline arg.
Update Coref Docs
Fix tokenization mismatch handling in coref
This was changed by merge
This was changed by merge
There's no guarantee about the order in which SpanGroup keys will come out, so access them in sorted order when doing comparisons.
This was necessary when the tok2vec_size option was necessary.
Dimension inference in Coref
This was probably used in the prototyping stage, left as a reference, and then forgotten. Nothing uses it any more.
@explosion-bot please test_gpu |
@explosion-bot please test_gpu |
URL: https://buildkite.com/explosion-ai/spacy-gpu-test-suite/builds/100 |
Closing this PR, as we'll release the functionality in The docs PR is here: #11291 |
Just wanted to send a quick update about
We'd love for you to try this out, and any feedback is very welcome over at the discussion forum! |
Work-in-progress
Description
Creating a native coref component in spaCy
Types of change
new feature
Checklist