-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the memory use of the pairwise linear model constant #45
Merged
shadeMe
merged 15 commits into
explosion:v4
from
danieldk:feature/pairwise-bilinear-constant-mem
Jun 26, 2023
Merged
Make the memory use of the pairwise linear model constant #45
shadeMe
merged 15 commits into
explosion:v4
from
danieldk:feature/pairwise-bilinear-constant-mem
Jun 26, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The Thinc part of the pairwire bilinear model is fairly simple before this change: we would collect the splits from all documents and then pad them. However, this caused the model to run out of memory on large docs, since it has to compute many n*n matrices (all padded to the longest sequence length). It would also perform unnecessary computations on many padding time steps. This change make the memory use independent of the doc length (given a fixed split length) by doing the following: - Get all splits and flatten to a list of split representations. (`with_splits`) - Batch the splits by their padded sizes. This ensures that memory use is constant when splits have a maximum size. This also permits some buffering, so that we get more equisized batches. (`with_minibatch_by_padded_size`) - The splits in the batches are padded and passed to the Torch model. Since the outputs of the Torch model are matrices, we unpad taking this into account. (`with_pad_seq_unpad_matrix`) In contrast to most `with_*` layers, `with_splits` is not symmetric. It takes at its input representations for each document (`List[Floats2d]`), however it outputs pairwise score matrices per split. The reason is that since the dimensions of the score matrices differ per split, we cannot concatenate them at a document level.
Assigning to @shadeMe, since this looks somewhat similar to the data massaging that we have in curated transformers. |
shadeMe
reviewed
Jun 6, 2023
spacy_experimental/biaffine_parser/tests/test_with_pad_seq_unpad_matrix.py
Outdated
Show resolved
Hide resolved
spacy_experimental/biaffine_parser/tests/test_with_minibatch_by_padded_size.py
Outdated
Show resolved
Hide resolved
This needs to be fixed in the transition-based parser.
Use explicitly from numpy.
…near-constant-mem
…near-constant-mem
shadeMe
approved these changes
Jun 26, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a couple of minor typos; can be merged once they're fixed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The Thinc part of the pairwire bilinear model is fairly simple before this change: we would collect the splits from all documents and then pad them. However, this caused the model to run out of memory on large docs, since it has to compute many n*n matrices (all padded to the longest sequence length). It would also perform unnecessary computations on many padding time steps.
This change make the memory use independent of the doc length (given a fixed split length) by doing the following:
with_splits
)with_minibatch_by_padded_size
)with_pad_seq_unpad_matrix
)In contrast to most
with_*
layers,with_splits
is not symmetric. It takes at its input representations for each document (List[Floats2d]
), however it outputs pairwise score matrices per split. The reason is that since the dimensions of the score matrices differ per split, we cannot concatenate them at a document level.