Make the memory use of the pairwise linear model constant #45

danieldk · 2023-03-30T14:57:06Z

The Thinc part of the pairwire bilinear model is fairly simple before this change: we would collect the splits from all documents and then pad them. However, this caused the model to run out of memory on large docs, since it has to compute many n*n matrices (all padded to the longest sequence length). It would also perform unnecessary computations on many padding time steps.

This change make the memory use independent of the doc length (given a fixed split length) by doing the following:

Get all splits and flatten to a list of split representations. (with_splits)
Batch the splits by their padded sizes. This ensures that memory use is constant when splits have a maximum size. This also permits some buffering, so that we get more equisized batches. (with_minibatch_by_padded_size)
The splits in the batches are padded and passed to the Torch model. Since the outputs of the Torch model are matrices, we unpad taking this into account. (with_pad_seq_unpad_matrix)

In contrast to most with_* layers, with_splits is not symmetric. It takes at its input representations for each document (List[Floats2d]), however it outputs pairwise score matrices per split. The reason is that since the dimensions of the score matrices differ per split, we cannot concatenate them at a document level.

The Thinc part of the pairwire bilinear model is fairly simple before this change: we would collect the splits from all documents and then pad them. However, this caused the model to run out of memory on large docs, since it has to compute many n*n matrices (all padded to the longest sequence length). It would also perform unnecessary computations on many padding time steps. This change make the memory use independent of the doc length (given a fixed split length) by doing the following: - Get all splits and flatten to a list of split representations. (`with_splits`) - Batch the splits by their padded sizes. This ensures that memory use is constant when splits have a maximum size. This also permits some buffering, so that we get more equisized batches. (`with_minibatch_by_padded_size`) - The splits in the batches are padded and passed to the Torch model. Since the outputs of the Torch model are matrices, we unpad taking this into account. (`with_pad_seq_unpad_matrix`) In contrast to most `with_*` layers, `with_splits` is not symmetric. It takes at its input representations for each document (`List[Floats2d]`), however it outputs pairwise score matrices per split. The reason is that since the dimensions of the score matrices differ per split, we cannot concatenate them at a document level.

danieldk · 2023-03-30T14:58:00Z

Assigning to @shadeMe, since this looks somewhat similar to the data massaging that we have in curated transformers.

spacy_experimental/biaffine_parser/arc_predicter.pyx

spacy_experimental/biaffine_parser/pairwise_bilinear.py

spacy_experimental/biaffine_parser/with_splits.py

spacy_experimental/biaffine_parser/with_pad_seq_unpad_matrix.py

spacy_experimental/biaffine_parser/tests/test_with_pad_seq_unpad_matrix.py

spacy_experimental/biaffine_parser/tests/test_with_minibatch_by_padded_size.py

This needs to be fixed in the transition-based parser.

Use explicitly from numpy.

…near-constant-mem

shadeMe

LGTM! Just a couple of minor typos; can be merged once they're fixed.

spacy_experimental/biaffine_parser/pairwise_bilinear.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

danieldk added the enhancement New feature or request label Mar 30, 2023

danieldk closed this Apr 11, 2023

danieldk reopened this Apr 11, 2023

danieldk added 2 commits April 11, 2023 15:23

Fixup pairwise bilinears backprop

be7e827

Remove unused imports

77d8da9

shadeMe reviewed Jun 6, 2023

View reviewed changes

danieldk added 10 commits June 12, 2023 09:37

Type fixes

cd7bf63

Remove workaround for NER training not working with sentence boundaries

19df781

This needs to be fixed in the transition-based parser.

Use unflatten_matrix in ArcPredicter.predict

2c15f0d

with_splits: more semantic naming

4fcb7de

Factor out memoize test utility function

325bd31

Documents shapes conversion function and fix a type

56e3277

cupy.testing.assert_equal does not exist

c8e749e

Use explicitly from numpy.

Merge remote-tracking branch 'upstream/v4' into feature/pairwise-bili…

3564114

…near-constant-mem

Merge remote-tracking branch 'upstream/v4' into feature/pairwise-bili…

29086c3

…near-constant-mem

Fix seed in ArcLabeler/Predicter overfit tests

db26bef

shadeMe approved these changes Jun 26, 2023

View reviewed changes

spacy_experimental/biaffine_parser/pairwise_bilinear.py Outdated Show resolved Hide resolved

spacy_experimental/biaffine_parser/pairwise_bilinear.py Outdated Show resolved Hide resolved

Daniël de Kok and others added 2 commits June 26, 2023 11:15

Fix typo

13f62cc

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

Another typo

4203053

shadeMe merged commit a139a5e into explosion:v4 Jun 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the memory use of the pairwise linear model constant #45

Make the memory use of the pairwise linear model constant #45

danieldk commented Mar 30, 2023

danieldk commented Mar 30, 2023

shadeMe left a comment

Make the memory use of the pairwise linear model constant #45

Make the memory use of the pairwise linear model constant #45

Conversation

danieldk commented Mar 30, 2023

danieldk commented Mar 30, 2023

shadeMe left a comment

Choose a reason for hiding this comment