bug? cpu memory leak in model.encode when training in gpu #3138

JazJaz426 · 2024-12-18T04:35:38Z

Hi,

In my work I was using sentence transformer's pretrained "sentence-transformers/all-MiniLM-L6-v2" as the language model. When encoding 8m documents in the gpu environment, it OOM'ed around 1m. I used memray to profile the memory usage and realized there's a memory leak in sentence transformer's _model.encode function (see the growing cpu memory usage over time, specifically 4 steps as I was running over 4 batches).

self._model.encode(
      text,
      batch_size=self._ENCODE_TEXT_BATCH_SIZE,
      device=self._model.device,
      normalize_embeddings=True,
  )

Noticed memray only tracks cpu memory usage and there was no such growing pattern when I was encoding the same dataset in cpu environment.

I had profiled other parts of the code and ruled out all other possibilities (e.g., uncleaned hanging around references) that could cause memory leak. I also searched in issues and seems that there's nothing relevant/helpful. Since sentence transformer is used by tons of folks in gpu environment, I wonder if the memory leak of self._model.encode is indeed true. If so, is there any way to address this issue by either fixing the code / changing the way I use this code?

Super curious and confused. Thanks for any help in advance.

The text was updated successfully, but these errors were encountered:

JazJaz426 · 2024-12-18T23:44:33Z

Found encode_multi_process function and looks like for any reasonably big dataset (e.g., >10k) it should be encoded using this function instead.

The documentation is really not very helpful - encode function seems to have no documentation about this performance issue / data size limit. encode_multi_process also mentions very briefly about "only use when dataset is large" without mentioning the difference between encode and encode_multi_process. Did some experiment myself and looks like the original encode function has CPU memory leak in the GPU environment. For CPU only environment, it's also not as performant too.

performance per batch before and after (10K)
only cpu
encoding: 6min vs 3min
cpu memory: 400MB vs 200MB
gpu available
encoding: 30s vs 15s
cpu memory: 1-2G w memory leak vs 500MB

So looks like encode_multi_process should be strictly the better choice for most use cases? If so, in each function's documentation, can we mention the difference between them so people can be aware of it?

JazJaz426 changed the title ~~memory leak in model.encode when training in gpu~~ bug? memory leak in model.encode when training in gpu Dec 18, 2024

JazJaz426 changed the title ~~bug? memory leak in model.encode when training in gpu~~ bug? cpu memory leak in model.encode when training in gpu Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug? cpu memory leak in model.encode when training in gpu #3138

bug? cpu memory leak in model.encode when training in gpu #3138

JazJaz426 commented Dec 18, 2024 •

edited

Loading

JazJaz426 commented Dec 18, 2024 •

edited

Loading

bug? cpu memory leak in model.encode when training in gpu #3138

bug? cpu memory leak in model.encode when training in gpu #3138

Comments

JazJaz426 commented Dec 18, 2024 • edited Loading

JazJaz426 commented Dec 18, 2024 • edited Loading

JazJaz426 commented Dec 18, 2024 •

edited

Loading

JazJaz426 commented Dec 18, 2024 •

edited

Loading