Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug? cpu memory leak in model.encode when training in gpu #3138

Open
JazJaz426 opened this issue Dec 18, 2024 · 1 comment
Open

bug? cpu memory leak in model.encode when training in gpu #3138

JazJaz426 opened this issue Dec 18, 2024 · 1 comment

Comments

@JazJaz426
Copy link

JazJaz426 commented Dec 18, 2024

Hi,

In my work I was using sentence transformer's pretrained "sentence-transformers/all-MiniLM-L6-v2" as the language model. When encoding 8m documents in the gpu environment, it OOM'ed around 1m. I used memray to profile the memory usage and realized there's a memory leak in sentence transformer's _model.encode function (see the growing cpu memory usage over time, specifically 4 steps as I was running over 4 batches).

self._model.encode(
      text,
      batch_size=self._ENCODE_TEXT_BATCH_SIZE,
      device=self._model.device,
      normalize_embeddings=True,
  )

Screenshot 2024-12-17 at 2 09 24 pm

Noticed memray only tracks cpu memory usage and there was no such growing pattern when I was encoding the same dataset in cpu environment.

Screenshot 2024-12-17 at 2 07 41 pm

I had profiled other parts of the code and ruled out all other possibilities (e.g., uncleaned hanging around references) that could cause memory leak. I also searched in issues and seems that there's nothing relevant/helpful. Since sentence transformer is used by tons of folks in gpu environment, I wonder if the memory leak of self._model.encode is indeed true. If so, is there any way to address this issue by either fixing the code / changing the way I use this code?

Super curious and confused. Thanks for any help in advance.

@JazJaz426 JazJaz426 changed the title memory leak in model.encode when training in gpu bug? memory leak in model.encode when training in gpu Dec 18, 2024
@JazJaz426 JazJaz426 changed the title bug? memory leak in model.encode when training in gpu bug? cpu memory leak in model.encode when training in gpu Dec 18, 2024
@JazJaz426
Copy link
Author

JazJaz426 commented Dec 18, 2024

Found encode_multi_process function and looks like for any reasonably big dataset (e.g., >10k) it should be encoded using this function instead.

The documentation is really not very helpful - encode function seems to have no documentation about this performance issue / data size limit. encode_multi_process also mentions very briefly about "only use when dataset is large" without mentioning the difference between encode and encode_multi_process. Did some experiment myself and looks like the original encode function has CPU memory leak in the GPU environment. For CPU only environment, it's also not as performant too.

performance per batch before and after (10K)
only cpu
encoding: 6min vs 3min
cpu memory: 400MB vs 200MB
gpu available
encoding: 30s vs 15s
cpu memory: 1-2G w memory leak vs 500MB

So looks like encode_multi_process should be strictly the better choice for most use cases? If so, in each function's documentation, can we mention the difference between them so people can be aware of it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant