You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my work I was using sentence transformer's pretrained "sentence-transformers/all-MiniLM-L6-v2" as the language model. When encoding 8m documents in the gpu environment, it OOM'ed around 1m. I used memray to profile the memory usage and realized there's a memory leak in sentence transformer's _model.encode function (see the growing cpu memory usage over time, specifically 4 steps as I was running over 4 batches).
Noticed memray only tracks cpu memory usage and there was no such growing pattern when I was encoding the same dataset in cpu environment.
I had profiled other parts of the code and ruled out all other possibilities (e.g., uncleaned hanging around references) that could cause memory leak. I also searched in issues and seems that there's nothing relevant/helpful. Since sentence transformer is used by tons of folks in gpu environment, I wonder if the memory leak of self._model.encode is indeed true. If so, is there any way to address this issue by either fixing the code / changing the way I use this code?
Super curious and confused. Thanks for any help in advance.
The text was updated successfully, but these errors were encountered:
JazJaz426
changed the title
memory leak in model.encode when training in gpu
bug? memory leak in model.encode when training in gpu
Dec 18, 2024
JazJaz426
changed the title
bug? memory leak in model.encode when training in gpu
bug? cpu memory leak in model.encode when training in gpu
Dec 18, 2024
Found encode_multi_process function and looks like for any reasonably big dataset (e.g., >10k) it should be encoded using this function instead.
The documentation is really not very helpful - encode function seems to have no documentation about this performance issue / data size limit. encode_multi_process also mentions very briefly about "only use when dataset is large" without mentioning the difference between encode and encode_multi_process. Did some experiment myself and looks like the original encode function has CPU memory leak in the GPU environment. For CPU only environment, it's also not as performant too.
performance per batch before and after (10K)
only cpu
encoding: 6min vs 3min
cpu memory: 400MB vs 200MB
gpu available
encoding: 30s vs 15s
cpu memory: 1-2G w memory leak vs 500MB
So looks like encode_multi_process should be strictly the better choice for most use cases? If so, in each function's documentation, can we mention the difference between them so people can be aware of it?
Hi,
In my work I was using sentence transformer's pretrained "sentence-transformers/all-MiniLM-L6-v2" as the language model. When encoding 8m documents in the gpu environment, it OOM'ed around 1m. I used memray to profile the memory usage and realized there's a memory leak in sentence transformer's _model.encode function (see the growing cpu memory usage over time, specifically 4 steps as I was running over 4 batches).
Noticed memray only tracks cpu memory usage and there was no such growing pattern when I was encoding the same dataset in cpu environment.
I had profiled other parts of the code and ruled out all other possibilities (e.g., uncleaned hanging around references) that could cause memory leak. I also searched in issues and seems that there's nothing relevant/helpful. Since sentence transformer is used by tons of folks in gpu environment, I wonder if the memory leak of self._model.encode is indeed true. If so, is there any way to address this issue by either fixing the code / changing the way I use this code?
Super curious and confused. Thanks for any help in advance.
The text was updated successfully, but these errors were encountered: