-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory increment and release when loading model via PretrainedModel.from_pretrained #18782
Comments
@ydshieh has been looking into memory leaks as well recently and might have some insights for you! |
Hi @tobyych Could you also try to add |
Hi @ydshieh, Tried to add For single
For single
For multiple
For multiple
With the explicit garbage collection, single As for multiple Any clue what weren't collected by the GC previously? |
Hi, @tobyych Glad to know I don't know what weren't collected by the GC previously. In general, (I believe) it's not easy to know exactly what I will try to check the |
@ydshieh, I tried to inspect the objects that were collected by the GC between def hf_load():
# bert-base-uncased: 421MB on disk
model = AutoModelForMaskedLM.from_pretrained("bert-base-uncased")
del model
d = defaultdict(int)
for o in gc.get_objects():
try:
d[o.__module__] += sys.getsizeof(o)
except:
d['others'] += sys.getsizeof(o)
for k, v in d.items():
if type(k) is str and k.startswith("transformers"):
print(k, v)
gc.collect()
|
Thanks @ydshieh! |
Issue
I was trying to understand the memory usage when loading a Hugging Face model. I found that when loading the model via
AutoModelForMaskedLM.from_pretrained("bert-base-uncased")
, the resulting increment in memory was (1) larger than the cached BERT model on disk (859MB v.s. 421MB) and (2) when deleting the variable, not all of the allocated memory got released. On the other hand, if I just dotorch.load("[path to cached model]")
, the memory allocation and release matched and the number was very close to that on disk. May I know why was there such a difference in behaviour?Code to reproduce the issue
Profile
hf_load
:direct_load
:To supplement, I also observed that when running
hf_load
above multiple times, the memory usage was rather unobvious.It increased in the first two times, but did not keep increasing from the third time onwards. I wonder how could this be explained.
P.S. Also attached the case for
direct_load
above. No increment was observed.Supplementary information
OS: 5.10.60.1-microsoft-standard-WSL2, 4.15.0-1113-azure #126~16.04.1-Ubuntu
Python: 3.8.12
PyTorch: 1.11.0
Transformers: 4.21.2
@LysandreJik
The text was updated successfully, but these errors were encountered: