-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.load
planned default flip for weights_only
#7799
Comments
@will-cromar any thoughts? |
…eights_only" Tests on XLA shard not fixed yet but there is an issue here pytorch/xla#7799 cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
Tests on XLA shard not fixed yet but there is an issue here pytorch/xla#7799 cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
…eights_only" Tests on XLA shard not fixed yet but there is an issue here pytorch/xla#7799 cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
Tests on XLA shard not fixed yet but there is an issue here pytorch/xla#7799 cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
Tests on XLA shard not fixed yet but there is an issue here pytorch/xla#7799 Pull Request resolved: #127627 Approved by: https://github.com/albanD ghstack dependencies: #132349
Gentle bump, @will-cromar what would be the best path forward here? |
Tests on XLA shard not fixed yet but there is an issue here pytorch/xla#7799 Pull Request resolved: pytorch#127627 Approved by: https://github.com/albanD ghstack dependencies: pytorch#132349
Sorry @mikaylagawarecki for the delay, I will take a look next Monday. |
Discussed this with @JackCaoG. We actually recommend against saving xla/torch_xla/core/xla_model.py Lines 1272 to 1274 in e657c87
Is it possible to do the same thing in I saw there are some other backends that use the same code path. Is there a plan to handle those? If you are planning add some interface we can implement for custom serialization/deserialization logic, it should also be possible for us to save/load data in an XLA-friendly format (via |
Hey @will-cromar, thanks for the response here! First of all, I am very glad to hear that XLA recommends saving CPU tensors 😄 That means that the weights_only flip will be very minimally BC-breaking for XLA.
It is definitely possible, but ~technically is also a BC-break within torch (separate from the weights_only BC break) I chatted with @JackCaoG a bit offline, sharing the summary of what we came to here
For the (perhaps tiny) portion of users who save their tensors without moving to CPU, I am personally inclined to keep the "if save on XLA, load on XLA" behavior (with the numpy dependency removed) But if you feel strongly that always forcing CPU within |
…registration devices without numpy" Related: pytorch/xla#7799 (comment) Follow ups: Do the same for maia and mtia ## Motivation With the move to `weights_only` by default, we are making an explicit decision not to allowlist GLOBALs required to deserialize `numpy` tensors by default. The implication is that backends relying on numpy for serialization will fail loudly when `torch.load` flips `weights_only`. However, we make the observation that this dependency on numpy was legacy and is not actually needed anymore. So we can remove it, which aligns with our weights_only strategy. ## Why is this ok? The following comment on why numpy is necessary for serialization is legacy https://github.com/pytorch/pytorch/blob/c87c9f0a01f4840bd19ac5058960c9766dd15ef8/torch/_tensor.py#L303-L312 We no longer do the following, though it was the case 5 years ago in the PR that added this > CPU storage is reconstructed with randomly initialized data, moved onto backend device, and then storage is updated to the serialized content **Instead what now happens is that CPU storage is constructed with data from the file **and then** moved onto backend device.** Old behavior (`legacy_load`): https://github.com/ailzhang/pytorch/blob/67adda891a839691790a0dcd99062430050eff3b/torch/serialization.py#L620 [ghstack-poisoned]
…n devices without numpy (#137444) Related: pytorch/xla#7799 (comment) Follow ups: Do the same for maia and mtia ## Motivation With the move to `weights_only` by default, we are making an explicit decision not to allowlist GLOBALs required to deserialize `numpy` tensors by default. The implication is that backends relying on numpy for serialization will fail loudly when `torch.load` flips `weights_only`. However, we make the observation that this dependency on numpy was legacy and is not actually needed anymore. So we can remove it, which aligns with our weights_only strategy. ## Why is this ok? The following comment on why numpy is necessary for serialization is legacy https://github.com/pytorch/pytorch/blob/c87c9f0a01f4840bd19ac5058960c9766dd15ef8/torch/_tensor.py#L303-L312 We no longer do the following, though it was the case 5 years ago in the PR that added this > CPU storage is reconstructed with randomly initialized data, moved onto backend device, and then storage is updated to the serialized content **Instead what now happens is that CPU storage is constructed with data from the file **and then** moved onto backend device.** Old behavior (`legacy_load`): https://github.com/ailzhang/pytorch/blob/67adda891a839691790a0dcd99062430050eff3b/torch/serialization.py#L620 Pull Request resolved: #137444 Approved by: https://github.com/albanD
Hey @will-cromar fyi it seems like there is a test here that tests saving of tensors on xla device https://github.com/pytorch/xla/blob/master/test/test_operations.py#L1460 I removed the numpy dependency, but when I flip
Could you look into this/allowlist the necessary globals in the xla library please, you can do this via |
ok I can repo, looking.. |
TL;DR
PyTorch is planning a BC-breaking change in
torch.load
to flip the default forweights_only
fromNone
(i.e.False
) toTrue
(and have added a warning to this effect in torch 2.4 :) ) that will break loading of tensor serialized when on XLA.Context
Instead of using the default
Unpickler
provided bypickle
,torch.load(weights_only=True)
uses a custom Unpickler that restricts the allowedGLOBAL
s in the checkpoint (classes and functions) to those here (that required to build state_dicts).The purpose of this is towards addressing the issue of remote code execution when using
torch.load
.Another feature of this is that users can allowlist certain globals using
add_safe_globals
(in torch 2.4) or thesafe_globals
context manager (in torch nightly), a simple example beingHow this affects XLA
Notably, XLA uses a special path that uses numpy for serialization/deserialization see here. However, we have made a decision not to include the numpy GLOBALS required for unpickling in the defaut list as we do not control the codepaths numpy implements for pickling (see relevant GLOBALs here)
Ask
Opening this issue to figure out the best way to move forward re above to make the flip as smooth as possible!
Ideally, it would be good if the path for serializing XLA tensors could be refactored to not use numpy and we would definitely accept a PR that implements this!
Separately, for existing checkpoints I imagine there will be something that needs to be done there.
cc @JackCaoG
The text was updated successfully, but these errors were encountered: