Skip to content

Tags: kwen2501/pytorch

Tags

ciflow/macos/73094

Update on "[quant] Add ConvTranspose reference module - Reland pytorc…

…h#73031"

Summary:
Add ConvTranspose reference module

Test Plan:
python3 test/test_quantization.py TestQuantizeEagerOps.test_conv_transpose_2d

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D34352228](https://our.internmc.facebook.com/intern/diff/D34352228)

[ghstack-poisoned]

ciflow/libtorch/73011

Test Test test

ciflow/binaries/73011

Test Test test

ciflow/all/73203

[FSDP][Reland] Implement local_state_dict and load_local_state_dict

1. Implement the framework to allow user to choose among `state_dict`, `local_state_dict`, and `sharded_state_dict`.
2. Implement ShardedTensor compatible local_state_dict() and load_local_state_dict().

Differential Revision: [D34383925](https://our.internmc.facebook.com/intern/diff/D34383925/)

[ghstack-poisoned]

ciflow/all/73202

add BFloat16 sparse operators on CPU: sparse_mask, add_out, addmm

ciflow/all/73115

[torch] do not fold bmm into mm when tensor1 dim==3 but not contiguous (

pytorch#73115)

Summary:
Pull Request resolved: pytorch#73115

matmul for [B, M, K] x [K, N] was mapped to mm by folding the first 2dim of tensor1 to [BxM, K] x [K, N] but when M and K are transposed it's better to use BMM to avoid data movement.

We could generalize the condition we don't fold (see more details in the comment) but being conservative here to be cautious about potential unintended regression.

Test Plan:
In the following simple test case, before this diff

0.00652953577041626 0.003044447898864746
Permutation takes about same time as GEMM

After this diff
0.002983328104019165 0.0030336639881134034
Permutation overhead essentially went away.

```
B = 128
M = 1024
N = 128
K = 1024

X = torch.rand(B, K, M).cuda()
b = torch.rand(N).cuda()
W = torch.rand(N, K).cuda()
X = X.permute(0, 2, 1)
Y = F.linear(X, W, b)

X_contiguous = X.contiguous()
Y_ref = F.linear(X_contiguous, W, b)

torch.testing.assert_close(Y, Y_ref)

t1, _ = benchmark_torch_function(F.linear, X, W, b, 0)

t2, _ = benchmark_torch_function(F.linear, X_contiguous, W, b, 0)

print(t1, t2)
```

Differential Revision: D34350990

fbshipit-source-id: f3dc761e3766e0f6f78b61eddc6d3a38f1d8a6d7

ciflow/all/72951

Update on "[PyTorch] Extend flatbuffer to support extra files map"

Extend flatbuffer to support extra files map

Flatbuffer schema has extra files. The users can write extra files by providing a `map<string, string>` which will be part of the flatbuffer model asset and and can be loaded back similar to pickle.

Differential Revision: [D34286346](https://our.internmc.facebook.com/intern/diff/D34286346/)

[ghstack-poisoned]

ciflow/all/72405

Update on "Check if the iterator is valid before dereferencing it"

Fixes pytorch#71674.

This shouldn't segfault now:

```
import torch
d = torch.complex64
torch.set_default_dtype(d)
```

[ghstack-poisoned]

v1.11.0-rc3

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
stop sccache server after building (pytorch#72794) (pytorch#73122)

Summary:
This is to avoid the directory , where the sccache is installed, couldn't be deleted.

Pull Request resolved: pytorch#72794

Reviewed By: H-Huang

Differential Revision: D34222877

Pulled By: janeyx99

fbshipit-source-id: 2765d6f49b375d15598586ed83ae4c5e667e7226
(cherry picked from commit 551e21c)

Co-authored-by: Yi Zhang <zhanyi@microsoft.com>

ciflow/binaries/72306

Update on "Add BUILD_LAZY_CUDA_LINALG option"

When enable, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs

Differential Revision: [D33992795](https://our.internmc.facebook.com/intern/diff/D33992795)

[ghstack-poisoned]