Tags: kwen2501/pytorch
Tags
Update on "[quant] Add ConvTranspose reference module - Reland pytorc… …h#73031" Summary: Add ConvTranspose reference module Test Plan: python3 test/test_quantization.py TestQuantizeEagerOps.test_conv_transpose_2d Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D34352228](https://our.internmc.facebook.com/intern/diff/D34352228) [ghstack-poisoned]
[FSDP][Reland] Implement local_state_dict and load_local_state_dict 1. Implement the framework to allow user to choose among `state_dict`, `local_state_dict`, and `sharded_state_dict`. 2. Implement ShardedTensor compatible local_state_dict() and load_local_state_dict(). Differential Revision: [D34383925](https://our.internmc.facebook.com/intern/diff/D34383925/) [ghstack-poisoned]
add BFloat16 sparse operators on CPU: sparse_mask, add_out, addmm
[torch] do not fold bmm into mm when tensor1 dim==3 but not contiguous ( pytorch#73115) Summary: Pull Request resolved: pytorch#73115 matmul for [B, M, K] x [K, N] was mapped to mm by folding the first 2dim of tensor1 to [BxM, K] x [K, N] but when M and K are transposed it's better to use BMM to avoid data movement. We could generalize the condition we don't fold (see more details in the comment) but being conservative here to be cautious about potential unintended regression. Test Plan: In the following simple test case, before this diff 0.00652953577041626 0.003044447898864746 Permutation takes about same time as GEMM After this diff 0.002983328104019165 0.0030336639881134034 Permutation overhead essentially went away. ``` B = 128 M = 1024 N = 128 K = 1024 X = torch.rand(B, K, M).cuda() b = torch.rand(N).cuda() W = torch.rand(N, K).cuda() X = X.permute(0, 2, 1) Y = F.linear(X, W, b) X_contiguous = X.contiguous() Y_ref = F.linear(X_contiguous, W, b) torch.testing.assert_close(Y, Y_ref) t1, _ = benchmark_torch_function(F.linear, X, W, b, 0) t2, _ = benchmark_torch_function(F.linear, X_contiguous, W, b, 0) print(t1, t2) ``` Differential Revision: D34350990 fbshipit-source-id: f3dc761e3766e0f6f78b61eddc6d3a38f1d8a6d7
Update on "[PyTorch] Extend flatbuffer to support extra files map" Extend flatbuffer to support extra files map Flatbuffer schema has extra files. The users can write extra files by providing a `map<string, string>` which will be part of the flatbuffer model asset and and can be loaded back similar to pickle. Differential Revision: [D34286346](https://our.internmc.facebook.com/intern/diff/D34286346/) [ghstack-poisoned]
Update on "Check if the iterator is valid before dereferencing it" Fixes pytorch#71674. This shouldn't segfault now: ``` import torch d = torch.complex64 torch.set_default_dtype(d) ``` [ghstack-poisoned]
stop sccache server after building (pytorch#72794) (pytorch#73122) Summary: This is to avoid the directory , where the sccache is installed, couldn't be deleted. Pull Request resolved: pytorch#72794 Reviewed By: H-Huang Differential Revision: D34222877 Pulled By: janeyx99 fbshipit-source-id: 2765d6f49b375d15598586ed83ae4c5e667e7226 (cherry picked from commit 551e21c) Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
Update on "Add BUILD_LAZY_CUDA_LINALG option" When enable, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs Differential Revision: [D33992795](https://our.internmc.facebook.com/intern/diff/D33992795) [ghstack-poisoned]
PreviousNext