[RFC] Lowering aten::_index_put_impl_ into tm_tensor.scatter op #616

erman-gurses · 2022-02-21T18:16:44Z

cathyzhyi · 2022-02-23T16:26:56Z

This op is quite tricky. We need to transform the operands meet the requirement for for tm_tensor.scatter

torch-mlir/external/llvm-external-projects/torch-mlir-dialects/include/torch-mlir-dialects/Dialect/TMTensor/IR/TMTensorOps.td

Line 99 in ba29d4f

def TMTensor_ScatterOp : TMTensor_Op<"scatter",

(assuming readers are familiar with both aten.index_put and tm_tensor.scatter)

For all indices tensors:

broadcast to the same shape (call it indices_shape for now)
flatten to 1d tensor
unsqueeze to make the last dimension size one
concatenate all the tensors together

For values operand:

broadcast to the values_shape. values_shape is consisted of 2 parts. First part is the same as the indices_shape. The second part is input tensor shape with dimensions matching the given indices tensors removed.
collapse the indices_shape part to be 1 dimension.

Some steps above are more convenient to be decomposed to other torch ops (like flatten to 1d tensor, unsqueeze, concatenate), some are more convenient to be converted directly to tm_tensor or linalg ops. The conversion should go into ConvertTorchToTMTensor pass to be added by #574.

We might not be able to handle all the cases aten.index_put support. For example, it allows a mixed of None and tensors for the indices list in any order. We might want to start with no None indices first.

silvasean · 2022-02-23T16:38:28Z

We will need a new pass called something like DecomposeOrConvertTorchOp to convert Torch ops to a mixed of torch and aten ops.

Do you mean a mix of torch and TMTensor ops? I think this pass would naturally be the scope of a ConvertTorchToTMTensor pass.

cathyzhyi · 2022-02-23T16:46:36Z

sry, I meant a mixed of torch and linalg/tmtensor ops.

ConvertTorchToTMTensor

We have another pass called lowerAlgorithmicOps #574 which lowers algorithmic torch ops to linalg and tm_tensor ops. I feel calling ConvertTorchToTMTensor for this new pass would cause confusion as it is not explicit the torch can also be decomposed to Torch ops.

silvasean · 2022-02-23T18:55:15Z

I left a comment in #574. I think that there should be LowerAlgorithmicOps (similar to DecomposeComplexOps) as a Torch->Torch lowering. Then ConvertTorchToTMTensor which happens in backends (i.e. after the Torch backend contract, just before ConvertTorchToLinalg).

cathyzhyi · 2022-02-23T20:04:13Z

I left a comment in #574. I think that there should be LowerAlgorithmicOps (similar to DecomposeComplexOps) as a Torch->Torch lowering. Then ConvertTorchToTMTensor which happens in backends (i.e. after the Torch backend contract, just before ConvertTorchToLinalg).

Ha, I see. There seems to be confusion about passes naming. I updated my comments. Thanks for clarifying this!

vivekkhandelwal1 · 2022-03-07T14:43:22Z

Hi @cathyzhyi @silvasean
I am working on the implementation of this op. I need this op for the functorch decomposition of aten::embedding_dense_backward.
https://github.com/pytorch/functorch/blob/main/functorch/_src/decompositions.py#L318-L333

@register_decomposition(aten.embedding_dense_backward)
def embedding_dense_backward(grad_output: Tensor, indices: Tensor, num_weights: int, padding_idx: int, scale_grad_by_freq: bool):
    numel = indices.numel()
    grad = grad_output.view(numel, grad_output.size(-1))
    grad_weight = aten.new_zeros(grad_output, (num_weights, grad_output.shape[-1]))
    indices_rank1 = indices.view(numel)
    if scale_grad_by_freq:
        counts = aten.new_zeros(indices, (num_weights,))
        ones = aten.new_ones(indices, (numel,))
        counts = aten.index_put(counts, [indices_rank1], ones, accumulate=True)
        grad_weights_scale = aten.index(counts, [indices_rank1])
        grad = grad / grad_weights_scale.unsqueeze(1)
    skip_padding = (indices_rank1 != padding_idx).unsqueeze(1)
    skip_padding = skip_padding.expand_as(grad)
    zero_grad = aten.full_like(grad, 0)
    return aten.index_put(grad_weight, [indices_rank1], aten.where(skip_padding, grad, zero_grad), accumulate=True)

I have gone through the comments mentioned above. Since this op seems a bit complex, should we implement this op as of now for the cases required, which is a 1-d tensor of indices and a 1-d tensor of values with
indices.size() == values.size()
We can add the remaining cases as a TO-DO and add them as and when required. The current implementation of tm_tensor.scatter op easily supports the 1-d case.

silvasean · 2022-03-07T20:08:52Z

Starting with the 1D version of this and emitting a clear diagnostic for the unsupported cases seems like a good incremental first step.

cathyzhyi · 2022-03-21T15:15:34Z

closed by #650

…lvm#616) * Generate files per configuration & refer to them as such This is a step towards fully supporting multi-config generators such as VS and Xcode. Right now, the generated files as well as path to files in the project point to a single location which often does not exist unless CMAKE_BUILD_TYPE was specified on the command line which defeats the purpose of multi-config. This change does a few things: 1) Files that are generated by the build now reside in the config directory when using a multi-config generator. For example, instead of: src/ExternalUtil.hpp, when using VS or Xcode, we have: src/Debug/ExternalUtil.hpp, etc. 2) The targets that use the generated files now have an explicit dependency 3) The lit configuration now correctly refers to the binaries in the onnx-mlir build tree per configuration 4) The paths to onnx-mlir and the build directories are correct in the generated files 5) Executables now have the right extensions when they are referenced in tools and scripts 6) A couple of unnecessary variables are removed Bonus: move the CTest includes where they are needed After this change, the path to LLVM_PROJ_BIN is still not correct always since it still refers to CMAKE_BUILD_TYPE which is not always known at configuration time. This can be fixed by using find_package(LLVM) or find_package(MLIR). Using find_package is optimal, but it is a bigger change, so I am splitting it from the file generation for simplicity. The find_package(MLIR) change will be sent for review after this change (and maybe a couple more simple ones) is in. Signed-off-by: Stella Stamenova <stilis@microsoft.com> * Correctly generate the files for single generators as well Signed-off-by: Stella Stamenova <stilis@microsoft.com> * Unfortunately, generator expressions are not enough to make this work and we have to do it the hard way Signed-off-by: Stella Stamenova <stilis@microsoft.com> * Fix a copy-paste error Signed-off-by: Stella Stamenova <stilis@microsoft.com> * Fix another copy-paste error because apparently I need to learn how to copy-paste Signed-off-by: Stella Stamenova <stilis@microsoft.com> * Make all python string raw strings - on (windows) systems where the path contains short directories such as a, \a ends up being treated as a special character causing failures Signed-off-by: Stella Stamenova <stilis@microsoft.com> * Add comments to explain the usage of CMAKE_CFG_INTDIR and add it for lit as well for consistency Signed-off-by: Stella Stamenova <stilis@microsoft.com> * Don't move the ctest include It is not pertinent to the multiconfig changes and it is causing a weird issue with test generation on some platforms Signed-off-by: Stella Stamenova <stilis@microsoft.com> * After thinking through the problem some more, we can simplify the logic for the generation of the files by using generator expressions in a couple of more places including the CMAKE_*_OUTPUT_DIRECTORY properties which support generator expressions. This allows the files to be generated for each config option and then we can always reference them in that location by using CMAKE_BUILD_TYPE when dealing with a single-config generator. By making this change this way, the CMAKE_*_OUTPUT_DIRECTORY set now fully controls where all of the outputs are located which means that if we wanted to align with LLVM for the binary layout, it is a simple 3-line change. Signed-off-by: Stella Stamenova <stilis@microsoft.com> * Add comment in runtime/jni/cmakelists.txt to explain why we need to copy the file. Signed-off-by: Stella Stamenova <stilis@microsoft.com> * Set the build path for both single- and multi-config generators to contain the build mode. For example, build_dir/bin/release and build_dir/lib_release Signed-off-by: Stella Stamenova <stilis@microsoft.com> * Change the build paths for onnx-mlir from build_dir/bin/release to build_dir/release/bin Signed-off-by: Stella Stamenova <stilis@microsoft.com> Co-authored-by: Kevin O'Brien <caomhin@us.ibm.com>

…ering (#3351) Addresses [Shark-Turbine #196](nod-ai/SHARK-TestSuite#196) Related tracker [Shark-Turbine #566](nod-ai/SHARK-ModelDev#566) Related onnx.Resize issues [Shark-Turbine #616](nod-ai/SHARK-ModelDev#616)

…ering (llvm#3351) Addresses [Shark-Turbine llvm#196](nod-ai/SHARK-TestSuite#196) Related tracker [Shark-Turbine llvm#566](nod-ai/SHARK-ModelDev#566) Related onnx.Resize issues [Shark-Turbine llvm#616](nod-ai/SHARK-ModelDev#616)

pashu123 assigned cathyzhyi and silvasean Feb 21, 2022

pashu123 added the help wanted Extra attention is needed label Feb 21, 2022

erman-gurses changed the title ~~Lowering _index_put_impl_ into tm_tensor.scatter op~~ [RFC] Lowering _index_put_impl_ into tm_tensor.scatter op Feb 21, 2022

erman-gurses changed the title ~~[RFC] Lowering _index_put_impl_ into tm_tensor.scatter op~~ [RFC] Lowering aten::_index_put_impl_ into tm_tensor.scatter op Feb 22, 2022

cathyzhyi removed the help wanted Extra attention is needed label Feb 28, 2022

cathyzhyi unassigned cathyzhyi and silvasean Feb 28, 2022

vivekkhandelwal1 added the help wanted Extra attention is needed label Mar 7, 2022

vivekkhandelwal1 assigned cathyzhyi, silvasean and vivekkhandelwal1 Mar 7, 2022

cathyzhyi removed the help wanted Extra attention is needed label Mar 9, 2022

cathyzhyi closed this as completed Mar 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Lowering aten::_index_put_impl_ into tm_tensor.scatter op #616

[RFC] Lowering aten::_index_put_impl_ into tm_tensor.scatter op #616

erman-gurses commented Feb 21, 2022 •

edited

Loading

cathyzhyi commented Feb 23, 2022 •

edited

Loading

silvasean commented Feb 23, 2022

cathyzhyi commented Feb 23, 2022

silvasean commented Feb 23, 2022

cathyzhyi commented Feb 23, 2022

vivekkhandelwal1 commented Mar 7, 2022 •

edited

Loading

silvasean commented Mar 7, 2022

cathyzhyi commented Mar 21, 2022

[RFC] Lowering aten::_index_put_impl_ into tm_tensor.scatter op #616

[RFC] Lowering aten::_index_put_impl_ into tm_tensor.scatter op #616

Comments

erman-gurses commented Feb 21, 2022 • edited Loading

cathyzhyi commented Feb 23, 2022 • edited Loading

silvasean commented Feb 23, 2022

cathyzhyi commented Feb 23, 2022

silvasean commented Feb 23, 2022

cathyzhyi commented Feb 23, 2022

vivekkhandelwal1 commented Mar 7, 2022 • edited Loading

silvasean commented Mar 7, 2022

cathyzhyi commented Mar 21, 2022

erman-gurses commented Feb 21, 2022 •

edited

Loading

cathyzhyi commented Feb 23, 2022 •

edited

Loading

vivekkhandelwal1 commented Mar 7, 2022 •

edited

Loading