Skip to content
This repository has been archived by the owner on Nov 25, 2024. It is now read-only.

Usage of C10 API's in WholeMemory and its implication on build #6

Open
Tracked by #48
teju85 opened this issue Aug 16, 2022 · 2 comments
Open
Tracked by #48

Usage of C10 API's in WholeMemory and its implication on build #6

teju85 opened this issue Aug 16, 2022 · 2 comments
Assignees
Labels
good first issue Good for newcomers tech-debt Cleanup tasks

Comments

@teju85
Copy link
Member

teju85 commented Aug 16, 2022

@BradReesWork JFYI...
Currently, WM directly uses C10 API's from pytorch. Eg: wholegraph/torch/whole_nccl_pytorch_tensor.h. This means, similar to the pytorch backend of cugraph-ops, we'll also have to think about the topic of CXX_ABI during its build! This can especially become complicated when WM moves inside cuGraph.

@teju85 teju85 changed the title Usage of C10 API's in WholeMemory Usage of C10 API's in WholeMemory and its implication on build Aug 16, 2022
@teju85 teju85 added good first issue Good for newcomers tech-debt Cleanup tasks labels Aug 16, 2022
@MatthiasKohl
Copy link

I think it would make sense to go the same route as with cugraph-ops: if we expose things using cython, and only use the __cuda_array_interface__ for integrating with DL frameworks, then we're able to circumvent this issue entirely, because we won't interact with pytorch at a C++ level at all and so the ABI level won't matter.
The issue only arises if you link against torchlib or some other pytorch backend directly, which you have to do of course if you want to use c10 or other C++ backends from pytorch directly.

@dongxuy04
Copy link
Contributor

I think this issue should have been addressed in PR #24 .

  1. All WholeMemory are supporting both dlpack and cuda_array_interface by PyWholeMemoryFlattenDlpack
  2. All memory needed by ops are allocated by wholememory_env_func_t,
    which is binded with PyTorch at python level using callbacks to Python functions.

So there is no C10 API used during build. And it can build and run without C10 API.

Moreover, as C10 API may have slightly better performance than python callback bindings. The codes are kept in python/pylibwholegraph/pylibwholegraph/torch_cpp_ext directory. They are not built at build and packaging time, just packaging the source code. If it is really helpful on performance, user can complie them at runtime on their machine manually as an optimizing option, by calling compile_cpp_extension. compile_cpp_extension is the only usage of C10 API, it is up to user and run only on user's machine with known PyTorch version. If users don't care about that, all the files under torch_cpp_ext are not used.

@BradReesWork I think we can we can close this issue now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
good first issue Good for newcomers tech-debt Cleanup tasks
Projects
None yet
Development

No branches or pull requests

3 participants