Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libtorch: new recipe #24759

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from
Draft

libtorch: new recipe #24759

wants to merge 17 commits into from

Conversation

valgur
Copy link
Contributor

@valgur valgur commented Jul 30, 2024

Summary

Changes to recipe: libtorch/2.4.0

Motivation

Tensors and Dynamic neural networks in Python with strong GPU acceleration.

https://github.com/pytorch/pytorch

Packaging status

Details

Continues from #5100 by @SpaceIm.

CUDA, HIP and SYCL backends are currently disabled since the PR is complex enough already and these can be addressed in a follow-up PR. Vulkan and Metal (TODO) should be usable as GPU backends currently.

Distributed feature is disabled as well to limit the scope and due to openmpi not yet being available (#18980).

Android and iOS builds are probably broken and need testing.

Non-OpenBLAS BLAS backends are probably not usable due to OpenBLAS being required for LAPACK. A separate LAPACK recipe would be required to fix that (such as #23798).

Closes #6861.

TODO:

  • Export missing CMake variables.
  • Test with Metal on macOS.
  • Submit bugfix patches upstream.
  • Create a recipe for pocketfft and unvendor.

@conan-center-bot

This comment has been minimized.

@conan-center-bot

This comment has been minimized.

valgur added 2 commits July 30, 2024 18:48
XNNPACK was not correctly added to project dependencies.
Prefer namespaced targets, if possible.
@conan-center-bot

This comment has been minimized.

Copy link
Contributor

Hooks produced the following warnings for commit 87a1370
libtorch/2.4.0@#f680755600363ae5e29186ad5b798792
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './third_party/kineto/libkineto/third_party/dynolog/hbt/src/perf_event/json_events/generated/intel/sapphirerapids_uncore_experimental.cpp' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './third_party/kineto/libkineto/third_party/dynolog/third_party/googletest/googlemock/include/gmock/internal/custom/gmock-generated-actions.h' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './third_party/kineto/libkineto/third_party/dynolog/third_party/json/doc/mkdocs/docs/api/byte_container_with_subtype/byte_container_with_subtype.md' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './third_party/kineto/libkineto/third_party/dynolog/third_party/json/test/reports/2016-09-09-nativejson_benchmark/conformance_overall_Result.png' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM/testing/python3/dcptestautomation/parse_dcgmproftester_single_metric.py' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM/testing/python3/tests/nvswitch_tests/test_nvswitch_with_running_fm.py' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM/scripts/verify_package_contents/datacenter-gpu-manager_VERSION_arm64.deb.txt' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './test/dynamo_expected_failures/TestExpandedWeightFunctionalCPU.test_expanded_weights_per_sample_grad_input_no_grad_nn_functional_group_norm_cpu_float64' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './test/dynamo_skips/TestProxyTensorOpInfoCPU.test_make_fx_symbolic_exhaustive_inplace_nn_functional_feature_alpha_dropout_without_train_cpu_float32' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_package(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './include/ATen/native/transformers/cuda/mem_eff_attention/iterators/predicated_tile_iterator_residual_last.h' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_package(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './include/ATen/native/transformers/cuda/mem_eff_attention/epilogue/epilogue_thread_apply_logsumexp.h' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_package(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './include/ATen/ops/max_pool2d_with_indices_backward_compositeexplicitautogradnonfunctional_dispatch.h' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.

@valgur valgur mentioned this pull request Aug 7, 2024
3 tasks
@conan-center-bot
Copy link
Collaborator

Conan v1 pipeline ❌

Failure in build 6 (ff36ad93e96684208459986e2a5088b01a02883d):

  • libtorch/2.4.0:
    An unexpected error happened and has been reported

Note: To save resources, CI tries to finish as soon as an error is found. For this reason you might find that not all the references have been launched or not all the configurations for a given reference. Also, take into account that we cannot guarantee the order of execution as it depends on CI workload and workers availability.


Conan v2 pipeline ❌

Note: Conan v2 builds are now mandatory. Please read our discussion about it.

The v2 pipeline failed. Please, review the errors and note this is required for pull requests to be merged. In case this recipe is still not ported to Conan 2.x, please, ping @conan-io/barbarians on the PR and we will help you.

Failure in build 8 (ff36ad93e96684208459986e2a5088b01a02883d):

  • libtorch/2.4.0:
    CI failed to create some packages (All logs)

    Logs for packageID 999239f19123416d584ffc8c46c1df33a363bf09:
    [settings]
    arch=armv8
    build_type=Release
    compiler=apple-clang
    compiler.cppstd=17
    compiler.libcxx=libc++
    compiler.version=13
    os=Macos
    [options]
    */*:shared=False
    
    [...]
    --   USE_NCCL              : OFF
    --   USE_NNPACK            : OFF
    --   USE_NUMPY             : OFF
    --   USE_OBSERVERS         : False
    --   USE_OPENCL            : False
    --   USE_OPENMP            : False
    --   USE_MIMALLOC          : False
    --   USE_VULKAN            : False
    --   USE_PROF              : OFF
    --   USE_PYTORCH_QNNPACK   : True
    --   USE_XNNPACK           : True
    --   USE_DISTRIBUTED       : OFF
    --   Public Dependencies  : 
    --   Private Dependencies : cpuinfo;fp16::fp16;fmt::fmt;pthreadpool::pthreadpool;flatbuffers::flatbuffers;xnnpack::xnnpack;Threads::Threads;cpuinfo;pytorch_qnnpack;fp16;onnx::onnx;foxi_loader;fmt::fmt-header-only;kineto
    --   Public CUDA Deps.    : 
    --   Private CUDA Deps.   : 
    --   USE_COREML_DELEGATE     : False
    --   BUILD_LAZY_TS_BACKEND   : True
    --   USE_ROCM_KERNEL_ASSERT : OFF
    -- Configuring done (5.6s)
    -- Generating done (0.5s)
    -- Build files have been written to: /Users/jenkins/workspace/prod-v2/bsr/75828/debae/p/b/libto3d26c80da6c4e/b/build/Release
    [  0%] Linking C static library ../../lib/libfxdiv.a
    [  0%] Built target clog
    [  0%] Built target libkineto_defs.bzl
    ar: no archive members specified
    usage:  ar -d [-TLsv] archive file ...
    	ar -m [-TLsv] archive file ...
    	ar -m [-abiTLsv] position archive file ...
    	ar -p [-TLsv] archive [file ...]
    	ar -q [-cTLsv] archive file ...
    	ar -r [-cuTLsv] archive file ...
    	ar -r [-abciuTLsv] position archive file ...
    	ar -t [-TLsv] archive [file ...]
    	ar -x [-ouTLsv] archive [file ...]
    make[2]: *** [lib/libfxdiv.a] Error 1
    make[1]: *** [confu-deps/pytorch_qnnpack/CMakeFiles/fxdiv.dir/all] Error 2
    make[1]: *** Waiting for unfinished jobs....
    [  0%] Built target kineto_api
    [  1%] Built target kineto_base
    [  8%] Built target c10
    [  8%] Built target ATEN_CPU_FILES_GEN_TARGET
    make: *** [all] Error 2
    
    libtorch/2.4.0: ERROR: 
    Package '999239f19123416d584ffc8c46c1df33a363bf09' build failed
    libtorch/2.4.0: WARN: Build folder /Users/jenkins/workspace/prod-v2/bsr/75828/debae/p/b/libto3d26c80da6c4e/b/build/Release
    ERROR: libtorch/2.4.0: Error in build() method, line 497
    	cmake.build(cli_args=["--parallel", "1"])
    	ConanException: Error 2 while executing
    

Note: To save resources, CI tries to finish as soon as an error is found. For this reason you might find that not all the references have been launched or not all the configurations for a given reference. Also, take into account that we cannot guarantee the order of execution as it depends on CI workload and workers availability.

@valgur valgur mentioned this pull request Sep 19, 2024
6 tasks
@hasB4K
Copy link

hasB4K commented Sep 26, 2024

Hello @valgur, thanks for this amazing PR. Do you plan to continue working on it? 🤞Having libtorch in Conan would be so neat. Since OpenMPI is now available, do you plan to let the user to enable the distributed feature?

tc.variables["BLAS"] = self._blas_cmake_option_value

tc.variables["MSVC_Z7_OVERRIDE"] = False

Copy link
Contributor

@keef-cognitiv keef-cognitiv Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, this also needs

tc.variables["CMAKE_CXX_EXTENSIONS"] = True

Tested this while running a build that uses a compiler.cppstd. If it is using a non-gnu standard (which for other packages it must be) ATen breaks with the same error as: pytorch/QNNPACK#67

This converts -std=c++17 for example to -std=gnu++17.

It's probably not necessary on Windows but also shouldn't hurt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! TODO: add gnu_extensions=True to check_min_cppstd()

whole_archive = f"-WHOLEARCHIVE:{lib_fullpath}"
else:
lib_fullpath = os.path.join(lib_folder, f"lib{libname}.a")
whole_archive = f"-Wl,--whole-archive,{lib_fullpath},--no-whole-archive"
Copy link

@lia-viam lia-viam Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this PR--I am not using this library directly but found it through following some github issues on whole archive linking.

For this line--I wonder if it is possible to do this with -Wl,--push-state,--pop-state? See eg
https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_LINK_LIBRARY_USING_FEATURE.html#loading-a-whole-static-library

https://github.com/Kitware/CMake/blob/ddf1d2944fe53b0fb0be79621c53d2d235fce07b/Modules/Platform/Linker/GNU.cmake#L35

self.options.rm_safe("with_mkldnn")
if not is_apple_os(self) or self.settings.os not in ["Linux", "Android"]:
del self.options.with_nnpack
self.options.with_itt = self.settings.arch in ["x86", "x86_64"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line overrides the manual taken settings in with_itt. So even if set to False this line sets it to True on x86, x86_64 which may not be the expected behaviour.


@property
def _use_nnpack_family(self):
return any(self.options.get_safe(f"with_{name}") for name in ["nnpack", "qnnpack", "xnnpack"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_xnnpack can not be deleted because it is unsafe used on line 284 if self.options.with_xnnpack:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment, but I don't see with_xnnpack option being deleted anywhere? This specific line is querying the value, not removing the option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the missing context. with_xnnpack must be deleted for MacOS build since this is not yet supported under Mac. For that cases it might be problem.
I managed to build libtorch under linux and MacOS ARM with some modifications based on @valgur's work.
If you are interested you can see the actual verison:
https://github.com/joda01/imagec-recipes/actions/runs/12983296316

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll definitely take a look.

recipes/libtorch/all/conanfile.py Show resolved Hide resolved
self.requires("vulkan-loader/1.3.268.0")
if self.options.with_mimalloc:
self.requires("mimalloc/2.1.7")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build on MacOS needs pybind

Suggested change
if is_apple_os(self):
self.requires("pybind11/2.13.6")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it's required specifically on macOS? I might have missed it on Linux due to having it available on my system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it without but had no success. You can have a look to
https://github.com/joda01/imagec-recipes/actions/runs/12983296316


# Keep only a restricted set of vendored dependencies.
# Do it before build() to limit the amount of files to copy.
allowed = ["pocketfft", "kineto", "miniz-2.1.0"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MacOs build some more 3rd party libs are neeeded

Suggested change
allowed = ["pocketfft", "kineto", "miniz-2.1.0"]
allowed = ["pocketfft", "kineto", "miniz-2.1.0"]
if self.is_mac_os == True:
allowed = ["pocketfft", "kineto", "miniz-2.1.0", "opentelemetry-cpp", "protobuf"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I would prefer to unvendor these and use Conan versions. I'll have to do more testing on macOS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I total agree with you! It's a fast workaround.

@valgur
Copy link
Contributor Author

valgur commented Jan 27, 2025

Thanks, @joda01 and @lia-viam, for pointing out the potential bugs!

@joda01
Copy link
Contributor

joda01 commented Jan 27, 2025

@valgur Thank's a lot for your initial work providing a libtorch recipe!
Do you have some experience with Windows build. In my pipeline I get Python not found form cmake even Pyhton is installed an in path!?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[request] pytorch/1.9
8 participants