Fix pip wheel build jobs #2969

dbort · 2024-04-10T18:57:31Z

These pip dependencies need to be present to build the pip wheel.

Also, change the version to a stub that looks less like a real version, until we can hook up the logic to get the version from the git repo state.

pytorch-bot · 2024-04-10T18:57:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2969

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit b5c293c with merge base d3326a2 ():

NEW FAILURES - The following jobs have failed:

Build Linux Wheels / pytorch/executorch / upload / manywheel-py3_8-cpu (gh)
Unable to find an artifact with the name: pytorch_executorch__3.8_cpu_x86_64
Build M1 Wheels / pytorch/executorch / upload / wheel-py3_8-cpu (gh)
Unable to find an artifact with the name: pytorch_executorch__3.8_cpu_
pull / test-models-linux (cmake, llava_encoder, portable, linux.4xlarge, 90) / linux-job (gh)
RuntimeError: Command docker exec -t 8dea638bbdb60e60af1afa5b5dbe550b7a288d77735437cf91e3b09ba104f08b /exec failed with exit code 1
trunk / test-coreml-delegate / macos-job (gh)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

dbort · 2024-04-10T18:59:04Z

Attempting a wheel build in https://github.com/pytorch/executorch/actions/runs/8636383613, triggered from https://github.com/pytorch/executorch/actions/workflows/build-wheels-linux.yml by pointing at this branch.

EDIT:
Still failing because of missing zstd and tomli:
https://github.com/pytorch/executorch/actions/runs/8636383613/job/23676183977#step:14:89
https://github.com/pytorch/executorch/actions/runs/8636383613/job/23676183977#step:14:102

I do see at least zstd installed in an environment (https://github.com/pytorch/executorch/actions/runs/8636383613/job/23676183977#step:9:2573), so I think we might be building in an isolated environment.

It looks like we're currently using the unmodified pytorch/pytorch script to build our pip packages, so I can't just edit it directly to try things out. And the action scripts live in a different repo, which means I can't tweak them as part of this PR: https://github.com/pytorch/test-infra/blob/main/.github/actions/setup-binary-builds/action.yml

And that action assumes that setting various env vars will affect the version, using logic like in https://github.com/pytorch/pytorch/blob/main/tools/generate_torch_version.py -- so we may need to duplicate some of that logic. Though some parts, like adding +cpu, don't make sense for us. But we might want to add like +xnnpack, +coreml to some versions of the packages.

kit1980

Do you really need requires like setuptools and wheel?
For TorchFix building wheels is just fine without those: https://github.com/pytorch-labs/torchfix/blob/main/pyproject.toml

You probably also want requires-python = ">=3.9" or something like that to specify min Python version.

Also in the future you should add project info like description, license, URL, etc. - again see https://github.com/pytorch-labs/torchfix/blob/main/pyproject.toml for an example.

dbort · 2024-04-11T21:40:14Z

I tried building in a totally empty environment locally, and it works for me:

conda create -yn et3 python=3.10.0
conda activate et3
python setup.py bdist_wheel

Built just fine; in the CI job, it would have failed during the bdist_wheel step.

And I was able to install the wheel:

pip install dist/executorch-0.0.0-cp310-cp310-macosx_11_0_arm64.whl

dbort · 2024-04-12T00:31:14Z

Trying another run at https://github.com/pytorch/executorch/actions/runs/8655047528 after updating build/packaging/pre_build_script.sh to manually install the build deps.

EDIT: This was failing because it only tried building for python3.8, which ET isn't compatible with. Huy added ciflow/binaries/all to this PR to try all python versions, and I started a new job in https://github.com/pytorch/executorch/actions/runs/8655199802

dbort · 2024-04-12T01:07:41Z

Got past the build deps, now failing with

RuntimeError: 2024-04-12T00:56:00.412778Z  WARN buck2::panic::imp: Warning "root_not_allowed": buck2 is not allowed to run as root (unless home dir is owned by root)
Command failed: buck2 is not allowed to run as root (unless home dir is owned by root)

Huy says: they are indeed running as root inside the Docker image

dbort · 2024-04-12T22:26:07Z

Most wheel builds are green after changing the workflow to not get recursive submodules: https://github.com/pytorch/executorch/actions/runs/8668397609/job/23773398885?pr=2969

Zipped wheels available at https://github.com/pytorch/executorch/actions/runs/8668397609?pr=2969#artifacts

I created pytorch/test-infra#5095 to add that option to upstream test-infra, but worst case we can stick with this workflow fork in the ET release branch

These pip dependencies need to be present to build the pip wheel. Also, change the version to a stub that looks less like a real version, until we can hook up the logic to get the version from the git repo state.

Manually install build requirements because `python setup.py bdist_wheel` does not install them.

setup.py is sometimes run as root in docker containers. buck2 doesn't allow running as root unless $HOME is owned by root or does not exist. So temporarily undefine it while configuring cmake, which runs buck2 to get some source lists. Also, the buck2 daemon can sometimes get stuck on the CI workers. Try killing it before starting the build, ignoring any failures.

Some CI jobs can fail with "OS file watch limit reached" when running buck2. This section should reduce the number of files that it tries to watch.

Change the build-wheels workflow to only fetch the first layer of submodules. ExecuTorch only needs the first layer of submodules to build its pip package, but the `build_wheels_*.yaml` workflows will recursively fetch all submodules by default. Fetching all submodules can also cause `buck2` to fail because it will try to watch too many files. This change makes `buck2` work on the CI runners, speeds up the jobs, and reduces disk/network usage.

dbort · 2024-04-13T22:58:50Z

Latest version turns on pybindings and xnnpack everywhere, and coreml/mps on macos.

Linux build failing at https://github.com/pytorch/executorch/actions/runs/8676736127/job/23791536165?pr=2969#step:14:142

/__w/executorch/executorch/pytorch/executorch/kernels/portable/cpu/op_isinf.cpp: In function ‘torch::executor::Tensor& torch::executor::native::isinf_out(torch::executor::RuntimeContext&, const Tensor&, torch::executor::Tensor&)’:
/__w/executorch/executorch/pytorch/executorch/kernels/portable/cpu/op_isinf.cpp:18:71: error: cannot resolve overloaded function ‘isinf’ based on conversion to type ‘torch::executor::FunctionRef<bool(double)>’
   18 |   return internal::unary_ufunc_realhb_to_bool(std::isinf, ctx, in, out);
      |                                                                       ^
gmake[3]: *** [kernels/portable/CMakeFiles/portable_kernels.dir/cpu/op_isinf.cpp.o] Error 1
gmake[3]: *** Waiting for unfinished jobs....
/__w/executorch/executorch/pytorch/executorch/kernels/portable/cpu/op_isnan.cpp: In function ‘torch::executor::Tensor& torch::executor::native::isnan_out(torch::executor::RuntimeContext&, const Tensor&, torch::executor::Tensor&)’:
/__w/executorch/executorch/pytorch/executorch/kernels/portable/cpu/op_isnan.cpp:18:71: error: cannot resolve overloaded function ‘isnan’ based on conversion to type ‘torch::executor::FunctionRef<bool(double)>’
   18 |   return internal::unary_ufunc_realhb_to_bool(std::isnan, ctx, in, out);
      |                                                                       ^
gmake[3]: *** [kernels/portable/CMakeFiles/portable_kernels.dir/cpu/op_isnan.cpp.o] Error 1
gmake[2]: *** [kernels/portable/CMakeFiles/portable_kernels.dir/all] Error 2
gmake[1]: *** [CMakeFiles/portable_lib.dir/rule] Error 2
gmake: *** [portable_lib] Error 2

macos build failing in coreml https://github.com/pytorch/executorch/actions/runs/8676736126/job/23791536162?pr=2969#step:12:249

/Users/ec2-user/runner/_work/executorch/executorch/pytorch/executorch/backends/apple/coreml/runtime/inmemoryfs/inmemory_filesystem.cpp:139:75: error: 'path' is unavailable: introduced in macOS 10.15
InMemoryFileSystem::Attributes get_file_attributes(const std::filesystem::path& path) {
                                                                          ^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/filesystem:909:24: note: 'path' has been explicitly marked unavailable here
class _LIBCPP_TYPE_VIS path {
                       ^
/Users/ec2-user/runner/_work/executorch/executorch/pytorch/executorch/backends/apple/coreml/runtime/inmemoryfs/inmemory_filesystem.cpp:141:53: error: 'last_write_time' is unavailable: introduced in macOS 10.15

And similar for directory_iterator. So maybe we can't build coreml for this prebuilt on our CI mac system.

Always build the pybindings when building the pip wheel. Always link in XNNPACK. On macos, also link in MPS. Core ML can't build on the worker machine, though, because the version of macOS is too old; Core ML requires some features introduced in macOS 10.15.

dbort · 2024-04-15T17:05:13Z

Latest wheel jobs generally look good. Even if they don't all work, some of them do, which is an improvement over the current state.
https://github.com/pytorch/executorch/actions/runs/8677034900/job/23792153064?pr=2969
https://github.com/pytorch/executorch/actions/runs/8677034899/job/23792153062?pr=2969

Passing the `std::` functions directory to unary_ufunc_realhb_to_bool can cause "error: cannot resolve overloaded function ‘isinf’ based on conversion to type ‘torch::executor::FunctionRef<bool(double)>’" in some compilation environments. Might be because these functions can be templatized, or because they became constexpr in C++23.

larryliu0820

Thanks for all the detailed comments! Really helpful to understand what the code does.

cccclai · 2024-04-15T17:18:56Z

Did we test the mps/xnnpack are actually linked and registered?

dbort · 2024-04-15T21:20:20Z

Good point @cccclai!

Testing xnnpack pybindings:

Downloaded the mac M1 python3.10 artifact: https://github.com/pytorch/executorch/actions/runs/8693191561/artifacts/1415284154
Unzipped that file to get the executorch-0.2.0.dev0+unknown-cp310-cp310-macosx_11_0_arm64.whl file
Created a new conda environment
Installed the requirements from install_requirements.sh, but did not do the pip install step at the end. (I did this by editing the script)
Ran pip install executorch-0.2.0.dev0+unknown-cp310-cp310-macosx_11_0_arm64.whl
Exported a .pte file that delegates to XNNPACK using a command from the examples/xnnpack/README.md file: python3 -m examples.xnnpack.aot_compiler --model_name="mv2" --delegate

Used hexedit to show that the .pte file does have XNNPACK delegate entries:

00D52CF0   A4 FE FF FF  01 00 00 00  0E 00 00 00  58 6E 6E 70  ............Xnnp
00D52D00   61 63 6B 42  61 63 6B 65  6E 64 00 00  72 FF FF FF  ackBackend..r...
00D52D10   14 00 00 00  0C 00 00 00  04 00 00 00  00 00 00 00  ................
00D52D20   30 FE FF FF  0E 00 00 00  58 6E 6E 70  61 63 6B 42  0.......XnnpackB
00D52D30   61 63 6B 65  6E 64 00 00  02 00 00 00  2C 00 00 00  ackend......,...

Then tried loading the program:

python
from executorch.extension.pybindings import portable_lib
portable_lib._load_for_executorch("./mv2_xnnpack_fp32.pte")

<executorch.extension.pybindings.portable_lib.ExecuTorchModule object at 0x100ac3c30>

This shows that it successfully loaded the program and all of its methods, which means it was able to resolve the XNNPACK backend entries.

Also verified that the pybindings .so contains some XNNPACK-related symbols:

nm extension/pybindings/portable_lib.cpython-310-darwin.so| grep -i xnnpack | head -3
0000000000023890 T __ZN5torch8executor7xnnpack8delegate11XNNCompiler12compileModelEPKvmPNS2_11XNNExecutorEPNS0_15MemoryAllocatorE
0000000000024074 T __ZN5torch8executor7xnnpack8delegate11XNNExecutor10initializeEP11xnn_runtimeONSt3__16vectorIjNS6_9allocatorIjEEEESB_
00000000000241dc T __ZN5torch8executor7xnnpack8delegate11XNNExecutor12prepare_argsEPPNS0_6EValueE

dbort · 2024-04-15T21:25:53Z

I am merging this directly into release/0.2 and will later cherry-pick these changes into main.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 10, 2024

dbort requested review from huydhn, guangy10 and kit1980 April 10, 2024 18:59

kit1980 approved these changes Apr 10, 2024

View reviewed changes

dbort force-pushed the sandbox/dbort/wheel-deps branch from dec3465 to 4213aff Compare April 12, 2024 00:27

huydhn added the ciflow/binaries/all label Apr 12, 2024

dbort force-pushed the sandbox/dbort/wheel-deps branch 2 times, most recently from f453b6e to 4a05874 Compare April 12, 2024 21:48

dbort force-pushed the sandbox/dbort/wheel-deps branch from 078ec4b to 786ffd2 Compare April 12, 2024 23:29

dbort added 5 commits April 13, 2024 15:24

Add required deps to pyproject.toml

df83321

These pip dependencies need to be present to build the pip wheel. Also, change the version to a stub that looks less like a real version, until we can hook up the logic to get the version from the git repo state.

Install build requirements in pre_build_script.sh

f04a253

Manually install build requirements because `python setup.py bdist_wheel` does not install them.

Add project.ignore to .buckconfig to reduce watched files

3b5f0e0

Some CI jobs can fail with "OS file watch limit reached" when running buck2. This section should reduce the number of files that it tries to watch.

dbort force-pushed the sandbox/dbort/wheel-deps branch 2 times, most recently from 62e7d06 to 4f31c17 Compare April 13, 2024 22:44

dbort force-pushed the sandbox/dbort/wheel-deps branch from 4f31c17 to 96ab9cc Compare April 13, 2024 23:22

dbort changed the title ~~Add required deps to pyproject.toml~~ Fix pip wheel build jobs Apr 14, 2024

dbort force-pushed the sandbox/dbort/wheel-deps branch from fe5ccd2 to b5c293c Compare April 15, 2024 17:07

dbort added the ciflow/trunk label Apr 15, 2024

larryliu0820 approved these changes Apr 15, 2024

View reviewed changes

dbort merged commit 638433f into release/0.2 Apr 15, 2024
104 of 110 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pip wheel build jobs #2969

Fix pip wheel build jobs #2969

dbort commented Apr 10, 2024

pytorch-bot bot commented Apr 10, 2024 •

edited

Loading

dbort commented Apr 10, 2024 •

edited

Loading

kit1980 left a comment

dbort commented Apr 11, 2024

dbort commented Apr 12, 2024 •

edited

Loading

dbort commented Apr 12, 2024

dbort commented Apr 12, 2024 •

edited

Loading

dbort commented Apr 13, 2024

dbort commented Apr 15, 2024

larryliu0820 left a comment

cccclai commented Apr 15, 2024

dbort commented Apr 15, 2024

dbort commented Apr 15, 2024

Fix pip wheel build jobs #2969

Fix pip wheel build jobs #2969

Conversation

dbort commented Apr 10, 2024

pytorch-bot bot commented Apr 10, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2969

❌ 4 New Failures

dbort commented Apr 10, 2024 • edited Loading

kit1980 left a comment

Choose a reason for hiding this comment

dbort commented Apr 11, 2024

dbort commented Apr 12, 2024 • edited Loading

dbort commented Apr 12, 2024

dbort commented Apr 12, 2024 • edited Loading

dbort commented Apr 13, 2024

dbort commented Apr 15, 2024

larryliu0820 left a comment

Choose a reason for hiding this comment

cccclai commented Apr 15, 2024

dbort commented Apr 15, 2024

dbort commented Apr 15, 2024

pytorch-bot bot commented Apr 10, 2024 •

edited

Loading

dbort commented Apr 10, 2024 •

edited

Loading

dbort commented Apr 12, 2024 •

edited

Loading

dbort commented Apr 12, 2024 •

edited

Loading