Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pip wheel build jobs #2969

Merged
merged 7 commits into from
Apr 15, 2024
Merged

Fix pip wheel build jobs #2969

merged 7 commits into from
Apr 15, 2024

Conversation

dbort
Copy link
Contributor

@dbort dbort commented Apr 10, 2024

These pip dependencies need to be present to build the pip wheel.

Also, change the version to a stub that looks less like a real version, until we can hook up the logic to get the version from the git repo state.

Copy link

pytorch-bot bot commented Apr 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2969

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit b5c293c with merge base d3326a2 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 10, 2024
@dbort
Copy link
Contributor Author

dbort commented Apr 10, 2024

Attempting a wheel build in https://github.com/pytorch/executorch/actions/runs/8636383613, triggered from https://github.com/pytorch/executorch/actions/workflows/build-wheels-linux.yml by pointing at this branch.

EDIT:
Still failing because of missing zstd and tomli:
https://github.com/pytorch/executorch/actions/runs/8636383613/job/23676183977#step:14:89
https://github.com/pytorch/executorch/actions/runs/8636383613/job/23676183977#step:14:102

I do see at least zstd installed in an environment (https://github.com/pytorch/executorch/actions/runs/8636383613/job/23676183977#step:9:2573), so I think we might be building in an isolated environment.

It looks like we're currently using the unmodified pytorch/pytorch script to build our pip packages, so I can't just edit it directly to try things out. And the action scripts live in a different repo, which means I can't tweak them as part of this PR: https://github.com/pytorch/test-infra/blob/main/.github/actions/setup-binary-builds/action.yml

And that action assumes that setting various env vars will affect the version, using logic like in https://github.com/pytorch/pytorch/blob/main/tools/generate_torch_version.py -- so we may need to duplicate some of that logic. Though some parts, like adding +cpu, don't make sense for us. But we might want to add like +xnnpack, +coreml to some versions of the packages.

Copy link
Member

@kit1980 kit1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need requires like setuptools and wheel?
For TorchFix building wheels is just fine without those: https://github.com/pytorch-labs/torchfix/blob/main/pyproject.toml

You probably also want requires-python = ">=3.9" or something like that to specify min Python version.

Also in the future you should add project info like description, license, URL, etc. - again see https://github.com/pytorch-labs/torchfix/blob/main/pyproject.toml for an example.

@dbort
Copy link
Contributor Author

dbort commented Apr 11, 2024

I tried building in a totally empty environment locally, and it works for me:

conda create -yn et3 python=3.10.0
conda activate et3
python setup.py bdist_wheel

Built just fine; in the CI job, it would have failed during the bdist_wheel step.

And I was able to install the wheel:

pip install dist/executorch-0.0.0-cp310-cp310-macosx_11_0_arm64.whl

@dbort
Copy link
Contributor Author

dbort commented Apr 12, 2024

Trying another run at https://github.com/pytorch/executorch/actions/runs/8655047528 after updating build/packaging/pre_build_script.sh to manually install the build deps.

EDIT: This was failing because it only tried building for python3.8, which ET isn't compatible with. Huy added ciflow/binaries/all to this PR to try all python versions, and I started a new job in https://github.com/pytorch/executorch/actions/runs/8655199802

@dbort
Copy link
Contributor Author

dbort commented Apr 12, 2024

Got past the build deps, now failing with

RuntimeError: 2024-04-12T00:56:00.412778Z  WARN buck2::panic::imp: Warning "root_not_allowed": buck2 is not allowed to run as root (unless home dir is owned by root)
Command failed: buck2 is not allowed to run as root (unless home dir is owned by root)

Huy says: they are indeed running as root inside the Docker image

@dbort dbort force-pushed the sandbox/dbort/wheel-deps branch 2 times, most recently from f453b6e to 4a05874 Compare April 12, 2024 21:48
@dbort
Copy link
Contributor Author

dbort commented Apr 12, 2024

Most wheel builds are green after changing the workflow to not get recursive submodules: https://github.com/pytorch/executorch/actions/runs/8668397609/job/23773398885?pr=2969

Zipped wheels available at https://github.com/pytorch/executorch/actions/runs/8668397609?pr=2969#artifacts

I created pytorch/test-infra#5095 to add that option to upstream test-infra, but worst case we can stick with this workflow fork in the ET release branch

These pip dependencies need to be present to build the pip wheel.

Also, change the version to a stub that looks less like a real version,
until we can hook up the logic to get the version from the git repo
state.
Manually install build requirements because `python setup.py
bdist_wheel` does not install them.
setup.py is sometimes run as root in docker containers. buck2 doesn't
allow running as root unless $HOME is owned by root or does not exist.
So temporarily undefine it while configuring cmake, which runs buck2 to
get some source lists.

Also, the buck2 daemon can sometimes get stuck on the CI workers. Try
killing it before starting the build, ignoring any failures.
Some CI jobs can fail with "OS file watch limit reached" when running
buck2. This section should reduce the number of files that it tries to
watch.
Change the build-wheels workflow to only fetch the first layer of
submodules. ExecuTorch only needs the first layer of submodules to
build its pip package, but the `build_wheels_*.yaml` workflows will
recursively fetch all submodules by default.

Fetching all submodules can also cause `buck2` to fail because it will
try to watch too many files.

This change makes `buck2` work on the CI runners, speeds up the jobs,
and reduces disk/network usage.
@dbort dbort force-pushed the sandbox/dbort/wheel-deps branch 2 times, most recently from 62e7d06 to 4f31c17 Compare April 13, 2024 22:44
@dbort
Copy link
Contributor Author

dbort commented Apr 13, 2024

Latest version turns on pybindings and xnnpack everywhere, and coreml/mps on macos.

Linux build failing at https://github.com/pytorch/executorch/actions/runs/8676736127/job/23791536165?pr=2969#step:14:142

/__w/executorch/executorch/pytorch/executorch/kernels/portable/cpu/op_isinf.cpp: In function ‘torch::executor::Tensor& torch::executor::native::isinf_out(torch::executor::RuntimeContext&, const Tensor&, torch::executor::Tensor&)’:
/__w/executorch/executorch/pytorch/executorch/kernels/portable/cpu/op_isinf.cpp:18:71: error: cannot resolve overloaded function ‘isinf’ based on conversion to type ‘torch::executor::FunctionRef<bool(double)>’
   18 |   return internal::unary_ufunc_realhb_to_bool(std::isinf, ctx, in, out);
      |                                                                       ^
gmake[3]: *** [kernels/portable/CMakeFiles/portable_kernels.dir/cpu/op_isinf.cpp.o] Error 1
gmake[3]: *** Waiting for unfinished jobs....
/__w/executorch/executorch/pytorch/executorch/kernels/portable/cpu/op_isnan.cpp: In function ‘torch::executor::Tensor& torch::executor::native::isnan_out(torch::executor::RuntimeContext&, const Tensor&, torch::executor::Tensor&)’:
/__w/executorch/executorch/pytorch/executorch/kernels/portable/cpu/op_isnan.cpp:18:71: error: cannot resolve overloaded function ‘isnan’ based on conversion to type ‘torch::executor::FunctionRef<bool(double)>’
   18 |   return internal::unary_ufunc_realhb_to_bool(std::isnan, ctx, in, out);
      |                                                                       ^
gmake[3]: *** [kernels/portable/CMakeFiles/portable_kernels.dir/cpu/op_isnan.cpp.o] Error 1
gmake[2]: *** [kernels/portable/CMakeFiles/portable_kernels.dir/all] Error 2
gmake[1]: *** [CMakeFiles/portable_lib.dir/rule] Error 2
gmake: *** [portable_lib] Error 2

macos build failing in coreml https://github.com/pytorch/executorch/actions/runs/8676736126/job/23791536162?pr=2969#step:12:249

/Users/ec2-user/runner/_work/executorch/executorch/pytorch/executorch/backends/apple/coreml/runtime/inmemoryfs/inmemory_filesystem.cpp:139:75: error: 'path' is unavailable: introduced in macOS 10.15
InMemoryFileSystem::Attributes get_file_attributes(const std::filesystem::path& path) {
                                                                          ^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/filesystem:909:24: note: 'path' has been explicitly marked unavailable here
class _LIBCPP_TYPE_VIS path {
                       ^
/Users/ec2-user/runner/_work/executorch/executorch/pytorch/executorch/backends/apple/coreml/runtime/inmemoryfs/inmemory_filesystem.cpp:141:53: error: 'last_write_time' is unavailable: introduced in macOS 10.15

And similar for directory_iterator. So maybe we can't build coreml for this prebuilt on our CI mac system.

Always build the pybindings when building the pip wheel.

Always link in XNNPACK.

On macos, also link in MPS. Core ML can't build on the worker machine,
though, because the version of macOS is too old; Core ML requires some
features introduced in macOS 10.15.
@dbort dbort changed the title Add required deps to pyproject.toml Fix pip wheel build jobs Apr 14, 2024
@dbort
Copy link
Contributor Author

dbort commented Apr 15, 2024

Latest wheel jobs generally look good. Even if they don't all work, some of them do, which is an improvement over the current state.
https://github.com/pytorch/executorch/actions/runs/8677034900/job/23792153064?pr=2969
https://github.com/pytorch/executorch/actions/runs/8677034899/job/23792153062?pr=2969

Passing the `std::` functions directory to unary_ufunc_realhb_to_bool
can cause "error: cannot resolve overloaded function ‘isinf’ based
on conversion to type ‘torch::executor::FunctionRef<bool(double)>’"
in some compilation environments.

Might be because these functions can be templatized, or because they
became constexpr in C++23.
Copy link
Contributor

@larryliu0820 larryliu0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the detailed comments! Really helpful to understand what the code does.

@cccclai
Copy link
Contributor

cccclai commented Apr 15, 2024

Did we test the mps/xnnpack are actually linked and registered?

@dbort
Copy link
Contributor Author

dbort commented Apr 15, 2024

Good point @cccclai!

Testing xnnpack pybindings:

  1. Downloaded the mac M1 python3.10 artifact: https://github.com/pytorch/executorch/actions/runs/8693191561/artifacts/1415284154
  2. Unzipped that file to get the executorch-0.2.0.dev0+unknown-cp310-cp310-macosx_11_0_arm64.whl file
  3. Created a new conda environment
  4. Installed the requirements from install_requirements.sh, but did not do the pip install step at the end. (I did this by editing the script)
  5. Ran pip install executorch-0.2.0.dev0+unknown-cp310-cp310-macosx_11_0_arm64.whl
  6. Exported a .pte file that delegates to XNNPACK using a command from the examples/xnnpack/README.md file: python3 -m examples.xnnpack.aot_compiler --model_name="mv2" --delegate

Used hexedit to show that the .pte file does have XNNPACK delegate entries:

00D52CF0   A4 FE FF FF  01 00 00 00  0E 00 00 00  58 6E 6E 70  ............Xnnp
00D52D00   61 63 6B 42  61 63 6B 65  6E 64 00 00  72 FF FF FF  ackBackend..r...
00D52D10   14 00 00 00  0C 00 00 00  04 00 00 00  00 00 00 00  ................
00D52D20   30 FE FF FF  0E 00 00 00  58 6E 6E 70  61 63 6B 42  0.......XnnpackB
00D52D30   61 63 6B 65  6E 64 00 00  02 00 00 00  2C 00 00 00  ackend......,...

Then tried loading the program:

python
from executorch.extension.pybindings import portable_lib
portable_lib._load_for_executorch("./mv2_xnnpack_fp32.pte")

<executorch.extension.pybindings.portable_lib.ExecuTorchModule object at 0x100ac3c30>

This shows that it successfully loaded the program and all of its methods, which means it was able to resolve the XNNPACK backend entries.

Also verified that the pybindings .so contains some XNNPACK-related symbols:

nm extension/pybindings/portable_lib.cpython-310-darwin.so| grep -i xnnpack | head -3
0000000000023890 T __ZN5torch8executor7xnnpack8delegate11XNNCompiler12compileModelEPKvmPNS2_11XNNExecutorEPNS0_15MemoryAllocatorE
0000000000024074 T __ZN5torch8executor7xnnpack8delegate11XNNExecutor10initializeEP11xnn_runtimeONSt3__16vectorIjNS6_9allocatorIjEEEESB_
00000000000241dc T __ZN5torch8executor7xnnpack8delegate11XNNExecutor12prepare_argsEPPNS0_6EValueE

@dbort
Copy link
Contributor Author

dbort commented Apr 15, 2024

I am merging this directly into release/0.2 and will later cherry-pick these changes into main.

@dbort dbort merged commit 638433f into release/0.2 Apr 15, 2024
104 of 110 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants