Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch libtorch_cuda_linalg.so #834

Closed
jamesdbrock opened this issue Nov 29, 2024 · 6 comments · Fixed by #836
Closed

torch libtorch_cuda_linalg.so #834

jamesdbrock opened this issue Nov 29, 2024 · 6 comments · Fixed by #836
Labels
bug Something isn't working

Comments

@jamesdbrock
Copy link

jamesdbrock commented Nov 29, 2024

Describe the bug

Missing libtorch_cuda_linalg.so at runtime.

To Reproduce

I don't have a minimal reproduction, sorry.

Error:

An error occurred during training.
Traceback (most recent call last):
  File "manufacia/training/pipeline.py", line 135, in run
  File "manufacia/training/patch_distribution_pipeline.py", line 354, in training_process_pipeline
  File "manufacia/training/patch_distribution_pipeline.py", line 167, in _run_training
  File "image_anomalies/patch_distribution.py", line 218, in train
  File "image_anomalies/patch_distribution.py", line 137, in patch_distribution_stats_online
RuntimeError: Error in dlopen: libtorch_cuda_linalg.so: cannot open shared object file: No such file or directory

Expected behavior

pyinstaller does correctly collect the object file _internal/torch/lib/libtorch_cuda_linalg.so.

I think that pyinstaller should symlink this shared object file in the _internal directory like this, but it doesn't:

libtorch_cuda_linalg.so -> torch/lib/libtorch_cuda_linalg.so*

Other libtorch object files are symlinked in the _internal directory:

libtorch_cpu.so -> torch/lib/libtorch_cpu.so*
libtorch_cuda_cpp.so -> torch/lib/libtorch_cuda_cpp.so*
libtorch_cuda_cu.so -> torch/lib/libtorch_cuda_cu.so*
libtorch_cuda.so -> torch/lib/libtorch_cuda.so*
libtorch_python.so -> torch/lib/libtorch_python.so*
libtorch.so -> torch/lib/libtorch.so*

Desktop (please complete the following information):

  • OS: Ubuntu
  • Python Version: 3.10
  • Version of pyinstaller-hooks-contrib: 2024.09
  • Version of PyInstaller 6.11.0
@jamesdbrock jamesdbrock added the state:triage We're still figuring out how severe this issue is label Nov 29, 2024
@jamesdbrock
Copy link
Author

Possibly similar to #591

@jamesdbrock
Copy link
Author

I can fix the symlink by adding this line at the end of my .spec file. Is there a better way to do this?

if is_linux:
    os.symlink('torch/lib/libtorch_cuda_linalg.so', 'dist/Manufacia/_internal/libtorch_cuda_linalg.so')

@rokm
Copy link
Member

rokm commented Nov 29, 2024

Other libtorch object files are symlinked to the top-level directory because they are linked by some other shared library. The torch/lib/libtorch_cuda_linalg.so is not (it is dynamically loaded at run-time via dlopen).

If adding the symlink to top-level directory solves the issue for you, then it is likely that the search location is incorrectly inferred from one of other symlinked shared libraries (instead of their "true" location). Instead of adding the symlink, does removing all libtorch* symlinks from top-level directory also solve the problem?


I think the minimal reproducer would be something that uses linalg on CUDA, e.g.

import torch

A = torch.randn(3, 3, device='cuda:0')
print(f"A={A}")
b = torch.randn(3, device='cuda:0')
print(f"b={b}")
x = torch.linalg.solve(A, b)
print(f"x={x}")

Can you confirm that this reproduces the problem on your system?

It does not seem to do so on mine, unless I deliberately remove _internal/torch/lib/libtorch_cuda_linalg.so (which means that in my case, the symlinks are not an issue, but the codepath does use libtorch_cuda_linalg).

If you can reproduce the issue on your system with this sample, what version of torch are you using, and how did you install it (pip, conda, etc.)?

@jamesdbrock
Copy link
Author

jamesdbrock commented Dec 3, 2024

Thanks for your attention, @rokm .

Instead of adding the symlink, does removing all libtorch* symlinks from top-level directory also solve the problem?

Yes that does solve the problem.

torch 1.13.1+cu117 installed by Poetry.

Reproduction

I used your script to make a reproduction.

Files

pyproject.toml

[tool.poetry]
name = "testmodule"
version = "1.0.1.0"
description = ""
authors = []

[tool.poetry.dependencies]
python = ">=3.10,<3.11"
torch = [
  { version = "1.13.1+cu117", platform = "linux", source = "torchcu117" },
]
numpy = "1.24.2"
pyinstaller = "^6.11.0"
pyinstaller-hooks-contrib = "^2024.9"


[tool.poetry.scripts]
testmodule = "testmodule.main:main"

[[tool.poetry.source]]
name = "torchcu117"
url = "https://download.pytorch.org/whl/cu117"
priority = "explicit"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

src/testmodule/main.py

import torch

def main():

    A = torch.randn(3, 3, device='cuda:0')
    print(f"A={A}")
    b = torch.randn(3, device='cuda:0')
    print(f"b={b}")
    x = torch.linalg.solve(A, b)
    print(f"x={x}")

if __name__ == "__main__":
    main()

src/testmodule/__init__.py

Commands

poetry install
poetry run testmodule

That should run the test program successfully.

Now we continue and package the test program with pyinstaller.

poetry shell
pyi-makespec src/testmodule/main.py --onedir
pyinstaller -y --clean main.spec
exit
dist/main/main

Running the pyinstaller-packaged program fails with

RuntimeError: Error in dlopen: libtorch_cuda_linalg.so: cannot open shared object file: No such file or directory

@rokm
Copy link
Member

rokm commented Dec 3, 2024

Thanks for additional details. I can reproduce the problem with older torch builds from https://download.pytorch.org/whl (e.g., 1.13.1+cu117, 2.0.1+cu117, and 2.1.2+cu118). Later versions seem to have fixed this, and wheels provided on PyPI do not seem to be affected, regardless of the version.

I guess the proper solution here is to suppress the symlink generation for libraries found in torch/lib. We already do the same for tensorflow and for nvidia packages (which are used by tensorflow and PyPI torch builds).

@rokm rokm added bug Something isn't working and removed state:triage We're still figuring out how severe this issue is labels Dec 3, 2024
@jamesdbrock
Copy link
Author

Thanks @rokm , I believe your PR #836 will solve this issue.

Another object file which suffers from this same issue, which I will write here so that searches will find this issue: torch/lib/libcudnn_cnn_train.so.8.

@rokm rokm closed this as completed in #836 Dec 3, 2024
github-actions bot pushed a commit to wxx9248/Pickle-Rush that referenced this issue Dec 23, 2024
…24.11 (#120)

Bumps
[pyinstaller-hooks-contrib](https://github.com/pyinstaller/pyinstaller-hooks-contrib)
from 2024.10 to 2024.11.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/releases">pyinstaller-hooks-contrib's
releases</a>.</em></p>
<blockquote>
<h2>v2024.11</h2>
<p>Please see the <a
href="https://www.github.com/pyinstaller/pyinstaller-hooks-contrib/tree/v2024.11/CHANGELOG.rst">changelog</a>
for more details</p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/blob/master/CHANGELOG.rst">pyinstaller-hooks-contrib's
changelog</a>.</em></p>
<blockquote>
<h2>2024.11 (2024-12-23)</h2>
<p>New hooks</p>
<pre><code>
* Add hook for ``selectolax`` to collect its data files.
(`[#841](pyinstaller/pyinstaller-hooks-contrib#841)

&lt;https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/841&gt;`_)
<p>Updated hooks
</code></pre></p>
<ul>
<li>(Linux) Update <code>torch</code> hook to suppress creation of
symbolic links to
the top-level application directory for the shared libraries discovered
during binary dependency analysis in <code>torch/lib</code> directory.
This fixes
issues with <code>libtorch_cuda_linalg.so</code> not being found in
spite of it
being collected, as observed with certain <code>torch</code> builds
provided by
<a
href="https://download.pytorch.org/whl/torch">https://download.pytorch.org/whl/torch</a>
(e.g., <code>1.13.1+cu117</code>,
<code>2.0.1+cu117</code>, and <code>2.1.2+cu118</code>).
(<code>[#834](pyinstaller/pyinstaller-hooks-contrib#834)
&lt;https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/834&gt;</code>_)</li>
<li>Update <code>sklearn.tree</code> hook for compatibility with
<code>scikit-learn</code> v1.6.0
(add <code>sklearn.tree._partitioner</code> to hidden imports).
(<code>[#838](pyinstaller/pyinstaller-hooks-contrib#838)
&lt;https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/838&gt;</code>_)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/c89667af18cb5f0860a15cadd35fdefcea3a7131"><code>c89667a</code></a>
Release v2024.11</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/f0344eecd0abd3839ec7dd33c7ecaab372be4654"><code>f0344ee</code></a>
Add hook for selectolax (<a
href="https://redirect.github.com/pyinstaller/pyinstaller-hooks-contrib/issues/841">#841</a>)</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/a715a66fcd495a92a922d95c18e32cef7be05c32"><code>a715a66</code></a>
Scheduled weekly dependency update for week 50 (<a
href="https://redirect.github.com/pyinstaller/pyinstaller-hooks-contrib/issues/839">#839</a>)</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/84412890d69a544f8f41d880dd292407ad7d93f5"><code>8441289</code></a>
hooks: update sklearn.tree hook for compatibility with scikit-learn
v1.6.0</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/141666a7435b79bf11a6715ebd360fc9e261f79e"><code>141666a</code></a>
Scheduled weekly dependency update for week 49 (<a
href="https://redirect.github.com/pyinstaller/pyinstaller-hooks-contrib/issues/837">#837</a>)</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/7d80e4be1bf2d8cdff8508e7934adab0a9a80f53"><code>7d80e4b</code></a>
hooks: torch: suppress generation of symlinks for shared libs</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/39fafd7e8cad13d6995feaa33f4ada62c39f50c0"><code>39fafd7</code></a>
tests: add test for torch.linalg on CUDA</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/72f8526c658cec26b51222199686bf7467ad5d12"><code>72f8526</code></a>
Scheduled weekly dependency update for week 48 (<a
href="https://redirect.github.com/pyinstaller/pyinstaller-hooks-contrib/issues/835">#835</a>)</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/4de49bfdcc0e580c359c0b3a7aa3227da819cfa6"><code>4de49bf</code></a>
Scheduled weekly dependency update for week 47 (<a
href="https://redirect.github.com/pyinstaller/pyinstaller-hooks-contrib/issues/833">#833</a>)</li>
<li><a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/commit/b16c965c336ffbf42e034634c33a5057bbda492d"><code>b16c965</code></a>
Scheduled weekly dependency update for week 46 (<a
href="https://redirect.github.com/pyinstaller/pyinstaller-hooks-contrib/issues/831">#831</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/pyinstaller/pyinstaller-hooks-contrib/compare/v2024.10...v2024.11">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pyinstaller-hooks-contrib&package-manager=pip&previous-version=2024.10&new-version=2024.11)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants