Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Python packages due to cleared $PYTHONPATH #22681

Open
Flamefire opened this issue Oct 16, 2024 · 19 comments
Open

Missing Python packages due to cleared $PYTHONPATH #22681

Flamefire opened this issue Oct 16, 2024 · 19 comments

Comments

@Flamefire
Copy link
Contributor

Problem Description

We install Spyder alongside other globally installed Python packages that are in non-standard locations. To make Python find them we either add the locations to $PYTHONPATH or use a custom sitecustomize.py whose location must be in $PYTHONPATH

When starting Spyder it fails with errors such as No module named 'zmq'.

I traced that to

if os.environ.get('PYTHONPATH'):
for path in os.environ['PYTHONPATH'].split(os.pathsep):
try:
sys.path.remove(path.rstrip(os.sep))
except ValueError:
pass

That removes the entries that are required such that Python finds the modules and our sitecustomize

Removing that results in a similar problem as the IDE console shows

Error while finding module specification for 'spyder_kernels.console' (ModuleNotFoundError: No module named 'spyder_kernels')

That arises due to removing $PYTHONPATH from the environment the kernel starts in:

env_vars.pop('PYTHONPATH', None)

I'm not fully sure what the intention was, but maybe that can be solved differently. Is that possible?

Versions

  • Spyder version: 5.5.1
  • Python 3.11.3 64-bit | Qt 5.15.10 | PyQt5 5.15.10 | Linux 4.18.0-513.24.1.el8_9.x86_64 (x86_64)
@ccordoba12
Copy link
Member

Hey @Flamefire, thanks for reporting. You said:

We install Spyder alongside other globally installed Python packages that are in non-standard locations. To make Python find them we either add the locations to $PYTHONPATH or use a custom sitecustomize.py whose location must be in $PYTHONPATH

We haven't contemplated the possibility of people building a Python installation by adding different directories to PYTHONPATH.

I'm not fully sure what the intention was

We remove PYTHONPATH before starting Spyder and from kernels to avoid users adding directories to it with modules named random.py, string.py, math.py, etc. Those modules would mask the respective standard library ones and cause major breakages. Most people are unaware that that is even possible and were very frustrated when it happened because it's not an easy problem to understand and find help about.

but maybe that can be solved differently. Is that possible?

I don't think there is a simple way to prevent the problem I mentioned without dropping PYTHONPATH from the env. But I'm open to suggestions.

@Flamefire
Copy link
Contributor Author

We remove PYTHONPATH before starting Spyder and from kernels to avoid users adding directories to it

How exactly are they adding directories to it? IIRC I read in one of the related issues that it was about the current directory being used implicitly as a python search path. Maybe that could be detected and only that one removed from PYTHONPATH, potentially together with any empty values as IIRC PYTHONPATH=/foo: would mean the same as /foo:$PWD

@mrclary
Copy link
Contributor

mrclary commented Oct 17, 2024

PYTHONPATH is for user-provided and non-standard-library paths and should not be relied upon for Python or installed packages to work properly. Unfortunately, any user-provided path via PYTHONPATH presents a potential hazard to Spyder, not just the current working directory, so we cannot selectively "clean" this environment variable.

Rather than PYTHONPATH, I would recommend using a .pth file located in ../lib/python3.11/site-packages, relative to the Python executable you are using.
See https://docs.python.org/3.11/library/site.html#module-site

@Flamefire
Copy link
Contributor Author

Unfortunately that doesn't work for our use case: We have environment modules managed via LMod to allow users to choose their desired version of software in a (heavily) shared environment/filesystem.

Those "modules" can be though of as (bash) scripts that are sourced and set/update and export environment variables. Hence the only thing a module providing a set of Python packages can do is append or prepend to $PYTHONPATH.

In particular we cannot modify the global Python installation and put additional files in there, so the .pth approach does not work for us.

We are switching to a mechanism using sitecustomize.py that reads a custom env variable to modify sys.path (as also described in your link). However we do put that into a separate folder and pass its location via $PYTHONPATH.

IIRC the $PWD is added to sys.path not $PYTHONPATH, I confused that. So any $PYTHONPATH would be something explicitly set by a user, so most "should" be aware of it.

As the intention is that people "were very frustrated when it happened because it's not an easy problem to understand and find help about." maybe another solution is to detect a set $PYTHONPATH and show a warning on startup to educate users that don't know about that but still allow this mechanism for users that do know and intended that (at least some like us)
If there is a message like "You have set $PYTHONPATH. If you encounter errors related to imports of Python packages you can try unsetting that environment variable before starting Spyder" then user that are not aware of that have some help right there without taking away control from all users.

I can also imaging another variable like $SPYDER_CLEAN_PYTHONPATH=0/false to disable that which we can set in the module. That would be in line with the standard $PYTHONNOUSERSITE or $PIP_REQUIRE_VIRTUALENV variables.

@mrclary
Copy link
Contributor

mrclary commented Oct 18, 2024

@Flamefire, thank you for the deeper explanation. I think I understand better now what your situation is.

How much control do you have over these module scripts?

Could you do something like the following?
Assuming there is a module located at /apps/modulefiles/mymodule/<ver>.lua, create a .pth file in /apps/modulefiles/mymodule/lib/python3.11/site-packages, containing the necessary paths. Rather than setting the PYTHONPATH variable in the module, set the PYTHONUSERBASE environment variable to /apps/modulefiles/mymodule. After loading this module, running Python will result in sys.path containing all the paths listed in the .pth file, without needing to modify PYTHONPATH.

@Flamefire
Copy link
Contributor Author

@mrclary Thanks for the reply and suggestions

How much control do you have over these module scripts?

More or less full control. BUT:
I was reporting this in the context of EasyBuild that installs the software and creates the modules. So we can control the contents of the modules. However there are many existing modules already installed on multiple HPC clusters so changing those existing modules is virtually impossible.

Rather than setting the PYTHONPATH variable in the module, set the PYTHONUSERBASE environment variable

I didn't know about that variable and it might come in handy.
However for our situation it doesn't help: There are multiple modules with Python packages. At the extreme it could be one module per package but in practice we have something like: 1 for Python (binaries etc.) and some base packages, 1 for commonly used packages from PyPI (~80 packages), 1 for numpy&scipy. And then some on top, e.g. h5py which requires HDF5 to be installed first, so it's sensible to put that in a extra modules in case some sites don't need it (and hence HDF5)

PYTHONPATH has the advantage of being a list. Hence we can prepend paths to this list in the same way we do with $PATH and $LD_LIBRARY_PATH.

So one solution I see is putting our sitecustomize.py into the site-packages folder and using that mechanism to read possible sys.path entries from our custom env variable, so we no longer require $PYTHONPATH for that.
However that doesn't help with existing modules and we only recently introduced an option to use that as the default mechanism instead of $PYTHONPATH. But sites can still opt to use $PYTHONPATH for whatever reasons which would then cause that problem again for them.

And as confusing as it might be if users set up $PYTHONPATH and put modules/files inside those shadowing standard library ones it was very confusing for us to find the (by now 2) places where the PYTHONPATH we purposely set up was removed.
Hence the idea of only showing a warning as a starting point for users that are not aware of that.

@mrclary
Copy link
Contributor

mrclary commented Oct 19, 2024

More or less full control. BUT: I was reporting this in the context of EasyBuild that installs the software and creates the modules. So we can control the contents of the modules. However there are many existing modules already installed on multiple HPC clusters so changing those existing modules is virtually impossible.

Yes, I see. That would require updating the EasyBuild and redeploy/reinstall across your clusters.

Rather than setting the PYTHONPATH variable in the module, set the PYTHONUSERBASE environment variable

I didn't know about that variable and it might come in handy. However for our situation it doesn't help: There are multiple modules with Python packages. At the extreme it could be one module per package but in practice we have something like: 1 for Python (binaries etc.) and some base packages, 1 for commonly used packages from PyPI (~80 packages), 1 for numpy&scipy. And then some on top, e.g. h5py which requires HDF5 to be installed first, so it's sensible to put that in a extra modules in case some sites don't need it (and hence HDF5)

Yes, I see. Python also provides PYTHONHOME for the purpose of using a non-standard location for the python libraries, but only for up to two locations and doesn't support scaling to 3 or more locations. In or order for PYTHONUSERBASE to work, each module would have to add/remove a .pth file to the $PYTHONUSERBASE/lib/python3.11/site-packages upon load/unload, which I don't think is possible.

So one solution I see is putting our sitecustomize.py into the site-packages folder and using that mechanism to read possible sys.path entries from our custom env variable, so we no longer require $PYTHONPATH for that.

I suppose that sitecustomize.py would require logic to determine which modules are loaded and modify sys.path accordingly.

However that doesn't help with existing modules and we only recently introduced an option to use that as the default mechanism instead of $PYTHONPATH. But sites can still opt to use $PYTHONPATH for whatever reasons which would then cause that problem again for them.

Yep. Either the change is made with EasyBuild and redeploy/reinstall across your systems, or changes are made to Spyder and you update Spyder. How is Spyder installed in your situation? Does it have its own module or is it installed in the base Python location?

Python 3.11 introduced the PYTHONSAFEPATH which prevents prepending potentially unsafe paths to sys.path. I don't know to what extent this applies to paths in PYTHONPATH but nevertheless cannot be used since Spyder still supports Python <3.11; but it might be something to look at in the future.

So, in conclusion, I agree that @Flamefire's suggestion to use an environment variable to switch PYTHONPATH treatment in Spyder may be the best solution for now. I propose SPY_ADD_PYTHONPATH. Spyder will not add paths from PYTHONPATH to sys.path upon startup if SPY_ADD_PYTHONPATH is unset or evaluates to False; Spyder will add paths from PYTHONPATH to sys.path upon startup if SPY_ADD_PYTHONPATH evaluates to True. What do @spyder-ide/core-developers think?

@mrclary
Copy link
Contributor

mrclary commented Oct 20, 2024

In or order for PYTHONUSERBASE to work, each module would have to add/remove a .pth file to the $PYTHONUSERBASE/lib/python3.11/site-packages upon load/unload, which I don't think is possible.

Actually, I think this is possible, but still not convenient.

@Flamefire
Copy link
Contributor Author

Flamefire commented Oct 20, 2024

Yes, I see. Python also provides PYTHONHOME for the purpose of using a non-standard location for the python libraries, but only for up to two locations and doesn't support scaling to 3 or more locations. In or order for PYTHONUSERBASE to work, each module would have to add/remove a .pth file to the $PYTHONUSERBASE/lib/python3.11/site-packages upon load/unload, which I don't think is possible.

Yes that is not possible because it is a shared filesystem, so that location needs to be at least specific to each user as they must not interfere with each other. But users can also have multiple independent tasks (from different shell sessions) running concurrently so that location must also be per-user-per-shell-session and still on the global filesystem which is impractical.

I suppose that sitecustomize.py would require logic to determine which modules are loaded and modify sys.path accordingly.

Correct. We have a shell variable that is prepended to by modules and that file reads it and adds the appropriate paths to sys.path

Yep. Either the change is made with EasyBuild and redeploy/reinstall across your systems, or changes are made to Spyder and you update Spyder. How is Spyder installed in your situation? Does it have its own module or is it installed in the base Python location?

Yes it has it's own module. So updating that is much easier, especially as currently it cannot be used as a module, although we have a workaround: We remove the unset-code with a patch. But a general solution avoiding the need for patches going forward would be appreciated.

Python 3.11 introduced the PYTHONSAFEPATH which prevents prepending potentially unsafe paths to sys.path. I don't know to what extent this applies to paths in PYTHONPATH

That seems to avoid the common issue with "wrong" files next to the file you want to execute. Interesting option indeed.

So, in conclusion, I agree that @Flamefire's suggestion to use an environment variable to switch PYTHONPATH treatment in Spyder may be the best solution for now. I propose SPY_ADD_PYTHONPATH. Spyder will not add paths from PYTHONPATH to sys.path upon startup if SPY_ADD_PYTHONPATH is unset or evaluates to False; Spyder will add paths from PYTHONPATH to sys.path upon startup if SPY_ADD_PYTHONPATH evaluates to True. What do @spyder-ide/core-developers think?

Just to be exact: Currently spider unsets $PYTHONPATH and removes those from sys.path after the fact. So maybe an inverse name is better SPY_CLEAN_PYTHONPATH which defaults to True to keep the current behavior but can be set to False/0/Off e.g. in our module. Or default to False and show a warning with a hint that this can be used given that the cleaning is basically a workaround for user mistakes, not something strictly required. So I'd use the more common case by default to have more testing/experience for that.

@mrclary
Copy link
Contributor

mrclary commented Oct 20, 2024

Just to be exact: Currently spider unsets $PYTHONPATH and removes those from sys.path after the fact. So maybe an inverse name is better SPY_CLEAN_PYTHONPATH which defaults to True to keep the current behavior but can be set to False/0/Off e.g. in our module. Or default to False and show a warning with a hint that this can be used given that the cleaning is basically a workaround for user mistakes, not something strictly required. So I'd use the more common case by default to have more testing/experience for that.

CLEAN does describe what's going on better, however the default value of the variable will be unset and it will be anti-idiomatic to have the unset variable and the variable set to truish both perform the same action. The unset variable and setting the variable to falsey should produce the same result. ADD indicates that the paths should be (re)added to Spyder's runtime sys.path.

@Flamefire
Copy link
Contributor Author

CLEAN does describe what's going on better, however the default value of the variable will be unset and it will be anti-idiomatic to have the unset variable and the variable set to truish both perform the same action. The unset variable and setting the variable to falsey should produce the same result. ADD indicates that the paths should be (re)added to Spyder's runtime sys.path.

True, I was just confused of what "ADD" would mean from the perspective of someone using $PYTHONPATH for that already so it sounds like it would add them twice.
Maybe SPY_USE_PYTHONPATH or SPY_ALLOW_PYTHONPATH? Or as in PYTHONNOUSERSITE/PYTHONDONTWRITEBYTECODE it could be something like SPY_NOCLEAN_PYTHONPATH

@ccordoba12
Copy link
Member

So, in conclusion, I agree that @Flamefire's suggestion to use an environment variable to switch PYTHONPATH treatment in Spyder may be the best solution for now. I propose SPY_ADD_PYTHONPATH. Spyder will not add paths from PYTHONPATH to sys.path upon startup if SPY_ADD_PYTHONPATH is unset or evaluates to False; Spyder will add paths from PYTHONPATH to sys.path upon startup if SPY_ADD_PYTHONPATH evaluates to True. What do @spyder-ide/core-developers think?

I agree that that's the best we could do to allow this use case.

@mrclary
Copy link
Contributor

mrclary commented Oct 22, 2024

`` it could be something like SPY_NOCLEAN_PYTHONPATH

I like this. 👍🏼

@mrclary
Copy link
Contributor

mrclary commented Oct 22, 2024

@Flamefire, If I submit a pull request, will you be able to help me test it on your system? I can, of course, test the state of sys.path with these changes locally, but I want to be sure that that everything works in your application environment.

@Flamefire
Copy link
Contributor Author

Yes I can test that as time allows. So there might be a bit of delay but it isn't so urgent that it should be fine.

@dalthviz
Copy link
Member

Note: Could this be related/should supersede #21633 ?

@mrclary
Copy link
Contributor

mrclary commented Oct 24, 2024

Note: Could this be related/should supersede #21633 ?

No; I think it is related to Windows subsystem for Linux. I left a comment.
However, the proposed solution for #22681 (current issue), may provide a workaround for #21633. But I think #21633 requires a dedicated fix.

@mrclary
Copy link
Contributor

mrclary commented Oct 25, 2024

@Flamefire @smax1,
I'll summarize what I think are the primary issues:

  • Spyder removes PYTHONPATH paths from its runtime environment sys.path, causing Spyder to fail to launch due to its inability to find necessary packages in non-standard locations (specified in PYTHONPATH). Spyder-kernels may be affected in the same way.
  • To pass PYTHONPATH on to user processes (IPython Console and others), Spyder only obtains PYTHONPATH from shell startup files (interactive + login) and not from the environment variables in its current launch environment, precluding possible changes from e.g. environment modules.

@Flamefire
Copy link
Contributor Author

@mrclary Correct. And just to be clarify on the 2nd point: When started from a console on Linux PYTHONPATH from the environment is honored except paths matching a certain pattern

And for both we want an option to disable any filtering

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants