Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to extend PATH with additional console scripts #1097

Closed
jneuff opened this issue Nov 6, 2020 · 8 comments · Fixed by #1153
Closed

Add option to extend PATH with additional console scripts #1097

jneuff opened this issue Nov 6, 2020 · 8 comments · Fixed by #1153
Assignees

Comments

@jneuff
Copy link

jneuff commented Nov 6, 2020

Background
I deploy apache airflow as a PEX file with -c airflow. Now, when I run airflow.pex webserver I get FileNotFoundError: [Errno 2] No such file or directory: 'gunicorn': 'gunicorn'. That's because the airflow webserver command calls gunicorn to start the webserver. As gunicorn is part of the PEX, I can call it with PEX_SCRIPT=gunicorn airflow.pex, but it is not in the PATH.

Workaround
My current solution is to have a script called gunicorn:

#!/bin/sh

PEX_SCRIPT=gunicorn airflow.pex $@

Adding this script to PATH before running airflow webserver works.

Proposal
It would come in handy to have an option for PEX to add additional console scripts contained in the PEX to the PATH. Creating the PEX file could look like this:

pex -r requirements.txt -c airflow --extend-path-with-scripts gunicorn -o airflow.pex

And of course a related runtime environment variable would make sense too.

What's your opinion on this? I'd be happy to contribute this feature!

@jsirois
Copy link
Member

jsirois commented Nov 6, 2020

This makes sense.

To be clear though, on the surface its not as trivial as it may sound. Since a PEX is a zipfile, you can't add items within it to the PATH directly. You need to extract those items and add extracted locations to the PATH (and, in some cases, to sys.path). It turns out PEX files already extract all contained distributions to a location under ~/.pex at runtime (if not run and extracted previously) though, so its likely the case that most heavy lifting needed to do this is in place.

Digging deeper, more problems surface. Pretend this is all implemented and consider a typical gunicorn script that will now be on the PATH:

$ cat /home/jsirois/.pex/installed_wheels/5b9580f6c90af9b2d97488e3d17143cca0b6de2a/gunicorn-20.0.4-py2.py3-none-any.whl/bin/gunicorn 
#!/usr/bin/python3.8
# -*- coding: utf-8 -*-
import re
import sys
from gunicorn.app.wsgiapp import run
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(run())

That shebang will be a problem for a multiplatform PEX. Say the PEX was built with --interpreter-constraint=>=3.6,<4 or --python=python3.8 --python=python3.9. Since gunicorn is a universal distribution, the various pythons the PEX is built for will all use the same gunicorn-20.0.4-py2.py3-none-any.whl. As such, one of those interpreters will be the one to install the gunicorn distrubution into the PEX file and that interpreter will set the script shebang. Now when the PEX file is shipped to a machine with only Python 3.9 using this example, things won't work.

@jneuff
Copy link
Author

jneuff commented Nov 10, 2020

Thanks for pointing out these difficulties. My first idea was to just add the existing scripts to PATH, but as you showed, this will not work for multiplatform PEX.

Another approach would be to expose the scripts in a central bin directory somewhere in the unzipped PEX. Either in a similar fashion to my workaround, or directly using the mechanics PEX uses to run a console script. I'll have to look into the details of what happens when executing a PEX file.

Regarding the extraction: Are the contents of a PEX file always exposed under $PEX_ROOT/unzipped_pexes?

@jsirois
Copy link
Member

jsirois commented Nov 12, 2020

Another approach would be to expose the scripts in a central bin directory somewhere in the unzipped PEX. Either in a similar fashion to my workaround, or directly using the mechanics PEX uses to run a console script. I'll have to look into the details of what happens when executing a PEX file.

You may have missed the significance of the shebang in scripts. Those shebangs are pinned to one interpreter - python3.8 in the gunicorn example above. That's fine if you've built a single-interpreter PEX but many folks use PEX to produce multi-interpreter / multi-platform PEXes. In that case the script will only work for one of the targeted interpreters. If the PEX is shipped to a machine without that interpreter but with another compatible interpreter the PEX will work up until user code tries to run that script at which point it will fail.

Regarding the extraction: Are the contents of a PEX file always exposed under $PEX_ROOT/unzipped_pexes?

No, only PEX files created with --unzip which is not the default. For those other pexes, their installed wheels are always unzipped prior to execution in $PEX_ROOT/installed_wheels; thus the /home/jsirois/.pex/installed_wheels/5b9580f6c90af9b2d97488e3d17143cca0b6de2a/gunicorn-20.0.4-py2.py3-none-any.whl/bin/gunicorn example above.

@jneuff
Copy link
Author

jneuff commented Nov 24, 2020

You may have missed the significance of the shebang in scripts. Those shebangs are pinned to one interpreter - python3.8 in the gunicorn example above. That's fine if you've built a single-interpreter PEX but many folks use PEX to produce multi-interpreter / multi-platform PEXes. In that case the script will only work for one of the targeted interpreters. If the PEX is shipped to a machine without that interpreter but with another compatible interpreter the PEX will work up until user code tries to run that script at which point it will fail.

So basically, if I understand this correctly, the -c feature is broken for multiplatform PEX. I thought there'd be some mechanics to deal with the shebangs (but then of course, we could just use these mechanics here). In my opinion, fixing the -c behavior for multiplatform PEX is orthogonal to this issue at hand.

@jsirois
Copy link
Member

jsirois commented Nov 24, 2020

You indeed don't understand correctly.

When you build a PEX file using -c, that script is validated to exist and then the script name is stored in the PEX file (in PEX-INFO). When the PEX file is executed, it first reads PEX-INFO and learns it should hand control to a script. It then finds that script and executes it by reading the script contents and then executing that via effectively python eval:
https://github.com/pantsbuild/pex/blob/16a4b3a4980008fe47a509afc3b24381a6649a95/pex/pex.py#L573-L605

N.B.: Since the script code is directly executed in the runtime interpreter, the shebang is discarded since it's just a comment at the top of the python script file.

The key difference here is Pex executes the script from -c in-process whereas it sounds like you want to execute additional scripts via user or 3rdparty code via subprocess (i.e.: via the os which requires the shebang).

@jneuff
Copy link
Author

jneuff commented Nov 24, 2020

Thanks for the clarification. I just dug into the code and found the two functions you quoted. So PEX makes sure all scripts work as expected.

My proposal from above is to execute the designated additional scripts either by wrapping them in a shell script that actually calls the PEX or directly using PEX mechanics (maybe similar to the __main__.py found in PEX files).

As far as I know, executables on the PATH must be files. Thus, there is no way around representing the scripts as such. Now, I think rendering a shell script for each additional script that just calls PEX is a simple and good solution.

The question is, where to put these scripts. For additional scripts that are specified at build time, we could have a bin directory under $PEX_ROOT/bin/<PEX hash>. That contains files like these:

#!/bin/sh

PEX_SCRIPT=gunicorn exec /path/to/my.pex $@

Then we extend the PATH with this directory.

But what happens if you specify additional scripts at runtime, like PEX_ADDITIONAL_SCRIPTS=gunicorn? We cannot just put their wrappers into $PEX_ROOT/bin/<PEX hash>, because next time you might run the PEX without PEX_ADDITIONAL_SCRIPTS.. Probably we have to ensure the correct state of $PEX_ROOT/bin/<PEX hash> for every call, that would solve this problem.

So, to make this fly, I need to provide for:

  • Build-time arguments to set additional scripts.
  • Storage of this information in PEX-INFO.
  • Run-time environment variables.
  • Combination of run-time and build-time information.
  • Creation (or restoration) of the $PEX_ROOT/bin/<PEX hash> directory.
  • Extension of PATH.

And of course we could extend this feature to additional -m entry points as well.

Does this sound reasonable to you?

@jneuff
Copy link
Author

jneuff commented Dec 21, 2020

@jsirois From my point of view the new pex-tools venv feature covers the requirements outlined in this issue. Sorry, I couldn't give more feedback on that PR. I see you kept this issue on the release docket - do you think this feature is still relevant now? Or shall we close this issue?

@jsirois
Copy link
Member

jsirois commented Dec 21, 2020

I don't consider the --include-tools/PEX_TOOLS=1 my.pex venv ... a completed solution so I left this open. I have a branch implementing a new --venv [prepend|append] build-time flag similar to the existing --unzip build time flag that will package a PEX file such that when it runs it will automatically crate itself a venv (under ~/.pex/venvs/...` and re-execute from there. That will close #962 and this issue as well.

@jsirois jsirois self-assigned this Dec 21, 2020
jsirois added a commit that referenced this issue Dec 24, 2020
The new --venv execution mode builds a PEX file that includes pex.tools
and extracts itself into a venv under PEX_ROOT upon 1st execution or any
execution that might select a diffrent interpreter than the default.

In order to speed up the local build and execute case, --seed mode is
added to seed the PEX_ROOT caches that will be used at runtime. This is
important for --venv mode since venv seeding depends on the selected
interpreter and one is already selected during the PEX file build
process.

Fixes #962
Fixes #1097
Fixes #1115
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants