Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable offline installations: poetry install --download-only; poetry install --offline #2184

Open
2 tasks done
jabozzo opened this issue Mar 13, 2020 · 15 comments
Open
2 tasks done
Assignees
Labels
kind/feature Feature requests/implementations

Comments

@jabozzo
Copy link

jabozzo commented Mar 13, 2020

  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have searched the documentation and believe that my question is not covered.

Motivation

After poetry has resolved the dependencies and written the lock file, it proceeds to download and install the dependencies one by one. This could be a problem for systems which are not containerized and just checkout a repo and update their dependencies to update the code. If the internet connection breaks in the middle of the process neither the old code or the new code are fully deployed.

Feature Request (simple)

A --download-then-install flag that would make poetry first download all the dependencies and then proceed to install the already downloaded dependencies could solve this use case.

The use would simply be:

poetry install --download-then-install

If any of the required dependencies could not be downloaded, poetry should not attempt to install neither remove anything and just fail. If all required dependencies had been downloaded successfully, then the installation / removal / updates are preformed using only the information in the local filesystem.

Feature Request (robust)

A combination of flags to separate both processes could also work, such as:

 poetry install --download-only  # Only downloads the dependencies
 poetry install --offline        # Installs dependencies only if all dependencies are in the local cache.

These command would be used in succession and the flags are exclusive. poetry install --offline should first look at the contents of the poetry.lock file to make sure the packages are stored in the local filesystem before performing the installation / removal / update.

@finswimmer finswimmer added the kind/feature Feature requests/implementations label Mar 14, 2020
@roniemartinez
Copy link

@finswimmer I don't think this is feature-worthy:

  1. Poetry is already caching packages so --download-then-install is redundant - https://github.com/python-poetry/poetry/blob/master/docs/docs/configuration.md#cache-dir-string
  2. If you are in a container, you can use --volume (if docker) or just save the caches somewhere else. CI's like Travis and AppVeyor do these by compressing and decompressing the cache directories.

@jabozzo
Copy link
Author

jabozzo commented Mar 19, 2020

@roniemartinez

I think you misunderstood the intention of --download-then-install. For example, if poetry is installing sqlalchemy, numpy and six the idea is that right now poetry is doing:

Download sqlalchemy
Install sqlalchemy
Download numpy
Install numpy
Download six
Install six

the idea is that poetry with the switch does instead:

Download sqlalchemy
Download numpy
Download six
Install sqlalchemy
Install numpy
Install six

For 2, you are right. For my specific use case I'm not in a containerized environment but I can do something similar:

# Download packages
poetry export -f requirements.txt > requirements.txt
pip download --timeout 120 -r  requirements.txt # Added benefit of specifying timeout for bad connections.

# Install later
pip install -r requirements.txt

It does not work for local repositories, though. Since the line in the requirements file end up being -e ./path and pip fails.

@TheFriendlyCoder
Copy link

I think another way to look at this would be as an optimization request. As stated, Poetry does appear to download and cache packages in the users home folder already, but for some reason it doesn't seem to re-use the files in this cache when re-installing the packages ... or at the very least it is doing some additional network operation to look up package data or something prior to installing from the cache. When installing to a workspace that has a lock file available Poetry should be able to see the full list of dependencies and versions of those dependencies that are required. Then it should be able to do a direct lookup in the local cache to see if those exact packages are available. If they are then the tool should proceed to install the package directly with no additional network IO needed. IMO this should be the default behavior of the tool and should not need any additional flags or command line options to customize that behavior.

I will admit that I am making some assumptions here because I have not dug deeply into the source for Poetry, but the reason I believe the tool is still doing some sort of network IO during such install operations is because network IO is super slow for me while working over a corporate VPN so individual package lookups can take many seconds or more, so it can take a considerable amount of time to pull package metadata over the wire. If the tool was not performing any network IO in this case then I would expect installs from the local cache to be extremely fast, but they take pretty much the same amount of time as doing a fresh install when the cache has been fully purged / deleted.

To put it into perspective as to the performance concerns I have here, one of my smaller projects has about 80 dependencies total - transitive and direct - and it takes nearly 90 seconds to generate a lock file for that dependency tree. And that does not include the time it takes to download the actual package files btw. So any efforts that can be made to optimize the network bandwidth here would be of great benefit to users in my situation.

@TheFriendlyCoder
Copy link

I neglected to mention in my last comment that it takes 2-3 minutes to install the dependencies for the sample project I mentioned, even after the poetry.lock file has been generated and the local poetry package cache has been fully populated. If we could shave off 90+ seconds of that by eliminating superfluous network lookups that would be an amazing time saver.

@dcendents
Copy link

+1

I come from a java and maven background where

  1. Code is compiled
  2. main (production) code cannot use test dependencies (dev-dependencies) because they are on different classpaths and reason 1

I'm relatively new to python but I haven't found a way to ensure that except by removing the dev-dependencies before running pylint.

So I'd love to install all dependencies, run my unit tests, remove dev-dependencies, run pylint (or the other way around), but doing this means adding minutes to the build currently.

The best I can come up with is maintaining 2 different venvs and play with the POETRY_VIRTUALENVS_PATH value between commands.

@ewjmulder
Copy link

+1

Another use case: we have a company internal package repo, that is configured in the pyproject.toml. But we build our Docker images in google cloud builder, that does not have access to that. So we'd like to download all packages first, then upload them to google cloud builder and install them there from the downloads. With pip this is possible with the pip download command. So in that case we need to process to be split into 2 independent steps (robust suggestion from OP).

@ivallesp
Copy link

ivallesp commented May 5, 2021

+1 plenty of use cases for protected, non-internet connected environments

@nikolaikopernik
Copy link

nikolaikopernik commented Oct 9, 2021

Hey @python-poetry
Can we think along of a solution here? What I have in mind now is 2 changes:

$ poetry download 
# downloads all the dependencies from the `lock` file (without any modification) to a local 
# folder inside the project (for instance `.locked` - like a project-level cache). If it is 
# possible to copy them from the local cache - they will be just copied into this local folder 
# without downloading. 

$ poetry install --offline  
# will try to install all the dependencies from the lock file without any external connections 
# (using local project cache or machine cache only). If there are not enough libs available 
# locally the installation will fail and won't install anything. 

@roniemartinez It's not enough to have local cache per machine, because then we mix all the dependencies from different projects. If I want to copy my project with all just my dependencies I don't want to copy the entire my cache folder. Although we can use it and do not download what's already been downloaded.

Another point that I miss now in poetry is offline installation. This will allow ship my project with all the dependencies anywhere and create the proper environment there.

So I think these 2 changes can bring some very useful features.

@bsvedin
Copy link

bsvedin commented Jan 20, 2022

npm and yarn both have --prefer-offline options which only hit the network if it is not found in the local cache.
Those were great at speeding up our frontend CI builds.
I would very much like this issue resolution to provide a similar workflow. It would greatly reduce my CI build times.
It would be great if the --prefer-offline option worked for both locking and installing.

I don't really care what the option is called --prefer-offline vs --download and --offline. Whatever you want. So long as it speeds up my life

@songololo
Copy link

songololo commented Mar 24, 2022

Just a comment that it would be nice if a --download flag could be combined with --platform and --only-binary flags to simplify transfer and installation to offline machines.

e.g. in pip this would look something like this:

pip download <packages> --platform=manylinux2010_x86_64 --only-binary=:all: --python-version=3.9 --implementation=cp

@Kaiser1989
Copy link

Kaiser1989 commented Jun 9, 2022

Currently i'm doing this:

poetry export > requirements.txt
poetry run pip download -r requirements.txt -d <lib_folder>
poetry run pip Install -r requirements.txt --no-index --find-links <lib_folder>

With this packages are only installed within the correct Environment.

But with this, i'm not able to run pytest, getting ModuleNotFoundError. Seems that poetry has some additional environment?
I can call tests by running pytest from poetry's python instance

poetry run python -m pytest -s

But still having some issues, as i have relative dependencies within my poetry project. These relative dependencies are not resolved by pip and the requirements.txt

@joaoe
Copy link

joaoe commented Sep 9, 2022

Howdy.

I'm trying to just add a local folder as an installation package

poetry add --group dev --editable ./tests

Our project is big and the tests folder has a lot of test files (unit, integration system), and I want them to be available as a python module.

Yet, that single command tells poetry to try to update every single package in my project and it just takes a lot of time.
Could you pretty please add a poetry add --offline which does not connect online anywhere and just uses that is available int he cache ? Thank you

@RexBarker
Copy link

RexBarker commented Jun 7, 2023

Just wondering what the conclusion is here. It seems this case was silently buried with no definite solution (as far as I can tell from the related issue trackers).

I was looking for a simple solution to install offline wheel files, similar to:
python -m pip install --no-index --no-deps wheelhouse/*.whl
...but using poetry directly.

@fre-sch
Copy link

fre-sch commented Jun 28, 2023

From the perspective of continuous integration and continuous deployments, this is also a desirable feature, I'd even say an expectable feature from a dependency manager.

In a CI/CD environment I want my build process to result in an artifact that's as atomically deployable as possible. To me this means including the resulting wheel file, but also all dependencies in exact versions as were used during this process. Aside from ensuring a 1:1 behavior/feature match of CI/CD and deployed status, this also helps prevent incomplete deployments due to networking issues. It's possible that the package repositories become unavailable between the CI/CD process run and the deployment, either due to service outages, or deployment into an environment without public network access. Additionally collecting the dependencies means rollbacks to previous versions (of the build) is largely atomic when those contain the dependencies at that time.

While extremely uncommon for public repositories such as PyPI, it's technically still possible for package contents in the repository to change regardless of versioning schemes. Hashing and checking those only help in detecting such situations, but doesn't help with content changing in-between download, installation of dependencies and having deployable artifacts. Again this is much less an issue with (mostly) immediate build -> deploy processes, but very relevant for rollbacks to previous build results.

Even if users here have provided a workaround, it feels like a project dependency management tool should provide this functionality.

@samhooke
Copy link

Download public and private packages for offline install

To expand upon the poetry export workarounds given above, for private repositories I found it helpful to add the --with-credentials option:

# Download all packages (public and private)
poetry export --with-credentials > requirements.txt
poetry run pip download -r requirements.txt -d packages/

Using --with-credentials allows the subsequent pip download command to authenticate with private repositories. This works even if you have two sources defined (e.g. a source of public PyPI with priority = "default" and source of a private repository with priority = "supplemental").

Beware that this does mean the generated requirements.txt contains sensitive information, though if you treat the file as ephemeral then it is less of a risk.

Install packages offline

To perform the offline install, I prefer to run a local PyPI server on the offline machine so that I can use poetry install rather than poetry run pip install ....

First install pypiserver on the offline machine, then copy over the packages directory created in the previous section, and from within that directory run the local PyPI server:

# Run the local PyPI server
pypi-server run .

It's then necessary to modify the pyproject.toml, by changing the url in all your sources to point at the local PyPI server (typically http://localhost:8080/simple/). If done correctly, you can use Poetry for the install as normal, i.e.:

# Install as normal, but from the local PyPI server
poetry install

This will emit Warning: poetry.lock is not consistent with pyproject.toml., since the content-hash at the bottom of the poetry.lock file will no longer match the hash of the pyproject.toml file, but from my current understanding it's okay to ignore the warning for this specific case. For more details I've put further notes here.

@Secrus Secrus self-assigned this Oct 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Feature requests/implementations
Projects
None yet
Development

Successfully merging a pull request may close this issue.