Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtualenv 20: is the symlink hack really worth it? #1563

Closed
asottile opened this issue Feb 11, 2020 · 14 comments · Fixed by #1578
Closed

virtualenv 20: is the symlink hack really worth it? #1563

asottile opened this issue Feb 11, 2020 · 14 comments · Fixed by #1578
Labels

Comments

@asottile
Copy link
Contributor

I did some timing and it seems like the trouble it causes is not really worth it -- at the very least I'd like an option which copies instead of symlinks

Here's some timing I did to try and guage the differences -- since there's no options I could find I toggled this line to if False to get my "copy" data:

with symlinks

my platform for this example is relatively low powered, a 2015 MBP

$ rm -rf vvv; time virtualenv vvv

real	0m0.128s
user	0m0.107s
sys	0m0.023s
$ rm -rf vvv; time virtualenv vvv

real	0m0.128s
user	0m0.118s
sys	0m0.012s
$ rm -rf vvv; time virtualenv vvv

real	0m0.123s
user	0m0.121s
sys	0m0.004s
$ rm -rf vvv; time virtualenv vvv

real	0m0.119s
user	0m0.117s
sys	0m0.004s
$ rm -rf vvv; time virtualenv vvv

real	0m0.127s
user	0m0.109s
sys	0m0.020s

disk usage:

$ du -hs vvv
128K	vvv

problems this can cause:

$ # copied to same path on other machine
$ ./vvv/bin/python -c 'import setuptools'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'setuptools'
$ ./vvv/bin/pip --help
Traceback (most recent call last):
  File "./vvv/bin/pip", line 6, in <module>
    from pip._internal.cli.main import main
ModuleNotFoundError: No module named 'pip'

with copies

$ rm -rf vvv; time virtualenv vvv

real	0m0.179s
user	0m0.155s
sys	0m0.050s
$ rm -rf vvv; time virtualenv vvv

real	0m0.185s
user	0m0.158s
sys	0m0.050s
$ rm -rf vvv; time virtualenv vvv

real	0m0.183s
user	0m0.160s
sys	0m0.048s
$ rm -rf vvv; time virtualenv vvv

real	0m0.172s
user	0m0.162s
sys	0m0.035s
$ rm -rf vvv; time virtualenv vvv

real	0m0.181s
user	0m0.142s
sys	0m0.065s
$ du -hs vvv
7.5M	vvv

trade off

so we're looking at ~60ms of time overhead -- which (imo) isn't that much -- the disk usage is another concern but we're still taking that usage one way or another

other considerations

hardlinks would be another consideration -- it would alleviate the problems I have with symlinks (caches, using virtualenv as a deployment mechanism, etc.) -- I'd have to do some implementation work to verify that case

@nsoranzo
Copy link
Contributor

I'm experiencing the ModuleNotFoundError: No module named 'pip' error mentioned above in https://travis-ci.org/galaxyproject/galaxy/jobs/648435619 , hoping for a solution.

@gaborbernat
Copy link
Contributor

@nsoranzo your issue is separate from the topic of this discussion, please open a new issue for that. @asottile I'll address your point raised after there are no more bugfixes needed at a later time; but note you can use --copies to get the copy behaviour (for both python files, and the app-data part). That being said if the app-data folder is causing you issues and you don't care about performance you really should be using the pip seeder.

@asottile
Copy link
Contributor Author

asottile commented Feb 11, 2020

I do care about performance, the copies approach is ~180ms whereas the pip approach is >3s

I don't want copies of the python executable

@asottile
Copy link
Contributor Author

also --copies does not do copies:

$ virtualenv vvv --copies
$ tree vvv
vvv
├── bin
│   ├── activate
│   ├── activate.csh
│   ├── activate.fish
│   ├── activate.ps1
│   ├── activate_this.py
│   ├── activate.xsh
│   ├── easy_install
│   ├── easy_install3
│   ├── easy_install-3.6
│   ├── pip
│   ├── pip3
│   ├── pip-3.6
│   ├── python
│   ├── python3
│   ├── python3.6
│   ├── wheel
│   ├── wheel3
│   └── wheel-3.6
├── lib
│   └── python3.6
│       └── site-packages
│           ├── easy_install.py -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/setuptools-45.2.0-py3-none-any/easy_install.py
│           ├── pip -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/pip-20.0.2-py2.py3-none-any/pip
│           ├── pip-20.0.2.dist-info -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/pip-20.0.2-py2.py3-none-any/pip-20.0.2.dist-info
│           ├── pip-20.0.2.dist-info.virtualenv -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/pip-20.0.2-py2.py3-none-any/pip-20.0.2.dist-info.virtualenv
│           ├── pkg_resources -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/setuptools-45.2.0-py3-none-any/pkg_resources
│           ├── setuptools -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/setuptools-45.2.0-py3-none-any/setuptools
│           ├── setuptools-45.2.0.dist-info -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/setuptools-45.2.0-py3-none-any/setuptools-45.2.0.dist-info
│           ├── setuptools-45.2.0.dist-info.virtualenv -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/setuptools-45.2.0-py3-none-any/setuptools-45.2.0.dist-info.virtualenv
│           ├── wheel -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/wheel-0.34.2-py2.py3-none-any/wheel
│           ├── wheel-0.34.2.dist-info -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/wheel-0.34.2-py2.py3-none-any/wheel-0.34.2.dist-info
│           └── wheel-0.34.2.dist-info.virtualenv -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/wheel-0.34.2-py2.py3-none-any/wheel-0.34.2.dist-info.virtualenv
└── pyvenv.cfg

11 directories, 23 files

@gaborbernat
Copy link
Contributor

Yeah, fixed that part of #1575, haven't released yet as first want to get in #1571 with it, that is failing at the moment.

@gaborbernat
Copy link
Contributor

I think the symlink approach is very much worth it; especially on Windows with some anti-virus active. That being said, should it be the default option? Maybe not.

@asottile
Copy link
Contributor Author

I guess my thought is that 60ms is not worth a sacrifice of correctness for sane defaults (whereas 3s+ is of course unacceptable)

In addition to breaking caching of virtualenvs in CI, I've also found it breaks our deployment system at lyft (which produces venvs at a well known location, then tars them up to deploy them)

I don't really want to continue having to "virtualenv's defaults are broken use --XXX" as I've had to do for --no-download for so long (thanks for fixing that by the way! 🙏)

@gaborbernat
Copy link
Contributor

I'll create a PR that adds a separate flag for controlling the app-data copy/symlink behaviour, and make it copy by default on all platforms. With a bit of good progress should be out in the next two hours together with some other fixes.

@gaborbernat
Copy link
Contributor

Just for reference on Windows with some more strict anti-virus (on non-SSD harddisks) the difference is more than 60ms; it's more in the realm of 10 seconds.

@emonty
Copy link

emonty commented Feb 11, 2020

Maybe then we could make the default for seeder be pip on non-Windows and the default be the symlink thing on Windows?

@gaborbernat
Copy link
Contributor

I consider the app-data path via copy superior on all cases; the symlink one is the more dangerous one. The pip seeder is 3s+ on non UNIX, and even longer on Windows. This way the default will be 200ms on UNIX, but users can opt-in into the faster --symlink-app-data if they can ensure that the symlinks are not broken.

@emonty
Copy link

emonty commented Feb 11, 2020

Yeah - I totally get it for the Windows users.

Another use case that just broke for us, FWIW - is we have the CI system create a few shared virtualenvs that go into /usr/local that other things use to get their hands on some tools that don't want to have their depends installed globally. Those shared venvs are installed by root, since they're going in a shared location. BUT - that means that the symlinks are to /root/.local which on some base OS's is chmod 770 - so the virtualenvs just became unusable. We're fixing that with --seeder=pip - but there's gonna be a bazillion corner cases like that for folks using virtualenv under *nix and if the main performance win is non-*nix, maybe let's keep the default new behavior there? Just talking out loud ...

@gaborbernat
Copy link
Contributor

As a developer on UNIX I very much prefer being done in 200ms; as over to 3 seconds. So we'll keep the app-data as the default seeder I believe. We'll work through the edge cases as they come up.

@gaborbernat
Copy link
Contributor

Hello, a fix for this issue has been released via virtualenv 20.0.2; see https://pypi.org/project/virtualenv/20.0.2/ (https://virtualenv.pypa.io/en/latest/changelog.html#v20-0-2-2020-02-11) . Please give a try and report back if your issue has not been addressed; if not, please comment here, and we'll reopen the ticket. We want to apologize for the inconvenience this has caused you and say thanks for having patience while we resolve the unexpected bugs with this new major release.

thanks

@pypa pypa locked and limited conversation to collaborators Jan 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants