Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File encoding issue in _install_wheel (pip 20.2, python 2) #8648

Closed
sbidoul opened this issue Jul 29, 2020 · 7 comments · Fixed by #8684
Closed

File encoding issue in _install_wheel (pip 20.2, python 2) #8648

sbidoul opened this issue Jul 29, 2020 · 7 comments · Fixed by #8684
Labels
C: encoding Related to text encoding and likely, UnicodeErrors kind: crash For situations where pip crashes Python 2 only Python 2 specific
Milestone

Comments

@sbidoul
Copy link
Member

sbidoul commented Jul 29, 2020

Environment

  • pip version: 20.2
  • Python version: 2.7
  • OS: linux

Description

pip install mr.bob fails with pip 20.2:

Processing /home/sbi-local/.cache/pip/wheels/d3/64/58/cd7b9ff05f450397981fa37a0a279ed1d12cf763af16d8d7d7/mr.bob-0.1.2-py2-none-any.whl
Requirement already satisfied: Jinja2>=2.5.0 in /home/sbi-local/.virtualenvs/tempenv-7d78147760d35/lib/python2.7/site-packages (from mr.bob) (2.11.2)
Requirement already satisfied: six>=1.2.0 in /home/sbi-local/.virtualenvs/tempenv-7d78147760d35/lib/python2.7/site-packages (from mr.bob) (1.15.0)
Requirement already satisfied: setuptools in /home/sbi-local/.virtualenvs/tempenv-7d78147760d35/lib/python2.7/site-packages (from mr.bob) (44.1.1)
Requirement already satisfied: MarkupSafe>=0.23 in /home/sbi-local/.virtualenvs/tempenv-7d78147760d35/lib/python2.7/site-packages (from Jinja2>=2.5.0->mr.bob) (1.1.1)
Installing collected packages: mr.bob
ERROR: Exception:
Traceback (most recent call last):
  File "/home/sbi-local/.virtualenvs/tempenv-7d78147760d35/lib/python2.7/site-packages/pip/_internal/cli/base_command.py", line 216, in _main
    status = self.run(options, args)
  File "/home/sbi-local/.virtualenvs/tempenv-7d78147760d35/lib/python2.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "/home/sbi-local/.virtualenvs/tempenv-7d78147760d35/lib/python2.7/site-packages/pip/_internal/commands/install.py", line 421, in run
    pycompile=options.compile,
  File "/home/sbi-local/.virtualenvs/tempenv-7d78147760d35/lib/python2.7/site-packages/pip/_internal/req/__init__.py", line 90, in install_given_reqs
    pycompile=pycompile,
  File "/home/sbi-local/.virtualenvs/tempenv-7d78147760d35/lib/python2.7/site-packages/pip/_internal/req/req_install.py", line 831, in install
    requested=self.user_supplied,
  File "/home/sbi-local/.virtualenvs/tempenv-7d78147760d35/lib/python2.7/site-packages/pip/_internal/operations/install/wheel.py", line 829, in install_wheel
    requested=requested,
  File "/home/sbi-local/.virtualenvs/tempenv-7d78147760d35/lib/python2.7/site-packages/pip/_internal/operations/install/wheel.py", line 658, in _install_wheel
    file.save()
  File "/home/sbi-local/.virtualenvs/tempenv-7d78147760d35/lib/python2.7/site-packages/pip/_internal/operations/install/wheel.py", line 442, in save
    with self._zip_file.open(self.src_record_path) as f:
  File "/usr/lib/python2.7/zipfile.py", line 984, in open
    zinfo = self.getinfo(name)
  File "/usr/lib/python2.7/zipfile.py", line 932, in getinfo
    'There is no item named %r in the archive' % name)
KeyError: "There is no item named u'mrbob/tests/templates/encodingc\\u030c/mapc\\u030ca/c\\u0301a.bob' in the archive"

I've not had time to investigate if that error comes from a bug in that old package or is a bug in the new _install_wheel. It works fine with pip 20.1 though, as well as with pip 20.2 on python 3.

@sbidoul sbidoul added the Python 2 only Python 2 specific label Jul 29, 2020
@sbidoul
Copy link
Member Author

sbidoul commented Jul 29, 2020

cc/ @chrahunt

@sbidoul sbidoul added C: encoding Related to text encoding and likely, UnicodeErrors kind: crash For situations where pip crashes labels Jul 30, 2020
@sbidoul sbidoul changed the title Decoding issue in _install_wheel (pip 20.2, python 2) File encoding issue in _install_wheel (pip 20.2, python 2) Aug 2, 2020
@sbidoul sbidoul added this to the 20.2.1 milestone Aug 2, 2020
@sbidoul
Copy link
Member Author

sbidoul commented Aug 2, 2020

This looks like a regression in 20.2, so I'm tentatively adding it to the 20.2.1 milestone.

Encoding src_record_path in utf-8 before calling zipfile open and getinfo seems to fix the issue, although I'm not sure it's the right approach.

@chrahunt @uranusjr do you have an advice about this?

@uranusjr
Copy link
Member

uranusjr commented Aug 2, 2020

This sounds like a hairy topic. Quoting Python 2 documentation on zipfile:

There is no official file name encoding for ZIP files. If you have unicode file names, you must convert them to byte strings in your desired encoding before passing them to write(). WinZip interprets all file names as encoded in CP437, also known as DOS Latin.

So I guess there is no “correct” way here. Python 3 detects UTF-8 support with a header flag. I’m not familiar with the ZIP spec to comment on the approach, and would assume it’s the best we can do.

@uranusjr
Copy link
Member

uranusjr commented Aug 2, 2020

I posted #8684 for this.

@sbidoul
Copy link
Member Author

sbidoul commented Aug 3, 2020

The wheel spec says filenames are utf-8 encoded so I guess we can rely on this? But it does not look right to me that we need to do such conversions.

FWIW, I bissected it to 4bdb8bc. But the root cause is probably a bit earlier.

I suspect we should preserve ZipBackedFile.src_record_file as we obtained it when listing the zip file content, while we now have it converted to unicode.

@uranusjr
Copy link
Member

uranusjr commented Aug 3, 2020

Oh, I did not know the wheel spec says that explicitly! That would make things a lot easier.

The problem is a combination of 4bdb8bc and a previous change I made that introduced the RecordPath abstraction. Prior to those changes, wheels were unpacked to the filesystem and copied; the unpacking logic uses would use the filesystem encoding to interpret paths inside the ZIP. This “works” on POSIX systems because paths are binary on them. The RecordPath abstraction was introduced to make the paths work on Windows as well, but combined with extracting directly from wheel, it also broke wheel installation on POSIX since now we need to perform a text-binary conversion on Python 2.

@uranusjr
Copy link
Member

uranusjr commented Aug 3, 2020

I update #8684 to always assume UTF-8 instead. This should be quite straightforward now.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: encoding Related to text encoding and likely, UnicodeErrors kind: crash For situations where pip crashes Python 2 only Python 2 specific
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants