Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #5868: TypeError in move_wheel_files(). #5883

Merged
merged 2 commits into from
Oct 24, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions news/5868.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix sorting `TypeError` in `move_wheel_files()` when installing some packages.
23 changes: 22 additions & 1 deletion src/pip/_internal/wheel.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,26 @@ def message_about_scripts_not_on_PATH(scripts):
return "\n".join(msg_lines)


def sorted_outrows(outrows):
"""
Return the given rows of a RECORD file in sorted order.

Each row is a 3-tuple (path, hash, size) and corresponds to a record of
a RECORD file (see PEP 376 and PEP 427 for details). For the rows
passed to this function, the size can be an integer as an int or string,
or the empty string.
"""
# Normally, there should only be one row per path, in which case the
# second and third elements don't come into play when sorting.
# However, in cases in the wild where a path might happen to occur twice,
# we don't want the sort operation to trigger an error (but still want
# determinism). Since the third element can be an int or string, we
# coerce each element to a string to avoid a TypeError in this case.
# For additional background, see--
# https://github.com/pypa/pip/issues/5868
return sorted(outrows, key=lambda row: tuple(str(x) for x in row))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to coerce everything to string when outrows is appended to, instead to needing to deal with mixed types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an interesting discussion at the original issue after I wrote this PR:
#5868
So I think I actually want to "withdraw" this now. :) Or at least rethink it first as I think some decisions need to be made. It might be better to discuss at that issue.

I may close this or mark as "WIP" in the meantime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uranusjr In thinking more about this, I'm starting to think that what I originally proposed is okay. There are two reasons: (1) Coercing everything to a string on append seems more brittle because you need to add that logic each place you are appending, which can be multiple spots (or remember to use a common helper function when appending). (2) Coercing everything to a string seems to violate the spirit of PEP 376. That PEP says the third element should be a size (i.e. integer). Thus I think it would be better / safer to leave the rows themselves alone, and confine the coercion to the sort operation (which is just a cosmetic thing anyways).

Copy link
Member

@xavfernandez xavfernandez Oct 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sorting on the file path should be enough ? We should not be getting two lines for the same file.
(And add a warning/error if we end up with duplicate lines)
(Sorry for the multiple/numerous duplicated comments ^^)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @xavfernandez. I definitely support at least adding a warning, but I think that should be done as part of a separate issue and PR so as not to expand the scope. I meant for this PR only to prevent the sort operation from crashing.

Re: sorting by only the first element, it's true that using all elements might almost never matter, but is there any harm? Being able to guarantee determinism even in unlikely edge cases or error cases seems like a good thing. If we add validation later to ensure the file names will be unique, we can always adjust the sort operation then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine I guess, maybe with a comment explaining the expected format (name, hash, size) and the fact that we are ok with sorting integer as string (since normally the sorting only happens on name)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation + a comment explaining nuances (maybe with a pointer to this issue and/or #5868) should be enough IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good solution, @xavfernandez and @uranusjr. Thanks. I'll draft up a comment and repost.



def move_wheel_files(name, req, wheeldir, user=False, home=None, root=None,
pycompile=True, scheme=None, isolated=False, prefix=None,
warn_script_location=True):
Expand Down Expand Up @@ -511,7 +531,8 @@ def _get_script_text(entry):
outrows.append((normpath(f, lib_dir), digest, length))
for f in installed:
outrows.append((installed[f], '', ''))
for row in sorted(outrows):
# Sort to simplify testing.
for row in sorted_outrows(outrows):
writer.writerow(row)
shutil.move(temp_record, record)

Expand Down
23 changes: 23 additions & 0 deletions tests/unit/test_wheel.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,29 @@ def test_get_entrypoints(tmpdir, console_scripts):
)


@pytest.mark.parametrize("outrows, expected", [
([
('', '', 'a'),
('', '', ''),
], [
('', '', ''),
('', '', 'a'),
]),
([
# Include an int to check avoiding the following error:
# > TypeError: '<' not supported between instances of 'str' and 'int'
('', '', 1),
('', '', ''),
], [
('', '', ''),
('', '', 1),
]),
])
def test_sorted_outrows(outrows, expected):
actual = wheel.sorted_outrows(outrows)
assert actual == expected


def test_wheel_version(tmpdir, data):
future_wheel = 'futurewheel-1.9-py2.py3-none-any.whl'
broken_wheel = 'brokenwheel-1.0-py2.py3-none-any.whl'
Expand Down