Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #5868: TypeError in move_wheel_files(). #5883

Merged
merged 2 commits into from
Oct 24, 2018

Conversation

cjerdonek
Copy link
Member

@cjerdonek cjerdonek commented Oct 14, 2018

This fixes #5868.

@cjerdonek cjerdonek added C: wheel The wheel format and 'pip wheel' command T: bugfix labels Oct 14, 2018
@cjerdonek cjerdonek added this to the 19.0 milestone Oct 14, 2018
# can be strings in some rows and integers in others.
def sorted_outrows(outrows):
"""Return the given "outrows" in sorted order."""
return sorted(outrows, key=lambda row: tuple(str(x) for x in row))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to coerce everything to string when outrows is appended to, instead to needing to deal with mixed types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an interesting discussion at the original issue after I wrote this PR:
#5868
So I think I actually want to "withdraw" this now. :) Or at least rethink it first as I think some decisions need to be made. It might be better to discuss at that issue.

I may close this or mark as "WIP" in the meantime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uranusjr In thinking more about this, I'm starting to think that what I originally proposed is okay. There are two reasons: (1) Coercing everything to a string on append seems more brittle because you need to add that logic each place you are appending, which can be multiple spots (or remember to use a common helper function when appending). (2) Coercing everything to a string seems to violate the spirit of PEP 376. That PEP says the third element should be a size (i.e. integer). Thus I think it would be better / safer to leave the rows themselves alone, and confine the coercion to the sort operation (which is just a cosmetic thing anyways).

Copy link
Member

@xavfernandez xavfernandez Oct 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sorting on the file path should be enough ? We should not be getting two lines for the same file.
(And add a warning/error if we end up with duplicate lines)
(Sorry for the multiple/numerous duplicated comments ^^)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @xavfernandez. I definitely support at least adding a warning, but I think that should be done as part of a separate issue and PR so as not to expand the scope. I meant for this PR only to prevent the sort operation from crashing.

Re: sorting by only the first element, it's true that using all elements might almost never matter, but is there any harm? Being able to guarantee determinism even in unlikely edge cases or error cases seems like a good thing. If we add validation later to ensure the file names will be unique, we can always adjust the sort operation then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine I guess, maybe with a comment explaining the expected format (name, hash, size) and the fact that we are ok with sorting integer as string (since normally the sorting only happens on name)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation + a comment explaining nuances (maybe with a pointer to this issue and/or #5868) should be enough IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good solution, @xavfernandez and @uranusjr. Thanks. I'll draft up a comment and repost.

@cjerdonek
Copy link
Member Author

I'm closing this since there was more discussion at issue #5868 after I wrote this, and I think some decisions need to be resolved first. This PR can always be reopened.

@cjerdonek
Copy link
Member Author

cjerdonek commented Oct 17, 2018

@uranusjr Regarding this PR, one thing that's holding me back is whether it's really better not to crash. Do we know if preventing a crash here would cause an even harder-to-diagnose problem later on in the chain, because of duplicate entries in the RECORD file?

Your anti-Postel's principle comment would say that it's better to crash, right?

@uranusjr
Copy link
Member

I think the problem is that the spec doesn’t forbid duplicate entries. If that is to be allowed, pip will need to handle it.

I agree potential duplicate entries could lead to harder-to-diagnose problems, but first we’ll need to do amend dist-info and wheel specs to describe whether duplicates may exist, and how they should be treated (if they are allowed). The wheel spec would also need to say whether it can contain certain entries, or specify the installer’s behaviour (ignored/overridden) if those paths exist.

@cjerdonek
Copy link
Member Author

@uranusjr Good point about the spec. Thanks. Regarding this PR, do you approve of it? Is there any downside of it in your opinion? One possible downside of what you suggested in PR #5890 is that it doesn't provide determinism in as many cases (e.g. in the duplicate entry edge case), which was the reason for sorting in the first place. Since duplicate names is an edge case that could be worth testing, it seems like it would be good to have determinism there as well (and seems it couldn't hurt).

@uranusjr
Copy link
Member

I feel the implementation is good enough if the intention is purely to make ordering deterministic. Otherwise it could be better to sort the last item as integer… maybe?

@cjerdonek
Copy link
Member Author

Otherwise it could be better to sort the last item as integer… maybe?

Yes, that's something that occurred to me, too. But then it adds the complication of what to do with the empty string (-1?). And then you also have the issue we've discussed in other forms about lines that don't conform to the spec. What if the element doesn't parse to an integer -- do we want to be introducing validation in this PR?

@cjerdonek
Copy link
Member Author

I added an expanded comment to the patch, as suggested. Let me know if it looks okay.

or the empty string.
"""
# Normally, there should only be one row per path, so the second and
# third elements of each row don't normally come into play when sorting.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two “normally” in this sentence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion of how it should be rephrased? The meaning doesn't seem correct to me if either one is removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe “Normally, there should …, and the second and third in this case …”?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K

@cjerdonek cjerdonek merged commit 951e0cb into pypa:master Oct 24, 2018
@cjerdonek cjerdonek deleted the fix-move-wheel-files-sort branch October 24, 2018 16:20
@lock
Copy link

lock bot commented May 31, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label May 31, 2019
@lock lock bot locked as resolved and limited conversation to collaborators May 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation C: wheel The wheel format and 'pip wheel' command
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sorting TypeError in move_wheel_files() during install (e.g. Poetry)
3 participants