Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paths should be sorted like strings #152

Closed
Gilthans opened this issue Aug 1, 2022 · 6 comments
Closed

Paths should be sorted like strings #152

Gilthans opened this issue Aug 1, 2022 · 6 comments

Comments

@Gilthans
Copy link
Contributor

Gilthans commented Aug 1, 2022

Hey there,
Thanks for the great package!
I was wondering why ns.PATH isn't on by default when using natsorted.

It would seem intuitive that a list of paths would be naturally sorted, and actually had to debug to realize I need to use os_sorted or natsort(files, alg=PATH).
Is this intentional?
If not, I'd be happy to submit a PR

@SethMMorton
Copy link
Owner

Yes, this is intentional, for two reasons.

  • The ns.PATH algotithm is significantly slower than the default algorithm, both in the initial application of the key and also in the sorting itself.
  • There are many different definitions of "natural sort", and what seems like the obvious definition to one person is a counter-intuitive definition for another. When natsort was first released ns.FLOAT was the default sorting algorithm and this caused lots of issues, and people created competing libraries because they thought natsort was buggy (instead of taking the time to understand what was going on like you have, sigh...).
    • I now believe the best course of action is to use the most "minimal" definition of a natural sort as the defautl, one that will cause the least amount of surprise for the most number of users, but then provide expanded definitions (like ns.PATH) when needed.

@Gilthans
Copy link
Contributor Author

Gilthans commented Aug 7, 2022

Thanks for the clarifcation Seth!
Do you have perhaps an example of where this logic is counterintuitive?
I had a list of paths that looks like /a/b/1/something, /a/b/2/something, etc, which I expected would correctly be sorted with nat-sort, but /a/b/10/something followed /a/b/1/something unfortunately

@SethMMorton
Copy link
Owner

SethMMorton commented Aug 7, 2022

Hmm... I'm confused. "10" should follow "1", and also for the example values you provided the default sorting and ns.PATH provide the same results.

% python
>>> x = ["/a/b/1/something", "/a/b/2/something", "/a/b/10/something"]
>>> sorted(x)
['/a/b/1/something', '/a/b/10/something', '/a/b/2/something']
>>> import natsort
>>> natsort.natsorted(x)
['/a/b/1/something', '/a/b/2/something', '/a/b/10/something']
>>> natsort.natsorted(x, alg=natsort.PATH)
['/a/b/1/something', '/a/b/2/something', '/a/b/10/something']
>>> natsort.__version__
'8.1.0'

Perhaps you could elaborate more on the problem you actually encountered?


Also, to be clear, ns.PATH was added to support the problem described here: #3. Unless you have data that looks like this then ns.PATH shouldn't actually be needed.

@Gilthans
Copy link
Contributor Author

Gilthans commented Aug 8, 2022

Interesting, it seems the issue lies with using Path objects for sorting:

>>> from natsort import natsorted, PATH
>>> from pathlib import Path
>>> x = ["/a/b/1/something", "/a/b/2/something", "/a/b/10/something"]
>>> natsorted(x)
['/a/b/1/something', '/a/b/2/something', '/a/b/10/something']
>>> natsorted(Path(f) for f in x)
[WindowsPath('/a/b/1/something'), WindowsPath('/a/b/10/something'), WindowsPath('/a/b/2/something')]
>>> natsorted((Path(f) for f in x), alg=PATH)
[WindowsPath('/a/b/1/something'), WindowsPath('/a/b/2/something'), WindowsPath('/a/b/10/something')]

This matches my case since I was sorting the output of d.iterdir() directly. I misattributed the cause.
Perhaps pathlib.Path should be special-cased to be cast to a string during key generation?
This would be useful in my case at least, and to me sounds more intuitive than the current behaviour. WDYT?

@SethMMorton
Copy link
Owner

SethMMorton commented Aug 9, 2022

Ah yes. This is expected but surprising behavior (see #16 for someone else suggesting something similar). The suggestion in that issue was to auto-use ns.PATH if Path objects are found, but that won't work (trust me on this).

However, auto-converting PurePath objects to str if ns.PATH is not given would be fine, so long as it does not significantly degrade performance. If you want to submit a PR to do that that would be great.

<unrelated rant that is not your fault> I really feel like the decision to make the pathlib.Path object not a subclass of str was a bad design decision that has caused more trouble than was worth. Here is just one of the ramifications. </unrelated rant that is not your fault>

@Gilthans Gilthans changed the title Should ns.PATH be on by default? Paths should be sorted like strings Aug 11, 2022
@SethMMorton
Copy link
Owner

Closed by #153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants