Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"New File" appears after "New File (1)" #3

Closed
catmanjan opened this issue Oct 10, 2013 · 17 comments
Closed

"New File" appears after "New File (1)" #3

catmanjan opened this issue Oct 10, 2013 · 17 comments
Assignees

Comments

@catmanjan
Copy link

Not sure if there is a workaround for this, but when sorting path names the order does not conform to Window's order.

@catmanjan
Copy link
Author

eg

@SethMMorton
Copy link
Owner

Can you be more specific? What is Window's order? I don't use Windows.

@catmanjan
Copy link
Author

example
See picture.

If you natsort

["Folder", "Folder (2)", "Folder (3)" ... ]

It results in

["Folder (2)", "Folder (3)" ... "Folder" ]

@SethMMorton
Copy link
Owner

Which version are you using? I'm using 3.0.1.

>>> import natsort
>>> a = ["Folder (3)", "Folder (2)", "Folder"]
>>> natsort.natsorted(a)
['Folder', 'Folder (2)', 'Folder (3)']

I get the output you expect on my Mac.

@catmanjan
Copy link
Author

Okay, it doesn't work when including a path:

>>> import natsort
>>> a=["C:\Folder\File", "C:\Folder (2)\File", "C:\Folder (3)\File"]
>>> natsort.natsorted(a)
['C:\\Folder (2)\\File', 'C:\\Folder (3)\\File', 'C:\\Folder\\File']

Even after unicode:

>>> import natsort
>>> a=[unicode("C:\Folder\File"), unicode("C:\Folder (2)\File"), unicode("C:\Folder (3)\File")]
>>> natsort.natsorted(a)
[u'C:\\Folder (2)\\File', u'C:\\Folder (3)\\File', u'C:\\Folder\\File']

3.0.1 on Python2.7

@SethMMorton
Copy link
Owner

This is outside the scope of the natsort algorithm. When natsort parses a string, it creates tuples of strings and numbers, and then sorts the tuples using python's builtin mechanisms. The strings you gave would be parsed as

('C:\\Folder (', 2, ')\\File',)
('C:\\Folder (', 3, ')\\File',)
('C:\\Folder\\File',)

Python sorts tuples by first element, then if there are a group that have the same first element, it does the second element, etc... If you were to sort the strings 'C:\\Folder (' and 'C:\\Folder\\File', the second would be last because ' ' < '\\' == True. You will get the same behavior using the builtin sorted function.

I assume that Windows treats these cases specially to sort folders in the manner you show so that it is more user friendly. There is really no way to make a general algorithm that will do this correctly because the characters immediately following "Folder" are different.

In the first case where it isn't a full path, they are parsed as

('Folder (', 2, ')',)
('Folder (', 3, ')',)
('Folder',)

In this case, "Folder" comes first because it and "Folder (" have the same first part of the string, but "Folder (" has extra trailing characters.

To work around this, I recommend making a list parallel to the paths that contains only the "Folder" part, use index_natsorted on that, then use that index to sort the original list of paths.

>>> paths = [r"C:\Folder\File", r"C:\Folder (2)\File", r"C:\Folder (3)\File"]
>>> names = [r"Folder", r"Folder (2)", r"Folder (3)"]
>>> index = natsort.index_natsorted(names)
>>> [paths[i] for i in index]
['C:\\Folder\\File', 'C:\\Folder (2)\\File', 'C:\\Folder (3)\\File']

Or, you could use dictionary keys

>>> paths = { r"Folder": r"C:\Folder\File", r"Folder (2)": r"C:\Folder (2)\File", r"Folder (3)": r"C:\Folder (3)\File", }
>>> [paths[key] for key in natsort.natsorted(paths)]
['C:\\Folder\\File', 'C:\\Folder (2)\\File', 'C:\\Folder (3)\\File']

A third workaround is to replace Folder with Folder (1) so that all the folders look the same.

>>> paths =  [r"C:\Folder (1)\File", r"C:\Folder (2)\File", r"C:\Folder (3)\File"]
>>> natsort.sorted(paths)
['C:\\Folder (1)\\File', 'C:\\Folder (2)\\File', 'C:\\Folder (3)\\File']

@SethMMorton
Copy link
Owner

Did any of these suggestions help?

@catmanjan
Copy link
Author

I believe index_natsorted would work for a trivial case, the problem is each folder could contain a similarly named set of sub directories which would have to be index and sorted. This was a major performance hit for large directory structures, even after using the relatively fast os.walk.

Have decided to just leave this as a limitation, thanks anyway.

@SethMMorton
Copy link
Owner

Sorry I couldn't help. Did you try replacing "Folder" with "Folder (1)"... I'm not sure if that would help but you could do a path.replace(r"Folder\", r"Folder (1)\') before processing, then ``path.replace(r"Folder (1)", r"Folder')` after processing.

I suppose that in the worst case, you could take the sorted list and then move the last element to the front, since at least you know that the "Folder" always gets put last.

Best of luck!

@catmanjan
Copy link
Author

Unfortunately there is no guarantee that the name of the folders will be "Folder", not really worth the effort to figure out which unnumbered folder name matches which list.

Thanks!

@SethMMorton
Copy link
Owner

I realize it's been a while and you might have moved on, but I think I have thought of a way to make this work for you. You need to tell natsort to sort each file component individually, because on its own each path component is sorted correctly. If you have access to the path module you can try something like this

natsorted(paths, key=lambda x: path(x).splitall())

Or, you can adapt something from this page to do something without the path module.

I am on vacation without access to a computer, so I cannot check this out, but I imagine this should work for you. I am thinking it might be something nice to include as an example in the documentation.

@SethMMorton SethMMorton reopened this Jul 1, 2014
@SethMMorton
Copy link
Owner

I'd like to point out that the reason this works is that natsort can now recursively descent into lists of lists to sort. When you originally posted this question, natsort had not yet learned how to do this. I didn't make the connection that that update would help you till just now.

I really hope this works, because if it can I think that will really be helpful to lots of people! Please let me know if you get it to work!!

@SethMMorton
Copy link
Owner

I have just verified that this works!

>>> import natsort
>>> import path
>>> a = ['/p/folder/test', '/p/folder (5)/test', '/p/folder (10)/test', '/p/folder (1)/test']
>>> natsort.natsorted(a)
['/p/folder (1)/test', '/p/folder (5)/test', '/p/folder (10)/test', '/p/folder/test']
>>> natsort.natsorted(a, key=lambda x: path.path(x).splitall())
['/p/folder/test', '/p/folder (1)/test', '/p/folder (5)/test', '/p/folder (10)/test']

@SethMMorton
Copy link
Owner

In the next release, I am planning on adding an option to natsort that will cause it to interpret input as paths, so that the user need not depend on path.py. My proposed API is something like this:

>>> a = ['/p/folder/test', '/p/folder (5)/test', '/p/folder (10)/test', '/p/folder (1)/test']
>>> natsort.natsorted(a, as_path=True)
['/p/folder/test', '/p/folder (1)/test', '/p/folder (5)/test', '/p/folder (10)/test']

Any objections or opinions on this?

@SethMMorton SethMMorton self-assigned this Jul 15, 2014
@SethMMorton
Copy link
Owner

This has been added as of commit d3bd9e4. Use the as_path=True option to natsorted to get it to work. An official release will soon follow.

I hope this helps.

@SethMMorton
Copy link
Owner

@catmanjan Check out version 3.4.0, which has support for sorting this correctly.

@SethMMorton
Copy link
Owner

Hello from the future. The preferred way to handle this now is

>>> from natsort import natsorted, ns
>>> a = ['/p/folder/test', '/p/folder (5)/test', '/p/folder (10)/test', '/p/folder (1)/test']
>>> natsorted(a, alg=ns.PATH)
['/p/folder/test', '/p/folder (1)/test', '/p/folder (5)/test', '/p/folder (10)/test']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants