-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
samefile() is not implemented #246
Comments
Thanks for opening the issue and for offering to contribute. We could base an implementation for The problem is: st_ino and st_dev are basically 0 for filesystems other than local. universal_pathlib/upath/_stat.py Lines 183 to 197 in 92675f0
So first we'd need to make sure that if a UPath with another protocol is provided that this method returns For a start, we could test if just comparing the stat_results for both will cover all tested filesystems. A test would be implemented here: universal_pathlib/upath/tests/cases.py Lines 307 to 308 in 92675f0
And we should at least add one test comparing two different protocols. Let me know if anything is unclear, |
I suspect that only those two fields are compared for the sake of optimization. In the context of local filesystems, if those two things are the same between two fstats, then there is no need to check anything else because the query has been satisfied. Also, depending on the nature of the stat object, I could imagine possible race conditions if they can change underneath you, so fewer comparisons is your friend. My own naive implementation would be to fast-path PosixUPath vs. (PosixUPath/Path) comparisons (is that the correct class for local protocol?), and then for everything else, normalize the paths and do an equality check (which would come back False for differing protocol). I think this risks some false negatives, and it isn't the fastest. Actually, put a pin that idea... Would it make sense to have |
I would suggest starting with the general case. People don't use UPath or pathlib for performance, they usually use it for convenience. If performance becomes an important enough issue for users, I would recommend moving the critical code path to fsspec. The UPath instance provides access to the fsspec filesystem, storage_options and urlpath for exactly this reason. Collecting concrete cases where this was necessary will then inform us where we can have the most impact with optimizations in universal_pathlib. |
Implementation wise I think I would start with comparing the stat objects for both if protocols match. For this it would be best to just implement |
Is there a plan for implementing
UPath.samefile()
? I can see how such a method might be difficult to define universally, and that it isn't the same thing asUPath.__eq__()
, which is implemented, but this broke some code of mine that was usually usingPath.samefile()
and then failed when I passed it aUPath
instead of aPath
.If this is a TODO, then is there anything I can do to help?
The text was updated successfully, but these errors were encountered: