Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experimental traits to more cleanly extend to key-value stores #159

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ExpandingMan
Copy link
Contributor

This is a (very small) first step of an experimental implementation of traits to attempt to address some of the problems with this issue.

Currently, FilePathsBase makes quite a lot of assumptions about the semantics of paths, I think it's fair to say that it assumes that paths are either POSIX compatible or windows. This is arguably unnecessarily restrictive to use cases which mirror these very closely.

In this draft I had created the traits DirectoriesType and PermissionsType. The former is to indicate whether directories are their own objects in a file system or whether they are merely implied by the path structure, the latter being the case for key-value stores such as S3. PermissionsType is whether file or directory permissions are supported by the file system.

An objection I anticipate to this proposal is that key-value stores are not file systems and it is a mistake to treat them like one. This is a wise objection, however there seem to be a number of cases (notably AWS S3, but also other AWS key-value stores) in which the key-value store is so commonly treated as a file system that using methods such as those provided by FilePathsBase is unavoidable. Indeed, the idea that S3 is a file system seems to be deeply ingrained in much of the "big data" ecosystem (horrific though that ecosystem might be). Therefore, I think it is at least worth asking whether FilePathsBase can be made general enough to handle those cases more seamlessly.

The "demo" I have in this draft defines methods in terms of intermediate methods which check the interface, so that attempts to use unimplemented methods will result in method errors. The real work to be done is re-writing some of the fall-back methods so that they make sense with the provided traits (I haven't done even a single example of this yet).

Obviously, I have done very little work here, but I wanted to gauge interest and receptiveness to this idea before I put any real time into this.

@codecov
Copy link

codecov bot commented Jan 25, 2022

Codecov Report

Merging #159 (b9bbf3a) into master (534b301) will decrease coverage by 0.29%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #159      +/-   ##
==========================================
- Coverage   91.91%   91.61%   -0.30%     
==========================================
  Files          12       12              
  Lines        1224     1228       +4     
==========================================
  Hits         1125     1125              
- Misses         99      103       +4     
Impacted Files Coverage Δ
src/FilePathsBase.jl 71.42% <0.00%> (-28.58%) ⬇️
src/path.jl 87.66% <0.00%> (-0.29%) ⬇️
src/system.jl 91.33% <0.00%> (-0.62%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 534b301...b9bbf3a. Read the comment docs.

Copy link
Owner

@rofinn rofinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the idea as a way to create more clear cutoffs for unsupported functionality. I do worry that this isn't that much of a change from just moving more of the default behaviour to just work on SystemPath.

Base.isdir(fp::AbstractPath) = isdir(mode(fp))

Base.isdir(fp::AbstractPath) = isdir(directoriestype(fp), fp)
Base.isdir(::DirectoriesExplicit, fp::AbstractPath) = isdir(mode(fp))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the assumption is that if I have:

directoriestype(::Type{MyPath}) = DirectoriesImplicit()

Then I also need to implement Base.isdir(::DirectoriesImplicit, fp::MyPath)? So the benefit is just when you throw the MethodError... or are we expecting folks to extend DirectoriesType with some shared logic that could be shared with multiple key-value path types?

Base.isfile(fp::AbstractPath) = isfile(mode(fp))

#TODO: most of these clearly only make sense on local FS, should they have traits?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a trait for this is a very good idea if the primary goal is to more cleanly fail for unsupported functionality. This could still get weird in the sense that S3Paths do still have permissions, but they work differently.

@ExpandingMan
Copy link
Contributor Author

I'm growing a little skeptical of my own idea. On one hand, I definitely stand by the idea that this package assumes too much of the underlying file system implementations, but on the other, I think we should be very careful not to completely overhaul this package based on S3 alone. Yes, in principle most of what would be done for S3 could be extended to other key-value stores, but S3 is the only use case that I feel certain will be used.

I think it would be productive if we could collectively come up with some more concrete examples of how this package might be used in the future. If those are too hard to come by, maybe generalizing this is a bad idea.

@rofinn
Copy link
Owner

rofinn commented Feb 8, 2022

I think a few obvious examples include:

  1. Other cloud providers like azure
  2. FTP client paths Meet FilePaths API invenia/FTPClient.jl#80
  3. SSH filepaths
  4. Navigating and manipulating archives like zip or tar files
  5. Better integration into FileTrees.jl https://github.com/shashi/FileTrees.jl

One thing that keeps bugging me is that there's a weird overlap between Dictionaries.jl, FilePathsBase.jl, FileTrees.jl and AxisSets.jl in which we often want to just describe algorithms for search/navigating prefixes against arbitrary objects (e.g., local files, remote objects, in-memory julia types). I wonder if some of the S3 issues would be better served with a kind of index API which must be implemented independent of authorization or read/write or permissions logic?

@ExpandingMan
Copy link
Contributor Author

Ok, that's definitely more examples than I had come up with. Also @omus suggested something like this in the other thread, I think not realizing I had done this.

My interest in this is renewed, I will work on it some more some time this week.

@ExpandingMan
Copy link
Contributor Author

I've revisited this today. On further reflection, it does seem that extending the interface this way is a good idea especially for AWSS3, but it is slightly more complicated than I had hoped.

The main breaking change I propose to the interface as of now is that exists should always return false on directories for key-value stores. We can provide a separate function indicating whether or not there exist any files at all in a certain set of keys.

@rofinn
Copy link
Owner

rofinn commented Mar 25, 2022

The main breaking change I propose to the interface as of now is that exists should always return false on directories for key-value stores. We can provide a separate function indicating whether or not there exist any files at all in a certain set of keys.

So the issue is that s3://mybucket/my/prefix/ is a perfectly valid object/filename on S3, separate from it being used as a prefix, so saying it doesn't exist seems equally dishonest. I think if you don't want to handle that case then it should be a method error.

@ExpandingMan
Copy link
Contributor Author

I don't know yet... regardless I don't think there's going to be any avoiding some breaking changes (though I think they can be minor).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants