Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add join keyword to readdir #129

Merged
merged 7 commits into from
Jan 15, 2021
Merged

add join keyword to readdir #129

merged 7 commits into from
Jan 15, 2021

Conversation

ericphanson
Copy link
Member

This keyword was added to Base in JuliaLang/julia#33113, which made it into Julia 1.4.

I couldn't run the tests locally, but I tested with an example and it seemed to work. The new tests in verify_files(path::AbstractPath) need the Base method, so will fail on Julia 1 - 1.3. Should I put them behind a VERSION check, or just drop them?

@mattBrzezinski
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Jan 12, 2021
@bors
Copy link
Contributor

bors bot commented Jan 12, 2021

try

Build failed:

@mattBrzezinski
Copy link
Member

I couldn't run the tests locally, but I tested with an example and it seemed to work.

Any particular errors which popped up? I assume that it's probably a credential thing. At the very least you need credentials in ENV VARs / ~/.aws/config / ~/.aws/credentials files with full S3 perms.

The new tests in verify_files(path::AbstractPath) need the Base method, so will fail on Julia 1 - 1.3. Should I put them behind a VERSION check, or just drop them?

I think keeping this behind a VERSION check would be best. Keeping packages on the same LTS as JuliaLang is the most easy way of handling things. Once 1.6 comes out as LTS, we can then dump the version check and set a new min ver.

@ericphanson
Copy link
Member Author

Any particular errors which popped up? I assume that it's probably a credential thing. At the very least you need credentials in ENV VARs / ~/.aws/config / ~/.aws/credentials files with full S3 perms.

Ah, this would be it, I'm on a restricted role. I could set something up but for this PR hopefully CI will be enough. Would you mind re-running bors? I think I fixed the issues in my last two commits.

@mattBrzezinski
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Jan 12, 2021
@bors
Copy link
Contributor

bors bot commented Jan 12, 2021

try

Build failed:

@ericphanson
Copy link
Member Author

sorry, looks like I had a copy-paste error there, try again?

@mattBrzezinski
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Jan 12, 2021
@bors
Copy link
Contributor

bors bot commented Jan 12, 2021

try

Build succeeded:

@ericphanson
Copy link
Member Author

Bump / Is anything I need to do here? 🙂

@mattBrzezinski
Copy link
Member

@rofinn has more strong opinions on S3Path stuff than I do.
I think this is fine, maybe locking the join=true as a separate function behind a VERSION check would be my only comment.

Copy link
Member

@rofinn rofinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me. Just a couple minor suggestions, I've also made an issue to add this to FilePathsBase.jl as well.

@rofinn
Copy link
Member

rofinn commented Jan 14, 2021

@mattBrzezinski Do we need to manually trigger bors or something?

@mattBrzezinski
Copy link
Member

@mattBrzezinski Do the tests not run anymore for this repo? I know Sam O'Connor ran the live tests in his personal account, but I thought we changed that when we started using bors?

Yes they work w/ bors using the Invenia Public CI account. I assume you're referring to the original body. Eric had responded in this thread;

Ah, this would be it, I'm on a restricted role. I could set something up but for this PR hopefully CI will be enough.

@ericphanson
Copy link
Member Author

thanks for the review! I've made those changes.

@mattBrzezinski
Copy link
Member

bors r+

@bors
Copy link
Contributor

bors bot commented Jan 15, 2021

@bors bors bot merged commit 1f058fc into JuliaCloud:master Jan 15, 2021
@ericphanson ericphanson deleted the eph/readdir_join branch January 15, 2021 18:12
sort && sort!(results)

# Return results, possibly joined with the root path if join=true
return join ? joinpath.(fp, results) : results
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, one thing I just realized about this is that it introduces a type instability cause join determines whether we return strings or paths.

Copy link
Member Author

@ericphanson ericphanson Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does-- I've been using this and for me having the functionality has been well worth it, but it might depend on your use-case. Generally either I am using it in the REPL (in which case it doesn't matter), or in code that generally ends up like

for file in readdir(path; join=true)
    # some kind of filtering...
    endswith(string(file), ".arrow") || continue
    process_file!(results, file)
end

and the function-barrier (process_file!) means the instability is not an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to remember, is there a reason you didn't want to use readpath? I'm wondering if for now I should just add those flags to both readpath and readdir, but just make the only difference that one return paths and one returns strings? In the future, I may make a breaking release which unifies the API and always returns paths for readdir.

Copy link
Member Author

@ericphanson ericphanson Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, is that what readpath does? I didn't know that function exists and it's not really documented what it does:

julia> using FilePathsBase

help?> readpath
search: readpath

  readpath(fp::P) where {P <: AbstractPath} -> Vector{P}

Usually I stick to the Base API since I might have a mix of S3Paths and Base (string) paths.

(To be clear, join doesn't just return S3Path's vs strings, but rather it determines or not it's the full path, including the bucket etc, or just the filename).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(To be clear, join doesn't just return S3Path's vs strings, but rather it determines or not it's the full path, including the bucket etc, or just the filename).

Yeah, I was just wondering if it would have made more sense to do joinpath.(string(fp), results) in the readdir function and left returning paths to readpath?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather accept the type instability than to try to explain to users (and remember myself) why it gives back a full path but you can't actually use it to read bytes etc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I've opted away from the type instability in FilePathsBase since I have less control over how it'll be used. It also means I can support that keyword without introducing a breaking change. I may change readdir to always return a path type if a path type is given in the future... in which case I'll plan to update S3Path as well.

rofinn/FilePathsBase.jl#152

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may change readdir to always return a path type if a path type is given

is there a way to do that while matching Base’s semantics? Like a RelativePath type or something?

Copy link
Member

@rofinn rofinn Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked into the RelativePath idea, and it's very verbose. Without a proper traits based approach I don't think that's gonna happen anytime soon. Mostly likely I'd just wrap the resulting string in an appropriate path type and expect subtypes to decide what they want to do. I think there's also an issue where folks want basename to return a path as well. Obviously that change would be very breaking and I'm not clear on the tradeoffs yet, so it probably won't happen right away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants