Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing S3 virtual paths and S3 aliases as data source #1756

Merged

Conversation

mapk-amazon
Copy link
Contributor

Related issue: samtools/samtools#1984

The two main changes are:

  • In s3_open_v4 an additional hopen is called if a redirect (status code between 300 and 399) is detected.
  • In redirect_endpoint_callback the url is parsed and a new one constructed respecting the virtual path style in Amazon S3 links.

Some minor changes:

  • Saving the address style of S3 web links in s3_auth_data to allow the use in redirect_endpoint_callback
  • Replacing the hardcoded initial value {0, 0, NULL} by KS_INITIALIZE for various variables.

Currently only s3 pathing works with a redirect;
With this fix s3 virtual gets a virtual redirect url;
additionally, performs an additional redirect if necessary.
Copy link
Member

@daviesrob daviesrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a first glance, this mostly looks OK apart from the file handle leakage. I have still to check it against the scenario described in #1984, though.

hfile_s3.c Outdated
@@ -1298,6 +1311,20 @@ static hFILE *s3_open_v4(const char *s3url, const char *mode, va_list *argsp) {

if (fp == NULL) goto error;

if (http_response >= 300 && http_response < 400) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing this for all 300 codes might be a bit over-tolerant. It would be safer to check for the specific codes we expect for this sort of redirection.

Also, you need an hclose_abruptly(fp); here to avoid leakage of the old connection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. Thank you for the review!

I specified in the latest commit the http_response to just check for 307. Also, I (hopefully) fixed the leakage.

@daviesrob daviesrob self-assigned this Mar 4, 2024
@daviesrob
Copy link
Member

Thanks, I can confirm that this works in my tests.

@daviesrob daviesrob merged commit 7d3efee into samtools:develop Mar 7, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants