-
Notifications
You must be signed in to change notification settings - Fork 33
When a HTTPFileHandle has an ETag, and that changes over time, consider the read unreliable #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…er the read unreliable
This solves https://github.com/duckdblabs/duckdb-internal/issues/5786, where if a file is changed mid-read we would not detect it. Currently there is a precedence for just throwing a:
when a file is queried on an range that is not compatible with new length. I think throwing HTTPException is OK here, main question I have is if we need to allow this to be skipped (say a "unreliable_etag" option). |
@carlopi can you merge this to v1.4-andium? Or does this need to also be in main? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have other ways to test this for existing httpfs workflows? Is it possible there are sometimes false positives?
Otherwise just the error on the comment
out_offset = 0; | ||
|
||
if (response.HasHeader("ETag") && !hfh.etag.empty() && response.GetHeaderValue("ETag") != hfh.etag) { | ||
throw HTTPException(response, "ETag was initially %s and now it returned %s, this likely means the remote file has changed.\nTry restart the read / close file-handle and restart (e.g. `DETACH` in case of database file).", hfh.etag, response.GetHeaderValue("ETag")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
\nTry to restart the read or close the file-handle and read the file again (e.g. `DETACH` in the file is a database file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably remove the file metadata from the HTTPFS cache so that we are sure to do a new HEAD
request upon the next time we read the file (important for when enable_http_metadata_cache
is enabled).
I would probably prefer this out in nightly for 1.3.2, error mode is potentially very bad AND allows to test this already.
That's a big question, I am not sure to what degree this can be really tested.
I am not sure, one positive thing is that |
I think main points to the v1.4 branch, so nightlies should stil be built/tested. Or are nightlies still building the v1.3-ossivalis branch? Wouldn't it require a bump in submodule pointer anyway?
|
if (response.HasHeader("ETag") && !hfh.etag.empty() && response.GetHeaderValue("ETag") != hfh.etag) { | ||
throw HTTPException(response, "ETag was initially %s and now it returned %s, this likely means the remote file has changed.\nTry restart the read / close file-handle and restart (e.g. `DETACH` in case of database file).", hfh.etag, response.GetHeaderValue("ETag")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samansmink's: we do add an option, and error message
I am not completely sure what should be the mitigation strategy here, should there be a setting to disable this check?