-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gw): Cache-Control: only-if-cached #9082
Conversation
This implements the only-if-cached behavior documented in specs: https://github.com/ipfs/specs/blob/main/http-gateways/PATH_GATEWAY.md#cache-control-request-header https://github.com/ipfs/specs/blob/main/http-gateways/PATH_GATEWAY.md#only-if-cached-head-behavior
return true | ||
} | ||
} | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have no happy path for a GET request so it will exit by that point.
In other words.
Let's assume I only store the root block of a file.
You first stat the root block, no error, request is not HEAD and so you return false, and the request continue as usual.
However when the following code tries to actually send the unixfs file, they will hit the network to fetch the childs.
It's not complient with my interpretation of the gateway spec:
if the gateway already has the data
Well we don't know at that point, we know we have the root but more ? Who knows ?
I think there is two solutions here:
- For now, the spec is just making things up until it matches whatever our code do.
You can add aSHOULD
(RFC2119) to the spec sentence accepting that cheap checks might be better. - At that point
handleOnlyIfCached
return a dagstore object that the rest of the code uses to fetch the data.
So the rest of the code tries to serve the request as usual but using an offline dagstore.
Sadly I think we cannot do the second option (even tho I like it more) because we cannot send an HTTP error after starting to send the payload (why are we using HTTP again ? browsers .... 😞).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jorropo thanks for noting this.
I've made a conscious decision to only check root block and use block stat --offline
for everything (even DAG requests) instead of dag stat --offline
as the latter could act as inexpensive DoS vecor (dag stat
on Wikipedia DAG does not sound fun, even if all blocks are in the datastore).
The motivation for only-if-cached
is to identify gateways which don't have any part of a DAG cached, and prioritize ones that do.
In that spirit, I agree that the spec should provide guidance to implementers, who will have to make a similar decision as I did (or implement custom heuristic/index)
Opened ipfs/specs#297 to clarify spec, but let's merge this PR as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidel I think that the performance gain block stat --offline
can be worth it. I'm ok with this just wanted the spec to be in accordance.
I didn't thought you would dag stat --offline
But rather do dagService = offlineBlockservice
So the directory index code (or file one if that a file, ...) would run in offline mode. It wouldn't be a DOS vector since it's really the same thing as usual but without networking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened ipfs/specs#297 to clarify spec, but let's merge this PR as-is.
🥳
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (need specs fixups but thoses are open)
This PR implements support for requests sent with
only-if-cached
prerequisite, as documented in Gateway specs:PATH_GATEWAY.md#cache-control-request-header:
PATH_GATEWAY.md#only-if-cached-head-behavior:
Why we need this?
only-if-cached
was not implemented yet)How does this PR work?
Implementation from this PR is minimal: when
only-if-cached
is present in a request,the Gateway will try to read block size for requested CID using API in offline mode (only local blockstore).
412 Precondition Failed
is returned.HEAD
method returns early with200 OK
without doing any additional IO