Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway returns object that did not change instead of 304 (Not Modified) #848

Closed
vstax opened this issue Sep 26, 2017 · 11 comments
Closed

Comments

@vstax
Copy link
Contributor

vstax commented Sep 26, 2017

I'm not 100% sure this is gateway problem and not configuration problem on my side - please advise if it is.

I have custom http headers configured as follows:

location bod6 {
    expires    1h;
    add_header Cache-Control public;
}

I access gateway (S3 mode) through squid. Squid caches objects for an hour. When an hour passes, it asks gateway if the object is changed by specifying "If-Modified-Since" (check through date) and "If-None-Match" (check through ETag) headers. However, instead of returning 304 gateway sends the whole data.
Request:

GET /bod6/00/00/94/00009444e8271f3d852257a39651827d34c20cd76b80b41c58fc454a89d6140b034d957b76234a451c129454fffb6d240040080000000000.xz HTTP/1.1
If-Modified-Since: Sat, 23 Sep 2017 04:30:16 GMT
If-None-Match: "ab8d12fb796ea4c868f5c1fca67abba4"
Accept-Encoding: identity
X-Amz-Content-SHA256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20170926T153222Z
Authorization: AWS4-HMAC-SHA256 Credential=0fc67b42677ded00cc46/20170926/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=b9524900b0853187a04e4318a7273090ae32e2be6670460734c90118c7b74b7a
User-Agent: Boto3/1.4.6 Python/2.7.5 Linux/2.6.32-042stab123.3 Botocore/1.6.0
Host: bodies.selectel.cloud.lan:8090
Via: 1.1 davproxy-1.selectel.cloud.lan (squid/3.5.20)
X-Forwarded-For: 192.168.130.121
Cache-Control: max-age=259200
Connection: keep-alive

Reply:

HTTP/1.1 200 OK
Connection: keep-alive
Date: Tue, 26 Sep 2017 15:32:21 GMT
Content-Length: 126740
Server: LeoFS
Content-Type: application/octet-stream
ETag: "ab8d12fb796ea4c868f5c1fca67abba4"
Last-Modified: Sat, 23 Sep 2017 04:30:16 GMT
X-From-Cache: True/via memory
Cache-Control: public, max-age=3600

(object data follows)

Since client sends If-Modified-Since: Sat, 23 Sep 2017 04:30:16 GMT and server knows that Last-Modified: Sat, 23 Sep 2017 04:30:16 GMT, this means that If-Modified-Since evaluates to false (object wasn't modified after Modified-Since date), and according to RFC it must send 304 (Not Modified) without data instead of 200 with data: https://tools.ietf.org/html/rfc7232#section-3.3

Alternatively, if server knows how to work with ETag and If-None-Match header, it must check that one instead of If-Modified-Since. Here we have If-None-Match: "ab8d12fb796ea4c868f5c1fca67abba4 from client and ETag: "ab8d12fb796ea4c868f5c1fca67abba4" on server, which means that objects match, condition evaluates to false and according to RFC it must sent 304 as well (https://tools.ietf.org/html/rfc7232#section-3.2).

@mocchira
Copy link
Member

@vstax Thanks for reporting.

It turned out that we have If-Modified-Since aware code block available only if cache.http_cache = true in leo_gateway.conf as below.

Instead it should be available regardless of its setting.
we will fix soon.

Alternatively, if server knows how to work with ETag and If-None-Match header, it must check that one instead of If-Modified-Since. Here we have If-None-Match: "ab8d12fb796ea4c868f5c1fca67abba4 from client and ETag: "ab8d12fb796ea4c868f5c1fca67abba4" on server, which means that objects match, condition evaluates to false and according to RFC it must sent 304 as well (https://tools.ietf.org/html/rfc7232#section-3.2).

Do you think we should implement If-Non-Match aware in addition to If-Modified-Since one?
(we are pleased to implement if squid (or other cache servers like varnish) can request with If-Non-Match to leofs)

@vstax
Copy link
Contributor Author

vstax commented Sep 28, 2017

@mocchira I see, thank you for identifying the problem. I read somewhere in documentation or issues that "http cache" mode in gateway isn't much used / tested so never tried enabling it.

Do you think we should implement If-Non-Match aware in addition to If-Modified-Since one?
(we are pleased to implement if squid (or other cache servers like varnish) can request with If-Non-Match to leofs)

As I understand it, If-Modified-Since is HTTP/1.0 legacy, and fully HTTP/1.1 complaint client and server are supposed to use ETag and If-None-Match instead. It works more precisely and solves several problems of If-Modified-Since like avoiding false positive when object was updated with the same copy (new Last-Modified date but same ETag). Squid supports ETag and If-None-Match for several years (since around HTTP/1.1 support was merged, i.e. probably 3.0 or 3.2) and looking at documentation, Varnish does as well. Most CDNs use If-None-Match too nowadays, I think. So yes, it probably would be nice to have If-None-Match support. It's more tricky header to support properly, though (like RFC and that Varnish documentation show, there are some special cases like passing multiple ETags or "*").

@mocchira
Copy link
Member

@vstax

I see, thank you for identifying the problem. I read somewhere in documentation or issues that "http cache" mode in gateway isn't much used / tested so never tried enabling it.

Right. we don't recommend to enable it.

As I understand it, If-Modified-Since is HTTP/1.0 legacy, and fully HTTP/1.1 complaint client and server are supposed to use ETag and If-None-Match instead. It works more precisely and solves several problems of If-Modified-Since like avoiding false positive when object was updated with the same copy (new Last-Modified date but same ETag). Squid supports ETag and If-None-Match for several years (since around HTTP/1.1 support was merged, i.e. probably 3.0 or 3.2) and looking at documentation, Varnish does as well. Most CDNs use If-None-Match too nowadays, I think. So yes, it probably would be nice to have If-None-Match support. It's more tricky header to support properly, though (like RFC and that Varnish documentation show, there are some special cases like passing multiple ETags or "*").

Thanks. we will also support part of If-None-Match (except some special cases).

@mocchira
Copy link
Member

mocchira commented Oct 6, 2017

It turned out that this long standing issue #216 prevents us from implementing If-None-Match against objects multipart-uploaded properly so we will implement only If-Modified-Since for now.

@mocchira
Copy link
Member

mocchira commented Oct 6, 2017

@vstax Please give it a try #865 when you have time.

@vstax
Copy link
Contributor Author

vstax commented Oct 11, 2017

@mocchira
I've tested it and I can see two problems. First is the fact that "Content-Length: 0" header is present for "304 Not Modified" is incorrect and can confuse clients, according to RFC (there is better explanation at ninenines/cowboy#1153). There are some examples that people are providing showing that some clients don't like it because it re-defines original header with wrong value; all clients work fine when "304" is returned with "Content-Length: (original length)", through RFC recommends always omitting it for the sake of shortening the response. (EDIT: apparently, some clients don't like that as well - https://bugs.chromium.org/p/chromium/issues/detail?id=79556; omitting this header seems to be the only way to do it correctly).

Another problem is that "Cache-Control" (added through headers option) isn't present in "304" response. It must be present according to RFC (clients want it to update internal date for next expiration), here is example of what happens when it's not there: https://stackoverflow.com/questions/1587667/should-http-304-not-modified-responses-contain-cache-control-headers
This is pretty serious because since clients may interpret 304 without cache-control as reply with cache-control and max-age=0, it causes them to do request every time after getting "304" once. In some scenarios, this can make it worse than current behaviour without support for 304...

In practice, squid isn't affected by both problems (it keeps original max-age), but I think it's just some workaround implemented in there and fixing at least missing "Cache-Control" is important.

@mocchira
Copy link
Member

@vstax Thanks for checking and reporting the problems. both should be fixed so I will update PR for the Cache-Control one. Since Content-Length one needs the fix on cowboy, I will send another PR on leo-project/cowboy.

@mocchira
Copy link
Member

mocchira commented Oct 12, 2017

Updated #865 to fix Cache-Control problem.

@mocchira
Copy link
Member

Sent leo-project/cowboy#2 to fix Content-Length problem.

@vstax
Copy link
Contributor Author

vstax commented Oct 16, 2017

@mocchira Both issues are fixed and everything seems correctly now - thank you!

Just one note, unrelated to this ticket, but since we're at it: apparently, according to RFC, Content-Length should be omitted in case of "204" reply as well:

A server MUST NOT send a Content-Length header field in any response with a status code of 1xx (Informational) or 204 (No Content). A server MUST NOT send a Content-Length header field in any 2xx (Successful) response to a CONNECT request (Section 4.3.6 of [RFC7231]).

Actually this is even stricter requirement than for 304, as RFC allows to have Content-Length for 304 if it matches the original object's Content-Length but forbids it for 204 in all cases.

204 reply is sent for delete, delete bucket, abort MU operations:

DELETE http://bod6.s3.amazonaws.com/00/00/e4/0000e47e2ba12f37a127ef001e2ebd5ab38318ccf6861e20e72b5bf1e5cad202d26efd17753264d539bea585e355cfd300d4070000000000.xz HTTP/1.1
Host: bod6.s3.amazonaws.com
Accept-Encoding: identity
Content-Length: 0
x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Authorization: AWS4-HMAC-SHA256 Credential=0fc67b42677ded00cc46/20171016/US/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=f4d050ba55a6bf92ac833dc5060bf5b7a51bc341a0679f89a0c23588813c8c1e
x-amz-date: 20171016T175401Z


HTTP/1.1 204 No Content
Date: Mon, 16 Oct 2017 17:54:01 GMT
Content-Length: 0
Server: LeoFS

Maybe it's a good idea to extend removal of Content-Length to 204 as well to confirm with RFC. I have no idea if having that header can cause any real problems, though.

If If-None-Match support won't be implemented as part of this ticket, it's OK to close this?

@mocchira
Copy link
Member

@vstax

Maybe it's a good idea to extend removal of Content-Length to 204 as well to confirm with RFC. I have no idea if having that header can cause any real problems, though.

Thanks for pointing this out. I will file as another issue and after we confirm that all test suites for aws-sdk we've supported passed, we will patch responding 204 without Content-Length.

If If-None-Match support won't be implemented as part of this ticket, it's OK to close this?

Yes please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants