Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A little help with updating cache #24326

Closed
archon810 opened this issue Sep 1, 2019 · 27 comments
Closed

A little help with updating cache #24326

archon810 opened this issue Sep 1, 2019 · 27 comments
Labels

Comments

@archon810
Copy link

Hi,

We recently launched AMP after a year of working on it (see https://www.androidpolice.com/2019/08/31/android-police-now-supports-google-amp/?amp), but realized that content updating doesn't work, and we need a little help.

https://www.androidpolice.com/.well-known/amphtml/apikey.pub was originally not served as plaintext, but that's been rectified is. It's possible that all we need is some sort of flush so that Google ingests it.

  1. https://developers.google.com/amp/cache/update-cache#update-rsa-keys states that to update the key, we can also ping https://example-com.<cache.updateCacheApiDomainSuffix>/r/s/example.com/.well-known/amphtml/apikey.pub which in our case should be https://www-androidpolice-com.cdn.ampproject.org/v/s/www.androidpolice.com/.well-known/amphtml/apikey.pub, but that's a 404.

  2. I tried joining https://amphtml.slack.com to ask the questions, but it seems to be only for @google.com users. Found https://bit.ly/amp-slack-signup and submitted a request. Hopefully that's still the right way to get in.

  3. Finally, the request to dump cache I'm trying to make is something like https://www-androidpolice-com.cdn.ampproject.org/update-cache/c/s/www.androidpolice.com/2019/08/31/android-police-now-supports-google-amp/%3famp?amp_action=flush&amp_ts=1567319166&amp_url_signature=Hkpfwk1YoewxjBAyaDEtIJ2EB9PWqBY_CeuuICQKmaPkCZ56zKdE9ROoiWdYfWaE-iUJjp2bX_cyHXl4jOqAkeKXeoJNjXwISSohPz_6E7nB4e94iLPxEGojDsEqvZ4ybSRBlsip1NNI5vXGBZKIPv-28GeoDtTGCyVxvPwnJRTn0POkQRfbWlM_hcLq9QlfVqV9w9jjm2TJ6K7Vk3NnsEsFqtsZsAsBbjYUVwiuxnCFNgyIljs8izLxySkWV8Ks6Z5ESMqVTruhSinc1iHB-bRuFQYzvM8JoiS9KoiWWRg4RgDHjNs2VKTx88kescVKgxl5BvwvgyOxKG7J0xAmK1, but it returns

403. That’s an error.

Your client does not have permission to get URL /update-cache/c/s/www.androidpolice.com/2019/08/31/android-police-now-supports-google-amp/%3Famp?amp_action=flush&amp_ts=1567319166&amp_url_signature=Hkpfwk1YoewxjBAyaDEtIJ2EB9PWqBY_CeuuICQKmaPkCZ56zKdE9ROoiWdYfWaE-iUJjp2bX_cyHXl4jOqAkeKXeoJNjXwISSohPz_6E7nB4e94iLPxEGojDsEqvZ4ybSRBlsip1NNI5vXGBZKIPv-28GeoDtTGCyVxvPwnJRTn0POkQRfbWlM_hcLq9QlfVqV9w9jjm2TJ6K7Vk3NnsEsFqtsZsAsBbjYUVwiuxnCFNgyIljs8izLxySkWV8Ks6Z5ESMqVTruhSinc1iHB-bRuFQYzvM8JoiS9KoiWWRg4RgDHjNs2VKTx88kescVKgxl5BvwvgyOxKG7J0xAmK1 from this server. (Client IP address: 73.189.194.95)

Invalid public key due to ingestion error: Invalid Content

That’s all we know.

Presumably, this points to an inability to read the key?

  1. For AMP urls that end with ?amp, do we need to send a request to https://example-com.<cache.updateCacheApiDomainSuffix>/update-cache/c/s/example.com/article%3famp?amp_action=flush... or https://example-com.<cache.updateCacheApiDomainSuffix>/update-cache/c/s/example.com/article?amp&amp_action=flush...?

  2. Regarding cache, do I understand it correctly that if we don't send a max-age cache-control header, the cache will never be updated, but if we do, it'll be updated once the max-age runs out without having to ping update-cache? Or do we still need to update-cache for every time we edit the content?

Thank you.

@archon810
Copy link
Author

@codewiz @csLittleye @Gregable have been helpful with AMP update issues in the past. Thank you.

@archon810
Copy link
Author

I was able to use a subdomain to test flushing the logic, and managed to get an "OK" response.

So at this point, that would resolve 3) above.

For 4) I found that it needs to be ?amp&amp_action-flush... and not the %3f version.

For 1) I still get a 404 for the subdomain I tried, so I'm still not sure what I'm doing wrong there.

@archon810
Copy link
Author

Someone or something seems to have flushed the key cache, and now update-cache requests to www.androidpolice.com succeed, so that's great.

However, I'm still not clear about something. It seems there are multiple copies of the cache created by various Google properties (like Discover, Chrome Discover, Google News, search), and there are subtle differences between them, such as

https://www-androidpolice-com.cdn.ampproject.org/c/s/www.androidpolice.com/2019/08/31/android-police-now-supports-google-amp/?amp

vs

https://www-androidpolice-com.cdn.ampproject.org/v/s/www.androidpolice.com/2019/08/31/android-police-now-supports-google-amp/?amp=&amp_js_v=0.1

Note the extra parameter as well as /v/ vs /c/. I'm not even sure what /v/ is supposed to stand for, as it's not part of this doc https://developers.google.com/amp/cache/overview.

These caches currently differ, even after we sent a cache-update request for https://www.androidpolice.com/2019/08/31/android-police-now-supports-google-amp/?amp, and it returned an OK:

https://www-androidpolice-com.cdn.ampproject.org/update-cache/c/s/www.androidpolice.com/2019/08/31/android-police-now-supports-google-amp/?amp&amp_action=flush&amp_ts=1567355288&amp_url_signature=Vzc2Mhzmk45PC5TDYsweVwQ74q1FbvAjgXzB-Mw0FrCyFWlNTQV5-8BW5ODUusxL9FnJ6DpitR1z7nb1CQjw1Zcnzz5XCoTNy3n2xLivyYSCLsmOZPzUqFUkMCrgIiPJSGDQkYfJU9Qzd6GDnso58xwzBkFEPuUBPFPCilZDMNpvPtYvNthkPunKsdO57JM7_qUiuWJw2M2atI6JMT-kmtaJ6YGeSkwNuESf5kT6nTI5CmUm5cox7aVUbqcgWI_sJ0cFbo5xIszmcG2UmbafmWHvFeSQyT5d69Q3VF_4kqW_8-r-WM6UBlBincRuWZWGwYEKijnv0Qhu_sJUoDIRWg

Appreciate any clarification.

@archon810
Copy link
Author

archon810 commented Sep 1, 2019

Furthermore, we can't seem to get the content to update even with max-age=30, with update-cache pings or without.

As a test, I changed the link below to "working on for a few years".

https://www.androidpolice.com/2019/08/31/android-police-now-supports-google-amp/?amp:

image



https://www-androidpolice-com.cdn.ampproject.org/c/s/www.androidpolice.com/2019/08/31/android-police-now-supports-google-amp/?amp:

image

This is still the old link.

That's after

[01-Sep-2019 10:14:00 America/Los_Angeles] AMP Update Ping: Pinging https://www-androidpolice-com.cdn.ampproject.org/update-cache/c/s/www.androidpolice.com/2019/08/31/android-police-now-supports-google-amp/?amp&amp_action=flush&amp_ts=1567358039&amp_url_signature=AEGgwqccF75lhH5bS8teTlsbqUVUCE5oYTFVW7z1xnsDqVG2xJqjhDx-L4Tcq5Mb_N5nOhBBfS_uLnepPzfJGx4T72IW1SSTYIbRsBAoKOkf15uH6m_am6MZ5oERIcTm3jZab15ZrctuwEnFwPism7V_h3ZlUsoSCElIYxv4N3Qpc5KWZmzKUQ1JaVK1MN8ouPd_8pG6_YPPtDbXYeRHTxSkerxbj6p_6Mp7ic1fMMf_kzggVfODoDJjpS2hI90yWHrCJJpByBXQZDiQljZaM6TBYLX8XOet5KpRp4cfT0eH3vm0fcixLcj1hTcmc050EuEdsuzIEnrwDRvXqNkg6A because post 583130 was updated. Return status was 200.

Cache-control is max-age: 30. So why is it not updating? (By the time you see the cache page, it may already update, but we're trying to resolve the timeliness of such updates.)

@sebastianbenz
Copy link
Contributor

Thanks for the detailed write-up. I'll try to answer a few of the questions:

  • re Cache URL: /v/ vs /c/. v returns a slightly different version of the document used by the AMP Viewer. Afaik you don't need to update this version via the update cache API. It's a good question though and we should document this (//cc @Gregable).
  • re Cache-control is max-age: 30: how long did you wait when testing? It can take up to 8(?) min until the updates are propagated. This used to be documented somewhere, but I couldn't find it either.

@archon810
Copy link
Author

Thanks. The question is why /v/ isn't updating at all. For example, https://www-androidpolice-com.cdn.ampproject.org/v/s/www.androidpolice.com/2019/08/31/android-police-now-supports-google-amp/?amp=&amp_js_v=0.1 still hasn't received updates after several days, whereas https://www-androidpolice-com.cdn.ampproject.org/c/s/www.androidpolice.com/2019/08/31/android-police-now-supports-google-amp/?amp is at least getting updates.

re Cache-control is max-age: 30: how long did you wait when testing?

I want to say I waited around 10 minutes, but I'll have to test again. I expected it to refresh after 30 seconds though.

@archon810
Copy link
Author

archon810 commented Sep 3, 2019

In general:

  1. How many url variations can a single post generate in various Google AMP caches? I'm talking both prefixes, like /v/ or /c/, as well as GET parameters.
  2. Should sending update-request to the main AMP url be sufficient to refresh all such caches, including ones with GET params?
  3. Is it possible to see all the cache urls generated for a specific post, via an API or some tool?
  4. Assuming our AMP pages return max-age 30, is it necessary to send update-caches or should the caches refresh by themselves after the listed expiration? It certainly hasn't been the case for us, and it's ambiguous to me from the documentation.
  5. Why can it take so long to update the cache, if it's set to max-age 30? Sometimes fixing a typo without letting thousands of people see it for 10 minutes is pretty crucial, and by relying on AMP, we give up the ability to make timely updates to our content compared to being able to update the site and instantly make any changes available. In comparison, for example, dumping a url from CloudFlare's cache is near-instant and should be the standard.

@Gregable
Copy link
Member

Gregable commented Sep 3, 2019

Reported internally to Google as b/140420052

@archon810
Copy link
Author

archon810 commented Sep 3, 2019

@sebastianbenz As a test this morning, we posted https://www.androidpolice.com/2019/09/03/android-website-updated-for-android-10/?amp and updated it with a distinct update that you can see marked by a yellow banner.

When I got to the post, ?amp was already showing the update, and I made sue to update it again (which internally triggers an update-cache request) to set our baseline. This was done at 9:48am.

image

The other 2 urls are:

At this point, at 10:08am, 20 minutes after the baseline, they still look like this - with no update in sight:
image

Update: 11:08am, 1hr 20 min later, and still no updates. In the meantime, there was another update to the original story:
image.

Headers on the cached AMP page show max-age:
image

@archon810
Copy link
Author

3 hours later, none of the updates still propagated to the AMP caches.

Here's the original update-cache ping from 9:48am:

[03-Sep-2019 09:48:48 America/Los_Angeles] AMP Update Ping: Pinging https://www-androidpolice-com.cdn.ampproject.org/update-cache/c/s/www.androidpolice.com/2019/09/03/android-website-updated-for-android-10/?amp&amp_action=flush&amp_ts=1567529328&amp_url_signature=tNFJQtqv5zpQjemlQEckZ1UcLwtm1dBDq7b_gaPqRjP5lDmHSA3jbNrmSsdaKBazTrAjykpe7yyWn48l3mI6apgIiksQLWdk4cG1NSRHfaJe992YobZm3knN_Yk-2gVvZNRJ44hqwX-gQ6zYVe0t8quKdyEHA_BtWJkN2nj4ytF19IVdcpsMhNVxk6QqR47SwqB92-nilihAqzgYzXHONkFOxA7jhNTmn3jfwIT5rtFyNUTJxZ6WsIAZeacluSoJ4haw73bRJdYIMe7MsPNAR1Roi2ocL2YYfZa6DbwYS_VcKFbP0qXdoQ7FVStja8odunzjBSva8JkWGM6QaVVjyg because post 583538 was updated. Return status was 200.

@Gregable
Copy link
Member

Gregable commented Sep 5, 2019

One issue observed (discussed over slack, but adding here for posterity) is that the /v/ and /c/ URLs listed are different publisher URLs.

The /c/ URL is fetching ?amp and the /v/ URL is fetching ?amp=. These are different URLs to the AMP cache and thus different cache entries, with different lifetimes. The update-cache ping only affected the url with ?amp which matches the /c/ request, hence why that one was updated.

@archon810
Copy link
Author

archon810 commented Sep 5, 2019

To follow up on Greg's reply, there seems to be a related bug somewhere in AMP CDN code which sometimes redirects urls with ?...&amp to ?amp=, thereby appending the = sign and causing stale cache to be served because the cache key now includes =, and the cache we're busting with update-cache does not.

For example:
https://www-androidpolice-com.cdn.ampproject.org/v/s/www.androidpolice.com/2019/09/05/google-says-itll-soon-release-a-new-android-auto-for-phone-screens-app/?amp_js_v=a2&amp_gsa=1&amp
gets redirected to https://www-androidpolice-com.cdn.ampproject.org/v/s/www.androidpolice.com/2019/09/05/google-says-itll-soon-release-a-new-android-auto-for-phone-screens-app/?amp=&amp_js_v=0.1

Note how amp_gsa=1 disappears, and &amp moves to the front and becomes amp=.

@westonruter
Copy link
Member

cf. ampproject/amp-wp#1383 where we are thinking to cause the AMP plugin to generate paired URLs with ?amp=1 instead of ?amp, or at least allow it to be configurable: ampproject/amp-wp#2204.

@archon810
Copy link
Author

I was also told today (by @Gregable) that we should be using s-maxage and not max-age. I was not aware of this, since https://developers.google.com/amp/cache/update-cache only talks about max-age. This is a documentation issue to be fixed IMO.

@Gregable
Copy link
Member

Gregable commented Sep 5, 2019

I was also told today (by @Gregable) that we should be using s-maxage and not max-age.

Both are fine. The important element is that s-maxage is intended to override max-age for intermediary caching.

@archon810
Copy link
Author

I just remembered what confused me further previously. https://developers.google.com/amp/cache/overview:

Google AMP Cache updates
When a user requests an AMP document from the Google AMP Cache, the cache automatically requests updates in order to be able to serve fresh content for the next user once the content has been cached. With this model, updates to AMP documents propagate automatically and quickly; few users will see the non-updated version after your update.

The cache follows a "stale-while-revalidate" model. It uses the origin's caching headers, such as Max-Age, as hints in deciding whether a particular document or resource is stale. When a user makes a request for something that is stale, that request causes a new copy to be fetched, so that the next user gets fresh content.

To limit the amount of load it generates for publisher sites, the Google AMP Cache considers any document fresh for at least 15 seconds, and any resource fresh for at least 1 minute. Note that those numbers may change in the future, as we tune the cache for optimum balance between freshness and load on publisher sites.

This document makes it seem like update-cache isn't needed at all, and with the right headers (such as s-maxage or max-age), the caches are updated automatically and in a timely fashion of seconds or minutes, which I don't believe to be the case.

Clarifications from the AMP core team would be greatly appreciated.

@archon810
Copy link
Author

Related: #19988.

@patrickkettner Any updates on the above questions? Thanks.

@seomaz
Copy link

seomaz commented Oct 26, 2019

Related: #25264 ?

@archon810
Copy link
Author

Hi, we're trying to plan ahead with converting the whole androidpolice.com site to AMP, now that we have single posts working. But now we have AMP CDN caching concerns. Imagine how many pages we'd actually need to purge after a single post is published.

Every single pagination page for the homepage, categories, tags, author, and any matching search terms would need to be updated. There's no way WP even knows all of them, let alone W3 itself. Such pages are probably left to expire and regen on their own, which, tbh, is fine - our W3 caching is set to like 30 minutes and I can live with pagination not working 100% on pages 5+.

But what about AMP caching? Will it actually re-query the source properly without a call to update-cache or will it cache all these pages forever until basically the next post happens to bust that cache?

I wish there was a way to tell AMP that every page can only be cached for up to, say, 5-30 minutes, and have it refresh from source no matter update-cache call or not.

This is actually a potential roadblock for our plan to convert the whole site and not just post pages to AMP.

Any advice here?
Thanks.

@morsssss
Copy link
Contributor

morsssss commented Jan 7, 2020

Just pinged b/140420052, letting people know that this issue persists

@morsssss
Copy link
Contributor

Update: the team that works on the AMP Cache is actively investigating this to see if there's an issue.

@nbcsteveb
Copy link

Hello,

We've had similar struggles getting up and running on our end.

I've written some sample code to demonstrate the issue here: https://gist.github.com/nbcsteveb/53f13a7f7704e446a22f89dfdd94b050

Response comes back:

Status Code: 403
Invalid public key due to ingestion error: 404 or 410 error from origin

I've reached out to our internal contact and they've advised me to comment in this thread for further assistance.

Thank you.

@tannerbaum
Copy link

tannerbaum commented Apr 28, 2020

Hello,

We've had similar struggles getting up and running on our end.

I've written some sample code to demonstrate the issue here: https://gist.github.com/nbcsteveb/53f13a7f7704e446a22f89dfdd94b050

Response comes back:

Status Code: 403
Invalid public key due to ingestion error: 404 or 410 error from origin

I've reached out to our internal contact and they've advised me to comment in this thread for further assistance.

Thank you.

I'm also stuck at this point (on top of the cache not respecting the max-age header). I suppose the other issues suggest that the problem could be that the public key isn't robotable, but I definitely made an effort to make sure that's not the case.

What's weird is that if I access the url that server side renders the articles, and hard refresh, every once and a while it will load different versions of the article. And all of those can be different what is shown in the AMP viewer through a google search.

Here are the response headers from the ssr link:

HTTP/1.1 200 OK
Server: openresty/1.15.8.1
Content-Type: text/html; charset=utf-8
x-powered-by: https://www.[redacted]next.de/
Access-Control-Allow-Origin: https://www-[redacted]-de.cdn.ampproject.org
Access-Control-Allow-Credentials: true
ETag: W/"13c0e-acqO6H3iH1PCS4mvpwVMMCeUIkU"
Strict-Transport-Security: max-age=15724800; includeSubDomains
Content-Encoding: gzip
Content-Length: 26030
Cache-Control: max-age=623
Date: Tue, 28 Apr 2020 16:09:30 GMT
Connection: keep-alive
Vary: Accept-Encoding

Could share the generated update-cache url if that helps.

@morsssss did you ever find an issue in your investigation?

Edit: In my case, our main issue was apollo client caching the data. That compounded with our own caching and then AMP to lead to a bad outcome. But ultimately it wasn't an amp issue at all.

@morsssss
Copy link
Contributor

morsssss commented May 1, 2020

@tannerbaum , this thread ended up spanning a number of cache-related questions. We should really close it so that new reports like yours can get noticed! (I'm sorry it didn't.) Usually these come down to various complexities in making the request with the right URL, making sure servers are available to serve fresh content when it's needed, etc.

I can only ask if @Gregable spots any issue here.

@tannerbaum
Copy link

@morsssss ha I definitely see what you are saying about this issue. I didn't want to make another issue for you guys at the risk of adding to the noise, but maybe I will go ahead and try to lay out everything I can in a new issue, including writing the actual url being generated. Thanks for your response!

@netnaps
Copy link

netnaps commented Aug 15, 2020

Hi I was pointed by @westonruter to this post again thanks to him, via a WordPress issue I created at the AMP Plugin Support here

I have gone through many posts on clearing the AMP Cache, concerning this post Titled - " Keeping things fresh with stale-while-revalidate "

I have tried this on my website, which is an AMP Native Site.

To check, I am using a Quote of the Day in the Footer that's updated daily with a cron automatically.

to construct the URL for the cache I followed the same pattern that is https://cdn.ampproject.org/c/s/netnaps.com for my home page, to the server-side I added: Header set Cache-Control "max-age=15, public,stale-while-revalidate=59"

With files match, I could remove the public and then tried with a refreshing, and that worked for me! The page was updated manually.

For the bulk update, I don't have much of the URL's, but if I have can make a script and set a cron to ping the URLs in bulk automatically but will that harm the site with google that I am not sure and could not find an answer to the same on the web.

I also followed this page from google here , that states the same " stale-while-revalidate "

I would look forward to more clarity on the same issue for serving fresh content by purging cache.

If this can be done via the default plugin it would be great.

@stale
Copy link

stale bot commented Feb 12, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Stale Inactive for one year or more label Feb 12, 2022
@stale stale bot closed this as completed Feb 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants