Expiration changes hanging deploys #2224

moonwell-octavius · 2022-03-18T02:53:17Z

🐛 Bug report

Describe the bug

Hi there, we use wrangler via https://github.com/cloudflare/wrangler-action to deploy a workers-site, and our builds stopped working within the past few hours.

Diving into why I notice that wrangler 1.19.9 was just cut a few hours ago.

Built successfully, built project size is 13 KiB.

... snip ...

 Deleting stale files...
Error:  Code 10033: bulk put: 'Invalid expiration of 1647570824. Please specify integer greater than the current number of seconds since the UNIX epoch.'

This makes me think that the recent change is related. When I downgrade by setting wranglerVersion: '1.19.8' in my github actions config things work as expected.

Reproduce the bug

I suspect if you deploy a site with other assets already uploaded it'll choke, not sure we do anything special that would solicit this behavior.

Expected behavior

I expect the API not to throw an error when running wrangler to deploy my site.

Environment and versions

Fill out the following information about your environment.

operating system: Github Actions
output of wrangler -V: v1.19.9
output of node -v: https://github.com/cloudflare/wrangler-action
content of wrangler.toml: happy to share privately

The text was updated successfully, but these errors were encountered:

fullheart · 2022-03-18T08:36:00Z

We have the same problem with Version v1.19.9. Thank you for fixing it.

arthurhamon · 2022-03-18T14:26:51Z

We encounter the same problem, we revert to previous version 1.19.8

jplhomer · 2022-03-18T14:31:25Z

Investigating a bit, and we discovered that chrono::Utc::now().timestamp() + (60 * 5); when running in GitHub Actions containers tends to output a timestamp that is 10 minutes behind the actual time. This leads to the bulk put KV API error.

The answer might be to use expiration_ttl instead, but sounds like that also has implications cloudflare/workers-sdk#587 (review)

Another way to improve this would be to catch the error and stop the GH Action to prevent hanging 💰

nsinghbakkt · 2022-03-18T18:29:05Z

same we are stuck at the moment all of our deploys are failing .. Please advise

fullheart · 2022-03-18T20:53:40Z

We workarounded the problem by downgrading the package version in our bitpucket pipeline to 1.19.8:

package.json: wrangler: 1.19.8 (fixed Version)
bitbucket-pipelines.yml: yarn global add wrangler@1.19.8

threepointone · 2022-03-19T13:33:00Z

This sucks, sorry about this y'all. We'll have a look on monday. The current workaround is to use 1.19.8

This switches how we expire static assets with `[site]` uploads to use `expiration_ttl` instead of `expiration`. This is because we can't trust the time that a deploy target may provide (like in cloudflare/wrangler-legacy#2224).

Deployments are failing due to: cloudflare/wrangler-legacy#2224

* Add offensive 360 sponsoring offensive360.com joined as our sponsor. I added logo in aside and readme. * Add logo asset * Update wrangler action * Use wrangler 1.19.8 Deployments are failing due to: cloudflare/wrangler-legacy#2224 * Fix lighthouse * Use actions/cache@v2 * Remove cache

petebacondarwin · 2022-03-20T15:57:33Z

Urgh! So we can't rely on the current time given in a Docker container to be accurate!!
I think using expiration_ttl will solve this.

Due to cloudflare/wrangler-legacy#2224

#2224

closes #2224

caass · 2022-03-21T17:46:32Z

This should be fixed with the latest release, v1.19.10. Can anyone confirm if this works for them or is still broken?

caass · 2022-03-21T17:56:09Z

Reopening until we have confirmation that this is fixed

fwwieffering · 2022-03-21T18:43:44Z

@caass I have a worker with ~ 3900 files in the KV store. These built up over the last few days when deleting files failed.

I don't think its indicative of the patch not working necessarily - but for KV stores of this size wrangler publish times out.

2022-03-21T17:49:17.8200716Z Using API Token authentication
2022-03-21T17:49:19.8534328Z /usr/local/bin/wrangler -> /usr/local/lib/node_modules/@cloudflare/wrangler/run-wrangler.js
2022-03-21T17:49:19.8534762Z /usr/local/bin/wrangler1 -> /usr/local/lib/node_modules/@cloudflare/wrangler/run-wrangler.js
2022-03-21T17:49:19.8593163Z 
2022-03-21T17:49:19.8593462Z > @cloudflare/wrangler@1.19.10 postinstall /usr/local/lib/node_modules/@cloudflare/wrangler
2022-03-21T17:49:19.8595009Z > node ./install-wrangler.js
2022-03-21T17:49:19.8595192Z 
2022-03-21T17:49:28.8701620Z 
2022-03-21T17:49:28.8704353Z found 1 high severity vulnerability
2022-03-21T17:49:28.8704726Z   run `npm audit fix` to fix them, or `npm audit` for details
2022-03-21T17:49:30.3936383Z  Built successfully, built project size is 12 KiB.
2022-03-21T17:49:32.9037406Z  Using namespace for Workers Site "<kv namespace>"
2022-03-21T17:49:38.8310088Z  Success
2022-03-21T17:49:38.8310297Z  Uploading site files
2022-03-21T17:49:43.7417744Z  Successfully published your script to
2022-03-21T17:49:43.7418254Z  <url>/* => stayed the same
2022-03-21T17:49:43.7418382Z 
2022-03-21T17:49:43.7418453Z  Deleting stale files...
2022-03-21T18:38:53.8751263Z Error: error sending request for url (https://api.cloudflare.com/client/v4/accounts/<account id>/storage/kv/namespaces/<namespace id>/values/assets/index.f268669f.df4cc6c294.js): operation timed out
2022-03-21T18:38:53.8751620Z 
2022-03-21T18:38:53.8751684Z Caused by:
2022-03-21T18:38:53.8751827Z     operation timed out

caass · 2022-03-21T19:00:32Z

@fwwieffering I see...likely the cause of that is that index.f268669f.df4cc6c294.js is a large file, and we now download every file in the store to re-upload them with an expirationTtl so that they expire after a few minutes when the new files are uploaded.

An immediate fix that comes to mind would be to implement automatic HTTP retries for requests that time out, so we get 3 or 5 goes at downloading the files before giving up and throwing an error. I can work on this in wrangler 1 and raise an issue for wrangler 2.

It may be worth simply running wrangler publish again and seeing if you get the same error, since that's essentially what an HTTP retry fix would do.

fwwieffering · 2022-03-21T19:13:11Z

@fwwieffering I see...likely the cause of that is that index.f268669f.df4cc6c294.js is a large file, and we now download every file in the store to re-upload them with an expirationTtl so that they expire after a few minutes when the new files are uploaded.

I was worried more about the number of files than the size of them. running wrangler publish on 1.19.10 in a different environment with a kv size of about 650 (less deployment churn over the last few days) succeeded.

caass · 2022-03-21T19:36:16Z

Ah, I see. Did running wrangler publish again give the same error?

hubertott · 2022-03-22T00:19:55Z

@fwwieffering I see...likely the cause of that is that index.f268669f.df4cc6c294.js is a large file, and we now download every file in the store to re-upload them with an expirationTtl so that they expire after a few minutes when the new files are uploaded.

I was worried more about the number of files than the size of them. running wrangler publish on 1.19.10 in a different environment with a kv size of about 650 (less deployment churn over the last few days) succeeded.

I too am concerned about the number of existing files in the kv store, we've been deploying headless applications for over a year now.

I am testing 1.19.10 at the moment and its still stuck at Deleting stale files... after 30mins.

christofferh · 2022-03-22T07:57:22Z

I am testing 1.19.10 at the moment and its still stuck at Deleting stale files... after 30mins.

Seeing the same thing here but with a kv size of ~12k items.

petebacondarwin · 2022-03-22T07:59:04Z

I believe that the Cloudflare API that Wrangler is using to expire these items is rate-limited to 1200 requests per 5 mins.
So I would not be surprised if 12K items is a problem since that is 10x as many requests as we are allowed in 5 mins.
Thinking about what the solution could be ...

caass · 2022-03-22T10:41:27Z

TL;DR: This should be fixed with the latest release, v1.19.11. Upgrade and if you're still experiencing problems, please report them here <3.

We've decided to revert the behavior of workers-sites to what it was before -- deleting unused files in the KV namespace when new files are uploaded.

We had complaints where, on a wrangler publish, the site would be temporarily unavailable in the time between when the old files were deleted and the new files had propagated to the edge. We tried to fix this by, instead of deleting the old files, marking them for expiration instead. This had a couple problems:

There's no way to set the expiration for an existing KV pair, so we had to download and then re-upload every file in the workers-sites KV namespace. This was problematic for users with high churn in their KV namespace, where in the five minutes before the files were set to expire, their namespace would balloon with new files and eventually cause API requests to fail as we downloaded more and more assets to mark them for expiration.
Setting the "expiration" field was unreliable: the publish could occur from an environment with an incorrect system time, or the publish could take so long that by the time the tagged-for-expiration files were uploaded, they were rejected by the API for an expiration set in the past

We remedied the second issue by setting an time-to-live instead of a expiration timestamp, but by this point the first issue had become so exacerbated that publishes were simply failing to download (or maybe re-upload, we're still not sure exactly which API requests were failing) the files from their KV namespaces.

We could maybe solve our original problem by introducing an artificial delay of ~60s in-between publishing new assets and deleting old ones, but if you publish a new version of your site more often than once per minute, you would still run into the "my KV namespace is blowing up" issue.

We (@petebacondarwin, @threepointone and myself) chatted about this and think we're going to leave this behavior as-is for the time being -- unless of course the "fix" (reversion) introduced in 1.19.11 doesn't work -- in wrangler 1, and focus further work on this issue in wrangler 2. Additionally, since we do offer a more complete site hosting solution in Cloudflare Pages (docs), once the dust settles here the original complaint is likely to become more low-priority.

For those interested in moving their projects to Pages, we do have a migration guide that could be helpful. Wrangler2 will also have a dedicated pages subcommand, so keep an eye on that, and please leave feedback in the wrangler2 repo if the command doesn't do what you need (or expect) it to do.

Thank you so much to @ryan-moonwell for filing this issue and everyone who posted in the thread for providing valuable feedback and insight which allowed us to rather quickly iterate on this and reach what is (hopefully!) a working solution. I'll leave this open for now until we've either had reports of successful publishes or the issue goes stale

We discovered critical issues with the way we expire unused assets with `[site]` (#666, cloudflare/wrangler-legacy#2224), that we're going back to the legacy manner of handling unused assets, i.e- deleting unused assets. Fixes #666

We discovered critical issues with the way we expire unused assets with `[site]` (see #666, cloudflare/wrangler-legacy#2224), that we're going back to the legacy manner of handling unused assets, i.e- deleting unused assets. Fixes #666

koeninger · 2022-03-22T13:59:52Z

I don't understand that explanation. The expiry approach should have at most tripled the work (download X old files, upload to expire X old files, upload Y new files) versus the approach of just uploading Y new files. I'd expect X to be proportional to Y. To put it another way, if this approach doesn't work for 1,000 files I'd expect the old approach to stop working at 3,000 files, which is still a problem. It'd be a problem even if there was an api to set expiration on a key without uploading a value.

If you're dealing with X being much larger than Y because people were using versions of wrangler that didn't delete anything for a long time, and you're not handling rate limiting correctly to have a slow but eventually successful first publish on a namespace that's gotten bloated with a bunch of old files, the easiest way for a customer to deal with that is just rename the project and delete the old KV namespace.

threepointone · 2022-03-28T07:18:41Z

We'll have to revisit the expiration logic, it's not clear what we were doing wrong; and it stands that we'll have to add rate limiting logic on our side at least, in addition to whatever else we were missing. That said, I'll assume this issue is fixed and folks are unblocked from making Sites deploys. Closing this issue.

jdddog · 2022-03-31T01:55:45Z

I'm still having problems with wrangler v1.19.11 that might be related to this, it throws an error when deleting stale files with no error message:

💁  Deleting stale files...
Error:

Deleting stale files eventually works if I run the wrangler deploy command once again after it has failed.

It was working OK soon after v1.19.11 was released however the above error popped up today (or I only noticed it today). I'm not sure if it is related, we deploy around about 20K files (for around 5000 pages).

petebacondarwin · 2022-03-31T08:38:20Z

@jdddog - deleting 20K assets could be problematic in itself, with or without this fix... The API can only handle a few thousand assets at a time, so Wrangler should be batching them up. It is possible that an individual batch is failing for some reason. This would explain why the deletion works finally, after a number of attempts, since the some of the batches might be succeeding. 🤔

jdddog · 2022-04-04T22:07:50Z

@jdddog - deleting 20K assets could be problematic in itself, with or without this fix... The API can only handle a few thousand assets at a time, so Wrangler should be batching them up. It is possible that an individual batch is failing for some reason. This would explain why the deletion works finally, after a number of attempts, since the some of the batches might be succeeding. thinking

Thanks @petebacondarwin. I see, that makes sense. It used to work OK without hanging or giving errors, it has only recently become a problem. It is probably more like 10K assets that get updated each time (as some files stay the same and don't get deleted).

petebacondarwin · 2022-04-11T11:24:12Z

We need to add a bit more sophistication in how we do this upload/sync/delete of assets. If you are currently unblocked, then this is likely to be work targeting Wrangler v2.1.

moonwell-octavius added the bug Something isn't working label Mar 18, 2022

This was referenced Mar 20, 2022

fix: expire unused assets on site upload #2221

Merged

feat: expire unused assets with [site] cloudflare/workers-sdk#459

Closed

threepointone mentioned this issue Mar 20, 2022

fix: use expiration_ttl to expire assets with [site] cloudflare/workers-sdk#649

Merged

jakubsacha added a commit to analysis-tools-dev/website-old that referenced this issue Mar 20, 2022

Use wrangler 1.19.8

dd608c2

Deployments are failing due to: cloudflare/wrangler-legacy#2224

github-actions bot mentioned this issue Mar 20, 2022

Version Packages cloudflare/workers-sdk#562

Merged

davidmytton added a commit to consoledotdev/home that referenced this issue Mar 21, 2022

Pin wrangler to 1.19.8

a3d92f1

Due to cloudflare/wrangler-legacy#2224

caass mentioned this issue Mar 21, 2022

hotfix for expiring unused sites assets #2226

Merged

caass pushed a commit that referenced this issue Mar 21, 2022

hotfix for expiring unused sites assets

3fd704b

#2224

caass closed this as completed in #2226 Mar 21, 2022

caass pushed a commit that referenced this issue Mar 21, 2022

hotfix for expiring unused sites assets (#2226)

1eadbcb

closes #2224

caass reopened this Mar 21, 2022

caass mentioned this issue Mar 22, 2022

revert sites expiry stuff #2228

Merged

petebacondarwin mentioned this issue Mar 22, 2022

🐛 BUG: bulk put: could not base64 decode value for key cloudflare/workers-sdk#666

Closed

threepointone mentioned this issue Mar 22, 2022

fix: delete unused [site] assets cloudflare/workers-sdk#667

Merged

github-actions bot mentioned this issue Mar 22, 2022

Version Packages cloudflare/workers-sdk#663

Merged

threepointone closed this as completed Mar 28, 2022

petebacondarwin mentioned this issue Jul 6, 2023

[wrangler] Add deploy --old-asset-ttl option cloudflare/workers-sdk#3384

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expiration changes hanging deploys #2224

Expiration changes hanging deploys #2224

moonwell-octavius commented Mar 18, 2022

fullheart commented Mar 18, 2022

arthurhamon commented Mar 18, 2022

jplhomer commented Mar 18, 2022

nsinghbakkt commented Mar 18, 2022

fullheart commented Mar 18, 2022

threepointone commented Mar 19, 2022

petebacondarwin commented Mar 20, 2022

caass commented Mar 21, 2022

caass commented Mar 21, 2022

fwwieffering commented Mar 21, 2022 •

edited

Loading

caass commented Mar 21, 2022 •

edited

Loading

fwwieffering commented Mar 21, 2022

caass commented Mar 21, 2022

hubertott commented Mar 22, 2022

christofferh commented Mar 22, 2022

petebacondarwin commented Mar 22, 2022

caass commented Mar 22, 2022

koeninger commented Mar 22, 2022

threepointone commented Mar 28, 2022

jdddog commented Mar 31, 2022 •

edited

Loading

petebacondarwin commented Mar 31, 2022

jdddog commented Apr 4, 2022 •

edited

Loading

petebacondarwin commented Apr 11, 2022

Expiration changes hanging deploys #2224

Expiration changes hanging deploys #2224

Comments

moonwell-octavius commented Mar 18, 2022

🐛 Bug report

Describe the bug

Reproduce the bug

Expected behavior

Environment and versions

fullheart commented Mar 18, 2022

arthurhamon commented Mar 18, 2022

jplhomer commented Mar 18, 2022

nsinghbakkt commented Mar 18, 2022

fullheart commented Mar 18, 2022

threepointone commented Mar 19, 2022

petebacondarwin commented Mar 20, 2022

caass commented Mar 21, 2022

caass commented Mar 21, 2022

fwwieffering commented Mar 21, 2022 • edited Loading

caass commented Mar 21, 2022 • edited Loading

fwwieffering commented Mar 21, 2022

caass commented Mar 21, 2022

hubertott commented Mar 22, 2022

christofferh commented Mar 22, 2022

petebacondarwin commented Mar 22, 2022

caass commented Mar 22, 2022

TL;DR: This should be fixed with the latest release, v1.19.11. Upgrade and if you're still experiencing problems, please report them here <3.

koeninger commented Mar 22, 2022

threepointone commented Mar 28, 2022

jdddog commented Mar 31, 2022 • edited Loading

petebacondarwin commented Mar 31, 2022

jdddog commented Apr 4, 2022 • edited Loading

petebacondarwin commented Apr 11, 2022

fwwieffering commented Mar 21, 2022 •

edited

Loading

caass commented Mar 21, 2022 •

edited

Loading

jdddog commented Mar 31, 2022 •

edited

Loading

jdddog commented Apr 4, 2022 •

edited

Loading