Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign Cloudflare cache purging #3410

Closed
richardlau opened this issue Jul 5, 2023 · 10 comments
Closed

Redesign Cloudflare cache purging #3410

richardlau opened this issue Jul 5, 2023 · 10 comments

Comments

@richardlau
Copy link
Member

I believe most of the server issues we're currently having with nodejs.org stem from our rather crude "purge everything" strategy. This is currently done by ansible/www-standalone/resources/scripts/cdn-purge.sh.j2 (which runs every five minutes) if a purge has been locally queued.

Purges are queued by calling ansible/www-standalone/resources/scripts/queue-cdn-purge.sh. This is done every time:

To give an idea of frequency, these are the cdn-purges from today (including the Node.js 20.4.0 release):

2023-07-05T06:35:01+00:00, nodejs, promote
2023-07-05T07:35:01+00:00, nodejs, promote
2023-07-05T11:05:01+00:00, nodejs, promote
2023-07-05T12:35:01+00:00, nodejs, promote
2023-07-05T13:35:02+00:00, nodejs, promote
2023-07-05T14:00:01+00:00, nodejs, promote resha_release
2023-07-05T14:55:01+00:00, nodejs, build-site
2023-07-05T16:30:01+00:00, nodejs, build-site

The first two "promote"s are from the nightly build, the next three from the V8 canary, and the last "promote" (and "resha_release") from the manual promotion of 20.4.0. The subsequent "build-sites" are from merging nodejs/nodejs.org#5473 and nodejs/nodejs.org#5474 respectively.

Purging everything is rather heavy handed, but our current options in Cloudflare are to either purge everything (as we are doing) or purge a list of static URLs (there are further options only available for enterprise accounts which we do not have access to). Switching to a more selective cache purge requires us to determine what URLs to purge for all of the above scenarios that we are currently purging everything.

@richardlau
Copy link
Member Author

I guess an interesting question would be, what would happen if we didn't purge the Cloudflare cache at all? One assumes that that may lead to getting stale pages/content (may even vary depending on what CF server you end up connecting to). Would new builds/releases not show up as 404 not founds are cached?

We could temporarily comment out

# purge full cache
curl -X DELETE \
"https://api.cloudflare.com/client/v4/zones/${zone_id}/purge_cache" \
-H "X-Auth-Email: ${api_email}" \
-H "X-Auth-Key: ${api_key}" \
-H "Content-Type: application/json" \
--data '{"purge_everything":true}'
or the cron job that runs that script on the 5 min timer
- "*/5 * * * * root /home/nodejs/cdn-purge.sh"
.

@richardlau
Copy link
Member Author

Now that we have a CF enterprise account, we have more purging options available to us, but that still requires engineering work to determine what (if anything) needs to be purged and pass that information between queue-cdn-purge.sh and cdn-purge.sh (or design something else completely) and make the relevant CF API call.

@nschonni
Copy link
Member

The only thing I can think of that might need a low expiry would be the index.tab/json.

I can't think of anything that really would need to be purged unless the cache got poisoned.

@targos
Copy link
Member

targos commented Jul 14, 2023

Also directory indexes

@nschonni
Copy link
Member

nschonni commented Jul 14, 2023

Was looking for something else, and found a little previous discussion over in #2375 and #2396 too

@targos
Copy link
Member

targos commented Jul 14, 2023

And for the website, all the generated html pages. JS and CSS chunks should be fine since they have content hashes in their file names.

@targos
Copy link
Member

targos commented Jul 17, 2023

There are also all "latest" symlinks like https://nodejs.org/docs/latest, https://nodejs.org/dist/latest/, https://nodejs.org/dist/latest-v20.x/, etc.

@targos
Copy link
Member

targos commented Jul 17, 2023

Note that we now have access to enterprise features such as:

@richardlau
Copy link
Member Author

I think cache-tags may make sense for the website, but for releases/nightlies/v8-canary we'd probably need to do either URL prefix (most likely because of docs) or by URL.

Copy link

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions github-actions bot added the stale label May 13, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants