-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core D2 dependency filename changes between builds breaks Netlify indexing #3383
Comments
Thanks for reporting this. Does this happen on Docusaurus 2 website too? Does this happen if you change no md files at all? Only one md file?
Not sure to understand about all that. To me the main problem is that hashes changes to URLs changes and thus more assets caching are invalidated. Can you explain how it impacts build/deploy time and by how much? |
I would imagine it does, as all D2 sites appear to generate these dependencies.
No, it only seems to happen after changing an md file, CSS or a custom page, i.e.
Yes, the screenshot I referenced in this issue was produced after editing a single md file. As you can see, building after modifying a single md file resulted in modifying all other doc and blog files, in addition to the root index.html and 404.html files.
Yes, that's the problem. Caching is invalidated after the URLs change, since that results in a different hash.
In the case of our largest site, https://xsoar.pan.dev, changing one file can result in as many as 2500+ "new files to upload" being detected by Netlify. Netlify performs additional processing of HTML before uploading to their CDN which can add as much as 15 or more minutes to the build time. Basically, without the benefit of caching/indexing, each deploy is treated as a brand new deploy. |
I see, thanks. Not sure to have time to check that problem currently but will keep it in mind. Curious:
I find it surprising to see a difference of 15min between builds just for the Netlify processing/upload :o At the same time your site seems to be quite large Maybe we could investigate something like incremental build (like Gatsby) to see if it's possible to rebuild faster, but it's unlikely we'll have time to do this soon (would be after 2.0.0 RC) |
Roughly 1500 docs and growing.
See above. We aren't currently using the versioning feature.
It varies, but it can still take as much as 26 minutes total to complete the Netlify build. When files changed, we can expect that time to increase +10-15 minutes or more.
+10-15 more minutes with a single file change, depending on which dependencies need to be regenerated, i.e. renamed.
We're working with Netlify closely on this. Their post processing can be tweaked but even with everything off they still perform some "processing" of static files before uploading to CDN.
That sounds intriguing. I was also wondering if you and the team have considered moving away from webpack? Or, at least, moving away from including a hash in the core JS dependency filenames. If the filenames are static, meaning they don't change between builds, this problem goes away. It's something to consider because I'm sure caching/indexing is important to all D2 users/sites, it just so happens we're one of the first to grow to a scale large enough to notice the bug. Please let me know if there's anything I can help with to improve our understanding of this issue. P.S. If you or another contributor could help point me to where in the codebase the core dependency filenames are generated it would be greatly appreciated! I've been having a difficult time figuring it out. |
Thanks for the feedback, that's probably one of the largest Docusaurus site :)
Really hope we could improve these build times
v2.0 is still in alpha and we are focusing on the final release. Removing Webpack at this point in v2 would be an annoying breaking change for v2 early adopters and plugin authors so I don't see it happening. I'm working with Facebook for a few months on Docusaurus, can't say for them what the plans are for v3+ of Docusaurus, but I guess moving out of Webpack is a possibility. Also worth checking the new possibilities offered by native ESM support and toolings like Snowpack and Vite, but it's unlikely to happen anytime soon imho.
We focus on shipping i18n and better versioning first, as it's a blocker for releasing v2.0 RC, but I think we should try to solve this annoying problem (as it's not blocking v2 RC, but is still important). For now I think the caching story on Docusaurus is not so good and has not received enough investments, and most people actually use Docusaurus without setting and host cache headers. As Netlify etc automatically enable Etags, performances are still ok though, but we should definitively explain how to optimize Docusaurus site hosting performances. About including a hash in filename, this can enable immutable caching (see #3156). But maybe this hash is not required for all files and we can look for a Webpack config that produces a more "stable" output on docs changes.
If you want to investigate, help is welcome, because I have to continue my work on i18n this month, and am the only fulltime maintainer. Our monorepo is not very hard to work with and contribute to. If your changes work on Docusaurus 2 website I guess it should also work for your site too. Unfortunately, I am no Webpack guru, so I don't have any particular insight on what might be the solution, but it should probably be here: https://github.com/facebook/docusaurus/blob/master/packages/docusaurus/src/webpack Easy steps to contribute on this:
|
Thanks for the tips. I believe the following module is where the dependency file names are generated (at least where the hash portion gets added): https://github.com/facebook/docusaurus/blob/master/packages/docusaurus/src/webpack/base.ts#L65
I might be misunderstanding but I thought immutable hashing wasn't reliant on filename, but on including a cryptographic hash in the script/link tag. If |
I'm not sure what you mean, but if the content of the file changes, we indeed want the filename to change. Changing the filename is a common practice to automatically invalidate http cache, used by a lot of people (including Gatsby etc). We really want to keep this because it has clear benefits. The issue is that if one doc changes, only a few output filenames should be modified (the ones related to the modified doc), not many/most/all of them (the behavior we seem to have). Sometimes having a shared file to change (like runtime-) produce a cascade of other filenames to change. We should ensure to prevent that behavior to happen. |
This NextJS related discussion about advantages of Webpack 5 is interesting, regarding the deterministic output https://stackoverflow.blog/2020/10/07/qa-with-the-creators-of-next-js-on-version-9-5/ |
Hi @slorber! Deterministic IDs looks like a game changer with respect to the caching issue described here. Is upgrading to Webpack 5 on the D2 roadmap? Do you know of a good way to upgrade webpack 5 in dev to test? |
@sserrata unfortunately not, we need to ship i18n first and then move to beta. I don't know how much of a breaking change Webpack v5 would be, particularly for plugin authors that implement the |
@sserrata a Webpack 5 PR is ready for review and I published a canary release: #4089 If you preserve the I'm not sure however that the deterministic output of Webpack 5 will satisfy you, as the runtime chunk hash still seems to changes every time a file is modified. Let me know if things improve but I guess it's somehow "normal", as the HTML files (can't be cached because they have a stable URL) should all see the "new SPA". |
@sserrata if you don't use cache-control headers (like "immutable") to hashed assets on your CDN, you can try to remove the hash from js filenames output. On Netlify, it will still provide etags-based caching, which is not too bad imho (and I guess most users don't even set more aggressive caching headers on their CDN) If you use {
output: {
filename: "[name].js";
}
} As far as I see, the chunks under If this setup works fine for you, I think we could make this a default for Docusaurus. Docusaurus is not a typical webpack app: it has a single entrypoint for all the pages, so any page modification modify this entrypoint. |
🐛 Bug Report
Docs/files that are otherwise unchanged between builds are marked as changed when the
runtime~main.<hash>.js
filename changes. This occurs since all generated HTML files importruntime~main.<hash>.js
. Since static site hosts like Netlify rely on file hashes for indexing this results in files incorrectly getting marked as changed between builds, which can greatly increase the overall build/deploy time. Our team noticed this behavior after our D2 site grew beyond 1K docs.Have you read the Contributing Guidelines on issues?
Yes.
To Reproduce
(Write your steps here:)
yarn run build
.cd build && find . -type f -exec md5 "{}" \; | sort
code --diff before_change_hashes.txt after_change_hashes.txt
(example using vscode)Expected behavior
Only docs/files that were intentionally changed between builds should be modified.
Actual Behavior
All static HTML files that import any or all of the following dependencies are modified when dependency filenames change following a build. This appears to be caused when the
<hash>
portion of each filename is changed between builds.runtime~main.<hash>.js
. # changes whenever any doc/file is modified between buildsmain.<hash>.js
# dependent files could be modified if this filename changesstyles.<hash>.js
. # dependent files could be modified if this filename changesstyles.<hash>.css
. # dependent files could be modified if this filename changesDepending on the size of the D2 site, this could potentially introduce many more modified files than expected between builds, which could render indexing by hosting/build sites like Netlify, GitHub Actions, et al., ineffective.
In the following screenshot, note all the changed files despite only
./docs/contributing/index.html
actually being modified:Your Environment
Reproducible Demo
Can be reproduced on any D2 site.
The text was updated successfully, but these errors were encountered: