Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate the circular dependency between the main loader and the worker #20580

Open
mmomtchev opened this issue Oct 31, 2023 · 11 comments
Open

Comments

@mmomtchev
Copy link

mmomtchev commented Oct 31, 2023

Currently, when using -pthreads emscripten produces a main loader which loads the WASM binary and spawns a pool of workers. In turn, each worker loads the main loader which loads the WASM binary. This allows to share the (quite complex) code that loads the WASM binary, but creates a circular dependency between the main loader and the worker.

What is the problem with a circular dependency and why does webpack complain?

Web browsers (and web servers), in their default configurations, tend to cache JS files with really long expiry times - because they know that these are static and can be very large. This means that if the JS on a site is updated without changing the names of the JS files, a client will load the new HTML - but will still use its own stale JS files.

To take advantage of this, and to allow websites to be easily updated, webpack includes a complex bundles/hash management feature. It divides the JS output into chunks and then uses an MD5 (or similar) to name them. When the site is updated, a client will download the new HTML and then will see that some of the chunks are not in its cache - because of the new names. This will prevent it from loading a Frankenstein version where some parts will be updated - and others not.

This is something that cannot be made to work with circular dependencies. In this case webpack emits a warning and does not generate hashes. These files are at risk of being loaded as parts of a Frankenstein version.

There is only one solution to this problem - the worker must load a file that does not reference it. This means that there must be two loaders - a main loader and a worker loader - which can eventually be inlined into the worker itself.

I see two main groups of solutions to this problem:

  • A Babel-like postprocessing is applied on the main loader to produce two files out of it - using something such as babel-plugin-minify-constant-folding which will have to be improved for this task
    (-) Avoids additional complexity in the preprocessor
    (+) Adds yet another step
  • The preprocessor does this
    (+) Adds complexity to the preprocessor
@RReverser
Copy link
Collaborator

There is only one solution to this problem - the worker must load a file that does not reference it.

Arguably, the other solution is for webpack to fix this on their side. As mentioned on the original issue where we discussed this, new Worker is different from regular module loading as they don't share same environment, so Webpack shouldn't be treating it as a circular dependency in the first place, but as a completely separate entry point, so it could still hash it together with its own dependencies.

Webpack will have same issues with other projects employing similar patterns, e.g. https://github.com/GoogleChromeLabs/wasm-bindgen-rayon which doesn't have similar level of control over main JS (as wasm-bindgen doesn't provide it) and won't be able to implement the same hack with two different environments within a single file. In general, it would be rather inefficient to work around this problem in each project separately when it can be fixed in the bundler once for all.

@mmomtchev
Copy link
Author

There is no way you can generate hashes out of the file contents for two files that contain each other's filename - including the generated hash.

@RReverser
Copy link
Collaborator

There is no way you can generate hashes out of the file contents for two files that contain each other's filename - including the generated hash.

You can. There are multiple ways to go about it, for example by converting any circular reference into an index of a previously seen node - similar to common workaround for encoding circular structures into JSON or printing them to console.

This gives stable hash and eliminates the issue with an infinite recursion where, to compute hash for the current node, you'd first need to compute hash for its deps including the current node.

Another way, as mentioned above, is that Webpack could also just not treat new Worker as a regular dependency and ignore it during hashing of the main file, which also eliminates recursion. After all, it can be computed and updated separately, as long as it references the same main file (which separately computed hash will prove).

@mmomtchev
Copy link
Author

You can't implement the full feature - content-based hashes - in all of its glory in this case. There are certainly various alternative strategies which give a similar result - but all of them will rely on webpack - or any other bundler - treating the emscripten-produced WASM case as a special case - or otherwise implementing major changes. This won't go into webpack for sure. It might go into an emscripten plugin, but I don't think such a plugin is a good idea.

The basic problem here is that the JS specifications allow for circular dependencies, but the web world has decided that it is a bad idea. Both camps have their arguments. Usually, in IT in particular, and in any industry sector, in general, in this case the biggest crowd wins. It is not about the technology and the validity of the arguments - it is about compatibility. And the truth is that the bundlers have been there for 10 years now (I just checked - first version of webpack is Feb 2014). Ironically, this is a whole year after asm.js - the WASM precursor. However bundlers are used on close to 100% of the Alexa 1000 sites. While WASM - despite its absolutely huge potential - is still very rare. It remains a very difficult technology. The npm module authors are afraid of it. The web developer are afraid of it. Project management is afraid, marketing is afraid.

rollup cannot be made to work without a plugin because this was the rollup authors' choice - in fact almost nothing works in rollup without a plugin. webpack is however possible. webpack also opens the door to React. And most React projects do not touch their webpack configuration which remains hidden under the layers. This is why I think that getting webpack to work without a fault is a very good idea.

@RReverser
Copy link
Collaborator

You can't implement the full feature - content-based hashes - in all of its glory in this case.

Can you elaborate why not? I described the implementation that would work with recursive dependencies (and, indeed, that's how recursive graphs are traditionally hashed by content). If you think it won't work specifically for JS dependency graphs, it would be helpful to provide specifics.

@RReverser
Copy link
Collaborator

RReverser commented Nov 2, 2023

This won't go into webpack for sure. It might go into an emscripten plugin, but I don't think such a plugin is a good idea.

Have you tried raising an issue with them? I found webpack maintainers to be very responsive and open in the past, so I wouldn't claim a potential fix, and one that would benefit multiple projects, won't go into webpack for sure without trying first.

@simonbuehler
Copy link

I worked around it by doing ugly stuff like:

allocateUnusedWorker() {
    var worker;
    var scriptforworker = "opencv_js.worker.js";
    if (!Module["locateFile"]) {
        worker = new Worker(new URL(scriptforworker, import.meta.url), {
            type: "module"
        })
    } else {
        var pthreadMainJs = locateFile(scriptforworker);
        worker = new Worker(pthreadMainJs, {
            type: "module"
        })
    }
    PThread.unusedWorkers.push(worker)
}

so vite would not process it, any ideas how to get the underlying issue solved?

@andreamancuso
Copy link

andreamancuso commented May 16, 2024

+1

I recently started working on a new open source library bridging the gap between React and WASM: https://github.com/andreamancuso/react-wasm

It's early days, code quality is sub-optimal, no tests, etc.

Anyway, I also got caught by webpack's circular dependency warning. I'd rather not just silence it. Either way, I most certainly would not want to encourage users of this library to do the same as they may be unable to do that (Create React App users in particular).

Any help with this would be much appreciated.

@sbc100
Copy link
Collaborator

sbc100 commented May 16, 2024

As of #21701 there is no more separate work file. There is just a single JS file and each work runs the same file.

I guess this means we have a script that depends directly on itself.. but I don't see any way around that. Does the help with bundlers?

@andreamancuso
Copy link

Thank you for the prompt reply. I added a reply to this other issue: #21844 (comment)

Truth be told, I am still quite new to emscripten in general, so it's likely I'm doing something silly somewhere.

@jamsinclair
Copy link

Sharing some information for Vite users. Self-referential circular dependencies are now supported with Vite when using versions greater than v5.2.2 and module workers.

You may need to configure the following config:

  • Configure emscripten with -s EXPORT_ES6=1
  • Use Vite versions >= 5.2.2
  • Configure worker settings in your vite.config.js
worker: {
  format: 'es'
}

Recent versions of Emscripten (>= v3.1.58) have broken Vite builds for a different reason and noted in #22394

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants