Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(experimental) worker isolation #24947

Closed
FranckFreiburger opened this issue Dec 10, 2018 · 31 comments
Closed

(experimental) worker isolation #24947

FranckFreiburger opened this issue Dec 10, 2018 · 31 comments
Labels
confirmed-bug Issues with confirmed bugs. worker Issues and PRs related to Worker support.

Comments

@FranckFreiburger
Copy link

  • Version: 11.3.0
  • Platform:win 7 (64)
  • Subsystem:

I just wondering if this lack of isolation is expected:

new Worker(`process.env.FOO = 123`, { eval: true });
new Worker(`console.log(process.env.FOO)`, { eval: true }); // prints 123
@addaleax addaleax added question Issues that look for answers. worker Issues and PRs related to Worker support. doc Issues and PRs related to the documentations. and removed question Issues that look for answers. labels Dec 10, 2018
@addaleax
Copy link
Member

I think at the very least the documentation does not match the actual behaviour here… I think in an earlier version of the workers implementation, we did what the doc states (not allow modifications from worker threads), but it seems that right now we also allow write access?

/cc @nodejs/workers

@devsnek
Copy link
Member

devsnek commented Dec 10, 2018

i prefer the documented behaviour

@FranckFreiburger
Copy link
Author

@addaleax, is this behavior thread-safe ?

@addaleax
Copy link
Member

@FranckFreiburger We access the environment variables with a mutex, both for reading and for writing, so this should be okay (at least as far as Node is concerned). But like @devsnek, I think I’d prefer the documented behaviour.

@joyeecheung
Copy link
Member

So this is a regression? (trying to apply labels)

@addaleax
Copy link
Member

@joyeecheung This behaviour (and the wrong docs) have been present since workers have first been released; I assume these checks went lost during a rebase or similar… so I don’t think it’s a regression, it’s just a bug?

@addaleax addaleax removed the doc Issues and PRs related to the documentations. label Dec 10, 2018
@joyeecheung joyeecheung added the confirmed-bug Issues with confirmed bugs. label Dec 11, 2018
@sagitsofan
Copy link
Contributor

sagitsofan commented Dec 22, 2018

@addaleax I will pick this one.
Should the appropriate solution needs to block the write access to process.env and emit a warning / exception ?

@addaleax
Copy link
Member

@sagitsofan I don’t think we want a warning or exception; rather, not installing the setters for process.env should be the most JavaScript-y solution here, I think?

@sagitsofan
Copy link
Contributor

sagitsofan commented Dec 22, 2018

@addaleax But if we silently will not install the setters of process.env the user will think this operation have done successfully, isn't it? shouldn't a warning is the appropriate way?

@devsnek
Copy link
Member

devsnek commented Dec 22, 2018

@sagitsofan you should use strict mode, so it will throw an error.

sagitsofan added a commit to sagitsofan/node that referenced this issue Dec 24, 2018
Added strict mode inside the worker script.

Fixes: nodejs#24947
@sagitsofan
Copy link
Contributor

Opening a new PR

@sagitsofan
Copy link
Contributor

PR #25202 is ready

Trott pushed a commit to sagitsofan/node that referenced this issue Dec 24, 2018
Added strict mode inside the worker script.

Fixes: nodejs#24947
@isaacs
Copy link
Contributor

isaacs commented Jan 21, 2019

I am currently exploring using worker_threads to simulate processes in programs that would otherwise spawn a lot of child procs.

I'm very much in favor of making the env isolated, but can it be done without making process.env completely locked down? For example, if I can swap out the whole process.env object, or make the values set in the worker but not in the parent, that would be ideal.

(I realize that of course this is not going to work with native addons or other parts of the program that access the env, since there is actually only one environment per process.)

@addaleax
Copy link
Member

I do like @isaacs’ suggestion.

@nodejs/workers Thoughts?

@benjamingr
Copy link
Member

(I realize that of course this is not going to work with native addons or other parts of the program that access the env, since there is actually only one environment per process.)

This makes it sound leaky.

I think there is value in providing this functionality to simulate processes communicating through the environment. I'm not sure process.env is where I'd put this sort of "configuration shared state".

@addaleax
Copy link
Member

@benjamingr It is a bit leaky, but really, native addons are the only items I could think of in terms of what would really be caught off guard by this.

@benjamingr
Copy link
Member

@addaleax what about exposing a worker.env or something similar?

@addaleax
Copy link
Member

@bmeck The variables would come from the Worker’s (foo’s) process.env, in your example including the modifications to process.env.PATH. (Unless we explicitly choose to change this in some way. But I don’t think we’d want to?)

@FranckFreiburger
Copy link
Author

FranckFreiburger commented Feb 22, 2019

🎉
I think that semantically process is the process, and process.env is the process's env.
This mean the the current behavior is consistent.
Then I think that any other behavior should be subject to a new API (maybe even remove process access from worker threads).

I agree that Isaac’s suggestion is the most convenient (and is my use case ie. run in a worker thread like if run in normal process) but is the less consistent.

@addaleax
Copy link
Member

@FranckFreiburger One unfortunate circumstance is that process has historically more of a place to put random items, rather than something actually specifiying the current process (e.g.: process.nextTick(), process.version/process.versions, process.platform, process.domain, process.dlopen(), etc.).

In Workers, the closest thing we have is referring to the current thread. And we can’t really remove process access from worker threads without a ton of breakage, given how common usage of process.nextTick() or the stdio streams is; accessing a lot of per-process state, such as the current working directory, is already restricted.

@bmeck
Copy link
Member

bmeck commented Feb 22, 2019

My concern here is that we get to a situation where emulating different process.env is possible with reasonable effort, but emulating having the same env is not. Technically, if it is normally shared you can always replace the .env descriptor via means like:

process.env = makeFakeENV();
// ... replace usages in child_process ...

However if we separate them by default, that makes it hard to share them, even though C++ would allow sharing them. I think choosing the one that matches the underlying structure and that still allows users to replace it with there is less likely to be something that causes inconsistency over time.

I think any solution (read only, or shared by default) would allow the separated form to be virtualized ontop of them, while the converse is not true.

@addaleax
Copy link
Member

@bmeck I see … I think you could still reasonably implement sharing process.env through message passing (possibly making it synchronous with Atomics.wait), but I can see that there are downsides to that approach.

How would you feel about making sharing with the parent thread possible as an opt-in?

@bmeck
Copy link
Member

bmeck commented Feb 22, 2019

@addaleax I think opt out makes more sense, you are opting out of the technical underpinnings threads would be expected to represent; this seems more like users making a choice about their intent than opting into the regular underpinnings that threads represent.

@mcollina
Copy link
Member

I'm ok both ways.

@Fishrock123
Copy link
Contributor

I would certainly prefer that env be across the whole process.

My only worry is that... I dunno people might try use it to use it to share state lol

@gireeshpunathil
Copy link
Member

I support the current behavior, and propose the doc to align with the code:

  • process.env reflects a single source of truth across the execution space.
  • provides simplicity and clarity for the users and the providers.
  • workers already enjoy the benefit of being in-process, in many cases. Also they share resources with main thread, and make side effects to the overall process. So why to isolate process.env alone?
  • Maintains consistency with other server side languages and platforms. In multi-threaded programs, threads are first class citizens and are privileged to run any code interchangeably with main thread (browsers may have specific reasons for isolation that is not relevant here IMO)

On the other hand, I think using process.env as a communication channel between running threads is a bad idea. We do have channel abstractions for the same (message ports) that provides fine grained, non-blocking communication facility.

However, ability for workers to use this entity to control child process environment improves the use case for workers by large. But isn't spawn(proc, args, env: Object.assign({}, process.env, { foo: 'bar' }) already possible?

@joyeecheung
Copy link
Member

joyeecheung commented Feb 23, 2019

If we do not allow invocations of process.chdir and the mutable process.umask in workers, I think we should disable modifications to the environment variables in workers as well.

@gireeshpunathil
Copy link
Member

I don't know the background of several process.x() being made unavailable to workers, I thought those are to do with either:

  • absence of a specification / requirement on the behavior of those, and / or
  • internal complexity of implementing those in the original version, but can incrementally add support for each of these APIs.

what drawback, in particular, do we see if we have a single read-write piece of memory for process.env that is protected by exclusive locks?

@joyeecheung
Copy link
Member

joyeecheung commented Feb 23, 2019

I believe the process methods are made unavailable in workers because we generally only allow the main thread to own (and mutate) per-process states - however the workers are able to read these states just fine. It's always possible for the users to post messages to the main thread and handle them to perform these mutations while using Atomics.await to do whatever locking their use cases require, so the question is, what is the benefit of shifting this responsibility into core and make process.env mutable in workers? Should we allow multiple workers to mutate the environment variables at the same time and defer to them to perform proper locking? (the setter implementation is protected with a mutex, but the user code around the setter invocations still need to take care of any races on their own)

addaleax added a commit to addaleax/node that referenced this issue Mar 30, 2019
Instead of sharing the OS-backed store for all `process.env` instances,
create a copy of `process.env` for every worker that is created.

The copies do not interact. Native-addons do not see modifications to
`process.env` from Worker threads, but child processes started from
Workers do default to the Worker’s copy of `process.env`.

This makes Workers behave like child processes as far as `process.env`
is concerned, and an option corresponding to the `child_process`
module’s `env` option is added to the constructor.

Fixes: nodejs#24947
addaleax added a commit that referenced this issue Mar 30, 2019
Allow a generic string-based backing store, with no significance
to the remainder of the process, as a store for `process.env`.

PR-URL: #26544
Fixes: #24947
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Yongsheng Zhang <zyszys98@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
addaleax added a commit that referenced this issue Mar 30, 2019
Instead of sharing the OS-backed store for all `process.env` instances,
create a copy of `process.env` for every worker that is created.

The copies do not interact. Native-addons do not see modifications to
`process.env` from Worker threads, but child processes started from
Workers do default to the Worker’s copy of `process.env`.

This makes Workers behave like child processes as far as `process.env`
is concerned, and an option corresponding to the `child_process`
module’s `env` option is added to the constructor.

Fixes: #24947

PR-URL: #26544
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Yongsheng Zhang <zyszys98@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
BethGriggs pushed a commit that referenced this issue Apr 5, 2019
Abstract the `process.env` backing mechanism in C++ to allow
different kinds of backing stores for `process.env` for different
Environments.

PR-URL: #26544
Fixes: #24947
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Yongsheng Zhang <zyszys98@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
BethGriggs pushed a commit that referenced this issue Apr 5, 2019
Allow a generic string-based backing store, with no significance
to the remainder of the process, as a store for `process.env`.

PR-URL: #26544
Fixes: #24947
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Yongsheng Zhang <zyszys98@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
BethGriggs pushed a commit that referenced this issue Apr 5, 2019
Instead of sharing the OS-backed store for all `process.env` instances,
create a copy of `process.env` for every worker that is created.

The copies do not interact. Native-addons do not see modifications to
`process.env` from Worker threads, but child processes started from
Workers do default to the Worker’s copy of `process.env`.

This makes Workers behave like child processes as far as `process.env`
is concerned, and an option corresponding to the `child_process`
module’s `env` option is added to the constructor.

Fixes: #24947

PR-URL: #26544
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Yongsheng Zhang <zyszys98@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
BethGriggs pushed a commit that referenced this issue Apr 9, 2019
Abstract the `process.env` backing mechanism in C++ to allow
different kinds of backing stores for `process.env` for different
Environments.

PR-URL: #26544
Fixes: #24947
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Yongsheng Zhang <zyszys98@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
BethGriggs pushed a commit that referenced this issue Apr 9, 2019
Allow a generic string-based backing store, with no significance
to the remainder of the process, as a store for `process.env`.

PR-URL: #26544
Fixes: #24947
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Yongsheng Zhang <zyszys98@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
BethGriggs pushed a commit that referenced this issue Apr 9, 2019
Instead of sharing the OS-backed store for all `process.env` instances,
create a copy of `process.env` for every worker that is created.

The copies do not interact. Native-addons do not see modifications to
`process.env` from Worker threads, but child processes started from
Workers do default to the Worker’s copy of `process.env`.

This makes Workers behave like child processes as far as `process.env`
is concerned, and an option corresponding to the `child_process`
module’s `env` option is added to the constructor.

Fixes: #24947

PR-URL: #26544
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Yongsheng Zhang <zyszys98@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
BethGriggs pushed a commit that referenced this issue Apr 9, 2019
Abstract the `process.env` backing mechanism in C++ to allow
different kinds of backing stores for `process.env` for different
Environments.

PR-URL: #26544
Fixes: #24947
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Yongsheng Zhang <zyszys98@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
BethGriggs pushed a commit that referenced this issue Apr 9, 2019
Allow a generic string-based backing store, with no significance
to the remainder of the process, as a store for `process.env`.

PR-URL: #26544
Fixes: #24947
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Yongsheng Zhang <zyszys98@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
BethGriggs pushed a commit that referenced this issue Apr 9, 2019
Instead of sharing the OS-backed store for all `process.env` instances,
create a copy of `process.env` for every worker that is created.

The copies do not interact. Native-addons do not see modifications to
`process.env` from Worker threads, but child processes started from
Workers do default to the Worker’s copy of `process.env`.

This makes Workers behave like child processes as far as `process.env`
is concerned, and an option corresponding to the `child_process`
module’s `env` option is added to the constructor.

Fixes: #24947

PR-URL: #26544
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Yongsheng Zhang <zyszys98@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. worker Issues and PRs related to Worker support.
Projects
None yet