-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Clarification around real world risks and use cases of VM module #40718
Comments
i think the primary use case of vm is tooling. projects like jest, for example, make extensive use of it. however as soon as you start getting into running code that you haven't vetted, you must take 100% of the burden of validation onto yourself. the amount of footguns in the vm module here is immense. For example the option you mentioned, maybe something we could put in the documentation there is a link to https://github.com/laverdet/isolated-vm, which does actually provide a way to run untrusted code with the guarantee that it cannot escape. |
I think you summed up the footgun aspect of it much better than I ever could :P The first thing the average person thinks when they see the I think adding a little context (again, no pun intended) even as simple as the bit about tooling or testing frameworks would be super useful to help people understand/digest what it could be used for. Linking to something like I realize my initial post might be a little long-winded, but I think there's a solid opportunity here for clarifying while also pointing people in constructive directions. I'd be more than happy to help write something up, but at the same time, I'll admit I definitely wouldn't put myself in the boat that fully understands all the caveats here, so having someone that does being able to ELI5 to a broader audience is definitely needed. |
Here is an odd idea: maybe we should embed isolated-vm (or a very similar approach) into Node.js itself. cc @bmeck |
As discussed in the Hackerone report #1398777, I disagree with this statement. I think building a secure sandbox on top of this module is close to impossible because of its unpredictable/inconsistent behavior. IMHO, the statement "Do not use it to run untrusted code" should be emphasized further "Do not use it to run untrusted code or for building systems that run untrusted code". It should also be colored in red and underlined if possible. 😁 However, I agree that including a pointer to something like |
cc @laverdet |
@mcollina We weren't sure why this was on the TSC agenda. (Awareness? Decision? Something else?) We pushed it to next week's agenda, but if you want to leave a sentence or two explaining what the ask of the TSC is here, that would be great. |
There are two fundamental questions:
Note this ties with some of the discussion about primordials. |
Hi I'm the isolated-vm guy. I personally don't think the node team should attempt to implement and maintain secure environments in core node. Additionally, I think blessing isolated-vm via a link in the core documentation could be irresponsible. isolated-vm is a great tool but still needs to be utilized by seasoned experts to avoid numerous foot guns. I'm happy to jump on the TSC call to provide more color, but I'm also getting the sense that there's not actually a strong push here to get this kind of thing embedded. |
I'll preface this by saying I don't have a strong opinion on whether something is added to node core itself, or whether something like I don't believe anyone is saying there aren't footguns or bad outcomes to doing this incorrectly, but I want to call out that's already happening right now; we're already living in the worst case scenario. There's plenty of use cases where people are already using the Adding a link to the docs isn't giving a blessing to blindly use I also strongly believe "running untrusted code" is somewhat of a strawman, because most people actually want to run things in more constrained scenarios or with restrictions, and being able to clearly articulate that with different options, along with their risks would be extremely valuable. Unpopular opinion, and nothing personal against the authors of a lot of these libraries (I think they're awesome), I absolutely hate the security warnings on a lot of modules, including things like tl;dr - I strongly believe there's a large gap between blindly running untrusted code and a lot of the practical use cases people actually care about. I 100% agree we can't boil the ocean, but we can definitely boil a few cups of water or even a whole pot. Appreciate the discussion nonetheless :) |
Documenting constraints and invariants of an API is just part of the job and, in the case of isolated contexts, security is a huge part of those constraints. I can't speak for other modules but the "Security" section of isolated-vm's README, has lots of actionable items [check nodejs for v8 updates which sneak in via semver-minor updates, implement process isolation, enable untrusted code mitigations, stay up to date on local privilege escalation vulnerabilities in your kernel, research container escape attacks, be aware of advanced persistent threats]. Adding more information about security actually came at the insistence of @cristianstaicu who chimed in here earlier. The internet is an exceptionally hostile sea and getting these things wrong ends entire companies and causes non-theoretical real-world human suffering. The more awareness that can be raised about these things the better. |
I'll completely agree the security section for Let's say someone wants to run untrusted code with no context. In other words, I have a string, I want to execute it in a fresh context with nothing passed into it at all and return a value. Is there a safe way to do this? const isolate = new ivm.Isolate({ memoryLimit: 8 });
const context = await isolate.createContext();
const data = context.eval(code, { timeout: 1 }) If we ignore the possibility of trying to blow up a process's memory or running an infinite loop, is the above code "safe"? By safe, are there known vulnerabilities or exploits that can be used to escalate privileges, read env variables, etc.? Let's take an even more strict use case. What if we use something like // parse code using babel
// check the node type of each node
// if node type is not a string or numeric literal, throw an error, else continue
const isolate = new ivm.Isolate({ memoryLimit: 8 });
const context = await isolate.createContext();
const data = context.eval(code, { timeout: 1 }) The ultimate point I'm trying to make is there is a difference between an absolutist point of view on security and a pragmatic view on security. An absolutist point of view is that nothing is technically safe, there could be unknown zero-day exploits in V8 allowing remote code execution, and perhaps the nodejs.org homepage should simply have a giant red banner on it at all times stating to not use nodejs unless you have a deep understanding of security and everything involved, but I'm assuming most people would laugh me out of the room if I suggested that (rightfully so :P). What I'm suggesting is that node take a more pragmatic approach and walk people through a handful of real-world use cases and risks. Start with using the I'll admit, when I first created this issue it was a bit opened ended, and it's probably gotten even broader since then, so I'll try to summarize my thoughts in a sentence: the current EDIT:
I completely understand this view, but the question I'd challenge you with is if slapping security warnings or basically making people feel like unless they have 50 years of experience the answer is always "no you can't do that safely" is the most effective way to mitigate that risk? Or is it possible we still explain the very real risks of doing it wrong while still providing ways to practically apply said functionality using best practices in a safe way. |
Since I am the guy arguing for big red security banners, I'll try to defend that viewpoint below. Overall, I like the examples @michael-melodyarc gave above, and a few months back, I would have agreed that they are a valuable addition to the current Node.js docs. But they would give a false sense of security to those reading them (especially the empty context example, as I will discuss below). Like @devsnek above, I thought that we can build fairly secure sandboxes on top of the We are in the process of disclosing these patterns to the maintainers of the sandboxes, so unfortunately I cannot share them here, but if you have access to the Hackerone report I mentioned above, please take a look. The problem is that the maintainers do not have many options here: they can either abandon their fairly-popular packages or put pressure on you guys to fix the problem in the Node.js core, since their membrane-based approaches cannot catch these problematic references. My report was closed as "informative" after getting the following reply: "This looks like a vulnerability on those sandboxes and not the I know that this is a bit vague without a clear code example, and I apologize for that. I hope my viewpoint is clear, though: the docs should reflect the reply I got on Hackerone. That is: do not build anything dealing with untrusted code using this module, as there probably are some quirky corner cases, which we are not willing to support you with. This will prevent situations like the ones we are in: millions of users are exposed (according to the download stats at least) without an easy way to deploy a fix. I know this will sound like academic blah-blah, but making security assumptions clear is crucial for smooth collaboration in open-source projects. |
@cristianstaicu Appreciate the insight; while I'm not privy to your specific work (it definitely sounds interesting), I understand where you're coming from and what you're wanting to protect people against. I want to clarify a few things on my part and wrap up with hopefully some concrete actions that node core could say yes or no to. I'm not advocating we make the I know you can't get into the details, so perhaps Maybe the answer is you literally can't do anything securely no matter how simple, which is certainly not a fun answer, but if it's accurate, it is what it is. I'm hopeful we could provide some "cookbook" type docs that would maybe walk through some use cases better, and give examples, while still calling out risks, etc. in an easier to digest way. I'm not suggesting node core makes any absolute security guarantees about any of this, or expecting @laverdet to make any guarantees about blindly using Before I get to the list of concrete actions, I'll call out one assumption I'm making, which is there are at least some ways to do certain subsets of things securely, whether that's through
Without going down a rabbit hole, I'm simply curious if there is guidance (even if high-level) that could be given to help people at least start exploring this area or doing scoped down use cases of "executing untrusted code" even if it's not in the traditional sense of blindly running a blob but more so in the realm of "check if code is valid expression, parse into AST, validate node types, only use primitives, etc." To end, I'm more than happy to spend some time writing up some sections for the docs if everyone is open to that, but I'll also acknowledge I don't fully understand this problem space, so it's a bit hard for me to write up what some of these sections might ultimately look like. Appreciate the in-depth responses. |
@cristianstaicu - throwing out one last thing here. I'm genuinely curious what your initial reaction to something like this would be, is it (1) this is relative sane/safe and should work as expected, or (2) I can't believe there are people in the world who would even consider writing code like this :P import ivm from 'isolated-vm';
import { parseExpression } from '@babel/parser';
import { traverseFast } from '@babel/types';
const code = '...';
const someJsonBlobThatIsParsed = JSON.parse('...');
const allowedNodeTypes = [
'BinaryExpression',
'Identifier',
'LogicalExpression',
'MemberExpression',
'NumericLiteral',
'StringLiteral',
];
traverseFast(parseExpression(code), (node) => {
if (!allowedNodeTypes.includes(node.type)) {
throw new Error('Node type not allowed: ' + node.type);
}
});
const isolate = new ivm.Isolate({ memoryLimit: 8 });
const context = await isolate.createContext();
await context.evalClosure(
'for (const [k, v] of Object.entries($0)) globalThis[k] = v',
[someJsonBlobThatIsParsed],
{ arguments: { copy: true } },
);
const result = await context.eval(code, { copy: true, timeout: 1 }); The goal was to allow simple JavaScript expressions, like |
First of all, The code you shared above looks ok-ish to me (maybe a yellow flag for this statement So, yes, I agree with you, there may be some safe use cases if you aggressively restrict the input, but IMHO most of the real-world use cases do not look like this. Thus, it is important to know what happens in case you messed up and your first line of defense, e.g., your membrane or static analysis, is not effective. Myself, I would either opt for a frozen runtime (à la SES) or for an isolation mechanism that uses the V8 Most importantly, you would want to make sure you are not in the case I described above, when the building blocks of your infrastructure (the |
Sorry, haha, I was being a bit tongue-in-cheek with my number 2 option :P I was more poking fun at myself since I wrote that snippet. I must have also gotten mixed up somewhere along the way as I thought you were initially against making any mention of Anyways, wholeheartedly agree with the rest of your comment. Some of my suggestions are probably aimed at So you want to execute some simple code that only allows some literals and a few operators? Okay, we can do that. Security is hard, and I think the current lack of examples (good uses and bad uses) makes it difficult for most to understand the why behind the recommendations. People generally aren't great thinking in abstract terms, so while saying Anyways, I'm rambling now, but I think there are a ton of learning opportunities in and around this problem space (it's a cool space), some which touch node core and some which don't, that could make a real difference to people looking at the docs and messing around with things. |
@devsnek @mcollina - Piggy backing on:
Is there any context (no pun intended) for why the current Thought exercise, if |
I do not have the time to contribute much to this discussion minus the viewpoint of a security triager. I think it would be awesome for the folks involved in this discussion to collaborate on some updated text for the documentation. If you all would also like to work on some additional code which guarantee more safety it would be awesome too. |
I think people asking for ways to make it more secure should attend the SES/Realms TC39 calls, because I completely agree with @michael-melodyarc that doing so is not feasible or practical in a general sense. Lots of time in those calls goes into things like how storing an Object in an internal slot (private field) can absolutely break out of a membrane. These kinds of nuances are not things I think could be taught by simple documentation nor would be backwards compatible with the API design of the |
From the TSC meeting today. We did not have people with the required context but this is what we discussed on the two questions to the TSC.
|
It also wraps functions in a way that’s very different than vm plus throwing. |
I agree, but the devil is in the details. Since the vm module is a wrapper around V8 Isolates, in theory it could provide similar guarantees as let's say isolated-vm, shadowrealm or any other similar proposal, but it does not. As I mentioned above, we found a clear bug in Node.js/vm module and I reported it on HackerOne because it breaks the assumption that "membranes on top of vm module are enough to isolate untrusted code" in a lot of sandboxes we analyzed. I got the reply: "This looks like a vulnerability on those sandboxes and not the vm module itself as it is clearly marked as not being security sandbox. We clearly do not provide any security guarantee on the behavior of vm." I understand that the Node.js team can decide what goes into their product and what not, but this should be made clear to the community, the sandbox creators, users, etc. Once again, this discussion is about willingness to support certain use cases over others. |
That's not true. The |
Sorry, my mistake there. What I wanted to say is that for base isolation, all these systems rely on tried and tested building blocks from the engine (Isolate, Context). So they should provide similar guarantees as those building blocks, or? My understanding is that inside the browser, Contexts are also used to isolate mutually untrusted components, i.e., iframes, scripts of Chrome extensions, etc.. So why couldn't they be used for the same purpose in Node.js? Am I missing something? Sure, once you allow shared references between contexts a lot of subtle problems may appear, but the bug we found was in a "fresh" new context where you do not expect many such subtleties. |
At the risk of coming across hyperbolic, can the docs be updated to include something like the following (ignore the exact wording, just iterating intent)?
At this point, there are two main discussions here:
The reason I keep getting hung up on 2 is that the |
While I tend to say yes to your first proposal/question, I am not sure that that is the responsible thing to do at this point. There is a certain sandbox with five million weekly downloads, built on top of |
But they already are on their own, they just don't know it. I totally agree with you in terms of not wanting to leave people to fend for themselves, but it seems like we're already living in the worst case scenario: things aren't secure/safe AND no one wants to say it out loud (minus a one sentence disclaimer in the docs). To me, the most responsible thing to do is make it blatantly obvious what they're doing isn't safe (and even better if can give clear examples / information as to why). |
|
remove the tsc-agend as there is a PR open. |
Refs: #40718 PR-URL: #41916 Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Beth Griggs <bgriggs@redhat.com> Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com> Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Mestery <mestery@protonmail.com> Reviewed-By: Derek Lewis <DerekNonGeneric@inf.is> Reviewed-By: James M Snell <jasnell@gmail.com>
Is this issue resolved? |
I think so, but if anyone has more concrete things that need to be done, leave a comment! |
Refs: nodejs#40718 PR-URL: nodejs#41916 Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Beth Griggs <bgriggs@redhat.com> Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com> Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Mestery <mestery@protonmail.com> Reviewed-By: Derek Lewis <DerekNonGeneric@inf.is> Reviewed-By: James M Snell <jasnell@gmail.com>
Refs: nodejs#40718 PR-URL: nodejs#41916 Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Beth Griggs <bgriggs@redhat.com> Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com> Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Mestery <mestery@protonmail.com> Reviewed-By: Derek Lewis <DerekNonGeneric@inf.is> Reviewed-By: James M Snell <jasnell@gmail.com>
Refs: #40718 PR-URL: #41916 Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Beth Griggs <bgriggs@redhat.com> Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com> Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Mestery <mestery@protonmail.com> Reviewed-By: Derek Lewis <DerekNonGeneric@inf.is> Reviewed-By: James M Snell <jasnell@gmail.com>
Refs: nodejs#40718 PR-URL: nodejs#41916 Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Beth Griggs <bgriggs@redhat.com> Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com> Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Mestery <mestery@protonmail.com> Reviewed-By: Derek Lewis <DerekNonGeneric@inf.is> Reviewed-By: James M Snell <jasnell@gmail.com>
Refs: #40718 PR-URL: #41916 Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Beth Griggs <bgriggs@redhat.com> Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com> Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Mestery <mestery@protonmail.com> Reviewed-By: Derek Lewis <DerekNonGeneric@inf.is> Reviewed-By: James M Snell <jasnell@gmail.com>
📗 API Reference Docs Problem
Version: n/a
Platform: n/a
Subsystem: VM
Affected URL(s):
Description
Apologies if this isn't the appropriate place for this; I wasn't sure if it fell under help versus docs versus somewhere else.
The official docs for the VM module state the following in the first line of it's description.
The last line of the Example: Running an HTTP server within a VM section says:
Now, I think it's fairly safe to say it's quite clear that blindly running arbitrary code through the
vm
module (or anything else) is most definitely not secure or safe. Although the second quote is a bit less definitive and things like thecontextCodeGeneration
option inrunInNewContext
are almost a bit of a teaser that might lead people to believe in some cases it is okay, but ultimately I digress :)This issue is two-fold:
First, I'm curious if it's possible to have additional context (no pun intended) about when the
vm
module is useful in practice to the average developer. I think it's safe to say a lot of people misunderstand it, and in some cases that can have very bad security consequences.Second, I'm curious if it's possible to elaborate on situations where using the
vm
module in restricted/well-defined situations it could be safe to run code under certain conditions. The following is a contrived example and one could argue it's not even an example of untrusted code, but I'm hoping it'll at least illustrate the idea.Say for example you take an expression and parse it into an AST and validate that it only includes certain nodes. For sake of argument, let's say you only accept identifiers, numeric literals, and operators like
<
,>
,==
, and&&
. Now take the following example:Now, you could certainly argue if you're parsing a string into an AST, validating the nodes, etc. that you aren't really running untrusted code at that point, but bear with me. Is this even a valid use case for the module? Is this even safe assuming you can validate the AST and the context you're passing in? Does this make you want to hit your head against your desk? :P
Ultimately, there are obviously other ways to do this, you're already parsing it out, you could just make your own logic to piece it back together without ever running the actual code itself, but this seems like a simple way to actually achieve evaluation of simple JS expressions, among other potential use cases, etc.
I think the
vm
module is one of the most misunderstood modules out there, because a lot of developers I've talked with either completely misunderstand what it does, how you'd use it, or why you'd even use it, along with the potential for horrible security consequences when used inappropriately. It'd be awesome if the docs could include some more information on the practical side of things and where the module really shines in real world usage. Telling people when not to use something is certainly value added, but also illustrating real world use cases for when you would want to use it can be equally valuable.submit a pull request.
The text was updated successfully, but these errors were encountered: