Yarn Plug 'N Play should generate a static manifest file, not `.pnp.js` #6388

elldritch · 2018-09-14T09:40:09Z

This is a response to yarnpkg/rfcs#101 (comment). For context, my previous response to the PnP proposal is here.

I won't reproduce in detail the arguments against .pnp.js I've previously presented (they can be read in the context comment). To summarise:

.pnp.js makes dependency analysis very difficult in environments without Node.JS. I believe this covers a non-trivial number of use cases.
.pnp.js presents security concerns that are not outweighed by its advantages in implementation flexibility.

To directly address @arcanis's concerns:

please check the whitepaper, it includes a rational explaining why we went with a Javascript file

Correct me if I'm wrong, but I interpreted the rationale to be "the implementation is very complex and might change, so it's easier to provide an executable API".

How significant is your use case (running the Node resolution without Node) compared to the ecosystem at large?

Helping developers and enterprises use dependency analysis tools is my day job, and I see this use case again and again and again. One of the methods we use for Node.JS dependency analysis is basically "run npm ls --json and parse the output" and we've repeatedly encountered environments where Node.JS isn't available in a CI environment even for projects written in Node.JS. CI environments at large enterprises are often complex, multi-stage, and multi-image; integrating polyglot dependency analysis tools at the project-level CI stage where Node.JS is available is often infeasible.

This isn't an off-hand edge case. We get bug reports caused by missing Node runtimes during dependency analysis on a regular (~weekly) basis.

The flexibility you talk about comes at a price, which is the flexibility for everyone using Node - consumption from third-party JS tools would be made harder since users would have to know about the data layout, and a single implementation of the resolver could ever exist. Versioning data structures is notoriously complex, while JS APIs are easy to version and to make backward compatible regardless of the actual implementation.

This makes a lot of sense to me, and I sympathise with this concern. I think the critical question here is: how often do you think the data layout will change?

Static manifest files have a lot of advantages. They're easy to parse, easy to analyse, portable (requiring Node means the current API is not really portable), and pose significantly fewer security risks. It seems premature to give up on all of these advantages to gain flexibility that you might not end up really needing.

In what scenarios do you see the data layout changing? How often do you think this will occur?

I think the performance improvement from avoiding a copy is so great that I would be surprised if you needed constant data layout changes to improve performance. In the case where changes are not often needed, it's feasible to add a version field to the data structure and provide a resolver library that switches on the version.

I would be much less opposed to an executable API if it were portable (e.g. in Bash).

Given a standardized API, what prevents you from running a script after yarn install that would consume the API and generate a file in whatever format is adaptable to your use case?

I can't consume the API if I can't run Node. For reasons outside of my control, it's infeasible for me to get all integrating Node projects to install a post-install hook.

The text was updated successfully, but these errors were encountered:

arcanis · 2018-09-14T09:57:19Z

Hey! Thank you for raising this issue! It's an important point, and since it's goes a bit against the current state of things (even if not entirely - cf webpack.config.js, .eslintrc.js, etc) I'm more than willing to explain why we went there, and which problems we think it solves.

I can't consume the API if I can't run Node. For reasons outside of my control, it's infeasible for me to get all integrating Node projects to install a post-install hook.

This seems dubious to me. If you can run Yarn, you can run Node. Even in your scenario where you have two images, you still have one image with Node that you use to run Yarn. I don't see what prevents you from doing this conversion as a secondary step. It doesn't have to be a post install script: you could just run it yourself.

Correct me if I'm wrong, but I interpreted the rationale to be "the implementation is very complex and might change, so it's easier to provide an executable API".

Not so much that it's very complex - we just don't expect to have the same idea as to what the implementation would be as other package managers. NPM has an history of doing things in their own ways, and I don't want to end up in a scenario where projects have to live with X different formats (assuming that PnP gets traction). Standardizing it as a JS API is a good way to circumvent such issues.

They're easy to parse, easy to analyse

I disagree on the "easy to analyze" part. In order to analyze them, you have to understand them. Meaning that you have to implement the exact same logic than any other loaders, and any shortcut might give one of your users wrong results. An API is much easier to use and less error-prone.

In what scenarios do you see the data layout changing? How often do you think this will occur?

I don't know. What I do know is that until 24h ago, the prospect of removing node_modules wasn't something anyone would have deemed credible, but here we are. I'd like to make sure that the improvements we make won't lock us out of future evolutions.

I'm would be much less opposed to an executable API if it were portable.

I think Node is the most portable thing we can get for Node applications 😉

elldritch · 2018-09-17T20:47:33Z

If you can run Yarn, you can run Node.

I think Node is the most portable thing we can get for Node applications wink

This is where we disagree. My point is that "analyze a Node project in an environment without Node.JS" is not an uncommon edge case. This is my day job, and I've seen the pattern of "the organisation-wide CI does not have the runtime" over and over again across countless organisations, including top technology companies, Fortune 100s, etc.

It might help if I give you more context. Here is how I believe you imagine the integration process looks (this is how I thought of it at first too):

A developer writes Node.JS project.
A developer wants to integrate Node.JS tool.
A developer runs Node.JS tool in the build/test image for the Node.JS project, where Node-the-runtime is available.

Here is how this process actually looks across many companies, 90% of the time (this is the rule, not the exception):

Thousands of developers write hundreds of projects, some of which are in Node.JS.
Since each of these projects are different, they each run their own project CIs. Each CI is managed by a developer familiar with the project. The organisation has an organisation-wide CI that does final smoke testing and automatic release. The organisation-wide CI only has the common dependencies for smoke testing, which often does not include Node.JS.
An engineering decision-maker decides to integrate dependency analysis.
Individual project CI maintainers push back, because they see this as more work for them.

At this point, we have two options for integrating a tool across an organisation:

Go to each team responsible for a Node.JS project (there can be tens to hundreds of these teams), set up a meeting with them, figure out how to work with their project CI's quirks, and help them integrate the tool. In this path, the tool can rely on having Node.JS in exchange for having a long and painful path to integration.
Integrate once across the entire project at the organisation-wide CI step. In this path, we don't necessarily have access to Node.JS in the environment, but we can integrate across many projects at once.

I cannot overemphasise how common this organisation structure and use case is and how painful it is to integrate on a project-by-project basis. One significant advantage of using a file format is that we can do analysis at the organisation level (and in general, we can do analysis in a portable way, regardless of what environment the tool is running in).

On one hand, I can see the argument that this is a people/enterprise problem and not a tool problem. On the other hand, I believe that tools should be built around real problems felt by real users, and not supporting this use case will cause a lot of pain for a lot of users down the line.

I disagree on the "easy to analyze" part. In order to analyze them, you have to understand them. Meaning that you have to implement the exact same logic than any other loaders, and any shortcut might give one of your users wrong results. An API is much easier to use and less error-prone.

There are a couple subtle effects in practice that I think make analysis a lot less hard than you believe. I'm going to break this down into a couple of points:

In order to analyze them, you have to understand them

I don't think the file format would be significantly harder to understand than the API. Obviously, you know better than me here; but I imagine that you can represent most of the needed state by mapping package names + revisions to file paths.

you have to implement the exact same logic than any other loaders, and any shortcut might give one of your users wrong results

One thing that we've learned in practice (that was very surprising to me) is that users are generally okay with approximately correct answers. As an example, the configuration spec for Bower is absolutely crazy (you can override the bower_components folder name with environment variables, global configuration files, project configuration files, etc.) but users are very happy with the 99.9% correct answer of "just read bower_components".

At the very least, users in practice have strongly preferred "approximately correct" to "no answer available" (or even "answer available, but you have to put in a lot more effort").

It sounds to me like the main problem right now is that I haven't convinced you that my use case is common and real. What other context can I share that would help you make that decision?

edmorley · 2018-09-17T20:56:20Z

Integrate once across the entire project at the organisation-wide CI step. In this path, we don't necessarily have access to Node.JS in the environment, but we can integrate across many projects at once.

If you can bundle a tool, why can't you also bundle a runtime for the tool, at the org-wide CI step?

arcanis · 2018-09-17T21:22:10Z

Go to each team responsible for a Node.JS project (there can be tens to hundreds of these teams), set up a meeting with them, figure out how to work with their project CI's quirks, and help them integrate the tool.

At least for now, Plug'n'Play will require a small integration, in that those teams will have to enable it manually (by adding the "installConfig": {"pnp": true} in their package.json). What would you think of simply adding a step to your global org CI similar to this?:

if [[ -f .pnp.js && ! -f static-resolutions.json ]]; then
  echo "It seems you're trying to register a package without checking-in the static resolutions table"
  echo "Check this page for more information on how to solve this issue: http://internal.wiki/pnp"
  exit 1
fi > /dev/stderr

It sounds to me like the main problem right now is that I haven't convinced you that my use case is common and real

I think it's more that I'm not convinced that it's something that should be fixed at the package manager level. I'm trying to understand why there is no other layer where this can/should happen.

elldritch · 2018-09-24T20:04:47Z

@edmorley: bundling a runtime is possible, but I would rather avoid this step. It's a significant complexity cost that every tool trying to analyse Yarn PNP projects would need to pay.

@arcanis:

What would you think of simply adding a step to your global org CI similar to this?:

This sounds like a good intermediate step. The pain here is that building those static resolution tables will still require a project-by-project integration.

I think it's more that I'm not convinced that it's something that should be fixed at the package manager level. I'm trying to understand why there is no other layer where this can/should happen.

My intuition says that the package manager is the right place to fix this, but I'm still trying to gather those thoughts into words. It feels to me like the package manager is the most seamless layer to implement this functionality. As another data point, here's another comment with similar concerns (excerpt below): #2250 (comment)

It would be great, then, if there were a better way than the above to figure out what's in a Yarn lockfile, when using something other than Node as our scripting language.

arcanis · 2018-09-25T16:41:29Z

Hey folks, I put together the following project: https://github.com/arcanis/build-pnm

It consumes the PnP API in order to generate a static package-name-maps file. As you can see by checking the source it's not a lot of code, and you could easily adapt it to output any language you want! In fact, it could likely be generalized to output data based on a template 🙂

The pain here is that building those static resolution tables will still require a project-by-project integration.

Could be alleviated if we had a plugin system. I have something in mind regarding this. I haven't progressed enough to show anything yet but just know that it's something I'd be super interested to pursue.

It would be great, then, if there were a better way than the above to figure out what's in a Yarn lockfile, when using something other than Node as our scripting language.

It's a somewhat different concern in that the problems behind the lockfile format being close-to-yaml-but-not-yaml are clear, and the solution is well understood (we should have added a few characters that would have made the file a subset of yaml). The topic here is much more uncharted territories.

elldritch · 2018-09-25T21:23:19Z

The plugin system sounds interesting -- our Maven integration uses this route as well.

It's a somewhat different concern in that the problems behind the lockfile format being close-to-yaml-but-not-yaml are clear, and the solution is well understood (we should have added a few characters that would have made the file a subset of yaml).

Yes, that's correct. My point was that there exist other people out there who want to understand Yarn projects without using Node.

arcanis · 2019-03-11T11:12:06Z

Closing, Yarn v2 has an option (pnpEnableInlining) that allows disabling the inlining and will cause Yarn to generate two files instead of one: the manifest, and the loader. Additionally, we'll expose a special package that you will be able to depend upon to instantiate a fresh PnP API from a serialized manifest.

Note that the manifest should be treated as a safe opaque file, and any assumption regarding the data it contains might lead to breakages (ideally only between major releases, but possibly minors as well).

ghost assigned arcanis Sep 14, 2018

ghost added the triaged label Sep 14, 2018

weswigham mentioned this issue Nov 1, 2018

Add new moduleResolution option: yarn-pnp microsoft/TypeScript#28289

Open

4 tasks

arcanis closed this as completed Mar 11, 2019

texastoland mentioned this issue May 9, 2019

Support Yarn-pnp, esy, and npm-link rescript-lang/rescript-compiler#3276

Open

evanw mentioned this issue May 27, 2020

Yarn PnP support evanw/esbuild#154

Closed

arcanis mentioned this issue Oct 28, 2020

Native support for PnP microsoft/TypeScript#35206

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yarn Plug 'N Play should generate a static manifest file, not `.pnp.js` #6388

Yarn Plug 'N Play should generate a static manifest file, not `.pnp.js` #6388

elldritch commented Sep 14, 2018 •

edited

Loading

arcanis commented Sep 14, 2018 •

edited

Loading

elldritch commented Sep 17, 2018

edmorley commented Sep 17, 2018

arcanis commented Sep 17, 2018

elldritch commented Sep 24, 2018

arcanis commented Sep 25, 2018

elldritch commented Sep 25, 2018

arcanis commented Mar 11, 2019 •

edited

Loading

Yarn Plug 'N Play should generate a static manifest file, not .pnp.js #6388

Yarn Plug 'N Play should generate a static manifest file, not .pnp.js #6388

Comments

elldritch commented Sep 14, 2018 • edited Loading

arcanis commented Sep 14, 2018 • edited Loading

elldritch commented Sep 17, 2018

edmorley commented Sep 17, 2018

arcanis commented Sep 17, 2018

elldritch commented Sep 24, 2018

arcanis commented Sep 25, 2018

elldritch commented Sep 25, 2018

arcanis commented Mar 11, 2019 • edited Loading

Yarn Plug 'N Play should generate a static manifest file, not `.pnp.js` #6388

Yarn Plug 'N Play should generate a static manifest file, not `.pnp.js` #6388

elldritch commented Sep 14, 2018 •

edited

Loading

arcanis commented Sep 14, 2018 •

edited

Loading

arcanis commented Mar 11, 2019 •

edited

Loading