Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ES modules exports (implements #2718) #2826

Merged
merged 19 commits into from
Mar 29, 2020
Merged

Conversation

jgonggrijp
Copy link
Collaborator

@jgonggrijp jgonggrijp commented Feb 28, 2020

Ready for review but not ready for merging; see "before merging" notes below.

Purpose

Reformatting the underscore source code to use ES6 exports, in order to facilitate treeshaking with tools such as rollup. See #2718.

Non-purposes

  • Converting other parts of the code to ES6+. I didn't introduce any const, let, async or arrow functions.
  • Modularization. All the underscore functions are still in a single module, in the same order.

I agree with and followed #2718 (comment) and #2718 (comment). One change at a time.

This is also one of the reasons this isn't ready for merge yet; the move to ES6 exports obviates one level of indentation, but I kept that indentation for now in order to ease review.

Update: a commit that removes the obsolete indentation was added to this PR on 2020-03-13.

What I did

  • Renamed underscore.js to underscore-source.js, because the former will be generated from the latter as of this PR. The generated underscore.js is functionally equivalent (except for some minor differences detailed below) to the former underscore.js. underscore-source.js is just a working name; see notes under "before merging".
  • Removed the IIFE wrapper, firstly because ES6 export statements must be at module level and secondly because rollup will generate an IIFE for the UMD bundle.
  • Removed the hand-coded UMD logic.
  • Reformatted the exports as well as internal functions in underscore-source.js in the following ways:
    1. Replaced _.name = function(){}; by export function name(){}.
    2. Replaced _.name = expression; by export var name = expression;.
    3. Replaced var internalName = function(){}; by function internalName(){} for consistency with 1. and also for brevity and a little added convenience (i.e., hoisting).
    4. Replaced internal uses of _.name by just name, except for _.iteratee and _.templateSettings because users may override these.
  • Changed some internal names without affecting the exported interface. The removal of the _. namespace for internal use introduced some naming conflicts, which I solved as follows:
    • There was both an internal has and an exported has. I renamed the latter to _has for internal use, because the non-exported has is used more. _has also resembles the old _.has.
    • Likewise for flatten/_flatten.
    • The exported isNaN has a counterpart that is an ES builtin. I renamed the former to _isNaN for internal use.
    • Likewise for isFinite/_isFinite.
    • keys is used very often as a local variable inside functions. I renamed the exported keys to _keys for internal use.
    • now is also the name of a local variable in throttle, which is the return value of the exported now. I renamed the local variable to _now.
  • Added an auxiliary underscore-module.js (working name) which takes all the exports from underscore-source.js and mixes them into _. This is a trick to avoid hand-coding all the exports twice, i.e., to avoid having to write both export name and _.name = name.
  • Set the module field in the package.json to the underscore-module.js, so that tools like rollup will be able to find the export-based version while everyone else can continue using the UMD bundle.
  • Added an auxiliary underscore-umd.js (working name) that re-exports only the default export from underscore-module.js. This facilicates the UMD bundling by rollup.
    Update 2020-03-26: in the meanwhile, the roles between the underscore-module and the underscore-umd has reversed. The latter is the module that mixes all functions into the _ object and the import order is underscore-source -> underscore-umd -> underscore-module.
  • Added rollup configuration to generate the UMD bundle, making sure to produce an UMD bundle which is as close to the hand-written original as possible. I used an outdated version of rollup so I could use the legacy option, which prevents rollup from generating getters, which would be incompatible with IE8.
  • Delegated the creation of the noConflict function to rollup, which has an option for this.
  • Added tests to verify that treeshaking is enabled and not overzealous. See notes about treeshaking below under "quantitative changes".
  • Added commands (scripts) to the package.json to generate the UMD bundles (both the underscore.js and two bundles for the treeshaking tests) and prepended these commands to other commands that depend on their existence.
  • Updated and configured eslint, and added a plugin, to enable the linter to process import/export syntax.

I did not convert the tests to ES module syntax; in fact, they still import from the (now generated) underscore.js.

Update 2020-03-26: In the meanwhile, in addition to the above, I have done a few more things which are detailed in the comments below.

Interface changes

  • The UMD wrapper generated by rollup exposes the same interfaces to the same module systems (CommonJS/AMD/window embed). However, it is a bit less refined than the former hand-written version. I don't expect it to cause issues, but I should still mention that it isn't 100% equivalent.
  • The noConflict method generated by rollup is 100% equivalent to the fomer hand-written version. However, it is only added in the window embed scenario. ESM, AMD and CommonJS module users will not see the method anymore, although they shouldn't need it, either.

I believe that the published UMD interface is otherwise completely identical to what is currently on the master branch.

This PR shows a comparison between the former hand-written underscore.js and the new reformatted underscore-source.js. In order to appreciate the interface changes, one might want to see a comparison between the former hand-written underscore.js and the new rollup-generated underscore-js instead. I pushed a secondary branch to make this possible. The diff can be found over here: https://github.com/jashkenas/underscore/compare/master...jgonggrijp:es-module-compiled?expand=1#diff-0f36b362a0b81d6f4d4bfd8a7413c75d (you may have to click "load diff").

Quantitative changes

The amount of code to maintain decreases very slightly.

. hand-written underscore.js proposed _source + _module + _umd
total lines of code 1692 1684
gzipped kB 16.32 16.04

The weight of the published gzipped + minified UMD bundle increases by 240 bytes.

. hand-written UMD module rollup generated UMD module
minified + gzipped kB 6.5 6.74

In exchange, users get the option to treeshake when using tools like rollup. If a user imports only map (from underscore-source.js), underscore's contribution to her UMD bundle decreases by 900 lines of code compared to the present situation. This is substantial, if not as spectacular as one might expect. This is due to rollup not being clever enought about side effects and erring on the side of caution.

As an experiment, I also created a completely modularized version where each function lives in a separate module (not published). This enabled rollup to shave off another 460 lines of code. Rollup also surprised me by pointing out circular dependencies, which it apparently cannot do if all code is in one module. I suspect that solving these circularities might cut map's size in half again, but I have not tried this.

. rollup treeshaken size of map in lines of code
present situation 1692
proposed changes 786
modularized (experimental, unpublished) 329

While the minified + gzipped UMD bundle weight has increased a bit, these numbers make me believe that using ES exports is definitely a move forward. In addition, solving circular dependencies might also decrease the weight again.

Update 2020-03-26: updated treeshake performance numbers can be found at the bottom of #2826 (comment).

Before merging

  • The obsolete level of indentation (and the comment at the top explaining it) still has to be removed.
  • The generated underscore.js should probably be commited. What do you think would be the right policy for this, @jashkenas? Update 2020-03-26: in the meanwhile I have proposed to never commit it and instead ensure that it is uploaded to NPM by enforcing the build in the prepare or prepublishOnly hook of the package.json.
  • Decide how to name underscore-{source,module,umd}.js. Perhaps modules/{index-unmixed,index,index-umd-wrapper}.js?

Edit 2020-03-7: Changed suggested names from modules/{underscore,index,umd}.js to modules/{index-unmixed,index,index-umd-wrapper}.js because the former might be problematic if we modularize the whole library in a subsequent PR.

Update 2020-03-27: slightly changed the above suggestion to modules/{index,index-all,index-default}.js, respectively. This suggestion was also implemented in a new commit that has been added to this PR.

Edit to add on 2020-06-18: You can support my work on Underscore and other open source projects on Patreon.

@jgonggrijp
Copy link
Collaborator Author

@jashkenas I'm done writing. I would be very grateful if you could find the time to review this. My apologies for the long opening post and the big diff (despite my best attempts to keep the diff minimal)!

@lohfu @mjeanroy FYI.

@jashkenas
Copy link
Owner

Wow! 👏👏👏.

Thanks for taking the time to actually make a stab at this — many people have talked a lot about it over the years, but it takes a rare soul to follow through.

I'm actually starting a month of parental leave next week, so should have a good chunk of time to look at this more closely soon. Please pester me if I don't get back to you soon enough...

@jgonggrijp
Copy link
Collaborator Author

jgonggrijp commented Feb 28, 2020

@jashkenas It's a pleasure. I love this library, thank you for making it. Good luck with parenting!

@jgonggrijp
Copy link
Collaborator Author

@jashkenas Any updates?

@jashkenas
Copy link
Owner

jashkenas commented Mar 6, 2020 via email

@jgonggrijp
Copy link
Collaborator Author

@jashkenas Here's another ping. My apologies if this is too soon. I hope you had a nice weekend and wedding.

I'd like to draw your attention to #2829 as well. If you approve of that one, I'd prefer it to be merged before this one. #2829 is a very small change.

On another note, would you appreciate some help with triaging open issues and PRs? Rather than spamming your notifications by responding to each of them individually, I could create a single issue ticket with a list of PRs than can be closed, a list of issues that can be closed, a list of PRs that require review, etcetera. If you like.

@jashkenas
Copy link
Owner

This is looking truly great. Thank you for taking the time to create such a solid draft of an ES-modularization. I really just have a few minor notes around the edges:

  • It would be nice if the UMD build of Underscore continued to support ES3 — there are a lot of folks who use Underscore, for example, in Adobe Illustrator and Photoshop under ExtendScript. It would be even better if we could find a way to assert that compatibility in the tests. If we change this in the future, and start using const, async and friends, it should be part of a bigger push to provide more asynchronous/promise-based functional logic in Underscore.

  • As you indicate, let's kill the extra level of indentation.

  • I'd prefer getting rid of the additional underscore-module.js file, and just including the _.name = function assignment in the main source file. It certainly is a neat trick though.

  • Naming: Since this change will become Underscore 2.0, I think it's fine for the main "underscore.js" to switch from being UMD formatted to a clean ES module. My preferred naming scheme would be:

    • underscore.js The main source ES module, which is also the module that should be directly imported when using ES modules. (No separate -module version.)
    • underscore-umd.js If it would be possible to eliminate this file by having Rollup read from underscore.js directly (now that it includes the correct default export), that would be great. Otherwise, we can leave this file in place.
    • dist/underscore-min.js Minified UMD
    • dist/underscore-min.js.map
  • It would be nice if the internal _names followed a consistent pattern (regardless of frequency of use), so: name for exports and _name for internal functions only.

  • It looks like it may be possible to have more current versions of Rollup avoid the use of getters by using the --no-externalLiveBindings option, which will make things IE8 compatible.

  • Overall — and even if it looks like it makes life easier for Rollup — I'd still strongly prefer to keep everything in a single underscore.js, ES module file. Perhaps there’s something we can do with respect to the code organization, or avoiding function hoisting, or circular dependencies, that can solve the treeshaken LOC problem in a different way.

Bravo! 👏

@jgonggrijp
Copy link
Collaborator Author

Thank you for taking the time to review this. I'm happy to hear it looks so good to you. This weekend, I'm going to try to address your concerns as well as I can. Some comments in advance:

  • It would be nice if the UMD build of Underscore continued to support ES3 — there are a lot of folks who use Underscore, for example, in Adobe Illustrator and Photoshop under ExtendScript. It would be even better if we could find a way to assert that compatibility in the tests.

This is a tall order, but I agree with the sentiment in principle and I will investigate the options.

If we change this in the future, and start using const, async and friends,

Please observe that this is by no means my intention, not even for a subsequent pull request. The current PR introduces no new syntax other than import/export. In my opinion, it is not necessary at this time to introduce other ES6+ syntax. In fact, not doing that has the advantage that no transpiler is needed.

it should be part of a bigger push to provide more asynchronous/promise-based functional logic in Underscore.

I'm not sure about this. I'm not sure more async belongs in underscore at all, because there is already a perfectly fine library available for that purpose. I'm not sure such an introduction would require modern async/await syntax, either, since you can work with promises without that syntax, too. And finally, the syntax used in the source code bears little relevance to the functionality available to the end user, since she will always see a version of the library that is encoded in old syntax, either as written by hand or transpiled down from modern syntax.

  • I'd prefer getting rid of the additional underscore-module.js file, and just including the _.name = function assignment in the main source file. It certainly is a neat trick though.

Why do you want to keep everything in one file?

The purpose of the module/source divide is not just avoiding repetition, although I think there is great value in DRY. It is also the only way to enable any treeshaking at all. I realize I didn't emphasize this before, but treeshaking only works when the user imports from the underscore-source, not when importing from the underscore-module. Treeshaking works when importing from the underscore-source, exactly because it doesn't contain the _.name = function assignments; those assignments constitute side effects and they force tools like rollup to always include all of the source.

I'll be frank: there is no point in accepting this pull request unless you resign to the idea of splitting the source into at least two modules.

However, I'm very certain that you have valid, well-thought reasons for wanting to keep everything together. I want to fully understand these reasons and I want to address them to the best of my ability. Do they perhaps include the wonderful annotated source edition that you can read through in one go? Name all of it, and I will fight dragons so we can keep as many of the advantages of a single module as possible.

  • Naming: Since this change will become Underscore 2.0,

Why will it? Isn't this mostly new functionality rather than a radical interface change?

I think it's fine for the main "underscore.js" to switch from being UMD formatted to a clean ES module. My preferred naming scheme would be:

    • underscore.js The main source ES module, which is also the module that should be directly imported when using ES modules. (No separate -module version.)

As I wrote above, this wish is impossible to grant while also having treeshaking. In addition to that, if you continue to call the UMD bundle underscore.js and use some other name for the main ESM entry point, nobody will need to change the path from which they import underscore, not even people who use ESM imports, because the module field of the package.json can transparently point them to the correct underlying file.

    • underscore-umd.js If it would be possible to eliminate this file by having Rollup read from underscore.js directly (now that it includes the correct default export), that would be great. Otherwise, we can leave this file in place.

Unfortunately, no, that file cannot be eliminated. The problem is not in the way the default export is arrived at (I could have done mixin(allExports) ; export { default } from 'underscore-source' instead), but in the way rollup treats files that contain both named exports and a default export. It insists on setting module.exports to a plain object in that case, which means that if people do

var _ = require('underscore');

, their _ is not anymore the OOP chain wrapper/partial placeholder/iteratee override holder it should be. For that, they would need to use

var _ = require('underscore').default;

instead. The only way to make the first scenario work, which I really think it should, is to offer rollup a module that has a default export and no other exports.

  • It would be nice if the internal _names followed a consistent pattern (regardless of frequency of use), so: name for exports and _name for internal functions only.

I do agree with this and I will do as you suggest.

  • It looks like it may be possible to have more current versions of Rollup avoid the use of getters by using the --no-externalLiveBindings option, which will make things IE8 compatible.

No, actually I tried that first, with rollup version 1.31.1. It still insisted on creating getters for some of the exports, probably because they were aliases of other exports. The only way to get rid of the getters entirely was the legacy option. This also indicates a potential problem with rollup; they don't seem very committed to supporting legacy systems.

I think there are three main options:

  1. Accept that we need to lag behind the rollup releases. It is only a (minor) problem for the developer, not for the user. The rollup version I used is only two or three years outdated, and apart from the getters and the use of Object.freeze (both of which are only lacking because of the legacy option), the generated code is identical to what recent versions of rollup generate.
  2. Drop support for IE8 (and probably ExtendScript). It hurts, but if you really intend to make this a 2.0, maybe it is an acceptable sacrifice. It would definitely help to shake some weight.
  3. Do a final transpilation+polyfill pass on the UMD bundle. This allows you to be as backwards-compatible as you want, but it comes at an ugly cost: additional layers of complexity in the tooling and potentially a lot of extra weight.
  • Overall — and even if it looks like it makes life easier for Rollup — I'd still strongly prefer to keep everything in a single underscore.js, ES module file. Perhaps there’s something we can do with respect to the code organization, or avoiding function hoisting, or circular dependencies, that can solve the treeshaken LOC problem in a different way.

Let me begin by repeating that I believe you have good reasons for this and that I want to address those reasons as well as I can. Please do explain those reasons to me in detail.

Rollup (and the hard problem of JavaScript treeshaking in general) is mainly sensitive to two things: side effects and dependency. Apart from that, the way the code is organized does not really affect its resuts. Whether one function goes before the other or the other way round does not matter, as long as neither depends on the other or writes to an object that the other might read from. Especially the question of side effects quickly becomes intractable as more common variables are involved. This is why treeshaking becomes more effective as the size of modules decreases: modules have isolated namespaces, so the finer the modularization, the less potential common state.

I truly believe that further modularization, up to the point of putting every function in a separate module, would be good for underscore. I didn't do it in this PR because I believe in evolutionary change and because I wanted to keep the review as digestible as possible. But I believe it should be the next step.

Apart from dramatically improving treeshaking, splitting out the source code into many small modules has a few other clear advantages:

  • As I said before, it enables rollup to report circular depencies.
  • It makes it possible to publish not only a monolithic UMD bundle, but also modularized AMD and CJS versions. In this way, even people who use require.js or browserify will have the opportunity to benefit from treeshaking, even also while using the same import paths as ESM users. The latter also means that one could write ESM code that uses underscore, transpile it to AMD, and then use dynamic treeshaking right from IE. It is just a matter of passing the preserveModules: true option to rollup, setting the browser field in the package.json for browserify users and setting a single path alias on the end of AMD users.
  • When creating a monolithic bundle, rollup will linearize the code in strict order of dependency, at least where the dependencies are noncyclic. Since there is some freedom in this, you can also guide the produced order to some extent by chosing the order of the imports in the index module. This could help to enhance readability, for example in generating the annotated source. Imagine a monolithic ESM module in strict order of dependency; it would be very clean.

By extension of the above, I believe that fine modularization goes before any attempt at solving circular dependencies.

Bravo! 👏

Thank you!

@jashkenas
Copy link
Owner

Thanks for the detailed response!

Please observe that this is by no means my intention, not even for a subsequent pull request. The current PR introduces no new syntax other than import/export. In my opinion, it is not necessary at this time to introduce other ES6+ syntax. In fact, not doing that has the advantage that no transpiler is needed.

👍

I'm not sure about this. I'm not sure more async belongs in underscore at all, because there is already a perfectly fine library available for that purpose.

👍

I think there are three main options:

  1. Accept that we need to lag behind the rollup releases.

👍 That sounds fine. I'm sorry to hear that the --no-externalLiveBindings doesn't work as described.

I'll be frank: there is no point in accepting this pull request unless you resign to the idea of splitting the source into at least two modules.

That’s fine, if it's necessary. If we end up with one file as the straight ES Module exports, one as the complete mixed API, and one file as the single-object UMD export — as you have it today — that's cool with me.

However, I'm very certain that you have valid, well-thought reasons for wanting to keep everything together. I want to fully understand these reasons and I want to address them to the best of my ability.

Simply because it's the soul of this (now venerable) project. Underscore has always tried — and has succeeded and failed over the years, as waves of pull requests have come and gone — to be a simply-written, single-file tour through some useful functional programming idioms in JavaScript. It’s an exercise that’s supposed to be as pedagogical as it is practical. There’s a balance between performance and organizational complexity on the one hand, and readability and source code simplicity on the other. Being a short, single-file library with no dependencies is at the heart of that identity. Also, Lodash already exists.

But I don't think that should be a dealbreaker — since, if we separate the -module or -source file as you suggest, it should make no difference to Rollup, or any other tool, if the source code happens to live in one file or many. It should be possible to syntactically determine where the seams lie between the functions, and achieve equal (or close to equal) results. And if they're not, it's either a hidden circular dependency (that we can fix), or a bug in Rollup (that can be fixed). Right?

If there are any other open questions you feel like I haven't addressed, please let me know.

@jgonggrijp
Copy link
Collaborator Author

If we end up with one file as the straight ES Module exports, one as the complete mixed API, and one file as the single-object UMD export — as you have it today — that's cool with me.

Great!

... reasons for wanting to keep everything together.

Simply because it's the soul of this (now venerable) project. Underscore has always tried — and has succeeded and failed over the years, as waves of pull requests have come and gone — to be a simply-written, single-file tour through some useful functional programming idioms in JavaScript. It’s an exercise that’s supposed to be as pedagogical as it is practical. There’s a balance between performance and organizational complexity on the one hand, and readability and source code simplicity on the other.

I know what you mean. Underscore is a very elegant library, venerable indeed. While I had read the docco rendering before, I became more intimately familiar with the code while preparing this PR and it only strengthened my opinion. It's like a good, well-paced story in which everything comes full circle at the end. With enlightening ingenuity.

Being a short, single-file library with no dependencies is at the heart of that identity. Also, Lodash already exists.

I full-heartedly agree about the succinctness and the self-sufficiency. As far as I'm concerned, those can and should stay. I also see the value of the "single read", but I think there might be other ways to address that than necessarily keeping all the primary source code in one file. Maybe it is enough if we generate a monolithic, well ordered ESM using rollup and then feed that into docco. To me, it seems like that might actually be the best of both worlds. Underscore could at the same time present a holistic story and also be a prime example of how to write elegant modular code.

As for Lodash. I'm a bit hurt by the remark. I don't believe that modularity is some kind of gliding scale that separates Lodash from Underscore, or even that it is the most significant difference between the libraries. As far as I'm concerned, Lodash is what it is for very different reasons. And I'm definitely not going to repeat that history. To begin with, I will not fork Underscore under any condition.

But I don't think that should be a dealbreaker — since, if we separate the -module or -source file as you suggest, it should make no difference to Rollup, or any other tool, if the source code happens to live in one file or many. It should be possible to syntactically determine where the seams lie between the functions, and achieve equal (or close to equal) results. And if they're not, it's either a hidden circular dependency (that we can fix), or a bug in Rollup (that can be fixed). Right?

Sorry, but no, wrong. Observe that there is a factor two treeshaking difference between the underscore-source from the current PR and the experimental, fully modularized version that I didn't publish, despite both containing the exact same code with the same circular dependencies and despite me using the exact same version of rollup in both cases.

As a matter of fact, it makes a great deal of difference whether the source code lives in one file or many. I can see why you don't want this to be true, ideologically speaking, but it is true nonetheless. Rollup is over its neck in the murky business of trying to statically track side effects in an impure, weakly typed dynamic language, while Underscore is a little paradise of functional, orthogonal style. Getting a factor two treeshaking improvement by changing Underscore is many orders of magnitude easier than getting that same improvement by changing Rollup. I know; I got a factor four treeshaking improvement in less than a week of editing Underscore (a factor two by splitting it in two and then another factor two by splitting it in 146), working all alone, while the Rollup team has been working on Rollup for years and they never managed such a dramatic treeshaking improvement on anything whatsoever.

I'm not just talking pragmatics. There are strong, fundamental reasons for this. In 2050, ES code distributed over many files will still be more effectively treeshaken than monolithic source code. I'll bet on that.

Besides that, hidden circular dependencies are uncovered by splitting everything in tiny modules and then having a tool tell you where the circularities are. That, or many hours of brain-melting static analysis by a human (assuming the human won't give up). I don't know about you, but I'm not going to repeat that effort every time we suspect there might be a circular dependency (which we might not always suspect). Given the choice, I'd rather have it modularized permanently and get an instant hint from a tool whenever somebody introduces a circularity.

If there are any other open questions you feel like I haven't addressed, please let me know.

Yes. I don't fully understand yet why you feel that modularity is at odds with the elegance of Underscore. So far I see no conflict, only harmony. What makes you feel different?

@jashkenas
Copy link
Owner

As I said, it’s an aesthetic preference that — to me — lies quite close to the heart of Underscore’s raison d'être, and is a matter of prioritization. To me, keeping Underscore a single file that can be copied, read through in a sitting, and used without any additional tooling, is very important, and a large part of the library’s original usefulness and popularity. Whether or not we support tree-shaking at all is much less important to me. (In fact, there are good arguments against encouraging it, and promoting the single, CDN-cached, 6.5k version of Underscore instead.)

But let's put that aside. When you're ready with this PR, let's take a few representative functions, and you can also do them in the modularized form — and I'll take a stab at trying to satisfy Rollup’s internal logic to see if I can come close to matching them with the single-source file. And if I can't approach your results, we'll discuss it again.

@jgonggrijp
Copy link
Collaborator Author

As I said, it’s an aesthetic preference that — to me — lies quite close to the heart of Underscore’s raison d'être, and is a matter of prioritization. To me, keeping Underscore a single file that can be copied, read through in a sitting, and used without any additional tooling, is very important, and a large part of the library’s original usefulness and popularity.

Alright. I can really relate to this. But this is not lost; the generated UMD bundle can be copied, read through in a single sitting and used without any additional tooling. Only the developer is seeing a change in this regard.

Whether or not we support tree-shaking at all is much less important to me. (In fact, there are good arguments against encouraging it, and promoting the single, CDN-cached, 6.5k version of Underscore instead.)

It may surprise you, but I fully agree with this. If you're using substantial portions of Underscore client side, a monolithic CDN embed is far better than a subset included in your own (probably already heavy) application bundle. Because of the caching, as you said, and also because requests to different hosts can run in parallel. I always use a mix of jsDelivr and cdnizer for this same reason.

On the other hand, there definitely are valid use cases for treeshaking. I can imagine a client side application that makes heavy use of _.partial but no other Underscore functions, or a server side application that needs to make do with very limited working memory.

But let's put that aside. When you're ready with this PR, let's take a few representative functions, and you can also do them in the modularized form — and I'll take a stab at trying to satisfy Rollup’s internal logic to see if I can come close to matching them with the single-source file. And if I can't approach your results, we'll discuss it again.

Fine by me. I'm about to push my finished work, stay tuned.

@jgonggrijp
Copy link
Collaborator Author

jgonggrijp commented Mar 13, 2020

@jashkenas I rebased on the latest master, ensuring to fix merge conflicts and include my own _.iteratee fix, added some more commits and then force-pushed. The changes since your last review:

  • Used _names only for internal names, as agreed. This necessitated aliasing the builtin isNaN and isFinite to _isNaN and _isFinite in the setup section. This change can be seen in isolation here: 6ae7bac.
  • Removed the indentation, as agreed.
  • Moved the function export aliases below the functions they alias. This is more consistent with the var export aliases and, more importantly, prevents rollup from dropping comments in some cases. Seen in isolation here: 8f4bbc6.
  • Renamed the modules. More on this below.
  • Added some logic that ensures that docco has a complete, ESM-based version of the source code to render as a single read, regardless of the number of underlying modules. I pushed a separate branch so you can preview the HTML that is produced: https://github.com/jgonggrijp/underscore/blob/esm-docco-committed/docs/underscore.html

In renaming the modules, I deviated a bit from the options we discussed before.

  • underscore-source.js became modules/index.js. This follows the convention that index.js is the module where you can find "everything". People who wish to treeshake will be importing from underscore/modules/index. If we further split this module in the future, that import path will continue to work because it will become the place where we collect all the functions.
  • underscore-umd.js became modules/index-default.js. This is still the entry point for the UMD bundle; people will not be specifying this path explicitly in their imports.
  • underscore-module became modules/index-all.js. This is now a module that re-exports the default from index-default plus all the named exports from index, hence all. People who use ESM will automatically get this module when they import ... from 'underscore'.

Please let me know whether you find this acceptable.

jgonggrijp added a commit to jgonggrijp/underscore that referenced this pull request Mar 13, 2020
@jashkenas
Copy link
Owner

No worries! I didn't take any offense — and I think that most of your game rules sound fine. I just didn't want to commit to a particular date without knowing when I would be able to look at this further.

Yesterday, they announced the closure of the borders here in Chile, and we’ve been scrambling around weighing the pros and cons of cancelled flights and shelter in place orders to try to get back home. Hopefully things will quiet down after we manage to make it back.

@jgonggrijp
Copy link
Collaborator Author

No problem, best of luck getting home!

@jgonggrijp
Copy link
Collaborator Author

@jashkenas I'm mostly asking this out of personal concern: how are the circumstances for you and your family?

@jashkenas
Copy link
Owner

We’re good! When they announced the impending border closure, and our Aeromexico flight was cancelled, we turned around, drove 6 hours back north to Santiago, and grabbed a flight (one of the last?) back home to go hunker down.

@jgonggrijp
Copy link
Collaborator Author

@jashkenas I'd really like to move on with this PR. Could you give me a status update?

@jashkenas
Copy link
Owner

I went back to work on Monday, and haven't had a chance to take a look at this again ... but I'll try to soon. Sorry for the slowness.

@jashkenas
Copy link
Owner

Alright! I took a good long crack at playing around with Rollup’s treeshaking this morning, and feel like a learned a lot about what kinds of patterns it can and can’t treeshake, how /*@__PURE__*/ doesn’t work, and the types of things that we do in basic Underscore that end up being included in the bundle if they're present in the imported module, regardless of use.

For example, here’s a pattern that Rollup can tree-shake away (it’s a reduced test case of our isArguments function):

function _toString() {}
export var isArguments = (function() {
  return _toString(arguments) === '[object Arguments]' ? 0 : 1;
}());

And here’s the version we need to use, that Rollup can’t:

function _toString() {}
export var isArguments = (function() {
  return _toString.call(arguments) === '[object Arguments]' ? 0 : 1;
}());

The difference in this case is .call().

Flowing through functions, this means that anything that depends on Object.prototype.toString.call(...) — which we use heavily to test types, can't be treeshaken if called at the top level, even if not exported directly.

Many of the other places where we call functions at the top level can't be treeshaken by Rollup, although some simpler patterns can.

So, what does that mean for this PR?

Ultimately, although I'd be happy to publish a version of Underscore that provides both ES Module and UMD flavors, I think that the module should be a monolithic one, given the small size of Underscore, its presence in the CDNjs cache, and the current state of treeshaking tooling. I'm still not interested in either chopping it up into tiny files, or in littering the codebase with /*@__PURE__*/ annotations (even if they worked, which they don’t) in order to satisfy Rollup’s current limitations.

That said, if you want to publish your fully modularized version as a different npm package, godspeed. I'd be happy to link to it from the top of the Underscore homepage. I hope that's not too disappointing.

@jgonggrijp
Copy link
Collaborator Author

Alright! I took a good long crack at playing around with Rollup’s treeshaking this morning, and feel like a learned a lot about what kinds of patterns it can and can’t treeshake, how /*@__PURE__*/ doesn’t work, and the types of things that we do in basic Underscore that end up being included in the bundle if they're present in the imported module, regardless of use.

You surprise me! I was prepared to wait a long time before you'd even announce a deadline for the treeshaking challenge, and now you've already completed the challenge. You did a really good job at it as well.

So, what does that mean for this PR?

Ultimately, although I'd be happy to publish a version of Underscore that provides both ES Module and UMD flavors, I think that the module should be a monolithic one, given the small size of Underscore, its presence in the CDNjs cache, and the current state of treeshaking tooling. I'm still not interested in either chopping it up into tiny files, or in littering the codebase with /*@__PURE__*/ annotations (even if they worked, which they don’t) in order to satisfy Rollup’s current limitations.

That said, if you want to publish your fully modularized version as a different npm package, godspeed. I'd be happy to link to it from the top of the Underscore homepage.

That's really nice of you, but no, I don't want to do that because it would amount to forking Underscore. I believe the world doesn't need yet another fork of Underscore. I only want Underscore to be modular if you are on board.

I hope that's not too disappointing.

If the current, semi-monolithic PR can still be merged, I'm happy enough. How about that?

And I'll try again later to interest you in chopping everything into tiny modules. ;-)

@jashkenas
Copy link
Owner

Okay, great! In that case I think that we’re almost there.

At least before I can merge it — and perhaps as a general policy — we need to provide a built version of underscore.js in the repo. For many years, there has been a link to test out the "Edge version" on the homepage: https://raw.github.com/jashkenas/underscore/master/underscore.js ... and it would be great if that could continue to work, as changes are made to the master branch.

So perhaps using Husky to define a pre-commit hook that makes sure that underscore.js is always kept up-to-date with modules/index.js ?

And then prepublish can handle the -min.js and -min.js.map variants, and the annotated source.

As to the actual roll-out, how would you like to handle it? I think this could merit a 2.0 release, with appropriate revisions to the homepage/documentation. Would you like to be added as a collaborator on this repo to help make that happen? I think you’ve clearly earned your stripes.

@jashkenas
Copy link
Owner

And as to the treeshaking — my experiments make me feel like (even with a single source file), it's tantalizingly close. Rollup is able to eliminate all of the obvious non-dependencies, in terms of function declarations at the top level.

It's only with our use of function calls at the top level, in order to define other functions, that it has problems, and even then, only some of the time.

Things like:

export var omit = restArguments(function(obj, _keys) {
  var iteratee = _keys[0], context;
  if (isFunction(iteratee)) {
    iteratee = negate(iteratee);
    if (_keys.length > 1) context = _keys[1];
  } else {
    _keys = map(_flatten(_keys, false, false), String);
    iteratee = function(value, key) {
      return !contains(_keys, key);
    };
  }
  return pick(obj, iteratee, context);
});

... because of the call to restArguments, even though that function obviously has no side effects.

I feel like there might be a more general pattern of dynamic function export creation we could find that would allow Rollup to treat each export in isolation ... but I wasn't able to discover it, despite my best efforts.

@jgonggrijp
Copy link
Collaborator Author

Okay, great! In that case I think that we’re almost there.

Hooray!

At least before I can merge it — and perhaps as a general policy — we need to provide a built version of underscore.js in the repo. For many years, there has been a link to test out the "Edge version" on the homepage: https://raw.github.com/jashkenas/underscore/master/underscore.js ... and it would be great if that could continue to work, as changes are made to the master branch.

So perhaps using Husky to define a pre-commit hook that makes sure that underscore.js is always kept up-to-date with modules/index.js ?

Sure, I'll look into it.

And then prepublish can handle the -min.js and -min.js.map variants, and the annotated source.

prepublish or prepublishOnly? I don't mind either, but in the first case, I'd prefer to call it prepare.

As to the actual roll-out, how would you like to handle it? I think this could merit a 2.0 release, with appropriate revisions to the homepage/documentation.

Regardless of the version number, I definitely think some additions to the documentation would be in order.

I really appreciate that you want to celebrate this with a 2.0 release, but I adhere to semver quite strictly and I think this should be 1.10. All of the old interface is still there, all of the backwards compatibility is still there and the source code still looks mostly the same. Users just get the new option to treeshake with rollup. I don't want to "shock" users with a 2.0 if it isn't actually disruptive.

If, in some hypothetical future, we chop everything into tiny modules, I think that might merit a 2.0 release though... ;-)

Would you like to be added as a collaborator on this repo to help make that happen? I think you’ve clearly earned your stripes.

That's an honour, thank you. I definitely want to help. If making me a collaborator helps to make me help, I'm all for it. :-)

As for your second comment about treeshaking: I feel your frustration! However, keep in mind that the size of _.map is already cut in half. That's already a pretty big win. There are probably functions that are treeshaken even more as well.

@jashkenas
Copy link
Owner

jashkenas commented Mar 29, 2020

prepublish or prepublishOnly?

Right, prepublishOnly is what I meant.

I really appreciate that you want to celebrate this with a 2.0 release, but I adhere to semver quite strictly and I think this should be 1.10.

Okay, 1.10 it is.

That's an honour, thank you. I definitely want to help. If making me a collaborator helps to make me help, I'm all for it. :-)

Done and done.

As for your second comment about treeshaking: I feel your frustration! However, keep in mind that the size of _.map is already cut in half. That's already a pretty big win. There are probably functions that are treeshaken even more as well.

From my understanding, not so much! Most of the half of the codebase that is currently included along with treeshaken map does not have anything to do with map directly (or indirectly!). It's just all of the top-level function calls that Rollup cannot determine are side-effect-free, and everything that those top-level function calls depend on. All of that same code will be included in every treeshaken individual export, no matter which one it is. And if we can restructure those top-level function calls to treeshake cleanly, conversely, all Underscore functions should export cleanly from the single file.

@jgonggrijp
Copy link
Collaborator Author

@jashkenas I think we're ready for the merge. I can now push the merge button by myself, but I'll give you a chance to review the new commits first.

Regarding the Husky hooks, I had to explicitly invoke git add inside the pre-commit hook to ensure that the UMD bundle would actually be committed. Apparently, doing this inside a pre-commit hook causes the autogenerated files to still appear in the diff and the index after git commit completes. To fix this, I had to add an explicit git reset in the post-commit hook as well.

How to continue from here? Merge the current PR first, then start a new branch in your repo to update the documentation?

Copy link
Owner

@jashkenas jashkenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to continue from here? Merge the current PR first, then start a new branch in your repo to update the documentation?

That sounds like a good plan.

@jgonggrijp jgonggrijp merged commit c49312c into jashkenas:master Mar 29, 2020
@jgonggrijp jgonggrijp deleted the es-module branch March 29, 2020 15:57
@jgonggrijp
Copy link
Collaborator Author

Given your approval, I went ahead and pushed the merge button. Hope you don't mind.

This is a big milestone. Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants