Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Intl: Automatically download ICU if requested. Reduce size. Keep --with-intl=none the default. #8719

Closed
wants to merge 13 commits into from

Conversation

srl295
Copy link
Member

@srl295 srl295 commented Nov 13, 2014

Background: see joyent#7676 of course

  • decide which variant is on by default
  • make it on by default
  • test more failure cases of downloading
  • add md5 sum
  • test more platforms

@srl295
Copy link
Member Author

srl295 commented Nov 13, 2014

seems to work on windows!

ECMA-402 (Intl) needs an ICU in deps\icu
Downloading ICU:
 http://download.icu-project.org/files/icu4c/54.1/icu4c-54_1-src.zip
/ 26.3MB total, 26.3MB downloaded
Downloaded icu4c-54_1-src.zip
Should have md5sum 6b89d60e2f0e140898ae4d7f72323bca
TODO: SKIPPING md5sum check, assuming OK!
Extracting ICU source zip: deps\icu4c-54_1-src.zip
creating  icu_config.gypi

@srl295
Copy link
Member Author

srl295 commented Nov 13, 2014

did some error tests - https://gist.github.com/srl295/d806b1e359dda274b7cc

@srl295 srl295 changed the title (IN PROGRESS) Automatically download ICU, eventually make --with-intl on by default Automatically download ICU, eventually make --with-intl on by default Nov 13, 2014
# append to existing version (UA)
version = '%s (node.js/configure)' % urllib.URLopener.version
def icu_download(path):
# download ICU, if needed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud here: is this the right place to do the download? should it at least be in a separate utility file called by configure or even not called automatically at all and have configure complain at you if you haven't downloaded it? Seems slightly code-smelly to have a download taking place in here, but perhaps there are precedents already set inside configure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg didn't see any precedents here. It certainly could be factored out into a separate utility or at least imported function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg @trevnorris I guess I could add import tools.downloader to configure and then create <node>/tools/downloader.py right? or maybe import downloader from tools?

Glad to do this. Just don't know what the tradeoffs are time wise as far as v0.12's runway.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what "standard protocol" is for downloading external resources. V8 tells you to run make dependencies if you try to just make w/o having downloaded the icu bundle. Maybe something similar to that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg I split out a module file tools/configure.d/nodedownload.py ( kept general in case there's more factoring done on configure ? ) - let me know how that looks.

@trevnorris seems like if it is on by default we might as well auto download, it seems the most generally user-friendly. You could also include a copy of icu .zip in node's .tgz which would save folks a step.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjfontaine Thoughts on how we want to proceed and what solution we want to use?

@srl295 srl295 changed the title Automatically download ICU, eventually make --with-intl on by default Automatically download ICU, enable Intl --with-intl=small-icuby default Nov 13, 2014
@srl295 srl295 changed the title Automatically download ICU, enable Intl --with-intl=small-icuby default Automatically download ICU Nov 17, 2014
@MylesBorins
Copy link

Not a huge addition to this conversation, but I managed to get a build of node with Intl support working off of this branch without issue on OSX 10.9.5

@piscisaureus
Copy link

Is it not possible to put the relevant parts of ICU in the git repository?
I may be the old guy here but downloading stuff as a build step, meh.

@srl295
Copy link
Member Author

srl295 commented Nov 19, 2014

@piscisaureus #7676 (comment) does just that (but doesn't use it yet - soon).

@srl295
Copy link
Member Author

srl295 commented Nov 19, 2014

OK! Note two commits. 5875492 is the tooling, the other is ICU itself.

@srl295 srl295 changed the title Automatically download ICU Make --with-intl=small-icu the default. Embed small ICU. Download if need be. Nov 19, 2014
@MylesBorins
Copy link

Just finished testing the latest cut, works as expected!

@trevnorris
Copy link

Had a discussion w/ @tjfontaine. Here are some key points:

  • Binary releases located on the download page will have --with-intl=small-icu by default.
  • The download page will also contain compressed .dat file (along w/ convenient README about how to use it) so full-icu can be side-loaded.
  • Building from source will continue to use --with-intl=none by default. Meaning, no icu source will be included in the git repo.
  • If icu is included in the build, then the necessary files will be downloaded automatically (if needed).

@srl295 You have one additional thing to include I believe.

@srl295 srl295 changed the title Make --with-intl=small-icu the default. Embed small ICU. Download if need be. Intl: Automatically download ICU if requested. Reduce size. Keep --with-intl=none the default. Nov 27, 2014
@srl295
Copy link
Member Author

srl295 commented Nov 27, 2014

@trevnorris I think this PR is complete as to the above list. I filed srl295/node#10 and srl295/node#11 to capture two other ideas- are they needed/possible in the v0.12 timeframe? I will look at them separately unless they are ship stoppers for #8719

@srl295 srl295 force-pushed the srl-v0.12-autoicu branch from 83994d7 to dc2100e Compare December 9, 2014 00:19
@srl295
Copy link
Member Author

srl295 commented Dec 9, 2014

@trevnorris added a test case in dc2100e

@misterdjules
Copy link

@srl295 Just to make sure we don't miss it, I created in your own fork a PR that fixes some cross-compiling issues. I'm not sure it's the proper way to fix it though, as I'm not very familiar with gyp.

@srl295 srl295 force-pushed the srl-v0.12-autoicu branch from 7c3ecfa to 6f6d337 Compare December 9, 2014 20:21
@srl295
Copy link
Member Author

srl295 commented Dec 9, 2014

@misterdjules I will be looking at srl295/node#12 very shortly. Trying to finish the first pending issue first

@srl295 srl295 force-pushed the srl-v0.12-autoicu branch from 5a94979 to 4f14994 Compare December 9, 2014 23:33
srl295 and others added 3 commits December 17, 2014 17:33
Respond to review comments from @misterdjules:

* Remove the prepare-icu-source.sh mechanism for creating
  an embedded deps/icu-small as per
  7cb4aad - it was not
  currently used and thus deadwood.

* Updated README.md to clarify some of the options:
  that 'none' is the default for --with-intl, and that
  --with-icu-source works with 'small-icu'

* clarified the test-intl.js test case

* cleaned up and added documentation in configure and in nodedownload.py
Instead of downloading ICU if missing from `deps/icu`, just print a warning
unless either `--download=all` or `--download=icu` is set.

Add some generic scaffolding in case other modules end up in the same boat
in the future.

It's harmless to pass `--download=xyzzy`, so, future proof
(at least until some xyzzy dependency is integrated).
build: i18n: Don't auto-download ICU unless --download=all

Thanks @trevnorris for taking a look.
```

The `small-icu` mode builds
with English-only data. You can add full data at runtime.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually supported? I thought it was not currently supported, but that we planned to fix it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's supported, and already landed in v0.12. That's why small icu links
insuch an odd way. If the "icu dir" option/env var is given, node_i18n.cc
doesn't call setcommondata and thus overriding happens (as icu is actually
linked against the stub data ).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, sounds good and thanks for the clarification!

@misterdjules
Copy link

Besides my two latest comments, LGTM.

Also, as discussed with @srl295 on IRC, I would like to land this PR sooner rather than later, so that we can have all ICU-related PRs in joyent/node instead of in @srl295's fork. Thus, we'd be able to assign them to the appropriate milestones and have a better idea of what needs to be done and when.

@misterdjules misterdjules added this to the 0.11.15 milestone Dec 19, 2014
@misterdjules
Copy link

@tjfontaine @trevnorris This LGTM, and I think we can leave #8914 for later. What's your opinion?

@trevnorris
Copy link

@misterdjules Agreed. Also, LGTM.

@piscisaureus
Copy link

-1

I am no fan of automatic downloading things as part of a build step (and much less so at runtime).
I know node-gyp does it too; it pains me every time I see it happen.

The other thing about the --with-icu build switch is that it pushes the core team's indecisiveness onto the user. If the core team can't decide, then how are users and distro maintainers supposed to make a good call? Now we'll see some linux distros ship node with full ICU support, and windows/mac users who use the official installers will (presumably) get only english or nothing at all. This fragmentation creates a situation that hurts all users, because people that don't care about ICU may end up with longer compile times and/or a bigger download, and people that do need to have a fallback for node builds that have only a limited or no ICU support.

Would it not be possible to only support loading ICU data files, so ICU itself could be distributed on NPM?
Users that need i18n support would simply do Intl = require('intl') and call it a day.

Or am I missing something here?

@misterdjules misterdjules mentioned this pull request Dec 24, 2014
@srl295
Copy link
Member Author

srl295 commented Dec 24, 2014

Bert, thanks for writing here. A couple of things.

This change actually does not automatically download by default, only doing
so if you specify the correct option that requires downloading, or if you
specify a URL.

Secondly, I'm Glad to be on the hook to work with other packagers in how
they package node and v8.

Thirdly, for the record i'd be glad to see "full icu" as the default
everywhere. However, As the trail of discussion on #7676 and related shows,
there has been over most of a year much discussion about the "sticker
shock" involved to the repo, download, installed size in that regard.

Fourthly as to "icu via npm" I would just note that this is a v8 feature
(for two years) and my work is only to enable it. It's not a separate
module, whether it is separable or not I do not know.

I'd be glad to discuss more as I'm able. Thanks again for the feedback.

@misterdjules
Copy link

@srl295 I'm taking some days off from now until January 4th, so I won't be able to do any work or even comment on this PR for a while.

@trevnorris @chrisdickinson @tjfontaine @cjihrig @orangemocha Could you please make sure that it's taken care of while I'm away?

Thank you!

@srl295
Copy link
Member Author

srl295 commented Dec 24, 2014

I'm off also but monitoring this thread.

assert.equal(new Intl.NumberFormat(['en']).format(12345.67890), '12,345.679');

var collOpts = { sensitivity: 'base', ignorePunctuation: true };
var coll = new Intl.Collator(['en'], collOpts);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace here is botched. Nothing huge. Just need to remember to fix it before it lands.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. From your other comment, I wont try to fix this, but let it get fixed
when it lands.

@trevnorris
Copy link

Only have a minor nit that should be taken care of by whoever flattens and lands this PR. LGTM.

@tjfontaine Any comments?

@srl295 Ping back by Friday if there's no more activity and I'll get this landed.

@srl295
Copy link
Member Author

srl295 commented Dec 31, 2014

So basically wait all year? :-( :-)

Thanks. Happy new year and I'll ping Friday.

@trevnorris
Copy link

@srl295 Heh, didn't even take the holiday into consideration. Sorry.

@srl295
Copy link
Member Author

srl295 commented Jan 2, 2015

ping @trevnorris - happy new year!

@trevnorris
Copy link

@srl295 Thanks. Been testing it out and works great. One question on usability. Do you think make clean should cleanup the icu data?

With this change, `make distclean` will
delete these items, which aren't in the source repo:
* `icu_config.gypi`
* `deps/icu` - the ICU source tree
* `deps/icu-tmp` - the temporary download/unpack work area
* `deps/icu4c*.tgz deps/icu4c*.zip` - ICU source tarballs
@srl295
Copy link
Member Author

srl295 commented Jan 2, 2015

@trevnorris 9828792 causes make distclean to cleanup deps/icu* and icu_config.gypi

trevnorris pushed a commit that referenced this pull request Jan 3, 2015
Make "--with-intl=none" the default and add "intl-none" option to
vcbuild.bat.

If icu data is missing print a warning unless either --download=all or
--download=icu is set. If set then automatically download, verify (MD5)
and unpack the ICU data if not already available.

There's a "list" of URLs being used, but right now only the first is
picked up. The logic works something like this:

* If there is no directory deps/icu,
  * If no zip file (currently icu4c-54_1-src.zip),
    * Download zip file (icu-project.org -> sf.net)
  * Verify the MD5 sum of the zipfile
    * If bad, print error and exit
  * Unpack the zipfile into deps/icu
* If deps/icu now exists, use it, else fail with help text

Add the configuration option "--with-icu-source=..."

Usage:
  * --with-icu-source=/path/to/my/other/icu
  * --with-icu-source=/path/to/icu54.zip
  * --with-icu-source=/path/to/icu54.tgz
  * --with-icu-source=http://example.com/icu54.tar.bz2

Add the configuration option "--with-icu-locals=...".  Allows choosing
which locales are used in the "small-icu" case.

Example:
    configure --with-intl=small-icu --with-icu-locales=tlh,grc,nl

(Also note that as of this writing, neither Klingon nor Ancient Greek
are in upstream CLDR data. Serving suggestion only.)

Don't use hard coded ../../out paths on windows. This was suggested by
@misterdjules as it causes test failures.  With this fix, "out" is no
longer created on windows and the following can run properly:

    python tools/test.py simple

Reduce space by about 1MB with ICU 54 (over without this patch). Also
trims a few other source files, but only conditional on the exact ICU
version used. This is to future-proof - a file that is unneeded now may
be needed in future ICUs.

Also:
  * Update distclean to remove icu related files
  * Refactor some code into tools/configure.d/nodedownload.py
  * Update docs
  * Add test

PR-URL: #8719
Fixes: #7676 (comment)
[trev.norris@gmail.com small change to test's whitespace and logic]
Signed-off-by: Trevor Norris <trev.norris@gmail.com>
@trevnorris
Copy link

Thanks much. Squashed and landed in a308395.

@trevnorris trevnorris closed this Jan 3, 2015
@srl295 srl295 deleted the srl-v0.12-autoicu branch January 3, 2015 01:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants