Skip to content

Commit 1a25e90

Browse files
srl295Trott
authored andcommitted
tools: support full-icu by default
Instead of an English-only icudt64l.dat in the repo, we now have icudt64l.dat.gz with all locales. - updated READMEs and docs - shrinker now copies source, and compresses (bzip2) the ICU data file - configure expects deps/icu-small to be full ICU with a full compressed data file Fixes: #19214 Co-Authored-By: Richard Lau <riclau@uk.ibm.com> Co-Authored-By: Jan Olaf Krems <jan.krems@gmail.com> Co-Authored-By: James M Snell <jasnell@gmail.com> PR-URL: #29522 Reviewed-By: Jan Krems <jan.krems@gmail.com> Reviewed-By: Jiawen Geng <technicalcute@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: Michaël Zasso <targos@protonmail.com>
1 parent a71fb97 commit 1a25e90

11 files changed

+194
-135
lines changed

BUILDING.md

+29-18
Original file line numberDiff line numberDiff line change
@@ -35,21 +35,23 @@ file a new issue.
3535
* [Building Node.js](#building-nodejs-1)
3636
* [Android/Android-based devices (e.g. Firefox OS)](#androidandroid-based-devices-eg-firefox-os)
3737
* [`Intl` (ECMA-402) support](#intl-ecma-402-support)
38-
* [Default: `small-icu` (English only) support](#default-small-icu-english-only-support)
3938
* [Build with full ICU support (all locales supported by ICU)](#build-with-full-icu-support-all-locales-supported-by-icu)
4039
* [Unix/macOS](#unixmacos)
4140
* [Windows](#windows-1)
42-
* [Building without Intl support](#building-without-intl-support)
41+
* [Trimmed: `small-icu` (English only) support](#trimmed-small-icu-english-only-support)
4342
* [Unix/macOS](#unixmacos-1)
4443
* [Windows](#windows-2)
45-
* [Use existing installed ICU (Unix/macOS only)](#use-existing-installed-icu-unixmacOS-only)
46-
* [Build with a specific ICU](#build-with-a-specific-icu)
44+
* [Building without Intl support](#building-without-intl-support)
4745
* [Unix/macOS](#unixmacos-2)
4846
* [Windows](#windows-3)
47+
* [Use existing installed ICU (Unix/macOS only)](#use-existing-installed-icu-unixmacOS-only)
48+
* [Build with a specific ICU](#build-with-a-specific-icu)
49+
* [Unix/macOS](#unixmacos-3)
50+
* [Windows](#windows-4)
4951
* [Building Node.js with FIPS-compliant OpenSSL](#building-nodejs-with-fips-compliant-openssl)
5052
* [Building Node.js with external core modules](#building-nodejs-with-external-core-modules)
51-
* [Unix/macOS](#unixmacos-3)
52-
* [Windows](#windows-4)
53+
* [Unix/macOS](#unixmacos-4)
54+
* [Windows](#windows-5)
5355
* [Note for downstream distributors of Node.js](#note-for-downstream-distributors-of-nodejs)
5456

5557
## Supported platforms
@@ -598,31 +600,40 @@ $ make
598600
## `Intl` (ECMA-402) support
599601

600602
[Intl](https://github.com/nodejs/node/blob/master/doc/api/intl.md) support is
601-
enabled by default, with English data only.
603+
enabled by default.
602604

603-
### Default: `small-icu` (English only) support
605+
### Build with full ICU support (all locales supported by ICU)
604606

605-
By default, only English data is included, but
606-
the full `Intl` (ECMA-402) APIs. It does not need to download
607-
any dependencies to function. You can add full
608-
data at runtime.
607+
This is the default option.
609608

610-
### Build with full ICU support (all locales supported by ICU)
609+
#### Unix/macOS
611610

612-
With the `--download=all`, this may download ICU if you don't have an
613-
ICU in `deps/icu`. (The embedded `small-icu` included in the default
614-
Node.js source does not include all locales.)
611+
```console
612+
$ ./configure --with-intl=full-icu
613+
```
614+
615+
#### Windows
616+
617+
```console
618+
> .\vcbuild full-icu
619+
```
620+
621+
### Trimmed: `small-icu` (English only) support
622+
623+
In this configuration, only English data is included, but
624+
the full `Intl` (ECMA-402) APIs. It does not need to download
625+
any dependencies to function. You can add full data at runtime.
615626

616627
#### Unix/macOS
617628

618629
```console
619-
$ ./configure --with-intl=full-icu --download=all
630+
$ ./configure --with-intl=small-icu
620631
```
621632

622633
#### Windows
623634

624635
```console
625-
> .\vcbuild full-icu download-all
636+
> .\vcbuild small-icu
626637
```
627638

628639
### Building without Intl support

configure.py

+61-26
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
import shlex
1212
import subprocess
1313
import shutil
14+
import bz2
15+
1416
from distutils.spawn import find_executable as which
1517

1618
# If not run from node/, cd to node/.
@@ -409,7 +411,7 @@
409411
intl_optgroup.add_option('--with-intl',
410412
action='store',
411413
dest='with_intl',
412-
default='small-icu',
414+
default='full-icu',
413415
choices=valid_intl_modes,
414416
help='Intl mode (valid choices: {0}) [default: %default]'.format(
415417
', '.join(valid_intl_modes)))
@@ -1399,38 +1401,35 @@ def write_config(data, name):
13991401
icu_parent_path = 'deps'
14001402

14011403
# The full path to the ICU source directory. Should not include './'.
1402-
icu_full_path = 'deps/icu'
1404+
icu_deps_path = 'deps/icu'
1405+
icu_full_path = icu_deps_path
14031406

14041407
# icu-tmp is used to download and unpack the ICU tarball.
14051408
icu_tmp_path = os.path.join(icu_parent_path, 'icu-tmp')
14061409

14071410
# canned ICU. see tools/icu/README.md to update.
14081411
canned_icu_dir = 'deps/icu-small'
14091412

1413+
# use the README to verify what the canned ICU is
1414+
canned_is_full = os.path.isfile(os.path.join(canned_icu_dir, 'README-FULL-ICU.txt'))
1415+
canned_is_small = os.path.isfile(os.path.join(canned_icu_dir, 'README-SMALL-ICU.txt'))
1416+
if canned_is_small:
1417+
warn('Ignoring %s - in-repo small icu is no longer supported.' % canned_icu_dir)
1418+
14101419
# We can use 'deps/icu-small' - pre-canned ICU *iff*
1411-
# - with_intl == small-icu (the default!)
1412-
# - with_icu_locales == 'root,en' (the default!)
1413-
# - deps/icu-small exists!
1420+
# - canned_is_full AND
14141421
# - with_icu_source is unset (i.e. no other ICU was specified)
1415-
# (Note that this is the *DEFAULT CASE*.)
14161422
#
14171423
# This is *roughly* equivalent to
1418-
# $ configure --with-intl=small-icu --with-icu-source=deps/icu-small
1424+
# $ configure --with-intl=full-icu --with-icu-source=deps/icu-small
14191425
# .. Except that we avoid copying icu-small over to deps/icu.
14201426
# In this default case, deps/icu is ignored, although make clean will
14211427
# still harmlessly remove deps/icu.
14221428

1423-
# are we using default locales?
1424-
using_default_locales = ( options.with_icu_locales == icu_default_locales )
1425-
1426-
# make sure the canned ICU really exists
1427-
canned_icu_available = os.path.isdir(canned_icu_dir)
1428-
1429-
if (o['variables']['icu_small'] == b(True)) and using_default_locales and (not with_icu_source) and canned_icu_available:
1429+
if (not with_icu_source) and canned_is_full:
14301430
# OK- we can use the canned ICU.
1431-
icu_config['variables']['icu_small_canned'] = 1
14321431
icu_full_path = canned_icu_dir
1433-
1432+
icu_config['variables']['icu_full_canned'] = 1
14341433
# --with-icu-source processing
14351434
# now, check that they didn't pass --with-icu-source=deps/icu
14361435
elif with_icu_source and os.path.abspath(icu_full_path) == os.path.abspath(with_icu_source):
@@ -1508,29 +1507,40 @@ def write_config(data, name):
15081507
icu_endianness = sys.byteorder[0]
15091508
o['variables']['icu_ver_major'] = icu_ver_major
15101509
o['variables']['icu_endianness'] = icu_endianness
1511-
icu_data_file_l = 'icudt%s%s.dat' % (icu_ver_major, 'l')
1510+
icu_data_file_l = 'icudt%s%s.dat' % (icu_ver_major, 'l') # LE filename
15121511
icu_data_file = 'icudt%s%s.dat' % (icu_ver_major, icu_endianness)
15131512
# relative to configure
15141513
icu_data_path = os.path.join(icu_full_path,
15151514
'source/data/in',
1516-
icu_data_file_l)
1515+
icu_data_file_l) # LE
1516+
compressed_data = '%s.bz2' % (icu_data_path)
1517+
if not os.path.isfile(icu_data_path) and os.path.isfile(compressed_data):
1518+
# unpack. deps/icu is a temporary path
1519+
if os.path.isdir(icu_tmp_path):
1520+
shutil.rmtree(icu_tmp_path)
1521+
os.mkdir(icu_tmp_path)
1522+
icu_data_path = os.path.join(icu_tmp_path, icu_data_file_l)
1523+
with open(icu_data_path, 'wb') as outf:
1524+
with bz2.BZ2File(compressed_data, 'rb') as inf:
1525+
shutil.copyfileobj(inf, outf)
1526+
# Now, proceed..
1527+
15171528
# relative to dep..
1518-
icu_data_in = os.path.join('..','..', icu_full_path, 'source/data/in', icu_data_file_l)
1529+
icu_data_in = os.path.join('..','..', icu_data_path)
15191530
if not os.path.isfile(icu_data_path) and icu_endianness != 'l':
15201531
# use host endianness
15211532
icu_data_path = os.path.join(icu_full_path,
15221533
'source/data/in',
1523-
icu_data_file)
1524-
# relative to dep..
1525-
icu_data_in = os.path.join('..', icu_full_path, 'source/data/in',
1526-
icu_data_file)
1527-
# this is the input '.dat' file to use .. icudt*.dat
1528-
# may be little-endian if from a icu-project.org tarball
1529-
o['variables']['icu_data_in'] = icu_data_in
1534+
icu_data_file) # will be generated
15301535
if not os.path.isfile(icu_data_path):
15311536
# .. and we're not about to build it from .gyp!
15321537
error('''ICU prebuilt data file %s does not exist.
15331538
See the README.md.''' % icu_data_path)
1539+
1540+
# this is the input '.dat' file to use .. icudt*.dat
1541+
# may be little-endian if from a icu-project.org tarball
1542+
o['variables']['icu_data_in'] = icu_data_in
1543+
15341544
# map from variable name to subdirs
15351545
icu_src = {
15361546
'stubdata': 'stubdata',
@@ -1547,6 +1557,31 @@ def write_config(data, name):
15471557
var = 'icu_src_%s' % i
15481558
path = '../../%s/source/%s' % (icu_full_path, icu_src[i])
15491559
icu_config['variables'][var] = glob_to_var('tools/icu', path, 'patches/%s/source/%s' % (icu_ver_major, icu_src[i]) )
1560+
# calculate platform-specific genccode args
1561+
# print("platform %s, flavor %s" % (sys.platform, flavor))
1562+
# if sys.platform == 'darwin':
1563+
# shlib_suffix = '%s.dylib'
1564+
# elif sys.platform.startswith('aix'):
1565+
# shlib_suffix = '%s.a'
1566+
# else:
1567+
# shlib_suffix = 'so.%s'
1568+
if flavor == 'win':
1569+
icu_config['variables']['icu_asm_ext'] = 'obj'
1570+
icu_config['variables']['icu_asm_opts'] = [ '-o ' ]
1571+
elif with_intl == 'small-icu' or options.cross_compiling:
1572+
icu_config['variables']['icu_asm_ext'] = 'c'
1573+
icu_config['variables']['icu_asm_opts'] = []
1574+
elif flavor == 'mac':
1575+
icu_config['variables']['icu_asm_ext'] = 'S'
1576+
icu_config['variables']['icu_asm_opts'] = [ '-a', 'gcc-darwin' ]
1577+
elif sys.platform.startswith('aix'):
1578+
icu_config['variables']['icu_asm_ext'] = 'S'
1579+
icu_config['variables']['icu_asm_opts'] = [ '-a', 'xlc' ]
1580+
else:
1581+
# assume GCC-compatible asm is OK
1582+
icu_config['variables']['icu_asm_ext'] = 'S'
1583+
icu_config['variables']['icu_asm_opts'] = [ '-a', 'gcc' ]
1584+
15501585
# write updated icu_config.gypi with a bunch of paths
15511586
write(icu_config_name, do_not_edit +
15521587
pprint.pformat(icu_config, indent=2) + '\n')

deps/icu-small/README-FULL-ICU.txt

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
ICU sources - auto generated by shrink-icu-src.py
2+
3+
This directory contains the ICU subset used by --with-intl=full-icu
4+
It is a strict subset of ICU 64 source files with the following exception(s):
5+
* deps/icu-small/source/data/in/icudt64l.dat.bz2 : compressed data file
6+
7+
8+
To rebuild this directory, see ../../tools/icu/README.md

deps/icu-small/README-SMALL-ICU.txt

-8
This file was deleted.
-2.85 MB
Binary file not shown.
9.33 MB
Binary file not shown.

doc/api/intl.md

+11-15
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,9 @@ programs. Some of them are:
2323
* [`RegExp` Unicode Property Escapes][]
2424

2525
Node.js (and its underlying V8 engine) uses [ICU][] to implement these features
26-
in native C/C++ code. However, some of them require a very large ICU data file
27-
in order to support all locales of the world. Because it is expected that most
28-
Node.js users will make use of only a small portion of ICU functionality, only
29-
a subset of the full ICU data set is provided by Node.js by default. Several
30-
options are provided for customizing and expanding the ICU data set either when
26+
in native C/C++ code. The full ICU data set is provided by Node.js by default.
27+
However, due to the size of the ICU data file, several
28+
options are provided for customizing the ICU data set either when
3129
building or running Node.js.
3230

3331
## Options for building Node.js
@@ -38,8 +36,8 @@ in [BUILDING.md][].
3836

3937
* `--with-intl=none`/`--without-intl`
4038
* `--with-intl=system-icu`
41-
* `--with-intl=small-icu` (default)
42-
* `--with-intl=full-icu`
39+
* `--with-intl=small-icu`
40+
* `--with-intl=full-icu` (default)
4341

4442
An overview of available Node.js and JavaScript features for each `configure`
4543
option:
@@ -66,8 +64,8 @@ operation is identical to that of `Date.prototype.toString()`.
6664

6765
### Disable all internationalization features (`none`)
6866

69-
If this option is chosen, most internationalization features mentioned above
70-
will be **unavailable** in the resulting `node` binary.
67+
If this option is chosen, ICU is disabled and most internationalization
68+
features mentioned above will be **unavailable** in the resulting `node` binary.
7169

7270
### Build with a pre-installed ICU (`system-icu`)
7371

@@ -106,9 +104,7 @@ console.log(spanish.format(january));
106104
// Should print "enero"
107105
```
108106

109-
This mode provides a good balance between features and binary size, and it is
110-
the default behavior if no `--with-intl` flag is passed. The official binaries
111-
are also built in this mode.
107+
This mode provides a balance between features and binary size.
112108

113109
#### Providing ICU data at runtime
114110

@@ -149,8 +145,9 @@ enable full `Intl` support.
149145

150146
This option makes the resulting binary link against ICU statically and include
151147
a full set of ICU data. A binary created this way has no further external
152-
dependencies and supports all locales, but might be rather large. See
153-
[BUILDING.md][BUILDING.md#full-icu] on how to compile a binary using this mode.
148+
dependencies and supports all locales, but might be rather large. This is
149+
the default behavior if no `--with-intl` flag is passed. The official binaries
150+
are also built in this mode.
154151

155152
## Detecting internationalization support
156153

@@ -205,7 +202,6 @@ to be helpful:
205202
[`String.prototype.toUpperCase()`]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toUpperCase
206203
[`require('buffer').transcode()`]: buffer.html#buffer_buffer_transcode_source_fromenc_toenc
207204
[`require('util').TextDecoder`]: util.html#util_class_util_textdecoder
208-
[BUILDING.md#full-icu]: https://github.com/nodejs/node/blob/master/BUILDING.md#build-with-full-icu-support-all-locales-supported-by-icu
209205
[BUILDING.md]: https://github.com/nodejs/node/blob/master/BUILDING.md
210206
[ECMA-262]: https://tc39.github.io/ecma262/
211207
[ECMA-402]: https://tc39.github.io/ecma402/

doc/api/util.md

+20-22
Original file line numberDiff line numberDiff line change
@@ -932,26 +932,9 @@ Per the [WHATWG Encoding Standard][], the encodings supported by the
932932
one or more aliases may be used.
933933

934934
Different Node.js build configurations support different sets of encodings.
935-
While a very basic set of encodings is supported even on Node.js builds without
936-
ICU enabled, support for some encodings is provided only when Node.js is built
937-
with ICU and using the full ICU data (see [Internationalization][]).
935+
(see [Internationalization][])
938936

939-
#### Encodings Supported Without ICU
940-
941-
| Encoding | Aliases |
942-
| ----------- | --------------------------------- |
943-
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
944-
| `'utf-16le'` | `'utf-16'` |
945-
946-
#### Encodings Supported by Default (With ICU)
947-
948-
| Encoding | Aliases |
949-
| ----------- | --------------------------------- |
950-
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
951-
| `'utf-16le'` | `'utf-16'` |
952-
| `'utf-16be'` | |
953-
954-
#### Encodings Requiring Full ICU Data
937+
#### Encodings Supported by Default (With Full ICU Data)
955938

956939
| Encoding | Aliases |
957940
| ----------------- | -------------------------------- |
@@ -990,6 +973,21 @@ with ICU and using the full ICU data (see [Internationalization][]).
990973
| `'shift_jis'` | `'csshiftjis'`, `'ms932'`, `'ms_kanji'`, `'shift-jis'`, `'sjis'`, `'windows-31j'`, `'x-sjis'` |
991974
| `'euc-kr'` | `'cseuckr'`, `'csksc56011987'`, `'iso-ir-149'`, `'korean'`, `'ks_c_5601-1987'`, `'ks_c_5601-1989'`, `'ksc5601'`, `'ksc_5601'`, `'windows-949'` |
992975

976+
#### Encodings Supported when Node.js is built with the `small-icu` option
977+
978+
| Encoding | Aliases |
979+
| ----------- | --------------------------------- |
980+
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
981+
| `'utf-16le'` | `'utf-16'` |
982+
| `'utf-16be'` | |
983+
984+
#### Encodings Supported when ICU is disabled
985+
986+
| Encoding | Aliases |
987+
| ----------- | --------------------------------- |
988+
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
989+
| `'utf-16le'` | `'utf-16'` |
990+
993991
The `'iso-8859-16'` encoding listed in the [WHATWG Encoding Standard][]
994992
is not supported.
995993

@@ -1005,9 +1003,9 @@ changes:
10051003
* `encoding` {string} Identifies the `encoding` that this `TextDecoder` instance
10061004
supports. **Default:** `'utf-8'`.
10071005
* `options` {Object}
1008-
* `fatal` {boolean} `true` if decoding failures are fatal. This option is only
1009-
supported when ICU is enabled (see [Internationalization][]). **Default:**
1010-
`false`.
1006+
* `fatal` {boolean} `true` if decoding failures are fatal.
1007+
This option is not supported when ICU is disabled
1008+
(see [Internationalization][]). **Default:** `false`.
10111009
* `ignoreBOM` {boolean} When `true`, the `TextDecoder` will include the byte
10121010
order mark in the decoded result. When `false`, the byte order mark will
10131011
be removed from the output. This option is only used when `encoding` is

0 commit comments

Comments
 (0)