Numbers: support compact mode #715

sieverk · 2017-04-27T00:34:50Z

WIP addressing #406 that I wanted to solicit some feedback on. I'm trying to add support for short/long compact numbers (long/short patterns already included in decimalFormats in CLDR). My proposed API is the following:

.numberFormatter({ compact: "short" })( 14000000 )
// > "14M"

.numberFormatter({ compact: "long" })( 14000000 )
// > "14 million"

The compact option plays nicely with the other options with one small asterisk: the long pattern can look a bit goofy with percents. ("14 million%") And currency/unit should be able to make use of this options in their number formatters as well.

Because compact patterns can only be determined when the number is in hand, we have to parse/apply that pattern at format-time. And since we can't determine which plural form applies until we have the rounded, decimal-shifted number, we have to check the pattern twice (the first time, figure out the decimal-shift divisor based on the "other" plural format string; the second time, use the rounded number to get the plural form and select the correct pattern for that plural). If I've missed some better way of doing this, I'm open to suggestions.

Anyway, wanted to solicit feedback on this approach--thanks for looking!

jsf-clabot · 2017-04-27T00:34:52Z

All committers have signed the CLA.

rxaviers · 2017-04-27T00:52:52Z

Hi @sieverk, it's awesome to see your WIP PR. It's on my TODO list to review and provide you feedback about your proposal. Thanks so far and sorry for the delay on my answer.

jbellenger · 2017-05-04T18:25:57Z

@rxaviers We tried deploying @sieverk's branch at twitter and saw that some short decimal formats break the number formatter.

For example, trying to format the number 222.604 in hu ends up executing this in src/number/format.js:

compactPattern = "000 E";  // hu cldr format
compactProperties = compactPattern.match( numberPatternRe );   // returns null

The cause appears to be in numberPatternRe:

/^(('([^']|'')*'|[^*#@0,.E])*)(\*.)?((([#,]*[0,]*0+)(\.0*[0-9]*#*)?)|([#,]*@+#*))(E\+?0+)?(('[^']+'|''|[^*#@0,.E])*)$/

Specifically, the last E in the pattern prevents matching for at least a couple of hu number formats. I'll try adding some fallback behavior for when pattern matching returns null, but there may be a latent issue here with number pattern matching.

rxaviers · 2017-05-04T18:48:25Z

E is a special character in the number pattern defined by CLDR.

It seems the compact number pattern defined by CLDR is slightly different in a way that the number pattern must be extracted from it.

The links I provided should contain the definition to allow that to be done correctly. If you run into any troubles please just let me know

rxaviers · 2017-05-24T08:55:48Z

test/unit/number/format.js

@@ -40,6 +40,14 @@ zh = new Cldr( "zh-u-nu-native" );

 QUnit.module( "Number Format" );

+function oneOrOtherPluralGenerator( plural ) {


Alternatively, we could call this enPluralGenerator :)

I see you're reusing it for es too, please ignore my comment.

rxaviers · 2017-05-24T11:33:52Z

src/number/format.js

+			prefix += compactProperties[ 1 ];
+			suffix = compactProperties[ 11 ] + suffix;
+		}
+	}


We need to handle cases bigger than maximum available scale provided by CLDR.

Could you elaborate on this? I'm afraid I don't quite understand what "scale" means in this context.

Sure, basically my concern is what happens when you try to format a number bigger (in orders of magnitude) than 100000000000000-count-other": "000 trillion?

Ahh, I see what you mean. We'll have to detect when no pattern is available for the number of digits and just return the raw number, I guess. Ideally, we'd be able to use scientific notation as a fallback when there is support for that.

Like commented below, it would be ideal to use the biggest available instead of ignoring compact mode for big numbers.

rxaviers · 2017-05-24T12:13:01Z

src/number/format.js

+			rounded = parseInt(numberFormatIntegerFractionDigits( number, minimumIntegerDigits,
+				minimumFractionDigits, maximumFractionDigits, round, roundIncrement ));
+			pluralForm = pluralGenerator ? pluralGenerator(rounded) : "other";
+			compactPattern = compactMap[ "1" + zeroes + "-count-" + pluralForm ] || compactPattern;


Please, what does || compactPattern prevent? A missing pluralForm? Let's add a comment explaining. In theory, that should never be needed, i.e., CLDR has the necessary forms for the locale; but it's better to be safe and not throw on runtime. 👍

It guards against CLDR languages (Japanese, for example, but there are several) that only specify patterns for the "other" plural. In that case, checking compactMap for a "one" pluralForm would give us undefined. Hence the fallback to compactPattern.

Yeap, but in theory that should never happen, for example the Japanese example, pluralGenerator should never return one. Also, CLDR has all the necessary forms for a locale. For example, for cases like Arabic where there are various plural forms, compactMap should contain all of them.

rxaviers · 2017-05-24T12:49:02Z

src/number/format.js

+
+			// calculate rounded value to determine plural form and extract correct pattern
+			rounded = parseInt(numberFormatIntegerFractionDigits( number, minimumIntegerDigits,
+				minimumFractionDigits, maximumFractionDigits, round, roundIncrement ));


You can drop this line and move the below code after numberFormatSignificantDigits and numberFormatIntegerFractionDigits calls, like this:

diff --git a/src/number/format.js b/src/number/format.js index 85ed8fbe..5ca2cda2 100644 --- a/src/number/format.js +++ b/src/number/format.js @@ -68,26 +68,15 @@ return function( number, properties, pluralGenerator ) { number *= 1000; } - var compactPattern, compactDigits, compactProperties, divisor, pluralForm, rounded; + var compactPattern, compactDigits, compactProperties, divisor, pluralForm, zeroes; if ( compactMap ) { - var zeroes = Array(Math.floor(number).toString().length).join( "0" ); + zeroes = Array(Math.floor(number).toString().length).join( "0" ); if ( zeroes.length >= 3 ) { // use default plural form to perform initial decimal shift compactPattern = compactMap[ "1" + zeroes + "-count-other" ]; compactDigits = compactPattern.split( "0" ).length - 1; divisor = zeroes.length - ( compactDigits - 1 ); number = number / Math.pow( 10, divisor ); - - // calculate rounded value to determine plural form and extract correct pattern - rounded = parseInt(numberFormatIntegerFractionDigits( number, minimumIntegerDigits, - minimumFractionDigits, maximumFractionDigits, round, roundIncrement )); - pluralForm = pluralGenerator ? pluralGenerator(rounded) : "other"; - compactPattern = compactMap[ "1" + zeroes + "-count-" + pluralForm ] || compactPattern; - compactProperties = compactPattern.match( numberPatternRe ); - - // update prefix/suffix with compact prefix/suffix - prefix += compactProperties[ 1 ]; - suffix = compactProperties[ 11 ] + suffix; } } @@ -102,6 +91,16 @@ return function( number, properties, pluralGenerator ) { minimumFractionDigits, maximumFractionDigits, round, roundIncrement ); } + if ( compactMap && zeroes.length >= 3 ) { + pluralForm = pluralGenerator ? pluralGenerator(+number) : "other"; + compactPattern = compactMap[ "1" + zeroes + "-count-" + pluralForm ] || compactPattern; + compactProperties = compactPattern.match( numberPatternRe ); + + // update prefix/suffix with compact prefix/suffix + prefix += compactProperties[ 1 ]; + suffix = compactProperties[ 11 ] + suffix; + } + // Remove the possible number minus sign number = number.replace( /^-/, "" );

Such change actually fixed this:

diff --git a/test/unit/number/format.js b/test/unit/number/format.js index ae8f9825..cd7520e0 100644 --- a/test/unit/number/format.js +++ b/test/unit/number/format.js @@ -120,7 +120,7 @@ QUnit.test( "decimals should format in compact mode", function( assert ) { assert.equal( format( 1273, properties( "#0.#", en, { compact: "long" } ) ), "1.3 thousand" ); assert.equal( format( 1273000, properties( "#0.#", es, { compact: "long" - } ), oneOrOtherPluralGenerator ), "1,3 millón" ); + } ), oneOrOtherPluralGenerator ), "1,3 millones" );

Ah, nice suggestion! Lets us take full advantage of sig/decimal figure options.

rxaviers · 2017-05-24T12:54:06Z

test/unit/number/format.js

+QUnit.test( "percents should format in compact mode", function( assert ) {
+	assert.equal( format( 127, properties( "#0%", en, { compact: "short" } ) ), "13K%" );
+	assert.equal( format( 127, properties( "#0%", en, { compact: "long" } ) ), "13 thousand%" );
+});


Overall, let's pass oneOrOtherPluralGenerator. It's weird to see 1.3 million with wrong pluralization 😄. We can add one specific test to assert the missing pluralGenerator and a missing plural form data to check both preventions you added in the code.

Another tests I'd like to see are: (a) round tests, (b) significant digits tests.

Example for b:

+ assert.equal( format( 1273500, properties( "@@@", en, { compact: "long" } ) ), "1.27 million" );

Added some round and significant digits tests -- let me know if you think there could be better coverage.

Not sure what you mean about the oneOrOtherPluralGenerator. I'm only passing it when using es because it's the only language we use in the tests that has plural forms. We could pass in all of the tests, I suppose, but that will make things a bit verbose.

Sure, what I mean is this:

enPluralGenerator = function(n) { var s = String(n).split('.'), v0 = !s[1]; return (n == 1 && v0) ? 'one' : 'other'; } esPluralGenerator = function(n) { return (n == 1) ? 'one' : 'other'; }

By the way, my original message was confusing wrt that.

Sure, I can make that change. It shouldn't affect English, though. The plural/single case is identical.

True, thanks

rxaviers · 2017-05-24T12:55:00Z

@sieverk this is looking great, good job so far... I've left a couple of comments.

rxaviers · 2017-05-24T13:05:57Z

Something I like about the API is that significant digits can be used very conveniently to control the display digits.

var fmt = .numberFormatter({
  minimumSignificantDigits: 1,
  maximumSignificantDigits: 2,
 compact: "short"
});
fmt( 1273500 ); // > 1.3 million
fmt( 12735000 ); // > 13 million
fmt( 127350000 ); // > 130 million

fmt = .numberFormatter({
  minimumSignificantDigits: 1,
  maximumSignificantDigits: 3,
  compact: "short"
});
fmt( 1273500 ); // > 1.27 million
fmt( 12735000 ); // > 12.7 million
fmt( 127350000 ); // > 127 million

rxaviers · 2017-05-24T13:10:42Z

Something that seems NOT possible is to lock the scale, for example at thousands, but CLDR doesn't mention that possibility. By the way, the approximation in use here seems to me very in line with CLDR. I believe that on cases where user needs more control over the scale and unit, they can use formatUnit and do the math.

rxaviers · 2017-05-29T21:27:28Z

@sieverk please could you sign the CLA? Thanks

SlexAxton · 2017-06-20T22:33:18Z

I'm also pretty interested in getting this in. Let me know if I can help.

sieverk · 2017-06-20T23:18:40Z

Sorry about the delayed response -- got swept up in a few other things this past month. Thanks for the feedback! I'll try to set some time aside this week to address questions / make some revisions / drop a few extra notes. Went ahead and got that CLA signed.

…act logic for numbers in the 1000s

…nding logic, added tests

rxaviers · 2017-06-22T14:46:28Z

src/number/compact-pattern-re.js

+ */
+return ( /^(('([^']|'')*'|[^*#@0,.E])*)(\*.)?((([#,]*[0,]*0+)(\.0*[0-9]*#*)?)|([#,]*@+#*))(E\+?0+)?(.*?)$/ );
+
+});


The diff between this and the src/number/pattern-re.js is:

--- compact-pattern-re.js 2017-06-22 11:34:01.000000000 -0300 +++ pattern-re.js 2017-03-13 11:49:00.000000000 -0300 @@ -26,7 +26,7 @@ * * suffix = non_number_stuff * - * non_number_stuff = regexp(.*?) + * non_number_stuff = regexp(('[^']+'|''|[^*#@0,.E])*) * * * Regexp groups: @@ -45,6 +45,6 @@ * 11: suffix * 12: - */ -return ( /^(('([^']|'')*'|[^*#@0,.E])*)(\*.)?((([#,]*[0,]*0+)(\.0*[0-9]*#*)?)|([#,]*@+#*))(E\+?0+)?(.*?)$/ ); +return ( /^(('([^']|'')*'|[^*#@0,.E])*)(\*.)?((([#,]*[0,]*0+)(\.0*[0-9]*#*)?)|([#,]*@+#*))(E\+?0+)?(('[^']+'|''|[^*#@0,.E])*)$/ ); });

Can we update src/number/pattern-re.js instead and reuse that?

The definition seems a valid number pattern re.

rxaviers · 2017-06-22T14:47:46Z

src/number/format.js

 	var infinitySymbol, maximumFractionDigits, maximumSignificantDigits, minimumFractionDigits,
 	minimumIntegerDigits, minimumSignificantDigits, nanSymbol, nuDigitsMap, padding, prefix,
 	primaryGroupingSize, pattern, ret, round, roundIncrement, secondaryGroupingSize, suffix,
-	symbolMap;
+	symbolMap, compactMap;


Nitpick: please sort the variables, therefore compactMap would be the first one. ☺️

D'oh, that's what I get for renaming too quickly.

rxaviers · 2017-06-22T14:49:56Z

Note to self: Review additional tests.

alex-stripe · 2017-07-18T08:30:29Z

Is there anything I can do to get this over the finish line? I'm really looking forward to using it.

rxaviers · 2017-07-18T17:12:42Z

Is there anything I can do to get this over the finish line? I'm really looking forward to using it.

Hi @alex-stripe, I think this is close to complete. I have to review the tests and I see one comment above not yet answered. @sieverk please any thoughts?

@alex-stripe do you use globalize-webpack-plugin or the precompilation on your side? If so, a great help would be getting those projects in good shape for this change. I believe very few will change there. Therefore, it would be mostly about adding tests to assert this works fine.

SlexAxton · 2017-07-18T19:00:00Z

(Sorry I keep switching accounts)

I don't use either, but I'll see if I can add some tests there in the next day or so just to keep things smooth.

rxaviers · 2017-07-18T19:29:57Z

Awesome! That will certainly make things smooth and quicker to release. Thanks

sieverk · 2017-07-18T20:40:41Z

Yeah, I think as far as code changes, there's that issue and handling long numbers (that exceed defined patterns). Both should be pretty easy. Will see if I can set aside an hour or so today.

…pattern-re

rxaviers · 2017-07-18T21:55:29Z

Awesome! I see updates and I believe it's entirely on my side now :). Will review in a couple of days. Thanks!

rxaviers · 2017-07-21T13:52:36Z

test/unit/number/format.js

+
+QUnit.test( "unsupported numbers should apply no compacting in compact mode", function( assert ) {
+	assert.equal( format( 0.01, properties( "#0.##", en, { compact: "short" } ) ), "0.01" )
+	assert.equal( format( 1234000000000000, properties( "#0", en, { compact: "short" } ) ), "1234000000000000" )


Instead of ignoring compact mode for the big case, can it use the biggest available, e.g., "000 trillion" => "1,234 trillion"?

Sure, I think that's probably more in line with what the user would want.

rxaviers · 2017-07-21T14:03:00Z

src/number/format.js

+		if ( zeroes.length >= 3 ) {
+
+			// use default plural form to perform initial decimal shift
+			compactPattern = compactMap[ "1" + zeroes + "-count-other" ];


At this point (the format execution itself), compactMap data structure should be the most convenient possible for use here (prepared by format-properties stage), e.g.:

compactMap: { 3: { one: "0 thousand", other: "0 thousand" }, 4: {one, other}, ... 15: {one, other} }

That would eliminate the need of creating the zeros array and would simplify the key above.

Ah, yes, shifting the work into compactMap makes a lot of sense.

rxaviers · 2017-07-21T14:13:02Z

src/number/format.js

+				// If no pattern, original number should remain uncompacted.
+				if ( compactPattern === "0" ) {
+					number = originalNumber;
+				}


Could this if block be moved up like:

if ( compactPattern === "0" ) { number = originalNumber; // Could you do this and avoid entering the other if block below // (the one after numbers are processed) compactPattern = null; } else if ( compactPattern ) { ... }

Yep. As you did in your branch, don't even need to apply the division / restore the number if arranged that way.

rxaviers · 2017-07-21T14:13:52Z

src/number/format.js

+			prefix += compactProperties[ 1 ];
+			suffix = compactProperties[ 11 ] + suffix;
+		}
+	}


Like commented below, it would be ideal to use the biggest available instead of ignoring compact mode for big numbers.

rxaviers · 2017-07-21T14:28:38Z

src/number/format.js

+
+		// Some languages specify no pattern for certain digit lengths, represented as "0".
+		// Only apply compact pattern if one is specified.
+		if ( compactPattern !== "0" ) {


Above, could you simply set compactPattern = null in this case and avoid this if completely?

rxaviers · 2017-07-21T15:04:59Z

test/unit/number/format.js

+QUnit.test( "percents should format in compact mode", function( assert ) {
+	assert.equal( format( 127, properties( "#0%", en, { compact: "short" } ) ), "13K%" );
+	assert.equal( format( 127, properties( "#0%", en, { compact: "long" } ) ), "13 thousand%" );
+});


True, thanks

rxaviers

Needed changes...

rxaviers

Needed changes...

rxaviers · 2017-07-21T21:15:07Z

I've added a couple of comments requesting changes, but I have simultaneously implemented them in here: https://github.com/rxaviers/globalize/tree/compact-mode

It needs another review round...

sieverk · 2017-07-26T17:37:00Z

Looked over your changes -- they look pretty good to me, other than some minor questions. Did you want a review over there?

rxaviers · 2017-07-26T17:38:22Z

Thanks @sieverk and sure your review is welcome. Let me create a PR.

rxaviers · 2017-07-26T17:40:05Z

Here we go... #759

alex-stripe · 2017-08-22T06:46:19Z

Seems like all the failures are commit message related.

Can we just rebase it to something valid and merge?

rxaviers · 2017-08-22T13:06:44Z

Continued in #759

rxaviers · 2017-08-22T13:15:52Z

@alex-stripe thanks for the push... It would be great if you could help by updating the examples and add a compact case if you have time. Thanks

rxaviers · 2017-08-22T13:16:18Z

By the way, this is merged via #759...

SlexAxton · 2017-08-22T14:08:38Z

Oh awesome. I'll whip up some examples asap.

…

On Tue, Aug 22, 2017 at 8:16 AM Rafael Xavier de Souza < ***@***.***> wrote: By the way, this is merged via #759 <#759>... — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#715 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAF5KoH535txWdceCoI-3I6FqOL-zLEzks5satSngaJpZM4NJli1> .

sieverk added 3 commits April 25, 2017 13:05

Docs: update with proposed compact API

b57ed30

Number: add support for compact option (short/long)

c989220

Docs: update main README example for compact numbers

e561d65

rxaviers reviewed May 24, 2017

View reviewed changes

sieverk added 8 commits June 21, 2017 14:28

support hungarian compact mode

bb3c0e8

support default/no pattern for languages that don't have special comp…

44c146b

…act logic for numbers in the 1000s

add missing file

6713968

merge latest master

7e5002f

Pull request feedback: shifted compact formatting logic after sig/rou…

6d44193

…nding logic, added tests

lint

355fa63

more lint

ed3986f

fix misnaming of numberCompact

fa27040

rxaviers reviewed Jun 22, 2017

View reviewed changes

gracefully handle unsupported numbers, merge compact-pattern-re into …

3789bfb

…pattern-re

rxaviers reviewed Jul 21, 2017

View reviewed changes

rxaviers requested changes Jul 21, 2017

View reviewed changes

rxaviers closed this Aug 22, 2017

		@@ -40,6 +40,14 @@ zh = new Cldr( "zh-u-nu-native" );

		QUnit.module( "Number Format" );

		function oneOrOtherPluralGenerator( plural ) {

Numbers: support compact mode #715

Numbers: support compact mode #715

Conversation

sieverk commented Apr 27, 2017

jsf-clabot commented Apr 27, 2017 • edited Loading

rxaviers commented Apr 27, 2017

jbellenger commented May 4, 2017

rxaviers commented May 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rxaviers Jun 22, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rxaviers May 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rxaviers commented May 24, 2017

rxaviers commented May 24, 2017 • edited Loading

rxaviers commented May 24, 2017 • edited Loading

rxaviers commented May 29, 2017

SlexAxton commented Jun 20, 2017

sieverk commented Jun 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rxaviers commented Jun 22, 2017 • edited Loading

alex-stripe commented Jul 18, 2017

rxaviers commented Jul 18, 2017

SlexAxton commented Jul 18, 2017

rxaviers commented Jul 18, 2017

sieverk commented Jul 18, 2017

rxaviers commented Jul 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rxaviers left a comment

Choose a reason for hiding this comment

rxaviers left a comment

Choose a reason for hiding this comment

rxaviers commented Jul 21, 2017 • edited Loading

sieverk commented Jul 26, 2017

rxaviers commented Jul 26, 2017

rxaviers commented Jul 26, 2017

alex-stripe commented Aug 22, 2017

rxaviers commented Aug 22, 2017

rxaviers commented Aug 22, 2017

rxaviers commented Aug 22, 2017

SlexAxton commented Aug 22, 2017 via email

jsf-clabot commented Apr 27, 2017 •

edited

Loading

rxaviers Jun 22, 2017 •

edited

Loading

rxaviers May 24, 2017 •

edited

Loading

rxaviers commented May 24, 2017 •

edited

Loading

rxaviers commented May 24, 2017 •

edited

Loading

rxaviers commented Jun 22, 2017 •

edited

Loading

rxaviers commented Jul 21, 2017 •

edited

Loading