Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numbers: support compact mode #715

Closed
wants to merge 12 commits into from

Conversation

sieverk
Copy link
Contributor

@sieverk sieverk commented Apr 27, 2017

WIP addressing #406 that I wanted to solicit some feedback on. I'm trying to add support for short/long compact numbers (long/short patterns already included in decimalFormats in CLDR). My proposed API is the following:

.numberFormatter({ compact: "short" })( 14000000 )
// > "14M"

.numberFormatter({ compact: "long" })( 14000000 )
// > "14 million"

The compact option plays nicely with the other options with one small asterisk: the long pattern can look a bit goofy with percents. ("14 million%") And currency/unit should be able to make use of this options in their number formatters as well.

Because compact patterns can only be determined when the number is in hand, we have to parse/apply that pattern at format-time. And since we can't determine which plural form applies until we have the rounded, decimal-shifted number, we have to check the pattern twice (the first time, figure out the decimal-shift divisor based on the "other" plural format string; the second time, use the rounded number to get the plural form and select the correct pattern for that plural). If I've missed some better way of doing this, I'm open to suggestions.

Anyway, wanted to solicit feedback on this approach--thanks for looking!

@jsf-clabot
Copy link

jsf-clabot commented Apr 27, 2017

CLA assistant check
All committers have signed the CLA.

@rxaviers
Copy link
Member

Hi @sieverk, it's awesome to see your WIP PR. It's on my TODO list to review and provide you feedback about your proposal. Thanks so far and sorry for the delay on my answer.

@jbellenger
Copy link

@rxaviers We tried deploying @sieverk's branch at twitter and saw that some short decimal formats break the number formatter.

For example, trying to format the number 222.604 in hu ends up executing this in src/number/format.js:

compactPattern = "000 E";  // hu cldr format
compactProperties = compactPattern.match( numberPatternRe );   // returns null

The cause appears to be in numberPatternRe:

/^(('([^']|'')*'|[^*#@0,.E])*)(\*.)?((([#,]*[0,]*0+)(\.0*[0-9]*#*)?)|([#,]*@+#*))(E\+?0+)?(('[^']+'|''|[^*#@0,.E])*)$/

Specifically, the last E in the pattern prevents matching for at least a couple of hu number formats. I'll try adding some fallback behavior for when pattern matching returns null, but there may be a latent issue here with number pattern matching.

@rxaviers
Copy link
Member

rxaviers commented May 4, 2017

E is a special character in the number pattern defined by CLDR.

It seems the compact number pattern defined by CLDR is slightly different in a way that the number pattern must be extracted from it.

The links I provided should contain the definition to allow that to be done correctly. If you run into any troubles please just let me know

@@ -40,6 +40,14 @@ zh = new Cldr( "zh-u-nu-native" );

QUnit.module( "Number Format" );

function oneOrOtherPluralGenerator( plural ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could call this enPluralGenerator :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you're reusing it for es too, please ignore my comment.

prefix += compactProperties[ 1 ];
suffix = compactProperties[ 11 ] + suffix;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to handle cases bigger than maximum available scale provided by CLDR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on this? I'm afraid I don't quite understand what "scale" means in this context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, basically my concern is what happens when you try to format a number bigger (in orders of magnitude) than 100000000000000-count-other": "000 trillion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see what you mean. We'll have to detect when no pattern is available for the number of digits and just return the raw number, I guess. Ideally, we'd be able to use scientific notation as a fallback when there is support for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like commented below, it would be ideal to use the biggest available instead of ignoring compact mode for big numbers.

rounded = parseInt(numberFormatIntegerFractionDigits( number, minimumIntegerDigits,
minimumFractionDigits, maximumFractionDigits, round, roundIncrement ));
pluralForm = pluralGenerator ? pluralGenerator(rounded) : "other";
compactPattern = compactMap[ "1" + zeroes + "-count-" + pluralForm ] || compactPattern;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, what does || compactPattern prevent? A missing pluralForm? Let's add a comment explaining. In theory, that should never be needed, i.e., CLDR has the necessary forms for the locale; but it's better to be safe and not throw on runtime. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It guards against CLDR languages (Japanese, for example, but there are several) that only specify patterns for the "other" plural. In that case, checking compactMap for a "one" pluralForm would give us undefined. Hence the fallback to compactPattern.

Copy link
Member

@rxaviers rxaviers Jun 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, but in theory that should never happen, for example the Japanese example, pluralGenerator should never return one. Also, CLDR has all the necessary forms for a locale. For example, for cases like Arabic where there are various plural forms, compactMap should contain all of them.


// calculate rounded value to determine plural form and extract correct pattern
rounded = parseInt(numberFormatIntegerFractionDigits( number, minimumIntegerDigits,
minimumFractionDigits, maximumFractionDigits, round, roundIncrement ));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop this line and move the below code after numberFormatSignificantDigits and numberFormatIntegerFractionDigits calls, like this:

diff --git a/src/number/format.js b/src/number/format.js
index 85ed8fbe..5ca2cda2 100644
--- a/src/number/format.js
+++ b/src/number/format.js
@@ -68,26 +68,15 @@ return function( number, properties, pluralGenerator ) {
 		number *= 1000;
 	}
 
-	var compactPattern, compactDigits, compactProperties, divisor, pluralForm, rounded;
+	var compactPattern, compactDigits, compactProperties, divisor, pluralForm, zeroes;
 	if ( compactMap ) {
-		var zeroes = Array(Math.floor(number).toString().length).join( "0" );
+		zeroes = Array(Math.floor(number).toString().length).join( "0" );
 		if ( zeroes.length >= 3 ) {
 			// use default plural form to perform initial decimal shift
 			compactPattern = compactMap[ "1" + zeroes + "-count-other" ];
 			compactDigits = compactPattern.split( "0" ).length - 1;
 			divisor = zeroes.length - ( compactDigits - 1 );
 			number = number / Math.pow( 10, divisor );
-
-			// calculate rounded value to determine plural form and extract correct pattern
-			rounded = parseInt(numberFormatIntegerFractionDigits( number, minimumIntegerDigits,
-				minimumFractionDigits, maximumFractionDigits, round, roundIncrement ));
-			pluralForm = pluralGenerator ? pluralGenerator(rounded) : "other";
-			compactPattern = compactMap[ "1" + zeroes + "-count-" + pluralForm ] || compactPattern;
-			compactProperties = compactPattern.match( numberPatternRe );
-
-			// update prefix/suffix with compact prefix/suffix
-			prefix += compactProperties[ 1 ];
-			suffix = compactProperties[ 11 ] + suffix;
 		}
 	}
 
@@ -102,6 +91,16 @@ return function( number, properties, pluralGenerator ) {
 			minimumFractionDigits, maximumFractionDigits, round, roundIncrement );
 	}
 
+	if ( compactMap && zeroes.length >= 3 ) {
+		pluralForm = pluralGenerator ? pluralGenerator(+number) : "other";
+		compactPattern = compactMap[ "1" + zeroes + "-count-" + pluralForm ] || compactPattern;
+		compactProperties = compactPattern.match( numberPatternRe );
+
+		// update prefix/suffix with compact prefix/suffix
+		prefix += compactProperties[ 1 ];
+		suffix = compactProperties[ 11 ] + suffix;
+	}
+
 	// Remove the possible number minus sign
 	number = number.replace( /^-/, "" );

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such change actually fixed this:

diff --git a/test/unit/number/format.js b/test/unit/number/format.js
index ae8f9825..cd7520e0 100644
--- a/test/unit/number/format.js
+++ b/test/unit/number/format.js
@@ -120,7 +120,7 @@ QUnit.test( "decimals should format in compact mode", function( assert ) {
 	assert.equal( format( 1273, properties( "#0.#", en, { compact: "long" } ) ), "1.3 thousand" );
 	assert.equal( format( 1273000, properties( "#0.#", es, {
 		compact: "long"
-	} ), oneOrOtherPluralGenerator ), "1,3 millón" );
+	} ), oneOrOtherPluralGenerator ), "1,3 millones" );

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice suggestion! Lets us take full advantage of sig/decimal figure options.

QUnit.test( "percents should format in compact mode", function( assert ) {
assert.equal( format( 127, properties( "#0%", en, { compact: "short" } ) ), "13K%" );
assert.equal( format( 127, properties( "#0%", en, { compact: "long" } ) ), "13 thousand%" );
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, let's pass oneOrOtherPluralGenerator. It's weird to see 1.3 million with wrong pluralization 😄. We can add one specific test to assert the missing pluralGenerator and a missing plural form data to check both preventions you added in the code.

Copy link
Member

@rxaviers rxaviers May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another tests I'd like to see are: (a) round tests, (b) significant digits tests.

Example for b:

 +	assert.equal( format( 1273500, properties( "@@@", en, { compact: "long" } ) ), "1.27 million" );

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some round and significant digits tests -- let me know if you think there could be better coverage.

Not sure what you mean about the oneOrOtherPluralGenerator. I'm only passing it when using es because it's the only language we use in the tests that has plural forms. We could pass in all of the tests, I suppose, but that will make things a bit verbose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, what I mean is this:

enPluralGenerator = function(n) {
  var s = String(n).split('.'), v0 = !s[1];
  return (n == 1 && v0) ? 'one' : 'other';
}

esPluralGenerator = function(n) {
  return (n == 1) ? 'one' : 'other';
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, my original message was confusing wrt that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can make that change. It shouldn't affect English, though. The plural/single case is identical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, thanks

@rxaviers
Copy link
Member

@sieverk this is looking great, good job so far... I've left a couple of comments.

@rxaviers
Copy link
Member

rxaviers commented May 24, 2017

Something I like about the API is that significant digits can be used very conveniently to control the display digits.

var fmt = .numberFormatter({
  minimumSignificantDigits: 1,
  maximumSignificantDigits: 2,
 compact: "short"
});
fmt( 1273500 ); // > 1.3 million
fmt( 12735000 ); // > 13 million
fmt( 127350000 ); // > 130 million

fmt = .numberFormatter({
  minimumSignificantDigits: 1,
  maximumSignificantDigits: 3,
  compact: "short"
});
fmt( 1273500 ); // > 1.27 million
fmt( 12735000 ); // > 12.7 million
fmt( 127350000 ); // > 127 million

@rxaviers
Copy link
Member

rxaviers commented May 24, 2017

Something that seems NOT possible is to lock the scale, for example at thousands, but CLDR doesn't mention that possibility. By the way, the approximation in use here seems to me very in line with CLDR. I believe that on cases where user needs more control over the scale and unit, they can use formatUnit and do the math.

@rxaviers
Copy link
Member

@sieverk please could you sign the CLA? Thanks

@SlexAxton
Copy link
Contributor

I'm also pretty interested in getting this in. Let me know if I can help.

@sieverk
Copy link
Contributor Author

sieverk commented Jun 20, 2017

Sorry about the delayed response -- got swept up in a few other things this past month. Thanks for the feedback! I'll try to set some time aside this week to address questions / make some revisions / drop a few extra notes. Went ahead and got that CLA signed.

*/
return ( /^(('([^']|'')*'|[^*#@0,.E])*)(\*.)?((([#,]*[0,]*0+)(\.0*[0-9]*#*)?)|([#,]*@+#*))(E\+?0+)?(.*?)$/ );

});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff between this and the src/number/pattern-re.js is:

--- compact-pattern-re.js	2017-06-22 11:34:01.000000000 -0300
+++ pattern-re.js	2017-03-13 11:49:00.000000000 -0300
@@ -26,7 +26,7 @@
  *
  * suffix =                   non_number_stuff
  *
- * non_number_stuff =         regexp(.*?)
+ * non_number_stuff =         regexp(('[^']+'|''|[^*#@0,.E])*)
  *
  *
  * Regexp groups:
@@ -45,6 +45,6 @@
  * 11: suffix
  * 12: -
  */
-return ( /^(('([^']|'')*'|[^*#@0,.E])*)(\*.)?((([#,]*[0,]*0+)(\.0*[0-9]*#*)?)|([#,]*@+#*))(E\+?0+)?(.*?)$/ );
+return ( /^(('([^']|'')*'|[^*#@0,.E])*)(\*.)?((([#,]*[0,]*0+)(\.0*[0-9]*#*)?)|([#,]*@+#*))(E\+?0+)?(('[^']+'|''|[^*#@0,.E])*)$/ );
 
 });

Can we update src/number/pattern-re.js instead and reuse that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition seems a valid number pattern re.

var infinitySymbol, maximumFractionDigits, maximumSignificantDigits, minimumFractionDigits,
minimumIntegerDigits, minimumSignificantDigits, nanSymbol, nuDigitsMap, padding, prefix,
primaryGroupingSize, pattern, ret, round, roundIncrement, secondaryGroupingSize, suffix,
symbolMap;
symbolMap, compactMap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: please sort the variables, therefore compactMap would be the first one. ☺️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh, that's what I get for renaming too quickly.

@rxaviers
Copy link
Member

rxaviers commented Jun 22, 2017

  • Note to self: Review additional tests.

@alex-stripe
Copy link

Is there anything I can do to get this over the finish line? I'm really looking forward to using it.

@rxaviers
Copy link
Member

Is there anything I can do to get this over the finish line? I'm really looking forward to using it.

Hi @alex-stripe, I think this is close to complete. I have to review the tests and I see one comment above not yet answered. @sieverk please any thoughts?

@alex-stripe do you use globalize-webpack-plugin or the precompilation on your side? If so, a great help would be getting those projects in good shape for this change. I believe very few will change there. Therefore, it would be mostly about adding tests to assert this works fine.

@SlexAxton
Copy link
Contributor

(Sorry I keep switching accounts)

I don't use either, but I'll see if I can add some tests there in the next day or so just to keep things smooth.

@rxaviers
Copy link
Member

Awesome! That will certainly make things smooth and quicker to release. Thanks

@sieverk
Copy link
Contributor Author

sieverk commented Jul 18, 2017

Yeah, I think as far as code changes, there's that issue and handling long numbers (that exceed defined patterns). Both should be pretty easy. Will see if I can set aside an hour or so today.

@rxaviers
Copy link
Member

Awesome! I see updates and I believe it's entirely on my side now :). Will review in a couple of days. Thanks!


QUnit.test( "unsupported numbers should apply no compacting in compact mode", function( assert ) {
assert.equal( format( 0.01, properties( "#0.##", en, { compact: "short" } ) ), "0.01" )
assert.equal( format( 1234000000000000, properties( "#0", en, { compact: "short" } ) ), "1234000000000000" )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of ignoring compact mode for the big case, can it use the biggest available, e.g., "000 trillion" => "1,234 trillion"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think that's probably more in line with what the user would want.

if ( zeroes.length >= 3 ) {

// use default plural form to perform initial decimal shift
compactPattern = compactMap[ "1" + zeroes + "-count-other" ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point (the format execution itself), compactMap data structure should be the most convenient possible for use here (prepared by format-properties stage), e.g.:

compactMap: {
  3: {
    one: "0 thousand",
    other: "0 thousand"
  },
  4: {one, other},
  ...
  15: {one, other}
}

That would eliminate the need of creating the zeros array and would simplify the key above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, shifting the work into compactMap makes a lot of sense.

// If no pattern, original number should remain uncompacted.
if ( compactPattern === "0" ) {
number = originalNumber;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this if block be moved up like:

if ( compactPattern === "0" ) {
    number = originalNumber;
    // Could you do this and avoid entering the other if block below
    // (the one after numbers are processed)
    compactPattern = null;
} else if ( compactPattern ) {
    ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. As you did in your branch, don't even need to apply the division / restore the number if arranged that way.

prefix += compactProperties[ 1 ];
suffix = compactProperties[ 11 ] + suffix;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like commented below, it would be ideal to use the biggest available instead of ignoring compact mode for big numbers.


// Some languages specify no pattern for certain digit lengths, represented as "0".
// Only apply compact pattern if one is specified.
if ( compactPattern !== "0" ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above, could you simply set compactPattern = null in this case and avoid this if completely?

QUnit.test( "percents should format in compact mode", function( assert ) {
assert.equal( format( 127, properties( "#0%", en, { compact: "short" } ) ), "13K%" );
assert.equal( format( 127, properties( "#0%", en, { compact: "long" } ) ), "13 thousand%" );
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, thanks

Copy link
Member

@rxaviers rxaviers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed changes...

Copy link
Member

@rxaviers rxaviers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed changes...

@rxaviers
Copy link
Member

rxaviers commented Jul 21, 2017

I've added a couple of comments requesting changes, but I have simultaneously implemented them in here: https://github.com/rxaviers/globalize/tree/compact-mode

It needs another review round...

@sieverk
Copy link
Contributor Author

sieverk commented Jul 26, 2017

Looked over your changes -- they look pretty good to me, other than some minor questions. Did you want a review over there?

@rxaviers
Copy link
Member

Thanks @sieverk and sure your review is welcome. Let me create a PR.

@rxaviers
Copy link
Member

Here we go... #759

@alex-stripe
Copy link

Seems like all the failures are commit message related.

Can we just rebase it to something valid and merge?

@rxaviers
Copy link
Member

Continued in #759

@rxaviers rxaviers closed this Aug 22, 2017
@rxaviers
Copy link
Member

@alex-stripe thanks for the push... It would be great if you could help by updating the examples and add a compact case if you have time. Thanks

@rxaviers
Copy link
Member

By the way, this is merged via #759...

@SlexAxton
Copy link
Contributor

SlexAxton commented Aug 22, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants