Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Intl.NumberFormat.prototype.formatToParts #5105

Merged
merged 2 commits into from
May 9, 2018

Conversation

jackhorton
Copy link
Contributor

@jackhorton jackhorton commented May 5, 2018

Also, somewhat randomly, fixes #5097 since I was already regenerating bytecode

// In more concrete words, we want to assign the entire formatted string to some number of parts,
// and each character should get the most specific part possible. We can accomplish this by
// keeping an auxiliary array with the same size as the formatted string that, at each index,
// holds the part for that character. For the following call pattern:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this explanation difficult to follow. The example that you give below, does that relate to the tree/example above? The example above has 11 entries, while below you refer to [0,9) which only has 10.

I'm also not clear on how the array of 0s below ends up turning into 1,000,000

Copy link
Contributor Author

@jackhorton jackhorton May 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it took me a while to get to this data structure so I think those are from two different nights. I can try clarify the comment a bit. The goal of the structure is basically that ICU reports back a tree of parts where each node in the tree has a type and a width, and there must be at least one node corresponding to each character in the tree (nodes can be wider than one character). Nodes can have children, and the parent-child relationship is that child node represents a more specific type for a given character range than the parent. Since ICU doesn't ever report "literal" parts of strings (like spaces or other extra characters), the root node in the tree will always be the entire width of the string with the type UnsetField. Then, for a string like "US$ 1,000", there will be two child nodes, one of type currency with width [0, 3) and one of type integer with width [4, 9). The integer node will have a child of type group and width [6, 7). So the most specific type for characters 0 to 3 is currency, 3 to 4 is unset (literal), 4 to 5 is integer, 5 to 6 is group, and 6 to 9 is integer.

I thought about this a bunch and I didn't like the idea of traversing an actual tree structure to get that information because it felt awkward to encode the "width" of nodes with specific meaning. So, I came up with the array structure where basically when we are told a part exists from position x to y, we can figure out what type used to apply to that span and update that section of the array with the new type. We skip over sections of the span [x, y) that have a type that doesn't match the start and end because that means we have already gotten a more specific part for that sub-span (for instance if we got a grouping separator before it's parent integer)

Does that make more sense? If not I can try to figure out a clearer explanation in person on Monday.

AssertOrFailFast(start < end);

// the asserts above mean the cast to charcount_t is safe
charcount_t ccStart = static_cast<charcount_t>(start);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use charcount_t here? Does the fields array have one entry for each character in the formatted string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. See my comment above.

{
if (JavascriptNumber::IsNan(num))
{
return JavascriptString::NewCopySz(_u("nan"), sc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewCopySz [](start = 45, length = 9)

Why copy when making these strings? I thought it was fine for a JavascriptString object to point to static data for its buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I didn't want to type out the numbers for each length to do NewWithBuffer, and regardless I planned on moving these to JavascriptLibrary's string cache before merging. This was mostly to prove that it would work.

, formatted(formatted)
, formattedLength(formattedLength)
, sc(scriptContext)
, fields(RecyclerNewArrayLeaf(sc->GetRecycler(), UNumberFormatFields, formattedLength))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised the Linux build didn't complain at you for not having fields marked as a Field, since it can hold a recycler pointer. Any idea why that would be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, looks like the checker only applies that rule to classes that have ever been recycler-allocated themselves, and NumberFormatPartsBuilder is only created on the stack. Carry on.


In reply to: 186459626 [](ancestors = 186459626)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably still mark it as a Field in case this ever needs to be created in the heap in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though the pointer itself would be a Field, not the thing being pointed at, like Field(UNumberFormatFields *). Do I have this backwards?


In reply to: 186466305 [](ancestors = 186466305)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you're correct, all the Field specifiers on this class should contain the whole type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, you're right.

, fields(RecyclerNewArrayLeaf(sc->GetRecycler(), UNumberFormatFields, formattedLength))
{
// this will allow InsertPart to tell what fields have been initialized or not, and will
// be the valid of resulting { type: "literal" } fields in ToPartsArray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be the valid of resulting [](start = 15, length = 25)

what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I meant "the value of resulting...". Does that make more sense?

void InsertPart(UNumberFormatFields field, int start, int end)
{
AssertOrFailFast(start >= 0);
AssertOrFailFast(end > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second assertion seems adequately covered by the third one

Copy link
Contributor

@sethbrenith sethbrenith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@jackhorton jackhorton force-pushed the intl/nf-formattoparts branch 2 times, most recently from 3d90e7e to bc6ea26 Compare May 7, 2018 20:39
Also, fix chakra-core#5097 while we are updating the bytecode and add tests for it.
@chakrabot chakrabot merged commit 1d2a952 into chakra-core:master May 9, 2018
chakrabot pushed a commit that referenced this pull request May 9, 2018
…tToParts

Merge pull request #5105 from jackhorton:intl/nf-formattoparts

Also, somewhat randomly, fixes #5097 since I was already regenerating bytecode
@jackhorton jackhorton deleted the intl/nf-formattoparts branch May 9, 2018 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intl-ICU: "a".localeCompare(null) throws TypeError
4 participants