Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed whitespace regex to /s+ (intermediate branch to fix the conflicts) #935

Merged
merged 11 commits into from
Mar 31, 2022

Conversation

sequba
Copy link
Contributor

@sequba sequba commented Mar 10, 2022

This is a technical PR that was necessary to resolve the conflicts from the forked repository. Fixes #898

Original PR by MartinDawson (#877):

Context

A crazy bug I just ran into caused by this.

In our spreadsheet we use a contentEditable div. We parse the text using chevrotain's lexer to find the cells to highlight. Like so:

image

When we access the content using .textContent it seems to replace all spaces with non-breaking spaces.

This should be fine because hyperformula should replace ANY space. However it's missing non-breaking spaces from the regex:

pattern: /[ \t\n\r]+/

Hyperformula's lexer was returning #ERROR! due to this.

So it should be /s+ instead.

How has this been tested?

See the spec test.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • I described the modification in the CHANGELOG.md file.

Other software compatibility:

Checked:

  • Ms Excel
  • Google Sheets
  • LibreOffice Calc

Other spreadsheet software accepts non-breaking space characters in formulas and displays them as regular spaces (no errors),

ODFF 1.3 compliance:

5.14 Whitespace
Whitespace ::= #x20 | #x09 | #x0a | #x0d

For calculation purposes, whitespace is ignored unless it is inside the contents of string constants or text surrounded by single quotes. Evaluators shall ignore any whitespace characters before and/or after any operators, constant numbers, constant strings, constant errors, inline arrays, parentheses used for controlling precedence, and the closing parenthesis of a function call. Whitespace shall be ignored following the initial equal sign(s). Whitespace shall be ignored just before a function name, but whitespace shall not separate a function name from its initial opening parenthesis. Whitespace shall not be used in the interior of a terminating grammar rule (a rule that references no other rule other than character sets, internally or externally-defined), unless specifically permitted by the terminating grammar rule, since these rules define the lexical properties of a component. Evaluators shall not write formulas with whitespace embedded in any unquoted identifier, constant Number, or constant Error. Evaluators shall treat SPACE (U+0020), CHARACTER TABULATION (U+0009), LINE FEED (U+000A), and CARRIAGE RETURN (U+000D) as whitespace characters.

An embedded line break shall be represented by a single LINE FEED character (U+000A), not by a carriage return-linefeed pair. When embedded in an XML attribute the linefeed character is represented as “ ”.

Evaluators should retain whitespace entered by the original formula creator and use it when saving or presenting the formula, and should not add additional whitespace unless directed to do so during the process of editing a formula.

The standard does not mention whitespace characters other than SPACE, TAB, LINE FEED, and CARRIAGE RETURN so I assume it is up to us how we want to handle them.

* Changed whitespace regex -> /s+.
* updated CHANGELOG.MD
@sequba sequba self-assigned this Mar 10, 2022
@sequba sequba requested a review from warpech March 10, 2022 18:33
@codecov
Copy link

codecov bot commented Mar 10, 2022

Codecov Report

Merging #935 (0279e6a) into develop (1e807ce) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #935   +/-   ##
========================================
  Coverage    95.70%   95.70%           
========================================
  Files          161      161           
  Lines        14018    14024    +6     
  Branches      2901     2902    +1     
========================================
+ Hits         13416    13422    +6     
  Misses         602      602           
Impacted Files Coverage Δ
src/Config.ts 98.00% <100.00%> (+0.02%) ⬆️
src/parser/FormulaParser.ts 97.22% <100.00%> (ø)
src/parser/LexerConfig.ts 100.00% <100.00%> (ø)
src/parser/ParserWithCaching.ts 94.55% <100.00%> (+0.07%) ⬆️

@sequba sequba mentioned this pull request Mar 15, 2022
2 tasks
@@ -95,7 +95,7 @@ export const ErrorLiteral = createToken({name: 'ErrorLiteral', pattern: /#[A-Za-
/* skipping whitespaces */
export const WhiteSpace = createToken({
name: 'WhiteSpace',
pattern: /[ \t\n\r]+/,
pattern: /\s+/,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sequba wrote in the OP:

The standard does not mention whitespace characters other than SPACE, TAB, LINE FEED, and CARRIAGE RETURN so I assume it is up to us how we want to handle them.

Our implementation will become non-standard if we do whatever we want. I think we should make the behavior configurable, as we did in the past with other ideas that go beyond the ODFF standard: #58

Copy link
Contributor Author

@sequba sequba Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it is highly unlikely that someone wants to display #ERROR! for formulas including non-breakable spaces but it won't hurt to add a config option for controlling that behaviour. I'll work on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added allowAllWhitespace parameter

@sequba sequba requested a review from warpech March 21, 2022 12:26
Copy link
Member

@warpech warpech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it looks very good but I have a doubt about the option itself.

src/Config.ts Outdated
*
* @category Formula Syntax
*/
allowAllWhitespace: boolean,
Copy link
Member

@warpech warpech Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there are many ways to do that, but I am looking for the most consistent and future-proof.

All other options in the "Formula Syntax" category allow string or array input. Have you considered naming this option ignoreWhiteSpace: 'standard' | 'any' (with standard as the default).

This allows us to add more options in the future like none, etc.

I think that the word "any" fits better than "all", because usually refers to quantity and might be confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. It's more flexible, Done.

@sequba sequba requested a review from warpech March 24, 2022 14:57
Copy link
Member

@warpech warpech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, love it!

@sequba sequba merged commit c125bf8 into develop Mar 31, 2022
@sequba sequba deleted the feature/issue-898/resolve-conflicts branch March 31, 2022 09:35
@kirszenbaum kirszenbaum linked an issue Apr 12, 2022 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

#ERROR! when using a non-breaking space character inside formula
3 participants