Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmonize strings delimiters inside the code, to free " or ' as compression token #55

Closed
Siorki opened this issue Aug 4, 2016 · 3 comments
Assignees
Milestone

Comments

@Siorki
Copy link
Owner

Siorki commented Aug 4, 2016

Currently, characters 34 " and 39 ' are banned as tokens for crusher / packer stages.
The rationale behind that is that if one is present in the input, the other one will be needed as the delimiter for the packed string.

However, if neither of them is present in the string, one will still be used as delimiter, while the other one could become a token. This adds a bit of complexity upon building the token list as the algorithm will have to determine which one will produce the longest ranges in the character class.
Having one or the other will not impact the crusher though.

@Siorki
Copy link
Owner Author

Siorki commented Oct 26, 2016

Complement : support for backquotes (ES6 template string) if ES6 flag (#54) is on.

@Siorki Siorki changed the title Allow " or ' as compression tokens if neither is present in the input Harmonize strings delimiters inside the code, to free " or ' as compression token Dec 3, 2016
@Siorki
Copy link
Owner Author

Siorki commented Dec 3, 2016

Extended the issue scope, hence the new title.

Following initial work on this issue, the preprocessor now recognizes strings in the input code. This might be reused for other tasks (#13, #57).
With ES6 we now have 3 delimiters : ' " `. Changing them in strings inside the input is an opportunity to free one either as a compression token, or to wrap the packed string without having to escape other occurrences inside. As this is performed before the crusher, there is no easy way to know whether one extra token will be useful (in 1k format, it usually isn't).

Suggested algorithm :

  • identify all strings inside the code, along with their current delimiter and `"' present inside
  • for each delimiter that can wrap the packed code, elaborate a minimal input
  • choose the delimiter that produces the shortest input (least escapes needed)
  • choose the delimiter that produces the shortest input (least escapes needed) while freeing one token (if possible)
  • create a branch for each so the need for the extra token will be determined after the crusher does its work, by comparing packed lengths

@Siorki Siorki added this to the 5.0 milestone Dec 24, 2016
@Siorki Siorki self-assigned this Dec 24, 2016
@Siorki
Copy link
Owner Author

Siorki commented Dec 28, 2016

Current implementation only addresses the first three bullet points above.
Improvements have been deferred to separate issues :

Siorki added a commit that referenced this issue Jan 3, 2017
Compute optimal packed string delimiter (', ", ` if ES6)
@Siorki Siorki closed this as completed Jan 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant