-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add improved performance on tons of whitespace #62
Conversation
Fixes #61 |
…ements would actually occur
Related-to: syntax-tree/mdast-util-to-hast#62. Co-authored-by: Nathaniel Hunter <42shadow42@gmail.com>
Related-to: syntax-tree/mdast-util-to-hast#62. Co-authored-by: Nathaniel Hunter <42shadow42@gmail.com>
Just moving some of the discussion from wooorm/trim-lines@51434f7#commitcomment-76990903 and wooorm/trim-lines@321dc64#commitcomment-76991050 back here. I remembered that I actually had a tiny utility that does what is also done here, that had the same performance problem, so I decided that it was probably a good idea to fix that utility, and reuse it here. I indeed thought that it would probably be really slow to revere strings in some cases. And it looked a bit off. That’s why I came up with a regex, splitting on line endings, and then trimming each line. I tested, with your test, that this new regex-way works exactly as fast as your reverse-string-way. You’re right that it would be good to also test some whitespace around line endings! |
lib/handlers/text.js
Outdated
u( | ||
'text', | ||
String(node.value) | ||
.replace(/^[ \t]+/gm, '') | ||
.split('') | ||
.reverse() | ||
.join('') | ||
.replace(/^[ \t]+/gm, '') | ||
.split('') | ||
.reverse() | ||
.join('') | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u( | |
'text', | |
String(node.value) | |
.replace(/^[ \t]+/gm, '') | |
.split('') | |
.reverse() | |
.join('') | |
.replace(/^[ \t]+/gm, '') | |
.split('') | |
.reverse() | |
.join('') | |
) | |
u('text', trimLines(node.value)) |
Could you switch this PR to use trim-lines
from npm
as a dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it makes sense to utilize an equivalent library that can be independently maintained, however before I make the change, I submitted a PR to address some edge cases that I identified, in the end the implementation is slower than 10ms on unfriendly input, but not more than 20ms. So it seems reasonable still.
test/text.js
Outdated
@@ -63,3 +63,20 @@ test('Nodes', (t) => { | |||
|
|||
t.end() | |||
}) | |||
|
|||
test('Nodes Efficiency', (t) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test('Nodes Efficiency', (t) => { | |
test('efficiency', (t) => { |
test/text.js
Outdated
|
||
test('Nodes Efficiency', (t) => { | ||
const timeout = setTimeout(() => { | ||
t.fail('should process lots of whitespace efficiently') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.fail('should process lots of whitespace efficiently') | |
t.fail('should process very fast') |
test/text.js
Outdated
|
||
t.deepEqual( | ||
toHast(u('text', `1${' '.repeat(70_000)}2`)), | ||
u('text', `1${' '.repeat(70_000)}2`), | ||
'should be efficient on excessive whitespace' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.deepEqual( | |
toHast(u('text', `1${' '.repeat(70_000)}2`)), | |
u('text', `1${' '.repeat(70_000)}2`), | |
'should be efficient on excessive whitespace' | |
) | |
const whitespace = ' '.repeat(70_000) | |
t.deepEqual( | |
toHast(u('text', '1' + whitespace + '2')), | |
u('text', '1' + whitespace + '2'), | |
'should be efficient on excessive whitespace' | |
) |
test/text.js
Outdated
@@ -63,3 +63,20 @@ test('Nodes', (t) => { | |||
|
|||
t.end() | |||
}) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below are some ideas on cleaning this code.
Optionally: it can also be dropped, as this test (and more) are now in trim-lines
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll just drop this test, I'll make the change once the library has the final revision in place. See my PR for details. wooorm/trim-lines#5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
@wooorm This one is ready for rereview. It now just delegates the inefficiency to your trim-lines package. |
This comment has been minimized.
This comment has been minimized.
Thanks, released in 12.1.2! |
Initial checklist
Description of changes
Altered method of clearing out extraneous whitespace to handle non-matches more effectively. Approaches the problem using a regex with no backtracking to remove trailing whitespace, reverses the string and repeats to remove leading (now trailing) whitespace again. reverses the string to restore the original orientation.
I know it seems a little unorthodox to reverse the string, but because of how regex engines operate a leading whitespace regex like "[ \t]*$" runs inefficiently with lots of non-matching whitespace.