Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add improved performance on tons of whitespace #62

Merged
merged 5 commits into from
Jul 4, 2022

Conversation

42shadow42
Copy link
Contributor

Initial checklist

  • I read the support docs
  • I read the contributing guide
  • I agree to follow the code of conduct
  • I searched issues and couldn’t find anything (or linked relevant results below)
  • If applicable, I’ve added docs and tests

Description of changes

Altered method of clearing out extraneous whitespace to handle non-matches more effectively. Approaches the problem using a regex with no backtracking to remove trailing whitespace, reverses the string and repeats to remove leading (now trailing) whitespace again. reverses the string to restore the original orientation.

I know it seems a little unorthodox to reverse the string, but because of how regex engines operate a leading whitespace regex like "[ \t]*$" runs inefficiently with lots of non-matching whitespace.

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Jun 25, 2022
@42shadow42
Copy link
Contributor Author

Fixes #61

wooorm added a commit to wooorm/trim-lines that referenced this pull request Jun 25, 2022
Related-to: syntax-tree/mdast-util-to-hast#62.

Co-authored-by: Nathaniel Hunter <42shadow42@gmail.com>
wooorm added a commit to wooorm/trim-lines that referenced this pull request Jun 25, 2022
Related-to: syntax-tree/mdast-util-to-hast#62.

Co-authored-by: Nathaniel Hunter <42shadow42@gmail.com>
@wooorm
Copy link
Member

wooorm commented Jun 26, 2022

Just moving some of the discussion from wooorm/trim-lines@51434f7#commitcomment-76990903 and wooorm/trim-lines@321dc64#commitcomment-76991050 back here.

I remembered that I actually had a tiny utility that does what is also done here, that had the same performance problem, so I decided that it was probably a good idea to fix that utility, and reuse it here.

I indeed thought that it would probably be really slow to revere strings in some cases. And it looked a bit off. That’s why I came up with a regex, splitting on line endings, and then trimming each line.

I tested, with your test, that this new regex-way works exactly as fast as your reverse-string-way. You’re right that it would be good to also test some whitespace around line endings!

wooorm added a commit to wooorm/trim-lines that referenced this pull request Jun 26, 2022
Comment on lines 15 to 26
u(
'text',
String(node.value)
.replace(/^[ \t]+/gm, '')
.split('')
.reverse()
.join('')
.replace(/^[ \t]+/gm, '')
.split('')
.reverse()
.join('')
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
u(
'text',
String(node.value)
.replace(/^[ \t]+/gm, '')
.split('')
.reverse()
.join('')
.replace(/^[ \t]+/gm, '')
.split('')
.reverse()
.join('')
)
u('text', trimLines(node.value))

Could you switch this PR to use trim-lines from npm as a dependency?

Copy link
Contributor Author

@42shadow42 42shadow42 Jun 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it makes sense to utilize an equivalent library that can be independently maintained, however before I make the change, I submitted a PR to address some edge cases that I identified, in the end the implementation is slower than 10ms on unfriendly input, but not more than 20ms. So it seems reasonable still.

test/text.js Outdated
@@ -63,3 +63,20 @@ test('Nodes', (t) => {

t.end()
})

test('Nodes Efficiency', (t) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test('Nodes Efficiency', (t) => {
test('efficiency', (t) => {

test/text.js Outdated

test('Nodes Efficiency', (t) => {
const timeout = setTimeout(() => {
t.fail('should process lots of whitespace efficiently')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.fail('should process lots of whitespace efficiently')
t.fail('should process very fast')

test/text.js Outdated
Comment on lines 71 to 76

t.deepEqual(
toHast(u('text', `1${' '.repeat(70_000)}2`)),
u('text', `1${' '.repeat(70_000)}2`),
'should be efficient on excessive whitespace'
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.deepEqual(
toHast(u('text', `1${' '.repeat(70_000)}2`)),
u('text', `1${' '.repeat(70_000)}2`),
'should be efficient on excessive whitespace'
)
const whitespace = ' '.repeat(70_000)
t.deepEqual(
toHast(u('text', '1' + whitespace + '2')),
u('text', '1' + whitespace + '2'),
'should be efficient on excessive whitespace'
)

test/text.js Outdated
@@ -63,3 +63,20 @@ test('Nodes', (t) => {

t.end()
})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below are some ideas on cleaning this code.
Optionally: it can also be dropped, as this test (and more) are now in trim-lines!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll just drop this test, I'll make the change once the library has the final revision in place. See my PR for details. wooorm/trim-lines#5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

@42shadow42
Copy link
Contributor Author

@wooorm This one is ready for rereview. It now just delegates the inefficiency to your trim-lines package.

@wooorm wooorm changed the title Optimize whitespace removal on text nodes Add improved performance on tons of whitespace Jul 4, 2022
@wooorm wooorm merged commit 36a7da8 into syntax-tree:main Jul 4, 2022
@wooorm wooorm added 🏁 area/perf This affects performance 💪 phase/solved Post is done labels Jul 4, 2022
@github-actions

This comment has been minimized.

@github-actions github-actions bot removed the 🤞 phase/open Post is being triaged manually label Jul 4, 2022
@wooorm
Copy link
Member

wooorm commented Jul 4, 2022

Thanks, released in 12.1.2!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏁 area/perf This affects performance 💪 phase/solved Post is done
Development

Successfully merging this pull request may close these issues.

2 participants