Fix splitCssText again #1640

eoghanmurray · 2025-01-29T15:07:08Z

See also PostHog/posthog-js#1668 for the downstream bug report, and further discussion in Slack (Thanks Paul D'Ambra for report and pointers)

This is a further improvement after performance fixes in #1615

This covers new scenarios as outlined in the tests. Test cases were recreated from a very large inline style node in https://hiring.workfully.com/signin which looks like it was in a shadow root :host(.productfruits--container) although I can't quite find it there now. I pulled the examples files from a breakpoint and have them locally, but the test cases here incorporate the important bit, including the split in the middle of a statement.
Ultimately the problem with the content which triggered this case was that margin-top: 0; as authored, gets serialized to margin-top: 0px;, which was preventing us finding the right point to split between normalized/unnormalized.

…n with `isAttachIframe`' test - it was working for me when the test was run in isolation (`-t` option), but when the entire cross-origin-iframes test was run, the change of iframe contents didn't seem to happen in time

… and we end up not finding a unique one - we should just go with the first one (note: this is still not binary search so could exhibit pathological behaviour)

… (Posthog) ... see comment from MartinWorkfully: PostHog/posthog-js#1668

…ches

…'s size

changeset-bot · 2025-01-29T15:07:12Z

🦋 Changeset detected

Latest commit: 5743c7e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 19 packages

Name	Type
rrweb-snapshot	Patch
rrweb	Patch
rrdom	Patch
rrdom-nodejs	Patch
rrweb-player	Patch
@rrweb/all	Patch
@rrweb/replay	Patch
@rrweb/record	Patch
@rrweb/types	Patch
@rrweb/packer	Patch
@rrweb/utils	Patch
@rrweb/web-extension	Patch
rrvideo	Patch
@rrweb/rrweb-plugin-console-record	Patch
@rrweb/rrweb-plugin-console-replay	Patch
@rrweb/rrweb-plugin-sequential-id-record	Patch
@rrweb/rrweb-plugin-sequential-id-replay	Patch
@rrweb/rrweb-plugin-canvas-webrtc-record	Patch
@rrweb/rrweb-plugin-canvas-webrtc-replay	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

pauldambra · 2025-01-31T13:52:19Z

packages/rrweb-snapshot/src/utils.ts

+            const prevTextContent = childNodes[i - 1].textContent;
+            if (prevTextContent && typeof prevTextContent === 'string') {
+              // pick the first matching point which respects the previous chunk's approx size
+              const prevMinLength = normalizeCssString(prevTextContent).length;


i guess it's somewhere here that means if you run a test twice (or some multiple times) then you don't get the same output

(at least in my experience of testing whether this was deterministic when trying to figure out what was happening)

I don't quite understand this comment, but I would say the algorithm is indeed deterministic, but maybe you mean it will behave differently based on different sized inputs because of the jLimit bit?

oh, i wrote a test that ran the split multiple times and compared the output and it didn't match
was generally whitespace ending up on different sides of a split

Ah okay, yes this is possible ... basically it stops when the normalised versions match (although I still think it would be deterministic given the same input twice). I can't imagine there being a problem if one side has more whitespace than it should have.

pauldambra · 2025-01-31T13:54:06Z

packages/rrweb-snapshot/src/utils.ts

+  if (_testNoPxNorm) {
+    return cssText.replace(/(\/\*[^*]*\*\/)|[\s;]/g, '');
+  } else {
+    return cssText.replace(/(\/\*[^*]*\*\/)|[\s;]/g, '').replace(/0px/g, '0');


i briefly tested a very naive loop parser and it was slower than regex replace - i guess because browsers/v8 are doing some magic to optimise this already

but I didn't test it over a range of inputs

this is (from my testing) at best O(n) for whitespace - and for clarity since i'm not much of a comp sci person. if you insert whitespace into the input then this gets slower the more whitespace is present

The performance of the normalization function could be improved. I've moreso tried to ensure it's not called repeatedly on the same piece of css (with the 'binary search' style changes in #1615 ).

pauldambra

i am really not sure how i could roll this out in prod
the last version basically doubled our support load and i'm ending this week exhausted as a result

i have no idea of how to test if this fixes for all cases or just for a specific case

we seem to be writing a css parser, and i'm wondering if adopting a css parser would be safer

eoghanmurray · 2025-02-04T10:35:16Z

We're not attempting to write a CSS parser at record time, we are using cssRules to parse the CSS which uses the native browser capabilities.

In most cases, the splitCssText function will pass through input for <style> elements that have only one child without further processing, so these fixes are very much for edge cases, which I know is cold comfort for yourself who is dealing with the fallout from the exceptions being encountered in the wild.

The mutation issues that this splitting solves are also mostly theoretical, so it's possible to patch/short-circuit the splitCssText function (directly return [cssText]) for guaranteed performance, at the expense of correctness of the edge cases documented in the 'css splitter' suite. I introduced this splitting approach in #1437 and never dreamed it would cause performance issues so I didn't write it with performance in mind then, and have been patching up since.

we seem to be writing a css parser

I appreciate that the algorithm implemented here is not simple; I've thought about abstracting it out to a third party library "split a string according to an array of related substrings which can be matched via a normalization function" ... I haven't looked into whether such a thing already exists.

This PR is definitely an improvement and I believe catches the last pathological case, particularly as now there is now an additional jLimit iteration limit.

There is another large PR to move CSS parsing off the main thread at record time, however that would likely have hidden this problem rather than bringing it to the fore so painfully.

I've also another plan to ditch the whole cssRules approach, and hence the need to do any matching of split points, as we'd just use the textContent directly, when we can detect that the style element hasn't been modified programmatically, however I'm waiting for #1475 to get merged before we can look at that.

Juice10 · 2025-02-04T14:30:25Z

packages/rrweb-snapshot/src/utils.ts

@@ -463,19 +470,24 @@ export function normalizeCssString(cssText: string): string {
 export function splitCssText(
  cssText: string,
  style: HTMLStyleElement,
+  _testNoPxNorm = false,


Would be great to get some tsdoc documentation as to what this does, especially since _variableName normally means: an unused variable, in JS/TS land

…again

* Fix up the 'should replace the existing DOM nodes on iframe navigation with `isAttachIframe`' test (rrweb-io#1636) - it was working for me when the test was run in isolation (`-t` option), but when the entire cross-origin-iframes test was run, the change of iframe contents didn't seem to happen in time * [chore]: Update actions/upload-artifact to v4 (rrweb-io#1643) * update actions/upload-artifact to v4 --------- Co-authored-by: Eoghan Murray <eoghan@getthere.ie> * Fix a code path where masking could be skipped on textareas (rrweb-io#1599) * Fixes rrweb-io#1596 * [chore] Cache yarn packages for CI (rrweb-io#1646) * [chore] Cache yarn packages for CI * Cache yarn in release.yml * [chore] Update deprecated download artifact on CI (rrweb-io#1647) * I'm merging even though ESLint is stlll failing in Github Actions as I believe it's running actions _without_ this PR applied yet * Fix env puppeteer error in cross-origin-iframes.test.ts (rrweb-io#1629) * chore(ci): track bundle size (rrweb-io#1630) * chore(ci): track bundle size --------- Co-authored-by: pauldambra <pauldambra@users.noreply.github.com> * Fix adapt css with split (rrweb-io#1600) Fix for rrweb-io#1575 where postcss was raising an exception * adapt the entire CSS as a whole in one pass with postcss, rather than adapting each split part separately * break up the postcss output again and assign to individual text nodes (kind of inverse of splitCssText at record side) * impose an upper bound of 30 iterations on the substring searches to preempt possible pathological behavior * add tests to demonstrate the scenario and prevent regression More technical details: * Fix algorithm; checks against `ix_end` within loop were incorrect when `ix_start` was bigger than zero. * Fix that length check against wrong array was causing 'should record style mutations with multiple child nodes and replay them correctly' test to fail. Note on last point: I haven't looked into things more deeply than that the test was complaining about missing .length after `replayer.pause(1000);` * Warn instead of fail on exceptions thrown from postcss (rrweb-io#1580) * postcss was introduced in rrweb-io#1458 for use within adaptCssForReplay * rrweb-io#1600 fixes the main case where invalid css could be introduced when if valid css from the output of `sheet.cssRules` was split according to how it was split across text nodes of the <style> * the guard introduced here is still useful as we likely in future will switch to capturing the raw stylesheet contents (both <style> and <link>), at which point we will be much less confident of getting valid css * Fix splitCssText again (rrweb-io#1640) Fixes a browser 'lock up' at record time due to a presence of large amounts of css in <style> elements, which are split over multiple text nodes, which triggers the new code added in rrweb-io#1437 (see that PR for full explanation of why this all exists). rrweb-io#1437 was not written with performance in mind as it was believed to be an edge case, but things like Grammarly browser extension (rrweb-io#1603) among other scenarios were triggering pathological behavior, some of which was solved in rrweb-io#1615. See also rrweb-io#1640 (comment) for further discussion. * Fix the case when there are multiple matches and we end up not finding a unique one - just go with the best guess when there are many splits by looking at the previous chunk's size * Also add '0px' -> '0' stylesheet normalization, which also fixes the sample problem in a different way * Add new test and modify it so that it can trigger a failure in the absence of the '0px' normalization; there may be other unknown ways of triggering a similar bug, so ensure that the primary 'best guess' method doesn't suffer a regression * Leverage the 'best guess' method so that we can quit after 100 iterations trying to find a unique substring; hopefully this bit along with the `iterLimit` already added will prevent any future pathological cases. Failing example extracted from large files identified by Paul D'Ambra (Posthog) ... see comment from MartinWorkfully: PostHog/posthog-js#1668 * fix: move patch function into utils to improve bundling (rrweb-io#1631) * fix: move patch function into utils to improve bundling --------- Co-authored-by: pauldambra <pauldambra@users.noreply.github.com> Co-authored-by: Justin Halsall <Juice10@users.noreply.github.com> --------- Co-authored-by: Eoghan Murray <eoghan@getthere.ie> Co-authored-by: Kevin Townsend <11738094+kevinatown@users.noreply.github.com> Co-authored-by: Justin Halsall <Juice10@users.noreply.github.com> Co-authored-by: Paul D'Ambra <paul@posthog.com> Co-authored-by: pauldambra <pauldambra@users.noreply.github.com> Co-authored-by: John Henry Gunther <jguntherenator@gmail.com>

eoghanmurray added 12 commits January 23, 2025 12:24

Fix a case I previously forgot about, when there are multiple matches…

5962d03

… and we end up not finding a unique one - we should just go with the first one (note: this is still not binary search so could exhibit pathological behaviour)

Failing example extracted from large files identified by Paul D'Ambra…

08a80ab

… (Posthog) ... see comment from MartinWorkfully: PostHog/posthog-js#1668

Fix that we weren't stopping when going from many matches to zero mat…

43df807

…ches

Avoid (probably) a happydom bug

d59c1ef

Disabliing to prove failure

21beccb

Make the matches go from 'many' to 'none'

e789129

Pick the best when there are many splits by looking at previous chunk…

a72c4e1

…'s size

This seems more straightforward

b3a5a12

Rename some variables for clarity

e02e608

yarn format

0347a36

Add changeset

d0fb0c5

eoghanmurray added 2 commits January 29, 2025 15:48

Mention what triggers the bug

e5a9550

Solve the same bug another way; by normalization

01d96bb

eoghanmurray requested a review from Juice10 January 29, 2025 16:06

eoghanmurray and others added 2 commits January 29, 2025 16:09

Apply formatting changes

86796ac

Don't go past 100 iterations trying to find a unique substring

3a452e3

pauldambra reviewed Jan 31, 2025

View reviewed changes

eoghanmurray mentioned this pull request Feb 4, 2025

[Bug]: splitCssText causes degraded performance when recording #1603

Closed

1 task

Juice10 approved these changes Feb 4, 2025

View reviewed changes

eoghanmurray added 2 commits February 4, 2025 16:52

Add a tscomment to explain the _testNoPxNorm test-only parameter

7a32e49

Merge remote-tracking branch 'upstream/master' into fix-splitCssText-…

5743c7e

…again

eoghanmurray merged commit 3e9e42f into rrweb-io:master Feb 6, 2025
6 checks passed

This was referenced Feb 6, 2025

Version Packages (alpha) #1605

Open

Version Packages (alpha) kevinatown/rrweb#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix splitCssText again #1640

Fix splitCssText again #1640

eoghanmurray commented Jan 29, 2025

changeset-bot bot commented Jan 29, 2025 •

edited

Loading

pauldambra Jan 31, 2025

eoghanmurray Feb 4, 2025

pauldambra Feb 4, 2025

eoghanmurray Feb 4, 2025

pauldambra Jan 31, 2025

eoghanmurray Feb 4, 2025

pauldambra left a comment

eoghanmurray commented Feb 4, 2025 •

edited

Loading

Juice10 Feb 4, 2025

Fix splitCssText again #1640

Fix splitCssText again #1640

Conversation

eoghanmurray commented Jan 29, 2025

changeset-bot bot commented Jan 29, 2025 • edited Loading

🦋 Changeset detected

pauldambra Jan 31, 2025

Choose a reason for hiding this comment

eoghanmurray Feb 4, 2025

Choose a reason for hiding this comment

pauldambra Feb 4, 2025

Choose a reason for hiding this comment

eoghanmurray Feb 4, 2025

Choose a reason for hiding this comment

pauldambra Jan 31, 2025

Choose a reason for hiding this comment

eoghanmurray Feb 4, 2025

Choose a reason for hiding this comment

pauldambra left a comment

Choose a reason for hiding this comment

eoghanmurray commented Feb 4, 2025 • edited Loading

Juice10 Feb 4, 2025

Choose a reason for hiding this comment

changeset-bot bot commented Jan 29, 2025 •

edited

Loading

eoghanmurray commented Feb 4, 2025 •

edited

Loading