-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix splitCssText again #1640
Fix splitCssText again #1640
Conversation
…n with `isAttachIframe`' test - it was working for me when the test was run in isolation (`-t` option), but when the entire cross-origin-iframes test was run, the change of iframe contents didn't seem to happen in time
… and we end up not finding a unique one - we should just go with the first one (note: this is still not binary search so could exhibit pathological behaviour)
… (Posthog) ... see comment from MartinWorkfully: PostHog/posthog-js#1668
🦋 Changeset detectedLatest commit: 5743c7e The changes in this PR will be included in the next version bump. This PR includes changesets to release 19 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
const prevTextContent = childNodes[i - 1].textContent; | ||
if (prevTextContent && typeof prevTextContent === 'string') { | ||
// pick the first matching point which respects the previous chunk's approx size | ||
const prevMinLength = normalizeCssString(prevTextContent).length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess it's somewhere here that means if you run a test twice (or some multiple times) then you don't get the same output
(at least in my experience of testing whether this was deterministic when trying to figure out what was happening)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this comment, but I would say the algorithm is indeed deterministic, but maybe you mean it will behave differently based on different sized inputs because of the jLimit
bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, i wrote a test that ran the split multiple times and compared the output and it didn't match
was generally whitespace ending up on different sides of a split
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, yes this is possible ... basically it stops when the normalised versions match (although I still think it would be deterministic given the same input twice). I can't imagine there being a problem if one side has more whitespace than it should have.
if (_testNoPxNorm) { | ||
return cssText.replace(/(\/\*[^*]*\*\/)|[\s;]/g, ''); | ||
} else { | ||
return cssText.replace(/(\/\*[^*]*\*\/)|[\s;]/g, '').replace(/0px/g, '0'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i briefly tested a very naive loop parser and it was slower than regex replace - i guess because browsers/v8 are doing some magic to optimise this already
but I didn't test it over a range of inputs
this is (from my testing) at best O(n) for whitespace - and for clarity since i'm not much of a comp sci person. if you insert whitespace into the input then this gets slower the more whitespace is present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance of the normalization function could be improved. I've moreso tried to ensure it's not called repeatedly on the same piece of css (with the 'binary search' style changes in #1615 ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am really not sure how i could roll this out in prod
the last version basically doubled our support load and i'm ending this week exhausted as a result
i have no idea of how to test if this fixes for all cases or just for a specific case
we seem to be writing a css parser, and i'm wondering if adopting a css parser would be safer
We're not attempting to write a CSS parser at record time, we are using In most cases, the The mutation issues that this splitting solves are also mostly theoretical, so it's possible to patch/short-circuit the
I appreciate that the algorithm implemented here is not simple; I've thought about abstracting it out to a third party library "split a string according to an array of related substrings which can be matched via a normalization function" ... I haven't looked into whether such a thing already exists. This PR is definitely an improvement and I believe catches the last pathological case, particularly as now there is now an additional There is another large PR to move CSS parsing off the main thread at record time, however that would likely have hidden this problem rather than bringing it to the fore so painfully. I've also another plan to ditch the whole |
@@ -463,19 +470,24 @@ export function normalizeCssString(cssText: string): string { | |||
export function splitCssText( | |||
cssText: string, | |||
style: HTMLStyleElement, | |||
_testNoPxNorm = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to get some tsdoc
documentation as to what this does, especially since _variableName
normally means: an unused variable, in JS/TS land
* Fix up the 'should replace the existing DOM nodes on iframe navigation with `isAttachIframe`' test (rrweb-io#1636) - it was working for me when the test was run in isolation (`-t` option), but when the entire cross-origin-iframes test was run, the change of iframe contents didn't seem to happen in time * [chore]: Update actions/upload-artifact to v4 (rrweb-io#1643) * update actions/upload-artifact to v4 --------- Co-authored-by: Eoghan Murray <eoghan@getthere.ie> * Fix a code path where masking could be skipped on textareas (rrweb-io#1599) * Fixes rrweb-io#1596 * [chore] Cache yarn packages for CI (rrweb-io#1646) * [chore] Cache yarn packages for CI * Cache yarn in release.yml * [chore] Update deprecated download artifact on CI (rrweb-io#1647) * I'm merging even though ESLint is stlll failing in Github Actions as I believe it's running actions _without_ this PR applied yet * Fix env puppeteer error in cross-origin-iframes.test.ts (rrweb-io#1629) * chore(ci): track bundle size (rrweb-io#1630) * chore(ci): track bundle size --------- Co-authored-by: pauldambra <pauldambra@users.noreply.github.com> * Fix adapt css with split (rrweb-io#1600) Fix for rrweb-io#1575 where postcss was raising an exception * adapt the entire CSS as a whole in one pass with postcss, rather than adapting each split part separately * break up the postcss output again and assign to individual text nodes (kind of inverse of splitCssText at record side) * impose an upper bound of 30 iterations on the substring searches to preempt possible pathological behavior * add tests to demonstrate the scenario and prevent regression More technical details: * Fix algorithm; checks against `ix_end` within loop were incorrect when `ix_start` was bigger than zero. * Fix that length check against wrong array was causing 'should record style mutations with multiple child nodes and replay them correctly' test to fail. Note on last point: I haven't looked into things more deeply than that the test was complaining about missing .length after `replayer.pause(1000);` * Warn instead of fail on exceptions thrown from postcss (rrweb-io#1580) * postcss was introduced in rrweb-io#1458 for use within adaptCssForReplay * rrweb-io#1600 fixes the main case where invalid css could be introduced when if valid css from the output of `sheet.cssRules` was split according to how it was split across text nodes of the <style> * the guard introduced here is still useful as we likely in future will switch to capturing the raw stylesheet contents (both <style> and <link>), at which point we will be much less confident of getting valid css * Fix splitCssText again (rrweb-io#1640) Fixes a browser 'lock up' at record time due to a presence of large amounts of css in <style> elements, which are split over multiple text nodes, which triggers the new code added in rrweb-io#1437 (see that PR for full explanation of why this all exists). rrweb-io#1437 was not written with performance in mind as it was believed to be an edge case, but things like Grammarly browser extension (rrweb-io#1603) among other scenarios were triggering pathological behavior, some of which was solved in rrweb-io#1615. See also rrweb-io#1640 (comment) for further discussion. * Fix the case when there are multiple matches and we end up not finding a unique one - just go with the best guess when there are many splits by looking at the previous chunk's size * Also add '0px' -> '0' stylesheet normalization, which also fixes the sample problem in a different way * Add new test and modify it so that it can trigger a failure in the absence of the '0px' normalization; there may be other unknown ways of triggering a similar bug, so ensure that the primary 'best guess' method doesn't suffer a regression * Leverage the 'best guess' method so that we can quit after 100 iterations trying to find a unique substring; hopefully this bit along with the `iterLimit` already added will prevent any future pathological cases. Failing example extracted from large files identified by Paul D'Ambra (Posthog) ... see comment from MartinWorkfully: PostHog/posthog-js#1668 * fix: move patch function into utils to improve bundling (rrweb-io#1631) * fix: move patch function into utils to improve bundling --------- Co-authored-by: pauldambra <pauldambra@users.noreply.github.com> Co-authored-by: Justin Halsall <Juice10@users.noreply.github.com> --------- Co-authored-by: Eoghan Murray <eoghan@getthere.ie> Co-authored-by: Kevin Townsend <11738094+kevinatown@users.noreply.github.com> Co-authored-by: Justin Halsall <Juice10@users.noreply.github.com> Co-authored-by: Paul D'Ambra <paul@posthog.com> Co-authored-by: pauldambra <pauldambra@users.noreply.github.com> Co-authored-by: John Henry Gunther <jguntherenator@gmail.com>
See also PostHog/posthog-js#1668 for the downstream bug report, and further discussion in Slack (Thanks Paul D'Ambra for report and pointers)
This is a further improvement after performance fixes in #1615
This covers new scenarios as outlined in the tests. Test cases were recreated from a very large inline style node in https://hiring.workfully.com/signin which looks like it was in a shadow root
:host(.productfruits--container)
although I can't quite find it there now. I pulled the examples files from a breakpoint and have them locally, but the test cases here incorporate the important bit, including the split in the middle of a statement.Ultimately the problem with the content which triggered this case was that
margin-top: 0;
as authored, gets serialized tomargin-top: 0px;
, which was preventing us finding the right point to split between normalized/unnormalized.