Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨[REPLAY-336] Privacy by Default #951

Merged
merged 43 commits into from
Aug 17, 2021
Merged

Conversation

jagracey
Copy link
Contributor

@jagracey jagracey commented Jul 20, 2021

Motivation

Implements "Privacy by Default" (REPLAY-336).

Note to public. We are undergoing internal "dogfooding" of this Privacy API because we anticipate product related changes as we utilize this API internally at Datadog- without yet committing to a stable Privacy API. Internal docs are currently being reviewed and validated.

PR Navigation Notes:

  • (internal to Datadog) Technical Specification
  • (internal to Datadog) This PR follows the following implementation spreadsheet
  • BREAKING CHANGES: changes input-masked + input-ignored to now both be mask-forms-only. This changes behaviour a bit, generally improving privacy. mask-forms-only is basically mask but only for forms like elements (see internal spec)
  • We've got some serialization architecture friction with attributes handling both HTML attributes and JS element state, with (theoretical though unlikely) lossy serialization. In the future we should split to handle a new state property, and perhaps a meta property for internal props like rrweb.

Brief Spec Change Overview:

  • implements general HTML text scrambling algorithm (shuffling non-whitespace chars + adds entropy + reduce charset)
  • 4 tagging modes, with inheritance rules (allow/mask/mask-forms-only/hidden)
  • internal tagging state tracks(not_set/ignore/unknown/*-sealed)
  • 3 asterisks used as a replacement marker to prevent leaking length for when hidden
  • select/option/datalist/optsgroup will randomize child nodes + has a consistent text content of ***
  • mask mode now also censors <a>.href, <img>.src, data-* attributes etc.
  • information captured by hidden elements is now the bare minimum (no id, class attributes etc anymore)

Future Considerations:

  • background CSS images not handled by privacy API
  • Privacy Level Cache invalidation doesn't support attribute changes
  • Changing a node's privacy level (including its descendants) is not currently supported.
  • further to prior note, Nodes don't "resync" when privacy level changes
  • JS state vs html attributes is currently merged together (perhaps we should implement new serialization shape of {attributes, state, meta})
  • perf improvements (eg. expose a config flag to not parse class names for privacy level lookup)
  • Serialisation edge case reminders:
    • support for <select> multiselect not reviewed
    • <video> + elements don't have currentTime serialised or observed.
    • tagName support for unicode, namespacing has been expanded, but not exhaustively reviewed (<foo:bar.ßaK>hello world</foo:bar.ßaK>)
  • clearing out a input[type=text].value doesn't work since no attribute is created (and we treat state as an attribute)
  • We should ignore certain tags (<script>,<meta>) but still record them so they don't introduce CSS styling side effects (script#widget + div or :nth-child(x))
  • Maybe the 3 files can be grouped differently serialize.ts serializationUtils, privacy.ts
  • During tests, we pass in different ancestor privacy levels so we can't cache the privacy level if parentNodePrivacyLevel is passed in. To fix, we should uncache these values after using them.

Changes

Largely isolated to 3 files:

  1. packages/rum-recorder/src/domain/record/serialize.ts
  2. packages/rum-recorder/src/domain/record/privacy.ts
  3. packages/rum-recorder/src/constants.ts

Testing

  1. Modified previously existing tests to conform.
  2. Added new test file packages/rum-recorder/src/domain/record/privacyByDefault.spec.ts
  3. Extensive manual testing via (internal) live session replay playground
  4. Manual checks against the datasheet spec

I have gone over the contributing documentation.

@codecov-commenter
Copy link

codecov-commenter commented Jul 22, 2021

Codecov Report

Merging #951 (23fce4c) into main (4cb9c9c) will increase coverage by 0.00%.
The diff coverage is 89.79%.

Impacted file tree graph

@@           Coverage Diff            @@
##             main     #951    +/-   ##
========================================
  Coverage   89.08%   89.09%            
========================================
  Files          83       84     +1     
  Lines        3868     4026   +158     
  Branches      857      925    +68     
========================================
+ Hits         3446     3587   +141     
- Misses        422      439    +17     
Impacted Files Coverage Δ
packages/rum/src/domain/record/types.ts 100.00% <ø> (ø)
packages/rum/src/domain/record/observer.ts 75.36% <66.66%> (-0.22%) ⬇️
packages/rum/src/domain/record/privacy.ts 87.80% <87.65%> (-12.20%) ⬇️
...ckages/rum/src/domain/record/serializationUtils.ts 97.91% <92.30%> (+2.67%) ⬆️
packages/rum/src/domain/record/serialize.ts 94.59% <93.58%> (+3.07%) ⬆️
packages/rum/src/constants.ts 100.00% <100.00%> (ø)
packages/rum/src/domain/record/mutationObserver.ts 96.96% <100.00%> (-0.71%) ⬇️
packages/rum/test/htmlAst.ts 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4cb9c9c...23fce4c. Read the comment docs.

@jagracey jagracey marked this pull request as ready for review July 22, 2021 12:29
@jagracey jagracey requested a review from a team as a code owner July 22, 2021 12:29
Copy link
Member

@BenoitZugmeyer BenoitZugmeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests shouldn't be in a separate file, but should be integrated in existing test files instead.

@jagracey jagracey force-pushed the john.gracey/privacy-by-default-mvp branch 2 times, most recently from 233a0e4 to e28af83 Compare August 1, 2021 16:33
@jagracey jagracey force-pushed the john.gracey/privacy-by-default-mvp branch from f83811b to 7aa72ed Compare August 9, 2021 18:37
@jagracey jagracey force-pushed the john.gracey/privacy-by-default-mvp branch from 8c39340 to a35f742 Compare August 9, 2021 22:41
@jagracey jagracey force-pushed the john.gracey/privacy-by-default-mvp branch 2 times, most recently from 73efb85 to 24bdd7a Compare August 16, 2021 21:27
@jagracey jagracey force-pushed the john.gracey/privacy-by-default-mvp branch from 24bdd7a to 65329d9 Compare August 16, 2021 21:32
@jagracey jagracey force-pushed the john.gracey/privacy-by-default-mvp branch from 76ee67e to 23fce4c Compare August 16, 2021 23:42

const makeHtmlDoc = (htmlContent: string, privacyTag: string) => {
try {
// Karma doesn't seem to support `document.documentElement.outerHTML`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Karma has nothing to do with outerHTML... probably something else

Maybe use

const htmlString = "<strong>Beware of the leopard</strong>";
const doc3 = new DOMParser().parseFromString(htmlString, "text/html");

Copy link
Member

@BenoitZugmeyer BenoitZugmeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@jagracey jagracey merged commit 2616cfd into main Aug 17, 2021
@jagracey jagracey deleted the john.gracey/privacy-by-default-mvp branch August 17, 2021 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants