Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUMM-2651 SR - skip new lines and spaces when obfuscating texts #1078

Conversation

mariusc83
Copy link
Member

What does this PR do?

We need to skip the new lines and spaces characters when applying the obfuscation on the text values to keep the aspect ratio of the replicated text view element in the Web Player.

Motivation

What inspired you to submit this pull request?

Additional Notes

Anything else we should know when reviewing?

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Make sure you discussed the feature or bugfix with the maintaining team in an Issue
  • Make sure each commit and the PR mention the Issue number (cf the CONTRIBUTING doc)

@mariusc83 mariusc83 self-assigned this Oct 11, 2022
@mariusc83 mariusc83 requested a review from a team as a code owner October 11, 2022 12:34
return String(
CharArray(text.length) {
val character = text[it]
if (character == CHARACTER_SPACE || character == CHARACTER_NEW_LINE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use character.isWhitespace() to check this rather than comparing to our own hardcoded values

Comment on lines 44 to 51
CharArray(text.length) {
val character = text[it]
if (character == ' ' || character == '\n') {
character
} else {
'x'
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you're duplicating the production algorithm, meaning there's no way your test can fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I know, this is for all the other tests that are not exactly testing the obfuscation, is just that we are asserting everything in those. For all the obfuscation cases I added the 3 tests below.


// Then
val expectedWireframe = mockButton.toTextWireframe().copy(text = fakeText)
val expectedWireframe = mockTextView.toTextWireframe().copy(text = fakeExpectedText)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also have a test in the form

val chunk1 = randomText
val chunk2 = randomText
val spacer = randomWhitespace // forge.forge.aWhitespaceString()
val input = chunk1 + spacer + chunk2
val expectedOutput = ("x" * chunk1.length) + spacer + ("x" * chunk2.length)

with this (I don't know if it exist in our repo

operator fun String.times(i: Int) {
    val builder = StringBuilder(i * length)
    repeat(i) {
        builder.append(this)
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

@codecov-commenter
Copy link

codecov-commenter commented Oct 11, 2022

Codecov Report

Merging #1078 (94f9404) into feature/sdkv2 (946cc7e) will increase coverage by 0.07%.
The diff coverage is 100.00%.

@@                Coverage Diff                @@
##           feature/sdkv2    #1078      +/-   ##
=================================================
+ Coverage          82.58%   82.65%   +0.07%     
=================================================
  Files                346      347       +1     
  Lines              11157    11165       +8     
  Branches            1831     1832       +1     
=================================================
+ Hits                9214     9228      +14     
+ Misses              1381     1378       -3     
+ Partials             562      559       -3     
Impacted Files Coverage Δ
...play/recorder/mapper/MaskAllTextWireframeMapper.kt 100.00% <100.00%> (+20.00%) ⬆️
.../sessionreplay/recorder/mapper/StringObfuscator.kt 100.00% <100.00%> (ø)
...android/v2/core/internal/DatadogContextProvider.kt 81.25% <0.00%> (-1.56%) ⬇️
...src/main/kotlin/com/datadog/android/DatadogSite.kt 86.67% <0.00%> (ø)
...android/rum/internal/ndk/DatadogNdkCrashHandler.kt 87.30% <0.00%> (ø)
...rc/main/java/com/datadog/opentracing/DDTracer.java 56.49% <0.00%> (+0.42%) ⬆️
...ain/java/com/datadog/opentracing/PendingTrace.java 60.34% <0.00%> (+1.72%) ⬆️
...rsistence/file/batch/PlainBatchFileReaderWriter.kt 91.18% <0.00%> (+2.94%) ⬆️

@mariusc83 mariusc83 force-pushed the mconstantin/rumm-2651/do-not-replace-spaces-when-applying-obfuscation branch 3 times, most recently from 7bb701a to 52fb574 Compare October 12, 2022 08:15
@mariusc83 mariusc83 requested a review from xgouchet October 12, 2022 08:16
@mariusc83 mariusc83 force-pushed the mconstantin/rumm-2651/do-not-replace-spaces-when-applying-obfuscation branch from 52fb574 to a19d58e Compare October 12, 2022 08:25
@xgouchet xgouchet added the size-small This PR is small sized label Oct 12, 2022
@mariusc83 mariusc83 force-pushed the mconstantin/rumm-2651/do-not-replace-spaces-when-applying-obfuscation branch from a19d58e to 99bb53a Compare October 12, 2022 09:26

internal class StringObfuscator {
fun obfuscate(stringValue: String): String {
return String(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add TODO maybe here, saying it works only for ASCII. Also character can be non-printable Control character, but I guess we can ignore it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that I need to use another method to support all unicode characters the way they are specifying in their comments.

Copy link
Member

@0xnm 0xnm Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't work properly for Unicode still. For example, let's take the following emoji. It has a code \u1f600, so it already exceeds char size (which values go only up to \uffff, because it can fit only 16 bits), so this single symbol consists of the 2 char -> "😀".length will give 2.

But in reality we cannot threat these chars independently, because first is just a high surrogate and second is the low surrogate. This is how UTF-16 encoding is actually working - it will read the stream of char of length 16 bits and decide if next char is a lower surrogate or independent symbol.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I imagine this is the way it should work but I think this will not be a problem. The only issue would be if one of this emojis will contain one of the non printable characters in their composition. In that case we will brake the text but I am pretty sure Character.isWhitespace() handles that somehow or maybe these whitespace characters are not used in emojis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that in case of emoji we will have xx instead of single x, not sure if this is what we want to achieve (I guess Character.isWhitespace() will return false for any surrogate).

Copy link
Member Author

@mariusc83 mariusc83 Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...I don't think this should be an issue as long as we obfuscate the texts. It will be a corner case anyway. I am going to add this explanation there though to have this in mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't say it is a corner case, because there are plenty of symbols with above \uffff, and if we think of emojis we may find a lot of them in messenger apps, for example, and obfuscation may break formatting. Not a blocker for me, but let's acknowledge this.

assertThat(obfuscatedText).isEqualTo(fakeExpectedText)
}

private fun String.times(n: Int): String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems the default repeat should do the job?

fakeExpectedChunk3 +
fakeExpectedChunk4 +
fakeExpectedChunk5
).times(n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need times call if it just copies the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to have a bigger text. I was following @xgouchet suggestion, not sure that this is what he wanted but I think is a bit overkill also. @xgouchet do you think this is still needed ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the times was more to generate a string with just 'x' of a given size, here it doesn't make sense

@mariusc83 mariusc83 force-pushed the mconstantin/rumm-2651/do-not-replace-spaces-when-applying-obfuscation branch 2 times, most recently from 754eefd to 94f9404 Compare October 12, 2022 11:59
@mariusc83 mariusc83 requested a review from 0xnm October 12, 2022 12:01
@mariusc83 mariusc83 force-pushed the mconstantin/rumm-2651/do-not-replace-spaces-when-applying-obfuscation branch from 94f9404 to c82b100 Compare October 13, 2022 08:11
val character = stringValue[it]
// Using the Character.isWhitespace() function we will support also the UNICODE
// characters according with the docs. Also have in mind that in JAVA the
// `char` primitive is a 16bits long number which covers all the UNICODE characters:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

char doesn't cover all the Unicode characters. It covers BMP plain Unicode characters, but it doesn't cover Supplementary plain characters (above 0xFFFF). When we call character.code we just cast from char to int, but to get code of the symbol from supplementary plain we need to merge 2 chars into single int.

@mariusc83 mariusc83 force-pushed the mconstantin/rumm-2651/do-not-replace-spaces-when-applying-obfuscation branch from c82b100 to dfee181 Compare October 13, 2022 10:39
@xgouchet xgouchet added the size-medium This PR is medium sized label Oct 13, 2022
Copy link
Member

@0xnm 0xnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!


@Test
@TestTargetApi(Build.VERSION_CODES.N)
fun `M mask String W maskString(){string with newline, Android N}`(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not maskString, but obfuscate. Same for other tests.

Suggested change
fun `M mask String W maskString(){string with newline, Android N}`(
fun `M mask String W obfuscate(){string with newline, Android N}`(

@mariusc83 mariusc83 force-pushed the mconstantin/rumm-2651/do-not-replace-spaces-when-applying-obfuscation branch 2 times, most recently from 09bc0c4 to 483486f Compare October 13, 2022 14:57
@mariusc83 mariusc83 force-pushed the mconstantin/rumm-2651/do-not-replace-spaces-when-applying-obfuscation branch from 483486f to 0d1e683 Compare October 13, 2022 15:10
@mariusc83 mariusc83 merged commit ef329c2 into feature/sdkv2 Oct 14, 2022
@mariusc83 mariusc83 deleted the mconstantin/rumm-2651/do-not-replace-spaces-when-applying-obfuscation branch October 14, 2022 06:56
@xgouchet xgouchet added this to the 1.16.0 milestone Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size-medium This PR is medium sized size-small This PR is small sized
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants