Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUMM-2139: New event storage format #908

Merged
merged 2 commits into from
Apr 25, 2022

Conversation

0xnm
Copy link
Member

@0xnm 0xnm commented Apr 21, 2022

What does this PR do?

This PR brings a new event storage format. Now every event written to the disk will also have a header which contains metadata. Currently this metadata only includes event size in bytes, which makes possible to treat a storage as a binary file and read individual events back for the possible pre-processing before doing the upload, also removing the need to store the separator in the file.

This new format also dramatically improves the logic needed to support encryption, because knowing event binary size we don't need to rely on the separator anymore when reading it back.

Another benefit is that is file is partially corrupted we will be able to read back at least some events, instead of loosing all the data in the file.

While this PR adds metadata to each event, it is not yet propagated back to the higher levels and is using currently only for reading events - this is done to minimize the change scope for this PR.

Reader is now responsible for joining events into a single binary batch.

New event storage structure: 1 byte of header version + 1 byte of meta size + meta bytes + event bytes

Meta structure: { "ev_size": number }, where ev_size is event size in bytes.

This change is not backward compatible with the previous storage format, but it is fine to have it like that - we introduce new folders, so data doesn't overlap between the formats.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Make sure you discussed the feature or bugfix with the maintaining team in an Issue
  • Make sure each commit and the PR mention the Issue number (cf the CONTRIBUTING doc)

@codecov-commenter
Copy link

codecov-commenter commented Apr 21, 2022

Codecov Report

Merging #908 (6b9d129) into develop (be33441) will decrease coverage by 0.02%.
The diff coverage is 90.24%.

❗ Current head 6b9d129 differs from pull request most recent head 9ac0d9c. Consider uploading reports for the commit 9ac0d9c to get more accurate results

@@             Coverage Diff             @@
##           develop     #908      +/-   ##
===========================================
- Coverage    83.06%   83.04%   -0.02%     
===========================================
  Files          266      267       +1     
  Lines         9068     9047      -21     
  Branches      1473     1453      -20     
===========================================
- Hits          7532     7513      -19     
+ Misses        1138     1137       -1     
+ Partials       398      397       -1     
Impacted Files Coverage Δ
...n/com/datadog/android/ndk/NdkCrashReportsPlugin.kt 62.86% <ø> (ø)
...rsistence/file/advanced/FeatureFileOrchestrator.kt 100.00% <ø> (ø)
...android/rum/internal/ndk/DatadogNdkCrashHandler.kt 86.78% <75.00%> (+0.04%) ⬆️
...nternal/persistence/file/batch/BatchFileHandler.kt 78.30% <86.21%> (+5.44%) ⬆️
...ndroid/core/internal/persistence/file/EventMeta.kt 90.00% <90.00%> (ø)
...n/com/datadog/android/core/internal/CoreFeature.kt 93.36% <100.00%> (ø)
...g/android/core/internal/data/upload/DataFlusher.kt 100.00% <100.00%> (ø)
.../internal/persistence/file/EncryptedFileHandler.kt 93.33% <100.00%> (+0.20%) ⬆️
...internal/persistence/file/FilePersistenceConfig.kt 87.50% <100.00%> (ø)
...rnal/persistence/file/batch/BatchFileDataReader.kt 100.00% <100.00%> (ø)
... and 10 more

@xgouchet xgouchet added the size-large This PR is large sized label Apr 21, 2022
@0xnm 0xnm marked this pull request as ready for review April 21, 2022 11:30
@0xnm 0xnm requested a review from a team as a code owner April 21, 2022 11:30
Copy link
Member

@ncreated ncreated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good from my POV 👌. Writing interfaces do not depend on data format (prefix/suffix/separator) ☑️ and encryption is vastly simplified ☑️.

Copy link
Member

@xgouchet xgouchet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


// region FileHandler
internal val delegate: FileHandler
) : FileHandler by delegate {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice job here!

val meta = metaGenerator(data)

// in Kotlin and Java byte type is signed, meaning it goes from -127 to 128.
// It is completely fine to have size more than 128, it will be just stored
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure about this ? I am thinking that for example : 129 will become -127 and when you will read it back will not have the required value. Am I missing something ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will be fine. It is just One's complement system https://en.wikipedia.org/wiki/Ones%27_complement. When we read byte back as InputStream.read() it returns actually an int (it applies 0xff mask, check here), so values greater than 127 which go to the negative zone in the byte representation will be positive if represented.

Here is an example https://ideone.com/5GJksf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, was not aware of that, thanks for the explanation.

} else {
result.copyOf(offset)
if (remaining != 0) {
val message = WARNING_NOT_ALL_DATA_READ.format(Locale.US, file.path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we still send this data in this case ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I was thinking about it and decided to keep it to see if this will happen in our nightly tests, just in case.

Not sure if it may happen at all with flash memory (maybe only when the system decides to clean up cache folder while we read the file?), but since we are using the binary format now, it is better to keep an eye on the file integrity to be able to catch possible issues.


// region private

private fun List<ByteArray>.join(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen that you already created Collection<ByteArray>.join extension for this, this looks much more simpler though, wondering why not having the same implementation also there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the one used in test is not very efficient - it will create new array for every reduce operation, but it is much more readable indeed (this is an advantage for the test).

The one used in the production code creates array only once and is using copyTo to fill it.

@0xnm 0xnm force-pushed the nogorodnikov/rumm-2139/new-event-storage-format branch from 6b9d129 to 9ac0d9c Compare April 22, 2022 14:12
@0xnm 0xnm merged commit 17addad into develop Apr 25, 2022
@0xnm 0xnm deleted the nogorodnikov/rumm-2139/new-event-storage-format branch April 25, 2022 07:47
@xgouchet xgouchet added this to the 1.13.0 milestone Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size-large This PR is large sized
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants