Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paths containing data and sensitive data #99

Open
emmanuel-hestia opened this issue Aug 4, 2022 · 0 comments
Open

Paths containing data and sensitive data #99

emmanuel-hestia opened this issue Aug 4, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@emmanuel-hestia
Copy link

The following can be related to issue #40

In TikTok archives, I observe that some data is being used in the path itself. Specifically, when two users Alice and Bob start a conversation Alice's archive will contain a path named

$.Direct Messages.Chat History.ChatHistory.Chat History with Bob:

while Bob's archive will record the same conversation as

$.Direct Messages.Chat History.ChatHistory.Chat History with Alice:

In both cases, individual messages of the conversation have fields $.Direct Messages.Chat History.ChatHistory.Chat History with Alice:[*].From (for Bob) or $.Direct Messages.Chat History.ChatHistory.Chat History with Bob:[*].From (for Alice) where the person who emits the specific message is mentioned.

This causes several issues:

  1. data is inconsistent between Alice's and Bob's archive, since the same information is recorded under two different formats
  2. since the path hard-codes a value, the model created from Alice's archive will be unusable to parse the archive of a third party whose information has not yet been seen (which is an important part of the whole point). E.g. if Alice has only talked with Bob, applying the model generated from Alice's archive to Charlie's data will simply fail to detect a conversation with Daniel.
  3. since the values inserted are usernames, they constitute sensitive information that must never be published in the open.

Point 1 should not be of immediate practical concern, and Point 3 can be solved by careful manual curating of the data.
Point 2., on the other hand, threatens our ability to process the affected sections of the data. Intuitively, this would call for

  • at least: a syntax telling the parser to ignore a part of the path, collapsing all conversations into a single arborescence. Messages from third parties to the user will still be identified by the From: field, but it would make it impossible to tell to which third party the user sends their messages
  • ideally: remove the username from the path (again collapsing all conversations into a single arborescence) but add a new field such as ConversationWith, retaining the functionality of the original configuration.

I hope that the current framework allows for such features and that they are not excessively difficult to implement.

@Amustache Amustache added the enhancement New feature or request label Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants