You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In TikTok archives, I observe that some data is being used in the path itself. Specifically, when two users Alice and Bob start a conversation Alice's archive will contain a path named
$.Direct Messages.Chat History.ChatHistory.Chat History with Bob:
while Bob's archive will record the same conversation as
$.Direct Messages.Chat History.ChatHistory.Chat History with Alice:
In both cases, individual messages of the conversation have fields $.Direct Messages.Chat History.ChatHistory.Chat History with Alice:[*].From (for Bob) or $.Direct Messages.Chat History.ChatHistory.Chat History with Bob:[*].From (for Alice) where the person who emits the specific message is mentioned.
This causes several issues:
data is inconsistent between Alice's and Bob's archive, since the same information is recorded under two different formats
since the path hard-codes a value, the model created from Alice's archive will be unusable to parse the archive of a third party whose information has not yet been seen (which is an important part of the whole point). E.g. if Alice has only talked with Bob, applying the model generated from Alice's archive to Charlie's data will simply fail to detect a conversation with Daniel.
since the values inserted are usernames, they constitute sensitive information that must never be published in the open.
Point 1 should not be of immediate practical concern, and Point 3 can be solved by careful manual curating of the data.
Point 2., on the other hand, threatens our ability to process the affected sections of the data. Intuitively, this would call for
at least: a syntax telling the parser to ignore a part of the path, collapsing all conversations into a single arborescence. Messages from third parties to the user will still be identified by the From: field, but it would make it impossible to tell to which third party the user sends their messages
ideally: remove the username from the path (again collapsing all conversations into a single arborescence) but add a new field such as ConversationWith, retaining the functionality of the original configuration.
I hope that the current framework allows for such features and that they are not excessively difficult to implement.
The text was updated successfully, but these errors were encountered:
The following can be related to issue #40
In TikTok archives, I observe that some data is being used in the path itself. Specifically, when two users Alice and Bob start a conversation Alice's archive will contain a path named
while Bob's archive will record the same conversation as
In both cases, individual messages of the conversation have fields
$.Direct Messages.Chat History.ChatHistory.Chat History with Alice:[*].From
(for Bob) or$.Direct Messages.Chat History.ChatHistory.Chat History with Bob:[*].From
(for Alice) where the person who emits the specific message is mentioned.This causes several issues:
Point 1 should not be of immediate practical concern, and Point 3 can be solved by careful manual curating of the data.
Point 2., on the other hand, threatens our ability to process the affected sections of the data. Intuitively, this would call for
I hope that the current framework allows for such features and that they are not excessively difficult to implement.
The text was updated successfully, but these errors were encountered: