-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add posthog tracking for UTDs in Element X #2300
Comments
Every unable to decrypt should be reported. The error are mapped to poshtog like that:
The error name should be on of those, but as for now the SDK is not reporting the error type, we hardcode to
Decryption error are reported per event. So if 2 messages using the same sessionId are in UTD, it should report twice. Clients should at least have some mecanism to not report several time the same event. Though non of the existing clients are persiting it, i.e after a restart the same event would be reported. The To filter out EIX ans EAX we use a |
@BillCarsonFr how useful and feasible would be to have the UTD reports logic in the crypto crate? I am asking because we have nothing on EX. The timeline is managed by the SDK. It would be great to have a single logic component at least in the SDK but maybe in the crypto crate as it could serve EWR too. |
The logic will be in the rust-sdk I believe. |
Visibility on screen isn't mentioned in the earlier requirements. Does a message have to be on screen (and still on screen after the grace period?) for it to be counted as a UTD for metrics? I think so, but that means a user has to be staring at a UTD for 30 seconds for it to count for the metrics? |
From my perspective the users experience is what counts. So if a user experiences an error because of a UTD this is what we should count. |
Unfortunalty we can't have exactly what we want. The current grace period is 4s and we are increasing it to stabilize the metrics, as it appears that there is a good amount of late keys on other clients (maybe less common with sliding sync). When we will have ressources we will be able to create a better system, something more like |
No it's just that it was displayed on screen once. Why would we do a stare at it for 30s thing?? |
One of the solution for implementation could be to use the existing rust-tracer and create a subscriber for analytics. |
Am I getting it right: You increase the grace period to get lower numbers? That's cheating if you ask me. If we have the ambition to create a great user-experience and believe this comes with no user having to wait for more than 4 sec that their messages decrypt we should stay with the metric as they are. |
My understanding for the rationale in increasing the grace period is not to juice the numbers but to differentiate permanently failing messages from temporarily failing messages. Both are bad, but permanently failing messages are much worse. So by increasing the period we can filter out the temporary failures and can focus on the permanent ones, accepting that the temporary failures also need work further down the line. The other option would be to report two windows, so a 4s (say) and 30 sec, but I think the team are trying to get the simplest thing possible to unblock them. @BillCarsonFr can you just confirm my understanding is correct? |
tbh. this is just a pragmatic short cut. The proper "solution" would be to record the time needed (delay between message arrived and key arrived) and present it in a histogram. This way we could learn a lot about the process... However, adding the missing bits to gather this information is just to expensive. |
It's fine, but if we go down that path, we need to make sure that if the key arrives after 20s, we attempt to decrypt the message again even if it is no longer on screen, so that we can exclude that message from the stats. Maybe that already happens, but the logs in https://github.com/element-hq/element-x-ios-rageshakes/issues/1377 suggest not. I'd like someone more familiar with the Element X codebase to explain what's going on there. |
Yes we need to review all the existing code and ensure it's doing what we want.
AFAIK as I looked some times ago, I think this logic is in the Also, I was surprised that it was not using the signaling in the crypto crate (there is a stream when keys are added/udpated in store). |
Exactly, in the perfect world we would to the perfect thing. |
Regarding |
Quick Notes on the discussions: The SDK API is not at the moment exposing the type of error that did occur (MissingRoomKey/Megolm), so we can't find the correct Decision: We don't block on that and hardcode |
This adds a new mechanism in the UI crate (since re-attempts to decrypt happen in the timeline, as of today — later that'll happen in the event cache) to notify whenever we run into a UTD (an event couldn't be decrypted) or a late-decryption event (after some time, a UTD could be decrypted). This new hook will deduplicate pings for the same event (identifying events on their event id), and also implements an optional grace period. If an event was a UTD: - if it's still a UTD after the grace period, then it's reported with a `None` `time_to_decrypt`, - if it's not a UTD anymore (i.e. it's been decrypted in the meanwhile), then it's reported with a `time_to_decrypt` set to the time it took to decrypt the event (approximate, since it starts counting from the time the timeline receives it, not the time the SDK fails to decrypt it at first). It's configurable as an optional hook on timeline builders. For the FFI, it's configurable at the sync service's level with a "delegate", and then the sync service will forward the hook to the timelines it creates, and the hook will forward the UTD info to the delegate. Part of element-hq/element-meta#2300. --- * ui: add a new module and trait for subscribing to unable-to-decrypt notifications and late decryptions (i.e. the key came in after the event that required it for decryption). * timeline: hook in (!) the unable-to-decrypt hook * timeline: prefix some test names with test_ * utd hook: delay reporting a UTDs * ffi: add support for configuring the utd hook * utd hook: switch strategy, have a single hook And have the data structure contain extra information about late-decryption events. * utd hook: rename `SmartUtdHook` to `UtdHookManager` * ffi: configure the UTD hook with a grace period of 60 seconds And ignore UTDs that have been late-decrypted in less than 4 seconds. * utd hook: update documentation and satisfy the clippy gods * ffi: introduce another UnableToDecryptInfo FFI struct that exposes simplified fields from the SDK's version * review: introduce type alias for pending utd reports * review: address other review comments
The SDK initial implementation has just been landed, so unassigning myself — it's now in the hands of our app devs! |
First UTDs analytics will be available in EXA 0.4.6 and EXI 1.5.12. We should make those versions public in the store today. |
Your use case
Add posthog tracking for UTDs.
Event: Error
Crypto SDK: Rust
Domain E2E
I am not a 100% sure that this what is required for the tracking. Please align also w. the CryptoTeam.
Have you considered any alternatives?
No response
Additional context
No response
Tasks
The text was updated successfully, but these errors were encountered: