-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Fix app freeze after login #20729
Conversation
Jenkins BuildsClick to see older builds (42)
|
5ae8fa2
to
a53e41e
Compare
0de002c
to
0b0b452
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ilmotta
Thank you for this PR. It's great to see performance improvements!
I've left some comments and questions 👍
src/status_im/core.cljs
Outdated
;; been updated, i.e., after the next Reagent animation frame. In such a | ||
;; case, the event should be dispatched with :flush-dom metadata. For more | ||
;; details, check re-frame's source code and documentation. | ||
(re-frame/dispatch ^:flush-dom [:signals/signal-received raw-event]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, is this a way to schedule this event until the end of the frame?
Is this the same as wrapping the event in js/requestAnimationFrame
? 🤔
If so, I thought reagent (an so re-frame) already executed their tasks at the end of the frame by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment I added is a copy & paste from the documentation. I'm taking re-frame's word for this, check it out from the docs https://github.com/day8/re-frame/blob/09e2d7132c479aa43f2a64164e54e42bf8511902/docs/Solve-the-CPU-hog-problem.md#L195:
In these kinds of cases, where you are only going to give the UI
**one chance to update** (not a repeated chance every few milliseconds),
then you had better be sure the DOM is fully synced.
To do this, you put metadata on the event being dispatched:
(re-frame.core/reg-event-fx
:process-x
(fn
[{db :db} event-v]
{:dispatch ^:flush-dom [:do-work-process-x] ;; <--- NOW WITH METADATA
:db (assoc db :processing-X true)})) ;; so the modal gets rendered
Notice the `^:flush-dom` metadata on the event being dispatched. Use
that when you want the UI to be fully updated before the event dispatch
is handled.
You only need this technique when you:
1. want the DOM to be fully updated
2. because you are going to hog the CPU for a while and not give it back. One chunk of work.
If you handle via multiple chunks you don't have to do this, because
you are repeatedly handing back control to the browser/UI. Its just
when you are going to tie up the CPU for a one, longish chunk.
The key part for me was Use that when you want the UI to be fully updated before the event dispatch is handled.
Which I'm assuming is a good thing because we don't want signals dispatching :signals/signal-received
to immediately interrupt anything, they can wait a tiny bit.
That's why I said the theory is sound, but I couldn't measure differences, but sounds good on paper. Could it have side-effects? I really can't tell.
I can clearly see this technique being useful (again, in theory) in places where we still use rf/merge
because they hog the CPU for longer in one chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another piece of the docs:
;; - when the event has :flush-dom metadata we schedule via
;; "reagent.core.after-render"
;; which will run event processing after the next Reagent animation frame.
It sounds good on paper for processing signal events. Or do you have reservations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I'm having more questions.
-
Does this metadata work on the
re-frame/dispatch
call or just when we are inside an event handler? -
I misunderstood and compared it with
js/requestAnimationFrame
, but I think it works the same asdispatch-sync
, am I right? could we use it instead? -
Which I'm assuming is a good thing because we don't want signals dispatching :signals/signal-received to immediately interrupt anything, they can wait a tiny bit.
So, are you trying to delay the execution of the event? 🤔 if so, I think you don't want to use this approach.
I know one or two places in the wallet where this can be used, sometimes we navigate but the re-frame data is not ready yet, so we could dispatch with this metadata so that we don't wait for one frame more. In the past I managed to solve it by delaying the navigation one frame (js/requestAnimationFrame
), but the team didn't like this approach, you may remember it:
And the other usage is when we navigate to view a collectible, we store it in re-frame, but it's not ready until the next frame, so we have a one-frame blink
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding to rf/merge
yes, I think it can be applied there if there's much work to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know as much as you about this part @ulisesmac.
I think the implications of this metadata should be explored simply because it's there already for us to be used. It should only make a difference in signals that are handled in the same chunk as :signals/signal-received
. So for instance, this means the second dispatch to handle a wallet
signal won't be affected by the metadata optimization, but a signal like "envelope.sent"
should be affected because it uses rf/merge
.
Does this metadata work on the re-frame/dispatch call or just when we are inside an event handler?
I think it works only for the immediate handler. Further re-dispatches from the handler will be treated the same. This metadata is useful in events that are more expensive, as they say in the docs.
So, are you trying to delay the execution of the event?
Correct, this is a form of delaying, telling re-frame that the handler for a signal should be processed only after the next Reagent animation frame.
Does this metadata work on the re-frame/dispatch call or just when we are inside an event handler?
dispatch-sync
would be the opposite of what I tried in this PR because it would immediately process a signal, without enqueueing, which would more likely affect responsiveness when there are too many (more important) events competing for resources.
I know one or two places in the wallet where this can be used
It would quite useful if we could experiment with this metadata with real data, as in the wallet, and start to get a feeling of it based on profiling. If it proves to be useless we can always take it away easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ilmotta Now I'm not sure if the metadata :flush-dom
executes everything "now" or delay the next execution. I personally understand it as a way of dispatch-sync
but for content that is inside event handlers. So:
Correct, this is a form of delaying, telling re-frame that the handler for a signal should be processed only after the next Reagent animation frame.
As far as I understand, by default they are executed at the end of the frame, if you are willing to delay it even one frame more, you can wrap it use js/requestAnimationFrame
or reagent/next-tick
(they should be exactly the same in React Native).
dispatch-sync would be the opposite of what I tried in this PR
What you did in this PR anddispatch-sync
are, according to what I see, the same thing, not the opposite. But we may need to read the source code to properly understand what is happening.
I've done a test where I know re-frame, sometimes, takes one frame more to propagate the given DB state.
Please consider replicating this example, and it'd be great if it's part of this PR (or should I open a new issue PR for it?)
-
In
status-im.contexts.wallet.collectible.view
move thecollectible
binding incollectible-details
outside the render function:
-
In
status-im.contexts.wallet.collectible.events
inside the event handler for:wallet/navigate-to-collectible-details
we navigate to the collectible page delaying the navigation 17ms (slightly more than a frame). So that re-frame propagates its state:
-
If we use a regular
dispatch
:And with the changes in
1
, once thecollectible
value is defined, it'll never be updated, so if the image is not found and we'll get an error in the UI, here is a video of it, it works as expected:Screencast.from.2024-07-19.14-32-15.mp4
As said before, re-frame sometimes is not able to propagate its DB state on time.
-
Now let's just add the metadata
:flush-dom
,
The app now looks smoother and NEVER fails to propagate the data on time:
Screencast.from.2024-07-19.14-36-08.mp4
So, :flush-dom
seems like a way of dispatch-sync
🤔 for the event where it's used (not the vent marked with the metadata).
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting results @ulisesmac 💯 Happy to see a problem could be solved by the metadata.
But I'm going to remove the metadata from this PR and then we can introduce it with a separate PR focusing on solving a specific problem. I hope that's okay for you :) The thing is that this PR still needs to be QAed, and I'm a bit weary it could introduce a small regression based on our investigations, and since this is maybe the last week before cutting the release branch 🤷🏼 And I still would like to get more approvals if possible.
So, :flush-dom seems like a way of dispatch-sync 🤔 for the event where it's used (not the vent marked with the metadata).
The implementation of dispatch-sync
is different, we can see that it literally skips the queue, whereas normal dispatch
calls (push event-queue event)
. I prefer to not think of it as dispatch sync for that reason. I saw that there are lots of little comments about the metadata sprinkled in re-frame's source. The comments mention that it can "pause processing". This confused me and it's now unclear to me exactly what the metadata does and I don't trust the docs anymore. For example:
- https://github.com/day8/re-frame/blob/09e2d7132c479aa43f2a64164e54e42bf8511902/src/re_frame/router.cljc#L65
- https://github.com/day8/re-frame/blob/09e2d7132c479aa43f2a64164e54e42bf8511902/src/re_frame/router.cljc#L153
- https://github.com/day8/re-frame/blob/09e2d7132c479aa43f2a64164e54e42bf8511902/src/re_frame/router.cljc#L193
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'm going to remove the metadata from this PR and then we can introduce it with a separate PR focusing on solving a specific problem.
Sounds good! @ilmotta
The comments mention that it can "pause processing". This confused me and it's now unclear to me exactly what the metadata does and I don't trust the docs anymore. For example:
We can ask in the Clojurians Slack, it's a bit confusing.
Thank you so much for sharing this info 👍 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I've discussed with you before, I still think we could implement a priority queue to process events automatically by batches and by priority so that no matter the amount of work, the app is always (or almost always) responsive, ideally everything would go through that queue. But I understand this is not the focus of this PR and the soultion requires time to investigate if it's reliable or not.
On the other hand, having arbitrary timeouts to delay a processing might not be enough for some phones or datasets, also they require a manual measurement.
Besides that, I completely think this PR is going in the right way, thanks for paying attention to the login process! 💯
Thank you so much for your thoughtful review @ulisesmac ❤️
Agreed 👍🏼
The custom queue idea deserves exploration, it sounds too good on paper to ignore because we know from OSes that we can optimize a scheduler for responsiveness and minimal latency. Perhaps re-frame's algorithm is not the best for us. I will create an issue to capture this idea once and for all, and maybe somebody can pick it up when feeling adventurous. |
0b0b452
to
8015b0d
Compare
59% of end-end tests have passed
Failed tests (20)Click to expandClass TestActivityMultipleDevicePR:
Class TestCommunityMultipleDeviceMergedTwo:
Class TestCommunityOneDeviceMerged:
Class TestActivityMultipleDevicePRTwo:
Class TestCommunityMultipleDeviceMerged:
Expected to fail tests (1)Click to expandClass TestWalletOneDevice:
Passed tests (30)Click to expandClass TestWalletMultipleDevice:
Class TestOneToOneChatMultipleSharedDevicesNewUiTwo:
Class TestDeepLinksOneDevice:
Class TestGroupChatMultipleDeviceMergedNewUI:
Class TestWalletOneDevice:
Class TestCommunityOneDeviceMerged:
Class TestActivityCenterContactRequestMultipleDevicePR:
Class TestOneToOneChatMultipleSharedDevicesNewUi:
|
Hey @ilmotta! Thanks for the PR. Please take a look at the issue. ISSUE 1 "[object Object] is bot ISeqable" when opening Invite people to contact list in community longtap menuPreconditions: User is a member of community and has at least 1 contact in contacts list Steps:
Expected result: no error, contact list is opened. Actual result: app crashes with "[object Object] is bot ISeqable" error Status-debug-logs - 2024-07-22T120408.832.zip telegram-cloud-document-2-5256000446484470926.mp4 |
hi @ilmotta thank you for PR. No issues from my side. PR is ready to be merged |
Leave this solution for a future PR due to uncertainty about the behavior of flush-dom.
ffdc8b1
to
1a5d675
Compare
We do a few things to reduce the initial load and make the app more responsive after login. The scenario we are covering is a user who joined communities with a large number of members and/or which contain token-gated channels with many members. - Related to #20283 - Related to #20285 - Optimize how we convert a community from JS to CLJS. Community members and chat members are no longer transformed to CLJS, they are kept as JS. Read more details below. - Delay processing lower-priority events by creating a third login phase. The goal is to not put on the same queue we process communities less important events, like fetching the count of unread notifications. Around 15 events could be delayed without causing trouble (and this further prevent a big chain of more events to be dispatched right after login). - Tried to use re-frame's flush-dom metadata, but removed due to uncertainty, check out the discussion: #20729 (comment) Use re-frame’s support for the flush-dom metadata whenever a signal arrives. According to the official documentation, this should tell re-frame to only process the event after the UI has been updated. It’s hard to say if this makes any difference, but the theory is sound. - Reduce the amount of data returned to the subscription that renders a list of communities. We were returning too much, like all members, chats, token permissions, etc. Other things I fixed or improved along the way: - Because members are now stored as JS, I took the opportunity to fix how members are sorted when they are listed. - Removed a few unused subs. - Configured oops to not throw during development (in production the behavior is to never throw). This means oops is now safe to be used instead of interop that can mysteriously fail in advanced compilation. - Show compressed key instead of public key in member list for the account currently logged in. Technical details The number one reason affecting the freeze after login was coming from converting thousands of members inside communities and also because we were doing it in an inefficient way using clojure.walk/stringify-keys. We shouldn't also transform that much data on the client as the parent issue created by flexsurfer correctly recommends. Ever since PR #20414 was merged, status-go doesn't return members in open channels, which greatly helps, for example, to load the Status community. The problem still exists for communities with token-gated channels with many members. The current code in develop does something quite inefficient: it fetches the communities, then transforms them recursively with js->clj and keywordizes keys, then transforms again all the potentially thousands of member IDs back to strings. This PR changes this. We now shallowly convert a community and ignore members because they can grow too fast. From artificial benchmarks simulating many members in token-gated channels, or communities with thousands of members, the improvement is noticeable. You will only really notice improvements if you have spectated or joined a community with 1000+ members and/or a community with many token-gated channels, each containing perhaps hundreds of members. What's the ideal solution? We should consider removing community members and channel members from the community entity returned by status-go entirely. The members should be a separate resource and paginated so that the client doesn't need to worry about the number of members, for the most part.
Revisions from develop: - 59ceddb develop origin/develop fix(wallet): fix bridge transactions (#20902) - 99ccbc3 Cover wallet send events with tests Part 2 #20411 #20533 (#20721) - 8c2d539 Enabling WalletConnect feature flag (#20906) - 67c83b1 fix(wallet): remove edit routes button in bridging (#20874) - 11a84ba feat(wallet): disable complex routing (#20901) - 1f5bb57 chore(wallet): disable bridging on unsupported tokens (#20846) - 4586f80 Add toggle in advanced settings for mobile data - 55c620e fix: create password for small screen (#20645) - 525609f Wallet Activity: transactions are not sorted by time #20808 (#20862) - 9065395 chore(settings): Disable telemetry option (#20881) - d27ab75 fix_:display group message using the new ui (#20787) - c6a1db6 ci: enable split apks & build only for arm64-v8a (#20683) - 73777e0 Ensure keycard account can send transaction after upgrading from v1 to v2 #20552 (#20845) - a6d3fc3 [#20524] fix: the missed keypairs are shown in the key pair list screen (#20888) - a671c70 fix broken screen and navigation when syncing fails (#20887) - a45991b 🥅 Filter connected dapps based on testnet mode, reject proposals and requests gracefully (#20799) - 2e9fa22 feat: wallet router v2 (#20631) - 737d8c4 rename sub to fix error when requesting to join community (#20868) - 3aa7e10 Sync process is blocked on Enabled notifications screen (#20883) - c1d2d44 perf: Fix app freeze after login (#20729) - 0fed811 e2e: updated testnet switching and added one test into smoke - 53c35cb fix(wallet): Linear gradient exception on invalid colors for watched account cards (#20854) - be82365 chore(settings)_: Remove testnet toggle from legacy advanced settings (#20875) - eae8a65 feat(wallet)_: Add beta info box in activity tab (#20873) - fe54a25 fix: not clearing network & web3-wallet on logout (#20886) - 15a4219 Reject wallet-connect request by dragging the modal down (#20763) (#20836) - 2ffbdac WalletConnect show expired toast (#20857) - 402eb83 fix Issue with scrolling WalletConnect transaction on Android (#20867) - ff88049 Fix WalletConnect header alignment on Android (#20860) - cee2124 WalletConnect no internet edge-cases (#20826) - 60ad7c8 chore(tests): New match-strict? cljs.test directive (#20825) - 4989c92 fix_: Adding own address as saved addresses (#20839)
Summary
This PR does a few things to reduce the initial load and make the app more responsive after login. The scenario we are covering in this PR is a user who joined communities with a large number of members and/or which contain token-gated channels with many members.
Other things I fixed or improved along the way:
Technical details
The number one reason affecting the freeze after login was coming from converting thousands of members inside communities and also because we were doing it in an inefficient way using
clojure.walk/stringify-keys
. We shouldn't also transform that much data on the client as the parent issue created by @flexsurfer correctly recommends. Ever since this PR #20414 was merged, status-go doesn't return members in open channels, which greatly helps, for example, to load the Status community. The problem still exists for communities with token-gated channels with many members.The current code in
develop
does something quite inefficient: it fetches the communities, then transforms them recursively withjs->clj
and keywordizes keys, then transforms again all the potentially thousands of member IDs back to strings. This PR changes this. We now shallowly convert a community and ignore members because they can grow too fast. From my artificial benchmarks simulating many members in token-gated channels, or communities with thousands of members, the difference is night and day when the app loads.Is this PR really better than develop?
You will only really notice improvements if you have spectated or joined a community with 1000+ members and/or a community with many token-gated channels, each containing perhaps hundreds of members.
To me, as a member of the Status community, in this particular scenario I see no freeze on Android or iOS, from the
develop
branch and from this PR. Therefore, this PR is not making things worse, but it's a significant improvement to handle communities with more members.What's the ideal solution?
I think in the future, community members and channel members should be removed from the community entity returned by status-go entirely and be a separate resource and paginated. This would allow us to remove custom code to optimize and scale better even in the face of much larger communities than Status.
Why not use bean?
Members' data is only ever used when the user wants to see community/channel members, and we only care about the members' IDs (a JS array of IDs), so I thought there wouldn't be much to gain from using cljs-bean. But I'm very curious to know how re-frame would behave with beans stored in the app-db.
Areas that may be impacted
Login, community, and channel members' list.
Steps to test
This PR is hard to test in terms of performance unless you can test with a community larger than Status, with a lot more members. I recommend the focus should be on the functionality first, checking that there are no regressions for data we know has to be loaded after login, like contacts, notifications, communities, channels, etc.
When listing channel members, currently in
develop
we are sorting by public key due to an incorrect implementation. That's why it looks as if it's random. This PR now sorts members by their displayed name. If we want to improve sorting even further I think we can this to a new issue because I had to fix just because the performance improvements required that.status: ready
Risk
Low risk, but a dev & QA testing would be great.