Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store persistent data in a sound way #4841

Closed
gnprice opened this issue Jul 1, 2021 · 4 comments · Fixed by #5309
Closed

Store persistent data in a sound way #4841

gnprice opened this issue Jul 1, 2021 · 4 comments · Fixed by #5309
Assignees
Labels
a-data-sync Zulip's event system, event queues, staleness/liveness a-redux P1 high-priority

Comments

@gnprice
Copy link
Member

gnprice commented Jul 1, 2021

The way we currently store the user's data on the device's storage, between runs of the app, is with the design from redux-persist:

  • The units of storage are the top-level subtrees of the Redux state: state.accounts is one, state.users is another, and so on.
  • When something changes, we write out the new version of each top-level subtree that changed: if a user changed their avatar then we write out the entire new state.users, and so on.
  • When things change across several different top-level subtrees, we write out each of them separately.

As a result, it's possible for the stored data to get into a state that was never the state inside the app, and that's internally inconsistent. This was the cause of #4587, for example.

For the most part, this doesn't cause user-visible problems today -- the crashes in #4587 were an exception to that pattern.

But the reason it doesn't is that we currently don't really rely that heavily on the stored data: each time the app starts up, we go and do an initial fetch of the server data from scratch. That means that for all the data that ultimately comes from the server -- which includes most of our data and most of the complexity in our data -- any inconsistencies that creep into the stored data will typically get wiped away almost immediately after we first even read it from storage. (The way that issue #4587 managed to evade that protection is that it caused a crash before we got far enough to do that initial fetch.) So we're largely protected from consequences of this issue… but the price is that we're not really getting much use out of storing that data.

There are several things we'd like to do that would mean relying more firmly on the stored data, and not constantly refreshing it from scratch:

  • The status quo means that when you launch the app, the data you're looking at is stale and there's a long lag before you see up-to-date data. This is an especially bad experience when you got there by opening a notification -- there's a message you know you got, it was right there in the notification, but we don't initially know about it. We have the basic design for how to fix this, as Support downgrading to a long-lived event queue #3916; but it means maintaining your data in device storage and relying on that, rather than refreshing from scratch.

  • We'd like to improve the experience of switching between several accounts (in several orgs): Add quick organization switcher #1321. Doing a really good job of that would require maintaining all the same data about all your accounts as we currently do about the one active account; and fundamentally, doing much at all in that direction will require maintaining at least some data about all your accounts. In principle I guess we could maintain all that data while keeping the reload-from-scratch-on-each-launch strategy; but it'd mean even more data getting reloaded from scratch, so even more lag.

  • We'd like to offer fuller-featured, higher-quality notifications: with an inline reply action (Allow message responses from within Android notification bar #2512), with end-to-end encryption from the Zulip server (Encrypt push notifications end-to-end from Zulip server #1190), pre-fetching the conversation's messages so it's ready to go if you open the notification (Eagerly fetch relevant conversation on getting notification #3599), among other things.

    All of these require more integration between the platform-native code (in Kotlin for Android, and in the future in Swift for iOS) and JS than we currently have: either for the platform-native code to see more of the information that the JS code has, or to send more information to JS, or both. That would be a lot easier if we have the data in a reasonable schema that the platform-native code can consume directly. Even if we don't do that -- even if we do all the integration by having the platform-native code call into JS -- it means continuously using the stored data, beyond the moments after each launch.

If we were to pursue any of those things without giving attention to this issue, it's likely we would encounter a lot more bugs like #4587. That kind of bug can be quite bad for a user, and they're pretty unpleasant to debug too, because fundamentally they involve our data being in a state that doesn't make any sense.

As a result, we need to do some combination of fixing this issue properly; mitigating it to reduce its impact; and carefully working around it in the context of any of the above features that we build before we've fixed it.

@gnprice gnprice added a-redux a-data-sync Zulip's event system, event queues, staleness/liveness P1 high-priority labels Jul 1, 2021
@gnprice
Copy link
Member Author

gnprice commented Jul 1, 2021

Some in-progress work to mitigate this issue is in #4694 -- that will arrange things so that when several subtrees get updated, we write out the updates in quick succession, in hopes that that reduces how often we end up with only some of the updates written out so that the data is inconsistent.

I don't have a definite plan for how to properly fix the issue, so the next steps there are some more investigation and discussion. In general, the standard way to handle this kind of problem is with a database -- this is why they were invented. In the context of a mobile app, the database software I'd almost certainly want to use is SQLite: it's highly reliable, highly portable, and has good efficiency. I don't know what the state is of convenient bindings for using SQLite from RN, or for that matter from Swift.


It may also be possible to fix this soundness issue with a smaller change on top of our existing system. One idea mentioned in previous discussion is to keep storing new versions of whole state subtrees, but with some kind of metadata to identify a tuple of those as a complete actual snapshot of the tree.

This basically means we'd be writing our own very simple database system. That sounds like famous last words, but it might work for our case: we currently tolerate some performance characteristics that general-purpose databases can't accept and that drive a lot of their complexity. In particular, rewriting the entire state.messages each time you get a new message (and similarly for other subtrees), and reading the entire database into our in-memory data structures at startup rather than ever trying to query just a piece of it.

@gnprice
Copy link
Member Author

gnprice commented Jul 1, 2021

A couple of related issues are #3408 and #4446. Those are at a different layer -- they're both about properly maintaining the state at the app level, in our reducers. But they have the same property that we currently get to paper over a lot of small bugs there with the fact that we're constantly refreshing the state from scratch, so anything we get wrong in updating the state just gets wiped away before very long.

In a future where we're maintaining the state incrementally for longer periods, the results of those bugs will accumulate, so that even when the bugs are all in situations that happen at low frequency, a user is likely to have several glitches scattered around their state. So those issues will also become more important to fix as we implement any of the features mentioned above.

gnprice added a commit to gnprice/zulip-mobile that referenced this issue Aug 13, 2021
This isn't itself server data -- it's what we use for acquiring
server data -- which is why we didn't cover it when expanding these
checks in 916dc4b.

But like the server data, it's something we assume is present
when rendering the main UI of the app.  And because of the same
issue zulip#4841, the same lack of consistency in our data-storage
model with redux-persist that caused zulip#4587 and made that fix
necessary, it's possible for it to be lacking data when the
existing checks in this function all pass.

So, add a check for this too.  Build on the comments added in the
previous commit to reason through why this won't get us stuck at
the loading screen; and add a comment at the otherwise
inessential-looking bit of code that provides one of the
assumptions we rely on for that reasoning.

Also correct a related point in the comment at the end of the
function.
chrisbobbe pushed a commit to gnprice/zulip-mobile that referenced this issue Aug 16, 2021
This isn't itself server data -- it's what we use for acquiring
server data -- which is why we didn't cover it when expanding these
checks in 916dc4b.

But like the server data, it's something we assume is present
when rendering the main UI of the app.  And because of the same
issue zulip#4841, the same lack of consistency in our data-storage
model with redux-persist that caused zulip#4587 and made that fix
necessary, it's possible for it to be lacking data when the
existing checks in this function all pass.

So, add a check for this too.  Build on the comments added in the
previous commit to reason through why this won't get us stuck at
the loading screen; and add a comment at the otherwise
inessential-looking bit of code that provides one of the
assumptions we rely on for that reasoning.

Also correct a related point in the comment at the end of the
function.
chrisbobbe pushed a commit to gnprice/zulip-mobile that referenced this issue Aug 16, 2021
This isn't itself server data -- it's what we use for acquiring
server data -- which is why we didn't cover it when expanding these
checks in 916dc4b.

But like the server data, it's something we assume is present
when rendering the main UI of the app.  And because of the same
issue zulip#4841, the same lack of consistency in our data-storage
model with redux-persist that caused zulip#4587 and made that fix
necessary, it's possible for it to be lacking data when the
existing checks in this function all pass.

So, add a check for this too.  Build on the comments added in the
previous commit to reason through why this won't get us stuck at
the loading screen; and add a comment at the otherwise
inessential-looking bit of code that provides one of the
assumptions we rely on for that reasoning.

Also correct a related point in the comment at the end of the
function.
@chrisbobbe chrisbobbe mentioned this issue Aug 26, 2021
@gnprice
Copy link
Member Author

gnprice commented Aug 30, 2021

We talked today in the office about this issue, and I think we have a plan for it.

  • I looked around recently at SQLite bindings. There's a pretty plausible-looking one for React Native: https://github.com/andpor/react-native-sqlite-storage
    • I made a trivial demo of using it: master...gnprice:dev-sqlite It successfully ran on Android, successfully built for iOS, and probably runs fine on iOS but I didn't get to confirming that.
    • The API isn't awesome. Like in that demo, we'll want to say SQLite.enablePromise(true); somewhere before any code actually uses it.
    • But I think fundamentally it will work fine. I looked through some of the implementation code, and it seems basically reasonable.
    • For accessing the same database from the respective native sides: Android comes standard with SQLite bindings. And on iOS, one doesn't really need an "iOS binding" -- you can just use SQLite (which is a C library) directly, and that's what react-native-sqlite-storage does. One might want a small layer to make a more convenient API or one more idiomatic for Swift or Objective-C, but we can just whip that up ourselves. The one real platform-specific bit is how to find the database file in the filesystem, and for that we can crib or share code from react-native-sqlite-storage.
  • Structure redux-persist code toward doing writes in batches with AsyncStorage.multiSet. #4694 is close to merge. Once that's in, our redux-persist layer will be doing pretty nearly the thing it needs to do in order to get soundness.
    • Maybe exactly the thing it needs to do, modulo the case (Structure redux-persist code toward doing writes in batches with AsyncStorage.multiSet. #4694 (comment) ) where one attempt at writing an update fails and we later try to write a subsequent update.
    • That is, we'll be (modulo that write-failed case) taking one coherent state tree, finding all the parts of it that have changed since what's already written, and making a single call down to the lower layer to write all of those updates out.
    • The remaining unsoundness will be that that single multiSet call doesn't translate into a single transactional update to the underlying store.
  • So, what we can do:
    • Replace the use of AsyncStorage with react-native-sqlite-storage. Replace (or reinterpret) the multiSet with a single transactional update.
    • The SQLite schema for this will be basically like the one AsyncStorage uses on Android (where it uses SQLite): one table mapping keys to values.
    • We'll need to take care with migration: we want to bring users' data forward from AsyncStorage. We'll need to keep around the ability to read the old format basically forever (at least a year or two), for users upgrading from now-existing versions of the app.
  • Some followup opportunities:
    • The reason we have the ZulipAsyncStorage layer, and its compression, is that RN's AsyncStorage on Android limits rows to about 2 MB of data, and state.users as uncompressed JSON can get bigger than that. Once we control the underlying SQLite database, we can raise that limit, and then the compression becomes purely a performance matter and we can experiment with removing it. (The compression saves storage, and saves time on IO, but costs CPU time to compress and decompress.)
      • In fact we definitely should raise that limit, to reduce the risk of bumping up against it despite the compression.
    • We can break up the storage more finely, so that instead of storing e.g. state.users as one record, we'd store each user as one record. This means quite a lot less to update when just one user changes. … This is probably best sequenced after we rearrange our data structures to accommodate keeping data around for multiple accounts, because that will likely influence what we want these finer-grained keys to look like and so that sequencing will save us a potentially-tricky migration.
      • After doing this for all the potentially-large subtrees, we can dispense with the compression -- for small values, it seems unlikely it's helpful even just for performance. (We could also perhaps selectively disable it for things that already can't be large.)
    • All the "several things we'd like to do" described in the OP 🙂

chrisbobbe added a commit to chrisbobbe/zulip-mobile that referenced this issue Sep 14, 2021
One thing that should help mitigate zulip#4841 is to narrow the window
during which we're propagating a Redux state change to persistent
storage. This does quite a bit of that, by not having the `setItem`
calls separated by (a) the CPU time to do the serialization, or (b)
yields, as Greg points out [1].

[1] zulip#4694 (comment)
chrisbobbe added a commit to chrisbobbe/zulip-mobile that referenced this issue Sep 14, 2021
As in the previous commit, the hope here is to mitigate zulip#4841.

It doesn't yet solve zulip#4841, though. We can't count on the sets being
done transactionally. Examining `AsyncStorage`'s implementation, we
see that the JavaScript connects to native modules the package
provides, and calls `RNC_AsyncSQLiteDBStorage` and
`RNCAsyncStorage` [1]. These are for Android and iOS, respectively.

The native implementations have these comments for `multiSet`:

Android (from the `ReactContextBaseJavaModule` with name
"RNC_AsyncSQLiteDBStorage" [2]):
```
/**
 * Inserts multiple (key, value) pairs. If one or more of the pairs cannot be inserted, this will
 * return AsyncLocalStorageFailure, but all other pairs will have been inserted.
 * The insertion will replace conflicting (key, value) pairs.
 */
```

iOS (from RNCAsyncStorage.h [3]):
```
// Add multiple key value pairs to the cache.
```

[1] https://github.com/react-native-async-storage/async-storage/blob/v1.6.3/lib/AsyncStorage.js#L17-L18
[2] https://github.com/react-native-async-storage/async-storage/blob/v1.6.3/android/src/main/java/com/reactnativecommunity/asyncstorage/AsyncStorageModule.java#L193-L197
[3] https://github.com/react-native-async-storage/async-storage/blob/v1.6.3/ios/RNCAsyncStorage.h#L40
gnprice pushed a commit to chrisbobbe/zulip-mobile that referenced this issue Sep 15, 2021
One thing that should help mitigate zulip#4841 is to narrow the window
during which we're propagating a Redux state change to persistent
storage. This does quite a bit of that, by not having the `setItem`
calls separated by (a) the CPU time to do the serialization, or (b)
yields, as Greg points out [1].

[1] zulip#4694 (comment)
gnprice pushed a commit to chrisbobbe/zulip-mobile that referenced this issue Sep 15, 2021
As in the previous commit, the hope here is to mitigate zulip#4841.

It doesn't yet solve zulip#4841, though. We can't count on the sets being
done transactionally. Examining `AsyncStorage`'s implementation, we
see that the JavaScript connects to native modules the package
provides, and calls `RNC_AsyncSQLiteDBStorage` and
`RNCAsyncStorage` [1]. These are for Android and iOS, respectively.

The native implementations have these comments for `multiSet`:

Android (from the `ReactContextBaseJavaModule` with name
"RNC_AsyncSQLiteDBStorage" [2]):
```
/**
 * Inserts multiple (key, value) pairs. If one or more of the pairs cannot be inserted, this will
 * return AsyncLocalStorageFailure, but all other pairs will have been inserted.
 * The insertion will replace conflicting (key, value) pairs.
 */
```

iOS (from RNCAsyncStorage.h [3]):
```
// Add multiple key value pairs to the cache.
```

[1] https://github.com/react-native-async-storage/async-storage/blob/v1.6.3/lib/AsyncStorage.js#L17-L18
[2] https://github.com/react-native-async-storage/async-storage/blob/v1.6.3/android/src/main/java/com/reactnativecommunity/asyncstorage/AsyncStorageModule.java#L193-L197
[3] https://github.com/react-native-async-storage/async-storage/blob/v1.6.3/ios/RNCAsyncStorage.h#L40
@gnprice
Copy link
Member Author

gnprice commented Sep 16, 2021

OK, and #4694 is merged!

There's that one open followup #4694 (comment) , but I think it's orthogonal to the main remaining thing we need to do:

The remaining unsoundness will be that that single multiSet call doesn't translate into a single transactional update to the underlying store.

So the next step, replacing the use of AsyncStorage with our own use of SQLite in order to provide a similar interface but with transactional multiSet, is now ready to be worked on.

@gnprice gnprice assigned gnprice and unassigned WesleyAC Dec 7, 2021
@gnprice gnprice mentioned this issue Dec 7, 2021
4 tasks
gnprice added a commit to gnprice/zulip-mobile that referenced this issue Mar 24, 2022
One test case changes: `setItem` now returns void instead of null.

I'm not sure if the upstream AsyncStorage really returns null here,
or if it too returns void and just the mock is off.

Anyway void is clearly the natural thing here -- it's basically a
function that returns no information -- so keep returning that
ourselves, and update the test.  Real callers should never care.

Fixes: zulip#4841
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a-data-sync Zulip's event system, event queues, staleness/liveness a-redux P1 high-priority
Projects
None yet
2 participants