Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using AOF (the collection of commands) instead of RDB for full synchronization #59

Open
soloestoy opened this issue Mar 28, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@soloestoy
Copy link
Member

Whenever we introduce new data structures or new encodings, we need to adjust the RDB's version and encoding format. Internally in the server, this is not an issue. However, changes to RDB create a substantial adaption workload for external tools.

For example, some data migration tools like redis-shake, which needs to parse the RDB file during full synchronization, has to put effort into adapting to new versions whenever there are changes in the RDB (RDB version as well as many changes in storage structures). The tool needs to parse RDB to extract key-value pairs and transform them into restore commands before sending them to the target instance.

Also, there are many detailed issues to be addressed, such as the restore command's parameters not exceeding 500MB (due to proto-max-bulk-len limitation), meaning that when dealing with larger payloads, it's necessary to analyze the specific storage format for splitting, and then reverse-engineer it back into commands for replay (for example, a large hash would be converted into multiple HSET commands).

Furthermore, instances on old versions cannot parse RDB from new versions and cannot use the restore command for data migration (some special scenarios may require rolling back to a previous version by using data migration tools).

To solve the problems and to ensure that migration tools are not affected by changes in RDB format, we can add a new method for full synchronization: using the AOF file (where AOF specifically refers to a collection of commands, not using an RDB preamble). While full synchronization between master and replica still uses the RDB file, data migration tools could declare the file format they wish to use during full synchronization via the REPLCONF command. By doing so, data migration tools can simply forward commands without needing to parse RDB, allowing the full synchronization data to be directly passed through to the target instance, thus simplifying the adaptation work.

@mattsta mattsta mentioned this issue Mar 28, 2024
10 tasks
@hpatro
Copy link
Collaborator

hpatro commented Mar 30, 2024

@soloestoy I've not looked much into AOF, so my question might be very naive.

Wouldn't a user need to pay certain amount of performance penalty (would vary with appendfsync config level) to keep AOF enabled and store the collection of commands whereas with the RDB methodology, libraries needs to stay on top of the updates in the RDB versions but there isn't performance penalty for regular command execution.

@soloestoy
Copy link
Member Author

@hpatro your question is very good. Perhaps I didn't express it clearly at the beginning, but I can explain it by answering your question.

The AOF file used during full synchronization does not rely on the appendonly config. It simply means that we originally used RDB files as snapshots, but now we have replaced it with a new AOF format snapshot. We also need to fork the child process, which converts all data into a set of write commands in AOF format. This way, the migration tool receives a set of commands during the full synchronization phase, eliminating the need to parse RDB encoding and achieving version independence.

@PingXie
Copy link
Member

PingXie commented Apr 1, 2024

@soloestoy, I think it is a great idea to use a no-preamble AOF and I will definitely consider using it for the atomic slot migration work (#23).

such as the restore command's parameters not exceeding 500MB (due to proto-max-bulk-len limitation)

Curious. Do we use the RESP protocol on full sync today? Going through https://github.com/valkey-io/valkey/blob/unstable/src/rdb.c#L3031, my impression is that we don't. If so, are we still bound by proto-max-bulk-len?

@soloestoy
Copy link
Member Author

Hi @PingXie , I mean when the migration tools parsing RDB, they have two choices:

  1. for every key, send the value's whole payload via RESTORE command, thus it doesn't need to decode the different RDB format for each data type (hash/set/list etc.), but the payload may exceeds proto-max-bulk-len.
  2. decode the RDB format of each different type of key value and convert it into commands. This way, although not limited by proto-max-bulk-len, the workload is significant.

@PingXie
Copy link
Member

PingXie commented Apr 1, 2024

Got it. This is not about the full sync but the migration tool. That makes sense. I think another benefit of using no-preamble AOF in either the full sync or slot migration is that we can easily achieve the non-blocking behavior.

@zuiderkwast
Copy link
Contributor

In some system, they are stuck on Redis 6.2 because they need to support Rolling Downgrade. Would this feature allow a replica running Redis 6.2 to replicate from Valkey 8.x?

Why do people need rolling downgrade?

After a rolling upgrade, if anything is not perfect, such as CPU or memory usage or some bug, the customer wants to downgrade again and take some time to investigate the problem. This includes systems where Valkey cluster nodes are just some part of a bigger system where everything is upgraded together in the same rolling upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

4 participants