Allow using AOF (the collection of commands) instead of RDB for full synchronization #59

soloestoy · 2024-03-28T06:46:14Z

Whenever we introduce new data structures or new encodings, we need to adjust the RDB's version and encoding format. Internally in the server, this is not an issue. However, changes to RDB create a substantial adaption workload for external tools.

For example, some data migration tools like redis-shake, which needs to parse the RDB file during full synchronization, has to put effort into adapting to new versions whenever there are changes in the RDB (RDB version as well as many changes in storage structures). The tool needs to parse RDB to extract key-value pairs and transform them into restore commands before sending them to the target instance.

Also, there are many detailed issues to be addressed, such as the restore command's parameters not exceeding 500MB (due to proto-max-bulk-len limitation), meaning that when dealing with larger payloads, it's necessary to analyze the specific storage format for splitting, and then reverse-engineer it back into commands for replay (for example, a large hash would be converted into multiple HSET commands).

Furthermore, instances on old versions cannot parse RDB from new versions and cannot use the restore command for data migration (some special scenarios may require rolling back to a previous version by using data migration tools).

To solve the problems and to ensure that migration tools are not affected by changes in RDB format, we can add a new method for full synchronization: using the AOF file (where AOF specifically refers to a collection of commands, not using an RDB preamble). While full synchronization between master and replica still uses the RDB file, data migration tools could declare the file format they wish to use during full synchronization via the REPLCONF command. By doing so, data migration tools can simply forward commands without needing to parse RDB, allowing the full synchronization data to be directly passed through to the target instance, thus simplifying the adaptation work.

The text was updated successfully, but these errors were encountered:

hpatro · 2024-03-30T00:25:04Z

@soloestoy I've not looked much into AOF, so my question might be very naive.

Wouldn't a user need to pay certain amount of performance penalty (would vary with appendfsync config level) to keep AOF enabled and store the collection of commands whereas with the RDB methodology, libraries needs to stay on top of the updates in the RDB versions but there isn't performance penalty for regular command execution.

soloestoy · 2024-04-01T03:53:49Z

@hpatro your question is very good. Perhaps I didn't express it clearly at the beginning, but I can explain it by answering your question.

The AOF file used during full synchronization does not rely on the appendonly config. It simply means that we originally used RDB files as snapshots, but now we have replaced it with a new AOF format snapshot. We also need to fork the child process, which converts all data into a set of write commands in AOF format. This way, the migration tool receives a set of commands during the full synchronization phase, eliminating the need to parse RDB encoding and achieving version independence.

PingXie · 2024-04-01T04:28:17Z

@soloestoy, I think it is a great idea to use a no-preamble AOF and I will definitely consider using it for the atomic slot migration work (#23).

such as the restore command's parameters not exceeding 500MB (due to proto-max-bulk-len limitation)

Curious. Do we use the RESP protocol on full sync today? Going through https://github.com/valkey-io/valkey/blob/unstable/src/rdb.c#L3031, my impression is that we don't. If so, are we still bound by proto-max-bulk-len?

soloestoy · 2024-04-01T05:29:50Z

Hi @PingXie , I mean when the migration tools parsing RDB, they have two choices:

for every key, send the value's whole payload via RESTORE command, thus it doesn't need to decode the different RDB format for each data type (hash/set/list etc.), but the payload may exceeds proto-max-bulk-len.
decode the RDB format of each different type of key value and convert it into commands. This way, although not limited by proto-max-bulk-len, the workload is significant.

PingXie · 2024-04-01T05:47:58Z

Got it. This is not about the full sync but the migration tool. That makes sense. I think another benefit of using no-preamble AOF in either the full sync or slot migration is that we can easily achieve the non-blocking behavior.

zuiderkwast · 2024-10-02T13:36:44Z

In some system, they are stuck on Redis 6.2 because they need to support Rolling Downgrade. Would this feature allow a replica running Redis 6.2 to replicate from Valkey 8.x?

Why do people need rolling downgrade?

After a rolling upgrade, if anything is not perfect, such as CPU or memory usage or some bug, the customer wants to downgrade again and take some time to investigate the problem. This includes systems where Valkey cluster nodes are just some part of a bigger system where everything is upgraded together in the same rolling upgrade.

mattsta mentioned this issue Mar 28, 2024

Wishlist #17

Open

10 tasks

zuiderkwast added the enhancement New feature or request label Apr 10, 2024

zuiderkwast added this to Valkey 8.0 Jun 17, 2024

zuiderkwast mentioned this issue Jun 30, 2024

Update release guidance for Valkey valkey-io/valkey-doc#94

Merged

PingXie mentioned this issue Jun 30, 2024

Dual channel replication #60

Merged

zuiderkwast added this to Roadmap Aug 29, 2024

zuiderkwast removed this from Valkey 8.0 Aug 29, 2024

zuiderkwast mentioned this issue Oct 2, 2024

[NEW] Rolling downgrade, forward compatibility #1108

Open

PingXie mentioned this issue Jan 8, 2025

[NEW] Atomic slot migration HLD #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow using AOF (the collection of commands) instead of RDB for full synchronization #59

Allow using AOF (the collection of commands) instead of RDB for full synchronization #59

soloestoy commented Mar 28, 2024

hpatro commented Mar 30, 2024

soloestoy commented Apr 1, 2024

PingXie commented Apr 1, 2024

soloestoy commented Apr 1, 2024

PingXie commented Apr 1, 2024

zuiderkwast commented Oct 2, 2024

Allow using AOF (the collection of commands) instead of RDB for full synchronization #59

Allow using AOF (the collection of commands) instead of RDB for full synchronization #59

Comments

soloestoy commented Mar 28, 2024

hpatro commented Mar 30, 2024

soloestoy commented Apr 1, 2024

PingXie commented Apr 1, 2024

soloestoy commented Apr 1, 2024

PingXie commented Apr 1, 2024

zuiderkwast commented Oct 2, 2024