-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on aurora_enable_repl_bin_log_filtering
in Aurora
#1424
Comments
You're referencing an Aurora v2 instance. Bump yourself up to v3 and try again and you'll likely run into the problems others are.
For me, gh-ost fails on Aurora v3 |
Thanks @rbanks54 I just tested on 3.06 and 3.07 and the cutover was successful. Using default parameters apart from 3.06
3.07
Test:
sysbench command running in parallel:
|
Interesting! We'd tried it on slightly earlier aurora version and had problems. We bumped our minor version just the other day and hadn't tried since, but an attempt just now works. We need to do a little more investigation as we may have made a mistake in our config that we didn't notice when making our earlier attempts |
Whelp... turns out we had a #facepalm level configuration issue. Thanks for posting this issue as it caused us to go back and triple check things (and find our obvious-in-hindsight problem) |
Thanks for double checking this! Based on testing this seems to confirm
Since the default for
|
Would you be able to share your "obvious-in-hindsight" config problem? We're also on a slightly older Aurora 3 version and are having issues with gh-ost, and we have aurora_binlog_replication_max_yield_seconds=0 already. Curious what you found. |
Hi!
In the RDS documentation it recommends disabling
aurora_enable_repl_bin_log_filtering
when using GH-OST in Aurora MySQL. Looking through the history I can see a few references to this potentially causing issues with cutovers, so I was looking to see if there was a reproducible case which can demonstrate this?aurora_enable_repl_bin_log_filtering
BackgroundIn Aurora, the writer instance in a cluster sends redo log records to the aurora storage volume and reader nodes in the cluster. Since Aurora MySQL stores binary logs in the cluster volume, redo log records are also be generated for binary logs under the hood. This is all transparent to users, and the binary logs are presented like they would be in community MySQL. When
aurora_enable_repl_bin_log_filtering
is enabled, redo logs for binary log records will still be sent to Aurora storage, but they will not be sent to aurora readers within the same cluster. Without filtering, these redo logs will be sent to reader instances and discarded since binary logs are not accessible from readers. This can lead to unnecessary amplification of network traffic on the writer/reader so the recommendation is to leaveaurora_enable_repl_bin_log_filtering
enabled. Binlog filtering is always enabled in Aurora MySQL version 3, so the parameter was removed in the Aurora 3 major version. This setting should not be confused with MySQL binlog replication filtering.Since binary logs are not accessible on readers, and
aurora_enable_repl_bin_log_filtering
only affects internal transportation of redo, I highly suspect this may not be the read cause of the cutover issues seen.Theory
What I suspect is the real cause is the
aurora_binlog_replication_max_yield_seconds
parameter.aurora_binlog_replication_max_yield_seconds
Backgroundaurora_binlog_replication_max_yield_seconds
was introduced along withaurora_binlog_read_buffer_size
to improve read performance of binary log consumer threads. The idea here was, to increase the IO read size(aurora_binlog_read_buffer_size
) for each binlog read req made by a consumer thread to improve IO efficiency/throughput of binlog consumer threads. Instead of reading binary logs from aurora storage in 8K chunks, you could configure the read buffer size of each binlog read request. The drawback was that if you were reading in larger chunks, it could lead to more contention with foreground transactions. To allow users configure this tradeoffaurora_binlog_replication_max_yield_seconds
was introduced. Instead of constantly contending with foreground transactions on the active binary log you could configure binary log consumers to "backoff" or "yield" for a number of seconds defined byaurora_binlog_replication_max_yield_seconds
.Note: In Aurora version 2.10 the binlog IO cache was introduced(more information here) which removed the need for the above yield second configurations. The binary log I/O cache aims to minimize read I/O from the Aurora storage layer by keeping the most recent binary log change events in its circular cache on your writer DB instance. This enhancement is enabled by default, so should not require any additional configurations.
from docs:
What I suspect here is that if
aurora_binlog_replication_max_yield_seconds
is configured to a non-zero value, the heartbeat in GH-OST cutovers will never reach zero for long enough to allow the cutover succeed, as it will read an event, sleep("yield"), repeat.Test
I done some basic testing on my side to validate this and it seems to be the case, but would love to hear from others. With yield seconds set to
60
, my heartbeatLag does not decrease:Shortly after I set
aurora_binlog_replication_max_yield_seconds
back to zero in my parameter group, the cutover succeeded:Would love to hear what you think.
Thanks,
Marc
The text was updated successfully, but these errors were encountered: