-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Online DDL via VReplication #7419
Online DDL via VReplication #7419
Conversation
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…parser Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…t on this branch's radar Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…ap query Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…writes to table, waiting for updated pos, renaming tables, releasing table, releasing locks Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great :-)
Left some comments/queries: in general the functionality looks good.
"strategy": sqltypes.StringBindVariable(string(schema.DDLStrategyPTOSC)), | ||
// isVReplMigrationReadyToCutOver sees if the vreplication migration has completed the row copy | ||
// and is up to date with the binlogs. | ||
func (e *Executor) isVReplMigrationReadyToCutOver(ctx context.Context, s *VReplStream) (isReady bool, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For low write qps it is possible that there are no binlog events for a table for a long time. If I read this logic correctly, in such cases the Online DDL workflow may not cut over automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Isn't there a heartbeat mechanism for vreplication though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time_updated
is updated by the heartbeat. However transaction_timestamp
is only when a new binlog event is seen, not for a heartbeat. This is quite rare since it means there is no writes on the source at all. But it has been reported by some users. Also if a database is taken "offline" for some reason (in the sense that no app is writing to it) and the user starts a workflow this can happen.
Practically we may not need to address this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed evaluation of transaction-timestamp. I want to be on the safer side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeUpdated
will always be current because of the heartbeat so the moment the copy state is empty it will say that the migrate workflow is ready for cutover. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I missing something?
Ermm... Not sure why you're asking? Am I missing something? 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To elaborate a bit, though I'm still unsure what you meant:
timeUpdated will always be current
Unless there's some lag, or vreplication is stopped
so the moment the copy state is empty
We also validate that pos
in non-empty. Were you suggesting the migration might cut-over before starting copying rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having discussed this: meanwhile checking transaction_timestamp
is the safer approach. time_updated
gets update regardless of incoming events.
So the risk of transaction_timestamp
is in completely stale servers. However, it's enough that some table is updated and the problem goes away. But if the server is completely stale, cut-over will not happen.
To solve that we can either:
- force injection of some dummy change, or
- compare GTID pos (likely not a good idea, because
_vt
tables are getting changed, affecting GTID, but scrubbed in the streamer) - modify
streamer
to include more information that we can use on target side.
// isVReplMigrationReadyToCutOver sees if the vreplication migration has completed the row copy | ||
// and is up to date with the binlogs. | ||
func (e *Executor) isVReplMigrationReadyToCutOver(ctx context.Context, s *VReplStream) (isReady bool, err error) { | ||
// Check all the cases where migration is still running: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When an online ddl completes, vttablets and vtgates will need to reload their schema as per the current mechanisms, before they see the changes made by the ddl correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I issue a ReloadSchema
before running the migration. Should I run ReloadSchema
immediately after completing the migration, or should I actually do that during the cut-over (when tables have been switched, and writes are still blocked)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it when writes are blocked might be better since binlog events after the write will then see the new schema. Otherwise we may have a race: new events for the altered table may be processed before the new schema is reloaded.
But the ReloadSchema
is just for this tablet, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err := tmClient.ReloadSchema(ctx, tablet.Tablet, ""); err != nil {
return err
}
correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just clarifying, that this goes through gRPC; so I want to make sure this isn't an expensive event, because meanwhile we're blocking (rather -- rejecting) writes to the table, so it's a critical point in time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define expensive...: ReloadSchema can take a while since it parses the entire db schema. It has been speeded up recently but for a large number of tables it will probably take a non-trivial amount of time. I would think, still in the order of milliseconds, but not entirely sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-trivial amount of time.
milliseconds is just fine. Would it take more than 1 second (I'd define "expensive" at 1 second) ? e.g. on a schema with 10,000 tables? Is it possible to ReloadSchema
for just one table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(having discussed this in sync with @rohit-nayak-ps) reloading schema can take a substantial time if many thousands of tables are involved. For now, we issue an async ReloadSchema (with goroutine). In the future, we will look into reloading a single table.
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…tion-online-ddl Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
I have merged #7492 here. This adds a new |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
7e2a074 introduces dummy statement injection for when we run vreplication based online DDL, and the server is utterly stale. In this situation, The solution is hacky: if we notice |
@rohit-nayak-ps actually, re: discussion about cut-over and stale server -- the tests disagree. The tests are happy to cut-over when the copy is complete, even when there's no traffic on the table. Did some local tests, inserted two rows on a table, then slept, then migrated -- cut-over took place. |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
I notice that it is impossible to |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@sougou confirmed changes to I've reverted 7e2a074 ; there is no need to inject dummy transaction. The existing mechanism works even on stale servers. |
Looking at This PR, meanwhile, is good to review, notwithstanding the fact that |
I should clarify. Adding a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Ready for review
This PR implements a "forward" alter table via VReplication.
General design described in #6926 (comment), which this PR implements.
Notable bullets about the changes in this PR:
The main logic of this PR is in
go/vt/vttablet/onlineddl/vrepl.go
andgo/vt/vttablet/onlineddl/executor.go
go/vt/vttablet/onlineddl/executor.go
can now invoke, cancel and terminate VReplication based migrations. Some of this management is symmetric to how we invoke, cancel and terminategh-ost
orpt-osc
, but there are also significant differences:gh-ost
andpt-osc
can only work within the context of this tablet (invoked as an OSExec
), VReplication migrations can execute by one tablet and continued on another, even in face of failover.gh-ost
andpt-osc
have self-logic to run the migration. VReplication needs to be told what exactly it should run. The tablet knows about aalter table
statement, and needs to create a matching rule/filter query for VReplication.gh-ost
andpt-osc
have self-logic to cut-over. Vreplication does not.Executor
identifies the occasion for cut-over and implements the cut-over logic, which entails stopping writes to original tables, waiting for pos, stopping VReplication, replacing tables.Imported a bunch of code from https://github.com/github/gh-ost. These are found under go/vt/vttablet/onlineddl/vrepl. The kind of functionality in those files:
We don't strictly need all the above as-is. We can replace parsing with vitess's parsing. We don't need to track datatypes because vreplication does. But I copied&pasted those files because they are mature and stable and get the work done.
I intend to refactor them later on and remove redundant/unrelated code, or replace with existing Vitess functionality. But I'd like to do so on a separate PR so as to focus on the main overall functionality here.
Split onlineddl/endtoend tests into
endtoend_ghost
andendtoend_vrepl
. They are mostly similar but with some changes (like how do you throttle a migration). The tests are OK-ish but need to be more elaborate. I again wish to follow up in a future PR as I am yet to design a satisfactoryendtoend
test for schema migrations and failure scenarios.Due to some earlier attempt to use
wrangler
insidetabletserver
, I created a newinterface
calledvexec.Executor
, and refactored ddlExecutor to bevexec.Executor
instead ofonlineddl.Executor
. I ended up not usingwrangler
insidetabletserver
but I liked the refactor and kept it.initial PR comment when this was Work In Progress:
Related Issue(s)
Checklist
Deployment Notes
Impacted Areas in Vitess
Components that this PR will affect: