-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
catalog: add support for moving reader catalogs timestamp forward for PCR #130088
Conversation
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
d105b00
to
3573eea
Compare
3573eea
to
9b19d6a
Compare
14a6f5f
to
7ee3333
Compare
7ee3333
to
f90873a
Compare
@msbutler RFAL! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! mostly left nits
return err | ||
} else { | ||
err = nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just realized the comment below is out of date
if tbl.GetExternal() == nil { | ||
return nil, errors.AssertionFailedf("cannot advanced timestamp on a real table.") | ||
} | ||
tbl.GetExternal().AsOf = asOf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if you instead created a BumpExternalAsOf()
api off of mutableTable
that contains the logic of advanceTimestampForTable
? The line tbl.GetExternal().AsOf = asOf
is a bit odd.
} else { | ||
// TODO(fqazi): For non-single version jumps on replicated descriptors | ||
// we need to either bring the descriptor offline or have a mechanism | ||
// to force the leasing subsystem to avoid "mixing" timestamps.g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oof, what are the consequences for the reader?
@@ -178,6 +120,10 @@ func SetupOrAdvanceStandbyReaderCatalog( | |||
return nil | |||
} | |||
namespaceUpdated.Add(e.GetID()) | |||
// Do not upsert entries if one already exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the namespaceUpdated
is no longer used?
destRunner := sqlutils.MakeSQLRunner(destConn) | ||
|
||
compareConn := func(query string) { | ||
compareConnAt := func(query string, now hlc.Timestamp, isEqual bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i think it would be clearer if the underlying func never accepted a timestamp and always used the captured now
var. compareConnAt
could instead be check
and compareConn
could be checkEqual
or something.
Now before any cluster of "checks" you always set now
to the desired timestamp.
pkg/sql/catalog/descs/collection.go
Outdated
@@ -282,6 +294,15 @@ func (tc *Collection) WriteDescToBatch( | |||
return errors.AssertionFailedf("cannot write descriptor with an empty ID: %v", desc) | |||
} | |||
desc.MaybeIncrementVersion() | |||
// If this is not an initial version, then replication | |||
// versions of descriptors cannot be mutated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is desc.GetVersion() != 1
part of this check?
@@ -176,6 +176,14 @@ func TestReaderCatalog(t *testing.T) { | |||
compareConn("SELECT * FROM t1 ORDER BY n") | |||
compareConn("SELECT * FROM v1 ORDER BY 1") | |||
compareConn("SELECT * FROM t2 ORDER BY j") | |||
|
|||
// Validate that schema changes are blocked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! could you add one test case related to modifying an existing descriptor in the reader tenant?
@@ -57,6 +60,15 @@ func (p *planner) SchemaChange(ctx context.Context, stmt tree.Statement) (planNo | |||
return nil, err | |||
} | |||
mode := p.extendedEvalCtx.SchemaChangerState.mode | |||
// Lease the system database to see if schema changes are blocked on reader | |||
// catalogs. | |||
systemDB, err := p.Descriptors().ByIDWithLeased(p.txn).Get().Database(ctx, keys.SystemDatabaseID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrm, could this add overhead to every schema change?
pkg/sql/create_table.go
Outdated
@@ -246,6 +247,16 @@ func hasPrimaryKeySerialType(params runParams, colDef *tree.ColumnTableDef) (boo | |||
} | |||
|
|||
func (n *createTableNode) startExec(params runParams) error { | |||
// Lease the system database to see if schema changes are blocked on reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you instead just lease the parent database? but maybe it doesn't matter. Same q for create_view
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why createTable
and createView
are special cased, but i trust you 😁
// Next ID: 62 | ||
// Replicated ID tracks the original version from the source tenant | ||
// that this descriptor was created from. | ||
optional uint32 replicated_version = 63 [(gogoproto.nullable) = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I wonder if the name of this field should be ReplicatedPCRVersion
, which makes it more clear that this field is very specific to PCR.
Also the name in the comment is out of date.
f90873a
to
dbe66cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @msbutler and @stevendanna)
pkg/sql/create_table.go
line 250 at r2 (raw file):
could you instead just lease the parent database? but maybe it doesn't matter. Same q for
create_view
Fair, we can just use the parent database in these code paths.
createTable / createView don't get some help from the optimizer since we support CREATE AS / CREATE VIEW, which help resolve things. Normally schema changes will take the AST and make opaque plan nodes, outside of those exceptions.
pkg/sql/schema_change_plan_node.go
line 65 at r2 (raw file):
Previously, msbutler (Michael Butler) wrote…
hrm, could this add overhead to every schema change?
So, the system database will eventually just leased to wait for one version (there probably are a pile of other things that will also read system tables and cause the same thing). So, this just shifts doing it a bit earlier.
pkg/sql/catalog/descs/collection.go
line 298 at r1 (raw file):
Previously, msbutler (Michael Butler) wrote…
why is
desc.GetVersion() != 1
part of this check?
Good point, this should no longer be needed (previously, I didn't have the bypass for the collection).
Done.
pkg/sql/catalog/replication/reader_catalog.go
line 107 at r1 (raw file):
Previously, msbutler (Michael Butler) wrote…
oof, what are the consequences for the reader?
You might be mixing descriptors that were in a set of schema changes, but have vastly different version.
For example:
A user decides to drop a column on a table, which will also clean up a view. Its possible I think that you end up in a scenario where you think the view exists, but get a the table without a column referenced by the view.
I don't think there is anything to prevent that from happening. Also I think in general are we worried about people mixing different timestamps on their descriptors?
pkg/sql/catalog/replication/reader_catalog.go
line 123 at r1 (raw file):
Previously, msbutler (Michael Butler) wrote…
i think the
namespaceUpdated
is no longer used?
Done.
pkg/sql/catalog/replication/reader_catalog.go
line 171 at r1 (raw file):
Previously, msbutler (Michael Butler) wrote…
what if you instead created a
BumpExternalAsOf()
api off ofmutableTable
that contains the logic ofadvanceTimestampForTable
? The linetbl.GetExternal().AsOf = asOf
is a bit odd.
Done.
pkg/sql/catalog/replication/reader_catalog_test.go
line 180 at r2 (raw file):
Previously, msbutler (Michael Butler) wrote…
nice! could you add one test case related to modifying an existing descriptor in the reader tenant?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @msbutler and @stevendanna)
pkg/sql/catalog/replication/reader_catalog.go
line 107 at r1 (raw file):
Previously, fqazi (Faizan Qazi) wrote…
You might be mixing descriptors that were in a set of schema changes, but have vastly different version.
For example:
A user decides to drop a column on a table, which will also clean up a view. Its possible I think that you end up in a scenario where you think the view exists, but get a the table without a column referenced by the view.I don't think there is anything to prevent that from happening. Also I think in general are we worried about people mixing different timestamps on their descriptors?
Actually, never mind the modification time check inside leasing should protect us from this scenario, so this should all work. Let me drop this comment
dbe66cc
to
e5aa8e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @fqazi and @stevendanna)
pkg/sql/schema_change_plan_node.go
line 65 at r2 (raw file):
Previously, fqazi (Faizan Qazi) wrote…
So, the system database will eventually just leased to wait for one version (there probably are a pile of other things that will also read system tables and cause the same thing). So, this just shifts doing it a bit earlier.
sg.
pkg/sql/catalog/replication/reader_catalog.go
line 107 at r1 (raw file):
Previously, fqazi (Faizan Qazi) wrote…
Actually, never mind the modification time check should protect us, so this should all work. Let me drop this comment
Not sure what you mean by "the modification time check". will hopefully understand with the comment.
@fqazi LGTM modulo the rebase is clean. Also one other thing to consider, that may not have seen: #130088 (comment) |
@fqazi one thing i'm curious about: it's interesting that the approach you take to prevent schema changes in the reader tenant touches very little code that @rafiss is working through in #130697. Why is that? Perhaps because Rafi's change needs to filter by schema change, which i guess is only accessible in the schema change code, whereas your change applies a blanket lock. |
Yeah, so this is a lower level code path thats lets us block all DDL operations. The schema_locked and other variants try to allow a subset of schema changes that are not backfilled, so they take a blacklist approach. Where specific code paths have checks and others are ignored. @msbutler |
72caa15
to
b562e75
Compare
To help support PCR, we should prevent schema changes on replicated descriptors. This will prevent the reader catalog from being incorrectly modified. To address this, this patch will add a ReplicatedPCRVersion field to all descriptors and use that to block schema changes. Additionally, this field will be used in the future to allow descriptors to be updated to later timestamps. Release note: None Fixes: cockroachdb#129664
Previously, schema changes were allowed on the reader catalog created by physical cluster replication. This could be problematic because users could introuduce conflicts that could prevent this. To adress this, this patch forces the reader catalogs system database to have the versions filed set, which can be used to disable schema change operations. Release note: None
b562e75
to
a4925ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @msbutler and @stevendanna)
pkg/sql/schema_change_plan_node.go
line 65 at r2 (raw file):
Previously, msbutler (Michael Butler) wrote…
sg.
Done.
pkg/sql/catalog/replication/reader_catalog.go
line 107 at r1 (raw file):
Previously, msbutler (Michael Butler) wrote…
Not sure what you mean by "the modification time check". will hopefully understand with the comment.
I added this comment, there is still a hazard from this:
// Its possible that multiple descriptor versions could have gone by on
// the from cluster, which could mean fairly large schema changes have
// occured. This should be fine, because the leasing subsystem will look
// at the modification time of a descriptor to determine which version should
// be picked up by a txn. Since, we are publishing all descriptors in a single txn,
// this would ensure that we don't end up using a mix of "new" descriptors in oldatxns,
// since read timestamp will be below the modification time. The only potential hazard
// that exists here is that newer txns could pick up older descriptors if the range feed
// is behind.
// TODO(fqazi): We should either change how big version jumps are handled here
// or adjust leasing to publish in batches (latter would be more usable).
I'll get an issue open and adjust the leasing code. I think we can do a fairly conservative change to eliminate the hazard and also make it more usable in other scenarios
@msbutler TFTR! bors r+ |
This patch adds logic for extracting and creating a PCR reader catalog. This patch will do the following:
Fixes: #129664