Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the whole TTableDescription to describe an indexImplTable in TIndexDescription #6280

Merged
merged 6 commits into from
Jul 5, 2024

Conversation

jepett0
Copy link
Collaborator

@jepett0 jepett0 commented Jul 4, 2024

It would be better (more future proof) to keep all the necessary settings of the indexImplTable in one message. I would not like to reinvent the bicycle, so the best option I see here is to reuse TTableDescription.

Additionally, I have added options fillConfig and fillBoundaries to control what parts of the whole TTableDescription would be return on DescribeTableIndex request. It should prevent the unnecessary use of CPU for parsing big protobufs (TableBoundaries can be big). We would need to enable fillBoundaries explicitly for backups, but this would be done in scope of #4955.

And the last change is to make IndexImplTableDescription field a repeated field. The motivation is to prepare flat_scheme_op.proto for multiple indexImplTables which are said to be coming soon to implement vector indices. It is a breaking change, but there are no instances of YDB that have been using the new TIndexDescriptions with PartitionConfig and PartitionBoundaries, so it should be ok.

@jepett0 jepett0 requested review from azevaykin and ijon July 4, 2024 09:05
@@ -92,7 +92,14 @@ class TDescribeTableRPC : public TRpcSchemeRequestActor<TDescribeTableRPC, TEvDe
return Reply(Ydb::StatusIds::INTERNAL_ERROR, ctx);
}

FillIndexDescription(describeTableResult, tableDescription, splitKeyType);
try {
FillIndexDescription(describeTableResult, tableDescription);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FillIndexDescription now calls FillColumnDescription which might throw

) {
*entry.MutableExplicitPartitions()->MutableSplitBoundary() = explicitPartitions;
}
*entry.AddIndexImplTableDescription() = indexImplTable->TableDescription;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire TableDescription is copied here in the TIndexDescription. However, it does not contain a lot of info. Mainly, it does not contain TTablePartition which is heavy for multishard tables.

Screenshot 2024-07-04 at 12 12 30

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why there is no column info in the IndexImplTableDescription... I will solve this problem in scope of #4955

const NKikimrMiniKQL::TType& splitKeyType
) {
for (const auto& boundary : boundaries) {
const NKikimrSchemeOp::TTableDescription& in, const NKikimrMiniKQL::TType& splitKeyType) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes restore these lines to the original state in which they were before the #5119

if (indexImplTableDescription.HasUniformPartitionsCount()) {
settings.set_uniform_partitions(indexImplTableDescription.GetUniformPartitionsCount());
}
if (indexImplTableDescription.SplitBoundarySize()) {
Copy link
Collaborator Author

@jepett0 jepett0 Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could ask yourself: "What if BOTH UniformPartitionsCount and SplitBoundary are specified?" My answer is: SplitBoundary has higher priority. This method (FillIndexDescription) prepares a description of an index. It does not create the index! So SplitBoundary is more important in this scenario.

The other option I see here is to throw an error. (FillIndexDescription already throws if FillColumnDescription cannot decide how to fill the columns.)

Copy link
Collaborator

@MBkkt MBkkt Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option I see here is to throw an error. (FillIndexDescription already throws if FillColumnDescription cannot decide how to fill the columns.)

Exception sounds better for me, because

  1. looks like error case
  2. can be changed to something else in future
  3. I don't think having two different behavior in such case is an good idea

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked how DescribeTable deals with this problem and found out that it just ignores UniformPartitionsCount, i.e. it does not even check if it exists in the TTableDescription. See the code around here. The motivation is: there is no such thing as UniformPartitionsCount for an existing table. If a table already exists, then it has explicit partition boundaries which are represented by SplitBoundary field of the TTableDescription protobuf.

@jepett0 jepett0 requested a review from MBkkt July 4, 2024 09:23
TExplicitPartitions ExplicitPartitions = 11;
}
optional TPartitioningPolicy PartitioningPolicy = 12;
repeated TTableDescription IndexImplTableDescription = 10;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Language Guide (proto 3) , repeated fields are used in the plural.

Suggested change
repeated TTableDescription IndexImplTableDescription = 10;
repeated TTableDescription IndexImplTableDescriptions = 10;

Copy link
Collaborator Author

@jepett0 jepett0 Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I agree. More on the issue:

Use pluralized names for repeated fields.

Repeated fields must use a plural field name.

  • statistically speaking, only 12 out of 94 repeated protobuf field names are singular in flat_scheme_op.proto

@ydb-platform ydb-platform deleted a comment from github-actions bot Jul 4, 2024

This comment was marked as outdated.

@ydb-platform ydb-platform deleted a comment from github-actions bot Jul 4, 2024
@ydb-platform ydb-platform deleted a comment from github-actions bot Jul 4, 2024

This comment was marked as outdated.

This comment was marked as outdated.

azevaykin
azevaykin previously approved these changes Jul 4, 2024
@@ -1000,15 +996,10 @@ message TIndexDescription {
optional uint64 PathOwnerId = 7;

repeated string DataColumnNames = 8;

// DataSize + IndexSize of indexImplTable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// DataSize + IndexSize of indexImplTable
// DataSize + IndexSize

What is IndexSize? Is it size of index or size of all (now single) index impl tables?

Copy link
Collaborator Author

@jepett0 jepett0 Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexSize is the size of the index (the internal index that implements the primary key logic) of the indexImplTable. It follows the general DataSize + IndexSize = TotalSize convention for tables.

Copy link
Collaborator Author

@jepett0 jepett0 Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would much rather leave this comment for you to deal with in scope of allowing multiple indexImplTables 😜 My idea for this PR is to create a hotfix before it is too late, not to correct every assumption in the code that the indexImplTable is single

ydb/core/tx/datashard/export_common.cpp Outdated Show resolved Hide resolved
if (indexImplTableDescription.HasUniformPartitionsCount()) {
settings.set_uniform_partitions(indexImplTableDescription.GetUniformPartitionsCount());
}
if (indexImplTableDescription.SplitBoundarySize()) {
Copy link
Collaborator

@MBkkt MBkkt Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option I see here is to throw an error. (FillIndexDescription already throws if FillColumnDescription cannot decide how to fill the columns.)

Exception sounds better for me, because

  1. looks like error case
  2. can be changed to something else in future
  3. I don't think having two different behavior in such case is an good idea

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

@jepett0 jepett0 requested review from MBkkt and azevaykin July 4, 2024 15:10
@jepett0 jepett0 marked this pull request as ready for review July 4, 2024 15:39
void DescribeTableIndex(const TPathId& pathId, const TString& name, NKikimrSchemeOp::TIndexDescription& entry);
void DescribeTableIndex(const TPathId& pathId, const TString& name, TTableIndexInfo::TPtr indexInfo, NKikimrSchemeOp::TIndexDescription& entry);
void DescribeTableIndex(const TPathId& pathId, const TString& name,
bool fillConfig, bool fillBoundaries, NKikimrSchemeOp::TIndexDescription& entry
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added explicit fillConfig and fillBoundaries options to be able to control the behavior of DescribeTableIndex. Moreover, these options are supported by DescribeTable and it would be inconsistent to ignore them in DescribeTableIndex

void FillPartitionConfig(const NKikimrSchemeOp::TPartitionConfig& in, NKikimrSchemeOp::TPartitionConfig& out) {
out.CopyFrom(in);
NKikimr::NSchemeShard::TPartitionConfigMerger::DeduplicateColumnFamiliesById(out);
out.MutableStorageRooms()->Clear();
Copy link
Collaborator Author

@jepett0 jepett0 Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that PartitionConfig is not just copied, but some unnecessary info is erased before giving it out to users. I have reused this logic for both DescribeTable and DescribeTableIndex.

azevaykin
azevaykin previously approved these changes Jul 5, 2024
MBkkt
MBkkt previously approved these changes Jul 5, 2024
@jepett0 jepett0 changed the title Prepare flat_scheme_op.proto for multiple indexImplTables Use the whole TTableDescription to describe indexImplTable in TIndexDescription Jul 5, 2024
@jepett0 jepett0 changed the title Use the whole TTableDescription to describe indexImplTable in TIndexDescription Use the whole TTableDescription to describe an indexImplTable in TIndexDescription Jul 5, 2024
Copy link

github-actions bot commented Jul 5, 2024

2024-07-05 11:09:34 UTC Pre-commit check for ee33351 has started.
2024-07-05 11:12:39 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-07-05 11:59:32 UTC Build successful.
2024-07-05 11:59:59 UTC Tests are running...
🔴 2024-07-05 13:45:25 UTC Some tests failed, follow the links below.

Test history | Test log

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
39750 34516 0 3 5221 10

🟢 2024-07-05 13:46:06 UTC ydbd size 8.1 GiB changed* by -120 Bytes, which is <= 0 Bytes vs main: OK

ydbd size dash main: 5fc09ca merge: ee33351 diff diff %
ydbd size 8 726 129 816 Bytes 8 726 129 696 Bytes -120 Bytes -0.000%
ydbd stripped size 477 144 232 Bytes 477 142 216 Bytes -2.0 KiB -0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@jepett0 jepett0 requested review from MBkkt and azevaykin July 5, 2024 11:42
@jepett0 jepett0 enabled auto-merge (squash) July 5, 2024 12:03
@jepett0 jepett0 disabled auto-merge July 5, 2024 12:52
@jepett0 jepett0 merged commit fb31079 into ydb-platform:main Jul 5, 2024
8 of 12 checks passed
@jepett0 jepett0 self-assigned this Jul 8, 2024
jepett0 added a commit to jepett0/ydb that referenced this pull request Jul 9, 2024
…exDescription (ydb-platform#6280)

The purpose of the change is to prepare for multiple indexImplTables for vector indices. Other changes are more or less cosmetic.
jepett0 added a commit that referenced this pull request Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants