Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add catalogVersion into QueryInputMetadata & QueryOutputMetadata #17497

Conversation

findinpath
Copy link
Contributor

@findinpath findinpath commented May 14, 2023

Description

As a follow-up on @dain 's work for adding version to the CatalogHandle, this PR adds catalogVersion to the inputs and output fields from QueryIOMetadata to eventually distinguish the catalog version from the output sent to the event listener for the tables involved in Trino queries.

Additional context and related issues

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label May 14, 2023
@findinpath findinpath force-pushed the findinpath/catalog-version-in-inputs-and-output branch from cb7422e to 83c74d5 Compare May 14, 2023 17:10
@findinpath
Copy link
Contributor Author

findinpath commented May 15, 2023

CI failure #17500 #16277

Copy link
Contributor

@erichwang erichwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this looks good, but would it be possible to use CatalogVersion instead of String in the Input/Output analysis parts. It's fine to keep it as String in the eventlistener parts.

@@ -32,6 +32,7 @@
public final class Input
{
private final String catalogName;
private final String catalogVersion;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anyway we can use the Object form of CatalogVersion here instead of string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the type of the field to Object instead of String / CatalogHandle.CatalogVersion.
The only constraint is now on the constructor to have CatalogHandle.CatalogVersion as a parameter so that it knows during the deserialization that the value

"catalogVersion": "default"

is to be converted back to a CatalogHandle.CatalogVersion instead of String like it would normally happen without this extra information.

@findinpath findinpath force-pushed the findinpath/catalog-version-in-inputs-and-output branch from 83c74d5 to 96a74fc Compare May 16, 2023 19:43
@findinpath
Copy link
Contributor Author

We could add a test on io.trino.execution.TestEventListenerBasic#testPrepareAndExecute for retrieving the catalogVersion from the inputs. However, the catalog version for this test is computed by the FileCatalogStore based on a hashing algorithm that may eventually cause changes in the test throughout the time.

/**
* This is not a generic, universal, or stable version computation, and can and will change from version to version without warning.
* For places that need a long term stable version, do not use this code.
*/
static CatalogVersion computeCatalogVersion(String catalogName, ConnectorName connectorName, Map<String, String> properties)
{
Hasher hasher = Hashing.sha256().newHasher();
hasher.putUnencodedChars("catalog-hash");
hashLengthPrefixedString(hasher, catalogName);
hashLengthPrefixedString(hasher, connectorName.toString());
hasher.putInt(properties.size());
ImmutableSortedMap.copyOf(properties).forEach((key, value) -> {
hashLengthPrefixedString(hasher, key);
hashLengthPrefixedString(hasher, value);
});
return new CatalogVersion(hasher.hash().toString());
}

The current hashed value of the input is a54a4dabd3c2c54b5c98ab9451385fae3f0487a110f722aa3d387e502f139234

@findinpath findinpath requested a review from erichwang May 16, 2023 20:39
@findinpath findinpath force-pushed the findinpath/catalog-version-in-inputs-and-output branch from 96a74fc to bdadd6f Compare June 12, 2023 04:45
@findinpath findinpath self-assigned this Jun 12, 2023
@erichwang
Copy link
Contributor

@findinpath, sorry I think I probably wasn't clear enough last time. What I meant was whether we could use the proper CatalogVersion as the type everywhere if possible. If not for some reason, String is an ok alternative for event logging, but it's always better to use the full type if possible.

@findinpath findinpath force-pushed the findinpath/catalog-version-in-inputs-and-output branch 6 times, most recently from 0adf834 to 5f75319 Compare June 12, 2023 19:03
@findinpath findinpath requested a review from erichwang June 12, 2023 19:04
@findinpath findinpath force-pushed the findinpath/catalog-version-in-inputs-and-output branch from 5f75319 to ada035f Compare June 12, 2023 19:05
Copy link
Contributor

@erichwang erichwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the tests pass, this looks good to me.

@electrum can you get this merged when ready?

@erichwang
Copy link
Contributor

@electrum ping

@electrum electrum merged commit c43fe9c into trinodb:master Jun 16, 2023
@github-actions github-actions bot added this to the 420 milestone Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants